How do programs change over time?

Jason Dagit and I recently wrote a paper that got accepted at the “DChanges 2013” workshop in Florence, Italy this September.  Alas, I didn’t get to attend since I had to be at another conference at the same time.  Jason had to endure a trip to Italy solo – which, I’m sure was a huge burden.  :-)  That said, this is a bit of work that I am quite excited to share with the world.

The paper is available on the ArXiV entitled “Identifying Change Patterns in Software History” - the final version isn’t much different (the reviewers liked it and didn’t request many changes), and fortunately the workshop has an open access policy on the publications.  So, the final version will be available in the wild in the not-too-distant future and won’t likely be too hard to turn up.

The problem we looked at is simply stated: what does the change history of a piece of software as represented in a source control system tell us about what people did to it over time?  Anyone who has worked on a project for any substantial amount of time knows that working on code isn’t dominated by adding new features – it is mostly an exercise of cleaning up and repairing flaws, reorganizing code to make it easier to add to in the future, and adding things that make the code more robust.  During the process of making these changes, I have often found that it feels like I do similar things over and over – add a null pointer check here, rearrange loop counters there, add parameters to functions elsewhere.  Odds are, if you asked a programmer “what did you have to do to address issue X in your code”, they would describe a pattern instead of an explicit set of specific changes, such as “we had to add a status parameter and tweak loop termination tests”.

This led us to ask : can we identify the patterns that the programmers had in mind that they would tell you in response to our question of “what did you do”, simply by looking at the evolutions stored in a version control system?  While easily stated, the question we looked at in our paper was how one actually goes about doing this.

We started with some work I had developed as part of a Dept. of Energy project called “COMPOSE-HPC” where we built infrastructure to manipulate programs in their abstract syntax form via a generic text representation of their syntax trees.  The representation we chose was the Annotated Term form used by the Stratego/XT and Spoofax Workbench projects.  A benefit of the ATerm form for programs is that it allows us to separate the language parser from the analyzer – parsing takes place in whatever compiler front end is available, and all we require is a traversal of the resulting parse tree or resulting abstract syntax tree that can emit terms that conform to the ATerm format.

Instead of reproducing the paper here, I recommend that interested people download it and read it over.  I believe the slides will be posted online at some point so you can see the presentation that Jason gave.  The short story is that given the generic ATerm representation for the program, we combined a structural differencing algorithm (specifically, the one discussed in this paper) with the well known anti-unification algorithm to identify change points and distill the abstract pattern represented by the change.  The details in the paper that aren’t reproduced here are how we defined a metric of similarity for clustering changes such that anti-unification could be applied over groups of similar changes to yield meaningful patterns.  To show the idea at work, we used the existing Haskell language-java parser to parse code and wrote a small amount of code to emit an aterm representation that could be analyzed.  We applied it to two real open source repositories – the one for the ANTLR parser project and the Clojure compiler.  It was satisfying to apply it to real repositories instead of contrived toy repositories – we felt that the fact the idea didn’t fall over when faced with the complexity and size of real projects indicated that we had something of real interest to share with the world here.

The code for this project will be up on github in the not too distant future – I will update this post when it becomes available.

Finding dents in a blobby shape

Recently someone on a forum I read on researchgate.net posted a question: given some blobby shapes like:

indent

How do we find the number of indentations in each shape?  A few years ago I was doing work with microscope slide images looking at cells in tissue, and similar shape analysis problems arose when trying to reason about the types of cells that appeared in the image.

Aside from any specific context (like biology), counting the dents in the black and white blobs shown above looked like an interesting little toy problem that wouldn’t be too hard to solve for simple shapes.  The key tool we will use is the interpretation of the boundary of the shape as a parametric curve that we can then approximate the curvature along.  When the sign of the curvature changes (i.e., when the normal to the curve switches from pointing into the shape to outside, or vice versa), we can infer that an inflection point has occurred corresponding to a dent beginning or ending.

The tools we need to solve this are:

  • A way to find a sequence of points (x,y) that represent a walk around the boundary of the shape,
  • A way to compute the derivative and second derivative of that boundary to compute an approximation of the curvature of the boundary at each point,
  • A method for identifying dents as changes in the curvature value.

In Matlab, most of these tools are readily available for us.  The bwboundaries function can take a binary image with the shape and produce the (x,y) sequence that forms the set of points around the boundary in order.  The fast Fourier transform then can be used to turn this sequence into a trigonometric polynomial as well as computing derivatives necessary to approximate the curvature at the points along the boundary.

Working backwards, our goal is to be able to compute the curvature at any point along the boundary of the shape.  We will be working with a parametric function defined as x=x(t) and y=y(t), which allows us to compute the curvature via:

\kappa = \frac{x'y''-y'x''}{\left( {x'}^2 + {y'}^2 \right)^\frac{3}{2}}

This means we need to have a way to compute the derivatives of the parametric function x’, x”, y’, and y”. It turns out this is pretty straightforward if we employ the Fourier transform. A useful property of the Fourier transform is its relationship to derivatives of functions. Specifically, given a function x(t), the following property holds:

\mathcal{F} \left[ \frac{d^n}{dt^n} x(t) \right] = (i \omega)^n X(i \omega)

where:

X(i \omega) = \mathcal{F} [x(t)]

That relationship happens to be quite convenient, since it means that if we can take the Fourier transform of our parametric function, then the derivatives come with little effort. We can perform some multiplications in the frequency domain and then use the inverse Fourier transform to recover the derivative.

x^{(n)}(t) = \mathcal{F}^{-1}\left[ (i \omega)^n X(i \omega) \right]

Armed with this information, we must first obtain the sequence of points along the shape boundary that represent discrete samples of the parametric function that defines the boundary curve.  Starting with a binary image with just a few dents in the shape, we can extract this with bwboundaries.

[b,~] = bwboundaries(im, 8);
b1 = b{1};

x = b1(:,2);
y = b1(:,1);

This function returns a cell array in the event that more than one boundary can be found (e.g., if the shape has holes). In the images considered here, there are no holes, so the first element of the cell array is used and the rest (if any are present) are ignored. The x coordinates are extracted from the second column of b, and the y coordinates from the first column.  Now we want to head into the frequency domain.

xf = fft(x);
yf = fft(y);

As discussed above, differentiation via the FFT is just a matter of scaling the Fourier coefficients. More detail on how this works can be found in this document as well as this set of comments on Matlab central.

nx = length(xf);
hx = ceil(nx/2)-1;
ftdiff = (2i*pi/nx)*(0:hx);
ftdiff(nx:-1:nx-hx+1) = -ftdiff(2:hx+1);
ftddiff = (-(2i*pi/nx)^2)*(0:hx);
ftddiff(nx:-1:nx-hx+1) = ftddiff(2:hx+1);

Before we continue, we have to take care of pixel effects that will show up as unwanted high frequency components in the FFT. If we look closely at the boundary that is traced by bwboundaries, we see that it is jagged due to the discrete pixels. To the FFT, this looks like a really high frequency component of the shape boundary. In practice though, we aren't interested in pixel-level distortions of the shape - we care about much larger features, which lie in much lower frequencies than the pixel effects. If we don't deal with these high frequency components, we will see oscillations all over the boundary, and a resulting huge number of places where the curvature approaches zero. In the figure below, the red curve is the discrete boundary defined by bwboundaries, and the green curve is the boundary after low pass filtering.

pixel-effects

We can apply a crude low pass filter by simply zeroing out frequency components above some frequency that we chose arbitrarily. There are better ways to perform filtering, but they aren't useful for this post. In this case, we will preserve only the 24 lowest frequency components. Note that we preserve both ends of the FFT since it is symmetric, and preserve one more value on the lower end of the sequence since the first element is the DC offset and not the lowest frequency component.


xf(25:end-24) = 0;
yf(25:end-24) = 0;

The result looks reasonable.

pixel-effects-whole

Finally, we can apply our multipliers to compute the derivative and second derivative of the parametric function describing the shape boundary. We only want the real components of the complex FFT.


dx = real(ifft(xf.*ftdiff'));
dy = real(ifft(yf.*ftdiff'));
ddx = real(ifft(xf.*ftddiff'));
ddy = real(ifft(yf.*ftddiff'));

Here we can see the derivatives plotted along the curve. The blue arrows are the first, and the cyan arrows are the second derivative.

derivatives

Finally, we can compute our approximation for the curvature.


k = sqrt((ddy.*dx - ddx.*dy).^2) ./ ((dx.^2 + dy.^2).^(3/2));

curvature

The last step is to identify places where the curvature approaches zero. Ideally, we'd be working with the signed curvature so that we can identify where inflection points are by observing the normal vector to the boundary surface switching from pointing outward to inward and vice versa. The approximation above is not signed, so we rely on another little hack where make the following assumption: the shape that we are dealing with never has flat edges where the curvature is zero. Obviously, this isn't a good assumption in the general case, but sufficient to demonstrate the technique here. If this technique is applied in a situation where we want to do the right thing, the signed curvature is the thing we'd want to compute and we would look for sign changes.

Instead, we look for places where the curvature goes very close to zero, and assume those are our inflection points. Now, often more than one point near an inflection point exhibits near zero curvature, so we look for gaps of more than one point where the curvature is near zero. For each dent, we should see two inflection points - one where the boundary enters the dent, and one where it leaves it.

thresh = 0.001;
pts = find(k < thresh);
dpts = diff(pts);
n = (length(find(abs(dpts) > 1))+1)/2;

For the example images above, the code yields the correct number of dents (2, 7, and 3).

results

Before leaving, it's worth noting some issues with this.  First, the assumption that curvature diving towards zero implies that in the signed case it would have switched sign is definitely not valid in a general sense.  Second, the filtering process is damaging to dents that are "sharp" - it has the effect of rounding them off, which could cause problems.

On the other hand, working with the boundary as a parametric curve and computing derivatives using the Fourier transform does buy us robustness since we stay relatively true to a well defined (in a mathematical sense) notion of what a dent in the shape is.

[NOTE: After posting, I discovered that some of the images were corrupted and have since replaced them with JPG equivalents. The quality inline is not as good as I'd like, but I am having issues exporting my Matlab plots to PNG.]

Updates coming for out-of-date public code

For anyone who has read any of my blog posts from the last couple years and tried to run the code unsuccessfully, this is my fault.  I didn’t include proper cabal files with each program to ensure that you had the right versions of the libraries they depend on when you try to build them.  As such, nothing is there to stop you from unsuccessfully attempting to build them against the latest and greatest library versions on Hackage, a few of which have changed in API-incompatible ways.  This will be repaired in the not-too-distant future.  In the meantime, the safest way to make sure they build is to look at the blog posting date, and install the version of the library from hackage that was current around that time.  As far as I know, the only major problem dependency is the Gloss visualization library which underwent a few minor API changes that broke a number of my examples.

2012 Summer School on Geometry and Data

This summer in Moscow, Idaho, I will be participating in a summer school focused on mathematical methods in data analysis from the geometric perspective.  The title is the “2012 Summer School on Geometry and Data.”  Specifically, from the description of the program:

The topic of the summer school will be research at the intersection of geometric measure theory and geometric analysis with data analysis, especially in the presence of uncertainty.

This should be very interesting, as it is often the case that finding structure and information in a large pile of data is a question of understanding its fundamental geometric properties.  I’ll be participating as one of the people who comes at the problem from the computational side.  Check out the page for the summer school here, which has links to the poster for the session as well as a document that summarizes the topics to be covered.  Students can find information on applying on the page as well.

Simulation, Dynamics, and the FPU Problem

I recently wandered by Powell’s Technical Books (one of the many perks of working in Portland is having this nearby) and left with the book “The Genesis of Simulation in Dynamics: Pursuing the Fermi-Pasta-Ulam Problem“. I think it’s a pretty interesting book relative to others that I’ve picked up on dynamics and chaotic systems. The book revolves around one problem and follows its development since it was first posed in the 1950s as the field of dynamics unfolded around it. I figured I’d give a quick description of the problem that the book discusses here and possibly interest others to grab the book.

The model that the book discusses is known as the Fermi-Pasta-Ulam (or FPU) problem. It is quite simple. Consider a 1D string that is represented as a sequence of line segments connected at n vertices. The connections between vertices are basically springs that obey Hooke’s law. In the model, the focus is on the displacement of these vertices over time. For the case where all vertices are equally spaced, the string doesn’t move at all since it is in an equilibrium state. When they are not uniformly spaced, interesting things start happening. The following video shows the case where the vertices are displaced based on a sine wave.

(Note: the slowing and speeding up of the video is artificial and not part of the model, but is due to variations in compute time per frame on my computer since I recorded this as a screen capture).

The model is defined by tracking the second derivative of the position of the vertices. The unperturbed linear model is described as:

\ddot{q}_j = (q_{j+1}-q_j) + (q_{j-1}-q_j)

Adding in a perturbation that is nonlinear is what introduces the interesting behavior:

\ddot{q}_j = (q_{j+1}-q_j) + (q_{j-1}-q_j) + \alpha \left[(q_{j+1}-q_j)^2 - (q_j - q_{j-1})^2\right]

The parameter alpha allows us to control the amount that this nonlinear term contributes to the evolution of the system. In my example movies, I let alpha be somewhere around 0.5. In Matlab, this is easily written as:

function fpu(n,alpha,s,dt,iters)
  % initial positions for n points on the line 
  % and two additional points for the fixed ends
  q = (0:n+1)./(n+1);
  qrest = (0:n+1)./(n+1);

  % velocities start as zero
  qvel = zeros(1,n+2);
  
  % perturbation added with a sinusoid
  q = q + 0.1.*sin(s.*q .* pi);

  % make sure ends are fixed at 0 and 1
  q(n+2) = 1; q(1) = 0;
  
  for i=1:iters
    % first term of q"
    qterm1 = q(3:end)-q(2:end-1);
    % second term of q"
    qterm2 = q(1:end-2)-q(2:end-1);

    % qacc is q".  note that (a-b)^2 = (-(b-a))^2
    qacc = qterm1 + qterm2 + alpha .* (qterm1.^2 - qterm2.^2);

    % velocity is velocity + acc
    qvel(2:end-1) = qvel(2:end-1) + qacc;
    
    % position is updated by velocity * time step
    q(2:end-1) = q(2:end-1) + qvel(2:end-1).*dt;
  end
end

Adding a few strategic plot commands at the end of that loop lets you visualize it and see the plot I showed above.

The goal of the model originally was to look at how the energy in the system moved between the different vibrational modes of the system. As can be seen pretty clearly in the video above, over time, the single low frequency sinusoid that the system starts with evolves into a more complex motion where higher frequency modes gain energy and contribute more to the state of the system.

We can look at the energy in each mode by writing the Hamiltonian of the system as:

H_0(a,\dot{a}) = \frac{1}{2}\sum_{k=1}^n(\dot{a}_k^2 + \omega_k^2a_k^2)

where:

\omega_k = 2 sin \left(\frac{k \pi}{2(n+1)}\right)

So, if we compute each component of the sum and look at them individually (instead of looking at the overall energy of the system represented by H_0), we can see how the different modes contribute to the overall energy of the system. The code above can be easily modified to add this computation after the update to the q vector:

    for k=1:n
        a(i,k) = (sqrt(2/(n+1))) * sum(q(2:end-1).*sin((1:n)*k*pi/(n+1)));
        ap(i,k) = (sqrt(2/(n+1))) * ...
                  sum((j/n+1).*pi.*q(2:end-1).*cos((1:n)*k*pi/(n+1)));
    end
    h(i,:) = 0.5 .* (ap(i,:).^2 + omega.^2 .* a(i,:).^2);

Here is a look at the energy of modes 2-8 for the first 30000 iterations of the Matlab code:

As we can see, over time some of the higher frequency modes begin to gain more energy. This is apparent in the video as well.

Just for fun, we can also look at an instance of the problem where we intentionally start off with a perturbation. In this case, we start off with initial conditions that cause the right-most link in the chain to be stretched further than the rest, leading to a wave that moves back and forth down the string. The impact of this on the energy distribution is interesting as well. This video was created using the code above, but instead of passing a nice integer value for the parameter s (like 1.0), I passed in a value like s=1.3.

Looking at the plot of the energy in each mode, we see that now things are less regular. For example, look at the plot for mode 3 - instead of being a nice smooth peak, the shape of the peaks over time changes. This plot spans the first 3000 iterations of the model for this case, allowing us to look a bit more carefully at the behavior at a fine time scale.

Overall, I think this problem is an interesting one to play with and use to learn a little about dynamical systems. I'm not done with the book yet, but so far it is pretty interesting. Fortunately, these days we have computers far faster than those that Fermi and friends had, so it is possible to casually play with problems like this that were quite challenging to work on back in the earliest days of computing.

Concurrency in Languages Book

Introduction to Concurrency in Programming Languages: The Book

Book Cover

Over the last few years, due to multicore and manycore processor trends, concurrency and parallelism have been written about a number of times. For many programmers, books play an important role relative to blogs or magazines simply due to the level of detail and breadth that can fit in a document with a generous page budget that is rigorously edited and technically reviewed. As some of my readers here may have noticed in the side bar off to the side of the posts (you probably can’t see the link if you get my posts via an RSS reader), I’m an author of one of these books, and wanted to talk a little about it here. I recently found out that the book is offered in a Kindle for Mac or PC edition, at a fraction of the price of the original, so I thought it would be worth discussing given the lower cost now available to potential readers.

The book came out and was widely available in early 2010, and has been selling as well as I would hope with essentially no marketing. My hope is that getting word out via an article on here will lead to interested people noticing this project that took about two years to put together with my co-authors Tim Mattson of Intel and Craig Rasmussen of LANL (I started the book with Craig while I was still working at LANL).

Why did we write the book?

Our goal in writing this book was to provide a resource to people interested in concurrency and parallelism that took a bit of a different approach to the topic than other books out there. As I discuss in more detail below, we made a point of focusing on the topic without adopting a specific implementation technique or programming model, with a large dose of historical context and a grounding in generally applicable patterns that appear in concurrent code. I believe it is important to learn the topic through a general conceptual foundation rooted in the history of how technology today evolved from prior efforts over 30-40 years.

In researching the current playing field of books in the area when I started writing, I asked myself “why don’t I find these books totally satisfying?”. The most common answer that I found myself coming up with was that the books were often heavily rooted in one way of implementing concurrent or parallel programs. For example, it is not uncommon to find a book that discusses the topic solely in the context of MPI, or OpenMP, or Java, or (insert your favorite library here). This is quite reasonable for a practitioner who wants to learn the theory of the area in the context of the specific implementation in which they plan to work. For example, I found a book on concurrency in Java (“Java Concurrency in Practice”) to be quite wonderful, especially if you were interested in then going to write Java code (which, at the time I read it, I was). Similar books exist for other popular systems, and even lesser known ones like ZPL (also, a very good book).

Unfortunately, this mingling of general concepts with specific implementations of them can often lead to some generality being lost in the basic concepts. Often implementations adopt a specific way of approaching a problem leading to a skewed view on a broader topic. This isn’t unique to the topic of this book – a similar limited focus arises in other areas, like GUI programming, distributed programming, web frameworks, etc… Furthermore, it is quite common for libraries or languages to fall out of favor over time, either due to technology trends or simply a lack of maintenance in the specific tools. For example, the Parallel Virtual Machine library, a one time competitor to MPI, is no longer a serious contender for parallel programming – leading to many of the books that chose it as their implementation of choice to become similarly dusty and inaccessible to readers over time.

So, we embarked on the project with a few high-level goals in mind:

Don’t start with the premise that parallelism and concurrency are intrinsically difficult concepts to grasp.

As was recently pointed out to me when talking to someone about this at a conference, words like concurrent, simultaneous, contention, and so on, were in common usage far before computers existed. Unsurprisingly, this is because they come up in our daily lives quite frequently – in activities ranging from cooking a meal (which, I use as part of my argument that concurrency is quite natural in the book), to coordinating a meeting with a group of people, and performing “multitasking” at work. Concurrency is an intrinsic property of the real world, and humans are actually pretty good at managing it when it comes to physical and mental activities. Starting with this premise, and focusing on what technologies we have at hand, we can begin to understand where the difficulties actually lie. How often are they are due to limitations of technology and implementation choices versus inherently challenging properties of a particular problem to be solved?

Choose established, stable languages.

While it was inevitable that we would choose concrete languages to demonstrate examples, I set down strict criteria about the languages that were chosen. MPI isn’t a language (it’s a library), and it is relatively well covered in other literature. OpenMP on the other hand, is standardized and available to many programmers in multiple languages and resides fairly close to the language itself. Cilk represents a relatively mature extension on C that is very (very!) simple, yet very powerful. It was not surprising to see Intel buy Cilk++ a year or two ago. Erlang was chosen because it was well established and mature, and represented a fundamentally different programming model from the rest. Other declarative languages have been in the news for their concurrency features, but they were still moving targets in 2008 when the bulk of the book was being written. Fortran was originally included (and the appendix that was cut from the book is available for free on the book web site), but was cut due to a likely limited interested audience. I put the appendix on the web site after the book was published since, while not a common-place language, Fortran 2008 actually represents a fairly modern parallel programming language. This surprises some folks, especially when the last they heard of Fortran it was peppered with capital letters, line numbers, and goto statements everywhere. It seems many people are unaware of the 1990, 1995, 2003, and 2008 revisions of that language that removed much of the arcane cruft and added relatively interesting new features.

Provide a strong, language neutral conceptual basis.

Readers really should learn about concurrency control and the corresponding types of problems that arise in concurrent systems before diving into any specific implementation. This is often given light treatment in books, and students must accumulate the knowledge by fusing concepts that they learn in operating systems, databases, distributed systems, and parallel programming courses. I tried to pull much of this together into a discussion that covered the essentials from all of these topical sources. This was rooted in my observations during a couple year stay at the University of Oregon as an adjunct faculty member where I taught parallel programming, operating systems, and distributed systems. Those courses were quite valuable in testing the material that eventually made it into the book. Some of my students from those courses provided great feedback when I was putting the material together (they are acknowledged in the introduction).

Present the history – How did technology evolve to what we see now?

This is probably the most fun part of the book to me, and I really think this is critical for any reader who wants to know and deeply understand the state of the art. The hot technologies of today are definitely not new — almost all of them derive directly from work that was performed in the past, and how those past efforts evolved into what we see now is very informative. History tells us what worked, why it worked, and what didn’t work. History is where we see technologies that went out of popularity due to trends in industry or technology, yet are relevant again due to technology coming full circle back to concepts invented decades ago. For example, GPUs bear a strong resemblance to vector machines and massively parallel computers from the 1980s. Unsurprisingly, the programming models popular today from NVidia and others are extremely similar to those developed twenty or thirty years ago. Programmers are not immune from the old saying, “Those who cannot remember the past are condemned to repeat it” — knowing the history of your technologies makes you a wiser software engineer and developer.

As we were working on the book, about 9 months in, we asked Tim to join in to help us ground the book in terms of how we discussed and demonstrated the concepts. The original plan was to have the later third to half of the book focused on demonstrating concepts in the context of his parallel design patterns – so, why not ask him to help out. I was quite happy when he agreed to join in. This led to the next goal:

Present examples in a framework that programmers can apply generally.

The later chunk of the book is focused on applying what we discuss early on through a series of examples. Instead of laying chapters out in a specialized fashion (e.g., chapter on a web server, chapter on a parallel image processing algorithm, etc…), we chose instead to frame the examples in terms of a set of generalizable patterns that emerge in concurrent and parallel programs. By structuring the book this way, presenting concrete examples in the context of a generalizable framework for thinking about programming, we hoped to provide the reader with knowledge that was more easily applicable to problems they will face themselves.

Concluding remarks

Two years on from finishing, I am glad we chose to write the book the way we did instead of falling victim to the temptations to either focus on technologies that may be passing fads or unstable and ever-changing research platforms. I think the book should have, forgive the pun, a long shelf life.

I do hope you enjoy the book if you choose to grab a copy, and we are quite receptive to feedback for ways that the book could be improved if we ever get around to releasing a second edition. We maintain an errata list on the book web page, along with a set of source code examples, pointers to a variety of concurrent and parallel programming languages, and slides for those who wish to use the book in the classroom setting.

For those interested in more, here are some links:

- Table of contents (PDF)
- Book web page
- Amazon
- CRC Press book page

And don’t take my word on the book – we’ve had a number of postive reviews! I keep a set of links to book reviews here, and currently have four up – the ACM computing reviews, Times Higher Education Supplement, InsideHPC, and Scientific Programming journal.

Harmonic Motion Simulation

(Edit: This article has been revised to fix a problem with the original in which the model didn’t properly reproduce the video of the real system. That’s all fixed now.)

The model described in this post is pretty simple. Recently I found a pointer to this page that demonstrated harmonic motion with a collection of pendulums of different lengths. The apparatus shown here is based on a system described by Richard Berg in the article “Pendulum waves: A demonstration of wave motion using pendula” from the American Journal of Physics, Volume 59, Issue 2, 1991. Here is the video from the Harvard page showing the apparatus at work:

It seemed like an interesting and simple physical system, so I decided to see if I could replicate it computationally. This is fairly basic physics – this system is probably early undergrad level physics and math, right around the time when you first encounter non-linear partial differential equations.

To start, let’s remember what the equation of motion is for a damped pendulum (using Newton’s notation for derivatives):

\ddot{\theta} + \gamma \dot{\theta} + \frac{g}{l} sin(\theta) = 0

In this equation, \gamma is the damping coefficient, g is gravitational acceleration, and l is the length of the pendulum. In the video shown at the end, we assume that the damping coefficient is zero, so our little computational pendulum exists in a vacuum. If you download the code and set it to a non-zero value, you will see a nice damping out of the motion over time as one would expect in reality due to air resistance.

In the simulation, the variable we are concerned with is \theta:

The second order partial differential equation that describes this system isn’t one that is easily solvable directly, so we resort to methods from numerical analysis to attack it. In this case, it is common to employ a fourth order Runge-Kutta scheme (see this document for a derivation of the Runge-Kutta scheme from the second order PDE). To start, we rewrite the equation in two parts:

\omega = \dot{\theta}
\dot{\omega} = -\gamma \omega - \frac{g}{l}sin(\theta)

Now that we have the system in terms of simultaneous first order equations, we can use the Runge-Kutta scheme to define a sequence of \theta and \omega values that are easily computable as:

\theta_{n+1} = \theta_n + \frac{1}{6}(k_{1a} + 2k_{2a} + 2k_{3a} + k_{4a})
\omega_{n+1} = \omega_n + \frac{1}{6}(k_{1b} + 2k_{2b} + 2k_{3b} + k_{4b})

Where we have the four coefficients for each defined for a time step \Delta t as:

k_{1a} = \omega_n \Delta t
k_{2a} = (\omega_n + \frac{k_{1b}}{2}) \Delta t
k_{3a} = (\omega_n + \frac{k_{2b}}{2}) \Delta t
k_{4a} = (\omega_n + k_{3b}) \Delta t

And

k_{1b} = f(\theta_n, \omega_n, t) \Delta t
k_{2b} = f(\theta_n + \frac{k_{1a}}{2}, \omega_n + \frac{k_{1b}}{2}, t + \frac{\Delta t}{2}) \Delta t
k_{3b} = f(\theta_n + \frac{k_{2a}}{2}, \omega_n + \frac{k_{2b}}{2}, t + \frac{\Delta t}{2}) \Delta t
k_{4b} = f(\theta_n + k_{3a}, \omega_n + k_{3b}, t + \Delta t) \Delta t

where

f(\theta, \omega, t) = -\gamma \omega - \frac{g}{l}sin(\theta)

At this point, writing the code is really straightforward. First, define f. One change that we will make is that we will also specify the length of the pendulum, l, as a parameter. This won’t impact the derivations above though – it just makes it easier later on to compose together the set of individual pendulums of different lengths to mimic what we see in the video.

g :: Double
g = 9.80665 -- gravitational acceleration = 9.80665m/s^2

f :: Double -> Double -> Double -> Double -> Double
f theta omega t l = (-gamma)*omega - (g/l)*sin(theta)

Note that the t value isn't used. We keep it for completeness, given that f as formulated formally is a function of \ddot{\theta}, \dot{\theta}, \theta, and t. Once we have this, we can then define the code that implements one step of the Runge-Kutta solver for a given time step dt:

dt :: Double
dt = 0.01

solve :: (Double, Double, Double, Double) -> (Double, Double, Double, Double)
solve (theta, omega, t, l) =
  (theta + (1/6)*(k1a + 2*k2a + 2*k3a + k4a),
   omega + (1/6)*(k1b + 2*k2b + 2*k3b + k4b),
   t', l)
  where
    t' = t + dt
    k1a = omega * dt
    k1b = (f theta omega t' l) * dt
    k2a = (omega + k1b/2) * dt
    k2b = (f (theta + k1a/2) (omega + k1b/2) (t' + dt/2) l) * dt
    k3a = (omega + k2b/2) * dt
    k3b = (f (theta + k2a/2) (omega + k2b/2) (t' + dt/2) l) * dt
    k4a = (omega + k3b) * dt
    k4b = (f (theta + k3a) (omega + k3b) (t' + dt) l) * dt

Now, the original video that inspired me to spend my Friday evening playing with this model included a set of pendulums of different lengths. The Berg paper describes the lengths as follows: the longest pendulum is intended to achieve X oscillations over a time period of T seconds. Each shorter pendulum should achieve one additional oscillation over that same time period. Now, we know that the time for one oscillation of a pendulum of length l is given by:

t = 2 \pi \sqrt{\frac{l}{g}}

From this, we can derive an equation for the length of pendulum n (where n runs from 1 to N, for N pendula with the Nth longest):

l_n = g\left(\frac{T}{2 \pi (X+(N-n))}\right)^2

In the Berg paper, X is 60 and T is 54 seconds, which we can then implement:

lengths :: Int -> Double -> Double -> [Double]
lengths n t l =
  let thelen curn = (t/((l+((fromIntegral n)-curn)) * 2.0 * pi))**2 * g
  in
      [thelen (fromIntegral i) | i <- [1..n]]

theta0 = pi / 8
omega0 = 0
npendu = 24
starts = map (\i -> (theta0, omega0, t, i)) (lengths npendu 54.0 60.0)

So, every pendulum starts at \theta = \frac{\pi}{8}, and we have 24 pendulums that take on lengths corresponding to those in the paper.

In order to watch the code run and replicate (as best as possible with this simple simulation) the video, I wrapped this code with Gloss to add simple visualization capabilities. The code is pretty simple: I set up some constants to define the window size, and then convert the angle into (x,y) coordinates via (x,y) = (l sin(\theta), -l cos(\theta)) where l is the length of the pendulum being rendered.

-- window parameters
winWidth :: Int
winWidth  = 600

winHeight :: Int
winHeight = 600

-- circle radius
cradius   = 5

-- colors for the pendulum and the string
clr = makeColor 1.0 1.0 0.0 1.0
lineclr = makeColor 0.3 0.3 0.3 0.5

-- render one pendulum
renderPendulum :: (Double, Double, Double, Double) -> Picture
renderPendulum (theta, omega, t, l) =
  let x = double2Float $ l*sin theta
      y = double2Float $ -(l*cos theta)
      twidth = ((fromIntegral winWidth) / 2)-15
      theight = ((fromIntegral winHeight) / 2)-15
  in
    Pictures $ [
      Color lineclr $
        Line [(0,0), (x*twidth,y*theight)],
      Color clr $
        Translate (x*twidth) (y*theight) $
        Circle cradius ]

-- render the list of pendulums
renderPendulums ps = Pictures $ map renderPendulum ps

This is then called from the main function that sets up the initial conditions for each, and then uses the Gloss simulateInWindow function to do the animation. Note that we don't use the time parameter of the simulateInWindow function, and control the time stepping using the dt value defined before.

main :: IO ()
main = do
  let niter = 1000
      theta0 = pi / 8
      omega0 = 0
      t = 0
      npendu = 15
      starts = map (\i -> (theta0, omega0, t, i)) (lengths npendu 54.0 60.0)
  simulateInWindow
    "Pendulums"
    (winWidth, winHeight)
    (1, 1)
    (greyN 0.1)
    120
    starts
    renderPendulums
    (\vp f m -> map solve m)

The result of this is shown in the video below.

If you are interested in playing with this code, you can find it in my public github repository. Enjoy playing with the pendula!