2012 Summer School on Geometry and Data

This summer in Moscow, Idaho, I will be participating in a summer school focused on mathematical methods in data analysis from the geometric perspective.  The title is the “2012 Summer School on Geometry and Data.”  Specifically, from the description of the program:

The topic of the summer school will be research at the intersection of geometric measure theory and geometric analysis with data analysis, especially in the presence of uncertainty.

This should be very interesting, as it is often the case that finding structure and information in a large pile of data is a question of understanding its fundamental geometric properties.  I’ll be participating as one of the people who comes at the problem from the computational side.  Check out the page for the summer school here, which has links to the poster for the session as well as a document that summarizes the topics to be covered.  Students can find information on applying on the page as well.

Advertisements

Simulation, Dynamics, and the FPU Problem

I recently wandered by Powell’s Technical Books (one of the many perks of working in Portland is having this nearby) and left with the book “The Genesis of Simulation in Dynamics: Pursuing the Fermi-Pasta-Ulam Problem“. I think it’s a pretty interesting book relative to others that I’ve picked up on dynamics and chaotic systems. The book revolves around one problem and follows its development since it was first posed in the 1950s as the field of dynamics unfolded around it. I figured I’d give a quick description of the problem that the book discusses here and possibly interest others to grab the book.

The model that the book discusses is known as the Fermi-Pasta-Ulam (or FPU) problem. It is quite simple. Consider a 1D string that is represented as a sequence of line segments connected at n vertices. The connections between vertices are basically springs that obey Hooke’s law. In the model, the focus is on the displacement of these vertices over time. For the case where all vertices are equally spaced, the string doesn’t move at all since it is in an equilibrium state. When they are not uniformly spaced, interesting things start happening. The following video shows the case where the vertices are displaced based on a sine wave.

(Note: the slowing and speeding up of the video is artificial and not part of the model, but is due to variations in compute time per frame on my computer since I recorded this as a screen capture).

The model is defined by tracking the second derivative of the position of the vertices. The unperturbed linear model is described as:

\ddot{q}_j = (q_{j+1}-q_j) + (q_{j-1}-q_j)

Adding in a perturbation that is nonlinear is what introduces the interesting behavior:

\ddot{q}_j = (q_{j+1}-q_j) + (q_{j-1}-q_j) + \alpha \left[(q_{j+1}-q_j)^2 - (q_j - q_{j-1})^2\right]

The parameter alpha allows us to control the amount that this nonlinear term contributes to the evolution of the system. In my example movies, I let alpha be somewhere around 0.5. In Matlab, this is easily written as:

function fpu(n,alpha,s,dt,iters)
  % initial positions for n points on the line 
  % and two additional points for the fixed ends
  q = (0:n+1)./(n+1);
  qrest = (0:n+1)./(n+1);

  % velocities start as zero
  qvel = zeros(1,n+2);
  
  % perturbation added with a sinusoid
  q = q + 0.1.*sin(s.*q .* pi);

  % make sure ends are fixed at 0 and 1
  q(n+2) = 1; q(1) = 0;
  
  for i=1:iters
    % first term of q"
    qterm1 = q(3:end)-q(2:end-1);
    % second term of q"
    qterm2 = q(1:end-2)-q(2:end-1);

    % qacc is q".  note that (a-b)^2 = (-(b-a))^2
    qacc = qterm1 + qterm2 + alpha .* (qterm1.^2 - qterm2.^2);

    % velocity is velocity + acc
    qvel(2:end-1) = qvel(2:end-1) + qacc;
    
    % position is updated by velocity * time step
    q(2:end-1) = q(2:end-1) + qvel(2:end-1).*dt;
  end
end

Adding a few strategic plot commands at the end of that loop lets you visualize it and see the plot I showed above.

The goal of the model originally was to look at how the energy in the system moved between the different vibrational modes of the system. As can be seen pretty clearly in the video above, over time, the single low frequency sinusoid that the system starts with evolves into a more complex motion where higher frequency modes gain energy and contribute more to the state of the system.

We can look at the energy in each mode by writing the Hamiltonian of the system as:

H_0(a,\dot{a}) = \frac{1}{2}\sum_{k=1}^n(\dot{a}_k^2 + \omega_k^2a_k^2)

where:

\omega_k = 2 sin \left(\frac{k \pi}{2(n+1)}\right)

So, if we compute each component of the sum and look at them individually (instead of looking at the overall energy of the system represented by H_0), we can see how the different modes contribute to the overall energy of the system. The code above can be easily modified to add this computation after the update to the q vector:

    for k=1:n
        a(i,k) = (sqrt(2/(n+1))) * sum(q(2:end-1).*sin((1:n)*k*pi/(n+1)));
        ap(i,k) = (sqrt(2/(n+1))) * ...
                  sum((j/n+1).*pi.*q(2:end-1).*cos((1:n)*k*pi/(n+1)));
    end
    h(i,:) = 0.5 .* (ap(i,:).^2 + omega.^2 .* a(i,:).^2);

Here is a look at the energy of modes 2-8 for the first 30000 iterations of the Matlab code:

As we can see, over time some of the higher frequency modes begin to gain more energy. This is apparent in the video as well.

Just for fun, we can also look at an instance of the problem where we intentionally start off with a perturbation. In this case, we start off with initial conditions that cause the right-most link in the chain to be stretched further than the rest, leading to a wave that moves back and forth down the string. The impact of this on the energy distribution is interesting as well. This video was created using the code above, but instead of passing a nice integer value for the parameter s (like 1.0), I passed in a value like s=1.3.

Looking at the plot of the energy in each mode, we see that now things are less regular. For example, look at the plot for mode 3 – instead of being a nice smooth peak, the shape of the peaks over time changes. This plot spans the first 3000 iterations of the model for this case, allowing us to look a bit more carefully at the behavior at a fine time scale.

Overall, I think this problem is an interesting one to play with and use to learn a little about dynamical systems. I’m not done with the book yet, but so far it is pretty interesting. Fortunately, these days we have computers far faster than those that Fermi and friends had, so it is possible to casually play with problems like this that were quite challenging to work on back in the earliest days of computing.

Concurrency in Languages Book

Introduction to Concurrency in Programming Languages: The Book

Book Cover

Over the last few years, due to multicore and manycore processor trends, concurrency and parallelism have been written about a number of times. For many programmers, books play an important role relative to blogs or magazines simply due to the level of detail and breadth that can fit in a document with a generous page budget that is rigorously edited and technically reviewed. As some of my readers here may have noticed in the side bar off to the side of the posts (you probably can’t see the link if you get my posts via an RSS reader), I’m an author of one of these books, and wanted to talk a little about it here. I recently found out that the book is offered in a Kindle for Mac or PC edition, at a fraction of the price of the original, so I thought it would be worth discussing given the lower cost now available to potential readers.

The book came out and was widely available in early 2010, and has been selling as well as I would hope with essentially no marketing. My hope is that getting word out via an article on here will lead to interested people noticing this project that took about two years to put together with my co-authors Tim Mattson of Intel and Craig Rasmussen of LANL (I started the book with Craig while I was still working at LANL).

Why did we write the book?

Our goal in writing this book was to provide a resource to people interested in concurrency and parallelism that took a bit of a different approach to the topic than other books out there. As I discuss in more detail below, we made a point of focusing on the topic without adopting a specific implementation technique or programming model, with a large dose of historical context and a grounding in generally applicable patterns that appear in concurrent code. I believe it is important to learn the topic through a general conceptual foundation rooted in the history of how technology today evolved from prior efforts over 30-40 years.

In researching the current playing field of books in the area when I started writing, I asked myself “why don’t I find these books totally satisfying?”. The most common answer that I found myself coming up with was that the books were often heavily rooted in one way of implementing concurrent or parallel programs. For example, it is not uncommon to find a book that discusses the topic solely in the context of MPI, or OpenMP, or Java, or (insert your favorite library here). This is quite reasonable for a practitioner who wants to learn the theory of the area in the context of the specific implementation in which they plan to work. For example, I found a book on concurrency in Java (“Java Concurrency in Practice”) to be quite wonderful, especially if you were interested in then going to write Java code (which, at the time I read it, I was). Similar books exist for other popular systems, and even lesser known ones like ZPL (also, a very good book).

Unfortunately, this mingling of general concepts with specific implementations of them can often lead to some generality being lost in the basic concepts. Often implementations adopt a specific way of approaching a problem leading to a skewed view on a broader topic. This isn’t unique to the topic of this book – a similar limited focus arises in other areas, like GUI programming, distributed programming, web frameworks, etc… Furthermore, it is quite common for libraries or languages to fall out of favor over time, either due to technology trends or simply a lack of maintenance in the specific tools. For example, the Parallel Virtual Machine library, a one time competitor to MPI, is no longer a serious contender for parallel programming – leading to many of the books that chose it as their implementation of choice to become similarly dusty and inaccessible to readers over time.

So, we embarked on the project with a few high-level goals in mind:

Don’t start with the premise that parallelism and concurrency are intrinsically difficult concepts to grasp.

As was recently pointed out to me when talking to someone about this at a conference, words like concurrent, simultaneous, contention, and so on, were in common usage far before computers existed. Unsurprisingly, this is because they come up in our daily lives quite frequently – in activities ranging from cooking a meal (which, I use as part of my argument that concurrency is quite natural in the book), to coordinating a meeting with a group of people, and performing “multitasking” at work. Concurrency is an intrinsic property of the real world, and humans are actually pretty good at managing it when it comes to physical and mental activities. Starting with this premise, and focusing on what technologies we have at hand, we can begin to understand where the difficulties actually lie. How often are they are due to limitations of technology and implementation choices versus inherently challenging properties of a particular problem to be solved?

Choose established, stable languages.

While it was inevitable that we would choose concrete languages to demonstrate examples, I set down strict criteria about the languages that were chosen. MPI isn’t a language (it’s a library), and it is relatively well covered in other literature. OpenMP on the other hand, is standardized and available to many programmers in multiple languages and resides fairly close to the language itself. Cilk represents a relatively mature extension on C that is very (very!) simple, yet very powerful. It was not surprising to see Intel buy Cilk++ a year or two ago. Erlang was chosen because it was well established and mature, and represented a fundamentally different programming model from the rest. Other declarative languages have been in the news for their concurrency features, but they were still moving targets in 2008 when the bulk of the book was being written. Fortran was originally included (and the appendix that was cut from the book is available for free on the book web site), but was cut due to a likely limited interested audience. I put the appendix on the web site after the book was published since, while not a common-place language, Fortran 2008 actually represents a fairly modern parallel programming language. This surprises some folks, especially when the last they heard of Fortran it was peppered with capital letters, line numbers, and goto statements everywhere. It seems many people are unaware of the 1990, 1995, 2003, and 2008 revisions of that language that removed much of the arcane cruft and added relatively interesting new features.

Provide a strong, language neutral conceptual basis.

Readers really should learn about concurrency control and the corresponding types of problems that arise in concurrent systems before diving into any specific implementation. This is often given light treatment in books, and students must accumulate the knowledge by fusing concepts that they learn in operating systems, databases, distributed systems, and parallel programming courses. I tried to pull much of this together into a discussion that covered the essentials from all of these topical sources. This was rooted in my observations during a couple year stay at the University of Oregon as an adjunct faculty member where I taught parallel programming, operating systems, and distributed systems. Those courses were quite valuable in testing the material that eventually made it into the book. Some of my students from those courses provided great feedback when I was putting the material together (they are acknowledged in the introduction).

Present the history – How did technology evolve to what we see now?

This is probably the most fun part of the book to me, and I really think this is critical for any reader who wants to know and deeply understand the state of the art. The hot technologies of today are definitely not new — almost all of them derive directly from work that was performed in the past, and how those past efforts evolved into what we see now is very informative. History tells us what worked, why it worked, and what didn’t work. History is where we see technologies that went out of popularity due to trends in industry or technology, yet are relevant again due to technology coming full circle back to concepts invented decades ago. For example, GPUs bear a strong resemblance to vector machines and massively parallel computers from the 1980s. Unsurprisingly, the programming models popular today from NVidia and others are extremely similar to those developed twenty or thirty years ago. Programmers are not immune from the old saying, “Those who cannot remember the past are condemned to repeat it” — knowing the history of your technologies makes you a wiser software engineer and developer.

As we were working on the book, about 9 months in, we asked Tim to join in to help us ground the book in terms of how we discussed and demonstrated the concepts. The original plan was to have the later third to half of the book focused on demonstrating concepts in the context of his parallel design patterns – so, why not ask him to help out. I was quite happy when he agreed to join in. This led to the next goal:

Present examples in a framework that programmers can apply generally.

The later chunk of the book is focused on applying what we discuss early on through a series of examples. Instead of laying chapters out in a specialized fashion (e.g., chapter on a web server, chapter on a parallel image processing algorithm, etc…), we chose instead to frame the examples in terms of a set of generalizable patterns that emerge in concurrent and parallel programs. By structuring the book this way, presenting concrete examples in the context of a generalizable framework for thinking about programming, we hoped to provide the reader with knowledge that was more easily applicable to problems they will face themselves.

Concluding remarks

Two years on from finishing, I am glad we chose to write the book the way we did instead of falling victim to the temptations to either focus on technologies that may be passing fads or unstable and ever-changing research platforms. I think the book should have, forgive the pun, a long shelf life.

I do hope you enjoy the book if you choose to grab a copy, and we are quite receptive to feedback for ways that the book could be improved if we ever get around to releasing a second edition. We maintain an errata list on the book web page, along with a set of source code examples, pointers to a variety of concurrent and parallel programming languages, and slides for those who wish to use the book in the classroom setting.

For those interested in more, here are some links:

Table of contents (PDF)
Book web page
Amazon
CRC Press book page

And don’t take my word on the book – we’ve had a number of postive reviews! I keep a set of links to book reviews here, and currently have four up – the ACM computing reviews, Times Higher Education Supplement, InsideHPC, and Scientific Programming journal.

Harmonic Motion Simulation

(Edit: This article has been revised to fix a problem with the original in which the model didn’t properly reproduce the video of the real system. That’s all fixed now.)

The model described in this post is pretty simple. Recently I found a pointer to this page that demonstrated harmonic motion with a collection of pendulums of different lengths. The apparatus shown here is based on a system described by Richard Berg in the article “Pendulum waves: A demonstration of wave motion using pendula” from the American Journal of Physics, Volume 59, Issue 2, 1991. Here is the video from the Harvard page showing the apparatus at work:

It seemed like an interesting and simple physical system, so I decided to see if I could replicate it computationally. This is fairly basic physics – this system is probably early undergrad level physics and math, right around the time when you first encounter non-linear partial differential equations.

To start, let’s remember what the equation of motion is for a damped pendulum (using Newton’s notation for derivatives):

\ddot{\theta} + \gamma \dot{\theta} + \frac{g}{l} sin(\theta) = 0

In this equation, \gamma is the damping coefficient, g is gravitational acceleration, and l is the length of the pendulum. In the video shown at the end, we assume that the damping coefficient is zero, so our little computational pendulum exists in a vacuum. If you download the code and set it to a non-zero value, you will see a nice damping out of the motion over time as one would expect in reality due to air resistance.

In the simulation, the variable we are concerned with is \theta:

The second order partial differential equation that describes this system isn’t one that is easily solvable directly, so we resort to methods from numerical analysis to attack it. In this case, it is common to employ a fourth order Runge-Kutta scheme (see this document for a derivation of the Runge-Kutta scheme from the second order PDE). To start, we rewrite the equation in two parts:

\omega = \dot{\theta}
\dot{\omega} = -\gamma \omega - \frac{g}{l}sin(\theta)

Now that we have the system in terms of simultaneous first order equations, we can use the Runge-Kutta scheme to define a sequence of \theta and \omega values that are easily computable as:

\theta_{n+1} = \theta_n + \frac{1}{6}(k_{1a} + 2k_{2a} + 2k_{3a} + k_{4a})
\omega_{n+1} = \omega_n + \frac{1}{6}(k_{1b} + 2k_{2b} + 2k_{3b} + k_{4b})

Where we have the four coefficients for each defined for a time step \Delta t as:

k_{1a} = \omega_n \Delta t
k_{2a} = (\omega_n + \frac{k_{1b}}{2}) \Delta t
k_{3a} = (\omega_n + \frac{k_{2b}}{2}) \Delta t
k_{4a} = (\omega_n + k_{3b}) \Delta t

And

k_{1b} = f(\theta_n, \omega_n, t) \Delta t
k_{2b} = f(\theta_n + \frac{k_{1a}}{2}, \omega_n + \frac{k_{1b}}{2}, t + \frac{\Delta t}{2}) \Delta t
k_{3b} = f(\theta_n + \frac{k_{2a}}{2}, \omega_n + \frac{k_{2b}}{2}, t + \frac{\Delta t}{2}) \Delta t
k_{4b} = f(\theta_n + k_{3a}, \omega_n + k_{3b}, t + \Delta t) \Delta t

where

f(\theta, \omega, t) = -\gamma \omega - \frac{g}{l}sin(\theta)

At this point, writing the code is really straightforward. First, define f. One change that we will make is that we will also specify the length of the pendulum, l, as a parameter. This won’t impact the derivations above though – it just makes it easier later on to compose together the set of individual pendulums of different lengths to mimic what we see in the video.

g :: Double
g = 9.80665 -- gravitational acceleration = 9.80665m/s^2

f :: Double -> Double -> Double -> Double -> Double
f theta omega t l = (-gamma)*omega - (g/l)*sin(theta)

Note that the t value isn’t used. We keep it for completeness, given that f as formulated formally is a function of \ddot{\theta}, \dot{\theta}, \theta, and t. Once we have this, we can then define the code that implements one step of the Runge-Kutta solver for a given time step dt:

dt :: Double
dt = 0.01

solve :: (Double, Double, Double, Double) -> (Double, Double, Double, Double)
solve (theta, omega, t, l) =
  (theta + (1/6)*(k1a + 2*k2a + 2*k3a + k4a),
   omega + (1/6)*(k1b + 2*k2b + 2*k3b + k4b),
   t', l)
  where
    t' = t + dt
    k1a = omega * dt
    k1b = (f theta omega t' l) * dt
    k2a = (omega + k1b/2) * dt
    k2b = (f (theta + k1a/2) (omega + k1b/2) (t' + dt/2) l) * dt
    k3a = (omega + k2b/2) * dt
    k3b = (f (theta + k2a/2) (omega + k2b/2) (t' + dt/2) l) * dt
    k4a = (omega + k3b) * dt
    k4b = (f (theta + k3a) (omega + k3b) (t' + dt) l) * dt

Now, the original video that inspired me to spend my Friday evening playing with this model included a set of pendulums of different lengths. The Berg paper describes the lengths as follows: the longest pendulum is intended to achieve X oscillations over a time period of T seconds. Each shorter pendulum should achieve one additional oscillation over that same time period. Now, we know that the time for one oscillation of a pendulum of length l is given by:

t = 2 \pi \sqrt{\frac{l}{g}}

From this, we can derive an equation for the length of pendulum n (where n runs from 1 to N, for N pendula with the Nth longest):

l_n = g\left(\frac{T}{2 \pi (X+(N-n))}\right)^2

In the Berg paper, X is 60 and T is 54 seconds, which we can then implement:

lengths :: Int -> Double -> Double -> [Double]
lengths n t l =
  let thelen curn = (t/((l+((fromIntegral n)-curn)) * 2.0 * pi))**2 * g
  in
      [thelen (fromIntegral i) | i <- [1..n]]

theta0 = pi / 8
omega0 = 0
npendu = 24
starts = map (\i -> (theta0, omega0, t, i)) (lengths npendu 54.0 60.0)

So, every pendulum starts at \theta = \frac{\pi}{8}, and we have 24 pendulums that take on lengths corresponding to those in the paper.

In order to watch the code run and replicate (as best as possible with this simple simulation) the video, I wrapped this code with Gloss to add simple visualization capabilities. The code is pretty simple: I set up some constants to define the window size, and then convert the angle into (x,y) coordinates via (x,y) = (l sin(\theta), -l cos(\theta)) where l is the length of the pendulum being rendered.

-- window parameters
winWidth :: Int
winWidth  = 600

winHeight :: Int
winHeight = 600

-- circle radius
cradius   = 5

-- colors for the pendulum and the string
clr = makeColor 1.0 1.0 0.0 1.0
lineclr = makeColor 0.3 0.3 0.3 0.5

-- render one pendulum
renderPendulum :: (Double, Double, Double, Double) -> Picture
renderPendulum (theta, omega, t, l) =
  let x = double2Float $ l*sin theta
      y = double2Float $ -(l*cos theta)
      twidth = ((fromIntegral winWidth) / 2)-15
      theight = ((fromIntegral winHeight) / 2)-15
  in
    Pictures $ [
      Color lineclr $
        Line [(0,0), (x*twidth,y*theight)],
      Color clr $
        Translate (x*twidth) (y*theight) $
        Circle cradius ]

-- render the list of pendulums
renderPendulums ps = Pictures $ map renderPendulum ps

This is then called from the main function that sets up the initial conditions for each, and then uses the Gloss simulateInWindow function to do the animation. Note that we don’t use the time parameter of the simulateInWindow function, and control the time stepping using the dt value defined before.

main :: IO ()
main = do
  let niter = 1000
      theta0 = pi / 8
      omega0 = 0
      t = 0
      npendu = 15
      starts = map (\i -> (theta0, omega0, t, i)) (lengths npendu 54.0 60.0)
  simulateInWindow
    "Pendulums"
    (winWidth, winHeight)
    (1, 1)
    (greyN 0.1)
    120
    starts
    renderPendulums
    (\vp f m -> map solve m)

The result of this is shown in the video below.

If you are interested in playing with this code, you can find it in my public github repository. Enjoy playing with the pendula!

Numerical optimization informed by computational cost for simulation tuning

Here’s another short post that refers interested people to a recently accepted paper for publication.

M. A. Abramson, T. J. Asaki, J. E. Dennis, Jr., R. Magallanez, and M. J. Sottile. “Efficently solving computationally expensive optimization problems with CPU time-related functions”, to appear in Structural and Multidisciplinary Optimization.

A preprint lives here. The abstract is:

In this paper, we characterize a new class of computationally expensive optimization problems and introduce an approach for solving them. In this class of problems, objective function values may be directly related to the computational time required to obtain them, so that, as the optimal solution is approached, the computational time required to evaluate the objective is significantly less than at points farther away from the solution. This is motivated by an application in which each objective function evaluation requires both a numerical fluid dynamics simulation and an image registration process, and the goal is to find the parameter values of a predetermined reference image by comparing the flow dynamics from the numerical simulation and the reference image through the image comparison process. In designing an approach to numerically solve the more general class of problems in an efficient way, we make use of surrogates based on CPU times of previously evaluated points, rather than their function values, all within the search step framework of mesh adaptive direct search algorithms. Because of the expected positive correlation between function values and their CPU times, a time cutoff parameter is added to the objective function evaluation to allow its termination during the comparison process if the computational time exceeds a specified threshold. The approach was tested using the NOMADm and DACE MATLAB software packages, and results are presented.

The basic problem that this paper is related to is parameter optimization for simulations, specifically those that involve fluid dynamics models. The figure above shows a snapshot of a fluid model for the simple lid-driven cavity problem that we used in the paper. In a number of problems that I encountered in my work at LANL, we were faced with the question of finding the optimal set of parameters for a simulation to match data that was experimentally obtained. Often this included algorithmic parameters, like mesh and grid resolutions, along with solver parameters (e.g., convergence criteria), and physical parameters. What is interesting is that a number of parameters have a direct impact on compute time. For example, resolving a grid to a finer resolution will result in a higher per-iteration compute time. The question that we posed was, what can one do if attacking the problem of both tuning parameters based on observed experimental data and identifying parameters that minimize compute time. That’s what this paper is all about.

Playing with the Naming Game

Introduction

After reading a recent paper in Physical Review E, “Role of feedback and broadcasting in the naming game“, I decided to implement the model used in the paper just to play around with the algorithm and reproduce the results. Working through the rather simple algorithm, I realized that it would be an interesting little exercise to share.

The naming game is a simple model that was introduced a number of years in the article “A Self-Organizing Spatial Vocabulary” by L. Steels. It turns out that it captures features of distributed algorithms, such as distributed leader election, and allows analysis of properties of systems that implement these algorithms. People have looked at the model from the point of view of statistical physics in recent years — a number of articles on the topic are referenced by the recent Phys. Rev. E article for those curious about digging in deeper.

The model proceeds as follows. We start with a set of n entities, each of which know no words. Each time step, we pick two of these entities, and select one to be the speaker and the other to be the hearer. The speaker selects a word from the set of words that they know, and asks the hearer if they know the word as well. If the speaker knows no words, it invents a unique one and uses it. If the hearer doesn’t know the word, then it adds it to its dictionary. If it does know the word, then we have a few possible outcomes. In one outcome, both the speaker and hearer clear their dictionaries and only store the word that they have discovered someone else knows. In another outcome, the hearer clears their dictionary while the speaker keeps theirs full. Similarly, the third outcome is the one in which the speaker clears the dictionary, with the hearer keeping theirs intact.

It’s a simple model, and when we run it, we see interesting behavior like the following:

In this plot (made with GNU R), the x-axis represents time and the y-axis represents the number of unique words that exist in the population. The plots show the average number of words known within the population for 15 trials, each with 200 individuals over 10000 time steps. As we can see, there is a period of invention, in which a set of words are invented, followed by a long period over which time the population slowly converges on a single common word. In this case, we see all three update scenarios — the original rule in which both speaker and hearer clear dictionaries on agreement, red when the speaker is the one who clears, and blue when the hearer is the one who clears. These plots appear to agree with those in the paper, which means the little implementation I put together works.

Implementing this is pretty straightforward, but in doing so an interesting (albeit not very complicated) pattern emerges that can be elegantly dealt with in Haskell using the type system.

Consider the constraints of the model:

  • Random number generation implies some changing state.
  • Generation of unique new words efficiently also can be implemented via state.
  • As the model runs, at each time step we care about logging the size of the world vocabulary.
  • A set of parameters related to the model need to be accessed at various points in the code.

As usual, for those who wish to follow along and see all of the code (not all of it is posted here in the blog post), you can find it in the “ng” subdirectory of my public github repo.

Naive implementation

A first, quick and dirty implementation that I expected to run reasonably quickly was written by letting the C part of my brain drive, but using Haskell syntax. A simple data type was created in which IORefs are used to store mutable state:

data NGState = NGState {
  randState :: IORef StdGen,
  wordState :: IORef Int
}

One field of the record holds the state of the random number generator, and the other the counter used to generate new words. A set of simple functions is then responsible for manipulating values of this type.

newNGState :: IO NGState
newNGState = do
  r <- newIORef (mkStdGen 1)
  w <- newIORef 0
  return $ NGState { randState = r, wordState = w }

getInt :: NGState -> Int -> IO Int
getInt s n = do
  let randGen = randState s
  gen <- readIORef randGen
  let (val, gen') = randomR (0,(n-1)) gen
  writeIORef randGen gen'
  return val

newWord :: NGState -> IO Int
newWord w = do
  let wordGenerator = wordState w
  cur <- readIORef wordGenerator
  writeIORef wordGenerator (cur+1)
  return cur

Not the worlds most elegant way to solve the problem, but it works. More on the cleaner solution below, but first, we can see the other routines that make up the model. First, we define an individual to be a list of words, which here are represented as Ints.

type Word = Int
type Individual = [Word]

Now, given two individuals and a word to communicate between them, we have the function that updates their dictionaries.

knowsWord_NG :: (Individual,Individual) -> Word -> (Individual, Individual)
knowsWord_NG (a,b) w = 
  case (elem w b) of
    True  -> ([w],[w])
    False -> (a, b++[w]) 

If the word is an element of the dictionary that defines the second individual, we return the two individuals with their dictionaries set to just the word. Otherwise, we pass the first individual out unmodified, and attach the new word to the dictionary of the second. The other two rules are similar to define.

knowsWord_HO_NG :: (Individual,Individual) -> Word 
                -> (Individual, Individual)
knowsWord_HO_NG (a,b) w =
  case (elem w b) of
    True  -> (a,[w])
    False -> (a, b++[w])

knowsWord_SO_NG :: (Individual,Individual) -> Word 
                -> (Individual, Individual)
knowsWord_SO_NG (a,b) w =
  case (elem w b) of
    True  -> ([w],b)
    False -> (a, b++[w])

Now, this is called by the routine that, given two individuals, selects the word to test from the dictionary of the first (or generates a word if that first individual knows nothing so far).

testWord :: (Individual, Individual) -> NGState 
         -> IO (Individual, Individual)
testWord (a,b) s | (a == []) = do
  w <- newWord s
  return ([w],b++[w])
testWord (a,b) s | otherwise = do
  let n = length a
  widx <- getInt s n
  let wval = (!!) a widx
      (a',b') = knowsWord_NG (a,b) wval
  return (a',b')

Finally, the time step function that selects two individuals and tests them using the code above.

timestepOnePair :: [Individual] -> NGState -> IO [Individual]
timestepOnePair w s = do
  let n = length w
  a <- getInt s n
  let (aval, w') = removeAt a w
  b <- getInt s (n-1)
  let (bval, w'') = removeAt b w'
  (aval',bval') <- testWord (aval,bval) s
  return (aval':bval':w'')

removeAt :: Int -> [a] -> (a,[a])
removeAt k xs = case back of
    [] -> error "bad index"
    x:rest -> (x, front ++ rest)
  where (front,back) = splitAt k xs

(Note: that removeAt code came from here).

The full code for this, including the drivers that iterate the time steps and do basic IO to save data to disk, can be found at my public github repo in the directory “ng” within the file “ng_io.hs”.

Nicer implementation

What is wrong with that implementation anyways? Nothing is “wrong” exactly, but it isn’t the most elegant way to go about solving the problem in a language like Haskell. First off, state is explicitly strung through the code, and effects are managed through mutable IO references. Next, the accumulation of results from a sequence of timesteps is explicitly passed back as a return value in the main function that iterates the time stepper (this code wasn’t shown above – it’s in the repo). Overall, I ended up writing a bunch of code that feels like boilerplate for these kinds of models after you’ve written a few of them for different problems.

Of course, little pieces of this are solved in various educational materials (or even one of my old posts) : use the state monad to hide the state that we update, use a reader monad to access parameters that are not ever changed during the model execution, and use the writer monad to log results as it runs. The code shrinks a little and becomes nicely pure if we use all of these monads to build a little monad that takes care of all of the common actions that I tend to encounter in these simple Monte Carlo-style models.

To build this monad, first we define the state that is updated. We remove the IORefs, and make the simulation-specific state a generic type. The random number state remains a plain old StdGen.

data MonteCarloState a = MonteCarloState {
  randState :: StdGen,
  simState :: a
}

Next, we define the monad. We want to stack state, writer, and reader to build the whole monad, in which the state has one type, the logged values have another, and the parameters have a third. All of these are left to the user to define later.

type MCMonad m a b c = 
  StateT (MonteCarloState b) (WriterT [a] (ReaderT c m))

Now, we just need to have a function to enter the monad and extract the values when we’re done in there.

runMonteCarlo initstate seed params f = do 
  let gen = mkStdGen seed
  let mcstate = MonteCarloState { randState = gen, simState = initstate }
  ((a, i), l) <- runReaderT (runWriterT (runStateT f mcstate)) params
  return (a, l)

The rest of the code is pretty simple and straightforward. Sampling random numbers is similar to any other monadic solution, and the other functions provide a thin wrapper around the reader and writer monads. Note that I chose to not make the caller use the typical functions like ask, tell, put, and get directly. This is to allow me to play tricks later if I choose, where I may decide to change how these functions are actually implemented. For example, I might remove the writer code and put IO in its place to dump to disk. I’d prefer to have the code that uses the MCMonad be isolated from this kind of detail. In any case, the code follows for the routines that the core model calls.

sampleInt :: Monad m => Int -> (MCMonad m a b c) Int
sampleInt n = do
  mcstate <- get
  let rs = randState mcstate
      (val, rs') = randomR (0,(n-1)) rs
  put ( mcstate { randState = rs' } )
  return val
  
logValue :: Monad m => a -> (MCMonad m a b c) ()
logValue val = do
  tell [val]

getSimState :: Monad m => (MCMonad m a b c) b
getSimState = do
  s <- get
  return (simState s)

updateSimState :: Monad m => b -> (MCMonad m a b c) ()
updateSimState newstate = do
  mcs <- get
  put ( mcs { simState = newstate } )

getParameters :: Monad m => (MCMonad m a b c) c
getParameters = do
  p <- ask
  return p

Now, how does this impact the model code itself? Functionally, it doesn’t impact it at all. Most of the changes are at the type signature level, and replacing the calls that worked with the IOrefs with calls to functions that access functionality provided by the state, reader, or writer monads.

First, I define the data type to hold the parameters of the model that are accessed in a read-only fashion. Note that I’m also refactoring the IO-based code to abstract out the function used to update individuals when a common word is found. This makes the core of the model more generic, and allows the specific update function to be used to become a parameter that is passed in via the model parameters.

type KnowsWord = 
  (Individual, Individual) -> Word -> (Individual, Individual)

data SimParams = SimParams { 
  wordChecker :: KnowsWord
}

Similarly, we define a type for the generic MCMonad specific to this problem:

type NGMonad m = MCMonad m Int Int SimParams

We see that the mutable simulation state is an Int, the data that is logged is also Int type, and we will pass parameters of the type we just defined above. Two of the important functions that use all of the features of the monad are now shown below. They should be familiar from above, but are now based on the NGMonad and not IO.

testWord :: Monad m => (Individual, Individual) -> KnowsWord ->
            NGMonad m (Individual, Individual)
testWord (a,b) _ | (a == []) = do
  w <- newWord
  return ([w],b++[w])
testWord (a,b) f | otherwise = do
  let n = length a
  widx <- sampleInt n
  let wval = (!!) a widx
      (a',b') = f (a,b) wval
  return (a',b')

newWord :: Monad m => NGMonad m Int
newWord = do
  cur <- getSimState
  updateSimState (cur+1)
  return cur

timestepOnePair :: Monad m => [Individual] -> NGMonad m [Individual]
timestepOnePair w = do
  let n = length w
  a <- sampleInt n
  let (aval, w') = removeAt a w
  b <- sampleInt (n-1)
  let (bval, w'') = removeAt b w'
  params <- getParameters
  (aval',bval') <- testWord (aval,bval) (wordChecker params)
  return (aval':bval':w'')

We can see that newWord uses the getSimState and updateSimState functions to modify the contents of the state managed by the state monad. The testWord function uses the random number generation, and the time stepping function uses getParameters to access the parameter data structure from which the specific word checking function (of the three alternatives, NG, SO_NG, and HO_NG) is held.

One of the problems with codes like this is that one typically wants to run a number of trials to show the average behavior of the model – not just a single run. This is pretty easy to accomplish. First, we have a routine to run the model for a number of time steps, and this is called by an outer driver that iterates it a number of times. Interestingly, this is one of the places where purity is nice! Recall that with the IOrefs, we are changing state, so reuse of the data structure will show evidence that a previous computation executed. For example, after N iterations we generated 100 words, the next time we run the model, our first word will be 100, not 0 — there will be residue of the previous run visible to the new one. By removing the impure IOrefs, we remove this residue. While it wouldn’t impact the correctness of this particular model, it is a nice feature to have. As models get more and more complex, it is hard to definitively tell if this kind of residual state that crosses between trials can have an impact on the model correctness, so we might as well avoid it altogether if possible.

goNTimes :: Monad m => [Individual] -> Int -> NGMonad m ()
goNTimes p i | i < 1     = do 
  logValue (numUnique p)
  return ()
goNTimes p i | otherwise = do
  p' <- timestepOnePair p
  logValue (numUnique p')
  goNTimes p' (i-1)

Now, this is called by a driver. The driver handles timing, calling the code that steps the model N times, and doing so for 15 different random number seeds. The set of runs is then aggregated together into a sequence of average vocabulary sizes over time so we can see the average behavior of the model.

runTrial f fname = do
  start <- getCPUTime
  let pop = replicate 200 ([]::Individual)
      seeds = [1..15]
      params = SimParams { wordChecker = f, betaValue = 1.0 }
  vals <- mapM (\i -> runMonteCarlo 0 i params $ goNTimes pop 10000) seeds
  let lists = map snd vals
  let avgs = averager lists
  dumpCounts fname avgs
  end <- getCPUTime
  let diff = (fromIntegral (end-start)) / (10^12)
  printf "%s: %0.4f sec.\n" (fname :: String) (diff :: Double)
  return ()

averager :: [[Int]] -> [Float]
averager vals =
  let n = length vals
      sums = map sum $ transpose vals
  in
    map (\i -> (fromIntegral i) / (fromIntegral n)) sums

We also dump the data to a file, but that’s not so interesting so we won’t show it here. The results of the code being run are shown in the plot at the beginning of the article.

Final words

Interestingly, the code with IOrefs versus the one with the state/reader/writer monad don’t perform much differently. This is very good news – it means that we don’t need to necessarily sacrifice all high-level abstractions for performance. Note that if you are looking at the code in github, the ng_io.hs code will run fast, simply because I didn’t put the code in there to run a set of trials – it just runs one trial for one method for updating when a common word is found.

Another interesting part of working on the code was performance tuning. It turns out that all of the time is spent in the code that counts the number of unique words present in the entire population at each timestep. My original code to do this, which was based only on code from Data.List, looked like this.

numUnique :: [Individual] -> Int
numUnique p = length $ foldl1 union p

The idea is simple – individuals are lists of Ints, so if we just take the length of the union of all of them, we get the number of unique words present. This is not fast though. It turns out that there is a faster solution that isn’t much harder to code up:

data Tree = Node Word Tree Tree | Empty

-- number of unique words is number of nodes in a tree containing
-- no duplicates
numUnique p = sizeTree $ foldl (\t ps -> foldl insTree t ps) Empty p

{-# INLINE insTree #-}
insTree :: Tree -> Word -> Tree
insTree Empty          i         = Node i Empty Empty
insTree n@(Node j l r) i | i==j  = n
insTree n@(Node j l r) i | i<j   = Node j (insTree l i) r
insTree n@(Node j l r) i | i>j   = Node j l (insTree r i) 

{-# INLINE sizeTree #-}
sizeTree Empty        = 0
sizeTree (Node _ l r) = 1 + (sizeTree l) + (sizeTree r)

The logarithmic time of the tree beats some linear access time when using the lists. It is likely that, had I taken more than a couple minutes to think about it, I would have found a similarly efficient data structure somewhere in the set of libraries that come with GHC and the Haskell Platform.

One other interesting performance related item that I don’t have an answer to yet (maybe a reader out there will get interested and play with it) is taking advantage of parallelism available in the code. Within a single run of N time steps, there is limited parallelism possible due to the dependence of each iteration on its predecessor. On the other hand, when we do the set of trials in which we do N time steps for a set of seeds, there is parallelism available – each trial of N steps can be run independently. Unfortunately, my rather simple attempts to parallelize the code didn’t do so well. The best I did was create a code using `par` and `pseq` that did utilize all of the cores in my machine (according to the activity monitor), but ended up running slower than the sequential code (the parallel code run with +RTS -N1). I admit, I haven’t spent any serious time on trying to parallelize it – but I am curious to find out what method works best.

So, if someone out there finds the paper interesting that I found, or happens to want to play with this code, here are a few items for the todo list that may be entertaining/educational:

  • Parallelize the code.
  • Add a few more parameters to, say, use a probability parameter to tune the likelihood of success of a shared word being communicated. The code described above is equivalent to that parameter being 1.0 (always succeed). According to the PRE paper, interesting things occur when you play with that parameter.
  • Impose a topology on the individuals. Currently, everyone can talk to everyone else. What if we restrict which individuals are allowed to talk to others? We should see interesting behavior there. I have a tiny little Haskell program that generates R-MAT random graphs with nice properties that can likely be used for this if anyone is interested.

Again, the code is available in my public github repo, so feel free to grab it and play around with it.

Functional flocks

One of my favorite topics is Artificial Life – how can we build simple computational models of things that we would consider to be living. Often, this focuses on finding simple models for behavior. I’m a member of the International Society of Artificial Life, which has an interesting journal that (in my opinion) justifies my yearly membership fee. Two topics in ALife that I have found consistently interesting are artificial chemistries and flocking behavior. This post focuses on flocking. One of the reasons I decided to write this post was a weekend spent coding a simple flocking algorithm after reading Brian Hayes’ article on the topic in American Scientist over the winter. The video below shows the code described in this post running with a few hundred (750 if I recall) entities. They start off in a random configuration at the beginning, so it takes a few seconds for them to start coalescing into little blobs that wander around in coherent groups.

In the late 1980s (1986 to be exact), Craig Reynolds introduced the world to Boids. An outstanding problem at the time in studying complex systems was how to model flocking behavior in animals like birds and fish. Many people have looked up (or into an aquarium) and wondered what mechanism was keeping birds or fish in the patterns that we see as flocks (or schools, in the case of fish). How do they form, how are they controlled, and why do they occur? Reynolds proposed a very simple model that yields very realistic behavior in simulated environments, and he called that model “boids”. Flocking models have since found their way into a number of application areas – the study of biological systems, computationally creating scenes in movies involving crowds moving around, and optimization techniques inspired by flocking and swarm behavior.

Birds and fish are not networked computers with connections to every other bird or fish in their group, nor do they possess unlimited computing power. So, techniques that one might come up with algorithmically to coordinate a set of entities may not match the constraints of real animals that flock or swarm. Birds are actually quite simple, and live in the physical world and have fairly limited senses and intelligence. Like any other animal, birds interact with other birds that they are near, and interact less interestingly with birds that they are not near. Nothing in that assumption about bird behavior should be much of a surprise – nearby birds interact strongly, and the interaction strength drops as birds get further away from each other.

The boids model was based on this observation, and is derived from a small set of simple rules that make sense for anyone who has tried to navigate a crowded space with a group of friends or family. The rules for each individual are:

  • Separation: Try to maintain some degree of “personal space” around yourself with respect to your neighbors. Avoid collisions.
  • Alignment: Given the group that you seek to follow, if they are moving as a group in one direction, adjust your movement to move in the same direction.
  • Cohesion: Try to move towards the group that you are a member of.

So, we end up with three criteria to guide adjustments to the movement of each entity: avoid hitting our neighbors, seek the group we are trying to keep up with, and try to move in the same direction that all members of that group are moving in. We also constrain things to assume that the entities don’t talk to each other, so all decisions are based on passive observations made by the observer of its neighbors. The beauty of boids is the simplicity of stating the rules and the subsequent behavior that we see coming out of them being applied to a group of entities.

This algorithm turns out to be quite easy to implement, and using the gloss library in Haskell, we can look at the results of a boids simulation with very little effort. This visualization was also invaluable for debugging, since it made it very easy to look at the various contributors to the motion of each entity — which was necessary when tuning parameters and chasing bugs. There is more on the visualization aspect of the implementation further down in this post.

There are a number of perfectly good posts on the web about boids, variants of boids, and the boids model implemented in a number of languages. So, it’s not really worth rehashing what can be found elsewhere, and instead I wanted to focus on the title of the post — functional flocking. Specifically, what it was like to write the algorithm in a functional style in Haskell, including some performance-oriented algorithms that are often omitted in other introductions, leading to flocking implementations with O(n^2) performance and limited performance scaling. So, in this post I’ll talk about:

  • Basic encoding of boids in Haskell.
  • The use of a KD-tree data structure to avoid O(n^2) operations during each iteration.
  • The use of the gloss library for algorithm visualization

Note that the KDTree code here is a simplified version of the code from my older DLA post – in this case since I am only working with 2D flocks, I use a version that is tuned for two dimensions.

The code accompanying this post resides in my public github repo. Feel free to check it out, play with it, tweak it, and follow along. There are some details not described here that were not so interesting and would have consumed space, but are visible in the code in github.

To start, we first must define an individual boid.

data Boid = Boid { identifier :: Int,
                   position :: Vec2,
                   velocity :: Vec2 } deriving Show

The code is simple: a boid has a position and velocity, and an identifier used to distinguish them from each other. In the code on github there are a couple of other fields used for visualizing the vectors contributing to the three factors listed above (cohesion, separation, alignment).

Next, we have some code for implementing these three factors. There are a set of parameters used here that can tune the contribution of each factor. The code in github has a few interesting values, but for the purposes of this post, assume that the parameters are provided elsewhere – we won’t dwell on them now.

The first rule we tackle is cohesion – a group of entities try to stay together by moving a bit towards their center of mass or centroid. So, given a list of boids in a neighborhood, find their centroid as a 2D position.

findCentroid :: [Boid] -> Vec2
findCentroid []    = error "Bad centroid"
findCentroid boids =
  let n = length boids
  in vecScale (foldl1 vecAdd (map position boids)) (1.0 / (fromIntegral n))

Pretty easy – add all of the position vectors, and scale by 1/n where n is the number of boids in the neighborhood. This yields the centroid from which the cohesion vector is computed.

cohesion :: Boid -> [Boid] -> Double -> Vec2
cohesion b boids a = 
  vecScale diff a
  where c = findCentroid boids
        p = position b
        diff = vecSub c p

Given a boid and the set of boids that surround it (including itself), and some scaling parameter, we find the centroid, subtract the position of the boid we care about from the centroid, and return that difference vector scaled by the parameter.

Next, we have the rule for separation.

separation :: Boid -> [Boid] -> Double -> Vec2
separation b []    a = vecZero
separation b boids a =
  let diff_positions = map (\i -> vecSub (position i) (position b)) boids
      closeby = filter (\i -> (vecNorm i) < a) diff_positions
      sep = foldl vecSub vecZero closeby
  in
    vecScale sep sScale

Again, we have the boid that we are adjusting, the list of neighbors, and a tuning parameter. Clearly a boid in isolation with no neighbors sees no influence based on the separation rule, so the zero vector is returned. If a neighborhood does exist (potentially containing the boid itself), then we compute a set of vectors for each neighbor that point away from the boid to adjust. We filter the neighborhood to only consider those that are close by based on the tuning parameter representing the separation radius. We then accumulate up all of the vectors pointing away from the boid we care about in the opposite direction of each of the neighbors within the separation radius. A scaling parameter is then applied to control how severely boids change their velocity in reaction to closeby neighbors. Depending on how this parameter is selected, boids either subtly move away from each other when they get close, or they bounce and run away really quickly from close encounters.

Finally, the third rule is alignment.

alignment :: Boid -> [Boid] -> Double -> Vec2
alignment b [] a = vecZero
alignment b boids a =
  let v = foldl1 vecAdd (map velocity boids)
      s = 1.0 / (fromIntegral $ length boids)
      v' = vecScale v s
  in
   vecScale (vecSub v' (velocity b)) a

In this rule, we again have the same parameters – the boid we adjust, its neighbors, and a parameter. Again, adjusting to an empty neighborhood contributes nothing. Otherwise, we compute the average velocity of all neighbors, subtract the velocity of the boid from it, and scale it by the parameter.

Putting these all together, for a single boid we have a simple function to adjust the velocity and position during a single timestep.

oneboid b boids =
  let c = cohesion b boids cParam
      s = separation b boids sParam
      a = alignment b boids aParam
      p = position b
      v = velocity b
      id = identifier b
      v' = vecAdd v (vecScale (vecAdd c (vecAdd s a)) 0.1)
      v'' = limiter (vecScale v' 1.0025) vLimit
      p' = vecAdd p v''
  in
   Boid { identifier = id,
          position = wraparound p',
          velocity = v''}

First, we compute the cohesion, separation, and alignment adjustments. These are then all added to the velocity of the boid to compute a new velocity. This is then scaled by some small constant greater than one to prevent the boids from slowing to a standstill, allowing them to speed up to some limiting velocity if none of the adjustments prevent it. Finally, we add the velocity to the position, and return the updated boid.

Updating the whole set of boids is pretty simple as well.

iterationkd :: ViewPort -> Float -> KDTreeNode Boid -> KDTreeNode Boid
iterationkd vp step w =
  let boids = mapKDTree w (\i -> oneboid i (findNeighbors w i))
  in
    foldl (\t b -> kdtAddPoint t (position b) b) newKDTree boids

The signature here has changed a little. Ignore the first two parameters, as those are used by gloss to automatically invoke the iteration function for each frame. The state of the world is a KDTree of Boid data elements, and after application of the single boid update to all of them, we create a new KDTree of Boids to return. To apply the single boid update to each of them, we use a function mapKDTree that maps a function over all elements of the tree. The input to the function is the boid at each KDTreeNode, and the body of the function is a call to the oneboid function with the boid at the node and the list of neighbors (we describe that next). Finally, we accumulate up the updated boids into a new KDTree and return it.

The function to find all boids within a given neighborhood is one of the places where one can see algorithmic inefficiencies if a naive data structure is used to store the boids. A spatial data structure that allows one to query the set for all boids within a given distance of a location is key to being able to prune the number of boids to look at as a neighborhood. Of course, depending on how tightly packed they can get, the number of boids that can fall within a given region can get quite large. So, careful parameter choices for spatial scales for both querying and separation are necessary to avoid accidentally falling back to O(n^2) work.


findNeighbors :: KDTreeNode Boid -> Boid -> [Boid]
findNeighbors w b =
  let p = position b
      
      -- bounds
      vlo = vecSub p epsvec
      vhi = vecAdd p epsvec
      
      -- split the boxes
      splith = splitBoxHoriz (vlo, vhi, 0.0, 0.0)
      splitv = concatMap splitBoxVert splith
      
      -- adjuster for wraparound
      adj1 ax ay (pos, theboid) = (vecAdd pos av,
                                   theboid { position = vecAdd p av })
        where av = Vec2 ax ay
              p = position theboid

      adjuster lo hi ax ay = let neighbors = kdtRangeSearch w lo hi
                             in map (adj1 ax ay) neighbors
      
      -- do the sequence of range searches
      ns = concatMap (\(lo,hi,ax,ay) -> adjuster lo hi ax ay) splitv
      
      -- compute the distances from boid b to members
      dists = map (\(np,n) -> (vecNorm (vecSub p np), n)) ns
  in
    b:(map snd (filter (\(d,_) -> d<=epsilon) dists))

The complication that arises in this code is that we want to maintain a toroidal topology for the world, so both the left and right hand sides must wrap around. Unfortunately, the KDTree as implemented assumes a flat plane that extends indefinitely in all directions. If we want to impose a toroidal topology on it, we need to do a little work. I am sure that there exist other ways to do this, but the one I quickly put together is pretty simple. The figure below should help a little in visualizing this. First, given a boid we want to query around (the green dot), we need to remap the parts of the spatial query box that fall off the world onto the other side appropriately. This allows us to capture points like the grey one that are far away in the plane sense, but nearby given the torus that we want to model. Furthermore, we need to remap the points that fall into the regions that wrap around into a coordinate system that makes sense for the boid that we are querying around. That remapping is what the red arrow corresponds to.

Wrapping around regions for queries to the KDTree
Wrapping around regions for queries to the KDTree

The code isn’t too bad to achieve this. First, we define a bounding box around the boid by defining an upper right and lower left corner. Two helpers (defined below) are used to split the bounding box horizontally, and then split the results of that vertically to yield the set of bounding boxes that we need to query. In addition to splitting the bounding boxes, we also pass around for each bounding box the adjustment that needs to be made to translate the contents of each box into the region as seen by the green boid. Once we have split the boxes and computed the offsets for each, we iterate the sequence of KDTree queries (range searches), adjust the results of each, and return the set that fall within some epsilon radius of the central boid. Note that the range search covers a rectangular region, but to maintain isotropy (direction invariance), we prune that down to a circle so that no directions are favored over others.


splitBoxHoriz :: (Vec2,Vec2,Double,Double) -> [(Vec2,Vec2,Double,Double)]
splitBoxHoriz (lo@(Vec2 lx ly), hi@(Vec2 hx hy), ax, ay) =
  if (hx-lx > w) 
  then [(Vec2 minx ly, Vec2 maxx hy, ax, ay)]
  else if (lx < minx) 
       then [(Vec2 minx ly, Vec2 hx hy, ax, ay),
             (Vec2 (maxx-(minx-lx)) ly, Vec2 maxx hy, (ax-w), ay)]
       else if (hx > maxx) 
            then [(Vec2 lx ly, Vec2 maxx hy, ax, ay),
                  (Vec2 minx ly, Vec2 (minx + (hx-maxx)) hy, ax+w, ay)]
            else [(lo,hi,ax,ay)]
  where w = maxx-minx

splitBoxVert :: (Vec2,Vec2,Double,Double) -> [(Vec2,Vec2,Double,Double)]
splitBoxVert (lo@(Vec2 lx ly), hi@(Vec2 hx hy), ax, ay) =
  if (hy-ly > h) 
  then [(Vec2 lx miny, Vec2 hx maxy, ax, ay)]
  else if (ly < miny) 
       then [(Vec2 lx miny, Vec2 hx hy, ax, ay),
             (Vec2 lx (maxy-(miny-ly)), Vec2 hx maxy, ax, ay-h)]
       else if (hy > maxy) 
            then [(Vec2 lx ly, Vec2 hx maxy, ax, ay),
                  (Vec2 lx miny, Vec2 hx (miny + (hy-maxy)), ax, ay+h)]
            else [(lo,hi,ax,ay)]
  where h = maxy-miny

That’s most of the interesting code! As mentioned above, the full program resides in github for those who wish to see further detail or try it out.

Of course, the code is only really satisfying if we can see what the output looks like. This is achieved using gloss. For example, in addition to the view above, we can observe the underlying contributors to the flocking motion, which is very useful in debugging. A video of a small number of boids running around with their epsilon regions, and alignment, cohesion, and separation vectors shown, can be seen below:

I apologize for the low quality of the video – I need to learn how to make better screen videos that don’t lose so much quality when they get uploaded to youtube (the raw files on my screen before upload look much better). A zoomed out version where the vectors are hard to resolve, but the epsilon regions are clear for a larger region with more boids can be found here:

How did this get visualized? Very simple! First, there is the code to visualize one boid.

-- some colors
boidColor       = makeColor 1.0 1.0 0.0 1.0
radiusColor     = makeColor 0.5 1.0 1.0 0.2
cohesionColor   = makeColor 1.0 0.0 0.0 1.0
separationColor = makeColor 0.0 1.0 0.0 1.0
alignmentColor  = makeColor 0.0 0.0 1.0 1.0

renderboid :: World -> Boid -> Picture
renderboid world b =
  let (Vec2 x y) = position b
      (Vec2 vx vy) = velocity b
      v = velocity b
      (Vec2 dCX dCY) = dbgC b
      (Vec2 dSX dSY) = dbgS b
      (Vec2 dAX dAY) = dbgA b
      sf = 5.0 * (scaleFactor world)
      sf' = 1.0 * (scaleFactor world)
      sf2 = sf * 10
      (xs,ys) = modelToScreen world (x,y)
      vxs = sf * (realToFrac vx) :: Float
      vys = sf * (realToFrac vy) :: Float
  in
    Pictures $ [
      Color boidColor $ 
        Translate xs ys $
        Circle 2 ,
      Color radiusColor $
        Translate xs ys $
        Circle ((realToFrac epsilon) * sf'),
      Color boidColor $          
        Line [(xs,ys), ((xs+vxs),(ys+vys))],
      Color cohesionColor $
        Line [(xs,ys), ((xs+(sf2*realToFrac dCX)),
                        (ys+(sf2*realToFrac dCY)))],
      Color alignmentColor $
        Line [(xs,ys), ((xs+(sf2*realToFrac dAX)),
                        (ys+(sf2*realToFrac dAY)))],
      Color separationColor $
        Line [(xs,ys), ((xs+(sf'*realToFrac dSX)),
                        (ys+(sf'*realToFrac dSY)))]
    ]

We see a bit of code that is used to scale vectors and positions to fit well on the screen, and then a list of visual elements that define the lines and circles along with their colors and positions. One quirk with the code is that my boid logic was based on Doubles, while Gloss uses Floats, so a necessary conversion takes place here.

Rendering the full set of boids is pretty straight forward:

renderboids :: World -> KDTreeNode Boid -> Picture
renderboids world bs = Pictures $ mapKDTree bs (renderboid world)

We also have some code that is used to map “world” coordinates to screen coordinates. This allows us to have a window that is, say, 800×800, but treat the boids as if they live in a world that spans [-4,4] in each dimension. We use a World type to pass this information around so we can do the conversion:

data World = World { width :: Double,
                     height :: Double,
                     pixWidth :: Int,
                     pixHeight :: Int } deriving Show

modelToScreen :: World -> (Double, Double) -> (Float, Float)
modelToScreen world (x,y) =
  let xscale = (fromIntegral (pixWidth world)) / (width world)
      yscale = (fromIntegral (pixHeight world)) / (height world)
  in
    (realToFrac $ x * xscale, realToFrac $ y * yscale)

scaleFactor :: World -> Float
scaleFactor world =
  let xscale = (fromIntegral (pixWidth world)) / (width world)
      yscale = (fromIntegral (pixHeight world)) / (height world)
  in
   realToFrac $ max xscale yscale

Last but not least, we have the main function that drives the whole simulation. The “simulateInWindow” function is a helper from gloss that, given window parameters, timing parameters (e.g., framerate), and a function to be called on each frame, automatically handles driving the time stepper.

main :: IO ()
main = do
  let w = World { width = (maxx-minx), height = (maxy-miny), pixWidth = 700, pixHeight = 700 }
      bs = initialize 150 10.0 0.5
      t = foldl (\t b -> kdtAddPoint t (position b) b) newKDTree bs
  simulateInWindow
    "Boids"
    (pixWidth w, pixHeight w)
    (10,10)
    (greyN 0.1)
    30 
    t
    (renderboids w)
    iterationkd

Hopefully this post is informative to people who are interested in basic algorithm visualization in Haskell, flocking algorithms, agent-based simulation, or artificial life. Please keep in mind that this is just a simple example hacked together over a weekend and tuned during the evening in my hotel room during a workshop. The code could use a significant amount of polish, but in my opinion, is at a point where people reading the post will have enough information to play with it themselves. Enjoy!