Modeling Audience Group Behavior


This page describes a project by Nuria Oliver and Stephen Intille for the MIT Media Lab Spring 1996 class Modeling Autonomous Agents.

The Goal

The goal of this project is to implement a system that can model audience behavior using multiple, communicating agents. Each agent is a simple model of a person sitting in a grid-like auditorium. The person can listen for different types of stimuli such as clapping and whistling coming from nearby audience members. Each audience agent then has instructions on how to respond to what it observes. For example, some agents may be given instructions to "Clap in sync with the people who are clapping." Another group of agents may be given the instructions: "Whistle for a second every 4 claps you hear". And so on. There can be several different types of audience agents, each with different types of behaviors. The idea is to try and construct audience simulations that start off with chaotic/random behavior and eventually converge to some interesting "collaborative" behavior like a special rhythm or a visual display. The higher level structure emerges from the individual behaviors, in a completely decentralized way. Agents neither do have a particular language nor explicitly communicate among each other. Communication is performed through the environment by paying attention at each instant of time to how other agents are acting and have acted in the recent past.

The inspiration for this project came about after a class brainstorming session on ways that a speaker at a graduation with several thousand people could do some type of interactive task with an audience. A model that works successfully in our system could be tested with a real crowd by taping small slips of papers to auditorium chairs before the arrival of the crowd and then surprising the crowd during a talk by asking them to reach under their chair, get their instruction, and perform the action described. With a bit of luck, chaos would gradually turn to collaborative behavior.

This document describes our current system, the tests we have performed, and the issues that it raises.


The Agent Types

Each audience simulation can use up to ten different types of agents. The agent types are defined by the user by setting four parameters:
Action Type
Each agent has a type of action that it can perform. Currently the actions are either clap, whistle, or do nothing. New actions can be easily added, such as display a card of a certain color. Agents are limited to doing a single type of action.
Listen Type
Each agent can perceive one or more types of behavior. For example, some agents might listen for clapping whereas other agents might listen for whistling and clapping. The resulting agent behavior depends on the type of behaviors it is listening to.
Desired Frequency
An agent has a goal of performing its designated action at a certain frequency relative to the frequency of its neighbors whose type of behavior coincides with the behaviors the agent is able to perceive. For instance, one can define an agent that whistles every four claps it perceives. The details of how the agent perceives a clap as well as how it determines its sync frequency with respect to the group are discussed later.
Connection Type
All agents can "listen" to a grid of agents centered around each agent's position. The size of this grid can be interactively changed by the user. For example, if the user sets a grid size of 1 x 1, a grid with one seat in front behind, left, and right is attended to (in other words an 8-connected region). A grid of size 2 x 2 would mean an agent listens to 24 agents (25 minus the agent itself). Clearly, such a simple listening strategy does not accurately model real perception in an auditorium-type of situation. Audience members can attend to stimuli from many more people than those immediately next to them, including people making noise who may be hundreds of seats away. Moreover, the people who are listened to are probably not uniformly distributed. In addition to distance, factors like tone and loudness determine whether a person is attended to or not. The computational load of modeling each agent listening to hundreds of other agents prevented us from implementing such models. However, we think even local neighborhoods should be useful for studying audience behaviors. If the user is willing to wait, though, a very large listening neighborhood can be set.
For each type of agent, the user also sets one two variables:
Percentage
This number indicates the percentage of the audience of the given type. For example, in a simulation with three agents, the audience can be set to be 30% of agent type 1, 20% of agent type 2, and 50% agent type 3. The agents will be randomly distributed in the seating grid.
Color
Each type of agent has a different color which is shown every time this agent is acting at the agent's grid location. The color is selected interactively by the user.

The System

The images below show two screen shots. The first is the main user interface. The user can adjust the size of the seating grid, the number of agent types, and the maximum number of iterations. It is possible to turn the display on and off so that a very large number of iterations can be run quickly. The Tcl/Tk interface is reasonably fast. However, displaying a grid with hundreds of agents might take around one second.

Once the number of agent types is set, an additional window allows the user to specify the agent characteristics for each time. Among them, the percentage of the audience of each type or the color for each agent type. Up to ten agent types can be created. After introducing all the features for each type of agent, a new audience can be created by pressing the Create Audience button.

Once the audience has been created, a new simulation of its behavior can be run. In order to do that, the user needs to hit Run Audience . Then, the public agents start performing their behaviors. Their activity is represented by the user selected colors in the audience grid. Whenever an agent is acting, i.e. is performing its specific behavior, its seat will flash with the agent's color. For instance, A clap action with a desired frequency of 2 might be represented by a "red seat" lighting up in the grid on the clock tick when an audience member performs the clap. The following image is a screen shot of the audience grid at one random clock tick during a run.

The simulation continues to run until it reaches the desired number of iterations. When the display is deactivated, a grid with several hundred seats can be run for several hundred iterations in just a few seconds with an 8-connected listen-to grid.


Temporal Perception

One tricky aspect of the simulation is modeling temporal behavior. If several agents that one agent listens to all acted in the last second at different times, what time is used by the agent to decide when it should perform it's own action? It needs to do some type of simple frequency detection.

For instance, an agent who is supposed to whistle every 2 claps needs to listen to other agents and try to detect their clapping frequency. Since it may hear many non-synchronized claps within a short time period (particularly at the start of the simulation before behavior has converged), it needs a method for determining a frequency of clapping given what it heard.

It seems obvious that agents need memory in order to be able to infer from the environment their neighbors frequency. In our simulation, each agent stores a memory of what it has done over the last sixteen time steps (the memory length of the agents is a parameter of the program). Other agents -those whose neighborhoods include the particular agent- can then access that representation, thereby knowing what neighbors did in the last sixteen time steps. Agents use the temporal information of neighbors to compute their own action frequencies.

The method currently in use works as follows:

The agent looks at the temporal history of its neighbors. Assume the agent is listening for "claps." For each of the last sixteen time steps, it sums up the number of claps made by neighbors during each clock tick. In order to compute its neighbors frequency, it looks for the two time steps with the most claps and declares the "clapping frequency" to be the amount of time between these two times. Therefore a majority rule is applied when determining this frequency. In fact, this rule seems to be what we humans do when trying to determine a global behavior of a group. It then decides when it needs to clap next to satisfy it's goal. If the agent can't find two time steps that have more claps than the rest, it randomly decides when to clap next. Other more complex types of frequency detection strategies could be implemented.

The situation is actually a bit more complicated than this, because some agents listen to two types of behavior: their own and another type. If an agent is supposed to whistle every three claps, it is possible to have that group of agents all whistling every three claps, but not all whistling together. If the agent is supposed to also listen to whistling, the desired behavior is that each agent whistles every three claps and they all whistle together.

Audience Simulations

We ran a number of different audience simulations. General trends that summarize them are described below. We found the the system performs as we expected it would and consequently it seems to be an adequate initial model of this type of audience behavior.

     
  1. Two groups, not inter-related
  2. When two groups of agents are run that do not listen to each other (for instance, a clap group listening only to a clap group and a whistle group only listening to a whistle group ), they each converge rapidly within group but the two groups don't converge with each other, given that they are absolutely independent. In this case, the environment is only used a a communication media within each of the groups.

     
  3. Two groups, inter-related, same frequency
  4. An example of this case is clap listens to whistle, whistle listens to clap, both with a frequency of one . This simulation will rapidly converge so that everyone is acting simultaneously.

     
  5. Two groups, inter-related, different frequency
  6. This example will either converge or diverge, depending upon who is listening to what. An illustration of convergence is a case in which clap listens to clap and whistle listens to clap, with frequencies of one and two respectively . In this case, the clapping group will synchronize their clapping and the whistling group will whistle every two claps. However, if the test run is clap listens to clap and whistle and whistle listens to clap, the clappers can never converge because the whistlers are trying to synchronize to the clappers who are, in turn, trying to synchronize to the whistlers and so on.

Audience parameters and behavior analysis


Problems/Issues


Conclusions and Possible Extensions

Some possible extensions are: Finally, the ultimate and most interesting extension would be test a working model with a real audience!


Technical Details

The code is written in C++ and Tcl/Tk and the current version has been compiled on an SGI Indy workstation. A significant amount of time was required to get the interface code running properly, given that new Tcl/Tk commands have been added and implemented in C++. The code is available by request to anyone who wants it, but is not supported. If you'd like a copy or have any questions or feedback about this project, contact us. We will be happy to answer your questions, and any type of constructive criticism is welcome.


Who Did What?

Modify the interface Grid code: Nuria
Add agent behavior code: Nuria and Stephen
Test runs: Nuria and Stephen
Video: Nuria and Stephen
Report/Web page: Stephen
Review of Report: Nuria and Stephen
Presentation: Nuria and Stephen


And Finally...

Nuria Oliver Ramirez and Stephen Intille were graduate students at the Vision and Modeling Group of the MIT Media Lab when they did this project. In addition to fun projects like this one, they also do research in computer vision and perception understanding.

We'd appreciate any comments that you may have on this work or on possible extensions.


Last modified: 5/21/96 by intille@media.mit.edu & nuria@microsoft.com.