I am using the dataset provided by Yelp.com to explore the spatial distribution of restaurants in Pittsburgh. I created a small program to parse the json file provided on the website and extract restaurant information from it. The program is available here.

The graph below shows the locations of pizza places plotted on the map of Pittsburgh. The colors of the nodes represent the Stars (ratings from reviewers), and the sizes represent the number of reviews.

I clustered the pizza places into five groups using the Kmeans algorithm, according to their geographical locations and numbers of reviews.

Clusters

Number of Items

Center Latitude

Center
Longitude

Center
Review Count

Cluster 1

145

40.345

-80.055

17.952

Cluster 2

73

40.353

-79.791

11.945

Cluster 3

215

40.444

-79.966

38.888

Cluster 4

121

40.531

-80.086

18.397

Cluster 5

79

40.494

-79.78

12.823

It is interesting that the center of Cluster 3 (red) has significantly higher review count than other clusters. Comparing the results with the map of Pittsburgh neighborhoods (Image: Tom Murphy VII.), it is shown that most nodes of Cluster 3 are located in the central district and areas near the city center. That may be why theses pizza places receive more reviews, which imply that they are more popular.

This time, I clustered the pizza places into five groups according to their geographical locations and Stars instead of the number of reviews. This time, Cluster 3 (red) and Cluster 4(light green) have a lot of overlapping areas near the city center. The Stars of the centers of Cluster 3 and 4 are very different, equal to 2.4 and 3.8 respectively. That means the pizza places in the same area are clustered into two groups. Therefore, while pizza places near the city center are more popular than others, they are not higher rated. I find it very interesting to explore issues in the social science by analyzing social media data, and hope to do more in the future!

I have developed a location model based on rent. In this model, the rent of each cell is calculated by taking the average of agents' income in this area. Agents have different income levels and requirements on space. Agents want to be located in the most accessible area they can afford where their preferences for space are matched.

There are two types of agents: residents and employers. Residents have high income (e.g. financial services), middle income (e.g. teachers and other professional occupations) and low income workers, which are classed as ‘commerce’, ‘service’ and ‘industry’ respectively. These classes are additionally broken down by age as young (18-34), middle aged (35-65) and old (66+). The agent’s age is calculated randomly when it is first created (18-67). Each agent desires a certain amount of space which is broken down by age categories.

Employer agents were designed to reflect the residential agents’ employers, and subsequently the same three groups of ‘commerce’, ‘service’ and ‘industrial’ were used to represent employers’ different roles instead of age, they have a tenure set between 0-6. employer agents’ decrease their tenure to zero. Once zero is reached, the employer can move. As with residents, employers have a space requirement. For example industrial firms are driven by the need for large amounts of land while financial services (i.e. ‘commerce’ employer) need less land but want a more central location. Each employer also has an income which is four times that of residents.

It is assumed that younger residential agents will move more frequently (every 2 iterations on average) than those who are middle aged (every 5 iterations) with the older residents moving the least (every 10 iterations, On the other hand, employers only move if their tenure is 0. Once an employer agent has moved and finds a suitable location, its tenure is reset to 6 and cannot move for 6 iterations of the model.

Agents of either residential or employer type wanting to be located in the most accessible area they can afford where their preferences for space are matched. An alternative zonal system is used, based on a series of small overlapping areas which allow agents to search the entire area which is not restricted to such boundaries and allows agents to identify clusters spread across such boundaries.

When an agent decides to move, it goes through the list of areas and finds which area is the most attractrive area (in this area its based on accessability). The agent initially moves to the centre of the area, then searches the area for an affordable neighborhood.

This is a Netlogo reimplementation of the pedestrian model in “Walk This Way:
Improving Pedestrian Agent-Based Models through Scene Activity Analysis”
by Andrew Crooks et al. The purpose of pedestrian models in general, is to better
understand and model how pedestrians utilize and move through space. This model makes use of mobility datasets from video surveillance to explore the potential that this type of information offers for the improvement
of agent-based pedestrian models.

The visualization of the model looks like this:
(Grey boxes are the obstacles. Yellow triangles are the agents.)

Here is a video showing the simulation process:

There are 16 entrances and 18 exits in the model. An agent is created at an entrance, and will choose one exit as its destination. Agents move towards their destinations using shortest route while avoiding both the fixed obstacles and the other agents. The rule of selecting shortest route is simple: set the patch that one can see with the lowest gradient as target, and move towards it. One can see a patch that is both within vision and not blocked by obstacles. The method of calculating gradients will be explained in the following text.

Diagram of the route-planning algorithm:

Two types of empirical data are used in this model. Firstly, the empirical of probability of choosing each entrance and exit is used when creating agents and assigning their entrance and exits. Secondly, the empirical data of how people have moved on this map on August 25th is used to construct the gradients map, according to which agents select their path towards their destinations. The more frequently being chosen as a path + the closer to destination, the lower the gradient will be. When the empirical gradient maps are not used, the gradients map is constructed purely based on distance to destinations. Four scenarios are designed to compare the simulation results with the empirical result, in order to show how mobility data could help to improve pedestrian models.

Scenario 1: No Realistic Information about Entrance/Exit Probabilities or Heat Maps
In this scenario, entrance and exit locations are considered known, but traffic flow through them is considered unknown. Under such conditions, we run the model to understand its basic functionality without calibrating it with real data about entrance and exit probabilities, nor activity-based heat maps. This will serve as a comparison benchmark, to assess later on how the ABM calibration through such information improves (or reduces) our ability to model movement within our scene.

Scenario 2: Realistic Entrance/Exit Probabilities But Disabled Heat Maps
In this scenario, we explore the effects of introducing realistic entrance and exit probabilities on the model. The heat map models used are distance-based, and not informed by the real datasets. Instead, we use distance-based gradients (i.e., agents choose an exit and walk the shortest route to that exit).

Scenario 3: Realistic Heat Maps but Disabled Entrance/Exit Probabilities
In this scenario we introduce real data-derived heat maps in the model calibration. These activity-based heat map-informed gradients are derived from harvesting the scene activity data, however entrance and exit probabilities are turned off. In a sense one could consider this a very simple form of learning how agents walk on paths more frequently traveled within the scene. It also allows us to compare to extent to which the quality of the results are due to the heat maps versus entrance and exit probability.

Scenario 4: Realistic Entrance/Exit Probabilities and Heat Maps Enabled
In the final scenario we use all available information to calibrate our ABM, namely, the heat map-informed gradients and entrance-exit combinations and see how this knowledge impacts the performance of the ABM.

Please note that there is one gradient map for each pair of entrance and exit, therefore,
16 * 18 = 288 maps are loaded. However, the final result is compared to
only one path frequency map which is an empirical data obtained on
August 25th. Also please note that, when the entrance/exit probabilities table is
used, some entrances are exits have a probability of being chosen equals
to zero. While the table is not used, agents just randomly choose any
entrances or exits.

I built a model of pedestrians who try to leave the floor through one or
two exits. The map being used is from GMU’s Krasnow Institute. The model
records the frequency of each cell being chosen as a path and draws the
result into a path graph, which can be exported to ArcGIS for further
analysis.

Here is a graph showing the path graph opened in ArcGIS:

Here is a video showing the simulation process:

Each pacth has a variable called elevation, which is determined by (1)
the shortest distance to the exit; (2)if it is in a room, elevation is
lower being closer to gate. If there are more than one exit patches, the
elevation is equal to the shortest distance to closest one of the exit
patches. People use the gravity model (always flow to lower elevation, if space is available) to move to the exit.

In this model, the “elevation” of a patch is decided by its distance to
exits as well as how close it is located to the gate of the room, so
that people can run out if rooms. When running the model, people always
try to move to lower elevation. This algorithm can also be used to build
a rainfall model to analyze the movement of rain drops on the ground. See this link for the Rainfall model. (http://geospatialcss.blogspot.com/2015/10/rainfall-model-of-crater-lake-national.html)

I have also added the export function to export the path frequency graph to an asc
file. You may open the file in ArcGIS for further analysis.

This is a model of agents who try to leave the room through the
exit on the right hand side. The model also records the frequency of
each cell being chosen as a path and draws the result into a path graph,
which can be exported to GIS for further analysis.

Here is a graph showing the path graph opened in GIS:

In order to calculate the “elevation”, each patch calculates its
distance to each exit patch, and set the lowest distance as elevation.
When running the model, people always try to move to lower elevation.
This algorithm can also be used to build a rainfall model to analyze the
movement of rain drops on the ground.

This is a path-finding model using the A-star algorithm to find the shortest path. The models uses the map of George Mason University, including the buildings, walkways, drive-ways, and waters. Commuters randomly select a building as destination, find and follow the shortest path to reach there.

The following is the original map this model uses. It has been simplified in the model for faster computation.

Here is a video showing the process:

How it works?

In the beginning, each commuter randomly selects a destination and then identify the shortest path to the destination. The A-star algorithm is used to find the shortest path in terms of distance. The commuters move one node in a tick. When they reach the destination, they stay there for one tick, and then find the next destination and move again.

The code for path selection can be simply explained as following:

Each node has a variable "distance" that records the shortest distance to the origin. It is set to be 9999 at default. The origin has distance 0.

While not all nodes have updated their neighbors:
ask those nodes to update their neighbors
if the distance through this node is shorter than the existing distance of neighbors, update neighbor, and updated neighbor is marked as "has not updated its neighbors"
the node is marked as "has updated it neighbor"

The loop stops when all nodes have updated their neighbors, in other words, no node can be updated with a shorter distance. The nodes of the shortest path are then put into a list for the commuter to follow.

How is the map simplified?

For faster computation this model simplifies the original data by reducing the number of nodes. To do that, the walkway data is loaded to the 20 x 20 grid in Netlogo, which is small, and therefore, many nodes fall on the same patch. In each patch, we only want to keep one node, and duplicate nodes are removed, while their neighbors are connected to the one node left.

Also, links are created in this model to represent roads. This is so far the best way I can find to deal with road related problems in Netlogo. However, because the way I create links is to link nodes one by one (see code for more details), so some roads are likely to be left behind. But again there is no better way I can find. Therefore, I also used a loop in setup to delete nodes that are not connected to the whole network.

Recently I have created a segregation model with the calculation of Moran's I, a measure of spatial autocorrelation developed by Patrick Alfred Pierce Moran. In this model, I am using the map of Washington DC.The form of data is vector data.

Each turtle here represents a houshold that is either blue or red. All
turtles want to have neighbors with the same color. The simple rule is
that they move to unoccupied patches until they are happy with their
neighbors.

Here is the map I am using in this model.

In the beginning, 10 to 80 turtles are created in each polygon,
depending on the population data. Turtles are either blue or red. Red
polygons have 60% red and 40% blue. Blue polygons have 60% blue and 40%
red.

In each tick, turtles look at two kinds of neighborhoods to decide
whether they are happy or not. One is their geometrical neighboring
polygons; the other is the 8-connected neighbors. If either neighborhood
has different neighbors more than the specified percentage to be
unhappy, turtle will move to an unoccupied patch in a polygon that is
unoccupied or has the same color with it. The colors of the polygons are
decided by the majority of turtles living in each of them, and the
colors change every tick.

Here is a video recording the simulation process.

How to identify polygon neighbors?

It is tricky to find the geometrical neighbors of each polygon, since
Netlogo does not have this function. How I did it was to use the Polygon
Neighbors function in ArcGIS 10.2 to create a text file which maps each
polygon to its neighbors. Then, I deleted unecessary information like
headers and ask Netlogo to read the information. Notice that neighbors are polygons that share either a boundary (edge)
or a corner (node).

How to export to ArcGIS?

There is a button Export to export the map to GIS. It exports current map to finalmap.csv in data folder. Information will include color and pcentage red for each polygon. To analyze it in ArcGIS, open the csv file in ArcGIS, and export data as a dbf file to replace the oringinal DC.dbf file.

How to calculate Moran's I and verify it?

Moran’s I is a measure of spatial correlation. Values range from −1
(indicating perfect dispersion) to +1 (perfect correlation). If the
different items are randomly distributed, Moran’s I is 0. There is a
slider to choose whether to do row standardization or not. Row
Standardization is a technique for adjusting the weights in a spatial
weights matrix. When weights are row standardized, each weight is
divided by its row sum. The row sum is the sum of weights for a
feature’s neighbors.

I have verified the Moran's I calculated in my model with ArcGIS, and they are the same. To verify it, open final map in GIS, create a new numeric field equal to pcetred. Then, use the tool "Spatial Autocorrelation (Morans I)" in ArcGIS. Choose the numeric field as input, "CONIGUITY_EDGES_CORNERS" as conceptualization relationship, and whether to do Row Standardization. See below for the settings.