Analyzing Urban Mobility with Uber Movement Data

Mapshaper is a free and open-source software for spatial data processing. It is written in javascript and runs in your browser without any extra plugins and can perform a range of analysis. It started out as a tool for topologically-aware simplification, but has evolved into a swiss army knife of spatial data processing tools. All processing is done in the browser locally, and I have found that it can handle large volumes of data easily and processing is usually much faster than desktop based GIS software.

I had recently learnt about Uber Movement data – which shares anonymized data aggregated from over ten billion taxi trips to help urban planning around the world. The data is freely available in open-format for download. The dataset includes info such as travel times and speeds along each road segment.

I set out to do the analyze this data completely in the browser, using tools such as Mapshaper and geojson.io. Below is a step-by-step tutorial that uses Uber Movement data for the city of Bengaluru and analyse it using mapshaper web interface. The tutorial will try to answer the following questions from the data:

  • Given a place in the city, find all places within 30 minutes of driving distance.
  • Given an origin and a destination, find the best and worst day of the week for commuting.

Get the Data

We will use the Travel Times Uber Movement data for Bangalore. Download the Travel Times by Day of the Week dataset for 2019 Quarter1. This data comes as a CSV file bangalore-wards-2019-1-WeeklyAggregate.csv. Also download the Geo Boundaries which has the ward boundaries for Bangalore in GeoJSON format. This will be a file named bangalore_wards.json

Procedure

1. Open the Mapshaper Web Interface by going to https://mapshaper.org/

2. You can drag-and-drop files to load them in mapshaper. Drag the bangalore_wards.json to the mapshaper tab. Once loaded, click Import. You will notice that the files load instantly – even large ones. This is because the data is not uploaded anywhere. All processing is done in the browser, so your data stays private, even when using website.

3. Once the ward boundaries are loaded, you can click the info button and hover over any feature to display the attributes.

4. The web user interface can be used for basic tasks like input, output and feature simplification. For all other operations, one must type commands into the Console. Click the Console button in the top-right corner.

5. At the console prompt, type help to see the full list of commands available.

6. Let’s explore basic data editing operation. type info at the prompt and press Enter. You will see the command return basic information about the bangalore_wards layer, include the different fields in the attribute data. Each feature in this layer has 4 attributes: DISPLAY_NAME, MOVEMENT_ID, WARD_NAME and WARD_NO.

7. Let’s delete one of the attribute from the layer which is not needed for our analysis. Enter the following expression in the console. The main command here is each which runs the given JavaScript expression on each feature. The delete DISPLAY_NAME expression is for deleting the DISPLAY_NAME attribute. The target option specified which of the input layers to apply the command.

each 'delete DISPLAY_NAME' target=bangalore_wards

8. You can click the info button and hover over any feature to verify that the DISPLAY_NAME attribute is indeed deleted. Now instead of deleting, let’s add another attribute. Having information about the area of each ward is useful. Mapshaper has built-in property called this.area which returns area of the polygon in units of square meters. Enter the following expression in the console to add a new property AREA_KM2 which contains the area in square kilometers for each feature.

each 'this.properties = {WARD_NAME:WARD_NAME, WARD_NO:WARD_NO, 
  MOVEMENT_ID:MOVEMENT_ID, AREA_KM2:this.area/1e6}' 
  target=bangalore_wards

9. Let’s do some geoprocessing on this layer. We have polygons for individual wards with attribute for their area. We can run a Dissolve operation to merge these into a single polygon while doing a sum of the AREA_KM2 property. The result will be another layer representing the administrative boundary of the city with the total area.

dissolve sum-fields=AREA_KM2 name=dissolved +
 target=bangalore_wards

10. Another common GIS operation is a spatial join. Let’s see how can we do it in mapshaper. First, let’s create a new layer which has a point of interest. An easy way to create data is http://geojson.io. Go to this site and locate the place of your interest by zooming/panning the base map. Click the Draw a marker button and drop a pin at the location to automatically create a GeoJSON representation of it. You can also add a property called name and fill it with the name of the place. Once satisfied, go to Save → GeoJSON. A new file named map.json will be downloaded to your computer.

11. Drag-and-drop map.json to mapshaper and click Import. You will see the point of interest displayed.

12. By default, only the active layer is displayed. But it is useful to see all the layers overlaid together. Click the Layers dropdown at the top and click the eye icon to turn on other layers.

13. Let’s join the point-of-interest map layer with the bangalore_wards layer. This operation will extract the attributes from the polygon in which the point is located.

join bangalore_wards target=map

14. Once the join is complete, inspect the point of interest. You will notice additional attributes extracted from the intersecting ward polygon. Note the MOVEMENT_ID value which we will need later.

15. Now, let’s import the CSV file containing the travel time data. Drag-and-drop the bangalore-wards-2019-1-WeeklyAggregate.csv in the mapshaper tab. Mapshaper tried to intelligently guess the data type of each field in the CSV file. In our case, the CSV file has numeric fields containing the origin and destination zones, but the wards layer stores them as text fields. To allow us to join them, both the field types must match. We can provide a type hint to mapshaper before importing this data. Enter string-fields=sourceid,dstid in the command line options box and click Import. Once imported, you can run info command to verify that the attributes were indeed imported as string (text).

16. We are now ready to answer the first question that we had set out to answer find all places within 30 minutes of driving distance from the point of interest. Recall the MOVEMENT_ID of the point-of-interest that we found for the point of interest. We can filter the travel times data to all records which have the sourceid that match the MOVEMENT_ID. This will give us the records containing travel times from the zone of our point-of-interest to all other zones. (Replace 45 with the MOVEMENT_ID of your point of interest). Notice that the input file contains more than 250k records, but the filter operation is almost instantaneous.

filter 'sourceid == 45' name=filtered + 
 target=bangalore-wards-2019-1-WeeklyAggregate

17. A new layer filtered will be added containing the subset of records for our analysis. Now we need to join these records to the wards polygon. This operation is known as a Table Join. We need to provide field names from both the layers that will be used to match records to appropriate geometry. As the input data has origin/destination records for each day of the week – we will have 7 records for each polygon. For such many-to-one joins, we also need to provide a calc expression to compute a statistic. In our case, we can use the median() function to calculate the median travel time from all 7 days of the week.

join filtered keys=MOVEMENT_ID,dstid 
 calc='median_time = median(mean_travel_time), join_count = count()'
 target=bangalore_wards

18. The bangalore_wards layer has now been joined with the travel time records where each zone polygon has a new attribute median_time containing travel time in seconds from the zone containing our point of interest.

19. To find all the zones that fall within 30 minutes of the origin zone, we can apply a filter to select the zones with median_time less than 30×60=1800 seconds.

filter 'median_time <= 1800' name=30mins + target=bangalore_wards

20. A new layer 30mins with all the zones within 30 minutes is created. We can dissolve it to get the entire region that meets our criteria.

dissolve target=30mins

21. There you have it. We have performed multiple GIS and statistical operations to find the region that is accessible within 30 minutes from our point of interest.

22. Mapshaper also has some basic styling options to visualize the result better. Run the following commands to set the styling for our layers. Make sure to run each statement individually and in the sequence given below.

style fill=#f0f0f0 stroke=#bdbdbd target=bangalore_wards
style stroke=#2b8cbe target=dissolved
style label-text=name dx=15 target=map
style stroke=red fill=#fee8c8 opacity=0.5 target=30mins

23. We can also create a isochrone map by categorizing the travel times and assigning a color ramp. This is done in 2 steps. First we define a color ramp using the colorizer command and next apply the color ramp to the layer. Below is the visualization showing zones for 15, 30 and 45 minute travel zones. The resulting visualization can be exported as SVG using the Export button on the top-right corner. The color values are stored as attributes that can be used in a graphics program such as Inkscape to create a high-quality graphic.

colorizer name=traveltime colors='#f0f9e8,#bae4bc,#7bccc4,#2b8cbe'
 breaks=900,1800,2700
style fill=traveltime(median_time) target=bangalore_wards

24. Now let’s try to answer the second question Given an origin and a destination, find the best and worst day of the week for commuting. Pick a origin zone and a destination zone. We can filter the data records to all records matching these origin and destination zone that occur on weekdays.

filter 'dstid == 164 && sourceid == 45 && "1,2,3,4,5".indexOf(dow) > -1'
 name=commute + target=bangalore-wards-2019-1-WeeklyAggregate

25. The new layer commute should contain only 5 records – 1 for each day of the work-week. We can sort the records by mean_travel_time and pick the best and worst days for travel.

sort 'mean_travel_time' descending -calc 'first(dow)'
sort 'mean_travel_time' descending -calc 'last(dow)'

Data Credits

  • Travel Times: Data retrieved from Uber Movement, (c) 2019 Uber Technologies, Inc., https://movement.uber.com.
  • Ward Boundaries: Maps Provided by Spatial Data of Municipalities (Maps) Project [http://projects.datameet.org/Municipal_Spatial_Data/] by Data{Meet}.

0 Comments

Leave a Comment

Leave a Reply to Optimizing Office Commute with Uber Movement Data – Spatial ThoughtsCancel reply