This weekend, I got an opportunity to volunteer with a non-profit called Junglescapes. We took a day trip to the Bandipur forest in Karnataka where they have done extensive work in forest restoration. One of their success stories is working with the locals to remove invasive species such as Lantana from the forest. Junglescapes volunteers and locals carry out regular line transact surveys to determine the impact of their interventions. One of the goals for my participation was to see if we can replace the cumbersone paper forms and handheld GPS devices with a mobile-phone based survey using ODK. I am sharing my notes on how we setup the survey and mapping of the result.

Setting up ODK

I created a survey form using XLSForm standard. The form is quite simple and uses the geopoint field type to collect the Lat/Lon of the start and end points of the transact. There is also a repeated field of type image to capture pictures along the transact. You can make a copy of this XLSForm spreadsheet and use it to customize your transact survey. Once the spreadsheet was ready, I converted it XForm using the online converter. The resulting XML file was loaded to a ODK Aggregate server running on AppEngine. Individual Android devices running the ODK Collect app were configured to use this server and forms were downloaded to the devices. You may also skip the server and copy the XML directly to mobile devices running the ODK Collect App.

Collecting Field Data

Upon entering the wildlife sanctuary, we trekked up to the region where the restoration has been done. There were 4 sites where we had to carry out the line transact survey. We lay down a 150ft line using a tape measure and collected data about number of trees, shrubs, plants and grass in a 10ft zone around the line. The data was entered directly in the ODK Collect app.

The ODK Collect app works completely offline. We were in the middle of a forest with no cell coverage but data collection work seamlessly. The app would sync the data once the devices get cell reception.

Processing Field Data

After we finished the surveys, we headed back from the sanctuary. When our devices got mobile data signals, the form submissions were submitted to the ODK Aggregate server automatically. You can also use the ODK Briefcase application on your computer to copy the data from devices directly.

Once you pull the data from the devices using ODK Aggregate or ODK Briefcase, you would have a spreadsheet with individual form submissions. This itself can be a final product of the survey and can be maintained as a record from periodic surveys. We wanted to also map the data since we had collected GPS coordinates for the start and end of each transact. This required a little bit of post-processing of the data to create a spreadsheet like this. Saving thie spreadsheet as a CSV, it can be imported in QGIS.

Using the Convert points to line tool from the Processing Toolbox, I created a line layer from the start and end points of the transact.

The resulting QGIS line layer was exported as a KML and imported to Google My Maps . I had also collected our entire trek and some waypoints using the My Tracks app. This was also imported to Google My Maps. These layers resulted in a rich and informative map from our field data exercise. See the live interactive map here.

Overall, using ODK for the transact surveys helped speed up the data collection. There was no data-entry after the surveys and creating an interactive map visualization was a breeze. If you are an individual or a team who is doing field data collection, moving to ODK would help you reduce errors and collect accurate data without the hassles of data entry.

GeoPDF is a unique data format that brings the portability of PDF to geospatial data. A GeoPDF document can present raster and vector data and preserve the georeference information. This can be a useful format for non-GIS folks to consume GIS data without needing GIS-software. While GeoPDF is a proprietary format, we have a close alternative in the open Geospatial PDF format. GDAL has added support for creating Geospatial PDF documents from version 1.10 onwards. In this post, I will show how to create a GeoPDF document containing multiple vector layers.

Get the Tools


OsGeo4W is the best way to install GDAL on Windows. The default installation gives your GDAL tools with PDF format support. You can use the GDAL tools via the OsGeo4W Shell included in the install.


KyngChaos providers a convenient GDAL installer for Mac. You also need to install the additional GeoPDF plugin to enable support for PDF format.

Once installed, add the path to GDAL library to your .bash_profile file to be able to use the commands easily from the terminal. Launch a Terminal and type in the following commands.

echo 'export PATH=/Library/Frameworks/GDAL.framework/Programs:$PATH' >> ~/.bash_profile
source ~/.bash_profile


Installation instructions will vary with the distribution. On Ubuntu, you can install the gdal-bin package.

sudo apt-get install gdal-bin

Verify GDAL Install

If you already have GDAL installed, or just installed it, run the following command in a terminal to verify that your GDAL installation is working and has support for GeoPDF format.

gdalinfo --formats | grep -i pdf

If you see Geospatial PDF printed in the output – you are all set. If you do not get any output or get an error, your install is not correctly configured.

Get the Data

For this example, I chose to use OpenStreetMap Metro Extracts from MapZen. Download the shapefiles (OSM2PGSQL SHP format) for the city of your choice. I am using the extract for Bangalore city in this example. Unzip the downloaded file to a folder on your computer.


The process for creating a GeoPDF file from a bunch of shapefiles is the matter of running a single gdal_translatecommand. But we need to prepare the data and figure out the correct command-line options. So follow along to understand how you can arrive at the final command – or simply scroll to the end to see the final command-line. has a comprehensive overview of all the options available for GeoPDF creation via GDAL. The follow steps are adapted and simplified version of that guide.

  • First step is to create a .vrt file that can hold all the vector layers we want in the PDF. If you just need a single layer in the PDF, you can skip creating the .vrt file and directly reference the layer in place of the VRT. Note the <SrcSQL> tag in the VRT file. This is for filtering out all features where the ‘name’ field is empty. You can leave that out or modify to suit your dataset. Name this file osm.vrt and save it on the same folder with your data.
    <OGRVRTLayer name="roads">
        <SrcSQL dialect="sqlite">SELECT name, highway, geometry from bengaluru_india_osm_line where name is not NULL</SrcSQL>
    <OGRVRTLayer name="pois">
        <SrcSQL dialect="sqlite">SELECT name, geometry from bengaluru_india_osm_point where name is not NULL</SrcSQL>
  • GeoPDF is a raster format that can overlay vectors on top. So we need a raster layer as the base. If you have some satellite imagery or scanned raster for the area, you can use it as the base layer, or we can create an empty raster for the extent of the vector layer. ogrtindex command creates a bounding box polygon from the given input layers. gdal_rasterize command then fills this polygon with the given value and creates a raster. The -tr option specifies the pixel resolution of the raster in degrees. You can tweak that to get the output size you need. cd to the directory where you have extracted the vector layers and run the following commands.
cd Users\Ujaval\Downloads\bengaluru_india.osm2pgsql-shapefiles

ogrtindex -accept _different_schemas extent.shp osm.vrt

gdal_rasterize -burn 255 -ot Byte -tr 0.0001 0.0001 extent.shp bangalore.tif
  • Now we can convert the empty bangalore.tif raster to a PDF – overlaying the vector layers from the osm.vrt file.
gdal_translate -of PDF -a_srs EPSG:4326 bangalore.tif bangalore.pdf -co OGR_DATASOURCE=osm.vrt -co OGR_DISPLAY_FIELD="name"
  • Once the conversion finishes, you can open the resulting bangalore.pdf file in any PDF viewer. Opening it in Adobe Acrobat viewer, you can see the map data layers. You can browse the features in the layer panel, search for any attribute value and zoom/pan the map.
  • Another popular use of GeoPDF files is to use it as offline base maps using programs such as Avenza PDF Maps. Loading the bangalore.pdf file on Avenza Maps on your mobile phone, you can use the GPS to view your current location or trace a GPS route on top. Search also works across layers in the PDF.

You can download the sample bangalore.pdf Geospatial PDF format file for exploring the format yourself.

Mapshaper is a free and open-source tool that is best known for fast and easy simplification. Other tools for simplification – like QGIS or ogr2ogr – do not preserve topology while simplifying. This means you may get sliver polygons or missing intersections. Mapshaper performs topologically-aware simplification and gives you much more control on the process.

Other popular open-source tools, PostGIS and GRASS can do topologically-aware simplifications as well.
But Mapshaper is much more than a simplification tool. It is in active development and has many more data processing and editing capabilities now. It also has a command-line version of the tool which can be run from a terminal. In this post, we will explore the command-line tool to carry out some complex geoprocessing tasks.

Mapshaper is a Node.js application. Download and install Node.js for your platform. You will need the Node Package Manager (NPM) to install mapshaper, so make sure it is enabled while going through the installer.

Once Node.js is installed, launch the Windows Command Prompt (cmd.exe) and run the following command to install mapshaper.

npm install -g mapshaper

Get the Data

Review the data and problem statement from the Performing Table Joins tutorial. Download the Census Tracts shapefile and the Population CSV ca_tracts_pop.csv. Unzip the file and extract it to a folder.


Mapshaper command takes an input, an output and a sequence of commands to execute. Each command is followed by options specific to that command. All the commands and options are well documented at the Mapshaper Wiki.

Let’s start with simplification. We will take the census tracts shapefile and simplify it to reduce the number of vertices and the total size. The command for simplification is -simplify. You can supply a percentage value as an option to specify how aggressiveness of the simplification. Another useful option is keep-shapes which ensures that none of the polygons from the input will get deleted. Run the following command. Make sure you cd to the directory where the data has been downloaded.

Note: The percentage value in the -simplify command can be a little misleading. The value indicates how many vertices to keep and not how many to remove. So a lower value would result in MORE simplification

mapshaper -i tl_2013_06_tract\tl_2013_06_tract.shp -simplify 20% keep-shapes -o output.shp

Mapshaper can also do Table Joins. We can now join the population field D001 from the ca_tracts_pop.csv file. The join will match the fields we specify as keys and add it to the output file. For the join to work correctly, we need to specify the field types in the CSV file. (Similar to how a .csvt file is needed by QGIS). We can ‘chain’ the -join command after the -simplify command to perform both the operation in a single command.

mapshaper -i tl_2013_06_tract\tl_2013_06_tract.shp -simplify 20% keep-shapes -join ca_tracts_pop.csv keys=GEOID,GEO.id2 field-types GEO.id2:str,D001:num -o output.shp

Mapshaper can also dissolve features. In my testing, Mapshaper’s dissolve operation was many times faster than QGIS or GRASS. Let’s add a -dissolve command and merge all census tracts for a county. We can also sum up the values of the D001field to get the total population of the county from the sum of individual census tracts.

mapshaper -i tl_2013_06_tract\tl_2013_06_tract.shp -simplify 20% keep-shapes -join ca_tracts_pop.csv keys=GEOID,GEO.id2 field-types GEO.id2:str,D001:num -dissolve COUNTYFP sum-fields D001 -o output.shp

The output format needed by many web apps is geojson or topojson. Mapshaper can write the output in these formats as well. Let’s add a format=geojson option to the -o command to write a geojson output.

mapshaper -i tl_2013_06_tract\tl_2013_06_tract.shp -simplify 20% keep-shapes -join ca_tracts_pop.csv keys=GEOID,GEO.id2 field-types GEO.id2:str,D001:num -dissolve COUNTYFP sum-fields D001 -o format=geojson output.geojson

Finally, let’s visualize our output. Go to and upload the resulting output.geojson. You will be able to visualize the output shapes and their properties

By now, you must have figured out that we have a very powerful tool on our hands. In just a single line of command and just a few seconds of computing, we did Simplification, Table Join, Dissolve and Format translation.

GDAL and OGR libraries come with handy command-line tools. These tools are quite powerful and can save you a lot of effort if you know how to use them. Here I will show how to use the ogrinfo and ogr2ogr tools to perform spatial joins. A single command can do complex operations on your spatial data and save you a lot of clicking-around and data-munging in a GIS.

Get the Tools

The best way to get the command-line tools on Windows is via the OSGeo4W Installer. If you are on Linux or Mac, see these instructions to get the package for your platform.

Get the Data

Review the data and problem statement from the Performing Spatial Joins tutorial. Download the Borough Boundaries and Nursing Homes shapefiles.


OGR command line tools accept only 1 input. But we have 2 inputs for the spatial join. An easy way to fix this, is to use a VRT file. A VRT file allows us to specify multiple inputs and pass them to the command-line tool as layers of a single input.

Unzip the input shapefiles in a single folder on your drive. Create a file named input.vrt in the same folder with the following content.

    <OGRVRTLayer name="boroughs">
    <OGRVRTLayer name="nursinghomes">

Open the OSGeo4W shell and cd to the directory containing the shapefiles and the vrt file. Run the ogrinfo command to check if the VRT file is correct.

ogrinfo input.vrt 

OGR tools can run SQL queries on the input layers. We will use the ST_INTERSECTS function to find all nursing homes that intersect the boundary of a borough and use the SUM function to find the total nursing home capacity of a borough. Run the following command.

ogrinfo -sql "SELECT b.BoroName, sum(n.Capacity) as total_capacity from
boroughs b, nursinghomes n WHERE ST_INTERSECTS(b.geometry, n.geometry) group
by b.BoroName" -dialect SQLITE input.vrt

You can see that in a single command we got the results by doing a spatial join that takes a lot of clicking around in a GIS environment. We can do a reverse spatial join as well. We can join the name of the Borough to each feature of the Nursing Homes layer. Using the ogr2ogr tool we can write out a shapefile from the resulting join. Note that we are adding a geometry column in the SELECT statement which results in a spatial output. Run the following command:

ogr2ogr -sql "SELECT n.Name, n.Capacity, n.geometry, b.BoroName from
boroughs b, nursinghomes n WHERE ST_INTERSECTS(b.geometry, n.geometry)"
-dialect SQLITE output.shp input.vrt

Open the output.shp in a GIS to verify that the new shapefile as attributes joined from the intersecting borough. You can use ogrinfo command to check that as well.

ogrinfo -al output.shp 

The Association for People with Disability (APD) is a non-profit organization based out of Bangalore, India. Their mission is to reach out and rehabilitate people with disability from the under privileged segment. Over the past year, I along with my colleagues have been volunteering with them to develop a system that can help improve their field data collection efforts.


APD provides variety of services rehabilitate under privileged people with disabilities. Before they can render any service, they need register the individuals with the organization and collect basic background information. Registrations are mostly done in their field offices or at camps organized in rural areas. Paper forms were filled at the site and shipped to their field office. A staff member entered the data manually into their software platform.

A registration camp

This has many problems:

  • The text on the paper form was often illegible. Some fields were missing or inaccurate.
  • Data entry was laborious and introduced errors.
  • 4–6 weeks of lag time before the data was available in the system.

Our solution

We helped APD implement a process using OpenDataKit (ODK) that allowed capture of the form using android devices. With the new system, the data is captured on the mobile device using the ODK Collect app in the field and sent to a ODK Aggregate server running on Google AppEngine. The data ends up in a shared spreadsheet which is imported to APD’s system after each registration camp.

APD staff using the ODK collect mobile app

This new workflow offers several advantages over the paper based forms:

  • Reliable data collection in areas with poor network connectivity. ODK Collect app can work completely offline and the data is stored on the device memory. Once the staff members are back in office and connect to a WiFi network — the data is sent to the server.
  • The mobile app enforces checks, so all the data is consistent and there are no missing fields.
  • Allows for the capture of pictures and additional metadata (such as time, location, staff id).
  • The data is exported from the spreadsheet and imported to their system the next day — cutting the lag from weeks to hours.


  • October 2014: Met with the APD registration team to understand their requirements and design a process.
  • November 2014: A prototype is created by migrating the registration form to ODK using XLSForm. APD trains field staff on using the mobile app. Successful field test.
First field test of the app with newly trained staff
  • December 2014: First full-fledged camp with 3 devices. Successful registration of 55 participants. Staff is very happy with the increased speed, accuracy and reduction in delay in getting the registrations processed.
First ‘real’ deployment.Staff had paper forms as backup, but did not need it
  • January 2014: APD moves their registrations completely to mobile devices. All registrations are completely paperless and have the added benefit of having participant’s picture as part of the registration process. Over 500 registrations processed without a problem.
Completely paperless registration camp after migration to the mobile app

While this is a great start, we are looking at helping them with other challenges in the field. In the coming months, we want to tackle the following problems:

  • Migrate patient visit and treatment forms to OpenDataKit. These require having access to the patient’s medical history in the field. OpenDataKit’s 2.0 suite of tools would be a good fit.
  • Task allocation and scheduling optimization for the field staff.
  • Encoding the knowledge from the training manual to a mobile app.

A long pending weekend project is done. Printed, cut and folded a sturdy globe using the template from Le Paper Globe.

This is not only fun, but a good prop to learn more about Geography. I envision it would make a fun do-it-yourself project with kids of all ages.

I recently had a need to calculate distance between a large number of latitude/longitude coordinate pairs. There are many options available if you want to import these in a GIS and run analysis. But there is a simpler and much more accesible way if you aren’t doing very high accuracy calculations.

Here I have a spreadsheet which implements the well-known Haversine formula to calculate distance between 2 coordinates. You can structure your point coordinates into 4 columns Lat1, Lon1, Lat2, Lon2 in decimal degrees and the distance will be calculated in meters.

You can give it a try. Just open this spreadsheet, make a copy it and play with it as you like.

The raw formula is below (Thanks to the reader Samuel who suggested it)

=2 * 6371000 * ASIN(SQRT((SIN((LAT2*(3.14159/180)-LAT1*(3.14159/180))/2))^2+COS(LAT2*(3.14159/180))*COS(LAT1*(3.14159/180))*SIN(((LONG2*(3.14159/180)-LONG1*(3.14159/180))/2))^2))