Generating pseudo-random data is important for many aspects of research work. QGIS provides for many methods of generating random points to facilitate this.
Recently, I ran into a problem where I wanted to generate random points inside a polygon – but I wanted the random points to have a certain distribution. I wanted to generate a dataset showing employee home locations for a company. Given a city boundary and the location of office, I wanted to have a point layer that showed where the employees lived. A simple ‘Random points within Polygon’ algorithm would not work here, since the distribution of points would not be uniform within the city.
A more realistic distribution would look something like below
- X% employees live within XX mins from office
- Y% employyes live within YY mins from office
There are additional criteria – such as zoning (home locations would be in residential zones) and land use/land cover (no homes in water bodies, parks etc.). These can be incorporated in the analysis below if we had the right data layers representing these. But for this post, we will just focus on distance from office.
How do we generate random points that match these criteria?
I came up with a simple solution to this in QGIS. Here’s the workflow
Below is the city of Hyderabad, India and a point layer representing the office location.
We want to generate random points with following rules
- 40% of employees live within 30 mins of travel from office
- 30% of employees live within 30-45 mins of travel from office
- 20% of employees live within 45-60 mins of travel from office
- 10% of employees live more than 60 mins of travel from office
The first step is to generate isochrone polygons representing the zones of 30, 45, 60 and >60 mins of travel time from the office. You can use the isochrone tool from the OpenRouteService plugin in QGIS to find area that are within the specified travel time from a point. Another more accurate way is to use the Uber Movement data. I used mapshaper and uber movement dataset to generate isochrone polygons as below.
Next, we add a new field in the isochrones layer and enter values that specify the percentage of employees that we expect to live in that zone.
Now we can use the built-in Random Points inside Polygons algorithm to generate random points. To match the number of points that are in each isochrone polygon to our expected employee percentage, we can use an expression such as below for Point count of density parameter. Assuming we have 200 employees.
(200 * "percentage_employees")/100
Here’s the output of the algorithm.
This method has ensures that overall distribution of points in the city matches a more realistic pattern that we had set out to achieve.