K-Means Clustering is a popular algorithm for automatically grouping points into natural clusters. QGIS comes with a Processing Toolbox algorithm ‘K-means clustering’ that can take a vector layer and group features into N clusters. A problem with this algorithm is that you do not have control over how many points end up in each cluster. Many applications require you to segment your data layer into equal sized clusters or clusters having a minimum number of points. Some examples where you may need this

  • When planning for FTTH (Fiber-to-the-Home) network one may want to divide a neighborhood into clusters of at least 250 houses for placement of a node.
  • Dividing a sales territory/ customers equally among sales teams with customers in the same region are assigned to the same team.

There is a variation of the K-means algorithm called Constrained K-Means Clustering that uses graph theory to find optimal clusters with a user supplied minimum number of points belonging to given clusters. Stanislaw Adaszewski has a nice Python implementation of this algorithm that I have adapted to be used as a Processing Toolbox algorithm in QGIS.


I have heard feedback from users that this algorithm doesn’t work on all types of point distributions and may get stuck while finding an optimal solution. I am looking into ways to improve the code and will appreciate if you had feedback.

Continue reading

Spatial indexing methods help speed up spatial queries. Most GIS software and databases provide a mechanism to compute and use spatial index for your data layers. QGIS as well as PostGIS use a spatial indexing scheme based on R-Tree data structure – which creates a hierarchical tree using bounding boxes of geometries. This is quite efficient and results in big speedup in certain types of spatial queries. Check out Spatial Indexing section of my course Advanced QGIS where I show how to use R-Tree based Spatial index in QGIS.

If you use Python for geoprocesisng, the GeoPandas library also provides an easy to use implementation of R-Tree based spatial index using the .sidex attribute. University of Helsinki’s AutoGIS course has an excellent example of using spatial index with geopandas.

In this post, I want to talk about another spatial indexing system called H3.

Continue reading

When working with raster data, you may sometimes need to deal with data gaps. These could be the result of sensor malfunction, processing errors or data corruption. Below is an example of data gap (i.e. no data values) in aerial imagery.

Source Image: © Commission for Lands (COLA) ; Revolutionary Government of Zanzibar (RGoZ), Downloaded from OpenAerialMap. (Note: The data gap is simulated using a python script and is not part of the original dataset)
Continue reading

Google Earth Engine (GEE) is a powerful cloud-based system for analysing massive amounts of remote sensing data. One area where Google Earth Engine shines is the ability to calculate time series of values extracted from a deep stack of imagery. While GEE is great at crunching numbers, it has limited cartographic capabilities. That’s where QGIS comes in. Using the Google Earth Engine Plugin for QGIS and Python, you can combine the computing power of GEE with the cartographic capabilities of QGIS. In this post, I will show how to write PyQGIS code to programmatically fetch time-series data, and render a map template to create an animated maps like below.

Continue reading

When you want to buffer features that are spread across a large area (such as global layers), there is no suitable projection that can give you accurate results. This is the classic case for needing Geodesic Buffers – where the distances are measured on an ellipsoid or spherical globe. This post explains the basics of geodesic vs. planar buffers well.

Continue reading