Most optical satellite imagery products come with one or more QA-bands that allows the user to assess quality of each pixel and extract pixels that meet their requirements. The most common application for QA-bands is to extract information about cloudy pixels and mask them. But the QA bands contain a wealth of other information that can help you remove low quality data from your analysis. Typically the information contained in QA bands is stored as Bitwise Flags. In this post, I will cover basic concepts related to Bitwise operations and how to extract and mask with specific quality indicators using Bitmasks.
Continue readingSplit Polygons into Equal Parts using QGIS
In this post, I describe how we can use built-in QGIS processing tools to create a workflow to split polygons into equal parts. Using a clever algorithm and Feature Iterator tool in the Processing Framework, we can easily split all features in a given polygon layer into equal parts.

The algorithm for splitting any polygon shape into equal parts is described in this post PostGIS Polygon Splitting by Paul Ramsey. We will see how this can be implemented in QGIS.
Continue readingCalculating Weighted Centroids
In this post, I will outline techniques for computing weighted-centroids in both QGIS and Google Earth Engine. For a polygon feature, the centroid is the geometric center. It can also be thought of as the average coordinate of all points within the polygon. There are some uses cases where you may want to compute a weighted-centroid where some parts of the polygon gets higher ‘weight’ than others. The main use-case is to calculate a population-weighted centroid. One can also use Night Lights data as a proxy for urbanized population and calculate a nightlights-weighted centroid. Some applications include:
- Regional Planning: Locate the population-weighted centroid to know the most accessible location from the region.
- Network Analysis: For generating demand points in location-allocation analysis, you need to convert demands from regions to points. It preferable to compute populated-weighted centroids for a more accurate analysis.
Do check out this twitter-thread by Raj Bhagat P for more discussion on weighted centroids.

Aggregating Gridded Population Data in Google Earth Engine
Google Earth Engine makes it easy to compute statistics on gridded raster datasets. While calculating statistics on imagery datasets is easy, special care must be taken when working with population datasets. In this post, I will outline the correct technique for computing statistics for population rasters and aggregating pixels.

GIS Applications in Urban and Regional Planning
I recently taught a 1-month long course on GIS Applications in Urban and Regional Planning. We explored how GIS can be applied to solve problems in 6 different thematic areas. In this post, I will outline different applications and show concrete examples of using open-datasets and open-source GIS software QGIS.
Update!
The full course material – including data packages and PDF handouts – is now available for free download. Download [qgis_urban_planning.zip] containing detailed step-by-step instructions and datasets. These materials are ideal for self-study and complement the QGIS tutorials on my website.
Here are the 6 thematic areas
- Land Use Planning and Management
- Crime Mapping and Analysis
- Solid Waste Management
- Urban Infrastructure and Utilities
- Urban Transportation
- Spatial Planning
K-Means Clustering with Equal Sized Clusters in QGIS
K-Means Clustering is a popular algorithm for automatically grouping points into natural clusters. QGIS comes with a Processing Toolbox algorithm ‘K-means clustering’ that can take a vector layer and group features into N clusters. A problem with this algorithm is that you do not have control over how many points end up in each cluster. Many applications require you to segment your data layer into equal sized clusters or clusters having a minimum number of points. Some examples where you may need this
- When planning for FTTH (Fiber-to-the-Home) network one may want to divide a neighborhood into clusters of at least 250 houses for placement of a node.
- Dividing a sales territory/ customers equally among sales teams with customers in the same region are assigned to the same team.
There is a variation of the K-means algorithm called Constrained K-Means Clustering that uses graph theory to find optimal clusters with a user supplied minimum number of points belonging to given clusters. Stanislaw Adaszewski has a nice Python implementation of this algorithm that I have adapted to be used as a Processing Toolbox algorithm in QGIS.
Warning!
I have heard feedback from users that this algorithm doesn’t work on all types of point distributions and may get stuck while finding an optimal solution. I am looking into ways to improve the code and will appreciate if you had feedback.
Spatial Homogeneity Testing of Raingauge Data with Advanced QGIS Expressions
Rainfall is arguably the most frequently measured hydro-meteorological variable. It is a required input for many hydrological applications like runoff computations, flood forecasting as well as engineering design of structures. However, rainfall data in its raw form contain many gaps and inconsistent values. Therefore it is important to do rigorous validation of rain-gauge observation before incorporating them into analysis.
World Bank’s National Hydrology Project (NHP) prescribes a set of primary and secondary validation methods in the Manual of Rainfall Data Validation.
Of particular interest to me are the spatial methods aimed to identify suspect values by comparison with neighboring stations. This spatial homogeneity test requires complex spatial and statistical data processing that can be quite challenging. I got an opportunity to work on a project that required automating the entire process of identifying and testing suspect stations. I ended up implementing it in QGIS using just Expressions and Processing Modeler. The whole solution required no custom code and was easily usable by an analyst in the QGIS environment. In this post, I will explain the details of the test and show you how you can use similar techniques for your own analysis.
This workflow was presented as a live session on QGIS Open Day. You can watch the recording to understand the concepts and implementation.
Continue readingWorking with Gridded Rainfall Data in Google Earth Engine
Many useful climate and weather datasets come as gridded rasters. The techniques for working with them is slightly different than other remote sensing datasets. In this post, I will show how to work with gridded rainfall data in Google Earth Engine. This post also serves an an example of how to use the map/reduce programming style to efficiently work with such large datasets.
Continue readingHistogram Matching in Google Earth Engine
Color correction is an important process working with satellite and aerial imagery. A common technique used to balance the colors across multiple images is Histogram Matching. While the algorithm has been around for a long time, there aren’t many free and open-source tools that can used at scale. Mapbox has released an open-source tool called rio-hist that works well for small and medium sized images. Whitebox Tools has a Histogram Matching algorithm that can be used in QGIS via Whitebox Tools Processing Plugin. But when working with large mosaics, such as the ones used in this post – it runs out of memory or takes a very long time. Google Earth Engine is a good alternative to perform fast histogram matching across large images.
In this post, I will first give an overview of the histogram matching algorithm and then show you how it can be implemented in Earth Engine. The example images are large high resolution orthomosaics (3cm/pixel resolution) collected by UAV around Oakland, CA area.

Fast Point-in-Polygon Analysis with GeoPandas and Uber’s H3 Spatial Index
Spatial indexing methods help speed up spatial queries. Most GIS software and databases provide a mechanism to compute and use spatial index for your data layers. QGIS as well as PostGIS use a spatial indexing scheme based on R-Tree data structure – which creates a hierarchical tree using bounding boxes of geometries. This is quite efficient and results in big speedup in certain types of spatial queries. Check out Spatial Indexing section of my course Advanced QGIS where I show how to use R-Tree based Spatial index in QGIS.
If you use Python for geoprocesisng, the GeoPandas library also provides an easy to use implementation of R-Tree based spatial index using the .sidex attribute. University of Helsinki’s AutoGIS course has an excellent example of using spatial index with geopandas.
In this post, I want to talk about another spatial indexing system called H3.
Continue reading