K-Means Clustering is a popular algorithm for automatically grouping points into natural clusters. QGIS comes with a Processing Toolbox algorithm ‘K-means clustering’ that can take a vector layer and group features into N clusters. A problem with this algorithm is that you do not have control over how many points end up in each cluster. Many applications require you to segment your data layer into equal sized clusters or clusters having a minimum number of points. Some examples where you may need this
When planning for FTTH (Fiber-to-the-Home) network one may want to divide a neighborhood into clusters of at least 250 houses for placement of a node.
Dividing a sales territory/ customers equally among sales teams with customers in the same region are assigned to the same team.
There is a variation of the K-means algorithm called Constrained K-Means Clustering that uses graph theory to find optimal clusters with a user supplied minimum number of points belonging to given clusters. Stanislaw Adaszewski has a nice Python implementation of this algorithm that I have adapted to be used as a Processing Toolbox algorithm in QGIS.
I have heard feedback from users that this algorithm doesn’t work on all types of point distributions and may get stuck while finding an optimal solution. I am looking into ways to improve the code and will appreciate if you had feedback.
Rainfall is arguably the most frequently measured hydro-meteorological variable. It is a required input for many hydrological applications like runoff computations, flood forecasting as well as engineering design of structures. However, rainfall data in its raw form contain many gaps and inconsistent values. Therefore it is important to do rigorous validation of rain-gauge observation before incorporating them into analysis.
World Bank’s National Hydrology Project (NHP) prescribes a set of primary and secondary validation methods in the Manual of Rainfall Data Validation. Of particular interest to me are the spatial methods aimed to identify suspect values by comparison with neighboring stations. This spatial homogeneity test requires complex spatial and statistical data processing that can be quite challenging. I got an opportunity to work on a project that required automating the entire process of identifying and testing suspect stations. I ended up implementing it in QGIS using just Expressions and Processing Modeler. The whole solution required no custom code and was easily usable by an analyst in the QGIS environment. In this post, I will explain the details of the test and show you how you can use similar techniques for your own analysis.
This workflow was presented as a live session on QGIS Open Day. You can watch the recording to understand the concepts and implementation.
Many useful climate and weather datasets come as gridded rasters. The techniques for working with them is slightly different than other remote sensing datasets. In this post, I will show how to work with gridded rainfall data in Google Earth Engine. This post also serves an an example of how to use the map/reduce programming style to efficiently work with such large datasets.
Color correction is an important process working with satellite and aerial imagery. A common technique used to balance the colors across multiple images is Histogram Matching. While the algorithm has been around for a long time, there aren’t many free and open-source tools that can used at scale. Mapbox has released an open-source tool called rio-hist that works well for small and medium sized images. Whitebox Tools has a Histogram Matching algorithm that can be used in QGIS via Whitebox Tools Processing Plugin. But when working with large mosaics, such as the ones used in this post – it runs out of memory or takes a very long time. Google Earth Engine is a good alternative to perform fast histogram matching across large images.
In this post, I will first give an overview of the histogram matching algorithm and then show you how it can be implemented in Earth Engine. The example images are large high resolution orthomosaics (3cm/pixel resolution) collected by UAV around Oakland, CA area.
Spatial indexing methods help speed up spatial queries. Most GIS software and databases provide a mechanism to compute and use spatial index for your data layers. QGIS as well as PostGIS use a spatial indexing scheme based on R-Tree data structure – which creates a hierarchical tree using bounding boxes of geometries. This is quite efficient and results in big speedup in certain types of spatial queries. Check out Spatial Indexing section of my course Advanced QGIS where I show how to use R-Tree based Spatial index in QGIS.
If you use Python for geoprocesisng, the GeoPandas library also provides an easy to use implementation of R-Tree based spatial index using the .sidex attribute. University of Helsinki’s AutoGIS course has an excellent example of using spatial index with geopandas.
In this post, I want to talk about another spatial indexing system called H3.
When working on Remote Sensing applications, many operations require calculating area. For example, one needs to calculate area covered by each class after supervised classification or find out how much area within a region is affected after a disaster. Calculating area for rasters and vectors is a straightforward operation in most software packages, but it is done in a slightly different way in Google Earth Engine – which can be confusing to beginners. In this post I will outline methods of calculating areas for both vectors as well as images. We will cover the following topics, starting from simple to complex.
Area Calculation for Features i.e. vector data
Area Calculation for Images (Single Class)
Area Calculation for Images by Class
Area Calculation for Images by Class by Region
Area Calculation for Images by Class by Region by Year
When working with raster data, you may sometimes need to deal with data gaps. These could be the result of sensor malfunction, processing errors or data corruption. Below is an example of data gap (i.e. no data values) in aerial imagery.
As everyone who is involved in teaching and training knows, the past few months have been hard. We all had to make changes to accommodate working from home and adopting online teaching methods. Before the COVID-19 outbreak, I used to conduct all my training in-person. Either hosting it at a training center or at a client location. My materials, structure and instruction style was tuned to this setup. I was skeptical whether the experience of a classroom can be replicated – even partially – online.
Over the past 2 months, I have conducted numerous online training sessions. All my courses have been moved to a ‘live’ online class and even started offering short-format classes. I did a lot of research, talked to other trainers and spent a considerable effort in trying to make this transition. I thought sharing some of the lessons and best practices here will help fellow educators.
I was invited to participate in a panel discussion on Geospatial Intelligence for #LetsTalkDeepTech Webcast hosted by Swiggy. I talked about the history and evolution of this space and gave a deep dive into solutions for deriving intelligence from imagery.
Below is the a longer version of my talk on evolution of location intelligence with some references. I also share a copy of my presentation at the end. Hope you find it useful. Agree/Disagree with my views? Let me know in the comments.
Time series analysis is one of the most common operations in Remote Sensing. It helps understanding and modeling of seasonal patterns as well as monitoring of land cover changes. Earth Engine is uniquely suited to allow extraction of dense time series over long periods of time.
In this post, I will go through different methods and approaches for time series extraction. While there are plenty of examples available that show how to extract a time series for a single location – there are unique challenges that come up when you need a time series for many locations spanning a large area. I will explain those challenges and present code samples to solve them.
The ultimate goal for this exercise is to extract NDVI time series from Sentinel-2 data over 1 year for 100 farm locations spanning an entire state in India.