Learning SQL with DuckDB

By ujaval | May 5, 2026

Historically, the open-source spatial analysis workflows lived in different silos

Python: You use Pandas, GeoPandas, and Jupyter Notebooks.
SQL: You use SQL with PostGIS.

If you lived in the Python ecosystem, you rarely have to switch to SQL for analysis and vice versa. Many people, including me, had little motivation to sharpen their SQL skills when you could get away with doing things in Python. Many of our students who wanted to learn SQL, found themselves choosing between these two stacks and found SQL had a much higher friction to get started.

Recently, I have been using DuckDB and find that it is the perfect bridge between these two ecosystems. DuckDB + Python + LLMs provide the easiest and most rewarding pathway for Python users to learn and incorporate SQL in their workflows.

In this post, we will cover the following topics

What is DuckDB?
Using DuckDB to learn SQL with the help of LLMs
Example Workflow with LLMs
- Querying and Loading Administrative Boundaries from GeoBoundaries
Advanced Workflows
- Extracting Overture Maps Data
- Extracting Farm Boundaries from Global Fields of The World (FTW)

Open our companion notebook learning_sql_duckdb.ipynb in Google Colab to follow along and run the queries yourself.

Continue reading →

Get a Country-specific World Map in QGIS

By ujaval | October 13, 2025

QGIS comes bundled with a simplified version of the Natural Earth Countries shapefile that is suitable for quick map-making. The layer can be loaded into your canvas by typing the keyword world in the coordinates bar.

While this is useful, there is no single political map of the world that is accepted by every country of the world. There are many disputed international boundaries, and each country has its own version of accepted international boundaries. To allow mapmakers to adhere to local mapping regulations, Natural Earth also publishes Countries point-of-views shapefiles for many countries that depict the world map according to each country’s law and/or local conventions. We provide a simple script to replace the bundled world map with your country’s point-of-view layer.

Continue reading →

#PythonDatavizChallenge – Learn Mapping and Data Visualization with Python in 30 Days

By ujaval | October 13, 2024

Welcome to #PythonDatavizChallenge – Learn Mapping and Data Visualization with Python in 30 Days! We have designed this challenge to help you learn how to create charts, maps, animations, dashboards and interactive mapping applications using Python ! Spend 30 minutes each day for the next 30 days to level-up your Python dataviz skills. We have spent over 2 years building and refining this course and are excited to share it with you all – completely free.

We will be posting short videos everyday and cover the full course material step by step. The material covers both static and dynamic plotting libraries along with the app framework – Streamlit. At the end of the course, you will have the necessary skills to build data-powered web mapping apps and dashboards. Ready for #PythonDatavizChallenge? Read on to know the details.

This is an intermediate course that assumes good working knowledge of Python. If you are new to programming, complete our Python Foundation for Spatial Analysis course first.

Continue reading →

#PyQGISChallenge – Master QGIS Python Development in 30 Days

By ujaval | July 15, 2024

Welcome to #PyQGISChallenge – Master QGIS Python Development in 30 Days! We are launching our PyQGIS Masterclass course on YouTube and have designed this challenge to help you learn how to customize QGIS using Python with scripts, custom algorithms, actions and plugins! Spend 30 minutes each day for the next 30 days to level-up your QGIS skills. This course is the result of my 15+ years of experience doing QGIS development – including building enterprise-grade plugins and deploying QGIS to thousands of users. I am really excited to share this content with you – completely free.

We will be posting short videos everyday and cover the full course material step by step. The material is designed to help you slowly ramp up and learn complex concepts! All you have to do is show up everyday and spend half an hour watching the videos and practicing the exercises. At the end, you can take up a mini-project and apply your newly acquired skills. Ready for #PyQGISChallenge? Read on to know the details.

This is an advanced course that assumes good working knowledge of both Python and QGIS. If you are new to programming, complete our Python Foundation for Spatial Analysis course first.

Continue reading →

LISS4 Image Processing using XArray and Dask

By ujaval | December 25, 2023

ISRO recently released the full archive of medium and low-resolution Earth Observation dataset to the public. This includes the imagery from LISS-IV camera aboard ResourceSat-2 and ResourceSat-2A satellites. This is currently the highest spatial resolution imagery available in the public domain for India. In this post, I want to cover the steps required to download the imagery and apply the pre-processing steps required to make this data ready for analysis – specifically how to programmatically convert the DN values to TOA Reflectance. We will use modern Python libraries such as XArray, rioxarray, and dask – which allow use to seamlessly work with large datasets and use all the available compute power on your machine.

Continue reading →

Understanding Pixel Weights in Zonal Statistics

By ujaval | July 13, 2023

An important concept in spatial statistics is pixel weights. When calculating pixel statistics with a polygon, partial pixel overlaps are treated differently by different packages and you need to understand this to evaluate the accuracy of your results. Consider the following image. What is the correct answer?

Continue reading →

Creating Animated Plots with Matplotlib

By ujaval | January 14, 2022

Matplotlib has functionality to created animations and can be used to create dynamic visualizations. In this post, I will explain the concepts and techniques for creating animated charts using Python and Matplotlib.

I find this technique very helpful in creating animations showing how certain algorithms work. This post also contains Python implementations of two common geometry simplification algorithms and they will used to create animations showing each step of the algorithm. Since both of these implementations use a recursive function, the technique shown in the post can be extended to visualize other recursive functions using matplotlib. You will learn how to create animated plots like below.

Continue reading →

K-Means Clustering with Equal Sized Clusters in QGIS

By ujaval | January 31, 2021

K-Means Clustering is a popular algorithm for automatically grouping points into natural clusters. QGIS comes with a Processing Toolbox algorithm ‘K-means clustering’ that can take a vector layer and group features into N clusters. A problem with this algorithm is that you do not have control over how many points end up in each cluster. Many applications require you to segment your data layer into equal sized clusters or clusters having a minimum number of points. Some examples where you may need this

When planning for FTTH (Fiber-to-the-Home) network one may want to divide a neighborhood into clusters of at least 250 houses for placement of a node.
Dividing a sales territory/ customers equally among sales teams with customers in the same region are assigned to the same team.

There is a variation of the K-means algorithm called Constrained K-Means Clustering that uses graph theory to find optimal clusters with a user supplied minimum number of points belonging to given clusters. Stanislaw Adaszewski has a nice Python implementation of this algorithm that I have adapted to be used as a Processing Toolbox algorithm in QGIS.

Warning!

I have heard feedback from users that this algorithm doesn’t work on all types of point distributions and may get stuck while finding an optimal solution. I am looking into ways to improve the code and will appreciate if you had feedback.

Continue reading →

Fast Point-in-Polygon Analysis with GeoPandas and Uber’s H3 Spatial Index

By ujaval | July 1, 2020

Spatial indexing methods help speed up spatial queries. Most GIS software and databases provide a mechanism to compute and use spatial index for your data layers. QGIS as well as PostGIS use a spatial indexing scheme based on R-Tree data structure – which creates a hierarchical tree using bounding boxes of geometries. This is quite efficient and results in big speedup in certain types of spatial queries. Check out Spatial Indexing section of my course Advanced QGIS where I show how to use R-Tree based Spatial index in QGIS.

If you use Python for geoprocesisng, the GeoPandas library also provides an easy to use implementation of R-Tree based spatial index using the .sidex attribute. University of Helsinki’s AutoGIS course has an excellent example of using spatial index with geopandas.

In this post, I want to talk about another spatial indexing system called H3.

Continue reading →

Fixing Rasters with Missing Data using QGIS, GDAL and Python

By ujaval | June 17, 2020

When working with raster data, you may sometimes need to deal with data gaps. These could be the result of sensor malfunction, processing errors or data corruption. Below is an example of data gap (i.e. no data values) in aerial imagery.

Source Image: © Commission for Lands (COLA) ; Revolutionary Government of Zanzibar (RGoZ), Downloaded from OpenAerialMap. (***Note: The data gap is simulated using a python script and is not part of the original dataset***)

Continue reading →