Fixing Rasters with Missing Data using QGIS, GDAL and Python

When working with raster data, you may sometimes need to deal with data gaps. These could be the result of sensor malfunction, processing errors or data corruption. Below is an example of data gap (i.e. no data values) in aerial imagery.

Source Image: © Commission for Lands (COLA) ; Revolutionary Government of Zanzibar (RGoZ), Downloaded from OpenAerialMap. (Note: The data gap is simulated using a python script and is not part of the original dataset)

If the data gap is small, it can be effectively addressed by interpolating values from neighboring pixels. I will outline 2 approaches for fixing this. First one using QGIS and another one with pure Python.

The method shown here applies an inverse distance weighted interpolation and smoothing using the gdal_fillnodata tool. As pointed out in the documentation, this is suitable for filling missing regions in continuous raster data such as elevation. It also works for very small gaps in varying data such as aerial imagery. If you are looking to interpolate point data to create rasters, you should use the gdal_grid tool instead. This is available as a few separate algorithms in QGIS under Processing → Toolbox → Raster Analysis → Grid …

Fixing Data Gaps in QGIS

GDAL comes with a tool gdal_fillnodata that can be used from the Processing Toolbox within QGIS.

If the source raster has a nodata value set and it is the same as the missing data value, then you can skip this step. Otherwise, the first step would be to set the raster’s nodata value to the pixel value of the data gap. From Processing → ToolBox, search and locate the Translate (convert format) tool

In our example the nodata pixel value is 0. Set the value 0 for Assign a specified nodata value to output bands option and enter a filename for the converted raster.

Now we are ready to run the Fill nodata tool from the Processing Toolbox

This tool works on 1 band at a time. Select Band 1 (Red). Set the Maximum distance to search out for values to interpolate to 1, since we have only 1 pixel gap. Save the output as 01_red.tif and click Run. Saving the file with prefix such as 01_ is important because the next step will merge the bands in the alphabetical order of the file names.

Repeat the process for Band 2 (Green) and Band 2 (Blue), choosing appropriate file names for them. You should have 3 separate rasters with no data values filled. Now we can merge them to a single file. Search and locate the Merge tool from the Processing Toolbox.

In the Merge tool, select all 3 individual rasters. Check the Place each input file into a separate band box. Enter a filename for the output and click Run.

The resulting merged raster will have 3 bands and the no-data gaps will be filled with interpolated values from neighboring pixels.

Here’s the animation showing before and after versions.

Fixing Data Gaps with Python

Usually when you have such issues in your dataset, it is persistent across all source images. If you have large amounts of data with such data gaps, fixing them in QGIS manually is not feasible. So here’s another approach using Python.

Those who have taken my Python Foundation for Spatial Analysis course, would know that this particular problem motivated me to learn Python 15 years back since there was no readily available solution. Below is a script that shows how to solve this problem in Python with the help of rasterio and numpy libraries.

# Fill Missing Rows with Data
# This script shows how to read an image where certain rows have missing data (i.e. 0) and fill that with the average of adjacent rows.
import rasterio
import numpy as np
filename = 'original.tif'
dataset = rasterio.open(filename)
metadata = dataset.meta
red = dataset.read(1)
green = dataset.read(2)
blue = dataset.read(3)
# The following returns an array where each item is True/False based on the condition `red==0`
result = np.all(red == 0, axis=1)
def average_rows(array, index):
    result = np.round(np.mean( np.array([array[index[0]-1], array[index[0]+1] ]), axis=0 ))
    array[index] = result
    
for index, x in np.ndenumerate(result):
    if (x and index[0] != 0 and index[0] != (dataset.height - 1)):
        average_rows(red, index)
        average_rows(blue, index)
        average_rows(green, index)
output_filename = 'fixed.tif'
metadata.update({'driver': 'GTiff'})
rgb_dataset = rasterio.open(output_filename, 'w', **metadata)
rgb_dataset.write(red, 1)
rgb_dataset.write(green, 2)
rgb_dataset.write(blue, 3)
rgb_dataset.close()

If you want to try out this example, you can download the demo dataset which has the aerial image and Jupyter notebook with the Python code.

Leave a Reply