Histogram Matching in Google Earth Engine

Color correction is an important process working with satellite and aerial imagery. A common technique used to balance the colors across multiple images is Histogram Matching. While the algorithm has been around for a long time, there aren’t many free and open-source tools that can used at scale. Mapbox has released an open-source tool called rio-hist that works well for small and medium sized images. Whitebox Tools has a Histogram Matching algorithm that can be used in QGIS via Whitebox Tools Processing Plugin. But when working with large mosaics, such as the ones used in this post – it runs out of memory or takes a very long time. Google Earth Engine is a good alternative to perform fast histogram matching across large images.

In this post, I will first give an overview of the histogram matching algorithm and then show you how it can be implemented in Earth Engine. The example images are large high resolution orthomosaics (3cm/pixel resolution) collected by UAV around Oakland, CA area.

Source Images: © Dan Koopman, Downloaded from OpenAerialMap

The histogram matching algorithm requires first computing the histogram of both reference and target images. Then for each of them we must calculate a Cumulative Distribution Function (CDF). This article gives a good overview of the process. Once we have the CDF for both the images, for each pixel in the target image, we can find the cumulative probability using the target CDF and then for that probability, lookup the pixel value using the reference CDF. The diagram below shows an example of how the matching algorithm works.

Implementation

Below is the earth engine implementation of this algorithm with explanations of each step. It is largely adapted from the code from this thread on the user forum.

Computing a histogram in Earth Engine is easy using the built-in ee.Reducer.histogram() reducer.

 var histogram = image.reduceRegion({
    reducer: ee.Reducer.histogram({
      maxBuckets: Math.pow(2, 8), 
    }), 
    geometry: image.geometry(), 
    scale: 10, 
    maxPixels: 1e9, 
  });

Once we have the histogram computed, we need to calculate the cumulative probabilities for each DN value. This can be accomplished using arrays – so we must first get familiar with how arrays work. Arrays and Matrix tutorial in the developer’s guide is a great overview of the topic. Below is a quick overview of the functions we need. The accum() function helps us create an array where each value is the cumulative sum of all previous values.

var test1 = ee.Array([0, 1, 2, 3, 4, 5])
print(test1.accum({axis:0})) // [0, 1, 3, 6, 10, 15]

Another function we need to understand is the ee.Array.cat() – which combines arrays across a given axis. The below snippet shows how 2 arrays can be combined across different axis values.

var test2 = ee.Array([1, 1, 1, 1, 1, 1])
// Combine along axis 0
print(ee.Array.cat([test1, test2], 0)) // [0,1,2,3,4,5,1,1,1,1,1,1]
// Combine along axis 1
print(ee.Array.cat([test1, test2], 1)) // [[0,1],[1,1],[2,1],[3,1],[4,1],[5,1]]

With the help of above functions, we can compute cumulative probabilities for each DN value. When coding in Earth Engine, it is always better to get your data structured as a FeatureCollection or an ImageCollection. These structures allow us to manipulate and visualize the data much more easily. You can create features with null geometries if your data is not spatial. Here’s the function that takes an image and returns a feature collection with cumulative probabilities.

var getHistData = function(image, band) {
  var histogram = image.reduceRegion({
    reducer: ee.Reducer.histogram({
      maxBuckets: Math.pow(2, 8), 
    }), 
    geometry: image.geometry(), 
    scale: 10, 
    maxPixels: 1e9, 
  });
  var dnList = ee.List(ee.Dictionary(histogram.get(band)).get('bucketMeans'));
  var countsList = ee.List(ee.Dictionary(histogram.get(band)).get('histogram'));
  var cumulativeCountsArray = ee.Array(countsList).accum({axis:0});
  var totalCount = cumulativeCountsArray.get([-1]);
  var cumulativeProbabilities = cumulativeCountsArray.divide(totalCount);
  
  var array = ee.Array.cat({arrays: [dnList, cumulativeProbabilities], axis:1});
  var fc = ee.FeatureCollection(array.toList().map(function(list) {
    return ee.Feature(null, {
      dn: ee.List(list).get(0), 
      probability: ee.List(list).get(1)});
  }));
  return fc
};

Here’s what the feature collection looks like when plotted DNs against cumulative probabilities.

Now we have a feature collection that has values for cumulative probabilities for each DN. We need to now find a function that fits these values. This is a classic regression problem. In Earth Engine, we can use classifiers to do standard regression.

When you build a classifier, the default mode is to output discrete class numbers. But you can set the mode to REGRESSION to have the classifier output continuous values from standard regression.

We build 2 classifiers to fit the CDF of each image. One will take the a DN value and predict the cumulative probability and another that will take the cumulative probability and predict the matched DN value.

Below snippet shows the equalize function that will compute the CDF and match a band of target image to the reference image.

var equalize = function(referenceImage, targetImage, band) {
  var referenceHistData = getHistData(referenceImage, band);
  var targetHistData = getHistData(targetImage, band);
  // We train dnToProb on target image and probToDn on reference image
    
    var dnToProb = ee.Classifier.smileRandomForest(100)
      .setOutputMode('REGRESSION')
      .train({
        features: targetHistData, 
        classProperty: 'probability', 
        inputProperties: ['dn']
    });
  
    var probToDn = ee.Classifier.smileRandomForest(100)
      .setOutputMode('REGRESSION')
      .train({
        features: referenceHistData, 
        classProperty: 'dn', 
        inputProperties: ['probability']
    });
    // Now we can take the target image and get cumulative probability
    var targetImageProb = targetImage.select(band).rename('dn').classify(dnToProb, 'probability')
    var targetImageDn = targetImageProb.classify(probToDn, band)
    return targetImageDn
};

The last part is calling the equalize function for all 3 bands and combining the results.

var match = function(referenceImage, targetImage, bandNames) {
  var matchedBands = bandNames.map(function(band) {
    return equalize(referenceImage, targetImage, band);
  })
  return ee.Image.cat(matchedBands)
};

Here’s the link to the complete script. If you want to try it with your own dataset, upload them to your account and replace the reference and target imports. The result is quite satisfying and the target image’s color tone matches the reference image quite well.

4 Comments

Leave a Comment

  1. How do I find or determine a target image when I’m working with a Sentinel-2 Level-2A image collection?

      • Thanks for your reply. The S-2 L2A Image collection should be already atmospheric corrected when used by Google. The paper deals with S-2 L-1C and only cover certain continents. I work with imagery from Northern Europe, so I cannot use their data. I could not find concrete information about a code for BRDF correction, but I will check again. I also do not understand why histogram matching should not be done with scientific data…

Leave a Reply