Estimating Above Ground Biomass using Random Forest Regression in GEE

In this post, we will learn how to build a regression model in Google Earth Engine and use it to estimate total above ground biomass using openly available Earth Observation datasets.

NASA’s Global Ecosystem Dynamics Investigation (GEDI) mission collects LIDAR measurements along ground transects at 30m spatial resolution at 60m intervals. The GEDI waveform captures the vertical distribution of vegetation structure and is used to derive estimates of Aboveground Biomass Density (AGBD) at each sample. These sample estimates of AGBD are useful but since they are point measurements – you cannot directly use them to calculate total aboveground biomass for a region. We can use other satellite datasets to build a regression model using the GEDI samples to map and quantify the biomass in a region.

Regression Workflow

This article shows how we can build and run the entire workflow in Earth Engine – from pre-processing to building a regression model to running the predictions. You will also learn some advanced techniques and best practices such as:

  • How to fuse datasets of different resolutions by using setDefaultProjection() and reduceResolution() to align them to a common grid.
  • How to sample pixels from rasters with sparse data efficiently and precisely by leveraging the image mask using stratifiedSample().
  • How to split your workflow into separate steps and use Exports to avoid user memory limit exceeded or computation timed-out errors.

The post is divided into the following 3 parts.

  • Part-1. Preprocessing and Data Preparation
  • Part-2. Building a Regression Model
  • Part-3. Estimation of Total Biomass

You will find the complete scripts for each part at the end of the post.

Part-1. Preprocessing and Data Preparation

Datasets

  • GEDI L4A Raster Aboveground Biomass Density: These are point estimates of Above Ground Biomass Density (AGBD) that will be used as ground-truth and the predicted variable in the regression model.
  • Sentinel-2 Surface Reflectance: We use Sentinel-2 spectral bands and spectral indices that will be used as independent variables in the regression model. Google’s CloudScore+ dataset will be used to apply a cloud mask to prepare an annual cloud-free composite.
  • Copernicus GLO-30 DEM: Elevation and slope bands computed from GLO30 DEM will be used in both pre-processing of GEDI data and also as independent variables in the regression model.

GEDI L4A data itself can have large errors for many parts of the world due to lack of ground truth data for the underlying models. If you have access to field measurements, you can substitute GEDI data with it by uploading a shapefile of plot level measurements and using reduceToImage() to convert it to a raster.

You may also use add additional datasets as covariates to further improve the model performance. We have code examples on how to stack Sentinel-1 SAR bands and Dynamic World Landcover Probability Bands which may improve the model estimates.

Select Region and Period of Interest

We will use a polygon to define a region of interest. The script is setup so you can just delete and draw a polygon anywhere in the world to use that region. We also define a time-period for our analysis.

// Select a region
// ****************************************************
// Delete the 'geometry' import and draw a polygon
// for your region of interest
Map.centerObject(geometry);

// Select a time-period
// ****************************************************
var startDate = ee.Date.fromYMD(2022, 1, 1);
var endDate = startDate.advance(1, 'year');

Preparing Sentinel-2 composite

We filter the Sentinel-2 collection for images within area and time-period of interest. We then apply a cloud-masking from the CloudScore+ dataset – which provides state-of-the-art cloud-masks for Sentinel-2. We map() functions to apply pixel scaling factor to get band reflectances and compute spectral indices. Once processing, we create a median composite. When we create a composite image, Earth Engine assigns a default projection to the image. Since we will be exporting and resampling the data later on – it is preferable to use .setDefaultProjection() to assign the original projection of the images to the composite.

Sentinel-2 Cloud-free Composite
var filteredS2 = s2
  .filter(ee.Filter.date(startDate, endDate))
  .filter(ee.Filter.bounds(geometry))

// Extract the projection before any processing
var s2Projection = ee.Image(filteredS2.first()).select('B4')
  .projection();
// Function to apply scale factor to convert
// pixel values to reflectances
var scaleBands = function(image) {
  return image.multiply(0.0001)
    .copyProperties(image, ['system:time_start']);
};

// Use Cloud Score+ cloud mask
var csPlus = ee.ImageCollection(
    'GOOGLE/CLOUD_SCORE_PLUS/V1/S2_HARMONIZED');
var csPlusBands = csPlus.first().bandNames();

// Function to mask pixels with low CS+ QA scores.
function maskLowQA(image) {
  var qaBand = 'cs';
  var clearThreshold = 0.5;
  var mask = image.select(qaBand).gte(clearThreshold);
  return image.updateMask(mask);
}


// Function to compute spectral indices
var addIndices = function(image) {
  var ndvi = image.normalizedDifference(['B8', 'B4'])
    .rename('ndvi');

  var mndwi = image.normalizedDifference(['B3', 'B11'])
    .rename('mndwi'); 

  var ndbi = image.normalizedDifference(['B11', 'B8'])
    .rename('ndbi');

  var evi = image.expression(
    '2.5 * ((NIR - RED)/(NIR + 6*RED - 7.5*BLUE + 1))', {
      'NIR': image.select('B8'),
      'RED': image.select('B4'),
      'BLUE': image.select('B2')
    }).rename('evi');

  var bsi = image.expression(
      '(( X + Y ) - (A + B)) /(( X + Y ) + (A + B)) ', {
        'X': image.select('B11'),
        'Y': image.select('B4'),
        'A': image.select('B8'),
        'B': image.select('B2'),
  }).rename('bsi');
  
  return image
    .addBands(ndvi)
    .addBands(mndwi)
    .addBands(ndbi)
    .addBands(evi)
    .addBands(bsi);
};
// We need to add Cloud Score + bands to each Sentinel-2
// image in the collection
// This is done using the linkCollection() function
var filteredS2WithCs = filteredS2.linkCollection(
    csPlus, csPlusBands);
// Apply all the pre-processing steps

// Order in which the functions are applied is important
var s2Processed = filteredS2WithCs
  .map(maskLowQA)
  .select('B.*')
  .map(scaleBands)
  .map(addIndices);

// Create the S2 composite
var s2Composite = s2Processed.median()
  .setDefaultProjection(s2Projection);

Preparing GLO-30 slope and elevation

The GLO-30 DEM comes as an ImageCollection consisting of several images. We filter it for images in our region, create a mosaic and set it’s default projection before computing the slope. Finally we create a stacked image with both slope and elevation bands.

var glo30Filtered = glo30
  .filter(ee.Filter.bounds(geometry))
  .select('DEM');

// Extract the projection
var demProj = glo30Filtered.first().select(0).projection();

// The dataset consists of individual images
// Create a mosaic and set the projection
var elevation = glo30Filtered.mosaic().rename('dem')
  .setDefaultProjection(demProj);

// Compute the slope
var slope = ee.Terrain.slope(elevation);

// Create an image with slope and elevation bands
var demBands = elevation.addBands(slope);

Preparing GEDI L4A mosaic

It is critical to filter out invalid or unreliable GEDI data before using it. We apply several masks to remove potentially erroneous measurements.

  • Remove all measurements not meeting quality requirement (l4_quality_flag = 0 and degrade_flag > 0)
  • Remove all measurements with high relative error (‘agbd_se’ / ‘agbd’ > 50%)
  • Remove all measurements on slopes > 30%

Finally we select all remaining measurements for the time-period and region of interest and create a mosaic.

// Function to select highest quality GEDI data
var qualityMask = function(image) {
  return image.updateMask(image.select('l4_quality_flag').eq(1))
      .updateMask(image.select('degrade_flag').eq(0));
};

// Function to mask unreliable GEDI measurements
// with a relative standard error > 50% 
// agbd_se / agbd > 0.5
var errorMask = function(image) {
  var relative_se = image.select('agbd_se')
    .divide(image.select('agbd'));
  return image.updateMask(relative_se.lte(0.5));
};

// Function to mask GEDI measurements on slopes > 30%
var slopeMask = function(image) {
  return image.updateMask(slope.lt(30));
};

var gediFiltered = gedi
  .filter(ee.Filter.date(startDate, endDate))
  .filter(ee.Filter.bounds(geometry));

var gediProjection = ee.Image(gediFiltered.first())
  .select('agbd').projection();

var gediProcessed = gediFiltered
  .map(qualityMask)
  .map(errorMask)
  .map(slopeMask);

var gediMosaic = gediProcessed.mosaic()
  .select('agbd').setDefaultProjection(gediProjection);
  

Clip and Export All Images

Once the data preparation is done, we export all the resulting images as Assets. This is a crucial step in the workflow that ensures we can process large dataset without running into user-memory limits or computation timed-out errors. This will also help us a lot in the model building step where we can interactively see the model outputs since the inputs will be pre-computed.

// Replace this with your asset folder
// The folder must exist before exporting
var exportPath = 'users/ujavalgandhi/public/blog/'

Export.image.toAsset({
  image: s2Composite.clip(geometry),
  description: 'S2_Composite_Export',
  assetId: exportPath + 's2_composite',
  region: geometry,
  scale: 10,
  maxPixels: 1e10
});

Export.image.toAsset({
  image: demBands.clip(geometry),
  description: 'DEM_Bands_Export',
  assetId: exportPath + 'dem_bands',
  region: geometry,
  scale: 30,
  maxPixels: 1e10
});

Export.image.toAsset({
  image: gediMosaic.clip(geometry),
  description: 'GEDI_Mosaic_Export',
  assetId: exportPath + 'gedi_mosaic',
  region: geometry,
  scale: 25,
  maxPixels: 1e10
});

Once the Asset export finishes, we move to the next part.

Part-2. Building a Regression Model

Import Assets

For Part-2, we start by accessing the assets that were exported in Part-1.

// Replace this with your asset folder used in Part-1
var exportPath = 'users/ujavalgandhi/public/blog/';

// Construct the path to the exported images
var s2Composite = ee.Image(exportPath + 's2_composite');
var demBands = ee.Image(exportPath + 'dem_bands');
var gediMosaic = ee.Image(exportPath + 'gedi_mosaic');

Resample Data to a Grid

GEDI measurements have a horizontal accuracy of +/- 9 m [reference]. This is problematic when matching the GEDI AGB values to Sentinel-2 pixels. To overcome this, we resample and aggregate all input images to a larger pixel grid with mean values from the original pixels. This also helps remove noise from the data and helps build a better machine-learning model.

// Choose the grid size and projection
var gridScale = 100;
var gridProjection = ee.Projection('EPSG:3857')
  .atScale(gridScale);

// Create a stacked image
// We assemble a composite with all the bands
var stacked = s2Composite
  .addBands(demBands)
  .addBands(gediMosaic);

//  Set the resampling mode
var stacked = stacked.resample('bilinear');

// Aggregate pixels with 'mean' statistics
var stackedResampled = stacked
  .reduceResolution({
    reducer: ee.Reducer.mean(),
    maxPixels: 1024
  })
  .reproject({
    crs: gridProjection
});

// As larger GEDI pixels contain masked original
// pixels, it has a transparency mask.
// We update the mask to remove the transparency
var stackedResampled = stackedResampled
  .updateMask(stackedResampled.mask().gt(0));

Extract Training Features

We have our input data ready for extracting training features. We use the Sentinel-2 bands and indices along with elevation and slope as Dependent Variables (aka predictors) and GEDI AGBD values as Independent Variable (aka predicted) in the regression model. We can extract the coincident values at each pixel and prepare our training dataset. Our GEDI image is mostly masked and contain values at only a small subset of pixels. If we used sample() it will return mostly empty values. to overcome this, we create a class-band from the GEDI mask and use stratifiedSample() to ensure we sample from the non-masked pixels.

Training features extracted by random sub-sampling
var predictors = s2Composite.bandNames().cat(demBands.bandNames());
var predicted = gediMosaic.bandNames().get(0);
print('predictors', predictors);
print('predicted', predicted);

var predictorImage = stackedResampled.select(predictors);
var predictedImage = stackedResampled.select([predicted]);

var classMask = predictedImage.mask().toInt().rename('class');
var numSamples = 1000;

// We set classPoints to [0, numSamples]
// This will give us 0 points for class 0 (masked areas)
// and numSample points for class 1 (non-masked areas)
var training = stackedResampled.addBands(classMask)
  .stratifiedSample({
    numPoints: numSamples,
    classBand: 'class',
    region: geometry,
    scale: gridScale,
    classValues: [0, 1],
    classPoints: [0, numSamples],  
    dropNulls: true,
    tileScale: 16
});

print('Number of Features Extracted', training.size()) ; 
print('Sample Training Feature', training.first());

Train a Regression Model

We are now ready to train the model. Many classifiers in Earth Engine can be used for both classification and regression tasks. Since we want to predict a numeric value (instead of a class) – we can set the classifier to run in the REGRESSION mode and train using the training data. Once the model is trained, we can compare model’s prediction against input values and compute the Root-Mean Square Error (RMSE) and correlation coefficient r^2 to check the model’s performance.

You should not split your samples into training and validation partitions as it is commonly done for classification models. You should use all the available data to build the regression model and assess the model accuracy using RMSE. See the discussion to learn more.

To improve the model’s performance and decrease RMSE, you can try Hyperparameter Tuning and Feature Scaling.

// Use the RandomForest classifier and set the 
// output mode to REGRESSION
var model = ee.Classifier.smileRandomForest(50)
  .setOutputMode('REGRESSION')
  .train({
    features: training,
    classProperty: predicted,
    inputProperties: predictors
  });

// Get model predictions for training samples
var predicted = training.classify({
  classifier: model,
  outputName: 'agbd_predicted'
});
// Calculate RMSE
var calculateRmse = function(input) {
    var observed = ee.Array(
      input.aggregate_array('agbd'));
    var predicted = ee.Array(
      input.aggregate_array('agbd_predicted'));
    var rmse = observed.subtract(predicted).pow(2)
      .reduce('mean', [0]).sqrt().get([0]);
    return rmse;
};
var rmse = calculateRmse(predicted);
print('RMSE', rmse)

// Create a plot of observed vs. predicted values
var chart = ui.Chart.feature.byFeature({
  features: predicted.select(['agbd', 'agbd_predicted']),
  xProperty: 'agbd',
  yProperties: ['agbd_predicted'],
}).setChartType('ScatterChart')
  .setOptions({
    title: 'Aboveground Biomass Density (Mg/Ha)',
    dataOpacity: 0.8,
    hAxis: {'title': 'Observed'},
    vAxis: {'title': 'Predicted'},
    legend: {position: 'right'},
    series: {
      0: {
        visibleInLegend: false,
        color: '#525252',
        pointSize: 3,
        pointShape: 'triangle',
      },
    },
    trendlines: {
      0: {
        type: 'linear', 
        color: 'black', 
        lineWidth: 1,
        pointSize: 0,
        labelInLegend: 'Linear Fit',
        visibleInLegend: true,
        showR2: true
      }
    },
    chartArea: {left: 100, bottom:100, width:'50%'},

});
print(chart);

Generate Predictions for Unknown Values

Once we are happy with the model, we can use the trained model to generate predictions at unknown locations from the image containing predictor bands.

var predictedImage = stackedResampled.classify({
  classifier: model,
  outputName: 'agbd'
});

Export the image with predicted values

The image containing predicted AGBD values at each pixel is now ready for export. We will use this in the next part to compute total biomass in the region.

// Replace this with your asset folder
// The folder must exist before exporting
var exportPath = 'users/ujavalgandhi/public/blog/';

Export.image.toAsset({
  image: predictedImage.clip(geometry),
  description: 'Predicted_Image_Export',
  assetId: exportPath + 'predicted_agbd',
  region: geometry,
  scale: gridScale,
  maxPixels: 1e10
});

Part-3: Estimation of Total Biomass

We now have predicted AGBD values are each pixel of the image and that can be used to estimate the total Aboveground Biomass (AGB) Stock in the region.

Import Assets

For Part-3, we start by accessing the assets that were exported in Part-2.

// Replace this with your asset folder used in Part-2
var exportPath = 'users/ujavalgandhi/public/blog/';

// Construct the path to the exported images
var s2Composite = ee.Image(exportPath + 's2_composite');
var predictedImage = ee.Image(exportPath + 'predicted_agbd');

Mask Non-Vegetated Pixels

But we must first remove all pixels belonging to non-vegetated areas. We can use the ESA WorldCover landcover dataset and select vegetated pixels.

var worldcover = ee.ImageCollection('ESA/WorldCover/v200').first();

// Aggregate pixels to the same grid as other dataset
// with 'mode' value. 
// i.e. The landcover with highest occurrence within the grid
var worldcoverResampled = worldcover
  .reduceResolution({
    reducer: ee.Reducer.mode(),
    maxPixels: 1024
  })
  .reproject({
    crs: gridProjection
});
// Select grids for the following classes
// | Class Name | Value |
// | Forests    | 10    | 
// | Shrubland  | 20    |
// | Grassland  | 30    |
// | Cropland   | 40    |
// | Mangroves  | 95    |
var landCoverMask = worldcoverResampled.eq(10)
    .or(worldcoverResampled.eq(20))
    .or(worldcoverResampled.eq(30))
    .or(worldcoverResampled.eq(40))
    .or(worldcoverResampled.eq(95));

var predictedImageMasked = predictedImage
  .updateMask(landCoverMask);

Calculate Total AGB

The units of GEDI AGBD values is Megagrams/Hectares. To get the Total AGB, we multiply each pixel by its area in Hectares and sum their values.

var pixelAreaHa = ee.Image.pixelArea().divide(10000);
var predictedAgb = predictedImageMasked.multiply(pixelAreaHa);

var stats = predictedAgb.reduceRegion({
  reducer: ee.Reducer.sum(),
  geometry: geometry,
  scale: gridScale,
  maxPixels: 1e10,
  tileScale: 16
});

// Result is a dictionary with key for each band
var totalAgb = stats.getNumber('agbd');

print('Total AGB (Mg)', totalAgb);

Access the Complete Scripts

The complete scripts for all the parts are linked below. The code is licensed under the open-source MIT license.

The assets used in the code are public and you can run each part to see the results. If you want to run the code on your own region, change the ‘geometry’ in Part-1 and change the Export folder to your own folder. Once the Exports finish running, you can run the next part.

If you have feedback on the code or methodology, please leave a comment below.

References

  1. Kellner, J. R., Armston, J., & Duncanson, L. (2023). Algorithm theoretical
    basis document for GEDI footprint aboveground biomass density. Earth and Space Science, 10, e2022EA002516. https://doi.org/10.1029/2022EA002516
  2. Yuri Shendryk, Fusing GEDI with earth observation data for large area
    aboveground biomass mapping, International Journal of Applied Earth Observation and Geoinformation, Volume 115, 2022, 103108, ISSN 1569-8432,
    https://doi.org/10.1016/j.jag.2022.103108

15 Comments

Leave a Comment

  1. Thank you sir, for this excellent work. I was wondering why the ‘pixelAreaHa’ was not used in computing the ‘totalAgb’. Could you please explain the rationale behind this decision?

  2. Thank you for the tutorial. How would the uncertainty of the estimation be calculated? Typically, in research papers, after determining the AGB values, authors include uncertainty measurements. For example, 2342 ± 12 Mg/ha.

  3. Thank you for the tutorial. I’m interested in understanding how to calculate the uncertainty of an estimation. Typically, in research papers, authors express this as something akin to “365 ± 12 Mg/h”.

  4. Hi! Super interesting article, do you think it is possible to import a shape, table or cordinates instead of drawing it? It does not work when hay copy an paste the cordinates into a variable

Leave a Reply