Tiling Large Exports in Google Earth Engine

ujaval

1 year ago

When exporting large rasters from Google Earth Engine, it is recommended that you split your exports into several smaller tiles. In this post, I will share the best practices for creating tiled exports in your target projection that can be mosaicked together without any pixel gaps or overlaps. They key concept is the use of the crsTransform to ensure that each individual tile is on the same pixel grid.

When exporting global or country-scale images at high-resolution, the job can take hours – and even days. A recommended practice to speed up very large exports is to split the image into smaller tiles and export each tile in a separate task.

Recommended Practice for Very Large Export

But doing this correctly requires nuanced understanding of projections in Earth Engine and making sure all projection related operations are done consistently within GEE. Here is the recommended workflow to split a large images into multiple tiles

Create a tiled grid in your required CRS using coveringGrid() function in Earth Engine.
Calculate a CRS Transform that will be used to create a target pixel grid that will be used for each output tile.
Resample or Aggregate Pixels.
Set a NoData value.
Export each tile with the same CRS Transform.

We will go through an example with code to see how this works. We will take the ESA WorldCover Landcover dataset and export image tiles covering the entire country of Estonia.

EE Python API is preferred for Batch Export like these which require creating and starting multiple exports tasks.

Data Prep

We select a country and create a clipped ESA WorldCover 2020 classification for the region. We use geemap to display the image.

worldcover = ee.ImageCollection("ESA/WorldCover/v100")
lsib = ee.FeatureCollection("USDOS/LSIB_SIMPLE/2017")

# Select the country
country = lsib.filter(ee.Filter.eq('country_na', 'Estonia'))
geometry = country.geometry()

# Select the image for export
image = worldcover.first().clip(geometry)

m = geemap.Map(width=800)
m.addLayer(image, {}, 'Input Image');
m.centerObject(country, 7)
m

Create a Grid

We create a grid and calculate the parameters for the CRS Transform. Each tile of the grid will be exported as a separate image on the chosen pixel grid. If your exported image is larger than 10,000 x 10,000 pixels, Earth Engine will split the output into multiple files. To avoid this, it is recommended to keep the tile sizes such that each tile does not exceed 10,000 pixels. Our input image is 10m pixel resolution, so we set the grid size to be 100,000 x 100,000 meters.

# Choose the export CRS
crs = 'EPSG:3301'

# Choose the pixel size for export (meters)
pixelSize = 10

# Choose the export tile size (pixels)
tileSize = 10000

# Calculate the grid size (meters)
gridSize = tileSize * pixelSize

# Create the grid covering the geometry bounds
bounds = geometry.bounds(**{
  'proj': crs, 'maxError': 1
})

grid = bounds.coveringGrid(**{
  'proj':crs, 'scale': gridSize
})

m.addLayer(grid, {'color': 'blue'}, 'Grid')
m

Calculate the CRS Transform

When a raster is reprojected to another CRS, a target grid of pixel is created and the value of each pixel is computed from the input raster pixels. The output pixel grid is determined by the crs transform which specifies the origin of the grid (coordinates of the top-left pixel) and the size of each pixel (resolution). GEE allows specifying a crsTransform parameter in all projection related functions to ensure that the calculations happen on the specified pixel grid.

Remember that in Earth Engine you seldom need to worry about the input projection of images. You can mix and combine images of different projections without reprojecting them first. You only specify the output projection when performing calculations or exporting and GEE will internally reproject all inputs to the specified target projection.

# Calculate the coordinates of the top-left corner of the grid
bounds = grid.geometry().bounds(**{
  'proj': crs, 'maxError': 1
});

# Extract the coordinates of the grid
coordList = ee.Array.cat(bounds.coordinates(), 1)

xCoords = coordList.slice(1, 0, 1)
yCoords = coordList.slice(1, 1, 2)

# We need the coordinates of the top-left pixel
xMin = xCoords.reduce('min', [0]).get([0,0])
yMax = yCoords.reduce('max', [0]).get([0,0])

# Create the CRS Transform

# The transform consists of 6 parameters:
# [xScale, xShearing, xTranslation, 
#  yShearing, yScale, yTranslation]
transform = ee.List([
    pixelSize, 0, xMin, 0, -pixelSize, yMax]).getInfo()
print(transform)

Resample or Aggregate Pixels

By default, the images are resampled to the target pixel grid using the Nearest Neighbor method. This is fine for most types of images, but you may want to change this behavior for certain types of operations.

When to resample PIXELS?

When the original and the target pixels are of similar size, resampling is appropriate.

For discrete rasters, such as landcover classification, the default nearest neighbor method is appropriate as it preserves the original class values.
For continuous rasters, such as elevation models or climate rasters – you may want to use the bilinear or bicubic resampling methods. This can be done using image.resample('bilinear') or image.resample('bicubic')

When to aggregate Pixels?

When the target pixels are much bigger in size than the original pixels, you need to aggregate the input pixels using an appropriate statistical function into larger pixels.

For discreate rasters, aggregate using the mode reducer, i.e. ee.Reducer.mode().
For continuous rasters, aggregate using the mean reducer, i.e. ee.Reducer.mean().
For population rasters, aggregate using the sum reducer, i.e. ee.Reducer.sum().

Resampling and Aggregation is covered in the GEE User Guide. You can also refer to our post on Aggregating Population Rasters for more details.

In our case, since we have a discrete raster with the same pixel size, we don’t need to do anything for this step.

Set a NoData Value

This is an important step. If you have masked pixels in your image, the output tiles will not be of equal size. To ensure each tile has the same dimensions and there are no gaps or overlapping pixels, unmask() all masked pixels and set a nodata value.

noDataValue = 0
exportImage = image.unmask(**{
    'value':noDataValue,
    'sameFootprint': False
})

Export Tiles

We are now ready to export the images into tiles using the grid layer. We created the tiling grid using the bounding box of the region geometry. This may result in certain grids that have no overlap with the region and thus will be empty. We can filter out those empty grids before exporting.

filtered_grid = grid
  .filter(ee.Filter.intersects('.geo', geometry))
m.addLayer(
    filtered_grid, {'color': 'red'}, 'Filtered Grid')
m

We can now iterate through each grid feature, obtain its geometry and export the image using the chosen export CRS and CRS Transform. We use the ee.batch.Export API to create and start a task for each tile automatically.

tile_ids = filtered_grid.aggregate_array('system:index').getInfo();
print('Total tiles', len(tile_ids))

# Export each tile
for i, tile_id in enumerate(tile_ids):
    feature = ee.Feature(filtered_grid.toList(1, i).get(0))
    geometry = feature.geometry()
    task_name = 'tile_' + tile_id.replace(',', '_')
    task = ee.batch.Export.image.toDrive(**{
        'image': exportImage,
        'description': f'Image_Export_{task_name}',
        'fileNamePrefix': task_name,
        'folder':'earthengine',
        'crs': crs,
        'crsTransform': transform,
        'region': geometry,
        'maxPixels': 1e10
    })
    task.start()
    print('Started Task: ', i+1)

You can view the complete Google Colab Notebook with a fully reproducible example for splitting a large image into tiles and exporting each tile separately. The resulting images are perfectly aligned with no gaps or pixel overlaps.

A Javascript-API version of this workflow is also available in this script. Since there is no way to start tasks automatically in the GEE JS-API, you will have to start each task manually or use a browser extension like OEEL that allows batch exports..