Managing Earth Engine Assets using the GEE Python API

If you are like me, you have a lot of assets uploaded to Earth Engine. As you upload more and more assets, managing this data becomes quite a cumbersome task. Earth Engine provides a handy Command-Line Tool that helps with asset management. While the command-line tool is very useful, it falls short when it comes to bulk data management tasks.

What if you want to rename an ImageCollection? You will need to manually move each child image to a new collection. If you wanted to delete assets matching certain keywords, you’ll need to write a custom shell script. If you are running low on your asset quota and want to delete large assets, there is no direct way to list large assets. Fortunately, the Earth Engine Python Client API comes with a handy ee.data module that we can leverage to write custom scripts. In this post, I will cover the following use cases with full python scripts that can be used by anyone to manage their assets:

  • How to get a list of all your assets (including folders/sub-folders/collections)
  • How to find the quota consumed by each asset and find large assets
  • How to rename ImageCollections

The post explains each use-case with code snippets. If you want to just grab the scripts, they are linked at the end of the post.

Installing the Python Client Library

I recommend using Anaconda to create a new environment and install the earthengine-api library. We have detailed step-by-step instructions to install and configure your environment. Once installed, open a Terminal/Anaconda Prompt and activate the environment where the library is installed.

The python scripts in this post should be run from this environment from a terminal. Before running any of the scripts, you must also complete a one-time authentication by running earthengine authenticate command.

Listing All Your Assets

Let’s say you have a lot of assets in your Assets folder and you want a list of all assets. This can get challenging as you can have Folders containing other folders containing collections of assets.

Earth Engine Asset Manager with Nested Folders

To get a list of all images and tables, we need to recursively query the folders and imagecollections to obtain the full list. The script below uses the ee.data.listAssets() method to get child assets and recursively calls it till it lists all sub-folders and ImageCollections.

import argparse
import ee

parser = argparse.ArgumentParser()
parser.add_argument('--asset_folder', help='full path to the asset folder')
args = parser.parse_args()
parent = args.asset_folder

ee.Initialize()


def get_asset_list(parent):
    parent_asset = ee.data.getAsset(parent)
    parent_id = parent_asset['name']
    parent_type = parent_asset['type']
    asset_list = []
    child_assets = ee.data.listAssets({'parent': parent_id})['assets']
    for child_asset in child_assets:
        child_id = child_asset['name']
        child_type = child_asset['type']
        if child_type in ['FOLDER','IMAGE_COLLECTION']:
            # Recursively call the function to get child assets
            asset_list.extend(get_asset_list(child_id))
        else:
            asset_list.append(child_id)
    return asset_list
    
all_assets = get_asset_list(parent)

print('Found {} assets'.format(len(all_assets)))

for asset in all_assets:
	print(asset)

We save this script as list_all_assets.py and run it from the terminal. Make sure to first activate the environment where the earthengine-api is installed. You can use the --asset_folder flag to specify either your asset root folder or any sub-folder/collection within your root folder.

python list_all_assets.py \
  --asset_folder projects/earthengine-legacy/assets/users/ujavalgandhi

Find Large Assets Consuming Quota

Now that we know how to get a list of all the assets, we can build on the script to get information about each asset’s size and locate large assets consuming quota. This is helpful if you want to free up some quota by deleting large assets. We make use of the ee.data.getAsset() method which queries for asset metadata.

import argparse
import ee
import csv

parser = argparse.ArgumentParser(
    usage='python asset_size.py <path to asset folder> <output_file>')
parser.add_argument('--asset_folder', help='full path to the asset folder')
parser.add_argument('--output_file', help='output file to write')

args = parser.parse_args()
parent = args.asset_folder

ee.Initialize()

def get_asset_list(parent):
    parent_asset = ee.data.getAsset(parent)
    parent_id = parent_asset['name']
    parent_type = parent_asset['type']
    asset_list = []
    child_assets = ee.data.listAssets({'parent': parent_id})['assets']
    for child_asset in child_assets:
        child_id = child_asset['name']
        child_type = child_asset['type']
        if child_type in ['FOLDER','IMAGE_COLLECTION']:
            # Recursively call the function to get child assets
            asset_list.extend(get_asset_list(child_id))
        else:
            asset_list.append(child_id)
    return asset_list
    
all_assets = get_asset_list(parent)

print('Found {} assets'.format(len(all_assets)))

data = []

for asset in all_assets:
    print('Processing {}'.format(asset))
    info = ee.data.getAsset(asset)
    asset_type = info['type']
    size = info['sizeBytes']
    size_mb = round(int(size)/1e6, 2)
    data.append({
        'asset': asset, 
        'type': asset_type,
        'size_mb': size_mb
    })
    

# Sort the assets by size
sorted_data = sorted(data, key=lambda d: d['size_mb'], reverse=True)

# Write the data to a file
fieldnames = ['asset', 'type', 'size_mb']
with open(args.output_file, mode='w') as output_file:
    csv_writer = csv.DictWriter(output_file, fieldnames=fieldnames)
    csv_writer.writeheader()
    for row in sorted_data:
        csv_writer.writerow(row)
        
print('Successfully written output file at {}'.format(args.output_file))

We save this file as asset_size.py and run it from a terminal like below.

python asset_size.py \
--asset_folder projects/earthengine-legacy/assets/users/ujavalgandhi/temp \
--output_file assetdata.csv

This generates a nicely formatted CSV where the assets are listed – with the largest assets at the top. You can then use the earthengine rm command or write a script using the ee.data.deleteAsset() method to delete offending assets.

Renaming ImageCollections

There is no easy way to rename ImageCollections in Earth Engine. If you try to use the earthengine mv command to rename a collection, you will get an error message such as “Before trying to move an asset, delete its children”. Instead, we can leverage the Python API and use ee.data.copyAsset() to copy the images from one collection to the other and then ee.data.deleteAsset() to delete the old ones. The script below provides the implementation of this logic.

import argparse
import ee

parser = argparse.ArgumentParser()
parser.add_argument('--old_collection', help='old collection')
parser.add_argument('--new_collection', help='new collection')
parser.add_argument('--delete', help='delete old collection',
    action=argparse.BooleanOptionalAction)

args = parser.parse_args()

old_collection = args.old_collection
new_collection = args.new_collection

ee.Initialize()

# Check if new collection exists
try:
    ee.ImageCollection(new_collection).getInfo()
except:
    print('Collection {} does not exist'.format(new_collection))
    ee.data.createAsset({'type': ee.data.ASSET_TYPE_IMAGE_COLL}, new_collection)
    print('Created a new empty collection {}.'.format(new_collection))
    

assets = ee.data.listAssets({'parent': old_collection})['assets']


for asset in assets:
    old_name = asset['name']
    new_name = old_name.replace(old_collection, new_collection)
    print('Copying {} to {}'.format(old_name, new_name))
    ee.data.copyAsset(old_name, new_name, True)
    if args.delete:
        print('Deleting <{}>'.format(old_name))
        ee.data.deleteAsset(old_name)

if args.delete:
    print('Deleting Collection <{}>'.format(old_collection))
    ee.data.deleteAsset(old_collection)

We save the script as rename_collection.py and use it to rename a collection containing several child assets. If you want to rename col1 to col2, run the script with following flags.

python rename_collection.py \
--old_collection projects/earthengine-legacy/assets/users/ujavalgandhi/col1 \ 
--new_collection projects/earthengine-legacy/assets/users/ujavalgandhi/col2

This will create a new collection col2 and copy over all child images to the new collection. This operation doesn’t delete the old collection. You can use the --delete option to delete the old collection after a successful copy.

Warning!
Please use the –delete option with care. The script is not tested for all possible edge cases.

python rename_collection.py \
--old_collection projects/earthengine-legacy/assets/users/ujavalgandhi/col1 \ 
--new_collection projects/earthengine-legacy/assets/users/ujavalgandhi/col2 \
--delete

You can get all the scripts covered in this post from the following links

If you are new to Earth Engine and want to master it, check out my course End-to-End Google Earth Engine which covers both Javascript and Python API.

Leave a Reply