Cache strategies for Google App Engine

Google App Engine (GAE) provides distributed in-memory cache, called Memcache. Due to quite rigid quotas, it might be necessary to extensively use the cache.

In my example, a web based photo gallery, a lot of image scaling is performed. For example, when an album is loaded, all images are shown as thumbnails. These thumbnails are generated with the Images service and more sooner than later an OverQuotaException is thrown. Although a few minutes after an OverQuotaException image transformation can be resumed, it is still annoying for the user. My application catches the exception and shows a “sorry, over quota” default image.

In order to reduce the number of image transformations, all transformed data is put into the cache. For this I am using a very simple (custom) API consisting of a CacheProducer, which creates the data and a CacheService, which checks if the data is already in cache or needs to be created.


public interface CacheProducer<T> {
    T produce();
}
public interface CacheService {
    <T> T cache(CacheProducer<T> producer, Serializable... params);
    void invalidate();
}

The Serializable... params parameter represents the components of the cache key.

Now let’s have a look at how to use this API. First we look at the PhotoDataService (which is used to get the actual image data from the datastore):


public class PhotoDataService {
...
    public byte[] getPhotoData(final Long photoId, final ImageFormat format, final ContentType type) {
        return cacheService.cache(new CacheProducer<byte[]>() {
            public byte[] produce() {
                PhotoData photoData = photoDataRepository.find(photoId);
                byte[] input = photoData.getDataAsArray();
                return imageService.process(input, format, type);
            }
        }, photoId, format, type);
    }
...
}

For this example I removed all code that is not related to caching. What you don’t see here: the CacheService first looks into the cache whether the data has been cached already. If not, the produce() method is called to produce the data, the CacheService puts it into the cache and returns the data. With this approach, I don’t need to care about the actual cache mechanisms, and I reduce clutter in the actual business logic. The cache API is reusable and can be used for any other data and key parameters.

However, the first run still produces a lot of requests to the Images service and here OverQuotaExceptions are still thrown. Next I’ll explain how to use a background task to prepopulate the cache in order to reduce the exceptions even further.