Cache strategies for Google App Engine

Google App Engine (GAE) provides distributed in-memory cache, called Memcache. Due to quite rigid quotas, it might be necessary to extensively use the cache.

In my example, a web based photo gallery, a lot of image scaling is performed. For example, when an album is loaded, all images are shown as thumbnails. These thumbnails are generated with the Images service and more sooner than later an OverQuotaException is thrown. Although a few minutes after an OverQuotaException image transformation can be resumed, it is still annoying for the user. My application catches the exception and shows a “sorry, over quota” default image.

In order to reduce the number of image transformations, all transformed data is put into the cache. For this I am using a very simple (custom) API consisting of a CacheProducer, which creates the data and a CacheService, which checks if the data is already in cache or needs to be created.


public interface CacheProducer<T> {
    T produce();
}
public interface CacheService {
    <T> T cache(CacheProducer<T> producer, Serializable... params);
    void invalidate();
}

The Serializable... params parameter represents the components of the cache key.

Now let’s have a look at how to use this API. First we look at the PhotoDataService (which is used to get the actual image data from the datastore):


public class PhotoDataService {
...
    public byte[] getPhotoData(final Long photoId, final ImageFormat format, final ContentType type) {
        return cacheService.cache(new CacheProducer<byte[]>() {
            public byte[] produce() {
                PhotoData photoData = photoDataRepository.find(photoId);
                byte[] input = photoData.getDataAsArray();
                return imageService.process(input, format, type);
            }
        }, photoId, format, type);
    }
...
}

For this example I removed all code that is not related to caching. What you don’t see here: the CacheService first looks into the cache whether the data has been cached already. If not, the produce() method is called to produce the data, the CacheService puts it into the cache and returns the data. With this approach, I don’t need to care about the actual cache mechanisms, and I reduce clutter in the actual business logic. The cache API is reusable and can be used for any other data and key parameters.

However, the first run still produces a lot of requests to the Images service and here OverQuotaExceptions are still thrown. Next I’ll explain how to use a background task to prepopulate the cache in order to reduce the exceptions even further.

4 thoughts on “Cache strategies for Google App Engine

  1. Are you generating the thumbnails every time the page with a thumbnail is requested? If so, instead, why not store the thumbnails in the GAE data store? You could generate the thumbnails using the GAE task queue.

  2. That is another good idea. However, I also like to reduce the number of datastore queries, and decided for memcache instead. I’m using a scheduled task to refresh the cache, but haven’t looked at the task queue API yet.

    There are two problems I’m dealing with when generating thumbnails: (1) The images service gets over quota quite quickly and (2) the scheduled task I’ve written so far (and which is using a brute-force approach) gets killed after 30 seconds. Maybe a task queue is a better solution.

  3. The task queue tasks are also killed at 30 seconds. You will have to break the task into smaller chunks. How about generating the thumbs when they are uploaded.

  4. So far, my application works quite good and there are only few over quota exceptions, so I haven’t looked at task queues yet…
    Why do I want to have the cache built up lazily? That’s because I believe that I have a better control about the thumbnail creation process that way and don’t need to worry about if a thumbnail was created or not. If I were storing thumbnails only in DB, then I had to handle errors with some custom retry mechanism or a combination of all sort of methods (including the one described in the post). So I decided to go with the simple solution.

Hinterlasse eine Antwort

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind markiert *

*

Du kannst folgende HTML-Tags benutzen: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>