Migrating blobstore data between App Engine applications using Fantasm.

Posted on by

Introduction

At VendAsta we needed to migrate a number of applications from the Master/Slave datastore to the High-Replication datastore. This process is straightforward, and Google provides tools to facilitate it.  However, there is no Google tool to migrate blobstore data between applications.

This posting describes the approach developed and used by VendAsta.

NOTE: As written, this blog posting only works properly prior to aliasing, with writes enabled on the source and target applications.

Overview

The approach developed at VendAsta uses the open-source Fantasm workflow engine to run state machines (App Engine Tasks) on the source (Master/Slave) and target (High-Replication) applications that pass blobstore data to each other via the urlfetch API.

Figure 1 shows an overview of the messages passed during the operation of the blobstore migration.

A. The process is started by calling /start_migration/ on the source application.

B. /start_migration/ starts a named state machine Send to continue (in parallel) over source blobs on the source application.

C. The Send machine calls /start_pull/ on the target application.

D. /start_pull/ starts a named state machine Pull on the target application to perform the copy.

E. The Pull machine fetches blob data from the source application via the /pull_endpoint/ and saves it on the target application.

Messages passed during blobstore migration
Figure 1: Messages passed during blobstore migration.

Named Tasks are used throughout to ensure idempotency. For instance, a blobstore migration can only be started once, and a specific blob will only be copied across a single time.

Downloading

Python code for this migration tool can be downloaded from Fantasm’s SVN Google code repository:

> svn export https://fantasm.googlecode.com/svn/trunk fantasm
> cd fantasm/applications/blobstore_migration

Installing

These instructions assume you want to copy blobstore data from the Master/Slave application sourceapp.appspot.com to the HRD application targetapp-hrd.appspot.com.

Deploy the source code into both the source and target applications by editing app.yaml and running appcfg.py update for both.

> head -n 2 app.yaml
application: sourceapp
version: blobstore-migration
> appcfg.py update .

> head -n 2 app.yaml
application: targetapp-hrd
version: blobstore-migration
> appcfg.py update .

Running

The blobstore migration code will be available as a separate version (on its own unique URL) on each App Engine application. You can kick-off a migration by calling the following URL on the source application:

http://blobstore-migration.sourceapp.appspot.com/blobstore_migration/start_migration/

User interface to start blobstore migration
Figure 2: User interface to start a blobstore migration.
  1. Edit the “FIXME” in the Target URL to point to your targetapp-hrd application.
  2. If you need to re-start a migration for any reason, supply a new Task Name.
  3. Click “Submit”.

You will receive a “Success!!” message if the migration was successfully started.

Migrating Data Models

The Pull state machine provides a hook method migrate() that runs on the target application allowing modifications to data models that reference the source blob keys.

def migrate(self, sourceBlobKey, targetBlobKey):
     """ Contains user code to migrate data

     @param sourceBlobKey: a string blob key in the source app
     @param targetBlobKey: a string blob key in the target app
     """
     pass

This method is available in the file blobstore_migration_fsm.py. You can add any user code you would like. If this code results in a recoverable error (e.g. DeadlineExceeded) then the infrastructure will ensure that subsequent retries of the Pull machine will pass the same blob keys to the method, ensuring you can upgrade your references in an idempotent manner.

At VendAsta we chose to not use this hook, and simply refactored our BlobInfo querying code as described in the following section.

Querying for blobstore.BlobInfo Instances

VendAsta’s customers have existing URLs that reference blob keys on source (Master/Slave) applications. These URLs still need to be supported, and not return error response codes. This can be accomplished with a minor bit of refactoring of blobstore.BlobInfo querying code.

This refactoring is facilitated by the BlobKeyMap models that get created on the target application when running the blobstore migration application. You must copy this db.Model definition into your own source code in order to use it. These models have key names identical to the values in the unindexed property sourceBlobKey.

class BlobKeyMap( db.Model ):
  """ Maps Master/Slave blobs to High-Replication blobs """
  sourceBlobKey = db.StringProperty(required=True, indexed=False)
  targetBlobKey = db.StringProperty(required=True)

So, instead of looking up a single (non-list) BlobInfo like this:

blobInfo = blobstore.BlobInfo.get(blobKey)

…use something like the following instead:

def getBlobInfo(blobKey):
  # first lookup the blob in the target application
  blobInfo = blobstore.BlobInfo.get(blobKey)
  if not blobInfo:
    # if not found, look for the key in the source application
    blobKeyMap = BlobKeyMap.get_by_key_name(str(blobKey))
    if blobKeyMap:
      blobKey = blobKeyMap.targetBlobKey
      # finally, lookup the source blob by target key
      blobInfo = blobstore.BlobInfo.get(blobKey)
  return blobInfo

blobInfo = getBlobInfo(blobKey)

You can do similar things with lists of blob keys.

Existing migration examples enforced the use of blobstore.BlobReferenceProperty to automatically migrate references. VendAsta’s data model included some stringified/serialized blob keys so it was not practical to convert the data model prior to migration.

Conclusion

VendAsta has written a self-contained application that you can use to copy blobstore data between App Engine applications. It is straight-forward to deploy this code as a separate version on your applications. It provides a robust and fault-tolerant mechanism to copy blobstore data between App Engine applications.

Related: In September, VendAsta’s Jason A. Collins wrote about using Fantasm for incremental data backups on Google App Engine.

  • Jitender

    Excellent . Works like a charm. Thanks a lot.