At VendAsta we needed to migrate a number of applications from the Master/Slave datastore to the High-Replication datastore. This process is straightforward, and Google provides tools to facilitate it. However, there is no Google tool to migrate blobstore data between applications.
This posting describes the approach developed and used by VendAsta.
NOTE: As written, this blog posting only works properly prior to aliasing, with writes enabled on the source and target applications.
The approach developed at VendAsta uses the open-source Fantasm workflow engine to run state machines (App Engine
Tasks) on the source (Master/Slave) and target (High-Replication) applications that pass blobstore data to each other via the
Figure 1 shows an overview of the messages passed during the operation of the blobstore migration.
A. The process is started by calling
/start_migration/ on the source application.
/start_migration/ starts a named state machine
Send to continue (in parallel) over source blobs on the source application.
Send machine calls
/start_pull/ on the target application.
/start_pull/ starts a named state machine
Pull on the target application to perform the copy.
Pull machine fetches blob data from the source application via the
/pull_endpoint/ and saves it on the target application.
Tasks are used throughout to ensure idempotency. For instance, a blobstore migration can only be started once, and a specific blob will only be copied across a single time.
Python code for this migration tool can be downloaded from Fantasm’s SVN Google code repository:
> svn export https://fantasm.googlecode.com/svn/trunk fantasm > cd fantasm/applications/blobstore_migration
These instructions assume you want to copy blobstore data from the Master/Slave application
sourceapp.appspot.com to the HRD application
Deploy the source code into both the source and target applications by editing
app.yaml and running
appcfg.py update for both.
> head -n 2 app.yaml application: sourceapp version: blobstore-migration > appcfg.py update . > head -n 2 app.yaml application: targetapp-hrd version: blobstore-migration > appcfg.py update .
The blobstore migration code will be available as a separate version (on its own unique URL) on each App Engine application. You can kick-off a migration by calling the following URL on the source application:
- Edit the “
FIXME” in the Target URL to point to your
- If you need to re-start a migration for any reason, supply a new
- Click “Submit”.
You will receive a “
Success!!” message if the migration was successfully started.
Migrating Data Models
Pull state machine provides a hook method
migrate() that runs on the target application allowing modifications to data models that reference the source blob keys.
def migrate(self, sourceBlobKey, targetBlobKey): """ Contains user code to migrate data @param sourceBlobKey: a string blob key in the source app @param targetBlobKey: a string blob key in the target app """ pass
This method is available in the file
blobstore_migration_fsm.py. You can add any user code you would like. If this code results in a recoverable error (e.g.
DeadlineExceeded) then the infrastructure will ensure that subsequent retries of the
Pull machine will pass the same blob keys to the method, ensuring you can upgrade your references in an idempotent manner.
At VendAsta we chose to not use this hook, and simply refactored our
BlobInfo querying code as described in the following section.
VendAsta’s customers have existing URLs that reference blob keys on source (Master/Slave) applications. These URLs still need to be supported, and not return error response codes. This can be accomplished with a minor bit of refactoring of
blobstore.BlobInfo querying code.
This refactoring is facilitated by the
BlobKeyMap models that get created on the target application when running the blobstore migration application. You must copy this
db.Model definition into your own source code in order to use it. These models have key names identical to the values in the unindexed property
class BlobKeyMap( db.Model ): """ Maps Master/Slave blobs to High-Replication blobs """ sourceBlobKey = db.StringProperty(required=True, indexed=False) targetBlobKey = db.StringProperty(required=True)
So, instead of looking up a single (non-list)
BlobInfo like this:
blobInfo = blobstore.BlobInfo.get(blobKey)
...use something like the following instead:
def getBlobInfo(blobKey): # first lookup the blob in the target application blobInfo = blobstore.BlobInfo.get(blobKey) if not blobInfo: # if not found, look for the key in the source application blobKeyMap = BlobKeyMap.get_by_key_name(str(blobKey)) if blobKeyMap: blobKey = blobKeyMap.targetBlobKey # finally, lookup the source blob by target key blobInfo = blobstore.BlobInfo.get(blobKey) return blobInfo blobInfo = getBlobInfo(blobKey)
You can do similar things with lists of blob keys.
Existing migration examples enforced the use of
blobstore.BlobReferenceProperty to automatically migrate references. VendAsta’s data model included some stringified/serialized blob keys so it was not practical to convert the data model prior to migration.
VendAsta has written a self-contained application that you can use to copy blobstore data between App Engine applications. It is straight-forward to deploy this code as a separate version on your applications. It provides a robust and fault-tolerant mechanism to copy blobstore data between App Engine applications.
Related: In September, VendAsta's Jason A. Collins wrote about using Fantasm for incremental data backups on Google App Engine.