25 Dec 2008

Update: Beta 2 Released! See the Gaebar Beta 2 announcement.

Here's my Christmas present for the Google App Engine community: Google App Engine Backup and Restore (or Gaebar, for short).

Gaebar is an easy-to-use, standalone Django application that you can plug in to your existing Google App Engine Django or app-engine-patch-based Django applications on Google App Engine to give them datastore backup and restore functionality.

For a quick overview of Gaebar, watch the screencast, above. For the impatient, links to the project pages where you can download Gaebar follow.

Downloads

Gaebar is hosted on GitHub. You can either download archives or clone the repository (or install Gaebar as a git submodule) via git. Alongside the Gaebar project itself are two sample applications — one built on Google App Engine Django and the other on app-engine-patch — that contain the Gaebar functional test suite. The functional test suite tests every datatype supported by Google App Engine as well as references, Expandos, and ancestor relationships.

Please make sure you read the readme files after downloading the projects for installation and usage instructions.

What you can do with Gaebar

  • Backup your deployment application's datastore for safekeeping.
  • Restore that backup on your local development server for testing with real data during development.
  • Restore that backup to a different Google App Engine application and use that application as a staging application much in the same way you would use a staging server in traditional development.

(You can, of course, also backup and restore your local development datastore as well as your staging application, etc.)

I've got a huge datastore, will Gaebar work for me?

Congratulations on the impressive size of your datastore! The answer should be "yes!" The largest datastore I've tested it with is the <head> web conference datastore. The latest backup had 18, 969 rows from 14 models backed up into 225 code shards.

What's a staging application?

A staging application is a new Google App Engine concept made possible by Gaebar. Basically, if your application is myapp.appspot.com, you can use a separate application (say, myapp-staging.appspot.com) in the same way as you would use a staging server in traditional development.

Your staging application can let you try out new features and test with real data without having your users see your changes until you are ready to deploy to your main application.

In fact, it's a perfect staging environment since it is identical to your deployment environment. For an example of this, see the screencast.

How Gaebar works

Gaebar backs up the data in your datastore to Python code. It restores your data by running the generated Python code.

Since a backup is a long running process, and since Google App Engine doesn't support long-running processes, Gaebar fakes a long running process by breaking up the backup and restore processes into bite-sized chunks and repeatedly hitting the server via Ajax calls.

By default, Gaebar backs up 5 rows at a time to avoid the short term CPU and 10-second call duration quotas and splits the generated code into code shards of approx. 300KB to avoid the 1MB limit on objects. You can change these defaults in the views.py file if your app has higher quotas and you want faster backups and restores.

Once a backup of a remote server is complete, Gaebar automatically hits your local development server. From there on, the local development server makes a series of calls to the remote server to download the backup files (code shards) from the remote server. Once the backup is complete, you will see a new backup folder in gaebar/backups with the contents of your backup.

Here's an example of some generated backup code from the Google App Engine Django test application (updated for the upcoming Beta 2 release):

import pickle
from google.appengine.api.datastore import datastore_types
from app1.models import *
from app2.models import *
 
def row_0(app_name):
	existing_entity = Profile.get(datastore_types.Key.from_path('Profile', 1, _app=app_name))
	if existing_entity:
		existing_entity.delete()
	profile_0 = Profile(key_name="id1", friends = pickle.loads('(lp0\n.'), in_relationship_with = pickle.loads('N.'), full_name = pickle.loads('VPaul Booth\np0\n.'))
	profile_0.put()
 
def row_1(app_name):
	existing_entity = Profile.get(datastore_types.Key.from_path('Profile', 2, _app=app_name))
	if existing_entity:
		existing_entity.delete()
	profile_1 = Profile(key_name="id2", full_name = pickle.loads('VAral Balkan\np0\n.'), friends = [datastore_types.Key.from_path('Profile', u'stephalicious', _app=app_name), datastore_types.Key.from_path('Profile', 'id1', _app=app_name)], in_relationship_with = datastore_types.Key.from_path('Profile', u'stephalicious', _app=app_name))
	profile_1.put()

To restore, you simply deploy your application, along with the backup folder, to your deployment environment and hit the Restore button in Gaebar. (If you have a large datastore both the backup and restore processes will take a long time, especially when restoring to a local development server.)

The restore process simply calls each of the generated row functions and each row function restores a single row into the datastore.

A note on the screencast

When mentioning how to install Gaebar, I left out that you also need to add the URL mapping for Gaebar to your application's urls.py. That, along with the rest of the installation instructions are in the readme.txt file, which I highly recommend that you peruse.

Have your say!

As in all things, my approach to blog posts is that they should evolve over time and your feedback is invaluable in achieving this by helping me fix factual errors, fill in details, and expand the original post.

What do you think of Gaebar? Have you run into any issues that need fixing? Do you have other suggestions on how to improve it? Or do you, perhaps, have a patch to send me that adds Webapp support or some other feature? Leave me a comment and let me know!

Gaebar is a Naklabâ„¢ production released under GNU GPL v3 and sponsored by the <head> web conference.

Gaebar Logo

Add Your Comment

Spam Protection by WP-SpamFree

Google App Engine Backup and Restore (Gaebar) released

  1. Awesome!

    How many times did it take for it to upload? :D

    Henri Watson
  2. Pretty cool. No surprise, you keep coming up with cool and new stuff.

    keep it up.

    Thanks

    -abdul

    Abdul Qabiz
  3. The innuendo in the name is intentional, yes?

    BigGary
  4. @Henri: Six or seven tries and about five or six hours :)

    @Abdul: Thanks!

    @BigGary: In-nu-en-do? :)

    Aral
  5. Awesome! That’s exactly what I need in my project.

    One quick question, does the backup restore support the reference properties in the datastore well? I will try it and test myself also.

    Robert
  6. Hi Robert,

    Yes, reference properties are supported. The restore process rewrites all keys using key names and, in the second pass of the restore process, recreates the reference relationships.

    Look through models.py and views.py in app1 in the Gaebar-gaed or Gaebar-aep sample projects to see the functional tests. They cover all of the datastore datatypes as well as references, ancestors, and Expando models.

    Do let me know if I’ve missed anything or if you run into any issues while testing with your real-world data.

    Aral
  7. Right, I just got the un(?)intentional innuendo thingy… “gay bar” for the record.

    I for one have unconsciously spelled it “GaeBear” which could stand for “Google App Engine Backup Everything and Restore”.

    Anyway, I guess it’s too late to rechristen the baeby, now that it’s being called weird names at school ;)

    Patrick Welfringer
  8. Great tool, thanks!

    The name sucks though :o

    brr
  9. Thanks for coming out with this application ;)

    Luke
  10. brilliant! love the play on the name

    Matthew
  11. You might want to add to the documentation (unless I just missed it) that an index.yaml entry is required for creating backups:

    - kind: GaebarCodeShard
    properties:
    – name: backup
    – name: created_at

    Thomas Bohmbach, Jr.
  12. I have just been playing with this, but it isn’t working.

    The problem is that my model class has a required field, and the app_engine validates the model on construction, not saving. Any thoughts??

    I get an error like:
    Traceback (most recent call last):
    File “../common/zip-packages/django.zip/django/core/handlers/base.py”, line 86, in get_response
    File “../gaebar/views.py”, line 1236, in backup_restore_row
    row_function(pass_number, application_name)
    File “../gaebar/backups/backup_2008_12_31_at_11_23_33_529775/shard0.py”, line 15, in row_0
    account_0 = Account(key_name=”id801″)
    File “/opt/google_appengine.117/google/appengine/ext/db/__init__.py”, line 587, in __init__
    prop.__set__(self, value)
    File “/opt/google_appengine.117/google/appengine/ext/db/__init__.py”, line 387, in __set__
    value = self.validate(value)
    File “/opt/google_appengine.117/google/appengine/ext/db/__init__.py”, line 2100, in validate
    value = super(FloatProperty, self).validate(value)
    File “/opt/google_appengine.117/google/appengine/ext/db/__init__.py”, line 414, in validate
    raise BadValueError(’Property %s is required’ % self.name)
    BadValueError: Property interestRate is required

    Jonathan
  13. It would be easy enough to set all the fields in the construction line instead of on following lines. This would also save bytes in the shards.

    Jonathan
  14. Thanks Thomas, I didn’t think to add it since the local dev server automatically creates them, but you’re right, that should be in the docs. I just committed the change. Docs now read:

    6. If you are declaring your indices manually, add the following to your index.yaml file (or run Gaebar locally in the dev server so that the index is created for you automatically):

    - kind: GaebarCodeShard
    properties:
    - name: backup
    - name: created_at

    Aral
  15. Ah, there you go, I didn’t catch this as I don’t think I have any required fields in my models (I prefer to do my own validation). I’m going to look into this ASAP and add a required field to the functional tests. Could be a hairy one… my first thought is to monkey patch the validation routines to disable them. Other suggestions welcome.

    Just read your follow-up comment. Of course, initiating everything in the constructor should solve the problem. Will update you once I’ve tested.

    Thanks again for telling me about this.

    Aral
  16. I can help test the required properties too. I ran into this and had started on a patch, but now I’ll just wait for your update :)

    Thomas Bohmbach, Jr.
  17. Just a quick note: if you have required properties, the quick workaround (if you can) is to specify a default value for your required properties (default=something) — this will prevent the exception.

    In the meanwhile, I’m rewriting the generated code to support required properties and will update this thread once I’ve pushed my changes to the remote repository.

    Aral
  18. Thomas, Jonathan: I’ve pushed my latest changes to the remote repository. I’ve added support for all required properties _except_ for ReferenceProperty instances. Going to look into that now as that’s probably going to be trickier.

    Aral
  19. OK, since we specify the model order in the settings file, we can be sure that referenced properties will exist before the properties that refer to them are created so I’m going to make the restore process single pass and create reference properties in the constructor along with all other properties so we can support required reference properties.

    Aral
  20. Hey guys, I just pushed another update to the Git repositories for Gaebar and Gaebar-gaed (Gaebar-aep has _not_ been updated yet).

    Changes:

    • All required properties, including references should work correctly now.
    • No longer using actual entity references (which was unnecessary) but keys to create reference properties. This means that creation order of references doesn’t matter any more (thanks to Pete Koomen who explained all this to me in an email ages ago; it only just sunk in, Pete!)
    • Due to above changes, the restore process is now one-pass, not two.

    I’m going to update gaebar-aep (I want refactor to make the functional tests their own Git submodule) and then push a Beta 2 release and announce it.

    Any testing in the meanwhile would be welcome but, given that it’s New Year’s Eve, I won’t hold my breath!

    Happy New Year everyone!

    Aral
  21. OK, I just updated gaebar, gaebar-gaed, gaebar-aep, and gaebar-gaed-skeleton (a new app that gives you a ready-to-use Google App Engine Django/Helper skeleton app with Gaebar pre-installed) to the latest gaebar codebase.

    I also refactored the functional tests into their own app on Github (gaebar-functional-tests). Both the App Engine Django and app-engine-patch test apps use that submodule now.

    I’m going to update the docs and the version numbers and we should have our Beta 2 release.

    Thank you again for your help and I hope you’ll test Beta 2 and let me know if you encounter any further issues and/or have comments or suggestions.

    Aral
  22. [...] happy to announce that there’s quite an important update to Gaebar that brings with it some essential bug fixes and should help shave quite a bit of time off of your [...]

    Aral Balkan - Gaebar Beta 2 Released
  23. Great idea and a cool name also. :D

    Ante Vrli
  24. Thanks for writing this…I’m really excited to try it out!

    I have one problem. When I’m trying to download the code shards to my local machine, the urls on the production server (e.g. http://some-app.appspot.com/gaebar/download-py/2009-01-06%2006:58:36.733418/098f6bcd4621d373cade4e832627b4f6/) I get the following error:

    NeedIndexError at /gaebar/download-py/2009-01-06 06:58:36.733418/098f6bcd4621d373cade4e832627b4f6/

    no matching index found.
    This query needs this index:
    - kind: GaebarCodeShard
    properties:
    – name: backup
    – name: created_at

    Did I miss something? I thought the indexes are supposed to be automatically generated.

    Tim
  25. Sorry, I was mistaken. I added the entry to index.yaml, updated the deployed app and it works as expected. I misunderstood the previous comments to mean that the index.yaml would be updated after accessing /gaebar/ on the development server.

    Thanks again!

    Tim
  26. Thanks Aral! My project was webapp-based and I *really* needed a better staging solution so gaebar was a good excuse to port to Django. I used your Gaebar-gaed template and had everything running in about two days.

    Very nice work!

    Tony Andrews
  27. [...] See the original announcement for a screencast and general information on Gaebar. [...]

    Aral Balkan - Gaebar Beta 3 (version 0.3) released
  28. What about an app that is not running the helper or patch. Anyway to use this?

    ray malone
  29. That was a beautiful screen cast that you made. If you don’t mind me asking, what tools did you use to create it?

    Jim
  30. It is GPLv3 so I can’t use it. I use BSD for my open source.

    Alvin
  31. Hi Alvin,

    What specifically is stopping you from using it? Since you’re not redistributing it or providing access to it to others, you should be able to use it without open sourcing your app. Do let me know if I’m missing something and I’ll consider releasing it under LGPL.

    Aral
  32. Aral
  33. Hi Aral,

    Great idea with the data extraction by chunk and great implementation! You have even been referenced on GAE Official blog! Congratulations.

    I imagine you’ll modify the scripts to save more rows at a time because of the increase of the limit of 10 seconds per computation to 30 seconds.

    I also want to mention I referenced this post in mine which explains the benefits of GAE service: Google App Engine: Free Hosting and Powerful SDK.

    A+, Dom

    Dom Derrien
  34. Nicely done.

    One enhancement that I could see useful in the future is to limit the number of records returned for a particular table. Maybe to pull randomly too.

    You may not want to develop on your local machine with a huge data set but use real user data.

    Just thoughts. Will be plugging it in.

    Thanks. I have added your site to my list of reading now!

    Glenn
  35. appreciate the info guys, thanks

    Malati
  36. [...] The next problem area is data migration with new software versions. I might see if I can make use of this somewhere: GAE BAR [...]

    BoraDev Software Technology » Python and Django and the Google App Engine
  37. Gaebar is not working for me. I had to hack the views.py file for it to setup the models properly as the kind map appears to have changed from what Gaebar is expecting. It is a very quick and nasty hack starting on line 448.

    for model_name in kind_map:
    model_classes.append(kind_map[model_name])
    model_name = model_name.split(’_')
    if model_name[-1] == current_model.lower():
    model_name_from_current_model = model_name[0] + ‘_’ + current_model.lower()

    Model = kind_map[model_name_from_current_model]

    This works fine and the back up is generated. But now when the local server tries to download it all I get back are 500 errors in the shard files and no data. Plus I am getting far more shard files than I should be as it appears to go into an infinite loop. I have had a quick look into why this may be and cannot immediately figure it out.

    I am running this on the latest AEP. Any ideas?

    Thanks,
    Simon

    Simon Holywell
  38. Damn right Simon, Gaebar is not working, the (448) Model = kind_map[current_model] “KeyError” ’cause not exist the dict-key current obtained from db._kind_map
    Exist only in lowercase and with the “_”. Why? I dont’t know.
    So I get the same 500 errors too after that.

    Well I’ll try to figure it out too. Any solution I’ll posted here.
    Thank you.
    Ivan

    Ivan
  39. Alexander Vasiljev
  40. 1. remote_api
    2. retry for datastore timeout
    3. mutli-thread like bulkloader.py

    Suggeest
  41. Hi, I tested gaebar with App Engine 1.1.8 and it works perfectly but it doesn’t work with App Engine 1.1.9 and 1.2.0. Any ideas how I can it make work?

    Stefan
  42. Hi Stefan …

    Did you remember to “*IMPORTANT* Patch your dev_appserver.py as per the instructions here: http://aralbalkan.com/1440

    When I updated … my dev_appserver.py was also updated and the patch was lost.

    Ernesto
  43. Hi Ernesto,

    yes, I applied the patch but it didn’t have any effect with App Engine 1.1.9 and 1.2.0.

    Stefan
  44. Hi … I had a lot of trouble making gaebar work with aep 1 … but I finally got it up & running.

    Some hints in case you had the same problems I did:

    1) AEP uses by default a Django Style naming (app_model vs Model) and this has conflicts with gaebar (0.3) … so I had to use the patch provided by Alexander Vasiljev (http://groups.google.com/group/app-engine-patch/browse_thread/thread/441ad1a7a47cf59b/f0007e42a53924a9#f0007e42a53924a9)

    2) After applying the patch, GAEBAR worked perfectly on my localhost, but it did not work at the deployment server (appspot.com). I got 500 errors and MANY shard files .

    The problem this time was with the index.yaml. For some reason my index.yaml file was not updated with new indexes, and this produced an error at the deployment server.

    I manually added the following index definition and it finally worked smoothly:

    - kind: gaebar_gaebarcodeshard
    properties:
    – name: backup
    – name: created_at

    P.S.: After erasing my index.yaml file, the development server started updating the index.yaml file again (automatically)

    Hope this helps …

    Ernesto
  45. thx Ernesto it works perfectly
    i have been working on this for a long time :p

    Leo Lou
  46. When I input data in the local server,
    model.key().id() returns its numerical ID properly
    but when I upload it to google server by gaebar,
    model.key().id() returns null

    Any one experienced in this issue?

    Leo Lou
  47. I just installed the gaebar in my application.

    After a few tries I was able to make a backup of my data (I got some Timeout errors, but today datastore was in anomaly with high latency times).

    My first comment: Awesome!

    Reinhard Spisser
  48. Hi Aral,

    This tool is so awesome! Thanks so much for creating it. Do you know of any way to auto backup your GAE datastore to Amazon S3? That would be the holy grail :-)

    Thanks,
    David

    David Nelson