Archive for the 'python' Category

Function.apply (is even easier) in Python

I can't believe I've been using Python for several months now without really understanding the extended call syntax.

You know how in ActionScript you can do functionRef.apply(thisObj, argumentsArray) if you need to call a function with a dynamic list of arguments? I was looking for a way to do this in Python and googled for "apply". Lo and behold, I found that it was deprecated.

Instead of using a separate function, you can simply pass in your arguments (and keyword arguments) to the function itself.

e.g., to call function add_numbers(first_number=a,second_number=b) with the list special_arguments=[2,2], you'd write:

add_numbers(*special_arguments)

And, for keyword arguments, where my_awesome_keyword_arguments = {'first_number':2, 'second_number':2}:

add_numbers(**my_awesome_keyword_arguments)

Finally, you can mix both positional and keyword arguments (say positional=[2] and keyword={'second_number':2}):

add_numbers(*positional, **keyword)

The crazy thing is that I've been using methods that use extended syntax for months now and yet I didn't actually grok exactly what was going on until today. Ah, I love it when something clicks. (And did I mention that the more I use Python, the more I love it?) :)

Python, the learn-at-home language

One of the things that I love about Python is that it has all the documentation you ever need (all right, almost) in the code itself. Many moons ago, the very first framework I wrote in ActionScript (Flash 5) used the same technique by placing documentation on the activation objects of functions (and it would be cool to see that practice make a comeback in AS3.)

In Python, to find out what properties an object has, you just ask for a listing. The following, for example, shows you all properties and methods on the os module.

import os
dir(os)

In fact, I was doing this just today as I wanted to find out which functions are available for working with folders for the automatic restore feature I'm building for my Google App Engine backup solution.

Of course, that brings back a lot of items.

But have no fear, because we have regular expressions. I've grown to loooooove regular expressions thanks to being finally forced to learn them and use them daily in Django. I'm probably don't write the most concise ones but I've gotten to the point where I use them all the time and they simplify my life to no end.

So, to see just the methods that have "dir" in them:

import re
r = r'(^.*?dir.*?$)'
rc = re.compile(r)
matches = map(rc.match, dir(os))
[x.groups() for x in matches if not x == None]

And Bob's your uncle.

(Oh yeah, and list comprehensions rock too!)

And the coolest thing is that since Python is interpreted, I'm doing all this in the excellent IPython shell and using the language to learn the language and as a reference.

The more I use Python, the more I love it. They really did everything right. And that success, no doubt, is firmly planted in the unerring focus and the core values instilled in the language by Guido -- you're a freakin' genius, dude, and it gives me no end of confidence in Google App Engine that you're on the project! :)

Coming soon: backup and restore the data in your Google App Engine applications

Google app Engine Backup Restore Datastore

Today, I made a full backup of the data on the Singularity web site on Google App Engine and restored it on my local machine running the SDK.

If you're not familiar with Google App Engine, you may be thinking, "so what?" Big deal, Aral, I can do backups with a click of the button in PhpMyAdmin. Unfortunately, though, there isn't currently a publicly available data export feature for Google App Engine, much less a solution to backup and restore your data easily. (One of the top criticisms aimed at Google App Engine is that you cannot backup/download your data.)

As far as I know, this is the first time a backup and restore has been done on Google App Engine (though Google engineers may have successfully tested their own solutions internally.)

My solution works by backing up the datastore incrementally into Python code (I ran into every possible limit on App Engine while developing this, as those of you following my Twitters today will have witnessed).

Yep, that's right, the backups are stored as Python code. "What about restores?", I hear you ask. Well, a backup is pretty useless if you cannot restore it. Restoring the backups is as easy as running the generated python code (either all at once, on the local SDK, or incrementally on the deployment environment.)

The four use cases I see for this are as follows:

  1. Backup you data (data safety)
  2. Backup your data and restore to the local SDK (local testing with real data)
  3. Backup your data and restore to a different App Engine instance (staging server)
  4. Backup your data and restore to your live application instance (data recovery)

I've already successfully handled use case 1 and I'm currently almost done with a generic restore feature that should correctly handle use cases 2-4.

When I've got restores working properly for the Singularity app, I'm going to decouple these handlers from the app and create a separate Django app that you can include in your own projects to give you backup and restore functionality in Google App Engine.

I will be releasing this as an open source project as part of Singularity's Open Source Initiative, alongside OpenCountryCodes, The European VAT Number Validation API, and The GAE SWF Project.

I've also got plans to make the whole process entirely seamless but my priority is working on the Singularity web conference so don't expect a very polished solution immediately. I will be working on this as my "20% project" of sorts, though, as it is essential to have a solid backup/restore solution for commercial apps on Google App Engine.

To read more about this, including some of the challenges, check out the following thread on the forums: Datastore backup solution (almost ready).

Based on forum postings, it would appear that Google is also working on the data export issue and I'm talking with Pete Koomen at the moment regarding my solution. Check out Pete's thread on backups here: Feedback on data exporter design?

Finally, one of the challenges in working with real data on the local SDK was that the SDK would grind to a halt after you populated a certain number of rows in the local datastore; see performance issue with SDK datastore with large volumne (>1000 rows). Thankfully, Baptiste Lepilleur created a very timely patch (see Issue 390) to work around the problem. This means that restoring a backup to your local machine will not take forever (it's still not blisteringly fast -- taking 0.1 sec per put but at least the duration does not increase linearly like it used to).

I plan to get restores fully working tomorrow (along with my other, more social responsibilities for Singularity and Pistach.io). Follow me on Twitter or keep an eye on the blog and/or the relevant thread on the forums for updates.

We're close to having a working data backup and restore solution on Google App Engine! :)

The 1MB hard limit in Google App Engine

There isn't much documentation on what exactly is affected by the 1MB limit in Google App Engine so here's my effort at documenting this based on my empirical findings:

  • You cannot have a data structure (e.g., variable) larger than 1MB in size or you'll get a MemoryError.
  • You cannot return a response that is larger than 1MB in size or you'll get an error similar to: HTTP response was too large: 3457738. The limit is: 1048576. (I got that error when trying to circumvent the 1MB MemoryError issue by returning a generator for my HttpResponse).
  • Model instances cannot be larger than 1MB in size or you'll get a RequestTooLarge error.

No nudity please, we’re Google (or why you shouldn’t mix naked domains and www on Google App Engine)

I have to confess, I love naked domains. You might say, I have somewhat of a fetish.

Naked domains, of course, are domain names without the www prefix. So, instead of www.singularity08.com, for example, having singularity08.com.

One of my pet peeves are sites that don't display correctly without the www prefix. I've found that it's usually a good sign that the site is going to be pretty crap. In fact, I was hoping that some day we would have www disappear from use altogether and that we'd all be swimming in a sea of naked domains. Well, I almost got my wish -- we've at least got heaps of domains with nudity.

The truth is, however, that the www subdomain is not going anywhere, especially on the cloud, and the thing to do is to have your naked domain forward to www (you listening, no-www?)

Why GAE naked domains don't play nice with others

When you host with Google App Engine, you can choose to use your own domain name for your app. You do this through Google Apps (Confused? You should be. The two sound very similar.) Google Apps, of course, is Google's online office suite. You get a Google Apps account and create A-records for your naked domain (four of them, pointing to 216.239.32.21, 216.239.34.21, 216.239.36.21, and 216.239.38.21) and you create a CNAME for your www subdomain that points to ghs.google.com. (I use DynDNS for all my DNS hosting -- they rock -- and make it really easy to set this stuff up.)

If all this DNS voodoo sounds confusing, it's because it is. I still wouldn't know a CNAME from an A-Record if I met one in a dimly-lit alley (ok, maybe I'd recognize the A-Record if it wasn't wearing make-up -- just maybe though). All you really need to know is that if you use those settings, things will work thanks to the magic of those hard-working DNS gnomes. No really, they exist. They're cute little things too, all bright-eyed and furry.

The problem you're left with, however, is that your domain is reachable from two URLs: one using the naked domain and one using the www.

"So what's wrong with that?", I hear you ask. Ah, a number of things, my inquisitive fellow, a number of things...

Firstly, it's not good to have two sets of URLs for the same resource (if you don't have time to digest the in-depth reasons why, at least know that it's bad for search engine rankings.)

Secondly, that address that you set your CNAME to, ghs.google.com, does special things. Like load balance your requests among the hundreds thousands gazillions of servers that Google has in its cloud. Your puny "A" list is not going to compete with that. In other words: www 1, naked domain 0.

Finally, and most importantly, your app will break. Woah, that's a big one... care to explain, Aral. Sure, Aral, I thought you'd never ask (it's not considered talking to yourself if you do it in a blog post, you know!)

Take this scenario:

You hit the Singularity web site at http://singularity08.com. Next, you go to buy a ticket and you get forwarded to PayPal. In the forwarding URL, PayPal gets told to return you to http://www.singularity08.com. Not a big deal, right? Oooh, but it is. When you return, you end up losing your session. Ouch!

You can work around this by making sure that you always use request.META['HTTP_HOST'] in Django when creating callback URLs but I guarantee you that you'll forget at some point and mix your naked domain and www and end up scratching your head at the random errors that result. That's exactly what happened to me earlier this week while gearing up to launch the Singularity web site.

So how do you work around it?

The simplest way I found was to write a simple piece of middleware in Django to handle the forwarding. Here it is, released under the MIT license:

import os
from django.conf import settings
from django import http

class NakedDomainRedirectMiddleware(object):
	def process_request(self, request):
		"""
		If the domain is being accessed from the naked domain, forward it to www.

		Copyright (c) 2008, Aral Balkan, Singularity Web Conference (http://www.singularity08.com)
		Released under the open source MIT License.

		"""

		naked_domain = settings.NAKED_DOMAIN
		host_name = os.environ['HTTP_HOST']
		start_of_uri = host_name[0:len(naked_domain)]

		if start_of_uri == naked_domain:
			full_path = request.get_full_path()
			uri = 'http://www.' + naked_domain + full_path;

			return http.HttpResponsePermanentRedirect(uri)

Save the above class and then add it to your settings file, at the top of your MIDDLEWARE_CLASSES tuple. For example, I have it in a module called middleware:

MIDDLEWARE_CLASSES = (
    'middleware.NakedDomainRedirectMiddleware',
    'django.middleware.common.CommonMiddleware',
    # etc.
)

Finally, set your naked domain in the settings file:

NAKED_DOMAIN = 'singularity08.com'

This should forward all requests to the naked domain to www. You'll end up not having two sets of URLs for each resource and you'll save yourself a lot of headache.

Google is aware of this issue and they were trying to implement a fix on their end to help me out but that's not in place yet. It's possible that they may implement the fix and make it the default behavior for all accounts (which is what I think should be the case) but it may take a little while as any such change will have to go through full QA testing.

In the meanwhile, this is a stop gap measure that's working out fine for me currently on the Singularity web site. I hope it helps you out too.

New runserver options for manage.py in Google App Engine Helper for Django

Update: Just as I was writing/releasing my patch, Matt Brown apparently committed r38 (diff) to the Google App Engine Helper for Django source tree which resolves the same issue -- and much more simply and elegantly than my solution, by using Django's own settings file. Use Matt's solution instead of my patch.

Original post follows:

The Google App Engine Helper for Django is an excellent library if you want to build Django apps on Google App Engine. I've been using it without any hassles on the new Singularity Web Conference site. One thing that it currently lacks, however, is a way to specify dev_appserver options (like smtp_host and smtp_port, for example) when you start the server.

So I got into the lazy habit of using dev_appserver.py to start the server when I needed to test email functionality and using ./manage.py runserver at other times.

Don't do this! :)

It started tripping me up because the two use different datastore instances. So I would do dumb things like run the server using dev_appserver.py and then start a shell with ./manage.py shell and wonder why certain entities were not in the datastore. Doh! :)

Instead, I'd recommend that you use manage.py for everything.

The problem then remains, how do you pass options to manage.py? Well, I just bit the bullet and added that functionality to the Google App Engine Helper for Django.

Here's the patch (runserver.patch.zip, 1kb).

After you apply it to your appengine_django folder, you will be able to pass the smtp_host and smtp_port options to the server just like you can with dev_appserver.py.

For example:

./manage.py runserver --smtp_host=localhost --smtp_port=5000 192.168.1.2:8080

I've only set those two options up because those are what I need at the moment but the code is flexible. Just add more valid dev_appserver.py options to the valid_args tuple in runserver.py and they'll automatically be proxied to the dev_appserver instance.

Winning at the shell game: iPython on Google App Engine

iPython is an awesome extended Python shell that gives you goodies like tab completion for instances, history tracing (so you can easily copy interactive sessions as doctests), etc. And, if you install it, your Django project on Google App Engine will automatically start using it instead of the regular python shell when you use ./manage.py shell.

To install iPython on OS X Tiger (yes, my Leopard discs are still safely in their box since I downgraded and I don't see any reason to bring them back out yet), I followed the following steps:

  1. Download the latest iPython from the iPython distributions page (ipython-0.8.4.tar.gz)
  2. Untar it, cd into the folder
  3. As per the instructions on the iPython download page:
    python setup.py build
    sudo python setup.py install
  4. To test it out on my Google App Engine/Django project, from my project folder: ./manage.py shell

(Note: The docs mention that you need to have readline installed on Mac OS X in order to use some of the features like tab completion and syntax highlighting. It just worked out of the box for me on OS X Tiger 10.4.11 -- I'm not sure if I had installed readline at some point or whether it was just there. Check out these instructions if you're having trouble.)

Once you have it installed, try out the cool code completion:

from my_app import models
models.

Press ⇥, and you'll see a list of all your models. models.my_model. ⇥ will show you the properties for that model and so on.

To create doctests, simply enter your test instructions in the shell and then type hist -n to get a dump of your history without line numbers that you can copy and paste into your doctest.

You can press ⌃ P and ⌃ ⇧ P to interactively bring up the previous and next commands in the history. If you've typed a bit of the command before doing this, it will filter to show you only those commands from your history that match the text you've entered.

You can also access the system shell without leaving iPython by preceding system calls with an exclamation mark. !ls, for example, will show you a listing of the current working directory.

And there's much more you can do that you can read about on the iPython documentation (or just type ? in the iPython shell itself and browse the docs interactively).

Check out iPython, it's yummy!

I found out about iPython from an excellent blog post by AkH on useful tips and good practices for Django projects. Thanks, dude!

Conditionally displaying sIFR text for different languages using content negotiation with Django

I'm using sIFR on the new Singularity teaser site (not least because Mark Wubben is speaking at the conference, mind you) and ran into an issue today with extended characters (such as extended Western characters, Chinese, Japanese, etc.) not displaying properly as they weren't embedded into the Flash text field.

Getting extended Western characters working is not too difficult as you can embed most of them in the sIFR SWF without increasing the size of the SWF too much. The current size of my GillSans SWF currently on the site is 32KB and includes the following character sets from Flash (note that some of these contain overlaps):

  • Uppercase
  • Lowercase
  • Numerals
  • Punctuation
  • Basic Latin
  • Latin I
  • Latin Extended A
  • Latin Extended B
  • Greek
  • Cyrillic
  • Armenian

Embedding fonts for other sets such as Hebrew, however, balloons the size of the SWF to unacceptable sizes.

The workaround I implemented was to use content negotiation to switch sIFR off for various languages. The code (which you can put inside your request handler or in a decorator or middleware method):

use_sifr = True
if 'HTTP_ACCEPT_LANGUAGE' in request.META:
	language = request.META['HTTP_ACCEPT_LANGUAGE']
	remove_sifr_for = ('ar', 'zh', 'he', 'ja', 'el', 'ko', 'pa', 'th')
	for lang_test in remove_sifr_for:
		if lang_test in language:
			use_sifr = False
context['use_sifr'] = use_sifr
 

Then, based on the setting of the use_sifr template variable, I conditionally include the JS for sIFR:

{% if use_sifr %}
	<script type="text/javascript" src="/js/sifr.js"></script>
	<script type="text/javascript" src="/js/sifr-config.js"></script>
{% endif %}

You can easily test this out in Firefox through the Preferences panel (⌘ ,) → General → Languages → Choose... and adding one of the languages for which sIFR is switched off (such as Chinese).

Running doctests from TextMate for Google App Engine modules

Developer emptor: I just lost a couple of hours to this: make sure you disable the Google App Engine doctest import in your apps when you're done testing a module lest you encounter _weird_ errors. I started having the login URL returned by users.create_login_url() being returned incorrectly when I forgot to remove the doctest import. It started forwarding to https://www.google.com/accounts/Login?continue=. Check out my forum post on it here.

I love Python's doctests. Basically, you test out your functions in the interactive shell and copy the results into the comments for a function. That's it! So simple.

Example:

def http_request(self, url, data, method=urlfetch.POST):
	"""
	Makes an API call to Triggermail and returns the response.
 
	>>> client = TriggerMail()
	>>> client.http_request('email', {'email':'EMAIL_REMOVED'}, urlfetch.GET)
	{u'blacklist': u'0', u'templates': {u'test2': 0}, u'verified': u'0', u'vars': {u'first_name': u'Aral', u'last_name': u'Balkan'}, u'optout': u'0'}
	"""

To run the doctests, you just need a main method in your module that looks like this:

if __name__ == "__main__":
	import doctest
	doctest.testmod()

And, if you're working with TextMate, you can run the current script and its doctests by pressing ⌘ R.

Sweet!

However, when working with Google App Engine, this doesn't work out of the box.

If you try it, you'll get an error similar to the following:

ImportError: No module named google.appengine.api

This is because the local GAE environment isn't set up properly. The same goes when trying to test your apps from the Python interactive shell.

(If you're using Django for your app, you're in luck, all you have to do is ./manage.py shell and you're up and running with an interactive shell that's configured for your GAE project.)

Thankfully, Duncan over at the GAE forums went to the trouble of finding out exactly which imports are necessary to get you up and running.

His code listing actually goes beyond setting up the environment to finding your modules and running the tests. For my purposes, I just want to be able to hit ⌘ R in TextMate and run the tests for my current module while developing it, so I took the top bit of his code and put it into a module called gae_doctests.py.

It looks like this:

# To enable doctests to run from TextMate, import this module
# (Use only when testing, then comment out.)
# From: http://groups.google.com/group/google-appengine/browse_thread/thread/fa81f6abd95aa8b9/efed988b302aafb4?lnk=gst&q=duncan+doctests#efed988b302aafb4
 
import sys
import os
sys.path = sys.path + ['/usr/local/google_appengine', '/usr/local/google_appengine/lib/django', '/usr/local/google_appengine/lib/webob', '/usr/local/google_appengine/lib/yaml/lib', '/usr/local/google_appengine/google/appengine','/Users/aral/singularity/']
 
from google.appengine.api import apiproxy_stub_map
from google.appengine.api import datastore_file_stub
from google.appengine.api import mail_stub
from google.appengine.api import urlfetch_stub
from google.appengine.api import user_service_stub
 
APP_ID = u'test_app'
AUTH_DOMAIN = 'gmail.com'
LOGGED_IN_USER = 't... {at} example(.)com'  # set to '' for no logged in user
 
# Start with a fresh api proxy.
apiproxy_stub_map.apiproxy = apiproxy_stub_map.APIProxyStubMap()
 
# Use a fresh stub datastore.
stub = datastore_file_stub.DatastoreFileStub(APP_ID, '/dev/null', '/dev/null')
apiproxy_stub_map.apiproxy.RegisterStub('datastore_v3', stub)
 
# Use a fresh stub UserService.
apiproxy_stub_map.apiproxy.RegisterStub('user',
user_service_stub.UserServiceStub())
os.environ['AUTH_DOMAIN'] = AUTH_DOMAIN
os.environ['USER_EMAIL'] = LOGGED_IN_USER
 
# Use a fresh urlfetch stub.
apiproxy_stub_map.apiproxy.RegisterStub(
    'urlfetch', urlfetch_stub.URLFetchServiceStub())
 
# Use a fresh mail stub.
apiproxy_stub_map.apiproxy.RegisterStub(
  'mail', mail_stub.MailServiceStub())
 

(Either replace the /usr/local/ bit with the actual path to your GAE install or use Duncans code which is neater -- I was lazy and copied the contents of the sys.path list from the Django interactive shell.)

To use it, simply:

import gae_doctests

And hit ⌘ R in TextMate. Sweet!

Once I'm done hacking away on a module, I simply comment out the import.

Could not embed video.

“Google App Engine: To Django or to webapp?” Revisted

When I was first starting out with Google App Engine (GAE), I wrote a short post detailing my thoughts on whether to use Django or Google's own webapp framework for GAE projects. In that post, I concluded that "there isn't a compelling reason to use Django at the moment with GAE".

Since then, I've re-evaluated my decision and, a few weeks ago, I ported the Singularity web app to Django from webapp.

Here are the main factors that contributed to my decision:

  • I wanted to use features in the latest Django trunk (0.97+) and the Google App Engine SDK ships with 0.96. I also did not want to be tied in to updates from Google for the Django framework.
  • Google released the Google App Engine Helper for Django which makes it very simple to set of a GAE Django app.
  • And, finally, looking through the source of Rietveld, I saw that Guido van Rossum is using Django for his Google App Engine app. And heck, I think he knows a thing or two that I may not.

Going with Django means that you can take advantage of more existing functionality (Django middleware, etc.) and, should you want to move away from Google App Engine in the future, there will be less code to rewrite. (Though, realistically, if you're doing anything with data, you're pretty much tied in to the DataStore at the moment until there is a suitable, scalable alternative that you can port to without completely refactoring your data layer.)

So my new take on this is that it makes a lot of sense to build Google App Engine applications using Django instead of pure webapp.

To get started, check out the Google App Engine Helper for Django.






Bad Behavior has blocked 0 access attempts in the last 7 days.