Update: The quotas have been reset, the site is back up, and engineers at Google are monitoring them and working closely with me until we can figure out what the issue is and fix it.
If you try to access the new <head> web site to see the new design, you might find it a little minimalist for your liking. You might also think that we've gone and changed the name again, this time to "403 Over Quota".
Google App Engine's short-term quota errors are killing us.
The new site for the <head> conference went live late last night and has been up and down all morning.
The Google App Engine quota system is fundamentally broken. You cannot have a cloud solution that "intelligently" takes sites down, essentially making every site running on it into a Twitter at the height of its troubles.
I can't even catch those Over Quota errors to display an apology message -- and I am profusely sorry and apologize if you have been trying to access the site this morning (believe me, the code's in there, it's just not getting triggered by these short-term quota errors.) It's very frustrating to say the least.
And I have no idea what's causing them. The logs have over quota messages, telling me that I'm 1.0x over quota (???) on requests that result in 404 errors. That's right, you can hit a non-existing page on the site and cause it to have an over quota error. That should be great news for denial-of-service attackers everywhere.
Sorry to vent but it just plain sucks when you work so hard on something only to have it made unavailable in such an inconsiderate way.
The way quotas are (mis)handled is the biggest thorn in Google App Engine's side. I can only conceive that the decision to implement quotas in such a barbaric fashion came from a Microsoft spy trained personally by Steve Ballmer and sent into the bowels of the Googleplex to infiltrate the Google App Engine team and cripple an otherwise excellent and revolutionary system that is -- this one major showstopper aside -- a joy to develop on.
This isn't Google Toy Engine, it's Google App Engine and it's about time that it started acting like a scalable cloud solution instead of a flashback to free Geocities or Tripod hosting with its over quota messages.
There is so much to love in Google App Engine but the quota system is simply broken and must be fixed.
The We haven’t changed the name of the conference to “Over Quota” article by Aral Balkan, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-Noncommercial 2.0 UK: England License.
I think you might be over estimating Microsoft a little :)
I guess it’s a fairly cutting edge platform and it’s going to have problems, but Google really need to be straight up about that.
Are the quotas based on bandwidth or hits? Could you point your domain at another app and redirect all requests over (or other hosting), so if the main app starts doing this you can catch it at the redirect to give people an apology/explanation?
I’m still surprised how you insist with google app engine after experiencing first hand that it isn’t a mature option yet.
I would love to know the reason you insist on this instead of moving to another cloud option or even a dedicated server.
You use gae framework but with django so i don’t think the change should be that terrific, i may be wrong though.
Cheers and good luck with the new name!
GAE is GAY.
actually — it kind of _is_ Google Toy Engine, right now. Until it’s out of ‘Preview Release’ status, and you can choose to pay for the service level you want, it’s just not going to be a reliable hosting platform…
But… it’s still in beta, right?
yeah it’s still beta, and so is gmail; give them both 5-6 years and they might come out of beta state ;-)
Google App Engine actually isn’t in one of Google’s “ForeverBeta’s” yet. It’s actually still just a preview release, as they call it (which seems more like something other companies would call a beta). It really isn’t meant to be used for production sites yet.
HOWEVER, if it can’t be used for production sites then how are folks supposed to figure out if it can work for them at all? The quota’s the head site is hitting are not ones you can pay to get over — they are specific limits on processing and bandwidth over a very short period of time put in place so you don’t use up your quotas or overload a particular back end machine in a short period of time.
This line pretty much sums it up: “All of our quotas are listed in terms of a rolling 24 [hour] window, although there are limits on the amount of resources your application can consume during shorter periods of time.”
One thing is a quota and another appears to be an fixed limit of an undocumented amount. Also read the paragraph at the bottom of this page: http://code.google.com/appengine/articles/quotas.html
The problem here is that they make it unclear if you’ll ever be able to pay to get over the limits. I’ve seen Google apps return errors at times and then start working again. This limiting factor may be part of how their infrastructure works to keep as much as possible running for as much time as possible.
The fact that the errors can’t be caught is interesting. The fact that the limits can be hit with 404s is very bad.
If the limits hold true even in to “ForeverBeta” and if 404s still count against quotas and limits, then GAE may forever remain Google Toy Engine for micro-projects.
Let’s just hope efforts that Aral, and others, are putting in will sway Google in a way that allows true scaling of apps under stress rather than just blocking apps under stress.
Hey Thom,
The limits, as Shane stated, are short term CPU limits. The problem is that I have no idea what is causing them. I’m not doing a lot of parsing or anything. It’s a simple site and I’m trying to be as good a citizen as possible — caching stuff whenever possible, etc.
I mean I’m getting over quota errors on a view that does essential nothing more than rendering a Django template — there must be something wrong with the system.
My friends at Google just cleared the quotas again but that’s just a short-term solution. This short-term quota thing has to be fixed somehow.
I’m sending over my code to the Google team in a few moments so they can look it over and make sure that I haven’t implemented the KillGoogleAppEngine anti-pattern or anything so maybe something will come of that.
But, looking at the logs, I really don’t think it’s my code.
@Fernando: There are several reasons I’m sticking with them. First and foremost, I still believe that it’s going to be a great platform and I’ve learned heaps about it. When it does mature a bit more, I plan for us to take advantage of it. The way I see it, we have a group of some of the top engineers in the world working for our app every day at Google.
Also, it’s not trivial to port away from Google App Engine. Yes, the site is 90% Django but the remaining 10% is the datastore and that’s not trivial to port. Hosting on the SDK is not an option (it’s not a deployment environment.)
There’s a tax you pay for being an early adopter but, in this case, I feel that it will be a worthy investment. And, if none of us took the plunge and braved it, then there would be no innovation. Standing and pointing from the sidelines is all well and good but new platforms need early adopters and — even more importantly — they need to listen to them and their feedback.
We just have to get through this rough patch — and the Google team are helping out as much as they can so I’m sure that we will.
Hi Aral, fellow App Engine dev here.
I’ve run into some of the same issues. Looking at your site, I notice that none of your static files return 304’s on subsequent requests, resulting in firefox requesting them on every single request. You might want to look into generating proper expirations and etags – I found this to help a metric ton on app engine, particularly for e.g. uploaded images (since they come from the datastore). Your 10 min expirations doesn’t seem to make firefox actually bother caching anything.
In addition to this; The short term cpu quota is extremely easy to saturate during spikes if your site does not run with a good deal of headroom to the ~350ms google considers the max for a reasonable request. The reason for this is that google will spawn up new processes, and this is very expensive (typically 3x as long – if not more, depending on complexity). At the same time request handling time will increase on existing processes (why that counts in the request time I don’t know; It’s in their framework/infrastructure, not in our code).
All in all, the quota system is funky as hell, and I struggle to understand how they expect us to ever use those 200k gigacycles you get every ~5 hours (the stats shown on the dash is for a 5 hour window), since any reasonable load to come near that will end up triggering enough high cpu warnings to exhaust that separate quota before even getting close. I suspect the quota works in this way so they can charge you for spikes, and use the free quota as a buffer for real sites, so you end up paying for getting slashdotted, but not normally.
How many requests have you been getting out of curiosity?
A funny and intriguing list of web hosting disasters .
Thomas, do you have any advice or resources for setting up said 304’s, expirations and etags on App Engine? The only cache management method I’m finding is to set expiration times in app.yaml, which seems suboptimal.
Nick –
Unfortunately, for static content all you can do is set expirations, unless you route static through your application, which I wouldn’t recommend.
For dynamic content however, you should make sure to generate etags and expiration times for as much stuff as possible, such as for example user avatars. What you then do is compare the etag the browser sends to you to the one you’ve generated, and return a 304 if they match. I’d recommend caching the generated etag but not the actual content; It is not worth it, simply pull it out of the datastore when it’s actually needed; In other words when the etags don’t match, or one isn’t provided.
My personal experience tells me that setting expiration and etag works best for caching, the other options are not worth the bother.
On the contrary, it is your website that is broken! You might want to check IE7 compatibility.
Thanks, Thomas, will try that out. Appreciate the advice!
So, U mean that any domain that points to google app engine results in
Server Not Found
404 Error
Right? because that’s the problem I am facing right now. I can not find what’s wrong. Do u guys experience the same thing?