Why are some things just much harder than they need to be? Finding a current, easy-to-use data source for populating a country selection box, for example.
Most of the clean country data out on the web is second-hand, having at one point in time been scraped and tediously hand-converted from the International Organization for Standardization (ISO)'s page of English country names and code elements.
That list "states the country names (official short names in English) in alphabetical order as given in ISO 3166-1 and the corresponding ISO 3166-1-alpha-2 code elements" of 246 countries and is free to use.
Unfortunately, it's an HTML table. Which means that you have some manual data massaging to do if you want to use it. Sucks big time.
ISO also sells the ISO 3166 "database" for about $148 USD. And it comes in -- get this -- Microsoft Access format. I bet we've all got a copy laying around somewhere, right? Oh wait, it's not 1999! So they want to charge you about 60 cents per country name? Not acceptable.
So what's a developer to do? Scrape the site? Write his one-line regular expression to pull out the data? Solve his own problem and move on?
Nah, let's make things better so that other developers don't lose precious time having to jump through ridiculous hoops! :)
Et, voilà, I present to you Open Country Codes.
This simple and deliberately unstyled little app gives you the ISO 3166-1 and ISO 3166-1-alpha-2 country name and country code lists in a variety of ready-to-use formats. There's an HTML country list selection box that you can drop into your page, and pure data structures in Python, JavaScript, ActionScript and a CountryComboBox component for Flex.
There are also JSON and XML feeds that you can use in your applications.
The application pulls in a fresh copy of the list every day and the code samples are dynamically generated.
I hope you find this useful. If you have any suggestions for improving it, please leave me a comment here.
The Open Country Codes: ISO 3166 country names and Alpha-2 country codes in HTML, Python, JavaScript, ActionScript, Flex, JSON, and XML article by Aral Balkan, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-Noncommercial 2.0 UK: England License.
excellent – that’s really useful! :-D
This is really handy Aral, thanks a lot!
(Note: the very picky person that I am noticed that on the Python page you’d really want to say “As a list of dictionaries”, but, yeah, whatever…)
Good to see you playing with Django btw!
Aral,
Thanks for this, I’ve scraped that page numerous times myself so it’s comforting to know I’ll never have to bother again.
One minor issue I noticed is that your Title Casing of the country names is a bit off: “Virgin Islands, U.s.” should of course be “U.S.” and there’s debatable casing of “And” and “Of”.
Maybe you’ll have better results with Gruber’s style?
Excellent. Hoping to port this to a few modules that Drupal uses. Thanks :D
Any hope that we might see a Web Services enabled, or better yet (from my perspective) a Flash Remoting enabled version of the “feeds?”
Thank you so, so much! This is excellent.
Thanks…very useful – but what about alpha-3? Hint, hint…
If anyone is looking for languages, I made up a Flex comboBox component for ISO 639-1 language codes. its based off a similar component made made by Peter Elst. demo and source available here: http://dansbigwebpage.com/blog/downloads/
-Dan
Just to let you know, I can’t comment on your site using Firefox (Linux) so I’m using Opera. Here’s my user agent:
“Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20080418 Ubuntu/7.10 (gutsy) Firefox/2.0.0.14″
Thank you so, so much!
@Cyril: Fixed, thanks for the heads up — and Django rocks big time! :)
@James: Ah, I’m going to look into the capitalization issue. I’m using my own implementation that handles embedded unicode in the form \uNNNN correctly (neither the built-in string.title() nor Gruber’s handles this well; they both capitalize the U). I will look through Gruber’s again and implement some of the niceties there while maintaining the unicode support.
@Dru: I might add new formats as time goes on. Adding Flash Remoting support is trivial with pyamf. Not sure about web services (just haven’t done anything with them in Python) but I’m assuming it’ll be one import and a couple of lines of code (like most things in Python) :) I am concentrating on getting the new Singularity web site up this week, though, so don’t expect an immediate update on this front.
@JB: Can you give me a real-world use-case for alpha-3? I’m not against implementing it, I just want to know where it would be used (everything I’ve seen on the Net pretty much seems to use alpha-2.)
@Nathan: Sorry about that — not sure why you’re experiencing it. I wonder if the theme I’m using on the blog has something to do with it. Must port blog to Python, must port blog to Python (2009?) :) If you can narrow this down at all, please do let me know.
@atom, @Ashe, @jnakai, @alftuga: You’re very welcome — I’m glad you’re finding it useful :)
James, I ended up hacking my own implementation to get the title-case right. Thank you so much for the heads up.
It should be correct now — please let me know if you still notice any issues.
Nice one Aral this is very handy.
Minor point; in Python a tuple of tuples would work well for data like this (which is unlikely to be changed at runtime) as it’s more efficient use of memory.
Hey Stuart,
Cool, I added tuple of tuples to the Python view. Thanks for the tip :)
Hey Aral
How did you get a google app to ‘refresh’ once a day?
Ive been thinking about using app engine for an app that would need to query an rss feed once an hour then generate db entries based on that feed.
cheers
Edd
Hi Edd,
I use memcache to cache the results with the refresh set to one day.
This is great. I’ve scraped wikipedia for similar content in the past, but this makes it way easier.
The only feature you’re missing is the ability to pass in a locale to get language specific translations of the country names.
Noel
Hi Aral,
Good thinking. Thanks for taking the time out of your busy schedule to do this for the development community.
Adrian
Aral,
We’ll seeing as you asked, here’s the edge cases I spotted, there’s no doubt more. Title casing English is stupidly complex, but introduce foreign country names and it just gets silly!
Heard Island and Mcdonald Islands (McDonald)
RÉunion (Réunion)
Saint BarthÉlemy (Barthélemy)
CÔte D’ivoire (Côte d’Ivoire)
Congo, the Democratic Republic of the
Macedonia, the Former Yugoslav Republic of (for these two the first “The” should be capitalised).
Is the source of the scraper open sourced?
Hey James,
Thanks so much for taking the time to send me these.
The erroneous capitalization of É, Ô, etc. was because I was using string methods in certain places instead of unicode e.g., string.lower() instead of unicode.lower(). I’ve now made sure that I’m using unicode everywhere.
I’ve special-cased the other three edge cases as I don’t too much time to devote to making the titlecaser truly generic.
I’m going to put the source up as soon as I get a moment (I’ll update this thread when I do).
_runs off to continue working on the Singularity web site_
By the way, I quickly added (only to the HTML, JS, AS, and Python views at the moment), the option to have the country names truncated by passing a length in the URL (e.g., http://opencountrycodes.appspot.com/html/33). Don’t bank on this syntax staying the same (especially if I’m going to add language support in the future, the URIs might be /html/en/10.)
Great post!
Comes in very handy for a country login thingie-majingy i need to finish, thanks alot.
Murten Saerbi
thanks. super site
HI,
congrats ! big idea and great implem !
In my company, I’m in charge of such a subjet and we intend to internally implement a central service (maybe rdf centric ?) for all our applications.
Where do you go from here now ?
What would be really cool is if you could also include the 3 letter codes, the 3 digit codes, and the currency code for each country. :)
Thanks for what you have done.
Why not a Java enum implementation as well?
[...] public links >> 3166 ISO 3166 country names and 2-letter country codes for HTML, Python … Saved by debaird on Sat 01-11-2008 Jeroen Frijters: New Development Snapshot Saved by [...]
I’ve got another vote for 3 letter country codes (commonly used on machine readable passports and using 3 characters allows the implementation to be more flexible for ‘made-up countries’ and future expansion).
PHP arrays (and different translations of the country names) would be brilliant!
This is amazing. I would like to see it go a step farther and include 3166-2 so that state/province can be populated as well.
It looks like this has been broken for a number of months – all the feeds are empty.
Any chance of fixing it?
Great idea! However, I find the (e.g. JSON) data structure you chose somehow sub-optimal. Checking values for validity assumes that one has to iterate over the whole array and checking each object’s key for existence of a value. Why not using a flat object?