Up until recently, I didn't really get regular expressions. Working with Django forced me to change that and I'm so thankful I took the time to delve deeper into them. These days, I pretty much use them on a daily basis and it shaves so much time off of some mundane tasks that I can't believe I ever got by without them. (They're especially helpful since TextMate has a regular expression engine built into its Find and Replace dialog.)
Today, I needed to repeat a capture group that could occur one or more times in a string and I kept getting just the last iteration. Some quick googling brought up a very informative page on Repeating a Capturing Group vs. Capturing a Repeated Group by the author of RegexBuddy.
It appears I was making a common mistake by repeating a capturing group instead of capturing a repeated group.
The code in question is to parse Google App Engine datastore keys so that I capture the whole key path, including all ancestors. A sample string:
s = "datastore_types.Key.from_path('Parent', 1L, 'Child', 30L, _app=u'myapp')"
So my first attempt, the flawed one was:
r = r"datastore_types.Key.from_path\(('.*?', \d*?L, )+_app=u'.*?'\)"
rc = re.compile(r)
rc.match(s).groups()
>>> ("'Child', 30L, ",)
What I should have written, to capture the repeated group:
r = r"datastore_types.Key.from_path\((('.*?', \d*?L, )+)_app=u'.*?'\)"
rc = re.compile(r)
rc.match(s).groups()
>>>("'Parent', 1L, 'Child', 30L, ", "'Child', 30L, ")
This results in both the result for the outer group (the repeated group; what we want) and the last iteration of the inner group (which we don't care about).
To optimize it further, you can make the inner group non-capturing. So the final version looks like this:
r = r"datastore_types.Key.from_path\(((?:'.*?', \d*?L, )+)_app=u'.*?'\)"
rc = re.compile(r)
rc.match(s).groups()
>>>("'Parent', 1L, 'Child', 30L, ", )
I may be more comfortable with regular expressions, but there's still so much to learn! :)
Update: And, like most things, the actual solution I ended up going with is much simpler:
r = r'datastore_types.Key.from_path\((.*?), _app'
The Regular expressions: repeating a capturing group and making the inner group non-repeating article by Aral Balkan, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-Noncommercial 2.0 UK: England License.

You probably know about this already, but i find this quite useful…
http://gskinner.com/RegExr/desktop/
Woot, that’s awesome, Keith — I didn’t know about it! :) Go, Grant! :)
Could someone explain the exactly what is going on above like Regex tutorial does. I have read the REGEX tutorial about 3 times and am still lost when I look at someone comples expression.
If you could email exactly how the expression works on the string at
steve_44@inbox.com
Here’s a native Mac regex app that I’ve found very useful: http://homepage.mac.com/roger_jolly/software/