20 Aug 2008

Cotton Candy Corral ReefThe cool thing about user submitted content is that you can't always predict what you're going to get. Our speakers at the Singularity Web Conference, for example, submit and update their own bios and session descriptions on the site. Yesterday, I noticed that Dr. Woohoo had put up an image of one of his awesome generative artworks in his session description.

Of course, since I hadn't considered images in session descriptions, this had the side-effect of breaking the layout of the sessions page.

(In case you're wondering, yes, this is the way I like to work. Instead of over-engineering things, I like to see how people actually use stuff and then evolve them to meet their needs.)

So tonight I wrote a bit of code to massage and tame how images in session descriptions are displayed and I thought I'd share it with you in case it helps anyone else. (Another, more complicated way to go about things would have been to grab the images using urlfetch, store them in the datastore, and resize them via the image API -- but that would have been overkill for my needs.)

# Copyright (c) 2008 Aral Balkan, Singularity Web Conference
# http://www.singularity08.com
# Released under the open source MIT license.
 
from markdown import Markdown
 
image_tag_width_re = r'(?P<img><img.*?width=")(?P<width>\d*?)"'
image_tag_height_re = r'(?P<img><img.*?height=")(?P<height>\d*?)"'
image_tag_re = r'(<img)(.*?)>'
image_tag_src_re = r'<img.*?src="(.*?)"'
image_tag_alt_re = r'<img.*?alt="(.*?)"'
 
image_tag_width_rc = re.compile(image_tag_width_re)
image_tag_height_rc = re.compile(image_tag_height_re)
image_tag_rc = re.compile(image_tag_re)
image_tag_src_rc = re.compile(image_tag_src_re)
image_tag_alt_rc = re.compile(image_tag_alt_re)
 
IMAGE_SAFE_WIDTH = 160.0
 
def massage_images(html):
	"""Helper: Alters dimensions of any images in the passed HTML to make them safe for the site's design."""
	image_tag_widths = image_tag_width_rc.findall(html)
	image_tag_heights = image_tag_height_rc.findall(html)
	image_tag_srcs = image_tag_src_rc.findall(html)
	image_tag_alts = image_tag_alt_rc.findall(html)
 
	for i in range(len(image_tag_widths)):
		# Reduce the width of any found images to 160px so as not to break the layout
		original_width = int(image_tag_widths[i][1])
 
		maintain_aspect_ratio = True
		try:
			original_height = int(image_tag_heights[i][1])
		except IndexError:
			# Mismatched width/height pairs on image tags. We won't be
			# able to maintain aspect ratio.
			maintain_aspect_ratio = False
 
		if maintain_aspect_ratio:
 
			aspect_ratio = float(original_width)/float(original_height)
			new_height = int(IMAGE_SAFE_WIDTH/aspect_ratio)
 
			# Substitute the new height
			html = image_tag_height_rc.sub(r'\g<img>'+repr(new_height)+r'"', html)
			logging.info(html)
 
		# Substitute the new width
		html = image_tag_width_rc.sub(r'\g<img>'+str(int(IMAGE_SAFE_WIDTH))+'"', html)
 
		# Add float:left and slight margin so that text flows around the image
		html = image_tag_rc.sub(r'\1 style="float:left; margin-right:.5em;" \2>', html)
 
		# Finally, add a link to the original image if people want to see it larger
		html = image_tag_rc.sub(r'<a href="'+ image_tag_srcs[i] + '" title="'+image_tag_alts[i]+r'">\1\2></a>', html)
 
	return html
 

There are a couple of basic but helpful regular expressions in there and you might find the snippet useful if you want to manipulate image tags generated from user submitted content.

Oh, and before I forget, Dr. Woohoo is going to be talking about Generating Artwork at the Singularity Web Conference. Check out his bio and session and the other sessions at the conference.

(You can find out more about Dr.Woohoo on his web site and take a look at his latest book, Color Visualizations: Exploring the Circle, vol 02.)

If you haven't booked your ticket for Singularity yet, hurry, as the $99 early bird discount ends at the end of this month.

Add Your Comment

Spam Protection by WP-SpamFree

Dr. Woohoo, Generating Artwork, and some Python code to massage user submitted content (specifically, images).

  1. Not over-engineering: good, even great.
    Allowing unescaped user input: bad

    Thinking about and anticipating user input is not over-engineering — it’s good practice. Essentially, if you were not expecting images in a description, he shouldn’t have been able to post one. This means that you’ve left yourself open to XSS attacks. Sure, your speakers *probably* aren’t going to misuse your website, but the number one rule of thumb is never ever trust user input. Ever. A nice side-effect of this is that your layout rarely gets broken (you still have to look out for non-breaking-lines-such-as-this-one-if-you-know-what-i-mean-right?)

    In this case, the more ‘complicated’ way is actually the right way. If you’re serving images, ideally you are serving them from YOUR servers, or from trusted servers. Again, the nice side effect is you can manipulate them using a server-side image API, and deliver a faster and better looking thumb.

    I realize it’s not worth it for this obvious edge case, but something to think about.

    rajbot
  2. Hi rajbot,

    It’s not unescaped input. It accepts Markdown — I didn’t think that people would put images in there but they could :)

    And Django escapes everything by default as of 0.97 unless you specifically mark it as |safe.

    Aral
  3. We use a similar system at work as our clients use our own custom bi-lingual CMS to update content on their sites, so the chances are that they will put an image that is way too big and thus breaking the layout. We use .net to re-size the images so that they fit to the exact sizes specified by us.

    The django method looks nice, but as we do a lot of public sector work we use .net on our back end. Thankfully as I just design and integrate CSS, I don’t touch that crazy code :)

    Johnny