Over the years, Twitter has become increasingly popular with indie game developers. At some point, the hashtag #ScreenshotSaturday was invented to tag screenshots of game developers' work in progress each week. The hashtag is a great way to get a sneak peek at what people are cooking up, and a great way to discover creators and their games that you might otherwise have not heard of.
As the tag gained popularity, a couple of websites were created by the indie community to parse the twitter feed and showcase the images being posted to twitter. The first, screenshotsaturday.com created by Pekuja, and the second, screenshotsaturday.frogames.com created by Mathieu of Frogames. There may even be more, I'm not sure.
As an avid user of Google Reader however, I thought it was a shame that neither site had a working RSS feed. I had been using the RSS feed from twitter directly, but this of course only gave me a list of links to images. What I was looking for was a feed of the images themselves so that I could browse the wonderful eye-candy from within Google Reader.
So, I decided to scratch that itch myself and put together a crawler to generate the XML feed I was after. You can find the result here:
The crawler is a fairly simple Python script which runs every hour over Friday, Saturday and Sunday each week (because there are always late posts). On each run it requests the list of
#ScreenshotSaturday tagged tweets from the Twitter API, using Python's handy urllib2 module.
Next it iterates over the tweets, discarding those which have no link or can be identified as retweets. If a link points directly to an image, or a known image host, the image link is extracted and added to a list. Otherwise a request is made to the link URL to check for a redirect, and the new URL is evaluated.
Because each image host works differently, the logic for determining the image link from the URL has to be implemented separately for each host. The list of supported image hosts, at the time of writing, stands at:
Many of these image hosts have an API from which image URLs can be obtained. For others, one has to resort to web scraping. For scraping, I use html5lib to parse the page HTML into a DOM tree, and py-dom-xpath to extract the link using an XPATH expression.
After doing a bit of subclassing to
urllib2 I was able to stop it from automatically following redirects, giving me chance to check the
robots.txt of each domain before making further requests to it (like a responsible web user!). The robotparser module in the standard library was perfect for this.
Finally, I use PyRSS2Gen to format the image tweets into an RSS XML document, and write it out to be served to the web.
And There You Have It
Maybe someone else will find the feed useful - who knows. But certainly I've satisfied my own desire for directly-syndicated visual inspiration, and learnt a bit about web scraping in Python at the same time.