Expand those shortened URLs before archiving twitter messages

What if a shortening service goes down?

People love to talk about the implications of twitter.com going down, but what if a URL-shortening service goes down? When I had trouble getting to is.gd recently, I realized that when they're down tweets referencing is.gd URLs are worthless—and that it wouldn't be too difficult to do something about it before this happens. (I have wondered, though: why doesn't twitter grab some short domain name and offer their own shortening service?) After all, if you're saving any tweets, why save them with a dependency on some potentially fly-by-night point of failure?

My wrapShortenedURLs.py python script, available at http://www.snee.com/xml/twclient/wrapShortenedURLs.py.txt, looks for URLs from five shortening services (defined in a list at the top of the script, in case you want to add others) and wraps those URLs in an HTML a element with an href attribute storing the URL that the shortened URL redirects to. For example, it will turn 'See http://is.gd/p3zb for Joseph Beuys fronting a bad German New Wave band' into 'See <a href="http://www.youtube.com/watch?v=DQ1_ALxGbGk">http://is.gd/p3zb</a> for Joseph Beuys fronting a bad German New Wave band'. (When writing the script, tweets with multiple shortened URLs were the difficult part, requiring an upgrade to my skill with Python regular expression functions.)

I've tested this with some XML pulled down using the twitter API and with a CSV file from tweetake.com, a service that lets you back up information you've stored on twitter, and it seems to work fine. I'll be using it with all my archived tweets from tweetake.com from now on, and if I ever write my own twitter archiving routine using the API, this will certainly be a part of it.

6 Comments

Sure seems like a good idea to me!


url shortening is probably turning out to be a bad idea and even though I have use/d it, completely agree


Now you've made me look at that truly horrible video...


fwiw, I wrote a small (and probably not very good) ruby script to expand tinyurls: http://planb.nicecupoftea.org/2009/02/02/expand-tinyurls-using-ruby/


for what reason would you archive twitter messages? the content is intended to be outdated after a day.

archive blogs!


Leo,

Of course I archive blogs. Some consider twitter to be a "microblog", and since I sometimes use it to mention interesting websites I've found, I like to archive those as well.

Many twitter messages are very ephemeral, and many aren't. I usually don't follow people who tweet things like "just finished breakfast", because I prefer the ones that say a little more. The people posting those may well consider them worth archiving.