Ever hear about Congress’s promise to archive every public tweet in the world from 2006 onwards? Heh. As one could expect, they’re running a little behind on that promis. For example: it takes the library roughly 24-hours to find a tweet taking place anywhere between 2006 and 2010.
Surprisingly, the report comes from the Library of Congress itself, outlining why, exactly, it’s isn’t available, and what it’s doing about it.
Despite nearly 400 requests by researchers, the library still hasn’t given anyone access to the archives in a useful manner. One reason is Twitter; the company’s contract prevents the library from making the archive widely accessible to the public, or easily downloadable, according to Buzzfeed.
Another reason is that the library, which is typically tasked with preserving an immense amount of physical data (its Virginia campus is the largest library in the world by shelf space alone), is having a hard time storing digital information.
The official line suggests that making the information relatively easy to access would require an expensive infrastructure of servers. High-traffic instant searches of the Twitter archive, the library suggests, would be cost-prohibitive. But the library has already spent “tens of thousands” of dollars on the project, according to a Washington Post report, and the cost line may be a little suspect: the library has requested $643.5 million for its 2013 budget.
Regardless, the library has recruited Gnip, a Colorado data company, to process tweets, with metadata, from around the world. The metadata, according to the Washington Post, will store where the tweet came from, who sent it, who follows the poster, and how many times it was retweeted, among other things. The archive, however, won’t store photos, videos, or links attached to a tweet.
But it’s not all doom and gloom for a massive Twitter archive: the goal of establishing a system to store and organize tweets ought to be finished by the end of January of 2013, as well as actually acquiring all of the public tweets between 2006 and 2010. It’s also, perhaps most impressively, on track to preserve a daily stream of tweets, which currently numbers upwards of 400 million a day.