Disk is the New Tape

I came across this gem at Data Center Knowledge, mentioning Twitter’s plans to move into their own data center, having outgrown the managed hosting services they use at NTT America. The article includes a great slide deck by Twitter’s John Adams entitled Scaling Twitter, and was presented at this week’s Chirp 2010 conference, which, sadly, I was unable to attend. There’s a ton of great stuff in here detailing some of the techniques, tools and technologies (including current darlings like Kestrel and Cassandra) that Twitter has used to scale their service in the face of 752% growth in 2008 followed by growth in 2009, a feat somewhat akin to upgrading a jet engine in flight.

But my favorite slide in the presentation is the one entitled “Disk is the new Tape”, which refers to the heavy I/O challenges that social graph applications face. Disk is just way too slow for most Web2.0 applications, which means apps need lots of RAM and must focus on techniques that minimize disk access at all costs in order to provide reasonable (sub 500ms) response times.

Clearly smart software like Kestrel and Cassandra, which are built from the ground up to run in highly distributed environments have enabled the building of apps at internet scale, but it does also suggest that server (and data center) architectures must evolve over time too — moving hard drives out of the critical path (perhaps transitioning to SSDs for non-volatile storage as costs fall?) and thereby relegating hard disks to offline archival storage, a fate met by tape drives years ago.