UPDATE, 2013-05-05, 5:56pm Pacific:
STOP RETWEETING THIS LIKE IT’S NEWS. THIS IS SPECULATION. I ASKED IF SOMEONE WHO KNOWS BETTER COULD CHECK MY WORK.
I’m getting a lot of conspiracy theorist followers retweeting this uncritically, and not very much informed comment on the calculation I made.
On second thought, I don’t think I’m really qualified to wade into a debate that attracts this kind of attention. I’m afraid that someone’s going to quote me as an authority, but I do software, I don’t build data centers or operations. I’m leaving it up so you don’t think “the man” got to me.
The following is a draft post based on napkin calculations and may be completely wrong. I posted it on the draft version of my blog for people to check the math. Please reply to me on Twitter or ten.klien@klien with corrections/confirmations.
Some people read stories like Are all telephone calls recorded and accessible to the US government? and dismiss them out of hand; for one, it ends in a question mark, and two, to them it seems like that would just be beyond the government’s abilities.
But let’s find out! Let’s look at the cost of storing all these calls.
Americans use about 2.3 trillion minutes on cell phones per year, which is about 378 billion seconds per day. I didn’t find a good stat for landline usage, so let’s double it to 750 billion seconds per day.
The open source codec Speex can encode voice at 1400 bits per second.
This gives us:
7.5e11 seconds/day * 1.4e3 bits/second = 1.05e15 bits/day / 8 bits/byte = 1.31e14 bytes/day / 1e12 bytes/terabyte = 131.25 terabytes/day
How much does that storage cost? Cloud storage is getting crazy cheap - even if you outsourced it to Amazon, at their lowest priced archival storage, it would cost just $0.01 per gigabyte.
131.25 TB/day * $10.24/TB = $1344.00/day
The actual price would be lower - the government can splurge for a better codec, and probably can achieve similar (or better) economies of scale.
I’m surprised by this result myself, but it looks right. I did the calculation another way with different sources, and got a very similar result.
Now, this is just the storage cost for a certain amount of time. It doesn’t count all the other costs of encoding and infrastructure and searching this information. But it’s really well within the government’s capabilities. Google does far more impressive things with search, and Facebook eats 50 TB of photos and videos per day.
And the federal government, while less nimble, has far more resources than the top ten websites put together. You might have needed to be Google to do this in the early 2000s, but these days cloud storage companies are dime a dozen, and they can use off-the-shelf or open source technology.
So if that’s what private companies can do, what would you expect the government can do?