How big is 5 Zettabyte?

Spy vs SpySince the interview of Edward Snowden with the Guardian, the discussion about privacy and companies storing and sharing unencrypted private data is picking up. Particularly Americans are worried about what it does for their National security and their private data. But that’s actually a naive thought, given the NSA stores worldwide data.

In a recent coverage on (a rather tabloid-looking news station in the U.S.), the interviewers are shocked to see that the NSA spies on “every American”.

This is a limited view of the world and failing to see the importance of spying on people outside the U.S., but lets start with technical side of things first. What data are they storing and how big is their hard-disk?

As you may know, the internet is a packet switching network. Although people try to mess with the routing and sometimes even disconnect whole countries, it is hard to make a packet travel a certain route or prevent it from traveling through a certain country. Much of the internet traffic passes through the U.S. regardless of it’s source and destination. But even if it didn’t, the NSA has that covered. Many companies, Like Facebook, Google has servers all over the world. Since Google and Facebook are companies registered in the U.S. the NSA could technically indirectly tap into one of their servers in a foreign country which contains data which never travelled through the US in the first place.

According to the worldwide distribution of internet users, if the NSA wanted to store data just about Americans, they could have built a much smaller, simpler facility. But they didn’t, which make you think. So how big is this facility exactly?

Holy Crepe5 Zettabyte
If you’ve watched the coverage, you’ll hear James Bamford say that the facility that the NSA is building can hold enough physical machines to store around 5 Zettabyte of data, given current technology standards. It sounds like a lot, but how big is a Zettabyte really?

According to wikipedia, 5 Zettabyte is 5,000,000,000,000,000,000,000 bytes. If, like me, you can not get your head around a number with 21 zeroes in it, I’ll give you some of the calculations I made to understand just how big this is.

I started with the Internet Archive. Currently, the Internet Archive and it’s Wayback machine holds 10 Petabytes of data (10,000,000,000,000,000 bytes). The servers at the Internet Archive crawl the Internet and store all public data they can find in a searchable manner. They provide the public with that data, so you can do cool things like see how my blog looked 10 years ago. The Internet Archive is pretty big you might think. With 5ZB, the NSA could store the complete Internet Archive 500,000 times.

Relative to world population
Ok so maybe the size of the internet, or the size of a Petabyte are not very imaginable. Let’s take it to the individual level. At the time of this writing, there are roughly 7.124.855.134 people on the world.

If the NSA would wanted to store information on every person in the world, regardless if they were a president or a newborn African child, they would have 701,768,654,374 bytes per person to do so:

5ZB per person

Yes, they can store 701GB for each person in the world. My personal email archive is around 5GB right now, and I have a 1TB hard-disk containing every (digital or digitized) photo I have ever made. If I would be the standard (which I am probably not), the NSA could very easily store everybody’s personal data, worldwide.

Relative to the people connected to the internet
Not everybody is connected to the internet (in fact that’s not entirely true but we’ll come to that later). As you have probably already seen on the site, there are roughly 2,552,435,328 active internet users (at the time of writing). So let’s see what the NSA can store if they “only” target the internet users:

5ZB per Internet User

Ah that’s better. Almost 2TB per Internet user. The NSA can now not only store my private data, photo’s and videos, but now there is room to spare so I can put a couple of blockbuster movies in there, too.

Since the NSA probably does some smart data reduction, like storing an email I send to you just once instead of a copy for each of us, I’m guessing the practical space per user will be more like 4TB.

HomerAm I connected to the internet?
In short: Yes. We are all connected in some way to the internet. Particularly if you are living in a “Western Country” you can be sure that in some way you, or at least data about your existance, is online somewhere. Cities store data about their inhabitants and share them with other departments and services to do their work. By default, this data travels accross the internet and we all know how good governements are at protecting and encrypting their data, right?

Let’s do an interesting thought experiment to see how far this goes. Suppose I have never connected to the internet, and neither have you. I own an old Toyota Corolla E70 and you and I stayed with our trusty Nokia P-30‘s. Now let’s say I lend you my car. The NSA would know about that, instantly. How? My car is registered to me. The roads are covered with cameras to detect traffic violations, but in fact, the most governments like to store data about which car is going where. Our cellphones are connected to cell towers, which reveals pretty accurate data about where they are. So the NSA can map the time and place of my car to the time and place of my cell phone and all cell phones I’ve called. They would quickly and easily discover that my car is moving with your cellphone in it. The fact that their data shows that my cellphone is at home means that you are probably driving my car. Based on our cellphone data, it would even be pretty easy to see our daily commutes, see where you and I work and when and where we met in person.

Conspiracy? Maybe. Possibility? Yes. Privacy breach? Absolutely.

Different countries have different meanings of the word “Privacy”. By definition, gathering of data which is considered to be private (secret or confidential) in the country of origin, is Espionage. This is not surprising, as old habbits die hard.

I don’t care, I have nothing to hide.
Gollum has something to hideNot so fast grasshopper. You do. In fact, the NSA just introduced the very need to keep your data private and store it locally, indefinitely. In the example above, suppose you used my car to take pictures as a hobby photographer of a crime scene that was in the news. With the data they gathered, the NSA could frame either you or me at that scene, at their peril. Because you and I have no access to the data they are storing about us, it will be very hard for you or me to prove that we weren’t, because the NSA will not provide you or the court with data that will prove otherwise.

The same goes for mail and other communications. Tabloid journalists can explain how to use one email, or once sentence from an email out of context to start a nasty gossip. The NSA could do that too. Suppose you are sharing role-playing stories with a friend over email. Those emails could suddenly look very interesting when placed out of context. Don’t think they will not do that. Power corrupts. They will, and very likely already have done so.

Don’t make me think
I’ve always said that the failure to make stuff user friendly is the single most important reason for brilliant ideas to fail miserably. In the light of current events and the power of modern computers, can we at least try to make strong encryption like PGP stupid-easy to use, and add “coolness” so people will want to use it?

Because much like openness, responsibility and privacy is now more important than ever.

Related material I found while writing this blog post:

  • Surveiller et punir: Naissance de la Prison is a 1975 book by the French philosopher Michel Foucault. If he was still alive, Foucault would probably see that after a period of individualism and consumer-centric change, the NSA is the re-introduction the Panopticon, a prison (or society) where the prisoners can not see the guards.
  • This Dutch program “Dare to Think” discusses how Foucault thought, and places it in the modern time. It discusses how modern individualism and a market driven society needs new mechanisms to steer the individual. The theory is in essence that people need to learn to steer themselves and check the self-set boundaries with others from time to time.
  • In the source of Windows NT 4 Service Pack 5 a variable with the name _NSAKEY was discovered which led to speculation that Windows computers could leak private data to the NSA.
  • Because few people do, When you decide to use PGP to encrypt your mail, you are likely to gain some attention, particularly from American “authorities”. Because you stand out. If you decide to stand out and encrypt stuff, you better do it right. Gather some information about Uncrackable Email (yes it’s possible). Please note that all your encrypted mails will be stored, so when somebody does find reason to detain you and get your passphrase, deleting your drive will be useless.
  • Estimations in the linked articles say that the NSA facility cost $2B to build, has it’s own 65MW power station, 60,000 tons of cooling equipment, and an electricity bill of $40M per year (unclear if that includes running their own power station). If you looked closely at the WorldMeters link, you could have noticed that there are almost 4 million Cancer related deaths per year and 35 million HIV related deaths, versus less than 8,000 terrorism related deaths in 2010. If the NSA is here to pretect us from dying, they’d better use their budget and intelligence to start investigating other stuff.

3 Responses to How big is 5 Zettabyte?

  1. rolfje says:

    And even if you never joined Facebook, your data is already there, handed over by your “friends”. Oh and don’t forget contact-leeches like Whatsapp and the likes.

  2. rolfje says:

    News: Not only can they selectively use the evidence, they can actually plant evidence on *your* computer without you even knowing:

