By now, as the end of the first quarter of the 21st century draws near, everyone is surely aware that user passwords are digital gold, and that protecting them is a key aspect of ensuring data security and privacy. Yet despite this, not all companies store passwords properly still.
In this post we look at how NOT to store user passwords, and what methods are used by services that take security seriously.
The wrong way: storing passwords in plaintext
The simplest method is to store passwords in an unencrypted database. When a user tries to sign in, authentication is just a matter of matching what they enter against what’s in the database.
But there’s always a risk that attackers might steal this database one way or another — for example, by exploiting vulnerabilities in the database software. Or a password table might get stolen by an ill-intentioned employee with high access privileges. Also leaked or intercepted employee credentials could be used to steal passwords. Put simply, there are plenty of scenarios where things can go pear-shaped. Remember: data stored in open form is precisely that — open.
A slightly better way: encrypted passwords
What if you store passwords in encrypted form? Not a bad idea at first glance, but it doesn’t work great in practice. After all, if you store encrypted passwords in the database, they have to be decrypted each and every time to compare them with user input.
And that means the encryption key will be somewhere close by. If that’s the case, this key can easily fall into hackers’ hands along with the password database. So, that defeats the whole purpose: the cybercriminals will be able to quickly decrypt this database and get passwords in plaintext, so we end up back where we started.
As cryptographers jest in all seriousness, encryption doesn’t solve the problem of data privacy — it just makes it a problem of secure key storage. You can come up with some sort of cunning schemes that may reduce the risks, but in general it won’t be possible to reliably secure passwords this way.
The proper way: storing password hashes
The best method is not to store passwords at all. If you don’t have something — it can’t get stolen, right?
But how to check whether a signing-in user has entered the correct password? That’s where hash functions come into play: special cryptographic algorithms that scramble any data into a fixed-length string of bits in a predictable but irreversible way.
Predictable here means that the same data is always converted into the same hash. And irreversible means that it’s completely impossible to recover the hashed data from the hash. That’s what any online service does if it cares about user data even just a tiny bit and values its reputation.
When a user creates a password during registration — not the password itself but its hash is stored in the database along with the username. Then, during the sign-in process this hash is compared against the hash of the password entered by the user. If they match, it means the passwords are the same.
In the event of a database leak, it’s not the passwords that the attackers get hold of, but their hashes, from which the original data cannot be recovered (irreversibility, remember?). Of course, this is a vast improvement security-wise, but it’s still too soon to rejoice: if the cybercriminals get their hands on the hashes, they might attempt a brute-force attack.
The even better way: salted hashes
After obtaining your database, the hackers might try to extract the passwords through brute force. This means taking a combination of characters, calculating its hash, and looking for matches across all entries in the database. If no matches are found, they’ll try another combination, and so on. If there’s a match, the password that was used to calculate the hash in the database is now known.
Worse still, the process of cracking hashed passwords can be sped up considerably by means of so-called rainbow tables. Rainbow tables are huge data arrays with precalculated hash functions for most frequently met passwords. As such, they make it easy to search for matches in the stolen database. And it’s all done automatically, of course, so the password-cracking process becomes too quick for comfort.
However, there is some good news: it’s impossible to calculate the hashes of all possible character combinations in advance — a complete rainbow table for any hashing algorithm will take up more disk space than there is on the planet. Even for the not-overly-reliable MD5 algorithm, such a hypothetical table would contain (deep breath) 340 282 366 920 938 463 463 374 607 431 768 211 456 records. Which is why only the most common combinations get included in rainbow tables.
To combat the use of rainbow tables, cryptographers came up with a solution that utilizes another important property of hash functions: even the tiniest change in the source text alters the hashing result beyond all recognition.
Before a password hash is computed and written to the database, a random set of characters (called a salt) is added to it. This way, the databased hashes are modified to the extent that even the most basic, obvious and frequently used passwords like “12345678” and “password” cannot be brute-forced with rainbow tables.
The simplest variant uses the same salt for all passwords. But the most hack-resistant one creates a separate salt for each individual record. The beauty of this approach is that salts can be stored in the same database with no additional risk: knowing the salt does not make the attackers’ task much easier. To crack the hashes, they will still have to apply pure brute force — go through every single combination.
The more online services adopt this non-storage of passwords method, the less likely a mass theft of user credentials (and the subsequent trouble associated with account hacking) will occur.