The Lazy Way to Hash Passwords
Every now and again, I’m asked to do an audit on a company’s software, systems, and practices. The same problems crop up time and time again (doing fake agile development, incorrect testing procedures, weak passwords, etc…).
Interestingly, outside of startups, the developers I’m talking to are almost always aware that they are doing things incorrectly and end up blaming ‘management’ for not giving them the time and resources to develop correctly. Right or wrong, it ends up being the pattern.
The reason I exclude startups from that list, is that a lot of young startups are started by a group of friends just out of school with no real world experience. That’s fine, it just means that whenever I talk to those people, they THINK they’re testing correctly, working in an agile fashion, etc… Unfortunately, they usually just don’t know any better. Grand scheme, not a huge problem given the size of the team, but they’re incurring a lot of technical debt from the start.
Aside: ‘Agile’ is such a buzzword that just hasn’t died and people still don’t know what Agile development looks like. While I’m not the industry expert on it, I know what it is, and more importantly, I know what it’s not… Also, for you Agile manifesto zealots - Agile is NOT always the solution. It’s not a magic wand that sweeps away all problems, nor is it even applicable in many industries.
But, buzzwords are not what I wanted to talk about today. I wanted to talk about…
Specifically, the storage of passwords (on a server or similar).
All too often, I’m given the (ssh)keys to the castle, and asked to poke around. And what do I see (after I create a few accounts)?
Yeah… My username/password in plain-text… Sigh…
What If You Don’t Have Server Access?
A little trick I’ve used when trying to figure out independently if the database passwords are hashed, when I’m not given full access, is to sit next to the youngest dev ops person and after a few hours, shout a few expletives about how I forgot my password and I’m locked out of a bunch of data. Then, politely (but urgently) ask them to email me my password so I can log back in (saying something about how the CEO needs this thing I’m working on right NOW).
If the answer I get is anything other than “we don’t store passwords in plain text” or “passwords are stored hashed”, then I know something shady is going on and I need to investigate.
Why Does This Still Happen?
I don’t have a single solitary clue…
For new companies, I would love to blame inexperience and a lack of understanding of crypto - but open source libraries make stuff like this a few lines of code.
Old companies with legacy databases tend to say something like “it’ll cost too much to fix”. While I can’t speak for the enormity of the work involved, I can safely say that it’s usually worth it, in the event something goes off the rails - and the mountain of liability that might (read: I hope) comes your way.
What Can I Do About It?
Nothing… Wait, no, I mean ANYTHING AND EVERYTHING!!!
First thing though, read through this article about plaintext vs fast hashes vs slow hashes.
Fast hashes (SHA256 or similar) are great for digests and the like, but not for hashing passwords. I’m surprised by the number of people who think using a fast hash for passwords is a GOOD thing! Quite the opposite, the point is to inject some computational complexity to make brute forcing a password a pain in the ass.
BCrypt or PBKDF2 or SCrypt?
I’m going to anger a lot of crypto fanboys, but in the most pragmatic way… It doesn’t matter…
I don’t have the pedigree to debate crypto pros/cons across those algorithms, but what I’m trying to say is that if you’re debating between those 3, you’re already ahead of the game. You’re not asking “SHOULD I hash” nor are you bringing SHA-X into the mix…
If you really care, read some journal articles on those 3 and then tell me which one I should be using.
My personal preference is BCrypt, because open-source implementations tend to have a very clean, simple API with minimal inputs (I don’t want to have to pick the number of rounds, or pass in a salt, or anything I can screw up - the implementation should decide something reasonable, and if I care, I can change it).
PassLib also does a quick summary with their thoughts (and I agree with most of the comments/concerns).
Fixing The Problem
Okay, let’s say you’re in the unfortunate state of having an unhashed set of passwords in your database. What do you do…?
Well, hash them. Obviously.
The first recommendation I would have if possible is to run a crawler through the database and hash all passwords in place. If there are a lot of passwords (more than a million) and assuming that each BCrypt takes at least 100ms, you could be there a while. Ideally, split up the DB, and operate in parallel (and arguably, take down the database for a weekend and just do it all together).
I can see how that’s not entirely endearing as a solution, and doing it in place on a live DB might introduce an occasional race condition - as well as a strange hot-swapping code situation.
The other option is…
Lazily Fixing The Problem
Using the assumption that the underlying codebase has a database and ORM of some sort (rather than just raw queries littered throughout), there is a good secondary option. If you don’t have an ORM, or have other architectural problems - well… Fix those first, then come back to this.
What you can do is update your ORM layer to support both hashed and unhashed passwords, and then only hash passwords for new users and existing users who have successfully (or unsuccessfully) attempted a login. That way, for all your active users, the problem is taken care of on the fly.
This still leaves you with unhashed passwords for your inactive users - but you can probably mark those and use the database crawler to handle those without worrying too much about race conditions.
Below is a Python snippet of what I often see from a User ORM instance. Just copies the password right in:
class BadUser(): ''' This is a trivial example, no error checking inputs ''' def __init__(self, username, password): self._username = username self.set_password(password) ''' Note the lack of hashing on the password - that's why this is a BadUser ''' def set_password(self, password): self._password = password ''' Here, we just compare plaintext passwords - this is bad ''' def verify_password(self, password): return password == self._password def __str__(self): return "BadUser - username: %s, password: %s" % (self._username, self._password)
Here is what I would prefer to see (using Passlib’s BCrypt as an example):
from passlib.hash import bcrypt class GoodUser(): ''' This is a trivial example, no error checking inputs ''' def __init__(self, username, password): self._username = username self.set_password(password) ''' When we want to set the password, we explicitly bcrypt it ''' def set_password(self, password): self._password = bcrypt.encrypt(password) ''' Password verifications happen by BCrypting the password we pass in, and comparing hashes ''' def verify_password(self, password): return bcrypt.verify(password, self._password) def __str__(self): return "GoodUser - username: %s, password: %s" % (self._username, self._password)
I’m just using Python as an example, because it’s a very concise way to make a point, but you’d see similar in whatever language you choose to use.
So, now, assuming your codebase started off at BadUser (which we’ll say is due to ‘legacy’ problems), how do you turn it into something like GoodUser?
Simple! As I said above, just hash passwords by default for new users, and for users who have successfully (or unsuccessfully) logged in.
from passlib.hash import bcrypt class LegacyUser(): ''' This is a trivial example, no error checking inputs ''' def __init__(self, username, password): self._username = username self.set_password(password) ''' New passwords/users will, by default, have BCrypted passwords ''' def set_password(self, password): self._password = bcrypt.encrypt(password) ''' Here's the trick. We first check (by some mechanism) if the existing password is BCrypted. If so, verify using BCrypt. If not, do plaintext comparison, and if successful, THEN BCrypt the password. Else, fail out (or BCrypt anyways, if you want to... Upon any activity of a user's account) ''' def verify_password(self, password): # Check if this password was previously BCrypted # Could check some better ways, but this is least amount of code if bcrypt.identify(self._password): return bcrypt.verify(password, self._password) # Check plain-text, and if valid, hash the password if password == self._password: self.set_password(self._password) return True # Optionally, BCrypt the password anyways: self.set_password(self._password) return False def __str__(self): return "LegacyUser - username: %s, password: %s" % (self._username, self._password)
So, as shown below, when a correct password is verified, the LegacyUser’s password ends up hashed.
>>> # Mimic a legacy user ... old_legacy_user = LegacyUser("sj", "DontHashMeBro!!!") >>> old_legacy_user._password = "DontHashMeBro!!!" >>> print "Old Legacy User - " + str(old_legacy_user) Old Legacy User - LegacyUser - username: sj, password: DontHashMeBro!!! >>> print "Incorrect Password Returned: " + str(old_legacy_user.verify_password("WrongPassword")) Incorrect Password Returned: False >>> print "Old Legacy User's password is not hashed - " + str(old_legacy_user) Old Legacy User's password is not hashed - LegacyUser - username: sj, password: DontHashMeBro!!! >>> >>> # Next time the correct password is verified, the internal password comes out hashed ... print "Correct Password Returned: " + str(old_legacy_user.verify_password("DontHashMeBro!!!")) Correct Password Returned: True >>> print "Old Legacy User's password is hashed - " + str(old_legacy_user) Old Legacy User's password is hashed - LegacyUser - username: sj, password: $2a$12$hcQpDZP1l/pk7nYSUYc15OYKm.Kr.XIKdQlv7sDhM0Q27S9ldC8X. >>>
And there you have an easy way to lazily hash passwords.