Introduction
Storing passwords securely should be imperative for any credible engineer. Plain text passwords are extremely insecure - you shouldn't even bother considering storing them a plain format. It's enough that someone gains view privileges on a database for an entire user base to be compromised.
Passwords must be stored in a database in a secure, yet manageable way.
You should always assume that your database will be compromised and take all necessary precautions to prevent anyone, who could get hold of your data, from exploiting it in any way possible. That is especially true for databases that store users' login credentials or other sensitive data.
Additionally - it's a question of ethical conduct. If a user signs up for your website - should you be able to find their password ad verbatim? Passwords are oftentimes used on multiple websites, contain personal information and/or could expose a side of the user that they wouldn't like to put out publicly. Neither you nor a malicious actor should be able to read a plain-text password at any point. This is why websites can't email you your password when you forget it - they don't know it. You have to reset it.
Hashing passwords is a cheap, secure and standard procedure that keeps passwords safe from both a webmaster and malicious actors.
To prevent anyone from blatantly exploiting login credentials, you should always hash passwords before storing them in a database. That is the simplest, yet most effective way to prevent the unauthorized use of passwords stored in your database. Even if someone gets a hold of users' login credentials, that information can't be used in any shape or form, since the format is unreadable for humans, and hard to crack computationally.
In this guide, we'll explain how to hash your passwords in Python using BCrypt. We'll cover what hashing is, how hashes are compared, how "salting" works and how hashing even makes passwords secure.
What Is Password Hashing?
In its most basic form, hashing refers to converting one string to another (which is also called a hash) using a hash function. Regardless of the size of an input string, the hash will have a fixed size which is predefined in a hashing algorithm itself. The goal is that the hash doesn't look anything like the input string and that any change in the input string produces a change in the hash.
Additionally - hashing functions hash input in a one-way fashion. It's not a round trip and a hashed password cannot be unhashed. The only way to check whether an input password matches the one in the database is to hash the input password as well, and then compare the hashes. This way, we don't need to know what the actual password is to ascertain whether it's matching the one in the database or not.
Note: In this guide, we'll use the term "hashing function" for a mathematical function used to calculate the fixed-size hash based on the input string (popular hashing functions include SHA256, SHA1, MD5, CRC32, BCrypt etc.). A "hashing algorithm" refers to the whole process of hashing, including not only a hashing function used but many more parameters that can be altered during the process of hashing.
Every time you put something such as "myPwd"
into the hashing algorithm you'll get the same exact output. But, if you change "myPwd"
even a bit, the output will be changed beyond recognition.
That ensures that even similar input strings produce completely different hashes. If similar passwords produced the same hashes - cracking one simple password could lead to creating a lookup table for other characters. On the other hand, since the same input always yields the same output, a hashing is pretty predictable.
Predictability is easily exploitable.
If someone knows what hashing function was used to hash a certain password (and there isn't a large list of hash functions in use), they can crack it by guessing all possible passwords, hashing them with the same hashing function, and comparing obtained hashes to the hash of the password that they want to crack. This type of attack is called a brute-force attack and the attack used to work extremely well for simple passwords, such as password123
, 12345678
, etc.
The easiest way to prevent brute-force attacks is to use a hashing function that is relatively slow to compute. That way the brute-force attack would take so much time to compute all possible hashes, that it's not even worth trying to perform.
Additionally, most web applications have built-in "timeouts" after a certain number of incorrect passwords were input, making brute-force guessing not viable if someone's trying to brute-force a password through a controlled UI, though, this doesn't hold if someone obtains a local copy of a hashed password.
What Is Salt in Password Hashing?
As cryptography, price per computation and technology advance - just choosing a proper hashing function isn't quite enough to secure passwords stored in a database. In some cases, even a great hashing function can't prevent an attack. Therefore, it's advised to take additional precautions to make cracking stored passwords even more difficult.
The problem with hashing is that the output (i.e. hash) is always the same for the same input. That makes hashing predictable, thus vulnerable. You can solve that by passing an additional random string alongside the input string when performing hashing. That will ensure that the hashing no longer produces the same output every time it gets the same string as the input.
That fixed-length pseudo-random string passed alongside the input string when performing hashing is called salt. Every time you want to store a password in a database, a new, random salt will be created and passed alongside the password to the hashing function. Consequently, even if two users have the same password, its record in a database will be totally different.
Remember that adding a single character to the end of a string before hashing changes the hash completely.
The salt used to generate a password is stored separately, and added to any new input that's to be hashed and compared to the stored hash in the database, ensuring that even with the addition of random elements - the user can log in using their respective password. The point of salting isn't to make it much more computationally difficult to crack a single password - it's to prevent finding similarities between hashed strings, and to prevent an attacker from cracking multiple passwords if they're the same.
Through salting - extremely computationally expensive operations are localized to a single instance and have to be repeated for every password in the database, stopping a cascade of broken security.
Thankfully - the entirety of this logic is typically abstracted away by security frameworks and modules that we can readily use in code.
What is BCrypt?
BCrypt is a password hashing algorithm, designed with all security precautions we've mentioned in mind. It is used as the default password hashing algorithm in OpenBSD, an open-source security-focused operating system, and is the most widely supported hashing algorithm to date.
BCrypt is considered to be fairly secure. Its hashing function is based on the Blowfish (cipher) algorithm, implements salting and adaptive computation speed. Adaptive speed refers to the ability to increase the complexity of calculating hash value, which future proofs the algorithm. It keeps being slow enough to prevent brute-force attacks no matter the increasing computing speed of the hardware.
BCrypt is widely supported and implemented in most mainstream languages. There publicly available implementations for Java, JavaScript, C, C++, C#, Go, Perl, PHP, etc. In this guide, we'll cover the Python implementation of the BCrypt algorithm.
How to Hash a Password in Python Using BCrypt
The bcrypt
module on PyPi offers a great implementation of BCrypt that we can easily install via pip
:
$ pip install bcrypt
Note:
To make sure that all required dependencies are installed, the official documentation advises you to run the following commands based on your operating system of choice.
For Debian and Ubuntu:
$ sudo apt-get install build-essential libffi-dev python-dev
For Fedora and RHEL-derivatives:
$ sudo yum install gcc libffi-devel python-devel
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
For Alpine:
$ apk add --update musl-dev gcc libffi-dev
After you've installed BCrypt using pip
, you can import it to your project:
import bcrypt
To hash your password using BCrypt, you must convert it to the array of bytes first. To achieve that, we can use the encode()
method of the string
class! It will encode the string version of the password you want to hash into a byte array, given a certain encoding type, and make it possible to hash using BCrypt.
Let's take 'MyPassWord'
as the example password to illustrate the usage of BCrypt:
pwd = 'MyPassWord'
bytePwd = pwd.encode('utf-8')
The encode()
method takes a string in some encoding (e.g. ASCII, UTF-8, etc.) and converts it to a corresponding array of bytes. That byte-array formed of a string is called a b-string.
Note: In the previous example, pwd
is a string and bytePwd
is a byte-array. But if you print both variables, the only visible difference is that the bytePwd
has b
as a prefix before its value - b'myPassword'
. Thence the name of that type of byte-array - a b-string.
Finally, you can hash the encoded password using BCrypt:
# Generate salt
mySalt = bcrypt.gensalt()
# Hash password
pwd_hash = bcrypt.hashpw(bytePwd, mySalt)
As you can see, the method used for hashing in BCrypt is hashpw()
. It takes two arguments, the b-string representation of a password and a salt. Obviously, you can manually create a salt, but it's definitely recommended to use the gensalt()
method instead. It's a BCrypt method created specifically for creating salt in a cryptographically secure fashion.
Note: Adaptive computation speed in BCrypt is achieved by setting a number of iterations needed to create a salt. That value is passed as the argument of the gensalt()
method. The default value is 12, meaning that BCrypt uses 212 (4096) iterations to generate a salt. By increasing the value of that argument, you increase the number of iterations used to generate a salt, and by extension, the time needed to compute the hash.
Now, the hash
is storing the hashed version of the password pwd
. The hash
should look somewhat similar to:
b'$2b$12$1XCXpgmbzURJvo.bA5m58OSE4qhe6pukgSRMrxI9aNSlePy06FuTi'
Not very similar to the original password, right? But if you compare the hash
to the original password using BCrypt's checkpw()
method, it returns a True
value!
Note: The checkpw()
method is designed for validating hashed passwords. It hashes the new input password, adds the salt it automatically tracks, and then compares the results.
Let's check whether the literal-text password
is a valid password for the new hash
we've just created:
password = 'MyPassWord'
password = password.encode('utf-8')
print(bcrypt.checkpw(password, pwd_hash))
# Output: True
Components of a BCrypt output
As we've seen in the previous example, the input to BCrypt is a password (up to 72 bytes) and a salt (with the associated number of iterations) and the output is the 24-byte hash.
Let's examine the following illustration to get a grasp of how BCrypt constructs the produced hash:
This illustration shows a hashing of the password, 'MyPassword'
, therefore it illustrates hashing from the previous section.
As we've discussed before, every time you call the gensalt()
method, it produces a new fixed-size byte-array (represented by a b-string). In this example, the gensalt()
method produced the output marked as salt
in the illustration. Let's decompose the salt
section and explain each individual subsection.
The salt
has three subsections divided by the $
sign:
-
bcrypt version
A special hashing algorithm identifier - in this case2b
- the newest version of the BCrypt algorithm. -
exponent
The argument of thegensalt()
method represents the number of iterations used to compute a salt. If no argument is passed, the default value is 12, therefore 212 iterations are used to compute a salt. -
generated salt
A radix-64 encoding of the generated salt represented by 22 characters.
After that, BCrypt sticks the salt
together with the hashed value of MyPassword
and thus creates the final hash
of the MyPassword
.
Note: The hashed value of MyPassword
(or any other password) refers to a radix-64 encoding of the first 23 bytes of the 24-byte hash. It is represented by 31 characters.
Conclusion
After reading this article you will have a solid understanding of how to use a BCrypt to hash a password before storing it into a database. To put things into a perspective, we've explained basic terminology in a general sense and then illustrated the process of hashing a password on the example of BCrypt.