What Is Hashing?
Hashing is a technique used to convert a key into another value (typically for cryptographic or data storage purposes). It works by running a mathematical function (called a hash function) called a key to create a new value — the hash (or the hash value).
One-way hashes
Cryptographically secure hash functions are said to be good when they’re one way (one way hashes). This means that you cannot get the original key from the hash value.
Collisions
If a hash function yields the same hash value for two or more keys, then that’s when we end up with a collision — this is non-ideal. There are ways to handle this (see this article).
When we do we hashing?
Implementing hash tables
Hash tables are a data structure that helps us store key/value data. They are comprised of two parts — a storage structure (be it an array, object, etc) and a hash function.
The hash function is responsible for fitting data within the data structure so that it can be retrieved later.
If the storage is bounded (fixed) such as an array, then the hash function must generate a key which can be used as the index, but fits within the size constraints of the array.
For example, a trivial formula to compute the index of a string or integer key is to do:
index = key % sizeOfTable
This would work, but for any fixed size table, we’re likely to run into collisions eventually.
See here for “Collision detection and dynamic array resizing".
When storage is unbounded — like when using a JavaScript object — we don’t have to worry about collisions. We can just use some unique aspect of the data (such as an email address, phone number, or a compound-key) to create the hash value to use as an index.
However, if security is important, we should still use a cryptographic function to create the hash.
That leads to the final point.
Cryptography & data encryption
Hashing is a very common technique when it comes to cryptography. The most common reason we hash is to convert passwords to something more secure (than plain text) before we store them to a database. We do this for two reasons:
- If a hacker steals the records from the db, they don’t get the actual passwords — just the hashed ones (which cannot be reversed since they were one way hashes).
- To prevent staff from accidentally viewing plain-text passwords in the production database.
To hash passwords, check out MD5 or SHA-2 — two widely-used crypto hashes.
Stay in touch!
Join 20000+ value-creating Software Essentialists getting actionable advice on how to master what matters each week. 🖖
View more in Data Structures