What Is Hashing?
Hashing is a technique used to convert a key into another value (typically for cryptographic or data storage purposes). It works by running a mathematical function (called a hash function) called a key to create a new value — the hash (or the hash value).
One-way hashes
Cryptographically secure hash functions are said to be good when they’re one way (one way hashes). This means that you cannot get the original key from the hash value.
Collisions
If a hash function yields the same hash value for two or more keys, then that’s when we end up with a collision — this is non-ideal. There are ways to handle this (see this article).
When we do we hashing?
Implementing hash tables
Hash tables are a data structure that helps us store key/value data. They are comprised of two parts — a storage structure (be it an array, object, etc) and a hash function.
The hash function is responsible for fitting data within the data structure so that it can be retrieved later.
If the storage is bounded (fixed) such as an array, then the hash function must generate a key which can be used as the index, but fits within the size constraints of the array.
For example, a trivial formula to compute the index of a string or integer key is to do:
index = key % sizeOfTable
This would work, but for any fixed size table, we’re likely to run into collisions eventually.
See here for “Collision detection and dynamic array resizing".
When storage is unbounded — like when using a JavaScript object — we don’t have to worry about collisions. We can just use some unique aspect of the data (such as an email address, phone number, or a compound-key) to create the hash value to use as an index.
However, if security is important, we should still use a cryptographic function to create the hash.
That leads to the final point.
Cryptography & data encryption
Hashing is a very common technique when it comes to cryptography. The most common reason we hash is to convert passwords to something more secure (than plain text) before we store them to a database. We do this for two reasons:
- If a hacker steals the records from the db, they don’t get the actual passwords — just the hashed ones (which cannot be reversed since they were one way hashes).
- To prevent staff from accidentally viewing plain-text passwords in the production database.
To hash passwords, check out MD5 or SHA-2 — two widely-used crypto hashes.
Discussion
Liked this? Sing it loud and proud 👨🎤.
🚀 55% off The Software Essentialist sale - doors open in 2 days
A lot of devs get stuck in “best practices”.
Between things like SOLID, Clean Code, microservices, single-file components, and so on — there’s a lot you can do...
But in my experience, there’s only a small set of things that really matter at this stage when it comes to making the shift from coding to crafting.
There's only a few things that really move the needle in terms of writing scalable, testable code (on any side of the stack).
That's what the Best Practice-First Phase is all about: the real best practices.
We're doing a 55% off promo of The Software Essentialist from March 31st to April 4th to celebrate the launch of The Best Practice-First phase of craftship in the Early Adopter course.
Folks have been asking me to do one of these for a while, but I wanted to wait until this valuable module dropped.
Join the waitlist for bonuses and to get early bird access (1 to 5 hours before everyone else) when the sale goes live.
Only 200 spots available. Join the waitlist here 🔗.
PS: You can track updates on the course progress & the companion book (solidbook.io) here.
Stay in touch!
Join 15000+ value-creating Software Essentialists getting actionable advice on how to master what matters each week. 🖖
View more in Data Structures
You may also enjoy...
A few more related articles