How Does a Hashing Algorithm Work?
The key used in public encryption is based on a hash value. This is a value that is generated from a base input number that makes use of a hashing algorithm. The important thing to know about a hash value is that it is almost impossible for the original input number to be found out without knowing the data used to form the hash value.
Here is an example of how hashing works:
Input Number: 365,258
Hashing Algorithm: Input# x 124
Hash Value: 45, 291, 992
A hashing algorithm is a cryptographic hash function that maps data of random size to a hash of a specific size. Although they have been designed to be a one-way function impossible to invert, a lot of hashing algorithms have been compromised in time.
Cryptographic hashes are used mostly in IT for digital signatures, password storage, file verification systems, message authentication codes and other types of authentication.
They can also be used for indexing data in hash tables, for fingerprinting, identifying files, or detecting duplicates. The basic idea is to use a deterministic algorithm that takes in one input and generates a fixed length string every time. As a result, the same input will always give the same output.
An issue with hashing algorithms is the certainty of collisions. That is because hashes represent a fixed length string, meaning that for every input imaginable, there are other possible inputs that will generate the same hash.
If an attacker succeeds in creating collisions on demand, he can pass off malicious files or data as having the correct hash and pass as legitimate. A good hash function should make things extremely difficult for attackers to generate inputs that hash to the same value.
Has computing should not be way too efficient, as that makes it easier for hackers to artificially compute collisions. Hashing algorithms have to be resilient against “pre-image attacks”. Specifically, it should be extremely difficult to calculate the retracing of the deterministic steps taken to replicate the value that generated the hash.
An ideal cryptographic hash function should meet the following criteria:
- it should be able to rapidly compute the hash value for any kind of data
- its hash value should make it impossible to regenerate a message from it (brute force attack being the only option)
- it should not permit hash collisions; each message must have their own hash.
- every modification made to a message should change the hash value. Any kind of change should result in a completely different hash. This phenomenon is called the avalanche effect.
Popular hashing algorithms
MD5 is one of the best-known hashing algorithm used extensively until it was render ineffective. Because of its extensive vulnerabilities, it has been compromised. In MD5, it’s quite easy to manipulate a document by inserting a malicious code while still producing the same hash. Its popularity was responsible for its demise. It was used so extensively, that now you can crack it just by using Google.
The CMU Software Engineering Institute views MD5 as being “cryptographically broken and unsuitable for further use”. It was used for many years, but now its main use consists of verifying data against unintended corruption.
Secure Hash Algorithm is a cryptographic hash function developed by the NSA. Their first algorithm, SHA-0 (released in 1993) has been compromised years ago. SHA-1 (1995) generates a 160-bit (20-byte) hash output. SHA-1 improved MD5 by just increasing the hash value to a 40 digits long hexadecimal number. The algorithm also became compromised in 2005 as there were discovered theoretical collisions, but its exact fall happened in 2010 when many organizations started recommending its replacement.
The safest version in use is currently SHA-2. SHA-2 incorporates a lot of important changes. Its family features six hash functions: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256.
In 2006, the National Institute of Standards and Technology (NIST) released a competition to find a replacement for SHA-2 that would be completely different implemented as a standard later on. The SHA-3 is part of a family of hashing algorithms known as KECCAK (pronounced ketch-ak).
In spite of sharing a similar name, SHA-3 is different in its internal through a mechanism known as a sponge construction, which makes use of random permutations to absorb and output data while serving as randomizing future inputs that are integrated into the hashing algorithm.
While they are absorbed, the message blocks are XORed into a subset of state, which is then transformed into one element, but alternated with state alterations. SHA-3 allows overcoming the limitations of preceding algorithms. It became a standard in 2015.
Hashing in Blockchain
Bitcoin SHA256 implementation can be computed with great efficiency by using Application Specific Integrated Circuits (or ASICs). The Proof of Work incentivizes the machines to compute together into pools and increase what is called “hash power”, which is a measure of the number of hashes a machine can compute in a time interval.
Bitcoin hashes data with SHA256 by running two iterations of the algorithm in its protocol. A double SHA256 is used to alleviate the damages incurred by a length-extension attack.
This sort of attack involves malicious actors finding out the length of a hash input which can be used to trick the hash function to start a certain part of its internal state by attaching a secret string to the hash value.
Ethereum uses a modified SHA-3 known as KECCAK256. In addition, Ethereum’s proof of work algorithm, Dagger-Hashimoto, was designed to be memory-hard to compute for hardware.
As history has shown us, hashing algorithms continuously evolve and get replaced by new ones, as the older ones are proven ineffective against attacks and ingenious hackers.