25 min read12 hours ago
–
Press enter or click to view image in full size
Cryptographic hash functions are foundational to the construction of practical cryptographic systems. A hash function transforms input data of arbitrary length into a fixed-length message digest that is easy to compute but impossible to invert. Once a digest is produced, recovering the original message is computationally infeasible. Hash functions are widely used in cryptographic applications, including verifying data integrity, authenticating messages, *generating deterministic randomness, *and *deriving keys *for secure communication protocols.
For many years, the SHA-2 family served as the primary cryptographic hash standard. However, advances in cryptanalysis and the growing security requireme…
25 min read12 hours ago
–
Press enter or click to view image in full size
Cryptographic hash functions are foundational to the construction of practical cryptographic systems. A hash function transforms input data of arbitrary length into a fixed-length message digest that is easy to compute but impossible to invert. Once a digest is produced, recovering the original message is computationally infeasible. Hash functions are widely used in cryptographic applications, including verifying data integrity, authenticating messages, *generating deterministic randomness, *and *deriving keys *for secure communication protocols.
For many years, the SHA-2 family served as the primary cryptographic hash standard. However, advances in cryptanalysis and the growing security requirements for more flexible constructions led to the development of a new standard. In 2015, NIST published FIPS 202, which defined the SHA-3 family of hash functions and introduced a new class of primitives known as extendable-output functions (XOFs).
Unlike earlier designs based on compression functions, SHA-3 is built on a general framework called the sponge construction. The sponge absorbs input data into a large internal state and later squeezes out the output bytes as needed. This design allows the same construction to support both fixed-length hashes (such as SHA-3–256) and variable-length outputs (such as SHAKE128 and SHAKE256).
This article provides a detailed design of the FIPS 202 standard. I’ll begin by explaining the sponge construction and its security properties, then examine how SHA-3 achieves diffusion and irreversibility through its internal permutation. Along the way, I’ll clarify which components of the design are reversible in isolation and why the overall construction remains a *one-way *function. Finally, the SHA-3 and SHAKE design will highlight that they are particularly well-suited for modern cryptographic systems, including post-quantum cryptography.
What is a Cryptographic Hash Function?
A cryptographic hash function is a deterministic algorithm that maps arbitrary-length input data to a fixed-length output, known as a message digest. The foundational mechanism of a secure hash function is that even a slight change in the input, such as flipping a single bit, produces an entirely different output, a behaviour commonly referred to as the avalanche effect.
Despite the sensitivity, the function remains fully deterministic; for a given input, the resulting digest is always the same. This combination of determinism and strong diffusion makes hash functions reliable fingerprints for data.
Unlike general-purpose checksums, cryptographic hash functions are built to satisfy specific security properties. A secure hash function must be computationally infeasible to invert, meaning that given a hash output, it should be impractical to recover the original input.
Preimage resistance: Given a hash value h, it is infeasible to find any message m such that hash(m) = h.
**Second-preimage resistance: **Given a message m, it is infeasible to find a different message m2, such that hash(m1) = hash(m2).
**Collision resistance: **It is infeasible to find any two distinct messages m1 and m2, such that hash(m1) = hash(m2).
These security properties make hash functions suitable for security-critical applications that require integrity, authenticity, and unpredictability.
SHA-2 Overview (FIPS 180–4)
The SHA-2 family of hash functions has long served as the primary cryptographic hashing standard in modern security systems. It includes widely deployed algorithms such as SHA-224, SHA-256, SHA-384, and SHA-512. The SHA-2 family is standardized by NIST in FIPS 180–4 (Secure Hash Standard).
SHA-2 hash functions are built using the Merkle-Damgård construction, in which input data is processed in fixed-size blocks by a compression function. Each block updates an internal chaining state, and after the final block is processed, the resulting state is output as the message digest. The Merkle-Damgård construction has been extensively analyzed and has proven suitable to meet the security goals of the SHA-2 message digest.
This article does not cover the internal compression function, message schedules, or execution flow of the SHA-2 hash functions. Readers interested in the detailed design and implementation of SHA-2 are encouraged to refer to the official NIST specification:
**NIST FIPS 180–4 (Secure Hash Standard) ** https://nvlpubs.nist.gov/nistpubs/fips/nist.fips.180-4.pdf
SHA-3 and Sponge Construction
SHA-3 is the latest member of the Secure Hash Algorithm family, standardized by NIST in FIPS 202. Unlike** SHA-2**, which evolved from earlier hash designs, SHA-3 is built on a completely different internal model.
At a high level, SHA-3 provides the same essential security guarantees as earlier hash functions: preimage-resistance, collision resistance, and strong diffusion, while also introducing a more flexible framework for constructing cryptographic digests. Instead of processing input through a chain of compression steps, SHA-3 operates on a large internal state and repeatedly applies a single cryptographic permutation. This design approach is known as the sponge construction.
In the Sponge model, input data is absorbed into the internal state in a fixed-size block, and output data is later squeezed from the same state. The same underlying construction supports both fixed-length hash functions (such as SHA-3–256) and extendable-output functions (such as SHAKE128 and SHAKE256).
Extendable-Output Functions (XOFs)
An extendable-output function generalizes the concept of a hash function by allowing the output length to be chosen arbitrarily. Instead of producing a fixed-size digest, an XOF can generate as many output bytes as required.
SHAKE128 and SHAKE256, defined in FIPS 202, are examples of XOFs. They are useful for applications that require deterministic pseudorandom output, such as key derivation, masking, and post-quantum cryptography* algorithms.*
Merkle–Damgård vs Sponge Construction
To understand why SHA-3 behaves differently from SHA-2, it is useful to compare the underlying constructions that define how input data is processed and how output is produced.
Merkle–Damgård Construction (SHA-2)
The Merkle–Damgård construction processes the input data sequentially in fixed-size blocks using a compression function. Each block is combined with an internal chaining value to produce a new chaining value, which is then fed into the next level. After the final block is processed, the resulting chaining value becomes the hash digest. This approach is efficient and well understood, but it’s tightly coupled to the hash function and to a fixed output size, and it introduces structural properties such as length-extension behaviour.
Merkle–Damgård hashes form a chain in which each block cryptographically links to the previous one, and the final block generates a hash that leaves a new state, which can be used to extend the chain.
As a result, specific features, such as variable length or domain separation, must be implemented outside the core construction.
Sponge Construction (SHA-3)
The Sponge construction replaces the compression pipeline with a large internal state that is reused throughout the computation. Instead of compressing blocks into a chaining value, the sponge alternates between two phases:
Absorbing: input blocks are mixed into the internal state and XORed with the rate portion of the state, then permuted.
Squeezing: output blocks are extracted from the same state and read from the rate portion.
The internal state is divided into two logical regions:
- Rate: the portion of the state that interacts with input and output.
- Capacity: the portion of the state preserve for security and never directly exposed.
The sponge design model changes the behaviour of how data flows through a hash function. Instead of chaining states forward, the sponge continuously evolves a single state, with security determined by how much of that state remains hidden.
Press enter or click to view image in full size
SHA-3 Sponge Construction Process
Keccak-p Permutations: The Engine Behind SHA-3
At the core of SHA-3 lies a single cryptographic primitive: a permutation applied repeatedly to a fixed-size internal state. This family of permutations is known as Keccak-p[b, nr], where b denotes the width of the state in bits and nr denotes the number of rounds applied to the permutation.
Valid state width includes b ∈ {25, 50, 100, 200, 400, 800, 1600}, with Keccak-f[1600] — a 1600-bit permutation with 24 rounds being the variant standardized in FIPS 202. All the diffusion, mixing, and security properties of SHA-3 ultimately rely on the repeated permutation of Keccak-f across many rounds.
The Keccak permutation is a bijective transformation: it deterministically rearranges and mixes bits. The permutation can be inverted with the given full internal state. At first glance, this may appear counterintuitive for a cryptographic hash function, which is expected to be one-way. However, this reversibility is not a weakness — it is a deliberate and essential design choice.
In SHA-3, irreversibility does not come from the permutation itself, but from the way the permutation is embedded within the sponge construction. Only a portion of the internal state, known as the rate, is ever exposed to input or output. The remaining capacity is never directly revealed. Because an attacker never observes the full state, reversing the permutation in practice becomes computationally infeasible, even though the permutation is mathematically invertible.
Keccak State
Keccak-f[1600] operates on a 1600-bit internal state, organized as a 5x5 matrix of 64-bit lanes.
Get Deeptiman Pattnaik’s stories in your inbox
Join Medium for free to get updates from this writer.
Each element of A[x, y, z] is a 64-bit word, often called a lane. This structure layout allows Keccak to mix data across rows, columns, and diagonals in a highly mechanical way. The entire state is treated as a single object: the permutation operates on all 1600 bits in 24 rounds.
Let A be the state array and to define a lane(x, y) computation is done as follows:
A[x ,y, z] where 0 ≤ x < 5, 0 ≤ y < 5, 0 ≤ z < w,
where
— x, y → lane coordinates in the 5x5 state.
— z → bit index wihtin the lane.
— w →64-bit word for Keccak-f[1600]
// lane[x, y] — -> lane[x + 5y] func idx(x, y int) int { return x + 5*y }
lane(x, y) = A[idx(x, y)] // 64-bit word
Keccak-p vs Keccak-f
The notation Keccak-p[b, nr] refers to a permutation operating on a b-bit state for nr rounds. And the ***Keccak-f ***variants are fixed, full-round permutations defined for specific state sizes. Each round applies the same sequence of transformation steps with round-specific constants.
For SHA-3: — State size: 1600 bits (b) — Number of rounds: 24 (nr) —Permutation: Keccak-f[1600]
The Five-Step Mappings
Each round of the ***Keccak-f ***permutation consists of five transformations in the following order:
1. θ (Theta) — mixes columns
For each column, θ computes the XOR of all five lanes in that column and then propagates this value into neighbouring columns. This operation ensures that every bit is touched by multiple columns in subsequent rounds.
— Spreads input differences across the entire state. — Provides global diffusion. — Preserves linearity.
// idx maps 2D Keccak coordinates (x, y) into a linear index.//// The state is stored in row order://// l[x, y] ---> l[x + 5y]func idx(x, y int) int { return x + 5*y }// Theta implements the θ step of Keccak.//// Theta mixes the bit with the parity of its column and adjacent columns.// This step provides diffusion across the entire state.func (l *Lanes) Theta() { var col, diagonal [5]uint64 for x := 0; x < 5; x++ { col[x] = l[idx(x, 0)] ^ l[idx(x, 1)] ^ l[idx(x, 2)] ^ l[idx(x, 3)] ^ l[idx(x, 4)] } // Compute the diagonal correction for each column. for x := 0; x < 5; x++ { diagonal[x] = col[(x+4)%5] ^ bits.RotateLeft64(col[(x+1)%5], 1) } // Apply the correction to every lane in the column. for x := 0; x < 5; x++ { for y := 0; y < 5; y++ { l[idx(x, y)] ^= diagonal[x] } }}
2. ρ (Rho) — Bitwise Rotation
The ρ step rotates each lane by a fixed lane-specific offset. Each of the 25 lanes is rotated left by a predetermined number of bits. These offsets are chosen so that bits are redistributed across different positions in later steps.
— Breaks bits alignment. — Preserves bit values, only changes positions.
// Rho implements the ρ steps of Keccak.//// Each lane is rotated left by a position-specific offset.// This step moves bits within each lane to different bit positions.func (l *Lanes) Rho(rhoOffsets [25]int) { for i := 0; i < 25; i++ { l[i] = bits.RotateLeft64(l[i], rhoOffsets[i]) }}// RhoOffsets computes the rotation left by a fixed amount that depends on its position.// The offsets are defined by the Keccak specification and ensure that bits are spread// across different bit positions in subsequent rounds.func RhoOffsets() [25]int { var rho [25]int x, y := 1, 0 for t := 0; t < 24; t++ { rho[x+5*y] = ((t + 1) * (t + 2) / 2) % 64 x, y = y, (2*x+3*y)%5 } return rho}
3. π (Pi) — permutes lane positions
The π step permutes the positions of the lanes within the 5x5 matrix. Each lane is moved to a new location according to a fixed mapping. The rearrangement ensures that bits from one column are prepositioned in different columns in later rounds.
— Shuffles structural relationships of the bits in the column. — Prepares the state for non-linear mixing.
// Pi implements the π step of Keccak.//// Pi permutes the positions of the lanes within the state, rearranging them to break column-wise// structure.func (l *Lanes) Pi() { var a [25]uint64 for x := 0; x < 5; x++ { for y := 0; y < 5; y++ { a[idx(y, (2*x+3*y)%5)] = l[idx(x, y)] } } *l = a}
4. χ (Chi) — introduces non-linearity
The (x) step is the only non-linear operation in the Keccak permutation. It operates on each row independently and updates each lane based on itself and its neighbours. This stage introduces conditional dependencies between lanes and breaks linear relationships created by the earlier stage.
The χ step can be understood as a controlled three-stage disruption. It first inverts a neighbouring lane, then conditionally gates information through an AND operation, and finally XORs the result back into the original state. Together, these operations break the linear structure and prevent the output from being presented as a linear function of the inputs.
— Destroys the linear correlation or transformation. — Prevents algebraic reconstruction. — Provides cryptographic confusion.
Swap under the Tunnel (Intuition)
As an intuition χ step can be compared like a controlled swap under a tunnel. Just during a heist, swaps three identical trucks inside the tunnel so observers lose track of which one carries the payload, χ temporarily obsures the relationship between neighbouring lanes. Each bit exits the round carrying the information from multiple sources, but without any reliable indication of where it originated.
// Chi implements the χ step of Keccak.//// Chi is the only non-linear step. It operates on each row independently combining bits using// AND, NOT, and XOR operations.func (l *Lanes) Chi() { for y := 0; y < 5; y++ { var row [5]uint64 for x := 0; x < 5; x++ { row[x] = l[idx(x, y)] } for x := 0; x < 5; x++ { // Triple bluff: // // 1. Negation (^row[(x+1)%5]) inverts the bits of the neighbouring // lane breaking linear expectations. // // 2. AND creates (^row[(x+1)%5] & row[(x+2)%5]) conditional dependency // on two neighbours. A bit is returned only if both conditions are met. // // 3. XOR with the original lane (⊕ row[x]) injects this conditional mask // to back into the state but output cannot be recomputed as a linear function // of the inputs. l[idx(x, y)] = row[x] ^ (^row[(x+1)%5] & row[(x+2)%5]) } }}
5. ι (Iota) — injects round constants
The ι step XORs a round-specific constant into one of the states. This prevents the permutation from exhibiting symmetry or fixed-points across rounds. Without (ι), certain structured inputs could evolve identically across rounds.
The ι step injects a round-specific constant that acts as an identity marker for each permutation round. Much like a VISA stamp marks each entry and exit across the border, this constant ensures that each round is uniquely tagged. Although the same permutation logic is applied repeatedly, the injected constants prevent the state from evolving symmetrically or cycling across rounds.
// round constantsvar RC = [24]uint64{ 0x0000000000000001, 0x0000000000008082, 0x800000000000808A, 0x8000000080008000, 0x000000000000808B, 0x0000000080000001, 0x8000000080008081, 0x8000000000008009, 0x000000000000008A, 0x0000000000000088, 0x0000000080008009, 0x000000008000000A, 0x000000008000808B, 0x800000000000008B, 0x8000000000008089, 0x8000000000008003, 0x8000000000008002, 0x8000000000000080, 0x000000000000800A, 0x800000008000000A, 0x8000000080008081, 0x8000000000008080, 0x0000000080000001, 0x8000000080008008,}// Strike applies the Iota(ι) step of Keccak.//// The round constant is XORed into lane (0, 0) to break symmetry between rounds.func (l *Lanes) Strike(round int) { l[0] ^= RC[round]}
Non-linearity: The Core of Keccak’s Security
The first three stages, θ, ρ, and π, spread information efficiently across the state; they are all linear operations. The χ step breaks the linear relationship between bits across neighbouring columns.
— Output bits are no longer linear combinations of input bits. — Small differences become conditionally dependent on neighbouring values. — After (χ) stage, algebraic relations between bits are destroyed.
In practical terms, χ breaks all meaningful relationships between adjacent and diagonal lanes, producing values that no longer resemble their neighbours. This is where structured diffusion becomes cryptographic confusion.
Keccak’s non-linearity is best understood as boxes over boxes rather than boxes inside boxes. Each round overlays conditional behaviour onto the current state, creating intermediate values whose apparent structure is intentionally deceptive and only gains meaning as it propagates through later rounds.
SHA-3 Permutation Mental Model and Why 24 Rounds?
The sponge construction works like reading a book where several critical pages are permanently missing. No matter how many times you read the visible pages (the rate), one can never reconstruct the missing ones (the capacity). The permutation rearranges all pages each time, but the hidden ones are never revealed.
The Keccak-f[1600] permutation used by SHA-3 applies 24 rounds of transformation. This round count is fixed by the FIPS 202 standard and was selected as a security margin based on extensive cryptanalysis during the SHA-3 competition.
Internal State Partitions: Rate vs Capacity
The security of SHA-3 does not come from secrecy nor from the complexity of its Keccak mechanical permutation. Instead, it is defined by how the internal state is partitioned into rate and capacity, and by how these two portions interact within the sponge construction.
State (1600 bits)
**Rate (r bits): ** 1. The portion of the state that absorbs input and produces output. 2. Input data is XORed into the rate portion, and then a permutation is applied. 3. Output bytes are read from the rate portion.
**Capacity (c bits): ** 1. The portion of the state that is never directly exposed. 2. The capacity contains bits that are never revealed, even during the output. 3. The resistance of SHA-3 to generic attacks is directly tied to the size of the capacity.
In practical terms: — Collision resistance: 2^(c/2) , where c = capacity bits which remains hidden — Preimage resistance: 2^c
// debugCapacity returns the capacity lanes.// FOR TESTING / DEMONSTRATION ONLY.func (s *State) debugCapacity() []uint64 { rateLanes := s.rate / 8 return s.lanes[rateLanes:]}func TestCapacityHidden(t *testing.T) { s := NewShake128() s.Write([]byte("Hello World")) s.Read(make([]byte, 32)) capacity := s.debugCapacity() for i, lane := range capacity { t.Logf("capacity lane %d: %016x", i, lane) }}
Press enter or click to view image in full size
Keccak Lane View (Rate vs Capcity)
SHAKE128 vs SHAKE256
Both SHAKE128 and SHAKE256 are extendable-output functions (XOFs) built on the same Keccak-f[1600] permutation and the same sponge construction. They differ only in how the 1600-bit state is split between rate and capacity.
Press enter or click to view image in full size
SHAKE128 and SHAKE256 Properites
Press enter or click to view image in full size
SHAKE128
Press enter or click to view image in full size
SHAKE256
// xorIn XORs a rate-sized block into the Keccak state.//// The input block is interpreted as little-endian 64-bit words and XORed lane-by-lane into the state.// Only the first rate/8 lanes are modified, the remaining lanes forms the sponge capacity and are// untouched.//// This function is used during the absorbing phase of the sponge.func (s *State) xorIn(block []byte) { bits := len(block) / 8 for i := 0; i < bits; i++ { s.lanes[i] ^= binary.LittleEndian.Uint64(block) block = block[8:] }}// copyOut copies output from the Keccak state into the provided buffer.//// State lanes are written in little-endian order one lane (8 bytes) at a time. And only the rate// portion of the state is exposed, capacity lanes are never copied out.//// This function is used during the squeezing phase of the sponge.func (s *State) copyOut(b []byte) { for i := 0; len(b) >= 8; i++ { binary.LittleEndian.PutUint64(b, s.lanes[i]) b = b[8:] }}
SHA-3 Padding: Domain Separation and Message Finalization Bit (0x80)
SHA-3 uses a carefully designed padding scheme that serves two distinct but equally important purposes: domain separation and the final padding bit, which marks the end of the message.
Domain Separator Byte
The domain-separator byte prepositions the hash block and enforces alignment between the input and the intended construction. Its purpose is to ensure that the same message, when processed under different hash function paradigms built on the same permutation, cannot collide or be misinterpreted.
The main purpose of the domain-separator byte is to identify which hash-function construction is being used on top of the Keccak permutation.
— SHA-3 fixed-length hashes use 0x06
— SHAKE extendable-output functions use 0x1F
Although SHA-3 and SHAKE share the same Keccak-f[1600] permutation, they must operate in distinct cryptography domains. Let’s run a quick test with the input message “Hello World” against all SHA-3 and SHAKE hash functions; the output blocks will differ.
func TestHelloWorldHash(t *testing.T) { msg := "Hello World" for _, paradigm := range []string{"SHA3-224", "SHA3-256", "SHA3-384", "SHA3-512", "SHAKE128", "SHAKE256"} { var s State switch paradigm { case "SHA3-224": s = New224() case "SHA3-256": s = New256() case "SHA3-384": s = New384() case "SHA3-512": s = New512() case "SHAKE128": s = NewShake128() case "SHAKE256": s = NewShake256() } h := make([]byte, 64) msgBytes := []byte(msg) s.Write(msgBytes) s.Read(h[:]) fmt.Println("Paradigm = ", paradigm, " - hashBlocks = ", h) } }Output:Paradigm = SHA3-224 - hashBlocks = [142 128 0 121 160 179 17 120 139 242 147 83 244 0 239 249 105 182 80 163 89 124 145 239 217 170 91 56 125 135 205 161 200 196 70 237 97 88 224 135 171 14 236 25 67 234 56 120 117 120 244 60 201 140 76 254 230 127 29 227 56 242 112 165]Paradigm = SHA3-256 - hashBlocks = [225 103 246 141 101 99 215 91 178 95 58 164 156 41 239 97 45 65 53 45 192 6 6 222 124 189 99 11 178 102 95 81 132 219 99 6 49 203 184 44 101 25 48 246 53 216 231 42 10 207 211 88 124 178 239 16 79 25 117 119 163 162 54 178]Paradigm = SHA3-384 - hashBlocks = [167 142 194 133 30 153 22 56 206 80 93 74 68 239 166 6 221 64 86 211 171 39 78 198 253 186 192 12 222 22 71 130 99 239 114 19 186 213 167 219 112 68 245 141 99 122 253 235 108 172 131 32 214 216 18 154 63 218 8 195 223 146 116 131]Paradigm = SHA3-512 - hashBlocks = [61 88 167 25 198 134 107 2 20 249 107 10 103 179 126 81 169 30 35 60 224 190 18 106 8 243 95 223 76 4 60 97 38 244 1 57 191 188 51 141 68 235 42 3 222 159 123 184 239 240 172 38 11 54 41 129 30 56 154 95 190 232 168 148]Paradigm = SHAKE128 - hashBlocks = [18 39 197 248 130 249 197 123 242 227 228 141 44 135 235 32 243 130 164 182 57 181 77 38 246 213 149 255 61 185 6 77 7 78 231 136 240 116 124 163 252 70 206 134 147 108 252 107 211 99 141 174 90 43 125 101 146 82 41 152 214 218 111 170]Paradigm = SHAKE256 - hashBlocks = [132 13 28 232 26 67 39 132 11 84 203 29 65 153 7 253 31 98 53 155 173 51 101 110 5 134 83 210 228 23 42 67 172 201 88 219 236 12 240 212 115 219 69 140 225 192 7 170 110 180 14 172 146 170 14 101 32 46 219 77 127 238 211 120]
Final Padding Bit (0x80)
The final padding step in SHA-3 applies an XOR with 0x80 to the last byte of the rate buffer. This sets the most significant bit of the final rate byte, corresponding to the closing 1 bit in the multi-rate padding scheme pad10*1.
This final 0x80 bit acts as an explicit end-of-message delimiter. It guarantees that every padded message has a unique representation and that no padded message can be a prefix of another.
Without this final bit, different messages could produce identical padded states. Even if a domain-separator byte is present, omitting the final 0x80 would allow distinct inputs to collide after padding, breaking the injective property required for secure hashing.
SHA-3 and SHAKE in the Post-Quantum Cryptography schemes
In modern cryptography, Post-quantum cryptography schemes rely heavily on cryptographic hash functions, not only for integrity but also as core building blocks. Unlike the classical public-key systems, PQC scheme constructions are built around structured randomness, deterministic randomness, and domain-separated hashing.
SHA-3 and its extendable-output variants (SHAKE128 and SHAKE256) are particularly well-suited for these requirements. The sponge construction provides a clean, flexible interface for absorbing structured input and deterministically expanding it into large, pseudorandom outputs. This capability is essential in post-quantum schemes where small seeds must be expanded into vectors, matrices, or challenges with well-defined statistical properties. Domain separation is also an important aspect when building Post-quantum protocols, which frequently reuse hash functions across multiple security layers within the same algorithm. This additional byte separates hash function results even though the underlying permutation is the same.
In later articles, I’ll provide design implementation of how SHA-3 and SHAKE serve as fundamental components in lattice-based cryptography, where hash functions are used for key generation, encapsulation, and signature mechanisms.
Appendix: Full SHA-3 Source Code
package sha3import ( "encoding/binary")// SpongePhase represents the current phase of the sponge construction.type SpongePhase intconst ( // Absorbing indicates that input data is being absorbed. Absorbing SpongePhase = iota // Squeezing indicates that output data is being produced. Squeezing)const ( // Maximum rate in bytes (SHAKE128). maxRateBytes int = 168 // Domain separation bytes. // These distinguish SHAKE from fixed-length SHA-3 hashes. dsbyteShake = 0x1f dsbyteShake2 = 0x06 // Default output sizes for fixed-length SHA-3 hashes. outputLenSHA3_224 = 28 outputLenSHA3_256 = 32 outputLenSHA3_384 = 48 outputLenSHA3_512 = 64 // SpongePhase rates in bytes for each SHA-3 variants. rate128 = 168 rate224 = 144 rate384 = 104 rate512 = 72 rate256 = 136)// State represents the internal state of the SHA-3 / SHAKE sponge.type State struct { // lanes: Keccak-f[1600] state (5x5 lanes of 64 bits). lanes keccak.Lanes // rate: Number of bytes absorbed/sequeezed per block. rate int // dsbyte: Domain-separator byte. dsbyte byte // bufLen: Number of buffered bytes. bufLen int // rateBuf: Maximum rate-sliced buffer bytes. rateBuf [maxRateBytes]byte // outputLen: Default output size for fixed-length hashes. outputLen int // phase: Current sponge phase. phase SpongePhase}// NewShake128 returns a new SHAKE128 sponge state.func NewShake128() State { return State{rate: rate128, dsbyte: dsbyteShake, phase: Absorbing}}// NewShake256 returns a new SHAKE256 sponge state.func NewShake256() State { return State{rate: rate256, dsbyte: dsbyteShake, phase: Absorbing}}// New224 returns a new SHA-3-224 sponge state.func New224() State { return State{rate: rate224, dsbyte: dsbyteShake2, outputLen: outputLenSHA3_224}}// New256 returns a new SHA-3-256 sponge state.func New256() State { return State{rate: rate256, dsbyte: dsbyteShake2, outputLen: outputLenSHA3_256}}// New384 returns a new SHA-3-384 sponge state.func New384() State { return State{rate: rate384, dsbyte: dsbyteShake2, outputLen: outputLenSHA3_384}}// New512 returns a new SHA-3-512 sponge state.func New512() State { return State{rate: rate512, dsbyte: dsbyteShake2, outputLen: outputLenSHA3_512}}// ShakeSum256 computes a SHAKE256 digest of the data into hash.func ShakeSum256(hash, data []byte) { h := NewShake256() _, _ = h.Write(data) h.Read(hash)}// Write absorbs input data into the sponge.//// Data is buffered until a full rate of block is available then XORed into the Keccak state followed// by a permutation.func (s *State) Write(data []byte) (written int, err error) { var block []byte written = len(data) for len(data) > 0 { block, data = s.absorbBytes(data) copy(s.rateBuf[s.bufLen:], block) s.bufLen += len(block) if s.bufLen == s.rate { s.permute() } } return written, nil}// Read squeezes output bytes from the sponge.//// If the sponge is still in the absorbing phase, padding is applied and the state transitioned to// squeezing.func (s *State) Read(out []byte) { if s.phase == Absorbing { s.padAndPermute(s.dsbyte) } consumed := 0 // track how many bytes of the current absorbed block have been consumed. for len(out) > 0 { n := copy(out, s.buf()) consumed += n out = out[n:] if consumed == s.bufLen { s.permute() } }}// absorbBytes returns the maximum input slice that fits into the rate buffer.func (s *State) absorbBytes(input []byte) ([]byte, []byte) { maxRate := s.rate - s.bufLen if maxRate > len(input) { maxRate = len(input) } return input[:maxRate], input[maxRate:]}// buf returns the active portion of the rate buffer.func (s *State) buf() []byte { return s.rateBuf[:s.bufLen]}// padAndPermute applies domain separation and multi-rate padding then transition the sponge from// absorbing to squeezing.func (s *State) padAndPermute(dsbyte byte) { padIndex := s.bufLen s.bufLen = s.rate buf := s.buf() buf[padIndex] = dsbyte for i := padIndex + 1; i < s.rate; i++ { buf[i] = 0 } // Final padding bit. buf[s.rate-1] ^= 0x80 // XORing the last-1-byte s.permute() s.phase = Squeezing s.bufLen = s.rate s.copyOut(buf)}// permute applies the Keccak-f[1600] permutation depending on sponge phase.func (s *State) permute() { switch s.phase { case Absorbing: // XOR buffered input into the state then permute. s.xorIn(s.buf()) s.bufLen = 0 s.lanes.PermuteWith1600() case Squeezing: // Permute state and extract new output block. s.lanes.PermuteWith1600() s.bufLen = s.rate s.copyOut(s.buf()) }}// clone returns a shallow copy of the sponge state.func (s *State) clone() *State { ret := *s return &ret}// Sum finalizes the sponge and returns the fixed-length hash output.func (s *State) Sum(in []byte) []byte { dup := s.clone() hash := make([]byte, dup.outputLen) dup.Read(hash) return append(in, hash...)}// Reset clears the sponge state for reuse.func (s *State) Reset() { for i := range s.lanes { s.lanes[i] = 0 } s.phase = Absorbing s.bufLen = 0}// xorIn XORs a rate-sized block into the Keccak state.//// The input block is interpreted as little-endian 64-bit words and XORed lane-by-lane into the state.// Only the first rate/8 lanes are modified, the remaining lanes forms the sponge capacity and are// untouched.//// This function is used during the absorbing phase of the sponge.func (s *State) xorIn(block []byte) { bits := len(block) / 8 for i := 0; i < bits; i++ { s.lanes[i] ^= binary.LittleEndian.Uint64(block) block = block[8:] }}// copyOut copies output from the Keccak state into the provided buffer.//// State lanes are written in little-endian order one lane (8 bytes) at a time. And only the rate// portion of the state is exposed, capacity lanes are never copied out.//// This function is used during the squeezing phase of the sponge.func (s *State) copyOut(b []byte) { for i := 0; len(b) >= 8; i++ { binary.LittleEndian.PutUint64(b, s.lanes[i]) b = b[8:] }}
Appendix: Full Keccak Permutation Source Code
package keccakimport "math/bits"// Lanes represents the 1600-bit Keccak state as 25 lanes of 64 bits each.//// The state is conceptually a 5x5 matrix of lanes://// l[x, y] with 0 <= x,y < 5//// It is stored in linear form using the mapping://// l[x, y] --> lanes[x + 5*y]//// This layout matches the Keccak-f[1600] specification and is used by all permutation// steps (θ, ρ, π, χ, ι).type Lanes [25]uint64// idx maps 2D Keccak coordinates (x, y) into a linear index.//// The state is stored in row order://// l[x, y] ---> l[x + 5y]func idx(x, y int) int { return x + 5*y }// RhoOffsets computes the rotation left by a fixed amount that depends on its position.// The offsets are defined by the Keccak specification and ensure that bits are spread// across different bit positions in subsequent rounds.func RhoOffsets() [25]int { var rho [25]int x, y := 1, 0 for t := 0; t < 24; t++ { rho[x+5*y] = ((t + 1) * (t + 2) / 2) % 64 x, y = y, (2*x+3*y)%5 } return rho}// Theta implements the θ step of Keccak.//// Theta mixes the bit with the parity of its column and adjacent columns.// This step provides diffusion across the entire state.func (l *Lanes) Theta() { var col, diagonal [5]uint64 for x := 0; x < 5; x++ { col[x] = l[idx(x, 0)] ^ l[idx(x, 1)] ^ l[idx(x, 2)] ^ l[idx(x, 3)] ^ l[idx(x, 4)] } // Compute the diagonal correction for each column. for x := 0; x < 5; x++ { diagonal[x] = col[(x+4)%5] ^ bits.RotateLeft64(col[(x+1)%5], 1) } // Apply the correction to every lane in the column. for x := 0; x < 5; x++ { for y := 0; y < 5; y++ { l[idx(x, y)] ^= diagonal[x] } }}// Rho implements the ρ steps of Keccak.//// Each lane is rotated left by a position-specific offset.// This step moves bits within each lane to different bit positions.func (l *Lanes) Rho(rhoOffsets [25]int) { for i := 0; i < 25; i++ { l[i] = bits.RotateLeft64(l[i], rhoOffsets[i]) }}// Pi implements the π step of Keccak.//// Pi permutes the positions of the lanes within the state, rearranging them to break column-wise// structure.func (l *Lanes) Pi() { var a [25]uint64 for x := 0; x < 5; x++ { for y := 0; y < 5; y++ { a[idx(y, (2*x+3*y)%5)] = l[idx(x, y)] } } *l = a}// Chi implements the χ step of Keccak.//// Chi is the only non-linear step. It operates on each row independently combining bits using// AND, NOT, and XOR operations.func (l *Lanes) Chi() { for y := 0; y < 5; y++ { var row [5]uint64 for x := 0; x < 5; x++ { row[x] = l[idx(x, y)] } for x := 0; x < 5; x++ { // Triple bluff: // // 1. Negation (^row[(x+1)%5]) inverts the bits of the neighbouring // lane breaking linear expectations. // // 2. AND creates (^row[(x+1)%5] & row[(x+2)%5]) conditional dependency // on two neighbours. A bit is returned only if both conditions are met. // // 3. XOR with the original lane (⊕ row[x]) injects this conditional mask // to back into the state but output cannot be recomputed as a linear function // of the inputs. l[idx(x, y)] = row[x] ^ (^row[(x+1)%5] & row[(x+2)%5]) } }}// Strike applies the Iota(ι) step of Keccak.//// The round constant is XORed into lane (0, 0) to break symmetry between rounds.func (l *Lanes) Strike(round int) { l[0] ^= RC[round]}// Round applies the one round of the Keccak-f[1600] permutation.//// Each round consist of:// θ → ρ → π → χ → ιfunc (l *Lanes) Round(round int) { l.Theta() l.Rho(RhoOffsets()) l.Pi() l.Chi() l.Strike(round)}// PermuteWith1600 applies the full Keccak-f[1600] permutation.//// It executes 24 rounds over the 1600-bit state.// This permutation is used by SHA-3 and SHAKE.func (l *Lanes) PermuteWith1600() { for r := 0; r < 24; r++ { l.Round(r) }}
Conclusion
SHA-3 represents a fundamental shift in the design of cryptographic hash functions. Abandoning the conventional *block-chaining *construction in SHA-2 and adopting an entirely different approach with an internal state and a mechanical permutation, SHA-3 achieves strong diffusion, clean domain separation, and flexible output generation within a unified framework.
SHA-3 derives its security not from hiding linear structure, but from exposing it. The linear diffusion rules of the Keccak permutation are fully public and reversible; security arises because this known linear diffusion is repeatedly conditioned by carefully placed non-linear transformations that irreversibly entangle the internal state. As a result, linear reasoning spreads information efficiently but cannot be composed to recover or predict the hidden state. In SHA-3, non-linearity is the mechanism that makes the diffusion one-way.