Silent data corruption — bit rot, undetected disk errors, firmware bugs, faulty memory, incomplete writes — is a real and measurable phenomenon in any storage system operating at scale. Research from CERN, Google, and NetApp has documented corruption rates that, while individually rare, become statistically significant when you manage thousands of files over years.
Fixity checking — computing and verifying cryptographic hashes of stored files — is the primary defence. But doing it well at scale requires careful decisions about when to hash, what hash metadata to store, how often to verify, and how to handle the inevitable detection of corruption.
Why fixity matters
Every file in a storage system can be silently modified by:
- Bit rot: random bit flips in storage media over time
- Firmware bugs: storage controller firmware that silently corrupts data during read/write operations
- Incomplete writes: power loss or crash during a write operation that leaves a file in a partially written state
- Silent disk errors: sectors that return incorrect data without signalling an I/O error
- Software bugs: application-level bugs that modify files unintentionally
Without fixity checking, you discover corruption only when a user reports a broken download, an archive fails to extract, or a backup restores a corrupted file. By then, the good copy may already be gone from your backup rotation.
Hashing strategy
Which hash algorithm
SHA-256 is the standard choice for fixity in 2026:
- Widely supported across all platforms and tools
- 256-bit output is collision-resistant for the foreseeable future
- Fast enough for bulk hashing with hardware acceleration (SHA-NI instructions on modern x86)
- Compatible with integrity manifests, SRI, and most verification tooling
SHA-512 offers a wider output and can be faster on 64-bit systems without SHA-NI, but the extra bits provide no practical benefit for fixity checking.
BLAKE3 is significantly faster than SHA-256 (especially on multi-core systems) and equally secure, but tooling support is still catching up. Use it if your verification pipeline supports it.
MD5 and SHA-1 are broken for collision resistance and should not be used for new fixity implementations. They are acceptable only for compatibility with existing systems that cannot be upgraded.
When to hash
Hash at these points in the file lifecycle:
- At ingest: when a file first enters your system. This establishes the baseline hash.
- After transfer: when a file is copied to a new location (backup, mirror, CDN origin). Compare the hash at the destination with the origin hash.
- Periodically at rest: schedule regular verification of stored files against their recorded hashes.
- Before serving: optionally, verify the hash before serving a download to a user (adds latency; appropriate for high-assurance scenarios).
What to store
For each file, store:
filepath: /archives/project-v2.4.1.tar.gz
sha256: 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
size: 248791040
hash_time: 2026-03-01T12:00:00Z
source: ingest
The size field is a cheap first check — if the file size doesn't match, something is wrong without needing to compute the hash. The hash_time tells you when the hash was last verified, and source indicates whether this hash is from original ingest or a periodic verification.
Manifest formats
Simple checksum files
The traditional SHA256SUMS format:
9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08 project-v2.4.1.tar.gz
a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2 project-v2.4.0.tar.gz
Verify with: sha256sum -c SHA256SUMS
Advantages: universal tooling support, human-readable, trivial to generate.
Disadvantages: no metadata beyond filename and hash; no provenance information.
Structured manifests (JSON/YAML)
For richer metadata:
{
"files": [
{
"path": "project-v2.4.1.tar.gz",
"sha256": "9f86d081...",
"size": 248791040,
"created": "2026-03-01T12:00:00Z"
}
],
"manifest_version": "1.0",
"generated": "2026-03-01T12:05:00Z"
}
Better for programmatic consumption, but requires custom verification tooling.
Periodic verification schedule
How often to verify depends on your risk tolerance and storage characteristics:
- Critical archives (signed releases, legal documents): weekly or daily
- Standard content (general file storage): monthly
- Cold storage (rarely accessed backups): quarterly
- Active serving paths (CDN origin): continuously via sample-based checking
Sample-based checking
For very large collections (millions of files), verifying everything daily is impractical. Instead:
- Verify a random sample of N files per day
- Over time, every file gets checked
- Prioritise recently-ingested files and files that haven't been verified recently
- Track verification coverage: what percentage of files have been verified in the last 30/90/365 days?
Detecting and handling corruption
When a hash mismatch is detected:
Immediate response
- Quarantine the file: do not serve, copy, or back up a corrupted file
- Log the event: record the file path, expected hash, actual hash, detection time, and storage device
- Alert operators: corruption detection is an urgent event that may indicate a larger problem
Recovery
- Check backups: find the most recent backup with a matching hash
- Check mirrors: if the file exists on multiple mirrors, verify which copies are correct
- Restore from verified copy: replace the corrupted file with a verified-good copy
- Re-verify after restore: confirm the restored file hashes correctly
Root cause analysis
A single corrupted file may be a random event. Multiple corrupted files on the same storage device suggest a hardware problem:
- Check disk SMART data for signs of degradation
- Check recent I/O error logs
- Test the storage device with vendor diagnostic tools
- Consider proactively migrating data off the affected device
Common mistakes
Storing hashes on the same disk as the data. If the disk fails catastrophically, you lose both the files and the hashes. Store fixity manifests on a separate storage system.
Hashing only at ingest and never again. A single hash at ingest proves the file was correct when received. It does not detect corruption that occurs afterward. Periodic verification is essential.
Using fast but weak hashes for "performance." CRC32 and Adler32 are fast but detect only a fraction of corruption patterns. Use SHA-256 — the performance cost is acceptable on modern hardware.
Not tracking verification timestamps. Without knowing when each file was last verified, you cannot assess your overall integrity coverage or prioritise verification of unverified files.
Ignoring file size as a pre-check. Checking file size before computing a hash is a free early-exit that catches truncated files, incomplete copies, and some corruption scenarios without any computation.
Verification checklist
- Generate
SHA256SUMSfor your archive directory:sha256sum /path/to/archives/* > SHA256SUMS - Verify immediately:
sha256sum -c SHA256SUMS— all files should pass - Schedule periodic re-verification (cron job or equivalent)
- Store
SHA256SUMSon a separate storage system - Set up alerting for any verification failure
- Test recovery: intentionally corrupt a test file and verify the detection and recovery pipeline works
Related reading on wplus.net
- Tamper-Evident Downloads — signing for provenance verification
- Infrastructure hub — hosting and serving fundamentals
- Operations hub — monitoring and diagnostics