Origins

Origins of data compression

Data compression began as a simple observation: common things deserve shorter descriptions.

Historical background page prepared for the Codec Help reference structure.

Short version

Compression is the art of representing information with fewer bits. Sometimes it keeps everything exactly, which is called lossless compression. Sometimes it removes details that are considered less important, which is called lossy compression.

Modern codecs use both ideas. A video codec may predict what changes from frame to frame, transform image detail into frequency information, quantize less-visible detail, and then use entropy coding to store the result efficiently.

A short timeline

1830s: Morse code Common letters are given shorter signals, making it an early practical example of variable-length coding.

1948: Information theory Claude Shannon gives engineers a mathematical way to talk about information, uncertainty, and limits of compression.

1949: Shannon-Fano coding Claude Shannon and Robert Fano describe probability-based variable-length codes.

1952: Huffman coding David A. Huffman publishes an optimal method for constructing minimum-redundancy prefix codes.

1970s: LZ77 and LZ78 Abraham Lempel and Jacob Ziv publish dictionary-style compression methods that influence ZIP, GIF, PNG, and many general-purpose compressors.

1980s onward: media codecs Audio, image, and video codecs combine prediction, transforms, psychoacoustics, quantization, and entropy coding.

The big idea

Most real-world data contains patterns. Text repeats letters and words. Images contain flat areas and repeated textures. Video frames resemble the frames before and after them. Audio has frequencies and masking effects that human hearing does not treat equally.

Compression works by exploiting these patterns. The cleaner the pattern, the more easily it can be described in fewer bits.

Lossless Restores the original data exactly.

Lossy Trades perfect reconstruction for much smaller files.

Entropy coding Uses shorter codes for more likely symbols.

Why this matters for codecs

When people compare MP3, AAC, Opus, H.264, H.265, AV1, FLAC, or WAV, they are often comparing different answers to the same old question: how can we keep the useful information while spending fewer bits?

That is why a codec page is easier to understand once you know the roots of compression. Codecs are not magic. They are layers of practical tricks built on older mathematical ideas.

Origins of data compression

Short version

A short timeline

The big idea

Why this matters for codecs

Related pages