Digital Audio Theory Fundamentals

March 5, 2024

Digital audio forms the backbone of music production, broadcasting, and countless other industries. With digital audio technology, we have unprecedented power to capture, manipulate, and reproduce sound in countless ways. From recording a hit song to designing immersive sound scapes for film, the possibilities are endless. But how do we bridge the gap between the analog world of sound waves and the realm of ones and zeroes? This article delves into digital audio theory, exploring the core principles that turn vibrations in the air into the music, podcasts, and soundtracks that enrich our lives.

What is Sound?

Sound is a series of pressure variations in the air around us. These variations ripple outward from a source, like the vibrations of a guitar string or a person's vocal cords. Microphones convert these pressure changes into continuously varying electrical voltages. In the days of analog recording, these voltages would be imprinted directly onto magnetic tape. But how do we bring this signal into the precise world of digital technology?

From Analog to Digital

To turn continuously varying signals into a digital representation, we need to perform a two-step process. First, we must take lots of "snapshots" of the audio signal over time; this is called sampling. The more samples we take per second, the more accurately we capture the shape of the sound wave. Second, each sample needs a precise measurement of its amplitude (essentially, how loud it is at that instant), a process called quantization. Imagine a graph where the smoothly curving line of the analog waveform is replaced with a series of steps, each step representing a sample's amplitude. We've now replaced the continuous flow of the original signal with a series of discrete numbers, each representing the sound at a specific moment in time. So, how do we turn these numbers into electrical signals, and back again? That's where converters come into play.

Sampling Steps

Analog to Digital and Back Again

The heart of digital audio systems relies on specialized circuits that bridge the gap between analog and digital worlds. The circuit that converts the incoming, continuously varying analog signal into a stream of digits is an Analog-to-Digital converter (ADC or A to D). Think of it as translating the fluid language of sound waves into precise numerical code a computer can understand. Conversely, to turn those digital files back into sound waves our ears can hear, we need a Digital-to-Analog converter (DAC or D to A). The DAC's job is to carefully reconstruct a smooth analog signal from those stored "snapshots" of the original sound.

ADC & DAC Converters

Sampling: Snapshots of Sound

Think of sampling as taking a series of ultra-fast photographs of the sound wave. Here's how it works:

  • Samples are taken at precise, regular intervals.
  • During each "snapshot," the amplitude (or height) of the signal at that exact moment is measured and immediately stored as a number with a defined resolution. The more detail we want in representing this amplitude, the more digital space it will need (this is what "defined resolution" means, and we'll cover it soon!)

The number of samples taken per second is called the sampling rate, commonly expressed in kilohertz (kHz). Standard rates include:

·       44.1 kHz (44,100 samples per second, the CD standard)

·       48 kHz (Common standard, often used for audio in video)

·       Higher rates like 88.2 kHz, 96 kHz, and even 192 kHz for high definition audio

·       Lower rates like 22kHz or 11kHz are used for older, low resolution devices.

Sampling Rate

Pulse Code Modulation

The process of sampling gives us snapshots of a sound wave, but how do we actually encode these into a format for storage and playback? The most common method is Pulse Code Modulation (PCM). In PCM, the measured amplitude of each sample is rounded to the nearest value within a set of numbers. The higher the number of possible values, the more accurately we can represent the intricacies of the original sound. Popular file formats like WAV and AIFF use PCM encoding. An interesting alternative, though less common, is Pulse Density Modulation (PDM), which represents amplitude by changes in the density of pulses rather than the magnitude.

Bit Depth (Word Size)

Bit depth determines how accurately we can represent the amplitude of each audio sample. Think of it as the number of "steps" available on a volume scale. A higher bit depth means a wider range of those steps, allowing us to capture smoother transitions from the quietest whisper to the loudest explosion. In digital audio, bit depth is the number of bits (the fundamental 0s and 1s of digital data) used to describe the possible values for each snapshot of the soundwave.

  • The Basics of Binary: Computers work with binary, a system where information is represented using only two digits: 0 and 1. Each digit in a binary number is called a "bit."
  • Bit Depth and Values: A 16-bit word size means we have 16 slots for those 0s and 1s (e.g., 0011010110101101). Each combination of 0s and 1s represents a unique number. Since there are two possible values (0 or 1) for each of the 16 slots, the total number of possible combinations is 2^16, or 65,536.
  • Loudness Representation: These 65,536 binary numbers map the range from complete silence to the loudest possible sound our 16-bit system can handle. Similarly, a 24-bit word has 2^24 (16,777,216) possible combinations, resulting in a much wider range of amplitude levels!

Practical Examples:

·       64-bit: Often used for internal calculations within computers, offering immense precision.

·       32-bit: Considered "high resolution" audio, sometimes used for recording and mastering.

·       24-bit: The standard in most modern DAWs, providing excellent dynamic range.

·       16-bit: The classic CD standard, sufficient for many forms of playback.

·       8-bit: Gives the "lo-fi" sound of older games and samplers, deliberately limited for a specific aesthetic.

Bit Depth

Dynamic Range

Bit depth directly impacts a digital audio system's dynamic range – the difference between the quietest and loudest sounds it can handle without distortion. Imagine dynamic range as a scale for loudness. More bits give us more finely spaced markings on that scale, letting us capture everything from delicate whispers to a full orchestra at its peak. Each additional bit adds approximately 6dB of dynamic range.

For example, a CD with its 16-bit format has a theoretical dynamic range of 96dB (16 x 6). This is sufficient for most music, but modern DAWs often use 24-bit audio for a reason. That wider 144dB (24 x 6) dynamic range provides greater flexibility, allowing engineers to capture soft details along with powerful crescendos, all with increased headroom to prevent distortion during recording and editing.

Nyquist Theorem

The Nyquist Theorem is a cornerstone of how we capture sound digitally. In essence, it states that to accurately represent an audio signal, your sampling rate (how often you take those "snapshots") must be at least twice the highest frequency you wish to record. This means if you want to capture frequencies up to 20kHz (the rough upper limit of human hearing), you need at least a 40kHz sampling rate.

The Nyquist Frequency is defined as half of the sampling rate. So, with a 44.1kHz sampling rate (standard for CDs), the Nyquist Frequency is 22.05kHz, a CD theoretically cannot reproduce frequencies higher than 22.05kHz. This is why higher sampling rates are used for "high definition" audio formats.

Why Twice the Frequency?

Think of trying to draw a wavy line by only putting dots on the paper. If the dots are too far apart, you won't be able to recreate the original shape. To accurately capture the wave's peaks and valleys, we need at least two samples per cycle of the highest frequency present. Sampling below the Nyquist limit leads to aliasing, a form of distortion where false, lower frequencies "fold" into the audible spectrum, making your audio sound harsh or unnatural.

Nyquist Theorem

Nyquist Theorem

Nyquist Theorem

Aliasing: The Unwanted Artifact

Aliasing is an unavoidable consequence of the sampling process. Essentially, if frequencies above the Nyquist Frequency are present in the original signal, they "fold back" into the audible range during the digital conversion, creating new frequencies that weren't there to begin with.

It's important to note that aliasing happens both above and below the sampling rate frequency. Mathematically, the aliased frequencies are the sum(+) and difference (-) between the incoming frequency and the sampling rate. So, if you have a 5kHz input tone and sample at 40kHz, you'd get aliases at 35kHz and 45kHz (both above human hearing).

The Problem When It's Audible

The trouble starts when aliasing creates frequencies within the audible range. Let's say you try to record a 4kHz tone with an insufficient 15kHz sample rate. You wouldn't just hear the original 4kHz. You'd also get an alias at 19kHz (the sum) and another at 11kHz (the difference), leading to a harsh, distorted sound.

Aliasing

Prevention: The Low Pass Filter

To combat this, anti-aliasing filters are essential. These are low-pass filters (LPF) placed before the ADC, aggressively cutting off frequencies above the Nyquist Frequency. With a 44.1kHz sample rate, for instance, the anti-aliasing filter blocks anything above 22.05kHz, ensuring only the desired frequencies reach the converter. This keeps aliases out of the audible range.

Aliasing

Summary

Succinctly: Sound, is a series of pressure variations in the air. To bring sound into the digital realm, we first need to convert these continuous vibrations into discrete numbers. This conversion process has two key steps: sampling and quantization. During sampling, we take regular "snapshots" of the sound wave's amplitude. Quantization measures those snapshots and represents each amplitude as a precise number with a defined resolution (bit depth). The Nyquist Theorem is crucial in this process. It states that our sampling rate must be at least double the highest frequency we want to capture. Failure to do so results in aliasing, where false frequencies intrude on our recording. Anti-aliasing filters are used to prevent this. Understanding these core principles is essential for capturing, storing, manipulating, and reproducing sound in the digital domain. From simple recording and playback to complex audio processing and synthesis, digital audio theory provides the foundation for a vast array of creative sonic possibilities.
Author Picture
By
Laurynas Ereksonas
Subscribe to our newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Explore Top-Quality Audio Production Services – Connect with Us Now

Diese9 specializes in bringing professional recording, mixing, and mastering services. Whether you're a solo artist, band, podcaster, etc. Experience hassle-free, high-quality audio production without the need for expensive equipment or technical know-how. Let us turn your audio visions into reality, with personalized service tailored to your unique needs.

Let's get started, how can we help you?

Select the type of service which you are interested in:
If you require video, audio repair, or any custom service, choose Mixing.
Next

What are we recording?

Is it a band/ensemble performace, your solo project or something else?
Next

Have you recorded yourself before?

Next

Tell us more about your project

Is there anything you want to let us know? Intruments, genre description, location, links to songs, deadlines, etc.

Tell us more about your project

Is there anything you want to let us know? How large is the project, genre description, links to references, delivery deadline, etc.

Contact details

Fill out your contact information, so that we can get in touch with you!
Preferred way we contact you
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.