July 23 2023

Sound Properties: All You Need to Know

Mahmoud Elzeiny DEEP LEARNING Deep Learning, Machine Learning, NLP, Sound, Sound properties, Speech, Speech recognition 0

Sound is an essential element of our sensory experience, sound plays a pivotal role in shaping our understanding of the environment and communicating complex emotions. This exploration seeks to unravel the intricate mechanisms that govern sound waves, frequencies, amplitudes, and the astonishing versatility of sound in various mediums, offering a comprehensive guide into the science behind the the Sound Properties. In this article we’ll cover the following topics:

What are sound waves?
How is sound digitally recorded?
What are the common sound properties?
What is noise?

Let’s start by asking what are sound waves?

A sound wave is the pattern of disturbance caused by the movement of energy traveling through a medium (such as air, water or any other liquid or solid matter) as it propagates away from the source of the sound. Sound waves are created by object vibrations and produce pressure waves, for example, a ringing cellphone.

The pressure wave disturbs the particles in the surrounding medium, and those particles disturb others next to them, and so on. The pattern of the disturbance creates outward movement in a wave pattern, like sea water in the ocean.

The wave carries the sound energy through the medium, usually in all directions and less intensely as it moves farther from the source.

How is sound digitally recorded?

Sound is digitally recorded through a process known as analog-to-digital conversion. The previously mentioned waves are continuous. The digital systems need just a bunch of numbers to be able to read and deal with. analog sound waves are captured by a microphone and converted into electrical signals. These signals are then sampled at regular intervals by an analog-to-digital converter (ADC), which measures the amplitude of the sound wave at each sample point.

The resulting series of digital values are then encoded and stored as binary data, forming a digital audio file that can be played back and reproduced with remarkable fidelity, preserving the original sound’s nuances and clarity.

These new values are called audio frames, where each frame represents a value.

What are the common sound properties?

Loudness:

The higher/lower the amplitude of the sound wave the higher/lower its strength or loudness.

Channel:

Simply it represents the number of channels describing the audio wave.

Where each audio wave can either be described by:

A single channel for all speakers (Mono) as in Figure 4.
Two different channels for each speaker (Stereo).

higher number of channels means better quality. Refer to Figure 4 for a comparison.

Sampling rate:

The audio sample rate is a measurement of the samples per second taken by the system from a continuous digital signal, these frequencies are measured in hertz (Hz), and the human ear can hear between 20 hertz (20Hz) and 20 kilohertz (20kHz).

The most common sampling rate used is 44.1kHz is more than twice the top range of human hearing, this is due to Nyquist-Shannon theorem, which states that sampling rate must have twice the frequency of the original recording, otherwise the sound is not faithfully reproduced, a sampling rate lower than twice the original frequency will lead to aliasing.

Aliasing (Figure 6) is known as an effect that causes different signals to become indistinguishable when sampled, common sampling rates: 8, 16, 44.1, 48, 88.2 and 96 kHz.

Bit depth:

Bit depth is also referred to as word length, plays a crucial role in digital audio recording. In this context, the term “bit” represents the on/off status of each switch in a computer, adhering to the binary system where 1 signifies “on” and 0 signifies “off.” Consequently, digital audio data is conveyed through strings of binary digits or bits, enabling computers to interpret and execute various tasks.

The bit depth determines the precision of values in the recording process, analogous to using a ruler with finer increments for more accurate measurements. Higher bit depths, such as 16, 24, or 32 bits, offer increased precision, minimizing the need for the analog-to-digital converter to quantize and approximate values to the nearest measuring increment, thereby enhancing the overall fidelity and detail in the digital audio representation.

In Figure 8, you can observe the following 2 examples:

16-bit: We can store up to 65,536 levels of information
24-bit: We can store up to 16,777,216 levels of information

Bit rate:

Simply the multiplication of sampling rate, bit depth and number of channels.

Pitch:

The pitch of a sound is intricately linked to the frequency of vibration of its waves. When the frequency of vibration is high, the resulting sound is perceived as shrill and has a higher pitch. Conversely, sounds with a lower pitch exhibit a lower frequency of vibration. Thus, the relationship between pitch and frequency is such that higher frequencies are associated with higher-pitched sounds, while lower frequencies correspond to lower-pitched sounds. This fundamental connection between frequency and pitch governs our perception of the auditory world and contributes to the diversity of sounds we encounter in our everyday experiences.

In figure 9, you can observe the following examples:

A bird produces a high-pitched sound whereas roaring of a lion is a low-pitched sound.
The voice of a woman has a higher pitch than that of a man.

Now after understanding sound and its properties, What is noise?

Noise, in its essence, constitutes any sound that is undesired or unwelcome. However, the distinction between what qualifies as sound versus noise is subjective and contingent upon both the listener and the prevailing circumstances. What may be perceived as a soothing waterfall sound to one person might be an irritating noise to another. Moreover, even for the same individual, a sound can evoke pleasure on one occasion and annoyance on another. Thus, discerning noise from sound hinges on the specific task, purpose of the audio recording, and the context in which it is experienced. The subjective nature of this differentiation highlights the complexity of auditory perception and reminds us of the diverse ways in which we interact with the auditory world around us.

Examples:

Crowd sound in the background of a speaking person.
White noise (a random signal having equal intensity at different frequencies).
Pink noise (nearly like white noise but with a lower pitch).
Students sound in the background of a lecture.

Resources & References

Author

Mahmoud Elzeiny

View all posts

Sound Properties: All You Need to Know

Let’s start by asking what are sound waves?

How is sound digitally recorded?