> The encoder was mainly optimized for 48Khz audio. Get over it. It's 2026, resampling is free, 48Khz is the standard. 44.1Khz will work, and so will 96Khz but use 48Khz if you want the best quality.
More or less. Streaming is often done with 48, video content has ben 48 for a while now, so unless you still produce content for CDs it is the standard.
44100 Hz had reasons no longer really needed (storing audio in 3 samples per line in VHS: 490 lines × 3 samples × 30 GPS = 44100 sample/s).
Qualitywise both are more than enough snd 99.99% of people would not be able to tell it apart in a blind test. Higher sample rates than 48kHz only needed when you want to pitch down ultrasonic recordings (of whales, bats and other such animals for example).
Aside from this higher than 48 kHz sample rates may have only downsides, like increased size and potential distortion in the ultrasonic frequency range that has sidebands in the audible range. Yet there is a persistent, but unscientific "more-is-better"-crowd in the HiFi-sector.
> Higher sample rates than 48kHz only needed when you want to pitch down ultrasonic recordings (of whales, bats and other such animals for example).
There are numerous use cases for higher sample rates that go beyond this but it's hard to talk about it without starting flame wars filled with junk science.
Say it or don't but "I have evidence otherwise but don't think I should say" is just as bad a flame war gateway as tempting the junk science audiophiles directly.
I know that with oscilloscopes, it’s recommended to use 5x instead of nuquist 2x of the highest frequency you want to use., but the most reasonable argument I’ve heard for higher than 48kHz sampling is digital audio effects.
But for the end result 48kHz is more than necessary. I can’t even hear any frequency above 17kHz.
yeah for real time signals higher frequency makes sense (very briefly before you fft and kill the high frequencies), but for stored signals nyquist is king.
> I know that with oscilloscopes, it’s recommended to use 5x instead of nuquist 2x of the highest frequency you want to use.
For capturing analog signals, 2.5X is enough headroom.
The 5X recommendation is probably for digital signals where the frequency refers to the baud rate, not the highest frequency coming through. A fast switching digital signal will have components with higher bandwidth than the fundamental. Using a higher multiple of samples (assuming the bandwidth is there) will let you see the shape of the waveform and rise and fall times better.
> But for the end result 48kHz is more than necessary. I can’t even hear any frequency above 17kHz
And even if you could, would the frequencies that all humans lose with age really be all that essential for the enjoyment of music? We are talking about frequencies most instruments won't even produce unless severely abused.
For some reasons in audiophile-land the magic is always in some elusive outer realms and never right there where the important stuff happens. They spend a fortune on speaker cables, while often not giving a second thought on room acoustics beyond the cosmetic. The magic sparkle is all the way in the ultrasonic, while their listening spaces have deep nulls in the mid-range due to comb filtering from reflective surfaces caused by a lack of acoustic treatment.
I love music (enough to have mixed it for a living) and to me it is very clear how the priorities are ordered when it comes to audio fidelity:
1. Room Acoustics
2. Speakers
3. Electronics & Digital
Going from the back: Assuming you don't get the cheapest of the cheapest and don't abuse the gear by making it do things it wasn't build for electronics and digital audio nowadays is transparent. That means, it essentially sounds the same if operated within spec. Even a 0,50 € IC will have distortion figures so staggeringly low it is below human perception and equipment is getting better still. A decent opamp can have distortion figures like 0.005 % THD with a linear frequency response all the way up to radio frequencies. There can be challenges with driving very weird speakers or headphones, but if you hsve the right combination of gear it doesn't have to be expensive to be indistinguishably good in it's audio performance.
This means speakers are way more important thsn the electronics before it. Their distortion numbers are multiple magnitudes higher (in the ball park of 3% THD), their frequency response is inherently problematic (often many dBs up and down even in expensive speakers), they will hsve different beaming characteristics st different frequencies, small speakers lack bass, placement is essential, etc. So getting good speakers is important.
But all of this is dwarfed by the impacts acoustics. The position of the speakers alone makes a huge difference. The impact of an acoustically untreated space is severe: you can get a completely smeared time response with deep nulls of 20dB and more while other frequencies are highly resonant. Even a budget speaker won't have problems of that magnitude.
So get some ok electronics, even more ok speakers, but invest the bulk of the money/time into the setup of the room itself.
Many adiophiles have that priority list reversed. Room acoustics suck. You need to measure a lot, add ugly absorbers in inconvenient places, can't place speakers where they look nice and conserve space, but need to place them where they work well acoustically, there is no ideal solution and everything is a compromise. So buying a gold plated HDMI cable and imagining the improvement appears to be better. Only that you might be doing it in a room where a positional difference of a few centimeters changes the frequency response of the listening position massively.
Higher sample rates are lower latency for the same block size and resampling is not "free" (pick 2: performance, aliasing, latency) so there can be advantages to working with audio archived at higher sample rates.
But all the advantages come down to professional or editing use cases. There's next to zero advantage to using it as a storage format for listening. Just like 24 bit audio (do you have an amp with 96dB SNR?).
Just personally, I have seen little evidence (personally, professionally, or academically) that there is any advantage for lossless audio for consumer applications. For professional applications there are plenty, and it's endlessly tiring to convince people that "no, actually I need 96kHz for my use case."
Where the audiophiles have _some_ argument here is the design of reconstruction filters which I've heard alleged can perform better in the audible frequency range if the stop band is outside of it. But I have never personally tested this, nor cared enough to. But the theory is sound.
Whether or not it's perceptible depends on what you're measuring, though. In theory, there should be perceptual differences in sound localization if your DAC's reconstruction filter is at 24kHz vs 48kHz since it will change the group delay in a critical frequency region, where you'll get sound at >~2kHz arriving later at the lower sample rate. I think it would be extremely hard to test this though, because humans are really shitty at sound localization to begin with, and practically speaking most recorded material is processed to shit in that frequency range to intentionally decorrelate the channels for the perception of "width."
> Higher sample rates are lower latency for the same block size
This a truly bizarre statement. On the one hand, of course higher sampling rates are lower latency for the same block size measured in samples. But all sampling rates have (almost [0]) identical latency for the same block size measured in time and lower sampling rates allow less computation for those shorter blocks.
[0] If you are concerned about needing to know future samples in order to calculate the actual signal amplitude at a time between samples, then (a) this matters less at higher sampling rates and (b) this is at most a small number of samples and we're talking about block sizes that presumably exceed, say, 5, so this isn't really a big deal.
The unit of a block size is samples (frames, technically), not seconds. When configuring audio devices for playback you tune both sample rate and block size for latency. It used to be far more common to tune sample rate than block size alone for tracking. This is getting into the weeds of actual devices though.
Also to your point, this is why compliant peak meters use a mandatory 4x upsampling at 48k.
> Just personally, I have seen little evidence (personally, professionally, or academically) that there is any advantage for lossless audio for consumer applications
I think the advantage of lossless audio is for archival: rip once, archive as lossless; then you can reencode your library with the latest and greatest lossy encoders over time, or just use the lossless if your player can manage it, cpu and storage is less of a limiting factor for players than 20 years ago.
I don't know how many people are actually managing their libraries these days though, so I dunno if makes a huge difference.
I wouldn't call archiving a consumer application but I understand the point. Really it gets back to the word: fidelity. Some say it means "truth" but really it's latin for faithful or in the context of audio, perceptually identical (a faithful representation). Even among highly trained and skilled listeners, lossy codecs are faithful and imperceptible.
Unless you also have a pretty decent monitoring system the group delay of the speakers isn't going to be consistent so the filters before them wouldn't matter all that much...
Even in that case I would have a hard time believing that any human in a blind test would be able to perceive a group delay of even 360deg above 2k...
You are talking about sub milliseconds differces in the time frequency content arrives at the ears, just tiling your head slightly will have a greater impact...
VHS doesn't store audio in samples nor does it have 490 lines or 30 G(?)PS. NTSC uses 525 lines per frame and PAL uses 625, both with interlacing at 60 fields per second. The VHS system is analog for audio and video, though analog video has discrete lines, and VHS records discrete stripes on the tape which should be one field each.
44100 was chosen for CD, as 20kHz upper limit of human hearing, doubled for Nyquist theorem, plus a 10% guard band so that anti-aliasing filters don't have to be made of magical fairy dust, plus a bit (maybe to make it relatively prime with something else in the system).
The first digital audio systems encoded the audio as a black-and-white video signal on video tapes. 44100 HZ was selected at it was the highest sampling rate achievable on both NTSC and PAL video tapes.
AAC has a strange quirk that the window size is dependent on the sampling rate, thus requiring a complete psychoacoustics reoptimization of all encoder parameters for each sampling rate, since a 20msec window sounds very different than a 60msec window, to human ears.
I think the closest thing to an actual "standard" is AES5-2018, "Recommended practice for professional digital audio".
Abstract:
> A sampling frequency of 48 kHz is recommended for the origination, processing, and interchange of audio programs employing pulse-code modulation. Recognition is also given to the use of a 44.1-kHz sampling frequency related to certain consumer digital applications, the use of a 32-kHz sampling frequency for transmission-related applications, and the use of a 96-kHz sampling frequency for applications requiring a higher bandwidth or more relaxed anti-alias filtering. This revision further quantifies the preferred choices for higher sampling frequencies.
Edit: From my personal perspective, 44.1kHz is a legacy minor annoyance
Yes and no. It is the standard for audio in film, which explains the author's focus. But is the audio CD bigger and more "standarder" than DVD and Blu-Ray? I think they're equals, and I personally think this encoder only makes sense for video content. Given all the caveats the author mentions (in particular about the sample rate) I would steer clear from using it when ripping CDs.
Pipewire will quite happily pipe through audio without resampling if it is the only source on a system. You can see this by running pw-top and using speaker-test with various sample rates.
Is 48kHz really the standard nowadays?
44.1kHz, isn't that what lameMP3 uses as default?
44100 Hz had reasons no longer really needed (storing audio in 3 samples per line in VHS: 490 lines × 3 samples × 30 GPS = 44100 sample/s).
Qualitywise both are more than enough snd 99.99% of people would not be able to tell it apart in a blind test. Higher sample rates than 48kHz only needed when you want to pitch down ultrasonic recordings (of whales, bats and other such animals for example).
Aside from this higher than 48 kHz sample rates may have only downsides, like increased size and potential distortion in the ultrasonic frequency range that has sidebands in the audible range. Yet there is a persistent, but unscientific "more-is-better"-crowd in the HiFi-sector.
There are numerous use cases for higher sample rates that go beyond this but it's hard to talk about it without starting flame wars filled with junk science.
But for the end result 48kHz is more than necessary. I can’t even hear any frequency above 17kHz.
For capturing analog signals, 2.5X is enough headroom.
The 5X recommendation is probably for digital signals where the frequency refers to the baud rate, not the highest frequency coming through. A fast switching digital signal will have components with higher bandwidth than the fundamental. Using a higher multiple of samples (assuming the bandwidth is there) will let you see the shape of the waveform and rise and fall times better.
And even if you could, would the frequencies that all humans lose with age really be all that essential for the enjoyment of music? We are talking about frequencies most instruments won't even produce unless severely abused.
For some reasons in audiophile-land the magic is always in some elusive outer realms and never right there where the important stuff happens. They spend a fortune on speaker cables, while often not giving a second thought on room acoustics beyond the cosmetic. The magic sparkle is all the way in the ultrasonic, while their listening spaces have deep nulls in the mid-range due to comb filtering from reflective surfaces caused by a lack of acoustic treatment.
I love music (enough to have mixed it for a living) and to me it is very clear how the priorities are ordered when it comes to audio fidelity:
1. Room Acoustics
2. Speakers
3. Electronics & Digital
Going from the back: Assuming you don't get the cheapest of the cheapest and don't abuse the gear by making it do things it wasn't build for electronics and digital audio nowadays is transparent. That means, it essentially sounds the same if operated within spec. Even a 0,50 € IC will have distortion figures so staggeringly low it is below human perception and equipment is getting better still. A decent opamp can have distortion figures like 0.005 % THD with a linear frequency response all the way up to radio frequencies. There can be challenges with driving very weird speakers or headphones, but if you hsve the right combination of gear it doesn't have to be expensive to be indistinguishably good in it's audio performance.
This means speakers are way more important thsn the electronics before it. Their distortion numbers are multiple magnitudes higher (in the ball park of 3% THD), their frequency response is inherently problematic (often many dBs up and down even in expensive speakers), they will hsve different beaming characteristics st different frequencies, small speakers lack bass, placement is essential, etc. So getting good speakers is important.
But all of this is dwarfed by the impacts acoustics. The position of the speakers alone makes a huge difference. The impact of an acoustically untreated space is severe: you can get a completely smeared time response with deep nulls of 20dB and more while other frequencies are highly resonant. Even a budget speaker won't have problems of that magnitude.
So get some ok electronics, even more ok speakers, but invest the bulk of the money/time into the setup of the room itself.
Many adiophiles have that priority list reversed. Room acoustics suck. You need to measure a lot, add ugly absorbers in inconvenient places, can't place speakers where they look nice and conserve space, but need to place them where they work well acoustically, there is no ideal solution and everything is a compromise. So buying a gold plated HDMI cable and imagining the improvement appears to be better. Only that you might be doing it in a room where a positional difference of a few centimeters changes the frequency response of the listening position massively.
But all the advantages come down to professional or editing use cases. There's next to zero advantage to using it as a storage format for listening. Just like 24 bit audio (do you have an amp with 96dB SNR?).
Just personally, I have seen little evidence (personally, professionally, or academically) that there is any advantage for lossless audio for consumer applications. For professional applications there are plenty, and it's endlessly tiring to convince people that "no, actually I need 96kHz for my use case."
Where the audiophiles have _some_ argument here is the design of reconstruction filters which I've heard alleged can perform better in the audible frequency range if the stop band is outside of it. But I have never personally tested this, nor cared enough to. But the theory is sound.
Whether or not it's perceptible depends on what you're measuring, though. In theory, there should be perceptual differences in sound localization if your DAC's reconstruction filter is at 24kHz vs 48kHz since it will change the group delay in a critical frequency region, where you'll get sound at >~2kHz arriving later at the lower sample rate. I think it would be extremely hard to test this though, because humans are really shitty at sound localization to begin with, and practically speaking most recorded material is processed to shit in that frequency range to intentionally decorrelate the channels for the perception of "width."
This a truly bizarre statement. On the one hand, of course higher sampling rates are lower latency for the same block size measured in samples. But all sampling rates have (almost [0]) identical latency for the same block size measured in time and lower sampling rates allow less computation for those shorter blocks.
[0] If you are concerned about needing to know future samples in order to calculate the actual signal amplitude at a time between samples, then (a) this matters less at higher sampling rates and (b) this is at most a small number of samples and we're talking about block sizes that presumably exceed, say, 5, so this isn't really a big deal.
Also to your point, this is why compliant peak meters use a mandatory 4x upsampling at 48k.
This isn't due to latency, it's because the true peak (in the analog waveform) could be between samples.
And if your goal is latency, it makes far more sense to change the block size rather than the sample rate.
> But all the advantages come down to professional or editing use cases.
That sounds about right.
I think the advantage of lossless audio is for archival: rip once, archive as lossless; then you can reencode your library with the latest and greatest lossy encoders over time, or just use the lossless if your player can manage it, cpu and storage is less of a limiting factor for players than 20 years ago.
I don't know how many people are actually managing their libraries these days though, so I dunno if makes a huge difference.
Unless you also have a pretty decent monitoring system the group delay of the speakers isn't going to be consistent so the filters before them wouldn't matter all that much...
Even in that case I would have a hard time believing that any human in a blind test would be able to perceive a group delay of even 360deg above 2k...
You are talking about sub milliseconds differces in the time frequency content arrives at the ears, just tiling your head slightly will have a greater impact...
44100 was chosen for CD, as 20kHz upper limit of human hearing, doubled for Nyquist theorem, plus a 10% guard band so that anti-aliasing filters don't have to be made of magical fairy dust, plus a bit (maybe to make it relatively prime with something else in the system).
The first digital audio systems encoded the audio as a black-and-white video signal on video tapes. 44100 HZ was selected at it was the highest sampling rate achievable on both NTSC and PAL video tapes.
This was of course fixed in Opus.
Abstract:
> A sampling frequency of 48 kHz is recommended for the origination, processing, and interchange of audio programs employing pulse-code modulation. Recognition is also given to the use of a 44.1-kHz sampling frequency related to certain consumer digital applications, the use of a 32-kHz sampling frequency for transmission-related applications, and the use of a 96-kHz sampling frequency for applications requiring a higher bandwidth or more relaxed anti-alias filtering. This revision further quantifies the preferred choices for higher sampling frequencies.
Edit: From my personal perspective, 44.1kHz is a legacy minor annoyance