Does digital audio work like digital images...ie more bits for highend?

Discussion in 'Audio Hardware' started by Kustom 250, Oct 16, 2008.

Thread Status:
Not open for further replies.
  1. Kustom 250

    Kustom 250 Active Member Thread Starter

    Location:
    Wisconsin
    I've worked with digital image files for a goodly part of my career and I've learned how to deal with the quirks of digital images.

    One thing I need to deal with a lot is how the data is divided up and stored. In the most basic terms there's more bits available as you go up in value, there's a lot more data in light values then in dark ones. I need to take this into account and use it to my advantage all the time.

    As I start dealing with needledrops and digital audio files I wonder if the same kind of thing is going on with digital audio files. Is there more data on the highend then on the low?


    Thanks to anyone who can lend a hand.
     
  2. Max F

    Max F Member

    I'll give this a shot.

    I think you want to record with a high enough input so that you are close to clipping at the loudest passages but you have all of the music well above the noise floor.

    Same with photos, you want to expose up to clipping so that you don't have too many shadows that can contain pixel noise.
     
  3. Joe Nino-Hernes

    Joe Nino-Hernes Active Member

    Location:
    Chicago, IL
    Bit depth determines maximum dynamic range. Sample rate determines maximum frequency response. Each additional bit gives you approximately 6 dB of dynamic range.

    A CD recorded at 16 bit 44,100 samples per second has a maximum dynamic range of 96 dB, and the highest frequency that can be recorded is 22,050 Hz.

    The highest frequency that can be recorded for a given sample rate is determined by dividing the sample rate in half.
     
  4. Dansk

    Dansk rational romantic mystic cynical idealist

    Location:
    Ontario, Canada
    It's the opposite, actually. CD audio has a sample rate of 44,100, so there are 44,100 samples in one second of music. Higher frequency sounds are waves that take less time to go from 0dB to peak and back again. So essentially the higher the frequency, the fewer samples there are in one peak.

    This isn't really a problem until you start getting up to the really high stuff, close to 20,000Hz. There's a cutoff point in CDs called the Nyquist frequency (22,050Hz, half the sample rate) where you're only getting one sample per peak. Naturally CDs can't reproduce anything higher than that frequency, because your waves aren't really waves anymore.

    So essentially, to answer your question, there's more data in a low frequency wave than there is in a high frequency one.
     
  5. bdiament

    bdiament Producer, Engineer, Soundkeeper

    Location:
    New York
    Hi Kustom,


    Added word length (i.e. more bits) in digital audio increases the dynamic range, just as it does with digital video --in video, the dynamic range being the range from black to white. (I believe some here are thinking in terms of frequency rather than dynamics, which is where the added bits come into play.)

    With audio, the added bits are put at the low end of the dynamic range (i.e. the quietest sounds). They key advantage of more bits in digital audio is better resolution of the quiet sounds. This is where the instrumental harmonics and much of the spatial information resides and in my view, is why many find 16/44 ("CD standard") digital audio somewhat harmonically thin and spatially "out of focus".

    Best regards,
    Barry
    www.soundkeeperrecordings.com
    www.barrydiamentaudio.com
     
  6. Kustom 250

    Kustom 250 Active Member Thread Starter

    Location:
    Wisconsin
    Thanks!
     
  7. Natt

    Natt Forum Resident

    Location:
    Acton, Canada
    Although higher frequencies are represented by less samples, they don't need as many samples either, so that balances out. Lower frequencies only need more samples when they contain higher frequencies. Because we record all frequencies up to a limit, we need a sample rate that's up to recording that.

    Both light and sound are perceived in such a way that a doubling of intensity is not perceived as double by the eye or ear. Thus a sensor that records linear light, when digitized has it's brightest areas of the image recorded with a vast over-precision compared to the darkest area of image. Either a gamma or a log curve is used to re-distribute those code values so that what is recorded is much more balanced, and depending on recording method, can be recorded more efficiently.

    With audio, log encoding of the linear bit depth values has been used as a simple form of lossy compression.

    So both in audio and images, if the data is linear, then there are an over-abundance of code values for loud / bright sounds / images.
     
  8. bdiament

    bdiament Producer, Engineer, Soundkeeper

    Location:
    New York
    Hi Natt,

    Can you explain this please?

    My understanding is that low frequencies are low frequencies and high frequencies are high frequencies. Lows don't "contain" highs and highs don't "contain" lows.

    Best regards,
    Barry
    www.soundkeeperrecordings.com
    www.barrydiamentaudio.com
     
  9. FalloutBoy

    FalloutBoy New Member

    Location:
    Sweden
    Assuming you already have sufficient dynamic range, I don't see how lowering the noise floor further would change the quiet sounds.

    "When we have enough bits to transmit a waveform we are able to transmit the entire dynamic range of the waveform completely transparently for the benefit of the human auditory system. If we use more bits than are necessary then we do not gain any benefits as far as the human auditory is concerned. The waveform will sound completely accurate with the fewest number of required bits, and more bits only lowers the quantization noise further below the noise inherent in the waveform. If this noise is already below the lowest level we can hear then the waveform does not sound any different if we lower the level of the quantization noise even further."
    -- Aldrich, N. (2004). Digital Audio Explained: For The Audio Engineer. Sweetwater
     
  10. bdiament

    bdiament Producer, Engineer, Soundkeeper

    Location:
    New York
    Hi FalloutBoy,

    The question becomes "What is sufficient dynamic range?"

    In a related function, it has already been shown (by KEF, decades ago and subsequently proven in numerous designs) that to capture all the frequencies we want, we need a bandwidth to multiples of the highest desired frequency. In other words, to properly capture a frequency, we need a bandwidth approximately 5 times greater than that frequency.

    Once we've determined what "sufficient" dynamic range is, it has yet to be shown just how much dynamic capability is required to successfully capture it, store it and play it back (the quote you offered notwithstanding).

    All I can suggest is, if the opportunity avails for you to make a recording using very high quality gear, do so at 16-bits and also at 24-bits. Then listen to the results on a good playback system and draw your own conclusions.

    Best regards,
    Barry
    www.soundkeeperrecordings.com
    www.barrydiamentaudio.com
     
  11. Metoo

    Metoo Forum Hall Of Fame

    Location:
    Spain (EU)
    Have you considered the fact that bits are registering amplitude? The change in amplitude over time is what makes the difference between the timbre of a piano and, say, a clavichord.

    Anybody in the know please correct me if I am wrong, but this is the reason why I prefer a high sampling rate/hogh bit depth combination. In this case, IMHO, you get enough samples to register the changes over time and the bit depth to register more exact changes in amplitude coinciding with each sample. This, again IMHO, translates into better instrument texture/timbre redendering and, as Barry mentioned before, better spatial information.
     
  12. Metoo

    Metoo Forum Hall Of Fame

    Location:
    Spain (EU)
    BTW, to reply to the OP: Yes, I believe there are many similarities. The same way that higher bit rates in images show more color realism, higher bit rates in music files show more timbre (instrument tone) fidelity.
     
  13. Rock Klammer

    Rock Klammer Formerly pompatusoflove

    Location:
    Clarkesville, Ga.
    Higher bit depth should also result in more voltage choices for the A/D or the D/A conversion. I.E. more resolution in the voltage domain. This should result in less guess work for the software code that has to fill in the blanks between the voltage steps in the converter.
     
  14. FalloutBoy

    FalloutBoy New Member

    Location:
    Sweden
    I may have misread you, but it sounds like you're confusing sample rate with modulator speed. The target sample rate only needs to be more than twice the Nyquist frequency, but the actual sampling is never done at that rate. All modern converters work at rates from 64 to over 1000 times the target sample rate.
    If, for example, the target sample rate is 44.1kS/s and the Sigma/Delta modulator works at 512x, then:

    1. First the analog signal is filtered by an analog anti-aliasing filter at 11.2896MHz (half the modulator speed).
    2. The sampling is done by the D/S-modulator at 22.5792MS/s (512x44.1kS/s).
    3. Then the signal is passed through a digital anti-aliasing filter that removes all information above 20kHz.
    4. Finally it's decimated (511 of every 512 samples are removed) to the target sample rate of 44.1kS/s (1x).

    So a target sample rate of 44.1kS/s is all that is needed to capture any waveform with a frequency of up to 20kHz.

    I'll let Dan Lavry (A/D-D/A developer) sum it up:

    "You DO NOT need more dots. There is NO ADDITIONAL INFORMATION in higher sampling rates. As pointed out by the VERY FUNDAMENTAL Nyquist theory, we need to sample at above twice the audio bandwidth to contain ALL the information."[1]

    "The optimal sample rate should be largely based on the required signal bandwidth. Audio industry salesmen have been promoting faster than optimal rates. The promotion of such ideas is based on the fallacy that faster rates yield more accuracy and/or more detail. Weather motivated by profit or ignorance, the promoters, leading the industry in the wrong direction, are stating the opposite of what is true."[1]

    How much dynamic range you need depends on what you are recording and how loud you are going to play it back.

    Since (as you wrote earlier) every bit adds 6dB to the dynamic range, it's not too difficult to calculate how many bits you need to capture the complete dynamic range of any given signal. There are however benefits of using higher bit depths when recording. More bits allows for more headroom to prevent clipping.

    And here is an example of much you need for playback:
    "[...]if the material is to be played back, the number of bits needed is limited to the amplitude with which the material is to be played. If the material is to be played at a level of peak=85dB SPL then the noise level of an optimal 16-bit system will be -11dB SPL, well below the threshold of hearing. If the material is to be played at a level of peak=85dB SPL, but the white noise floor in the room it is to played in is 45dB SPL then the maximum dynamic range that will be heard is around 40dB SPL. Even recognizing the human ability to hear up to 25dB or so below a given white noise floor, the maximum dynamic range that will be heard (including the range below the noise floor) is only 65dB. 65dB Of dynamic range only requires 11 bits. Therefore, an 11-bit recording will sound the same in this situation as a 24-bit recording."[2]

    I've actually done that. I've also experimented with various bit reduction technologies (24/44.1->16/44.1 w/ "colored" dither, 24/96->8/96 w/ noise shaping, etc.). The results where consistent with the technical literature: a difference in bit depth is a difference in noise floor and dynamic range (and thus a difference in when the white noise becomes audible).


    [1] Lavry, Dan. (2004). Sampling Theory For Digital Audio. Lavry Engineering, Inc.
    [2] Aldrich, Nika. (2004). Digital Audio Explained: For The Audio Engineer. Sweetwater
     
    krisbee likes this.
  15. FalloutBoy

    FalloutBoy New Member

    Location:
    Sweden
    Of course. That is after all the only thing they do :).

    You're not alone in drawing those conclusions, but it does not really work like that.

    There are only two variables in digital sampling technology: sample rate and bit depth. The first determines the bandwidth and the second the dynamic range.
    Increasing the sample rate or bit depth will only extend those limits. It will not increase the accuracy of the represented waveforms.
    As long as the captured waveforms are within those limits, they will be accurately represented.

    Bit depth is usually the variable most people have a difficult time understanding since it is a bit counterintuitive.
    Like the fact that just two quantization steps can describe a complete waveform with all its complexity and detail.
    It also requires a basic understanding of quantizing, noise and dither.
     
  16. Sgt. Pepper

    Sgt. Pepper Member

    Location:
    Pittsburgh, PA
    As far as bit depth is concerned, there is also the issue of quantization errors though. I would think that most recordings would have a dynamic range that would fit in the CD standard, but those low level sounds would not be accurately represented. Our ears wouldn't be able to handle a recording that utilized all 144db of dynamic range available in 24 bit, but the extra bits help with the low level sounds. Of course, you know more about this than I do.:)
     
  17. Metoo

    Metoo Forum Hall Of Fame

    Location:
    Spain (EU)
    I can understand this, but we also know that in digital audio there are always more sample representing the lower frequencies than the higher ones, no matter what the sampling frequency.

    On the other hand, the limitations I notice in redbook-quality music as compared to high bit depth + high resolution could be due to brickwall filter issues, but I don't think that those explain the fact that the higher bit depth examples usually render the music with a more majestic/full feeling, whereas 16 bits sound flat compared to them. What, in your view, would explain this?
     
  18. dartira

    dartira rise and shine like a far out superstar

    Perfect theory does not equal perfect implementation.

    I think the differences we hear between between 44.1 / 96K and 16 / 24 bits may have to do with a possible abundancy of information obscuring the inadequacies of the technology that's supposed to reproduce all this perfection.
     
  19. bdiament

    bdiament Producer, Engineer, Soundkeeper

    Location:
    New York
    Hi FalloutBoy,

    I am not confusing them. (Also, you are describing one particular approach, not how all A-D converters work.)


    I am familiar with Nyquist. It does a very good job of describing the requirements for digitally recording sine waves. Try looking at a swept square wave (which is closer to a musical wave than a sine wave) on an oscilloscope if you need evidence beyond what your ears provide.


    Playback volume is unrelated to the word length required to properly capture the sound of Life in a digital recording.

    What you describe would be fine if we were recording sine waves. After all, CD can do 96 dB (allegedly). But at the lower end of that 96 dB, the sound coarsens and is not very pretty or anything close to what occurred at the microphone inputs.


    I don't know what you've listened to or what recordings you've made and with what gear. If all you get from 24-bits over 16 is noise floor and dynamic range, then we just hear it very, very differently.

    Best regards,
    Barry
    www.soundkeeperrecordings.com
    www.barrydiamentaudio.com
     
  20. bdiament

    bdiament Producer, Engineer, Soundkeeper

    Location:
    New York
    Hi Michael,

    I think we are saying the same thing, just with different words.

    Some get lost in the theories and tend to get simplistic in their analysis. What is forgotten (or perhaps not experienced directly) is that most of the popular theoretical analyses are based on static signals, which are nothing like a music signal. These same theories were used 25 years ago in the attempt to convince us (successful in some quarters by the looks of it) that CD is "perfect".

    Best regards,
    Barry
    www.soundkeeperrecordings.com
    www.barrydiamentaudio.com
     
  21. Kustom 250

    Kustom 250 Active Member Thread Starter

    Location:
    Wisconsin

    I'm an artist, not a scientist. So I'm out of my depth here, but I'm a reasonably smart guy and I work with REALLY smart people all the time at the engineering school where I spend my days at.

    One of the areas I help make graphics/videos for is for the people who study the flow of liquids.

    All I know is they are constantly coming up with new ways to model those waveforms. They're not even close to getting them "perfect". And at least to me the waveforms they're dealing with aren't even as complex as sound. They're still working on modeling exactly how liquids flow out of the end of a pipe. Recorded sound seems a lot more complicated them that to me.


    Anyway. Thanks to everyone for helping me better understand digital audio.
     
  22. Taurus

    Taurus Senior Member

    Location:
    Houston, Texas
    That's basically what I also believe, and why people (including myself) usually hear more detail in parts of music that are recorded at low levels using 24 bit words vs. 16 bit.......hopefully this isn't a placebo effect! But I don't think that effect is at work because before I had even heard of the low-level detail issue, I noticed that "reverb tails" lasted longer while listening to my just-purchased copy of Eric Johnson's Ah Via Musicom dvd-audio. That disc used the 96kHz/24bit format. I've owned the CD version and listened to it regualrly since that album debuted in 1991(?) and am familiar with its sonic personality. So at the time I figured the longer tails were just a result of the increased sampling rate.

    But after researching more about those longer sample words I realized why the sound they represent should be of higher quality, theoretically anyway. Using some knowledge from my digital coursework back in the 80s (I have a little A.A.S. 2 year degree in industrial electronics) & my statistics courses, I scribbled out the following calculations.

    Voltage levels possible with 16 bits, using binary format (2 = two possible levels i.e. 1 or 0):

    2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 = 65,536 possible voltage levels

    Voltage levels possible with 24 bits:

    2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 = 16,777,216 levels

    So it's apparent the 24 bit word provides for much less "chunky" voltage divisions with which to build a waveform.

    Reality check!

    1) I read a few years ago many 24 bit convertors cannot actually resolve to such fine divisions, and act more like 20 bit convertors instead. But 20 bits is still much better than 16.

    2) I have yet to find out if the average human hearing system can itself can actually perceive all of the divisions possible with 24 bits.

    Lastly, I read on this forum somewhere that using more than X bits in a sample word means an engineer no longer has to use dithering noise (or at least much less of it) to generate a clean signal, but that kind of knowledge is beyond the scope of this writer's brain :D so maybe someone more knowledgable about that issue could comment on it.

    Edit: I wonder if increased bit depth is sort of equivalent to the contrast ratio for a monitor. Last night I was comparing plasma HDTVs to LCD HDTVs at an A/V retailer and noticed the plasmas' ratios were much larger, which IIRC is supposed to partly account for the blacker blacks & richer colors plasma sets can produce.
     
  23. bdiament

    bdiament Producer, Engineer, Soundkeeper

    Location:
    New York
    Hi Taurus,

    The benefits of the additional bits occur at the bottom of the dynamic range. They do not result in smaller "steps". The 16th bit from the top in a 24-bit file is identical to the 16th bit from the top in a 16-bit file. It is the 17th bit where the benefits of 24-bits begin.


    Dither is (should/need) only used when reducing the word length of a file, say from a 24-bit original to a 16-bit CD master. If you're staying at 24-bits or staying at 16-bits, there is no need for dither, so being at 24 doesn't make a difference. Dither only applies when word length is being changed to a smaller value.

    Best regards,
    Barry
    www.soundkeeperrecordings.com
    www.barrydiamentaudio.com
     
  24. Grant

    Grant Life is a rock, but the radio rolled me!

    Sure about that?
     
  25. FalloutBoy

    FalloutBoy New Member

    Location:
    Sweden
    Yes, but that does not matter. As long as you have more than 2 samples you can recreate any waveform.
    That is the fundamental point of the Nyquist theory. It's all or nothing: either the waveform can be recreated (it's within the bandwidth) or it can't be recreated (it's outside the bandwidth).

    To reiterate Dan Lavry's words:
    "You DO NOT need more dots. There is NO ADDITIONAL INFORMATION in higher sampling rates. As pointed out by the VERY FUNDAMENTAL Nyquist theory, we need to sample at above twice the audio bandwidth to contain ALL the information."

    There are many variables that can make material at different bit depths sound different, but most are related to hardware/software/listening issues and not the actual bit depths themselves.

    The only way to test the difference between bit depths is if everything else is equal. And with everything else being equal, the difference between bit depths is just a difference in noise floor and dynamic range. This was also confirmed by the only major listening test that has been published on this issue:

    "The test results show that the CD-quality A/D/A loop was undetectable at normal-to-loud listening levels, by any of the subjects, on any of the playback systems. The noise of the CD-quality loop was audible only at very elevated levels."[1]

    You need to play at levels of about 96dB SPL + the background noise of the room: >20dB SPL (that's over 116dB SPL) to hear the background noise of a 16 bit system.


    [1] Meyer, E. Brad; Moran, David R. (2007). Audibility of a CD-Standard A/DA/A Loop Inserted into High-Resolution Audio Playback. JAES Volume 55 Issue 9 pp. 775-779
     
Thread Status:
Not open for further replies.

Share This Page

molar-endocrine