Native sound format

SpyderTL · Post by **SpyderTL** » Sat Jan 17, 2015 8:22 pm

I'm trying to decide on a native audio stream format -- something that I can convert any audio source to, and later convert to any other format.

The most obvious one would be PCM 44.1 16-bit stereo, but I'm also considering using a Float based format, as well.

I'm also thinking about the possibility of creating a few audio stream readers that would read a particular format, and convert each sample to several other formats on-the-fly.

On an unrelated note, I've put my networking/Internet code away for a while, but I did have one question that you guys may be able to answer. If you are sending IP packets to a machine on your local subnet, I know you can sent the Ethernet packets directly to the target MAC address. However, does anyone know what happens if you just send every packet to the gateway, even for local network machines? Will the gateway route them back to the local network, or do they get dropped? Just curious.

Thanks guys.

Edit: More thoughts on the Float audio format... The only reason I'm considering using it is so that all audio can be normalized between 0 and 1 (or -1 and 1). This would be much easier to deal with when generating waveforms, and processing the data in-line (changing volume, adding effects, etc.), and using the audio data for non-audio purposes, like rendering visualizations, or converting to various formats.

It seems working with integer based formats would require more work. However, most audio will be pulled from MP3 files, so I'm not sure which approach would work better in that case. Anyone happen to know anything about compressed audio formats?

embryo · Post by **embryo** » Sun Jan 18, 2015 4:13 am

SpyderTL wrote:The most obvious one would be PCM 44.1 16-bit stereo, but I'm also considering using a Float based format, as well.

Float is a bit less efficient from the storage and performance point of view. It requires more space (including some unused exponent bits) and it's processing is more expensive because 16-bit words can be processed twice as fast as float's 4 bytes.

But if your goal is simplicity, then float can deliver a uniform way of storing and processing of many formats (after conversion, of course). However, a uniformity here is justs a data format, so you can chose any other format and use it in a uniform way, just like floats, achieving even simpler solution due to removal of exponent and float-to-integer conversion.

More interesting variants of storage/processing can be found if audio is expressed as a bigger (than 16-bit or float) data structure with compression and many other features combined. Other features can include anything from a composition name and up to song text, music notes, related compositions and embedded artificial intelligence

Octocontrabass · Post by **Octocontrabass** » Sun Jan 18, 2015 5:59 am

SpyderTL wrote:The most obvious one would be PCM 44.1 16-bit stereo, but I'm also considering using a Float based format, as well.

The two most useful sample rates for audio are 44.1kHz and 48kHz. All modern audio hardware supports 48kHz natively, and most is also capable of 44.1kHz. Lower rates start throwing away audible information, and higher rates are a waste of space and processing power.

16 bits is enough for storing recordings with proper mastering, but float is the most useful for performing inline manipulation of the audio data due to its immense dynamic range.

Stereo is certainly good enough for most users, but keep in mind you may eventually want to support more channels.

There are also some more exotic features you may wish to implement in the future, but I'll ignore those for now as they are way beyond what you're looking at for the moment.

SpyderTL wrote:It seems working with integer based formats would require more work. However, most audio will be pulled from MP3 files, so I'm not sure which approach would work better in that case. Anyone happen to know anything about compressed audio formats?

MP3 doesn't store PCM samples at all. It stores parameters to a Fourier transform that produces the samples. The Fourier transform is usually floating-point (though fixed-point decoders exist), so floating-point makes the most sense in this situation.

Why is most audio coming from MP3 files? There have been many improvements in lossy audio compression in the past 20 years.

SpyderTL · Post by **SpyderTL** » Sun Jan 18, 2015 12:57 pm

Yeah, I probably should have said multi-channel instead of stereo, and compressed instead of MP3...

So, now I'm thinking about using Audio Stream Reader objects instead of picking a single native format. Anyone that's used an XMLReader in .NET should understand how this would work.

Let's take MP3 data, for an example. With a MP3Reader object (or any reader object, for that matter), you would call one method if you want 8-bit unsigned values, or call a different method if you want 16-bit signed values, or a third method if you want float values. Each call would give you one value, but it would also update several fields on the reader object that you could use to determine which channel that value belongs to, and the time offset for that value, which would allow you to "read" data for any number of channels, at any bit rate. The calling function would then decide what to do with the data -- use it directly, combine it with other data (ex. stereo to mono), or throw it away (for example, if the source bit rate is too high, and the timestamp is earlier than the time that is needed at the moment). The MP3 reader object would decompress the MP3 data, as needed, but other readers could exist to read AAC streams, WAV files, or pull live audio from a microphone or line in on the sound card.

Hopefully, I can make all of this work fast enough to keep a consistent stream of data supplied to the sound card (using the OS level object framework that I currently have). I'll start by creating a simple sine wave generator, and see if I can get it to supply data to the sound card at 44.1x16x2, and go from there.

Octocontrabass · Post by **Octocontrabass** » Sun Jan 18, 2015 3:25 pm

SpyderTL wrote:compressed instead of MP3...

Lossless or lossy? (MP3 and similar are lossy.)

Lossy encoders and decoders usually operate on floats, due to the ease of maintaining appropriate precision during the various transformations that are applied on the signal, though decoders that use only integers are popular for embedded applications that often don't have good (or any!) floating-point capabilities.

Lossless encoders and decoders almost always operate on integers, due to the difficulty of maintaining deterministic results with floats.

SpyderTL wrote:or throw it away (for example, if the source bit rate is too high, and the timestamp is earlier than the time that is needed at the moment).

That will sound pretty bad. You'll probably want to implement a better sample rate converter once you get everything working. (Also, you mean sample rate, not bit rate.)

Geri · Post by **Geri** » Fri Jan 23, 2015 9:32 am

44,1 khz + short int + stereo for me.
if your sound blaster code is working, please inform me on skype.

SpyderTL · Post by **SpyderTL** » Fri Jan 23, 2015 5:45 pm

My soundblaster 16 (DSP 4.05) code is "working", in that I can play audio clips in VirtualBox.

I am waiting on a VirtualBox update to fix an issue with its interrupt flags register before I get back into it. But if you are up for the challenge, I can update the ISO on codeplex for you to test with.

There are several different ways to play audio on the SB16, and I'm only using one of them at the moment (16-bit stereo 44100 DMA auto-initialize), but it's better than nothing.

I'll let you know when the ISO is ready.

EDIT: The ISO is ready. It's uploaded to https://ozone.codeplex.com.

Just type "Audio.GetDevices" and hit enter. If it says "Creative SoundBlaster ISA", then it's working!

The code sends the Get DSP Version command (0xE1) to port 0x22c, and reads two bytes from port 0x22a. It expects 0x04 and 0x05 to be returned. Once you have that working, you can type "Audio.GetDevices.First.Play(SineWave.Create(220.0, 44100))" and hit enter to hear a sine wave for a few seconds. You can change the frequency (220.0) if you want a different note, and you can also use "SquareWave" and "TriangleWave", as well.

This method uses Port 220, Interrupt 7 and DMA channel 5 to communicate with the sound blaster. Eventually, it will autodetect these values...

The ISO version above ignores the interrupt status register (Mixer register 0x82), but I will re-enable that code when VirtualBox is fixed.

Let me know if you run into any problems.

Thanks.

Octocontrabass · Post by **Octocontrabass** » Sat Jan 24, 2015 3:50 am

SpyderTL wrote:ISO

Do you have a floppy disk image that I can use to try it with my Sound Blaster 16? (It would also need to run on a 386, which might be a bigger problem than fitting your OS on a floppy disk...)

SpyderTL · Post by **SpyderTL** » Sat Jan 24, 2015 10:32 am

Yep. It's up on CodePlex, now.

Octocontrabass · Post by **Octocontrabass** » Sun Jan 25, 2015 4:37 pm

Unfortunately, the floppy disk doesn't work on the 386. It sits for a while with the disk light on, then presumably triple-faults and reboots.

I did notice your boot code does a few things that might not work on a PC old enough to have a real Sound Blaster 16:

INT 0x15 AX=0x2401 (sys:EnableA20Gate) is known to be unreliable
INT 0x15 EAX=0xE820 probably isn't implemented
CR0.NE requires at least a 486DX

As I said earlier, getting it to run on a 386 might be a bigger problem.

Looks like I'll have to pull out a different test box...

Octocontrabass · Post by **Octocontrabass** » Sun Jan 25, 2015 6:50 pm

I pulled out something a little newer (the BIOS date is 1998 instead of 1991), so I can test both the floppy disk and CD version.

The floppy disk reboots immediately.

The CD gets me to a functional command line, but doesn't detect the installed AWE64. (Not that I expected it to.) It also likes to reboot a lot: pressing tab on a blank command line prints some text and then reboots. Trying to get the ACPI processor count reboots. Trying to get the status of the (nonexistent) first audio device reboots.

You might want to invest in an error handler that does something other than reboot.

Geri · Post by **Geri** » Sun Jan 25, 2015 7:43 pm

thankyou, i will try it, i hope i can add sound support in the next version.

carbonBased · Post by **carbonBased** » Sun Jan 25, 2015 7:46 pm

Why do you need *a* native sound format?

Why not implement a pluggable system and allow for many sound formats?

--Jeff

SpyderTL · Post by **SpyderTL** » Mon Jan 26, 2015 4:04 pm

Octocontrabass wrote:Unfortunately, the floppy disk doesn't work on the 386. It sits for a while with the disk light on, then presumably triple-faults and reboots.

I did notice your boot code does a few things that might not work on a PC old enough to have a real Sound Blaster 16:
INT 0x15 AX=0x2401 (sys:EnableA20Gate) is known to be unreliable
INT 0x15 EAX=0xE820 probably isn't implemented
CR0.NE requires at least a 486DX
As I said earlier, getting it to run on a 386 might be a bigger problem.

Looks like I'll have to pull out a different test box...

Yep.. Sorry about that. I don't have a machine old enough to test that kind of stuff. I could comment all of that code out and everything would probably work fine.

Let me know if you can't get it working and I'll see what I can do...

Maybe the SoundBlaster wasn't the way to go, after all

The floppy disk reboots immediately.

That's disappointing, but good to know. I'd be interested to know what the specs are on that machine.

The CD gets me to a functional command line, but doesn't detect the installed AWE64. (Not that I expected it to.) It also likes to reboot a lot: pressing tab on a blank command line prints some text and then reboots.

How much RAM is on that machine? I am making some assumptions about available RAM, but that code "should" work with anything over 4 MB.

Trying to get the ACPI processor count reboots. Trying to get the status of the (nonexistent) first audio device reboots. You might want to invest in an error handler that does something other than reboot.

For now, the code assumes that you won't call anything that's not available. I started to work on handling errors and exceptions, but that opened up a pretty big can of worms, so I decided to put it on my to-do list.

SpyderTL · Post by **SpyderTL** » Mon Jan 26, 2015 4:13 pm

carbonBased wrote:Why do you need *a* native sound format?

Why not implement a pluggable system and allow for many sound formats?

--Jeff

Using a standard audio format was my initial idea, but I think that I've decided to go with a more dynamic approach, and let the "consuming" application decide what format it wants, and force the "provider" of the data to provide the data in several key formats (8-bit unsigned, 16-bit signed, 32-bit float, etc.).

I posted more information above, if you are interested. Here is a snip:

So, now I'm thinking about using Audio Stream Reader objects instead of picking a single native format. Anyone that's used an XMLReader in .NET should understand how this would work.

Let's take MP3 data, for an example. With a MP3Reader object (or any reader object, for that matter), you would call one method if you want 8-bit unsigned values, or call a different method if you want 16-bit signed values, or a third method if you want float values. Each call would give you one value, but it would also update several fields on the reader object that you could use to determine which channel that value belongs to, and the time offset for that value, which would allow you to "read" data for any number of channels, at any bit rate. The calling function would then decide what to do with the data -- use it directly, combine it with other data (ex. stereo to mono), or throw it away (for example, if the source bit rate is too high, and the timestamp is earlier than the time that is needed at the moment). The MP3 reader object would decompress the MP3 data, as needed, but other readers could exist to read AAC streams, WAV files, or pull live audio from a microphone or line in on the sound card.

Hopefully, I can make all of this work fast enough to keep a consistent stream of data supplied to the sound card (using the OS level object framework that I currently have). I'll start by creating a simple sine wave generator, and see if I can get it to supply data to the sound card at 44.1x16x2, and go from there.

OSDev.org

Native sound format

Native sound format

Re: Native sound format

Re: Native sound format

Re: Native sound format

Re: Native sound format

Re: Native sound format

Re: Native sound format

Re: Native sound format

Re: Native sound format

Re: Native sound format

Re: Native sound format

Re: Native sound format

Re: Native sound format

Re: Native sound format

Re: Native sound format