Audio Device Design

Jeko · Post by **Jeko** » Sat Jul 05, 2008 10:43 am

After many hours of seeing OSes code, I've managed to develop my own audio device driver infrastructure. (also the video, but I'll talk in another post of this)
I've seen that audio devices are only simple block devices. If you want to play something you can write to the device, if you want to change DSP speed, values of the mixer, etc. you must use IOCTLs.
At least this is what I've understood. Is it right?

Brendan · Post by **Brendan** » Sat Jul 05, 2008 1:04 pm

Hi,

Jeko wrote:I've seen that audio devices are only simple block devices. If you want to play something you can write to the device, if you want to change DSP speed, values of the mixer, etc. you must use IOCTLs.
At least this is what I've understood. Is it right?

That depends on what the OS provides.

I'd probably prefer a "play sound <fileName> at <time> from <position> with <volume> using <effects>" style of interface; where the sound driver loads the file from disk (if it's not already cached), converts it into whatever format the sound card wants, applies any special effects, adjusts the volume for each speaker, then mixes it with any data already in the output buffer so that it's heard at exactly the right time.

The old "everything is a file (except when it's an IOCTL)" theory isn't really that useful IMHO (especially for a sound driver where many different applications should be able to write to the same "file" at the same time)...

Cheers,

Brendan

Brynet-Inc · Post by **Brynet-Inc** » Sat Jul 05, 2008 1:14 pm

Brendan wrote:The old "everything is a file (except when it's an IOCTL)" theory isn't really that useful IMHO (especially for a sound driver where many different applications should be able to write to the same "file" at the same time)...

That's why we have sound servers/audio mixers, like ESD/JACK/aRts.. etc..

I think it's very useful, but I guess there is where preferences kick-in... I for instance think it's a brilliant design, (Everything a file, and some may *also* have IOCTL's..).

The fact that *you* don't think it's a good design doesn't mean it's a bad one..

@Jeko, OpenBSD's audio/mixer(4) api-layer is device independent, there are several *character* devices, not block devices.

To quote the man page:

audio(4) wrote:There are four device files available for audio operation: /dev/audio, /dev/sound, /dev/audioctl, and /dev/mixer. /dev/audio and /dev/sound are used for recording or playback of digital samples. /dev/mixer is used to manipulate volume, recording source, or other audio mixer functions. /dev/audioctl accepts the same ioctl(2) operations as /dev/sound, but no other operations. In contrast to /dev/sound, which has the exclusive open property, /dev/audioctl can be opened at any time and can be used to manipulate the audio device while it is in use.

OpenBSD provides a OSS-compatible layer, so userland applications tend to use that..

bewing · Post by **bewing** » Sat Jul 05, 2008 8:49 pm

At the very lowest driver level, that is the way a sound card works. You preset the DSP speed, and then you send an enormous amount of data to it (usually through DMA). The data is in PCM format (think of Windoze .WAV files) -- with a 16 bit value for each channel's voltage at each time interval. It might be a good idea to always set the DSP speed to 44.1 kHz, permanently.

And as Brendan and Brynet say, an audio mixing driver is a wonderful thing. You have to be able to build your drivers so that they can send their output to other drivers, rather than always directly to hardware. All applications send their audio output to the mixer driver, and not the low level driver. The mixer driver is asynchronous and accepts many streams of audio data. It adjusts all the data streams to the same datarate, adds them together, and sends it all on to the synchronous lowest level audio driver, that spools the data directly to the device.

You could also try (as Brendan seems to be suggesting) to combine both of these aspects into just one big driver -- but I'd say that keeping them separate adds a bit of flexibility. You will need to be creating "stacks" of drivers anyway, for other devices (USB, TCP/IP).

Brendan · Post by **Brendan** » Sun Jul 06, 2008 1:30 am

Hi,

bewing wrote:You could also try (as Brendan seems to be suggesting) to combine both of these aspects into just one big driver -- but I'd say that keeping them separate adds a bit of flexibility. You will need to be creating "stacks" of drivers anyway, for other devices (USB, TCP/IP).

Keeping the mixing and the low level device driver/s separate would be more flexible, if only to support unusual situations (e.g. one stereo sound card for front_left and front_right speakers, and another stereo sound card for back_left and back_right speakers).

However, applications can't do more than the interfaces they use allow, so interfaces should be designed to handle the most complex situations you can imagine. This basically means that the audio interface should be designed for high quality 3D games - if it can handle everything a 3D game wants, then it can handle anything any application will throw at it.

So, imagine you're writing a 3D game, and the player is facing north standing next to a railway track. The railway track runs from north-east to south-west. From overhead, it looks like this (where FL, FR, BL and BR are speakers):

Code: Select all

FL        FR
       /
      /
     /
    /^
   /
BL        BR

Your game knows that in 2 seconds time a train will pass by.

Your game knows that it'll need to play "train_engine1.wav" (in a loop) in 2 seconds time, and that the sound needs to fade in on the front_right speaker, then shift from the front_right speaker to the front_left speaker to the back_left speaker, then fade out on the back_left speaker. Your game also knows that it'll need to play "train_wheel_click_clack.wav" many times, from various points along the train as it passes by (with correct sound positioning used each time the sound is played).

In addition, the sound files use 22 KHz sampling with 8-bit data, but the sound card/s want 44 KHz with 16-bit data; and it needs to give the best possible results regardless of how many speakers there are, where the speakers are in relation to the player/user and the characteristics of each speaker (e.g. frequency response).

Also, you don't want to load the sound data from disk every time the sounds are played - the data should be cached somewhere, and (for performance reasons) you don't want to convert the sound data into a different format each time it's used (it's better to cache pre-converted data so that it's already in the correct format when it's needed).

Now, how much of this should the game itself have to do, how much should the game offload onto the mixer, and how much should the mixer offload onto the driver/s?

Myself, I'd probably want a mixer interface (that the game uses) that has something like the following:

soundRef = loadSoundFile(fileName);
soundRef = loadSoundData(data, format);
soundInstance = playSoundOnce(soundRef, starting_time, 3D_starting_position, 3D_displacement_vector, volume);
soundInstance = playSoundLoop(soundRef, starting_time, 3D_starting_position, 3D_displacement_vector, volume);
stopSound(soundInstance, ending_time);
moveSound(soundInstance, change_time, 3D_displacement_vector);
changeVolume(soundInstance, change_time, new_volume);

That way the game doesn't need to touch the sound data itself, and can tell the mixer where sounds start from and the direction and speed the sounds are moving. The "3D_displacement_vector" means that the game doesn't need to constantly update the sound's position, and the game only needs to tell the mixer if the sound changes direction.

For the sound drivers, the interface would end up something like:

channels= getChannels();
channelDetails = getChannelDetails(channel);
setOutputData(channel, change_time, soundData);

In this case, there's one channel per speaker, and the sound driver maintains a buffer containing "N seconds" of data (to be played in future) for each channel. The "setOutput()" is used by the mixer to replace/overwrite the sound data starting at a specific time. The "channelDetails" would be a structure of information about the channel (speaker's position relative to the user, dynamic range, frequency response, etc).

Lastly, I guess you could slap a "cat sound.dat > /dev/sound" interface on top of this for legacy purposes...

Cheers,

Brendan

jal · Post by **jal** » Thu Jul 10, 2008 6:56 am

Brendan wrote:Your game knows that it'll need to play "train_engine1.wav" (in a loop) in 2 seconds time, and that the sound needs to fade in on the front_right speaker, then shift from the front_right speaker to the front_left speaker to the back_left speaker, then fade out on the back_left speaker. Your game also knows that it'll need to play "train_wheel_click_clack.wav" many times, from various points along the train as it passes by (with correct sound positioning used each time the sound is played).

You forget Doppler shifting!

JAL

jal · Post by **jal** » Thu Jul 10, 2008 6:57 am

bewing wrote:It might be a good idea to always set the DSP speed to 44.1 kHz, permanently.

Why? I'd set it to the highest possible value, 96kHz seems common, nowadays.

JAL

bewing · Post by **bewing** » Thu Jul 10, 2008 8:18 am

44.1KHz per channel was chosen for CD Audio because anything beyond that is already beyond the limits of human hearing -- especially if you use a cubic spline to interpolate points between samples (inside the audio device's firmware) to reduce high frequency (harmonic) noise. So setting it to a higher number is just wasting bandwidth / CPU power.
Also, speaker systems generally have no ability to respond at frequencies higher than about 30KHz.
Most audio programmers / applications understand these facts, and you will find that 44.1KHz is by far the most common datarate for audio output from applications -- so your driver will not have to do any datarate conversion if you set its speed to 44.1KHz, for most apps.

JackScott · Post by **JackScott** » Fri Jul 11, 2008 4:39 am

44.1K samples/sec is only 22.05KHz effectively (at a theoretical maximum) due to the fact that a cycle must have a positive sample and a negative sample. So a speaker with 30KHz response rates could deal with samples up to 60K samples/sec.

Also, DVDs have a sample rate of 48K samples/sec. Higher bit rates become more common as we have more cycles to spare. My sound card can do 192K samples/sec, and I have it set that way for at least some of each day as I do audio editing.

OSDev.org

Audio Device Design

Audio Device Design

Re: Audio Device Design

Re: Audio Device Design

Re: Audio Device Design

Re: Audio Device Design

Re: Audio Device Design

Re: Audio Device Design

Re: Audio Device Design

Re: Audio Device Design