Audio Device Design

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Audio Device Design

Post by Jeko »

After many hours of seeing OSes code, I've managed to develop my own audio device driver infrastructure. (also the video, but I'll talk in another post of this)
I've seen that audio devices are only simple block devices. If you want to play something you can write to the device, if you want to change DSP speed, values of the mixer, etc. you must use IOCTLs.
At least this is what I've understood. Is it right?
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling

http://sourceforge.net/projects/jeko - Jeko Operating System
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Audio Device Design

Post by Brendan »

Hi,
Jeko wrote:I've seen that audio devices are only simple block devices. If you want to play something you can write to the device, if you want to change DSP speed, values of the mixer, etc. you must use IOCTLs.
At least this is what I've understood. Is it right?
That depends on what the OS provides.

I'd probably prefer a "play sound <fileName> at <time> from <position> with <volume> using <effects>" style of interface; where the sound driver loads the file from disk (if it's not already cached), converts it into whatever format the sound card wants, applies any special effects, adjusts the volume for each speaker, then mixes it with any data already in the output buffer so that it's heard at exactly the right time.

The old "everything is a file (except when it's an IOCTL)" theory isn't really that useful IMHO (especially for a sound driver where many different applications should be able to write to the same "file" at the same time)...


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brynet-Inc
Member
Member
Posts: 2426
Joined: Tue Oct 17, 2006 9:29 pm
Libera.chat IRC: brynet
Location: Canada
Contact:

Re: Audio Device Design

Post by Brynet-Inc »

Brendan wrote:The old "everything is a file (except when it's an IOCTL)" theory isn't really that useful IMHO (especially for a sound driver where many different applications should be able to write to the same "file" at the same time)...
That's why we have sound servers/audio mixers, like ESD/JACK/aRts.. etc.. ;)

I think it's very useful, but I guess there is where preferences kick-in... I for instance think it's a brilliant design, (Everything a file, and some may *also* have IOCTL's..).

The fact that *you* don't think it's a good design doesn't mean it's a bad one.. ;)

@Jeko, OpenBSD's audio/mixer(4) api-layer is device independent, there are several *character* devices, not block devices.

To quote the man page:
audio(4) wrote:There are four device files available for audio operation: /dev/audio, /dev/sound, /dev/audioctl, and /dev/mixer. /dev/audio and /dev/sound are used for recording or playback of digital samples. /dev/mixer is used to manipulate volume, recording source, or other audio mixer functions. /dev/audioctl accepts the same ioctl(2) operations as /dev/sound, but no other operations. In contrast to /dev/sound, which has the exclusive open property, /dev/audioctl can be opened at any time and can be used to manipulate the audio device while it is in use.
OpenBSD provides a OSS-compatible layer, so userland applications tend to use that..
Image
Twitter: @canadianbryan. Award by smcerm, I stole it. Original was larger.
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: Audio Device Design

Post by bewing »

At the very lowest driver level, that is the way a sound card works. You preset the DSP speed, and then you send an enormous amount of data to it (usually through DMA). The data is in PCM format (think of Windoze .WAV files) -- with a 16 bit value for each channel's voltage at each time interval. It might be a good idea to always set the DSP speed to 44.1 kHz, permanently.

And as Brendan and Brynet say, an audio mixing driver is a wonderful thing. You have to be able to build your drivers so that they can send their output to other drivers, rather than always directly to hardware. All applications send their audio output to the mixer driver, and not the low level driver. The mixer driver is asynchronous and accepts many streams of audio data. It adjusts all the data streams to the same datarate, adds them together, and sends it all on to the synchronous lowest level audio driver, that spools the data directly to the device.

You could also try (as Brendan seems to be suggesting) to combine both of these aspects into just one big driver -- but I'd say that keeping them separate adds a bit of flexibility. You will need to be creating "stacks" of drivers anyway, for other devices (USB, TCP/IP).
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Audio Device Design

Post by Brendan »

Hi,
bewing wrote:You could also try (as Brendan seems to be suggesting) to combine both of these aspects into just one big driver -- but I'd say that keeping them separate adds a bit of flexibility. You will need to be creating "stacks" of drivers anyway, for other devices (USB, TCP/IP).
Keeping the mixing and the low level device driver/s separate would be more flexible, if only to support unusual situations (e.g. one stereo sound card for front_left and front_right speakers, and another stereo sound card for back_left and back_right speakers).

However, applications can't do more than the interfaces they use allow, so interfaces should be designed to handle the most complex situations you can imagine. This basically means that the audio interface should be designed for high quality 3D games - if it can handle everything a 3D game wants, then it can handle anything any application will throw at it.

So, imagine you're writing a 3D game, and the player is facing north standing next to a railway track. The railway track runs from north-east to south-west. From overhead, it looks like this (where FL, FR, BL and BR are speakers):

Code: Select all

FL        FR
       /
      /
     /
    /^
   /
BL        BR
Your game knows that in 2 seconds time a train will pass by.

Your game knows that it'll need to play "train_engine1.wav" (in a loop) in 2 seconds time, and that the sound needs to fade in on the front_right speaker, then shift from the front_right speaker to the front_left speaker to the back_left speaker, then fade out on the back_left speaker. Your game also knows that it'll need to play "train_wheel_click_clack.wav" many times, from various points along the train as it passes by (with correct sound positioning used each time the sound is played).

In addition, the sound files use 22 KHz sampling with 8-bit data, but the sound card/s want 44 KHz with 16-bit data; and it needs to give the best possible results regardless of how many speakers there are, where the speakers are in relation to the player/user and the characteristics of each speaker (e.g. frequency response).

Also, you don't want to load the sound data from disk every time the sounds are played - the data should be cached somewhere, and (for performance reasons) you don't want to convert the sound data into a different format each time it's used (it's better to cache pre-converted data so that it's already in the correct format when it's needed).

Now, how much of this should the game itself have to do, how much should the game offload onto the mixer, and how much should the mixer offload onto the driver/s?

Myself, I'd probably want a mixer interface (that the game uses) that has something like the following:
  • soundRef = loadSoundFile(fileName);
    soundRef = loadSoundData(data, format);
    soundInstance = playSoundOnce(soundRef, starting_time, 3D_starting_position, 3D_displacement_vector, volume);
    soundInstance = playSoundLoop(soundRef, starting_time, 3D_starting_position, 3D_displacement_vector, volume);
    stopSound(soundInstance, ending_time);
    moveSound(soundInstance, change_time, 3D_displacement_vector);
    changeVolume(soundInstance, change_time, new_volume);
That way the game doesn't need to touch the sound data itself, and can tell the mixer where sounds start from and the direction and speed the sounds are moving. The "3D_displacement_vector" means that the game doesn't need to constantly update the sound's position, and the game only needs to tell the mixer if the sound changes direction.

For the sound drivers, the interface would end up something like:
  • channels= getChannels();
    channelDetails = getChannelDetails(channel);
    setOutputData(channel, change_time, soundData);
In this case, there's one channel per speaker, and the sound driver maintains a buffer containing "N seconds" of data (to be played in future) for each channel. The "setOutput()" is used by the mixer to replace/overwrite the sound data starting at a specific time. The "channelDetails" would be a structure of information about the channel (speaker's position relative to the user, dynamic range, frequency response, etc).

Lastly, I guess you could slap a "cat sound.dat > /dev/sound" interface on top of this for legacy purposes... :roll:


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Re: Audio Device Design

Post by jal »

Brendan wrote:Your game knows that it'll need to play "train_engine1.wav" (in a loop) in 2 seconds time, and that the sound needs to fade in on the front_right speaker, then shift from the front_right speaker to the front_left speaker to the back_left speaker, then fade out on the back_left speaker. Your game also knows that it'll need to play "train_wheel_click_clack.wav" many times, from various points along the train as it passes by (with correct sound positioning used each time the sound is played).
You forget Doppler shifting!


JAL
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Re: Audio Device Design

Post by jal »

bewing wrote:It might be a good idea to always set the DSP speed to 44.1 kHz, permanently.
Why? I'd set it to the highest possible value, 96kHz seems common, nowadays.


JAL
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: Audio Device Design

Post by bewing »

44.1KHz per channel was chosen for CD Audio because anything beyond that is already beyond the limits of human hearing -- especially if you use a cubic spline to interpolate points between samples (inside the audio device's firmware) to reduce high frequency (harmonic) noise. So setting it to a higher number is just wasting bandwidth / CPU power.
Also, speaker systems generally have no ability to respond at frequencies higher than about 30KHz.
Most audio programmers / applications understand these facts, and you will find that 44.1KHz is by far the most common datarate for audio output from applications -- so your driver will not have to do any datarate conversion if you set its speed to 44.1KHz, for most apps.
User avatar
JackScott
Member
Member
Posts: 1031
Joined: Thu Dec 21, 2006 3:03 am
Location: Hobart, Australia
Contact:

Re: Audio Device Design

Post by JackScott »

44.1K samples/sec is only 22.05KHz effectively (at a theoretical maximum) due to the fact that a cycle must have a positive sample and a negative sample. So a speaker with 30KHz response rates could deal with samples up to 60K samples/sec.

Also, DVDs have a sample rate of 48K samples/sec. Higher bit rates become more common as we have more cycles to spare. My sound card can do 192K samples/sec, and I have it set that way for at least some of each day as I do audio editing.
Post Reply