Making audio complicated |
Introduction |
This chapter could be called "Advanced audio programming"
as well. However I'm trying to avoid advertising features described here
too much. These features are very useful or even necessary when used in
the right place. However they don't automatically make your application
better if used in situations when they are not necessary. Some of the features
to be presented below don't work with all devices (full
duplex audio and direct DMA access among
others) or make your application very operating system dependent (direct
DMA access).
It is assumed that you have perfect understanding about the features described in the introduction and basic audio sections of this guide. The features described here will work only if the guidelines defined in the basic sections have been followed carefully. |
Audio Internals |
Application program doesn't (normally) access audio hardware directly.
All data being recorded or played back is stored in a kernel DMA buffer
while the device is accessing it. The application uses normal read
and write calls to transfer data between the kernel buffer and
the buffer in the application's data segment.
The audio driver uses improved version of so called double buffering method. In the basic double buffering method there are two buffers. One of them is being accessed by the device while the other is being read or written by the application. When the device gets the first buffer processed, it moves to the other one. This process is repeated as long as the device is in use. This method gives the application time to do some processing at the same time when the device is running. This makes it possible to record and playback without pauses. The amount of time the application can spend on processing the buffer half depends on the buffer size and the data rate. For example when a program is recording audio using 8 kHz/8 bits/mono sampling, the data rate is 8 kilobytes/second. If there is 2*4 kbytes of buffer, it gives the application more than 0.5 seconds of time to store the data to disk and to come back to read from the device. If it spends more than 0.5 seconds, the buffer overruns and the driver has to discard some data. 0.5 seconds is pretty much of time to store 4 kbytes of data to disk. However things become more complicated when data rate is increased. For example with audio CD quality the data rate is 172 kilobytes/second the available time is just 23 milliseconds. It is about the same than worst case seek time of normal disk drives which means that recording is likely to fail. better results can be achieved by using larger buffers but it increases latencies related to the buffering. The method used by the audio driver of OSS could be called as multi-buffering. In this method the available buffer space is divided to several equally sized blocks which are called as fragments. In this way it is possible to increase available buffer size without increasing latencies related to the buffering. By default the driver computes the fragment size so that latencies are about 0.5 seconds (for output) or about 0.1 seconds (for input) using the current data rate. There is a ioctl call for adjusting the fragment size in case the application wants to use different size. [TODO: Insert an illustration here] Normal operation when writing to the deviceWhen the program calls write first time after opening the device, the driver performs the following steps:
When the application calls write second time, the data is simply stored to the playback buffer and internal pointers of the driver are updated accordingly. If the application has attempted to write more data than there is currently free space on the buffer, it will be forced to wait until one fragment gets completely played by the device. This is the normal situation with programs that work properly. They usually write data at least slightly faster than the device plays it. Sooner or later they get the buffer completely filled and the driver forces them to work at the same speed with the device. A playback underrun situation occurs when the application fails to write more data before the device gets earlier data completely played. This kind of underrun occurs if:
Normal operation when reading from the deviceWhen the program calls read first time after opening the device, the driver performs the following steps:
A recording overrun situation occurs if the device fills the recording buffer completely. If this happens, the device is stopped and further samples being recorded will be discarded. Reasons of recording overruns are very similar than causes of playback underruns. A very common situation where playback overrun may occur is recording of high speed audio directly to disk. In Linux this doesn't work except with very fast disk (in other environments this should not be a problem). |
Buffering - Improving real time performance |
Normally programs don't need to care about buffering parameters of
audio devices. However most of the features presented in this document
have been designed to work with full fragments. For this reason your program
may work better if it reads and writes data one buffer fragment at time
(please note that this is not normally required).
Determining buffering parametersThe driver computes the optimum fragment size automatically depending on sampling parameters (speed, bits and number of channels) and amount of available memory. Application may ask the buffer size by using the following ioctl call.int frag_size; if (ioctl(audio_fd, SNDCTL_DSP_GETBLKSIZE, &frag_size) == -1) error();The fragment size in bytes is returned in the frag_size. The application may use this value as the size when allocating (malloc) buffer for audio data and the count when reading from or writing to the device. NOTE! This ioctl call also computes the fragment size in case it has not already been done. For this reason you should call it only after setting sampling parameters or setting fragment size explicitly. NOTE2! Some (old) audio applications written for Linux check that the returned fragment size is between arbitrary limits (this was necessary with version 0.1 of the driver). New applications should not make this kind of test. The above call returns the "static" fragment size. There are two additional calls which return information about the live situation. audio_buf_info info; ioctl(audio_fd, SNDCTL_DSP_GETISPACE, &info); ioctl(audio_fd, SNDCTL_DSP_GETOSPACE, &info);The above calls return information about output and input buffering (respectively). The audio_buf_info record contains the following fields:
Number of full fragments that can be read or written without blocking. Note that this field is reliable only when the application reads/writes full fragments at time. Total number of fragments allocated for buffering. Size of a fragment in bytes. This is the same value than returned by ioctl(SNDCTL_DSP_GETBLKSIZE). Number of bytes that can be read or written immediately without blocking. Selecting buffering parametersIn some cases it may be desirable to select the fragment size explicitly. For example in real time applications (such as games) it is necessary to use relatively short fragments. Otherwise delays between events on the screen and their associated sound effects become too long. OSS API contains an ioctl call for setting the fragment size and maximum number of fragments.int arg = 0xMMMMSSSS; if (ioctl(audio_fd, SNDCTL_DSP_SETFRAGMENT, &arg)) error();Argument of this call is an integer encoded as 0xMMMMSSSS (in hex). The 16 least significant bits determine the fragment size. The size is 2SSSS. For example SSSS=0008 gives fragment size of 256 bytes (28). The minimum is 16 bytes (SSSS=4) and the maximum is total_buffer_size/2. Some devices or processor architectures may require larger fragments in this case the requested fragment size is automatically increased. The 16 most significant bits (MMMM) determine maximum number of fragments. By default the deriver computes this based on available buffer space. The minimum value is 2 and the maximum depends on the situation. Set MMMM=0x7fff if you don't want to limit the number of fragments. NOTE! This ioctl call must be used as early as possible. The optimum location is immediately after opening the device. It is NOT possible to change fragmenting parameters second time without closing and reopening the device. Also note that calling read(), write() or the above three ioctl calls "locks" the buffering parameters which may not be changed after that. NOTE2! Setting the fragment size and/or number of fragments too small may have unexpected results (at least in slow machines). UNIX is multitasking environment where other processes may use CPU time unexpectedly. The application must ensure that the selected fragmenting parameters provide enough "slack" so that other concurrently running processes don't cause underruns. Each underrun causes a click or pause to the output signal. With relatively short fragments this may cause whining sound which is very difficult to identify. Using fragment sizes shorter than 256 bytes is not recommended as the default mode of application. Short fragments should only be used when explicitly requested by user. Obtaining buffering information (pointers)In some cases it is necessary for application to know exactly how much data has been played or recorded. The OSS API provides two ioctl calls for these purposes. Information returned by these calls are not precise in all cases. Some sound devices use internal buffering which make the returned pointer value very imprecise. In addition some operating systems don't allow obtaining value of the actual DMA pointer. Using these calls in applications is likely to make it non-portable between operating systems and makes them incompatible with many popular devices (such as the "original" Gravis Ultrasound). Applications should use ioctl(SNDCTL_DSP_GETCAPS) to check device capabilities before using these calls.count_info info; ioctl(audio_fd, SNDCTL_DSP_GETIPTR, &info); ioctl(audio_fd, SNDCTL_DSP_GETOPTR, &info);These calls return information about recording and playback pointers (respectively). The count_info structure contains the following fields:
Number of bytes processed since opening the device. This field divided by the number of bytes/sample can be used as a precise timer. However underruns, overruns and calls to some ioctl calls (SNDCTL_DSP_RESET, SNDCTL_DSP_POST and SNDCTL_DSP_SYNC) decrease precision of the value. Also some operating systems don't permit reading value of the actual DMA pointer so in these cases the value is truncated to previous fragment boundary. Number of fragment transitions (hardware interrupts) processed since previous call to this ioctl (the value is reset to 0 after each call). This field is valid only when using direct access to audio buffer. This field is a byte offset of current playback/recording position from the beginning of audio buffer. This field has little value except when using direct access to audio buffer. Non-blocking reads and writesAll audio read and write calls are non-blocking as long as there is enough space/data in the buffer when the application makes the call.. The application may use SNDCTL_DSP_GETOSPACE and SNDCTL_DSP_GETISPACE to check device's status before making the call. The bytes field tells how many bytes can be read or written without blocking. It is highly recommended to read and write full fragments every time when using select. |
Using select() |
OSS driver supports standard select() system call. With audio
devices select returns 1 in the read or write descriptor bitmask
when it is possible to read or write at least one byte without blocking.
The application should SNDCTL_DSP_GETOSPACE
and SNDCTL_DSP_GETISPACE to
check the actual situation. Reading and writing full fragments at time
is recommended when select() is used.
Calling select() with the audio_fd bit set in the readfds parameter has an important side effect. This call starts recording immediately if it has not already started and recording is enabled. (Due to a bug in OSS versions earlier than 3.6 this may not work with all cards.) Some operating systems (such as Solaris) don't support select(). In these case the poll() system call can be used in place of select. |
Checking device capabilities |
There are some features in the OSS API that don't work with all devices
and/or operating systems. For this reason it is important to check that
the features are available before trying to use them. Effect of using features
not supported by current hardware/operating system combination is undefined.
It is possible to check availability of certain features by using the SNDCTL_DSP_GETCAPS ioctl as below: int caps; ioctl(audio_fd, SNDCTL_DSP_GETCAPS, &caps);This call returns a bitmask defining the available features. The possible bits are:
|
Synchronization issues |
In some applications it is necessary to synchronize audio playback/recording
with screen updates, MIDI playback or some other "external" events.
This section describes some ways how it is possible to implement this kind
of features. When using the features described in this section it is very
important to access the device by writing and reading full fragments at
time. Using partial fragments is possible but it may introduce problems
which are very difficult to handle.
There are several different reasons for using synchronization:
Avoiding blocking in audio operationsThe recommended method for implementing non-blocking reads or writes is to use select(). Further instructions for using this method has been given above.Synchronizing external events with audioWhen there is need to get audio recording and playback to work in sync with screen updates, it is easier play audio at it is own speed and to synchronize screen updates with it. To do this, you can use the SNDCTL_DSP_GETxPTR calls to obtain the number of bytes that have been processed since opening the device. Then divide the bytes field returned by the call by number of bytes per sample (for example 4 in 16bits stereo mode). To get the number of milliseconds since start you need to multiply the sample count by 1000 and to divide this by the sampling rate.In this way you can use normal UNIX alarm timers or select()to control the interval between screen updates while still being able to obtain exact "audio" time. Note that any kind of performance problems (playback underruns and recording overruns) disturb audio timing and decrease it is precision. Synchronizing audio with external eventsIn games and some other real time applications there is need to keep sound effects playing at the same time with the game events. For example sound of an explosion should be played exactly at the time (or slightly later) with the flash on the screen.The recommended method to be used in this case is to decrease the fragment size and maximum number of fragments used with the device. In most cases this kind of applications work best with just 2 or 3 fragments. A suitable fragment size can be determined by dividing the byte rate of audio playback by number of frames/second to be displayed by the game. It is recommended to avoid too tight timing since otherwise random performance problems may degrade audio output seriously. Another way to synchronize audio playback with other events is to use direct access to audio device buffer. However this way is not recommended since it is not possible with all devices and operating systems. When using the methods described above, there may be need to start playback and/or recording precisely at the right time (this should be somehow rare requirement). This is possible by using the trigger feature described below. Synchronizing recording and playback togetherIn full duplex applications it may be necessary to keep audio playback and recording synchronized together. For example it may be necessary to play back earlier recorded material at the same time when recording new audio tracks. Note that this kind of applications are possible only with devices supporting full duplex operation or by using two separate audio devices together. In the second case it is important that both devices support precisely the sampling rate to be used (otherwise synchronization is not possible). Use the trigger feature when you need this kind of synchronization.Implementing real time effect processors and other odditiesTerm "real time" means here an application which records audio data, performs some kind of processing on it and outputs it immediately without practically any delays. Unfortunately this kind of applications are not possible using UNIX like multitasking operating system and general purpose computer hardware. There is always some delay between recording a sample and before it is available for processing by the application (the same is true with playback too). In addition multitasking overhead (other simultaneously running processes) cause unexpected pauses in operation of the application itself. Normally this kind of operations are done with dedicated hardware with system software designed for this kind of use.It is possible to decrease the delay between input and output by decreasing the fragment size. In theory the fragment size can be as short as 16 bytes with a fast machine. However in practice it is difficult to get fragment sizes shorter than 128 to 256 bytes to work. Using direct access to the hardware level audio buffer may provide better results in systems where this feature works. If you still want to implement this kind of application, you should use short fragments together with select(). The shortest fragment size that works depends on situation and the only way to find it out is making some experiments. And (of cause) you should use a device with full duplex capability or two separate devices together. Starting audio playback and/or recording with precise timingThe SNDCTL_DSP_SETTRIGGER ioctl call has been designed to be used in applications which require starting recording and/or playback with precise timing. Before you use this ioctl, you should check that DSP_CAP_TRIGGER feature is supported by the device. Trying to use this ioctl with a device not supporting it will give undefined results.This ioctl accepts an integer parameter where two bits are used to enable and disable playback, recording or both. The PCM_ENABLE_INPUT bit controls recording and PCM_ENABLE_OUTPUT controls playback. These bits can be used together provided that the device supports full duplex and the device has been opened for O_RDWR access. In other cases the application should use only one of these bits without reopening the device. The driver maintains these bits for each audio device (supporting this feature). Initially (after open) these bits are set to 1 which makes the device to work normally. Before the application can use the trigger ioctl to start device operations, the bit to be used should be set to 0. To do this you can use the following code. It is important to note that this can be done only immediately after opening the device (before writing to or reading from it). It is currently NOT possible to stop or restart a device that has already been active without first reopening the device file. int enable_bits = ~PCM_ENABLE_OUTPUT; /* This disables playback */ ioctl(audiofd, SNDCTL_DSP_SETTRIGGER, &enable_bits);After the above call writes to the device dont start the actual device operation. The application can fill the audio buffer by outputting data using write(). Write will return -1 with errno set to EAGAIN if the application tries to write when the buffer is full. This permits preloading the buffer with output data in advance. Calling read()when PCM_ENABLE_INPUT is not set will always return EAGAIN. To actually activate the operation, call SNDCTL_DSP_TRIGGER with the appropriate bits set. This will start the enabled operations immediately (provided that there is already data in the output buffer). It is also possible to leave one of the directions disabled while starting the another one. Starting audio recording or playback in sync with /dev/sequencer or /dev/musicIn some cases it is necessary to synchronize playback of audio sequences with MIDI output (this is possible with recording too). To do this you need to suspend the device before writing to or reading from it. This can be done by calling ioctl(audiofd, SNDCTL_DSP_SETSYNCRO, 0). After this the device works just like when both the recording and the playback trigger bits (see above) have been set to 0. The difference is that it is not possible to reactivate the device without using features of /dev/sequencer or /dev/music (SEQ_PLAYAUDIO event). |
Full duplex mode |
Full duplex means audio devices capability to do input and output
in parallel.
Most audio devices are half duplex which means that they support both recording and playback but cant do them simultaneously due to hardware level limitations (some devices can't do recording at all). In this cases it is very difficult to implement applications which do both recording and playback. It is recommended that the device is reopened when switching between recording and playback. It is possible to get full duplex features by using two separate devices. In context of OSS this is not called full duplex but simultaneous use of two devices. Full duplex does _NOT_ mean that the same device can be used twice. With current OSS it is not possible to open device that is already open. This feature can possibly be implemented in future versions. In this kind of situations you will need to use two separate devices. Some applications require full duplex operation. It is important that such applications verify that full duplex is possible (using DSP_CAP_DUPLEX) before trying to use the device. Otherwise behaviour of the application will be unpredictable. Application should switch the full duplex feature on immediately after opening the device using ioctl(audiofd, SNDCTL_DSP_SETDUPLEX, 0). This call switches the device to full duplex mode and makes the driver to be prepared for full duplex access. This must be done before checking the DSP_CAP_DUPLEX bit since otherwise the driver may report that the device doesn't support full duplex. Using full duplex is simple in theory. The application just:
|
Accessing DMA buffer directly |
In some (rather rare) cases it is possible to map audio device's hardware
level buffer area into the address space of an application. This method
is very operating system dependent and it is currently available only in
Linux. To for more info about this method (in Linux) you should look at
a demonstration
program provided by 4Front Technologies.
The direct mapping method is possible only with devices that have a hardware level buffer which is directly accessible from host CPU's address space (for example a DMA buffer or a shared memory area). The basic idea is simple. The application uses an operating system dependent method to map the input or the output buffer into it is own virtual address space. In case of full duplex devices, there are two separate buffers (one for input and one for output). After that it triggers the desired transfer operation(s). After that the buffer will be continuously accessed by the hardware until the device is closed. The application can access the buffer area(s) using pointers but normal read() and write() calls can no longer be used. The buffer area is continuously scanned by the hardware. When the pointer reaches the end of the buffer, the pointer is moved back to the beginning. The application can read and write the data using the SNDCTL_DSP_GETxPTR calls. The bytes field tells how many bytes the device has processed since beginning. The ptr field gives an offset relative from the beginning of the buffer. This pointer must be aligned to nearest sample boundary before accessing the buffer using it. The pointer returned by this call is not absolutely precise due to possible delays in executing the ioctl call and possible FIFOs inside the hardware device itself. For this reason the application should assume that the actual pointer is few samples ahead the returned value. When using direct access the blocks field returned by the SNDCTL_DSP_GETxPTR calls has special meaning. The value returned in this field is number of fragments that have been processed since previous call to the same ioctl (the counter is cleared after the call). Also select() works in special way with mapped access. Select returns a bit in the readfds or writefds parameter after each interrupt generated by the device. This happens when the pointer moves from a buffer fragment to another. However the application should check the actual pointer very carefully. It is possible that the select call returns relatively long time after the interrupt. It is even possible that another interrupts occur before the application gets control again. Note that the playback buffer is never cleaned by the driver. If the application stops updating the buffer, its present contents will be played in loop again and again. Sufficient play-ahead is recommended since otherwise the device may play uninitialized (old) samples if there are any performance problems. No software based sample format conversions are performed by the driver. For this reason the application must use a sample format that is directly supported by the driver. |