Fragments - history, theory and applications
It looks like the concept of fragments in OSS is causing some confusion. I think it’s necessary to explain that concept in detail. I will probably keep updating this page for a while so stay tuned.
History
The first versions of OSS (the Linux sound driver) in early 90’s used primitive dual buffering (ping-pong buffering) approach where the DMA buffer being read/written by the sound card was divided to two equally sized halves. The buffer size was determined by some compile time constant (originally 64k) so the buffer halves were fixed to half of that. The application was not able to change it.
Also the first version was very primitive. It required that each read/write was equal to the fragment size. For this reason all applications were required to call SNDCTL_DSP_GETBLKSIZE and to allocate it’s local buffer with that size. This limitation got fixed in few weeks and use of SNDCTL_DSP_GETBLKSIZE become optional. However there were still good reasons to use it in many applications.
In the early days of OSS computers were very slow. I had a 25 Mhz 386 one and it was barely capable to do any audio processing. My favourite application was a module (.mod) player called str (still included in the tutorials/sndkit/dsp/str directory of the OSS source tree). It used about 90% of the CPU time and caused lot of hiccup. The only way to get it working properly was by using as large buffer as possible. The 10% of extra CPU power made it possible to fill the buffer completely in reasonable time. After the buffer was completely full there was plenty of time to compute more audio content before the first half of the buffer got completely played. Even minor CPU load caused by the other processes didn’t cause underruns. However this worked only as long as the application written exactly the right amount of data each time (size of the buffer half or 1/N of it). Use of other write sizes caused additional delays because the process occasionally had to sleep longer than necessary.
So in the beginning the only goal in audio application development was to use as large buffer as possible and to keep it completely filled most of the time. Use of large buffers means large latencies. In the beginning they didn’t matter because the CPU was not capable to run any fancy applications that could have care anything about latencies.
Bit later faster 486 and Pentium systems got cheaper and Linux hackers (including me) started to use them. Suddenly it become possible to develop applications like games. I think the first Linux game was Sasteroids (a SVGAlib based astereoid shooting game by Brad Pitz). Suddenly use of large buffers become a problem becaused the sound effects lagged far behind the action on the screen. The solution was to use smaller buffers. I added the SNDCTL_DSP_SUBDIVIDE ioctl that can be used to request a buffer that was 2, 4, 8 or N times smaller than the default one. That solved the latency problem at that time. Some other ioctl calls (SNDCTL_DSP_GETOSPACE/GETISPACE) were also added so that applications could avoid blocking in read and write calls if there was not enough space immediately available (IMHO adding these calls was a mistake).
The original dual buffering model had some problems. The most difficult one was that the full/half buffer interrupts raised by the sound cards were not always reliable. The interrupts sometimes occurred bit too early when the device was still reading the last sample(s) from the buffer half that was reported to be free. When the driver copied new data to the buffer it overwritten some samples the device had not read yet. This caused serious clicking. The other problem was that often only half of the buffer contained valid audio data before the application got woken up to feed new data to the buffer. Under heavy CPU load this caused unnecessary underruns when the application got delayed.
The next step was dropping the dual buffering scheme and replacing it with the current multi buffering model still used by OSS. Instead of splitting the 64k buffer to two 32k halves OSS now used (by default) 16 sub buffers (fragments) of 4k. One of the fragments was used as a guard zone and kept empty. The remaining 16 fragments were available to the application. This fixed the clicking problems caused by too early buffer interrupts. In addition the SNDCTL_DSP_SETFRAGMENT call was added so that the application can control the fragment and buffer size (number of fragments).
There is also a new SNDCTL_DSP_POLICY ioctl that gives slightly easier way to control the buffer size. Policy value of 5 gives the “default” fragment size. Value of 10 selects buffering mode that is suitable for applications that don’t have any latency requirements. Value of 0 gives buffering policy suitable for applications with agressive low latency requirements. The problem of this ioctl call is that the policy levels are not defined presicely. In the future this call may be extended to support frame rate (FPS) selection. For example value of 24 gives fragment/buffer size that is optimal for playing 24 fps movies. However at this moment this feature has not been implemented.
The theory part
In the new/current model the fragment size defines the maximum time the application may get blocked when doing normal (blocking) read or write. Number of fragments together with the fragment size defines the total buffer size. Both the total buffer size and fragment size together define the latency when the application uses normal (blocking) writes. Reads work in different way because the latencies depend only on the fragment size (unless the application is too slow to keep the buffer empty).
The actual device can work in two different ways:
- Typical PCI/ISA devices use continuous DMA transfers to read/write the DMA buffer allocated by OSS. When the active DMA position eaches the last sample in the buffer the device will automatically skip to the first sample in the buffer. This means that if the application or OSS doesn’t do anything the device will keep playing the current buffer content again and again. In practice OSS will detect the underrun if the application has not written data fast enough. When a playback underrun is detected OSS will automatically wipe out the DMA buffer and the device will keep playing silence. During recording oldest fragments in the buffer will get automatically discarded every time a recording overrun occurs. The audio device is programmed to raise a buffer interrupt at each fragment boundary. OSS uses these interrupts for bookkeeping and to wake up the application when more data/space becomes available.
- USB devices work in slightly different way. The device doesn’t read the kernel level DMA buffer directly. Instead the USB host controller (driver) calls bnack OSS after each fragment and OSS sets up the transfer of the netx fragment to the device. New data gets transferred to an USB audio device every millisecond which means that the fragment size is always equal to one millisecond (see below).
The fragment/buffer interrupts are synchronous to the actual audio stream. They always occur at the moment when there is new data/space available to the application. When the application uses normal (blocking) reads and writes it’s guaranteed that the application will have maximum amount of time to do it’s processing before an underrun/overrun occurs.
What is the effect of the fragment and buffer sizes?
The fragment size defines the amount of time between buffer/fragment interrupts. Every trime the device has processed a fragment it will raise an interrupt. The interrupt handler of the sound card driver will get called. The driver will inform the audio core of OSS which in turn wakes up the application waiting in the read, write, select or poll call. If the application uses exactly fragment sizes reads/writes then the call will return as soon as possible after the application can process one fragment of data. If the application uses longer or shorter reads/writes then the timing becomes bit more complicated. Usually this is not a problem. However applications that require very uniform timing should use roughly fragment sized reads/writes (fragment size should be equal or less than the write/read count). Uniform timing means that writing/reading N samples takes rougly N/sample_rate seconds. If the fragment size is much longer than the read/write size then some read/write calls may return immediately and some others may take much longer.
If the fragment size is N samples and the application reads/writes audio data in chunks of N samples then the maximum time the application can block is N/Fs seconds (Fs is the sampling rate). The average blocking time is N/2Fs and the minimum is 0.
The total buffer size (fragment_size*number_of_fragments) defines the maximum delay between writing a sample and the moment it actually gets played (in addition there may be additional latencies caused by the audio hardware). This figure is only valid if the application uses normal (blocking) writes and tries to keep the buffercompletely filled all the time. In practice the fragment currently played by the device may be completely full or completely empty. This reduces the total latency depending on the situation.
Are fragment sizes always powers of two and why?
The SNDCTL_DSP_SETFRAGMENT ioctl calls can be used to select the fragment size in powers of two. Using powers of wo is ideal in many ways. It eliminates nasty side effects caused by implementation details (DMA FIFOs, etc) of the sound devices. Also the conversion ratios caused by the sample format and number of channel conversions are usually powers of two which makes bookkeeping much easier. However fragment sizes are not always powers of two (2, 4, 8, 16, 32, 64, …, 65536).
First of all USB daudio devices transsfer audio data in blocks of 1 milliseconds. The actual fragment size is sample_rate*sample_size_in_bytes*number_of_channels/1000. When using 48 kHz/16 bits/stereo the fragment size is 48000*2*2=192 bytes which is not a power of two. Use of 44.1 kHz makes things even more complicated because 44100*2*2/1000 is 176.4. This means that some fragments will be 176 bytes (44 samples) and some of them will be 180 bytes (45 samples).
There are also devices that require 6 channel (5.1) audio streams to use fragments that have specific size. Otherwise output will be badly garbled.
I used SNDCTL_DSP_SETFRAGMENT to get give fragment/buffer size but got something different. Is this a bug?
No. It is a bug but a feature. The SNDCTL_DSP_SETFRAGMENT and SNDCTL_DSP_POLICY calls can be used only to give a hint to OSS and the actual device driver. There is no guarantee that the request will be satisfied. In many cases the device has some limitations and it cannot support the requested fragment/buffer size. For example:
- The buffer/fragment size may be fixed. The device may support just one fragment/buffer size or the size is fixed to a value defined in the control panel (ossmix/ossxmix). This is the way how the highest quality professional audio cards usually work.
- The max_int_rate setting in osscore.conf limits the maximum buffer/fragment interrupt rate to some value. If the requested fragment size requires more interrupts then the fragment size will get multiplied by two until the interrupt rate drops below the configured limit. To get smaller fragment sizes the max_int_rate parameter needs to be decreased. This feature was added to OSS to prevent poorly designed applications from causing unnecessary interrupt overhead to that slows down the whole system.
- The virtual mixing (vmix) subsystem sets the device to use fixed fragment size (depending on the max_int_rate setting). This dictates the fragment size used by the client devices. Virtual mixer can be bypassed in some special cases by opening the audio device with the O_EXCL flag.
- Automatic sample rate and audio format conversions done by OSS will affect the fragment size seen by the application. For eample sample rate conversions between 44100 Hz and 48000 Hz will result in fragment sizes that are not multiples of two. In current versions of OSS any format conversions that change the stream size will affect the actual fragment size. To prevent the format conversions the application can use the SNDCTL_DSP_COOKEDMODE ioctl to disable them.
- Some OSS implementations (including Sun’s Boomer) use fixed fragment/buffer size and don’t let the applications to change it.
For this reason applications that depend on specific fragment sizes should bypass vmix and to disable the rate/format conversions. Even after that the fragment size may be dictated by the device. It is necessary to check the actual fragment size by calling SNDCTL_DSP_GETBLKSIZE or SNDCTL_DSP_GETOSPACE/GETISPACE. If the fragment/buffer size differs too much from the desired value then the user should be notified. However refusing to work would be stupid since there may be no way to make the device to provide the required fragment/buffer size. OTOH using different application level logic should usually remove need for specific buffer/fragment sizes.
Applications
For prehistoric reasons many applications set the buffering parameters and depend on them. However this should not be necessary in most applications. There are application design technicues that make applications immune to the buffering details. There are some applications that depend on deterministic buffering and timing but in reality (I think) they are very rare. Depending on the fragment/buffer sizes will be very stupid if better design can eliminate need for that. In large number of situation the application will not be able to control this kind of parameters. It would be sad if wrong application design prevents it from running with the highest quality audio devices of the future.
When should I care about fragments?
The idea of OSS application development is that you use as few features and ioctl calls as possible. If the default settings don’t work then you can start looking for things to optimize. Typical applications can use the default fragment size determined by OSS (in case of Boomer the application can’t even change the fragment size/count). Write/read size can be almost anything (say 4k) and it doesn’t need to be the same as the actual fragment size. Even in many “advanced” cases you only need to use select()/poll() without need to worry about fragments.
If the timing of the application seems to be unpredictable or if there is too long delays between audio output and other events (”lip sync” problems) then you may need to pay attention on the fragment size. If you change the fragment size then you typically also need to use fragment sized reads and writes.