More moving parts is better????

I learnt to know Unix (HP-UX) in December 1998 and it was love from the very first sight. I liked the simplicity and elegance of the concepts like piping output of a command to (a chain of) other commands. I was particularily impressed about the simple and elegant file and device model. It made it possible to handle any device just like a regular disk file. This was very different from the HP3000/MPE environment I used before that time. Some devices (like serial ports) had some setup steps to be done using some ioctl calls. However after that point all you had to do is calling read and write until you decide to close the device.

Following this model was my primary goal when I started working on OSS (the Sound Blaster driver for Linux at that time). I implemented a set of ioctl calls for setting up the device. After that it was possible to use the device just by calling read and write. The select (later poll) call was used when the application had need to serve multiple devices at the same time. My opinion is that this approach is still the mother of all approaches.

The audio model of OSS is like a big horizontally installed wheel with two or more sinks (fragments) installed to it. The wheel rotates at fixed rate (sampling rate). The plattform has two sides. There is a worker (application) working on the loading side. When an empty bucket arrives to the loading area a bell will ring and the worker will have some time to fill the sink. After that the sink will move out of reach and get emptied there. While the sink moves away another (empty) one arrives and the bell rings again. If the worker can fill the sink faster than required he can take a nap before a new sink arrives. If the worker is too slow or sleeps too much then sinks will leave the loading area empty or half empty (underrun).

In mechanical engineering one of the main goals is to reduce the number of moving parts that could break or work wrong. This model used by OSS has just very few moving parts (the wheel, the bell and the tool used by the worker to fill the sinks). Some amount of energy will be needed to move the wheel and the worker needs to get paid for his work.

Several years later another group of developers started the ALSA project because they felt that OSS was too limited. It didn’t support some super-duper features of some most “advanced” sound cards. They developed a new kernel level API. However that appeared to be flawed in a way or another. The solution was to hide all the mess behind a library API (alsa-lib). Many years later even this library API was found to be flawed and even its developers abandoned it. The moved to a brand new library called Jack. Now it loks like the Linux community has abandoned Jack too and moving to PulseAudio or something else (to come). The result is that all open source Unix/Linux applications have dozens of different audio plugins for diffrent APIs like OSS, ALSA, Jack, Pulse Audio, ESD, ART, Gstreamer and even more future ones to come. Every time somebody feels unhappy with the earlier ones he writes a new one on top of one or all of the earlier ones. Then he becomes the one and only tailor to provide “emperor’s new clothes”. Where is the child who finally tells that actually the emperor is naked.

Does this make any sense at all? Certainly not. My personal opinion is that this is all bogus. If the original alternative(s) are broken then no additional layer built on top of it/them cannot make any amount of difference. By using agressiwe workaround strategies it might be possible to hide some of the problems under carpet. However sooner or later (probably sooner) somebody will get in a situation where the solution doesn’t work properly. All additional layers have their own timing and control models. They re more or less (in)compatible with the layers below them. This approach will not work. When somebody figures this out then he will probably implement his very own yet another superior software layer above all the shit under it.

Adding more software layers between the actual application and the device means adding more and more moving parts to the system. There may now be more advanced machinery that needs more power. There may be groups of additional workers that need their salary. The transport chain becomes longer which means that there has to be more stuff working to keep all the sinks and conveyor belts filled (which means additional latencies). The additional machines may get jammed or the additional workers may get stroke or be disturbed by nasty insects. Additional alarm bells or flow control systems are required which makes things more challenging to manage. All this will be more expensive (CPU load). And if something goes wrong then finding out who is responsible may be impossible (developers of the additional layers will blame each other, the customer or design problems of the original wheel). Does this make any sense at all?

Just about a week ago I was contacted by a software developer who was trying to develop an application that plays different sounds in a row. It was a Java application. The Java engine was running on top of ESD. Esd in turn was using OSS (or was it using the ALSA emulation of OSS). The problem was that only the last sound clib got played properly. The earlier ones got cut when some later sound clip was started. Was this problem caused by the Java engine, ESD, OSS or something between them. There was no way to find that out. If the reason cannot be found then the problem cannot be fixed. So who will be responsible if the company developing the actual application cannot deliver the product?

There is nothing wrong in having some kind of umbrella audio libraries that hide the differences between the “native” audio device APIs of different operating systems. The problem is that none of such libraries do just that. Instead they all seem to provide unrelated functionalities like mixing multiple streams or handling “effect” plugins or things like that. Mixing of streams is responsibility of the “host” audio API. Mixing done on any other layer benefits only the applications that use that particular API. Other library layers/APIs can do mixing only if they do that on top of the APIs/libraries they depend on. The result will be just that there are multiple layers and mixing daemons that run in the same system. What a wonderful waste of resources.

IMHO the only sane way of doing (virtual) mixing is the very lowest API layer in each system. In case of OSS it’s the vmix engine that runs in the kernel level. Some vice men have said that mixing in the kernel level is banned because everything that can be done in user space should be done there. However this is their personal opinion biased by their own interests. Mixing in the kernel level consumes some (not that many) CPU cycles in the kernel space and increases the overall CPU load caused by the kernel space. However it really doesn’t matter at all where the CPU cycles are spent. The same instructions executed in the user space will take exactly the same amount of CPU cycles even in the user space. In fact user space mixing will consume much more CPU cycles since additional context switches will be required. Kernel level mixing inturn guarantees that all the audio processing will get executed at the highest priority (interrupt time) and other tasks competing for the same CPU core cannot disturb it. The result is that there will be less overall CPU load without risk of audio hiccup caused by any concurrent CPU load. In the other words mixing done in kernel space may reduce the performance of the kernel itself but it will boost the usefullness of the total system.

The Linux audio community has abandoned the “legacy” Unix/Posix/Linux device API for no reason that is valid. The current solution is based on large set of cometing an parallel libraries. They are all different but don’t provide any benefit over the original Unix/Posix/Linux device file model. The result is just that there are now much more different audio API’s to support than there were 15 years ago. The actual applications don’t work any better but they just have to support dozens on redundant audio AIPs (and counting). They are no better than the lowest level APIs. If the low level APIs are broken then the effort should be put in fixing them. Adding more and more software/library layers that try to fix the original problems is doomed to fail in miserable ways.

Conclusion

  • The audio API to be usedf should be in the lowest possible level. That is the Unix/Posix/Linux device API that is common to most devices (not just audio).
  • If mixing of multiple streams is necessary the it should be done in the kernel level. This gives the best overall CPU usage performance and minimizes the problems caused by concurrent processes.
  • If the audio stream (or a mix of multiple streams) needs to be fed to a remote system then only the final mix should be sent. This can be done by using the oss_userdev imnterface of OSS. Passing each individual stream to an user space daemon is pure waste of resources if only the final mix can be passed to the daemon. A single (dedicated) audio pipe (like oss_userdev) can be implemented in equally efficient way than any other kernel level (process-to-process) pipe.
  • If a common API is required for multiple host audio APIs (OSS, Windows, Mac) then such library should focus only on hiding the differences between the host APIs. Additional features such as mixing (daemons) is considered harmfull and must be avoided.