Why sound quality is bad?
I have been following the discussion on many Linux audio related forums and mailing lists for years. Users keep complaining that sound is garbled. This has continued for years and it will continue for decades if nothing is changed.
Why is sound garbled under Linux while it works under Windows/Mac?
There are few reasons to this. The first one is specific to OSSv4. The other ones are application design issues and common to all audio subsystems (OSS, OSS/Free and ALSA). I have said all this many times before so it may sound like I’m repeating myself. However …
- Practically all Linux audio applications open the audio device with O_NDELAY/O_NONBLOCK flags. However after this they don’t handle non-blocking reads and writes as defined by POSIX. OSSv4 implements POSIX compatible non-blocking I/O and applications that don’t expect non-blocking behavior will run in FFWD speed.
- Applications try to get lower latencies than necessary. This makes them less tolerant against the CPU load caused by the other applications running in the same system. Lower latencies mean shorter buffers. Shorter buffers mean that the application has less time to waste in waiting for the CPU to become available to it. After some limit the application will fail to write/read new audio fast enough which causes a pause/gap/click in the signal. If this happens too often then the sound will be completely garbled. This is an application level bug and the sound system (be it ALSA or OSS) has no way to fix it.
- For some reason most Linux audio applications try to avoid blocking on the audio device. They don’t use normal blocking reads/writes. Instead they use asynchronous timers (usleep/nanosleep/poll/select/whatever) to wait until they can read/write without blocking. Unfortunately this method is not reliable. The application may work just fine most of the time. However sooner or later it will run out of luck. This is more likely to happen if there is any other CPU activity in the system.
- Using asynchronous timers together with short buffers (low latencies) is potentially dangerous. Short buffers mean that the application has less spare time to tolerate poor precision of the system timer. If asynchronous timing is used then the buffer size (latency) should be significantly larger than the resolution of the system timer. Typical Linux/Unix systems use 100Hz system timer which permits fragment sizes that are much larger than 1/100th of second (10 ms). I don’t know what is safe but buffer sizes shorter than 30 to 50 ms are likely to be unreliable. Systems that have 1000 Hz system timer can work down to 3 to 5 ms latencies.
- The current trend is to layer different sound systems over each other. In the worst case there is also a server that runs in background and audio streams to/from all applications get looped through it. These top level sound systems are supposed to fix problems in the lower level APIs. IMHO this is pretty much impossible. Bugs should be fixed in the original software. Upper layers can add some workarounds but at the same time they add their own bugs to the soup. Having a sound server running in background makes timing more unreliable and adds additional context switches to the system.
As I said these current problems are caused mostly by application design errors. If the application doesn’t work properly then no sound subsystem can recover that. The situation is different if the sound subsystem enforces use of dangerous techniques like the ones mentioned above.
These problems must be fixed in the application level. The current trend is that the sound subsystem is blamed for the problems. Then a new sound system is developed and the faulty applications get ported to the new one. The new system inherits the problems of the earlier ones and the circle starts again.