Low latencies are bullshit
Linux/Unix audio is so marginal business that there is room only for one idea at time. This idea has remained the same for past 15 years. This magic subject is “low latencies”. In particular this is the favourite subject of many ALSA believers because ALSA is said to provide superior latencies. IMHO the whole low latency hoopla is pure bullshit.
It looks like very few Linux/Unix audio hackers understand what do low latencies really mean. Having “zero” latency might look much better (or sexier) than latencies of ~1 ms. ~1 ms in turn looks better than ~10 ms. However they forget that speed of sound is limited to ~340 m/s (at sea level). This means that:
- During 10 ms sound travels 3.4 m.
- During 1 ms it travels 34 cm.
- During a 48 kHz sample period it travels mighty 7 mm.
In the other way this means that:
- 3.4 m distance between the sound source and the microphone (or speaker and the listener) causes a delay of 10 ms.
- 34 cm distance causes a delay of 1 ms.
- 7 mm distance causes a delay of 1 sample at 48 kHz.
And also:
- Most sound sources (musical instruments) are much larger than 34 and it is inpractical to place the microphone closer to them than 32 cm. This means that in the real (analog) world there are always sound propabation delays that are in ~1 ms range.
- Distance between front and back rows of a big symphony orchestra or choir is several meters so the differences in the delays are in the ~10 ms range.
Do this kind of delay matter in the real world? Do recording engineers pay any attention in getting the microphones located within few centtimeters from the instrument. Do you have heard anybody complaining that a symphony recording has annoying delays between instruments? No, you don’t. This kind of latencies simply don’t matter in the real world.
So why all this low latecy bullshit in audio programming? Why an application with “zero latencies” is considered to be superior to an application that does “low enough” latencies”? Getting reasonable latencies (down to ~10 ms) is trivial. You don’t need to use any special techniques to get there. Maybe the problem is that this is not sexy enough.
There are many situations where the latencies need to be kept in precise control. However in these situations there is no need to push the application or the device to work in some special hyper low latency mode. Instead very simple math can be used to correlate different streams with each other. This is not something that requires clever programming but just some elementary school math. When two or more audio streams (locked to the same sample clock) are started a the same moment (sample) then they will get counting at the same rate forever. This means that sample N will always belong to the same moment of time on all the streams.
There is also one special situation when the driver/device level latencies matter. This is when the latencies can get acumulated in way or other. However (IMHO) this can only happen if the overall application design is busted.
So next time you hear/see somebody talking about lowest possible latencies then you know this guy is potentionally a clueless idiot. In particular if he is blaming lack of super duper real time extensions in the kernel for the hiccup caused by the application.
Remember that for any level of latency the application must be able to do processing for every single latency period withing the previous latency period. If the latency is 100 ms this is pretty much guaranteed to work in all cases (even if you do some heavy computing in background). However if the latency is 1 ms then the operating system must be able to reschedule the application to run once during every single 1 ms period. Any time the time window gets missed there will be some kind of hiccup in the audio. This may be possible by using nasty real time kernel hacks at least if you can tolerate occasional clicks in audio.