UPDATED: See below.

Okay, so the latest: we’re pretty sure this is not actually xorg now. We’re back to session saves. Not I/O in general: specifically session saves, which is to say, saving the entire project.

See, the every two-minutes thing turned out to be a new feature in Ardour I hadn’t noticed: scheduled auto-saves, which turned out to be… every two minutes. Saves also happen whenever you enable master record, which is the other time I see it. So we’re pretty damn sure it’s Save Session.

We know it’s not I/O in general. Recording is actually far more I/O intensive, and once record is enabled and the save process is done, you can record all you want to without any problems. Bouncing existing material is also a complete nonissue.

It’s also not a filesystem issue: it happens even with RAMdisk, which is faster than anything else. And the behaviour reproduces itself perfectly on my non-USB on-motherboard Intel HD Audio card, so it’s not USB.

Now, to get into more details, I’ve gone digging deep into Ardour source code. BUT I HAVE AN IDEA, so bear with me.

In the source code, most of save happens in libs/ardour/session_state.cc

Save works fine when plugins are deactivated but triggers XRUNs – which means buffer overflows due to more than 100% digital signal processing capability (DSP) is available – when plugins are active.

That’s any kind of plugin, and it doesn’t seem to matter how few.

Save session calls a lot of things including get_state(), which in turn gets latency data from plugins via (eventually) latency_compute_run(), the code for which is the same! in both lv2 and ladspa plugin interfaces.

latency_compute_run() calculates the latency by actually running the plugin. Not a copy: it runs in place the actual plugin that’s in use.

This is all in here:

latency_compute_run() activates the plugin even if it’s already activated (!) then deactivates it on exit (which I guess is stacked somehow because they don’t deactivate in Ardour itself) and runs a second thread on the same instance of the plugin. (Presumably, because how else I guess?)

This strikes me as a minefield.

And so, an hypothesis: this is causing the hyperthreading predictive Intel cpu I have to retrace because of bad prediction and/or bad hyperthreading.

Penalty for this in Intel land is large, and I have seen commentary to the effect that it is large in the Intel Core series I have. I suspect that the two versions of the active plugin may be continually invalidating each other(!) for the duration of the latency test. It may even be causing the on-chip cache to be thrown out.

This would explain why it stops being an issue when the plugin is not active.


ETA: Brent over on Facebook pointed me at this 5-year-old bug, which led me to try fencing Ardour off to a single CPU. And when I do that… the problem goes away. Now, this sounds terrible, but I’m finding even with my semi-pathological test project (which I built to repro this problem) I can get down to 23-ish ms latency with a good degree of safety. So clearly, no matter what’s happening, it does. not. like. multicore.

That said, with hardware monitoring (which I have) that’s plenty good enough. I could live with 60ms if I knew it was safe. 23ms being safe (and 11.7 being mostly ok but a little iffy)? Awesome. Still: what is this?

ETA2: las, who wrote most of and manages the plugin code, popped on and said what I described would totally happen … except the latency recalculation doesn’t actually get called during save. I appear to have just misread the code, which is easy to do when all you have is grep and vi and an unfamiliar codebase.

ETA3: Well, hey! Turns out that setting Input Device and Output Device separately to the same device directly instead of setting Interface to the device (and leaving input and output devices to default assignment) means that Jack loads the device handler twice, as two instances – once for input, once for output. Thanks to rgareus on Ardour Chat for that pointer.

I can see how they get there, but there really ought to be a warning dialogue if you do that.

That means on a single-processor I can get down to 5.6ms latency and past my pathological repro tests cleanly. This is the kind of performance I’ve been expecting out of this box – at a minimum. Attained. I could in theory not even hardware monitor at these speeds – tho’ you really want to be down around 3ms for that ideally. (I can actually kinda run at 2.8ms – but it’s dodgy.) Since I have hardware monitoring I’m setting it all the way up to 11.6ms just to keep DSP numbers down. But any way you look at it – this is awesome.

I was really hoping to get this system back to usability before heading off, and – success! Thanks to everybody who threw out ideas, even if they didn’t work, because at least there are things we get to rule out when that happens.

Also, I’ve started putting together a dev envrironment (with help from Tom – thanks!) so I can explore this further when I get back into town. Saves shouldn’t be doing this. It’d be one thing were it just to HD and not to ramdisk, that’d be fine. But to ramdisk? No. Just… no. And the processor core thing, and the plugins-active-vs-not things are just odd. Maybe I can find it.