This article describes some of the existing infrastructure for Linux applications and/or audio processing modules to exchange audio and MIDI data. It reflects the personal experience of the author, so it is necessarily not exhaustive and may contain some subjective bias.

A first distinction must be made between data transport between different applications and data transport inside one application.

Data transport between applications

In this case there are at least two different applications, each one having its own independent execution environment (process). These applications can exchange data via an external service, normally some kind of third party transport daemon running in parallel.

Most of the external transport services take the form of an audio server (daemon) offering an API to its client applications that includes, among other functions:

  • Registering the application with the service as a client.
  • Registering one or more data input/output ports.
  • Managing connections between any existing compatible ports registered with the server.

Transport services also provide built-in ports for the hardware devices found in the system, like any audio or MIDI channels detected in the installed soundcards.

Some simple client applications will connect their ports automatically to sensible defaults as a convenience. For instance, many media players will automatically connect their audio output ports to the hardware playback ports (note the distinction between capture/playback, input/output or read/write ports: for instance, hardware audio playback channels -which ultimately output sound to speakers- are actually audio input ports from the point of view of the transport server, since other applications can write data on them coming from their output ports.)

Some applications also provide some kind of interface so the user can choose where to connect the ports it provides.

Finally, there are third party applications that act as a patchbay. A patchbay role is to let the user manage connections between the different existing ports. Some examples are:

  • qjackctl: the “Audio”, “MIDI” and “ALSA” tabs in the connection dialog provide patchbays for the jack-audio, jack-midi and alsa-seq services, respectively.
  • aconnect: is a CLI utility that lets the user manage connections between alsa-seq ports.

Audio transport between applications

Nowadays most audio production applications use jack, the Jack Audio Connection Kit, to route audio from one to another. jack provides a callback-based API. Every “jackified application” has to register with the jack daemon (jackd) as a jack client and provide one or several callback functions that will be called by the jackd process when appropriate.

jack callbacks should be carefully designed according to a number of guidelines so they don’t hog the computer resources and play nice with the rest of the jack clients. This is especially important when jackd is running with real time privileges, so audio processing and routing takes precedence over any other process running in the system to guarantee a glitch-free audio stream within the capabilities of the hardware. A badly behaved client could render the system unusable.

jack clients register input and output ports with the jackd server that can then be connected to other jack ports. This routing can be handled in the application itself or via a external patchbay, like qjackctl.

The jackd server also provides ports to the audio hardware devices using different backends depending on the hardware driver. Available back-ends (hardware drivers) may include: alsa, dummy, freebob, firewire, net, oss or sun.

+-------------------------------+
|             jack              |
+------+------------------+-----+
| ALSA | firewire (FFADO) | ... |
+------+------------------+-----+

MIDI transport between applications

At the date of this writing, MIDI routing is a bit less streamlined than audio. There are three external MIDI routing APIs popular with MIDI software authors:

  • alsa-seq: at this moment this is probably the most widely used. This API provides timestamped MIDI event handling. alsa-seq clients register themselves with the system and provide a number of input and/or output ports. Ports are routed from the client applications or via a external patchbay like qjackctl or aconnect. alsa-seq provides ports to the available MIDI hardware devices.
  • alsa-raw: this API provides raw MIDI event handling (thus no timestamping.) It is used for very specific applications that access the hardware ports directly, but not for connecting one application to another.
  • jack-midi: the jack server can also provide sample-accurate timestamped MIDI event handling. Again the routing can be done from the client application or using a external patchbay like qjackctl. jack-midi can use alsa-seq or alsa-raw as a backends. The FFADO project provides jack-midi ports for MIDI fireware devices.

It is worth mentioning a2jmidid, which is a daemon that creates a bridge between alsa-seq ports and jack-midi ports, allowing alsa-seq applications to be used in a jack-midi setup.

Data transport between applications and “The Session Issue”

External mechanisms of audio and MIDI data transport between applications are a very convenient framework for developers to allow communication while keeping applications very independent. This is especially relevant because of “The GUI Toolkit Issue”, discussed later in this document.

However, it is not the most convenient framework for the user: a complex project can consist of several applications working together. For instance: a sequencer, a FX plugin rack, a couple of virtual instruments… all of them interconnected in a possibly complex way, including perhaps hardware audio and MIDI ports, too. Every time the user wants to work on that project he has to repeat the painstaking job of launching all those applications and recreate the connections among them.

There are some solutions to ease this task:

  • Patchbay configuration files: patchbays like those in qjackctl provide a mechanism to store all the connections for a session in a file, so after the user has launched all the involved applications he/she can recreate the connections by loading this file.
  • Audio session managers: daemons like lash, ladish or, very recently, jackd itself provide a mechanism so clients can register their configuration with a session server. This session can then be stored and later retrieved, so the session manager will take care of launching the clients and recreate their connections. For this to work, all the applications involved in a project must include support for the session protocol in question. The main problem with this approach is that many popular applications don’t include support for session management yet. It is also necessary to introduce a new application, the session manager, which provides the user interface to construct the session, store it and retrieve it.
  • Internal data transport: the third solution would be to do away with the need for different applications, resulting in sessions that use just one host and one or several plugins. The host will load and connect all the plugins configured for the project. This only works if all the modules used by the session can be loaded by the host as plugins, but the fact is that many modules are not available as plugins because they are too complex to fit within one of the available plugin standards or because of “the GUI Toolkit Issue.”

Data transport within one single application

In this case, an application called host can extend its audio processing/generating capabilities by dynamically loading external libraries (plugins) and the routing among modules is handled internally by the application, thus using the external infrastructure only for accessing hardware devices.

Plugins are dynamically loadable libraries that comply with a certain standard that must also be supported by the host. Plugins provide input and/or output ports, control ports and processing functions that are called by the host.

The most widely used audio plugin standards in Linux are:

  • LADSPA: roughly equivalent to VST in Windows, this standard doesn’t provide MIDI data handling, so it is only used in audio processing or generation. LADSPA doesn’t provide any support for custom GUIs, so the host must programmatically generate a generic one.
  • DSSI: roughly equivalent to VSTi, DSSI is aimed at virtual instruments. Designed as an extension to LADSPA, it provides among other features MIDI event handling, MIDI-controller automation of control ports and off-process GUIs using OSC as IPC method. The DSSI specification doesn’t support multi-channel instruments.
  • LV2: an evolution of LADSPA, LV2 is an extensible specification. The core specification is equivalent to LADSPA. Some of the available extensions include: MIDI ports, in-process GUIs, off-process GUIs, string ports, port grouping… If a plugin requires one or more of these extensions the host must also support them in order to be able to run the plugin. As of this writing LV2 core is not yet supported by many of the most popular host applications. There are two plugin racks: lv2rack for audio effects and zynjacku for virtual instruments.

Audio plugins and “The GUI Toolkit Issue”

A plugin GUI is a dialog containing widgets that display the values of the plugin contol ports and allows the user to change them.

Any GUI host can use the port description information provided by a plugin to algorithmically create a generic GUI assigning a suitable widget to every control port. However these generic GUIs tend to be badly organized and change from host to host.

Most users will find more practical that the plugin author provides also a custom GUI so its layout is optimized, practical and remains consistent accross different hosts.

There are two main approaches to plugin GUIs in Linux:

  • Embedded (in-process) GUIs: the plugin GUI is created and executed by the host process and is included in the host GUI event loop. For this to be possible both plugin and host must use the same GUI toolkit (there is actually some talk about the possibility of mixing toolkits in one event loop, but no one has come up with a workable solution yet.) And herein lies the problem: in Linux there are many GUI toolkits available: the most widely used ones are Qt and Gtk, but there are many more: wxWidgets, FLTK, Motif, FOX, Tk… each with its own particular focus and strengths. This allows every developer to choose the toolkit that suits better his/her technical needs and programming style, but mixing up toolkits in one single application is very complicated, if at all possible. Two solutions for this predicament are:
    • Do as VSTgui, decide on one of the existing toolkits or create yet another one and mandate every plugin and host author to use it. This is not likely to go down very well in a community built around the concepts of freedom and choices.
    • Define an abstract GUI description language and create a parser and renderer for every different toolkit out there. The problem in this solution is how to determine the capabilities of this hypothetical language so it is powerful and open enough to allow arbitrarily complex GUIs without making the implementation of the renderer library an unmaintainable nightmare.
  • External (off-process) GUIs: the GUI is a stand-alone application with its own process and event loop that communicates with the host via some IPC mechanism. This is the approach used by DSSI, which uses the OSC protocol as transport. External GUIs solve the toolkit problem, since they allow developers to use whichever toolkit they wish, but have their own set of problems: each GUI is its own independent top-level window, which some people find disorganized, they consume more resources, are less responsive, etc.

Summarizing, many clever people have dedicated a long time to think this problem through and discuss different possibilities, but the fact is that there is no perfect solution to it that every single developer can agree upon.