Learning Track Player - POC

March 23, 2025 · 6 min read

Jeroen Janssen

I build a Proof of Concept (POC) Multi-Channel Opus Player for Learning Tracks

Press the below "Play" button to start playback, and use the volume and panning sliders to change the mix. See the full explanation after the player.

Shower The People - Player

The original vocal arrangement by Juan Garcia as recorded bei Klangbezirk can be found on the MusikalSpezial website.

Overview

This Proof of Concept uses:

Opus Audio Compression to store the different parts of the learning track in separate channels.
- The example opus file I generated is 10Mb for 9 channels of data (4 voices, 4 piano and 1 click track)
Ogg Description Metadata to store the names of the parts of the learning track
WebAssembly version of the Ogg Opus Decoder
WebAudio for playback and control of panning & volume

I build this Proof Of Concept with the help of Claude and GitHub CoPilot.

Why?

The inspiration for this POC was re-ignited while working on the Examples - Custom Playback on Website section of my Creating Vocal Learning Tracks documentation.
I already had some past experience with the Opus codec and its support for multi-channel audio.

The idea is simple

Instead of preparing all the mixes in advance, enable a singer to just choose what mix they want on demand.

Using LLMs for the POC

I wanted to get some more practical experience in using LLMs for prototyping software.

There are lot of online talks on AI and LLMs, but until you are actually used it to solve a practical problem it is difficult to really have a better understanding.

warning

You still need to know what you are doing.
The generated code is not always correct and that requires debugging (or re-prompting).

The first step: Encoding the Learning track

This was my first prompt in Claude:

I am interested in using the opus audio codec multiple channel support as a way 
to store multiple voice parts of a choir arrangement (soprano, alto, tenor, bass). 
The idea is to generate an opus file containing all voices where each voice is 
stored in its own channel. Then during playback it should be possible to decide 
on the final volume and panning of the channels so a listener has the option to 
choose their own custom mix. Give me a short overview of steps to create such an 
opus file using s.wav,a.wav,t.wav and b.wav and also give an overview how to 
playback such an opus file using browser apis. 
As an alternative list what technology could be best used to reach these goals

This resulted in (abbreviated version):

Creating a multi-channel Opus file from separate choir voice recordings is definitely possible. Opus supports up to 255 channels, making it suitable for your SATB (Soprano, Alto, Tenor, Bass) arrangement.
You can create a multi-channel Opus file using FFmpeg
Playback using the Web Audio API in modern browsers
Alternative Technologies

If Opus doesn't suit your needs, consider these alternatives:
- WebM with Vorbis or Opus codec - Good browser support, similar capabilities
- Web Audio API with individual audio files - Load each voice separately and mix them at runtime
- Stem files (used in music production) - Store multiple tracks in a format designed for remixing
- WebAssembly audio mixing framework - For more complex real-time processing
- Ogg Vorbis - Another open audio codec with multi-channel support
The Web Audio API approach with separate files might actually be easier to implement and gives you more flexibility for custom mixing without having to re-encode.

And after an additional prompt on storing metadata about the channels:

Does opus have a way to store metadata about each channel to store the voice name?

After a while I ended up with an ffmpeg commandline to encode my input files:

ffmpeg -i ./data/shower-the-people/S.wav -i ./data/shower-the-people/A.wav -i ./data/shower-the-people/T.wav -i ./data/shower-the-people/B.wav -i ./data/shower-the-people/S-P.wav -i ./data/shower-the-people/A-P.wav -i ./data/shower-the-people/T-P.wav -i ./data/shower-the-people/B-P.wav -i ./data/shower-the-people/Click.wav \
-filter_complex "[0:a][1:a][2:a][3:a][4:a][5:a][6:a][7:a][8:a]amerge=inputs=9[aout]" \
-map "[aout]" -c:a libopus -b:a 288k -mapping_family 255 \
-metadata comment="Channel order: S, A, T, B, S-P, A-P, T-P, B-P, Click" \
./data/shower-the-people/output.opus

Oops: Opus Mapping Family 255

Now that I have this output.opus file containing all the tracks, it is time to see if I can also actually playback the audio in the browser. And this is where I ran into a problem, because I couldn't get the generated opus file loading correctly using the audioContext.decodeAudioData method.

Error

Error loading audio file: EncodingError: Decoding failed

I confirmed I could open the file in Audacity, but it didn't work from the browser. And I believe this is because of the mapping_family 255 option that I actually used because I needed 9 channels.
The other mappings mostly relate to surround sound where channels have a specific location meaning (front left, front center, side left, rear right, etc).

From the Ogg Encapsulation Specification - Channel Mapping Family 255

Channel Mapping Family 255

Allowed numbers of channels: 1...255. No defined channel meaning.

Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams. Offline implementations MAY deinterleave the output into separate PCM files, one per channel. Implementations SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.

Next step: decoding the Opus file 'manually' in the browser.
(I am splitting this into 2 parts, since it is taking a fair bit of time to read back through my notes and write this post)

Shower The People - Player​

Overview​

Why?​

Using LLMs for the POC​

The first step: Encoding the Learning track​

Oops: Opus Mapping Family 255​