Jekyll2022-06-27T05:21:06-07:00https://nyanpasu64.github.io/feed.xmlnyanpasu64’s blog [OLD]Adventures in programming, DSP, and chiptunenyanpasu64Implementing low-latency shared/exclusive mode audio output/duplex2022-06-14T00:00:00-07:002022-06-14T00:00:00-07:00https://nyanpasu64.github.io/blog/low-latency-audio-output-duplex-alsa<p>Audio output and duplex is actually quite tricky, and even libraries like RtAudio get it wrong. If you’re writing an app that needs low-latency audio without glitches, the proper implementation architecture differs between apps talking to pull-mode (well-designed, low-latency) mixing daemons, and apps talking to hardware. (I hear push-mode mixing daemons are incompatible with low latency; I discuss this at the end.) This is my best understanding of the problem right now.</p>
<h2 id="prior-art">Prior art</h2>
<p>There are some previous resources on implementing ALSA duplex, but I find them to be unclear and/or incomplete:</p>
<ul>
<li><a href="https://git.alsa-project.org/?p=alsa-lib.git;a=blob;f=test/latency.c">https://git.alsa-project.org/?p=alsa-lib.git;a=blob;f=test/latency.c</a>; gets the “write silence” part right but doesn’t explain what it’s doing, and the main loop is confusing.</li>
<li><a href="https://web.archive.org/web/20211003144458/http://www.saunalahti.fi/~s7l/blog/2005/08/21/Full%20Duplex%20ALSA">https://web.archive.org/web/20211003144458/http://www.saunalahti.fi/~s7l/blog/2005/08/21/Full%20Duplex%20ALSA</a> gets the “write silence” part right, but doesn’t know <em>why</em> it’s necessary.</li>
<li><a href="http://equalarea.com/paul/alsa-audio.html#duplexex">http://equalarea.com/paul/alsa-audio.html#duplexex</a> says: “The the interrupt-driven example represents a fundamentally better design for many situations. It is, however, rather complex to extend to full duplex. This is why I suggest you forget about all of this… In a word: JACK.” However this doesn’t answer the question of how <em>JACK</em> itself implements full duplex audio.</li>
</ul>
<h2 id="alsa-terminology">ALSA terminology</h2>
<p>These are some background terms which are helpful to understand before writing an audio backend.</p>
<p><strong>Sample:</strong> one amplitude in a discrete-time signal, or the time interval between an ADC generating or DAC playing adjacent samples.</p>
<p><strong>Frame:</strong> one sample of time, or one sample across all audio channels.</p>
<p><strong>Period:</strong> Every time the hardware record/play point advances by this many frames, the app is woken up to read or generate audio. In most ALSA apps, the hardware period determines the chunks of audio read, generated, or written.</p>
<p>However you can read and write arbitrary chunks of audio anyway, and query the exact point where the hardware is writing or playing audio at any time, even between periods. For example, PulseAudio and PipeWire’s ALSA backends ignore/disable periods altogether, and instead fetch and play audio based off a variable-interval OS timer loosely synchronized with the hardware’s write and play points.</p>
<ul>
<li>PipeWire (timer-based scheduling) experiences extra latency with batch devices (<a href="https://gitlab.freedesktop.org/pipewire/pipewire/-/wikis/FAQ#pipewire-buffering-explained">link</a>), and PulseAudio used to turn off timer-based scheduling for batch devices (<a href="https://www.alsa-project.org/pipermail/alsa-devel/2014-March/073816.html">link</a>).</li>
<li>On the other hand, Paul Davis says conventional <em>period-based</em> scheduling struggles <em>more</em> than timer-based (PulseAudio, PipeWire) for batch devices (<a href="https://blog.linuxplumbersconf.org/2009/slides/Paul-Davis-lpc2009.pdf">link</a> @ “The Importance of Timing”). I’m not sure how to reconcile this.</li>
</ul>
<p><strong>Batch device:</strong> Represented by <code class="language-plaintext highlighter-rouge">SNDRV_PCM_INFO_BATCH</code> in the Linux kernel. I’m not exactly sure what it means. <a href="https://www.alsa-project.org/pipermail/alsa-devel/2014-March/073816.html">https://www.alsa-project.org/pipermail/alsa-devel/2014-March/073816.html</a> says it’s a device where audio can only be sent to the device in period-sized chunks. <a href="https://www.alsa-project.org/pipermail/alsa-devel/2015-June/094037.html">https://www.alsa-project.org/pipermail/alsa-devel/2015-June/094037.html</a> is too complicated for me to understand.</p>
<p><strong>Quantum:</strong> PipeWire’s app-facing equivalent to ALSA/JACK periods.</p>
<p><strong>Buffer size:</strong> the total amount of audio which an input ALSA device can buffer for an app to read, or can be buffered by an app for an output ALSA device to play. Always at least 2 periods long.</p>
<p><strong>Available frames:</strong> The number of frames (channel-independent samples) of audio readable/buffered (for input streams) or writable (for output streams).</p>
<p><strong>“Buffered” frames:</strong> For input devices, this matches available (readable) frames. For output devices, this equals the buffer size minus available (writable) frames.</p>
<p><strong>hw devices, plugins, etc:</strong> See <a href="https://www.volkerschatz.com/noise/alsa.html">https://www.volkerschatz.com/noise/alsa.html</a>.</p>
<h2 id="minimum-achievable-inputoutputduplex-latency">Minimum achievable input/output/duplex latency</h2>
<p>The minimum achievable audio latency at a given period size is achieved by having 2 periods of total capture/playback buffering between hardware and a app (RtApiAlsa, JACK2, or PipeWire).</p>
<ul>
<li>If an audio daemon mixes audio from multiple apps, it can only avoid adding latency if there is no buffering (but instead synchronous execution) between the daemon and apps. JACK2 in synchronous mode and PipeWire support this, but pipewire-alsa fails this test by default, so ALSA is not a zero-latency way of talking to PipeWire.</li>
</ul>
<p>For duplex streams, the total round-trip (microphone-to-speaker) latency of a duplex stream is <code class="language-plaintext highlighter-rouge">N</code> periods (the maximum amount of buffered audio in the output buffer). <code class="language-plaintext highlighter-rouge">N</code> is always ≥ 2 and almost always an integer.</p>
<p>For capture and duplex streams, there are <code class="language-plaintext highlighter-rouge">0</code> to <code class="language-plaintext highlighter-rouge">1</code> periods of capture (microphone-to-screen) latency (since microphone input can occur at any time, but is always processed at period boundaries).</p>
<p>For playback and duplex streams, there are <code class="language-plaintext highlighter-rouge">N-1</code> to <code class="language-plaintext highlighter-rouge">N</code> periods of playback (keyboard-to-speaker) latency (since keyboard input can occur at any point, but is always converted into audio at period boundaries).</p>
<p>These values only include delay caused by audio buffers, and exclude extra latency in the input stack, display stack, sound drivers, resamplers, or ADC/DAC.</p>
<p>Note that this article doesn’t cover the advantages of extra buffering, like smoothing over hitches, or JACK2 async mode ensuring that an app that stalls won’t cause the system audio and all apps to xrun. I have not studied JACK2 async mode though.</p>
<h2 id="avoid-blocking-writes-both-exclusive-and-shared-output-only">Avoid blocking writes (both exclusive and shared, output only)</h2>
<p>If your app generates one output period of audio at a time and you want to minimize keypress-to-audio latency, regardless if your app outputs to hardware devices or pull-mode daemons, it should never rely on blocking writes to act as output backpressure. Instead it should wait until 1 period of audio is writable, <em>then</em> generate 1 period of audio and nonblocking-write it. (This does not apply to duplex apps, since waiting for available <em>input</em> data effectively acts as <em>output</em> throttling.)</p>
<p>If your app generates audio <em>before</em> performing blocking writes for throttling, you will generate a new period of audio as soon as the previous period of audio is written (a full period of real time before a new period of audio is writable). This audio gets buffered for an extra period (while <code class="language-plaintext highlighter-rouge">snd_pcm_writei()</code> blocks) before reaching the speakers, so <strong>external (eg. keyboard) input takes a period longer to be audible.</strong></p>
<p>(Note that avoiding blocking writes isn’t necessarily beneficial if you don’t generate and play audio in chunks synchronized with output periods.)</p>
<p><strong>Issue:</strong> RtAudio relies on blocking <code class="language-plaintext highlighter-rouge">snd_pcm_writei</code> in pure-output streams. This adds 1 period of keyboard-to-speaker latency to output streams. (It also relies on blocking <code class="language-plaintext highlighter-rouge">snd_pcm_writei</code> for duplex streams, but this is essentially harmless since RtAudio first blocks on <code class="language-plaintext highlighter-rouge">snd_pcm_readi</code>, and by the time the function returns, if the input and output streams are synchronized <code class="language-plaintext highlighter-rouge">snd_pcm_writei</code> is effectively a nonblocking write call.)</p>
<h3 id="alsa-blocking-readswrites-vs-snd_pcm_wait-vs-poll">ALSA: blocking reads/writes vs. snd_pcm_wait() vs. poll()</h3>
<p>Making a blocking call to <code class="language-plaintext highlighter-rouge">snd_pcm_readi()</code> before generating sound is basically fine and does not add latency relative to nonblocking reads (<code class="language-plaintext highlighter-rouge">snd_pcm_sw_params_set_avail_min(1 period)</code> during setup, and calling <code class="language-plaintext highlighter-rouge">snd_pcm_wait()</code> before every read).</p>
<p>On the other hand, generating sound then making a blocking call to <code class="language-plaintext highlighter-rouge">snd_pcm_writei()</code> (in output-only streams) adds a full period of keyboard-to-speaker latency relative to nonblocking writes (<code class="language-plaintext highlighter-rouge">snd_pcm_sw_params_set_avail_min(unused_buffer_size + 1 period)</code> during setup, and calling <code class="language-plaintext highlighter-rouge">snd_pcm_wait()</code> before generating and writing audio).</p>
<p><code class="language-plaintext highlighter-rouge">poll()</code> has the same latency as <code class="language-plaintext highlighter-rouge">snd_pcm_wait()</code> and is more difficult to setup. The advantage is that you can pass in an extra file descriptor, allowing the main thread to interrupt the audio thread if <code class="language-plaintext highlighter-rouge">poll/snd_pcm_wait()</code> is stuck waiting on a stalled ALSA device. (I’m not sure if stalled ALSA is common, but I’ve seen stalled shared-mode WASAPI happen.)</p>
<h2 id="avoid-buffering-shared-output-streams-output-and-duplex">Avoid buffering shared output streams (output and duplex)</h2>
<p>Most apps use shared-mode streams, since exclusive-mode streams take up an entire audio device, preventing other apps from playing sound. Shared-mode streams generally communicate with a userspace audio daemon<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, which is responsible for mixing audio from various programs and feeding it into hardware sound buffers, and ideally even routing audio from app to app.</p>
<p>If an app needs a output-only or duplex shared-mode stream, and must avoid unnecessary output latency, it should not buffer output audio itself (or generate audio <em>before</em> performing a blocking write, discussed above). Instead it should wait for the daemon to request output audio (and optionally provide input audio), <em>then</em> generate output audio and send it to the daemon. This minimizes output latency, and in the case of duplex streams, enables <em>zero-latency</em> app chaining between apps in an audio graph! To achieve this, the pull-mode mixing daemon (for example JACK2 or PipeWire) requests audio from the first app, and synchronously passes it to later apps within the <em>same period</em> of real-world time. Sending audio through two apps in series has zero added latency compared to sending audio through one app. The downside is that if you chain too many apps, JACK2 can’t finish ticking all the apps in a single period, and fails to output audio to the speakers in time, resulting in an audio glitch or xrun.</p>
<p><strong>Issue:</strong> Any ALSA app talking to pulseaudio-alsa or pipewire-alsa (and possibly any PulseAudio app talking to pipewire-pulse) will perform extra buffering. Hopefully RtAudio, PortAudio, etc. will all add PipeWire backends someday (SDL2 already has it: <a href="https://www.phoronix.com/scan.php?page=news_item&px=SDL2-Lands-PipeWire-Audio">https://www.phoronix.com/scan.php?page=news_item&px=SDL2-Lands-PipeWire-Audio</a>).</p>
<p>As a result, for the remainder of the article, I will be focusing on using ALSA to talk to <em>hardware</em> devices.</p>
<h2 id="buffer-1-2-periods-in-exclusive-output-streams-output-and-duplex">Buffer 1-2 periods in exclusive output streams (output and duplex)</h2>
<p>It is useful for some apps to open hardware devices directly (such that no other app can output or even receive audio), using exclusive-mode APIs like ALSA. These apps include audio daemons like PipeWire and JACK2 (which mix audio output from multiple shared-mode apps), or DAWs (which occupy an entire audio device for low-latency low-overhead audio recording and playback).</p>
<p>Apps which open hardware in exclusive mode must handle output timing in real-world time themselves. They must read input audio as the hardware writes it into buffers, and send output audio to the buffers <em>ahead</em> of the hardware playing it back.</p>
<p>In well-designed duplex apps that talk to hardware, such as jack2 talking to ALSA, the general approach is:</p>
<ul>
<li>Pick a mic-to-speaker delay (called <code class="language-plaintext highlighter-rouge">used_buffer_size</code> and measured in frames).</li>
<li>Pick a period size, which divides <code class="language-plaintext highlighter-rouge">used_buffer_size</code> into <code class="language-plaintext highlighter-rouge">N</code> periods. <code class="language-plaintext highlighter-rouge">N</code> is usually an integer ≥ 2.</li>
<li>Tell ALSA to allocate an input and output buffer, each of size ≥ <code class="language-plaintext highlighter-rouge">used_buffer_size</code>, each with the correct period size.</li>
<li>Write <code class="language-plaintext highlighter-rouge">used_buffer_size</code> frames of silence to the output</li>
</ul>
<p>Then loop:</p>
<ul>
<li>wait for 1 period/block of input to be available/readable, and 1 period/block of output to play and be available/writable. JACK2 uses <code class="language-plaintext highlighter-rouge">poll()</code>, if you don’t need cancellation you can use <code class="language-plaintext highlighter-rouge">snd_pcm_wait()</code> or even blocking <code class="language-plaintext highlighter-rouge">snd_pcm_readi()</code>.</li>
<li>read 1 period of input, and pass it to the user callback which generates 1 period of output</li>
<li>write 1 period of output into the available/writable room</li>
</ul>
<h2 id="implementing-exclusive-mode-duplex-like-jack2">Implementing exclusive-mode duplex like JACK2</h2>
<p>JACK2’s ALSA backend, and this guide, assume the input and output device in a duplex pair share the same underlying sample clock and never go out of sync. Calling <code class="language-plaintext highlighter-rouge">snd_pcm_link()</code> on two streams is supposed to succeed if and only if they share the same sample clock, buffer size and period count, etc. (the exact criteria are undocumented, and I didn’t read the kernel source yet). If it succeeds, it not only starts and stops the streams together, but is supposed to synchronize the input’s write pointer and the output’s read pointer.</p>
<p>PipeWire supports rate-matching resampling (<a href="https://gitlab.freedesktop.org/pipewire/pipewire/-/wikis/FAQ#how-are-multiple-devices-handled">link</a>), but (like timer-based scheduling) it introduces a great deal of complexity (<em>heuristic</em> clock skew estimation, resampling latency compensation), which I have not studied, is out of scope for opening a simple duplex stream, and <em>actively detracts</em> from learning the fundamentals.</p>
<p>Note that <code class="language-plaintext highlighter-rouge">unused_buffer_size > 0</code> is also incidental complexity, and not essential to understanding the concepts. Normally <code class="language-plaintext highlighter-rouge">buffer_size = N periods</code>.</p>
<p>On ALSA, you can implement full duplex period-based audio by:</p>
<ul>
<li>Optionally(?) open input and output <code class="language-plaintext highlighter-rouge">snd_pcm_t</code> in <code class="language-plaintext highlighter-rouge">SND_PCM_NONBLOCK</code>.</li>
<li>Setup both the input and output streams with <code class="language-plaintext highlighter-rouge">N</code> periods of audio. <code class="language-plaintext highlighter-rouge">N</code> is selected by the user, and is usually 2-4. (If the device only supports <code class="language-plaintext highlighter-rouge">>N</code> periods of audio, JACK2 can open the device with <code class="language-plaintext highlighter-rouge">>N</code> periods, but simulate <code class="language-plaintext highlighter-rouge">N</code> periods of latency by never filling the output device beyond <code class="language-plaintext highlighter-rouge">N</code> periods.)</li>
<li>Let <code class="language-plaintext highlighter-rouge">used_buffer_size = N periods</code> (in frames). This equals the total <code class="language-plaintext highlighter-rouge">buffer_size</code> unless the device only supports <code class="language-plaintext highlighter-rouge">>N</code> periods.</li>
<li>Let <code class="language-plaintext highlighter-rouge">unused_buffer_size = buffer_size - used_buffer_size</code> (in frames). This equals 0 unless the device only supports <code class="language-plaintext highlighter-rouge">>N</code> periods.</li>
<li>Set up the input and output streams, so software waiting/polling will wake up when the hardware writes or reads the correct amount of data.
<ul>
<li>For the input stream, we want to read as soon as 1 period of data is readable/available, so call <code class="language-plaintext highlighter-rouge">snd_pcm_sw_params_set_avail_min(1 period)</code>. You can skip this call if you open the device without <code class="language-plaintext highlighter-rouge">SND_PCM_NONBLOCK</code> and use blocking <code class="language-plaintext highlighter-rouge">snd_pcm_readi</code>, but to my knowledge <code class="language-plaintext highlighter-rouge">snd_pcm_sw_params_set_avail_min()</code> is not optional in the lower-overhead mmap mode.</li>
</ul>
</li>
<li>The output stream is more complicated if <code class="language-plaintext highlighter-rouge">unused_buffer_size != 0</code>.
<ul>
<li>We want to write 1 period of audio once <code class="language-plaintext highlighter-rouge">buffered ≤ used_buffer_size - 1 period</code> (in frames). And we know <code class="language-plaintext highlighter-rouge">writable/available = buffer_size - buffered</code>. So we want to write audio once <code class="language-plaintext highlighter-rouge">writable/available ≥ unused_buffer_size + 1 period</code>.</li>
<li>Call <code class="language-plaintext highlighter-rouge">snd_pcm_sw_params_set_avail_min(unused_buffer_size + 1 period)</code>, so polling/waiting on the output stream will unblock once that much audio is writable.</li>
</ul>
</li>
<li>For duplex streams, write <code class="language-plaintext highlighter-rouge">N</code> periods of silence. This can be skipped for output-only streams, but JACK2 does it for those too.</li>
<li><code class="language-plaintext highlighter-rouge">snd_pcm_start()</code> the input stream if available, and the output stream if available and not linked to the input.</li>
</ul>
<p>And in the audio loop:</p>
<ul>
<li>Either call <code class="language-plaintext highlighter-rouge">poll()</code> (like JACK2, can wait on multiple fds) or <code class="language-plaintext highlighter-rouge">snd_pcm_wait</code> (simpler, synchronous), to wait until 1 period of room is readable from the input stream and writable to the output stream (excluding <code class="language-plaintext highlighter-rouge">unused_buffer_size</code>).
<ul>
<li>At this point, we have <code class="language-plaintext highlighter-rouge">N-1</code> periods of time to generate audio, before the input buffer runs out of room for capturing audio and the output runs out of buffered audio to play. This is why <code class="language-plaintext highlighter-rouge">N</code> must be greater than 1; if not we have <em>no</em> time to generate 1 period of audio to play.</li>
</ul>
</li>
<li>Read 1 period of audio from the input buffer, generate 1 period of output audio, and write it to the output buffer.
<ul>
<li>Now the output buffer holds <code class="language-plaintext highlighter-rouge">≤ used_buffer_size</code> frames, leaving <code class="language-plaintext highlighter-rouge">≥ unused_buffer_size</code> room writable/available.</li>
</ul>
</li>
</ul>
<h3 id="rtaudio-gets-duplex-wrong-can-have-xruns-and-glitches">RtAudio gets duplex wrong, can have xruns and glitches</h3>
<p><strong>Issue:</strong> RtAudio opens and polls an ALSA duplex stream (in this case, duplex.cpp with <a href="https://github.com/nyanpasu64/rtaudio/tree/alsa-duplex-buffering">extra debug prints added</a>, opening my motherboard’s hw device) by:</p>
<ul>
<li>Don’t fill the output with silence.</li>
<li>Call <code class="language-plaintext highlighter-rouge">snd_pcm_sw_params_set_start_threshold()</code> on both streams (though RtAudio only triggers on the input, which starts both streams).</li>
<li><code class="language-plaintext highlighter-rouge">snd_pcm_link()</code> the input and output streams so they both start at the same time. Setup the streams the same way regardless if it succeeds or fails. (On my motherboard audio, it succeeds.)</li>
</ul>
<p>Then loop:</p>
<ul>
<li>Call <code class="language-plaintext highlighter-rouge">snd_pcm_readi(1 period)</code> of input (blocking until available), and pass it to the user callback which generates 1 period of output.
<ul>
<li>Because RtAudio calls <code class="language-plaintext highlighter-rouge">snd_pcm_sw_params_set_start_threshold</code> on the input stream, and the two streams are linked, <code class="language-plaintext highlighter-rouge">snd_pcm_readi()</code> starts both the input and output streams <em>immediately</em> (upon call, not upon return). The output stream is started with no data inside, and tries to play the absence of data. It’s a miracle it doesn’t xrun immediately.</li>
<li>Once the input stream has 1 period of input, <code class="language-plaintext highlighter-rouge">snd_pcm_readi</code> returns. By this point, the output stream has more <code class="language-plaintext highlighter-rouge">snd_pcm_avail()</code> than the total buffer size, and <em>negative</em> <code class="language-plaintext highlighter-rouge">snd_pcm_delay()</code>, yet <em>somehow</em> it does not xrun on the first <code class="language-plaintext highlighter-rouge">snd_pcm_writei()</code>.</li>
</ul>
</li>
<li>Call <code class="language-plaintext highlighter-rouge">snd_pcm_writei(1 period)</code> of output. This does not block since there are three periods available/writable (or two if the input/output streams are not linked).
<ul>
<li>This is supposed to be called when there is 1 period of empty/available space in the buffer to write to. Instead it’s called when there is 1 period of empty space <em>more</em> than the entire buffer size! I don’t understand how ALSA even allows this.</li>
</ul>
</li>
</ul>
<h3 id="fixing-rtaudio-output-and-duplex">Fixing RtAudio output and duplex</h3>
<p>To resolve this for duplex streams, the easiest approach is to change stream starting:</p>
<ul>
<li>Write 1 full buffer (or the used portion) of silence into the output.</li>
<li>Don’t call <code class="language-plaintext highlighter-rouge">snd_pcm_sw_params_set_start_threshold()</code> on the output stream of a duplex pair. Instead use <code class="language-plaintext highlighter-rouge">snd_pcm_link()</code> to start the output stream upon the first input read (or if <code class="language-plaintext highlighter-rouge">snd_pcm_link()</code> fails, start the output stream yourself before the first input read).</li>
</ul>
<p>This approach fails for output-only streams. To resolve the issue in both duplex and output streams, you must:</p>
<ul>
<li>Call <code class="language-plaintext highlighter-rouge">snd_pcm_sw_params_set_avail_min(unused_buffer_size + 1 period)</code> before starting the output stream.</li>
<li>Call <code class="language-plaintext highlighter-rouge">snd_pcm_wait()</code> (or <code class="language-plaintext highlighter-rouge">poll()</code>) on the output stream every period, <em>before</em> generating audio.</li>
</ul>
<p>I haven’t looked into how RtAudio stops ALSA streams (with or without <code class="language-plaintext highlighter-rouge">snd_pcm_link()</code>), then starts them again, and what happens if you call them quickly enough that the buffers haven’t fully drained yet.</p>
<h2 id="optional-replacing-blocking-readswrites-with-cancellable-polling">(optional) Replacing blocking reads/writes with cancellable polling</h2>
<p>RtAudio needs to use polling to avoid extra latency in output-only streams. Should it be used for duplex and input-only streams as well? Is it worth adding an extra pollfd for cancelling blocking writes (possibly replacing the condvar)?</p>
<p>I don’t know how to refactor RtAudio to allow cancelling a blocked <code class="language-plaintext highlighter-rouge">snd_pcm_readi/writei</code>. Maybe pthread cancellation is sufficient, I don’t know. If not, one JACK2 and cpal-inspired approach is:</p>
<ul>
<li>Open all <code class="language-plaintext highlighter-rouge">snd_pcm_t</code> in <code class="language-plaintext highlighter-rouge">SND_PCM_NONBLOCK</code></li>
<li>Fetch fds for each <code class="language-plaintext highlighter-rouge">snd_pcm_t</code> using <code class="language-plaintext highlighter-rouge">snd_pcm_poll_descriptors()</code></li>
<li>Share an interrupt pipefd/eventfd between the GUI and audio thread</li>
<li>In the audio callback:
<ul>
<li><code class="language-plaintext highlighter-rouge">poll()</code> the input, output, and interrupt fds</li>
<li>Pass the result into <code class="language-plaintext highlighter-rouge">snd_pcm_poll_descriptors_revents()</code></li>
<li>Only perform non-blocking PCM reads/writes, or exit the loop if the interrupt fd is signalled.</li>
</ul>
</li>
</ul>
<p>Unfortunately this requires a pile of refactoring for relatively little gain.</p>
<h2 id="is-rtaudios-current-approach-appropriate-for-low-latency-pipewire-alsa">Is RtAudio’s current approach appropriate for low-latency pipewire-alsa?</h2>
<p><strong>Update: No.</strong></p>
<p>pipewire-alsa in its current form (<a href="https://gitlab.freedesktop.org/pipewire/pipewire/-/commit/774ade1467b8c68ac9646624d941be994bd3702b">774ade146</a>) is wholly unsuitable for low-latency audio.</p>
<p>I use <code class="language-plaintext highlighter-rouge">jack_iodelay</code> to measure signal latency, by using Helvum (a PipeWire graph editor) to route <code class="language-plaintext highlighter-rouge">jack_iodelay</code>’s output (which generates audio) through other nodes (which should pass-through audio with a delay) and back into its input (which measures audio and determines latency). When <code class="language-plaintext highlighter-rouge">jack_iodelay</code> is routed through hardware alone, it reports the usual 2 periods/quantums of latency. When I start RtAudio’s ALSA duplex app with period matched to the PipeWire quantum (which should add only 1 period of latency since <code class="language-plaintext highlighter-rouge">snd_pcm_link()</code> fails), and route <code class="language-plaintext highlighter-rouge">jack_iodelay</code> through hardware and duplex in series, <code class="language-plaintext highlighter-rouge">jack_iodelay</code> reports a whopping 7 periods of latency. My guess is that pipewire-alsa adds a full 2 periods of buffering to both its input and output streams. I’m not sure if I have the motivation to understand and fix it.</p>
<p><strong>Earlier:</strong></p>
<p>RtAudio doesn’t write silence to the output of a duplex stream before starting the streams, and only writes to the output stream once one period of data arrives at the input stream. This is unambiguously wrong for hw device streams. Is it the best way to achieve zero-latency alsa passthrough, when using the pipewire-alsa ALSA plugin? I don’t know if it works or if the output stream xruns, I don’t know if this is contractually guaranteed to work, and I’d have to test it and read the pipewire-alsa source (<a href="https://gitlab.freedesktop.org/pipewire/pipewire/-/blob/master/pipewire-alsa/alsa-plugins/pcm_pipewire.c">link</a>).</p>
<p>Is it possible to achieve low-latency <em>output-only</em> ALSA, perhaps by waiting until the buffer is entirely empty (<code class="language-plaintext highlighter-rouge">snd_pcm_sw_params_set_avail_min()</code>)? Again I don’t know, and I’d have to test.</p>
<h2 id="push-mode-audio-loses-the-battle-before-its-even-fought">Push-mode audio loses the battle before it’s even fought</h2>
<p>I hear push-mode mixing daemons like PulseAudio (or possibly WASAPI) are fundamentally bad designs, incompatible with low-latency or consistent-latency audio output.</p>
<p><a href="https://superpowered.com/androidaudiopathlatency">https://superpowered.com/androidaudiopathlatency</a> (<a href="https://news.ycombinator.com/item?id=9386994">discussion</a>) is an horror story. In fact I read elsewhere that pre-AAudio Android duplex loopback latency is <em>different</em> on every run; I can no longer recall the source, but it’s entirely consistent with the user application’s own ring buffering, or if input and output streams were started separately and not started and run in sync at a driver level like <code class="language-plaintext highlighter-rouge">snd_pcm_link</code>.</p>
<p>Note that Android audio may have improved since then, see AAudio and <a href="https://android-developers.googleblog.com/2021/03/an-update-on-androids-audio-latency.html">https://android-developers.googleblog.com/2021/03/an-update-on-androids-audio-latency.html</a>.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>ALSA dmix may be kernel-based. I’m not sure, and I haven’t looked into it. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>nyanpasu64Audio output and duplex is actually quite tricky, and even libraries like RtAudio get it wrong. If you’re writing an app that needs low-latency audio without glitches, the proper implementation architecture differs between apps talking to pull-mode (well-designed, low-latency) mixing daemons, and apps talking to hardware. (I hear push-mode mixing daemons are incompatible with low latency; I discuss this at the end.) This is my best understanding of the problem right now.This site has moved!2022-06-14T00:00:00-07:002022-06-14T00:00:00-07:00https://nyanpasu64.github.io/blog/moved<p>This site has moved to <a href="https://nyanpasu64.gitlab.io/">https://nyanpasu64.gitlab.io/</a>. Be sure to update your feed readers.</p>nyanpasu64This site has moved to https://nyanpasu64.gitlab.io/. Be sure to update your feed readers.The missing guide for Arch Linux PKGBUILD’s pkgver() version numbers2021-08-13T07:21:00-07:002021-08-13T07:21:00-07:00https://nyanpasu64.github.io/blog/the-missing-guide-for-arch-linux-pkgbuild-s-pkgver-version-numbers<p>Pacman’s version comparison algorithm was designed over a decade ago to properly sort many categories of real-world version numbers, and is now set in stone, quirks and all. Later on, the AUR developed <code class="language-plaintext highlighter-rouge">pkgver()</code> conventions and templates which turn Git commits into version numbers that would sort properly in Pacman. But what are Pacman’s requirements for sorting real-world version numbers, how does Pacman’s version comparison algorithm work, and how are AUR <code class="language-plaintext highlighter-rouge">pkgver()</code> built around the algorithm?</p>
<h1 id="how-pacman-compares-versions">How Pacman compares versions</h1>
<p><code class="language-plaintext highlighter-rouge">vercmp</code> is a command-line utility which takes two string arguments and compares them using Pacman’s version comparison algorithm.</p>
<p>The <code class="language-plaintext highlighter-rouge">vercmp</code> executable exposes the algorithm used by Pacman to determine whether a different package version is newer than what you have currently installed. Sadly, https://man.archlinux.org/man/vercmp.8 (as well as the pacman manpage) is inadequate and fails to explain the algorithm, only providing a few examples.</p>
<h2 id="requirements-for-comparing-versions">Requirements for comparing versions</h2>
<p>Pacman needs to compare the versions of real-world software programs and its own conventions correctly:</p>
<ul>
<li>1.0-beta < 1.0 (from semver)
<ul>
<li>pacman and vercmp fail to fulfill this requirement, because it interprets <code class="language-plaintext highlighter-rouge">-beta</code> as build metadata (see <code class="language-plaintext highlighter-rouge">parseEVR()</code> <code class="language-plaintext highlighter-rouge">-release</code>).</li>
</ul>
</li>
<li>1.0beta < 1.0 (Arch labels pre-release packages as 1.0beta rather than 1.0-beta)</li>
<li>1.0 < 1.0.1</li>
<li>1.0.1 < 1.0.2</li>
</ul>
<p>Pacman’s version comparison algorithm also has incidental properties that I don’t consider to be first principles. However, AUR <code class="language-plaintext highlighter-rouge">pkgver()</code> depend on certain ones to generate unusual-looking unintuitive version numbers that nonetheless sort properly in Pacman.</p>
<ul>
<li>1.0 < 1.0.0 (I think they should be equal)</li>
<li>alpha < beta < 1.0</li>
<li>1.0 < 1.0.alpha (it’s strange that 1.0 < 1.0.alpha < 1.0.0)</li>
<li>1.0.alpha < 1.0.0</li>
<li>1.0.alpha < 1.0.1</li>
</ul>
<h2 id="algorithm-implementation">Algorithm implementation</h2>
<p>The algorithm is implemented in <code class="language-plaintext highlighter-rouge">alpm_pkg_vercmp()</code> in the Pacman source code (<a href="https://gitlab.archlinux.org/pacman/pacman/-/blob/master/lib/libalpm/version.c"><code class="language-plaintext highlighter-rouge">:lib/libalpm/version.c</code></a>). The file is 260 lines of code, with multiple functions dedicated to different aspects of version comparison. The algorithm is written in raw C, with <em>glorious</em> null-terminated strings, and string slicing implemented via <code class="language-plaintext highlighter-rouge">const</code>-incompatible null byte insertion. 😿</p>
<h3 id="epoch-version-and-release">Epoch, version, and release</h3>
<p><code class="language-plaintext highlighter-rouge">parseEVR()</code> parses Arch package versions using the format <code class="language-plaintext highlighter-rouge">[epoch:]version[-release]</code>. More specifically, all characters after the last hyphen form the release (even if there are colons afterwards), and the epoch is “0” unless the first non-digit is a colon. If no epoch is present, the epoch is labeled as 0.</p>
<p><code class="language-plaintext highlighter-rouge">parseEVR()</code> allows only numbers in the epoch field. It is usually absent, but can be used as a “major version” to ensure that newer program versions compare higher, even if the newer program’s version number (stored in the version field) is <em>lower</em> than in older versions.</p>
<p>The release field is an optional location for “build metadata”. A version with no release field is considered equal to otherwise-identical versions with any release field, but two otherwise-identical versions with different release fields use the release field to break ties.</p>
<h3 id="comparing-versions">Comparing versions</h3>
<p>Each field is then compared using <code class="language-plaintext highlighter-rouge">rpmvercmp()</code>. Missing epochs are assumed to be 0, and missing releases are assumed to be equal to any numbered release.</p>
<p><code class="language-plaintext highlighter-rouge">rpmvercmp()</code> decomposes its input into “segments”, where each segment starts with 0 or more “separator” characters (any non-alphanumeric character), which are followed by 1 or more “body” characters (each body contains either alphabetic characters or numeric characters, so “1a” is 2 segments). The input may be terminated by a “dangling” segment with only separator characters and no body (but realistic version numbers will not have a dangling segment).</p>
<p>This can be modeled as the regex <code class="language-plaintext highlighter-rouge">([^a-zA-Z0-9]* ([a-zA-Z]+ | [0-9]+) )* [^a-zA-Z0-9]*</code> more or less.</p>
<p>Both inputs are split into segments (including dangling segments), starting at the beginning. The algorithm loops over segments from both inputs, starting with the first segment from each, until either input runs out of segments entirely (one or both segments are absent).</p>
<p>Each loop iteration receives one segment from each version, for as long as both versions have segments remaining:</p>
<ul>
<li>All leading separators are trimmed off both segments. Results:
<ul>
<li>1.1 = 1_1</li>
</ul>
</li>
<li>If either segment is empty after trimming separators (because it’s a dangling segment), the loop breaks.</li>
<li>If one segment started with more separator characters, it’s a larger version. Note that the Pacman developers believe that realistic version numbers do not have multiple separator characters in a row, and Pacman isn’t designed to handle this situation perfectly. Results:
<ul>
<li>1 < .1 = _1 < ..1</li>
<li>1.1 < 1..1</li>
<li>1.a < 1..a</li>
<li>1rev < 1.rev < 1..rev</li>
<li>a10 < a.10</li>
</ul>
</li>
<li>Alphabetic segments are sorted lexicographically, and sort before numeric segments (sorted numerically). Results:
<ul>
<li>a < aa < z < zz < 1 = 01 < 9 < 10</li>
</ul>
</li>
</ul>
<p>The function returns immediately if the loop finds a pair of segments that compare unequal. Otherwise the loop stops (without stripping separators) when one or both inputs reach the end of line, or breaks (after stripping separators) when one or both inputs reach a final dangling segment.</p>
<p>At this point, one of these is true:</p>
<ul>
<li>at least one version has no segment.</li>
<li>no versions have missing segments, but at least one version has a dangling segment (causing both segments to be stripped, so at least one version <em>now</em> has no segment).</li>
</ul>
<p>The segments are compared as follows:</p>
<ul>
<li>none = none</li>
<li>none > alpha</li>
<li>none < separator or number</li>
<li>alpha < none</li>
<li>separator or number > none</li>
</ul>
<p>The algorithm is complete.</p>
<p>All dangling segments compare equal to one another, but come after “segment with text” and “no segment” and before “segment with number”.</p>
<ul>
<li>
<p>1a < 1 < 1.a < 1. = 1.. < 1.0</p>
</li>
<li>’’ < ‘.’ = ‘..’</li>
<li>1 < 1. = 1_ = 1..</li>
</ul>
<p>Unfortunately this algorithm has a cycle, caused by how more leading separators wins a version comparison (even if followed by a losing body) if both segments have bodies, but gets ignored if one or both segments are empty after trimming.</p>
<ul>
<li>1.0 < 1..a (more leading separators wins since both segments have bodies)</li>
<li>1..a < 1. (leading separators ignored since 1. is empty after trimming, ‘a’ < ‘’)</li>
<li>
<ol>
<li>< 1.0 (leading separators ignored since 1. is empty after trimming, ‘’ < ‘1’)</li>
</ol>
</li>
</ul>
<p>Note that 1. and 1… are interchangeable, because the dangling separators get stripped out either way.</p>
<p>The Pacman developers commented, “Fun example :) Like I said, having multiple delimiters in a row doesn’t make a lot of sense, so that is pretty much undefined behaviour”</p>
<h2 id="testing-the-requirements">Testing the requirements</h2>
<p>Dangling segments and multiple separators don’t occur in real-world version numbers and can be ignored. Does this algorithm properly order real-world versions?</p>
<ul>
<li>1.0beta < 1.0</li>
</ul>
<p>Yes, “beta” < “”.</p>
<ul>
<li>1.0 < 1.0.1</li>
</ul>
<p>Yes, “” < “.1”.</p>
<ul>
<li>1.0.1 < 1.0.2</li>
</ul>
<p>Yes, “.1” < “.2”.</p>
<ul>
<li>1.0 < 1.0.alpha</li>
</ul>
<p>Yes, “” < “.alpha”</p>
<h1 id="what-is-pkgbuild-and-pkgver">What is PKGBUILD and <code class="language-plaintext highlighter-rouge">pkgver</code>?</h1>
<p>PKGBUILD files are shell scripts defining variables and functions used by <code class="language-plaintext highlighter-rouge">makepkg</code> to build a binary package. The <code class="language-plaintext highlighter-rouge">pkgver</code> variable serves as the version number of the PKGBUILD and the package produced. All PKGBUILD files contain a <code class="language-plaintext highlighter-rouge">pkgver</code> variable, storing the package’s version at the time the file was written. However, this is insufficient for VCS/<code class="language-plaintext highlighter-rouge">-git</code> packages tracking the latest commit in a Git repository, where the version of software built by a PKGBUILD can change even when the PKGBUILD does not. To accommodate this, <code class="language-plaintext highlighter-rouge">makepkg</code> also supports a <code class="language-plaintext highlighter-rouge">pkgver()</code> function, which when run produces the <em>current</em> version of the package.</p>
<p>If <code class="language-plaintext highlighter-rouge">pkgver</code> is a variable only, then an unmodified PKGBUILD and <code class="language-plaintext highlighter-rouge">pkgver</code> means the package has not been updated. But if a <code class="language-plaintext highlighter-rouge">pkgver()</code> function is present, then an AUR helper trying to determine if an installed package is outdated must re-clone/pull the VCS repo listed in <code class="language-plaintext highlighter-rouge">source=(...)</code> and call <code class="language-plaintext highlighter-rouge">pkgver()</code> again, even if the PKGBUILD and <code class="language-plaintext highlighter-rouge">pkgver</code> are unmodified.</p>
<p>If a <code class="language-plaintext highlighter-rouge">pkgver()</code> function is present, then running <code class="language-plaintext highlighter-rouge">makepkg</code> to build the PKGBUILD into a binary package also rewrites the PKGBUILD file with a <em>new</em> value for the <code class="language-plaintext highlighter-rouge">pkgver</code> variable. A few fixed-version packages like <a href="https://github.com/archlinux/svntogit-packages/blob/master/qt5-base/trunk/PKGBUILD">qt5-base</a> and <a href="https://github.com/archlinux/svntogit-packages/blob/master/qt5-wayland/trunk/PKGBUILD">qt5-wayland</a> use this property by defining a <code class="language-plaintext highlighter-rouge">pkgver()</code> function to automatically recompute complex version numbers. Unlike <code class="language-plaintext highlighter-rouge">-git</code> packages in the AUR, these PKGBUILDs build a fixed version of the source code, and their <code class="language-plaintext highlighter-rouge">pkgver()</code> functions return a fixed value.</p>
<h1 id="building-a-pkgver-so-pacman-sorts-git-repositories-correctly">Building a <code class="language-plaintext highlighter-rouge">pkgver()</code> so Pacman sorts Git repositories correctly</h1>
<p>Git repositories in the wild have a lot of variance; some don’t have tags, some have tags that sort properly, and some have tags in the wrong order. And some repositories start with no tags, but create tags later on when they make their first release.</p>
<h2 id="requirements-for-comparing-versions-1">Requirements for comparing versions</h2>
<p>What are the requirements for generating version numbers from a Git repository?</p>
<ul>
<li>As a repository without tags creates more commits, the version number should increase.</li>
<li>When a repository creates its first release/tag, the version number should increase.</li>
<li>As a repository with tags creates more commits, the version number should increase.</li>
<li>If the most recent tag changes from 1.0 to 1.1, the version number should increase.</li>
<li>If the most recent tag changes from 1.0 to 1.0.1, the version number should increase.</li>
</ul>
<p>How can we achieve these criteria, given how Pacman works?</p>
<h2 id="arch-wiki-templates">Arch Wiki templates</h2>
<p><a href="https://wiki.archlinux.org/index.php/VCS_package_guidelines#The_pkgver()_function">The Arch wiki</a> provides copy-paste snippets of example pkgver() functions, but fails to explain the underlying concepts (what <code class="language-plaintext highlighter-rouge">git describe</code> outputs, what the sed expression does, how the resulting expression is evaluated by <code class="language-plaintext highlighter-rouge">vercmp</code> and <code class="language-plaintext highlighter-rouge">pacman</code>).</p>
<h3 id="untagged-git-repositories">Untagged Git repositories</h3>
<p>In a Git repo where the history of <code class="language-plaintext highlighter-rouge">master</code> has no tags, the recommended <code class="language-plaintext highlighter-rouge">pkgver()</code> counts commits:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pkgver<span class="o">()</span> <span class="o">{</span>
<span class="nb">cd</span> <span class="s2">"</span><span class="nv">$pkgname</span><span class="s2">"</span>
<span class="nb">printf</span> <span class="s2">"r%s.%s"</span> <span class="s2">"</span><span class="si">$(</span>git rev-list <span class="nt">--count</span> HEAD<span class="si">)</span><span class="s2">"</span> <span class="s2">"</span><span class="si">$(</span>git rev-parse <span class="nt">--short</span> HEAD<span class="si">)</span><span class="s2">"</span>
<span class="o">}</span>
</code></pre></div></div>
<p>This produces a string <code class="language-plaintext highlighter-rouge">r{number of commits}.{commit hash}</code>.</p>
<p>Any letter would work equally well for the version comparison algorithm, <code class="language-plaintext highlighter-rouge">r</code> was chosen because it sounds like “revision”. But what is the purpose of a letter?</p>
<h3 id="tagged-git-repositories">Tagged Git repositories</h3>
<p>If the repo has tags like 0.2.5 which begin with a number (no leading “v” prefix like v0.2.5), <code class="language-plaintext highlighter-rouge">git describe --long --tags</code> can be used as the root source for the version:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pkgver<span class="o">()</span> <span class="o">{</span>
<span class="nb">cd</span> <span class="s2">"</span><span class="nv">$pkgname</span><span class="s2">"</span>
git describe <span class="nt">--long</span> <span class="nt">--tags</span> | <span class="nb">sed</span> <span class="s1">'s/\([^-]*-g\)/r\1/;s/-/./g'</span>
<span class="o">}</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">git describe --long</code> produces a string with format <code class="language-plaintext highlighter-rouge">{most recent tag}-{commits since tag}-g{commit hash}</code>.</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git checkout master
git describe <span class="nt">--long</span> <span class="nt">--tags</span> <span class="c"># v2.4-25-ga240b43</span>
git checkout v2.4 <span class="c"># or git checkout HEAD~25</span>
git describe <span class="nt">--long</span> <span class="nt">--tags</span> <span class="c"># v2.4-0-g51e51f4</span>
</code></pre></div></div>
<p>The sed expression turns it into <code class="language-plaintext highlighter-rouge">{most recent tag}.r{commits since tag}.g{commit hash}</code>.</p>
<h2 id="testing-the-requirements-1">Testing the requirements</h2>
<ul>
<li>As a repository without tags creates more commits, the version number should increase.</li>
</ul>
<p>“r###” < “r###+1”? Trivially so, as the “r” segment is the same, but the “number of commits” segment increases.</p>
<ul>
<li>When a repository creates its first release/tag, the version number should increase.</li>
</ul>
<p>“r###” < “1.0.r###”? Yes. The version of the untagged repository starts with a “r” segment. The version of the tagged repository starts with a numeric segment (taken from the tag), which comes after.</p>
<ul>
<li>As a repository with tags creates more commits, the version number should increase.</li>
</ul>
<p>“1.0.r###” < “1.0.r###+1”? Yes. “most recent tag” is unchanged, “.r” is unchanged, and “commits since tag” increases.</p>
<ul>
<li>If the most recent tag changes from 1.0 to 1.1, the version number should increase.</li>
</ul>
<p>“1.0.r###” < “1.1.r###”? Yes. “most recent tag” increases.</p>
<ul>
<li>If the most recent tag changes from 1.0 to 1.0.1, the version number should increase.</li>
</ul>
<p>“1.0.r###” < “1.0.1.r###”? Yes. “1”=”1”, “.0”=”.0”, and “.r” < “.1”.</p>
<h3 id="why-not-">Why not “.”?</h3>
<p>If tag-based versions were <code class="language-plaintext highlighter-rouge">{most recent tag}.{commits since tag}...</code>, then if the most recent tag changes from 1.0 to 1.0.1, the version would change from “1.0.###” to “1.0.1.###”, where “.1” sorts <em>before</em> “.###” despite 1.0.1 being a newer program version.</p>
<p>This was first brought up by @diabonas:archlinux.org:</p>
<blockquote>
<p>You need it because otherwise 1.0.500 (where 500 is the revision count) would be newer than 1.0.1.30 (again, 30 is the revision count) - this doesn’t happen with 1.0.r500, which is older than 1.0.1.r30 because a letter is always older than a digit</p>
</blockquote>
<h3 id="why-not-r">Why not “r”?</h3>
<blockquote>
<p>It’s 1.0.1.r30 - the dot is important as 1.0.1r30 would be older than 1.0.1, but 1.0.1.r30 is newer - it’s a revision after 1.0.1 after all. And yeah, 1.0r31 is a revision after 1.0, but before the next upstream release 1.0.1, whole 1.0.1.r31 is a revision after 1.0.1, so newer than 1.0.r30</p>
</blockquote>
<h2 id="the-arch-wiki-is-wrong">The Arch Wiki is wrong</h2>
<p>The Arch wiki’s stated requirements for generating version numbers are:</p>
<blockquote>
<p>It is recommended to have following version format: <em>RELEASE.rREVISION</em> where <em>REVISION</em> is a monotonically increasing number that uniquely identifies the source tree (VCS revisions do this).</p>
</blockquote>
<p>The Arch wiki is wrong; given the “RELEASE.RELEASE.rREVISION” convention recommended by the wiki, for Pacman to properly identify older and newer packages, REVISION does not need to be globally monotonic, only within a given RELEASE. And the Arch wiki even breaks its own rules: the example “Git with tags” <code class="language-plaintext highlighter-rouge">pkgver()</code>’s REVISION is not monotonic except within a given RELEASE (Git tag).</p>
<p>Even if the Arch wiki was changed to say that REVISION needs to be monotonic within a given RELEASE, it states that <code class="language-plaintext highlighter-rouge">0.1.r456 > r454</code> but <code class="language-plaintext highlighter-rouge">0.1.456 < 454</code>, without explaining the algorithm used to compare revisions. This only serves to confuse the reader.</p>nyanpasu64Pacman’s version comparison algorithm was designed over a decade ago to properly sort many categories of real-world version numbers, and is now set in stone, quirks and all. Later on, the AUR developed pkgver() conventions and templates which turn Git commits into version numbers that would sort properly in Pacman. But what are Pacman’s requirements for sorting real-world version numbers, how does Pacman’s version comparison algorithm work, and how are AUR pkgver() built around the algorithm?ExoTracker Newsletter #2 - Pivoting to SNES, designing an instrument list2021-03-10T00:00:00-08:002021-03-10T00:00:00-08:00https://nyanpasu64.github.io/blog/exotracker-newsletter-2-pivoting-to-snes<p>For those of you who aren’t already aware, ExoTracker is a tracker-like composing tool, based around subdividing beats instead of integer rows. This allows the user to place notes at arbitrary fractions of a beat (like sheet music), and additionally allows tracker-like delay effects (which can be negative, which is impossible in most trackers). Beat subdivision allows for mixing eighth notes and triplets, and using beats for timing (rather than rows) could make tempo calculation more intuitive than other trackers.</p>
<h2 id="pivoting-to-a-snes-tracker">Pivoting to a SNES tracker</h2>
<p>After spending several months away from ExoTracker, I’ve decided to switch away from emulating a Famicom with expansions, to a SNES’s SPC700 sound chip. I chose to do this because the SNES has less pre-existing options for composing (especially if you limit yourself to free options, ruling out chipsynth SFC and somewhat SNES Tracker). Another benefit is that it’s simpler to write a SNES sound engine; the SNES only has 1 type of channel, so I don’t need to find a way to modularize/abstract the sound driver to reuse instrument code for the Famicom’s numerous expansion chips, which have different register addresses, sizes, and interpretations (pitch: period vs frequency vs Yamaha, volume: linear vs. log vs. hardware envelopes).</p>
<p>Issue is, I haven’t decided how to handle timing… On the NES, the vblank interrupt is the most processor-efficient way to tick the audio engine, and you normally run one tick per vblank. But (to the best of my understanding) the S-SMP (CPU) has 1 fast and 2 slow timers (with configurable dividers), and they don’t interrupt the S-SMP, so you need to busy-wait and poll them manually. And some SNES games change the timer speed to adjust song tempo (so each quarter note is a fixed number of timer ticks like MIDI). Others have unchanging timer speeds and let an uneven number of timer ticks pass between each subsequent quarter note (like FamiTracker’s tempo).</p>
<p>Looking at how pre-existing trackers behave, FamiTracker allows users to configure Speed and Tempo, which interact strangely<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. 0CC-FT adds more modes: “fixed” to turn off Tempo so Speed controls “ticks/row” directly, and grooves to switch Speed on every row.</p>
<p>OpenMPT has <a href="https://wiki.openmpt.org/Manual:_Song_Properties#Overview">3 tempo modes</a> comprising 2 conceptually different types: Classic/Alternative let users pick the duration of rows, whereas Modern lets users pick the duration of beats. Both come with customizable “ticks/second” (which is fixed in FamiTracker), and Classic/Alternative (but not Modern) suffer from tempo rounding errors. Since ExoTracker doesn’t have rows (but instead arbitrary beat subdivisions), copying Classic/Alternative is not an option, and only copying Modern is.</p>
<p>I’m going to use a SPC700 emulation core (likely Blargg’s). I think only the S-SMP can read the SPC700’s timers… but I probably won’t write S-SMP code, but instead will reimplement the driver in C++, using native x86 instructions to communicate with the GUI and S-DSP emulator, so I’ll have to simulate the timers myself. Anyway I need to pick whether to use the fast or slow timer (probably copy existing games), what GUI to provide for customizing the timer rate (either expose the raw register value, or a tempo which gets converted/rounded to a divider register), and what tempo mode to use (fixed-timer/FamiTracker tempo, vs variable-timer/MIDI/OpenMPT Modern) since it’s impractical to implement multiple tempo modes in the C++ and ASM drivers.</p>
<h2 id="designing-an-instrument-list">Designing an instrument list</h2>
<p>There is no instrument editor, and I don’t know when there will be one. In the meantime I’ve been working on adding an instrument list.</p>
<figure class="image">
<img srcset="instrument-list.png 1.25x" src="instrument-list.png" alt="I added an instrument list widget. The instruments form columns, just like FamiTracker's instrument list. Unlike FamiTracker, it shows all 128 instrument slots, even though most of them are empty. It's confusing to look at." />
<figcaption>I added an instrument list widget. The instruments form columns, just like FamiTracker's instrument list. Unlike FamiTracker, it shows all 128 instrument slots, even though most of them are empty. It's confusing to look at.</figcaption>
</figure>
<p>So looking at this picture, obviously it needs improvement. Aside from showing dozens of empty slots, another difference from FamiTracker is that each column has its own width, instead of matching the width of the widest instrument in any column. I’m not sure if that’s a good or bad thing, or if the Qt GUI library allows me to change it.</p>
<p>One solution is to copy how FamiTracker only shows occupied instrument slots. Implementing will take work, because you need to filter the array of numbered instruments and only expose the slots with instruments, and when a user clicks an item, map from the item’s position in the widget back to instrument numbers. One approach is to keep a cached vector of items, each one holding an instrument number that’s guaranteed to point to a non-empty instrument, and regenerate this vector whenever the document is modified.</p>
<h3 id="missing-functionality-in-famitracker">Missing functionality in FamiTracker</h3>
<p>Unfortunately FamiTracker’s instrument drag-and-drop behavior leaves features to be desired. FamiTracker defines drag-and-drop to swap instruments (and not empty slots). But sometimes I want to move an instrument into an empty slot, which is not possible (unless you fill empty slots with placeholder instruments). And sometimes I want to insert, remove, or move instruments, which shifts all instrument numbers afterwards by 1. (In some cases, this may even include empty slots as well, which may or may not be desirable.) This is not possible in FamiTracker unless you drag each instrument over one by one, which is tedious.</p>
<figure class="image">
<img src="famitracker-instrument-groups.png" alt="At 96 DPI, FamiTracker's instruments are grouped into 8-instrument columns. Each has the same leading digit, and each leading digit is split into exactly 2 columns." />
<figcaption>At 96 DPI, FamiTracker's instruments are grouped into 8-instrument columns. Each has the same leading digit, and each leading digit is split into exactly 2 columns.</figcaption>
</figure>
<p>I also like to categorize instruments into percussion, melodic, and expansion chip instruments, then divide them into groups of 8. This is because in FamiTracker, the instrument list is rendered as groups of 8 instruments. However since FamiTracker does not render empty instrument slots, this requires creating empty instruments to fill in any gaps in the numbering scheme.
<!-- I also use empty instruments in FamiTracker to group instruments. In FamiTracker, there's 8 instruments in each column, so i can organize them in groups of 8. However this requires creating empty instruments in between. -->
<!-- Another benefit of placeholder instruments (or showing empty slots) is for grouping instruments. --></p>
<p>(Sidenote: If the list widget isn’t exactly 8 instruments tall, this grouping system breaks, and the instruments are no longer arranged in visually neat columns corresponding 0x0 through 0x7 and 0x8 through 0xf.)</p>
<figure class="image">
<img srcset="famitracker-instrument-groups-125.png 1.25x" src="famitracker-instrument-groups-125.png" alt="Unfortunately at 120 DPI, FamiTracker's instrument list is 9 instruments tall rather than 8 (due to rounding differences), breaking the groups. At 192 DPI, the list is 10 instruments tall!" />
<figcaption>Unfortunately at 120 DPI, FamiTracker's instrument list is 9 instruments tall rather than 8 (due to rounding differences), breaking the groups. At 192 DPI, the list is 10 instruments tall!</figcaption>
</figure>
<h3 id="solution-showing-placeholders">Solution: showing placeholders?</h3>
<p>One possibility is providing a user option to show all instruments, including placeholders, from zero until the last occupied instrument slot. You can create or delete instruments in-place (filling or creating an empty slot), drag-and-drop to swap slots (both empty and full), and even insert, delete, or move instruments while shifting the rest forwards or backwards.</p>
<p>If you want to create or insert instruments past the largest-numbered slot, you’ll have to check a box to show all slots, even unoccupied ones. This will look ugly if empty columns are very narrow, but will look less ugly if all columns are the same width.</p>
<p>Unfortunately this solution doesn’t have the same properties as the “empty-named instruments” I’ve been using in FamiTracker, requiring users to adjust. If you’re using empty-but-shown instrument slots (instead of empty-named instruments as in FamiTracker), then pressing the “New Instrument” button won’t append an instrument to the end of the list (after all the empty-name instruments), but will instead fill the first empty slot.</p>
<h3 id="another-approach-openmpt">Another approach: OpenMPT</h3>
<p>OpenMPT has a tree view on the left of the window, showing a list of numbered samples, and (in many module formats) a list of numbered instruments. The numbers are integers starting from 1, unlike FamiTracker’s hex values starting from 00. OpenMPT behaves like a dynamic-size list of samples/instruments which may have empty names and no data. Contrast this with my previous idea of a fixed-size list of instruments, where each may be absent.</p>
<p>I’ve run some testing in a .mptm file on OpenMPT 1.29.07. It seems simple at first, but gets weirder the further you investigate.</p>
<ul>
<li>Right-clicking any sample/instrument and clicking “Insert Sample” or “Insert Instrument” will insert one <em>after</em> the one you’ve clicked, increasing the number of each subsequent sample/instrument by 1.
<ul>
<li>This makes sense under my proposed instrument scheme, and is not possible in FamiTracker.</li>
</ul>
</li>
<li>Inserting a sample/instrument at the very end of the list will create a blank sample/instrument (with a dimmed icon) at the end of the list. This can be repeated to add multiple blank samples/instruments.
<ul>
<li>This shows that OpenMPT displays (and probably stores) the sample/instrument lists with a variable “length” field. This functions quite differently than my proposed “show trailing placeholders” checkbox, and I suspect OpenMPT’s UI is better. OpenMPT makes it easier to append instruments, whereas my code makes it easier to insert instruments at large indices without filling the space before it.</li>
<li>I’m concerned that copying OpenMPT’s approach can lead to bugs. If my code stores instruments in a dynamic-length vector, it’s easy to index out of bounds (which can be avoided if I use custom getter functions that treat out-of-bounds indices as “no instrument present”). If my code stores instruments in a fixed-size array with a cosmetic length field (whether saved in the module or not), I can accidentally set a length shorter than the index of the largest instrument present + 1.</li>
</ul>
</li>
<li>Right-clicking any sample/instrument and clicking “Delete Sample/Instrument” replaces it with an empty slot (with a dimmed icon), and does not shift future instruments/samples back by 1 to fill the hole generated. The exception is deleting the last sample/instrument, which will decrease the list’s length by 1 and not leave behind an empty slot.
<ul>
<li>It would be nice to have a way to delete full/empty slots and shift everything backwards by 1.</li>
</ul>
</li>
</ul>
<p>Samples have strange behavior:</p>
<ul>
<li>Samples with no wave data have dimmed icons. This doesn’t mean much.</li>
<li>Opening the Samples tab and clicking the Insert Sample button (not to be confused with the Insert Sample right-click menu item) will <em>sometimes</em> insert one at the end of the list of visible samples… and <em>other times</em> overwrite samples with no name and no sample data (created by “Insert Sample” or “Delete Sample”) (and only insert a new sample if none exist). The resulting sample will be called <code class="language-plaintext highlighter-rouge">untitled</code>, and because of the non-empty name, cannot be overwritten or deleted.
<ul>
<li>Bizzare.</li>
</ul>
</li>
<li>Deleting any sample will delete all trailing “empty” samples (no name <em>and</em> no waveform). The only exception is if all samples in a module are empty, in which case it leaves one behind (all modules have at least 1 sample, and OpenMPT will never delete the last sample).</li>
</ul>
<p>Instruments have a different set of strange behavior:</p>
<ul>
<li>Instruments which have just been deleted have dimmed icons. The “Insert Instrument” <em>button</em> will insert an instrument into the first dimmed instrument, and append a new one if none exist. Changing any property of a deleted/dimmed instrument (name, contents) will undim its icon permanently (until deleted again). The “Insert Instrument” <em>menu item</em> will undim <em>all</em> icons.
<ul>
<li>So much for consistency.</li>
</ul>
</li>
<li>Trying to delete an instrument slot does nothing if it’s dimmed.</li>
<li>Deleting any instrument does not clear trailing empty instruments. Deleting the last instrument shrinks the list by 1 instead of dimming the last instrument, but if the second-last instrument was dimmed, it turns into the last instrument and remain dimmed. In this scenario, you cannot shrink the list any further; the Delete key does nothing, and the right-click menu does nothing. The only solution is to undim the icons and then delete them.</li>
</ul>
<h3 id="sidenote-openmpt-bugs">Sidenote: OpenMPT bugs</h3>
<p>In the sidebar, click a sample. Click again (opens a rename field with time delay) and rapidly press Delete before the rename is initiated. The rename will pop up after the delete dialog appears. Clicking Yes to delete will delete the instrument, but keep the rename field open.</p>
<p>I’ve gotten OpenMPT to omit a number in the sidebar’s sample/instrument numbering scheme (probably samples, forgot), after messing with it. It reappeared when I pressed F5 (which began playback).</p>
<h2 id="footnotes">Footnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>In FamiTracker:</p>
<ul>
<li>Tempo only matches “beats/min” if Speed * “Highlight 1” = 24.</li>
<li>Speed only matches “ticks/row” if Tempo = “ticks/second” * 2.5.</li>
<li>Speed defaults to 6 “ticks/row”. Highlight 1 defaults to 4 rows/beat. Tempo defaults to 150 “beats/min”. Ticks/second defaults to 60 (or 50 on PAL) because ticks are usually triggered by vblanks/frames.</li>
</ul>
<p><a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>nyanpasu64For those of you who aren’t already aware, ExoTracker is a tracker-like composing tool, based around subdividing beats instead of integer rows. This allows the user to place notes at arbitrary fractions of a beat (like sheet music), and additionally allows tracker-like delay effects (which can be negative, which is impossible in most trackers). Beat subdivision allows for mixing eighth notes and triplets, and using beats for timing (rather than rows) could make tempo calculation more intuitive than other trackers.An unsafe tour of Rust’s Send and Sync2021-01-01T06:54:00-08:002021-01-01T06:54:00-08:00https://nyanpasu64.github.io/blog/an-unsafe-tour-of-rust-s-send-and-sync<p>Rust’s concurrency safety is based around the <code class="language-plaintext highlighter-rouge">Send</code> and <code class="language-plaintext highlighter-rouge">Sync</code> traits. For people writing safe code, you don’t really need to understand these traits on a deep level, only enough to satisfy the compiler when it spits errors at you (or switch from <code class="language-plaintext highlighter-rouge">std</code> threads to Crossbeam scoped threads to make errors go away). However if you’re writing unsafe concurrent code, such as having a <code class="language-plaintext highlighter-rouge">&UnsafeCell<T></code> hand out <code class="language-plaintext highlighter-rouge">&T</code> and <code class="language-plaintext highlighter-rouge">&mut T</code>, you need to understand <code class="language-plaintext highlighter-rouge">Send</code> and <code class="language-plaintext highlighter-rouge">Sync</code> at a more fundamental level, to pick the appropriate trait bounds when writing <code class="language-plaintext highlighter-rouge">unsafe impl Send/Sync</code> statements, or add the appropriate <code class="language-plaintext highlighter-rouge">PhantomData<T></code> to your types.</p>
<p>In this article, I will explore the precise behavior of <code class="language-plaintext highlighter-rouge">Send</code> and <code class="language-plaintext highlighter-rouge">Sync</code>, and explain <em>why</em> the standard library’s trait bounds are the way they are.</p>
<h2 id="prior-art">Prior art</h2>
<blockquote>
<p>You can think of Send as “Exclusive access is thread-safe,” and Sync as “Shared access is thread-safe.”</p>
<p><a href="https://www.reddit.com/r/rust/comments/9elom2/why_does_implt_send_for_mut_t_require_t_send/">[Source]</a></p>
</blockquote>
<p>I recommended first reading <a href="https://limpet.net/mbrubeck/2019/02/07/rust-a-unique-perspective.html">“Rust: A unique perspective”</a>. This article gives a conceptual overview of the mechanics (unique and shared references) I will analyze in more depth.</p>
<h2 id="defining-sync-and-send">Defining Sync and Send</h2>
<p><code class="language-plaintext highlighter-rouge">T: Send</code> means <code class="language-plaintext highlighter-rouge">T</code> and <code class="language-plaintext highlighter-rouge">&mut T</code> (which allow dropping <code class="language-plaintext highlighter-rouge">T</code>) can be passed between threads. <code class="language-plaintext highlighter-rouge">T: Sync</code> means <code class="language-plaintext highlighter-rouge">&T</code> (which allows shared/aliased access to <code class="language-plaintext highlighter-rouge">T</code>) can be passed between threads. Either or both may be true for any given type. <code class="language-plaintext highlighter-rouge">T: Sync</code> ≡ <code class="language-plaintext highlighter-rouge">&T: Send</code> (by definition).</p>
<p>One way that <code class="language-plaintext highlighter-rouge">T: !Sync</code> can occur is <strong>if a type has non-atomic interior mutability</strong>. This means that every <code class="language-plaintext highlighter-rouge">&T</code> (there can be more than one) can mutate <code class="language-plaintext highlighter-rouge">T</code> at the same time non-atomically, causing data races if a <code class="language-plaintext highlighter-rouge">&T</code> is sent to another thread. <code class="language-plaintext highlighter-rouge">T: !Sync</code> includes <code class="language-plaintext highlighter-rouge">Cell<V></code> and <code class="language-plaintext highlighter-rouge">RefCell<V></code>, as well as <code class="language-plaintext highlighter-rouge">Rc<V></code> (which acts like <code class="language-plaintext highlighter-rouge">&(Cell<RefCount>, V)</code>).</p>
<p><code class="language-plaintext highlighter-rouge">T: !Send</code> <strong>if a type is bound to the current thread</strong>. Examples:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">MutexGuard</code>, where the “unlock” syscall must occur on the same thread as “lock”.</li>
<li><code class="language-plaintext highlighter-rouge">&V</code> where <code class="language-plaintext highlighter-rouge">V</code> can be modified non-atomically (only safe from a single thread) through multiple <code class="language-plaintext highlighter-rouge">&V</code> (explained above).</li>
</ul>
<h2 id="primitives">Primitives</h2>
<p>Most primitive types (like <code class="language-plaintext highlighter-rouge">i32</code>) are <code class="language-plaintext highlighter-rouge">Send+Sync</code>. They can be read through shared references (<code class="language-plaintext highlighter-rouge">&</code>) by multiple threads at once (<code class="language-plaintext highlighter-rouge">Sync</code>), and modified through unique references (<code class="language-plaintext highlighter-rouge">&mut</code>) by any one thread at a time (<code class="language-plaintext highlighter-rouge">Send</code>).</p>
<h2 id="owning-references">Owning references</h2>
<p><code class="language-plaintext highlighter-rouge">Box<T></code> and <code class="language-plaintext highlighter-rouge">&mut T</code> give the same access as having a <code class="language-plaintext highlighter-rouge">T</code> directly, so it shares the same Sync/Send status as <code class="language-plaintext highlighter-rouge">T</code>.</p>
<p>(Sidenote) Technically, <code class="language-plaintext highlighter-rouge">&mut T</code> allows swapping the <code class="language-plaintext highlighter-rouge">T</code> (which cannot panic), but prohibits moving the <code class="language-plaintext highlighter-rouge">T</code>. This is because moving invalidates the <code class="language-plaintext highlighter-rouge">&mut T</code>, and the <code class="language-plaintext highlighter-rouge">&mut T</code>s and <code class="language-plaintext highlighter-rouge">T</code> it’s constructed from.</p>
<p>For a demonstration of <code class="language-plaintext highlighter-rouge">&mut T</code>, see <a href="#example-passing-mut-t-send-between-threads">“Example: Passing <code class="language-plaintext highlighter-rouge">&mut (T: Send)</code> between threads”</a> section in this page.</p>
<h3 id="where-these-semantics-are-defined">Where these semantics are defined</h3>
<ul>
<li><a href="https://doc.rust-lang.org/std/primitive.reference.html#impl-Send-1"><code class="language-plaintext highlighter-rouge">impl Send for &mut T where T: Send</code></a></li>
<li><code class="language-plaintext highlighter-rouge">impl Sync for &mut T where T: Sync</code> is not on the page…</li>
<li><a href="https://doc.rust-lang.org/std/boxed/struct.Box.html#impl-Send"><code class="language-plaintext highlighter-rouge">impl Send for Box<T> where T: Send</code></a></li>
<li><a href="https://doc.rust-lang.org/std/boxed/struct.Box.html#impl-Sync"><code class="language-plaintext highlighter-rouge">impl Sync for Box<T> where T: Sync</code></a></li>
</ul>
<h2 id="shared-references">Shared references</h2>
<p>Unlike owning references, many <code class="language-plaintext highlighter-rouge">&T</code> can be created from the same <code class="language-plaintext highlighter-rouge">T</code>. And an unlimited number of <code class="language-plaintext highlighter-rouge">&T</code> and <code class="language-plaintext highlighter-rouge">Rc<T></code> and <code class="language-plaintext highlighter-rouge">Arc<T></code> copies/clones can point to the same <code class="language-plaintext highlighter-rouge">T</code>.</p>
<p>By definition, you can <code class="language-plaintext highlighter-rouge">Send</code> <code class="language-plaintext highlighter-rouge">&T</code> instances to other threads iff <code class="language-plaintext highlighter-rouge">T</code> is <code class="language-plaintext highlighter-rouge">Sync</code>. For example, <code class="language-plaintext highlighter-rouge">&i32</code> is <code class="language-plaintext highlighter-rouge">Send</code> because <code class="language-plaintext highlighter-rouge">i32</code> is <code class="language-plaintext highlighter-rouge">Sync</code>.</p>
<p>Less obvious is that <code class="language-plaintext highlighter-rouge">&T: Sync</code> requires that <code class="language-plaintext highlighter-rouge">T: Sync</code>. Why is this the case?</p>
<ul>
<li>Why must <code class="language-plaintext highlighter-rouge">T</code> be <code class="language-plaintext highlighter-rouge">Sync</code>? We want <code class="language-plaintext highlighter-rouge">&T: Sync</code>. This means <code class="language-plaintext highlighter-rouge">&&T</code> (which is clonable/copyable) is <code class="language-plaintext highlighter-rouge">Send</code>, allowing multiple threads to concurrently obtain <code class="language-plaintext highlighter-rouge">&&T</code> and <code class="language-plaintext highlighter-rouge">&T</code>, which is only legal if <code class="language-plaintext highlighter-rouge">T: Sync</code>.</li>
<li>Why is <code class="language-plaintext highlighter-rouge">&&T: Send</code> legal? Because <code class="language-plaintext highlighter-rouge">&T</code> lacks interior mutability (a <code class="language-plaintext highlighter-rouge">&&T</code> can’t modify the <code class="language-plaintext highlighter-rouge">&T</code> to point to a different <code class="language-plaintext highlighter-rouge">T</code>).</li>
</ul>
<h3 id="sources">Sources</h3>
<ul>
<li><a href="https://doc.rust-lang.org/std/primitive.reference.html#impl-Send"><code class="language-plaintext highlighter-rouge">impl Send for &T where T: Sync</code></a></li>
<li><code class="language-plaintext highlighter-rouge">impl Sync for &T where T: Sync</code> is not on the page…
<ul>
<li>For a demonstration, see the <a href="#example-t-send-or-sync-both-depend-on-t-sync">“Example: <code class="language-plaintext highlighter-rouge">&T: Send or Sync</code> both depend on <code class="language-plaintext highlighter-rouge">T: Sync</code>”</a> section in this page.</li>
</ul>
</li>
</ul>
<h2 id="interior-mutability">Interior mutability</h2>
<p><code class="language-plaintext highlighter-rouge">Cell<i32></code> (and <code class="language-plaintext highlighter-rouge">RefCell<i32></code>) is <code class="language-plaintext highlighter-rouge">!Sync</code> because it has single-threaded <strong>interior mutability</strong>, which translates to multithreaded <strong>data races</strong>.</p>
<p><code class="language-plaintext highlighter-rouge">UnsafeCell<i32></code> is <code class="language-plaintext highlighter-rouge">!Sync</code> to prevent misuse, since only some usages are <code class="language-plaintext highlighter-rouge">Sync</code> and <code class="language-plaintext highlighter-rouge">impl !Sync</code> is unstable. As a result, you need to <code class="language-plaintext highlighter-rouge">unsafe impl Sync</code> (which shows up in grep) if you want concurrent access.</p>
<h2 id="smart-pointers-rct">Smart pointers: <code class="language-plaintext highlighter-rouge">Rc<T></code></h2>
<p><code class="language-plaintext highlighter-rouge">Rc<i32></code> acts like <code class="language-plaintext highlighter-rouge">&(Cell<RefCount>, i32)</code>. It is <code class="language-plaintext highlighter-rouge">!Sync</code> because <code class="language-plaintext highlighter-rouge">Cell<RefCount></code> has <strong>interior mutability</strong> and <strong>data races</strong> on <code class="language-plaintext highlighter-rouge">RefCount</code>, and <code class="language-plaintext highlighter-rouge">!Send</code> because <code class="language-plaintext highlighter-rouge">Rc<T></code> is clonable, acts like a<code class="language-plaintext highlighter-rouge">&Cell<RefCount></code>, and <code class="language-plaintext highlighter-rouge">Cell<RefCount></code> is <code class="language-plaintext highlighter-rouge">!Sync</code>.</p>
<p>(Technically <code class="language-plaintext highlighter-rouge">Rc<i32></code> also acts like <code class="language-plaintext highlighter-rouge">&mut T</code> in its ability to drop <code class="language-plaintext highlighter-rouge">T</code>, but it doesn’t matter because it’s always <code class="language-plaintext highlighter-rouge">!Send</code> and <code class="language-plaintext highlighter-rouge">!Sync</code>.)</p>
<h3 id="sources-1">Sources</h3>
<ul>
<li><a href="https://doc.rust-lang.org/std/rc/struct.Rc.html#impl-Send"><code class="language-plaintext highlighter-rouge">impl<T> !Send for Rc<T></code></a></li>
<li><a href="https://doc.rust-lang.org/std/rc/struct.Rc.html#impl-Sync"><code class="language-plaintext highlighter-rouge">impl<T> !Sync for Rc<T></code></a></li>
</ul>
<h2 id="smart-pointers-arct-atomic-refcounting">Smart pointers: <code class="language-plaintext highlighter-rouge">Arc<T></code> (atomic refcounting)</h2>
<p><code class="language-plaintext highlighter-rouge">Arc<T></code> is a doozy. It acts like <code class="language-plaintext highlighter-rouge">&(Atomic<RefCount>, T)</code> in its ability to alias <code class="language-plaintext highlighter-rouge">T</code>, and <code class="language-plaintext highlighter-rouge">T</code>/<code class="language-plaintext highlighter-rouge">&mut T</code> in its ability to drop or <code class="language-plaintext highlighter-rouge">get_mut</code> or <code class="language-plaintext highlighter-rouge">try_unwrap</code> the <code class="language-plaintext highlighter-rouge">T</code>.</p>
<p>Because <code class="language-plaintext highlighter-rouge">&T</code> can alias, <code class="language-plaintext highlighter-rouge">Arc<T>: Send+Sync</code> requires <code class="language-plaintext highlighter-rouge">T: Sync</code>.</p>
<p>Additionally, <code class="language-plaintext highlighter-rouge">Arc<T>: Send</code> requires <code class="language-plaintext highlighter-rouge">T: Send</code> (because you can move <code class="language-plaintext highlighter-rouge">Arc<T></code> across threads, and <code class="language-plaintext highlighter-rouge">T</code> with it).</p>
<p>And <code class="language-plaintext highlighter-rouge">Arc<T>: Sync</code> requires <code class="language-plaintext highlighter-rouge">T: Send</code>, because if <code class="language-plaintext highlighter-rouge">T: !Send</code> but <code class="language-plaintext highlighter-rouge">Arc<T>: Sync</code>, you could clone the Arc (via <code class="language-plaintext highlighter-rouge">&Arc<T></code>) from another thread, and drop (or <code class="language-plaintext highlighter-rouge">get_mut</code> or <code class="language-plaintext highlighter-rouge">try_unwrap</code>) the clone last, violating <code class="language-plaintext highlighter-rouge">T: !Send</code>.</p>
<p>(<code class="language-plaintext highlighter-rouge">Atomic<RefCount></code> is <code class="language-plaintext highlighter-rouge">Send+Sync</code> and does not contribute to <code class="language-plaintext highlighter-rouge">Arc</code>’s thread safety.)</p>
<h3 id="sources-2">Sources</h3>
<ul>
<li><a href="https://doc.rust-lang.org/std/sync/struct.Arc.html#impl-Send"><code class="language-plaintext highlighter-rouge">impl<T> Send for Arc<T> where T: Send + Sync</code></a></li>
<li><a href="https://doc.rust-lang.org/std/sync/struct.Arc.html#impl-Sync"><code class="language-plaintext highlighter-rouge">impl<T> Sync for Arc<T> where T: Send + Sync</code></a></li>
</ul>
<p>This was also discussed in a <a href="https://stackoverflow.com/questions/41909811/why-does-arct-require-t-to-be-both-send-and-sync-in-order-to-be-send">Stack Overflow question</a>.</p>
<h2 id="mutexes">Mutexes</h2>
<p><code class="language-plaintext highlighter-rouge">Mutex<T></code> is <code class="language-plaintext highlighter-rouge">Sync</code> even if <code class="language-plaintext highlighter-rouge">T</code> isn’t, because if multiple threads obtain <code class="language-plaintext highlighter-rouge">&Mutex<T></code>, they can’t all obtain <code class="language-plaintext highlighter-rouge">&T</code>.</p>
<p><code class="language-plaintext highlighter-rouge">Mutex<T>: Sync</code> requires <code class="language-plaintext highlighter-rouge">T: Send</code>. We want <code class="language-plaintext highlighter-rouge">&Mutex</code> to be <code class="language-plaintext highlighter-rouge">Send</code>, meaning multiple threads can lock the mutex and obtain a <code class="language-plaintext highlighter-rouge">&mut T</code> (which lets you swap <code class="language-plaintext highlighter-rouge">T</code> and control which thread calls <code class="language-plaintext highlighter-rouge">Drop</code>). To hand-wave, exclusive access to <code class="language-plaintext highlighter-rouge">T</code> gets passed between threads, requiring that <code class="language-plaintext highlighter-rouge">T: Send</code>.</p>
<p><code class="language-plaintext highlighter-rouge">Mutex<T>: Send</code> requires <code class="language-plaintext highlighter-rouge">T: Send</code> because <code class="language-plaintext highlighter-rouge">Mutex</code> is a value type.</p>
<p><code class="language-plaintext highlighter-rouge">MutexGuard<T></code> is <code class="language-plaintext highlighter-rouge">!Send</code> because it’s <strong>bound to the constructing thread</strong> (on some OSes including Windows, you can’t send or exchange “responsibility for freeing a mutex” to another thread). Otherwise it acts like a <code class="language-plaintext highlighter-rouge">&mut T</code>, which is <code class="language-plaintext highlighter-rouge">Sync</code> if T is <code class="language-plaintext highlighter-rouge">Sync</code>. Additionally you can extract a <code class="language-plaintext highlighter-rouge">&mut T</code> (which is <code class="language-plaintext highlighter-rouge">Send</code>) using <code class="language-plaintext highlighter-rouge">&mut *guard</code>.</p>
<h3 id="sources-3">Sources</h3>
<ul>
<li><a href="https://doc.rust-lang.org/std/sync/struct.Mutex.html#impl-Send"><code class="language-plaintext highlighter-rouge">Mutex</code> traits</a></li>
<li><a href="https://doc.rust-lang.org/std/sync/struct.MutexGuard.html#impl-Send"><code class="language-plaintext highlighter-rouge">MutexGuard</code> traits</a></li>
</ul>
<h3 id="contrived-corner-cases">Contrived corner cases</h3>
<p><code class="language-plaintext highlighter-rouge">Mutex<MutexGuard<i32>></code> is <code class="language-plaintext highlighter-rouge">!Sync</code> because <code class="language-plaintext highlighter-rouge">MutexGuard<i32></code> is <code class="language-plaintext highlighter-rouge">!Send</code>.</p>
<h2 id="thoughts-on-trait-bounds-and-flexibility-for-users">Thoughts on trait bounds and flexibility for users</h2>
<p>Why does <code class="language-plaintext highlighter-rouge">Arc<T></code> not have a <code class="language-plaintext highlighter-rouge">where T: Send + Sync</code> trait bound, but instead allows you to construct <code class="language-plaintext highlighter-rouge">Arc<T></code> for any <code class="language-plaintext highlighter-rouge">T</code> (but just not send/share it across threads)?</p>
<p>I’ve heard that you should avoid putting trait bounds in types, but (if I remember correctly) instead in method implementations, or (in the case of <code class="language-plaintext highlighter-rouge">Arc</code>) in conditional <code class="language-plaintext highlighter-rouge">Send</code>/<code class="language-plaintext highlighter-rouge">Sync</code> implementations. One person said:</p>
<blockquote>
<p>The reason the restrictions are usually on the implementations rather than on the type in general is that you don’t usually know every possible implementation
If you later realize you can add other functionality, you can just add additional impl blocks with different restrictions, whereas if they were on the type you would potentially have to worry about unifying the restrictions (which can be really awkward) or removing them altogether</p>
</blockquote>
<p>When asking about this topic, I was pointed to the <a href="https://rust-lang.github.io/api-guidelines/about.html">Rust API guidelines</a>, but I couldn’t find any discussion of this issue.</p>
<hr />
<p>I personally encountered this topic when I used an <code class="language-plaintext highlighter-rouge">Arc</code> internally for <a href="https://github.com/nyanpasu64/spectro2/blob/master/flip-cell/src/lib.rs">the <code class="language-plaintext highlighter-rouge">flip-cell</code> crate</a> (which turns out to be equivalent to <a href="https://github.com/Ralith/oddio/blob/main/src/swap.rs">Oddio’s <code class="language-plaintext highlighter-rouge">Swap</code> type</a> and the <a href="https://github.com/HadrienG2/triple-buffer"><code class="language-plaintext highlighter-rouge">triple-buffer</code> crate</a>).</p>
<p><code class="language-plaintext highlighter-rouge">Arc<T>: Sync</code> is only safe if <code class="language-plaintext highlighter-rouge">T: Send</code>, not just <code class="language-plaintext highlighter-rouge">T: Sync</code>; this is because another thread can look at an <code class="language-plaintext highlighter-rouge">&Arc<T></code>, clone it, and obtain an <code class="language-plaintext highlighter-rouge">Arc<T></code> sharing ownership over the same object. But if we create a type <code class="language-plaintext highlighter-rouge">FlipReader<T></code> (<a href="https://github.com/nyanpasu64/spectro2/blob/05561a21d85fc5fc0e8e92140edf01d6b64401bc/flip-cell/src/lib.rs#L188-L201">source</a>) which contains an <code class="language-plaintext highlighter-rouge">Arc<Wrapper<T>></code> but prohibits cloning it, then making <code class="language-plaintext highlighter-rouge">FlipReader<T>: Sync</code> does not allow another thread to take shared ownership of <code class="language-plaintext highlighter-rouge">Wrapper<T></code>, so the <code class="language-plaintext highlighter-rouge">Wrapper<T>: Send</code> trait bound is unnecessary.</p>
<p>Had the struct <code class="language-plaintext highlighter-rouge">Arc<T></code> required <code class="language-plaintext highlighter-rouge">T: Send + Sync</code> to even be constructed, <code class="language-plaintext highlighter-rouge">Arc</code> would be crippled as a building block for unsafe code.</p>
<h2 id="example-passing-mut-t-send-between-threads">Example: Passing <code class="language-plaintext highlighter-rouge">&mut (T: Send)</code> between threads</h2>
<p>Cell is <code class="language-plaintext highlighter-rouge">Send</code> but not <code class="language-plaintext highlighter-rouge">Sync</code>. Both <code class="language-plaintext highlighter-rouge">Cell</code> and <code class="language-plaintext highlighter-rouge">&mut Cell</code> can be passed between threads. The following code builds as-is, but not if <code class="language-plaintext highlighter-rouge">&mut</code> is changed to <code class="language-plaintext highlighter-rouge">&</code>.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="n">thread</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">cell</span><span class="p">::</span><span class="n">Cell</span><span class="p">;</span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="c">// Send + !Sync</span>
<span class="k">let</span> <span class="n">cell_ref</span><span class="p">:</span> <span class="o">&</span><span class="k">mut</span> <span class="n">Cell</span><span class="o"><</span><span class="nb">i32</span><span class="o">></span> <span class="o">=</span> <span class="nn">Box</span><span class="p">::</span><span class="nf">leak</span><span class="p">(</span><span class="nn">Box</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">Cell</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">0</span><span class="p">)));</span>
<span class="nn">thread</span><span class="p">::</span><span class="nf">spawn</span><span class="p">(</span><span class="k">move</span> <span class="p">||</span> <span class="p">{</span>
<span class="n">cell_ref</span><span class="nf">.replace</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="example-t-send-or-sync-both-depend-on-t-sync">Example: <code class="language-plaintext highlighter-rouge">&T: Send or Sync</code> both depend on <code class="language-plaintext highlighter-rouge">T: Sync</code></h2>
<p>If <code class="language-plaintext highlighter-rouge">T: !Sync</code> (for example <code class="language-plaintext highlighter-rouge">Cell</code>), then <code class="language-plaintext highlighter-rouge">&T</code> is neither <code class="language-plaintext highlighter-rouge">Send</code> nor <code class="language-plaintext highlighter-rouge">Sync</code>.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">cell</span><span class="p">::</span><span class="n">Cell</span><span class="p">;</span>
<span class="k">fn</span> <span class="n">ensure_sync</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="n">Sync</span><span class="o">></span><span class="p">(</span><span class="mi">_</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{}</span>
<span class="k">fn</span> <span class="n">ensure_send</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="nb">Send</span><span class="o">></span><span class="p">(</span><span class="mi">_</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{}</span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">foo</span> <span class="o">=</span> <span class="nn">Cell</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">1i32</span><span class="p">);</span>
<span class="nf">ensure_sync</span><span class="p">(</span><span class="o">&</span><span class="n">foo</span><span class="p">);</span>
<span class="nf">ensure_send</span><span class="p">(</span><span class="o">&</span><span class="n">foo</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Trying to compile this code returns the errors:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Standard Error
Compiling playground v0.0.1 (/playground)
error[E0277]: `Cell<i32>` cannot be shared between threads safely
--> src/main.rs:8:17
|
3 | fn ensure_sync<T: Sync>(_: T) {}
| ---- required by this bound in `ensure_sync`
...
8 | ensure_sync(&foo);
| ^^^^ `Cell<i32>` cannot be shared between threads safely
|
= help: within `&Cell<i32>`, the trait `Sync` is not implemented for `Cell<i32>`
= note: required because it appears within the type `&Cell<i32>`
error[E0277]: `Cell<i32>` cannot be shared between threads safely
--> src/main.rs:9:17
|
4 | fn ensure_send<T: Send>(_: T) {}
| ---- required by this bound in `ensure_send`
...
9 | ensure_send(&foo);
| ^^^^ `Cell<i32>` cannot be shared between threads safely
|
= help: the trait `Sync` is not implemented for `Cell<i32>`
= note: required because of the requirements on the impl of `Send` for `&Cell<i32>`
</code></pre></div></div>
<p>If <code class="language-plaintext highlighter-rouge">T: !Send + Sync</code> (for example <code class="language-plaintext highlighter-rouge">MutexGuard</code>), then <code class="language-plaintext highlighter-rouge">&T</code> is still <code class="language-plaintext highlighter-rouge">Send + Sync</code>. (This makes sense, because <code class="language-plaintext highlighter-rouge">T: !Send</code> only constrains the behavior of a <code class="language-plaintext highlighter-rouge">&mut T</code>, and should not affect the properties of a <code class="language-plaintext highlighter-rouge">&T</code>.)</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">marker</span><span class="p">::</span><span class="n">PhantomData</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">sync</span><span class="p">::</span><span class="n">MutexGuard</span><span class="p">;</span>
<span class="k">fn</span> <span class="n">ensure_sync</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="n">Sync</span><span class="o">></span><span class="p">(</span><span class="mi">_</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{}</span>
<span class="k">fn</span> <span class="n">ensure_send</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="nb">Send</span><span class="o">></span><span class="p">(</span><span class="mi">_</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{}</span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">foo</span> <span class="o">=</span> <span class="nn">PhantomData</span><span class="p">::</span><span class="o"><</span><span class="n">MutexGuard</span><span class="o"><</span><span class="nb">i32</span><span class="o">>></span> <span class="p">{};</span>
<span class="nf">ensure_sync</span><span class="p">(</span><span class="o">&</span><span class="n">foo</span><span class="p">);</span>
<span class="nf">ensure_send</span><span class="p">(</span><span class="o">&</span><span class="n">foo</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p><em>This blog post was edited on 2021-02-09 to fix minor errors and clarify <code class="language-plaintext highlighter-rouge">Rc<V></code>.</em></p>nyanpasu64Rust’s concurrency safety is based around the Send and Sync traits. For people writing safe code, you don’t really need to understand these traits on a deep level, only enough to satisfy the compiler when it spits errors at you (or switch from std threads to Crossbeam scoped threads to make errors go away). However if you’re writing unsafe concurrent code, such as having a &UnsafeCell<T> hand out &T and &mut T, you need to understand Send and Sync at a more fundamental level, to pick the appropriate trait bounds when writing unsafe impl Send/Sync statements, or add the appropriate PhantomData<T> to your types.ExoTracker Issues - Abandoning the grid2020-10-10T19:16:00-07:002020-10-10T19:16:00-07:00https://nyanpasu64.github.io/blog/exotracker-issues-abandoning-the-grid<p>Trackers have decades of design, with interlocking features and design decisions, many based on the assumption that every event is quantized on a grid:</p>
<ul>
<li>You don’t need lines above events, since it’s obvious which row the event is in.
<ul>
<li>In regular trackers, events are treated as taking up height. Most events are triggered when the cursor enters them, but pattern-jump effects are triggered when the cursor exits them.</li>
</ul>
</li>
<li>Empty grid cells have dashes in them, to indicate an empty slot belonging to a subcolumn (note/instrument/volume/effect…) and row. (I refer to channels as columns, for historical reasons and because Renoise uses the same terminology.)</li>
</ul>
<p>ExoTracker’s central feature is that events/notes no longer have to be quantized to a grid, but are stored as rational numbers (fractions) in terms of “beats”. Events are mapped onto rows using a zoom level (rows per beat), and changing the zoom level causes notes to fall onto or off gridlines.</p>
<p>(tl;dr: Skip forwards to <a href="#per-digit-cursors">“Per-digit cursors”</a> where the problems start.)</p>
<figure class="image">
<img srcset="exotracker_full_subcolumn_cursor.png 1.25x" src="exotracker_full_subcolumn_cursor.png" alt="In ExoTracker, each subcolumn has a different background color. The instrument subcolumn has two digits plus padding. The cursor is located in the instrument subcolumn, and takes up both digits plus padding." />
<figcaption>In ExoTracker, each subcolumn has a different background color. The instrument subcolumn has two digits plus padding. The cursor is located in the instrument subcolumn, and takes up both digits plus padding.</figcaption>
</figure>
<p>As a result, many tracker conventions must be adapted to work with off-grid notes.</p>
<ul>
<li>Closely spaced notes/events can cause text to draw over other text. I programmed lower events to erase the text drawn by upper events.</li>
<li>The “events have height” model breaks down when off-grid events exist and you can change the zoom level. So I draw a line above every note, representing the instant in time the event occurs at (which I find easier to reason about than 1-row-tall events).
<ul>
<li>Initially, for each event, I drew a line across all subcolumns (even empty ones), but that became confusing when combined with closely spaced events erasing text in only occupied subcolumns (not empty ones).</li>
<li>So now only occupied subcolumns have lines.</li>
</ul>
</li>
<li>Pressing Delete 4 times will no longer delete all notes in the next 4 rows, because it won’t erase events between rows.
<ul>
<li>I felt that the alternative, where pressing Delete also deletes notes in between rows, would be more confusing and inflexible for users.</li>
</ul>
</li>
<li>So I made “cursor step” both snap to the grid and off-grid notes… which is actually rather janky and surprising.
<ul>
<li>The issue is that my code snaps to all events in the channel, not just events with a non-empty value in the cursor X position’s subcolumn. Changing this would complicate the code.</li>
<li>To alleviate this, I turned off off-grid event snapping, and added a mode where pressing Delete (or entering a value) steps to the next event, regardless of the cursor’s X subcolumn. This is useful for deleting many events, adding or changing instruments, etc.</li>
</ul>
</li>
<li>Trackers usually have gaps between note/event/volume/effect, so I add padding around each subcolumn.</li>
<li>If empty grid cells are drawn with dashes, then on-grid notes will erase the dash. But it’s not clear if off-grid or triplet notes should erase the dashes or not. I chose to replace dashes with horizontal gridlines. But they don’t indicate subcolumns, so I add colored backgrounds.
<ul>
<li>One alternative would be to draw dashes before drawing events, and simply erase dashes whenever text overlaps them. This looks fine with on-grid events, but with off-grid events, you can end up with partial dashes which might look a bit ugly.
<ul>
<li><strong>In retrospect, this was probably a better idea, since colored backgrounds proved to be a disaster.</strong></li>
</ul>
</li>
</ul>
</li>
</ul>
<h2 id="per-digit-cursors">Per-digit cursors</h2>
<p>Everything was more-or-less working, until I added per-digit cursors and the setup broke down. Suddenly the cursor width became highly inconsistent; the cursor is wider when you place it in the leftmost or rightmost digit of a subcolumn.</p>
<figure class="image">
<img srcset="exotracker_cursor_positions.gif 1.25x" src="exotracker_cursor_positions.gif" alt="The effect subcolumn has three positions, or one character plus two digits. The cursor is wider in the left and right subcolumn, and narrower in the center." />
<figcaption>The effect subcolumn has three positions, or one character plus two digits. The cursor is wider in the left and right subcolumn, and narrower in the center.</figcaption>
</figure>
<p>So I can make the cursor narrower… now there’s a gap around the cursor. And several people have said the gap looks very weird, so this isn’t a good solution.</p>
<figure class="image">
<img srcset="exotracker_gap_around_cursor.png 1.25x" src="exotracker_gap_around_cursor.png" alt="There is a gap between the left border of the subcolumn's background, and the cursor's left boundary. The same issue occurs with the right of each subcolumn." />
<figcaption>There is a gap between the left border of the subcolumn's background, and the cursor's left boundary. The same issue occurs with the right of each subcolumn.</figcaption>
</figure>
<figure class="image">
<img srcset="exotracker_gap_around_cursor_2.png 1.25x" src="exotracker_gap_around_cursor_2.png" alt="There is a gap between the note subcolumn's boundaries and the cursor's boundaries." />
<figcaption>There is a gap between the note subcolumn's boundaries and the cursor's boundaries.</figcaption>
</figure>
<p>This is actually what FamiTracker does too, but it doesn’t look ugly because there aren’t background stripes.</p>
<figure class="image">
<img srcset="famitracker_cursor_positions.gif 1.25x" src="famitracker_cursor_positions.gif" alt="I screen-recorded all cursor positions of FamiTracker into an animated GIF, and overlayed the frames into a single screenshot. There are gaps to the left and right of each subcolumn (note, instrument, volume, and effect). The composite screenshot is ugly." />
<figcaption>I screen-recorded all cursor positions of FamiTracker into an animated GIF, and overlayed the frames into a single screenshot. There are gaps to the left and right of each subcolumn (note, instrument, volume, and effect). The composite screenshot is ugly.</figcaption>
</figure>
<p>So do I make the stripes narrower?</p>
<figure class="image">
<img srcset="exotracker_narrow_subcolumn_background.png 1.25x" src="exotracker_narrow_subcolumn_background.png" alt="The background color is confined to the width filled by text and selectable by the cursor. There are gray stripes around each subcolumn. It's still ugly." />
<figcaption>The background color is confined to the width filled by text and selectable by the cursor. There are gray stripes around each subcolumn. It's still ugly.</figcaption>
</figure>
<p>Get rid of the dividers too?</p>
<figure class="image">
<img srcset="no_dividers.png 1.25x" src="no_dividers.png" alt="Now the subcolumn dividers are gone, replaced by gray stripes. It looks weird to have middle gaps wider than side gaps." />
<figcaption>Now the subcolumn dividers are gone, replaced by gray stripes. It looks weird to have middle gaps wider than side gaps.</figcaption>
</figure>
<p>So I can make them the same width… but then either the note lines overlap, or they have gaps, or the widths are inconsistent. This will cause problems with mouse handling as well.</p>
<figure class="image">
<img srcset="no_dividers_rearranged.png 1.25x" src="no_dividers_rearranged.png" alt="The gray lines are rearranged, such that the width between subcolumns equals the width between a subcolumn and a channel divider. The note lines overlap." />
<figcaption>The gray lines are rearranged, such that the width between subcolumns equals the width between a subcolumn and a channel divider. The note lines overlap.</figcaption>
</figure>
<h2 id="a-solution">A solution?</h2>
<p>At this point, I feel that background stripes to indicate subcolumn boundaries, and cursors which occupy digits within a multi-digit subcolumn, are simply incompatible.</p>
<p>If I want to keep that latter feature, I could remove background stripes and replace them with dashes, which are erased whenever text overlaps them. This looks fine with on-grid events, but with off-grid events, you can end up with partial dashes which might look a bit ugly.</p>nyanpasu64Trackers have decades of design, with interlocking features and design decisions, many based on the assumption that every event is quantized on a grid:ExoTracker Newsletter #12020-08-26T01:46:00-07:002020-08-26T01:46:00-07:00https://nyanpasu64.github.io/blog/exotracker-newsletter-1<p>I just finished implementing timeline entry editing. Since I have school coming up, I decided to release a demo of its current state. Since my summary was getting a bit too long to post in Discord, I decided to write a blog post / newsletter.</p>
<h2 id="demo-download">Demo download</h2>
<p>Windows 64-bit: <a href="https://ci.appveyor.com/api/buildjobs/3e0uv6sxov74g50d/artifacts/exotracker-v1.0.60-dev.7z">https://ci.appveyor.com/api/buildjobs/3e0uv6sxov74g50d/artifacts/exotracker-v1.0.60-dev.7z</a></p>
<p>Source: <a href="https://gitlab.com/nyanpasu64/exotracker-cpp/-/tree/timeline-editor">https://gitlab.com/nyanpasu64/exotracker-cpp/-/tree/timeline-editor</a> (currently commit d5386ea0). It only compiles in recent GCC and Clang (only tested Clang 10), due to using statement expressions.</p>
<h2 id="demo-notes">Demo notes</h2>
<ul>
<li><strong>Press Space to enable note entry, and Enter to play.</strong> Unfortunately, note preview is not supported yet.</li>
<li>ExoTracker uses a FamiTracker-style piano layout.</li>
<li>Only Famicom/NES APU1 is supported. Some demo songs have dual APU1 which can be used for composing.</li>
<li>Notes, instruments, and volumes are supported. Effects are not.</li>
<li>Try passing in names of sample documents as command-line arguments. Listed in order from most to least useful:
<ul>
<li>Partial songs: <code class="language-plaintext highlighter-rouge">dream-fragments</code>, <code class="language-plaintext highlighter-rouge">world-revolution</code> (default song)</li>
<li><code class="language-plaintext highlighter-rouge">empty</code> (add your own notes)</li>
<li><code class="language-plaintext highlighter-rouge">audio-test</code> (dual APU1) (sounds bad, but useful for finding audio stuttering)</li>
<li><code class="language-plaintext highlighter-rouge">block-test</code> (dual APU1) (rendering test for block system, no notes or events)</li>
<li><code class="language-plaintext highlighter-rouge">render-test</code> (sounds bad, negative octave text is too wide for the screen)</li>
</ul>
</li>
<li>Some sample documents have short and/or looped blocks (the gray rectangles to the left of each channel), which are not possible in most other trackers (I don’t know if LSDj and C64 trackers support this). But right now, users can only create full-grid blocks, and cannot delete blocks.
<ul>
<li>The block system is powerful, but unfortunately not editable through the UI yet, so you can’t try it out to see useful it is.</li>
<li>Pattern reuse is not implemented.</li>
</ul>
</li>
<li>All edits are undoable. Some but not all timeline edits save cursor position.</li>
<li>There are buttons for reordering timeline entries. They’re supposed to have icons instead of text, but icons are only available on my machine.</li>
<li>The actual timeline widget (list of rows) is unfinished and will be replaced with a custom-drawn widget.</li>
<li>The audio code will lock up if you decrease the timeline row length until a block has a negative length 😉</li>
</ul>
<h2 id="timeline-system-overview">Timeline system overview</h2>
<p><strong>tl;dr skip forward to “Demo feedback” if you want to just play with the program instead of reading documentation.</strong></p>
<p>The frame/order editor is replaced with a timeline editor, and its functionality is changed significantly.</p>
<p>The pattern grid structure from existing trackers is carried over (under the name of timeline rows and grid cells). Each timeline row has its own length which can vary between rows (like OpenMPT, unlike FamiTracker). Each timeline row holds one timeline cell (or grid cell) per channel. However, unlike patterns, timeline cells do not contain events directly, but through several layers of indirection.</p>
<p>A timeline cell can hold zero or more blocks, which carry a start and end time (in integer beats) and a pattern. These blocks have nonzero length, do not overlap in time, occur in increasing time order, and lie between 0 and the timeline cell’s length (the last block’s end time can take on a special value corresponding to “end of cell”)[1].</p>
<p>Each block contains a single pattern, consisting of a list of events and an optional loop duration (in integer beats). The pattern starts playing when absolute time reaches the block’s start time, and stops playing when absolute time reaches the block’s end time. If the loop duration is set, whenever relative time (within the pattern) reaches the loop duration, playback jumps back to the pattern’s begin. A block can cut off a pattern’s events early when time reaches the block’s end time (either the pattern’s initial play or during a loop). However a block cannot start playback partway into a pattern (no plans to add support yet).</p>
<p>Eventually, patterns can be reused in multiple blocks at different times (and possibly different channels).</p>
<p>[1] I’m not sure what to do if a user shrinks a timeline row, which causes an numeric-end block to end past the cell, or an “end of cell” block to have a size ≤ 0, etc.</p>
<h3 id="motivation">Motivation</h3>
<p>The timeline system is intended to allow treating the program like FamiStudio or a tracker, with timestamps encoded relative to current pattern/frame begin, and reuse at pattern-level granularity. If you try to enter a note/volume/effect in a region without a block in place, a block is automatically created in the current channel, filling all empty space available (up to an entire grid cell) (not implemented yet).</p>
<p>It is also intended to have a similar degree of flexibility as a DAW like Reaper (fine-grained block splitting and looping). The tradeoff is that because global timestamps are relative to grid cell begin, blocks are not allowed to cross grid cell boundaries (otherwise it would be painful to convert between block/pattern-relative and global timestamps).</p>
<h2 id="unresolved-questions">Unresolved questions</h2>
<ul>
<li>Are the gray block rectangles (to the left of each channel) ugly? I’m planning to use those to allow dragging patterns around, resizing them, and distinguishing reused patterns through color.</li>
<li>Should I rename the timeline to something else?
<ul>
<li>Sequence?</li>
<li>Order? (I feel it’s bad because “order” implies every entry is merely the ID of a single pattern, but in reality is a container for 0 or more loopable patterns.)</li>
<li>OpenMPT has an “order list” widget to edit a “sequence” of patterns; it uses two names for similar concepts.</li>
</ul>
</li>
<li>What should I call a row/unit in the timeline editor? It’s treated as a coarse unit of time and a container for blocks/patterns, but is not a pattern.
<ul>
<li>Grid row? Timeline/sequence row? (I keep confusing “timeline row” with “pattern row”. “Grid” is concise, but I don’t know if it’s an unintuitive name.)</li>
<li>Segment?</li>
<li>(Timeline/sequence) entry?</li>
<li>Cell? (currently used for “one row, one channel”)</li>
</ul>
</li>
<li>How should I improve my current behavior when adding and deleting timeline entries?
<ul>
<li>Should adding a new timeline entry move the cursor to the new entry’s row 0? Should undoing move the cursor to the same spot, move it to the old location, or leave the cursor in place?</li>
<li>Should deleting an timeline entry move the cursor to the former entry’s row 0? Should undoing move the cursor to the same spot, move it to the old location, or leave the cursor in place?</li>
<li>Eventually I’ll add the ability to right-click and add/delete timeline entries other than the one the cursor is in. Should the cursor move to the right-clicked entry, and stay there after undoing?</li>
</ul>
</li>
<li>Once the timeline widget is implemented, what should it show?
<ul>
<li>Titles for each sequence entry?</li>
<li>“Pattern overview” with coarse-grained visualizations of blocks (gray for unique blocks, colored for shared)? Should shared patterns have numbers? Names? Should all patterns have numbers (an idea I’m not a fan of)?</li>
<li>Draw both (and somehow try to find enough room for both)?</li>
<li>0CC has bookmarks and highlights, but doesn’t show all names.</li>
</ul>
</li>
</ul>
<h2 id="feedback">Feedback</h2>
<p>If you find any crash bugs, let me know. (Some tricky-to-get-right areas were deleting the last row in the timeline, or deleting a long row and the cursor moves into the next, shorter, row.)</p>
<p>If you have any UI or behavior suggestions, tell me too. (I personally think I got the code reasonably watertight, but the UI behavior is a toss-up and I have no clue how people will react.)</p>
<p>You can report issues at <a href="https://gitlab.com/nyanpasu64/exotracker-cpp/-/issues">https://gitlab.com/nyanpasu64/exotracker-cpp/-/issues</a>.</p>nyanpasu64I just finished implementing timeline entry editing. Since I have school coming up, I decided to release a demo of its current state. Since my summary was getting a bit too long to post in Discord, I decided to write a blog post / newsletter.Describing convolution using item-based indexing and inclusive ranges2020-05-08T13:54:00-07:002020-05-08T13:54:00-07:00https://nyanpasu64.github.io/blog/describing-convolution-using-item-based-indexing-and-inclusive-ranges<p><em>This is a follow-up to my previous post, <a href="../the-gridline-mental-model-of-indexing-and-slicing">“The gridline mental model of indexing and slicing”</a>. I split this out because it’s related to DSP as well as programming, and may not be as interesting to the broader programming audience.</em></p>
<hr />
<p>In some cases, it’s useful to think of array indices as pointing to individual items (not fenceposts), and represent sets of items using inclusive ranges. For example, in DSP (digital signal processing), “signals” are effectively arrays of samples (AKA amplitudes) at signed-integer indices. If you have a signal of length N starting at index 0, you can treat it as an infinite signal that’s only nonzero at indices <code class="language-plaintext highlighter-rouge">[0, N-1]</code> inclusive. All other indices (negative indices, and indices N and above) have a value of zero.</p>
<p>Convolution is a process where you “spread out” each nonzero element in a signal by an “impulse response”. One example of convolution is taking a picture with a shaky or defocused camera, where we assume all objects in the image are distorted or defocused equally.</p>
<p>Every point of light is smudged into a blob or streak. If you assume the point of light starts at an “original position”, the blob or streak is an image (two-dimensional signal) which maps positions (relative to the point of light) onto intensities. This signal is known as an “impulse response”. Every object gets “smudged” by that impulse response (blob or streak). This process of “smudging” is convolving the image by the impulse response.</p>
<p>Convolution also applies to 1-dimensional signals like audio. Filtering or adding reverb to audio is convolving the signal by an impulse response (which is the result of sending a short impulse or pop through the filter/reverb).</p>
<p>If you convolve (or smudge) a one-dimensional signal of length <code class="language-plaintext highlighter-rouge">L</code> (which can only be nonzero at indices <code class="language-plaintext highlighter-rouge">[0, L-1]</code>) by an impulse response of length <code class="language-plaintext highlighter-rouge">P</code> (which can only be nonzero at indices <code class="language-plaintext highlighter-rouge">[0, P-1]</code>), the resulting signal can only be nonzero at indices <code class="language-plaintext highlighter-rouge">[0, (L-1) + (P-1)]</code> or <code class="language-plaintext highlighter-rouge">[0, L+P-2]</code>, and will have length <code class="language-plaintext highlighter-rouge">L+P-1</code>. (I think this formula also generalizes to two or more dimensions!)</p>
<h2 id="inclusive-ranges-in-block-convolution-very-technical">Inclusive ranges in block convolution (very technical)</h2>
<p>Convolving a long signal by a short kernel (aka impulse response) of length <code class="language-plaintext highlighter-rouge">P</code> is often faster if you split the long signal into chunks, then compute the output in blocks of length <code class="language-plaintext highlighter-rouge">L</code>. This is because the FFT has runtime O(N log N), and one large FFT can be slower than many smaller FFTs.</p>
<p>One method is overlap-add convolution. If you break a long signal into blocks of length <code class="language-plaintext highlighter-rouge">L</code>, each block is only nonzero between <code class="language-plaintext highlighter-rouge">[0, L-1]</code>. And if your filter kernel has length <code class="language-plaintext highlighter-rouge">P</code> and starts at index 0, it’s only nonzero between <code class="language-plaintext highlighter-rouge">[0, P-1]</code>. And if you convolve signals of length <code class="language-plaintext highlighter-rouge">L</code> and <code class="language-plaintext highlighter-rouge">P</code>, the largest index with nonzero amplitude is <code class="language-plaintext highlighter-rouge">L+P-2</code>. And the resulting signal has support <code class="language-plaintext highlighter-rouge">[0, L+P-2]</code> and length <code class="language-plaintext highlighter-rouge">L+P-1</code>.</p>
<p>Another example is overlap-save convolution. If you pick a block of <code class="language-plaintext highlighter-rouge">L+P-1</code> samples from the input signal, the <em>beginning</em> of each convolution output is corrupted and must be discarded. In particular, trailing samples at indices ≤ <code class="language-plaintext highlighter-rouge">-1</code> are spread out by a filter kernel of support <code class="language-plaintext highlighter-rouge">[0, P-1]</code>, corrupting outputs <code class="language-plaintext highlighter-rouge">[0, P-2]</code>, forcing you to discard the first <code class="language-plaintext highlighter-rouge">P-1</code> samples. If you don’t prepend zeros to the input signal, overlap-save convolution loses the first <code class="language-plaintext highlighter-rouge">P-1</code> samples of the input/output.</p>
<h2 id="credits">Credits</h2>
<p>Thanks to ax6 for helping me edit this article.</p>nyanpasu64This is a follow-up to my previous post, “The gridline mental model of indexing and slicing”. I split this out because it’s related to DSP as well as programming, and may not be as interesting to the broader programming audience.The gridline mental model of indexing and slicing2020-05-08T13:26:00-07:002020-05-08T13:26:00-07:00https://nyanpasu64.github.io/blog/the-gridline-mental-model-of-indexing-and-slicing<p><em>Republished from my <a href="https://gist.github.com/nyanpasu64/c01e50ad97b1a92ccea374c3f941dd93#file-index-md">Github gist</a>.</em></p>
<p>Integer indexes can either represent fenceposts (gridlines) or item pointers, and there’s a sort of duality.</p>
<h2 id="mental-model-gridline-based-asymmetric-indexing">Mental model: Gridline-based “asymmetric indexing”</h2>
<p>Memory or data is treated as a “pool of memory”. Pointers and indices do not refer to <em>elements</em>, but <em>gaps between elements</em> (in other words, fenceposts or gridlines). This is the same way I think about wall clocks and musical time subdivision, where time is continuous and timestamps refer to <em>instants</em> which separate regions of time.</p>
<p>In C and Python, array indexing can be interpreted via a mental model of gridlines. If <code class="language-plaintext highlighter-rouge">a</code> is an array holding elements, then <code class="language-plaintext highlighter-rouge">a[x]</code> is the element after gridline <code class="language-plaintext highlighter-rouge">x</code>. I call this “asymmetric indexing” (since every pointer refers to memory lying on the right side of it), but it’s a useful convention. In C, if the array <code class="language-plaintext highlighter-rouge">a</code> holds elements of size <code class="language-plaintext highlighter-rouge">s</code>, <code class="language-plaintext highlighter-rouge">a[x]</code> occupies bytes from <code class="language-plaintext highlighter-rouge">(byte*)(a) + s*x</code> up until <code class="language-plaintext highlighter-rouge">(byte*)(a) + s*(x+1)</code>.</p>
<p>In Python, if <code class="language-plaintext highlighter-rouge">x</code> is a list (actually a resizable contiguous array), <code class="language-plaintext highlighter-rouge">x[0]</code> is the first element (after gridline 0), and <code class="language-plaintext highlighter-rouge">x[-1]</code> is the last element, 1 before the end (after gridline <code class="language-plaintext highlighter-rouge">len(x)-1</code>). This behavior matches a subset of modular arithmetic.</p>
<p>In C++, <code class="language-plaintext highlighter-rouge">iterator</code> and <code class="language-plaintext highlighter-rouge">reverse_iterator</code> both point to fenceposts between items. An array can have valid iterators or reverse iterators pointing to “before the first element”, “after the last element”, or anywhere in between.</p>
<p>Dereferencing a forward iterator accesses the element <em>after</em> the gridline, much like <code class="language-plaintext highlighter-rouge">*ptr</code> with a raw pointer. However, dereferencing a reverse iterator accesses the element <em>before</em> the gridline, which compiles to <code class="language-plaintext highlighter-rouge">*(ptr - 1)</code>. As a result, <code class="language-plaintext highlighter-rouge">reverse_iterator</code> appears to be slightly slower on actual CPUs: <a href="https://stackoverflow.com/a/2549554">https://stackoverflow.com/a/2549554</a>.</p>
<p>cppreference.com has a diagram attempting to explain <code class="language-plaintext highlighter-rouge">reverse_iterator</code>:</p>
<p><img src="http://upload.cppreference.com/mwiki/images/3/39/range-rbegin-rend.svg" alt="cppreference.com reverse_iterator diagram" /></p>
<p>I think the diagram is badly designed and unnecessarily confusing, with two arrows coming from the top of the diagram, and two pictures of the array offset by one element. It’s technically not wrong, but it assumes that pointers point to <em>objects</em>, not <em>fenceposts</em>, which is a very inelegant mental model for this purpose.</p>
<h2 id="alternative-mental-model-item-based-indexing">Alternative mental model: Item-based indexing</h2>
<p>In pure math and DSP, and at high levels of abstraction, you can instead treat each item as an indivisible entity, rather than occupying a region of memory bounded between 2 endpoints. Then indexing points to an object, not an address or gridline in memory. In this mental model, slicing behaves quite differently.</p>
<p>You can choose to index from 0 or 1. Indexing from 0 or 1 is somewhat orthogonal to gridline-based or item-based indexing. However, most gridline-based languages index from 0, and many item-based languages index from 1.</p>
<p>The R language operates under this mental model. Much like mathematical notation, indexes begin at 1, and ranges of items <code class="language-plaintext highlighter-rouge">a[1:5]</code> are inclusive on both ends. In fact, <code class="language-plaintext highlighter-rouge">1:5</code> generates a vector of integers <code class="language-plaintext highlighter-rouge">1 2 3 4 5</code>.</p>
<p>The item-based mental model (with inclusive ranges) is useful in some cases, for example in DSP. However I moved that to a separate article, <a href="../describing-convolution-using-item-based-indexing-and-inclusive-ranges">“Describing convolution using item-based indexing and inclusive ranges”</a>, since it’s not closely related to indexing.</p>
<h2 id="gridline-based-slicing-and-closed-closed-indexing">Gridline-based slicing and closed-closed indexing</h2>
<p>Assume you have an array <code class="language-plaintext highlighter-rouge">a</code> with <code class="language-plaintext highlighter-rouge">N</code> elements.</p>
<p>For a region between gridlines <code class="language-plaintext highlighter-rouge">a ≤ b</code> to be valid, <code class="language-plaintext highlighter-rouge">a ≥ arr</code> and <code class="language-plaintext highlighter-rouge">b ≤ arr + N</code>. Note that <code class="language-plaintext highlighter-rouge">b</code> (and also <code class="language-plaintext highlighter-rouge">a</code>) is allowed to be equal to the final gridline, which is a perfectly valid gridline! The only reason people consider it “out of bounds” or “past the end of the array” is because it has no element to its right (cannot be used for asymmetric indexing).</p>
<p>What are the valid indices into an array of length N, treating the first element as 0?</p>
<ul>
<li>Conventional wisdom believes that valid array indices lie in a closed-open range.</li>
<li>begin ∈ [0..N) since element 0 is valid, but element 0 is past the end of the array.</li>
</ul>
<p>Another approach is to model “valid array indices” as a special case of “valid array slices”, where the slice is of length 1. Under this approach, valid indices lie within a “closed-closed” inclusive range.</p>
<p>What are the valid starting indices of length-1 regions within an array?</p>
<ul>
<li>begin ≥ 0, otherwise the start of the region will lie outside the array.</li>
<li>begin + 1 ≤ N, otherwise the end of the region will lie outside the array.</li>
<li>begin ∈ [0..N-1]</li>
</ul>
<p>What are the valid starting indices of length-2 regions within an array?</p>
<ul>
<li>begin ≥ 0, otherwise the start of the region will lie outside the array.</li>
<li>begin + 2 ≤ N, otherwise the end of the region will lie outside the array.</li>
<li>begin ∈ [0..N-2]</li>
</ul>
<p>In summary, obtaining indexes from inclusive ranges are good because “valid indexes” are a special case of “valid slice starting indices” which are modeled well by inclusive ranges. Under this line of logic, a pair of pointers, <code class="language-plaintext highlighter-rouge">(pointer to begin, pointer to end)</code>, describes a slice of memory. I feel like this mental model is underused, and explaining it would help people understand C++’s <code class="language-plaintext highlighter-rouge">reverse_iterator</code> better.</p>
<p>Obtaining indexes from half-open ranges are good if you either assume “asymmetric indexing” (don’t think in terms of slicing), or treat each item as an indivisible entity (alternative mental model, zero-indexed). Under this line of logic, <code class="language-plaintext highlighter-rouge">(pointer to begin, pointer to end)</code> is interpreted as <code class="language-plaintext highlighter-rouge">(pointer to first element, pointer past the final element)</code>, which is how how I’ve seen it be described by some people.</p>
<p>I feel languages should have both half-open ranges to generate indexes, and inclusive ranges to generate slice endpoints. Python only has half-open ranges, and math only has inclusive ranges. Rust has both, but unfortunately inclusive ranges are very slow and unoptimized compared to half-open ranges.</p>
<h2 id="issue-negative-indexing-is-asymmetric">Issue: negative indexing is asymmetric</h2>
<p>To me, negative indexing is awkward in python. The first 2 elements in an list are <code class="language-plaintext highlighter-rouge">a[0]</code> and <code class="language-plaintext highlighter-rouge">a[1]</code>, but the last 2 elements are <code class="language-plaintext highlighter-rouge">a[-1]</code> and <code class="language-plaintext highlighter-rouge">a[-2]</code>. Interpreting this under the grid model, this arises because indexing <code class="language-plaintext highlighter-rouge">a[i]</code> takes the element <em>after</em> gridline <code class="language-plaintext highlighter-rouge">i</code>, which is inherently asymmetric.</p>
<h2 id="issue-modular-negative-slicing-and-circularity-is-ambiguous">Issue: modular negative slicing and circularity is ambiguous</h2>
<p>In Python, <em>item indexes</em> into a length-N array (where integer indices refer to the item after the gridline) conform to mod-N arithmetic. Each integer index is either interpreted mod N, or raises an “out of bounds” exception.</p>
<p>However, <em>slice endpoints</em> do <em>not</em> quite conform to modular indexing mod N. This is because fenceposts 0 and N are distinct gridlines in memory, but are conflated under mod-N operation.</p>
<p>In Python, if you want to access the last 2 elements in a length-N array, you can write <code class="language-plaintext highlighter-rouge">a[N-2:N-0]</code>. If you treat slice endpoints as modular indexes mod N, you can abbreviate this to <code class="language-plaintext highlighter-rouge">[-2:-0]</code>. But this instead returns an empty slice from N-2 to 0, since unlike array indexes, Python slice endpoints don’t quite conform to modular indexing. And Python has no concept of a “negative zero” integer index meaning something different.</p>
<p>Because CSS grid has no fencepost 0, it sidesteps this issue entirely. Negating a slice endpoint always switches between “counting from the left” and “counting from the right”.</p>
<h3 id="numpy-violates-modular-arithmetic">Numpy violates modular arithmetic</h3>
<p>One place where this issue comes up is in Numpy. By analogy, in Python, you can assign to list slices to replace part of a list with other elements. For example, you can write <code class="language-plaintext highlighter-rouge">a[x:y] = [...]</code>. To insert one item, you can write <code class="language-plaintext highlighter-rouge">a[x:x] = [1]</code>. Given Python’s slicing rules, <code class="language-plaintext highlighter-rouge">a[0:0] = [1]</code> inserts an element before the beginning of the list, and <code class="language-plaintext highlighter-rouge">a[-1:-1] = [1]</code> inserts an element <em>before</em> the last element of the list (not at the end of the list!) This is better written as <code class="language-plaintext highlighter-rouge">a.insert(x, 1)</code> where x can be any valid fencepost (including N which is not a valid index).</p>
<p>Numpy has an operation called <code class="language-plaintext highlighter-rouge">np.stack()</code> where you combine two or more N-dimensional arrays into a N+1-dimensional array. All input arrays have identical <code class="language-plaintext highlighter-rouge">shape: N-dimensional tuple</code> determining the dimensionality needed to index the array all the way. The output array has the same <code class="language-plaintext highlighter-rouge">shape</code> as the inputs, but with an extra element equal to the number of arrays you’ve passed in.</p>
<p><code class="language-plaintext highlighter-rouge">np.stack(axis=0)</code> is analogous to <code class="language-plaintext highlighter-rouge">shape.insert(0, number of inputs)</code>. But <code class="language-plaintext highlighter-rouge">np.stack(axis=-1)</code> is analogous to <code class="language-plaintext highlighter-rouge">shape.insert(N - 0, number of inputs)</code>, not <code class="language-plaintext highlighter-rouge">N - 1</code>. 🤢</p>
<h3 id="css-grid-fixes-negative-slicing-but-not-negative-indexing">CSS Grid fixes negative slicing but not negative indexing</h3>
<p>CSS Grid allows web developers to dynamically position elements in table-like grids. In this case, fenceposts are <em>literally</em> gridlines between on-screen items. A layout with N columns (declared using <code class="language-plaintext highlighter-rouge">grid-template-columns</code>) has N+1 gridlines. (Tracks are columns or rows.)</p>
<p>Interestingly, gridline 0 does not exist. Gridline 1 is the leftmost gridline before the first item, and gridline N+1 is the rightmost gridline after the last item. Also, gridline -1 is the rightmost gridline, and gridline -(N+1) is the leftmost gridline. This is 1 greater than Python’s positive slicing, and 1 smaller than Python’s negative slicing.</p>
<p>In the case of <code class="language-plaintext highlighter-rouge">grid-column</code> and <code class="language-plaintext highlighter-rouge">grid-row</code>, when inserting an item into the table, you can “slice” using <code class="language-plaintext highlighter-rouge">a / b</code> syntax to specify a start and end gridline. Or you can “index” using <code class="language-plaintext highlighter-rouge">a</code> syntax, so the browser infers <code class="language-plaintext highlighter-rouge">b = a+1</code> (the item spans one track = row or column). Which is <em>almost</em> an amazing idea. Except when <code class="language-plaintext highlighter-rouge">a</code> is -1, then <code class="language-plaintext highlighter-rouge">b</code> is inferred to be 0, not -2. And you end up with an item placed “out of bounds” and past the last column and gridline you declared. They were <em>so close</em> to achieving perfect symmetry between positive and negative indexing. At least in CSS you won’t get any buffer overflows 😉</p>
<h3 id="text-field-cursor-affinity">Text field cursor affinity</h3>
<p>A related issue is in text editing. If you’re in a long paragraph and press the End key on a keyboard, the cursor will be placed after the last word in the current line, and after the space too. If you go to the next line and press the Home key, the cursor will be placed before the first word. But these 2 locations represent the same byte index into the text document! At this point, if you press the left and right arrow keys, you’ll get unusual cursor behavior which differs between programs:</p>
<ul>
<li>Sublime Text snaps the cursor to the previous line (which I don’t like).</li>
<li>VS Code keeps the cursor on the current line.</li>
<li>Chrome, Notepad, and Qt apps snap the cursor to the next line.</li>
<li>Firefox treats “end of the current line” and “beginning of the second line” as separate locations. If you’re at the end of a line (the same spot as the beginning of the next word), you need to press Right twice to get 1 character into the next word!</li>
</ul>
<p>The same behavior occurs if a single very long word is wrapped across multiple lines. And each program listed above behaves identically, regardless if you’re wrapping a paragraph or single word.</p>
<p>This behavior was briefly described in <a href="https://lord.io/blog/2019/text-editing-hates-you-too/">https://lord.io/blog/2019/text-editing-hates-you-too/</a> “Affinity”. That site only mentions single long words wrapped across multiple lines.</p>
<p>Reality is awful. There is no perfect solution.</p>
<h3 id="ring-buffers">Ring buffers</h3>
<p>(Note that I am not an expert on ring buffers.)</p>
<p>A ring buffer contains a length-N array, and (one design choice is) two pointers/indices into the array. The assumption is that the “write pointer” points to (the gridline before) the first element not written yet, and the “read pointer” points to (the gridline before) the first element which can be read. If a ring buffer has begin_ptr == end_ptr, is it empty or full? You can’t tell! One solution is to always leave 1 element unwritten at all times. Another is to keep one pointer and a length counter (which ranges from 0 through N inclusive).</p>
<h2 id="prior-art">Prior art</h2>
<p><a href="https://wiki.c2.com/?FencePostError">https://wiki.c2.com/?FencePostError</a></p>
<p><a href="https://news.ycombinator.com/item?id=6601515">https://news.ycombinator.com/item?id=6601515</a>, first comment <a href="https://news.ycombinator.com/item?id=6602497">https://news.ycombinator.com/item?id=6602497</a></p>nyanpasu64Republished from my Github gist.