RSS.Style logo RSS/Atom Feed Analysis


Analysis of https://jvns.ca/atom.xml

Feed fetched in 155 ms.
Warning Content type is application/xml, not text/xml.
Feed is 303,418 characters long.
Feed has an ETag of W/"0c063f078053a15687b2faaae11f146b-ssl-df".
Warning Feed is missing the Last-Modified HTTP header.
Warning This feed does not have a stylesheet.
This appears to be an Atom feed.
Feed title: Julia Evans
Error Feed self link does not match feed URL: http://jvns.ca/atom.xml.
Feed has 20 items.
Home page URL: http://jvns.ca/
Error Home page URL is on a different protocol: http:.
Warning Home page URL redirected to https://jvns.ca/.
Home page has feed discovery link in <head>.
Home page has a link to the feed in the <body>

Formatted XML
<feed xmlns="http://www.w3.org/2005/Atom">
    <title><![CDATA[Julia Evans]]></title>
    <link href="http://jvns.ca/atom.xml" rel="self"/>
    <link href="http://jvns.ca"/>
    <updated>2025-03-07T13:18:31+00:00</updated>
    <id>http://jvns.ca</id>
    <author>
        <name>Julia Evans</name>
    </author>
    <generator uri="http://gohugo.io/">Hugo</generator>
    <entry>
        <title type="html"><![CDATA[Standards for ANSI escape codes]]></title>
        <link href="https://jvns.ca/blog/2025/03/07/escape-code-standards/"/>
        <updated>2025-03-07T00:00:00+00:00</updated>
        <id>https://jvns.ca/blog/2025/03/07/escape-code-standards/</id>
        <content type="html"><![CDATA[<p>Hello! Today I want to talk about ANSI escape codes.</p>
<p>For a long time I was vaguely aware of ANSI escape codes (&ldquo;that&rsquo;s how you make
text red in the terminal and stuff&rdquo;) but I had no real understanding of where they were
supposed to be defined or whether or not there were standards for them. I just
had a kind of vague &ldquo;there be dragons&rdquo; feeling around them. While learning
about the terminal this year, I&rsquo;ve learned that:</p>
<ol>
<li>ANSI escape codes are responsible for a lot of usability improvements
in the terminal (did you know there&rsquo;s a way to copy to your system clipboard
when SSHed into a remote machine?? It&rsquo;s an escape code called <a href="https://jvns.ca/til/vim-osc52/">OSC 52</a>!)</li>
<li>They aren&rsquo;t completely standardized, and because of that they don&rsquo;t always
work reliably. And because they&rsquo;re also invisible, it&rsquo;s extremely
frustrating to troubleshoot escape code issues.</li>
</ol>
<p>So I wanted to put together a list for myself of some standards that exist
around escape codes, because I want to know if they <em>have</em> to feel unreliable
and frustrating, or if there&rsquo;s a future where we could all rely on them with
more confidence.</p>
<ul>
<li><a href="#what-s-an-escape-code">what&rsquo;s an escape code?</a></li>
<li><a href="#ecma-48">ECMA-48</a></li>
<li><a href="#xterm-control-sequences">xterm control sequences</a></li>
<li><a href="#terminfo">terminfo</a></li>
<li><a href="#should-programs-use-terminfo">should programs use terminfo?</a></li>
<li><a href="#is-there-a-single-common-set-of-escape-codes">is there a &ldquo;single common set&rdquo; of escape codes?</a></li>
<li><a href="#some-reasons-to-use-terminfo">some reasons to use terminfo</a></li>
<li><a href="#some-more-documents-standards">some more documents/standards</a></li>
<li><a href="#why-i-think-this-is-interesting">why I think this is interesting</a></li>
</ul>
<h3 id="what-s-an-escape-code">what&rsquo;s an escape code?</h3>
<p>Have you ever pressed the left arrow key in your terminal and seen <code>^[[D</code>?
That&rsquo;s an escape code! It&rsquo;s called an &ldquo;escape code&rdquo; because the first character
is the &ldquo;escape&rdquo; character, which is usually written as <code>ESC</code>, <code>\x1b</code>, <code>\E</code>,
<code>\033</code>, or <code>^[</code>.</p>
<p>Escape codes are how your terminal emulator communicates various kinds of
information (colours, mouse movement, etc) with programs running in the
terminal. There are two kind of escape codes:</p>
<ol>
<li><strong>input codes</strong> which your terminal emulator sends for keypresses or mouse
movements that don&rsquo;t fit into Unicode. For example &ldquo;left arrow key&rdquo; is
<code>ESC[D</code>, &ldquo;Ctrl+left arrow&rdquo; might be <code>ESC[1;5D</code>, and clicking the mouse might
be something like <code>ESC[M :3</code>.</li>
<li><strong>output codes</strong> which programs can print out to colour text, move the
cursor around, clear the screen, hide the cursor, copy text to the
clipboard, enable mouse reporting, set the window title, etc.</li>
</ol>
<p>Now let&rsquo;s talk about standards!</p>
<h3 id="ecma-48">ECMA-48</h3>
<p>The first standard I found relating to escape codes was
<a href="https://ecma-international.org/wp-content/uploads/ECMA-48_5th_edition_june_1991.pdf">ECMA-48</a>,
which was originally published in 1976.</p>
<p>ECMA-48 does two things:</p>
<ol>
<li>Define some general <em>formats</em> for escape codes (like &ldquo;CSI&rdquo; codes, which are
<code>ESC[</code> + something and &ldquo;OSC&rdquo; codes, which are <code>ESC]</code> + something)</li>
<li>Define some specific escape codes, like how &ldquo;move the cursor to the left&rdquo; is
<code>ESC[D</code>, or &ldquo;turn text red&rdquo; is  <code>ESC[31m</code>. In the spec, the &ldquo;cursor left&rdquo;
one is called <code>CURSOR LEFT</code> and the one for changing colours is called
<code>SELECT GRAPHIC RENDITION</code>.</li>
</ol>
<p>The formats are extensible, so there&rsquo;s room for others to define more escape
codes in the future. Lots of escape codes that are popular today aren&rsquo;t defined
in ECMA-48: for example it&rsquo;s pretty common for terminal applications (like vim,
htop, or tmux) to support using the mouse, but ECMA-48 doesn&rsquo;t define escape
codes for the mouse.</p>
<h3 id="xterm-control-sequences">xterm control sequences</h3>
<p>There are a bunch of escape codes that aren&rsquo;t defined in ECMA-48, for example:</p>
<ul>
<li>enabling mouse reporting (where did you click in your terminal?)</li>
<li>bracketed paste (did you paste that text or type it in?)</li>
<li>OSC 52 (which terminal applications can use to copy text to your system clipboard)</li>
</ul>
<p>I believe (correct me if I&rsquo;m wrong!) that these and some others came from
xterm, are documented in <a href="https://invisible-island.net/xterm/ctlseqs/ctlseqs.html">XTerm Control Sequences</a>, and have
been widely implemented by other terminal emulators.</p>
<p>This list of &ldquo;what xterm supports&rdquo; is not a standard exactly, but xterm is
extremely influential and so it seems like an important document.</p>
<h3 id="terminfo">terminfo</h3>
<p>In the 80s (and to some extent today, but my understanding is that it was MUCH
more dramatic in the 80s) there was a huge amount of variation in what escape
codes terminals actually supported.</p>
<p>To deal with this, there&rsquo;s a database of escape codes for various terminals
called &ldquo;terminfo&rdquo;.</p>
<p>It looks like the standard for terminfo is called <a href="https://publications.opengroup.org/c243-1">X/Open Curses</a>, though you need to create
an account to view that standard for some reason. It defines the database format as well
as a C library interface (&ldquo;curses&rdquo;) for accessing the database.</p>
<p>For example you can run this bash snippet to see every possible escape code for
&ldquo;clear screen&rdquo; for all of the different terminals your system knows about:</p>
<pre><code>for term in $(toe -a | awk '{print $1}')
do
  echo $term
  infocmp -1 -T &quot;$term&quot; 2&gt;/dev/null | grep 'clear=' | sed 's/clear=//g;s/,//g'
done
</code></pre>
<p>On my system (and probably every system I&rsquo;ve ever used?), the terminfo database is managed by ncurses.</p>
<h3 id="should-programs-use-terminfo">should programs use terminfo?</h3>
<p>I think it&rsquo;s interesting that there are two main approaches that applications
take to handling ANSI escape codes:</p>
<ol>
<li>Use the terminfo database to figure out which escape codes to use, depending
on what&rsquo;s in the <code>TERM</code> environment variable. Fish does this, for example.</li>
<li>Identify a &ldquo;single common set&rdquo; of escape codes which works in &ldquo;enough&rdquo;
terminal emulators and just hardcode those.</li>
</ol>
<p>Some examples of programs/libraries that take approach #2 (&ldquo;don&rsquo;t use terminfo&rdquo;) include:</p>
<ul>
<li><a href="https://github.com/mawww/kakoune/commit/c12699d2e9c2806d6ed184032078d0b84a3370bb">kakoune</a></li>
<li><a href="https://github.com/prompt-toolkit/python-prompt-toolkit/blob/165258d2f3ae594b50f16c7b50ffb06627476269/src/prompt_toolkit/input/ansi_escape_sequences.py#L5-L8">python-prompt-toolkit</a></li>
<li><a href="https://github.com/antirez/linenoise">linenoise</a></li>
<li><a href="https://github.com/rockorager/libvaxis">libvaxis</a></li>
<li><a href="https://github.com/chalk/chalk">chalk</a></li>
</ul>
<p>I got curious about why folks might be moving away from terminfo and I found
this very interesting and extremely detailed
<a href="https://twoot.site/@bean/113056942625234032">rant about terminfo from one of the fish maintainers</a>, which argues that:</p>
<blockquote>
<p>[the terminfo authors] have done a lot of work that, at the time, was
extremely important and helpful. My point is that it no longer is.</p>
</blockquote>
<p>I&rsquo;m not going to do it justice so I&rsquo;m not going to summarize it, I think it&rsquo;s
worth reading.</p>
<h3 id="is-there-a-single-common-set-of-escape-codes">is there a &ldquo;single common set&rdquo; of escape codes?</h3>
<p>I was just talking about the idea that you can use a &ldquo;common set&rdquo; of escape
codes that will work for most people. But what is that set? Is there any agreement?</p>
<p>I really do not know the answer to this at all, but from doing some reading it
seems like it&rsquo;s some combination of:</p>
<ul>
<li>The codes that the VT100 supported (though some aren&rsquo;t relevant on modern terminals)</li>
<li>what&rsquo;s in ECMA-48 (which I think also has some things that are no longer relevant)</li>
<li>What xterm supports (though I&rsquo;d guess that not everything in there is actually widely supported enough)</li>
</ul>
<p>and maybe ultimately &ldquo;identify the terminal emulators you think your users are
going to use most frequently and test in those&rdquo;, the same way web developers do
when deciding which CSS features are okay to use</p>
<p>I don&rsquo;t think there are any resources like <a href="https://caniuse.com/">Can I use&hellip;?</a> or
<a href="https://web-platform-dx.github.io/web-features/">Baseline</a> for the terminal
though. (in theory terminfo is supposed to be the &ldquo;caniuse&rdquo; for the terminal
but it seems like it often takes 10+ years to add new terminal features when
people invent them which makes it very limited)</p>
<h3 id="some-reasons-to-use-terminfo">some reasons to use terminfo</h3>
<p>I also asked on Mastodon why people found terminfo valuable in 2025 and got a
few reasons that made sense to me:</p>
<ul>
<li>some people expect to be able to use the <code>TERM</code> environment variable to
control how programs behave (for example with <code>TERM=dumb</code>), and there&rsquo;s
no standard for how that should work in a post-terminfo world</li>
<li>even though there&rsquo;s <em>less</em> variation between terminal emulators than
there was in the 80s, there&rsquo;s far from zero variation: there are graphical
terminals, the Linux framebuffer console, the situation you&rsquo;re in when
connecting to a server via its serial console, Emacs shell mode, and probably
more that I&rsquo;m missing</li>
<li>there is no one standard for what the &ldquo;single common set&rdquo; of escape codes
is, and sometimes programs use escape codes which aren&rsquo;t actually widely
supported enough</li>
</ul>
<h3 id="terminfo-user-agent-detection">terminfo &amp; user agent detection</h3>
<p>The way that ncurses uses the <code>TERM</code> environment variable to decide which
escape codes to use reminds me of how webservers used to sometimes use the
browser user agent to decide which version of a website to serve.</p>
<p>It also seems like it&rsquo;s had some of the same results &ndash; the way iTerm2 reports
itself as being &ldquo;xterm-256color&rdquo; feels similar to how Safari&rsquo;s user agent is
&ldquo;Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7_4) AppleWebKit/605.1.15 (KHTML,
like Gecko) Version/18.3 Safari/605.1.15&rdquo;. In both cases the terminal emulator
/ browser ends up changing its user agent to get around user agent detection
that isn&rsquo;t working well.</p>
<p>On the web we ended up deciding that user agent detection was not a good
practice and to instead focus on standardization so we can serve the same
HTML/CSS to all browsers. I don&rsquo;t know if the same approach is the future in
the terminal though &ndash; I think the terminal landscape today is much more
fragmented than the web ever was as well as being much less well funded.</p>
<h3 id="some-more-documents-standards">some more documents/standards</h3>
<p>A few more documents and standards related to escape codes, in no particular order:</p>
<ul>
<li>the <a href="https://man7.org/linux/man-pages/man4/console_codes.4.html">Linux console_codes man page</a> documents
escape codes that Linux supports</li>
<li>how the <a href="https://vt100.net/docs/vt100-ug/chapter3.html">VT 100</a> handles escape codes &amp; control sequences</li>
<li>the <a href="https://sw.kovidgoyal.net/kitty/keyboard-protocol/">kitty keyboard protocol</a></li>
<li><a href="https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda">OSC 8</a> for links in the terminal (and notes on <a href="https://github.com/Alhadis/OSC8-Adoption?tab=readme-ov-file">adoption</a>)</li>
<li>A <a href="https://github.com/tmux/tmux/blob/882fb4d295deb3e4b803eb444915763305114e4f/tools/ansicode.txt">summary of ANSI standards from tmux</a></li>
<li>this <a href="https://iterm2.com/feature-reporting/">terminal features reporting specification from iTerm</a></li>
<li>sixel graphics</li>
</ul>
<h3 id="why-i-think-this-is-interesting">why I think this is interesting</h3>
<p>I sometimes see people saying that the unix terminal is &ldquo;outdated&rdquo;, and since I
love the terminal so much I&rsquo;m always curious about what incremental changes
might make it feel less &ldquo;outdated&rdquo;.</p>
<p>Maybe if we had a clearer standards landscape (like we do on the web!) it would
be easier for terminal emulator developers to build new features and for
authors of terminal applications to more confidently adopt those features so
that we can all benefit from them and have a richer experience in the terminal.</p>
<p>Obviously standardizing ANSI escape codes is not easy (ECMA-48 was first
published almost 50 years ago and we&rsquo;re still not there!). I don&rsquo;t even know
what all of the challenges are. But the situation with HTML/CSS/JS used to be
extremely bad too and now it&rsquo;s MUCH better, so maybe there&rsquo;s hope.</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[How to add a directory to your PATH]]></title>
        <link href="https://jvns.ca/blog/2025/02/13/how-to-add-a-directory-to-your-path/"/>
        <updated>2025-02-13T12:27:56+00:00</updated>
        <id>https://jvns.ca/blog/2025/02/13/how-to-add-a-directory-to-your-path/</id>
        <content type="html"><![CDATA[<p>I was talking to a friend about how to add a directory to your PATH today. It&rsquo;s
something that feels &ldquo;obvious&rdquo; to me since I&rsquo;ve been using the terminal for a
long time, but when I searched for instructions for how to do it, I actually
couldn&rsquo;t find something that explained all of the steps &ndash; a lot of them just
said &ldquo;add this to <code>~/.bashrc</code>&rdquo;, but what if you&rsquo;re not using bash? What if your
bash config is actually in a different file? And how are you supposed to figure
out which directory to add anyway?</p>
<p>So I wanted to try to write down some more complete directions and mention some
of the gotchas I&rsquo;ve run into over the years.</p>
<p>Here&rsquo;s a table of contents:</p>
<ul>
<li><a href="#step-1-what-shell-are-you-using">step 1: what shell are you using?</a></li>
<li><a href="#step-2-find-your-shell-s-config-file">step 2: find your shell&rsquo;s config file</a>
<ul>
<li><a href="#a-note-on-bash-s-config-file">a note on bash&rsquo;s config file</a></li>
</ul>
</li>
<li><a href="#step-3-figure-out-which-directory-to-add">step 3: figure out which directory to add</a>
<ul>
<li><a href="#step-3-1-double-check-it-s-the-right-directory">step 3.1: double check it&rsquo;s the right directory</a></li>
</ul>
</li>
<li><a href="#step-4-edit-your-shell-config">step 4: edit your shell config</a></li>
<li><a href="#step-5-restart-your-shell">step 5: restart your shell</a></li>
<li>problems:
<ul>
<li><a href="#problem-1-it-ran-the-wrong-program">problem 1: it ran the wrong program</a></li>
<li><a href="#problem-2-the-program-isn-t-being-run-from-your-shell">problem 2: the program isn&rsquo;t being run from your shell</a></li>
<li><a href="#problem-3-duplicate-path-entries-making-it-harder-to-debug">problem 3: duplicate PATH entries making it harder to debug</a></li>
<li><a href="#problem-4-losing-your-history-after-updating-your-path">problem 4: losing your history after updating your PATH</a></li>
</ul>
</li>
<li>notes:
<ul>
<li><a href="#a-note-on-source">a note on source</a></li>
<li><a href="#a-note-on-fish-add-path">a note on fish_add_path</a></li>
</ul>
</li>
</ul>
<h3 id="step-1-what-shell-are-you-using">step 1: what shell are you using?</h3>
<p>If you&rsquo;re not sure what shell you&rsquo;re using, here&rsquo;s a way to find out. Run this:</p>
<pre><code>ps -p $$ -o pid,comm=
</code></pre>
<ul>
<li>if you&rsquo;re using <strong>bash</strong>, it&rsquo;ll print out <code>97295 bash</code></li>
<li>if you&rsquo;re using <strong>zsh</strong>, it&rsquo;ll print out <code>97295 zsh</code></li>
<li>if you&rsquo;re using <strong>fish</strong>, it&rsquo;ll print out an error like &ldquo;In fish, please use
$fish_pid&rdquo; (<code>$$</code> isn&rsquo;t valid syntax in fish, but in any case the error
message tells you that you&rsquo;re using fish, which you probably already knew)</li>
</ul>
<p>Also bash is the default on Linux and zsh is the default on Mac OS (as of
2024). I&rsquo;ll only cover bash, zsh, and fish in these directions.</p>
<h3 id="step-2-find-your-shell-s-config-file">step 2: find your shell&rsquo;s config file</h3>
<ul>
<li>in zsh, it&rsquo;s probably <code>~/.zshrc</code></li>
<li>in bash, it might be <code>~/.bashrc</code>, but it&rsquo;s complicated, see the note in the next section</li>
<li>in fish, it&rsquo;s probably <code>~/.config/fish/config.fish</code> (you can run <code>echo $__fish_config_dir</code> if you want to be 100% sure)</li>
</ul>
<h3 id="a-note-on-bash-s-config-file">a note on bash&rsquo;s config file</h3>
<p>Bash has three possible config files: <code>~/.bashrc</code>, <code>~/.bash_profile</code>, and <code>~/.profile</code>.</p>
<p>If you&rsquo;re not sure which one your system is set up to use, I&rsquo;d recommend
testing this way:</p>
<ol>
<li>add <code>echo hi there</code> to your <code>~/.bashrc</code></li>
<li>Restart your terminal</li>
<li>If you see &ldquo;hi there&rdquo;, that means <code>~/.bashrc</code> is being used! Hooray!</li>
<li>Otherwise remove it and try the same thing with <code>~/.bash_profile</code></li>
<li>You can also try <code>~/.profile</code> if the first two options don&rsquo;t work.</li>
</ol>
<p>(there are a lot of <a href="https://blog.flowblok.id.au/2013-02/shell-startup-scripts.html">elaborate flow charts</a> out there that explain how bash
decides which config file to use but IMO it&rsquo;s not worth it to internalize them
and just testing is the fastest way to be sure)</p>
<h3 id="step-3-figure-out-which-directory-to-add">step 3: figure out which directory to add</h3>
<p>Let&rsquo;s say that you&rsquo;re trying to install and run a program called <code>http-server</code>
and it doesn&rsquo;t work, like this:</p>
<pre><code>$ npm install -g http-server
$ http-server
bash: http-server: command not found
</code></pre>
<p>How do you find what directory <code>http-server</code> is in? Honestly in general this is
not that easy &ndash; often the answer is something like &ldquo;it depends on how npm is
configured&rdquo;. A few ideas:</p>
<ul>
<li>Often when setting up a new installer (like <code>cargo</code>, <code>npm</code>, <code>homebrew</code>, etc),
when you first set it up it&rsquo;ll print out some directions about how to update
your PATH. So if you&rsquo;re paying attention you can get the directions then.</li>
<li>Sometimes installers will automatically update your shell&rsquo;s config file
to update your <code>PATH</code> for you</li>
<li>Sometimes just Googling &ldquo;where does npm install things?&rdquo; will turn up the
answer</li>
<li>Some tools have a subcommand that tells you where they&rsquo;re configured to
install things, like:
<ul>
<li>Node/npm: <code>npm config get prefix</code> (then append <code>/bin/</code>)</li>
<li>Go: <code>go env GOPATH</code> (then append <code>/bin/</code>)</li>
<li>asdf: <code>asdf info | grep ASDF_DIR</code> (then append <code>/bin/</code> and <code>/shims/</code>)</li>
</ul>
</li>
</ul>
<h3 id="step-3-1-double-check-it-s-the-right-directory">step 3.1: double check it&rsquo;s the right directory</h3>
<p>Once you&rsquo;ve found a directory you think might be the right one, make sure it&rsquo;s
actually correct! For example, I found out that on my machine, <code>http-server</code> is
in <code>~/.npm-global/bin</code>. I can make sure that it&rsquo;s the right directory by trying to
run the program <code>http-server</code> in that directory like this:</p>
<pre><code>$ ~/.npm-global/bin/http-server
Starting up http-server, serving ./public
</code></pre>
<p>It worked! Now that you know what directory you need to add to your <code>PATH</code>,
let&rsquo;s move to the next step!</p>
<h3 id="step-4-edit-your-shell-config">step 4: edit your shell config</h3>
<p>Now we have the 2 critical pieces of information we need:</p>
<ol>
<li>Which directory you&rsquo;re trying to add to your PATH (like  <code>~/.npm-global/bin/</code>)</li>
<li>Where your shell&rsquo;s config is (like <code>~/.bashrc</code>, <code>~/.zshrc</code>, or <code>~/.config/fish/config.fish</code>)</li>
</ol>
<p>Now what you need to add depends on your shell:</p>
<p><strong>bash instructions:</strong></p>
<p>Open your shell&rsquo;s config file, and add a line like this:</p>
<pre><code>export PATH=$PATH:~/.npm-global/bin/
</code></pre>
<p>(obviously replace <code>~/.npm-global/bin</code> with the actual directory you&rsquo;re trying to add)</p>
<p><strong>zsh instructions:</strong></p>
<p>You can do the same thing as in bash, but zsh also has some slightly fancier
syntax you can use if you prefer:</p>
<pre><code>path=(
  $path
  ~/.npm-global/bin
)
</code></pre>
<p><strong>fish instructions:</strong></p>
<p>In fish, the syntax is different:</p>
<pre><code>set PATH $PATH ~/.npm-global/bin
</code></pre>
<p>(in fish you can also use <code>fish_add_path</code>, some notes on that <a href="#a-note-on-fish-add-path">further down</a>)</p>
<h3 id="step-5-restart-your-shell">step 5: restart your shell</h3>
<p>Now, an extremely important step: updating your shell&rsquo;s config won&rsquo;t take
effect if you don&rsquo;t restart it!</p>
<p>Two ways to do this:</p>
<ol>
<li>open a new terminal (or terminal tab), and maybe close the old one so you don&rsquo;t get confused</li>
<li>Run <code>bash</code> to start a new shell (or <code>zsh</code> if you&rsquo;re using zsh, or <code>fish</code> if you&rsquo;re using fish)</li>
</ol>
<p>I&rsquo;ve found that both of these usually work fine.</p>
<p>And you should be done! Try running the program you were trying to run and
hopefully it works now.</p>
<p>If not, here are a couple of problems that you might run into:</p>
<h3 id="problem-1-it-ran-the-wrong-program">problem 1: it ran the wrong program</h3>
<p>If the wrong <strong>version</strong> of a program is running, you might need to add the
directory to the <em>beginning</em> of your PATH instead of the end.</p>
<p>For example, on my system I have two versions of <code>python3</code> installed, which I
can see by running <code>which -a</code>:</p>
<pre><code>$ which -a python3
/usr/bin/python3
/opt/homebrew/bin/python3
</code></pre>
<p>The one your shell will use is the <strong>first one listed</strong>.</p>
<p>If you want to use the Homebrew version, you need to add that directory
(<code>/opt/homebrew/bin</code>) to the <strong>beginning</strong> of your PATH instead, by putting this in
your shell&rsquo;s config file (it&rsquo;s <code>/opt/homebrew/bin/:$PATH</code> instead of the usual <code>$PATH:/opt/homebrew/bin/</code>)</p>
<pre><code>export PATH=/opt/homebrew/bin/:$PATH
</code></pre>
<p>or in fish:</p>
<pre><code>set PATH ~/.cargo/bin $PATH
</code></pre>
<h3 id="problem-2-the-program-isn-t-being-run-from-your-shell">problem 2: the program isn&rsquo;t being run from your shell</h3>
<p>All of these directions only work if you&rsquo;re running the program <strong>from your
shell</strong>. If you&rsquo;re running the program from an IDE, from a GUI, in a cron job,
or some other way, you&rsquo;ll need to add the directory to your PATH in a different
way, and the exact details might depend on the situation.</p>
<p><strong>in a cron job</strong></p>
<p>Some options:</p>
<ul>
<li>use the full path to the program you&rsquo;re running, like <code>/home/bork/bin/my-program</code></li>
<li>put the full PATH you want as the first line of your crontab (something like
PATH=/bin:/usr/bin:/usr/local/bin:&hellip;.). You can get the full PATH you&rsquo;re
using in your shell by running <code>echo &quot;PATH=$PATH&quot;</code>.</li>
</ul>
<p>I&rsquo;m honestly not sure how to handle it in an IDE/GUI because I haven&rsquo;t run into
that in a long time, will add directions here if someone points me in the right
direction.</p>
<h3 id="problem-3-duplicate-path-entries-making-it-harder-to-debug">problem 3: duplicate <code>PATH</code> entries making it harder to debug</h3>
<p>If you edit your path and start a new shell by running <code>bash</code> (or <code>zsh</code>, or
<code>fish</code>), you&rsquo;ll often end up with duplicate <code>PATH</code> entries, because the shell
keeps adding new things to your <code>PATH</code> every time you start your shell.</p>
<p>Personally I don&rsquo;t think I&rsquo;ve run into a situation where this kind of
duplication breaks anything, but the duplicates can make it harder to debug
what&rsquo;s going on with your <code>PATH</code> if you&rsquo;re trying to understand its contents.</p>
<p>Some ways you could deal with this:</p>
<ol>
<li>If you&rsquo;re debugging your <code>PATH</code>, open a new terminal to do it in so you get
a &ldquo;fresh&rdquo; state. This should avoid the duplication.</li>
<li>Deduplicate your <code>PATH</code> at the end of your shell&rsquo;s config  (for example in
zsh apparently you can do this with <code>typeset -U path</code>)</li>
<li>Check that the directory isn&rsquo;t already in your <code>PATH</code> when adding it (for
example in fish I believe you can do this with <code>fish_add_path --path /some/directory</code>)</li>
</ol>
<p>How to deduplicate your <code>PATH</code> is shell-specific and there isn&rsquo;t always a
built in way to do it so you&rsquo;ll need to look up how to accomplish it in your
shell.</p>
<h3 id="problem-4-losing-your-history-after-updating-your-path">problem 4: losing your history after updating your <code>PATH</code></h3>
<p>Here&rsquo;s a situation that&rsquo;s easy to get into in bash or zsh:</p>
<ol>
<li>Run a command (it fails)</li>
<li>Update your <code>PATH</code></li>
<li>Run <code>bash</code> to reload your config</li>
<li>Press the up arrow a couple of times to rerun the failed command (or open a new terminal)</li>
<li>The failed command isn&rsquo;t in your history! Why not?</li>
</ol>
<p>This happens because in bash, by default, history is not saved until you exit
the shell.</p>
<p>Some options for fixing this:</p>
<ul>
<li>Instead of running <code>bash</code> to reload your config, run <code>source ~/.bashrc</code> (or
<code>source ~/.zshrc</code> in zsh). This will reload the config inside your current
session.</li>
<li>Configure your shell to continuously save your history instead of only saving
the history when the shell exits. (How to do this depends on whether you&rsquo;re
using bash or zsh, the history options in zsh are a bit complicated and I&rsquo;m
not exactly sure what the best way is)</li>
</ul>
<h3 id="a-note-on-source">a note on <code>source</code></h3>
<p>When you install <code>cargo</code> (Rust&rsquo;s installer) for the first time, it gives you
these instructions for how to set up your PATH, which don&rsquo;t mention a specific
directory at all.</p>
<pre><code>This is usually done by running one of the following (note the leading DOT):

. &quot;$HOME/.cargo/env&quot;        	# For sh/bash/zsh/ash/dash/pdksh
source &quot;$HOME/.cargo/env.fish&quot;  # For fish
</code></pre>
<p>The idea is that you add that line to your shell&rsquo;s config, and their script
automatically sets up your <code>PATH</code> (and potentially other things) for you.</p>
<p>This is pretty common (for example <a href="https://github.com/Homebrew/install/blob/deacfa6a6e62e5f4002baf9e1fac7a96e9aa5d41/install.sh#L1072-L1087">Homebrew</a> suggests you eval <code>brew shellenv</code>), and there are
two ways to approach this:</p>
<ol>
<li>Just do what the tool suggests (like adding <code>. &quot;$HOME/.cargo/env&quot;</code> to your shell&rsquo;s config)</li>
<li>Figure out which directories the script they&rsquo;re telling you to run would add
to your PATH, and then add those manually. Here&rsquo;s how I&rsquo;d do that:
<ul>
<li>Run <code>. &quot;$HOME/.cargo/env&quot;</code> in my shell (or the fish version if using fish)</li>
<li>Run <code>echo &quot;$PATH&quot; | tr ':' '\n' | grep cargo</code> to figure out which directories it added</li>
<li>See that it says <code>/Users/bork/.cargo/bin</code> and shorten that to <code>~/.cargo/bin</code></li>
<li>Add the directory <code>~/.cargo/bin</code> to PATH (with the directions in this post)</li>
</ul>
</li>
</ol>
<p>I don&rsquo;t think there&rsquo;s anything wrong with doing what the tool suggests (it
might be the &ldquo;best way&rdquo;!), but personally I usually use the second approach
because I prefer knowing exactly what configuration I&rsquo;m changing.</p>
<h3 id="a-note-on-fish-add-path">a note on <code>fish_add_path</code></h3>
<p>fish has a handy function called <code>fish_add_path</code> that you can run to add a directory to your <code>PATH</code> like this:</p>
<pre><code>fish_add_path /some/directory
</code></pre>
<p>This is cool (it&rsquo;s such a simple command!) but I&rsquo;ve stopped using it for a couple of reasons:</p>
<ol>
<li>Sometimes <code>fish_add_path</code> will update the <code>PATH</code> for every session in the
future (with a &ldquo;universal variable&rdquo;) and sometimes it will update the <code>PATH</code>
just for the current session and it&rsquo;s hard for me to tell which one it will
do. In theory the docs explain this but I could not understand them.</li>
<li>If you ever need to <em>remove</em> the directory from your <code>PATH</code> a few weeks or
months later because maybe you made a mistake, it&rsquo;s kind of hard to do
(there are <a href="https://github.com/fish-shell/fish-shell/issues/8604">instructions in this comments of this github issue though</a>).</li>
</ol>
<h3 id="that-s-all">that&rsquo;s all</h3>
<p>Hopefully this will help some people. Let me know (on Mastodon or Bluesky) if
you there are other major gotchas that have tripped you up when adding a
directory to your PATH, or if you have questions about this post!</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Some terminal frustrations]]></title>
        <link href="https://jvns.ca/blog/2025/02/05/some-terminal-frustrations/"/>
        <updated>2025-02-05T16:57:00+00:00</updated>
        <id>https://jvns.ca/blog/2025/02/05/some-terminal-frustrations/</id>
        <content type="html"><![CDATA[<p>A few weeks ago I ran a terminal survey (you can <a href="https://jvns.ca/terminal-survey/results-bsky.html">read the results here</a>) and at the end I asked:</p>
<blockquote>
<p>What’s the most frustrating thing about using the terminal for you?</p>
</blockquote>
<p>1600 people answered, and I decided to spend a few days categorizing all the
responses. Along the way I learned that classifying qualitative data is not
easy but I gave it my best shot. I ended up building a custom
<a href="https://github.com/jvns/classificator">tool</a> to make it faster to categorize
everything.</p>
<p>As with all of my surveys the methodology isn&rsquo;t particularly scientific. I just
posted the survey to Mastodon and Twitter, ran it for a couple of days, and got
answers from whoever happened to see it and felt like responding.</p>
<p>Here are the top categories of frustrations!</p>
<p>I think it&rsquo;s worth keeping in mind while reading these comments that</p>
<ul>
<li>40% of people answering this survey have been using the terminal for <strong>21+ years</strong></li>
<li>95% of people answering the survey have been using the terminal for at least 4 years</li>
</ul>
<p>These comments aren&rsquo;t coming from total beginners.</p>
<p>Here are the categories of frustrations! The number in brackets is the number
of people with that frustration. I&rsquo;m mostly writing this up for myself because
I&rsquo;m trying to write a zine about the terminal and I wanted to get a sense for
what people are having trouble with.</p>
<h3 id="remembering-syntax-115">remembering syntax (115)</h3>
<p>People talked about struggles remembering:</p>
<ul>
<li>the syntax for CLI tools like awk, jq, sed, etc</li>
<li>the syntax for redirects</li>
<li>keyboard shortcuts for tmux, text editing, etc</li>
</ul>
<p>One example comment:</p>
<blockquote>
<p>There are just so many little &ldquo;trivia&rdquo; details to remember for full
functionality. Even after all these years I&rsquo;ll sometimes forget where it&rsquo;s 2
or 1 for stderr, or forget which is which for <code>&gt;</code> and <code>&gt;&gt;</code>.</p>
</blockquote>
<h3 id="switching-terminals-is-hard-91">switching terminals is hard (91)</h3>
<p>People talked about struggling with switching systems (for example home/work
computer or when SSHing) and running into:</p>
<ul>
<li>OS differences in keyboard shortcuts (like Linux vs Mac)</li>
<li>systems which don&rsquo;t have their preferred text editor (&ldquo;no vim&rdquo; or &ldquo;only vim&rdquo;)</li>
<li>different versions of the same command (like Mac OS grep vs GNU grep)</li>
<li>no tab completion</li>
<li>a shell they aren&rsquo;t used to (&ldquo;the subtle differences between zsh and bash&rdquo;)</li>
</ul>
<p>as well as differences inside the same system like pagers being not consistent
with each other (git diff pagers, other pagers).</p>
<p>One example comment:</p>
<blockquote>
<p>I got used to fish and vi mode which are not available when I ssh into
servers, containers.</p>
</blockquote>
<h3 id="color-85">color (85)</h3>
<p>Lots of problems with color, like:</p>
<ul>
<li>programs setting colors that are unreadable with a light background color</li>
<li>finding a colorscheme they like (and getting it to work consistently across different apps)</li>
<li>color not working inside several layers of SSH/tmux/etc</li>
<li>not liking the defaults</li>
<li>not wanting color at all and struggling to turn it off</li>
</ul>
<p>This comment felt relatable to me:</p>
<blockquote>
<p>Getting my terminal theme configured in a reasonable way between the terminal
emulator and fish (I did this years ago and remember it being tedious and
fiddly and now feel like I&rsquo;m locked into my current theme because it works
and I dread touching any of that configuration ever again).</p>
</blockquote>
<h3 id="keyboard-shortcuts-84">keyboard shortcuts (84)</h3>
<p>Half of the comments on keyboard shortcuts were about how on Linux/Windows, the
keyboard shortcut to copy/paste in the terminal is different from in the rest
of the OS.</p>
<p>Some other issues with keyboard shortcuts other than copy/paste:</p>
<ul>
<li>using <code>Ctrl-W</code> in a browser-based terminal and closing the window</li>
<li>the terminal only supports a limited set of keyboard shortcuts (no
<code>Ctrl-Shift-</code>, no <code>Super</code>, no <code>Hyper</code>, lots of <code>ctrl-</code> shortcuts aren&rsquo;t
possible like <code>Ctrl-,</code>)</li>
<li>the OS stopping you from using a terminal keyboard shortcut (like by default
Mac OS uses <code>Ctrl+left arrow</code> for something else)</li>
<li>issues using emacs in the terminal</li>
<li>backspace not working (2)</li>
</ul>
<h3 id="other-copy-and-paste-issues-75">other copy and paste issues (75)</h3>
<p>Aside from &ldquo;the keyboard shortcut for copy and paste is different&rdquo;, there were
a lot of OTHER issues with copy and paste, like:</p>
<ul>
<li>copying over SSH</li>
<li>how tmux and the terminal emulator both do copy/paste in different ways</li>
<li>dealing with many different clipboards (system clipboard, vim clipboard, the
&ldquo;middle click&rdquo; clipboard on Linux, tmux&rsquo;s clipboard, etc) and potentially
synchronizing them</li>
<li>random spaces added when copying from the terminal</li>
<li>pasting multiline commands which automatically get run in a terrifying way</li>
<li>wanting a way to copy text without using the mouse</li>
</ul>
<h3 id="discoverability-55">discoverability (55)</h3>
<p>There were lots of comments about this, which all came down to the same basic
complaint &ndash; it&rsquo;s hard to discover useful tools or features! This comment kind of
summed it all up:</p>
<blockquote>
<p>How difficult it is to learn independently. Most of what I know is an
assorted collection of stuff I&rsquo;ve been told by random people over the years.</p>
</blockquote>
<h3 id="steep-learning-curve-44">steep learning curve (44)</h3>
<p>A lot of comments about it generally having a steep learning curve. A couple of
example comments:</p>
<blockquote>
<p>After 15 years of using it, I’m not much faster than using it than I was 5 or
maybe even 10 years ago.</p>
</blockquote>
<p>and</p>
<blockquote>
<p>That I know I could make my life easier by learning more about the shortcuts
and commands and configuring the terminal but I don&rsquo;t spend the time because it
feels overwhelming.</p>
</blockquote>
<h3 id="history-42">history  (42)</h3>
<p>Some issues with shell history:</p>
<ul>
<li>history not being shared between terminal tabs (16)</li>
<li>limits that are too short (4)</li>
<li>history not being restored when terminal tabs are restored</li>
<li>losing history because the terminal crashed</li>
<li>not knowing how to search history</li>
</ul>
<p>One example comment:</p>
<blockquote>
<p>It wasted a lot of time until I figured it out and still annoys me that
&ldquo;history&rdquo; on zsh has such a small buffer;  I have to type &ldquo;history 0&rdquo; to get
any useful length of history.</p>
</blockquote>
<h3 id="bad-documentation-37">bad documentation (37)</h3>
<p>People talked about:</p>
<ul>
<li>documentation being generally opaque</li>
<li>lack of examples in man pages</li>
<li>programs which don&rsquo;t have man pages</li>
</ul>
<p>Here&rsquo;s a representative comment:</p>
<blockquote>
<p>Finding good examples and docs. Man pages often not enough, have to wade
through stack overflow</p>
</blockquote>
<h3 id="scrollback-36">scrollback (36)</h3>
<p>A few issues with scrollback:</p>
<ul>
<li>programs printing out too much data making you lose scrollback history</li>
<li>resizing the terminal messes up the scrollback</li>
<li>lack of timestamps</li>
<li>GUI programs that you start in the background printing stuff out that gets in
the way of other programs&rsquo; outputs</li>
</ul>
<p>One example comment:</p>
<blockquote>
<p>When resizing the terminal (in particular: making it narrower) leads to
broken rewrapping of the scrollback content because the commands formatted
their output based on the terminal window width.</p>
</blockquote>
<h3 id="it-feels-outdated-33">&ldquo;it feels outdated&rdquo; (33)</h3>
<p>Lots of comments about how the terminal feels hampered by legacy decisions and
how users often end up needing to learn implementation details that feel very
esoteric. One example comment:</p>
<blockquote>
<p>Most of the legacy cruft, it would be great to have a green field
implementation of the CLI interface.</p>
</blockquote>
<h3 id="shell-scripting-32">shell scripting (32)</h3>
<p>Lots of complaints about POSIX shell scripting. There&rsquo;s a general feeling that
shell scripting is difficult but also that switching to a different less
standard scripting language (fish, nushell, etc) brings its own problems.</p>
<blockquote>
<p>Shell scripting. My tolerance to ditch a shell script and go to a scripting
language is pretty low. It’s just too messy and powerful. Screwing up can be
costly so I don’t even bother.</p>
</blockquote>
<h3 id="more-issues">more issues</h3>
<p>Some more issues that were mentioned at least 10 times:</p>
<ul>
<li>(31) inconsistent command line arguments: is it -h or help or &ndash;help?</li>
<li>(24) keeping dotfiles in sync across different systems</li>
<li>(23) performance (e.g. &ldquo;my shell takes too long to start&rdquo;)</li>
<li>(20) window management (potentially with some combination of tmux tabs, terminal tabs, and multiple terminal windows. Where did that shell session go?)</li>
<li>(17) generally feeling scared/uneasy (&ldquo;The debilitating fear that I’m going
to do some mysterious Bad Thing with a command and I will have absolutely no
idea how to fix or undo it or even really figure out what happened&rdquo;)</li>
<li>(16) terminfo issues (&ldquo;Having to learn about terminfo if/when I try a new terminal emulator and ssh elsewhere.&rdquo;)</li>
<li>(16) lack of image support (sixel etc)</li>
<li>(15) SSH issues (like having to start over when you lose the SSH connection)</li>
<li>(15) various tmux/screen issues (for example lack of integration between tmux and the terminal emulator)</li>
<li>(15) typos &amp; slow typing</li>
<li>(13) the terminal getting messed up for various reasons (pressing <code>Ctrl-S</code>, <code>cat</code>ing a binary, etc)</li>
<li>(12) quoting/escaping in the shell</li>
<li>(11) various Windows/PowerShell issues</li>
</ul>
<h3 id="n-a-122">n/a (122)</h3>
<p>There were also 122 answers to the effect of &ldquo;nothing really&rdquo; or &ldquo;only that I
can&rsquo;t do EVERYTHING in the terminal&rdquo;</p>
<p>One example comment:</p>
<blockquote>
<p>Think I&rsquo;ve found work arounds for most/all frustrations</p>
</blockquote>
<h3 id="that-s-all">that&rsquo;s all!</h3>
<p>I&rsquo;m not going to make a lot of commentary on these results, but here are a
couple of categories that feel related to me:</p>
<ul>
<li>remembering syntax &amp; history (often the thing you need to remember is something you&rsquo;ve run before!)</li>
<li>discoverability &amp; the learning curve (the lack of discoverability is definitely a big part of what makes it hard to learn)</li>
<li>&ldquo;switching systems is hard&rdquo; &amp; &ldquo;it feels outdated&rdquo; (tools that haven&rsquo;t really
changed in 30 or 40 years have many problems but they do tend to be always
<em>there</em> no matter what system you&rsquo;re on, which is very useful and makes them
hard to stop using)</li>
</ul>
<p>Trying to categorize all these results in a reasonable way really gave me an
appreciation for social science researchers&rsquo; skills.</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[What's involved in getting a "modern" terminal setup?]]></title>
        <link href="https://jvns.ca/blog/2025/01/11/getting-a-modern-terminal-setup/"/>
        <updated>2025-01-11T09:46:01+00:00</updated>
        <id>https://jvns.ca/blog/2025/01/11/getting-a-modern-terminal-setup/</id>
        <content type="html"><![CDATA[<p>Hello! Recently I ran a terminal survey and I asked people what frustrated
them. One person commented:</p>
<blockquote>
<p>There are so many pieces to having a modern terminal experience. I wish it
all came out of the box.</p>
</blockquote>
<p>My immediate reaction was &ldquo;oh, getting a modern terminal experience isn&rsquo;t that
hard, you just need to&hellip;.&rdquo;, but the more I thought about it, the longer the
&ldquo;you just need to&hellip;&rdquo; list got, and I kept thinking about more and more
caveats.</p>
<p>So I thought I would write down some notes about what it means to me personally
to have a &ldquo;modern&rdquo; terminal experience and what I think can make it hard for
people to get there.</p>
<h3 id="what-is-a-modern-terminal-experience">what is a &ldquo;modern terminal experience&rdquo;?</h3>
<p>Here are a few things that are important to me, with which part of the system
is responsible for them:</p>
<ul>
<li><strong>multiline support for copy and paste</strong>: if you paste 3 commands in your shell, it should not immediately run them all! That&rsquo;s scary! (<strong>shell</strong>, <strong>terminal emulator</strong>)</li>
<li><strong>infinite shell history</strong>: if I run a command in my shell, it should be saved forever, not deleted after 500 history entries or whatever. Also I want commands to be saved to the history immediately when I run them, not only when I exit the shell session (<strong>shell</strong>)</li>
<li><strong>a useful prompt</strong>: I can&rsquo;t live without having my <strong>current directory</strong> and <strong>current git branch</strong> in my prompt (<strong>shell</strong>)</li>
<li><strong>24-bit colour</strong>: this is important to me because I find it MUCH easier to theme neovim with 24-bit colour support than in a terminal with only 256 colours (<strong>terminal emulator</strong>)</li>
<li><strong>clipboard integration</strong> between vim and my operating system so that when I copy in Firefox, I can just press <code>p</code> in vim to paste (<strong>text editor</strong>, maybe the OS/terminal emulator too)</li>
<li><strong>good autocomplete</strong>: for example commands like git should have command-specific autocomplete (<strong>shell</strong>)</li>
<li><strong>having colours in <code>ls</code></strong> (<strong>shell config</strong>)</li>
<li><strong>a terminal theme I like</strong>: I spend a lot of time in my terminal, I want it to look nice and I want its theme to match my terminal editor&rsquo;s theme. (<strong>terminal emulator</strong>, <strong>text editor</strong>)</li>
<li><strong>automatic terminal fixing</strong>: If a programs prints out some weird escape
codes that mess up my terminal, I want that to automatically get reset so
that my terminal doesn&rsquo;t get messed up (<strong>shell</strong>)</li>
<li><strong>keybindings</strong>: I want <code>Ctrl+left arrow</code> to work (<strong>shell</strong> or <strong>application</strong>)</li>
<li><strong>being able to use the scroll wheel in programs like <code>less</code></strong>: (<strong>terminal emulator</strong> and <strong>applications</strong>)</li>
</ul>
<p>There are a million other terminal conveniences out there and different people
value different things, but those are the ones that I would be really unhappy
without.</p>
<h3 id="how-i-achieve-a-modern-experience">how I achieve a &ldquo;modern experience&rdquo;</h3>
<p>My basic approach is:</p>
<ol>
<li>use the <code>fish</code> shell. Mostly don&rsquo;t configure it, except to:
<ul>
<li>set the <code>EDITOR</code> environment variable to my favourite terminal editor</li>
<li>alias <code>ls</code> to <code>ls --color=auto</code></li>
</ul>
</li>
<li>use any terminal emulator with 24-bit colour support. In the past I&rsquo;ve used
GNOME Terminal, Terminator, and iTerm, but I&rsquo;m not picky about this. I don&rsquo;t really
configure it other than to choose a font.</li>
<li>use <code>neovim</code>, with a configuration that I&rsquo;ve been very slowly building over the last 9 years or so (the last time I deleted my vim config and started from scratch was 9 years ago)</li>
<li>use the <a href="https://github.com/chriskempson/base16">base16 framework</a> to theme everything</li>
</ol>
<p>A few things that affect my approach:</p>
<ul>
<li>I don&rsquo;t spend a lot of time SSHed into other machines</li>
<li>I&rsquo;d rather use the mouse a little than come up with keyboard-based ways to do everything</li>
<li>I work on a lot of small projects, not one big project</li>
</ul>
<h3 id="some-out-of-the-box-options-for-a-modern-experience">some &ldquo;out of the box&rdquo; options for a &ldquo;modern&rdquo; experience</h3>
<p>What if you want a nice experience, but don&rsquo;t want to spend a lot of time on
configuration? Figuring out how to configure vim in a way that I was satisfied
with really did take me like ten years, which is a long time!</p>
<p>My best ideas for how to get a reasonable terminal experience with minimal
config are:</p>
<ul>
<li>shell: either <code>fish</code> or <code>zsh</code> with <a href="https://ohmyz.sh/">oh-my-zsh</a></li>
<li>terminal emulator: almost anything with 24-bit colour support, for example all of these are popular:
<ul>
<li>linux: GNOME Terminal, Konsole, Terminator, xfce4-terminal</li>
<li>mac: iTerm (Terminal.app doesn&rsquo;t have 256-colour support)</li>
<li>cross-platform: kitty, alacritty, wezterm, or ghostty</li>
</ul>
</li>
<li>shell config:
<ul>
<li>set the <code>EDITOR</code> environment variable to your favourite terminal text
editor</li>
<li>maybe alias <code>ls</code> to <code>ls --color=auto</code></li>
</ul>
</li>
<li>text editor: this is a tough one, maybe <a href="https://micro-editor.github.io/">micro</a> or <a href="https://helix-editor.com/">helix</a>? I haven&rsquo;t used
either of them seriously but they both seem like very cool projects and I
think it&rsquo;s amazing that you can just use all the usual GUI editor commands
(<code>Ctrl-C</code> to copy, <code>Ctrl-V</code> to paste, <code>Ctrl-A</code> to select all) in micro and
they do what you&rsquo;d expect. I would probably try switching to helix except
that retraining my vim muscle memory seems way too hard. Also helix doesn&rsquo;t
have a GUI or plugin system yet.</li>
</ul>
<p>Personally I <strong>wouldn&rsquo;t</strong> use xterm, rxvt, or Terminal.app as a terminal emulator,
because I&rsquo;ve found in the past that they&rsquo;re missing core features (like 24-bit
colour in Terminal.app&rsquo;s case) that make the terminal harder to use for me.</p>
<p>I don&rsquo;t want to pretend that getting a &ldquo;modern&rdquo; terminal experience is easier
than it is though &ndash; I think there are two issues that make it hard. Let&rsquo;s talk
about them!</p>
<h3 id="issue-1-with-getting-to-a-modern-experience-the-shell">issue 1 with getting to a &ldquo;modern&rdquo; experience: the shell</h3>
<p>bash and zsh are by far the two most popular shells, and neither of them
provide a default experience that I would be happy using out of the box, for
example:</p>
<ul>
<li>you need to customize your prompt</li>
<li>they don&rsquo;t come with git completions by default, you have to set them up</li>
<li>by default, bash only stores 500 (!) lines of history and (at least on Mac OS)
zsh is only configured to store 2000 lines, which is still not a lot</li>
<li>I find bash&rsquo;s tab completion very frustrating, if there&rsquo;s more than
one match then you can&rsquo;t tab through them</li>
</ul>
<p>And even though <a href="https://jvns.ca/blog/2024/09/12/reasons-i--still--love-fish/">I love fish</a>, the fact
that it isn&rsquo;t POSIX does make it hard for a lot of folks to make the switch.</p>
<p>Of course it&rsquo;s totally possible to learn how to customize your prompt in bash
or whatever, and it doesn&rsquo;t even need to be that complicated (in bash I&rsquo;d
probably start with something like <code>export PS1='[\u@\h \W$(__git_ps1 &quot; (%s)&quot;)]\$ '</code>, or maybe use <a href="https://starship.rs/">starship</a>).
But each of these &ldquo;not complicated&rdquo; things really does add up and it&rsquo;s
especially tough if you need to keep your config in sync across several
systems.</p>
<p>An extremely popular solution to getting a &ldquo;modern&rdquo; shell experience is
<a href="https://ohmyz.sh/">oh-my-zsh</a>. It seems like a great project and I know a lot
of people use it very happily, but I&rsquo;ve struggled with configuration systems
like that in the past &ndash; it looks like right now the base oh-my-zsh adds about
3000 lines of config, and often I find that having an extra configuration
system makes it harder to debug what&rsquo;s happening when things go wrong. I
personally have a tendency to use the system to add a lot of extra plugins,
make my system slow, get frustrated that it&rsquo;s slow, and then delete it
completely and write a new config from scratch.</p>
<h3 id="issue-2-with-getting-to-a-modern-experience-the-text-editor">issue 2 with getting to a &ldquo;modern&rdquo; experience: the text editor</h3>
<p>In the terminal survey I ran recently, the most popular terminal text editors
by far were <code>vim</code>, <code>emacs</code>, and <code>nano</code>.</p>
<p>I think the main options for terminal text editors are:</p>
<ul>
<li>use vim or emacs and configure it to your liking, you can probably have any
feature you want if you put in the work</li>
<li>use nano and accept that you&rsquo;re going to have a pretty limited experience
(for example I don&rsquo;t think you can select text with the mouse and then &ldquo;cut&rdquo;
it in nano)</li>
<li>use <code>micro</code> or <code>helix</code> which seem to offer a pretty good out-of-the-box
experience, potentially occasionally run into issues with using a less
mainstream text editor</li>
<li>just avoid using a terminal text editor as much as possible, maybe use VSCode, use
VSCode&rsquo;s terminal for all your terminal needs, and mostly never edit files in
the terminal. Or I know a lot of people use <code>code</code> as their <code>EDITOR</code> in the terminal.</li>
</ul>
<h3 id="issue-3-individual-applications">issue 3: individual applications</h3>
<p>The last issue is that sometimes individual programs that I use are kind of
annoying. For example on my Mac OS machine, <code>/usr/bin/sqlite3</code> doesn&rsquo;t support
the <code>Ctrl+Left Arrow</code> keyboard shortcut. Fixing this to get a reasonable
terminal experience in SQLite was a little complicated, I had to:</p>
<ul>
<li>realize why this is happening (Mac OS won&rsquo;t ship GNU tools, and &ldquo;Ctrl-Left arrow&rdquo; support comes from GNU readline)</li>
<li>find a workaround (install sqlite from homebrew, which does have readline support)</li>
<li>adjust my environment (put Homebrew&rsquo;s sqlite3 in my PATH)</li>
</ul>
<p>I find that debugging application-specific issues like this is really not easy
and often it doesn&rsquo;t feel &ldquo;worth it&rdquo; &ndash; often I&rsquo;ll end up just dealing with
various minor inconveniences because I don&rsquo;t want to spend hours investigating
them. The only reason I was even able to figure this one out at all is that
I&rsquo;ve been spending a huge amount of time thinking about the terminal recently.</p>
<p>A big part of having a &ldquo;modern&rdquo; experience using terminal programs is just
using newer terminal programs, for example I can&rsquo;t be bothered to learn a
keyboard shortcut to sort the columns in <code>top</code>, but in <code>htop</code>  I can just click
on a column heading with my mouse to sort it. So I use htop instead! But discovering new more &ldquo;modern&rdquo; command line tools isn&rsquo;t easy (though
I made <a href="https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-line-tools/">a list here</a>),
finding ones that I actually like using in practice takes time, and if you&rsquo;re
SSHed into another machine, they won&rsquo;t always be there.</p>
<h3 id="everything-affects-everything-else">everything affects everything else</h3>
<p>Something I find tricky about configuring my terminal to make everything &ldquo;nice&rdquo;
is that changing one seemingly small thing about my workflow can really affect
everything else. For example right now I don&rsquo;t use tmux. But if I needed to use
tmux again (for example because I was doing a lot of work SSHed into another
machine), I&rsquo;d need to think about a few things, like:</p>
<ul>
<li>if I wanted tmux&rsquo;s copy to synchronize with my system clipboard over
SSH, I&rsquo;d need to make sure that my terminal emulator has <a href="https://old.reddit.com/r/vim/comments/k1ydpn/a_guide_on_how_to_copy_text_from_anywhere/">OSC 52 support</a></li>
<li>if I wanted to use iTerm&rsquo;s tmux integration (which makes tmux tabs into iTerm
tabs), I&rsquo;d need to change how I configure colours &ndash; right now I set them
with a <a href="https://github.com/chriskempson/base16-shell/blob/588691ba71b47e75793ed9edfcfaa058326a6f41/scripts/base16-solarized-light.sh">shell script</a> that I run when my shell starts, but that means the
colours get lost when restoring a tmux session.</li>
</ul>
<p>and probably more things I haven&rsquo;t thought of. &ldquo;Using tmux means that I have to
change how I manage my colours&rdquo; sounds unlikely, but that really did happen to
me and I decided &ldquo;well, I don&rsquo;t want to change how I manage colours right now,
so I guess I&rsquo;m not using that feature!&rdquo;.</p>
<p>It&rsquo;s also hard to remember which features I&rsquo;m relying on &ndash; for example maybe
my current terminal <em>does</em> have OSC 52 support and because copying from tmux over SSH
has always Just Worked I don&rsquo;t even realize that that&rsquo;s something I need, and
then it mysteriously stops working when I switch terminals.</p>
<h3 id="change-things-slowly">change things slowly</h3>
<p>Personally even though I think my setup is not <em>that</em> complicated, it&rsquo;s taken
me 20 years to get to this point! Because terminal config changes are so likely
to have unexpected and hard-to-understand consequences, I&rsquo;ve found that if I
change a lot of terminal configuration all at once it makes it much harder to
understand what went wrong if there&rsquo;s a problem, which can be really
disorienting.</p>
<p>So I usually prefer to make pretty small changes, and accept that changes can
might take me a REALLY long time to get used to. For example I switched from
using <code>ls</code> to <a href="https://github.com/eza-community/eza">eza</a> a year or two ago and
while I like it (because <code>eza -l</code> prints human-readable file sizes by default)
I&rsquo;m still not quite sure about it. But also sometimes it&rsquo;s worth it to make a
big change, like I made the switch to fish (from bash) 10 years ago and I&rsquo;m
very happy I did.</p>
<h3 id="getting-a-modern-terminal-is-not-that-easy">getting a &ldquo;modern&rdquo; terminal is not that easy</h3>
<p>Trying to explain how &ldquo;easy&rdquo; it is to configure your terminal really just made
me think that it&rsquo;s kind of hard and that I still sometimes get confused.</p>
<p>I&rsquo;ve found that there&rsquo;s never one perfect way to configure things in the
terminal that will be compatible with every single other thing. I just need to
try stuff, figure out some kind of locally stable state that works for me, and
accept that if I start using a new tool it might disrupt the system and I might
need to rethink things.</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA["Rules" that terminal programs follow]]></title>
        <link href="https://jvns.ca/blog/2024/11/26/terminal-rules/"/>
        <updated>2024-12-12T09:28:22+00:00</updated>
        <id>https://jvns.ca/blog/2024/11/26/terminal-rules/</id>
        <content type="html"><![CDATA[<p>Recently I&rsquo;ve been thinking about how everything that happens in the terminal
is some combination of:</p>
<ol>
<li>Your <strong>operating system</strong>&rsquo;s job</li>
<li>Your <strong>shell</strong>&rsquo;s job</li>
<li>Your <strong>terminal emulator</strong>&rsquo;s job</li>
<li>The job of <strong>whatever program you happen to be running</strong> (like <code>top</code> or <code>vim</code> or <code>cat</code>)</li>
</ol>
<p>The first three (your operating system, shell, and terminal emulator) are all kind of
known quantities &ndash; if you&rsquo;re using bash in GNOME Terminal on Linux, you can
more or less reason about how how all of those things interact, and some of
their behaviour is standardized by POSIX.</p>
<p>But the fourth one (&ldquo;whatever program you happen to be running&rdquo;) feels like it
could do ANYTHING. How are you supposed to know how a program is going to
behave?</p>
<p>This post is kind of long so here&rsquo;s a quick table of contents:</p>
<ul>
<li><a href="#programs-behave-surprisingly-consistently">programs behave surprisingly consistently</a></li>
<li><a href="#these-are-meant-to-be-descriptive-not-prescriptive">these are meant to be descriptive, not prescriptive</a></li>
<li><a href="#it-s-not-always-obvious-which-rules-are-the-program-s-responsibility-to-implement">it&rsquo;s not always obvious which &ldquo;rules&rdquo; are the program&rsquo;s responsibility to implement</a></li>
<li><a href="#rule-1-noninteractive-programs-should-quit-when-you-press-ctrl-c">rule 1: noninteractive programs should quit when you press <code>Ctrl-C</code></a></li>
<li><a href="#rule-2-tuis-should-quit-when-you-press-q">rule 2: TUIs should quit when you press <code>q</code></a></li>
<li><a href="#rule-3-repls-should-quit-when-you-press-ctrl-d-on-an-empty-line">rule 3: REPLs should quit when you press <code>Ctrl-D</code> on an empty line</a></li>
<li><a href="#rule-4-don-t-use-more-than-16-colours">rule 4: don&rsquo;t use more than 16 colours</a></li>
<li><a href="#rule-5-vaguely-support-readline-keybindings">rule 5: vaguely support readline keybindings</a></li>
<li><a href="#rule-5-1-ctrl-w-should-delete-the-last-word">rule 5.1: <code>Ctrl-W</code> should delete the last word</a></li>
<li><a href="#rule-6-disable-colours-when-writing-to-a-pipe">rule 6: disable colours when writing to a pipe</a></li>
<li><a href="#rule-7-means-stdin-stdout">rule 7: <code>-</code> means stdin/stdout</a></li>
<li><a href="#these-rules-take-a-long-time-to-learn">these &ldquo;rules&rdquo; take a long time to learn</a></li>
</ul>
<h3 id="programs-behave-surprisingly-consistently">programs behave surprisingly consistently</h3>
<p>As far as I know, there are no real standards for how programs in the terminal
should behave &ndash; the closest things I know of are:</p>
<ul>
<li>POSIX, which mostly dictates how your terminal emulator / OS / shell should
work together. I think it does specify a few things about how core utilities like
<code>cp</code> should work but AFAIK it doesn&rsquo;t have anything to say about how for
example <code>htop</code> should behave.</li>
<li>these <a href="https://clig.dev/">command line interface guidelines</a></li>
</ul>
<p>But even though there are no standards, in my experience programs in the
terminal behave in a pretty consistent way. So I wanted to write down a list of
&ldquo;rules&rdquo; that in my experience programs mostly follow.</p>
<h3 id="these-are-meant-to-be-descriptive-not-prescriptive">these are meant to be descriptive, not prescriptive</h3>
<p>My goal here isn&rsquo;t to convince authors of terminal programs that they <em>should</em>
follow any of these rules. There are lots of exceptions to these and often
there&rsquo;s a good reason for those exceptions.</p>
<p>But it&rsquo;s very useful for me to know what behaviour to expect from a random new
terminal program that I&rsquo;m using. Instead of &ldquo;uh, programs could do literally
anything&rdquo;, it&rsquo;s &ldquo;ok, here are the basic rules I expect, and then I can keep a
short mental list of exceptions&rdquo;.</p>
<p>So I&rsquo;m just writing down what I&rsquo;ve observed about how programs behave in my 20
years of using the terminal, why I think they behave that way, and some
examples of cases where that rule is &ldquo;broken&rdquo;.</p>
<h3 id="it-s-not-always-obvious-which-rules-are-the-program-s-responsibility-to-implement">it&rsquo;s not always obvious which &ldquo;rules&rdquo; are the program&rsquo;s responsibility to implement</h3>
<p>There are a bunch of common conventions that I think are pretty clearly the
program&rsquo;s responsibility to implement, like:</p>
<ul>
<li>config files should go in <code>~/.BLAHrc</code> or <code>~/.config/BLAH/FILE</code> or <code>/etc/BLAH/</code> or something</li>
<li><code>--help</code> should print help text</li>
<li>programs should print &ldquo;regular&rdquo; output to stdout and errors to stderr</li>
</ul>
<p>But in this post I&rsquo;m going to focus on things that it&rsquo;s not 100% obvious are
the program&rsquo;s responsibility. For example it feels to me like a &ldquo;law of nature&rdquo;
that pressing <code>Ctrl-D</code> should quit a REPL, but programs often
need to explicitly implement support for it &ndash; even though <code>cat</code> doesn&rsquo;t need
to implement <code>Ctrl-D</code> support, <code>ipython</code> <a href="https://github.com/prompt-toolkit/python-prompt-toolkit/blob/a2a12300c635ab3c051566e363ed27d853af4b21/src/prompt_toolkit/shortcuts/prompt.py#L824-L837">does</a>. (more about that in &ldquo;rule 3&rdquo; below)</p>
<p>Understanding which things are the program&rsquo;s responsibility makes it much less
surprising when different programs&rsquo; implementations are slightly different.</p>
<h3 id="rule-1-noninteractive-programs-should-quit-when-you-press-ctrl-c">rule 1: noninteractive programs should quit when you press <code>Ctrl-C</code></h3>
<p>The main reason for this rule is that noninteractive programs will quit by
default on <code>Ctrl-C</code> if they don&rsquo;t set up a <code>SIGINT</code> signal handler, so this is
kind of a &ldquo;you should act like the default&rdquo; rule.</p>
<p>Something that trips a lot of people up is that this doesn&rsquo;t apply to
<strong>interactive</strong> programs like <code>python3</code> or <code>bc</code> or <code>less</code>. This is because in
an interactive program, <code>Ctrl-C</code> has a different job &ndash; if the program is
running an operation (like for example a search in <code>less</code> or some Python code
in <code>python3</code>), then <code>Ctrl-C</code> will interrupt that operation but not stop the
program.</p>
<p>As an example of how this works in an interactive program: here&rsquo;s the code <a href="https://github.com/prompt-toolkit/python-prompt-toolkit/blob/a2a12300c635ab3c051566e363ed27d853af4b21/src/prompt_toolkit/key_binding/bindings/vi.py#L2225">in prompt-toolkit</a> (the library that iPython uses for handling input)
that aborts a search when you press <code>Ctrl-C</code>.</p>
<h3 id="rule-2-tuis-should-quit-when-you-press-q">rule 2: TUIs should quit when you press <code>q</code></h3>
<p>TUI programs (like <code>less</code> or <code>htop</code>) will usually quit when you press <code>q</code>.</p>
<p>This rule doesn&rsquo;t apply to any program where pressing <code>q</code> to quit wouldn&rsquo;t make
sense, like <code>tmux</code> or text editors.</p>
<h3 id="rule-3-repls-should-quit-when-you-press-ctrl-d-on-an-empty-line">rule 3: REPLs should quit when you press <code>Ctrl-D</code> on an empty line</h3>
<p>REPLs (like <code>python3</code> or <code>ed</code>) will usually quit when you press <code>Ctrl-D</code> on an
empty line. This rule is similar to the <code>Ctrl-C</code> rule &ndash; the reason for this is
that by default if you&rsquo;re running a program (like <code>cat</code>) in &ldquo;cooked mode&rdquo;, then
the operating system will return an <code>EOF</code> when you press <code>Ctrl-D</code> on an empty
line.</p>
<p>Most of the REPLs I use (sqlite3, python3, fish, bash, etc) don&rsquo;t actually use
cooked mode, but they all implement this keyboard shortcut anyway to mimic the
default behaviour.</p>
<p>For example, here&rsquo;s <a href="https://github.com/prompt-toolkit/python-prompt-toolkit/blob/a2a12300c635ab3c051566e363ed27d853af4b21/src/prompt_toolkit/shortcuts/prompt.py#L824-L837">the code in prompt-toolkit</a>
that quits when you press Ctrl-D, and here&rsquo;s <a href="https://github.com/bminor/bash/blob/6794b5478f660256a1023712b5fc169196ed0a22/lib/readline/readline.c#L658-L672">the same code in readline</a>.</p>
<p>I actually thought that this one was a &ldquo;Law of Terminal Physics&rdquo; until very
recently because I&rsquo;ve basically never seen it broken, but you can see that it&rsquo;s
just something that each individual input library has to implement in the links
above.</p>
<p>Someone pointed out that the Erlang REPL does not quit when you press <code>Ctrl-D</code>,
so I guess not every REPL follows this &ldquo;rule&rdquo;.</p>
<h3 id="rule-4-don-t-use-more-than-16-colours">rule 4: don&rsquo;t use more than 16 colours</h3>
<p>Terminal programs rarely use colours other than the base 16 ANSI colours. This
is because if you specify colours with a hex code, it&rsquo;s very likely to clash
with some users&rsquo; background colour. For example if I print out some text as
<code>#EEEEEE</code>, it would be almost invisible on a white background, though it would
look fine on a dark background.</p>
<p>But if you stick to the default 16 base colours, you have a much better chance
that the user has configured those colours in their terminal emulator so that
they work reasonably well with their background color. Another reason to stick
to the default base 16 colours is that it makes less assumptions about what
colours the terminal emulator supports.</p>
<p>The only programs I usually see breaking this &ldquo;rule&rdquo; are text editors, for
example Helix by default will use a purple background which is not a default
ANSI colour. It seems fine for Helix to break this rule since Helix isn&rsquo;t a
&ldquo;core&rdquo; program and I assume any Helix user who doesn&rsquo;t like that colorscheme
will just change the theme.</p>
<h3 id="rule-5-vaguely-support-readline-keybindings">rule 5: vaguely support readline keybindings</h3>
<p>Almost every program I use supports <code>readline</code> keybindings if it would make
sense to do so. For example, here are a bunch of different programs and a link
to where they define <code>Ctrl-E</code> to go to the end of the line:</p>
<ul>
<li>ipython (<a href="https://github.com/prompt-toolkit/python-prompt-toolkit/blob/a2a12300c635ab3c051566e363ed27d853af4b21/src/prompt_toolkit/key_binding/bindings/emacs.py#L72">Ctrl-E defined here</a>)</li>
<li>atuin (<a href="https://github.com/atuinsh/atuin/blob/a67cfc82fe0dc907a01f07a0fd625701e062a33b/crates/atuin/src/command/client/search/interactive.rs#L407">Ctrl-E defined here</a>)</li>
<li>fzf (<a href="https://github.com/junegunn/fzf/blob/bb55045596d6d08445f3c6d320c3ec2b457462d1/src/terminal.go#L611">Ctrl-E defined here</a>)</li>
<li>zsh (<a href="https://github.com/zsh-users/zsh/blob/86d5f24a3d28541f242eb3807379301ea976de87/Src/Zle/zle_bindings.c#L94">Ctrl-E defined here</a>)</li>
<li>fish (<a href="https://github.com/fish-shell/fish-shell/blob/99fa8aaaa7956178973150a03ce4954ab17a197b/share/functions/fish_default_key_bindings.fish#L43">Ctrl-E defined here</a>)</li>
<li>tmux&rsquo;s command prompt (<a href="https://github.com/tmux/tmux/blob/ae8f2208c98e3c2d6e3fe4cad2281dce8fd11f31/key-bindings.c#L490">Ctrl-E defined here</a>)</li>
</ul>
<p>None of those programs actually uses <code>readline</code> directly, they just sort of
mimic emacs/readline keybindings. They don&rsquo;t always mimic them <em>exactly</em>: for
example atuin seems to use <code>Ctrl-A</code> as a prefix, so <code>Ctrl-A</code> doesn&rsquo;t go to the
beginning of the line.</p>
<p>Also all of these programs seem to implement their own internal cut and paste
buffers so you can delete a line with <code>Ctrl-U</code> and then paste it with <code>Ctrl-Y</code>.</p>
<p>The exceptions to this are:</p>
<ul>
<li>some programs (like <code>git</code>, <code>cat</code>, and <code>nc</code>) don&rsquo;t have any line editing support at all (except for backspace, <code>Ctrl-W</code>, and <code>Ctrl-U</code>)</li>
<li>as usual text editors are an exception, every text editor has its own
approach to editing text</li>
</ul>
<p>I wrote more about this &ldquo;what keybindings does a program support?&rdquo; question in
<a href="https://jvns.ca/blog/2024/07/08/readline/">entering text in the terminal is complicated</a>.</p>
<h3 id="rule-5-1-ctrl-w-should-delete-the-last-word">rule 5.1: Ctrl-W should delete the last word</h3>
<p>I&rsquo;ve never seen a program (other than a text editor) where <code>Ctrl-W</code> <em>doesn&rsquo;t</em>
delete the last word. This is similar to the <code>Ctrl-C</code> rule &ndash; by default if a
program is in &ldquo;cooked mode&rdquo;, the OS will delete the last word if you press
<code>Ctrl-W</code>, and delete the whole line if you press <code>Ctrl-U</code>. So usually programs
will imitate that behaviour.</p>
<p>I can&rsquo;t think of any exceptions to this other than text editors but if there
are I&rsquo;d love to hear about them!</p>
<h3 id="rule-6-disable-colours-when-writing-to-a-pipe">rule 6: disable colours when writing to a pipe</h3>
<p>Most programs will disable colours when writing to a pipe. For example:</p>
<ul>
<li><code>rg blah</code> will highlight all occurrences of <code>blah</code> in the output, but if the
output is to a pipe or a file, it&rsquo;ll turn off the highlighting.</li>
<li><code>ls --color=auto</code> will use colour when writing to a terminal, but not when
writing to a pipe</li>
</ul>
<p>Both of those programs will also format their output differently when writing
to the terminal: <code>ls</code> will organize files into columns, and ripgrep will group
matches with headings.</p>
<p>If you want to force the program to use colour (for example because you want to
look at the colour), you can use <code>unbuffer</code> to force the program&rsquo;s output to be
a tty like this:</p>
<pre><code>unbuffer rg blah |  less -R
</code></pre>
<p>I&rsquo;m sure that there are some programs that &ldquo;break&rdquo; this rule but I can&rsquo;t think
of any examples right now. Some programs have an <code>--color</code> flag that you can
use to force colour to be on, in the example above you could also do <code>rg --color=always | less -R</code>.</p>
<h3 id="rule-7-means-stdin-stdout">rule 7: <code>-</code> means stdin/stdout</h3>
<p>Usually if you pass <code>-</code> to a program instead of a filename, it&rsquo;ll read from
stdin or write to stdout (whichever is appropriate). For example, if you want
to format the Python code that&rsquo;s on your clipboard with <code>black</code> and then copy
it, you could run:</p>
<pre><code>pbpaste | black - | pbcopy
</code></pre>
<p>(<code>pbpaste</code> is a Mac program, you can do something similar on Linux with <code>xclip</code>)</p>
<p>My impression is that most programs implement this if it would make sense and I
can&rsquo;t think of any exceptions right now, but I&rsquo;m sure there are many
exceptions.</p>
<h3 id="these-rules-take-a-long-time-to-learn">these &ldquo;rules&rdquo; take a long time to learn</h3>
<p>These rules took me a long time for me to learn because I had to:</p>
<ol>
<li>learn that the rule applied anywhere at all (&quot;<code>Ctrl-C</code> will exit programs&quot;)</li>
<li>notice some exceptions (&ldquo;okay, <code>Ctrl-C</code> will exit <code>find</code> but not <code>less</code>&rdquo;)</li>
<li>subconsciously figure out what the pattern is (&quot;<code>Ctrl-C</code> will generally quit
noninteractive programs, but in interactive programs it might interrupt the
current operation instead of quitting the program&quot;)</li>
<li>eventually maybe formulate it into an explicit rule that I know</li>
</ol>
<p>A lot of my understanding of the terminal is honestly still in the
&ldquo;subconscious pattern recognition&rdquo; stage. The only reason I&rsquo;ve been taking the
time to make things explicit at all is because I&rsquo;ve been trying to explain how
it works to others. Hopefully writing down these &ldquo;rules&rdquo; explicitly will make
learning some of this stuff a little bit faster for others.</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Why pipes sometimes get "stuck": buffering]]></title>
        <link href="https://jvns.ca/blog/2024/11/29/why-pipes-get-stuck-buffering/"/>
        <updated>2024-11-29T08:23:31+00:00</updated>
        <id>https://jvns.ca/blog/2024/11/29/why-pipes-get-stuck-buffering/</id>
        <content type="html"><![CDATA[<p>Here&rsquo;s a niche terminal problem that has bothered me for years but that I never
really understood until a few weeks ago. Let&rsquo;s say you&rsquo;re running this command
to watch for some specific output in a log file:</p>
<pre><code>tail -f /some/log/file | grep thing1 | grep thing2
</code></pre>
<p>If log lines are being added to the file relatively slowly, the result I&rsquo;d see
is&hellip; nothing! It doesn&rsquo;t matter if there were matches in the log file or not,
there just wouldn&rsquo;t be any output.</p>
<p>I internalized this as &ldquo;uh, I guess pipes just get stuck sometimes and don&rsquo;t
show me the output, that&rsquo;s weird&rdquo;, and I&rsquo;d handle it by just
running <code>grep thing1 /some/log/file | grep thing2</code> instead, which would work.</p>
<p>So as I&rsquo;ve been doing a terminal deep dive over the last few months I was
really excited to finally learn exactly why this happens.</p>
<h3 id="why-this-happens-buffering">why this happens: buffering</h3>
<p>The reason why &ldquo;pipes get stuck&rdquo; sometimes is that it&rsquo;s VERY common for
programs to buffer their output before writing it to a pipe or file. So the
pipe is working fine, the problem is that the program never even wrote the data
to the pipe!</p>
<p>This is for performance reasons: writing all output immediately as soon as you
can uses more system calls, so it&rsquo;s more efficient to save up data until you
have 8KB or so of data to write (or until the program exits) and THEN write it
to the pipe.</p>
<p>In this example:</p>
<pre><code>tail -f /some/log/file | grep thing1 | grep thing2
</code></pre>
<p>the problem is that <code>grep thing1</code> is saving up all of its matches until it has
8KB of data to write, which might literally never happen.</p>
<h3 id="programs-don-t-buffer-when-writing-to-a-terminal">programs don&rsquo;t buffer when writing to a terminal</h3>
<p>Part of why I found this so disorienting is that <code>tail -f file | grep thing</code>
will work totally fine, but then when you add the second <code>grep</code>, it stops
working!! The reason for this is that the way <code>grep</code> handles buffering depends
on whether it&rsquo;s writing to a terminal or not.</p>
<p>Here&rsquo;s how <code>grep</code> (and many other programs) decides to buffer its output:</p>
<ul>
<li>Check if stdout is a terminal or not using the <code>isatty</code> function
<ul>
<li>If it&rsquo;s a terminal, use line buffering (print every line immediately as soon as you have it)</li>
<li>Otherwise, use &ldquo;block buffering&rdquo; &ndash; only print data if you have at least 8KB or so of data to print</li>
</ul>
</li>
</ul>
<p>So if <code>grep</code> is writing directly to your terminal then you&rsquo;ll see the line as
soon as it&rsquo;s printed, but if it&rsquo;s writing to a pipe, you won&rsquo;t.</p>
<p>Of course the buffer size isn&rsquo;t always 8KB for every program, it depends on the implementation. For <code>grep</code> the buffering is handled by libc, and libc&rsquo;s buffer size is
defined in the <code>BUFSIZ</code> variable. <a href="https://github.com/bminor/glibc/blob/c69e8cccaff8f2d89cee43202623b33e6ef5d24a/libio/stdio.h#L100">Here&rsquo;s where that&rsquo;s defined in glibc</a>.</p>
<p>(as an aside: &ldquo;programs do not use 8KB output buffers when writing to a
terminal&rdquo; isn&rsquo;t, like, a law of terminal physics, a program COULD use an 8KB
buffer when writing output to a terminal if it wanted, it would just be
extremely weird if it did that, I can&rsquo;t think of any program that behaves that
way)</p>
<h3 id="commands-that-buffer-commands-that-don-t">commands that buffer &amp; commands that don&rsquo;t</h3>
<p>One annoying thing about this buffering behaviour is that you kind of need to
remember which commands buffer their output when writing to a pipe.</p>
<p>Some commands that <strong>don&rsquo;t</strong> buffer their output:</p>
<ul>
<li>tail</li>
<li>cat</li>
<li>tee</li>
</ul>
<p>I think almost everything else will buffer output, especially if it&rsquo;s a command
where you&rsquo;re likely to be using it for batch processing. Here&rsquo;s a list of some
common commands that buffer their output when writing to a pipe, along with the
flag that disables block buffering.</p>
<ul>
<li>grep (<code>--line-buffered</code>)</li>
<li>sed (<code>-u</code>)</li>
<li>awk (there&rsquo;s a <code>fflush()</code> function)</li>
<li>tcpdump (<code>-l</code>)</li>
<li>jq (<code>-u</code>)</li>
<li>tr (<code>-u</code>)</li>
<li>cut (can&rsquo;t disable buffering)</li>
</ul>
<p>Those are all the ones I can think of, lots of unix commands (like <code>sort</code>) may
or may not buffer their output but it doesn&rsquo;t matter because <code>sort</code> can&rsquo;t do
anything until it finishes receiving input anyway.</p>
<p>Also I did my best to test both the Mac OS and GNU versions of these but there
are a lot of variations and I might have made some mistakes.</p>
<h3 id="programming-languages-where-the-default-print-statement-buffers">programming languages where the default &ldquo;print&rdquo; statement buffers</h3>
<p>Also, here are a few programming language where the default print statement
will buffer output when writing to a pipe, and some ways to disable buffering
if you want:</p>
<ul>
<li>C (disable with <code>setvbuf</code>)</li>
<li>Python (disable with <code>python -u</code>, or <code>PYTHONUNBUFFERED=1</code>, or <code>sys.stdout.reconfigure(line_buffering=False)</code>, or <code>print(x, flush=True)</code>)</li>
<li>Ruby (disable with <code>STDOUT.sync = true</code>)</li>
<li>Perl (disable with <code>$| = 1</code>)</li>
</ul>
<p>I assume that these languages are designed this way so that the default print
function will be fast when you&rsquo;re doing batch processing.</p>
<p>Also whether output is buffered or not might depend on how you print, for
example in C++ <code>cout &lt;&lt; &quot;hello\n&quot;</code> buffers when writing to a pipe but <code>cout &lt;&lt; &quot;hello&quot; &lt;&lt; endl</code> will flush its output.</p>
<h3 id="when-you-press-ctrl-c-on-a-pipe-the-contents-of-the-buffer-are-lost">when you press <code>Ctrl-C</code> on a pipe, the contents of the buffer are lost</h3>
<p>Let&rsquo;s say you&rsquo;re running this command as a hacky way to watch for DNS requests
to <code>example.com</code>, and you forgot to pass <code>-l</code> to tcpdump:</p>
<pre><code>sudo tcpdump -ni any port 53 | grep example.com
</code></pre>
<p>When you press <code>Ctrl-C</code>, what happens? In a magical perfect world, what I would
<em>want</em> to happen is for <code>tcpdump</code> to flush its buffer, <code>grep</code> would search for
<code>example.com</code>, and I would see all the output I missed.</p>
<p>But in the real world, what happens is that all the programs get killed and the
output in <code>tcpdump</code>&rsquo;s buffer is lost.</p>
<p>I think this problem is probably unavoidable &ndash; I spent a little time with
<code>strace</code> to see how this works and <code>grep</code> receives the <code>SIGINT</code> before
<code>tcpdump</code> anyway so even if <code>tcpdump</code> tried to flush its buffer <code>grep</code> would
already be dead.</p>
<small>
<p>After a little more investigation, there is a workaround: if you find
<code>tcpdump</code>&rsquo;s PID and <code>kill -TERM $PID</code>, then tcpdump will flush the buffer so
you can see the output. That&rsquo;s kind of a pain but I tested it and it seems to
work.</p>
</small>
<h3 id="redirecting-to-a-file-also-buffers">redirecting to a file also buffers</h3>
<p>It&rsquo;s not just pipes, this will also buffer:</p>
<pre><code>sudo tcpdump -ni any port 53 &gt; output.txt
</code></pre>
<p>Redirecting to a file doesn&rsquo;t have the same &ldquo;<code>Ctrl-C</code> will totally destroy the
contents of the buffer&rdquo; problem though &ndash; in my experience it usually behaves
more like you&rsquo;d want, where the contents of the buffer get written to the file
before the program exits. I&rsquo;m not 100% sure whether this is something you can
always rely on or not.</p>
<h3 id="a-bunch-of-potential-ways-to-avoid-buffering">a bunch of potential ways to avoid buffering</h3>
<p>Okay, let&rsquo;s talk solutions. Let&rsquo;s say you&rsquo;ve run this command:</p>
<pre><code>tail -f /some/log/file | grep thing1 | grep thing2
</code></pre>
<p>I asked people on Mastodon how they would solve this in practice and there were
5 basic approaches. Here they are:</p>
<h4 id="solution-1-run-a-program-that-finishes-quickly">solution 1: run a program that finishes quickly</h4>
<p>Historically my solution to this has been to just avoid the &ldquo;command writing to
pipe slowly&rdquo; situation completely and instead run a program that will finish quickly
like this:</p>
<pre><code>cat /some/log/file | grep thing1 | grep thing2 | tail
</code></pre>
<p>This doesn&rsquo;t do the same thing as the original command but it does mean that
you get to avoid thinking about these weird buffering issues.</p>
<p>(you could also do <code>grep thing1 /some/log/file</code> but I often prefer to use an
&ldquo;unnecessary&rdquo; <code>cat</code>)</p>
<h4 id="solution-2-remember-the-line-buffer-flag-to-grep">solution 2: remember the &ldquo;line buffer&rdquo; flag to grep</h4>
<p>You could remember that grep has a flag to avoid buffering and pass it like this:</p>
<pre><code>tail -f /some/log/file | grep --line-buffered thing1 | grep thing2
</code></pre>
<h4 id="solution-3-use-awk">solution 3: use awk</h4>
<p>Some people said that if they&rsquo;re specifically dealing with a multiple greps
situation, they&rsquo;ll rewrite it to use a single <code>awk</code> instead, like this:</p>
<pre><code>tail -f /some/log/file |  awk '/thing1/ &amp;&amp; /thing2/'
</code></pre>
<p>Or you would write a more complicated <code>grep</code>, like this:</p>
<pre><code>tail -f /some/log/file |  grep -E 'thing1.*thing2'
</code></pre>
<p>(<code>awk</code> also buffers, so for this to work you&rsquo;ll want <code>awk</code> to be the last command in the pipeline)</p>
<h4 id="solution-4-use-stdbuf">solution 4: use <code>stdbuf</code></h4>
<p><code>stdbuf</code> uses LD_PRELOAD to turn off libc&rsquo;s buffering, and you can use it to turn off output buffering like this:</p>
<pre><code>tail -f /some/log/file | stdbuf -o0 grep thing1 | grep thing2
</code></pre>
<p>Like any <code>LD_PRELOAD</code> solution it&rsquo;s a bit unreliable &ndash; it doesn&rsquo;t work on
static binaries, I think won&rsquo;t work if the program isn&rsquo;t using libc&rsquo;s
buffering, and doesn&rsquo;t always work on Mac OS. Harry Marr has a really nice <a href="https://hmarr.com/blog/how-stdbuf-works/">How stdbuf works</a> post.</p>
<h4 id="solution-5-use-unbuffer">solution 5: use <code>unbuffer</code></h4>
<p><code>unbuffer program</code> will force the program&rsquo;s output to be a TTY, which means
that it&rsquo;ll behave the way it normally would on a TTY (less buffering, colour
output, etc). You could use it in this example like this:</p>
<pre><code>tail -f /some/log/file | unbuffer grep thing1 | grep thing2
</code></pre>
<p>Unlike <code>stdbuf</code> it will always work, though it might have unwanted side
effects, for example <code>grep thing1</code>&rsquo;s will also colour matches.</p>
<p>If you want to install unbuffer, it&rsquo;s in the <code>expect</code> package.</p>
<h3 id="that-s-all-the-solutions-i-know-about">that&rsquo;s all the solutions I know about!</h3>
<p>It&rsquo;s a bit hard for me to say which one is &ldquo;best&rdquo;, I think personally I&rsquo;m
mostly likely to use <code>unbuffer</code> because I know it&rsquo;s always going to work.</p>
<p>If I learn about more solutions I&rsquo;ll try to add them to this post.</p>
<h3 id="i-m-not-really-sure-how-often-this-comes-up">I&rsquo;m not really sure how often this comes up</h3>
<p>I think it&rsquo;s not very common for me to have a program that slowly trickles data
into a pipe like this, normally if I&rsquo;m using a pipe a bunch of data gets
written very quickly, processed by everything in the pipeline, and then
everything exits. The only examples I can come up with right now are:</p>
<ul>
<li>tcpdump</li>
<li><code>tail -f</code></li>
<li>watching log files in a different way like with <code>kubectl logs</code></li>
<li>the output of a slow computation</li>
</ul>
<h3 id="what-if-there-were-an-environment-variable-to-disable-buffering">what if there were an environment variable to disable buffering?</h3>
<p>I think it would be cool if there were a standard environment variable to turn
off buffering, like <code>PYTHONUNBUFFERED</code> in Python. I got this idea from a
<a href="https://blog.plover.com/Unix/stdio-buffering.html">couple</a> of <a href="https://blog.plover.com/Unix/stdio-buffering-2.html">blog posts</a> by Mark Dominus
in 2018. Maybe <code>NO_BUFFER</code> like <a href="https://no-color.org/">NO_COLOR</a>?</p>
<p>The design seems tricky to get right; Mark points out that NETBSD has <a href="https://man.netbsd.org/setbuf.3">environment variables called <code>STDBUF</code>, <code>STDBUF1</code>, etc</a> which gives you a
ton of control over buffering but I imagine most developers don&rsquo;t want to
implement many different environment variables to handle a relatively minor
edge case.</p>
<p>I&rsquo;m also curious about whether there are any programs that just automatically
flush their output buffers after some period of time (like 1 second). It feels
like it would be nice in theory but I can&rsquo;t think of any program that does that
so I imagine there are some downsides.</p>
<h3 id="stuff-i-left-out">stuff I left out</h3>
<p>Some things I didn&rsquo;t talk about in this post since these posts have been
getting pretty long recently and seriously does anyone REALLY want to read 3000
words about buffering?</p>
<ul>
<li>the difference between line buffering and having totally unbuffered output</li>
<li>how buffering to stderr is different from buffering to stdout</li>
<li>this post is only about buffering that happens <strong>inside the program</strong>, your
operating system&rsquo;s TTY driver also does a little bit of buffering sometimes</li>
<li>other reasons you might need to flush your output other than &ldquo;you&rsquo;re writing
to a pipe&rdquo;</li>
</ul>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Importing a frontend Javascript library without a build system]]></title>
        <link href="https://jvns.ca/blog/2024/11/18/how-to-import-a-javascript-library/"/>
        <updated>2024-11-18T09:35:42+00:00</updated>
        <id>https://jvns.ca/blog/2024/11/18/how-to-import-a-javascript-library/</id>
        <content type="html"><![CDATA[<p>I like writing Javascript <a href="https://jvns.ca/blog/2023/02/16/writing-javascript-without-a-build-system/">without a build system</a>
and for the millionth time yesterday I ran into a problem where I needed to
figure out how to import a Javascript library in my code without using a build
system, and it took FOREVER to figure out how to import it because the
library&rsquo;s setup instructions assume that you&rsquo;re using a build system.</p>
<p>Luckily at this point I&rsquo;ve mostly learned how to navigate this situation and
either successfully use the library or decide it&rsquo;s too difficult and switch to
a different library, so here&rsquo;s the guide I wish I had to importing Javascript
libraries years ago.</p>
<p>I&rsquo;m only going to talk about using Javacript libraries on the frontend, and
only about how to use them in a no-build-system setup.</p>
<p>In this post I&rsquo;m going to talk about:</p>
<ol>
<li>the three main types of Javascript files a library might provide (ES Modules, the &ldquo;classic&rdquo; global variable kind, and CommonJS)</li>
<li>how to figure out which types of files a Javascript library includes in its build</li>
<li>ways to import each type of file in your code</li>
</ol>
<h3 id="the-three-kinds-of-javascript-files">the three kinds of Javascript files</h3>
<p>There are 3 basic types of Javascript files a library can provide:</p>
<ol>
<li>the &ldquo;classic&rdquo; type of file that defines a global variable. This is the kind
of file that you can just <code>&lt;script src&gt;</code> and it&rsquo;ll Just Work. Great if you
can get it but not always available</li>
<li>an ES module (which may or may not depend on other files, we&rsquo;ll get to that)</li>
<li>a &ldquo;CommonJS&rdquo; module. This is for Node, you can&rsquo;t use it in a browser at all
without using a build system.</li>
</ol>
<p>I&rsquo;m not sure if there&rsquo;s a better name for the &ldquo;classic&rdquo; type but I&rsquo;m just going
to call it &ldquo;classic&rdquo;. Also there&rsquo;s a type called &ldquo;AMD&rdquo; but I&rsquo;m not sure how
relevant it is in 2024.</p>
<p>Now that we know the 3 types of files, let&rsquo;s talk about how to figure out which
of these the library actually provides!</p>
<h3 id="where-to-find-the-files-the-npm-build">where to find the files: the NPM build</h3>
<p>Every Javascript library has a <strong>build</strong> which it uploads to NPM. You might be
thinking (like I did originally) &ndash; Julia! The whole POINT is that we&rsquo;re not
using Node to build our library! Why are we talking about NPM?</p>
<p>But if you&rsquo;re using a link from a CDN like <a href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.min.js">https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.min.js</a>,
you&rsquo;re still using the NPM build! All the files on the CDNs originally come
from NPM.</p>
<p>Because of this, I sometimes like to <code>npm install</code> the library even if I&rsquo;m not
planning to use Node to build my library at all &ndash; I&rsquo;ll just create a new temp
folder, <code>npm install</code> there, and then delete it when I&rsquo;m done. I like being able to poke
around in the files in the NPM build on my filesystem, because then I can be
100% sure that I&rsquo;m seeing everything that the library is making available in
its build and that the CDN isn&rsquo;t hiding something from me.</p>
<p>So let&rsquo;s <code>npm install</code> a few libraries and try to figure out what types of
Javascript files they provide in their builds!</p>
<h3 id="example-library-1-chart-js">example library 1: chart.js</h3>
<p>First let&rsquo;s look inside <a href="https://www.chartjs.org">Chart.js</a>, a plotting library.</p>
<pre><code>$ cd /tmp/whatever
$ npm install chart.js
$ cd node_modules/chart.js/dist
$ ls *.*js
chart.cjs  chart.js  chart.umd.js  helpers.cjs  helpers.js
</code></pre>
<p>This library seems to have 3 basic options:</p>
<p><strong>option 1:</strong> <code>chart.cjs</code>. The <code>.cjs</code> suffix tells me that this is a <strong>CommonJS
file</strong>, for using in Node. This means it&rsquo;s impossible to use it directly in the
browser without some kind of build step.</p>
<p><strong>option 2:<code>chart.js</code></strong>. The <code>.js</code> suffix by itself doesn&rsquo;t tell us what kind of
file it is, but if I open it up, I see <code>import '@kurkle/color';</code> which is an
immediate sign that this is an ES module &ndash; the <code>import ...</code> syntax is ES
module syntax.</p>
<p><strong>option 3: <code>chart.umd.js</code></strong>. &ldquo;UMD&rdquo; stands for &ldquo;Universal Module Definition&rdquo;,
which I think means that you can use this file either with a basic <code>&lt;script src&gt;</code>, CommonJS,
or some third thing called AMD that I don&rsquo;t understand.</p>
<h3 id="how-to-use-a-umd-file">how to use a UMD file</h3>
<p>When I was using Chart.js I picked Option 3. I just needed to add this to my
code:</p>
<pre><code>&lt;script src=&quot;./chart.umd.js&quot;&gt; &lt;/script&gt;
</code></pre>
<p>and then I could use the library with the global <code>Chart</code> environment variable.
Couldn&rsquo;t be easier. I just copied <code>chart.umd.js</code> into my Git repository so that
I didn&rsquo;t have to worry about using NPM or the CDNs going down or anything.</p>
<h3 id="the-build-files-aren-t-always-in-the-dist-directory">the build files aren&rsquo;t always in the <code>dist</code> directory</h3>
<p>A lot of libraries will put their build in the <code>dist</code> directory, but not
always! The build files&rsquo; location is specified in the library&rsquo;s <code>package.json</code>.</p>
<p>For example here&rsquo;s an excerpt from Chart.js&rsquo;s <code>package.json</code>.</p>
<pre><code>  &quot;jsdelivr&quot;: &quot;./dist/chart.umd.js&quot;,
  &quot;unpkg&quot;: &quot;./dist/chart.umd.js&quot;,
  &quot;main&quot;: &quot;./dist/chart.cjs&quot;,
  &quot;module&quot;: &quot;./dist/chart.js&quot;,
</code></pre>
<p>I think this is saying that if you want to use an ES Module (<code>module</code>) you
should use <code>dist/chart.js</code>, but the jsDelivr and unpkg CDNs should use
<code>./dist/chart.umd.js</code>. I guess <code>main</code> is for Node.</p>
<p><code>chart.js</code>&rsquo;s <code>package.json</code> also says <code>&quot;type&quot;: &quot;module&quot;</code>, which <a href="https://nodejs.org/api/packages.html#modules-packages">according to this documentation</a>
tells Node to treat files as ES modules by default. I think it doesn&rsquo;t tell us
specifically which files are ES modules and which ones aren&rsquo;t but it does tell
us that <em>something</em> in there is an ES module.</p>
<h3 id="example-library-2-atcute-oauth-browser-client">example library 2: <code>@atcute/oauth-browser-client</code></h3>
<p><a href="https://github.com/mary-ext/atcute/tree/trunk/packages/oauth/browser-client"><code>@atcute/oauth-browser-client</code></a>
is a library for logging into Bluesky with OAuth in the browser.</p>
<p>Let&rsquo;s see what kinds of Javascript files it provides in its build!</p>
<pre><code>$ npm install @atcute/oauth-browser-client
$ cd node_modules/@atcute/oauth-browser-client/dist
$ ls *js
constants.js  dpop.js  environment.js  errors.js  index.js  resolvers.js
</code></pre>
<p>It seems like the only plausible root file in here is <code>index.js</code>, which looks
something like this:</p>
<pre><code>export { configureOAuth } from './environment.js';
export * from './errors.js';
export * from './resolvers.js';
</code></pre>
<p>This <code>export</code> syntax means it&rsquo;s an <strong>ES module</strong>. That means we can use it in
the browser without a build step! Let&rsquo;s see how to do that.</p>
<h3 id="how-to-use-an-es-module-with-importmaps">how to use an ES module with importmaps</h3>
<p>Using an ES module isn&rsquo;t an easy as just adding a <code>&lt;script src=&quot;whatever.js&quot;&gt;</code>. Instead, if
the ES module has dependencies (like <code>@atcute/oauth-browser-client</code> does) the
steps are:</p>
<ol>
<li>Set up an import map in your HTML</li>
<li>Put import statements like <code>import { configureOAuth } from '@atcute/oauth-browser-client';</code> in your JS code</li>
<li>Include your JS code in your HTML like this: <code>&lt;script type=&quot;module&quot; src=&quot;YOURSCRIPT.js&quot;&gt;&lt;/script&gt;</code></li>
</ol>
<p>The reason we need an import map instead of just doing something like <code>import { BrowserOAuthClient } from &quot;./oauth-client-browser.js&quot;</code> is that internally the module has more import statements like <code>import {something} from @atcute/client</code>, and we need to tell the browser where to get the code for <code>@atcute/client</code> and all of its other dependencies.</p>
<p>Here&rsquo;s what the importmap I used looks like for <code>@atcute/oauth-browser-client</code>:</p>
<pre><code>&lt;script type=&quot;importmap&quot;&gt;
{
  &quot;imports&quot;: {
    &quot;nanoid&quot;: &quot;./node_modules/nanoid/bin/dist/index.js&quot;,
    &quot;nanoid/non-secure&quot;: &quot;./node_modules/nanoid/non-secure/index.js&quot;,
    &quot;nanoid/url-alphabet&quot;: &quot;./node_modules/nanoid/url-alphabet/dist/index.js&quot;,
    &quot;@atcute/oauth-browser-client&quot;: &quot;./node_modules/@atcute/oauth-browser-client/dist/index.js&quot;,
    &quot;@atcute/client&quot;: &quot;./node_modules/@atcute/client/dist/index.js&quot;,
    &quot;@atcute/client/utils/did&quot;: &quot;./node_modules/@atcute/client/dist/utils/did.js&quot;
  }
}
&lt;/script&gt;
</code></pre>
<p>Getting these import maps to work is pretty fiddly, I feel like there must be a
tool to generate them automatically but I haven&rsquo;t found one yet. It&rsquo;s definitely possible to
write a script that automatically generates the importmaps using <a href="https://esbuild.github.io/api/#metafile">esbuild&rsquo;s metafile</a> but I haven&rsquo;t done that and
maybe there&rsquo;s a better way.</p>
<p>I decided to set up importmaps yesterday to get
<a href="https://github.com/jvns/bsky-oauth-example">github.com/jvns/bsky-oauth-example</a>
to work, so there&rsquo;s some example code in that repo.</p>
<p>Also someone pointed me to Simon Willison&rsquo;s
<a href="https://simonwillison.net/2023/May/2/download-esm/">download-esm</a>, which will
download an ES module and rewrite the imports to point to the JS files directly
so that you don&rsquo;t need importmaps. I haven&rsquo;t tried it yet but it seems like a
great idea.</p>
<h3 id="problems-with-importmaps-too-many-files">problems with importmaps: too many files</h3>
<p>I did run into some problems with using importmaps in the browser though &ndash; it
needed to download dozens of Javascript files to load my site, and my webserver
in development couldn&rsquo;t keep up for some reason. I kept seeing files fail to
load randomly and then had to reload the page and hope that they would succeed
this time.</p>
<p>It wasn&rsquo;t an issue anymore when I deployed my site to production, so I guess it
was a problem with my local dev environment.</p>
<p>Also one slightly annoying thing about ES modules in general is that you need to
be running a webserver to use them, I&rsquo;m sure this is for a good reason but it&rsquo;s
easier when you can just open your <code>index.html</code> file without starting a
webserver.</p>
<p>Because of the &ldquo;too many files&rdquo; thing I think actually using ES modules with
importmaps in this way isn&rsquo;t actually that appealing to me, but it&rsquo;s good to
know it&rsquo;s possible.</p>
<h3 id="how-to-use-an-es-module-without-importmaps">how to use an ES module without importmaps</h3>
<p>If the ES module doesn&rsquo;t have dependencies then it&rsquo;s even easier &ndash; you don&rsquo;t
need the importmaps! You can just:</p>
<ul>
<li>put <code>&lt;script type=&quot;module&quot; src=&quot;YOURCODE.js&quot;&gt;&lt;/script&gt;</code> in your HTML. The <code>type=&quot;module&quot;</code> is important.</li>
<li>put <code>import {whatever} from &quot;https://example.com/whatever.js&quot;</code> in <code>YOURCODE.js</code></li>
</ul>
<h3 id="alternative-use-esbuild">alternative: use esbuild</h3>
<p>If you don&rsquo;t want to use importmaps, you can also use a build system like <a href="https://esbuild.github.io/">esbuild</a>. I talked about how to do
that in <a href="https://jvns.ca/blog/2021/11/15/esbuild-vue/">Some notes on using esbuild</a>, but this blog post is
about ways to avoid build systems completely so I&rsquo;m not going to talk about
that option here. I do still like esbuild though and I think it&rsquo;s a good option
in this case.</p>
<h3 id="what-s-the-browser-support-for-importmaps">what&rsquo;s the browser support for importmaps?</h3>
<p><a href="https://caniuse.com/import-maps">CanIUse</a> says that importmaps are in
&ldquo;Baseline 2023: newly available across major browsers&rdquo; so my sense is that in
2024 that&rsquo;s still maybe a little bit too new? I think I would use importmaps
for some fun experimental code that I only wanted like myself and 12 people to
use, but if I wanted my code to be more widely usable I&rsquo;d use <code>esbuild</code> instead.</p>
<h3 id="example-library-3-atproto-oauth-client-browser">example library 3: <code>@atproto/oauth-client-browser</code></h3>
<p>Let&rsquo;s look at one final example library! This is a different Bluesky auth
library than <code>@atcute/oauth-browser-client</code>.</p>
<pre><code>$ npm install @atproto/oauth-client-browser
$ cd node_modules/@atproto/oauth-client-browser/dist
$ ls *js
browser-oauth-client.js  browser-oauth-database.js  browser-runtime-implementation.js  errors.js  index.js  indexed-db-store.js  util.js
</code></pre>
<p>Again, it seems like only real candidate file here is <code>index.js</code>. But this is a
different situation from the previous example library! Let&rsquo;s take a look at
<code>index.js</code>:</p>
<p>There&rsquo;s a bunch of stuff like this in <code>index.js</code>:</p>
<pre><code>__exportStar(require(&quot;@atproto/oauth-client&quot;), exports);
__exportStar(require(&quot;./browser-oauth-client.js&quot;), exports);
__exportStar(require(&quot;./errors.js&quot;), exports);
var util_js_1 = require(&quot;./util.js&quot;);
</code></pre>
<p>This <code>require()</code> syntax is CommonJS syntax, which means that we can&rsquo;t use this
file in the browser at all, we need to use some kind of build step, and
ESBuild won&rsquo;t work either.</p>
<p>Also in this library&rsquo;s <code>package.json</code> it says <code>&quot;type&quot;: &quot;commonjs&quot;</code> which is
another way to tell it&rsquo;s CommonJS.</p>
<h3 id="how-to-use-a-commonjs-module-with-esm-sh-https-esm-sh">how to use a CommonJS module with <a href="https://esm.sh">esm.sh</a></h3>
<p>Originally I thought it was impossible to use CommonJS modules without learning
a build system, but then someone Bluesky told me about
<a href="https://esm.sh">esm.sh</a>! It&rsquo;s a CDN that will translate anything into an ES
Module. <a href="https://www.skypack.dev/">skypack.dev</a> does something similar, I&rsquo;m not
sure what the difference is but one person mentioned that if one doesn&rsquo;t work
sometimes they&rsquo;ll try the other one.</p>
<p>For <code>@atproto/oauth-client-browser</code> using it seems pretty simple, I just need to put this in my HTML:</p>
<pre><code>&lt;script type=&quot;module&quot; src=&quot;script.js&quot;&gt; &lt;/script&gt;
</code></pre>
<p>and then put this in <code>script.js</code>.</p>
<pre><code>import { BrowserOAuthClient } from &quot;https://esm.sh/@atproto/[email protected]&quot;
</code></pre>
<p>It seems to Just Work, which is cool! Of course this is still sort of using a
build system &ndash; it&rsquo;s just that esm.sh is running the build instead of me. My
main concerns with this approach are:</p>
<ul>
<li>I don&rsquo;t really trust CDNs to keep working forever &ndash; usually I like to copy dependencies into my repository so that they don&rsquo;t go away for some reason in the future.</li>
<li>I&rsquo;ve heard of some issues with CDNs having security compromises which scares me.</li>
<li>I don&rsquo;t really understand what esm.sh is doing.</li>
</ul>
<h3 id="esbuild-can-also-convert-commonjs-modules-into-es-modules">esbuild can also convert CommonJS modules into ES modules</h3>
<p>I also learned that you can also use <code>esbuild</code> to convert a CommonJS module
into an ES module, though there are some limitations &ndash; the <code>import { BrowserOAuthClient } from</code> syntax doesn&rsquo;t work. Here&rsquo;s a <a href="https://github.com/evanw/esbuild/issues/442">github issue about that</a>.</p>
<p>I think the <code>esbuild</code> approach is probably more appealing to me than the
<code>esm.sh</code> approach because it&rsquo;s a tool that I already have on my computer so I
trust it more. I haven&rsquo;t experimented with this much yet though.</p>
<h3 id="summary-of-the-three-types-of-files">summary of the three types of files</h3>
<p>Here&rsquo;s a summary of the three types of JS files you might encounter, options
for how to use them, and how to identify them.</p>
<p>Unhelpfully a <code>.js</code> or <code>.min.js</code> file extension could be any of these 3
options, so if the file is <code>something.js</code> you need to do more detective work to
figure out what you&rsquo;re dealing with.</p>
<ol>
<li><strong>&ldquo;classic&rdquo; JS files</strong>
<ul>
<li><strong>How to use it:</strong>: <code>&lt;script src=&quot;whatever.js&quot;&gt;&lt;/script&gt;</code></li>
<li><strong>Ways to identify it:</strong>
<ul>
<li>The website has a big friendly banner in its setup instructions saying &ldquo;Use this with a CDN!&rdquo;  or something</li>
<li>A <code>.umd.js</code> extension</li>
<li>Just try to put it in a <code>&lt;script src=...</code> tag and see if it works</li>
</ul>
</li>
</ul>
</li>
<li><strong>ES Modules</strong>
<ul>
<li><strong>Ways to use it:</strong>
<ul>
<li>If there are no dependencies, just <code>import {whatever} from &quot;./my-module.js&quot;</code> directly in your code</li>
<li>If there are dependencies, create an importmap and <code>import {whatever} from &quot;my-module&quot;</code>
<ul>
<li>or use <a href="https://simonwillison.net/2023/May/2/download-esm/">download-esm</a> to remove the need for an importmap</li>
</ul>
</li>
<li>Use <a href="https://esbuild.github.io/">esbuild</a> or any ES Module bundler</li>
</ul>
</li>
<li><strong>Ways to identify it:</strong>
<ul>
<li>Look for an <code>import </code> or <code>export </code> statement. (not <code>module.exports = ...</code>, that&rsquo;s CommonJS)</li>
<li>An <code>.mjs</code> extension</li>
<li>maybe <code>&quot;type&quot;: &quot;module&quot;</code> in <code>package.json</code> (though it&rsquo;s not clear to me which file exactly this refers to)</li>
</ul>
</li>
</ul>
</li>
<li><strong>CommonJS Modules</strong>
<ul>
<li><strong>Ways to use it:</strong>
<ul>
<li>Use <a href="https://esm.sh/#docs">https://esm.sh</a> to convert it into an ES module, like <code>https://esm.sh/@atproto/[email protected]</code></li>
<li>Use a build somehow (??)</li>
</ul>
</li>
<li><strong>Ways to identify it:</strong>
<ul>
<li>Look for <code>require()</code> or <code>module.exports = ...</code> in the code</li>
<li>A <code>.cjs</code> extension</li>
<li>maybe <code>&quot;type&quot;: &quot;commonjs&quot;</code> in <code>package.json</code> (though it&rsquo;s not clear to me which file exactly this refers to)</li>
</ul>
</li>
</ul>
</li>
</ol>
<h3 id="it-s-really-nice-to-have-es-modules-standardized">it&rsquo;s really nice to have ES modules standardized</h3>
<p>The main difference between CommonJS modules and ES modules from my perspective
is that ES modules are actually a standard. This makes me feel a lot more
confident using them, because browsers commit to backwards compatibility for
web standards forever &ndash; if I write some code using ES modules today, I can
feel sure that it&rsquo;ll still work the same way in 15 years.</p>
<p>It also makes me feel better about using tooling like <code>esbuild</code> because even if
the esbuild project dies, because it&rsquo;s implementing a standard it feels likely
that there will be another similar tool in the future that I can replace it
with.</p>
<h3 id="the-js-community-has-built-a-lot-of-very-cool-tools">the JS community has built a lot of very cool tools</h3>
<p>A lot of the time when I talk about this stuff I get responses like &ldquo;I hate
javascript!!! it&rsquo;s the worst!!!&rdquo;. But my experience is that there are a lot of great tools for Javascript
(I just learned about <a href="https://esm.sh">https://esm.sh</a> yesterday which seems great! I love
esbuild!), and that if I take the time to learn how things works I can take
advantage of some of those tools and make my life a lot easier.</p>
<p>So the goal of this post is definitely not to complain about Javascript, it&rsquo;s
to understand the landscape so I can use the tooling in a way that feels good
to me.</p>
<h3 id="questions-i-still-have">questions I still have</h3>
<p>Here are some questions I still have, I&rsquo;ll add the answers into the post if I
learn the answer.</p>
<ul>
<li>Is there a tool that automatically generates importmaps for an ES Module that
I have set up locally? (apparently yes: <a href="https://jspm.org/getting-started">jspm</a>)</li>
<li>How can I convert a CommonJS module into an ES module on my computer, the way
<a href="https://esm.sh">https://esm.sh</a> does? (apparently esbuild can sort of do this, though <a href="https://github.com/evanw/esbuild/issues/442">named exports don&rsquo;t work</a>)</li>
<li>When people normally build CommonJS modules into regular JS code, what&rsquo;s code is
doing that? Obviously there are tools like webpack, rollup, esbuild, etc, but
do those tools all implement their own JS parsers/static analysis? How many
JS parsers are there out there?</li>
<li>Is there any way to bundle an ES module into a single file (like
<code>atcute-client.js</code>), but so that in the browser I can still import multiple
different paths from that file (like both <code>@atcute/client/lexicons</code> and
<code>@atcute/client</code>)?</li>
</ul>
<h3 id="all-the-tools">all the tools</h3>
<p>Here&rsquo;s a list of every tool we talked about in this post:</p>
<ul>
<li>Simon Willison&rsquo;s
<a href="https://simonwillison.net/2023/May/2/download-esm/">download-esm</a> which will
download an ES module and convert the imports to point at JS files so you
don&rsquo;t need an importmap</li>
<li><a href="esm.sh">https://esm.sh/</a> and <a href="https://www.skypack.dev/">skypack.dev</a></li>
<li><a href="https://esbuild.github.io/">esbuild</a></li>
<li><a href="https://jspm.org/getting-started">JSPM</a> can generate importmaps</li>
</ul>
<p>Writing this post has made me think that even though I usually don&rsquo;t want to
have a build that I run every time I update the project, I might be willing to
have a build step (using <code>download-esm</code> or something) that I run <strong>only once</strong>
when setting up the project and never run again except maybe if I&rsquo;m updating my
dependency versions.</p>
<h3 id="that-s-all">that&rsquo;s all!</h3>
<p>Thanks to <a href="https://polotek.net/">Marco Rogers</a> who taught me a lot of the things
in this post. I&rsquo;ve probably made some mistakes in this post and I&rsquo;d love to
know what they are &ndash; let me know on Bluesky or Mastodon!</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[New microblog with TILs]]></title>
        <link href="https://jvns.ca/blog/2024/11/09/new-microblog/"/>
        <updated>2024-11-09T09:24:29+00:00</updated>
        <id>https://jvns.ca/blog/2024/11/09/new-microblog/</id>
        <content type="html"><![CDATA[<p>I added a new section to this site a couple weeks ago called
<a href="https://jvns.ca/til/">TIL</a> (&ldquo;today I learned&rdquo;).</p>
<h3 id="the-goal-save-interesting-tools-facts-i-posted-on-social-media">the goal: save interesting tools &amp; facts I posted on social media</h3>
<p>One kind of thing I like to post on Mastodon/Bluesky is &ldquo;hey, here&rsquo;s a cool
thing&rdquo;, like <a href="https://github.com/dbcli/litecli">the great SQLite repl litecli</a>, or
the fact that cross compiling in Go Just Works and it&rsquo;s amazing, or
<a href="https://www.latacora.com/blog/2018/04/03/cryptographic-right-answers/">cryptographic right answers</a>,
or <a href="https://diffdiff.net/">this great diff tool</a>. Usually I don&rsquo;t want to write
a whole blog post about those things because I really don&rsquo;t have much more to
say than &ldquo;hey this is useful!&rdquo;</p>
<p>It started to bother me that I didn&rsquo;t have anywhere to put those things: for
example recently I wanted to use <a href="https://diffdiff.net/">diffdiff</a> and I just
could not remember what it was called.</p>
<h3 id="the-solution-make-a-new-section-of-this-blog">the solution: make a new section of this blog</h3>
<p>So I quickly made a new folder called <a href="https://jvns.ca/til/">/til/</a>, added some
custom styling (I wanted to style the posts to look a little bit like a tweet),
made a little Rake task to help me create new posts quickly (<code>rake new_til</code>), and
set up a separate RSS Feed for it.</p>
<p>I think this new section of the blog might be more for myself than anything,
now when I forget the link to Cryptographic Right Answers I can hopefully look
it up on the TIL page. (you might think &ldquo;julia, why not use bookmarks??&rdquo; but I
have been failing to use bookmarks for my whole life and I don&rsquo;t see that
changing ever, putting things in public is for whatever reason much easier for
me)</p>
<p>So far it&rsquo;s been working, often I can actually just make a quick post in 2
minutes which was the goal.</p>
<h3 id="inspired-by-simon-willison-s-til-blog">inspired by Simon Willison&rsquo;s TIL blog</h3>
<p>My page is inspired by <a href="https://til.simonwillison.net/">Simon Willison&rsquo;s great TIL blog</a>, though my TIL posts are a lot shorter.</p>
<h3 id="i-don-t-necessarily-want-everything-to-be-archived">I don&rsquo;t necessarily want everything to be archived</h3>
<p>This came about because I spent a lot of time on Twitter, so I&rsquo;ve been thinking
about what I want to do about all of my tweets.</p>
<p>I keep reading the advice to &ldquo;POSSE&rdquo; (&ldquo;post on your own site, syndicate
elsewhere&rdquo;), and while I find the idea appealing in principle, for me part of
the appeal of social media is that it&rsquo;s a little bit ephemeral. I can
post polls or questions or observations or jokes and then they can just kind of
fade away as they become less relevant.</p>
<p>I find it a lot easier to identify specific categories of things that I actually
want to have on a Real Website That I Own:</p>
<ul>
<li>blog posts here!</li>
<li>comics at <a href="https://wizardzines.com/comics/">https://wizardzines.com/comics/</a>!</li>
<li>now TILs at <a href="https://jvns.ca/til/">https://jvns.ca/til/</a>)</li>
</ul>
<p>and then let everything else be kind of ephemeral.</p>
<p>I really believe in the advice to make email lists though &ndash; the first two
(blog posts &amp; comics) both have email lists and RSS feeds that people can
subscribe to if they want. I might add a quick summary of any TIL posts from
that week to the &ldquo;blog posts from this week&rdquo; mailing list.</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[ASCII control characters in my terminal]]></title>
        <link href="https://jvns.ca/blog/2024/10/31/ascii-control-characters/"/>
        <updated>2024-10-31T08:00:10+00:00</updated>
        <id>https://jvns.ca/blog/2024/10/31/ascii-control-characters/</id>
        <content type="html"><![CDATA[<p>Hello! I&rsquo;ve been thinking about the terminal a lot and yesterday I got curious
about all these &ldquo;control codes&rdquo;, like <code>Ctrl-A</code>, <code>Ctrl-C</code>, <code>Ctrl-W</code>, etc. What&rsquo;s
the deal with all of them?</p>
<h3 id="a-table-of-ascii-control-characters">a table of ASCII control characters</h3>
<p>Here&rsquo;s a table of all 33 ASCII control characters, and what they do on my
machine (on Mac OS), more or less. There are about a million caveats, but I&rsquo;ll talk about
what it means and all the problems with this diagram that I know about.</p>
<p><a href="https://jvns.ca/ascii.html"><img src="https://jvns.ca/images/ascii-control.png"></a></p>
<p>You can also view it <a href="https://jvns.ca/ascii.html">as an HTML page</a> (I just made it an image so
it would show up in RSS).</p>
<h3 id="different-kinds-of-codes-are-mixed-together">different kinds of codes are mixed together</h3>
<p>The first surprising thing about this diagram to me is that there are 33
control codes, split into (very roughly speaking) these categories:</p>
<ol>
<li>Codes that are handled by the operating system&rsquo;s terminal driver, for
example when the OS sees a <code>3</code> (<code>Ctrl-C</code>), it&rsquo;ll send a <code>SIGINT</code> signal to
the current program</li>
<li>Everything else is passed through to the application as-is and the
application can do whatever it wants with them. Some subcategories of
those:
<ul>
<li>Codes that correspond to a literal keypress of a key on your keyboard
(<code>Enter</code>, <code>Tab</code>, <code>Backspace</code>). For example when you press <code>Enter</code>, your
terminal gets sent <code>13</code>.</li>
<li>Codes used by <code>readline</code>: &ldquo;the application can do whatever it wants&rdquo;
often means &ldquo;it&rsquo;ll do more or less what the <code>readline</code> library does,
whether the application actually uses <code>readline</code> or not&rdquo;, so I&rsquo;ve
labelled a bunch of the codes that <code>readline</code> uses</li>
<li>Other codes, for example I think <code>Ctrl-X</code> has no standard meaning in the
terminal in general but emacs uses it very heavily</li>
</ul>
</li>
</ol>
<p>There&rsquo;s no real structure to which codes are in which categories, they&rsquo;re all
just kind of randomly scattered because this evolved organically.</p>
<p>(If you&rsquo;re curious about readline, I wrote more about readline in <a href="https://jvns.ca/blog/2024/07/08/readline/">entering text in the terminal is complicated</a>, and there are a lot of
<a href="https://github.com/chzyer/readline/blob/master/doc/shortcut.md">cheat sheets out there</a>)</p>
<h3 id="there-are-only-33-control-codes">there are only 33 control codes</h3>
<p>Something else that I find a little surprising is that are only 33 control codes &ndash;
A to Z, plus 7 more (<code>@, [, \, ], ^, _, ?</code>). This means that if you want to
have for example <code>Ctrl-1</code> as a keyboard shortcut in a terminal application,
that&rsquo;s not really meaningful &ndash; on my machine at least <code>Ctrl-1</code> is exactly the
same thing as just pressing <code>1</code>, <code>Ctrl-3</code> is the same as <code>Ctrl-[</code>, etc.</p>
<p>Also <code>Ctrl+Shift+C</code> isn&rsquo;t a control code &ndash; what it does depends on your
terminal emulator. On Linux <code>Ctrl-Shift-X</code> is often used by the terminal
emulator to copy or open a new tab or paste for example, it&rsquo;s not sent to the
TTY at all.</p>
<p>Also I use <code>Ctrl+Left Arrow</code> all the time, but that isn&rsquo;t a control code,
instead it sends an ANSI escape sequence (<code>ctrl-[[1;5D</code>) which is a different
thing which we absolutely do not have space for in this post.</p>
<p>This &ldquo;there are only 33 codes&rdquo; thing is totally different from how keyboard
shortcuts work in a GUI where you can have <code>Ctrl+KEY</code> for any key you want.</p>
<h3 id="the-official-ascii-names-aren-t-very-meaningful-to-me">the official ASCII names aren&rsquo;t very meaningful to me</h3>
<p>Each of these 33 control codes has a name in ASCII (for example <code>3</code> is <code>ETX</code>).
When all of these control codes were originally defined, they weren&rsquo;t being
used for computers or terminals at all, they were used for <a href="https://falsedoor.com/doc/ascii_evolution-of-character-codes.pdf">the telegraph machine</a>.
Telegraph machines aren&rsquo;t the same as UNIX terminals so a lot of the codes were repurposed to mean something else.</p>
<p>Personally I don&rsquo;t find these ASCII names very useful, because 50% of the time
the name in ASCII has no actual relationship to what that code does on UNIX
systems today. So it feels easier to just ignore the ASCII names completely
instead of trying to figure which ones still match their original meaning.</p>
<h3 id="it-s-hard-to-use-ctrl-m-as-a-keyboard-shortcut">It&rsquo;s hard to use Ctrl-M  as a keyboard shortcut</h3>
<p>Another thing that&rsquo;s a bit weird is that <code>Ctrl-M</code> is literally the same as
<code>Enter</code>, and <code>Ctrl-I</code> is the same as <code>Tab</code>, which makes it hard to use those two as keyboard shortcuts.</p>
<p>From some quick research, it seems like some folks do still use <code>Ctrl-I</code> and
<code>Ctrl-M</code> as keyboard shortcuts (<a href="https://github.com/tmux/tmux/issues/2705">here&rsquo;s an example</a>), but to do that
you need to configure your terminal emulator to treat them differently than the
default.</p>
<p>For me the main takeaway is that if I ever write a terminal application I
should avoid <code>Ctrl-I</code> and <code>Ctrl-M</code> as keyboard shortcuts in it.</p>
<h3 id="how-to-identify-what-control-codes-get-sent">how to identify what control codes get sent</h3>
<p>While writing this I needed to do a bunch of experimenting to figure out what
various key combinations did, so I wrote this Python script
<a href="https://gist.github.com/jvns/a2ea09dbfbe03cc75b7bfb381941c742">echo-key.py</a>
that will print them out.</p>
<p>There&rsquo;s probably a more official way but I appreciated having a script I could
customize.</p>
<h3 id="caveat-on-canonical-vs-noncanonical-mode">caveat: on canonical vs noncanonical mode</h3>
<p>Two of these codes (<code>Ctrl-W</code> and <code>Ctrl-U</code>) are labelled in the table as
&ldquo;handled by the OS&rdquo;, but actually they&rsquo;re not <strong>always</strong> handled by the OS, it
depends on whether the terminal is in &ldquo;canonical&rdquo; mode or in &ldquo;noncanonical mode&rdquo;.</p>
<p>In <a href="https://www.man7.org/linux/man-pages/man3/termios.3.html">canonical mode</a>,
programs only get input when you press <code>Enter</code> (and the OS is in charge of deleting characters when you press <code>Backspace</code> or <code>Ctrl-W</code>). But in noncanonical mode the program gets
input immediately when you press a key, and the <code>Ctrl-W</code> and <code>Ctrl-U</code> codes are passed through to the program to handle any way it wants.</p>
<p>Generally in noncanonical mode the program will handle <code>Ctrl-W</code> and <code>Ctrl-U</code>
similarly to how the OS does, but there are some small differences.</p>
<p>Some examples of programs that use canonical mode:</p>
<ul>
<li>probably pretty much any noninteractive program, like <code>grep</code> or <code>cat</code></li>
<li><code>git</code>, I think</li>
</ul>
<p>Examples of programs that use noncanonical mode:</p>
<ul>
<li><code>python3</code>, <code>irb</code> and other REPLs</li>
<li>your shell</li>
<li>any full screen TUI like <code>less</code> or <code>vim</code></li>
</ul>
<h3 id="caveat-all-of-the-os-terminal-driver-codes-are-configurable-with-stty">caveat: all of the &ldquo;OS terminal driver&rdquo; codes are configurable with <code>stty</code></h3>
<p>I said that <code>Ctrl-C</code> sends <code>SIGINT</code> but technically this is not necessarily
true, if you really want to you can remap all of the codes labelled &ldquo;OS
terminal driver&rdquo;, plus Backspace, using a tool called <code>stty</code>, and you can view
the mappings with <code>stty -a</code>.</p>
<p>Here are the mappings on my machine right now:</p>
<pre><code>$ stty -a
cchars: discard = ^O; dsusp = ^Y; eof = ^D; eol = &lt;undef&gt;;
	eol2 = &lt;undef&gt;; erase = ^?; intr = ^C; kill = ^U; lnext = ^V;
	min = 1; quit = ^\; reprint = ^R; start = ^Q; status = ^T;
	stop = ^S; susp = ^Z; time = 0; werase = ^W;
</code></pre>
<p>I have personally never remapped any of these and I cannot imagine a reason I
would (I think it would be a recipe for confusion and disaster for me), but I
<a href="TODO">asked on Mastodon</a> and people said the most common reasons they used
<code>stty</code> were:</p>
<ul>
<li>fix a broken terminal with <code>stty sane</code></li>
<li>set <code>stty erase ^H</code> to change how Backspace works</li>
<li>set <code>stty ixoff</code></li>
<li>some people even map <code>SIGINT</code> to a different key, like their <code>DELETE</code> key</li>
</ul>
<h3 id="caveat-on-signals">caveat: on signals</h3>
<p>Two signals caveats:</p>
<ol>
<li>If the <code>ISIG</code> terminal mode is turned off, then the OS won&rsquo;t send signals. For example <code>vim</code> turns off <code>ISIG</code></li>
<li>Apparently on BSDs, there&rsquo;s an extra control code (<code>Ctrl-T</code>) which sends <code>SIGINFO</code></li>
</ol>
<p>You can see which terminal modes a program is setting using <code>strace</code> like this,
terminal modes are set with the <code>ioctl</code> system call:</p>
<pre><code>$ strace -tt -o out  vim
$ grep ioctl out | grep SET
</code></pre>
<p>here are the modes <code>vim</code> sets when it starts (<code>ISIG</code> and <code>ICANON</code> are
missing!):</p>
<pre><code>17:43:36.670636 ioctl(0, TCSETS, {c_iflag=IXANY|IMAXBEL|IUTF8,
c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST, c_cflag=B38400|CS8|CREAD,
c_lflag=ECHOK|ECHOCTL|ECHOKE|PENDIN, ...}) = 0
</code></pre>
<p>and it resets the modes when it exits:</p>
<pre><code>17:43:38.027284 ioctl(0, TCSETS, {c_iflag=ICRNL|IXANY|IMAXBEL|IUTF8,
c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD,
c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE|PENDIN, ...}) = 0
</code></pre>
<p>I think the specific combination of modes vim is using here might be called
&ldquo;raw mode&rdquo;, <a href="https://linux.die.net/man/3/cfmakeraw">man cfmakeraw</a> talks about
that.</p>
<h3 id="there-are-a-lot-of-conflicts">there are a lot of conflicts</h3>
<p>Related to &ldquo;there are only 33 codes&rdquo;, there are a lot of conflicts where
different parts of the system want to use the same code for different things,
for example by default <code>Ctrl-S</code> will freeze your screen, but if you turn that
off then <code>readline</code> will use <code>Ctrl-S</code> to do a forward search.</p>
<p>Another example is that on my machine sometimes <code>Ctrl-T</code> will send <code>SIGINFO</code>
and sometimes it&rsquo;ll transpose 2 characters and sometimes it&rsquo;ll do something
completely different depending on:</p>
<ul>
<li>whether the program has <code>ISIG</code> set</li>
<li>whether the program uses <code>readline</code> / imitates readline&rsquo;s behaviour</li>
</ul>
<h3 id="caveat-on-backspace-and-other-backspace">caveat: on &ldquo;backspace&rdquo; and &ldquo;other backspace&rdquo;</h3>
<p>In this diagram I&rsquo;ve labelled code 127 as &ldquo;backspace&rdquo; and 8 as &ldquo;other
backspace&rdquo;. Uh, what?</p>
<p>I think this was the single biggest topic of discussion in the replies on Mastodon &ndash; apparently there&rsquo;s a LOT of history to this and I&rsquo;d never heard of any of it before.</p>
<p>First, here&rsquo;s how it works on my machine:</p>
<ol>
<li>I press the <code>Backspace</code> key</li>
<li>The TTY gets sent the byte <code>127</code>, which is called <code>DEL</code> in ASCII</li>
<li>the OS terminal driver and readline both have <code>127</code> mapped to &ldquo;backspace&rdquo; (so it works both in canonical mode and noncanonical mode)</li>
<li>The previous character gets deleted</li>
</ol>
<p>If I press <code>Ctrl+H</code>, it has the same effect as <code>Backspace</code> if I&rsquo;m using
readline, but in a program without readline support (like <code>cat</code> for instance),
it just prints out <code>^H</code>.</p>
<p>Apparently Step 2 above is different for some folks &ndash; their <code>Backspace</code> key sends
the byte <code>8</code> instead of <code>127</code>, and so if they want Backspace to work then they
need to configure the OS (using <code>stty</code>) to set <code>erase = ^H</code>.</p>
<p>There&rsquo;s an incredible <a href="https://www.debian.org/doc/debian-policy/ch-opersys.html#keyboard-configuration">section of the Debian Policy Manual on keyboard configuration</a>
that describes how <code>Delete</code> and <code>Backspace</code> should work according to Debian
policy, which seems very similar to how it works on my Mac today.  My
understanding (via <a href="https://tech.lgbt/@Diziet/113396035847619715">this mastodon post</a>)
is that this policy was written in the 90s because there was a lot of confusion
about what <code>Backspace</code> should do in the 90s and there needed to be a standard
to get everything to work.</p>
<p>There&rsquo;s a bunch more historical terminal stuff here but that&rsquo;s all I&rsquo;ll say for
now.</p>
<h3 id="there-s-probably-a-lot-more-diversity-in-how-this-works">there&rsquo;s probably a lot more diversity in how this works</h3>
<p>I&rsquo;ve probably missed a bunch more ways that &ldquo;how it works on my machine&rdquo; might
be different from how it works on other people&rsquo;s machines, and I&rsquo;ve probably
made some mistakes about how it works on my machine too. But that&rsquo;s all I&rsquo;ve
got for today.</p>
<p>Some more stuff I know that I&rsquo;ve left out: according to <code>stty -a</code> <code>Ctrl-O</code> is
&ldquo;discard&rdquo;, <code>Ctrl-R</code> is &ldquo;reprint&rdquo;, and <code>Ctrl-Y</code> is &ldquo;dsusp&rdquo;. I have no idea how
to make those actually do anything (pressing them does not do anything
obvious, and some people have told me what they used to do historically but
it&rsquo;s not clear to me if they have a use in 2024), and a lot of the time in practice
they seem to just be passed through to the application anyway so I just
labelled <code>Ctrl-R</code> and <code>Ctrl-Y</code> as
<code>readline</code>.</p>
<h3 id="not-all-of-this-is-that-useful-to-know">not all of this is that useful to know</h3>
<p>Also I want to say that I think the contents of this post are kind of interesting
but I don&rsquo;t think they&rsquo;re necessarily that <em>useful</em>. I&rsquo;ve used the terminal
pretty successfully every day for the last 20 years without knowing literally
any of this &ndash; I just knew what <code>Ctrl-C</code>, <code>Ctrl-D</code>, <code>Ctrl-Z</code>, <code>Ctrl-R</code>,
<code>Ctrl-L</code> did in practice (plus maybe <code>Ctrl-A</code>, <code>Ctrl-E</code> and <code>Ctrl-W</code>) and did
not worry about the details for the most part, and that was
almost always totally fine except when I was <a href="https://jvns.ca/blog/2022/07/20/pseudoterminals/">trying to use xterm.js</a>.</p>
<p>But I had fun learning about it so maybe it&rsquo;ll be interesting to you too.</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Using less memory to look up IP addresses in Mess With DNS]]></title>
        <link href="https://jvns.ca/blog/2024/10/27/asn-ip-address-memory/"/>
        <updated>2024-10-27T07:47:04+00:00</updated>
        <id>https://jvns.ca/blog/2024/10/27/asn-ip-address-memory/</id>
        <content type="html"><![CDATA[<p>I&rsquo;ve been having problems for the last 3 years or so where <a href="https://messwithdns.net/">Mess With DNS</a>
periodically runs out of memory and gets OOM killed.</p>
<p>This hasn&rsquo;t been a big priority for me: usually it just goes down for a few
minutes while it restarts, and it only happens once a day at most, so I&rsquo;ve just
been ignoring. But last week it started actually causing a problem so I decided
to look into it.</p>
<p>This was kind of winding road where I learned a lot so here&rsquo;s a table of contents:</p>
<ul>
<li><a href="#there-s-about-100mb-of-memory-available">there&rsquo;s about 100MB of memory available</a></li>
<li><a href="#the-problem-oom-killing-the-backup-script">the problem: OOM killing the backup script</a></li>
<li><a href="#attempt-1-use-sqlite">attempt 1: use SQLite</a>
<ul>
<li><a href="#problem-how-to-store-ipv6-addresses">problem: how to store IPv6 addresses</a></li>
<li><a href="#problem-it-s-500x-slower">problem: it&rsquo;s 500x slower</a></li>
<li><a href="#time-for-explain-query-plan">time for EXPLAIN QUERY PLAN</a></li>
</ul>
</li>
<li><a href="#attempt-2-use-a-trie">attempt 2: use a trie</a>
<ul>
<li><a href="#some-notes-on-memory-profiling">some notes on memory profiling</a></li>
</ul>
</li>
<li><a href="#attempt-3-make-my-array-use-less-memory">attempt 3: make my array use less memory</a>
<ul>
<li><a href="#idea-3-1-deduplicate-the-name-and-country">idea 3.1: deduplicate the Name and Country</a></li>
<li><a href="#how-big-are-asns">how big are ASNs?</a></li>
<li><a href="#idea-3-2-use-netip-addr-instead-of-net-ip">idea 3.2: use netip.Addr instead of net.IP</a></li>
<li><a href="#the-result-saved-70mb-of-memory">the result: saved 70MB of memory!</a></li>
</ul>
</li>
</ul>
<h3 id="there-s-about-100mb-of-memory-available">there&rsquo;s about 100MB of memory available</h3>
<p>I run Mess With DNS on a VM without about 465MB of RAM, which according to
<code>ps aux</code> (the <code>RSS</code> column) is split up something like:</p>
<ul>
<li>100MB for PowerDNS</li>
<li>200MB for Mess With DNS</li>
<li>40MB for <a href="https://fly.io/blog/ssh-and-user-mode-ip-wireguard/">hallpass</a></li>
</ul>
<p>That leaves about 110MB of memory free.</p>
<p>A while back I set <a href="https://tip.golang.org/doc/gc-guide">GOMEMLIMIT</a> to 250MB
to try to make sure the garbage collector ran if Mess With DNS used more than
250MB of memory, and I think this helped but it didn&rsquo;t solve everything.</p>
<h3 id="the-problem-oom-killing-the-backup-script">the problem: OOM killing the backup script</h3>
<p>A few weeks ago I started backing up Mess With DNS&rsquo;s database for the first time <a href="https://jvns.ca/til/restic-for-backing-up-sqlite-dbs/">using restic</a>.</p>
<p>This has been working okay, but since Mess With DNS operates without much extra
memory I think <code>restic</code> sometimes needed more memory than was available on the
system, and so the backup script sometimes got OOM killed.</p>
<p>This was a problem because</p>
<ol>
<li>backups might be corrupted sometimes</li>
<li>more importantly, restic takes out a lock when it runs, and so I&rsquo;d have to manually do an
unlock if I wanted the backups to continue working. Doing manual work like
this is the #1 thing I try to avoid with all my web services (who has time
for that!) so I really wanted to do something about it.</li>
</ol>
<p>There&rsquo;s probably more than one solution to this, but I decided to try to make
Mess With DNS use less memory so that there was more available memory on the
system, mostly because it seemed like a fun problem to try to solve.</p>
<h3 id="what-s-using-memory-ip-addresses">what&rsquo;s using memory: IP addresses</h3>
<p>I&rsquo;d run a memory profile of Mess With DNS a bunch of times in the past, so I
knew exactly what was using most of Mess With DNS&rsquo;s memory: IP addresses.</p>
<p>When it starts, Mess With DNS loads this <a href="https://iptoasn.com/">database where you can look up the
ASN of every IP address</a> into memory, so that when it
receives a DNS query it can take the source IP address like <code>74.125.16.248</code> and
tell you that IP address belongs to <code>GOOGLE</code>.</p>
<p>This database by itself used about 117MB of memory, and a simple <code>du</code> told me
that was too much &ndash; the original text files were only 37MB!</p>
<pre><code>$ du -sh *.tsv
26M	ip2asn-v4.tsv
11M	ip2asn-v6.tsv
</code></pre>
<p>The way it worked originally is that I had an array of these:</p>
<pre><code>type IPRange struct {
	StartIP net.IP
	EndIP   net.IP
	Num     int
	Name    string
	Country string
}
</code></pre>
<p>and I searched through it with a binary search to figure out if any of the
ranges contained the IP I was looking for. Basically the simplest possible
thing and it&rsquo;s super fast, my machine can do about 9 million lookups per
second.</p>
<h3 id="attempt-1-use-sqlite">attempt 1: use SQLite</h3>
<p>I&rsquo;ve been using SQLite recently, so my first thought was &ndash; maybe I can store
all of this data on disk in an SQLite database, give the tables an index, and
that&rsquo;ll use less memory.</p>
<p>So I:</p>
<ul>
<li>wrote a quick Python script using <a href="https://sqlite-utils.datasette.io/en/stable/">sqlite-utils</a> to import the TSV files into an SQLite database</li>
<li>adjusted my code to select from the database instead</li>
</ul>
<p>This did solve the initial memory goal (after a GC it now hardly used any
memory at all because the table was on disk!), though I&rsquo;m not sure how much GC
churn this solution would cause if we needed to do a lot of queries at once. I
did a quick memory profile and it seemed to allocate about 1KB of memory per
lookup.</p>
<p>Let&rsquo;s talk about the issues I ran into with using SQLite though.</p>
<h3 id="problem-how-to-store-ipv6-addresses">problem: how to store IPv6 addresses</h3>
<p>SQLite doesn&rsquo;t have support for big integers and IPv6 addresses are 128 bits,
so I decided to store them as text. I think <code>BLOB</code> might have been better, I
originally thought <code>BLOB</code>s couldn&rsquo;t be compared but the <a href="https://www.sqlite.org/datatype3.html#sort_order">sqlite docs</a> say they can.</p>
<p>I ended up with this schema:</p>
<pre><code>CREATE TABLE ipv4_ranges (
   start_ip INTEGER NOT NULL,
   end_ip INTEGER NOT NULL,
   asn INTEGER NOT NULL,
   country TEXT NOT NULL,
   name TEXT NOT NULL
);
CREATE TABLE ipv6_ranges (
   start_ip TEXT NOT NULL,
   end_ip TEXT NOT NULL,
   asn INTEGER,
   country TEXT,
   name TEXT
);
CREATE INDEX idx_ipv4_ranges_start_ip ON ipv4_ranges (start_ip);
CREATE INDEX idx_ipv6_ranges_start_ip ON ipv6_ranges (start_ip);
CREATE INDEX idx_ipv4_ranges_end_ip ON ipv4_ranges (end_ip);
CREATE INDEX idx_ipv6_ranges_end_ip ON ipv6_ranges (end_ip);
</code></pre>
<p>Also I learned that Python has an <code>ipaddress</code> module, so I could use
<code>ipaddress.ip_address(s).exploded</code> to make sure that the IPv6 addresses were
expanded so that a string comparison would compare them properly.</p>
<h3 id="problem-it-s-500x-slower">problem: it&rsquo;s 500x slower</h3>
<p>I ran a quick microbenchmark, something like this. It printed out that it could
look up 17,000 IPv6 addresses per second, and similarly for IPv4 addresses.</p>
<p>This was pretty discouraging &ndash; being able to look up 17k addresses per section
is kind of fine (Mess With DNS does not get a lot of traffic), but I compared it to
the original binary search code and the original code could do 9 million per second.</p>
<pre><code>	ips := []net.IP{}
	count := 20000
	for i := 0; i &lt; count; i++ {
		// create a random IPv6 address
		bytes := randomBytes()
		ip := net.IP(bytes[:])
		ips = append(ips, ip)
	}
	now := time.Now()
	success := 0
	for _, ip := range ips {
		_, err := ranges.FindASN(ip)
		if err == nil {
			success++
		}
	}
	fmt.Println(success)
	elapsed := time.Since(now)
	fmt.Println(&quot;number per second&quot;, float64(count)/elapsed.Seconds())
</code></pre>
<h3 id="time-for-explain-query-plan">time for EXPLAIN QUERY PLAN</h3>
<p>I&rsquo;d never really done an EXPLAIN in sqlite, so I thought it would be a fun
opportunity to see what the query plan was doing.</p>
<pre><code>sqlite&gt; explain query plan select * from ipv6_ranges where '2607:f8b0:4006:0824:0000:0000:0000:200e' BETWEEN start_ip and end_ip;
QUERY PLAN
`--SEARCH ipv6_ranges USING INDEX idx_ipv6_ranges_end_ip (end_ip&gt;?)
</code></pre>
<p>It looks like it&rsquo;s just using the <code>end_ip</code> index and not the <code>start_ip</code> index,
so maybe it makes sense that it&rsquo;s slower than the binary search.</p>
<p>I tried to figure out if there was a way to make SQLite use both indexes, but I
couldn&rsquo;t find one and maybe it knows best anyway.</p>
<p>At this point I gave up on the SQLite solution, I didn&rsquo;t love that it was
slower and also it&rsquo;s a lot more complex than just doing a binary search. I felt
like I&rsquo;d rather keep something much more similar to the binary search.</p>
<p>A few things I tried with SQLite that did not cause it to use both indexes:</p>
<ul>
<li>using a compound index instead of two separate indexes</li>
<li>running <code>ANALYZE</code></li>
<li>using <code>INTERSECT</code> to intersect the results of <code>start_ip &lt; ?</code> and <code>? &lt; end_ip</code>. This did make it use both indexes, but it also seemed to make the
query literally 1000x slower, probably because it needed to create the
results of both subqueries in memory and intersect them.</li>
</ul>
<h3 id="attempt-2-use-a-trie">attempt 2: use a trie</h3>
<p>My next idea was to use a
<a href="https://medium.com/basecs/trying-to-understand-tries-3ec6bede0014">trie</a>,
because I had some vague idea that maybe a trie would use less memory, and
I found this library called
<a href="https://github.com/seancfoley/ipaddress-go">ipaddress-go</a> that lets you look up IP addresses using a trie.</p>
<p>I tried using it <a href="https://gist.github.com/jvns/3ce617796b22127017590ac62c57fddd">here&rsquo;s the code</a>, but I
think I was doing something wildly wrong because, compared to my naive array + binary search:</p>
<ul>
<li>it used WAY more memory (800MB to store just the IPv4 addresses)</li>
<li>it was a lot slower to do the lookups (it could do only 100K/second instead of 9 million/second)</li>
</ul>
<p>I&rsquo;m not really sure what went wrong here but I gave up on this approach and
decided to just try to make my array use less memory and stick to a simple
binary search.</p>
<h3 id="some-notes-on-memory-profiling">some notes on memory profiling</h3>
<p>One thing I learned about memory profiling is that you can use <code>runtime</code>
package to see how much memory is currently allocated in the program. That&rsquo;s
how I got all the memory numbers in this post. Here&rsquo;s the code:</p>
<pre><code>func memusage() {
	runtime.GC()
	var m runtime.MemStats
	runtime.ReadMemStats(&amp;m)
	fmt.Printf(&quot;Alloc = %v MiB\n&quot;, m.Alloc/1024/1024)
	// write mem.prof
	f, err := os.Create(&quot;mem.prof&quot;)
	if err != nil {
		log.Fatal(err)
	}
	pprof.WriteHeapProfile(f)
	f.Close()
}
</code></pre>
<p>Also I learned that if you use <code>pprof</code> to analyze a heap profile there are two
ways to analyze it: you can pass either <code>--alloc-space</code> or <code>--inuse-space</code> to
<code>go tool pprof</code>. I don&rsquo;t know how I didn&rsquo;t realize this before but
<code>alloc-space</code> will tell you about everything that was allocated, and
<code>inuse-space</code> will just include memory that&rsquo;s currently in use.</p>
<p>Anyway I ran <code>go tool pprof -pdf --inuse_space mem.prof &gt; mem.pdf</code> a lot. Also
every time I use pprof I find myself referring to <a href="https://jvns.ca/blog/2017/09/24/profiling-go-with-pprof/">my own intro to pprof</a>, it&rsquo;s probably
the blog post I wrote that I use the most often. I should add <code>--alloc-space</code>
and <code>--inuse-space</code> to it.</p>
<h3 id="attempt-3-make-my-array-use-less-memory">attempt 3: make my array use less memory</h3>
<p>I was storing my ip2asn entries like this:</p>
<pre><code>type IPRange struct {
	StartIP net.IP
	EndIP   net.IP
	Num     int
	Name    string
	Country string
}
</code></pre>
<p>I had 3 ideas for ways to improve this:</p>
<ol>
<li>There was a lot of repetition of <code>Name</code> and the <code>Country</code>, because a lot of IP ranges belong to the same ASN</li>
<li><code>net.IP</code> is an <code>[]byte</code> under the hood, which felt like it involved an unnecessary pointer, was there a way to inline it into the struct?</li>
<li>Maybe I didn&rsquo;t need both the start IP and the end IP, often the ranges were consecutive so maybe I could rearrange things so that I only had the start IP</li>
</ol>
<h3 id="idea-3-1-deduplicate-the-name-and-country">idea 3.1: deduplicate the Name and Country</h3>
<p>I figured I could store the ASN info in an array, and then just store the index
into the array in my <code>IPRange</code> struct. Here are the structs so you can see what
I mean:</p>
<pre><code>type IPRange struct {
	StartIP netip.Addr
	EndIP   netip.Addr
	ASN     uint32
	Idx     uint32
}

type ASNInfo struct {
	Country string
	Name    string
}

type ASNPool struct {
	asns   []ASNInfo
	lookup map[ASNInfo]uint32
}
</code></pre>
<p>This worked! It brought memory usage from 117MB to 65MB &ndash; a 50MB savings. I felt good about this.</p>
<p><a href="https://github.com/jvns/mess-with-dns/blob/94f77b4bb1597b5e2a6768e33bd6c285919aa1bf/api/streamer/ip2asn/ip2asn.go#L18-L54">Here&rsquo;s all of the code for that part</a>.</p>
<h3 id="how-big-are-asns">how big are ASNs?</h3>
<p>As an aside &ndash; I&rsquo;m storing the ASN in a <code>uint32</code>, is that right? I looked in the ip2asn
file and the biggest one seems to be 401307, though there are a few lines that
say <code>4294901931</code> which is much bigger, but also are just inside the range of a
uint32. So I can definitely use a <code>uint32</code>.</p>
<pre><code>59.101.179.0	59.101.179.255	4294901931	Unknown	AS4294901931
</code></pre>
<h3 id="idea-3-2-use-netip-addr-instead-of-net-ip">idea 3.2: use <code>netip.Addr</code> instead of <code>net.IP</code></h3>
<p>It turns out that I&rsquo;m not the only one who felt that <code>net.IP</code> was using an
unnecessary amount of memory &ndash; in 2021 the folks at Tailscale released a new
IP address library for Go which solves this and many other issues. <a href="https://tailscale.com/blog/netaddr-new-ip-type-for-go">They wrote a great blog post about it</a>.</p>
<p>I discovered (to my delight) that not only does this new IP address library exist and do exactly what I want, it&rsquo;s also now in the Go
standard library as <a href="https://pkg.go.dev/net/netip#Addr">netip.Addr</a>. Switching to <code>netip.Addr</code> was
very easy and saved another 20MB of memory, bringing us to 46MB.</p>
<p>I didn&rsquo;t try my third idea (remove the end IP from the struct) because I&rsquo;d
already been programming for long enough on a Saturday morning and I was happy
with my progress.</p>
<p>It&rsquo;s always such a great feeling when I think &ldquo;hey, I don&rsquo;t like this, there
must be a better way&rdquo; and then immediately discover that someone has already
made the exact thing I want, thought about it a lot more than me, and
implemented it much better than I would have.</p>
<h3 id="all-of-this-was-messier-in-real-life">all of this was messier in real life</h3>
<p>Even though I tried to explain this in a simple linear way &ldquo;I tried X, then I
tried Y, then I tried Z&rdquo;, that&rsquo;s kind of a lie &ndash; I always try to take my
actual debugging process (total chaos) and make it seem more linear and
understandable because the reality is just too annoying to write down. It&rsquo;s
more like:</p>
<ul>
<li>try sqlite</li>
<li>try a trie</li>
<li>second guess everything that I concluded about sqlite, go back and look at
the results again</li>
<li>wait what about indexes</li>
<li>very very belatedly realize that I can use <code>runtime</code> to check how much
memory everything is using, start doing that</li>
<li>look at the trie again, maybe I misunderstood everything</li>
<li>give up and go back to binary search</li>
<li>look at all of the numbers for tries/sqlite again to make sure I didn&rsquo;t misunderstand</li>
</ul>
<h3 id="a-note-on-using-512mb-of-memory">A note on using 512MB of memory</h3>
<p>Someone asked why I don&rsquo;t just give the VM more memory. I could very easily
afford to pay for a VM with 1GB of memory, but I feel like 512MB really
<em>should</em> be enough (and really that 256MB should be enough!) so I&rsquo;d rather stay
inside that constraint. It&rsquo;s kind of a fun puzzle.</p>
<h3 id="a-few-ideas-from-the-replies">a few ideas from the replies</h3>
<p>Folks had a lot of good ideas I hadn&rsquo;t thought of. Recording them as
inspiration if I feel like having another Fun Performance Day at some point.</p>
<ul>
<li>Try Go&rsquo;s <a href="https://pkg.go.dev/unique">unique</a> package for the <code>ASNPool</code>. Someone tried this and it uses more memory, probably because Go&rsquo;s pointers are 64 bits</li>
<li>Try compiling with <code>GOARCH=386</code> to use 32-bit pointers to sace space (maybe in combination with using <code>unique</code>!)</li>
<li>It should be possible to store all of the IPv6 addresses in just 64 bits, because only the first 64 bits of the address are public</li>
<li><a href="https://en.m.wikipedia.org/wiki/Interpolation_search">Interpolation search</a> might be faster than binary search since IP addresses are numeric</li>
<li>Try the MaxMind db format with <a href="https://github.com/maxmind/mmdbwriter">mmdbwriter</a> or <a href="https://github.com/ipinfo/mmdbctl">mmdbctl</a></li>
<li>Tailscale&rsquo;s <a href="https://github.com/tailscale/art">art</a> routing table package</li>
</ul>
<h3 id="the-result-saved-70mb-of-memory">the result: saved 70MB of memory!</h3>
<p>I deployed the new version and now Mess With DNS is using less memory! Hooray!</p>
<p>A few other notes:</p>
<ul>
<li>lookups are a little slower &ndash; in my microbenchmark they went from 9 million
lookups/second to 6 million, maybe because I added a little indirection.
Using less memory and a little more CPU seemed like a good tradeoff though.</li>
<li>it&rsquo;s still using more memory than the raw text files do (46MB vs 37MB), I
guess pointers take up space and that&rsquo;s okay.</li>
</ul>
<p>I&rsquo;m honestly not sure if this will solve all my memory problems, probably not!
But I had fun, I learned a few things about SQLite, I still don&rsquo;t know what to
think about tries, and it made me love binary search even more than I already
did.</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Some notes on upgrading Hugo]]></title>
        <link href="https://jvns.ca/blog/2024/10/07/some-notes-on-upgrading-hugo/"/>
        <updated>2024-10-07T09:19:57+00:00</updated>
        <id>https://jvns.ca/blog/2024/10/07/some-notes-on-upgrading-hugo/</id>
        <content type="html"><![CDATA[<p>Warning: this is a post about very boring yakshaving, probably only of interest
to people who are trying to upgrade Hugo from a very old version to a new
version. But what are blogs for if not documenting one&rsquo;s very boring yakshaves
from time to time?</p>
<p>So yesterday I decided to try to upgrade Hugo. There&rsquo;s no real reason to do
this &ndash; I&rsquo;ve been using Hugo version 0.40 to generate this blog since 2018, it
works fine, and I don&rsquo;t have any problems with it. But I thought &ndash; maybe it
won&rsquo;t be as hard as I think, and I kind of like a tedious computer task sometimes!</p>
<p>I thought I&rsquo;d document what I learned along the way in case it&rsquo;s useful to
anyone else doing this very specific migration. I upgraded from Hugo v0.40
(from 2018) to v0.135 (from 2024).</p>
<p>Here are most of the changes I had to make:</p>
<h3 id="change-1-template-theme-partials-thing-html-is-now-partial-thing-html">change 1: <code>template &quot;theme/partials/thing.html</code> is now <code>partial thing.html</code></h3>
<p>I had to replace a bunch of instances of <code>{{ template &quot;theme/partials/header.html&quot; . }}</code> with <code>{{ partial &quot;header.html&quot; . }}</code>.</p>
<p>This happened in <a href="https://github.com/gohugoio/hugo/releases/tag/v0.42">v0.42</a>:</p>
<blockquote>
<p>We have now virtualized the filesystems for project and theme files. This
makes everything simpler, faster and more powerful. But it also means that
template lookups on the form {{ template “theme/partials/pagination.html” .
}} will not work anymore. That syntax has never been documented, so it&rsquo;s not
expected to be in wide use.</p>
</blockquote>
<h3 id="change-2-data-pages-is-now-site-regularpages">change 2: <code>.Data.Pages</code> is now <code>site.RegularPages</code></h3>
<p>This seems to be discussed in the <a href="https://github.com/gohugoio/hugo/releases/tag/v0.57.2">release notes for 0.57.2</a></p>
<p>I just needed to replace <code>.Data.Pages</code> with <code>site.RegularPages</code> in the template on the homepage as well as in my RSS feed template.</p>
<h3 id="change-3-next-and-prev-got-flipped">change 3:  <code>.Next</code> and <code>.Prev</code> got flipped</h3>
<p>I had this comment in the part of my theme where I link to the next/previous blog post:</p>
<blockquote>
<p>&ldquo;next&rdquo; and &ldquo;previous&rdquo; in hugo apparently mean the opposite of what I&rsquo;d think
they&rsquo;d mean intuitively. I&rsquo;d expect &ldquo;next&rdquo; to mean &ldquo;in the future&rdquo; and
&ldquo;previous&rdquo; to mean &ldquo;in the past&rdquo; but it&rsquo;s the opposite</p>
</blockquote>
<p>It looks they changed this in
<a href="https://github.com/gohugoio/hugo/commit/ad705aac0649fa3102f7639bc4db65d45e108ee2">ad705aac064</a>
so that &ldquo;next&rdquo; actually is in the future and &ldquo;prev&rdquo; actually is in the past. I
definitely find the new behaviour more intuitive.</p>
<h3 id="downloading-the-hugo-changelogs-with-a-script">downloading the Hugo changelogs with a script</h3>
<p>Figuring out why/when all of these changes happened was a little difficult. I
ended up hacking together a bash script to <a href="https://gist.github.com/jvns/dbe4bd9271a56f1f8562bfe329c2aa9e">download all of the changelogs from github as text files</a>, which I
could then grep to try to figure out what happened. It turns out it&rsquo;s pretty
easy to get all of the changelogs from the GitHub API.</p>
<p>So far everything was not so bad &ndash; there was also a change around taxonomies
that&rsquo;s I can&rsquo;t quite explain, but it was all pretty manageable, but then we got
to the really tough one: the markdown renderer.</p>
<h3 id="change-4-the-markdown-renderer-blackfriday-goldmark">change 4: the markdown renderer (blackfriday -&gt; goldmark)</h3>
<p>The blackfriday markdown renderer (which was previously the default) was removed in <a href="https://github.com/gohugoio/hugo/releases/tag/v0.100.0">v0.100.0</a>. This seems pretty reasonable:</p>
<blockquote>
<p>It has been deprecated for a long time, its v1 version is not maintained
anymore, and there are many known issues. Goldmark should be a mature
replacement by now.</p>
</blockquote>
<p>Fixing all my Markdown changes was a huge pain &ndash; I ended up having to update
80 different Markdown files (out of 700) so that they would render properly, and I&rsquo;m not totally sure</p>
<h3 id="why-bother-switching-renderers">why bother switching renderers?</h3>
<p>The obvious question here is &ndash; why bother even trying to upgrade Hugo at all
if I have to switch Markdown renderers?
My old site was running totally fine and I think it wasn&rsquo;t necessarily a <em>good</em>
use of time, but the one reason I think it might be useful in the future is
that the new renderer (goldmark) uses the <a href="https://commonmark.org/">CommonMark markdown standard</a>, which I&rsquo;m hoping will be somewhat
more futureproof. So maybe I won&rsquo;t have to go through this again? We&rsquo;ll see.</p>
<p>Also it turned out that the new Goldmark renderer does fix some problems I had
(but didn&rsquo;t know that I had) with smart quotes and how lists/blockquotes
interact.</p>
<h3 id="finding-all-the-markdown-problems-the-process">finding all the Markdown problems: the process</h3>
<p>The hard part of this Markdown change was even figuring out what changed.
Almost all of the problems (including #2 and #3 above) just silently broke the
site, they didn&rsquo;t cause any errors or anything. So I had to diff the HTML to
hunt them down.</p>
<p>Here&rsquo;s what I ended up doing:</p>
<ol>
<li>Generate the site with the old version, put it in <code>public_old</code></li>
<li>Generate the new version, put it in <code>public</code></li>
<li>Diff every single HTML file in <code>public/</code> and <code>public_old</code> with <a href="https://gist.github.com/jvns/c7272cfb906e3ed0a3e9f8d361c5b5fc">this diff.sh script</a> and put the results in a <code>diffs/</code> folder</li>
<li>Run variations on <code>find diffs -type f | xargs cat | grep -C 5 '(31m|32m)' | less -r</code> over and over again to look at every single change until I found something that seemed wrong</li>
<li>Update the Markdown to fix the problem</li>
<li>Repeat until everything seemed okay</li>
</ol>
<p>(the <code>grep 31m|32m</code> thing is searching for red/green text in the diff)</p>
<p>This was very time consuming but it was a little bit fun for some reason so I
kept doing it until it seemed like nothing too horrible was left.</p>
<h3 id="the-new-markdown-rules">the new markdown rules</h3>
<p>Here&rsquo;s a list of every type of Markdown change I had to make. It&rsquo;s very
possible these are all extremely specific to me but it took me a long time to
figure them all out so maybe this will be helpful to one other person who finds
this in the future.</p>
<h4 id="4-1-mixing-html-and-markdown">4.1: mixing HTML and markdown</h4>
<p>This doesn&rsquo;t work anymore (it doesn&rsquo;t expand the link):</p>
<pre><code>&lt;small&gt;
[a link](https://example.com)
&lt;/small&gt;
</code></pre>
<p>I need to do this instead:</p>
<pre><code>&lt;small&gt;

[a link](https://example.com)

&lt;/small&gt;
</code></pre>
<p>This works too:</p>
<pre><code>&lt;small&gt; [a link](https://example.com) &lt;/small&gt;
</code></pre>
<h4 id="4-2-is-changed-into">4.2: <code>&lt;&lt;</code> is changed into «</h4>
<p>I didn&rsquo;t want this so I needed to configure:</p>
<pre><code>markup:
  goldmark:
    extensions:
      typographer:
        leftAngleQuote: '&amp;lt;&amp;lt;'
        rightAngleQuote: '&amp;gt;&amp;gt;'
</code></pre>
<h4 id="4-3-nested-lists-sometimes-need-4-space-indents">4.3: nested lists sometimes need 4 space indents</h4>
<p>This doesn&rsquo;t render as a nested list anymore if I only indent by 2 spaces, I need to put 4 spaces.</p>
<pre><code>1. a
  * b
  * c
2. b
</code></pre>
<p>The problem is that the amount of indent needed depends on the size of the list
markers. <a href="https://spec.commonmark.org/0.29/#example-263">Here&rsquo;s a reference in CommonMark for this</a>.</p>
<h4 id="4-4-blockquotes-inside-lists-work-better">4.4: blockquotes inside lists work better</h4>
<p>Previously the <code>&gt; quote</code> here didn&rsquo;t render as a blockquote, and with the new renderer it does.</p>
<pre><code>* something
&gt; quote
* something else
</code></pre>
<p>I found a bunch of Markdown that had been kind of broken (which I hadn&rsquo;t
noticed) that works better with the new renderer, and this is an example of
that.</p>
<p>Lists inside blockquotes also seem to work better.</p>
<h4 id="4-5-headings-inside-lists">4.5: headings inside lists</h4>
<p>Previously this didn&rsquo;t render as a heading, but now it does. So I needed to
replace the <code>#</code> with <code>&amp;num;</code>.</p>
<pre><code>* # passengers: 20
</code></pre>
<h4 id="4-6-or-1-at-the-beginning-of-the-line-makes-it-a-list">4.6:  <code>+</code> or <code>1)</code> at the beginning of the line makes it a list</h4>
<p>I had something which looked like this:</p>
<pre><code>`1 / (1
+ exp(-1)) = 0.73`
</code></pre>
<p>With Blackfriday it rendered like this:</p>
<pre><code>&lt;p&gt;&lt;code&gt;1 / (1
+ exp(-1)) = 0.73&lt;/code&gt;&lt;/p&gt;
</code></pre>
<p>and with Goldmark it rendered like this:</p>
<pre><code>&lt;p&gt;`1 / (1&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;exp(-1)) = 0.73`&lt;/li&gt;
&lt;/ul&gt;
</code></pre>
<p>Same thing if there was an accidental <code>1)</code> at the beginning of a line, like in this Markdown snippet</p>
<pre><code>I set up a small Hadoop cluster (1 master, 2 workers, replication set to 
1) on 
</code></pre>
<p>To fix this I just had to rewrap the line so that the <code>+</code> wasn&rsquo;t the first character.</p>
<p>The Markdown is formatted this way because I wrap my Markdown to 80 characters
a lot and the wrapping isn&rsquo;t very context sensitive.</p>
<h4 id="4-7-no-more-smart-quotes-in-code-blocks">4.7: no more smart quotes in code blocks</h4>
<p>There were a bunch of places where the old renderer (Blackfriday) was doing
unwanted things in code blocks like replacing <code>...</code> with <code>…</code> or replacing
quotes with smart quotes. I hadn&rsquo;t realized this was happening and I was very
happy to have it fixed.</p>
<h4 id="4-8-better-quote-management">4.8: better quote management</h4>
<p>The way this gets rendered got better:</p>
<pre><code>&quot;Oh, *interesting*!&quot;
</code></pre>
<ul>
<li>old: “Oh, <em>interesting</em>!“</li>
<li>new: “Oh, <em>interesting</em>!”</li>
</ul>
<p>Before there were two left smart quotes, now the quotes match.</p>
<h4 id="4-9-images-are-no-longer-wrapped-in-a-p-tag">4.9: images are no longer wrapped in a <code>p</code> tag</h4>
<p>Previously if I had an image like this:</p>
<pre><code>&lt;img src=&quot;https://jvns.ca/images/rustboot1.png&quot;&gt;
</code></pre>
<p>it would get wrapped in a <code>&lt;p&gt;</code> tag, now it doesn&rsquo;t anymore. I dealt with this
just by adding a <code>margin-bottom: 0.75em</code> to images in the CSS, hopefully
that&rsquo;ll make them display well enough.</p>
<h4 id="4-10-br-is-now-wrapped-in-a-p-tag">4.10: <code>&lt;br&gt;</code> is now wrapped in a <code>p</code> tag</h4>
<p>Previously this wouldn&rsquo;t get wrapped in a <code>p</code> tag, but now it seems to:</p>
<pre><code>&lt;br&gt;&lt;br&gt;
</code></pre>
<p>I just gave up on fixing this though and resigned myself to maybe having some
extra space in some cases. Maybe I&rsquo;ll try to fix it later if I feel like
another yakshave.</p>
<h4 id="4-11-some-more-goldmark-settings">4.11: some more goldmark settings</h4>
<p>I also needed to</p>
<ul>
<li>turn off code highlighting (because it wasn&rsquo;t working properly and I didn&rsquo;t have it before anyway)</li>
<li>use the old &ldquo;blackfriday&rdquo; method to generate heading IDs so they didn&rsquo;t change</li>
<li>allow raw HTML in my markdown</li>
</ul>
<p>Here&rsquo;s what I needed to add to my <code>config.yaml</code> to do all that:</p>
<pre><code>markup:
  highlight:
    codeFences: false
  goldmark:
    renderer:
      unsafe: true
    parser:
      autoHeadingIDType: blackfriday
</code></pre>
<p>Maybe I&rsquo;ll try to get syntax highlighting working one day, who knows. I might
prefer having it off though.</p>
<h3 id="a-little-script-to-compare-blackfriday-and-goldmark">a little script to compare blackfriday and goldmark</h3>
<p>I also wrote a little program to compare the Blackfriday and Goldmark output
for various markdown snippets, <a href="https://gist.github.com/jvns/9cc3024ff98433ced5e3a2304c5fc5e4">here it is in a gist</a>.</p>
<p>It&rsquo;s not really configured the exact same way Blackfriday and Goldmark were in
my Hugo versions, but it was still helpful to have to help me understand what
was going on.</p>
<h3 id="a-quick-note-on-maintaining-themes">a quick note on maintaining themes</h3>
<p>My approach to themes in Hugo has been:</p>
<ol>
<li>pay someone to make a nice design for the site (for example wizardzines.com was designed by <a href="https://melody.dev/">Melody Starling</a>)</li>
<li>use a totally custom theme</li>
<li>commit that theme to the same Github repo as the site</li>
</ol>
<p>So I just need to edit the theme files to fix any problems. Also I wrote a lot
of the theme myself so I&rsquo;m pretty familiar with how it works.</p>
<p>Relying on someone else to keep a theme updated feels kind of scary to me, I
think if I were using a third-party theme I&rsquo;d just copy the code into my site&rsquo;s
github repo and then maintain it myself.</p>
<h3 id="which-static-site-generators-have-better-backwards-compatibility">which static site generators have better backwards compatibility?</h3>
<p>I <a href="https://social.jvns.ca/@b0rk/113260718682453232">asked on Mastodon</a> if
anyone had used a static site generator with good backwards compatibility.</p>
<p>The main answers seemed to be Jekyll and 11ty. Several people said they&rsquo;d been
using Jekyll for 10 years without any issues, and 11ty says it has
<a href="https://www.11ty.dev/blog/stability/">stability as a core goal</a>.</p>
<p>I think a big factor in how appealing Jekyll/11ty are is how easy it is for you
to maintain a working Ruby / Node environment on your computer: part of the
reason I stopped using Jekyll was that I got tired of having to maintain a
working Ruby installation. But I imagine this wouldn&rsquo;t be a problem for a Ruby
or Node developer.</p>
<p>Several people said that they don&rsquo;t build their Jekyll site locally at all &ndash;
they just use GitHub Pages to build it.</p>
<h3 id="that-s-it">that&rsquo;s it!</h3>
<p>Overall I&rsquo;ve been happy with Hugo &ndash; I <a href="https://jvns.ca/blog/2016/10/09/switching-to-hugo/">started using it</a> because it had fast
build times and it was a static binary, and both of those things are still
extremely useful to me. I might have spent 10 hours on this upgrade, but I&rsquo;ve
probably spent 1000+ hours writing blog posts without thinking about Hugo at
all so that seems like an extremely reasonable ratio.</p>
<p>I find it hard to be too mad about the backwards incompatible changes, most of
them were quite a long time ago, Hugo does a great job of making their old
releases available so you can use the old release if you want, and the most
difficult one is removing support for the <code>blackfriday</code> Markdown renderer in
favour of using something CommonMark-compliant which seems pretty reasonable to
me even if it is a huge pain.</p>
<p>But it did take a long time and I don&rsquo;t think I&rsquo;d particularly recommend moving
700 blog posts to a new Markdown renderer unless you&rsquo;re really in the mood for
a lot of computer suffering for some reason.</p>
<p>The new renderer did fix a bunch of problems so I think overall it might be a
good thing, even if I&rsquo;ll have to remember to make 2 changes to how I write
Markdown (4.1 and 4.3).</p>
<p>Also I&rsquo;m still using Hugo 0.54 for <a href="https://wizardzines.com">https://wizardzines.com</a> so maybe these notes
will be useful to Future Me if I ever feel like upgrading Hugo for that site.</p>
<p>Hopefully I didn&rsquo;t break too many things on the blog by doing this, let me know
if you see anything broken!</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Terminal colours are tricky]]></title>
        <link href="https://jvns.ca/blog/2024/10/01/terminal-colours/"/>
        <updated>2024-10-01T10:01:44+00:00</updated>
        <id>https://jvns.ca/blog/2024/10/01/terminal-colours/</id>
        <content type="html"><![CDATA[<p>Yesterday I was thinking about how long it took me to get a colorscheme in my
terminal that I was mostly happy with (SO MANY YEARS), and it made me wonder
what about terminal colours made it so hard.</p>
<p>So I <a href="https://social.jvns.ca/@b0rk/113226972156366201">asked people on Mastodon</a> what problems
they&rsquo;ve run into with colours in the terminal, and I got a ton of interesting
responses! Let&rsquo;s talk about some of the problems and a few possible ways to fix
them.</p>
<h3 id="problem-1-blue-on-black">problem 1: blue on black</h3>
<p>One of the top complaints was &ldquo;blue on black is hard to read&rdquo;. Here&rsquo;s an
example of that: if I open Terminal.app, set the background to black, and run
<code>ls</code>, the directories are displayed in a blue that isn&rsquo;t that easy to read:</p>
<img src="https://jvns.ca/images/terminal-blue.png" style="max-width: 400px">
<p>To understand why we&rsquo;re seeing this blue, let&rsquo;s talk about ANSI colours!</p>
<h3 id="the-16-ansi-colours">the 16 ANSI colours</h3>
<p>Your terminal has 16 numbered colours &ndash; black, red, green, yellow, blue,
magenta, cyan, white, and &ldquo;bright&rdquo; version of each of those.</p>
<p>Programs can use them by printing out an &ldquo;ANSI escape code&rdquo; &ndash; for example if
you want to see each of the 16 colours in your terminal, you can run this
Python program:</p>
<pre><code class="language-python">def color(num, text):
    return f&quot;\033[38;5;{num}m{text}\033[0m&quot;

for i in range(16):
    print(color(i, f&quot;number {i:02}&quot;))
</code></pre>
<h3 id="what-are-the-ansi-colours">what are the ANSI colours?</h3>
<p>This made me wonder &ndash; if blue is colour number 5, who decides what hex color
that should correspond to?</p>
<p>The answer seems to be &ldquo;there&rsquo;s no standard, terminal emulators just choose
colours and it&rsquo;s not very consistent&rdquo;. Here&rsquo;s a <a href="https://en.m.wikipedia.org/wiki/ANSI_escape_code#Colors">screenshot of a table from Wikipedia</a>, where you
can see that there&rsquo;s a lot of variation:</p>
<img src="https://jvns.ca/images/wikipedia.png"> 
<h3 id="problem-1-5-bright-yellow-on-white">problem 1.5: bright yellow on white</h3>
<p>Bright yellow on white is even worse than blue on black, here&rsquo;s what I get in
a terminal with the default settings:</p>
<img src="https://jvns.ca/images/terminal-yellow.png" style="max-height: 40px">
<p>That&rsquo;s almost impossible to read (and some other colours like light green cause
similar issues), so let&rsquo;s talk about solutions!</p>
<h3 id="two-ways-to-reconfigure-your-colours">two ways to reconfigure your colours</h3>
<p>If you&rsquo;re annoyed by these colour contrast issues (or maybe you just think the
default ANSI colours are ugly), you might think &ndash; well, I&rsquo;ll just choose a
different &ldquo;blue&rdquo; and pick something I like better!</p>
<p>There are two ways you can do this:</p>
<p><strong>Way 1: Configure your terminal emulator</strong>: I think most modern terminal emulators
have a way to reconfigure the colours, and some of them even come with some
preinstalled themes that you might like better than the defaults.</p>
<p><strong>Way 2: Run a shell script</strong>: There are ANSI escape codes that you can print
out to tell your terminal emulator to reconfigure its colours. <a href="https://github.com/chriskempson/base16-shell/blob/master/scripts/base16-solarized-light.sh">Here&rsquo;s a shell script that does that</a>,
from the <a href="https://github.com/chriskempson/base16-shell">base16-shell</a> project.
You can see that it has a few different conventions for changing the colours &ndash;
I guess different terminal emulators have different escape codes for changing
their colour palette, and so the script is trying to pick the right style of
escape code based on the <code>TERM</code> environment variable.</p>
<h3 id="what-are-the-pros-and-cons-of-the-2-ways-of-configuring-your-colours">what are the pros and cons of the 2 ways of configuring your colours?</h3>
<p>I prefer to use the &ldquo;shell script&rdquo; method, because:</p>
<ul>
<li>if I switch terminal emulators for some reason, I don&rsquo;t need to a different configuration system, my colours still Just Work</li>
<li>I use <a href="https://github.com/chriskempson/base16-shell">base16-shell</a> with base16-vim to make my vim colours match my terminal colours, which is convenient</li>
</ul>
<p>some advantages of configuring colours in your terminal emulator:</p>
<ul>
<li>if you use a popular terminal emulator, there are probably a lot more nice terminal themes out there that you can choose from</li>
<li>not all terminal emulators support the &ldquo;shell script method&rdquo;, and even if
they do, the results can be a little inconsistent</li>
</ul>
<p>This is what my shell has looked like for probably the last 5 years (using the
solarized light base16 theme), and I&rsquo;m pretty happy with it. Here&rsquo;s <code>htop</code>:</p>
<img src="https://jvns.ca/images/terminal-my-colours.png" style="max-width: 400px">
<p>Okay, so let&rsquo;s say you&rsquo;ve found a terminal colorscheme that you like. What else
can go wrong?</p>
<h3 id="problem-2-programs-using-256-colours">problem 2: programs using 256 colours</h3>
<p>Here&rsquo;s what some output of <code>fd</code>, a <code>find</code> alternative, looks like in my
colorscheme:</p>
<img src="https://jvns.ca/images/terminal-problem-fd.png" style="max-width: 400px">
<p>The contrast is pretty bad here, and I definitely don&rsquo;t have that lime green in
my normal colorscheme. What&rsquo;s going on?</p>
<p>We can see what color codes <code>fd</code> is using using the <code>unbuffer</code> program to
capture its output including the color codes:</p>
<pre><code>$ unbuffer fd . &gt; out
$ vim out
^[[38;5;48mbad-again.sh^[[0m
^[[38;5;48mbad.sh^[[0m
^[[38;5;48mbetter.sh^[[0m
out
</code></pre>
<p><code>^[[38;5;48</code> means &ldquo;set the foreground color to color <code>48</code>&rdquo;. Terminals don&rsquo;t
only have 16 colours &ndash; many terminals these days actually have 3 ways of
specifying colours:</p>
<ol>
<li>the 16 ANSI colours we already talked about</li>
<li>an extended set of 256 colours</li>
<li>a further extended set of 24-bit hex colours, like <code>#ffea03</code></li>
</ol>
<p>So <code>fd</code> is using one of the colours from the extended 256-color set. <code>bat</code> (a
<code>cat</code> alternative) does something similar &ndash; here&rsquo;s what it looks like by
default in my terminal.</p>
<img src="https://jvns.ca/images/terminal-bat.png" style="max-width: 400px">
<p>This looks fine though and it really seems like it&rsquo;s trying to work well with a
variety of terminal themes.</p>
<h3 id="some-newer-tools-seem-to-have-theme-support">some newer tools seem to have theme support</h3>
<p>I think it&rsquo;s interesting that some of these newer terminal tools (<code>fd</code>, <code>cat</code>,
<code>delta</code>, and probably more) have support for arbitrary custom themes. I guess
the downside of this approach is that the default theme might clash with your
terminal&rsquo;s background, but the upside is that it gives you a lot more control
over theming the tool&rsquo;s output than just choosing 16 ANSI colours.</p>
<p>I don&rsquo;t really use <code>bat</code>, but if I did I&rsquo;d probably use <code>bat --theme ansi</code> to
just use the ANSI colours that I have set in my normal terminal colorscheme.</p>
<h3 id="problem-3-the-grays-in-solarized">problem 3: the grays in Solarized</h3>
<p>A bunch of people on Mastodon mentioned a specific issue with grays in the
Solarized theme: when I list a directory, the base16 Solarized Light theme
looks like this:</p>
<img src="https://jvns.ca/images/terminal-solarized-base16.png" style="max-width: 400px">
<p>but iTerm&rsquo;s default Solarized Light theme looks like this:</p>
<img src="https://jvns.ca/images/terminal-solarized-iterm.png" style="max-width: 400px">
<p>This is because in the iTerm theme (which is the <a href="https://ethanschoonover.com/solarized/#the-values">original Solarized design</a>), colors 9-14 (the &ldquo;bright blue&rdquo;, &ldquo;bright
red&rdquo;, etc) are mapped to a series of grays, and when I run <code>ls</code>, it&rsquo;s trying to
use those &ldquo;bright&rdquo; colours to color my directories and executables.</p>
<p>My best guess for why the original Solarized theme is designed this way is to
make the grays available to the <a href="https://github.com/altercation/vim-colors-solarized/blob/528a59f26d12278698bb946f8fb82a63711eec21/colors/solarized.vim">vim Solarized colorscheme</a>.</p>
<p>I&rsquo;m pretty sure I prefer the modified base16 version I use where the &ldquo;bright&rdquo;
colours are actually colours instead of all being shades of gray though. (I
didn&rsquo;t actually realize the version I was using wasn&rsquo;t the &ldquo;original&rdquo; Solarized
theme until I wrote this post)</p>
<p>In any case I really love Solarized and I&rsquo;m very happy it exists so that I can
use a modified version of it.</p>
<h3 id="problem-4-a-vim-theme-that-doesn-t-match-the-terminal-background">problem 4: a vim theme that doesn&rsquo;t match the terminal background</h3>
<p>If I my vim theme has a different background colour than my terminal theme, I
get this ugly border, like this:</p>
<img src="https://jvns.ca/images/terminal-vim-black-bg.png" style="max-width: 400px">
<p>This one is a pretty minor issue though and I think making your terminal
background match your vim background is pretty straightforward.</p>
<h3 id="problem-5-programs-setting-a-background-color">problem 5: programs setting a background color</h3>
<p>A few people mentioned problems with terminal applications setting an
unwanted background colour, so let&rsquo;s look at an example of that.</p>
<p>Here <code>ngrok</code> has set the background to color #16 (&ldquo;black&rdquo;), but the
<code>base16-shell</code> script I use sets color 16 to be bright orange, so I get this,
which is pretty bad:</p>
<img src="https://jvns.ca/images/terminal-ngrok-solarized.png" style="max-width: 400px">
<p>I think the intention is for ngrok to look something like this:</p>
<img src="https://jvns.ca/images/terminal-ngrok-regular.png" style="max-width: 400px">
<p>I think <code>base16-shell</code> sets color #16 to orange (instead of black)
so that it can provide extra colours for use by <a href="https://github.com/chriskempson/base16-vim/blob/3be3cd82cd31acfcab9a41bad853d9c68d30478d/colors/base16-solarized-light.vim">base16-vim</a>.
This feels reasonable to me &ndash; I use <code>base16-vim</code> in the terminal, so I guess I&rsquo;m
using that feature and it&rsquo;s probably more important to me than <code>ngrok</code> (which I
rarely use) behaving a bit weirdly.</p>
<p>This particular issue is a maybe obscure clash between ngrok and my colorschem,
but I think this kind of clash is pretty common when a program sets an ANSI
background color that the user has remapped for some reason.</p>
<h3 id="a-nice-solution-to-contrast-issues-minimum-contrast">a nice solution to contrast issues: &ldquo;minimum contrast&rdquo;</h3>
<p>A bunch of terminals (iTerm2, <a href="https://github.com/Eugeny/tabby">tabby</a>, kitty&rsquo;s <a href="https://sw.kovidgoyal.net/kitty/conf/#opt-kitty.text_fg_override_threshold">text_fg_override_threshold</a>, and
folks tell me also Ghostty and Windows Terminal) have a &ldquo;minimum
contrast&rdquo; feature that will automatically adjust colours to make sure they have enough contrast.</p>
<p>Here&rsquo;s an example from iTerm. This ngrok accident from before has pretty bad
contrast, I find it pretty difficult to read:</p>
<img src="https://jvns.ca/images/terminal-ngrok-solarized.png" style="max-width: 400px">
<p>With &ldquo;minimum contrast&rdquo; set to 40 in iTerm, it looks like this instead:</p>
<img src="https://jvns.ca/images/terminal-ngrok-solarized-contrast.png" style="max-width: 400px">
<p>I didn&rsquo;t have minimum contrast turned on before but I just turned it on today
because it makes such a big difference when something goes wrong with colours
in the terminal.</p>
<h3 id="problem-6-term-being-set-to-the-wrong-thing">problem 6: <code>TERM</code> being set to the wrong thing</h3>
<p>A few people mentioned that they&rsquo;ll SSH into a system that doesn&rsquo;t support the
<code>TERM</code> environment variable that they have set locally, and then the colours
won&rsquo;t work.</p>
<p>I think the way <code>TERM</code> works is that systems have a <code>terminfo</code> database, so if
the value of the <code>TERM</code> environment variable isn&rsquo;t in the system&rsquo;s terminfo
database, then it won&rsquo;t know how to output colours for that terminal. I don&rsquo;t
know too much about terminfo, but someone linked me to this <a href="https://twoot.site/@bean/113056942625234032">terminfo rant</a> that talks about a few other
issues with terminfo.</p>
<p>I don&rsquo;t have a system on hand to reproduce this one so I can&rsquo;t say for sure how
to fix it, but <a href="https://unix.stackexchange.com/questions/67537/prevent-ssh-client-passing-term-environment-variable-to-server">this stackoverflow question</a>
suggests running something like <code>TERM=xterm ssh</code> instead of <code>ssh</code>.</p>
<h3 id="problem-7-picking-good-colours-is-hard">problem 7: picking &ldquo;good&rdquo; colours is hard</h3>
<p>A couple of problems people mentioned with designing / finding terminal colorschemes:</p>
<ul>
<li>some folks are colorblind and have trouble finding an appropriate colorscheme</li>
<li>accidentally making the background color too close to the cursor or selection color, so they&rsquo;re hard to find</li>
<li>generally finding colours that work with every program is a struggle (for example you can see me having a problem with this with ngrok above!)</li>
</ul>
<h3 id="problem-8-making-nethack-mc-look-right">problem 8: making nethack/mc look right</h3>
<p>Another problem people mentioned is using a program like nethack or midnight
commander which you might expect to have a specific colourscheme based on the
default ANSI terminal colours.</p>
<p>For example, midnight commander has a really specific classic look:</p>
<img src="https://jvns.ca/images/terminal-mc-normal.png" style="max-width: 200px">
<p>But in my Solarized theme, midnight commander looks like this:</p>
<img src="https://jvns.ca/images/terminal-mc-solarized.png" style="max-width: 200px">
<p>The Solarized version feels like it could be disorienting if you&rsquo;re
very used to the &ldquo;classic&rdquo; look.</p>
<p>One solution Simon Tatham mentioned to this is using some palette customization
ANSI codes (like the ones base16 uses that I talked about earlier) to change
the color palette right before starting the program, for example remapping
yellow to a brighter yellow before starting Nethack so that the yellow
characters look better.</p>
<h3 id="problem-9-commands-disabling-colours-when-writing-to-a-pipe">problem 9: commands disabling colours when writing to a pipe</h3>
<p>If I run <code>fd | less</code>, I see something like this, with the colours disabled.</p>
<img src="https://jvns.ca/images/terminal-fd-bw.png" style="max-width: 300px">
<p>In general I find this useful &ndash; if I pipe a command to <code>grep</code>, I don&rsquo;t want it
to print out all those color escape codes, I just want the plain text. But what if you want to see the colours?</p>
<p>To see the colours, you can run <code>unbuffer fd | less -r</code>! I just learned about
<code>unbuffer</code> recently and I think it&rsquo;s really cool, <code>unbuffer</code> opens a tty for the
command to write to so that it thinks it&rsquo;s writing to a TTY. It also fixes
issues with programs buffering their output when writing to a pipe, which is
why it&rsquo;s called <code>unbuffer</code>.</p>
<p>Here&rsquo;s what the output of <code>unbuffer fd | less -r</code> looks like for me:</p>
<img src="https://jvns.ca/images/terminal-fd-color.png" style="max-width: 300px">
<p>Also some commands (including <code>fd</code>) support a <code>--color=always</code> flag which will
force them to always print out the colours.</p>
<h3 id="problem-10-unwanted-colour-in-ls-and-other-commands">problem 10: unwanted colour in <code>ls</code> and other commands</h3>
<p>Some people mentioned that they don&rsquo;t want <code>ls</code> to use colour at all, perhaps
because <code>ls</code> uses blue, it&rsquo;s hard to read on black, and maybe they don&rsquo;t feel like
customizing their terminal&rsquo;s colourscheme to make the blue more readable or
just don&rsquo;t find the use of colour helpful.</p>
<p>Some possible solutions to this one:</p>
<ul>
<li>you can run <code>ls --color=never</code>, which is probably easiest</li>
<li>you can also set <code>LS_COLORS</code> to customize the colours used by <code>ls</code>. I think some other programs other than <code>ls</code> support the <code>LS_COLORS</code> environment variable too.</li>
<li>also some programs support setting <code>NO_COLOR=true</code> (there&rsquo;s a <a href="https://no-color.org/">list here</a>)</li>
</ul>
<p>Here&rsquo;s an example of running <code>LS_COLORS=&quot;fi=0:di=0:ln=0:pi=0:so=0:bd=0:cd=0:or=0:ex=0&quot; ls</code>:</p>
<img src="https://jvns.ca/images/terminal-ls-colors.png" style="max-width: 500px">
<h3 id="problem-11-the-colours-in-vim">problem 11: the colours in vim</h3>
<p>I used to have a lot of problems with configuring my colours in vim &ndash; I&rsquo;d set
up my terminal colours in a way that I thought was okay, and then I&rsquo;d start vim
and it would just be a disaster.</p>
<p>I think what was going on here is that today, there are two ways to set up a vim colorscheme in the terminal:</p>
<ol>
<li>using your ANSI terminal colours &ndash; you tell vim which ANSI colour number to use for the background, for functions, etc.</li>
<li>using 24-bit hex colours &ndash; instead of ANSI terminal colours, the vim colorscheme can use hex codes like #faea99 directly</li>
</ol>
<p>20 years ago when I started using vim, terminals with 24-bit hex color support
were a lot less common (or maybe they didn&rsquo;t exist at all), and vim certainly
didn&rsquo;t have support for using 24-bit colour in the terminal. From some quick
searching through git, it looks like <a href="https://github.com/vim/vim/commit/8a633e3427b47286869aa4b96f2bfc1fe65b25cd">vim added support for 24-bit colour in 2016</a>
&ndash; just 8 years ago!</p>
<p>So to get colours to work properly in vim before 2016, you needed to synchronize
your terminal colorscheme and your vim colorscheme. <a href="https://github.com/chriskempson/base16-vim/blob/3be3cd82cd31acfcab9a41bad853d9c68d30478d/colors/base16-solarized-light.vim#L52-L71">Here&rsquo;s what that looked like</a>,
the colorscheme needed to map the vim color classes like <code>cterm05</code> to ANSI colour numbers.</p>
<p>But in 2024, the story is really different! Vim (and Neovim, which I use now)
support 24-bit colours, and as of Neovim 0.10 (released in May 2024), the
<code>termguicolors</code> setting (which tells Vim to use 24-bit hex colours for
colorschemes) is <a href="https://neovim.io/doc/user/news-0.10.html">turned on by default</a> in any terminal with 24-bit
color support.</p>
<p>So this &ldquo;you need to synchronize your terminal colorscheme and your vim
colorscheme&rdquo; problem is not an issue anymore for me in 2024, since I
don&rsquo;t plan to use terminals without 24-bit color support in the future.</p>
<p>The biggest consequence for me of this whole thing is that I don&rsquo;t need base16
to set colors 16-21 to weird stuff anymore to integrate with vim &ndash; I can just
use a terminal theme and a vim theme, and as long as the two themes use similar
colours (so it&rsquo;s not jarring for me to switch between them) there&rsquo;s no problem.
I think I can just remove those parts from my <code>base16</code> shell script and totally
avoid the problem with ngrok and the weird orange background I talked about
above.</p>
<h3 id="some-more-problems-i-left-out">some more problems I left out</h3>
<p>I think there are a lot of issues around the intersection of multiple programs,
like using some combination tmux/ssh/vim that I couldn&rsquo;t figure out how to
reproduce well enough to talk about them. Also I&rsquo;m sure I missed a lot of other
things too.</p>
<h3 id="base16-has-really-worked-for-me">base16 has really worked for me</h3>
<p>I&rsquo;ve personally had a lot of success with using
<a href="https://github.com/chriskempson/base16-shell">base16-shell</a> with
<a href="https://github.com/chriskempson/base16-vim">base16-vim</a> &ndash; I just need to add <a href="https://github.com/chriskempson/base16-shell?tab=readme-ov-file#fish">a couple of lines</a> to my
fish config to set it up (+ a few <code>.vimrc</code> lines) and then I can move on and
accept any remaining problems that that doesn&rsquo;t solve.</p>
<p>I don&rsquo;t think base16 is for everyone though, some limitations I&rsquo;m aware
of with base16 that might make it not work for you:</p>
<ul>
<li>it comes with a limited set of builtin themes and you might not like any of them</li>
<li>the Solarized base16 theme (and maybe all of the themes?) sets the &ldquo;bright&rdquo;
ANSI colours to be exactly the same as the normal colours, which might cause
a problem if you&rsquo;re relying on the &ldquo;bright&rdquo; colours to be different from the
regular ones</li>
<li>it sets colours 16-21 in order to give the vim colorschemes from <code>base16-vim</code>
access to more colours, which might not be relevant if you always use a
terminal with 24-bit color support, and can cause problems like the ngrok
issue above</li>
<li>also the way it sets colours 16-21 could be a problem in terminals that don&rsquo;t
have 256-color support, like the linux framebuffer terminal</li>
</ul>
<p>Apparently there&rsquo;s a community fork of base16 called
<a href="https://github.com/tinted-theming/home">tinted-theming</a>, which I haven&rsquo;t
looked into much yet.</p>
<h3 id="some-other-colorscheme-tools">some other colorscheme tools</h3>
<p>Just one so far but I&rsquo;ll link more if people tell me about them:</p>
<ul>
<li><a href="https://rootloops.sh/">rootloops.sh</a> for generating colorschemes (and <a href="https://hamvocke.com/blog/lets-create-a-terminal-color-scheme/">&ldquo;let&rsquo;s create a terminal color scheme&rdquo;</a>)</li>
<li>Some popular colorschemes (according to people I asked on Mastodon): <a href="https://catppuccin.com/">catpuccin</a>, Monokai, Gruvbox, <a href="https://github.com/dracula">Dracula</a>, <a href="https://protesilaos.com/emacs/modus-themes">Modus (a high contrast theme)</a>, <a href="https://github.com/folke/tokyonight.nvim">Tokyo Night</a>, <a href="https://www.nordtheme.com/">Nord</a>, <a href="https://rosepinetheme.com/">Rosé Pine</a></li>
</ul>
<h3 id="okay-that-was-a-lot">okay, that was a lot</h3>
<p>We talked about a lot in this post and  while I think learning about all these
details is kind of fun if I&rsquo;m in the mood to do a deep dive, I find it SO
FRUSTRATING to deal with it when I just want my colours to work! Being
surprised by unreadable text and having to find a workaround is just not my
idea of a good day.</p>
<p>Personally I&rsquo;m a zero-configuration kind of person and it&rsquo;s not that appealing
to me to have to put together a lot of custom configuration just to make my
colours in the terminal look acceptable. I&rsquo;d much rather just have some
reasonable defaults that I don&rsquo;t have to change.</p>
<h3 id="minimum-contrast-seems-like-an-amazing-feature">minimum contrast seems like an amazing feature</h3>
<p>My one big takeaway from writing this was to turn on &ldquo;minimum contrast&rdquo; in my
terminal, I think it&rsquo;s going to fix most of the occasional accidental
unreadable text issues I run into and I&rsquo;m pretty excited about it.</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Some Go web dev notes]]></title>
        <link href="https://jvns.ca/blog/2024/09/27/some-go-web-dev-notes/"/>
        <updated>2024-09-27T11:16:00+00:00</updated>
        <id>https://jvns.ca/blog/2024/09/27/some-go-web-dev-notes/</id>
        <content type="html"><![CDATA[<p>I spent a lot of time in the past couple of weeks working on a website in Go
that may or may not ever see the light of day, but I learned a couple of things
along the way I wanted to write down. Here they are:</p>
<h3 id="go-1-22-now-has-better-routing">go 1.22 now has better routing</h3>
<p>I&rsquo;ve never felt motivated to learn any of the Go routing libraries
(gorilla/mux, chi, etc), so I&rsquo;ve been doing all my routing by hand, like this.</p>
<pre><code>	// DELETE /records:
	case r.Method == &quot;DELETE&quot; &amp;&amp; n == 1 &amp;&amp; p[0] == &quot;records&quot;:
		if !requireLogin(username, r.URL.Path, r, w) {
			return
		}
		deleteAllRecords(ctx, username, rs, w, r)
	// POST /records/&lt;ID&gt;
	case r.Method == &quot;POST&quot; &amp;&amp; n == 2 &amp;&amp; p[0] == &quot;records&quot; &amp;&amp; len(p[1]) &gt; 0:
		if !requireLogin(username, r.URL.Path, r, w) {
			return
		}
		updateRecord(ctx, username, p[1], rs, w, r)

</code></pre>
<p>But apparently <a href="https://go.dev/blog/routing-enhancements">as of Go 1.22</a>, Go
now has better support for routing in the standard library, so that code can be
rewritten something like this:</p>
<pre><code>	mux.HandleFunc(&quot;DELETE /records/&quot;, app.deleteAllRecords)
	mux.HandleFunc(&quot;POST /records/{record_id}&quot;, app.updateRecord)
</code></pre>
<p>Though it would also need a login middleware, so maybe something more like
this, with a <code>requireLogin</code> middleware.</p>
<pre><code>	mux.Handle(&quot;DELETE /records/&quot;, requireLogin(http.HandlerFunc(app.deleteAllRecords)))
</code></pre>
<h3 id="a-gotcha-with-the-built-in-router-redirects-with-trailing-slashes">a gotcha with the built-in router: redirects with trailing slashes</h3>
<p>One annoying gotcha I ran into was: if I make a route for <code>/records/</code>, then a
request for <code>/records</code> <a href="https://pkg.go.dev/net/http#hdr-Trailing_slash_redirection-ServeMux">will be redirected</a> to <code>/records/</code>.</p>
<p>I ran into an issue with this where sending a POST request to <code>/records</code>
redirected to a GET request for <code>/records/</code>, which broke the POST request
because it removed the request body. Thankfully <a href="https://xeiaso.net/blog/go-servemux-slash-2021-11-04/">Xe Iaso wrote a blog post about the exact same issue</a> which made it
easier to debug.</p>
<p>I think the solution to this is just to use API endpoints like <code>POST /records</code>
instead of <code>POST /records/</code>, which seems like a more normal design anyway.</p>
<h3 id="sqlc-automatically-generates-code-for-my-db-queries">sqlc automatically generates code for my db queries</h3>
<p>I got a little bit tired of writing so much boilerplate for my SQL queries, but
I didn&rsquo;t really feel like learning an ORM, because I know what SQL queries I
want to write, and I didn&rsquo;t feel like learning the ORM&rsquo;s conventions for
translating things into SQL queries.</p>
<p>But then I found <a href="https://sqlc.dev/">sqlc</a>, which will compile a query like this:</p>
<pre><code>
-- name: GetVariant :one
SELECT *
FROM variants
WHERE id = ?;

</code></pre>
<p>into Go code like this:</p>
<pre><code>const getVariant = `-- name: GetVariant :one
SELECT id, created_at, updated_at, disabled, product_name, variant_name
FROM variants
WHERE id = ?
`

func (q *Queries) GetVariant(ctx context.Context, id int64) (Variant, error) {
	row := q.db.QueryRowContext(ctx, getVariant, id)
	var i Variant
	err := row.Scan(
		&amp;i.ID,
		&amp;i.CreatedAt,
		&amp;i.UpdatedAt,
		&amp;i.Disabled,
		&amp;i.ProductName,
		&amp;i.VariantName,
	)
	return i, err
}
</code></pre>
<p>What I like about this is that if I&rsquo;m ever unsure about what Go code to write
for a given SQL query, I can just write the query I want, read the generated
function and it&rsquo;ll tell me exactly what to do to call it. It feels much easier
to me than trying to dig through the ORM&rsquo;s documentation to figure out how to
construct the SQL query I want.</p>
<p>Reading <a href="https://brandur.org/fragments/sqlc-2024">Brandur&rsquo;s sqlc notes from 2024</a> also gave me some confidence
that this is a workable path for my tiny programs. That post gives a really
helpful example of how to conditionally update fields in a table using CASE
statements (for example if you have a table with 20 columns and you only want
to update 3 of them).</p>
<h3 id="sqlite-tips">sqlite tips</h3>
<p>Someone on Mastodon linked me to this post called <a href="https://kerkour.com/sqlite-for-servers">Optimizing sqlite for servers</a>. My projects are small and I&rsquo;m
not so concerned about performance, but my main takeaways were:</p>
<ul>
<li>have a dedicated object for <strong>writing</strong> to the database, and run
<code>db.SetMaxOpenConns(1)</code> on it. I learned the hard way that if I don&rsquo;t do this
then I&rsquo;ll get <code>SQLITE_BUSY</code> errors from two threads trying to write to the db
at the same time.</li>
<li>if I want to make reads faster, I could have 2 separate db objects, one for writing and one for reading</li>
</ul>
<p>There are a more tips in that post that seem useful (like &ldquo;COUNT queries are
slow&rdquo; and &ldquo;Use STRICT tables&rdquo;), but I haven&rsquo;t done those yet.</p>
<p>Also sometimes if I have two tables where I know I&rsquo;ll never need to do a <code>JOIN</code>
beteween them, I&rsquo;ll just put them in separate databases so that I can connect
to them independently.</p>
<h3 id="go-1-19-introduced-a-way-to-set-a-gc-memory-limit">Go 1.19 introduced a way to set a GC memory limit</h3>
<p>I run all of my Go projects in VMs with relatively little memory, like 256MB or
512MB. I ran into an issue where my application kept getting OOM killed and it
was confusing &ndash; did I have a memory leak? What?</p>
<p>After some Googling, I realized that maybe I didn&rsquo;t have a memory leak, maybe I
just needed to reconfigure the garbage collector! It turns out that by default (according to <a href="https://tip.golang.org/doc/gc-guide">A Guide to the Go Garbage Collector</a>), Go&rsquo;s garbage collector will
let the application allocate memory up to <strong>2x</strong> the current heap size.</p>
<p><a href="https://messwithdns.net">Mess With DNS</a>&rsquo;s base heap size is around 170MB and
the amount of memory free on the VM is around 160MB right now, so if its memory
doubled, it&rsquo;ll get OOM killed.</p>
<p>In Go 1.19, they added a way to tell Go &ldquo;hey, if the application starts using
this much memory, run a GC&rdquo;. So I set the GC memory limit to 250MB and it seems
to have resulted in the application getting OOM killed less often:</p>
<pre><code>export GOMEMLIMIT=250MiB
</code></pre>
<h3 id="some-reasons-i-like-making-websites-in-go">some reasons I like making websites in Go</h3>
<p>I&rsquo;ve been making tiny websites (like the <a href="https://nginx-playground.wizardzines.com/">nginx playground</a>) in Go on and off for the last 4 years or so and it&rsquo;s really been working for me. I think I like it because:</p>
<ul>
<li>there&rsquo;s just 1 static binary, all I need to do to deploy it is copy the binary. If there are static files I can just embed them in the binary with <a href="https://pkg.go.dev/embed">embed</a>.</li>
<li>there&rsquo;s a built-in webserver that&rsquo;s okay to use in production, so I don&rsquo;t need to configure WSGI or whatever to get it to work. I can just put it behind <a href="https://caddyserver.com/">Caddy</a> or run it on fly.io or whatever.</li>
<li>Go&rsquo;s toolchain is very easy to install, I can just do <code>apt-get install golang-go</code> or whatever and then a <code>go build</code> will build my project</li>
<li>it feels like there&rsquo;s very little to remember to start sending HTTP responses
&ndash; basically all there is are functions like <code>Serve(w http.ResponseWriter, r *http.Request)</code> which read the request and send a response. If I need to
remember some detail of how exactly that&rsquo;s accomplished, I just have to read
the function!</li>
<li>also <code>net/http</code> is in the standard library, so you can start making websites
without installing any libraries at all. I really appreciate this one.</li>
<li>Go is a pretty systems-y language, so if I need to run an <code>ioctl</code> or
something that&rsquo;s easy to do</li>
</ul>
<p>In general everything about it feels like it makes projects easy to work on for
5 days, abandon for 2 years, and then get back into writing code without a lot
of problems.</p>
<p>For contrast, I&rsquo;ve tried to learn Rails a couple of times and I really <em>want</em>
to love Rails &ndash; I&rsquo;ve made a couple of toy websites in Rails and it&rsquo;s always
felt like a really magical experience. But ultimately when I come back to those
projects I can&rsquo;t remember how anything works and I just end up giving up. It
feels easier to me to come back to my Go projects that are full of a lot of
repetitive boilerplate, because at least I can read the code and figure out how
it works.</p>
<h3 id="things-i-haven-t-figured-out-yet">things I haven&rsquo;t figured out yet</h3>
<p>some things I haven&rsquo;t done much of yet in Go:</p>
<ul>
<li>rendering HTML templates: usually my Go servers are just APIs and I make the
frontend a single-page app with Vue. I&rsquo;ve used <code>html/template</code> a lot in Hugo (which I&rsquo;ve used for this blog for the last 8 years)
but I&rsquo;m still not sure how I feel about it.</li>
<li>I&rsquo;ve never made a real login system, usually my servers don&rsquo;t have users at all.</li>
<li>I&rsquo;ve never tried to implement CSRF</li>
</ul>
<p>In general I&rsquo;m not sure how to implement security-sensitive features so I don&rsquo;t
start projects which need login/CSRF/etc. I imagine this is where a framework
would help.</p>
<h3 id="it-s-cool-to-see-the-new-features-go-has-been-adding">it&rsquo;s cool to see the new features Go has been adding</h3>
<p>Both of the Go features I mentioned in this post (<code>GOMEMLIMIT</code> and the routing)
are new in the last couple of years and I didn&rsquo;t notice when they came out. It
makes me think I should pay closer attention to the release notes for new Go
versions.</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Reasons I still love the fish shell]]></title>
        <link href="https://jvns.ca/blog/2024/09/12/reasons-i--still--love-fish/"/>
        <updated>2024-09-12T15:09:12+00:00</updated>
        <id>https://jvns.ca/blog/2024/09/12/reasons-i--still--love-fish/</id>
        <content type="html"><![CDATA[<p>I wrote about how much I love <a href="https://fishshell.com/">fish</a> in <a href="https://jvns.ca/blog/2017/04/23/the-fish-shell-is-awesome/">this blog post from 2017</a> and, 7 years
of using it every day later, I&rsquo;ve found even more reasons to love it. So I
thought I&rsquo;d write a new post with both the old reasons I loved it and some
reasons.</p>
<p>This came up today because I was trying to figure out why my terminal doesn&rsquo;t
break anymore when I cat a binary to my terminal, the answer was &ldquo;fish fixes
the terminal!&rdquo;, and I just thought that was really nice.</p>
<h3 id="1-no-configuration">1. no configuration</h3>
<p>In 10 years of using fish I have never found a single thing I wanted to configure. It just works the way I want. My fish config file just has:</p>
<ul>
<li>environment variables</li>
<li>aliases (<code>alias ls eza</code>, <code>alias vim nvim</code>, etc)</li>
<li>the occasional <code>direnv hook fish | source</code> to integrate a tool like direnv</li>
<li>a script I run to set up my <a href="https://github.com/chriskempson/base16-shell/blob/588691ba71b47e75793ed9edfcfaa058326a6f41/scripts/base16-solarized-light.sh">terminal colours</a></li>
</ul>
<p>I&rsquo;ve been told that configuring things in fish is really easy if you ever do
want to configure something though.</p>
<h3 id="2-autosuggestions-from-my-shell-history">2. autosuggestions from my shell history</h3>
<p>My absolute favourite thing about fish is that I type, it’ll automatically
suggest (in light grey) a matching command that I ran recently. I can press the
right arrow key to accept the completion, or keep typing to ignore it.</p>
<p>Here’s what that looks like. In this example I just typed the “v” key and it
guessed that I want to run the previous vim command again.</p>
<img src="https://jvns.ca/images/fish-2024.png">
<h3 id="2-5-smart-shell-autosuggestions">2.5 &ldquo;smart&rdquo; shell autosuggestions</h3>
<p>One of my favourite subtle autocomplete features is how fish handles autocompleting commands that contain paths in them. For example, if I run:</p>
<pre><code>$ ls blah.txt
</code></pre>
<p>that command will only be autocompleted in directories that contain <code>blah.txt</code> &ndash; it won&rsquo;t show up in a different directory. (here&rsquo;s <a href="https://github.com/fish-shell/fish-shell/issues/120#issuecomment-6376019">a short comment about how it works</a>)</p>
<p>As an example, if in this directory I type <code>bash scripts/</code>, it&rsquo;ll only suggest
history commands including files that <em>actually exist</em> in my blog&rsquo;s scripts
folder, and not the dozens of other irrelevant <code>scripts/</code> commands I&rsquo;ve run in
other folders.</p>
<p>I didn&rsquo;t understand exactly how this worked until last week, it just felt like fish was
magically able to suggest the right commands. It still feels a little like magic and I love it.</p>
<h3 id="3-pasting-multiline-commands">3. pasting multiline commands</h3>
<p>If I copy and paste multiple lines, bash will run them all, like this:</p>
<pre><code>[bork@grapefruit linux-playground (main)]$ echo hi
hi
[bork@grapefruit linux-playground (main)]$ touch blah
[bork@grapefruit linux-playground (main)]$ echo hi
hi
</code></pre>
<p>This is a bit alarming &ndash; what if I didn&rsquo;t actually <em>want</em> to run all those
commands?</p>
<p>Fish will paste them all at a single prompt, so that I can press Enter if I
actually want to run them. Much less scary.</p>
<pre><code>bork@grapefruit ~/work/&gt; echo hi

                         touch blah
                         echo hi
</code></pre>
<h3 id="4-nice-tab-completion">4. nice tab completion</h3>
<p>If I run <code>ls</code> and press tab, it&rsquo;ll display all the filenames in a nice grid. I can use either Tab, Shift+Tab, or the arrow keys to navigate the grid.</p>
<p>Also, I can tab complete from the <strong>middle</strong> of a filename &ndash; if the filename
starts with a weird character (or if it&rsquo;s just not very unique), I can type
some characters from the middle and press tab.</p>
<p>Here&rsquo;s what the tab completion looks like:</p>
<pre><code>bork@grapefruit ~/work/&gt; ls 
api/  blah.py     fly.toml   README.md
blah  Dockerfile  frontend/  test_websocket.sh
</code></pre>
<p>I honestly don&rsquo;t complete things other than filenames very much so I can&rsquo;t
speak to that, but I&rsquo;ve found the experience of tab completing filenames to be
very good.</p>
<h3 id="5-nice-default-prompt-including-git-integration">5. nice default prompt (including git integration)</h3>
<p>Fish&rsquo;s default prompt includes everything I want:</p>
<ul>
<li>username</li>
<li>hostname</li>
<li>current folder</li>
<li>git integration</li>
<li>status of last command exit (if the last command failed)</li>
</ul>
<p>Here&rsquo;s a screenshot with a few different variations on the default prompt,
including if the last command was interrupted (the <code>SIGINT</code>) or failed.</p>
<img src="https://jvns.ca/images/fish-prompt-2024.png">
<h3 id="6-nice-history-defaults">6. nice history defaults</h3>
<p>In bash, the maximum history size is 500 by default, presumably because
computers used to be slow and not have a lot of disk space. Also, by default,
commands don&rsquo;t get added to your history until you end your session. So if your
computer crashes, you lose some history.</p>
<p>In fish:</p>
<ol>
<li>the default history size is 256,000 commands. I don&rsquo;t see any reason I&rsquo;d ever need more.</li>
<li>if you open a new tab, everything you&rsquo;ve ever run (including commands in
open sessions) is immediately available to you</li>
<li>in an existing session, the history search will only include commands from
the current session, plus everything that was in history at the time that
you started the shell</li>
</ol>
<p>I&rsquo;m not sure how clearly I&rsquo;m explaining how fish&rsquo;s history system works here,
but it feels really good to me in practice. My impression is that the way it&rsquo;s
implemented is the commands are continually added to the history file, but fish
only loads the history file once, on startup.</p>
<p>I&rsquo;ll mention here that if you want to have a fancier history system in another
shell it might be worth checking out <a href="https://github.com/atuinsh/atuin">atuin</a> or <a href="https://github.com/junegunn/fzf">fzf</a>.</p>
<h3 id="7-press-up-arrow-to-search-history">7. press up arrow to search history</h3>
<p>I also like fish&rsquo;s interface for searching history: for example if I want to
edit my fish config file, I can just type:</p>
<pre><code>$ config.fish
</code></pre>
<p>and then press the up arrow to go back the last command that included <code>config.fish</code>. That&rsquo;ll complete to:</p>
<pre><code>$ vim ~/.config/fish/config.fish
</code></pre>
<p>and I&rsquo;m done. This isn&rsquo;t <em>so</em> different from using <code>Ctrl+R</code> in bash to search
your history but I think I like it a little better over all, maybe because
<code>Ctrl+R</code> has some behaviours that I find confusing (for example you can
end up accidentally editing your history which I don&rsquo;t like).</p>
<h3 id="8-the-terminal-doesn-t-break">8. the terminal doesn&rsquo;t break</h3>
<p>I used to run into issues with bash where I&rsquo;d accidentally <code>cat</code> a binary to
the terminal, and it would break the terminal.</p>
<p>Every time fish displays a prompt, it&rsquo;ll try to fix up your terminal so that
you don&rsquo;t end up in weird situations like this. I think <a href="https://github.com/fish-shell/fish-shell/blob/a979b6341d7fc4c466b3992f25da3209e0808aaa/src/reader.rs#L3601-L3623">this is some of the
code in fish to prevent broken terminals</a>.</p>
<p>Some things that it does are:</p>
<ul>
<li>turn on <code>echo</code> so that you can see the characters you type</li>
<li>make sure that newlines work properly so that you don&rsquo;t get that weird staircase effect</li>
<li>reset your terminal background colour, etc</li>
</ul>
<p>I don&rsquo;t think I&rsquo;ve run into any of these &ldquo;my terminal is broken&rdquo; issues in a
very long time, and I actually didn&rsquo;t even realize that this was because of
fish &ndash; I thought that things somehow magically just got better, or maybe I
wasn&rsquo;t making as many mistakes. But I think it was mostly fish saving me from
myself, and I really appreciate that.</p>
<h3 id="9-ctrl-s-is-disabled">9. Ctrl+S is disabled</h3>
<p>Also related to terminals breaking: fish disables Ctrl+S (which freezes your
terminal and then you need to remember to press Ctrl+Q to unfreeze it). It&rsquo;s a
feature that I&rsquo;ve never wanted and I&rsquo;m happy to not have it.</p>
<p>Apparently you can disable <code>Ctrl+S</code> in other shells with <code>stty -ixon</code>.</p>
<h3 id="10-nice-syntax-highlighting">10. nice syntax highlighting</h3>
<p>By default commands that don&rsquo;t exist are highlighted in red, like this.</p>
<img src="https://jvns.ca/images/fish-syntax-2024.png">
<h3 id="11-easier-loops">11. easier loops</h3>
<p>I find the loop syntax in fish a lot easier to type than the bash syntax. It looks like this:</p>
<pre><code>for i in *.yaml
  echo $i
end
</code></pre>
<p>Also it&rsquo;ll add indentation in your loops which is nice.</p>
<h3 id="12-easier-multiline-editing">12. easier multiline editing</h3>
<p>Related to loops: you can edit multiline commands much more easily than in bash
(just use the arrow keys to navigate the multiline command!). Also when you use
the up arrow to get a multiline command from your history, it&rsquo;ll show you the
whole command the exact same way you typed it instead of squishing it all onto
one line like bash does:</p>
<pre><code>$ bash
$ for i in *.png
&gt; do
&gt; echo $i
&gt; done
$ # press up arrow
$ for i in *.png; do echo $i; done ink
</code></pre>
<h3 id="13-ctrl-left-arrow">13. Ctrl+left arrow</h3>
<p>This might just be me, but I really appreciate that fish has the <code>Ctrl+left arrow</code> / <code>Ctrl+right arrow</code> keyboard shortcut for moving between
words when writing a command.</p>
<p>I&rsquo;m honestly a bit confused about where this keyboard shortcut is coming from
(the only documented keyboard shortcut for this I can find in fish is <code>Alt+left arrow</code> / <code>Alt + right arrow</code> which seems to do the same thing), but I&rsquo;m pretty
sure this is a fish shortcut.</p>
<p>A couple of notes about getting this shortcut to work / where it comes from:</p>
<ul>
<li>one person said they needed to switch their terminal emulator from the &ldquo;Linux
console&rdquo; keybindings to &ldquo;Default (XFree 4)&rdquo; to get it to work in fish</li>
<li>on Mac OS, <code>Ctrl+left arrow</code> switches workspaces by default, so I had to turn
that off.</li>
<li>Also apparently Ubuntu configures libreadline in <code>/etc/inputrc</code> to make
<code>Ctrl+left/right arrow</code> go back/forward a word, so it&rsquo;ll work in bash on
Ubuntu and maybe other Linux distros too. Here&rsquo;s a <a href="https://stackoverflow.com/questions/5029118/bash-ctrl-to-move-cursor-between-words-strings">stack overflow question talking about that</a></li>
</ul>
<h3 id="a-downside-not-everything-has-a-fish-integration">a downside: not everything has a fish integration</h3>
<p>Sometimes tools don&rsquo;t have instructions for integrating them with fish. That&rsquo;s annoying, but:</p>
<ul>
<li>I&rsquo;ve found this has gotten better over the last 10 years as fish has gotten
more popular. For example Python&rsquo;s virtualenv has had a fish integration for
a long time now.</li>
<li>If I need to run a POSIX shell command real quick, I can always just run <code>bash</code> or <code>zsh</code></li>
<li>I&rsquo;ve gotten much better over the years at translating simple commands to fish syntax when I need to</li>
</ul>
<p>My biggest day-to-day to annoyance is probably that for whatever reason I&rsquo;m
still not  used to fish&rsquo;s syntax for setting environment variables, I get confused
about <code>set</code> vs <code>set -x</code>.</p>
<h3 id="another-downside-fish-add-path">another downside: <code>fish_add_path</code></h3>
<p>fish has a function called <code>fish_add_path</code> that you can run to add a directory
to your <code>PATH</code> like this:</p>
<pre><code>fish_add_path /some/directory
</code></pre>
<p>I love the idea of it and I used to use it all the time, but I&rsquo;ve stopped using
it for two reasons:</p>
<ol>
<li>Sometimes <code>fish_add_path</code> will update the <code>PATH</code> for every session in the
future (with a &ldquo;universal variable&rdquo;) and sometimes it will update the <code>PATH</code>
just for the current session. It&rsquo;s hard for me to tell which one it will
do: in theory the docs explain this but I could not understand them.</li>
<li>If you ever need to <em>remove</em> the directory from your <code>PATH</code> a few weeks or
months later because maybe you made a mistake, that&rsquo;s also kind of hard to do
(there are <a href="https://github.com/fish-shell/fish-shell/issues/8604">instructions in this comments of this github issue though</a>).</li>
</ol>
<p>Instead I just update my PATH like this, similarly to how I&rsquo;d do it in bash:</p>
<pre><code>set PATH $PATH /some/directory/bin
</code></pre>
<h3 id="on-posix-compatibility">on POSIX compatibility</h3>
<p>When I started using fish, you couldn&rsquo;t do things like <code>cmd1 &amp;&amp; cmd2</code> &ndash; it
would complain &ldquo;no, you need to run <code>cmd1; and cmd2</code>&rdquo; instead.</p>
<p>It seems like over the years fish has started accepting a little more POSIX-style syntax than it used to, like:</p>
<ul>
<li><code>cmd1 &amp;&amp; cmd2</code></li>
<li><code>export a=b</code> to set an environment variable (though this seems a bit limited, you can&rsquo;t do <code>export PATH=$PATH:/whatever</code> so I think it&rsquo;s probably better to learn <code>set</code> instead)</li>
</ul>
<h3 id="on-fish-as-a-default-shell">on fish as a default shell</h3>
<p>Changing my default shell to fish is always a little annoying, I occasionally get myself into a situation where</p>
<ol>
<li>I install fish somewhere like maybe <code>/home/bork/.nix-stuff/bin/fish</code></li>
<li>I add the new fish location to <code>/etc/shells</code> as an allowed shell</li>
<li>I change my shell with <code>chsh</code></li>
<li>at some point months/years later I reinstall fish in a different location for some reason and remove the old one</li>
<li>oh no!!! I have no valid shell! I can&rsquo;t open a new terminal tab anymore!</li>
</ol>
<p>This has never been a major issue because I always have a terminal open
somewhere where I can fix the problem and rescue myself, but it&rsquo;s a bit
alarming.</p>
<p>If you don&rsquo;t want to use <code>chsh</code> to change your shell to fish (which is very reasonable,
maybe I shouldn&rsquo;t be doing that), the <a href="https://wiki.archlinux.org/title/Fish">Arch wiki page</a> has a couple of good suggestions &ndash;
either configure your terminal emulator to run fish or add an <code>exec fish</code> to
your <code>.bashrc</code>.</p>
<h3 id="i-ve-never-really-learned-the-scripting-language">I&rsquo;ve never really learned the scripting language</h3>
<p>Other than occasionally writing a for loop interactively on the command line,
I&rsquo;ve never really learned the fish scripting language. I still do all of my
shell scripting in bash.</p>
<p>I don&rsquo;t think I&rsquo;ve ever written a fish function or <code>if</code> statement.</p>
<h3 id="it-seems-like-fish-is-getting-pretty-popular">it seems like fish is getting pretty popular</h3>
<p>I ran a highly unscientific poll on Mastodon asking people what shell they <a href="https://social.jvns.ca/@b0rk/112722850642874842">use interactively</a>. The results were (of 2600 responses):</p>
<ul>
<li>46% bash</li>
<li>49% zsh</li>
<li>16% fish</li>
<li>5% other</li>
</ul>
<p>I think 16% for fish is pretty remarkable, since (as far as I know) there isn&rsquo;t
any system where fish is the default shell, and my sense is that it&rsquo;s very
common to just stick to whatever your system&rsquo;s default shell is.</p>
<p>It feels like a big achievement for the fish project, even if maybe my Mastodon
followers are more likely than the average shell user to use fish for some
reason.</p>
<h3 id="who-might-fish-be-right-for">who might fish be right for?</h3>
<p>Fish definitely isn&rsquo;t for everyone. I think I like it because:</p>
<ol>
<li>I really dislike configuring my shell (and honestly my dev environment in general), I want things to &ldquo;just work&rdquo; with the default settings</li>
<li>fish&rsquo;s defaults feel good to me</li>
<li>I don&rsquo;t spend that much time logged into random servers using other shells
so there&rsquo;s not too much context switching</li>
<li>I liked its features so much that I was willing to relearn how to do a few
&ldquo;basic&rdquo; shell things, like using parentheses <code>(seq 1 10)</code> to run a command
instead of backticks or using <code>set</code> instead of <code>export</code></li>
</ol>
<p>Maybe you&rsquo;re also a person who would like fish! I hope a few more of the people
who fish is for can find it, because I spend so much of my time in the terminal
and it&rsquo;s made that time much more pleasant.</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Migrating Mess With DNS to use PowerDNS]]></title>
        <link href="https://jvns.ca/blog/2024/08/19/migrating-mess-with-dns-to-use-powerdns/"/>
        <updated>2024-08-19T08:15:28+00:00</updated>
        <id>https://jvns.ca/blog/2024/08/19/migrating-mess-with-dns-to-use-powerdns/</id>
        <content type="html"><![CDATA[<p>About 3 years ago, I announced <a href="https://messwithdns.net/">Mess With DNS</a> in
<a href="https://jvns.ca/blog/2021/12/15/mess-with-dns/">this blog post</a>, a playground
where you can learn how DNS works by messing around and creating records.</p>
<p>I wasn&rsquo;t very careful with the DNS implementation though (to quote the release blog
post: &ldquo;following the DNS RFCs? not exactly&rdquo;), and people started reporting
problems that eventually I decided that I wanted to fix.</p>
<h3 id="the-problems">the problems</h3>
<p>Some of the problems people have reported were:</p>
<ul>
<li>domain names with underscores weren&rsquo;t allowed, even though they should be</li>
<li>If there was a CNAME record for a domain name, it allowed you to create other records for that domain name, even if it shouldn&rsquo;t</li>
<li>you could create 2 different CNAME records for the same domain name, which shouldn&rsquo;t be allowed</li>
<li>no support for the SVCB or HTTPS record types, which seemed a little complex to implement</li>
<li>no support for upgrading from UDP to TCP for big responses</li>
</ul>
<p>And there are certainly more issues that nobody got around to reporting, for
example that if you added an NS record for a subdomain to delegate it, Mess
With DNS wouldn&rsquo;t handle the delegation properly.</p>
<h3 id="the-solution-powerdns">the solution: PowerDNS</h3>
<p>I wasn&rsquo;t sure how to fix these problems for a long time &ndash; technically I
<em>could</em> have started addressing them individually, but it felt like there were
a million edge cases and I&rsquo;d never get there.</p>
<p>But then one day I was chatting with someone else who was working on a DNS
server and they said they were using <a href="https://github.com/PowerDNS/pdns/">PowerDNS</a>: an open
source DNS server with an HTTP API!</p>
<p>This seemed like an obvious solution to my problems &ndash; I could just swap out my
own crappy DNS implementation for PowerDNS.</p>
<p>There were a couple of challenges I ran into when setting up PowerDNS that I&rsquo;ll
talk about here. I really don&rsquo;t do a lot of web development and I think I&rsquo;ve never
built a website that depends on a relatively complex API before, so it was a
bit of a learning experience.</p>
<h3 id="challenge-1-getting-every-query-made-to-the-dns-server">challenge 1: getting every query made to the DNS server</h3>
<p>One of the main things Mess With DNS does is give you a live view of every DNS
query it receives for your subdomain, using a websocket. To make this work, it
needs to intercept every DNS query before they it gets sent to the PowerDNS DNS
server:</p>
<p>There were 2 options I could think of for how to intercept the DNS queries:</p>
<ol>
<li>dnstap: <code>dnsdist</code> (a DNS load balancer from the PowerDNS project) has
support for logging all DNS queries it receives using
<a href="https://dnstap.info/">dnstap</a>, so I could put dnsdist in front of PowerDNS
and then log queries that way</li>
<li>Have my Go server listen on port 53 and proxy the queries myself</li>
</ol>
<p>I originally implemented option #1, but for some reason there was a 1 second
delay before every query got logged. I couldn&rsquo;t figure out why, so I
implemented my own <a href="https://github.com/jvns/mess-with-dns/blob/3423c9496dd772f7157a56f9e068fd926e89c331/api/main.go#L265-L310">very simple proxy</a> instead.</p>
<h3 id="challenge-2-should-the-frontend-have-direct-access-to-the-powerdns-api">challenge 2: should the frontend have direct access to the PowerDNS API?</h3>
<p>The frontend used to have a lot of DNS logic in it &ndash; it converted emoji domain
names to ASCII using punycode, had a lookup table to convert numeric DNS query
types (like <code>1</code>) to their human-readable names (like <code>A</code>), did a little bit of
validation, and more.</p>
<p>Originally I considered keeping this pattern and just giving the frontend (more
or less) direct access to the PowerDNS API to create and delete, but writing
even more complex code in Javascript didn&rsquo;t feel that appealing to me &ndash; I
don&rsquo;t really know how to write tests in Javascript and it seemed like it
wouldn&rsquo;t end well.</p>
<p>So I decided to take all of the DNS logic out of the frontend and write a new
DNS API for managing records, shaped something like this:</p>
<ul>
<li><code>GET /records</code></li>
<li><code>DELETE /records/&lt;ID&gt;</code></li>
<li><code>DELETE /records/</code> (delete all records for a user)</li>
<li><code>POST /records/</code> (create record)</li>
<li><code>POST /records/&lt;ID&gt;</code> (update record)</li>
</ul>
<p>This meant that I could actually write tests for my code, since the backend is
in Go and I do know how to write tests in Go.</p>
<h3 id="what-i-learned-it-s-okay-for-an-api-to-duplicate-information">what I learned: it&rsquo;s okay for an API to duplicate information</h3>
<p>I had this idea that APIs shouldn&rsquo;t return duplicate information &ndash; for example
if I get a DNS record, it should only include a given piece of information
once.</p>
<p>But I ran into a problem with that idea when displaying MX records: an MX
record has 2 fields, &ldquo;preference&rdquo;, and &ldquo;mail server&rdquo;. And I needed to display
that information in 2 different ways on the frontend:</p>
<ol>
<li>In a form, where &ldquo;Preference&rdquo; and &ldquo;Mail Server&rdquo; are 2 different form fields (like <code>10</code> and <code>mail.example.com</code>)</li>
<li>In a summary view, where I wanted to just show the record (<code>10 mail.example.com</code>)</li>
</ol>
<p>This is kind of a small problem, but it came up in a few different places.</p>
<p>I talked to my friend Marco Rogers about this, and based on some advice from
him I realized that I could return the same information in the API in 2
different ways! Then the frontend just has to display it. So I started just
returning duplicate information in the API, something like this:</p>
<pre><code>{
  values: {'Preference': 10, 'Server': 'mail.example.com'},
  content: '10 mail.example.com',
  ...
}
</code></pre>
<p>I ended up using this pattern in a couple of other places where I needed to
display the same information in 2 different ways and it was SO much easier.</p>
<p>I think what I learned from this is that if I&rsquo;m making an API that isn&rsquo;t
intended for external use (there are no users of this API other than the
frontend!), I can tailor it very specifically to the frontend&rsquo;s needs and
that&rsquo;s okay.</p>
<h3 id="challenge-3-what-s-a-record-s-id">challenge 3: what&rsquo;s a record&rsquo;s ID?</h3>
<p>In Mess With DNS (and I think in most DNS user interfaces!), you create, add, and delete <strong>records</strong>.</p>
<p>But that&rsquo;s not how the PowerDNS API works. In PowerDNS, you create a <strong>zone</strong>,
which is made of <strong>record sets</strong>. Records don&rsquo;t have any ID in the API at all.</p>
<p>I ended up solving this by generate a fake ID for each records which is made of:</p>
<ul>
<li>its <strong>name</strong></li>
<li>its <strong>type</strong></li>
<li>and its <strong>content</strong> (base64-encoded)</li>
</ul>
<p>For example one record&rsquo;s ID is <code>brooch225.messwithdns.com.|NS|bnMxLm1lc3N3aXRoZG5zLmNvbS4=</code></p>
<p>Then I can search through the zone and find the appropriate record to update
it.</p>
<p>This means that if you update a record then its ID will change which isn&rsquo;t
usually what I want in an ID, but that seems fine.</p>
<h3 id="challenge-4-making-clear-error-messages">challenge 4: making clear error messages</h3>
<p>I think the error messages that the PowerDNS API returns aren&rsquo;t really intended to be shown to end users, for example:</p>
<ul>
<li><code>Name 'new\032site.island358.messwithdns.com.' contains unsupported characters</code> (this error encodes the space as <code>\032</code>, which is a bit disorienting if you don&rsquo;t know that the space character is 32 in ASCII)</li>
<li><code>RRset test.pear5.messwithdns.com. IN CNAME: Conflicts with pre-existing RRset</code> (this talks about RRsets, which aren&rsquo;t a concept that the Mess With DNS UI has at all)</li>
<li><code>Record orange.beryl5.messwithdns.com./A '1.2.3.4$': Parsing record content (try 'pdnsutil check-zone'): unable to parse IP address, strange character: $</code> (mentions &ldquo;pdnsutil&rdquo;, a utility which Mess With DNS&rsquo;s users don&rsquo;t have
access to in this context)</li>
</ul>
<p>I ended up handling this in two ways:</p>
<ol>
<li>Do some initial basic validation of values that users enter (like IP addresses), so I can just return errors like <code>Invalid IPv4 address: &quot;1.2.3.4$</code></li>
<li>If that goes well, send the request to PowerDNS and if we get an error back, then do some <a href="https://github.com/jvns/mess-with-dns/blob/c02579190e103218b2c8dfc6dceb19f863752f15/api/records/pdns_errors.go">hacky translation</a> of those messages to make them clearer.</li>
</ol>
<p>Sometimes users will still get errors from PowerDNS directly, but I added some
logging of all the errors that users see, so hopefully I can review them and
add extra translations if there are other common errors that come up.</p>
<p>I think what I learned from this is that if I&rsquo;m building a user-facing
application on top of an API, I need to be pretty thoughtful about how I
resurface those errors to users.</p>
<h3 id="challenge-5-setting-up-sqlite">challenge 5: setting up SQLite</h3>
<p>Previously Mess With DNS was using a Postgres database. This was problematic
because I only gave the Postgres machine 256MB of RAM, which meant that the
database got OOM killed almost every single day. I never really worked out
exactly why it got OOM killed every day, but that&rsquo;s how it was. I spent some
time trying to tune Postgres&rsquo; memory usage by setting the max connections /
<code>work-mem</code> / <code>maintenance-work-mem</code> and it helped a bit but didn&rsquo;t solve the
problem.</p>
<p>So for this refactor I decided to use SQLite instead, because the website
doesn&rsquo;t really get that much traffic. There are some choices involved with
using SQLite, and I decided to:</p>
<ol>
<li>Run <code>db.SetMaxOpenConns(1)</code> to make sure that we only open 1 connection to
the database at a time, to prevent <code>SQLITE_BUSY</code> errors from two threads
trying to access the database at the same time (just setting WAL mode didn&rsquo;t
work)</li>
<li>Use separate databases for each of the 3 tables (users, records, and
requests) to reduce contention. This maybe isn&rsquo;t really necessary, but there
was no reason I needed the tables to be in the same database so I figured I&rsquo;d set
up separate databases to be safe.</li>
<li>Use the cgo-free <a href="https://pkg.go.dev/modernc.org/sqlite?utm_source=godoc">modernc.org/sqlite</a>, which <a href="https://datastation.multiprocess.io/blog/2022-05-12-sqlite-in-go-with-and-without-cgo.html">translates SQLite&rsquo;s source code to Go</a>.
I might switch to a more &ldquo;normal&rdquo; sqlite implementation instead at some point and use cgo though.
I think the main reason I prefer to avoid cgo is that cgo has landed me with <a href="https://jvns.ca/blog/2021/11/17/debugging-a-weird--file-not-found--error/">difficult-to-debug errors in the past</a>.</li>
<li>use WAL mode</li>
</ol>
<p>I still haven&rsquo;t set up backups, though I don&rsquo;t think my Postgres database had
backups either. I think I&rsquo;m unlikely to use
<a href="https://litestream.io/">litestream</a> for backups &ndash; Mess With DNS is very far
from a critical application, and I think daily backups that I could recover
from in case of a disaster are more than good enough.</p>
<h3 id="challenge-6-upgrading-vue-managing-forms">challenge 6: upgrading Vue &amp; managing forms</h3>
<p>This has nothing to do with PowerDNS but I decided to upgrade Vue.js from
version 2 to 3 as part of this refresh. The main problem with that is that the
form validation library I was using (FormKit) completely changed its API
between Vue 2 and Vue 3, so I decided to just stop using it instead of learning
the new API.</p>
<p>I ended up switching to some form validation tools that are built into the
browser like <code>required</code> and <code>oninvalid</code> (<a href="https://github.com/jvns/mess-with-dns/blob/90f7a2d2982c8151a3ddcab532bc1db07a043f84/frontend/components/NewRecord.html#L5-L8">here&rsquo;s the code</a>).
I think it could use some of improvement, I still don&rsquo;t understand forms very well.</p>
<h3 id="challenge-7-managing-state-in-the-frontend">challenge 7: managing state in the frontend</h3>
<p>This also has nothing to do with PowerDNS, but when modifying the frontend I
realized that my state management in the frontend was a mess &ndash; in every place
where I made an API request to the backend, I had to try to remember to add a
&ldquo;refresh records&rdquo; call after that in every place that I&rsquo;d modified the state
and I wasn&rsquo;t always consistent about it.</p>
<p>With some more advice from Marco, I ended up implementing a single global
<a href="https://github.com/jvns/mess-with-dns/blob/90f7a2d2982c8151a3ddcab532bc1db07a043f84/frontend/store.ts#L32-L44">state management store</a>
which stores all the state for the application, and which lets me
create/update/delete records.</p>
<p>Then my components can just call <code>store.createRecord(record)</code>, and the store
will automatically resynchronize all of the state as needed.</p>
<h3 id="challenge-8-sequencing-the-project">challenge 8: sequencing the project</h3>
<p>This project ended up having several steps because I reworked the whole
integration between the frontend and the backend. I ended up splitting it into
a few different phases:</p>
<ol>
<li>Upgrade Vue from v2 to v3</li>
<li>Make the state management store</li>
<li>Implement a different backend API, move a lot of DNS logic out of the frontend, and add tests for the backend</li>
<li>Integrate PowerDNS</li>
</ol>
<p>I made sure that the website was (more or less) 100% working and then deployed
it in between phases, so that the amount of changes I was managing at a time
stayed somewhat under control.</p>
<h3 id="the-new-website-is-up-now">the new website is up now!</h3>
<p>I released the upgraded website a few days ago and it seems to work!
The PowerDNS API has been great to work on top of, and I&rsquo;m relieved that
there&rsquo;s a whole class of problems that I now don&rsquo;t have to think about at all,
other than potentially trying to make the error messages from PowerDNS a little
clearer. Using PowerDNS has fixed a lot of the DNS issues that folks have
reported in the last few years and it feels great.</p>
<p>If you run into problems with the new Mess With DNS I&rsquo;d love to <a href="https://github.com/jvns/mess-with-dns/issues/">hear about them here</a>.</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Go structs are copied on assignment (and other things about Go I'd missed)]]></title>
        <link href="https://jvns.ca/blog/2024/08/06/go-structs-copied-on-assignment/"/>
        <updated>2024-08-06T08:38:35+00:00</updated>
        <id>https://jvns.ca/blog/2024/08/06/go-structs-copied-on-assignment/</id>
        <content type="html"><![CDATA[<p>I&rsquo;ve been writing Go pretty casually for years &ndash; the backends for all of my
playgrounds (<a href="https://nginx-playground.wizardzines.com/">nginx</a>, <a href="https://messwithdns.net/">dns</a>, <a href="https://memory-spy.wizardzines.com/">memory</a>, <a href="https://dns-lookup.jvns.ca/">more DNS</a>) are written in Go, but many of those projects are just a few hundred lines and I don&rsquo;t come back to those codebases much.</p>
<p>I thought I more or less understood the basics of the language, but this week
I&rsquo;ve been writing a lot more Go than usual while working on some upgrades to
<a href="https://messwithdns.net">Mess with DNS</a>, and ran into a bug that revealed I
was missing a very basic concept!</p>
<p>Then I posted about this on Mastodon and someone linked me to this very cool
site (and book) called <a href="https://100go.co">100 Go Mistakes and How To Avoid Them</a> by <a href="https://teivah.dev/">Teiva Harsanyi</a>. It just came out in 2022 so it&rsquo;s relatively new.</p>
<p>I decided to read through the site to see what <em>else</em> I was missing, and found
a couple of other misconceptions I had about Go. I&rsquo;ll talk about some of the
mistakes that jumped out to me the most, but really the whole
<a href="https://100go.co/">100 Go Mistakes</a> site is great and I&rsquo;d recommend reading it.</p>
<p>Here&rsquo;s the initial mistake that started me on this journey:</p>
<h3 id="mistake-1-not-understanding-that-structs-are-copied-on-assignment">mistake 1: not understanding that structs are copied on assignment</h3>
<p>Let&rsquo;s say we have a struct:</p>
<pre><code>type Thing struct {
    Name string
}
</code></pre>
<p>and this code:</p>
<pre><code>thing := Thing{&quot;record&quot;}
other_thing := thing
other_thing.Name = &quot;banana&quot;
fmt.Println(thing)
</code></pre>
<p>This prints &ldquo;record&rdquo; and not &ldquo;banana&rdquo; (<a href="https://go.dev/play/p/kUeP2ocFtXw">play.go.dev link</a>), because <code>thing</code> is copied when you
assign it to <code>other_thing</code>.</p>
<h3 id="the-problem-this-caused-me-ranges">the problem this caused me: ranges</h3>
<p>The bug I spent 2 hours of my life debugging last week was effectively this code (<a href="https://go.dev/play/p/85FnGG86UBP">play.go.dev link</a>):</p>
<pre><code>type Thing struct {
  Name string
}
func findThing(things []Thing, name string) *Thing {
  for _, thing := range things {
    if thing.Name == name {
      return &amp;thing
    }
  }
  return nil
}

func main() {
  things := []Thing{Thing{&quot;record&quot;}, Thing{&quot;banana&quot;}}
  thing := findThing(things, &quot;record&quot;)
  thing.Name = &quot;gramaphone&quot;
  fmt.Println(things)
}
</code></pre>
<p>This prints out <code>[{record} {banana}]</code> &ndash; because <code>findThing</code> returned a copy, we didn&rsquo;t change the name in the original array.</p>
<p>This mistake is <a href="https://100go.co/#ignoring-that-elements-are-copied-in-range-loops-30">#30 in 100 Go Mistakes</a>.</p>
<p>I fixed the bug by changing it to something like this (<a href="https://go.dev/play/p/CKZCRUwv_nG">play.go.dev link</a>), which returns a
reference to the item in the array we&rsquo;re looking for instead of a copy.</p>
<pre><code>func findThing(things []Thing, name string) *Thing {
  for i := range things {
    if things[i].Name == name {
      return &amp;things[i]
    }
  }
  return nil
}
</code></pre>
<h3 id="why-didn-t-i-realize-this">why didn&rsquo;t I realize this?</h3>
<p>When I learned that I was mistaken about how assignment worked in Go I was
really taken aback, like &ndash; it&rsquo;s such a basic fact about the language works!
If I was wrong about that then what ELSE am I wrong about in Go????</p>
<p>My best guess for what happened is:</p>
<ol>
<li>I&rsquo;ve heard for my whole life that when you define a function,
you need to think about whether its arguments are passed by <strong>reference</strong> or
by <strong>value</strong></li>
<li>So I&rsquo;d thought about this in Go, and I knew that if you pass a struct as a
value to a function, it gets copied &ndash; if you want to pass a reference then
you have to pass a pointer</li>
<li>But somehow it never occurred to me that you need to think about the same
thing for <strong>assignments</strong>, perhaps because in most of the other languages I
use (Python, JS, Java) I think everything is a reference anyway. Except for
in Rust, where you do have values that you make copies of but I think most of the time I had to run <code>.clone()</code> explicitly.
(though apparently structs will be automatically copied on assignment if the struct implements the <code>Copy</code> trait)</li>
<li>Also obviously I just don&rsquo;t write that much Go so I guess it&rsquo;s never come
up.</li>
</ol>
<h3 id="mistake-2-side-effects-appending-slices-25-https-100go-co-unexpected-side-effects-using-slice-append-25">mistake 2: side effects appending slices (<a href="https://100go.co/#unexpected-side-effects-using-slice-append-25">#25</a>)</h3>
<p>When you subset a slice with <code>x[2:3]</code>, the original slice and the sub-slice
share the same backing array, so if you append to the new slice, it can
unintentionally change the old slice:</p>
<p>For example, this code prints <code>[1 2 3 555 5]</code> (<a href="https://go.dev/play/p/qssfM_NSXJD">code on play.go.dev</a>)</p>
<pre><code>x := []int{1, 2, 3, 4, 5}
y := x[2:3]
y = append(y, 555)
fmt.Println(x)
</code></pre>
<p>I don&rsquo;t think this has ever actually happened to me, but it&rsquo;s alarming and I&rsquo;m
very happy to know about it.</p>
<p>Apparently you can avoid this problem by changing <code>y := x[2:3]</code> to <code>y := x[2:3:3]</code>, which restricts the new slice&rsquo;s capacity so that appending to it
will re-allocate a new slice. Here&rsquo;s some <a href="https://go.dev/play/p/aE78JUL4-Iv">code on play.go.dev</a> that does that.</p>
<h3 id="mistake-3-not-understanding-the-different-types-of-method-receivers-42">mistake 3: not understanding the different types of method receivers (#42)</h3>
<p>This one isn&rsquo;t a &ldquo;mistake&rdquo; exactly, but it&rsquo;s been a source of confusion for me
and it&rsquo;s pretty simple so I&rsquo;m glad to have it cleared up.</p>
<p>In Go you can declare methods in 2 different ways:</p>
<ol>
<li><code>func (t Thing) Function()</code> (a &ldquo;value receiver&rdquo;)</li>
<li><code>func (t *Thing) Function()</code> (a &ldquo;pointer receiver&rdquo;)</li>
</ol>
<p>My understanding now is that basically:</p>
<ul>
<li>If you want the method to mutate the struct <code>t</code>, you need a pointer receiver.</li>
<li>If you want to make sure the method <strong>doesn&rsquo;t</strong> mutate the struct <code>t</code>, use a value receiver.</li>
</ul>
<p><a href="https://100go.co/#not-knowing-which-type-of-receiver-to-use-42">Explanation #42</a> has a
bunch of other interesting details though. There&rsquo;s definitely still something
I&rsquo;m missing about value vs pointer receivers (I got a compile error related to
them a couple of times in the last week that I still don&rsquo;t understand), but
hopefully I&rsquo;ll run into that error again soon and I can figure it out.</p>
<h3 id="more-interesting-things-i-noticed">more interesting things I noticed</h3>
<p>Some more notes from 100 Go Mistakes:</p>
<ul>
<li>apparently you can <a href="https://100go.co/#never-using-named-result-parameters-43">name the outputs of your function (#43)</a>, though that can have <a href="https://100go.co/#unintended-side-effects-with-named-result-parameters-44">issues (#44)</a> and I&rsquo;m not sure I want to</li>
<li><a href="https://100go.co/#not-exploring-all-the-go-testing-features-90">apparently you can put tests in a different package (#90)</a> to
ensure that you only use the package&rsquo;s public interfaces, which seems really
useful</li>
<li>there are a lots of notes about how to use contexts, channels, goroutines,
mutexes, sync.WaitGroup, etc. I&rsquo;m sure I have something to learn about all of
those but today is not the day I&rsquo;m going to learn them.</li>
</ul>
<p>Also there are some things that have tripped me up in the past, like:</p>
<ul>
<li><a href="https://100go.co/#forgetting-the-return-statement-after-replying-to-an-http-request-80">forgetting the return statement after replying to an HTTP request (#80)</a></li>
<li><a href="https://100go.co/#not-using-testing-utility-packages-httptest-and-iotest-88">not realizing the httptest package exists (#88)</a></li>
</ul>
<h3 id="this-100-common-mistakes-format-is-great">this &ldquo;100 common mistakes&rdquo; format is great</h3>
<p>I really appreciated this &ldquo;100 common mistakes&rdquo; format &ndash; it made it really
easy for me to skim through the mistakes and very quickly mentally classify
them into:</p>
<ol>
<li>yep, I know that</li>
<li>not interested in that one right now</li>
<li>WOW WAIT I DID NOT KNOW THAT, THAT IS VERY USEFUL!!!!</li>
</ol>
<p>It looks like &ldquo;100 Common Mistakes&rdquo; is a series of books from Manning and they
also have &ldquo;100 Java Mistakes&rdquo; and an upcoming &ldquo;100 SQL Server Mistakes&rdquo;.</p>
<p>Also I enjoyed what I&rsquo;ve read of <a href="https://effectivepython.com/">Effective Python</a> by Brett Slatkin, which has a similar &ldquo;here are a bunch of
short Python style tips&rdquo; structure where you can quickly skim it and take
what&rsquo;s useful to you. There&rsquo;s also Effective C++, Effective Java, and probably
more.</p>
<h3 id="some-other-go-resources">some other Go resources</h3>
<p>other resources I&rsquo;ve appreciated:</p>
<ul>
<li><a href="https://gobyexample.com/">Go by example</a> for basic syntax</li>
<li><a href="https://go.dev/play/">go.dev/play</a></li>
<li>obviously <a href="https://pkg.go.dev">https://pkg.go.dev</a> for documentation about literally everything</li>
<li><a href="https://staticcheck.dev/">staticcheck</a> seems like a useful linter &ndash; for
example I just started using it to tell me when I&rsquo;ve forgotten to handle an
error</li>
<li>apparently <a href="https://golangci-lint.run/">golangci-lint</a> includes a bunch of different linters</li>
</ul>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Entering text in the terminal is complicated]]></title>
        <link href="https://jvns.ca/blog/2024/07/08/readline/"/>
        <updated>2024-07-08T13:00:15+00:00</updated>
        <id>https://jvns.ca/blog/2024/07/08/readline/</id>
        <content type="html"><![CDATA[<p>The other day I asked what folks on Mastodon find confusing about working in
the terminal, and one thing that stood out to me was &ldquo;editing a command you
already typed in&rdquo;.</p>
<p>This really resonated with me: even though entering some text and editing it is
a very &ldquo;basic&rdquo; task, it took me maybe 15 years of using the terminal every
single day to get used to using <code>Ctrl+A</code> to go to the beginning of the line (or
<code>Ctrl+E</code> for the end &ndash; I think I used <code>Home</code>/<code>End</code> instead).</p>
<p>So let&rsquo;s talk about why entering text might be hard! I&rsquo;ll also share a few tips
that I wish I&rsquo;d learned earlier.</p>
<h3 id="it-s-very-inconsistent-between-programs">it&rsquo;s very inconsistent between programs</h3>
<p>A big part of what makes entering text in the terminal hard is the
inconsistency between how different programs handle entering text. For example:</p>
<ol>
<li>some programs (<code>cat</code>, <code>nc</code>, <code>git commit --interactive</code>, etc) don&rsquo;t support using arrow keys at all: if you press arrow keys, you&rsquo;ll just see <code>^[[D^[[D^[[C^[[C^</code></li>
<li>many programs (like <code>irb</code>, <code>python3</code> on a Linux machine and many many more) use the <code>readline</code> library, which gives you a lot of basic functionality (history, arrow keys, etc)</li>
<li>some programs (like <code>/usr/bin/python3</code> on my Mac) do support very basic features like arrow keys, but not other features like <code>Ctrl+left</code> or reverse searching with <code>Ctrl+R</code></li>
<li>some programs (like the <code>fish</code> shell or <code>ipython3</code> or <code>micro</code> or <code>vim</code>) have their own fancy system for accepting input which is totally custom</li>
</ol>
<p>So there&rsquo;s a lot of variation! Let&rsquo;s talk about each of those a little more.</p>
<h3 id="mode-1-the-baseline">mode 1: the baseline</h3>
<p>First, there&rsquo;s &ldquo;the baseline&rdquo; &ndash; what happens if a program just accepts text by
calling <code>fgets()</code> or whatever and doing absolutely nothing else to provide a
nicer experience. Here&rsquo;s what using these tools typically looks for me &ndash; If I
start the version of <a href="https://wiki.archlinux.org/title/Dash">dash</a> installed on
my machine (a pretty minimal shell) press the left arrow keys, it just prints
<code>^[[D</code> to the terminal.</p>
<pre><code>$ ls l-^[[D^[[D^[[D
</code></pre>
<p>At first it doesn&rsquo;t seem like all of these &ldquo;baseline&rdquo; tools have much in
common, but there are actually a few features that you get for free just from
your terminal, without the program needing to do anything special at all.</p>
<p>The things you get for free are:</p>
<ol>
<li>typing in text, obviously</li>
<li>backspace</li>
<li><code>Ctrl+W</code>, to delete the previous word</li>
<li><code>Ctrl+U</code>, to delete the whole line</li>
<li>a few other things unrelated to text editing (like <code>Ctrl+C</code> to interrupt the process, <code>Ctrl+Z</code> to suspend, etc)</li>
</ol>
<p>This is not <em>great</em>, but it means that if you want to delete a word you
generally can do it with <code>Ctrl+W</code> instead of pressing backspace 15 times, even
if you&rsquo;re in an environment which is offering you absolutely zero features.</p>
<p>You can get a list of all the ctrl codes that your terminal supports with <code>stty -a</code>.</p>
<h3 id="mode-2-tools-that-use-readline">mode 2: tools that use <code>readline</code></h3>
<p>The next group is tools that use readline! Readline is a GNU library to make
entering text more pleasant, and it&rsquo;s very widely used.</p>
<p>My favourite readline keyboard shortcuts are:</p>
<ol>
<li><code>Ctrl+E</code> (or <code>End</code>) to go to the end of the line</li>
<li><code>Ctrl+A</code> (or <code>Home</code>) to go to the beginning of the line</li>
<li><code>Ctrl+left/right arrow</code> to go back/forward 1 word</li>
<li>up arrow to go back to the previous command</li>
<li><code>Ctrl+R</code> to search your history</li>
</ol>
<p>And you can use <code>Ctrl+W</code> / <code>Ctrl+U</code> from the &ldquo;baseline&rdquo; list, though <code>Ctrl+U</code>
deletes from the cursor to the beginning of the line instead of deleting the
whole line. I think <code>Ctrl+W</code> might also have a slightly different definition of
what a &ldquo;word&rdquo; is.</p>
<p>There are a lot more (<a href="https://www.man7.org/linux/man-pages/man3/readline.3.html#EDITING_COMMANDS">here&rsquo;s a full list</a>), but those are the only ones that I personally use.</p>
<p>The <code>bash</code> shell is probably the most famous readline user (when you use
<code>Ctrl+R</code> to search your history in bash, that feature actually comes from
readline), but there are TONS of programs that use it &ndash; for example <code>psql</code>,
<code>irb</code>, <code>python3</code>, etc.</p>
<h3 id="tip-you-can-make-anything-use-readline-with-rlwrap">tip: you can make ANYTHING use readline with <code>rlwrap</code></h3>
<p>One of my absolute favourite things is that if you have a program like <code>nc</code>
without readline support, you can just run <code>rlwrap nc</code> to turn it into a
program with readline support!</p>
<p>This is incredible and makes a lot of tools that are borderline unusable MUCH
more pleasant to use. You can even apparently set up <a href="https://github.com/hanslub42/rlwrap">rlwrap</a> to include your own
custom autocompletions, though I&rsquo;ve never tried that.</p>
<h3 id="some-reasons-tools-might-not-use-readline">some reasons tools might not use readline</h3>
<p>I think reasons tools might not use readline might include:</p>
<ul>
<li>the program is very simple (like <code>cat</code> or <code>nc</code>) and maybe the maintainers don&rsquo;t want to bring in a relatively large dependency</li>
<li>license reasons, if the program&rsquo;s license is not GPL-compatible &ndash; readline is GPL-licensed, not LGPL</li>
<li>only a very small part of the program is interactive, and maybe readline
support isn&rsquo;t seen as important. For example <code>git</code> has a few interactive
features (like <code>git add -p</code>), but not very many, and usually you&rsquo;re just
typing a single character like <code>y</code> or <code>n</code> &ndash; most of the time you need to really
type something significant in git, it&rsquo;ll drop you into a text editor instead.</li>
</ul>
<p>For example idris2 says <a href="https://idris2.readthedocs.io/en/latest/tutorial/interactive.html#editing-at-the-repl">they don&rsquo;t use readline</a>
to keep dependencies minimal and suggest using <code>rlwrap</code> to get better
interactive features.</p>
<h3 id="how-to-know-if-you-re-using-readline">how to know if you&rsquo;re using readline</h3>
<p>The simplest test I can think of is to press <code>Ctrl+R</code>, and if you see:</p>
<pre><code>(reverse-i-search)`':
</code></pre>
<p>then you&rsquo;re probably using readline. This obviously isn&rsquo;t a guarantee (some
other library could use the term <code>reverse-i-search</code> too!), but I don&rsquo;t know of
another system that uses that specific term to refer to searching history.</p>
<h3 id="the-readline-keybindings-come-from-emacs">the readline keybindings come from Emacs</h3>
<p>Because I&rsquo;m a vim user, It took me a very long time to understand where these
keybindings come from (why <code>Ctrl+A</code> to go to the beginning of a line??? so
weird!)</p>
<p>My understanding is these keybindings actually come from Emacs &ndash; <code>Ctrl+A</code> and
<code>Ctrl+E</code> do the same thing in Emacs as they do in Readline and I assume the
other keyboard shortcuts mostly do as well, though I tried out <code>Ctrl+W</code> and
<code>Ctrl+U</code> in Emacs and they don&rsquo;t do the same thing as they do in the terminal
so I guess there are some differences.</p>
<p>There&rsquo;s some more <a href="https://twobithistory.org/2019/08/22/readline.html">history of the Readline project here</a>.</p>
<h3 id="mode-3-another-input-library-like-libedit">mode 3: another input library (like <code>libedit</code>)</h3>
<p>On my Mac laptop, <code>/usr/bin/python3</code> is in a weird middle ground where it
supports <em>some</em> readline features (for example the arrow keys), but not the
other ones. For example when I press <code>Ctrl+left arrow</code>, it prints out <code>;5D</code>,
like this:</p>
<pre><code>$ python3
&gt;&gt;&gt; importt subprocess;5D
</code></pre>
<p>Folks on Mastodon helped me figure out that this is because in the default
Python install on Mac OS, the Python <code>readline</code> module is actually backed by
<code>libedit</code>, which is a similar library which has fewer features, presumably
because Readline is <a href="https://en.wikipedia.org/wiki/GNU_Readline#Choice_of_the_GPL_as_GNU_Readline's_license">GPL licensed</a>.</p>
<p>Here&rsquo;s how I was eventually able to figure out that Python was using libedit on
my system:</p>
<pre><code>$ python3 -c &quot;import readline; print(readline.__doc__)&quot;
Importing this module enables command line editing using libedit readline.
</code></pre>
<p>Generally Python uses readline though if you install it on Linux or through
Homebrew. It&rsquo;s just that the specific version that Apple includes on their
systems doesn&rsquo;t have readline. Also <a href="https://docs.python.org/3.13/whatsnew/3.13.html#a-better-interactive-interpreter">Python 3.13 is going to remove the readline dependency</a>
in favour of a custom library, so &ldquo;Python uses readline&rdquo; won&rsquo;t be true in the
future.</p>
<p>I assume that there are more programs on my Mac that use libedit but I haven&rsquo;t
looked into it.</p>
<h3 id="mode-4-something-custom">mode 4: something custom</h3>
<p>The last group of programs is programs that have their own custom (and sometimes
much fancier!) system for editing text. This includes:</p>
<ul>
<li>most terminal text editors (nano, micro, vim, emacs, etc)</li>
<li>some shells (like fish), for example it seems like fish supports <code>Ctrl+Z</code> for undo when typing in a command. Zsh&rsquo;s line editor is called <a href="https://zsh.sourceforge.io/Guide/zshguide04.html">zle</a>.</li>
<li>some REPLs (like <code>ipython</code>), for example IPython uses the <a href="https://python-prompt-toolkit.readthedocs.io/">prompt_toolkit</a> library instead of readline</li>
<li>lots of other programs (like <code>atuin</code>)</li>
</ul>
<p>Some features you might see are:</p>
<ul>
<li>better autocomplete which is more customized to the tool</li>
<li>nicer history management (for example with syntax highlighting) than the default you get from readline</li>
<li>more keyboard shortcuts</li>
</ul>
<h3 id="custom-input-systems-are-often-readline-inspired">custom input systems are often readline-inspired</h3>
<p>I went looking at how <a href="https://atuin.sh/">Atuin</a> (a wonderful tool for
searching your shell history that I started using recently) handles text input.
Looking at <a href="https://github.com/atuinsh/atuin/blob/a67cfc82fe0dc907a01f07a0fd625701e062a33b/crates/atuin/src/command/client/search/interactive.rs#L382-L430">the code</a>
and some of the discussion around it, their implementation is custom but it&rsquo;s
inspired by readline, which makes sense to me &ndash; a lot of users are used to
those keybindings, and it&rsquo;s convenient for them to work even though atuin
doesn&rsquo;t use readline.</p>
<p><a href="https://python-prompt-toolkit.readthedocs.io/">prompt_toolkit</a> (the library
IPython uses) is similar &ndash; it actually supports a lot of options (including
vi-like keybindings), but the default is to support the readline-style
keybindings.</p>
<p>This is like how you see a lot of programs which support very basic vim
keybindings (like <code>j</code> for down and <code>k</code> for up). For example Fastmail supports
<code>j</code> and <code>k</code> even though most of its other keybindings don&rsquo;t have much
relationship to vim.</p>
<p>I assume that most &ldquo;readline-inspired&rdquo; custom input systems have various subtle
incompatibilities with readline, but this doesn&rsquo;t really bother me at all
personally because I&rsquo;m extremely ignorant of most of readline&rsquo;s features. I only use
maybe 5 keyboard shortcuts, so as long as they support the 5 basic commands I
know (which they always do!) I feel pretty comfortable. And usually these
custom systems have much better autocomplete than you&rsquo;d get from just using
readline, so generally I prefer them over readline.</p>
<h3 id="lots-of-shells-support-vi-keybindings">lots of shells support vi keybindings</h3>
<p>Bash, zsh, and fish all have a &ldquo;vi mode&rdquo; for entering text. In a
<a href="https://social.jvns.ca/@b0rk/112723846172173621">very unscientific poll</a> I ran on
Mastodon, 12% of people said they use it, so it seems pretty popular.</p>
<p>Readline also has a &ldquo;vi mode&rdquo; (which is how Bash&rsquo;s support for it works), so by
extension lots of other programs have it too.</p>
<p>I&rsquo;ve always thought that vi mode seems really cool, but for some reason even
though I&rsquo;m a vim user it&rsquo;s never stuck for me.</p>
<h3 id="understanding-what-situation-you-re-in-really-helps">understanding what situation you&rsquo;re in really helps</h3>
<p>I&rsquo;ve spent a lot of my life being confused about why a command line application
I was using wasn&rsquo;t behaving the way I wanted, and it feels good to be able to
more or less understand what&rsquo;s going on.</p>
<p>I think this is roughly my mental flowchart when I&rsquo;m entering text at a command
line prompt:</p>
<ol>
<li>Do the arrow keys not work? Probably there&rsquo;s no input system at all, but at
least I can use <code>Ctrl+W</code> and <code>Ctrl+U</code>, and I can <code>rlwrap</code> the tool if I
want more features.</li>
<li>Does <code>Ctrl+R</code> print <code>reverse-i-search</code>? Probably it&rsquo;s readline, so I can use
all of the readline shortcuts I&rsquo;m used to, and I know I can get some basic
history and press up arrow to get the previous command.</li>
<li>Does <code>Ctrl+R</code> do something else? This is probably some custom input library:
it&rsquo;ll probably act more or less like readline, and I can check the
documentation if I really want to know how it works.</li>
</ol>
<p>Being able to diagnose what&rsquo;s going on like this makes the command line feel a
more predictable and less chaotic.</p>
<h3 id="some-things-this-post-left-out">some things this post left out</h3>
<p>There are lots more complications related to entering text that we didn&rsquo;t talk
about at all here, like:</p>
<ul>
<li>issues related to ssh / tmux / etc</li>
<li>the <code>TERM</code> environment variable</li>
<li>how different terminals (gnome terminal, iTerm, xterm, etc) have different kinds of support for copying/pasting text</li>
<li>unicode</li>
<li>probably a lot more</li>
</ul>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Reasons to use your shell's job control]]></title>
        <link href="https://jvns.ca/blog/2024/07/03/reasons-to-use-job-control/"/>
        <updated>2024-07-03T08:00:20+00:00</updated>
        <id>https://jvns.ca/blog/2024/07/03/reasons-to-use-job-control/</id>
        <content type="html"><![CDATA[<p>Hello! Today someone on Mastodon asked about job control (<code>fg</code>, <code>bg</code>, <code>Ctrl+z</code>,
<code>wait</code>, etc). It made me think about how I don&rsquo;t use my shell&rsquo;s job
control interactively very often: usually I prefer to just open a new terminal
tab if I want to run multiple terminal programs, or use tmux if it&rsquo;s over ssh.
But I was curious about whether other people used job control more often than me.</p>
<p>So I <a href="https://social.jvns.ca/@b0rk/112716835387523648">asked on Mastodon</a> for
reasons people use job control. There were a lot of great responses, and it
even made me want to consider using job control a little more!</p>
<p>In this post I&rsquo;m only going to talk about using job control interactively (not
in scripts) &ndash; the post is already long enough just talking about interactive
use.</p>
<h3 id="what-s-job-control">what&rsquo;s job control?</h3>
<p>First: what&rsquo;s job control? Well &ndash; in a terminal, your processes can be in one of 3 states:</p>
<ol>
<li>in the <strong>foreground</strong>. This is the normal state when you start a process.</li>
<li>in the <strong>background</strong>. This is what happens when you run <code>some_process &amp;</code>: the process is still running, but you can&rsquo;t interact with it anymore unless you bring it back to the foreground.</li>
<li><strong>stopped</strong>. This is what happens when you start a process and then press <code>Ctrl+Z</code>. This pauses the process: it won&rsquo;t keep using the CPU, but you can restart it if you want.</li>
</ol>
<p>&ldquo;Job control&rdquo; is a set of commands for seeing which processes are running in a terminal and moving processes between these 3 states</p>
<h3 id="how-to-use-job-control">how to use job control</h3>
<ul>
<li><code>fg</code> brings a process to the foreground. It works on both stopped processes and background processes. For example, if you start a background process with <code>cat &lt; /dev/zero &amp;</code>, you can bring it back to the foreground by running <code>fg</code></li>
<li><code>bg</code> restarts a stopped process and puts it in the background.</li>
<li>Pressing <code>Ctrl+z</code> stops the current foreground process.</li>
<li><code>jobs</code> lists all processes that are active in your terminal</li>
<li><code>kill</code> sends a signal (like <code>SIGKILL</code>) to a job (this is the shell builtin <code>kill</code>, not <code>/bin/kill</code>)</li>
<li><code>disown</code> removes the job from the list of running jobs, so that it doesn&rsquo;t get killed when you close the terminal</li>
<li><code>wait</code> waits for all background processes to complete. I only use this in scripts though.</li>
<li>apparently in bash/zsh you can also just type <code>%2</code> instead of <code>fg %2</code></li>
</ul>
<p>I might have forgotten some other job control commands but I think those are all the ones I&rsquo;ve ever used.</p>
<p>You can also give <code>fg</code> or <code>bg</code> a specific job to foreground/background. For example if I see this in the output of <code>jobs</code>:</p>
<pre><code>$ jobs
Job Group State   Command
1   3161  running cat &lt; /dev/zero &amp;
2   3264  stopped nvim -w ~/.vimkeys $argv
</code></pre>
<p>then I can foreground <code>nvim</code> with <code>fg %2</code>. You can also kill it with <code>kill -9 %2</code>, or just <code>kill %2</code> if you want to be more gentle.</p>
<h3 id="how-is-kill-2-implemented">how is <code>kill %2</code> implemented?</h3>
<p>I was curious about how <code>kill %2</code> works &ndash; does <code>%2</code> just get replaced with the
PID of the relevant process when you run the command, the way environment
variables are? Some quick experimentation shows that it isn&rsquo;t:</p>
<pre><code>$ echo kill %2
kill %2
$ type kill
kill is a function with definition
# Defined in /nix/store/vicfrai6lhnl8xw6azq5dzaizx56gw4m-fish-3.7.0/share/fish/config.fish
</code></pre>
<p>So <code>kill</code> is a fish builtin that knows how to interpret <code>%2</code>. Looking at
the source code (which is very easy in fish!), it uses <code>jobs -p %2</code> to expand <code>%2</code>
into a PID, and then runs the regular <code>kill</code> command.</p>
<h3 id="on-differences-between-shells">on differences between shells</h3>
<p>Job control is implemented by your shell. I use fish, but my sense is that the
basics of job control work pretty similarly in bash, fish, and zsh.</p>
<p>There are definitely some shells which don&rsquo;t have job control at all, but I&rsquo;ve
only used bash/fish/zsh so I don&rsquo;t know much about that.</p>
<p>Now let&rsquo;s get into a few reasons people use job control!</p>
<h3 id="reason-1-kill-a-command-that-s-not-responding-to-ctrl-c">reason 1: kill a command that&rsquo;s not responding to Ctrl+C</h3>
<p>I run into processes that don&rsquo;t respond to <code>Ctrl+C</code> pretty regularly, and it&rsquo;s
always a little annoying &ndash; I usually switch terminal tabs to find and kill and
the process. A bunch of people pointed out that you can do this in a faster way
using job control!</p>
<p>How to do this: Press <code>Ctrl+Z</code>, then <code>kill %1</code> (or the appropriate job number
if there&rsquo;s more than one stopped/background job, which you can get from
<code>jobs</code>). You can also <code>kill -9</code> if it&rsquo;s really not responding.</p>
<h3 id="reason-2-background-a-gui-app-so-it-s-not-using-up-a-terminal-tab">reason 2: background a GUI app so it&rsquo;s not using up a terminal tab</h3>
<p>Sometimes I start a GUI program from the command line (for example with
<code>wireshark some_file.pcap</code>), forget to start it in the background, and don&rsquo;t want it eating up my terminal tab.</p>
<p>How to do this:</p>
<ul>
<li>move the GUI program to the background by pressing <code>Ctrl+Z</code> and then running <code>bg</code>.</li>
<li>you can also run <code>disown</code> to remove it from the list of jobs, to make sure that
the GUI program won&rsquo;t get closed when you close your terminal tab.</li>
</ul>
<p>Personally I try to avoid starting GUI programs from the terminal if possible
because I don&rsquo;t like how their stdout pollutes my terminal (on a Mac I use
<code>open -a Wireshark</code> instead because I find it works better but sometimes you
don&rsquo;t have another choice.</p>
<h3 id="reason-2-5-accidentally-started-a-long-running-job-without-tmux">reason 2.5: accidentally started a long-running job without <code>tmux</code></h3>
<p>This is basically the same as the GUI app thing &ndash; you can move the job to the
background and disown it.</p>
<p>I was also curious about if there are ways to redirect a process&rsquo;s output to a
file after it&rsquo;s already started. A quick search turned up <a href="https://github.com/jerome-pouiller/reredirect/">this Linux-only tool</a> which is based on
<a href="https://blog.nelhage.com/">nelhage</a>&rsquo;s <a href="https://github.com/nelhage/reptyr">reptyr</a> (which lets you for example move a
process that you started outside of tmux to tmux) but I haven&rsquo;t tried either of
those.</p>
<h3 id="reason-3-running-a-command-while-using-vim">reason 3: running a command while using <code>vim</code></h3>
<p>A lot of people mentioned that if they want to quickly test something while
editing code in <code>vim</code> or another terminal editor, they like to use <code>Ctrl+Z</code>
to stop vim, run the command, and then run <code>fg</code> to go back to their editor.</p>
<p>You can also use this to check the output of a command that you ran before
starting <code>vim</code>.</p>
<p>I&rsquo;ve never gotten in the habit of this, probably because I mostly use a GUI
version of vim. I feel like I&rsquo;d also be likely to switch terminal tabs and end
up wondering &ldquo;wait&hellip; where did I put my editor???&rdquo; and have to go searching
for it.</p>
<h3 id="reason-4-preferring-interleaved-output">reason 4: preferring interleaved output</h3>
<p>A few people said that they prefer to the output of all of their commands being
interleaved in the terminal. This really surprised me because I usually think
of having the output of lots of different commands interleaved as being a <em>bad</em>
thing, but one person said that they like to do this with tcpdump specifically
and I think that actually sounds extremely useful. Here&rsquo;s what it looks like:</p>
<pre><code># start tcpdump
$ sudo tcpdump -ni any port 1234 &amp;
tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type PKTAP (Apple DLT_PKTAP), snapshot length 524288 bytes

# run curl
$ curl google.com:1234
13:13:29.881018 IP 192.168.1.173.49626 &gt; 142.251.41.78.1234: Flags [S], seq 613574185, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 2730440518 ecr 0,sackOK,eol], length 0
13:13:30.881963 IP 192.168.1.173.49626 &gt; 142.251.41.78.1234: Flags [S], seq 613574185, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 2730441519 ecr 0,sackOK,eol], length 0
13:13:31.882587 IP 192.168.1.173.49626 &gt; 142.251.41.78.1234: Flags [S], seq 613574185, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 2730442520 ecr 0,sackOK,eol], length 0
 
# when you're done, kill the tcpdump in the background
$ kill %1 
</code></pre>
<p>I think it&rsquo;s really nice here that you can see the output of tcpdump inline in
your terminal &ndash; when I&rsquo;m using tcpdump I&rsquo;m always switching back and forth and
I always get confused trying to match up the timestamps, so keeping everything
in one terminal seems like it might be a lot clearer. I&rsquo;m going to try it.</p>
<h3 id="reason-5-suspend-a-cpu-hungry-program">reason 5: suspend a CPU-hungry program</h3>
<p>One person said that sometimes they&rsquo;re running a very CPU-intensive program,
for example converting a video with <code>ffmpeg</code>, and they need to use the CPU for
something else, but don&rsquo;t want to lose the work that ffmpeg already did.</p>
<p>You can do this by pressing <code>Ctrl+Z</code> to pause the process, and then run <code>fg</code>
when you want to start it again.</p>
<h3 id="reason-6-you-accidentally-ran-ctrl-z">reason 6: you accidentally ran Ctrl+Z</h3>
<p>Many people replied that they didn&rsquo;t use job control <em>intentionally</em>, but
that they sometimes accidentally ran Ctrl+Z, which stopped whatever program was
running, so they needed to learn how to use <code>fg</code> to bring it back to the
foreground.</p>
<p>The were also some mentions of accidentally running <code>Ctrl+S</code> too (which stops
your terminal and I think can be undone with <code>Ctrl+Q</code>). My terminal totally
ignores <code>Ctrl+S</code> so I guess I&rsquo;m safe from that one though.</p>
<h3 id="reason-7-already-set-up-a-bunch-of-environment-variables">reason 7: already set up a bunch of environment variables</h3>
<p>Some folks mentioned that they already set up a bunch of environment variables
that they need to run various commands, so it&rsquo;s easier to use job control to
run multiple commands in the same terminal than to redo that work in another
tab.</p>
<h3 id="reason-8-it-s-your-only-option">reason 8: it&rsquo;s your only option</h3>
<p>Probably the most obvious reason to use job control to manage multiple
processes is &ldquo;because you have to&rdquo; &ndash; maybe you&rsquo;re in single-user mode, or on a
very restricted computer, or SSH&rsquo;d into a machine that doesn&rsquo;t have tmux or
screen and you don&rsquo;t want to create multiple SSH sessions.</p>
<h3 id="reason-9-some-people-just-like-it-better">reason 9: some people just like it better</h3>
<p>Some people also said that they just don&rsquo;t like using terminal tabs: for
instance a few folks mentioned that they prefer to be able to see all of their
terminals on the screen at the same time, so they&rsquo;d rather have 4 terminals on
the screen and then use job control if they need to run more than 4 programs.</p>
<h3 id="i-learned-a-few-new-tricks">I learned a few new tricks!</h3>
<p>I think my two main takeaways from thos post is I&rsquo;ll probably try out job control a little more for:</p>
<ol>
<li>killing processes that don&rsquo;t respond to Ctrl+C</li>
<li>running <code>tcpdump</code> in the background with whatever network command I&rsquo;m running, so I can see both of their output in the same place</li>
</ol>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[New zine: How Git Works!]]></title>
        <link href="https://jvns.ca/blog/2024/04/25/new-zine--how-git-works-/"/>
        <updated>2024-06-03T09:45:11+00:00</updated>
        <id>https://jvns.ca/blog/2024/04/25/new-zine--how-git-works-/</id>
        <content type="html"><![CDATA[<p>Hello! I&rsquo;ve been writing about git on here nonstop for months, and the git zine
is FINALLY done! It came out on Friday!</p>
<p>You can get it for $12 here:
<a href="https://wizardzines.com/zines/git">https://wizardzines.com/zines/git</a>, or get
an <a href="https://wizardzines.com/zines/all-the-zines/">14-pack of all my zines here</a>.</p>
<p>Here&rsquo;s the cover:</p>
<div align="center">
<a href="https://wizardzines.com/zines/git">
  <img width="600px" src="https://wizardzines.com/zines/git/cover-small.jpg">
  </a>
</div>
<h3 id="the-table-of-contents">the table of contents</h3>
<p>Here&rsquo;s the table of contents:</p>
<a href="https://wizardzines.com/zines/git/toc.png">
  <img width="600px" src="https://wizardzines.com/zines/git/toc.png">
</a>
<h3 id="who-is-this-zine-for">who is this zine for?</h3>
<p>I wrote this zine for people who have been using git for years and are still
afraid of it. As always &ndash; I think it sucks to be afraid of the tools that you
use in your work every day! I want folks to feel confident using git.</p>
<p>My goals are:</p>
<ul>
<li>To explain how some parts of git that initially seem scary (like &ldquo;detached
HEAD state&rdquo;) are pretty straightforward to deal with once you understand
what&rsquo;s going on</li>
<li>To show some parts of git you probably <em>should</em> be careful around.  For
example, the stash is one of the places in git where it&rsquo;s easiest to lose
your work in a way that&rsquo;s incredibly annoying to recover form, and I avoid
using it heavily because of that.</li>
<li>To clear up a few common misconceptions about how the core parts of git (like
commits, branches, and merging) work</li>
</ul>
<h3 id="what-s-the-difference-between-this-and-oh-shit-git">what&rsquo;s the difference between this and Oh Shit, Git!</h3>
<p>You might be wondering – Julia! You already have a zine about git! What’s going
on? <a href="https://wizardzines.com/zines/oh-shit-git">Oh Shit, Git!</a> is a set of tricks for fixing git messes. <a href="https://wizardzines.com/zines/git/">&ldquo;How Git Works&rdquo;</a>
explains how Git <strong>actually</strong> works.</p>
<p>Also, Oh Shit, Git! is the amazing <a href="https://sylormiller.com/">Katie Sylor Miller</a>&rsquo;s <a href="https://ohshitgit.com/">concept</a>: we made it
into a zine because I was such a huge fan of her work on it.</p>
<p>I think they go really well together.</p>
<h3 id="what-s-so-confusing-about-git-anyway">what&rsquo;s so confusing about git, anyway?</h3>
<p>This zine was really hard for me to write because when I started writing it,
I&rsquo;d been using git pretty confidently for 10 years. I had no real memory of
what it was <em>like</em> to struggle with git.</p>
<p>But thanks to a huge amount of help from <a href="https://marieflanagan.com/">Marie</a> as
well as everyone who talked to me about git on Mastodon, eventually I was able
to see that there are a lot of things about git that are counterintuitive,
misleading, or just plain confusing. These include:</p>
<ul>
<li><a href="https://jvns.ca/blog/2023/11/01/confusing-git-terminology/">confusing terminology</a> (for example &ldquo;fast-forward&rdquo;, &ldquo;reference&rdquo;, or &ldquo;remote-tracking branch&rdquo;)</li>
<li>misleading messages (for example how <code>Your branch is up to date with 'origin/main'</code> doesn&rsquo;t necessary mean that your branch is up to date with the <code>main</code> branch on the origin)</li>
<li>uninformative output (for example how I <em>STILL</em> can&rsquo;t reliably figure out which code comes from which branch when I&rsquo;m looking at a merge conflict)</li>
<li>a lack of guidance around handling diverged branches (for example how when you run <code>git pull</code> and your branch has diverged from the origin, it doesn&rsquo;t give you great guidance how to handle the situation)</li>
<li>inconsistent behaviour (for example how git&rsquo;s reflogs are almost always append-only, EXCEPT for the stash, where git will delete entries when you run <code>git stash drop</code>)</li>
</ul>
<p>The more I heard from people how about how confusing they find git, the more it
became clear that git really does not make it easy to figure out what its
internal logic is just by using it.</p>
<h3 id="handling-git-s-weirdnesses-becomes-pretty-routine">handling git&rsquo;s weirdnesses becomes pretty routine</h3>
<p>The previous section made git sound really bad, like &ldquo;how can anyone possibly
use this thing?&rdquo;.</p>
<p>But my experience is that after I learned what git actually means by all of its
weird error messages, dealing with it became pretty routine! I&rsquo;ll see an
<code>error: failed to push some refs to 'github.com:jvns/wizard-zines-site'</code>,
realize &ldquo;oh right, probably a coworker made some changes to <code>main</code> since I last
ran <code>git pull</code>&rdquo;, run <code>git pull --rebase</code> to incorporate their changes, and move
on with my day. The whole thing takes about 10 seconds.</p>
<p>Or if I see a <code>You are in 'detached HEAD' state</code> warning, I&rsquo;ll just make sure
to run <code>git checkout mybranch</code> before continuing to write code. No big deal.</p>
<p>For me (and for a lot of folks I talk to about git!), dealing with git&rsquo;s weird
language can become so normal that you totally forget why anybody would even
find it weird.</p>
<h3 id="a-little-bit-of-internals">a little bit of internals</h3>
<p>One of my biggest questions when writing this zine was how much to focus on
what&rsquo;s in the <code>.git</code> directory. We ended up deciding to include a couple of
pages about internals (&ldquo;inside .git&rdquo;, pages 14-15), but otherwise focus more on
git&rsquo;s <em>behaviour</em> when you use it and why sometimes git behaves in unexpected
ways.</p>
<p>This is partly because there are lots of great guides to git&rsquo;s internals
out there already (<a href="https://maryrosecook.com/blog/post/git-from-the-inside-out">1</a>, <a href="https://shop.jcoglan.com/building-git/">2</a>), and partly because I think even if you <em>have</em> read one
of these guides to git&rsquo;s internals, it isn&rsquo;t totally obvious how to connect
that information to what you actually see in git&rsquo;s user interface.</p>
<p>For example: it&rsquo;s easy to find documentation about remotes in git &ndash;
for example <a href="https://git-scm.com/book/en/v2/Git-Branching-Remote-Branches">this page</a> says:</p>
<blockquote>
<p>Remote-tracking branches [&hellip;] remind you where the branches in your remote
repositories were the last time you connected to them.</p>
</blockquote>
<p>But even if you&rsquo;ve read that, you might not realize that the statement <code>Your branch is up to date with 'origin/main'&quot;</code> in <code>git status</code> doesn&rsquo;t necessarily
mean that you&rsquo;re actually up to date with the remote <code>main</code> branch.</p>
<p>So in general in the zine we focus on the behaviour you see in Git&rsquo;s UI, and
then explain how that relates to what&rsquo;s happening internally in Git.</p>
<h3 id="the-cheat-sheet">the cheat sheet</h3>
<p>The zine also comes with a free printable cheat sheet: (click to get a PDF version)</p>
<a href="https://wizardzines.com/git-cheat-sheet.pdf">
  <img width="600px" src="https://wizardzines.com/images/cheat-sheet-smaller.png">
</a>
<h3 id="it-comes-with-an-html-transcript">it comes with an HTML transcript!</h3>
<p>The zine also comes with an HTML transcript, to (hopefully) make it easier to
read on a screen reader! Our Operations Manager, Lee, transcribed all of the
pages and wrote image descriptions. I&rsquo;d love feedback about the experience of
reading the zine on a screen reader if you try it.</p>
<h3 id="i-really-do-love-git">I really do love git</h3>
<p>I&rsquo;ve been pretty critical about git in this post, but I only write zines about
technologies I love, and git is no exception.</p>
<p>Some reasons I love git:</p>
<ul>
<li>it&rsquo;s fast!</li>
<li>it&rsquo;s backwards compatible! I learned how to use it 10 years ago and
everything I learned then is still true</li>
<li>there&rsquo;s tons of great free Git hosting available out there (GitHub! Gitlab! a
million more!), so I can easily back up all my code</li>
<li>simple workflows are REALLY simple (if I&rsquo;m working on a project on my own, I
can just run <code>git commit -am 'whatever'</code> and <code>git push</code> over and over again and it
works perfectly)</li>
<li>Almost every internal file in git is a pretty simple text file (or has a
version which is a text file), which makes me feel like I can always
understand exactly what&rsquo;s going on under the hood if I want to.</li>
</ul>
<p>I hope this zine helps some of you love it too.</p>
<h3 id="people-who-helped-with-this-zine">people who helped with this zine</h3>
<p>I don&rsquo;t make these zines by myself!</p>
<p>I worked with <a href="https://marieflanagan.com/">Marie Claire LeBlanc Flanagan</a> every
morning for 8 months to write clear explanations of git.</p>
<p>The cover is by Vladimir Kašiković,
Gersande La Flèche did copy editing,
James Coglan (of the great <a href="https://shop.jcoglan.com/building-git/">Building
Git</a>) did technical review, our
Operations Manager Lee did the transcription as well as a million other
things, my partner Kamal read the zine and told me which parts were off (as he
always does), and I had a million great conversations with Marco Rogers about
git.</p>
<p>And finally, I want to thank all the beta readers! There were 66 this time
which is a record! They left hundreds of comments about what was confusing,
what they learned, and which of my jokes were funny. It&rsquo;s always hard to hear
from beta readers that a page I thought made sense is actually extremely
confusing, and fixing those problems before the final version makes the zine so
much better.</p>
<h3 id="get-the-zine">get the zine</h3>
<p>Here are some links to get the zine again:</p>
<ul>
<li>get <a href="https://wizardzines.com/zines/git">How Git Works</a></li>
<li>get an <a href="https://wizardzines.com/zines/all-the-zines/">14-pack of all my zines here</a>.</li>
</ul>
<p>As always, you can get either a PDF version to print at home or a print version
shipped to your house. The only caveat is print orders will ship in <strong>July</strong> &ndash; I
need to wait for orders to come in to get an idea of how many I should print
before sending it to the printer.</p>
<h3 id="thank-you">thank you</h3>
<p>As always: if you&rsquo;ve bought zines in the past, thank you for all your support
over the years. And thanks to all of you (1000+ people!!!) who have already
bought the zine in the first 3 days. It&rsquo;s already set a record for most zines
sold in a single day and I&rsquo;ve been really blown away.</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Notes on git's error messages]]></title>
        <link href="https://jvns.ca/blog/2024/04/10/notes-on-git-error-messages/"/>
        <updated>2024-04-10T12:43:14+00:00</updated>
        <id>https://jvns.ca/blog/2024/04/10/notes-on-git-error-messages/</id>
        <content type="html"><![CDATA[<p>While writing about Git, I&rsquo;ve noticed that a lot of folks struggle with Git&rsquo;s
error messages. I&rsquo;ve had many years to get used to these error messages so it
took me a really long time to understand <em>why</em> folks were confused, but having
thought about it much more, I&rsquo;ve realized that:</p>
<ol>
<li>sometimes I actually <em>am</em> confused by the error messages, I&rsquo;m just used to
being confused</li>
<li>I have a bunch of strategies for getting more information when the error
message git gives me isn&rsquo;t very informative</li>
</ol>
<p>So in this post, I&rsquo;m going to go through a bunch of Git&rsquo;s error messages,
list a few things that I think are confusing about them for each one, and talk
about what I do when I&rsquo;m confused by the message.</p>
<h3 id="improving-error-messages-isn-t-easy">improving error messages isn&rsquo;t easy</h3>
<p>Before we start, I want to say that trying to think about why these error
messages are confusing has given me a lot of respect for how difficult
maintaining Git is. I&rsquo;ve been thinking about Git for months, and for some of
these messages I really have no idea how to improve them.</p>
<p>Some things that seem hard to me about improving error messages:</p>
<ul>
<li>if you come up with an idea for a new message, it&rsquo;s hard to tell if it&rsquo;s actually better!</li>
<li>work like improving error messages often <a href="https://lwn.net/Articles/959768/">isn&rsquo;t funded</a></li>
<li>the error messages have to be translated (git&rsquo;s error messages are translated into <a href="https://github.com/git/git/tree/master/po">19 languages</a>!)</li>
</ul>
<p>That said, if you find these messages confusing, hopefully some of these notes
will help clarify them a bit.</p>
<style>
.error {
  color: #db322e;
}
.warning {
  color: #765900;
}
.bg {
  color: #fdf6e3
}
pre {
  background-color: #fdf6e3;
  padding: 10px;
  border-radius: 5px;
  /* wrap long lines */
  white-space: pre-wrap;
}

h2 a {
  color: black;
  text-decoration: none;
}

article span {
  padding: 0;
}

article a:hover {
  text-decoration: underline;
}
</style>
<h2 id="git-push-on-a-diverged-branch">
  <a href="#git-push-on-a-diverged-branch">
  error: <code>git push</code> on a diverged branch
  </a>
</h2>
<pre>
$ git push
To github.com:jvns/int-exposed
<span class="error">! [rejected]        main -> main (non-fast-forward)</span>
<span class="warning">error: failed to push some refs to 'github.com:jvns/int-exposed'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.</span>

$ git status
On branch main
Your branch and 'origin/main' have diverged,
and have 2 and 1 different commits each, respectively.
</pre>
<p>Some things I find confusing about this:</p>
<ol>
<li>You get the exact same error message whether the branch is just <strong>behind</strong>
or the branch has <strong>diverged</strong>. There&rsquo;s no way to tell which it is from this
message: you need to run <code>git status</code> or <code>git pull</code> to find out.</li>
<li>It says <code>failed to push some refs</code>, but it&rsquo;s not totally clear <em>which</em> references it
failed to push. I believe everything that failed to push is listed with <code>! [rejected]</code> on the previous line&ndash; in this case just the <code>main</code> branch.</li>
</ol>
<p><strong>What I like to do if I&rsquo;m confused:</strong></p>
<ul>
<li>I&rsquo;ll run <code>git status</code> to figure out what the state of my current branch is.</li>
<li>I think I almost never try to push more than one branch at a time, so I
usually totally ignore git&rsquo;s notes about which specific branch failed to push
&ndash; I just assume that it&rsquo;s my current branch</li>
</ul>
<h2 id="git-pull-on-a-diverged-branch">
  <a href="#git-pull-on-a-diverged-branch">
  error: <code>git pull</code> on a diverged branch
  </a>
</h2>
<pre>
$ git pull
<span class="warning">hint: You have divergent branches and need to specify how to reconcile them.
hint: You can do so by running one of the following commands sometime before
hint: your next pull:
hint:
hint:   git config pull.rebase false  # merge
hint:   git config pull.rebase true   # rebase
hint:   git config pull.ff only       # fast-forward only
hint:
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.</span>
fatal: Need to specify how to reconcile divergent branches.
</pre>
<p>The main thing I think is confusing here is that git is presenting you with a
kind of overwhelming number of options: it&rsquo;s saying that you can either:</p>
<ol>
<li>configure <code>pull.rebase false</code>, <code>pull.rebase true</code>, or <code>pull.ff only</code> locally</li>
<li>or configure them globally</li>
<li>or run <code>git pull --rebase</code> or <code>git pull --no-rebase</code></li>
</ol>
<p>It&rsquo;s very hard to imagine how a beginner to git could easily use this hint to
sort through all these options on their own.</p>
<p>If I were explaining this to a friend, I&rsquo;d say something like &ldquo;you can use <code>git pull --rebase</code>
or <code>git pull --no-rebase</code> to resolve this with a rebase or merge
<em>right now</em>, and if you want to set a permanent preference, you can do that
with <code>git config pull.rebase false</code> or <code>git config pull.rebase true</code>.</p>
<p><code>git config pull.ff only</code> feels a little redundant to me because that&rsquo;s git&rsquo;s
default behaviour anyway (though it wasn&rsquo;t always).</p>
<p><strong>What I like to do here:</strong></p>
<ul>
<li>run <code>git status</code> to see the state of my current branch</li>
<li>maybe run <code>git log origin/main</code> or <code>git log</code> to see what the diverged commits are</li>
<li>usually run <code>git pull --rebase</code> to resolve it</li>
<li>sometimes I&rsquo;ll run <code>git push --force</code> or <code>git reset --hard origin/main</code> if I
want to throw away my local work or remote work (for example because I
accidentally commited to the wrong branch, or because I ran <code>git commit --amend</code> on a personal branch that only I&rsquo;m using and want to force push)</li>
</ul>
<h2 id="git-checkout-asdf">
  <a href="#git-checkout-asdf">
  error: <code>git checkout asdf</code> (a branch that doesn't exist)
  </a>
</h2>
<pre>
$ git checkout asdf
error: pathspec 'asdf' did not match any file(s) known to git
</pre>
<p>This is a little weird because we my intention was to check out a <strong>branch</strong>,
but <code>git checkout</code> is complaining about a <strong>path</strong> that doesn&rsquo;t exist.</p>
<p>This is happening because <code>git checkout</code>&rsquo;s first argument can be either a
branch or a path, and git has no way of knowing which one you intended. This
seems tricky to improve, but I might expect something like &ldquo;No such branch,
commit, or path: asdf&rdquo;.</p>
<p><strong>What I like to do here:</strong></p>
<ul>
<li>in theory it would be good to use <code>git switch</code> instead, but I keep using <code>git checkout</code> anyway</li>
<li>generally I just remember that I need to decode this as &ldquo;branch <code>asdf</code> doesn&rsquo;t exist&rdquo;</li>
</ul>
<h2 id="git-switch-asdf">
  <a href="#git-switch-asdf">
  error: <code>git switch asdf</code> (a branch that doesn't exist)
  </a>
</h2>
<pre>
$ git switch asdf
fatal: invalid reference: asdf
</pre>
<p><code>git switch</code> only accepts a branch as an argument (unless you pass <code>-d</code>), so why is it saying <code>invalid reference: asdf</code> instead of <code>invalid branch: asdf</code>?</p>
<p>I think the reason is that internally, <code>git switch</code> is trying to be helpful in its error messages: if you run <code>git switch v0.1</code> to switch to a tag, it&rsquo;ll say:</p>
<pre><code>$ git switch v0.1
fatal: a branch is expected, got tag 'v0.1'`
</code></pre>
<p>So what git is trying to communicate with <code>fatal: invalid reference: asdf</code> is
&ldquo;<code>asdf</code> isn&rsquo;t a branch, but it&rsquo;s not a tag either, or any other reference&rdquo;. From my various <a href="https://jvns.ca/blog/2024/03/28/git-poll-results/">git polls</a> my impression is that
a lot of git users have literally no idea what a &ldquo;reference&rdquo; is in git, so I&rsquo;m not sure if that&rsquo;s coming across.</p>
<p><strong>What I like to do here:</strong></p>
<p>90% of the time when a git error message says <code>reference</code> I just mentally
replace it with <code>branch</code> in my head.</p>
<h2 id="detached-head">
  error: <a href="#detached-head"><code>git checkout HEAD^</code></a>
</h2>
<pre>$ git checkout HEAD^
Note: switching to 'HEAD^'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 182cd3f add "swap byte order" button
</pre>
<p>
This is a tough one. Definitely a lot of people are confused about this
message, but obviously there's been a lot of effort to improve it too. I don't
have anything smart to say about this one.
</p>
<p><strong>What I like to do here:</strong></p>
<ul>
<li>my shell prompt tells me if I&rsquo;m in detached HEAD state, and generally I can remember not to make new commits while in that state</li>
<li>when I&rsquo;m done looking at whatever old commits I wanted to look at, I&rsquo;ll run <code>git checkout main</code> or something to go back to a branch</li>
</ul>
<h2 id="rebase-in-progress">
  <a href="#rebase-in-progress">
  message: <code>git status</code> when a rebase is in progress
  </a>  
</h2>
<p>This isn&rsquo;t an error message, but I still find it a little confusing on its own:</p>
<pre>
$ git status
<span class="error">interactive rebase in progress;</span> onto c694cf8
Last command done (1 command done):
   pick 0a9964d wip
No commands remaining.
You are currently rebasing branch 'main' on 'c694cf8'.
  (fix conflicts and then run "git rebase --continue")
  (use "git rebase --skip" to skip this patch)
  (use "git rebase --abort" to check out the original branch)

Unmerged paths:
  (use "git restore --staged <file>..." to unstage)
  (use "git add <file>..." to mark resolution)
  <span class="error">both modified:   index.html</span>

no changes added to commit (use "git add" and/or "git commit -a")
</pre>
<p>Two things I think could be clearer here:</p>
<ol>
<li>I think it would be nice if <code>You are currently rebasing branch 'main' on 'c694cf8'.</code> were on the first line instead of the 5th line &ndash; right now the first line doesn&rsquo;t say which branch you&rsquo;re rebasing.</li>
<li>In this case, <code>c694cf8</code> is actually <code>origin/main</code>, so I feel like <code>You are currently rebasing branch 'main' on 'origin/main'</code> might be even clearer.</li>
</ol>
<p><strong>What I like to do here:</strong></p>
<p>My shell prompt includes the branch that I&rsquo;m currently rebasing, so I rely on that instead of the output of <code>git status</code>.</p>
<h2 id="merge-deleted">
  <a href="#merge-deleted">
  error: <code>git rebase</code> when a file has been deleted
  </a>
</h2>
<pre>
$ git rebase main
CONFLICT (modify/delete): index.html deleted in 0ce151e (wip) and modified in HEAD.  Version HEAD of index.html left in tree.
error: could not apply 0ce151e... wip
</pre>
<p>The thing I still find confusing about this is &ndash; <code>index.html</code> was modified in
<code>HEAD</code>. But what is <code>HEAD</code>? Is it the commit I was working on when I started
the merge/rebase, or is it the commit from the other branch? (the answer is
&ldquo;<code>HEAD</code> is your branch if you&rsquo;re doing a merge, and it&rsquo;s the &ldquo;other branch&rdquo; if
you&rsquo;re doing a rebase, but I always find that hard to remember)</p>
<p>I think I would personally find it easier to understand if the message listed the branch names if possible, something like this:</p>
<pre><code>CONFLICT (modify/delete): index.html deleted on `main` and modified on `mybranch`
</code></pre>
<h2 id="merge-ours">
  <a href="#merge-ours">
  error: <code>git status</code> during a merge or rebase (who is "them"?)
  </a>
</h2>
<pre>
$ git status 
On branch master
You have unmerged paths.
  (fix conflicts and run "git commit")
  (use "git merge --abort" to abort the merge)
<p>Unmerged paths:
(use &ldquo;git add/rm <file>&hellip;&rdquo; as appropriate to mark resolution)
deleted by them: the_file</p>
<p>no changes added to commit (use &ldquo;git add&rdquo; and/or &ldquo;git commit -a&rdquo;)
</pre></p>
<p>I find this one confusing in exactly the same way as the previous message: it
says <code>deleted by them:</code>, but what &ldquo;them&rdquo; refers to depends on whether you did a merge or rebase or cherry-pick.</p>
<ul>
<li>for a merge, <code>them</code> is the other branch you merged in</li>
<li>for a rebase, <code>them</code> is the branch that you were on when you ran <code>git rebase</code></li>
<li>for a cherry-pick, I guess it&rsquo;s the commit you cherry-picked</li>
</ul>
<p><strong>What I like to do if I&rsquo;m confused:</strong></p>
<ul>
<li>try to remember what I did</li>
<li>run <code>git show main --stat</code> or something to see what I did on the <code>main</code> branch if I can&rsquo;t remember</li>
</ul>
<h2 id="git clean">
  <a href="#git-clean">
  error: <code>git clean</code>
  </a>
</h2>
<pre>
$ git clean
fatal: clean.requireForce defaults to true and neither -i, -n, nor -f given; refusing to clean
</pre>
<p>I just find it a bit confusing that you need to look up what <code>-i</code>, <code>-n</code> and
<code>-f</code> are to be able to understand this error message. I&rsquo;m personally way too
lazy to do that so even though I&rsquo;ve probably been using <code>git clean</code> for 10
years I still had no idea what <code>-i</code> stood for (<code>interactive</code>) until I was
writing this down.</p>
<p><strong>What I like to do if I&rsquo;m confused:</strong></p>
<p>Usually I just chaotically run <code>git clean -f</code> to delete all my untracked files
and hope for the best, though I might actually switch to <code>git clean -i</code>  now
that I know what <code>-i</code> stands for. Seems a lot safer.</p>
<h3 id="that-s-all">that&rsquo;s all!</h3>
<p>Hopefully some of this is helpful!</p>
]]></content>
    </entry>
</feed>
Raw text
<feed xmlns="http://www.w3.org/2005/Atom">
  <title><![CDATA[Julia Evans]]></title>
  <link href="http://jvns.ca/atom.xml" rel="self"/>
  <link href="http://jvns.ca"/>
  <updated>2025-03-07T13:18:31+00:00</updated>
  <id>http://jvns.ca</id>
  <author>
      <name>Julia Evans</name>
  </author>
  <generator uri="http://gohugo.io/">Hugo</generator>

  
  <entry>
    <title type="html"><![CDATA[Standards for ANSI escape codes]]></title>
    <link href="https://jvns.ca/blog/2025/03/07/escape-code-standards/"/>
    <updated>2025-03-07T00:00:00+00:00</updated>
    <id>https://jvns.ca/blog/2025/03/07/escape-code-standards/</id>
    <content type="html"><![CDATA[<p>Hello! Today I want to talk about ANSI escape codes.</p>
<p>For a long time I was vaguely aware of ANSI escape codes (&ldquo;that&rsquo;s how you make
text red in the terminal and stuff&rdquo;) but I had no real understanding of where they were
supposed to be defined or whether or not there were standards for them. I just
had a kind of vague &ldquo;there be dragons&rdquo; feeling around them. While learning
about the terminal this year, I&rsquo;ve learned that:</p>
<ol>
<li>ANSI escape codes are responsible for a lot of usability improvements
in the terminal (did you know there&rsquo;s a way to copy to your system clipboard
when SSHed into a remote machine?? It&rsquo;s an escape code called <a href="https://jvns.ca/til/vim-osc52/">OSC 52</a>!)</li>
<li>They aren&rsquo;t completely standardized, and because of that they don&rsquo;t always
work reliably. And because they&rsquo;re also invisible, it&rsquo;s extremely
frustrating to troubleshoot escape code issues.</li>
</ol>
<p>So I wanted to put together a list for myself of some standards that exist
around escape codes, because I want to know if they <em>have</em> to feel unreliable
and frustrating, or if there&rsquo;s a future where we could all rely on them with
more confidence.</p>
<ul>
<li><a href="#what-s-an-escape-code">what&rsquo;s an escape code?</a></li>
<li><a href="#ecma-48">ECMA-48</a></li>
<li><a href="#xterm-control-sequences">xterm control sequences</a></li>
<li><a href="#terminfo">terminfo</a></li>
<li><a href="#should-programs-use-terminfo">should programs use terminfo?</a></li>
<li><a href="#is-there-a-single-common-set-of-escape-codes">is there a &ldquo;single common set&rdquo; of escape codes?</a></li>
<li><a href="#some-reasons-to-use-terminfo">some reasons to use terminfo</a></li>
<li><a href="#some-more-documents-standards">some more documents/standards</a></li>
<li><a href="#why-i-think-this-is-interesting">why I think this is interesting</a></li>
</ul>
<h3 id="what-s-an-escape-code">what&rsquo;s an escape code?</h3>
<p>Have you ever pressed the left arrow key in your terminal and seen <code>^[[D</code>?
That&rsquo;s an escape code! It&rsquo;s called an &ldquo;escape code&rdquo; because the first character
is the &ldquo;escape&rdquo; character, which is usually written as <code>ESC</code>, <code>\x1b</code>, <code>\E</code>,
<code>\033</code>, or <code>^[</code>.</p>
<p>Escape codes are how your terminal emulator communicates various kinds of
information (colours, mouse movement, etc) with programs running in the
terminal. There are two kind of escape codes:</p>
<ol>
<li><strong>input codes</strong> which your terminal emulator sends for keypresses or mouse
movements that don&rsquo;t fit into Unicode. For example &ldquo;left arrow key&rdquo; is
<code>ESC[D</code>, &ldquo;Ctrl+left arrow&rdquo; might be <code>ESC[1;5D</code>, and clicking the mouse might
be something like <code>ESC[M :3</code>.</li>
<li><strong>output codes</strong> which programs can print out to colour text, move the
cursor around, clear the screen, hide the cursor, copy text to the
clipboard, enable mouse reporting, set the window title, etc.</li>
</ol>
<p>Now let&rsquo;s talk about standards!</p>
<h3 id="ecma-48">ECMA-48</h3>
<p>The first standard I found relating to escape codes was
<a href="https://ecma-international.org/wp-content/uploads/ECMA-48_5th_edition_june_1991.pdf">ECMA-48</a>,
which was originally published in 1976.</p>
<p>ECMA-48 does two things:</p>
<ol>
<li>Define some general <em>formats</em> for escape codes (like &ldquo;CSI&rdquo; codes, which are
<code>ESC[</code> + something and &ldquo;OSC&rdquo; codes, which are <code>ESC]</code> + something)</li>
<li>Define some specific escape codes, like how &ldquo;move the cursor to the left&rdquo; is
<code>ESC[D</code>, or &ldquo;turn text red&rdquo; is  <code>ESC[31m</code>. In the spec, the &ldquo;cursor left&rdquo;
one is called <code>CURSOR LEFT</code> and the one for changing colours is called
<code>SELECT GRAPHIC RENDITION</code>.</li>
</ol>
<p>The formats are extensible, so there&rsquo;s room for others to define more escape
codes in the future. Lots of escape codes that are popular today aren&rsquo;t defined
in ECMA-48: for example it&rsquo;s pretty common for terminal applications (like vim,
htop, or tmux) to support using the mouse, but ECMA-48 doesn&rsquo;t define escape
codes for the mouse.</p>
<h3 id="xterm-control-sequences">xterm control sequences</h3>
<p>There are a bunch of escape codes that aren&rsquo;t defined in ECMA-48, for example:</p>
<ul>
<li>enabling mouse reporting (where did you click in your terminal?)</li>
<li>bracketed paste (did you paste that text or type it in?)</li>
<li>OSC 52 (which terminal applications can use to copy text to your system clipboard)</li>
</ul>
<p>I believe (correct me if I&rsquo;m wrong!) that these and some others came from
xterm, are documented in <a href="https://invisible-island.net/xterm/ctlseqs/ctlseqs.html">XTerm Control Sequences</a>, and have
been widely implemented by other terminal emulators.</p>
<p>This list of &ldquo;what xterm supports&rdquo; is not a standard exactly, but xterm is
extremely influential and so it seems like an important document.</p>
<h3 id="terminfo">terminfo</h3>
<p>In the 80s (and to some extent today, but my understanding is that it was MUCH
more dramatic in the 80s) there was a huge amount of variation in what escape
codes terminals actually supported.</p>
<p>To deal with this, there&rsquo;s a database of escape codes for various terminals
called &ldquo;terminfo&rdquo;.</p>
<p>It looks like the standard for terminfo is called <a href="https://publications.opengroup.org/c243-1">X/Open Curses</a>, though you need to create
an account to view that standard for some reason. It defines the database format as well
as a C library interface (&ldquo;curses&rdquo;) for accessing the database.</p>
<p>For example you can run this bash snippet to see every possible escape code for
&ldquo;clear screen&rdquo; for all of the different terminals your system knows about:</p>
<pre><code>for term in $(toe -a | awk '{print $1}')
do
  echo $term
  infocmp -1 -T &quot;$term&quot; 2&gt;/dev/null | grep 'clear=' | sed 's/clear=//g;s/,//g'
done
</code></pre>
<p>On my system (and probably every system I&rsquo;ve ever used?), the terminfo database is managed by ncurses.</p>
<h3 id="should-programs-use-terminfo">should programs use terminfo?</h3>
<p>I think it&rsquo;s interesting that there are two main approaches that applications
take to handling ANSI escape codes:</p>
<ol>
<li>Use the terminfo database to figure out which escape codes to use, depending
on what&rsquo;s in the <code>TERM</code> environment variable. Fish does this, for example.</li>
<li>Identify a &ldquo;single common set&rdquo; of escape codes which works in &ldquo;enough&rdquo;
terminal emulators and just hardcode those.</li>
</ol>
<p>Some examples of programs/libraries that take approach #2 (&ldquo;don&rsquo;t use terminfo&rdquo;) include:</p>
<ul>
<li><a href="https://github.com/mawww/kakoune/commit/c12699d2e9c2806d6ed184032078d0b84a3370bb">kakoune</a></li>
<li><a href="https://github.com/prompt-toolkit/python-prompt-toolkit/blob/165258d2f3ae594b50f16c7b50ffb06627476269/src/prompt_toolkit/input/ansi_escape_sequences.py#L5-L8">python-prompt-toolkit</a></li>
<li><a href="https://github.com/antirez/linenoise">linenoise</a></li>
<li><a href="https://github.com/rockorager/libvaxis">libvaxis</a></li>
<li><a href="https://github.com/chalk/chalk">chalk</a></li>
</ul>
<p>I got curious about why folks might be moving away from terminfo and I found
this very interesting and extremely detailed
<a href="https://twoot.site/@bean/113056942625234032">rant about terminfo from one of the fish maintainers</a>, which argues that:</p>
<blockquote>
<p>[the terminfo authors] have done a lot of work that, at the time, was
extremely important and helpful. My point is that it no longer is.</p>
</blockquote>
<p>I&rsquo;m not going to do it justice so I&rsquo;m not going to summarize it, I think it&rsquo;s
worth reading.</p>
<h3 id="is-there-a-single-common-set-of-escape-codes">is there a &ldquo;single common set&rdquo; of escape codes?</h3>
<p>I was just talking about the idea that you can use a &ldquo;common set&rdquo; of escape
codes that will work for most people. But what is that set? Is there any agreement?</p>
<p>I really do not know the answer to this at all, but from doing some reading it
seems like it&rsquo;s some combination of:</p>
<ul>
<li>The codes that the VT100 supported (though some aren&rsquo;t relevant on modern terminals)</li>
<li>what&rsquo;s in ECMA-48 (which I think also has some things that are no longer relevant)</li>
<li>What xterm supports (though I&rsquo;d guess that not everything in there is actually widely supported enough)</li>
</ul>
<p>and maybe ultimately &ldquo;identify the terminal emulators you think your users are
going to use most frequently and test in those&rdquo;, the same way web developers do
when deciding which CSS features are okay to use</p>
<p>I don&rsquo;t think there are any resources like <a href="https://caniuse.com/">Can I use&hellip;?</a> or
<a href="https://web-platform-dx.github.io/web-features/">Baseline</a> for the terminal
though. (in theory terminfo is supposed to be the &ldquo;caniuse&rdquo; for the terminal
but it seems like it often takes 10+ years to add new terminal features when
people invent them which makes it very limited)</p>
<h3 id="some-reasons-to-use-terminfo">some reasons to use terminfo</h3>
<p>I also asked on Mastodon why people found terminfo valuable in 2025 and got a
few reasons that made sense to me:</p>
<ul>
<li>some people expect to be able to use the <code>TERM</code> environment variable to
control how programs behave (for example with <code>TERM=dumb</code>), and there&rsquo;s
no standard for how that should work in a post-terminfo world</li>
<li>even though there&rsquo;s <em>less</em> variation between terminal emulators than
there was in the 80s, there&rsquo;s far from zero variation: there are graphical
terminals, the Linux framebuffer console, the situation you&rsquo;re in when
connecting to a server via its serial console, Emacs shell mode, and probably
more that I&rsquo;m missing</li>
<li>there is no one standard for what the &ldquo;single common set&rdquo; of escape codes
is, and sometimes programs use escape codes which aren&rsquo;t actually widely
supported enough</li>
</ul>
<h3 id="terminfo-user-agent-detection">terminfo &amp; user agent detection</h3>
<p>The way that ncurses uses the <code>TERM</code> environment variable to decide which
escape codes to use reminds me of how webservers used to sometimes use the
browser user agent to decide which version of a website to serve.</p>
<p>It also seems like it&rsquo;s had some of the same results &ndash; the way iTerm2 reports
itself as being &ldquo;xterm-256color&rdquo; feels similar to how Safari&rsquo;s user agent is
&ldquo;Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7_4) AppleWebKit/605.1.15 (KHTML,
like Gecko) Version/18.3 Safari/605.1.15&rdquo;. In both cases the terminal emulator
/ browser ends up changing its user agent to get around user agent detection
that isn&rsquo;t working well.</p>
<p>On the web we ended up deciding that user agent detection was not a good
practice and to instead focus on standardization so we can serve the same
HTML/CSS to all browsers. I don&rsquo;t know if the same approach is the future in
the terminal though &ndash; I think the terminal landscape today is much more
fragmented than the web ever was as well as being much less well funded.</p>
<h3 id="some-more-documents-standards">some more documents/standards</h3>
<p>A few more documents and standards related to escape codes, in no particular order:</p>
<ul>
<li>the <a href="https://man7.org/linux/man-pages/man4/console_codes.4.html">Linux console_codes man page</a> documents
escape codes that Linux supports</li>
<li>how the <a href="https://vt100.net/docs/vt100-ug/chapter3.html">VT 100</a> handles escape codes &amp; control sequences</li>
<li>the <a href="https://sw.kovidgoyal.net/kitty/keyboard-protocol/">kitty keyboard protocol</a></li>
<li><a href="https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda">OSC 8</a> for links in the terminal (and notes on <a href="https://github.com/Alhadis/OSC8-Adoption?tab=readme-ov-file">adoption</a>)</li>
<li>A <a href="https://github.com/tmux/tmux/blob/882fb4d295deb3e4b803eb444915763305114e4f/tools/ansicode.txt">summary of ANSI standards from tmux</a></li>
<li>this <a href="https://iterm2.com/feature-reporting/">terminal features reporting specification from iTerm</a></li>
<li>sixel graphics</li>
</ul>
<h3 id="why-i-think-this-is-interesting">why I think this is interesting</h3>
<p>I sometimes see people saying that the unix terminal is &ldquo;outdated&rdquo;, and since I
love the terminal so much I&rsquo;m always curious about what incremental changes
might make it feel less &ldquo;outdated&rdquo;.</p>
<p>Maybe if we had a clearer standards landscape (like we do on the web!) it would
be easier for terminal emulator developers to build new features and for
authors of terminal applications to more confidently adopt those features so
that we can all benefit from them and have a richer experience in the terminal.</p>
<p>Obviously standardizing ANSI escape codes is not easy (ECMA-48 was first
published almost 50 years ago and we&rsquo;re still not there!). I don&rsquo;t even know
what all of the challenges are. But the situation with HTML/CSS/JS used to be
extremely bad too and now it&rsquo;s MUCH better, so maybe there&rsquo;s hope.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[How to add a directory to your PATH]]></title>
    <link href="https://jvns.ca/blog/2025/02/13/how-to-add-a-directory-to-your-path/"/>
    <updated>2025-02-13T12:27:56+00:00</updated>
    <id>https://jvns.ca/blog/2025/02/13/how-to-add-a-directory-to-your-path/</id>
    <content type="html"><![CDATA[<p>I was talking to a friend about how to add a directory to your PATH today. It&rsquo;s
something that feels &ldquo;obvious&rdquo; to me since I&rsquo;ve been using the terminal for a
long time, but when I searched for instructions for how to do it, I actually
couldn&rsquo;t find something that explained all of the steps &ndash; a lot of them just
said &ldquo;add this to <code>~/.bashrc</code>&rdquo;, but what if you&rsquo;re not using bash? What if your
bash config is actually in a different file? And how are you supposed to figure
out which directory to add anyway?</p>
<p>So I wanted to try to write down some more complete directions and mention some
of the gotchas I&rsquo;ve run into over the years.</p>
<p>Here&rsquo;s a table of contents:</p>
<ul>
<li><a href="#step-1-what-shell-are-you-using">step 1: what shell are you using?</a></li>
<li><a href="#step-2-find-your-shell-s-config-file">step 2: find your shell&rsquo;s config file</a>
<ul>
<li><a href="#a-note-on-bash-s-config-file">a note on bash&rsquo;s config file</a></li>
</ul>
</li>
<li><a href="#step-3-figure-out-which-directory-to-add">step 3: figure out which directory to add</a>
<ul>
<li><a href="#step-3-1-double-check-it-s-the-right-directory">step 3.1: double check it&rsquo;s the right directory</a></li>
</ul>
</li>
<li><a href="#step-4-edit-your-shell-config">step 4: edit your shell config</a></li>
<li><a href="#step-5-restart-your-shell">step 5: restart your shell</a></li>
<li>problems:
<ul>
<li><a href="#problem-1-it-ran-the-wrong-program">problem 1: it ran the wrong program</a></li>
<li><a href="#problem-2-the-program-isn-t-being-run-from-your-shell">problem 2: the program isn&rsquo;t being run from your shell</a></li>
<li><a href="#problem-3-duplicate-path-entries-making-it-harder-to-debug">problem 3: duplicate PATH entries making it harder to debug</a></li>
<li><a href="#problem-4-losing-your-history-after-updating-your-path">problem 4: losing your history after updating your PATH</a></li>
</ul>
</li>
<li>notes:
<ul>
<li><a href="#a-note-on-source">a note on source</a></li>
<li><a href="#a-note-on-fish-add-path">a note on fish_add_path</a></li>
</ul>
</li>
</ul>
<h3 id="step-1-what-shell-are-you-using">step 1: what shell are you using?</h3>
<p>If you&rsquo;re not sure what shell you&rsquo;re using, here&rsquo;s a way to find out. Run this:</p>
<pre><code>ps -p $$ -o pid,comm=
</code></pre>
<ul>
<li>if you&rsquo;re using <strong>bash</strong>, it&rsquo;ll print out <code>97295 bash</code></li>
<li>if you&rsquo;re using <strong>zsh</strong>, it&rsquo;ll print out <code>97295 zsh</code></li>
<li>if you&rsquo;re using <strong>fish</strong>, it&rsquo;ll print out an error like &ldquo;In fish, please use
$fish_pid&rdquo; (<code>$$</code> isn&rsquo;t valid syntax in fish, but in any case the error
message tells you that you&rsquo;re using fish, which you probably already knew)</li>
</ul>
<p>Also bash is the default on Linux and zsh is the default on Mac OS (as of
2024). I&rsquo;ll only cover bash, zsh, and fish in these directions.</p>
<h3 id="step-2-find-your-shell-s-config-file">step 2: find your shell&rsquo;s config file</h3>
<ul>
<li>in zsh, it&rsquo;s probably <code>~/.zshrc</code></li>
<li>in bash, it might be <code>~/.bashrc</code>, but it&rsquo;s complicated, see the note in the next section</li>
<li>in fish, it&rsquo;s probably <code>~/.config/fish/config.fish</code> (you can run <code>echo $__fish_config_dir</code> if you want to be 100% sure)</li>
</ul>
<h3 id="a-note-on-bash-s-config-file">a note on bash&rsquo;s config file</h3>
<p>Bash has three possible config files: <code>~/.bashrc</code>, <code>~/.bash_profile</code>, and <code>~/.profile</code>.</p>
<p>If you&rsquo;re not sure which one your system is set up to use, I&rsquo;d recommend
testing this way:</p>
<ol>
<li>add <code>echo hi there</code> to your <code>~/.bashrc</code></li>
<li>Restart your terminal</li>
<li>If you see &ldquo;hi there&rdquo;, that means <code>~/.bashrc</code> is being used! Hooray!</li>
<li>Otherwise remove it and try the same thing with <code>~/.bash_profile</code></li>
<li>You can also try <code>~/.profile</code> if the first two options don&rsquo;t work.</li>
</ol>
<p>(there are a lot of <a href="https://blog.flowblok.id.au/2013-02/shell-startup-scripts.html">elaborate flow charts</a> out there that explain how bash
decides which config file to use but IMO it&rsquo;s not worth it to internalize them
and just testing is the fastest way to be sure)</p>
<h3 id="step-3-figure-out-which-directory-to-add">step 3: figure out which directory to add</h3>
<p>Let&rsquo;s say that you&rsquo;re trying to install and run a program called <code>http-server</code>
and it doesn&rsquo;t work, like this:</p>
<pre><code>$ npm install -g http-server
$ http-server
bash: http-server: command not found
</code></pre>
<p>How do you find what directory <code>http-server</code> is in? Honestly in general this is
not that easy &ndash; often the answer is something like &ldquo;it depends on how npm is
configured&rdquo;. A few ideas:</p>
<ul>
<li>Often when setting up a new installer (like <code>cargo</code>, <code>npm</code>, <code>homebrew</code>, etc),
when you first set it up it&rsquo;ll print out some directions about how to update
your PATH. So if you&rsquo;re paying attention you can get the directions then.</li>
<li>Sometimes installers will automatically update your shell&rsquo;s config file
to update your <code>PATH</code> for you</li>
<li>Sometimes just Googling &ldquo;where does npm install things?&rdquo; will turn up the
answer</li>
<li>Some tools have a subcommand that tells you where they&rsquo;re configured to
install things, like:
<ul>
<li>Node/npm: <code>npm config get prefix</code> (then append <code>/bin/</code>)</li>
<li>Go: <code>go env GOPATH</code> (then append <code>/bin/</code>)</li>
<li>asdf: <code>asdf info | grep ASDF_DIR</code> (then append <code>/bin/</code> and <code>/shims/</code>)</li>
</ul>
</li>
</ul>
<h3 id="step-3-1-double-check-it-s-the-right-directory">step 3.1: double check it&rsquo;s the right directory</h3>
<p>Once you&rsquo;ve found a directory you think might be the right one, make sure it&rsquo;s
actually correct! For example, I found out that on my machine, <code>http-server</code> is
in <code>~/.npm-global/bin</code>. I can make sure that it&rsquo;s the right directory by trying to
run the program <code>http-server</code> in that directory like this:</p>
<pre><code>$ ~/.npm-global/bin/http-server
Starting up http-server, serving ./public
</code></pre>
<p>It worked! Now that you know what directory you need to add to your <code>PATH</code>,
let&rsquo;s move to the next step!</p>
<h3 id="step-4-edit-your-shell-config">step 4: edit your shell config</h3>
<p>Now we have the 2 critical pieces of information we need:</p>
<ol>
<li>Which directory you&rsquo;re trying to add to your PATH (like  <code>~/.npm-global/bin/</code>)</li>
<li>Where your shell&rsquo;s config is (like <code>~/.bashrc</code>, <code>~/.zshrc</code>, or <code>~/.config/fish/config.fish</code>)</li>
</ol>
<p>Now what you need to add depends on your shell:</p>
<p><strong>bash instructions:</strong></p>
<p>Open your shell&rsquo;s config file, and add a line like this:</p>
<pre><code>export PATH=$PATH:~/.npm-global/bin/
</code></pre>
<p>(obviously replace <code>~/.npm-global/bin</code> with the actual directory you&rsquo;re trying to add)</p>
<p><strong>zsh instructions:</strong></p>
<p>You can do the same thing as in bash, but zsh also has some slightly fancier
syntax you can use if you prefer:</p>
<pre><code>path=(
  $path
  ~/.npm-global/bin
)
</code></pre>
<p><strong>fish instructions:</strong></p>
<p>In fish, the syntax is different:</p>
<pre><code>set PATH $PATH ~/.npm-global/bin
</code></pre>
<p>(in fish you can also use <code>fish_add_path</code>, some notes on that <a href="#a-note-on-fish-add-path">further down</a>)</p>
<h3 id="step-5-restart-your-shell">step 5: restart your shell</h3>
<p>Now, an extremely important step: updating your shell&rsquo;s config won&rsquo;t take
effect if you don&rsquo;t restart it!</p>
<p>Two ways to do this:</p>
<ol>
<li>open a new terminal (or terminal tab), and maybe close the old one so you don&rsquo;t get confused</li>
<li>Run <code>bash</code> to start a new shell (or <code>zsh</code> if you&rsquo;re using zsh, or <code>fish</code> if you&rsquo;re using fish)</li>
</ol>
<p>I&rsquo;ve found that both of these usually work fine.</p>
<p>And you should be done! Try running the program you were trying to run and
hopefully it works now.</p>
<p>If not, here are a couple of problems that you might run into:</p>
<h3 id="problem-1-it-ran-the-wrong-program">problem 1: it ran the wrong program</h3>
<p>If the wrong <strong>version</strong> of a program is running, you might need to add the
directory to the <em>beginning</em> of your PATH instead of the end.</p>
<p>For example, on my system I have two versions of <code>python3</code> installed, which I
can see by running <code>which -a</code>:</p>
<pre><code>$ which -a python3
/usr/bin/python3
/opt/homebrew/bin/python3
</code></pre>
<p>The one your shell will use is the <strong>first one listed</strong>.</p>
<p>If you want to use the Homebrew version, you need to add that directory
(<code>/opt/homebrew/bin</code>) to the <strong>beginning</strong> of your PATH instead, by putting this in
your shell&rsquo;s config file (it&rsquo;s <code>/opt/homebrew/bin/:$PATH</code> instead of the usual <code>$PATH:/opt/homebrew/bin/</code>)</p>
<pre><code>export PATH=/opt/homebrew/bin/:$PATH
</code></pre>
<p>or in fish:</p>
<pre><code>set PATH ~/.cargo/bin $PATH
</code></pre>
<h3 id="problem-2-the-program-isn-t-being-run-from-your-shell">problem 2: the program isn&rsquo;t being run from your shell</h3>
<p>All of these directions only work if you&rsquo;re running the program <strong>from your
shell</strong>. If you&rsquo;re running the program from an IDE, from a GUI, in a cron job,
or some other way, you&rsquo;ll need to add the directory to your PATH in a different
way, and the exact details might depend on the situation.</p>
<p><strong>in a cron job</strong></p>
<p>Some options:</p>
<ul>
<li>use the full path to the program you&rsquo;re running, like <code>/home/bork/bin/my-program</code></li>
<li>put the full PATH you want as the first line of your crontab (something like
PATH=/bin:/usr/bin:/usr/local/bin:&hellip;.). You can get the full PATH you&rsquo;re
using in your shell by running <code>echo &quot;PATH=$PATH&quot;</code>.</li>
</ul>
<p>I&rsquo;m honestly not sure how to handle it in an IDE/GUI because I haven&rsquo;t run into
that in a long time, will add directions here if someone points me in the right
direction.</p>
<h3 id="problem-3-duplicate-path-entries-making-it-harder-to-debug">problem 3: duplicate <code>PATH</code> entries making it harder to debug</h3>
<p>If you edit your path and start a new shell by running <code>bash</code> (or <code>zsh</code>, or
<code>fish</code>), you&rsquo;ll often end up with duplicate <code>PATH</code> entries, because the shell
keeps adding new things to your <code>PATH</code> every time you start your shell.</p>
<p>Personally I don&rsquo;t think I&rsquo;ve run into a situation where this kind of
duplication breaks anything, but the duplicates can make it harder to debug
what&rsquo;s going on with your <code>PATH</code> if you&rsquo;re trying to understand its contents.</p>
<p>Some ways you could deal with this:</p>
<ol>
<li>If you&rsquo;re debugging your <code>PATH</code>, open a new terminal to do it in so you get
a &ldquo;fresh&rdquo; state. This should avoid the duplication.</li>
<li>Deduplicate your <code>PATH</code> at the end of your shell&rsquo;s config  (for example in
zsh apparently you can do this with <code>typeset -U path</code>)</li>
<li>Check that the directory isn&rsquo;t already in your <code>PATH</code> when adding it (for
example in fish I believe you can do this with <code>fish_add_path --path /some/directory</code>)</li>
</ol>
<p>How to deduplicate your <code>PATH</code> is shell-specific and there isn&rsquo;t always a
built in way to do it so you&rsquo;ll need to look up how to accomplish it in your
shell.</p>
<h3 id="problem-4-losing-your-history-after-updating-your-path">problem 4: losing your history after updating your <code>PATH</code></h3>
<p>Here&rsquo;s a situation that&rsquo;s easy to get into in bash or zsh:</p>
<ol>
<li>Run a command (it fails)</li>
<li>Update your <code>PATH</code></li>
<li>Run <code>bash</code> to reload your config</li>
<li>Press the up arrow a couple of times to rerun the failed command (or open a new terminal)</li>
<li>The failed command isn&rsquo;t in your history! Why not?</li>
</ol>
<p>This happens because in bash, by default, history is not saved until you exit
the shell.</p>
<p>Some options for fixing this:</p>
<ul>
<li>Instead of running <code>bash</code> to reload your config, run <code>source ~/.bashrc</code> (or
<code>source ~/.zshrc</code> in zsh). This will reload the config inside your current
session.</li>
<li>Configure your shell to continuously save your history instead of only saving
the history when the shell exits. (How to do this depends on whether you&rsquo;re
using bash or zsh, the history options in zsh are a bit complicated and I&rsquo;m
not exactly sure what the best way is)</li>
</ul>
<h3 id="a-note-on-source">a note on <code>source</code></h3>
<p>When you install <code>cargo</code> (Rust&rsquo;s installer) for the first time, it gives you
these instructions for how to set up your PATH, which don&rsquo;t mention a specific
directory at all.</p>
<pre><code>This is usually done by running one of the following (note the leading DOT):

. &quot;$HOME/.cargo/env&quot;        	# For sh/bash/zsh/ash/dash/pdksh
source &quot;$HOME/.cargo/env.fish&quot;  # For fish
</code></pre>
<p>The idea is that you add that line to your shell&rsquo;s config, and their script
automatically sets up your <code>PATH</code> (and potentially other things) for you.</p>
<p>This is pretty common (for example <a href="https://github.com/Homebrew/install/blob/deacfa6a6e62e5f4002baf9e1fac7a96e9aa5d41/install.sh#L1072-L1087">Homebrew</a> suggests you eval <code>brew shellenv</code>), and there are
two ways to approach this:</p>
<ol>
<li>Just do what the tool suggests (like adding <code>. &quot;$HOME/.cargo/env&quot;</code> to your shell&rsquo;s config)</li>
<li>Figure out which directories the script they&rsquo;re telling you to run would add
to your PATH, and then add those manually. Here&rsquo;s how I&rsquo;d do that:
<ul>
<li>Run <code>. &quot;$HOME/.cargo/env&quot;</code> in my shell (or the fish version if using fish)</li>
<li>Run <code>echo &quot;$PATH&quot; | tr ':' '\n' | grep cargo</code> to figure out which directories it added</li>
<li>See that it says <code>/Users/bork/.cargo/bin</code> and shorten that to <code>~/.cargo/bin</code></li>
<li>Add the directory <code>~/.cargo/bin</code> to PATH (with the directions in this post)</li>
</ul>
</li>
</ol>
<p>I don&rsquo;t think there&rsquo;s anything wrong with doing what the tool suggests (it
might be the &ldquo;best way&rdquo;!), but personally I usually use the second approach
because I prefer knowing exactly what configuration I&rsquo;m changing.</p>
<h3 id="a-note-on-fish-add-path">a note on <code>fish_add_path</code></h3>
<p>fish has a handy function called <code>fish_add_path</code> that you can run to add a directory to your <code>PATH</code> like this:</p>
<pre><code>fish_add_path /some/directory
</code></pre>
<p>This is cool (it&rsquo;s such a simple command!) but I&rsquo;ve stopped using it for a couple of reasons:</p>
<ol>
<li>Sometimes <code>fish_add_path</code> will update the <code>PATH</code> for every session in the
future (with a &ldquo;universal variable&rdquo;) and sometimes it will update the <code>PATH</code>
just for the current session and it&rsquo;s hard for me to tell which one it will
do. In theory the docs explain this but I could not understand them.</li>
<li>If you ever need to <em>remove</em> the directory from your <code>PATH</code> a few weeks or
months later because maybe you made a mistake, it&rsquo;s kind of hard to do
(there are <a href="https://github.com/fish-shell/fish-shell/issues/8604">instructions in this comments of this github issue though</a>).</li>
</ol>
<h3 id="that-s-all">that&rsquo;s all</h3>
<p>Hopefully this will help some people. Let me know (on Mastodon or Bluesky) if
you there are other major gotchas that have tripped you up when adding a
directory to your PATH, or if you have questions about this post!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Some terminal frustrations]]></title>
    <link href="https://jvns.ca/blog/2025/02/05/some-terminal-frustrations/"/>
    <updated>2025-02-05T16:57:00+00:00</updated>
    <id>https://jvns.ca/blog/2025/02/05/some-terminal-frustrations/</id>
    <content type="html"><![CDATA[<p>A few weeks ago I ran a terminal survey (you can <a href="https://jvns.ca/terminal-survey/results-bsky.html">read the results here</a>) and at the end I asked:</p>
<blockquote>
<p>What’s the most frustrating thing about using the terminal for you?</p>
</blockquote>
<p>1600 people answered, and I decided to spend a few days categorizing all the
responses. Along the way I learned that classifying qualitative data is not
easy but I gave it my best shot. I ended up building a custom
<a href="https://github.com/jvns/classificator">tool</a> to make it faster to categorize
everything.</p>
<p>As with all of my surveys the methodology isn&rsquo;t particularly scientific. I just
posted the survey to Mastodon and Twitter, ran it for a couple of days, and got
answers from whoever happened to see it and felt like responding.</p>
<p>Here are the top categories of frustrations!</p>
<p>I think it&rsquo;s worth keeping in mind while reading these comments that</p>
<ul>
<li>40% of people answering this survey have been using the terminal for <strong>21+ years</strong></li>
<li>95% of people answering the survey have been using the terminal for at least 4 years</li>
</ul>
<p>These comments aren&rsquo;t coming from total beginners.</p>
<p>Here are the categories of frustrations! The number in brackets is the number
of people with that frustration. I&rsquo;m mostly writing this up for myself because
I&rsquo;m trying to write a zine about the terminal and I wanted to get a sense for
what people are having trouble with.</p>
<h3 id="remembering-syntax-115">remembering syntax (115)</h3>
<p>People talked about struggles remembering:</p>
<ul>
<li>the syntax for CLI tools like awk, jq, sed, etc</li>
<li>the syntax for redirects</li>
<li>keyboard shortcuts for tmux, text editing, etc</li>
</ul>
<p>One example comment:</p>
<blockquote>
<p>There are just so many little &ldquo;trivia&rdquo; details to remember for full
functionality. Even after all these years I&rsquo;ll sometimes forget where it&rsquo;s 2
or 1 for stderr, or forget which is which for <code>&gt;</code> and <code>&gt;&gt;</code>.</p>
</blockquote>
<h3 id="switching-terminals-is-hard-91">switching terminals is hard (91)</h3>
<p>People talked about struggling with switching systems (for example home/work
computer or when SSHing) and running into:</p>
<ul>
<li>OS differences in keyboard shortcuts (like Linux vs Mac)</li>
<li>systems which don&rsquo;t have their preferred text editor (&ldquo;no vim&rdquo; or &ldquo;only vim&rdquo;)</li>
<li>different versions of the same command (like Mac OS grep vs GNU grep)</li>
<li>no tab completion</li>
<li>a shell they aren&rsquo;t used to (&ldquo;the subtle differences between zsh and bash&rdquo;)</li>
</ul>
<p>as well as differences inside the same system like pagers being not consistent
with each other (git diff pagers, other pagers).</p>
<p>One example comment:</p>
<blockquote>
<p>I got used to fish and vi mode which are not available when I ssh into
servers, containers.</p>
</blockquote>
<h3 id="color-85">color (85)</h3>
<p>Lots of problems with color, like:</p>
<ul>
<li>programs setting colors that are unreadable with a light background color</li>
<li>finding a colorscheme they like (and getting it to work consistently across different apps)</li>
<li>color not working inside several layers of SSH/tmux/etc</li>
<li>not liking the defaults</li>
<li>not wanting color at all and struggling to turn it off</li>
</ul>
<p>This comment felt relatable to me:</p>
<blockquote>
<p>Getting my terminal theme configured in a reasonable way between the terminal
emulator and fish (I did this years ago and remember it being tedious and
fiddly and now feel like I&rsquo;m locked into my current theme because it works
and I dread touching any of that configuration ever again).</p>
</blockquote>
<h3 id="keyboard-shortcuts-84">keyboard shortcuts (84)</h3>
<p>Half of the comments on keyboard shortcuts were about how on Linux/Windows, the
keyboard shortcut to copy/paste in the terminal is different from in the rest
of the OS.</p>
<p>Some other issues with keyboard shortcuts other than copy/paste:</p>
<ul>
<li>using <code>Ctrl-W</code> in a browser-based terminal and closing the window</li>
<li>the terminal only supports a limited set of keyboard shortcuts (no
<code>Ctrl-Shift-</code>, no <code>Super</code>, no <code>Hyper</code>, lots of <code>ctrl-</code> shortcuts aren&rsquo;t
possible like <code>Ctrl-,</code>)</li>
<li>the OS stopping you from using a terminal keyboard shortcut (like by default
Mac OS uses <code>Ctrl+left arrow</code> for something else)</li>
<li>issues using emacs in the terminal</li>
<li>backspace not working (2)</li>
</ul>
<h3 id="other-copy-and-paste-issues-75">other copy and paste issues (75)</h3>
<p>Aside from &ldquo;the keyboard shortcut for copy and paste is different&rdquo;, there were
a lot of OTHER issues with copy and paste, like:</p>
<ul>
<li>copying over SSH</li>
<li>how tmux and the terminal emulator both do copy/paste in different ways</li>
<li>dealing with many different clipboards (system clipboard, vim clipboard, the
&ldquo;middle click&rdquo; clipboard on Linux, tmux&rsquo;s clipboard, etc) and potentially
synchronizing them</li>
<li>random spaces added when copying from the terminal</li>
<li>pasting multiline commands which automatically get run in a terrifying way</li>
<li>wanting a way to copy text without using the mouse</li>
</ul>
<h3 id="discoverability-55">discoverability (55)</h3>
<p>There were lots of comments about this, which all came down to the same basic
complaint &ndash; it&rsquo;s hard to discover useful tools or features! This comment kind of
summed it all up:</p>
<blockquote>
<p>How difficult it is to learn independently. Most of what I know is an
assorted collection of stuff I&rsquo;ve been told by random people over the years.</p>
</blockquote>
<h3 id="steep-learning-curve-44">steep learning curve (44)</h3>
<p>A lot of comments about it generally having a steep learning curve. A couple of
example comments:</p>
<blockquote>
<p>After 15 years of using it, I’m not much faster than using it than I was 5 or
maybe even 10 years ago.</p>
</blockquote>
<p>and</p>
<blockquote>
<p>That I know I could make my life easier by learning more about the shortcuts
and commands and configuring the terminal but I don&rsquo;t spend the time because it
feels overwhelming.</p>
</blockquote>
<h3 id="history-42">history  (42)</h3>
<p>Some issues with shell history:</p>
<ul>
<li>history not being shared between terminal tabs (16)</li>
<li>limits that are too short (4)</li>
<li>history not being restored when terminal tabs are restored</li>
<li>losing history because the terminal crashed</li>
<li>not knowing how to search history</li>
</ul>
<p>One example comment:</p>
<blockquote>
<p>It wasted a lot of time until I figured it out and still annoys me that
&ldquo;history&rdquo; on zsh has such a small buffer;  I have to type &ldquo;history 0&rdquo; to get
any useful length of history.</p>
</blockquote>
<h3 id="bad-documentation-37">bad documentation (37)</h3>
<p>People talked about:</p>
<ul>
<li>documentation being generally opaque</li>
<li>lack of examples in man pages</li>
<li>programs which don&rsquo;t have man pages</li>
</ul>
<p>Here&rsquo;s a representative comment:</p>
<blockquote>
<p>Finding good examples and docs. Man pages often not enough, have to wade
through stack overflow</p>
</blockquote>
<h3 id="scrollback-36">scrollback (36)</h3>
<p>A few issues with scrollback:</p>
<ul>
<li>programs printing out too much data making you lose scrollback history</li>
<li>resizing the terminal messes up the scrollback</li>
<li>lack of timestamps</li>
<li>GUI programs that you start in the background printing stuff out that gets in
the way of other programs&rsquo; outputs</li>
</ul>
<p>One example comment:</p>
<blockquote>
<p>When resizing the terminal (in particular: making it narrower) leads to
broken rewrapping of the scrollback content because the commands formatted
their output based on the terminal window width.</p>
</blockquote>
<h3 id="it-feels-outdated-33">&ldquo;it feels outdated&rdquo; (33)</h3>
<p>Lots of comments about how the terminal feels hampered by legacy decisions and
how users often end up needing to learn implementation details that feel very
esoteric. One example comment:</p>
<blockquote>
<p>Most of the legacy cruft, it would be great to have a green field
implementation of the CLI interface.</p>
</blockquote>
<h3 id="shell-scripting-32">shell scripting (32)</h3>
<p>Lots of complaints about POSIX shell scripting. There&rsquo;s a general feeling that
shell scripting is difficult but also that switching to a different less
standard scripting language (fish, nushell, etc) brings its own problems.</p>
<blockquote>
<p>Shell scripting. My tolerance to ditch a shell script and go to a scripting
language is pretty low. It’s just too messy and powerful. Screwing up can be
costly so I don’t even bother.</p>
</blockquote>
<h3 id="more-issues">more issues</h3>
<p>Some more issues that were mentioned at least 10 times:</p>
<ul>
<li>(31) inconsistent command line arguments: is it -h or help or &ndash;help?</li>
<li>(24) keeping dotfiles in sync across different systems</li>
<li>(23) performance (e.g. &ldquo;my shell takes too long to start&rdquo;)</li>
<li>(20) window management (potentially with some combination of tmux tabs, terminal tabs, and multiple terminal windows. Where did that shell session go?)</li>
<li>(17) generally feeling scared/uneasy (&ldquo;The debilitating fear that I’m going
to do some mysterious Bad Thing with a command and I will have absolutely no
idea how to fix or undo it or even really figure out what happened&rdquo;)</li>
<li>(16) terminfo issues (&ldquo;Having to learn about terminfo if/when I try a new terminal emulator and ssh elsewhere.&rdquo;)</li>
<li>(16) lack of image support (sixel etc)</li>
<li>(15) SSH issues (like having to start over when you lose the SSH connection)</li>
<li>(15) various tmux/screen issues (for example lack of integration between tmux and the terminal emulator)</li>
<li>(15) typos &amp; slow typing</li>
<li>(13) the terminal getting messed up for various reasons (pressing <code>Ctrl-S</code>, <code>cat</code>ing a binary, etc)</li>
<li>(12) quoting/escaping in the shell</li>
<li>(11) various Windows/PowerShell issues</li>
</ul>
<h3 id="n-a-122">n/a (122)</h3>
<p>There were also 122 answers to the effect of &ldquo;nothing really&rdquo; or &ldquo;only that I
can&rsquo;t do EVERYTHING in the terminal&rdquo;</p>
<p>One example comment:</p>
<blockquote>
<p>Think I&rsquo;ve found work arounds for most/all frustrations</p>
</blockquote>
<h3 id="that-s-all">that&rsquo;s all!</h3>
<p>I&rsquo;m not going to make a lot of commentary on these results, but here are a
couple of categories that feel related to me:</p>
<ul>
<li>remembering syntax &amp; history (often the thing you need to remember is something you&rsquo;ve run before!)</li>
<li>discoverability &amp; the learning curve (the lack of discoverability is definitely a big part of what makes it hard to learn)</li>
<li>&ldquo;switching systems is hard&rdquo; &amp; &ldquo;it feels outdated&rdquo; (tools that haven&rsquo;t really
changed in 30 or 40 years have many problems but they do tend to be always
<em>there</em> no matter what system you&rsquo;re on, which is very useful and makes them
hard to stop using)</li>
</ul>
<p>Trying to categorize all these results in a reasonable way really gave me an
appreciation for social science researchers&rsquo; skills.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[What's involved in getting a "modern" terminal setup?]]></title>
    <link href="https://jvns.ca/blog/2025/01/11/getting-a-modern-terminal-setup/"/>
    <updated>2025-01-11T09:46:01+00:00</updated>
    <id>https://jvns.ca/blog/2025/01/11/getting-a-modern-terminal-setup/</id>
    <content type="html"><![CDATA[<p>Hello! Recently I ran a terminal survey and I asked people what frustrated
them. One person commented:</p>
<blockquote>
<p>There are so many pieces to having a modern terminal experience. I wish it
all came out of the box.</p>
</blockquote>
<p>My immediate reaction was &ldquo;oh, getting a modern terminal experience isn&rsquo;t that
hard, you just need to&hellip;.&rdquo;, but the more I thought about it, the longer the
&ldquo;you just need to&hellip;&rdquo; list got, and I kept thinking about more and more
caveats.</p>
<p>So I thought I would write down some notes about what it means to me personally
to have a &ldquo;modern&rdquo; terminal experience and what I think can make it hard for
people to get there.</p>
<h3 id="what-is-a-modern-terminal-experience">what is a &ldquo;modern terminal experience&rdquo;?</h3>
<p>Here are a few things that are important to me, with which part of the system
is responsible for them:</p>
<ul>
<li><strong>multiline support for copy and paste</strong>: if you paste 3 commands in your shell, it should not immediately run them all! That&rsquo;s scary! (<strong>shell</strong>, <strong>terminal emulator</strong>)</li>
<li><strong>infinite shell history</strong>: if I run a command in my shell, it should be saved forever, not deleted after 500 history entries or whatever. Also I want commands to be saved to the history immediately when I run them, not only when I exit the shell session (<strong>shell</strong>)</li>
<li><strong>a useful prompt</strong>: I can&rsquo;t live without having my <strong>current directory</strong> and <strong>current git branch</strong> in my prompt (<strong>shell</strong>)</li>
<li><strong>24-bit colour</strong>: this is important to me because I find it MUCH easier to theme neovim with 24-bit colour support than in a terminal with only 256 colours (<strong>terminal emulator</strong>)</li>
<li><strong>clipboard integration</strong> between vim and my operating system so that when I copy in Firefox, I can just press <code>p</code> in vim to paste (<strong>text editor</strong>, maybe the OS/terminal emulator too)</li>
<li><strong>good autocomplete</strong>: for example commands like git should have command-specific autocomplete (<strong>shell</strong>)</li>
<li><strong>having colours in <code>ls</code></strong> (<strong>shell config</strong>)</li>
<li><strong>a terminal theme I like</strong>: I spend a lot of time in my terminal, I want it to look nice and I want its theme to match my terminal editor&rsquo;s theme. (<strong>terminal emulator</strong>, <strong>text editor</strong>)</li>
<li><strong>automatic terminal fixing</strong>: If a programs prints out some weird escape
codes that mess up my terminal, I want that to automatically get reset so
that my terminal doesn&rsquo;t get messed up (<strong>shell</strong>)</li>
<li><strong>keybindings</strong>: I want <code>Ctrl+left arrow</code> to work (<strong>shell</strong> or <strong>application</strong>)</li>
<li><strong>being able to use the scroll wheel in programs like <code>less</code></strong>: (<strong>terminal emulator</strong> and <strong>applications</strong>)</li>
</ul>
<p>There are a million other terminal conveniences out there and different people
value different things, but those are the ones that I would be really unhappy
without.</p>
<h3 id="how-i-achieve-a-modern-experience">how I achieve a &ldquo;modern experience&rdquo;</h3>
<p>My basic approach is:</p>
<ol>
<li>use the <code>fish</code> shell. Mostly don&rsquo;t configure it, except to:
<ul>
<li>set the <code>EDITOR</code> environment variable to my favourite terminal editor</li>
<li>alias <code>ls</code> to <code>ls --color=auto</code></li>
</ul>
</li>
<li>use any terminal emulator with 24-bit colour support. In the past I&rsquo;ve used
GNOME Terminal, Terminator, and iTerm, but I&rsquo;m not picky about this. I don&rsquo;t really
configure it other than to choose a font.</li>
<li>use <code>neovim</code>, with a configuration that I&rsquo;ve been very slowly building over the last 9 years or so (the last time I deleted my vim config and started from scratch was 9 years ago)</li>
<li>use the <a href="https://github.com/chriskempson/base16">base16 framework</a> to theme everything</li>
</ol>
<p>A few things that affect my approach:</p>
<ul>
<li>I don&rsquo;t spend a lot of time SSHed into other machines</li>
<li>I&rsquo;d rather use the mouse a little than come up with keyboard-based ways to do everything</li>
<li>I work on a lot of small projects, not one big project</li>
</ul>
<h3 id="some-out-of-the-box-options-for-a-modern-experience">some &ldquo;out of the box&rdquo; options for a &ldquo;modern&rdquo; experience</h3>
<p>What if you want a nice experience, but don&rsquo;t want to spend a lot of time on
configuration? Figuring out how to configure vim in a way that I was satisfied
with really did take me like ten years, which is a long time!</p>
<p>My best ideas for how to get a reasonable terminal experience with minimal
config are:</p>
<ul>
<li>shell: either <code>fish</code> or <code>zsh</code> with <a href="https://ohmyz.sh/">oh-my-zsh</a></li>
<li>terminal emulator: almost anything with 24-bit colour support, for example all of these are popular:
<ul>
<li>linux: GNOME Terminal, Konsole, Terminator, xfce4-terminal</li>
<li>mac: iTerm (Terminal.app doesn&rsquo;t have 256-colour support)</li>
<li>cross-platform: kitty, alacritty, wezterm, or ghostty</li>
</ul>
</li>
<li>shell config:
<ul>
<li>set the <code>EDITOR</code> environment variable to your favourite terminal text
editor</li>
<li>maybe alias <code>ls</code> to <code>ls --color=auto</code></li>
</ul>
</li>
<li>text editor: this is a tough one, maybe <a href="https://micro-editor.github.io/">micro</a> or <a href="https://helix-editor.com/">helix</a>? I haven&rsquo;t used
either of them seriously but they both seem like very cool projects and I
think it&rsquo;s amazing that you can just use all the usual GUI editor commands
(<code>Ctrl-C</code> to copy, <code>Ctrl-V</code> to paste, <code>Ctrl-A</code> to select all) in micro and
they do what you&rsquo;d expect. I would probably try switching to helix except
that retraining my vim muscle memory seems way too hard. Also helix doesn&rsquo;t
have a GUI or plugin system yet.</li>
</ul>
<p>Personally I <strong>wouldn&rsquo;t</strong> use xterm, rxvt, or Terminal.app as a terminal emulator,
because I&rsquo;ve found in the past that they&rsquo;re missing core features (like 24-bit
colour in Terminal.app&rsquo;s case) that make the terminal harder to use for me.</p>
<p>I don&rsquo;t want to pretend that getting a &ldquo;modern&rdquo; terminal experience is easier
than it is though &ndash; I think there are two issues that make it hard. Let&rsquo;s talk
about them!</p>
<h3 id="issue-1-with-getting-to-a-modern-experience-the-shell">issue 1 with getting to a &ldquo;modern&rdquo; experience: the shell</h3>
<p>bash and zsh are by far the two most popular shells, and neither of them
provide a default experience that I would be happy using out of the box, for
example:</p>
<ul>
<li>you need to customize your prompt</li>
<li>they don&rsquo;t come with git completions by default, you have to set them up</li>
<li>by default, bash only stores 500 (!) lines of history and (at least on Mac OS)
zsh is only configured to store 2000 lines, which is still not a lot</li>
<li>I find bash&rsquo;s tab completion very frustrating, if there&rsquo;s more than
one match then you can&rsquo;t tab through them</li>
</ul>
<p>And even though <a href="https://jvns.ca/blog/2024/09/12/reasons-i--still--love-fish/">I love fish</a>, the fact
that it isn&rsquo;t POSIX does make it hard for a lot of folks to make the switch.</p>
<p>Of course it&rsquo;s totally possible to learn how to customize your prompt in bash
or whatever, and it doesn&rsquo;t even need to be that complicated (in bash I&rsquo;d
probably start with something like <code>export PS1='[\u@\h \W$(__git_ps1 &quot; (%s)&quot;)]\$ '</code>, or maybe use <a href="https://starship.rs/">starship</a>).
But each of these &ldquo;not complicated&rdquo; things really does add up and it&rsquo;s
especially tough if you need to keep your config in sync across several
systems.</p>
<p>An extremely popular solution to getting a &ldquo;modern&rdquo; shell experience is
<a href="https://ohmyz.sh/">oh-my-zsh</a>. It seems like a great project and I know a lot
of people use it very happily, but I&rsquo;ve struggled with configuration systems
like that in the past &ndash; it looks like right now the base oh-my-zsh adds about
3000 lines of config, and often I find that having an extra configuration
system makes it harder to debug what&rsquo;s happening when things go wrong. I
personally have a tendency to use the system to add a lot of extra plugins,
make my system slow, get frustrated that it&rsquo;s slow, and then delete it
completely and write a new config from scratch.</p>
<h3 id="issue-2-with-getting-to-a-modern-experience-the-text-editor">issue 2 with getting to a &ldquo;modern&rdquo; experience: the text editor</h3>
<p>In the terminal survey I ran recently, the most popular terminal text editors
by far were <code>vim</code>, <code>emacs</code>, and <code>nano</code>.</p>
<p>I think the main options for terminal text editors are:</p>
<ul>
<li>use vim or emacs and configure it to your liking, you can probably have any
feature you want if you put in the work</li>
<li>use nano and accept that you&rsquo;re going to have a pretty limited experience
(for example I don&rsquo;t think you can select text with the mouse and then &ldquo;cut&rdquo;
it in nano)</li>
<li>use <code>micro</code> or <code>helix</code> which seem to offer a pretty good out-of-the-box
experience, potentially occasionally run into issues with using a less
mainstream text editor</li>
<li>just avoid using a terminal text editor as much as possible, maybe use VSCode, use
VSCode&rsquo;s terminal for all your terminal needs, and mostly never edit files in
the terminal. Or I know a lot of people use <code>code</code> as their <code>EDITOR</code> in the terminal.</li>
</ul>
<h3 id="issue-3-individual-applications">issue 3: individual applications</h3>
<p>The last issue is that sometimes individual programs that I use are kind of
annoying. For example on my Mac OS machine, <code>/usr/bin/sqlite3</code> doesn&rsquo;t support
the <code>Ctrl+Left Arrow</code> keyboard shortcut. Fixing this to get a reasonable
terminal experience in SQLite was a little complicated, I had to:</p>
<ul>
<li>realize why this is happening (Mac OS won&rsquo;t ship GNU tools, and &ldquo;Ctrl-Left arrow&rdquo; support comes from GNU readline)</li>
<li>find a workaround (install sqlite from homebrew, which does have readline support)</li>
<li>adjust my environment (put Homebrew&rsquo;s sqlite3 in my PATH)</li>
</ul>
<p>I find that debugging application-specific issues like this is really not easy
and often it doesn&rsquo;t feel &ldquo;worth it&rdquo; &ndash; often I&rsquo;ll end up just dealing with
various minor inconveniences because I don&rsquo;t want to spend hours investigating
them. The only reason I was even able to figure this one out at all is that
I&rsquo;ve been spending a huge amount of time thinking about the terminal recently.</p>
<p>A big part of having a &ldquo;modern&rdquo; experience using terminal programs is just
using newer terminal programs, for example I can&rsquo;t be bothered to learn a
keyboard shortcut to sort the columns in <code>top</code>, but in <code>htop</code>  I can just click
on a column heading with my mouse to sort it. So I use htop instead! But discovering new more &ldquo;modern&rdquo; command line tools isn&rsquo;t easy (though
I made <a href="https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-line-tools/">a list here</a>),
finding ones that I actually like using in practice takes time, and if you&rsquo;re
SSHed into another machine, they won&rsquo;t always be there.</p>
<h3 id="everything-affects-everything-else">everything affects everything else</h3>
<p>Something I find tricky about configuring my terminal to make everything &ldquo;nice&rdquo;
is that changing one seemingly small thing about my workflow can really affect
everything else. For example right now I don&rsquo;t use tmux. But if I needed to use
tmux again (for example because I was doing a lot of work SSHed into another
machine), I&rsquo;d need to think about a few things, like:</p>
<ul>
<li>if I wanted tmux&rsquo;s copy to synchronize with my system clipboard over
SSH, I&rsquo;d need to make sure that my terminal emulator has <a href="https://old.reddit.com/r/vim/comments/k1ydpn/a_guide_on_how_to_copy_text_from_anywhere/">OSC 52 support</a></li>
<li>if I wanted to use iTerm&rsquo;s tmux integration (which makes tmux tabs into iTerm
tabs), I&rsquo;d need to change how I configure colours &ndash; right now I set them
with a <a href="https://github.com/chriskempson/base16-shell/blob/588691ba71b47e75793ed9edfcfaa058326a6f41/scripts/base16-solarized-light.sh">shell script</a> that I run when my shell starts, but that means the
colours get lost when restoring a tmux session.</li>
</ul>
<p>and probably more things I haven&rsquo;t thought of. &ldquo;Using tmux means that I have to
change how I manage my colours&rdquo; sounds unlikely, but that really did happen to
me and I decided &ldquo;well, I don&rsquo;t want to change how I manage colours right now,
so I guess I&rsquo;m not using that feature!&rdquo;.</p>
<p>It&rsquo;s also hard to remember which features I&rsquo;m relying on &ndash; for example maybe
my current terminal <em>does</em> have OSC 52 support and because copying from tmux over SSH
has always Just Worked I don&rsquo;t even realize that that&rsquo;s something I need, and
then it mysteriously stops working when I switch terminals.</p>
<h3 id="change-things-slowly">change things slowly</h3>
<p>Personally even though I think my setup is not <em>that</em> complicated, it&rsquo;s taken
me 20 years to get to this point! Because terminal config changes are so likely
to have unexpected and hard-to-understand consequences, I&rsquo;ve found that if I
change a lot of terminal configuration all at once it makes it much harder to
understand what went wrong if there&rsquo;s a problem, which can be really
disorienting.</p>
<p>So I usually prefer to make pretty small changes, and accept that changes can
might take me a REALLY long time to get used to. For example I switched from
using <code>ls</code> to <a href="https://github.com/eza-community/eza">eza</a> a year or two ago and
while I like it (because <code>eza -l</code> prints human-readable file sizes by default)
I&rsquo;m still not quite sure about it. But also sometimes it&rsquo;s worth it to make a
big change, like I made the switch to fish (from bash) 10 years ago and I&rsquo;m
very happy I did.</p>
<h3 id="getting-a-modern-terminal-is-not-that-easy">getting a &ldquo;modern&rdquo; terminal is not that easy</h3>
<p>Trying to explain how &ldquo;easy&rdquo; it is to configure your terminal really just made
me think that it&rsquo;s kind of hard and that I still sometimes get confused.</p>
<p>I&rsquo;ve found that there&rsquo;s never one perfect way to configure things in the
terminal that will be compatible with every single other thing. I just need to
try stuff, figure out some kind of locally stable state that works for me, and
accept that if I start using a new tool it might disrupt the system and I might
need to rethink things.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA["Rules" that terminal programs follow]]></title>
    <link href="https://jvns.ca/blog/2024/11/26/terminal-rules/"/>
    <updated>2024-12-12T09:28:22+00:00</updated>
    <id>https://jvns.ca/blog/2024/11/26/terminal-rules/</id>
    <content type="html"><![CDATA[<p>Recently I&rsquo;ve been thinking about how everything that happens in the terminal
is some combination of:</p>
<ol>
<li>Your <strong>operating system</strong>&rsquo;s job</li>
<li>Your <strong>shell</strong>&rsquo;s job</li>
<li>Your <strong>terminal emulator</strong>&rsquo;s job</li>
<li>The job of <strong>whatever program you happen to be running</strong> (like <code>top</code> or <code>vim</code> or <code>cat</code>)</li>
</ol>
<p>The first three (your operating system, shell, and terminal emulator) are all kind of
known quantities &ndash; if you&rsquo;re using bash in GNOME Terminal on Linux, you can
more or less reason about how how all of those things interact, and some of
their behaviour is standardized by POSIX.</p>
<p>But the fourth one (&ldquo;whatever program you happen to be running&rdquo;) feels like it
could do ANYTHING. How are you supposed to know how a program is going to
behave?</p>
<p>This post is kind of long so here&rsquo;s a quick table of contents:</p>
<ul>
<li><a href="#programs-behave-surprisingly-consistently">programs behave surprisingly consistently</a></li>
<li><a href="#these-are-meant-to-be-descriptive-not-prescriptive">these are meant to be descriptive, not prescriptive</a></li>
<li><a href="#it-s-not-always-obvious-which-rules-are-the-program-s-responsibility-to-implement">it&rsquo;s not always obvious which &ldquo;rules&rdquo; are the program&rsquo;s responsibility to implement</a></li>
<li><a href="#rule-1-noninteractive-programs-should-quit-when-you-press-ctrl-c">rule 1: noninteractive programs should quit when you press <code>Ctrl-C</code></a></li>
<li><a href="#rule-2-tuis-should-quit-when-you-press-q">rule 2: TUIs should quit when you press <code>q</code></a></li>
<li><a href="#rule-3-repls-should-quit-when-you-press-ctrl-d-on-an-empty-line">rule 3: REPLs should quit when you press <code>Ctrl-D</code> on an empty line</a></li>
<li><a href="#rule-4-don-t-use-more-than-16-colours">rule 4: don&rsquo;t use more than 16 colours</a></li>
<li><a href="#rule-5-vaguely-support-readline-keybindings">rule 5: vaguely support readline keybindings</a></li>
<li><a href="#rule-5-1-ctrl-w-should-delete-the-last-word">rule 5.1: <code>Ctrl-W</code> should delete the last word</a></li>
<li><a href="#rule-6-disable-colours-when-writing-to-a-pipe">rule 6: disable colours when writing to a pipe</a></li>
<li><a href="#rule-7-means-stdin-stdout">rule 7: <code>-</code> means stdin/stdout</a></li>
<li><a href="#these-rules-take-a-long-time-to-learn">these &ldquo;rules&rdquo; take a long time to learn</a></li>
</ul>
<h3 id="programs-behave-surprisingly-consistently">programs behave surprisingly consistently</h3>
<p>As far as I know, there are no real standards for how programs in the terminal
should behave &ndash; the closest things I know of are:</p>
<ul>
<li>POSIX, which mostly dictates how your terminal emulator / OS / shell should
work together. I think it does specify a few things about how core utilities like
<code>cp</code> should work but AFAIK it doesn&rsquo;t have anything to say about how for
example <code>htop</code> should behave.</li>
<li>these <a href="https://clig.dev/">command line interface guidelines</a></li>
</ul>
<p>But even though there are no standards, in my experience programs in the
terminal behave in a pretty consistent way. So I wanted to write down a list of
&ldquo;rules&rdquo; that in my experience programs mostly follow.</p>
<h3 id="these-are-meant-to-be-descriptive-not-prescriptive">these are meant to be descriptive, not prescriptive</h3>
<p>My goal here isn&rsquo;t to convince authors of terminal programs that they <em>should</em>
follow any of these rules. There are lots of exceptions to these and often
there&rsquo;s a good reason for those exceptions.</p>
<p>But it&rsquo;s very useful for me to know what behaviour to expect from a random new
terminal program that I&rsquo;m using. Instead of &ldquo;uh, programs could do literally
anything&rdquo;, it&rsquo;s &ldquo;ok, here are the basic rules I expect, and then I can keep a
short mental list of exceptions&rdquo;.</p>
<p>So I&rsquo;m just writing down what I&rsquo;ve observed about how programs behave in my 20
years of using the terminal, why I think they behave that way, and some
examples of cases where that rule is &ldquo;broken&rdquo;.</p>
<h3 id="it-s-not-always-obvious-which-rules-are-the-program-s-responsibility-to-implement">it&rsquo;s not always obvious which &ldquo;rules&rdquo; are the program&rsquo;s responsibility to implement</h3>
<p>There are a bunch of common conventions that I think are pretty clearly the
program&rsquo;s responsibility to implement, like:</p>
<ul>
<li>config files should go in <code>~/.BLAHrc</code> or <code>~/.config/BLAH/FILE</code> or <code>/etc/BLAH/</code> or something</li>
<li><code>--help</code> should print help text</li>
<li>programs should print &ldquo;regular&rdquo; output to stdout and errors to stderr</li>
</ul>
<p>But in this post I&rsquo;m going to focus on things that it&rsquo;s not 100% obvious are
the program&rsquo;s responsibility. For example it feels to me like a &ldquo;law of nature&rdquo;
that pressing <code>Ctrl-D</code> should quit a REPL, but programs often
need to explicitly implement support for it &ndash; even though <code>cat</code> doesn&rsquo;t need
to implement <code>Ctrl-D</code> support, <code>ipython</code> <a href="https://github.com/prompt-toolkit/python-prompt-toolkit/blob/a2a12300c635ab3c051566e363ed27d853af4b21/src/prompt_toolkit/shortcuts/prompt.py#L824-L837">does</a>. (more about that in &ldquo;rule 3&rdquo; below)</p>
<p>Understanding which things are the program&rsquo;s responsibility makes it much less
surprising when different programs&rsquo; implementations are slightly different.</p>
<h3 id="rule-1-noninteractive-programs-should-quit-when-you-press-ctrl-c">rule 1: noninteractive programs should quit when you press <code>Ctrl-C</code></h3>
<p>The main reason for this rule is that noninteractive programs will quit by
default on <code>Ctrl-C</code> if they don&rsquo;t set up a <code>SIGINT</code> signal handler, so this is
kind of a &ldquo;you should act like the default&rdquo; rule.</p>
<p>Something that trips a lot of people up is that this doesn&rsquo;t apply to
<strong>interactive</strong> programs like <code>python3</code> or <code>bc</code> or <code>less</code>. This is because in
an interactive program, <code>Ctrl-C</code> has a different job &ndash; if the program is
running an operation (like for example a search in <code>less</code> or some Python code
in <code>python3</code>), then <code>Ctrl-C</code> will interrupt that operation but not stop the
program.</p>
<p>As an example of how this works in an interactive program: here&rsquo;s the code <a href="https://github.com/prompt-toolkit/python-prompt-toolkit/blob/a2a12300c635ab3c051566e363ed27d853af4b21/src/prompt_toolkit/key_binding/bindings/vi.py#L2225">in prompt-toolkit</a> (the library that iPython uses for handling input)
that aborts a search when you press <code>Ctrl-C</code>.</p>
<h3 id="rule-2-tuis-should-quit-when-you-press-q">rule 2: TUIs should quit when you press <code>q</code></h3>
<p>TUI programs (like <code>less</code> or <code>htop</code>) will usually quit when you press <code>q</code>.</p>
<p>This rule doesn&rsquo;t apply to any program where pressing <code>q</code> to quit wouldn&rsquo;t make
sense, like <code>tmux</code> or text editors.</p>
<h3 id="rule-3-repls-should-quit-when-you-press-ctrl-d-on-an-empty-line">rule 3: REPLs should quit when you press <code>Ctrl-D</code> on an empty line</h3>
<p>REPLs (like <code>python3</code> or <code>ed</code>) will usually quit when you press <code>Ctrl-D</code> on an
empty line. This rule is similar to the <code>Ctrl-C</code> rule &ndash; the reason for this is
that by default if you&rsquo;re running a program (like <code>cat</code>) in &ldquo;cooked mode&rdquo;, then
the operating system will return an <code>EOF</code> when you press <code>Ctrl-D</code> on an empty
line.</p>
<p>Most of the REPLs I use (sqlite3, python3, fish, bash, etc) don&rsquo;t actually use
cooked mode, but they all implement this keyboard shortcut anyway to mimic the
default behaviour.</p>
<p>For example, here&rsquo;s <a href="https://github.com/prompt-toolkit/python-prompt-toolkit/blob/a2a12300c635ab3c051566e363ed27d853af4b21/src/prompt_toolkit/shortcuts/prompt.py#L824-L837">the code in prompt-toolkit</a>
that quits when you press Ctrl-D, and here&rsquo;s <a href="https://github.com/bminor/bash/blob/6794b5478f660256a1023712b5fc169196ed0a22/lib/readline/readline.c#L658-L672">the same code in readline</a>.</p>
<p>I actually thought that this one was a &ldquo;Law of Terminal Physics&rdquo; until very
recently because I&rsquo;ve basically never seen it broken, but you can see that it&rsquo;s
just something that each individual input library has to implement in the links
above.</p>
<p>Someone pointed out that the Erlang REPL does not quit when you press <code>Ctrl-D</code>,
so I guess not every REPL follows this &ldquo;rule&rdquo;.</p>
<h3 id="rule-4-don-t-use-more-than-16-colours">rule 4: don&rsquo;t use more than 16 colours</h3>
<p>Terminal programs rarely use colours other than the base 16 ANSI colours. This
is because if you specify colours with a hex code, it&rsquo;s very likely to clash
with some users&rsquo; background colour. For example if I print out some text as
<code>#EEEEEE</code>, it would be almost invisible on a white background, though it would
look fine on a dark background.</p>
<p>But if you stick to the default 16 base colours, you have a much better chance
that the user has configured those colours in their terminal emulator so that
they work reasonably well with their background color. Another reason to stick
to the default base 16 colours is that it makes less assumptions about what
colours the terminal emulator supports.</p>
<p>The only programs I usually see breaking this &ldquo;rule&rdquo; are text editors, for
example Helix by default will use a purple background which is not a default
ANSI colour. It seems fine for Helix to break this rule since Helix isn&rsquo;t a
&ldquo;core&rdquo; program and I assume any Helix user who doesn&rsquo;t like that colorscheme
will just change the theme.</p>
<h3 id="rule-5-vaguely-support-readline-keybindings">rule 5: vaguely support readline keybindings</h3>
<p>Almost every program I use supports <code>readline</code> keybindings if it would make
sense to do so. For example, here are a bunch of different programs and a link
to where they define <code>Ctrl-E</code> to go to the end of the line:</p>
<ul>
<li>ipython (<a href="https://github.com/prompt-toolkit/python-prompt-toolkit/blob/a2a12300c635ab3c051566e363ed27d853af4b21/src/prompt_toolkit/key_binding/bindings/emacs.py#L72">Ctrl-E defined here</a>)</li>
<li>atuin (<a href="https://github.com/atuinsh/atuin/blob/a67cfc82fe0dc907a01f07a0fd625701e062a33b/crates/atuin/src/command/client/search/interactive.rs#L407">Ctrl-E defined here</a>)</li>
<li>fzf (<a href="https://github.com/junegunn/fzf/blob/bb55045596d6d08445f3c6d320c3ec2b457462d1/src/terminal.go#L611">Ctrl-E defined here</a>)</li>
<li>zsh (<a href="https://github.com/zsh-users/zsh/blob/86d5f24a3d28541f242eb3807379301ea976de87/Src/Zle/zle_bindings.c#L94">Ctrl-E defined here</a>)</li>
<li>fish (<a href="https://github.com/fish-shell/fish-shell/blob/99fa8aaaa7956178973150a03ce4954ab17a197b/share/functions/fish_default_key_bindings.fish#L43">Ctrl-E defined here</a>)</li>
<li>tmux&rsquo;s command prompt (<a href="https://github.com/tmux/tmux/blob/ae8f2208c98e3c2d6e3fe4cad2281dce8fd11f31/key-bindings.c#L490">Ctrl-E defined here</a>)</li>
</ul>
<p>None of those programs actually uses <code>readline</code> directly, they just sort of
mimic emacs/readline keybindings. They don&rsquo;t always mimic them <em>exactly</em>: for
example atuin seems to use <code>Ctrl-A</code> as a prefix, so <code>Ctrl-A</code> doesn&rsquo;t go to the
beginning of the line.</p>
<p>Also all of these programs seem to implement their own internal cut and paste
buffers so you can delete a line with <code>Ctrl-U</code> and then paste it with <code>Ctrl-Y</code>.</p>
<p>The exceptions to this are:</p>
<ul>
<li>some programs (like <code>git</code>, <code>cat</code>, and <code>nc</code>) don&rsquo;t have any line editing support at all (except for backspace, <code>Ctrl-W</code>, and <code>Ctrl-U</code>)</li>
<li>as usual text editors are an exception, every text editor has its own
approach to editing text</li>
</ul>
<p>I wrote more about this &ldquo;what keybindings does a program support?&rdquo; question in
<a href="https://jvns.ca/blog/2024/07/08/readline/">entering text in the terminal is complicated</a>.</p>
<h3 id="rule-5-1-ctrl-w-should-delete-the-last-word">rule 5.1: Ctrl-W should delete the last word</h3>
<p>I&rsquo;ve never seen a program (other than a text editor) where <code>Ctrl-W</code> <em>doesn&rsquo;t</em>
delete the last word. This is similar to the <code>Ctrl-C</code> rule &ndash; by default if a
program is in &ldquo;cooked mode&rdquo;, the OS will delete the last word if you press
<code>Ctrl-W</code>, and delete the whole line if you press <code>Ctrl-U</code>. So usually programs
will imitate that behaviour.</p>
<p>I can&rsquo;t think of any exceptions to this other than text editors but if there
are I&rsquo;d love to hear about them!</p>
<h3 id="rule-6-disable-colours-when-writing-to-a-pipe">rule 6: disable colours when writing to a pipe</h3>
<p>Most programs will disable colours when writing to a pipe. For example:</p>
<ul>
<li><code>rg blah</code> will highlight all occurrences of <code>blah</code> in the output, but if the
output is to a pipe or a file, it&rsquo;ll turn off the highlighting.</li>
<li><code>ls --color=auto</code> will use colour when writing to a terminal, but not when
writing to a pipe</li>
</ul>
<p>Both of those programs will also format their output differently when writing
to the terminal: <code>ls</code> will organize files into columns, and ripgrep will group
matches with headings.</p>
<p>If you want to force the program to use colour (for example because you want to
look at the colour), you can use <code>unbuffer</code> to force the program&rsquo;s output to be
a tty like this:</p>
<pre><code>unbuffer rg blah |  less -R
</code></pre>
<p>I&rsquo;m sure that there are some programs that &ldquo;break&rdquo; this rule but I can&rsquo;t think
of any examples right now. Some programs have an <code>--color</code> flag that you can
use to force colour to be on, in the example above you could also do <code>rg --color=always | less -R</code>.</p>
<h3 id="rule-7-means-stdin-stdout">rule 7: <code>-</code> means stdin/stdout</h3>
<p>Usually if you pass <code>-</code> to a program instead of a filename, it&rsquo;ll read from
stdin or write to stdout (whichever is appropriate). For example, if you want
to format the Python code that&rsquo;s on your clipboard with <code>black</code> and then copy
it, you could run:</p>
<pre><code>pbpaste | black - | pbcopy
</code></pre>
<p>(<code>pbpaste</code> is a Mac program, you can do something similar on Linux with <code>xclip</code>)</p>
<p>My impression is that most programs implement this if it would make sense and I
can&rsquo;t think of any exceptions right now, but I&rsquo;m sure there are many
exceptions.</p>
<h3 id="these-rules-take-a-long-time-to-learn">these &ldquo;rules&rdquo; take a long time to learn</h3>
<p>These rules took me a long time for me to learn because I had to:</p>
<ol>
<li>learn that the rule applied anywhere at all (&quot;<code>Ctrl-C</code> will exit programs&quot;)</li>
<li>notice some exceptions (&ldquo;okay, <code>Ctrl-C</code> will exit <code>find</code> but not <code>less</code>&rdquo;)</li>
<li>subconsciously figure out what the pattern is (&quot;<code>Ctrl-C</code> will generally quit
noninteractive programs, but in interactive programs it might interrupt the
current operation instead of quitting the program&quot;)</li>
<li>eventually maybe formulate it into an explicit rule that I know</li>
</ol>
<p>A lot of my understanding of the terminal is honestly still in the
&ldquo;subconscious pattern recognition&rdquo; stage. The only reason I&rsquo;ve been taking the
time to make things explicit at all is because I&rsquo;ve been trying to explain how
it works to others. Hopefully writing down these &ldquo;rules&rdquo; explicitly will make
learning some of this stuff a little bit faster for others.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Why pipes sometimes get "stuck": buffering]]></title>
    <link href="https://jvns.ca/blog/2024/11/29/why-pipes-get-stuck-buffering/"/>
    <updated>2024-11-29T08:23:31+00:00</updated>
    <id>https://jvns.ca/blog/2024/11/29/why-pipes-get-stuck-buffering/</id>
    <content type="html"><![CDATA[<p>Here&rsquo;s a niche terminal problem that has bothered me for years but that I never
really understood until a few weeks ago. Let&rsquo;s say you&rsquo;re running this command
to watch for some specific output in a log file:</p>
<pre><code>tail -f /some/log/file | grep thing1 | grep thing2
</code></pre>
<p>If log lines are being added to the file relatively slowly, the result I&rsquo;d see
is&hellip; nothing! It doesn&rsquo;t matter if there were matches in the log file or not,
there just wouldn&rsquo;t be any output.</p>
<p>I internalized this as &ldquo;uh, I guess pipes just get stuck sometimes and don&rsquo;t
show me the output, that&rsquo;s weird&rdquo;, and I&rsquo;d handle it by just
running <code>grep thing1 /some/log/file | grep thing2</code> instead, which would work.</p>
<p>So as I&rsquo;ve been doing a terminal deep dive over the last few months I was
really excited to finally learn exactly why this happens.</p>
<h3 id="why-this-happens-buffering">why this happens: buffering</h3>
<p>The reason why &ldquo;pipes get stuck&rdquo; sometimes is that it&rsquo;s VERY common for
programs to buffer their output before writing it to a pipe or file. So the
pipe is working fine, the problem is that the program never even wrote the data
to the pipe!</p>
<p>This is for performance reasons: writing all output immediately as soon as you
can uses more system calls, so it&rsquo;s more efficient to save up data until you
have 8KB or so of data to write (or until the program exits) and THEN write it
to the pipe.</p>
<p>In this example:</p>
<pre><code>tail -f /some/log/file | grep thing1 | grep thing2
</code></pre>
<p>the problem is that <code>grep thing1</code> is saving up all of its matches until it has
8KB of data to write, which might literally never happen.</p>
<h3 id="programs-don-t-buffer-when-writing-to-a-terminal">programs don&rsquo;t buffer when writing to a terminal</h3>
<p>Part of why I found this so disorienting is that <code>tail -f file | grep thing</code>
will work totally fine, but then when you add the second <code>grep</code>, it stops
working!! The reason for this is that the way <code>grep</code> handles buffering depends
on whether it&rsquo;s writing to a terminal or not.</p>
<p>Here&rsquo;s how <code>grep</code> (and many other programs) decides to buffer its output:</p>
<ul>
<li>Check if stdout is a terminal or not using the <code>isatty</code> function
<ul>
<li>If it&rsquo;s a terminal, use line buffering (print every line immediately as soon as you have it)</li>
<li>Otherwise, use &ldquo;block buffering&rdquo; &ndash; only print data if you have at least 8KB or so of data to print</li>
</ul>
</li>
</ul>
<p>So if <code>grep</code> is writing directly to your terminal then you&rsquo;ll see the line as
soon as it&rsquo;s printed, but if it&rsquo;s writing to a pipe, you won&rsquo;t.</p>
<p>Of course the buffer size isn&rsquo;t always 8KB for every program, it depends on the implementation. For <code>grep</code> the buffering is handled by libc, and libc&rsquo;s buffer size is
defined in the <code>BUFSIZ</code> variable. <a href="https://github.com/bminor/glibc/blob/c69e8cccaff8f2d89cee43202623b33e6ef5d24a/libio/stdio.h#L100">Here&rsquo;s where that&rsquo;s defined in glibc</a>.</p>
<p>(as an aside: &ldquo;programs do not use 8KB output buffers when writing to a
terminal&rdquo; isn&rsquo;t, like, a law of terminal physics, a program COULD use an 8KB
buffer when writing output to a terminal if it wanted, it would just be
extremely weird if it did that, I can&rsquo;t think of any program that behaves that
way)</p>
<h3 id="commands-that-buffer-commands-that-don-t">commands that buffer &amp; commands that don&rsquo;t</h3>
<p>One annoying thing about this buffering behaviour is that you kind of need to
remember which commands buffer their output when writing to a pipe.</p>
<p>Some commands that <strong>don&rsquo;t</strong> buffer their output:</p>
<ul>
<li>tail</li>
<li>cat</li>
<li>tee</li>
</ul>
<p>I think almost everything else will buffer output, especially if it&rsquo;s a command
where you&rsquo;re likely to be using it for batch processing. Here&rsquo;s a list of some
common commands that buffer their output when writing to a pipe, along with the
flag that disables block buffering.</p>
<ul>
<li>grep (<code>--line-buffered</code>)</li>
<li>sed (<code>-u</code>)</li>
<li>awk (there&rsquo;s a <code>fflush()</code> function)</li>
<li>tcpdump (<code>-l</code>)</li>
<li>jq (<code>-u</code>)</li>
<li>tr (<code>-u</code>)</li>
<li>cut (can&rsquo;t disable buffering)</li>
</ul>
<p>Those are all the ones I can think of, lots of unix commands (like <code>sort</code>) may
or may not buffer their output but it doesn&rsquo;t matter because <code>sort</code> can&rsquo;t do
anything until it finishes receiving input anyway.</p>
<p>Also I did my best to test both the Mac OS and GNU versions of these but there
are a lot of variations and I might have made some mistakes.</p>
<h3 id="programming-languages-where-the-default-print-statement-buffers">programming languages where the default &ldquo;print&rdquo; statement buffers</h3>
<p>Also, here are a few programming language where the default print statement
will buffer output when writing to a pipe, and some ways to disable buffering
if you want:</p>
<ul>
<li>C (disable with <code>setvbuf</code>)</li>
<li>Python (disable with <code>python -u</code>, or <code>PYTHONUNBUFFERED=1</code>, or <code>sys.stdout.reconfigure(line_buffering=False)</code>, or <code>print(x, flush=True)</code>)</li>
<li>Ruby (disable with <code>STDOUT.sync = true</code>)</li>
<li>Perl (disable with <code>$| = 1</code>)</li>
</ul>
<p>I assume that these languages are designed this way so that the default print
function will be fast when you&rsquo;re doing batch processing.</p>
<p>Also whether output is buffered or not might depend on how you print, for
example in C++ <code>cout &lt;&lt; &quot;hello\n&quot;</code> buffers when writing to a pipe but <code>cout &lt;&lt; &quot;hello&quot; &lt;&lt; endl</code> will flush its output.</p>
<h3 id="when-you-press-ctrl-c-on-a-pipe-the-contents-of-the-buffer-are-lost">when you press <code>Ctrl-C</code> on a pipe, the contents of the buffer are lost</h3>
<p>Let&rsquo;s say you&rsquo;re running this command as a hacky way to watch for DNS requests
to <code>example.com</code>, and you forgot to pass <code>-l</code> to tcpdump:</p>
<pre><code>sudo tcpdump -ni any port 53 | grep example.com
</code></pre>
<p>When you press <code>Ctrl-C</code>, what happens? In a magical perfect world, what I would
<em>want</em> to happen is for <code>tcpdump</code> to flush its buffer, <code>grep</code> would search for
<code>example.com</code>, and I would see all the output I missed.</p>
<p>But in the real world, what happens is that all the programs get killed and the
output in <code>tcpdump</code>&rsquo;s buffer is lost.</p>
<p>I think this problem is probably unavoidable &ndash; I spent a little time with
<code>strace</code> to see how this works and <code>grep</code> receives the <code>SIGINT</code> before
<code>tcpdump</code> anyway so even if <code>tcpdump</code> tried to flush its buffer <code>grep</code> would
already be dead.</p>
<small>
<p>After a little more investigation, there is a workaround: if you find
<code>tcpdump</code>&rsquo;s PID and <code>kill -TERM $PID</code>, then tcpdump will flush the buffer so
you can see the output. That&rsquo;s kind of a pain but I tested it and it seems to
work.</p>
</small>
<h3 id="redirecting-to-a-file-also-buffers">redirecting to a file also buffers</h3>
<p>It&rsquo;s not just pipes, this will also buffer:</p>
<pre><code>sudo tcpdump -ni any port 53 &gt; output.txt
</code></pre>
<p>Redirecting to a file doesn&rsquo;t have the same &ldquo;<code>Ctrl-C</code> will totally destroy the
contents of the buffer&rdquo; problem though &ndash; in my experience it usually behaves
more like you&rsquo;d want, where the contents of the buffer get written to the file
before the program exits. I&rsquo;m not 100% sure whether this is something you can
always rely on or not.</p>
<h3 id="a-bunch-of-potential-ways-to-avoid-buffering">a bunch of potential ways to avoid buffering</h3>
<p>Okay, let&rsquo;s talk solutions. Let&rsquo;s say you&rsquo;ve run this command:</p>
<pre><code>tail -f /some/log/file | grep thing1 | grep thing2
</code></pre>
<p>I asked people on Mastodon how they would solve this in practice and there were
5 basic approaches. Here they are:</p>
<h4 id="solution-1-run-a-program-that-finishes-quickly">solution 1: run a program that finishes quickly</h4>
<p>Historically my solution to this has been to just avoid the &ldquo;command writing to
pipe slowly&rdquo; situation completely and instead run a program that will finish quickly
like this:</p>
<pre><code>cat /some/log/file | grep thing1 | grep thing2 | tail
</code></pre>
<p>This doesn&rsquo;t do the same thing as the original command but it does mean that
you get to avoid thinking about these weird buffering issues.</p>
<p>(you could also do <code>grep thing1 /some/log/file</code> but I often prefer to use an
&ldquo;unnecessary&rdquo; <code>cat</code>)</p>
<h4 id="solution-2-remember-the-line-buffer-flag-to-grep">solution 2: remember the &ldquo;line buffer&rdquo; flag to grep</h4>
<p>You could remember that grep has a flag to avoid buffering and pass it like this:</p>
<pre><code>tail -f /some/log/file | grep --line-buffered thing1 | grep thing2
</code></pre>
<h4 id="solution-3-use-awk">solution 3: use awk</h4>
<p>Some people said that if they&rsquo;re specifically dealing with a multiple greps
situation, they&rsquo;ll rewrite it to use a single <code>awk</code> instead, like this:</p>
<pre><code>tail -f /some/log/file |  awk '/thing1/ &amp;&amp; /thing2/'
</code></pre>
<p>Or you would write a more complicated <code>grep</code>, like this:</p>
<pre><code>tail -f /some/log/file |  grep -E 'thing1.*thing2'
</code></pre>
<p>(<code>awk</code> also buffers, so for this to work you&rsquo;ll want <code>awk</code> to be the last command in the pipeline)</p>
<h4 id="solution-4-use-stdbuf">solution 4: use <code>stdbuf</code></h4>
<p><code>stdbuf</code> uses LD_PRELOAD to turn off libc&rsquo;s buffering, and you can use it to turn off output buffering like this:</p>
<pre><code>tail -f /some/log/file | stdbuf -o0 grep thing1 | grep thing2
</code></pre>
<p>Like any <code>LD_PRELOAD</code> solution it&rsquo;s a bit unreliable &ndash; it doesn&rsquo;t work on
static binaries, I think won&rsquo;t work if the program isn&rsquo;t using libc&rsquo;s
buffering, and doesn&rsquo;t always work on Mac OS. Harry Marr has a really nice <a href="https://hmarr.com/blog/how-stdbuf-works/">How stdbuf works</a> post.</p>
<h4 id="solution-5-use-unbuffer">solution 5: use <code>unbuffer</code></h4>
<p><code>unbuffer program</code> will force the program&rsquo;s output to be a TTY, which means
that it&rsquo;ll behave the way it normally would on a TTY (less buffering, colour
output, etc). You could use it in this example like this:</p>
<pre><code>tail -f /some/log/file | unbuffer grep thing1 | grep thing2
</code></pre>
<p>Unlike <code>stdbuf</code> it will always work, though it might have unwanted side
effects, for example <code>grep thing1</code>&rsquo;s will also colour matches.</p>
<p>If you want to install unbuffer, it&rsquo;s in the <code>expect</code> package.</p>
<h3 id="that-s-all-the-solutions-i-know-about">that&rsquo;s all the solutions I know about!</h3>
<p>It&rsquo;s a bit hard for me to say which one is &ldquo;best&rdquo;, I think personally I&rsquo;m
mostly likely to use <code>unbuffer</code> because I know it&rsquo;s always going to work.</p>
<p>If I learn about more solutions I&rsquo;ll try to add them to this post.</p>
<h3 id="i-m-not-really-sure-how-often-this-comes-up">I&rsquo;m not really sure how often this comes up</h3>
<p>I think it&rsquo;s not very common for me to have a program that slowly trickles data
into a pipe like this, normally if I&rsquo;m using a pipe a bunch of data gets
written very quickly, processed by everything in the pipeline, and then
everything exits. The only examples I can come up with right now are:</p>
<ul>
<li>tcpdump</li>
<li><code>tail -f</code></li>
<li>watching log files in a different way like with <code>kubectl logs</code></li>
<li>the output of a slow computation</li>
</ul>
<h3 id="what-if-there-were-an-environment-variable-to-disable-buffering">what if there were an environment variable to disable buffering?</h3>
<p>I think it would be cool if there were a standard environment variable to turn
off buffering, like <code>PYTHONUNBUFFERED</code> in Python. I got this idea from a
<a href="https://blog.plover.com/Unix/stdio-buffering.html">couple</a> of <a href="https://blog.plover.com/Unix/stdio-buffering-2.html">blog posts</a> by Mark Dominus
in 2018. Maybe <code>NO_BUFFER</code> like <a href="https://no-color.org/">NO_COLOR</a>?</p>
<p>The design seems tricky to get right; Mark points out that NETBSD has <a href="https://man.netbsd.org/setbuf.3">environment variables called <code>STDBUF</code>, <code>STDBUF1</code>, etc</a> which gives you a
ton of control over buffering but I imagine most developers don&rsquo;t want to
implement many different environment variables to handle a relatively minor
edge case.</p>
<p>I&rsquo;m also curious about whether there are any programs that just automatically
flush their output buffers after some period of time (like 1 second). It feels
like it would be nice in theory but I can&rsquo;t think of any program that does that
so I imagine there are some downsides.</p>
<h3 id="stuff-i-left-out">stuff I left out</h3>
<p>Some things I didn&rsquo;t talk about in this post since these posts have been
getting pretty long recently and seriously does anyone REALLY want to read 3000
words about buffering?</p>
<ul>
<li>the difference between line buffering and having totally unbuffered output</li>
<li>how buffering to stderr is different from buffering to stdout</li>
<li>this post is only about buffering that happens <strong>inside the program</strong>, your
operating system&rsquo;s TTY driver also does a little bit of buffering sometimes</li>
<li>other reasons you might need to flush your output other than &ldquo;you&rsquo;re writing
to a pipe&rdquo;</li>
</ul>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Importing a frontend Javascript library without a build system]]></title>
    <link href="https://jvns.ca/blog/2024/11/18/how-to-import-a-javascript-library/"/>
    <updated>2024-11-18T09:35:42+00:00</updated>
    <id>https://jvns.ca/blog/2024/11/18/how-to-import-a-javascript-library/</id>
    <content type="html"><![CDATA[<p>I like writing Javascript <a href="https://jvns.ca/blog/2023/02/16/writing-javascript-without-a-build-system/">without a build system</a>
and for the millionth time yesterday I ran into a problem where I needed to
figure out how to import a Javascript library in my code without using a build
system, and it took FOREVER to figure out how to import it because the
library&rsquo;s setup instructions assume that you&rsquo;re using a build system.</p>
<p>Luckily at this point I&rsquo;ve mostly learned how to navigate this situation and
either successfully use the library or decide it&rsquo;s too difficult and switch to
a different library, so here&rsquo;s the guide I wish I had to importing Javascript
libraries years ago.</p>
<p>I&rsquo;m only going to talk about using Javacript libraries on the frontend, and
only about how to use them in a no-build-system setup.</p>
<p>In this post I&rsquo;m going to talk about:</p>
<ol>
<li>the three main types of Javascript files a library might provide (ES Modules, the &ldquo;classic&rdquo; global variable kind, and CommonJS)</li>
<li>how to figure out which types of files a Javascript library includes in its build</li>
<li>ways to import each type of file in your code</li>
</ol>
<h3 id="the-three-kinds-of-javascript-files">the three kinds of Javascript files</h3>
<p>There are 3 basic types of Javascript files a library can provide:</p>
<ol>
<li>the &ldquo;classic&rdquo; type of file that defines a global variable. This is the kind
of file that you can just <code>&lt;script src&gt;</code> and it&rsquo;ll Just Work. Great if you
can get it but not always available</li>
<li>an ES module (which may or may not depend on other files, we&rsquo;ll get to that)</li>
<li>a &ldquo;CommonJS&rdquo; module. This is for Node, you can&rsquo;t use it in a browser at all
without using a build system.</li>
</ol>
<p>I&rsquo;m not sure if there&rsquo;s a better name for the &ldquo;classic&rdquo; type but I&rsquo;m just going
to call it &ldquo;classic&rdquo;. Also there&rsquo;s a type called &ldquo;AMD&rdquo; but I&rsquo;m not sure how
relevant it is in 2024.</p>
<p>Now that we know the 3 types of files, let&rsquo;s talk about how to figure out which
of these the library actually provides!</p>
<h3 id="where-to-find-the-files-the-npm-build">where to find the files: the NPM build</h3>
<p>Every Javascript library has a <strong>build</strong> which it uploads to NPM. You might be
thinking (like I did originally) &ndash; Julia! The whole POINT is that we&rsquo;re not
using Node to build our library! Why are we talking about NPM?</p>
<p>But if you&rsquo;re using a link from a CDN like <a href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.min.js">https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.min.js</a>,
you&rsquo;re still using the NPM build! All the files on the CDNs originally come
from NPM.</p>
<p>Because of this, I sometimes like to <code>npm install</code> the library even if I&rsquo;m not
planning to use Node to build my library at all &ndash; I&rsquo;ll just create a new temp
folder, <code>npm install</code> there, and then delete it when I&rsquo;m done. I like being able to poke
around in the files in the NPM build on my filesystem, because then I can be
100% sure that I&rsquo;m seeing everything that the library is making available in
its build and that the CDN isn&rsquo;t hiding something from me.</p>
<p>So let&rsquo;s <code>npm install</code> a few libraries and try to figure out what types of
Javascript files they provide in their builds!</p>
<h3 id="example-library-1-chart-js">example library 1: chart.js</h3>
<p>First let&rsquo;s look inside <a href="https://www.chartjs.org">Chart.js</a>, a plotting library.</p>
<pre><code>$ cd /tmp/whatever
$ npm install chart.js
$ cd node_modules/chart.js/dist
$ ls *.*js
chart.cjs  chart.js  chart.umd.js  helpers.cjs  helpers.js
</code></pre>
<p>This library seems to have 3 basic options:</p>
<p><strong>option 1:</strong> <code>chart.cjs</code>. The <code>.cjs</code> suffix tells me that this is a <strong>CommonJS
file</strong>, for using in Node. This means it&rsquo;s impossible to use it directly in the
browser without some kind of build step.</p>
<p><strong>option 2:<code>chart.js</code></strong>. The <code>.js</code> suffix by itself doesn&rsquo;t tell us what kind of
file it is, but if I open it up, I see <code>import '@kurkle/color';</code> which is an
immediate sign that this is an ES module &ndash; the <code>import ...</code> syntax is ES
module syntax.</p>
<p><strong>option 3: <code>chart.umd.js</code></strong>. &ldquo;UMD&rdquo; stands for &ldquo;Universal Module Definition&rdquo;,
which I think means that you can use this file either with a basic <code>&lt;script src&gt;</code>, CommonJS,
or some third thing called AMD that I don&rsquo;t understand.</p>
<h3 id="how-to-use-a-umd-file">how to use a UMD file</h3>
<p>When I was using Chart.js I picked Option 3. I just needed to add this to my
code:</p>
<pre><code>&lt;script src=&quot;./chart.umd.js&quot;&gt; &lt;/script&gt;
</code></pre>
<p>and then I could use the library with the global <code>Chart</code> environment variable.
Couldn&rsquo;t be easier. I just copied <code>chart.umd.js</code> into my Git repository so that
I didn&rsquo;t have to worry about using NPM or the CDNs going down or anything.</p>
<h3 id="the-build-files-aren-t-always-in-the-dist-directory">the build files aren&rsquo;t always in the <code>dist</code> directory</h3>
<p>A lot of libraries will put their build in the <code>dist</code> directory, but not
always! The build files&rsquo; location is specified in the library&rsquo;s <code>package.json</code>.</p>
<p>For example here&rsquo;s an excerpt from Chart.js&rsquo;s <code>package.json</code>.</p>
<pre><code>  &quot;jsdelivr&quot;: &quot;./dist/chart.umd.js&quot;,
  &quot;unpkg&quot;: &quot;./dist/chart.umd.js&quot;,
  &quot;main&quot;: &quot;./dist/chart.cjs&quot;,
  &quot;module&quot;: &quot;./dist/chart.js&quot;,
</code></pre>
<p>I think this is saying that if you want to use an ES Module (<code>module</code>) you
should use <code>dist/chart.js</code>, but the jsDelivr and unpkg CDNs should use
<code>./dist/chart.umd.js</code>. I guess <code>main</code> is for Node.</p>
<p><code>chart.js</code>&rsquo;s <code>package.json</code> also says <code>&quot;type&quot;: &quot;module&quot;</code>, which <a href="https://nodejs.org/api/packages.html#modules-packages">according to this documentation</a>
tells Node to treat files as ES modules by default. I think it doesn&rsquo;t tell us
specifically which files are ES modules and which ones aren&rsquo;t but it does tell
us that <em>something</em> in there is an ES module.</p>
<h3 id="example-library-2-atcute-oauth-browser-client">example library 2: <code>@atcute/oauth-browser-client</code></h3>
<p><a href="https://github.com/mary-ext/atcute/tree/trunk/packages/oauth/browser-client"><code>@atcute/oauth-browser-client</code></a>
is a library for logging into Bluesky with OAuth in the browser.</p>
<p>Let&rsquo;s see what kinds of Javascript files it provides in its build!</p>
<pre><code>$ npm install @atcute/oauth-browser-client
$ cd node_modules/@atcute/oauth-browser-client/dist
$ ls *js
constants.js  dpop.js  environment.js  errors.js  index.js  resolvers.js
</code></pre>
<p>It seems like the only plausible root file in here is <code>index.js</code>, which looks
something like this:</p>
<pre><code>export { configureOAuth } from './environment.js';
export * from './errors.js';
export * from './resolvers.js';
</code></pre>
<p>This <code>export</code> syntax means it&rsquo;s an <strong>ES module</strong>. That means we can use it in
the browser without a build step! Let&rsquo;s see how to do that.</p>
<h3 id="how-to-use-an-es-module-with-importmaps">how to use an ES module with importmaps</h3>
<p>Using an ES module isn&rsquo;t an easy as just adding a <code>&lt;script src=&quot;whatever.js&quot;&gt;</code>. Instead, if
the ES module has dependencies (like <code>@atcute/oauth-browser-client</code> does) the
steps are:</p>
<ol>
<li>Set up an import map in your HTML</li>
<li>Put import statements like <code>import { configureOAuth } from '@atcute/oauth-browser-client';</code> in your JS code</li>
<li>Include your JS code in your HTML like this: <code>&lt;script type=&quot;module&quot; src=&quot;YOURSCRIPT.js&quot;&gt;&lt;/script&gt;</code></li>
</ol>
<p>The reason we need an import map instead of just doing something like <code>import { BrowserOAuthClient } from &quot;./oauth-client-browser.js&quot;</code> is that internally the module has more import statements like <code>import {something} from @atcute/client</code>, and we need to tell the browser where to get the code for <code>@atcute/client</code> and all of its other dependencies.</p>
<p>Here&rsquo;s what the importmap I used looks like for <code>@atcute/oauth-browser-client</code>:</p>
<pre><code>&lt;script type=&quot;importmap&quot;&gt;
{
  &quot;imports&quot;: {
    &quot;nanoid&quot;: &quot;./node_modules/nanoid/bin/dist/index.js&quot;,
    &quot;nanoid/non-secure&quot;: &quot;./node_modules/nanoid/non-secure/index.js&quot;,
    &quot;nanoid/url-alphabet&quot;: &quot;./node_modules/nanoid/url-alphabet/dist/index.js&quot;,
    &quot;@atcute/oauth-browser-client&quot;: &quot;./node_modules/@atcute/oauth-browser-client/dist/index.js&quot;,
    &quot;@atcute/client&quot;: &quot;./node_modules/@atcute/client/dist/index.js&quot;,
    &quot;@atcute/client/utils/did&quot;: &quot;./node_modules/@atcute/client/dist/utils/did.js&quot;
  }
}
&lt;/script&gt;
</code></pre>
<p>Getting these import maps to work is pretty fiddly, I feel like there must be a
tool to generate them automatically but I haven&rsquo;t found one yet. It&rsquo;s definitely possible to
write a script that automatically generates the importmaps using <a href="https://esbuild.github.io/api/#metafile">esbuild&rsquo;s metafile</a> but I haven&rsquo;t done that and
maybe there&rsquo;s a better way.</p>
<p>I decided to set up importmaps yesterday to get
<a href="https://github.com/jvns/bsky-oauth-example">github.com/jvns/bsky-oauth-example</a>
to work, so there&rsquo;s some example code in that repo.</p>
<p>Also someone pointed me to Simon Willison&rsquo;s
<a href="https://simonwillison.net/2023/May/2/download-esm/">download-esm</a>, which will
download an ES module and rewrite the imports to point to the JS files directly
so that you don&rsquo;t need importmaps. I haven&rsquo;t tried it yet but it seems like a
great idea.</p>
<h3 id="problems-with-importmaps-too-many-files">problems with importmaps: too many files</h3>
<p>I did run into some problems with using importmaps in the browser though &ndash; it
needed to download dozens of Javascript files to load my site, and my webserver
in development couldn&rsquo;t keep up for some reason. I kept seeing files fail to
load randomly and then had to reload the page and hope that they would succeed
this time.</p>
<p>It wasn&rsquo;t an issue anymore when I deployed my site to production, so I guess it
was a problem with my local dev environment.</p>
<p>Also one slightly annoying thing about ES modules in general is that you need to
be running a webserver to use them, I&rsquo;m sure this is for a good reason but it&rsquo;s
easier when you can just open your <code>index.html</code> file without starting a
webserver.</p>
<p>Because of the &ldquo;too many files&rdquo; thing I think actually using ES modules with
importmaps in this way isn&rsquo;t actually that appealing to me, but it&rsquo;s good to
know it&rsquo;s possible.</p>
<h3 id="how-to-use-an-es-module-without-importmaps">how to use an ES module without importmaps</h3>
<p>If the ES module doesn&rsquo;t have dependencies then it&rsquo;s even easier &ndash; you don&rsquo;t
need the importmaps! You can just:</p>
<ul>
<li>put <code>&lt;script type=&quot;module&quot; src=&quot;YOURCODE.js&quot;&gt;&lt;/script&gt;</code> in your HTML. The <code>type=&quot;module&quot;</code> is important.</li>
<li>put <code>import {whatever} from &quot;https://example.com/whatever.js&quot;</code> in <code>YOURCODE.js</code></li>
</ul>
<h3 id="alternative-use-esbuild">alternative: use esbuild</h3>
<p>If you don&rsquo;t want to use importmaps, you can also use a build system like <a href="https://esbuild.github.io/">esbuild</a>. I talked about how to do
that in <a href="https://jvns.ca/blog/2021/11/15/esbuild-vue/">Some notes on using esbuild</a>, but this blog post is
about ways to avoid build systems completely so I&rsquo;m not going to talk about
that option here. I do still like esbuild though and I think it&rsquo;s a good option
in this case.</p>
<h3 id="what-s-the-browser-support-for-importmaps">what&rsquo;s the browser support for importmaps?</h3>
<p><a href="https://caniuse.com/import-maps">CanIUse</a> says that importmaps are in
&ldquo;Baseline 2023: newly available across major browsers&rdquo; so my sense is that in
2024 that&rsquo;s still maybe a little bit too new? I think I would use importmaps
for some fun experimental code that I only wanted like myself and 12 people to
use, but if I wanted my code to be more widely usable I&rsquo;d use <code>esbuild</code> instead.</p>
<h3 id="example-library-3-atproto-oauth-client-browser">example library 3: <code>@atproto/oauth-client-browser</code></h3>
<p>Let&rsquo;s look at one final example library! This is a different Bluesky auth
library than <code>@atcute/oauth-browser-client</code>.</p>
<pre><code>$ npm install @atproto/oauth-client-browser
$ cd node_modules/@atproto/oauth-client-browser/dist
$ ls *js
browser-oauth-client.js  browser-oauth-database.js  browser-runtime-implementation.js  errors.js  index.js  indexed-db-store.js  util.js
</code></pre>
<p>Again, it seems like only real candidate file here is <code>index.js</code>. But this is a
different situation from the previous example library! Let&rsquo;s take a look at
<code>index.js</code>:</p>
<p>There&rsquo;s a bunch of stuff like this in <code>index.js</code>:</p>
<pre><code>__exportStar(require(&quot;@atproto/oauth-client&quot;), exports);
__exportStar(require(&quot;./browser-oauth-client.js&quot;), exports);
__exportStar(require(&quot;./errors.js&quot;), exports);
var util_js_1 = require(&quot;./util.js&quot;);
</code></pre>
<p>This <code>require()</code> syntax is CommonJS syntax, which means that we can&rsquo;t use this
file in the browser at all, we need to use some kind of build step, and
ESBuild won&rsquo;t work either.</p>
<p>Also in this library&rsquo;s <code>package.json</code> it says <code>&quot;type&quot;: &quot;commonjs&quot;</code> which is
another way to tell it&rsquo;s CommonJS.</p>
<h3 id="how-to-use-a-commonjs-module-with-esm-sh-https-esm-sh">how to use a CommonJS module with <a href="https://esm.sh">esm.sh</a></h3>
<p>Originally I thought it was impossible to use CommonJS modules without learning
a build system, but then someone Bluesky told me about
<a href="https://esm.sh">esm.sh</a>! It&rsquo;s a CDN that will translate anything into an ES
Module. <a href="https://www.skypack.dev/">skypack.dev</a> does something similar, I&rsquo;m not
sure what the difference is but one person mentioned that if one doesn&rsquo;t work
sometimes they&rsquo;ll try the other one.</p>
<p>For <code>@atproto/oauth-client-browser</code> using it seems pretty simple, I just need to put this in my HTML:</p>
<pre><code>&lt;script type=&quot;module&quot; src=&quot;script.js&quot;&gt; &lt;/script&gt;
</code></pre>
<p>and then put this in <code>script.js</code>.</p>
<pre><code>import { BrowserOAuthClient } from &quot;https://esm.sh/@atproto/[email protected]&quot;
</code></pre>
<p>It seems to Just Work, which is cool! Of course this is still sort of using a
build system &ndash; it&rsquo;s just that esm.sh is running the build instead of me. My
main concerns with this approach are:</p>
<ul>
<li>I don&rsquo;t really trust CDNs to keep working forever &ndash; usually I like to copy dependencies into my repository so that they don&rsquo;t go away for some reason in the future.</li>
<li>I&rsquo;ve heard of some issues with CDNs having security compromises which scares me.</li>
<li>I don&rsquo;t really understand what esm.sh is doing.</li>
</ul>
<h3 id="esbuild-can-also-convert-commonjs-modules-into-es-modules">esbuild can also convert CommonJS modules into ES modules</h3>
<p>I also learned that you can also use <code>esbuild</code> to convert a CommonJS module
into an ES module, though there are some limitations &ndash; the <code>import { BrowserOAuthClient } from</code> syntax doesn&rsquo;t work. Here&rsquo;s a <a href="https://github.com/evanw/esbuild/issues/442">github issue about that</a>.</p>
<p>I think the <code>esbuild</code> approach is probably more appealing to me than the
<code>esm.sh</code> approach because it&rsquo;s a tool that I already have on my computer so I
trust it more. I haven&rsquo;t experimented with this much yet though.</p>
<h3 id="summary-of-the-three-types-of-files">summary of the three types of files</h3>
<p>Here&rsquo;s a summary of the three types of JS files you might encounter, options
for how to use them, and how to identify them.</p>
<p>Unhelpfully a <code>.js</code> or <code>.min.js</code> file extension could be any of these 3
options, so if the file is <code>something.js</code> you need to do more detective work to
figure out what you&rsquo;re dealing with.</p>
<ol>
<li><strong>&ldquo;classic&rdquo; JS files</strong>
<ul>
<li><strong>How to use it:</strong>: <code>&lt;script src=&quot;whatever.js&quot;&gt;&lt;/script&gt;</code></li>
<li><strong>Ways to identify it:</strong>
<ul>
<li>The website has a big friendly banner in its setup instructions saying &ldquo;Use this with a CDN!&rdquo;  or something</li>
<li>A <code>.umd.js</code> extension</li>
<li>Just try to put it in a <code>&lt;script src=...</code> tag and see if it works</li>
</ul>
</li>
</ul>
</li>
<li><strong>ES Modules</strong>
<ul>
<li><strong>Ways to use it:</strong>
<ul>
<li>If there are no dependencies, just <code>import {whatever} from &quot;./my-module.js&quot;</code> directly in your code</li>
<li>If there are dependencies, create an importmap and <code>import {whatever} from &quot;my-module&quot;</code>
<ul>
<li>or use <a href="https://simonwillison.net/2023/May/2/download-esm/">download-esm</a> to remove the need for an importmap</li>
</ul>
</li>
<li>Use <a href="https://esbuild.github.io/">esbuild</a> or any ES Module bundler</li>
</ul>
</li>
<li><strong>Ways to identify it:</strong>
<ul>
<li>Look for an <code>import </code> or <code>export </code> statement. (not <code>module.exports = ...</code>, that&rsquo;s CommonJS)</li>
<li>An <code>.mjs</code> extension</li>
<li>maybe <code>&quot;type&quot;: &quot;module&quot;</code> in <code>package.json</code> (though it&rsquo;s not clear to me which file exactly this refers to)</li>
</ul>
</li>
</ul>
</li>
<li><strong>CommonJS Modules</strong>
<ul>
<li><strong>Ways to use it:</strong>
<ul>
<li>Use <a href="https://esm.sh/#docs">https://esm.sh</a> to convert it into an ES module, like <code>https://esm.sh/@atproto/[email protected]</code></li>
<li>Use a build somehow (??)</li>
</ul>
</li>
<li><strong>Ways to identify it:</strong>
<ul>
<li>Look for <code>require()</code> or <code>module.exports = ...</code> in the code</li>
<li>A <code>.cjs</code> extension</li>
<li>maybe <code>&quot;type&quot;: &quot;commonjs&quot;</code> in <code>package.json</code> (though it&rsquo;s not clear to me which file exactly this refers to)</li>
</ul>
</li>
</ul>
</li>
</ol>
<h3 id="it-s-really-nice-to-have-es-modules-standardized">it&rsquo;s really nice to have ES modules standardized</h3>
<p>The main difference between CommonJS modules and ES modules from my perspective
is that ES modules are actually a standard. This makes me feel a lot more
confident using them, because browsers commit to backwards compatibility for
web standards forever &ndash; if I write some code using ES modules today, I can
feel sure that it&rsquo;ll still work the same way in 15 years.</p>
<p>It also makes me feel better about using tooling like <code>esbuild</code> because even if
the esbuild project dies, because it&rsquo;s implementing a standard it feels likely
that there will be another similar tool in the future that I can replace it
with.</p>
<h3 id="the-js-community-has-built-a-lot-of-very-cool-tools">the JS community has built a lot of very cool tools</h3>
<p>A lot of the time when I talk about this stuff I get responses like &ldquo;I hate
javascript!!! it&rsquo;s the worst!!!&rdquo;. But my experience is that there are a lot of great tools for Javascript
(I just learned about <a href="https://esm.sh">https://esm.sh</a> yesterday which seems great! I love
esbuild!), and that if I take the time to learn how things works I can take
advantage of some of those tools and make my life a lot easier.</p>
<p>So the goal of this post is definitely not to complain about Javascript, it&rsquo;s
to understand the landscape so I can use the tooling in a way that feels good
to me.</p>
<h3 id="questions-i-still-have">questions I still have</h3>
<p>Here are some questions I still have, I&rsquo;ll add the answers into the post if I
learn the answer.</p>
<ul>
<li>Is there a tool that automatically generates importmaps for an ES Module that
I have set up locally? (apparently yes: <a href="https://jspm.org/getting-started">jspm</a>)</li>
<li>How can I convert a CommonJS module into an ES module on my computer, the way
<a href="https://esm.sh">https://esm.sh</a> does? (apparently esbuild can sort of do this, though <a href="https://github.com/evanw/esbuild/issues/442">named exports don&rsquo;t work</a>)</li>
<li>When people normally build CommonJS modules into regular JS code, what&rsquo;s code is
doing that? Obviously there are tools like webpack, rollup, esbuild, etc, but
do those tools all implement their own JS parsers/static analysis? How many
JS parsers are there out there?</li>
<li>Is there any way to bundle an ES module into a single file (like
<code>atcute-client.js</code>), but so that in the browser I can still import multiple
different paths from that file (like both <code>@atcute/client/lexicons</code> and
<code>@atcute/client</code>)?</li>
</ul>
<h3 id="all-the-tools">all the tools</h3>
<p>Here&rsquo;s a list of every tool we talked about in this post:</p>
<ul>
<li>Simon Willison&rsquo;s
<a href="https://simonwillison.net/2023/May/2/download-esm/">download-esm</a> which will
download an ES module and convert the imports to point at JS files so you
don&rsquo;t need an importmap</li>
<li><a href="esm.sh">https://esm.sh/</a> and <a href="https://www.skypack.dev/">skypack.dev</a></li>
<li><a href="https://esbuild.github.io/">esbuild</a></li>
<li><a href="https://jspm.org/getting-started">JSPM</a> can generate importmaps</li>
</ul>
<p>Writing this post has made me think that even though I usually don&rsquo;t want to
have a build that I run every time I update the project, I might be willing to
have a build step (using <code>download-esm</code> or something) that I run <strong>only once</strong>
when setting up the project and never run again except maybe if I&rsquo;m updating my
dependency versions.</p>
<h3 id="that-s-all">that&rsquo;s all!</h3>
<p>Thanks to <a href="https://polotek.net/">Marco Rogers</a> who taught me a lot of the things
in this post. I&rsquo;ve probably made some mistakes in this post and I&rsquo;d love to
know what they are &ndash; let me know on Bluesky or Mastodon!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[New microblog with TILs]]></title>
    <link href="https://jvns.ca/blog/2024/11/09/new-microblog/"/>
    <updated>2024-11-09T09:24:29+00:00</updated>
    <id>https://jvns.ca/blog/2024/11/09/new-microblog/</id>
    <content type="html"><![CDATA[<p>I added a new section to this site a couple weeks ago called
<a href="https://jvns.ca/til/">TIL</a> (&ldquo;today I learned&rdquo;).</p>
<h3 id="the-goal-save-interesting-tools-facts-i-posted-on-social-media">the goal: save interesting tools &amp; facts I posted on social media</h3>
<p>One kind of thing I like to post on Mastodon/Bluesky is &ldquo;hey, here&rsquo;s a cool
thing&rdquo;, like <a href="https://github.com/dbcli/litecli">the great SQLite repl litecli</a>, or
the fact that cross compiling in Go Just Works and it&rsquo;s amazing, or
<a href="https://www.latacora.com/blog/2018/04/03/cryptographic-right-answers/">cryptographic right answers</a>,
or <a href="https://diffdiff.net/">this great diff tool</a>. Usually I don&rsquo;t want to write
a whole blog post about those things because I really don&rsquo;t have much more to
say than &ldquo;hey this is useful!&rdquo;</p>
<p>It started to bother me that I didn&rsquo;t have anywhere to put those things: for
example recently I wanted to use <a href="https://diffdiff.net/">diffdiff</a> and I just
could not remember what it was called.</p>
<h3 id="the-solution-make-a-new-section-of-this-blog">the solution: make a new section of this blog</h3>
<p>So I quickly made a new folder called <a href="https://jvns.ca/til/">/til/</a>, added some
custom styling (I wanted to style the posts to look a little bit like a tweet),
made a little Rake task to help me create new posts quickly (<code>rake new_til</code>), and
set up a separate RSS Feed for it.</p>
<p>I think this new section of the blog might be more for myself than anything,
now when I forget the link to Cryptographic Right Answers I can hopefully look
it up on the TIL page. (you might think &ldquo;julia, why not use bookmarks??&rdquo; but I
have been failing to use bookmarks for my whole life and I don&rsquo;t see that
changing ever, putting things in public is for whatever reason much easier for
me)</p>
<p>So far it&rsquo;s been working, often I can actually just make a quick post in 2
minutes which was the goal.</p>
<h3 id="inspired-by-simon-willison-s-til-blog">inspired by Simon Willison&rsquo;s TIL blog</h3>
<p>My page is inspired by <a href="https://til.simonwillison.net/">Simon Willison&rsquo;s great TIL blog</a>, though my TIL posts are a lot shorter.</p>
<h3 id="i-don-t-necessarily-want-everything-to-be-archived">I don&rsquo;t necessarily want everything to be archived</h3>
<p>This came about because I spent a lot of time on Twitter, so I&rsquo;ve been thinking
about what I want to do about all of my tweets.</p>
<p>I keep reading the advice to &ldquo;POSSE&rdquo; (&ldquo;post on your own site, syndicate
elsewhere&rdquo;), and while I find the idea appealing in principle, for me part of
the appeal of social media is that it&rsquo;s a little bit ephemeral. I can
post polls or questions or observations or jokes and then they can just kind of
fade away as they become less relevant.</p>
<p>I find it a lot easier to identify specific categories of things that I actually
want to have on a Real Website That I Own:</p>
<ul>
<li>blog posts here!</li>
<li>comics at <a href="https://wizardzines.com/comics/">https://wizardzines.com/comics/</a>!</li>
<li>now TILs at <a href="https://jvns.ca/til/">https://jvns.ca/til/</a>)</li>
</ul>
<p>and then let everything else be kind of ephemeral.</p>
<p>I really believe in the advice to make email lists though &ndash; the first two
(blog posts &amp; comics) both have email lists and RSS feeds that people can
subscribe to if they want. I might add a quick summary of any TIL posts from
that week to the &ldquo;blog posts from this week&rdquo; mailing list.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[ASCII control characters in my terminal]]></title>
    <link href="https://jvns.ca/blog/2024/10/31/ascii-control-characters/"/>
    <updated>2024-10-31T08:00:10+00:00</updated>
    <id>https://jvns.ca/blog/2024/10/31/ascii-control-characters/</id>
    <content type="html"><![CDATA[<p>Hello! I&rsquo;ve been thinking about the terminal a lot and yesterday I got curious
about all these &ldquo;control codes&rdquo;, like <code>Ctrl-A</code>, <code>Ctrl-C</code>, <code>Ctrl-W</code>, etc. What&rsquo;s
the deal with all of them?</p>
<h3 id="a-table-of-ascii-control-characters">a table of ASCII control characters</h3>
<p>Here&rsquo;s a table of all 33 ASCII control characters, and what they do on my
machine (on Mac OS), more or less. There are about a million caveats, but I&rsquo;ll talk about
what it means and all the problems with this diagram that I know about.</p>
<p><a href="https://jvns.ca/ascii.html"><img src="https://jvns.ca/images/ascii-control.png"></a></p>
<p>You can also view it <a href="https://jvns.ca/ascii.html">as an HTML page</a> (I just made it an image so
it would show up in RSS).</p>
<h3 id="different-kinds-of-codes-are-mixed-together">different kinds of codes are mixed together</h3>
<p>The first surprising thing about this diagram to me is that there are 33
control codes, split into (very roughly speaking) these categories:</p>
<ol>
<li>Codes that are handled by the operating system&rsquo;s terminal driver, for
example when the OS sees a <code>3</code> (<code>Ctrl-C</code>), it&rsquo;ll send a <code>SIGINT</code> signal to
the current program</li>
<li>Everything else is passed through to the application as-is and the
application can do whatever it wants with them. Some subcategories of
those:
<ul>
<li>Codes that correspond to a literal keypress of a key on your keyboard
(<code>Enter</code>, <code>Tab</code>, <code>Backspace</code>). For example when you press <code>Enter</code>, your
terminal gets sent <code>13</code>.</li>
<li>Codes used by <code>readline</code>: &ldquo;the application can do whatever it wants&rdquo;
often means &ldquo;it&rsquo;ll do more or less what the <code>readline</code> library does,
whether the application actually uses <code>readline</code> or not&rdquo;, so I&rsquo;ve
labelled a bunch of the codes that <code>readline</code> uses</li>
<li>Other codes, for example I think <code>Ctrl-X</code> has no standard meaning in the
terminal in general but emacs uses it very heavily</li>
</ul>
</li>
</ol>
<p>There&rsquo;s no real structure to which codes are in which categories, they&rsquo;re all
just kind of randomly scattered because this evolved organically.</p>
<p>(If you&rsquo;re curious about readline, I wrote more about readline in <a href="https://jvns.ca/blog/2024/07/08/readline/">entering text in the terminal is complicated</a>, and there are a lot of
<a href="https://github.com/chzyer/readline/blob/master/doc/shortcut.md">cheat sheets out there</a>)</p>
<h3 id="there-are-only-33-control-codes">there are only 33 control codes</h3>
<p>Something else that I find a little surprising is that are only 33 control codes &ndash;
A to Z, plus 7 more (<code>@, [, \, ], ^, _, ?</code>). This means that if you want to
have for example <code>Ctrl-1</code> as a keyboard shortcut in a terminal application,
that&rsquo;s not really meaningful &ndash; on my machine at least <code>Ctrl-1</code> is exactly the
same thing as just pressing <code>1</code>, <code>Ctrl-3</code> is the same as <code>Ctrl-[</code>, etc.</p>
<p>Also <code>Ctrl+Shift+C</code> isn&rsquo;t a control code &ndash; what it does depends on your
terminal emulator. On Linux <code>Ctrl-Shift-X</code> is often used by the terminal
emulator to copy or open a new tab or paste for example, it&rsquo;s not sent to the
TTY at all.</p>
<p>Also I use <code>Ctrl+Left Arrow</code> all the time, but that isn&rsquo;t a control code,
instead it sends an ANSI escape sequence (<code>ctrl-[[1;5D</code>) which is a different
thing which we absolutely do not have space for in this post.</p>
<p>This &ldquo;there are only 33 codes&rdquo; thing is totally different from how keyboard
shortcuts work in a GUI where you can have <code>Ctrl+KEY</code> for any key you want.</p>
<h3 id="the-official-ascii-names-aren-t-very-meaningful-to-me">the official ASCII names aren&rsquo;t very meaningful to me</h3>
<p>Each of these 33 control codes has a name in ASCII (for example <code>3</code> is <code>ETX</code>).
When all of these control codes were originally defined, they weren&rsquo;t being
used for computers or terminals at all, they were used for <a href="https://falsedoor.com/doc/ascii_evolution-of-character-codes.pdf">the telegraph machine</a>.
Telegraph machines aren&rsquo;t the same as UNIX terminals so a lot of the codes were repurposed to mean something else.</p>
<p>Personally I don&rsquo;t find these ASCII names very useful, because 50% of the time
the name in ASCII has no actual relationship to what that code does on UNIX
systems today. So it feels easier to just ignore the ASCII names completely
instead of trying to figure which ones still match their original meaning.</p>
<h3 id="it-s-hard-to-use-ctrl-m-as-a-keyboard-shortcut">It&rsquo;s hard to use Ctrl-M  as a keyboard shortcut</h3>
<p>Another thing that&rsquo;s a bit weird is that <code>Ctrl-M</code> is literally the same as
<code>Enter</code>, and <code>Ctrl-I</code> is the same as <code>Tab</code>, which makes it hard to use those two as keyboard shortcuts.</p>
<p>From some quick research, it seems like some folks do still use <code>Ctrl-I</code> and
<code>Ctrl-M</code> as keyboard shortcuts (<a href="https://github.com/tmux/tmux/issues/2705">here&rsquo;s an example</a>), but to do that
you need to configure your terminal emulator to treat them differently than the
default.</p>
<p>For me the main takeaway is that if I ever write a terminal application I
should avoid <code>Ctrl-I</code> and <code>Ctrl-M</code> as keyboard shortcuts in it.</p>
<h3 id="how-to-identify-what-control-codes-get-sent">how to identify what control codes get sent</h3>
<p>While writing this I needed to do a bunch of experimenting to figure out what
various key combinations did, so I wrote this Python script
<a href="https://gist.github.com/jvns/a2ea09dbfbe03cc75b7bfb381941c742">echo-key.py</a>
that will print them out.</p>
<p>There&rsquo;s probably a more official way but I appreciated having a script I could
customize.</p>
<h3 id="caveat-on-canonical-vs-noncanonical-mode">caveat: on canonical vs noncanonical mode</h3>
<p>Two of these codes (<code>Ctrl-W</code> and <code>Ctrl-U</code>) are labelled in the table as
&ldquo;handled by the OS&rdquo;, but actually they&rsquo;re not <strong>always</strong> handled by the OS, it
depends on whether the terminal is in &ldquo;canonical&rdquo; mode or in &ldquo;noncanonical mode&rdquo;.</p>
<p>In <a href="https://www.man7.org/linux/man-pages/man3/termios.3.html">canonical mode</a>,
programs only get input when you press <code>Enter</code> (and the OS is in charge of deleting characters when you press <code>Backspace</code> or <code>Ctrl-W</code>). But in noncanonical mode the program gets
input immediately when you press a key, and the <code>Ctrl-W</code> and <code>Ctrl-U</code> codes are passed through to the program to handle any way it wants.</p>
<p>Generally in noncanonical mode the program will handle <code>Ctrl-W</code> and <code>Ctrl-U</code>
similarly to how the OS does, but there are some small differences.</p>
<p>Some examples of programs that use canonical mode:</p>
<ul>
<li>probably pretty much any noninteractive program, like <code>grep</code> or <code>cat</code></li>
<li><code>git</code>, I think</li>
</ul>
<p>Examples of programs that use noncanonical mode:</p>
<ul>
<li><code>python3</code>, <code>irb</code> and other REPLs</li>
<li>your shell</li>
<li>any full screen TUI like <code>less</code> or <code>vim</code></li>
</ul>
<h3 id="caveat-all-of-the-os-terminal-driver-codes-are-configurable-with-stty">caveat: all of the &ldquo;OS terminal driver&rdquo; codes are configurable with <code>stty</code></h3>
<p>I said that <code>Ctrl-C</code> sends <code>SIGINT</code> but technically this is not necessarily
true, if you really want to you can remap all of the codes labelled &ldquo;OS
terminal driver&rdquo;, plus Backspace, using a tool called <code>stty</code>, and you can view
the mappings with <code>stty -a</code>.</p>
<p>Here are the mappings on my machine right now:</p>
<pre><code>$ stty -a
cchars: discard = ^O; dsusp = ^Y; eof = ^D; eol = &lt;undef&gt;;
	eol2 = &lt;undef&gt;; erase = ^?; intr = ^C; kill = ^U; lnext = ^V;
	min = 1; quit = ^\; reprint = ^R; start = ^Q; status = ^T;
	stop = ^S; susp = ^Z; time = 0; werase = ^W;
</code></pre>
<p>I have personally never remapped any of these and I cannot imagine a reason I
would (I think it would be a recipe for confusion and disaster for me), but I
<a href="TODO">asked on Mastodon</a> and people said the most common reasons they used
<code>stty</code> were:</p>
<ul>
<li>fix a broken terminal with <code>stty sane</code></li>
<li>set <code>stty erase ^H</code> to change how Backspace works</li>
<li>set <code>stty ixoff</code></li>
<li>some people even map <code>SIGINT</code> to a different key, like their <code>DELETE</code> key</li>
</ul>
<h3 id="caveat-on-signals">caveat: on signals</h3>
<p>Two signals caveats:</p>
<ol>
<li>If the <code>ISIG</code> terminal mode is turned off, then the OS won&rsquo;t send signals. For example <code>vim</code> turns off <code>ISIG</code></li>
<li>Apparently on BSDs, there&rsquo;s an extra control code (<code>Ctrl-T</code>) which sends <code>SIGINFO</code></li>
</ol>
<p>You can see which terminal modes a program is setting using <code>strace</code> like this,
terminal modes are set with the <code>ioctl</code> system call:</p>
<pre><code>$ strace -tt -o out  vim
$ grep ioctl out | grep SET
</code></pre>
<p>here are the modes <code>vim</code> sets when it starts (<code>ISIG</code> and <code>ICANON</code> are
missing!):</p>
<pre><code>17:43:36.670636 ioctl(0, TCSETS, {c_iflag=IXANY|IMAXBEL|IUTF8,
c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST, c_cflag=B38400|CS8|CREAD,
c_lflag=ECHOK|ECHOCTL|ECHOKE|PENDIN, ...}) = 0
</code></pre>
<p>and it resets the modes when it exits:</p>
<pre><code>17:43:38.027284 ioctl(0, TCSETS, {c_iflag=ICRNL|IXANY|IMAXBEL|IUTF8,
c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD,
c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE|PENDIN, ...}) = 0
</code></pre>
<p>I think the specific combination of modes vim is using here might be called
&ldquo;raw mode&rdquo;, <a href="https://linux.die.net/man/3/cfmakeraw">man cfmakeraw</a> talks about
that.</p>
<h3 id="there-are-a-lot-of-conflicts">there are a lot of conflicts</h3>
<p>Related to &ldquo;there are only 33 codes&rdquo;, there are a lot of conflicts where
different parts of the system want to use the same code for different things,
for example by default <code>Ctrl-S</code> will freeze your screen, but if you turn that
off then <code>readline</code> will use <code>Ctrl-S</code> to do a forward search.</p>
<p>Another example is that on my machine sometimes <code>Ctrl-T</code> will send <code>SIGINFO</code>
and sometimes it&rsquo;ll transpose 2 characters and sometimes it&rsquo;ll do something
completely different depending on:</p>
<ul>
<li>whether the program has <code>ISIG</code> set</li>
<li>whether the program uses <code>readline</code> / imitates readline&rsquo;s behaviour</li>
</ul>
<h3 id="caveat-on-backspace-and-other-backspace">caveat: on &ldquo;backspace&rdquo; and &ldquo;other backspace&rdquo;</h3>
<p>In this diagram I&rsquo;ve labelled code 127 as &ldquo;backspace&rdquo; and 8 as &ldquo;other
backspace&rdquo;. Uh, what?</p>
<p>I think this was the single biggest topic of discussion in the replies on Mastodon &ndash; apparently there&rsquo;s a LOT of history to this and I&rsquo;d never heard of any of it before.</p>
<p>First, here&rsquo;s how it works on my machine:</p>
<ol>
<li>I press the <code>Backspace</code> key</li>
<li>The TTY gets sent the byte <code>127</code>, which is called <code>DEL</code> in ASCII</li>
<li>the OS terminal driver and readline both have <code>127</code> mapped to &ldquo;backspace&rdquo; (so it works both in canonical mode and noncanonical mode)</li>
<li>The previous character gets deleted</li>
</ol>
<p>If I press <code>Ctrl+H</code>, it has the same effect as <code>Backspace</code> if I&rsquo;m using
readline, but in a program without readline support (like <code>cat</code> for instance),
it just prints out <code>^H</code>.</p>
<p>Apparently Step 2 above is different for some folks &ndash; their <code>Backspace</code> key sends
the byte <code>8</code> instead of <code>127</code>, and so if they want Backspace to work then they
need to configure the OS (using <code>stty</code>) to set <code>erase = ^H</code>.</p>
<p>There&rsquo;s an incredible <a href="https://www.debian.org/doc/debian-policy/ch-opersys.html#keyboard-configuration">section of the Debian Policy Manual on keyboard configuration</a>
that describes how <code>Delete</code> and <code>Backspace</code> should work according to Debian
policy, which seems very similar to how it works on my Mac today.  My
understanding (via <a href="https://tech.lgbt/@Diziet/113396035847619715">this mastodon post</a>)
is that this policy was written in the 90s because there was a lot of confusion
about what <code>Backspace</code> should do in the 90s and there needed to be a standard
to get everything to work.</p>
<p>There&rsquo;s a bunch more historical terminal stuff here but that&rsquo;s all I&rsquo;ll say for
now.</p>
<h3 id="there-s-probably-a-lot-more-diversity-in-how-this-works">there&rsquo;s probably a lot more diversity in how this works</h3>
<p>I&rsquo;ve probably missed a bunch more ways that &ldquo;how it works on my machine&rdquo; might
be different from how it works on other people&rsquo;s machines, and I&rsquo;ve probably
made some mistakes about how it works on my machine too. But that&rsquo;s all I&rsquo;ve
got for today.</p>
<p>Some more stuff I know that I&rsquo;ve left out: according to <code>stty -a</code> <code>Ctrl-O</code> is
&ldquo;discard&rdquo;, <code>Ctrl-R</code> is &ldquo;reprint&rdquo;, and <code>Ctrl-Y</code> is &ldquo;dsusp&rdquo;. I have no idea how
to make those actually do anything (pressing them does not do anything
obvious, and some people have told me what they used to do historically but
it&rsquo;s not clear to me if they have a use in 2024), and a lot of the time in practice
they seem to just be passed through to the application anyway so I just
labelled <code>Ctrl-R</code> and <code>Ctrl-Y</code> as
<code>readline</code>.</p>
<h3 id="not-all-of-this-is-that-useful-to-know">not all of this is that useful to know</h3>
<p>Also I want to say that I think the contents of this post are kind of interesting
but I don&rsquo;t think they&rsquo;re necessarily that <em>useful</em>. I&rsquo;ve used the terminal
pretty successfully every day for the last 20 years without knowing literally
any of this &ndash; I just knew what <code>Ctrl-C</code>, <code>Ctrl-D</code>, <code>Ctrl-Z</code>, <code>Ctrl-R</code>,
<code>Ctrl-L</code> did in practice (plus maybe <code>Ctrl-A</code>, <code>Ctrl-E</code> and <code>Ctrl-W</code>) and did
not worry about the details for the most part, and that was
almost always totally fine except when I was <a href="https://jvns.ca/blog/2022/07/20/pseudoterminals/">trying to use xterm.js</a>.</p>
<p>But I had fun learning about it so maybe it&rsquo;ll be interesting to you too.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Using less memory to look up IP addresses in Mess With DNS]]></title>
    <link href="https://jvns.ca/blog/2024/10/27/asn-ip-address-memory/"/>
    <updated>2024-10-27T07:47:04+00:00</updated>
    <id>https://jvns.ca/blog/2024/10/27/asn-ip-address-memory/</id>
    <content type="html"><![CDATA[<p>I&rsquo;ve been having problems for the last 3 years or so where <a href="https://messwithdns.net/">Mess With DNS</a>
periodically runs out of memory and gets OOM killed.</p>
<p>This hasn&rsquo;t been a big priority for me: usually it just goes down for a few
minutes while it restarts, and it only happens once a day at most, so I&rsquo;ve just
been ignoring. But last week it started actually causing a problem so I decided
to look into it.</p>
<p>This was kind of winding road where I learned a lot so here&rsquo;s a table of contents:</p>
<ul>
<li><a href="#there-s-about-100mb-of-memory-available">there&rsquo;s about 100MB of memory available</a></li>
<li><a href="#the-problem-oom-killing-the-backup-script">the problem: OOM killing the backup script</a></li>
<li><a href="#attempt-1-use-sqlite">attempt 1: use SQLite</a>
<ul>
<li><a href="#problem-how-to-store-ipv6-addresses">problem: how to store IPv6 addresses</a></li>
<li><a href="#problem-it-s-500x-slower">problem: it&rsquo;s 500x slower</a></li>
<li><a href="#time-for-explain-query-plan">time for EXPLAIN QUERY PLAN</a></li>
</ul>
</li>
<li><a href="#attempt-2-use-a-trie">attempt 2: use a trie</a>
<ul>
<li><a href="#some-notes-on-memory-profiling">some notes on memory profiling</a></li>
</ul>
</li>
<li><a href="#attempt-3-make-my-array-use-less-memory">attempt 3: make my array use less memory</a>
<ul>
<li><a href="#idea-3-1-deduplicate-the-name-and-country">idea 3.1: deduplicate the Name and Country</a></li>
<li><a href="#how-big-are-asns">how big are ASNs?</a></li>
<li><a href="#idea-3-2-use-netip-addr-instead-of-net-ip">idea 3.2: use netip.Addr instead of net.IP</a></li>
<li><a href="#the-result-saved-70mb-of-memory">the result: saved 70MB of memory!</a></li>
</ul>
</li>
</ul>
<h3 id="there-s-about-100mb-of-memory-available">there&rsquo;s about 100MB of memory available</h3>
<p>I run Mess With DNS on a VM without about 465MB of RAM, which according to
<code>ps aux</code> (the <code>RSS</code> column) is split up something like:</p>
<ul>
<li>100MB for PowerDNS</li>
<li>200MB for Mess With DNS</li>
<li>40MB for <a href="https://fly.io/blog/ssh-and-user-mode-ip-wireguard/">hallpass</a></li>
</ul>
<p>That leaves about 110MB of memory free.</p>
<p>A while back I set <a href="https://tip.golang.org/doc/gc-guide">GOMEMLIMIT</a> to 250MB
to try to make sure the garbage collector ran if Mess With DNS used more than
250MB of memory, and I think this helped but it didn&rsquo;t solve everything.</p>
<h3 id="the-problem-oom-killing-the-backup-script">the problem: OOM killing the backup script</h3>
<p>A few weeks ago I started backing up Mess With DNS&rsquo;s database for the first time <a href="https://jvns.ca/til/restic-for-backing-up-sqlite-dbs/">using restic</a>.</p>
<p>This has been working okay, but since Mess With DNS operates without much extra
memory I think <code>restic</code> sometimes needed more memory than was available on the
system, and so the backup script sometimes got OOM killed.</p>
<p>This was a problem because</p>
<ol>
<li>backups might be corrupted sometimes</li>
<li>more importantly, restic takes out a lock when it runs, and so I&rsquo;d have to manually do an
unlock if I wanted the backups to continue working. Doing manual work like
this is the #1 thing I try to avoid with all my web services (who has time
for that!) so I really wanted to do something about it.</li>
</ol>
<p>There&rsquo;s probably more than one solution to this, but I decided to try to make
Mess With DNS use less memory so that there was more available memory on the
system, mostly because it seemed like a fun problem to try to solve.</p>
<h3 id="what-s-using-memory-ip-addresses">what&rsquo;s using memory: IP addresses</h3>
<p>I&rsquo;d run a memory profile of Mess With DNS a bunch of times in the past, so I
knew exactly what was using most of Mess With DNS&rsquo;s memory: IP addresses.</p>
<p>When it starts, Mess With DNS loads this <a href="https://iptoasn.com/">database where you can look up the
ASN of every IP address</a> into memory, so that when it
receives a DNS query it can take the source IP address like <code>74.125.16.248</code> and
tell you that IP address belongs to <code>GOOGLE</code>.</p>
<p>This database by itself used about 117MB of memory, and a simple <code>du</code> told me
that was too much &ndash; the original text files were only 37MB!</p>
<pre><code>$ du -sh *.tsv
26M	ip2asn-v4.tsv
11M	ip2asn-v6.tsv
</code></pre>
<p>The way it worked originally is that I had an array of these:</p>
<pre><code>type IPRange struct {
	StartIP net.IP
	EndIP   net.IP
	Num     int
	Name    string
	Country string
}
</code></pre>
<p>and I searched through it with a binary search to figure out if any of the
ranges contained the IP I was looking for. Basically the simplest possible
thing and it&rsquo;s super fast, my machine can do about 9 million lookups per
second.</p>
<h3 id="attempt-1-use-sqlite">attempt 1: use SQLite</h3>
<p>I&rsquo;ve been using SQLite recently, so my first thought was &ndash; maybe I can store
all of this data on disk in an SQLite database, give the tables an index, and
that&rsquo;ll use less memory.</p>
<p>So I:</p>
<ul>
<li>wrote a quick Python script using <a href="https://sqlite-utils.datasette.io/en/stable/">sqlite-utils</a> to import the TSV files into an SQLite database</li>
<li>adjusted my code to select from the database instead</li>
</ul>
<p>This did solve the initial memory goal (after a GC it now hardly used any
memory at all because the table was on disk!), though I&rsquo;m not sure how much GC
churn this solution would cause if we needed to do a lot of queries at once. I
did a quick memory profile and it seemed to allocate about 1KB of memory per
lookup.</p>
<p>Let&rsquo;s talk about the issues I ran into with using SQLite though.</p>
<h3 id="problem-how-to-store-ipv6-addresses">problem: how to store IPv6 addresses</h3>
<p>SQLite doesn&rsquo;t have support for big integers and IPv6 addresses are 128 bits,
so I decided to store them as text. I think <code>BLOB</code> might have been better, I
originally thought <code>BLOB</code>s couldn&rsquo;t be compared but the <a href="https://www.sqlite.org/datatype3.html#sort_order">sqlite docs</a> say they can.</p>
<p>I ended up with this schema:</p>
<pre><code>CREATE TABLE ipv4_ranges (
   start_ip INTEGER NOT NULL,
   end_ip INTEGER NOT NULL,
   asn INTEGER NOT NULL,
   country TEXT NOT NULL,
   name TEXT NOT NULL
);
CREATE TABLE ipv6_ranges (
   start_ip TEXT NOT NULL,
   end_ip TEXT NOT NULL,
   asn INTEGER,
   country TEXT,
   name TEXT
);
CREATE INDEX idx_ipv4_ranges_start_ip ON ipv4_ranges (start_ip);
CREATE INDEX idx_ipv6_ranges_start_ip ON ipv6_ranges (start_ip);
CREATE INDEX idx_ipv4_ranges_end_ip ON ipv4_ranges (end_ip);
CREATE INDEX idx_ipv6_ranges_end_ip ON ipv6_ranges (end_ip);
</code></pre>
<p>Also I learned that Python has an <code>ipaddress</code> module, so I could use
<code>ipaddress.ip_address(s).exploded</code> to make sure that the IPv6 addresses were
expanded so that a string comparison would compare them properly.</p>
<h3 id="problem-it-s-500x-slower">problem: it&rsquo;s 500x slower</h3>
<p>I ran a quick microbenchmark, something like this. It printed out that it could
look up 17,000 IPv6 addresses per second, and similarly for IPv4 addresses.</p>
<p>This was pretty discouraging &ndash; being able to look up 17k addresses per section
is kind of fine (Mess With DNS does not get a lot of traffic), but I compared it to
the original binary search code and the original code could do 9 million per second.</p>
<pre><code>	ips := []net.IP{}
	count := 20000
	for i := 0; i &lt; count; i++ {
		// create a random IPv6 address
		bytes := randomBytes()
		ip := net.IP(bytes[:])
		ips = append(ips, ip)
	}
	now := time.Now()
	success := 0
	for _, ip := range ips {
		_, err := ranges.FindASN(ip)
		if err == nil {
			success++
		}
	}
	fmt.Println(success)
	elapsed := time.Since(now)
	fmt.Println(&quot;number per second&quot;, float64(count)/elapsed.Seconds())
</code></pre>
<h3 id="time-for-explain-query-plan">time for EXPLAIN QUERY PLAN</h3>
<p>I&rsquo;d never really done an EXPLAIN in sqlite, so I thought it would be a fun
opportunity to see what the query plan was doing.</p>
<pre><code>sqlite&gt; explain query plan select * from ipv6_ranges where '2607:f8b0:4006:0824:0000:0000:0000:200e' BETWEEN start_ip and end_ip;
QUERY PLAN
`--SEARCH ipv6_ranges USING INDEX idx_ipv6_ranges_end_ip (end_ip&gt;?)
</code></pre>
<p>It looks like it&rsquo;s just using the <code>end_ip</code> index and not the <code>start_ip</code> index,
so maybe it makes sense that it&rsquo;s slower than the binary search.</p>
<p>I tried to figure out if there was a way to make SQLite use both indexes, but I
couldn&rsquo;t find one and maybe it knows best anyway.</p>
<p>At this point I gave up on the SQLite solution, I didn&rsquo;t love that it was
slower and also it&rsquo;s a lot more complex than just doing a binary search. I felt
like I&rsquo;d rather keep something much more similar to the binary search.</p>
<p>A few things I tried with SQLite that did not cause it to use both indexes:</p>
<ul>
<li>using a compound index instead of two separate indexes</li>
<li>running <code>ANALYZE</code></li>
<li>using <code>INTERSECT</code> to intersect the results of <code>start_ip &lt; ?</code> and <code>? &lt; end_ip</code>. This did make it use both indexes, but it also seemed to make the
query literally 1000x slower, probably because it needed to create the
results of both subqueries in memory and intersect them.</li>
</ul>
<h3 id="attempt-2-use-a-trie">attempt 2: use a trie</h3>
<p>My next idea was to use a
<a href="https://medium.com/basecs/trying-to-understand-tries-3ec6bede0014">trie</a>,
because I had some vague idea that maybe a trie would use less memory, and
I found this library called
<a href="https://github.com/seancfoley/ipaddress-go">ipaddress-go</a> that lets you look up IP addresses using a trie.</p>
<p>I tried using it <a href="https://gist.github.com/jvns/3ce617796b22127017590ac62c57fddd">here&rsquo;s the code</a>, but I
think I was doing something wildly wrong because, compared to my naive array + binary search:</p>
<ul>
<li>it used WAY more memory (800MB to store just the IPv4 addresses)</li>
<li>it was a lot slower to do the lookups (it could do only 100K/second instead of 9 million/second)</li>
</ul>
<p>I&rsquo;m not really sure what went wrong here but I gave up on this approach and
decided to just try to make my array use less memory and stick to a simple
binary search.</p>
<h3 id="some-notes-on-memory-profiling">some notes on memory profiling</h3>
<p>One thing I learned about memory profiling is that you can use <code>runtime</code>
package to see how much memory is currently allocated in the program. That&rsquo;s
how I got all the memory numbers in this post. Here&rsquo;s the code:</p>
<pre><code>func memusage() {
	runtime.GC()
	var m runtime.MemStats
	runtime.ReadMemStats(&amp;m)
	fmt.Printf(&quot;Alloc = %v MiB\n&quot;, m.Alloc/1024/1024)
	// write mem.prof
	f, err := os.Create(&quot;mem.prof&quot;)
	if err != nil {
		log.Fatal(err)
	}
	pprof.WriteHeapProfile(f)
	f.Close()
}
</code></pre>
<p>Also I learned that if you use <code>pprof</code> to analyze a heap profile there are two
ways to analyze it: you can pass either <code>--alloc-space</code> or <code>--inuse-space</code> to
<code>go tool pprof</code>. I don&rsquo;t know how I didn&rsquo;t realize this before but
<code>alloc-space</code> will tell you about everything that was allocated, and
<code>inuse-space</code> will just include memory that&rsquo;s currently in use.</p>
<p>Anyway I ran <code>go tool pprof -pdf --inuse_space mem.prof &gt; mem.pdf</code> a lot. Also
every time I use pprof I find myself referring to <a href="https://jvns.ca/blog/2017/09/24/profiling-go-with-pprof/">my own intro to pprof</a>, it&rsquo;s probably
the blog post I wrote that I use the most often. I should add <code>--alloc-space</code>
and <code>--inuse-space</code> to it.</p>
<h3 id="attempt-3-make-my-array-use-less-memory">attempt 3: make my array use less memory</h3>
<p>I was storing my ip2asn entries like this:</p>
<pre><code>type IPRange struct {
	StartIP net.IP
	EndIP   net.IP
	Num     int
	Name    string
	Country string
}
</code></pre>
<p>I had 3 ideas for ways to improve this:</p>
<ol>
<li>There was a lot of repetition of <code>Name</code> and the <code>Country</code>, because a lot of IP ranges belong to the same ASN</li>
<li><code>net.IP</code> is an <code>[]byte</code> under the hood, which felt like it involved an unnecessary pointer, was there a way to inline it into the struct?</li>
<li>Maybe I didn&rsquo;t need both the start IP and the end IP, often the ranges were consecutive so maybe I could rearrange things so that I only had the start IP</li>
</ol>
<h3 id="idea-3-1-deduplicate-the-name-and-country">idea 3.1: deduplicate the Name and Country</h3>
<p>I figured I could store the ASN info in an array, and then just store the index
into the array in my <code>IPRange</code> struct. Here are the structs so you can see what
I mean:</p>
<pre><code>type IPRange struct {
	StartIP netip.Addr
	EndIP   netip.Addr
	ASN     uint32
	Idx     uint32
}

type ASNInfo struct {
	Country string
	Name    string
}

type ASNPool struct {
	asns   []ASNInfo
	lookup map[ASNInfo]uint32
}
</code></pre>
<p>This worked! It brought memory usage from 117MB to 65MB &ndash; a 50MB savings. I felt good about this.</p>
<p><a href="https://github.com/jvns/mess-with-dns/blob/94f77b4bb1597b5e2a6768e33bd6c285919aa1bf/api/streamer/ip2asn/ip2asn.go#L18-L54">Here&rsquo;s all of the code for that part</a>.</p>
<h3 id="how-big-are-asns">how big are ASNs?</h3>
<p>As an aside &ndash; I&rsquo;m storing the ASN in a <code>uint32</code>, is that right? I looked in the ip2asn
file and the biggest one seems to be 401307, though there are a few lines that
say <code>4294901931</code> which is much bigger, but also are just inside the range of a
uint32. So I can definitely use a <code>uint32</code>.</p>
<pre><code>59.101.179.0	59.101.179.255	4294901931	Unknown	AS4294901931
</code></pre>
<h3 id="idea-3-2-use-netip-addr-instead-of-net-ip">idea 3.2: use <code>netip.Addr</code> instead of <code>net.IP</code></h3>
<p>It turns out that I&rsquo;m not the only one who felt that <code>net.IP</code> was using an
unnecessary amount of memory &ndash; in 2021 the folks at Tailscale released a new
IP address library for Go which solves this and many other issues. <a href="https://tailscale.com/blog/netaddr-new-ip-type-for-go">They wrote a great blog post about it</a>.</p>
<p>I discovered (to my delight) that not only does this new IP address library exist and do exactly what I want, it&rsquo;s also now in the Go
standard library as <a href="https://pkg.go.dev/net/netip#Addr">netip.Addr</a>. Switching to <code>netip.Addr</code> was
very easy and saved another 20MB of memory, bringing us to 46MB.</p>
<p>I didn&rsquo;t try my third idea (remove the end IP from the struct) because I&rsquo;d
already been programming for long enough on a Saturday morning and I was happy
with my progress.</p>
<p>It&rsquo;s always such a great feeling when I think &ldquo;hey, I don&rsquo;t like this, there
must be a better way&rdquo; and then immediately discover that someone has already
made the exact thing I want, thought about it a lot more than me, and
implemented it much better than I would have.</p>
<h3 id="all-of-this-was-messier-in-real-life">all of this was messier in real life</h3>
<p>Even though I tried to explain this in a simple linear way &ldquo;I tried X, then I
tried Y, then I tried Z&rdquo;, that&rsquo;s kind of a lie &ndash; I always try to take my
actual debugging process (total chaos) and make it seem more linear and
understandable because the reality is just too annoying to write down. It&rsquo;s
more like:</p>
<ul>
<li>try sqlite</li>
<li>try a trie</li>
<li>second guess everything that I concluded about sqlite, go back and look at
the results again</li>
<li>wait what about indexes</li>
<li>very very belatedly realize that I can use <code>runtime</code> to check how much
memory everything is using, start doing that</li>
<li>look at the trie again, maybe I misunderstood everything</li>
<li>give up and go back to binary search</li>
<li>look at all of the numbers for tries/sqlite again to make sure I didn&rsquo;t misunderstand</li>
</ul>
<h3 id="a-note-on-using-512mb-of-memory">A note on using 512MB of memory</h3>
<p>Someone asked why I don&rsquo;t just give the VM more memory. I could very easily
afford to pay for a VM with 1GB of memory, but I feel like 512MB really
<em>should</em> be enough (and really that 256MB should be enough!) so I&rsquo;d rather stay
inside that constraint. It&rsquo;s kind of a fun puzzle.</p>
<h3 id="a-few-ideas-from-the-replies">a few ideas from the replies</h3>
<p>Folks had a lot of good ideas I hadn&rsquo;t thought of. Recording them as
inspiration if I feel like having another Fun Performance Day at some point.</p>
<ul>
<li>Try Go&rsquo;s <a href="https://pkg.go.dev/unique">unique</a> package for the <code>ASNPool</code>. Someone tried this and it uses more memory, probably because Go&rsquo;s pointers are 64 bits</li>
<li>Try compiling with <code>GOARCH=386</code> to use 32-bit pointers to sace space (maybe in combination with using <code>unique</code>!)</li>
<li>It should be possible to store all of the IPv6 addresses in just 64 bits, because only the first 64 bits of the address are public</li>
<li><a href="https://en.m.wikipedia.org/wiki/Interpolation_search">Interpolation search</a> might be faster than binary search since IP addresses are numeric</li>
<li>Try the MaxMind db format with <a href="https://github.com/maxmind/mmdbwriter">mmdbwriter</a> or <a href="https://github.com/ipinfo/mmdbctl">mmdbctl</a></li>
<li>Tailscale&rsquo;s <a href="https://github.com/tailscale/art">art</a> routing table package</li>
</ul>
<h3 id="the-result-saved-70mb-of-memory">the result: saved 70MB of memory!</h3>
<p>I deployed the new version and now Mess With DNS is using less memory! Hooray!</p>
<p>A few other notes:</p>
<ul>
<li>lookups are a little slower &ndash; in my microbenchmark they went from 9 million
lookups/second to 6 million, maybe because I added a little indirection.
Using less memory and a little more CPU seemed like a good tradeoff though.</li>
<li>it&rsquo;s still using more memory than the raw text files do (46MB vs 37MB), I
guess pointers take up space and that&rsquo;s okay.</li>
</ul>
<p>I&rsquo;m honestly not sure if this will solve all my memory problems, probably not!
But I had fun, I learned a few things about SQLite, I still don&rsquo;t know what to
think about tries, and it made me love binary search even more than I already
did.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Some notes on upgrading Hugo]]></title>
    <link href="https://jvns.ca/blog/2024/10/07/some-notes-on-upgrading-hugo/"/>
    <updated>2024-10-07T09:19:57+00:00</updated>
    <id>https://jvns.ca/blog/2024/10/07/some-notes-on-upgrading-hugo/</id>
    <content type="html"><![CDATA[<p>Warning: this is a post about very boring yakshaving, probably only of interest
to people who are trying to upgrade Hugo from a very old version to a new
version. But what are blogs for if not documenting one&rsquo;s very boring yakshaves
from time to time?</p>
<p>So yesterday I decided to try to upgrade Hugo. There&rsquo;s no real reason to do
this &ndash; I&rsquo;ve been using Hugo version 0.40 to generate this blog since 2018, it
works fine, and I don&rsquo;t have any problems with it. But I thought &ndash; maybe it
won&rsquo;t be as hard as I think, and I kind of like a tedious computer task sometimes!</p>
<p>I thought I&rsquo;d document what I learned along the way in case it&rsquo;s useful to
anyone else doing this very specific migration. I upgraded from Hugo v0.40
(from 2018) to v0.135 (from 2024).</p>
<p>Here are most of the changes I had to make:</p>
<h3 id="change-1-template-theme-partials-thing-html-is-now-partial-thing-html">change 1: <code>template &quot;theme/partials/thing.html</code> is now <code>partial thing.html</code></h3>
<p>I had to replace a bunch of instances of <code>{{ template &quot;theme/partials/header.html&quot; . }}</code> with <code>{{ partial &quot;header.html&quot; . }}</code>.</p>
<p>This happened in <a href="https://github.com/gohugoio/hugo/releases/tag/v0.42">v0.42</a>:</p>
<blockquote>
<p>We have now virtualized the filesystems for project and theme files. This
makes everything simpler, faster and more powerful. But it also means that
template lookups on the form {{ template “theme/partials/pagination.html” .
}} will not work anymore. That syntax has never been documented, so it&rsquo;s not
expected to be in wide use.</p>
</blockquote>
<h3 id="change-2-data-pages-is-now-site-regularpages">change 2: <code>.Data.Pages</code> is now <code>site.RegularPages</code></h3>
<p>This seems to be discussed in the <a href="https://github.com/gohugoio/hugo/releases/tag/v0.57.2">release notes for 0.57.2</a></p>
<p>I just needed to replace <code>.Data.Pages</code> with <code>site.RegularPages</code> in the template on the homepage as well as in my RSS feed template.</p>
<h3 id="change-3-next-and-prev-got-flipped">change 3:  <code>.Next</code> and <code>.Prev</code> got flipped</h3>
<p>I had this comment in the part of my theme where I link to the next/previous blog post:</p>
<blockquote>
<p>&ldquo;next&rdquo; and &ldquo;previous&rdquo; in hugo apparently mean the opposite of what I&rsquo;d think
they&rsquo;d mean intuitively. I&rsquo;d expect &ldquo;next&rdquo; to mean &ldquo;in the future&rdquo; and
&ldquo;previous&rdquo; to mean &ldquo;in the past&rdquo; but it&rsquo;s the opposite</p>
</blockquote>
<p>It looks they changed this in
<a href="https://github.com/gohugoio/hugo/commit/ad705aac0649fa3102f7639bc4db65d45e108ee2">ad705aac064</a>
so that &ldquo;next&rdquo; actually is in the future and &ldquo;prev&rdquo; actually is in the past. I
definitely find the new behaviour more intuitive.</p>
<h3 id="downloading-the-hugo-changelogs-with-a-script">downloading the Hugo changelogs with a script</h3>
<p>Figuring out why/when all of these changes happened was a little difficult. I
ended up hacking together a bash script to <a href="https://gist.github.com/jvns/dbe4bd9271a56f1f8562bfe329c2aa9e">download all of the changelogs from github as text files</a>, which I
could then grep to try to figure out what happened. It turns out it&rsquo;s pretty
easy to get all of the changelogs from the GitHub API.</p>
<p>So far everything was not so bad &ndash; there was also a change around taxonomies
that&rsquo;s I can&rsquo;t quite explain, but it was all pretty manageable, but then we got
to the really tough one: the markdown renderer.</p>
<h3 id="change-4-the-markdown-renderer-blackfriday-goldmark">change 4: the markdown renderer (blackfriday -&gt; goldmark)</h3>
<p>The blackfriday markdown renderer (which was previously the default) was removed in <a href="https://github.com/gohugoio/hugo/releases/tag/v0.100.0">v0.100.0</a>. This seems pretty reasonable:</p>
<blockquote>
<p>It has been deprecated for a long time, its v1 version is not maintained
anymore, and there are many known issues. Goldmark should be a mature
replacement by now.</p>
</blockquote>
<p>Fixing all my Markdown changes was a huge pain &ndash; I ended up having to update
80 different Markdown files (out of 700) so that they would render properly, and I&rsquo;m not totally sure</p>
<h3 id="why-bother-switching-renderers">why bother switching renderers?</h3>
<p>The obvious question here is &ndash; why bother even trying to upgrade Hugo at all
if I have to switch Markdown renderers?
My old site was running totally fine and I think it wasn&rsquo;t necessarily a <em>good</em>
use of time, but the one reason I think it might be useful in the future is
that the new renderer (goldmark) uses the <a href="https://commonmark.org/">CommonMark markdown standard</a>, which I&rsquo;m hoping will be somewhat
more futureproof. So maybe I won&rsquo;t have to go through this again? We&rsquo;ll see.</p>
<p>Also it turned out that the new Goldmark renderer does fix some problems I had
(but didn&rsquo;t know that I had) with smart quotes and how lists/blockquotes
interact.</p>
<h3 id="finding-all-the-markdown-problems-the-process">finding all the Markdown problems: the process</h3>
<p>The hard part of this Markdown change was even figuring out what changed.
Almost all of the problems (including #2 and #3 above) just silently broke the
site, they didn&rsquo;t cause any errors or anything. So I had to diff the HTML to
hunt them down.</p>
<p>Here&rsquo;s what I ended up doing:</p>
<ol>
<li>Generate the site with the old version, put it in <code>public_old</code></li>
<li>Generate the new version, put it in <code>public</code></li>
<li>Diff every single HTML file in <code>public/</code> and <code>public_old</code> with <a href="https://gist.github.com/jvns/c7272cfb906e3ed0a3e9f8d361c5b5fc">this diff.sh script</a> and put the results in a <code>diffs/</code> folder</li>
<li>Run variations on <code>find diffs -type f | xargs cat | grep -C 5 '(31m|32m)' | less -r</code> over and over again to look at every single change until I found something that seemed wrong</li>
<li>Update the Markdown to fix the problem</li>
<li>Repeat until everything seemed okay</li>
</ol>
<p>(the <code>grep 31m|32m</code> thing is searching for red/green text in the diff)</p>
<p>This was very time consuming but it was a little bit fun for some reason so I
kept doing it until it seemed like nothing too horrible was left.</p>
<h3 id="the-new-markdown-rules">the new markdown rules</h3>
<p>Here&rsquo;s a list of every type of Markdown change I had to make. It&rsquo;s very
possible these are all extremely specific to me but it took me a long time to
figure them all out so maybe this will be helpful to one other person who finds
this in the future.</p>
<h4 id="4-1-mixing-html-and-markdown">4.1: mixing HTML and markdown</h4>
<p>This doesn&rsquo;t work anymore (it doesn&rsquo;t expand the link):</p>
<pre><code>&lt;small&gt;
[a link](https://example.com)
&lt;/small&gt;
</code></pre>
<p>I need to do this instead:</p>
<pre><code>&lt;small&gt;

[a link](https://example.com)

&lt;/small&gt;
</code></pre>
<p>This works too:</p>
<pre><code>&lt;small&gt; [a link](https://example.com) &lt;/small&gt;
</code></pre>
<h4 id="4-2-is-changed-into">4.2: <code>&lt;&lt;</code> is changed into «</h4>
<p>I didn&rsquo;t want this so I needed to configure:</p>
<pre><code>markup:
  goldmark:
    extensions:
      typographer:
        leftAngleQuote: '&amp;lt;&amp;lt;'
        rightAngleQuote: '&amp;gt;&amp;gt;'
</code></pre>
<h4 id="4-3-nested-lists-sometimes-need-4-space-indents">4.3: nested lists sometimes need 4 space indents</h4>
<p>This doesn&rsquo;t render as a nested list anymore if I only indent by 2 spaces, I need to put 4 spaces.</p>
<pre><code>1. a
  * b
  * c
2. b
</code></pre>
<p>The problem is that the amount of indent needed depends on the size of the list
markers. <a href="https://spec.commonmark.org/0.29/#example-263">Here&rsquo;s a reference in CommonMark for this</a>.</p>
<h4 id="4-4-blockquotes-inside-lists-work-better">4.4: blockquotes inside lists work better</h4>
<p>Previously the <code>&gt; quote</code> here didn&rsquo;t render as a blockquote, and with the new renderer it does.</p>
<pre><code>* something
&gt; quote
* something else
</code></pre>
<p>I found a bunch of Markdown that had been kind of broken (which I hadn&rsquo;t
noticed) that works better with the new renderer, and this is an example of
that.</p>
<p>Lists inside blockquotes also seem to work better.</p>
<h4 id="4-5-headings-inside-lists">4.5: headings inside lists</h4>
<p>Previously this didn&rsquo;t render as a heading, but now it does. So I needed to
replace the <code>#</code> with <code>&amp;num;</code>.</p>
<pre><code>* # passengers: 20
</code></pre>
<h4 id="4-6-or-1-at-the-beginning-of-the-line-makes-it-a-list">4.6:  <code>+</code> or <code>1)</code> at the beginning of the line makes it a list</h4>
<p>I had something which looked like this:</p>
<pre><code>`1 / (1
+ exp(-1)) = 0.73`
</code></pre>
<p>With Blackfriday it rendered like this:</p>
<pre><code>&lt;p&gt;&lt;code&gt;1 / (1
+ exp(-1)) = 0.73&lt;/code&gt;&lt;/p&gt;
</code></pre>
<p>and with Goldmark it rendered like this:</p>
<pre><code>&lt;p&gt;`1 / (1&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;exp(-1)) = 0.73`&lt;/li&gt;
&lt;/ul&gt;
</code></pre>
<p>Same thing if there was an accidental <code>1)</code> at the beginning of a line, like in this Markdown snippet</p>
<pre><code>I set up a small Hadoop cluster (1 master, 2 workers, replication set to 
1) on 
</code></pre>
<p>To fix this I just had to rewrap the line so that the <code>+</code> wasn&rsquo;t the first character.</p>
<p>The Markdown is formatted this way because I wrap my Markdown to 80 characters
a lot and the wrapping isn&rsquo;t very context sensitive.</p>
<h4 id="4-7-no-more-smart-quotes-in-code-blocks">4.7: no more smart quotes in code blocks</h4>
<p>There were a bunch of places where the old renderer (Blackfriday) was doing
unwanted things in code blocks like replacing <code>...</code> with <code>…</code> or replacing
quotes with smart quotes. I hadn&rsquo;t realized this was happening and I was very
happy to have it fixed.</p>
<h4 id="4-8-better-quote-management">4.8: better quote management</h4>
<p>The way this gets rendered got better:</p>
<pre><code>&quot;Oh, *interesting*!&quot;
</code></pre>
<ul>
<li>old: “Oh, <em>interesting</em>!“</li>
<li>new: “Oh, <em>interesting</em>!”</li>
</ul>
<p>Before there were two left smart quotes, now the quotes match.</p>
<h4 id="4-9-images-are-no-longer-wrapped-in-a-p-tag">4.9: images are no longer wrapped in a <code>p</code> tag</h4>
<p>Previously if I had an image like this:</p>
<pre><code>&lt;img src=&quot;https://jvns.ca/images/rustboot1.png&quot;&gt;
</code></pre>
<p>it would get wrapped in a <code>&lt;p&gt;</code> tag, now it doesn&rsquo;t anymore. I dealt with this
just by adding a <code>margin-bottom: 0.75em</code> to images in the CSS, hopefully
that&rsquo;ll make them display well enough.</p>
<h4 id="4-10-br-is-now-wrapped-in-a-p-tag">4.10: <code>&lt;br&gt;</code> is now wrapped in a <code>p</code> tag</h4>
<p>Previously this wouldn&rsquo;t get wrapped in a <code>p</code> tag, but now it seems to:</p>
<pre><code>&lt;br&gt;&lt;br&gt;
</code></pre>
<p>I just gave up on fixing this though and resigned myself to maybe having some
extra space in some cases. Maybe I&rsquo;ll try to fix it later if I feel like
another yakshave.</p>
<h4 id="4-11-some-more-goldmark-settings">4.11: some more goldmark settings</h4>
<p>I also needed to</p>
<ul>
<li>turn off code highlighting (because it wasn&rsquo;t working properly and I didn&rsquo;t have it before anyway)</li>
<li>use the old &ldquo;blackfriday&rdquo; method to generate heading IDs so they didn&rsquo;t change</li>
<li>allow raw HTML in my markdown</li>
</ul>
<p>Here&rsquo;s what I needed to add to my <code>config.yaml</code> to do all that:</p>
<pre><code>markup:
  highlight:
    codeFences: false
  goldmark:
    renderer:
      unsafe: true
    parser:
      autoHeadingIDType: blackfriday
</code></pre>
<p>Maybe I&rsquo;ll try to get syntax highlighting working one day, who knows. I might
prefer having it off though.</p>
<h3 id="a-little-script-to-compare-blackfriday-and-goldmark">a little script to compare blackfriday and goldmark</h3>
<p>I also wrote a little program to compare the Blackfriday and Goldmark output
for various markdown snippets, <a href="https://gist.github.com/jvns/9cc3024ff98433ced5e3a2304c5fc5e4">here it is in a gist</a>.</p>
<p>It&rsquo;s not really configured the exact same way Blackfriday and Goldmark were in
my Hugo versions, but it was still helpful to have to help me understand what
was going on.</p>
<h3 id="a-quick-note-on-maintaining-themes">a quick note on maintaining themes</h3>
<p>My approach to themes in Hugo has been:</p>
<ol>
<li>pay someone to make a nice design for the site (for example wizardzines.com was designed by <a href="https://melody.dev/">Melody Starling</a>)</li>
<li>use a totally custom theme</li>
<li>commit that theme to the same Github repo as the site</li>
</ol>
<p>So I just need to edit the theme files to fix any problems. Also I wrote a lot
of the theme myself so I&rsquo;m pretty familiar with how it works.</p>
<p>Relying on someone else to keep a theme updated feels kind of scary to me, I
think if I were using a third-party theme I&rsquo;d just copy the code into my site&rsquo;s
github repo and then maintain it myself.</p>
<h3 id="which-static-site-generators-have-better-backwards-compatibility">which static site generators have better backwards compatibility?</h3>
<p>I <a href="https://social.jvns.ca/@b0rk/113260718682453232">asked on Mastodon</a> if
anyone had used a static site generator with good backwards compatibility.</p>
<p>The main answers seemed to be Jekyll and 11ty. Several people said they&rsquo;d been
using Jekyll for 10 years without any issues, and 11ty says it has
<a href="https://www.11ty.dev/blog/stability/">stability as a core goal</a>.</p>
<p>I think a big factor in how appealing Jekyll/11ty are is how easy it is for you
to maintain a working Ruby / Node environment on your computer: part of the
reason I stopped using Jekyll was that I got tired of having to maintain a
working Ruby installation. But I imagine this wouldn&rsquo;t be a problem for a Ruby
or Node developer.</p>
<p>Several people said that they don&rsquo;t build their Jekyll site locally at all &ndash;
they just use GitHub Pages to build it.</p>
<h3 id="that-s-it">that&rsquo;s it!</h3>
<p>Overall I&rsquo;ve been happy with Hugo &ndash; I <a href="https://jvns.ca/blog/2016/10/09/switching-to-hugo/">started using it</a> because it had fast
build times and it was a static binary, and both of those things are still
extremely useful to me. I might have spent 10 hours on this upgrade, but I&rsquo;ve
probably spent 1000+ hours writing blog posts without thinking about Hugo at
all so that seems like an extremely reasonable ratio.</p>
<p>I find it hard to be too mad about the backwards incompatible changes, most of
them were quite a long time ago, Hugo does a great job of making their old
releases available so you can use the old release if you want, and the most
difficult one is removing support for the <code>blackfriday</code> Markdown renderer in
favour of using something CommonMark-compliant which seems pretty reasonable to
me even if it is a huge pain.</p>
<p>But it did take a long time and I don&rsquo;t think I&rsquo;d particularly recommend moving
700 blog posts to a new Markdown renderer unless you&rsquo;re really in the mood for
a lot of computer suffering for some reason.</p>
<p>The new renderer did fix a bunch of problems so I think overall it might be a
good thing, even if I&rsquo;ll have to remember to make 2 changes to how I write
Markdown (4.1 and 4.3).</p>
<p>Also I&rsquo;m still using Hugo 0.54 for <a href="https://wizardzines.com">https://wizardzines.com</a> so maybe these notes
will be useful to Future Me if I ever feel like upgrading Hugo for that site.</p>
<p>Hopefully I didn&rsquo;t break too many things on the blog by doing this, let me know
if you see anything broken!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Terminal colours are tricky]]></title>
    <link href="https://jvns.ca/blog/2024/10/01/terminal-colours/"/>
    <updated>2024-10-01T10:01:44+00:00</updated>
    <id>https://jvns.ca/blog/2024/10/01/terminal-colours/</id>
    <content type="html"><![CDATA[<p>Yesterday I was thinking about how long it took me to get a colorscheme in my
terminal that I was mostly happy with (SO MANY YEARS), and it made me wonder
what about terminal colours made it so hard.</p>
<p>So I <a href="https://social.jvns.ca/@b0rk/113226972156366201">asked people on Mastodon</a> what problems
they&rsquo;ve run into with colours in the terminal, and I got a ton of interesting
responses! Let&rsquo;s talk about some of the problems and a few possible ways to fix
them.</p>
<h3 id="problem-1-blue-on-black">problem 1: blue on black</h3>
<p>One of the top complaints was &ldquo;blue on black is hard to read&rdquo;. Here&rsquo;s an
example of that: if I open Terminal.app, set the background to black, and run
<code>ls</code>, the directories are displayed in a blue that isn&rsquo;t that easy to read:</p>
<img src="https://jvns.ca/images/terminal-blue.png" style="max-width: 400px">
<p>To understand why we&rsquo;re seeing this blue, let&rsquo;s talk about ANSI colours!</p>
<h3 id="the-16-ansi-colours">the 16 ANSI colours</h3>
<p>Your terminal has 16 numbered colours &ndash; black, red, green, yellow, blue,
magenta, cyan, white, and &ldquo;bright&rdquo; version of each of those.</p>
<p>Programs can use them by printing out an &ldquo;ANSI escape code&rdquo; &ndash; for example if
you want to see each of the 16 colours in your terminal, you can run this
Python program:</p>
<pre><code class="language-python">def color(num, text):
    return f&quot;\033[38;5;{num}m{text}\033[0m&quot;

for i in range(16):
    print(color(i, f&quot;number {i:02}&quot;))
</code></pre>
<h3 id="what-are-the-ansi-colours">what are the ANSI colours?</h3>
<p>This made me wonder &ndash; if blue is colour number 5, who decides what hex color
that should correspond to?</p>
<p>The answer seems to be &ldquo;there&rsquo;s no standard, terminal emulators just choose
colours and it&rsquo;s not very consistent&rdquo;. Here&rsquo;s a <a href="https://en.m.wikipedia.org/wiki/ANSI_escape_code#Colors">screenshot of a table from Wikipedia</a>, where you
can see that there&rsquo;s a lot of variation:</p>
<img src="https://jvns.ca/images/wikipedia.png"> 
<h3 id="problem-1-5-bright-yellow-on-white">problem 1.5: bright yellow on white</h3>
<p>Bright yellow on white is even worse than blue on black, here&rsquo;s what I get in
a terminal with the default settings:</p>
<img src="https://jvns.ca/images/terminal-yellow.png" style="max-height: 40px">
<p>That&rsquo;s almost impossible to read (and some other colours like light green cause
similar issues), so let&rsquo;s talk about solutions!</p>
<h3 id="two-ways-to-reconfigure-your-colours">two ways to reconfigure your colours</h3>
<p>If you&rsquo;re annoyed by these colour contrast issues (or maybe you just think the
default ANSI colours are ugly), you might think &ndash; well, I&rsquo;ll just choose a
different &ldquo;blue&rdquo; and pick something I like better!</p>
<p>There are two ways you can do this:</p>
<p><strong>Way 1: Configure your terminal emulator</strong>: I think most modern terminal emulators
have a way to reconfigure the colours, and some of them even come with some
preinstalled themes that you might like better than the defaults.</p>
<p><strong>Way 2: Run a shell script</strong>: There are ANSI escape codes that you can print
out to tell your terminal emulator to reconfigure its colours. <a href="https://github.com/chriskempson/base16-shell/blob/master/scripts/base16-solarized-light.sh">Here&rsquo;s a shell script that does that</a>,
from the <a href="https://github.com/chriskempson/base16-shell">base16-shell</a> project.
You can see that it has a few different conventions for changing the colours &ndash;
I guess different terminal emulators have different escape codes for changing
their colour palette, and so the script is trying to pick the right style of
escape code based on the <code>TERM</code> environment variable.</p>
<h3 id="what-are-the-pros-and-cons-of-the-2-ways-of-configuring-your-colours">what are the pros and cons of the 2 ways of configuring your colours?</h3>
<p>I prefer to use the &ldquo;shell script&rdquo; method, because:</p>
<ul>
<li>if I switch terminal emulators for some reason, I don&rsquo;t need to a different configuration system, my colours still Just Work</li>
<li>I use <a href="https://github.com/chriskempson/base16-shell">base16-shell</a> with base16-vim to make my vim colours match my terminal colours, which is convenient</li>
</ul>
<p>some advantages of configuring colours in your terminal emulator:</p>
<ul>
<li>if you use a popular terminal emulator, there are probably a lot more nice terminal themes out there that you can choose from</li>
<li>not all terminal emulators support the &ldquo;shell script method&rdquo;, and even if
they do, the results can be a little inconsistent</li>
</ul>
<p>This is what my shell has looked like for probably the last 5 years (using the
solarized light base16 theme), and I&rsquo;m pretty happy with it. Here&rsquo;s <code>htop</code>:</p>
<img src="https://jvns.ca/images/terminal-my-colours.png" style="max-width: 400px">
<p>Okay, so let&rsquo;s say you&rsquo;ve found a terminal colorscheme that you like. What else
can go wrong?</p>
<h3 id="problem-2-programs-using-256-colours">problem 2: programs using 256 colours</h3>
<p>Here&rsquo;s what some output of <code>fd</code>, a <code>find</code> alternative, looks like in my
colorscheme:</p>
<img src="https://jvns.ca/images/terminal-problem-fd.png" style="max-width: 400px">
<p>The contrast is pretty bad here, and I definitely don&rsquo;t have that lime green in
my normal colorscheme. What&rsquo;s going on?</p>
<p>We can see what color codes <code>fd</code> is using using the <code>unbuffer</code> program to
capture its output including the color codes:</p>
<pre><code>$ unbuffer fd . &gt; out
$ vim out
^[[38;5;48mbad-again.sh^[[0m
^[[38;5;48mbad.sh^[[0m
^[[38;5;48mbetter.sh^[[0m
out
</code></pre>
<p><code>^[[38;5;48</code> means &ldquo;set the foreground color to color <code>48</code>&rdquo;. Terminals don&rsquo;t
only have 16 colours &ndash; many terminals these days actually have 3 ways of
specifying colours:</p>
<ol>
<li>the 16 ANSI colours we already talked about</li>
<li>an extended set of 256 colours</li>
<li>a further extended set of 24-bit hex colours, like <code>#ffea03</code></li>
</ol>
<p>So <code>fd</code> is using one of the colours from the extended 256-color set. <code>bat</code> (a
<code>cat</code> alternative) does something similar &ndash; here&rsquo;s what it looks like by
default in my terminal.</p>
<img src="https://jvns.ca/images/terminal-bat.png" style="max-width: 400px">
<p>This looks fine though and it really seems like it&rsquo;s trying to work well with a
variety of terminal themes.</p>
<h3 id="some-newer-tools-seem-to-have-theme-support">some newer tools seem to have theme support</h3>
<p>I think it&rsquo;s interesting that some of these newer terminal tools (<code>fd</code>, <code>cat</code>,
<code>delta</code>, and probably more) have support for arbitrary custom themes. I guess
the downside of this approach is that the default theme might clash with your
terminal&rsquo;s background, but the upside is that it gives you a lot more control
over theming the tool&rsquo;s output than just choosing 16 ANSI colours.</p>
<p>I don&rsquo;t really use <code>bat</code>, but if I did I&rsquo;d probably use <code>bat --theme ansi</code> to
just use the ANSI colours that I have set in my normal terminal colorscheme.</p>
<h3 id="problem-3-the-grays-in-solarized">problem 3: the grays in Solarized</h3>
<p>A bunch of people on Mastodon mentioned a specific issue with grays in the
Solarized theme: when I list a directory, the base16 Solarized Light theme
looks like this:</p>
<img src="https://jvns.ca/images/terminal-solarized-base16.png" style="max-width: 400px">
<p>but iTerm&rsquo;s default Solarized Light theme looks like this:</p>
<img src="https://jvns.ca/images/terminal-solarized-iterm.png" style="max-width: 400px">
<p>This is because in the iTerm theme (which is the <a href="https://ethanschoonover.com/solarized/#the-values">original Solarized design</a>), colors 9-14 (the &ldquo;bright blue&rdquo;, &ldquo;bright
red&rdquo;, etc) are mapped to a series of grays, and when I run <code>ls</code>, it&rsquo;s trying to
use those &ldquo;bright&rdquo; colours to color my directories and executables.</p>
<p>My best guess for why the original Solarized theme is designed this way is to
make the grays available to the <a href="https://github.com/altercation/vim-colors-solarized/blob/528a59f26d12278698bb946f8fb82a63711eec21/colors/solarized.vim">vim Solarized colorscheme</a>.</p>
<p>I&rsquo;m pretty sure I prefer the modified base16 version I use where the &ldquo;bright&rdquo;
colours are actually colours instead of all being shades of gray though. (I
didn&rsquo;t actually realize the version I was using wasn&rsquo;t the &ldquo;original&rdquo; Solarized
theme until I wrote this post)</p>
<p>In any case I really love Solarized and I&rsquo;m very happy it exists so that I can
use a modified version of it.</p>
<h3 id="problem-4-a-vim-theme-that-doesn-t-match-the-terminal-background">problem 4: a vim theme that doesn&rsquo;t match the terminal background</h3>
<p>If I my vim theme has a different background colour than my terminal theme, I
get this ugly border, like this:</p>
<img src="https://jvns.ca/images/terminal-vim-black-bg.png" style="max-width: 400px">
<p>This one is a pretty minor issue though and I think making your terminal
background match your vim background is pretty straightforward.</p>
<h3 id="problem-5-programs-setting-a-background-color">problem 5: programs setting a background color</h3>
<p>A few people mentioned problems with terminal applications setting an
unwanted background colour, so let&rsquo;s look at an example of that.</p>
<p>Here <code>ngrok</code> has set the background to color #16 (&ldquo;black&rdquo;), but the
<code>base16-shell</code> script I use sets color 16 to be bright orange, so I get this,
which is pretty bad:</p>
<img src="https://jvns.ca/images/terminal-ngrok-solarized.png" style="max-width: 400px">
<p>I think the intention is for ngrok to look something like this:</p>
<img src="https://jvns.ca/images/terminal-ngrok-regular.png" style="max-width: 400px">
<p>I think <code>base16-shell</code> sets color #16 to orange (instead of black)
so that it can provide extra colours for use by <a href="https://github.com/chriskempson/base16-vim/blob/3be3cd82cd31acfcab9a41bad853d9c68d30478d/colors/base16-solarized-light.vim">base16-vim</a>.
This feels reasonable to me &ndash; I use <code>base16-vim</code> in the terminal, so I guess I&rsquo;m
using that feature and it&rsquo;s probably more important to me than <code>ngrok</code> (which I
rarely use) behaving a bit weirdly.</p>
<p>This particular issue is a maybe obscure clash between ngrok and my colorschem,
but I think this kind of clash is pretty common when a program sets an ANSI
background color that the user has remapped for some reason.</p>
<h3 id="a-nice-solution-to-contrast-issues-minimum-contrast">a nice solution to contrast issues: &ldquo;minimum contrast&rdquo;</h3>
<p>A bunch of terminals (iTerm2, <a href="https://github.com/Eugeny/tabby">tabby</a>, kitty&rsquo;s <a href="https://sw.kovidgoyal.net/kitty/conf/#opt-kitty.text_fg_override_threshold">text_fg_override_threshold</a>, and
folks tell me also Ghostty and Windows Terminal) have a &ldquo;minimum
contrast&rdquo; feature that will automatically adjust colours to make sure they have enough contrast.</p>
<p>Here&rsquo;s an example from iTerm. This ngrok accident from before has pretty bad
contrast, I find it pretty difficult to read:</p>
<img src="https://jvns.ca/images/terminal-ngrok-solarized.png" style="max-width: 400px">
<p>With &ldquo;minimum contrast&rdquo; set to 40 in iTerm, it looks like this instead:</p>
<img src="https://jvns.ca/images/terminal-ngrok-solarized-contrast.png" style="max-width: 400px">
<p>I didn&rsquo;t have minimum contrast turned on before but I just turned it on today
because it makes such a big difference when something goes wrong with colours
in the terminal.</p>
<h3 id="problem-6-term-being-set-to-the-wrong-thing">problem 6: <code>TERM</code> being set to the wrong thing</h3>
<p>A few people mentioned that they&rsquo;ll SSH into a system that doesn&rsquo;t support the
<code>TERM</code> environment variable that they have set locally, and then the colours
won&rsquo;t work.</p>
<p>I think the way <code>TERM</code> works is that systems have a <code>terminfo</code> database, so if
the value of the <code>TERM</code> environment variable isn&rsquo;t in the system&rsquo;s terminfo
database, then it won&rsquo;t know how to output colours for that terminal. I don&rsquo;t
know too much about terminfo, but someone linked me to this <a href="https://twoot.site/@bean/113056942625234032">terminfo rant</a> that talks about a few other
issues with terminfo.</p>
<p>I don&rsquo;t have a system on hand to reproduce this one so I can&rsquo;t say for sure how
to fix it, but <a href="https://unix.stackexchange.com/questions/67537/prevent-ssh-client-passing-term-environment-variable-to-server">this stackoverflow question</a>
suggests running something like <code>TERM=xterm ssh</code> instead of <code>ssh</code>.</p>
<h3 id="problem-7-picking-good-colours-is-hard">problem 7: picking &ldquo;good&rdquo; colours is hard</h3>
<p>A couple of problems people mentioned with designing / finding terminal colorschemes:</p>
<ul>
<li>some folks are colorblind and have trouble finding an appropriate colorscheme</li>
<li>accidentally making the background color too close to the cursor or selection color, so they&rsquo;re hard to find</li>
<li>generally finding colours that work with every program is a struggle (for example you can see me having a problem with this with ngrok above!)</li>
</ul>
<h3 id="problem-8-making-nethack-mc-look-right">problem 8: making nethack/mc look right</h3>
<p>Another problem people mentioned is using a program like nethack or midnight
commander which you might expect to have a specific colourscheme based on the
default ANSI terminal colours.</p>
<p>For example, midnight commander has a really specific classic look:</p>
<img src="https://jvns.ca/images/terminal-mc-normal.png" style="max-width: 200px">
<p>But in my Solarized theme, midnight commander looks like this:</p>
<img src="https://jvns.ca/images/terminal-mc-solarized.png" style="max-width: 200px">
<p>The Solarized version feels like it could be disorienting if you&rsquo;re
very used to the &ldquo;classic&rdquo; look.</p>
<p>One solution Simon Tatham mentioned to this is using some palette customization
ANSI codes (like the ones base16 uses that I talked about earlier) to change
the color palette right before starting the program, for example remapping
yellow to a brighter yellow before starting Nethack so that the yellow
characters look better.</p>
<h3 id="problem-9-commands-disabling-colours-when-writing-to-a-pipe">problem 9: commands disabling colours when writing to a pipe</h3>
<p>If I run <code>fd | less</code>, I see something like this, with the colours disabled.</p>
<img src="https://jvns.ca/images/terminal-fd-bw.png" style="max-width: 300px">
<p>In general I find this useful &ndash; if I pipe a command to <code>grep</code>, I don&rsquo;t want it
to print out all those color escape codes, I just want the plain text. But what if you want to see the colours?</p>
<p>To see the colours, you can run <code>unbuffer fd | less -r</code>! I just learned about
<code>unbuffer</code> recently and I think it&rsquo;s really cool, <code>unbuffer</code> opens a tty for the
command to write to so that it thinks it&rsquo;s writing to a TTY. It also fixes
issues with programs buffering their output when writing to a pipe, which is
why it&rsquo;s called <code>unbuffer</code>.</p>
<p>Here&rsquo;s what the output of <code>unbuffer fd | less -r</code> looks like for me:</p>
<img src="https://jvns.ca/images/terminal-fd-color.png" style="max-width: 300px">
<p>Also some commands (including <code>fd</code>) support a <code>--color=always</code> flag which will
force them to always print out the colours.</p>
<h3 id="problem-10-unwanted-colour-in-ls-and-other-commands">problem 10: unwanted colour in <code>ls</code> and other commands</h3>
<p>Some people mentioned that they don&rsquo;t want <code>ls</code> to use colour at all, perhaps
because <code>ls</code> uses blue, it&rsquo;s hard to read on black, and maybe they don&rsquo;t feel like
customizing their terminal&rsquo;s colourscheme to make the blue more readable or
just don&rsquo;t find the use of colour helpful.</p>
<p>Some possible solutions to this one:</p>
<ul>
<li>you can run <code>ls --color=never</code>, which is probably easiest</li>
<li>you can also set <code>LS_COLORS</code> to customize the colours used by <code>ls</code>. I think some other programs other than <code>ls</code> support the <code>LS_COLORS</code> environment variable too.</li>
<li>also some programs support setting <code>NO_COLOR=true</code> (there&rsquo;s a <a href="https://no-color.org/">list here</a>)</li>
</ul>
<p>Here&rsquo;s an example of running <code>LS_COLORS=&quot;fi=0:di=0:ln=0:pi=0:so=0:bd=0:cd=0:or=0:ex=0&quot; ls</code>:</p>
<img src="https://jvns.ca/images/terminal-ls-colors.png" style="max-width: 500px">
<h3 id="problem-11-the-colours-in-vim">problem 11: the colours in vim</h3>
<p>I used to have a lot of problems with configuring my colours in vim &ndash; I&rsquo;d set
up my terminal colours in a way that I thought was okay, and then I&rsquo;d start vim
and it would just be a disaster.</p>
<p>I think what was going on here is that today, there are two ways to set up a vim colorscheme in the terminal:</p>
<ol>
<li>using your ANSI terminal colours &ndash; you tell vim which ANSI colour number to use for the background, for functions, etc.</li>
<li>using 24-bit hex colours &ndash; instead of ANSI terminal colours, the vim colorscheme can use hex codes like #faea99 directly</li>
</ol>
<p>20 years ago when I started using vim, terminals with 24-bit hex color support
were a lot less common (or maybe they didn&rsquo;t exist at all), and vim certainly
didn&rsquo;t have support for using 24-bit colour in the terminal. From some quick
searching through git, it looks like <a href="https://github.com/vim/vim/commit/8a633e3427b47286869aa4b96f2bfc1fe65b25cd">vim added support for 24-bit colour in 2016</a>
&ndash; just 8 years ago!</p>
<p>So to get colours to work properly in vim before 2016, you needed to synchronize
your terminal colorscheme and your vim colorscheme. <a href="https://github.com/chriskempson/base16-vim/blob/3be3cd82cd31acfcab9a41bad853d9c68d30478d/colors/base16-solarized-light.vim#L52-L71">Here&rsquo;s what that looked like</a>,
the colorscheme needed to map the vim color classes like <code>cterm05</code> to ANSI colour numbers.</p>
<p>But in 2024, the story is really different! Vim (and Neovim, which I use now)
support 24-bit colours, and as of Neovim 0.10 (released in May 2024), the
<code>termguicolors</code> setting (which tells Vim to use 24-bit hex colours for
colorschemes) is <a href="https://neovim.io/doc/user/news-0.10.html">turned on by default</a> in any terminal with 24-bit
color support.</p>
<p>So this &ldquo;you need to synchronize your terminal colorscheme and your vim
colorscheme&rdquo; problem is not an issue anymore for me in 2024, since I
don&rsquo;t plan to use terminals without 24-bit color support in the future.</p>
<p>The biggest consequence for me of this whole thing is that I don&rsquo;t need base16
to set colors 16-21 to weird stuff anymore to integrate with vim &ndash; I can just
use a terminal theme and a vim theme, and as long as the two themes use similar
colours (so it&rsquo;s not jarring for me to switch between them) there&rsquo;s no problem.
I think I can just remove those parts from my <code>base16</code> shell script and totally
avoid the problem with ngrok and the weird orange background I talked about
above.</p>
<h3 id="some-more-problems-i-left-out">some more problems I left out</h3>
<p>I think there are a lot of issues around the intersection of multiple programs,
like using some combination tmux/ssh/vim that I couldn&rsquo;t figure out how to
reproduce well enough to talk about them. Also I&rsquo;m sure I missed a lot of other
things too.</p>
<h3 id="base16-has-really-worked-for-me">base16 has really worked for me</h3>
<p>I&rsquo;ve personally had a lot of success with using
<a href="https://github.com/chriskempson/base16-shell">base16-shell</a> with
<a href="https://github.com/chriskempson/base16-vim">base16-vim</a> &ndash; I just need to add <a href="https://github.com/chriskempson/base16-shell?tab=readme-ov-file#fish">a couple of lines</a> to my
fish config to set it up (+ a few <code>.vimrc</code> lines) and then I can move on and
accept any remaining problems that that doesn&rsquo;t solve.</p>
<p>I don&rsquo;t think base16 is for everyone though, some limitations I&rsquo;m aware
of with base16 that might make it not work for you:</p>
<ul>
<li>it comes with a limited set of builtin themes and you might not like any of them</li>
<li>the Solarized base16 theme (and maybe all of the themes?) sets the &ldquo;bright&rdquo;
ANSI colours to be exactly the same as the normal colours, which might cause
a problem if you&rsquo;re relying on the &ldquo;bright&rdquo; colours to be different from the
regular ones</li>
<li>it sets colours 16-21 in order to give the vim colorschemes from <code>base16-vim</code>
access to more colours, which might not be relevant if you always use a
terminal with 24-bit color support, and can cause problems like the ngrok
issue above</li>
<li>also the way it sets colours 16-21 could be a problem in terminals that don&rsquo;t
have 256-color support, like the linux framebuffer terminal</li>
</ul>
<p>Apparently there&rsquo;s a community fork of base16 called
<a href="https://github.com/tinted-theming/home">tinted-theming</a>, which I haven&rsquo;t
looked into much yet.</p>
<h3 id="some-other-colorscheme-tools">some other colorscheme tools</h3>
<p>Just one so far but I&rsquo;ll link more if people tell me about them:</p>
<ul>
<li><a href="https://rootloops.sh/">rootloops.sh</a> for generating colorschemes (and <a href="https://hamvocke.com/blog/lets-create-a-terminal-color-scheme/">&ldquo;let&rsquo;s create a terminal color scheme&rdquo;</a>)</li>
<li>Some popular colorschemes (according to people I asked on Mastodon): <a href="https://catppuccin.com/">catpuccin</a>, Monokai, Gruvbox, <a href="https://github.com/dracula">Dracula</a>, <a href="https://protesilaos.com/emacs/modus-themes">Modus (a high contrast theme)</a>, <a href="https://github.com/folke/tokyonight.nvim">Tokyo Night</a>, <a href="https://www.nordtheme.com/">Nord</a>, <a href="https://rosepinetheme.com/">Rosé Pine</a></li>
</ul>
<h3 id="okay-that-was-a-lot">okay, that was a lot</h3>
<p>We talked about a lot in this post and  while I think learning about all these
details is kind of fun if I&rsquo;m in the mood to do a deep dive, I find it SO
FRUSTRATING to deal with it when I just want my colours to work! Being
surprised by unreadable text and having to find a workaround is just not my
idea of a good day.</p>
<p>Personally I&rsquo;m a zero-configuration kind of person and it&rsquo;s not that appealing
to me to have to put together a lot of custom configuration just to make my
colours in the terminal look acceptable. I&rsquo;d much rather just have some
reasonable defaults that I don&rsquo;t have to change.</p>
<h3 id="minimum-contrast-seems-like-an-amazing-feature">minimum contrast seems like an amazing feature</h3>
<p>My one big takeaway from writing this was to turn on &ldquo;minimum contrast&rdquo; in my
terminal, I think it&rsquo;s going to fix most of the occasional accidental
unreadable text issues I run into and I&rsquo;m pretty excited about it.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Some Go web dev notes]]></title>
    <link href="https://jvns.ca/blog/2024/09/27/some-go-web-dev-notes/"/>
    <updated>2024-09-27T11:16:00+00:00</updated>
    <id>https://jvns.ca/blog/2024/09/27/some-go-web-dev-notes/</id>
    <content type="html"><![CDATA[<p>I spent a lot of time in the past couple of weeks working on a website in Go
that may or may not ever see the light of day, but I learned a couple of things
along the way I wanted to write down. Here they are:</p>
<h3 id="go-1-22-now-has-better-routing">go 1.22 now has better routing</h3>
<p>I&rsquo;ve never felt motivated to learn any of the Go routing libraries
(gorilla/mux, chi, etc), so I&rsquo;ve been doing all my routing by hand, like this.</p>
<pre><code>	// DELETE /records:
	case r.Method == &quot;DELETE&quot; &amp;&amp; n == 1 &amp;&amp; p[0] == &quot;records&quot;:
		if !requireLogin(username, r.URL.Path, r, w) {
			return
		}
		deleteAllRecords(ctx, username, rs, w, r)
	// POST /records/&lt;ID&gt;
	case r.Method == &quot;POST&quot; &amp;&amp; n == 2 &amp;&amp; p[0] == &quot;records&quot; &amp;&amp; len(p[1]) &gt; 0:
		if !requireLogin(username, r.URL.Path, r, w) {
			return
		}
		updateRecord(ctx, username, p[1], rs, w, r)

</code></pre>
<p>But apparently <a href="https://go.dev/blog/routing-enhancements">as of Go 1.22</a>, Go
now has better support for routing in the standard library, so that code can be
rewritten something like this:</p>
<pre><code>	mux.HandleFunc(&quot;DELETE /records/&quot;, app.deleteAllRecords)
	mux.HandleFunc(&quot;POST /records/{record_id}&quot;, app.updateRecord)
</code></pre>
<p>Though it would also need a login middleware, so maybe something more like
this, with a <code>requireLogin</code> middleware.</p>
<pre><code>	mux.Handle(&quot;DELETE /records/&quot;, requireLogin(http.HandlerFunc(app.deleteAllRecords)))
</code></pre>
<h3 id="a-gotcha-with-the-built-in-router-redirects-with-trailing-slashes">a gotcha with the built-in router: redirects with trailing slashes</h3>
<p>One annoying gotcha I ran into was: if I make a route for <code>/records/</code>, then a
request for <code>/records</code> <a href="https://pkg.go.dev/net/http#hdr-Trailing_slash_redirection-ServeMux">will be redirected</a> to <code>/records/</code>.</p>
<p>I ran into an issue with this where sending a POST request to <code>/records</code>
redirected to a GET request for <code>/records/</code>, which broke the POST request
because it removed the request body. Thankfully <a href="https://xeiaso.net/blog/go-servemux-slash-2021-11-04/">Xe Iaso wrote a blog post about the exact same issue</a> which made it
easier to debug.</p>
<p>I think the solution to this is just to use API endpoints like <code>POST /records</code>
instead of <code>POST /records/</code>, which seems like a more normal design anyway.</p>
<h3 id="sqlc-automatically-generates-code-for-my-db-queries">sqlc automatically generates code for my db queries</h3>
<p>I got a little bit tired of writing so much boilerplate for my SQL queries, but
I didn&rsquo;t really feel like learning an ORM, because I know what SQL queries I
want to write, and I didn&rsquo;t feel like learning the ORM&rsquo;s conventions for
translating things into SQL queries.</p>
<p>But then I found <a href="https://sqlc.dev/">sqlc</a>, which will compile a query like this:</p>
<pre><code>
-- name: GetVariant :one
SELECT *
FROM variants
WHERE id = ?;

</code></pre>
<p>into Go code like this:</p>
<pre><code>const getVariant = `-- name: GetVariant :one
SELECT id, created_at, updated_at, disabled, product_name, variant_name
FROM variants
WHERE id = ?
`

func (q *Queries) GetVariant(ctx context.Context, id int64) (Variant, error) {
	row := q.db.QueryRowContext(ctx, getVariant, id)
	var i Variant
	err := row.Scan(
		&amp;i.ID,
		&amp;i.CreatedAt,
		&amp;i.UpdatedAt,
		&amp;i.Disabled,
		&amp;i.ProductName,
		&amp;i.VariantName,
	)
	return i, err
}
</code></pre>
<p>What I like about this is that if I&rsquo;m ever unsure about what Go code to write
for a given SQL query, I can just write the query I want, read the generated
function and it&rsquo;ll tell me exactly what to do to call it. It feels much easier
to me than trying to dig through the ORM&rsquo;s documentation to figure out how to
construct the SQL query I want.</p>
<p>Reading <a href="https://brandur.org/fragments/sqlc-2024">Brandur&rsquo;s sqlc notes from 2024</a> also gave me some confidence
that this is a workable path for my tiny programs. That post gives a really
helpful example of how to conditionally update fields in a table using CASE
statements (for example if you have a table with 20 columns and you only want
to update 3 of them).</p>
<h3 id="sqlite-tips">sqlite tips</h3>
<p>Someone on Mastodon linked me to this post called <a href="https://kerkour.com/sqlite-for-servers">Optimizing sqlite for servers</a>. My projects are small and I&rsquo;m
not so concerned about performance, but my main takeaways were:</p>
<ul>
<li>have a dedicated object for <strong>writing</strong> to the database, and run
<code>db.SetMaxOpenConns(1)</code> on it. I learned the hard way that if I don&rsquo;t do this
then I&rsquo;ll get <code>SQLITE_BUSY</code> errors from two threads trying to write to the db
at the same time.</li>
<li>if I want to make reads faster, I could have 2 separate db objects, one for writing and one for reading</li>
</ul>
<p>There are a more tips in that post that seem useful (like &ldquo;COUNT queries are
slow&rdquo; and &ldquo;Use STRICT tables&rdquo;), but I haven&rsquo;t done those yet.</p>
<p>Also sometimes if I have two tables where I know I&rsquo;ll never need to do a <code>JOIN</code>
beteween them, I&rsquo;ll just put them in separate databases so that I can connect
to them independently.</p>
<h3 id="go-1-19-introduced-a-way-to-set-a-gc-memory-limit">Go 1.19 introduced a way to set a GC memory limit</h3>
<p>I run all of my Go projects in VMs with relatively little memory, like 256MB or
512MB. I ran into an issue where my application kept getting OOM killed and it
was confusing &ndash; did I have a memory leak? What?</p>
<p>After some Googling, I realized that maybe I didn&rsquo;t have a memory leak, maybe I
just needed to reconfigure the garbage collector! It turns out that by default (according to <a href="https://tip.golang.org/doc/gc-guide">A Guide to the Go Garbage Collector</a>), Go&rsquo;s garbage collector will
let the application allocate memory up to <strong>2x</strong> the current heap size.</p>
<p><a href="https://messwithdns.net">Mess With DNS</a>&rsquo;s base heap size is around 170MB and
the amount of memory free on the VM is around 160MB right now, so if its memory
doubled, it&rsquo;ll get OOM killed.</p>
<p>In Go 1.19, they added a way to tell Go &ldquo;hey, if the application starts using
this much memory, run a GC&rdquo;. So I set the GC memory limit to 250MB and it seems
to have resulted in the application getting OOM killed less often:</p>
<pre><code>export GOMEMLIMIT=250MiB
</code></pre>
<h3 id="some-reasons-i-like-making-websites-in-go">some reasons I like making websites in Go</h3>
<p>I&rsquo;ve been making tiny websites (like the <a href="https://nginx-playground.wizardzines.com/">nginx playground</a>) in Go on and off for the last 4 years or so and it&rsquo;s really been working for me. I think I like it because:</p>
<ul>
<li>there&rsquo;s just 1 static binary, all I need to do to deploy it is copy the binary. If there are static files I can just embed them in the binary with <a href="https://pkg.go.dev/embed">embed</a>.</li>
<li>there&rsquo;s a built-in webserver that&rsquo;s okay to use in production, so I don&rsquo;t need to configure WSGI or whatever to get it to work. I can just put it behind <a href="https://caddyserver.com/">Caddy</a> or run it on fly.io or whatever.</li>
<li>Go&rsquo;s toolchain is very easy to install, I can just do <code>apt-get install golang-go</code> or whatever and then a <code>go build</code> will build my project</li>
<li>it feels like there&rsquo;s very little to remember to start sending HTTP responses
&ndash; basically all there is are functions like <code>Serve(w http.ResponseWriter, r *http.Request)</code> which read the request and send a response. If I need to
remember some detail of how exactly that&rsquo;s accomplished, I just have to read
the function!</li>
<li>also <code>net/http</code> is in the standard library, so you can start making websites
without installing any libraries at all. I really appreciate this one.</li>
<li>Go is a pretty systems-y language, so if I need to run an <code>ioctl</code> or
something that&rsquo;s easy to do</li>
</ul>
<p>In general everything about it feels like it makes projects easy to work on for
5 days, abandon for 2 years, and then get back into writing code without a lot
of problems.</p>
<p>For contrast, I&rsquo;ve tried to learn Rails a couple of times and I really <em>want</em>
to love Rails &ndash; I&rsquo;ve made a couple of toy websites in Rails and it&rsquo;s always
felt like a really magical experience. But ultimately when I come back to those
projects I can&rsquo;t remember how anything works and I just end up giving up. It
feels easier to me to come back to my Go projects that are full of a lot of
repetitive boilerplate, because at least I can read the code and figure out how
it works.</p>
<h3 id="things-i-haven-t-figured-out-yet">things I haven&rsquo;t figured out yet</h3>
<p>some things I haven&rsquo;t done much of yet in Go:</p>
<ul>
<li>rendering HTML templates: usually my Go servers are just APIs and I make the
frontend a single-page app with Vue. I&rsquo;ve used <code>html/template</code> a lot in Hugo (which I&rsquo;ve used for this blog for the last 8 years)
but I&rsquo;m still not sure how I feel about it.</li>
<li>I&rsquo;ve never made a real login system, usually my servers don&rsquo;t have users at all.</li>
<li>I&rsquo;ve never tried to implement CSRF</li>
</ul>
<p>In general I&rsquo;m not sure how to implement security-sensitive features so I don&rsquo;t
start projects which need login/CSRF/etc. I imagine this is where a framework
would help.</p>
<h3 id="it-s-cool-to-see-the-new-features-go-has-been-adding">it&rsquo;s cool to see the new features Go has been adding</h3>
<p>Both of the Go features I mentioned in this post (<code>GOMEMLIMIT</code> and the routing)
are new in the last couple of years and I didn&rsquo;t notice when they came out. It
makes me think I should pay closer attention to the release notes for new Go
versions.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Reasons I still love the fish shell]]></title>
    <link href="https://jvns.ca/blog/2024/09/12/reasons-i--still--love-fish/"/>
    <updated>2024-09-12T15:09:12+00:00</updated>
    <id>https://jvns.ca/blog/2024/09/12/reasons-i--still--love-fish/</id>
    <content type="html"><![CDATA[<p>I wrote about how much I love <a href="https://fishshell.com/">fish</a> in <a href="https://jvns.ca/blog/2017/04/23/the-fish-shell-is-awesome/">this blog post from 2017</a> and, 7 years
of using it every day later, I&rsquo;ve found even more reasons to love it. So I
thought I&rsquo;d write a new post with both the old reasons I loved it and some
reasons.</p>
<p>This came up today because I was trying to figure out why my terminal doesn&rsquo;t
break anymore when I cat a binary to my terminal, the answer was &ldquo;fish fixes
the terminal!&rdquo;, and I just thought that was really nice.</p>
<h3 id="1-no-configuration">1. no configuration</h3>
<p>In 10 years of using fish I have never found a single thing I wanted to configure. It just works the way I want. My fish config file just has:</p>
<ul>
<li>environment variables</li>
<li>aliases (<code>alias ls eza</code>, <code>alias vim nvim</code>, etc)</li>
<li>the occasional <code>direnv hook fish | source</code> to integrate a tool like direnv</li>
<li>a script I run to set up my <a href="https://github.com/chriskempson/base16-shell/blob/588691ba71b47e75793ed9edfcfaa058326a6f41/scripts/base16-solarized-light.sh">terminal colours</a></li>
</ul>
<p>I&rsquo;ve been told that configuring things in fish is really easy if you ever do
want to configure something though.</p>
<h3 id="2-autosuggestions-from-my-shell-history">2. autosuggestions from my shell history</h3>
<p>My absolute favourite thing about fish is that I type, it’ll automatically
suggest (in light grey) a matching command that I ran recently. I can press the
right arrow key to accept the completion, or keep typing to ignore it.</p>
<p>Here’s what that looks like. In this example I just typed the “v” key and it
guessed that I want to run the previous vim command again.</p>
<img src="https://jvns.ca/images/fish-2024.png">
<h3 id="2-5-smart-shell-autosuggestions">2.5 &ldquo;smart&rdquo; shell autosuggestions</h3>
<p>One of my favourite subtle autocomplete features is how fish handles autocompleting commands that contain paths in them. For example, if I run:</p>
<pre><code>$ ls blah.txt
</code></pre>
<p>that command will only be autocompleted in directories that contain <code>blah.txt</code> &ndash; it won&rsquo;t show up in a different directory. (here&rsquo;s <a href="https://github.com/fish-shell/fish-shell/issues/120#issuecomment-6376019">a short comment about how it works</a>)</p>
<p>As an example, if in this directory I type <code>bash scripts/</code>, it&rsquo;ll only suggest
history commands including files that <em>actually exist</em> in my blog&rsquo;s scripts
folder, and not the dozens of other irrelevant <code>scripts/</code> commands I&rsquo;ve run in
other folders.</p>
<p>I didn&rsquo;t understand exactly how this worked until last week, it just felt like fish was
magically able to suggest the right commands. It still feels a little like magic and I love it.</p>
<h3 id="3-pasting-multiline-commands">3. pasting multiline commands</h3>
<p>If I copy and paste multiple lines, bash will run them all, like this:</p>
<pre><code>[bork@grapefruit linux-playground (main)]$ echo hi
hi
[bork@grapefruit linux-playground (main)]$ touch blah
[bork@grapefruit linux-playground (main)]$ echo hi
hi
</code></pre>
<p>This is a bit alarming &ndash; what if I didn&rsquo;t actually <em>want</em> to run all those
commands?</p>
<p>Fish will paste them all at a single prompt, so that I can press Enter if I
actually want to run them. Much less scary.</p>
<pre><code>bork@grapefruit ~/work/&gt; echo hi

                         touch blah
                         echo hi
</code></pre>
<h3 id="4-nice-tab-completion">4. nice tab completion</h3>
<p>If I run <code>ls</code> and press tab, it&rsquo;ll display all the filenames in a nice grid. I can use either Tab, Shift+Tab, or the arrow keys to navigate the grid.</p>
<p>Also, I can tab complete from the <strong>middle</strong> of a filename &ndash; if the filename
starts with a weird character (or if it&rsquo;s just not very unique), I can type
some characters from the middle and press tab.</p>
<p>Here&rsquo;s what the tab completion looks like:</p>
<pre><code>bork@grapefruit ~/work/&gt; ls 
api/  blah.py     fly.toml   README.md
blah  Dockerfile  frontend/  test_websocket.sh
</code></pre>
<p>I honestly don&rsquo;t complete things other than filenames very much so I can&rsquo;t
speak to that, but I&rsquo;ve found the experience of tab completing filenames to be
very good.</p>
<h3 id="5-nice-default-prompt-including-git-integration">5. nice default prompt (including git integration)</h3>
<p>Fish&rsquo;s default prompt includes everything I want:</p>
<ul>
<li>username</li>
<li>hostname</li>
<li>current folder</li>
<li>git integration</li>
<li>status of last command exit (if the last command failed)</li>
</ul>
<p>Here&rsquo;s a screenshot with a few different variations on the default prompt,
including if the last command was interrupted (the <code>SIGINT</code>) or failed.</p>
<img src="https://jvns.ca/images/fish-prompt-2024.png">
<h3 id="6-nice-history-defaults">6. nice history defaults</h3>
<p>In bash, the maximum history size is 500 by default, presumably because
computers used to be slow and not have a lot of disk space. Also, by default,
commands don&rsquo;t get added to your history until you end your session. So if your
computer crashes, you lose some history.</p>
<p>In fish:</p>
<ol>
<li>the default history size is 256,000 commands. I don&rsquo;t see any reason I&rsquo;d ever need more.</li>
<li>if you open a new tab, everything you&rsquo;ve ever run (including commands in
open sessions) is immediately available to you</li>
<li>in an existing session, the history search will only include commands from
the current session, plus everything that was in history at the time that
you started the shell</li>
</ol>
<p>I&rsquo;m not sure how clearly I&rsquo;m explaining how fish&rsquo;s history system works here,
but it feels really good to me in practice. My impression is that the way it&rsquo;s
implemented is the commands are continually added to the history file, but fish
only loads the history file once, on startup.</p>
<p>I&rsquo;ll mention here that if you want to have a fancier history system in another
shell it might be worth checking out <a href="https://github.com/atuinsh/atuin">atuin</a> or <a href="https://github.com/junegunn/fzf">fzf</a>.</p>
<h3 id="7-press-up-arrow-to-search-history">7. press up arrow to search history</h3>
<p>I also like fish&rsquo;s interface for searching history: for example if I want to
edit my fish config file, I can just type:</p>
<pre><code>$ config.fish
</code></pre>
<p>and then press the up arrow to go back the last command that included <code>config.fish</code>. That&rsquo;ll complete to:</p>
<pre><code>$ vim ~/.config/fish/config.fish
</code></pre>
<p>and I&rsquo;m done. This isn&rsquo;t <em>so</em> different from using <code>Ctrl+R</code> in bash to search
your history but I think I like it a little better over all, maybe because
<code>Ctrl+R</code> has some behaviours that I find confusing (for example you can
end up accidentally editing your history which I don&rsquo;t like).</p>
<h3 id="8-the-terminal-doesn-t-break">8. the terminal doesn&rsquo;t break</h3>
<p>I used to run into issues with bash where I&rsquo;d accidentally <code>cat</code> a binary to
the terminal, and it would break the terminal.</p>
<p>Every time fish displays a prompt, it&rsquo;ll try to fix up your terminal so that
you don&rsquo;t end up in weird situations like this. I think <a href="https://github.com/fish-shell/fish-shell/blob/a979b6341d7fc4c466b3992f25da3209e0808aaa/src/reader.rs#L3601-L3623">this is some of the
code in fish to prevent broken terminals</a>.</p>
<p>Some things that it does are:</p>
<ul>
<li>turn on <code>echo</code> so that you can see the characters you type</li>
<li>make sure that newlines work properly so that you don&rsquo;t get that weird staircase effect</li>
<li>reset your terminal background colour, etc</li>
</ul>
<p>I don&rsquo;t think I&rsquo;ve run into any of these &ldquo;my terminal is broken&rdquo; issues in a
very long time, and I actually didn&rsquo;t even realize that this was because of
fish &ndash; I thought that things somehow magically just got better, or maybe I
wasn&rsquo;t making as many mistakes. But I think it was mostly fish saving me from
myself, and I really appreciate that.</p>
<h3 id="9-ctrl-s-is-disabled">9. Ctrl+S is disabled</h3>
<p>Also related to terminals breaking: fish disables Ctrl+S (which freezes your
terminal and then you need to remember to press Ctrl+Q to unfreeze it). It&rsquo;s a
feature that I&rsquo;ve never wanted and I&rsquo;m happy to not have it.</p>
<p>Apparently you can disable <code>Ctrl+S</code> in other shells with <code>stty -ixon</code>.</p>
<h3 id="10-nice-syntax-highlighting">10. nice syntax highlighting</h3>
<p>By default commands that don&rsquo;t exist are highlighted in red, like this.</p>
<img src="https://jvns.ca/images/fish-syntax-2024.png">
<h3 id="11-easier-loops">11. easier loops</h3>
<p>I find the loop syntax in fish a lot easier to type than the bash syntax. It looks like this:</p>
<pre><code>for i in *.yaml
  echo $i
end
</code></pre>
<p>Also it&rsquo;ll add indentation in your loops which is nice.</p>
<h3 id="12-easier-multiline-editing">12. easier multiline editing</h3>
<p>Related to loops: you can edit multiline commands much more easily than in bash
(just use the arrow keys to navigate the multiline command!). Also when you use
the up arrow to get a multiline command from your history, it&rsquo;ll show you the
whole command the exact same way you typed it instead of squishing it all onto
one line like bash does:</p>
<pre><code>$ bash
$ for i in *.png
&gt; do
&gt; echo $i
&gt; done
$ # press up arrow
$ for i in *.png; do echo $i; done ink
</code></pre>
<h3 id="13-ctrl-left-arrow">13. Ctrl+left arrow</h3>
<p>This might just be me, but I really appreciate that fish has the <code>Ctrl+left arrow</code> / <code>Ctrl+right arrow</code> keyboard shortcut for moving between
words when writing a command.</p>
<p>I&rsquo;m honestly a bit confused about where this keyboard shortcut is coming from
(the only documented keyboard shortcut for this I can find in fish is <code>Alt+left arrow</code> / <code>Alt + right arrow</code> which seems to do the same thing), but I&rsquo;m pretty
sure this is a fish shortcut.</p>
<p>A couple of notes about getting this shortcut to work / where it comes from:</p>
<ul>
<li>one person said they needed to switch their terminal emulator from the &ldquo;Linux
console&rdquo; keybindings to &ldquo;Default (XFree 4)&rdquo; to get it to work in fish</li>
<li>on Mac OS, <code>Ctrl+left arrow</code> switches workspaces by default, so I had to turn
that off.</li>
<li>Also apparently Ubuntu configures libreadline in <code>/etc/inputrc</code> to make
<code>Ctrl+left/right arrow</code> go back/forward a word, so it&rsquo;ll work in bash on
Ubuntu and maybe other Linux distros too. Here&rsquo;s a <a href="https://stackoverflow.com/questions/5029118/bash-ctrl-to-move-cursor-between-words-strings">stack overflow question talking about that</a></li>
</ul>
<h3 id="a-downside-not-everything-has-a-fish-integration">a downside: not everything has a fish integration</h3>
<p>Sometimes tools don&rsquo;t have instructions for integrating them with fish. That&rsquo;s annoying, but:</p>
<ul>
<li>I&rsquo;ve found this has gotten better over the last 10 years as fish has gotten
more popular. For example Python&rsquo;s virtualenv has had a fish integration for
a long time now.</li>
<li>If I need to run a POSIX shell command real quick, I can always just run <code>bash</code> or <code>zsh</code></li>
<li>I&rsquo;ve gotten much better over the years at translating simple commands to fish syntax when I need to</li>
</ul>
<p>My biggest day-to-day to annoyance is probably that for whatever reason I&rsquo;m
still not  used to fish&rsquo;s syntax for setting environment variables, I get confused
about <code>set</code> vs <code>set -x</code>.</p>
<h3 id="another-downside-fish-add-path">another downside: <code>fish_add_path</code></h3>
<p>fish has a function called <code>fish_add_path</code> that you can run to add a directory
to your <code>PATH</code> like this:</p>
<pre><code>fish_add_path /some/directory
</code></pre>
<p>I love the idea of it and I used to use it all the time, but I&rsquo;ve stopped using
it for two reasons:</p>
<ol>
<li>Sometimes <code>fish_add_path</code> will update the <code>PATH</code> for every session in the
future (with a &ldquo;universal variable&rdquo;) and sometimes it will update the <code>PATH</code>
just for the current session. It&rsquo;s hard for me to tell which one it will
do: in theory the docs explain this but I could not understand them.</li>
<li>If you ever need to <em>remove</em> the directory from your <code>PATH</code> a few weeks or
months later because maybe you made a mistake, that&rsquo;s also kind of hard to do
(there are <a href="https://github.com/fish-shell/fish-shell/issues/8604">instructions in this comments of this github issue though</a>).</li>
</ol>
<p>Instead I just update my PATH like this, similarly to how I&rsquo;d do it in bash:</p>
<pre><code>set PATH $PATH /some/directory/bin
</code></pre>
<h3 id="on-posix-compatibility">on POSIX compatibility</h3>
<p>When I started using fish, you couldn&rsquo;t do things like <code>cmd1 &amp;&amp; cmd2</code> &ndash; it
would complain &ldquo;no, you need to run <code>cmd1; and cmd2</code>&rdquo; instead.</p>
<p>It seems like over the years fish has started accepting a little more POSIX-style syntax than it used to, like:</p>
<ul>
<li><code>cmd1 &amp;&amp; cmd2</code></li>
<li><code>export a=b</code> to set an environment variable (though this seems a bit limited, you can&rsquo;t do <code>export PATH=$PATH:/whatever</code> so I think it&rsquo;s probably better to learn <code>set</code> instead)</li>
</ul>
<h3 id="on-fish-as-a-default-shell">on fish as a default shell</h3>
<p>Changing my default shell to fish is always a little annoying, I occasionally get myself into a situation where</p>
<ol>
<li>I install fish somewhere like maybe <code>/home/bork/.nix-stuff/bin/fish</code></li>
<li>I add the new fish location to <code>/etc/shells</code> as an allowed shell</li>
<li>I change my shell with <code>chsh</code></li>
<li>at some point months/years later I reinstall fish in a different location for some reason and remove the old one</li>
<li>oh no!!! I have no valid shell! I can&rsquo;t open a new terminal tab anymore!</li>
</ol>
<p>This has never been a major issue because I always have a terminal open
somewhere where I can fix the problem and rescue myself, but it&rsquo;s a bit
alarming.</p>
<p>If you don&rsquo;t want to use <code>chsh</code> to change your shell to fish (which is very reasonable,
maybe I shouldn&rsquo;t be doing that), the <a href="https://wiki.archlinux.org/title/Fish">Arch wiki page</a> has a couple of good suggestions &ndash;
either configure your terminal emulator to run fish or add an <code>exec fish</code> to
your <code>.bashrc</code>.</p>
<h3 id="i-ve-never-really-learned-the-scripting-language">I&rsquo;ve never really learned the scripting language</h3>
<p>Other than occasionally writing a for loop interactively on the command line,
I&rsquo;ve never really learned the fish scripting language. I still do all of my
shell scripting in bash.</p>
<p>I don&rsquo;t think I&rsquo;ve ever written a fish function or <code>if</code> statement.</p>
<h3 id="it-seems-like-fish-is-getting-pretty-popular">it seems like fish is getting pretty popular</h3>
<p>I ran a highly unscientific poll on Mastodon asking people what shell they <a href="https://social.jvns.ca/@b0rk/112722850642874842">use interactively</a>. The results were (of 2600 responses):</p>
<ul>
<li>46% bash</li>
<li>49% zsh</li>
<li>16% fish</li>
<li>5% other</li>
</ul>
<p>I think 16% for fish is pretty remarkable, since (as far as I know) there isn&rsquo;t
any system where fish is the default shell, and my sense is that it&rsquo;s very
common to just stick to whatever your system&rsquo;s default shell is.</p>
<p>It feels like a big achievement for the fish project, even if maybe my Mastodon
followers are more likely than the average shell user to use fish for some
reason.</p>
<h3 id="who-might-fish-be-right-for">who might fish be right for?</h3>
<p>Fish definitely isn&rsquo;t for everyone. I think I like it because:</p>
<ol>
<li>I really dislike configuring my shell (and honestly my dev environment in general), I want things to &ldquo;just work&rdquo; with the default settings</li>
<li>fish&rsquo;s defaults feel good to me</li>
<li>I don&rsquo;t spend that much time logged into random servers using other shells
so there&rsquo;s not too much context switching</li>
<li>I liked its features so much that I was willing to relearn how to do a few
&ldquo;basic&rdquo; shell things, like using parentheses <code>(seq 1 10)</code> to run a command
instead of backticks or using <code>set</code> instead of <code>export</code></li>
</ol>
<p>Maybe you&rsquo;re also a person who would like fish! I hope a few more of the people
who fish is for can find it, because I spend so much of my time in the terminal
and it&rsquo;s made that time much more pleasant.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Migrating Mess With DNS to use PowerDNS]]></title>
    <link href="https://jvns.ca/blog/2024/08/19/migrating-mess-with-dns-to-use-powerdns/"/>
    <updated>2024-08-19T08:15:28+00:00</updated>
    <id>https://jvns.ca/blog/2024/08/19/migrating-mess-with-dns-to-use-powerdns/</id>
    <content type="html"><![CDATA[<p>About 3 years ago, I announced <a href="https://messwithdns.net/">Mess With DNS</a> in
<a href="https://jvns.ca/blog/2021/12/15/mess-with-dns/">this blog post</a>, a playground
where you can learn how DNS works by messing around and creating records.</p>
<p>I wasn&rsquo;t very careful with the DNS implementation though (to quote the release blog
post: &ldquo;following the DNS RFCs? not exactly&rdquo;), and people started reporting
problems that eventually I decided that I wanted to fix.</p>
<h3 id="the-problems">the problems</h3>
<p>Some of the problems people have reported were:</p>
<ul>
<li>domain names with underscores weren&rsquo;t allowed, even though they should be</li>
<li>If there was a CNAME record for a domain name, it allowed you to create other records for that domain name, even if it shouldn&rsquo;t</li>
<li>you could create 2 different CNAME records for the same domain name, which shouldn&rsquo;t be allowed</li>
<li>no support for the SVCB or HTTPS record types, which seemed a little complex to implement</li>
<li>no support for upgrading from UDP to TCP for big responses</li>
</ul>
<p>And there are certainly more issues that nobody got around to reporting, for
example that if you added an NS record for a subdomain to delegate it, Mess
With DNS wouldn&rsquo;t handle the delegation properly.</p>
<h3 id="the-solution-powerdns">the solution: PowerDNS</h3>
<p>I wasn&rsquo;t sure how to fix these problems for a long time &ndash; technically I
<em>could</em> have started addressing them individually, but it felt like there were
a million edge cases and I&rsquo;d never get there.</p>
<p>But then one day I was chatting with someone else who was working on a DNS
server and they said they were using <a href="https://github.com/PowerDNS/pdns/">PowerDNS</a>: an open
source DNS server with an HTTP API!</p>
<p>This seemed like an obvious solution to my problems &ndash; I could just swap out my
own crappy DNS implementation for PowerDNS.</p>
<p>There were a couple of challenges I ran into when setting up PowerDNS that I&rsquo;ll
talk about here. I really don&rsquo;t do a lot of web development and I think I&rsquo;ve never
built a website that depends on a relatively complex API before, so it was a
bit of a learning experience.</p>
<h3 id="challenge-1-getting-every-query-made-to-the-dns-server">challenge 1: getting every query made to the DNS server</h3>
<p>One of the main things Mess With DNS does is give you a live view of every DNS
query it receives for your subdomain, using a websocket. To make this work, it
needs to intercept every DNS query before they it gets sent to the PowerDNS DNS
server:</p>
<p>There were 2 options I could think of for how to intercept the DNS queries:</p>
<ol>
<li>dnstap: <code>dnsdist</code> (a DNS load balancer from the PowerDNS project) has
support for logging all DNS queries it receives using
<a href="https://dnstap.info/">dnstap</a>, so I could put dnsdist in front of PowerDNS
and then log queries that way</li>
<li>Have my Go server listen on port 53 and proxy the queries myself</li>
</ol>
<p>I originally implemented option #1, but for some reason there was a 1 second
delay before every query got logged. I couldn&rsquo;t figure out why, so I
implemented my own <a href="https://github.com/jvns/mess-with-dns/blob/3423c9496dd772f7157a56f9e068fd926e89c331/api/main.go#L265-L310">very simple proxy</a> instead.</p>
<h3 id="challenge-2-should-the-frontend-have-direct-access-to-the-powerdns-api">challenge 2: should the frontend have direct access to the PowerDNS API?</h3>
<p>The frontend used to have a lot of DNS logic in it &ndash; it converted emoji domain
names to ASCII using punycode, had a lookup table to convert numeric DNS query
types (like <code>1</code>) to their human-readable names (like <code>A</code>), did a little bit of
validation, and more.</p>
<p>Originally I considered keeping this pattern and just giving the frontend (more
or less) direct access to the PowerDNS API to create and delete, but writing
even more complex code in Javascript didn&rsquo;t feel that appealing to me &ndash; I
don&rsquo;t really know how to write tests in Javascript and it seemed like it
wouldn&rsquo;t end well.</p>
<p>So I decided to take all of the DNS logic out of the frontend and write a new
DNS API for managing records, shaped something like this:</p>
<ul>
<li><code>GET /records</code></li>
<li><code>DELETE /records/&lt;ID&gt;</code></li>
<li><code>DELETE /records/</code> (delete all records for a user)</li>
<li><code>POST /records/</code> (create record)</li>
<li><code>POST /records/&lt;ID&gt;</code> (update record)</li>
</ul>
<p>This meant that I could actually write tests for my code, since the backend is
in Go and I do know how to write tests in Go.</p>
<h3 id="what-i-learned-it-s-okay-for-an-api-to-duplicate-information">what I learned: it&rsquo;s okay for an API to duplicate information</h3>
<p>I had this idea that APIs shouldn&rsquo;t return duplicate information &ndash; for example
if I get a DNS record, it should only include a given piece of information
once.</p>
<p>But I ran into a problem with that idea when displaying MX records: an MX
record has 2 fields, &ldquo;preference&rdquo;, and &ldquo;mail server&rdquo;. And I needed to display
that information in 2 different ways on the frontend:</p>
<ol>
<li>In a form, where &ldquo;Preference&rdquo; and &ldquo;Mail Server&rdquo; are 2 different form fields (like <code>10</code> and <code>mail.example.com</code>)</li>
<li>In a summary view, where I wanted to just show the record (<code>10 mail.example.com</code>)</li>
</ol>
<p>This is kind of a small problem, but it came up in a few different places.</p>
<p>I talked to my friend Marco Rogers about this, and based on some advice from
him I realized that I could return the same information in the API in 2
different ways! Then the frontend just has to display it. So I started just
returning duplicate information in the API, something like this:</p>
<pre><code>{
  values: {'Preference': 10, 'Server': 'mail.example.com'},
  content: '10 mail.example.com',
  ...
}
</code></pre>
<p>I ended up using this pattern in a couple of other places where I needed to
display the same information in 2 different ways and it was SO much easier.</p>
<p>I think what I learned from this is that if I&rsquo;m making an API that isn&rsquo;t
intended for external use (there are no users of this API other than the
frontend!), I can tailor it very specifically to the frontend&rsquo;s needs and
that&rsquo;s okay.</p>
<h3 id="challenge-3-what-s-a-record-s-id">challenge 3: what&rsquo;s a record&rsquo;s ID?</h3>
<p>In Mess With DNS (and I think in most DNS user interfaces!), you create, add, and delete <strong>records</strong>.</p>
<p>But that&rsquo;s not how the PowerDNS API works. In PowerDNS, you create a <strong>zone</strong>,
which is made of <strong>record sets</strong>. Records don&rsquo;t have any ID in the API at all.</p>
<p>I ended up solving this by generate a fake ID for each records which is made of:</p>
<ul>
<li>its <strong>name</strong></li>
<li>its <strong>type</strong></li>
<li>and its <strong>content</strong> (base64-encoded)</li>
</ul>
<p>For example one record&rsquo;s ID is <code>brooch225.messwithdns.com.|NS|bnMxLm1lc3N3aXRoZG5zLmNvbS4=</code></p>
<p>Then I can search through the zone and find the appropriate record to update
it.</p>
<p>This means that if you update a record then its ID will change which isn&rsquo;t
usually what I want in an ID, but that seems fine.</p>
<h3 id="challenge-4-making-clear-error-messages">challenge 4: making clear error messages</h3>
<p>I think the error messages that the PowerDNS API returns aren&rsquo;t really intended to be shown to end users, for example:</p>
<ul>
<li><code>Name 'new\032site.island358.messwithdns.com.' contains unsupported characters</code> (this error encodes the space as <code>\032</code>, which is a bit disorienting if you don&rsquo;t know that the space character is 32 in ASCII)</li>
<li><code>RRset test.pear5.messwithdns.com. IN CNAME: Conflicts with pre-existing RRset</code> (this talks about RRsets, which aren&rsquo;t a concept that the Mess With DNS UI has at all)</li>
<li><code>Record orange.beryl5.messwithdns.com./A '1.2.3.4$': Parsing record content (try 'pdnsutil check-zone'): unable to parse IP address, strange character: $</code> (mentions &ldquo;pdnsutil&rdquo;, a utility which Mess With DNS&rsquo;s users don&rsquo;t have
access to in this context)</li>
</ul>
<p>I ended up handling this in two ways:</p>
<ol>
<li>Do some initial basic validation of values that users enter (like IP addresses), so I can just return errors like <code>Invalid IPv4 address: &quot;1.2.3.4$</code></li>
<li>If that goes well, send the request to PowerDNS and if we get an error back, then do some <a href="https://github.com/jvns/mess-with-dns/blob/c02579190e103218b2c8dfc6dceb19f863752f15/api/records/pdns_errors.go">hacky translation</a> of those messages to make them clearer.</li>
</ol>
<p>Sometimes users will still get errors from PowerDNS directly, but I added some
logging of all the errors that users see, so hopefully I can review them and
add extra translations if there are other common errors that come up.</p>
<p>I think what I learned from this is that if I&rsquo;m building a user-facing
application on top of an API, I need to be pretty thoughtful about how I
resurface those errors to users.</p>
<h3 id="challenge-5-setting-up-sqlite">challenge 5: setting up SQLite</h3>
<p>Previously Mess With DNS was using a Postgres database. This was problematic
because I only gave the Postgres machine 256MB of RAM, which meant that the
database got OOM killed almost every single day. I never really worked out
exactly why it got OOM killed every day, but that&rsquo;s how it was. I spent some
time trying to tune Postgres&rsquo; memory usage by setting the max connections /
<code>work-mem</code> / <code>maintenance-work-mem</code> and it helped a bit but didn&rsquo;t solve the
problem.</p>
<p>So for this refactor I decided to use SQLite instead, because the website
doesn&rsquo;t really get that much traffic. There are some choices involved with
using SQLite, and I decided to:</p>
<ol>
<li>Run <code>db.SetMaxOpenConns(1)</code> to make sure that we only open 1 connection to
the database at a time, to prevent <code>SQLITE_BUSY</code> errors from two threads
trying to access the database at the same time (just setting WAL mode didn&rsquo;t
work)</li>
<li>Use separate databases for each of the 3 tables (users, records, and
requests) to reduce contention. This maybe isn&rsquo;t really necessary, but there
was no reason I needed the tables to be in the same database so I figured I&rsquo;d set
up separate databases to be safe.</li>
<li>Use the cgo-free <a href="https://pkg.go.dev/modernc.org/sqlite?utm_source=godoc">modernc.org/sqlite</a>, which <a href="https://datastation.multiprocess.io/blog/2022-05-12-sqlite-in-go-with-and-without-cgo.html">translates SQLite&rsquo;s source code to Go</a>.
I might switch to a more &ldquo;normal&rdquo; sqlite implementation instead at some point and use cgo though.
I think the main reason I prefer to avoid cgo is that cgo has landed me with <a href="https://jvns.ca/blog/2021/11/17/debugging-a-weird--file-not-found--error/">difficult-to-debug errors in the past</a>.</li>
<li>use WAL mode</li>
</ol>
<p>I still haven&rsquo;t set up backups, though I don&rsquo;t think my Postgres database had
backups either. I think I&rsquo;m unlikely to use
<a href="https://litestream.io/">litestream</a> for backups &ndash; Mess With DNS is very far
from a critical application, and I think daily backups that I could recover
from in case of a disaster are more than good enough.</p>
<h3 id="challenge-6-upgrading-vue-managing-forms">challenge 6: upgrading Vue &amp; managing forms</h3>
<p>This has nothing to do with PowerDNS but I decided to upgrade Vue.js from
version 2 to 3 as part of this refresh. The main problem with that is that the
form validation library I was using (FormKit) completely changed its API
between Vue 2 and Vue 3, so I decided to just stop using it instead of learning
the new API.</p>
<p>I ended up switching to some form validation tools that are built into the
browser like <code>required</code> and <code>oninvalid</code> (<a href="https://github.com/jvns/mess-with-dns/blob/90f7a2d2982c8151a3ddcab532bc1db07a043f84/frontend/components/NewRecord.html#L5-L8">here&rsquo;s the code</a>).
I think it could use some of improvement, I still don&rsquo;t understand forms very well.</p>
<h3 id="challenge-7-managing-state-in-the-frontend">challenge 7: managing state in the frontend</h3>
<p>This also has nothing to do with PowerDNS, but when modifying the frontend I
realized that my state management in the frontend was a mess &ndash; in every place
where I made an API request to the backend, I had to try to remember to add a
&ldquo;refresh records&rdquo; call after that in every place that I&rsquo;d modified the state
and I wasn&rsquo;t always consistent about it.</p>
<p>With some more advice from Marco, I ended up implementing a single global
<a href="https://github.com/jvns/mess-with-dns/blob/90f7a2d2982c8151a3ddcab532bc1db07a043f84/frontend/store.ts#L32-L44">state management store</a>
which stores all the state for the application, and which lets me
create/update/delete records.</p>
<p>Then my components can just call <code>store.createRecord(record)</code>, and the store
will automatically resynchronize all of the state as needed.</p>
<h3 id="challenge-8-sequencing-the-project">challenge 8: sequencing the project</h3>
<p>This project ended up having several steps because I reworked the whole
integration between the frontend and the backend. I ended up splitting it into
a few different phases:</p>
<ol>
<li>Upgrade Vue from v2 to v3</li>
<li>Make the state management store</li>
<li>Implement a different backend API, move a lot of DNS logic out of the frontend, and add tests for the backend</li>
<li>Integrate PowerDNS</li>
</ol>
<p>I made sure that the website was (more or less) 100% working and then deployed
it in between phases, so that the amount of changes I was managing at a time
stayed somewhat under control.</p>
<h3 id="the-new-website-is-up-now">the new website is up now!</h3>
<p>I released the upgraded website a few days ago and it seems to work!
The PowerDNS API has been great to work on top of, and I&rsquo;m relieved that
there&rsquo;s a whole class of problems that I now don&rsquo;t have to think about at all,
other than potentially trying to make the error messages from PowerDNS a little
clearer. Using PowerDNS has fixed a lot of the DNS issues that folks have
reported in the last few years and it feels great.</p>
<p>If you run into problems with the new Mess With DNS I&rsquo;d love to <a href="https://github.com/jvns/mess-with-dns/issues/">hear about them here</a>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Go structs are copied on assignment (and other things about Go I'd missed)]]></title>
    <link href="https://jvns.ca/blog/2024/08/06/go-structs-copied-on-assignment/"/>
    <updated>2024-08-06T08:38:35+00:00</updated>
    <id>https://jvns.ca/blog/2024/08/06/go-structs-copied-on-assignment/</id>
    <content type="html"><![CDATA[<p>I&rsquo;ve been writing Go pretty casually for years &ndash; the backends for all of my
playgrounds (<a href="https://nginx-playground.wizardzines.com/">nginx</a>, <a href="https://messwithdns.net/">dns</a>, <a href="https://memory-spy.wizardzines.com/">memory</a>, <a href="https://dns-lookup.jvns.ca/">more DNS</a>) are written in Go, but many of those projects are just a few hundred lines and I don&rsquo;t come back to those codebases much.</p>
<p>I thought I more or less understood the basics of the language, but this week
I&rsquo;ve been writing a lot more Go than usual while working on some upgrades to
<a href="https://messwithdns.net">Mess with DNS</a>, and ran into a bug that revealed I
was missing a very basic concept!</p>
<p>Then I posted about this on Mastodon and someone linked me to this very cool
site (and book) called <a href="https://100go.co">100 Go Mistakes and How To Avoid Them</a> by <a href="https://teivah.dev/">Teiva Harsanyi</a>. It just came out in 2022 so it&rsquo;s relatively new.</p>
<p>I decided to read through the site to see what <em>else</em> I was missing, and found
a couple of other misconceptions I had about Go. I&rsquo;ll talk about some of the
mistakes that jumped out to me the most, but really the whole
<a href="https://100go.co/">100 Go Mistakes</a> site is great and I&rsquo;d recommend reading it.</p>
<p>Here&rsquo;s the initial mistake that started me on this journey:</p>
<h3 id="mistake-1-not-understanding-that-structs-are-copied-on-assignment">mistake 1: not understanding that structs are copied on assignment</h3>
<p>Let&rsquo;s say we have a struct:</p>
<pre><code>type Thing struct {
    Name string
}
</code></pre>
<p>and this code:</p>
<pre><code>thing := Thing{&quot;record&quot;}
other_thing := thing
other_thing.Name = &quot;banana&quot;
fmt.Println(thing)
</code></pre>
<p>This prints &ldquo;record&rdquo; and not &ldquo;banana&rdquo; (<a href="https://go.dev/play/p/kUeP2ocFtXw">play.go.dev link</a>), because <code>thing</code> is copied when you
assign it to <code>other_thing</code>.</p>
<h3 id="the-problem-this-caused-me-ranges">the problem this caused me: ranges</h3>
<p>The bug I spent 2 hours of my life debugging last week was effectively this code (<a href="https://go.dev/play/p/85FnGG86UBP">play.go.dev link</a>):</p>
<pre><code>type Thing struct {
  Name string
}
func findThing(things []Thing, name string) *Thing {
  for _, thing := range things {
    if thing.Name == name {
      return &amp;thing
    }
  }
  return nil
}

func main() {
  things := []Thing{Thing{&quot;record&quot;}, Thing{&quot;banana&quot;}}
  thing := findThing(things, &quot;record&quot;)
  thing.Name = &quot;gramaphone&quot;
  fmt.Println(things)
}
</code></pre>
<p>This prints out <code>[{record} {banana}]</code> &ndash; because <code>findThing</code> returned a copy, we didn&rsquo;t change the name in the original array.</p>
<p>This mistake is <a href="https://100go.co/#ignoring-that-elements-are-copied-in-range-loops-30">#30 in 100 Go Mistakes</a>.</p>
<p>I fixed the bug by changing it to something like this (<a href="https://go.dev/play/p/CKZCRUwv_nG">play.go.dev link</a>), which returns a
reference to the item in the array we&rsquo;re looking for instead of a copy.</p>
<pre><code>func findThing(things []Thing, name string) *Thing {
  for i := range things {
    if things[i].Name == name {
      return &amp;things[i]
    }
  }
  return nil
}
</code></pre>
<h3 id="why-didn-t-i-realize-this">why didn&rsquo;t I realize this?</h3>
<p>When I learned that I was mistaken about how assignment worked in Go I was
really taken aback, like &ndash; it&rsquo;s such a basic fact about the language works!
If I was wrong about that then what ELSE am I wrong about in Go????</p>
<p>My best guess for what happened is:</p>
<ol>
<li>I&rsquo;ve heard for my whole life that when you define a function,
you need to think about whether its arguments are passed by <strong>reference</strong> or
by <strong>value</strong></li>
<li>So I&rsquo;d thought about this in Go, and I knew that if you pass a struct as a
value to a function, it gets copied &ndash; if you want to pass a reference then
you have to pass a pointer</li>
<li>But somehow it never occurred to me that you need to think about the same
thing for <strong>assignments</strong>, perhaps because in most of the other languages I
use (Python, JS, Java) I think everything is a reference anyway. Except for
in Rust, where you do have values that you make copies of but I think most of the time I had to run <code>.clone()</code> explicitly.
(though apparently structs will be automatically copied on assignment if the struct implements the <code>Copy</code> trait)</li>
<li>Also obviously I just don&rsquo;t write that much Go so I guess it&rsquo;s never come
up.</li>
</ol>
<h3 id="mistake-2-side-effects-appending-slices-25-https-100go-co-unexpected-side-effects-using-slice-append-25">mistake 2: side effects appending slices (<a href="https://100go.co/#unexpected-side-effects-using-slice-append-25">#25</a>)</h3>
<p>When you subset a slice with <code>x[2:3]</code>, the original slice and the sub-slice
share the same backing array, so if you append to the new slice, it can
unintentionally change the old slice:</p>
<p>For example, this code prints <code>[1 2 3 555 5]</code> (<a href="https://go.dev/play/p/qssfM_NSXJD">code on play.go.dev</a>)</p>
<pre><code>x := []int{1, 2, 3, 4, 5}
y := x[2:3]
y = append(y, 555)
fmt.Println(x)
</code></pre>
<p>I don&rsquo;t think this has ever actually happened to me, but it&rsquo;s alarming and I&rsquo;m
very happy to know about it.</p>
<p>Apparently you can avoid this problem by changing <code>y := x[2:3]</code> to <code>y := x[2:3:3]</code>, which restricts the new slice&rsquo;s capacity so that appending to it
will re-allocate a new slice. Here&rsquo;s some <a href="https://go.dev/play/p/aE78JUL4-Iv">code on play.go.dev</a> that does that.</p>
<h3 id="mistake-3-not-understanding-the-different-types-of-method-receivers-42">mistake 3: not understanding the different types of method receivers (#42)</h3>
<p>This one isn&rsquo;t a &ldquo;mistake&rdquo; exactly, but it&rsquo;s been a source of confusion for me
and it&rsquo;s pretty simple so I&rsquo;m glad to have it cleared up.</p>
<p>In Go you can declare methods in 2 different ways:</p>
<ol>
<li><code>func (t Thing) Function()</code> (a &ldquo;value receiver&rdquo;)</li>
<li><code>func (t *Thing) Function()</code> (a &ldquo;pointer receiver&rdquo;)</li>
</ol>
<p>My understanding now is that basically:</p>
<ul>
<li>If you want the method to mutate the struct <code>t</code>, you need a pointer receiver.</li>
<li>If you want to make sure the method <strong>doesn&rsquo;t</strong> mutate the struct <code>t</code>, use a value receiver.</li>
</ul>
<p><a href="https://100go.co/#not-knowing-which-type-of-receiver-to-use-42">Explanation #42</a> has a
bunch of other interesting details though. There&rsquo;s definitely still something
I&rsquo;m missing about value vs pointer receivers (I got a compile error related to
them a couple of times in the last week that I still don&rsquo;t understand), but
hopefully I&rsquo;ll run into that error again soon and I can figure it out.</p>
<h3 id="more-interesting-things-i-noticed">more interesting things I noticed</h3>
<p>Some more notes from 100 Go Mistakes:</p>
<ul>
<li>apparently you can <a href="https://100go.co/#never-using-named-result-parameters-43">name the outputs of your function (#43)</a>, though that can have <a href="https://100go.co/#unintended-side-effects-with-named-result-parameters-44">issues (#44)</a> and I&rsquo;m not sure I want to</li>
<li><a href="https://100go.co/#not-exploring-all-the-go-testing-features-90">apparently you can put tests in a different package (#90)</a> to
ensure that you only use the package&rsquo;s public interfaces, which seems really
useful</li>
<li>there are a lots of notes about how to use contexts, channels, goroutines,
mutexes, sync.WaitGroup, etc. I&rsquo;m sure I have something to learn about all of
those but today is not the day I&rsquo;m going to learn them.</li>
</ul>
<p>Also there are some things that have tripped me up in the past, like:</p>
<ul>
<li><a href="https://100go.co/#forgetting-the-return-statement-after-replying-to-an-http-request-80">forgetting the return statement after replying to an HTTP request (#80)</a></li>
<li><a href="https://100go.co/#not-using-testing-utility-packages-httptest-and-iotest-88">not realizing the httptest package exists (#88)</a></li>
</ul>
<h3 id="this-100-common-mistakes-format-is-great">this &ldquo;100 common mistakes&rdquo; format is great</h3>
<p>I really appreciated this &ldquo;100 common mistakes&rdquo; format &ndash; it made it really
easy for me to skim through the mistakes and very quickly mentally classify
them into:</p>
<ol>
<li>yep, I know that</li>
<li>not interested in that one right now</li>
<li>WOW WAIT I DID NOT KNOW THAT, THAT IS VERY USEFUL!!!!</li>
</ol>
<p>It looks like &ldquo;100 Common Mistakes&rdquo; is a series of books from Manning and they
also have &ldquo;100 Java Mistakes&rdquo; and an upcoming &ldquo;100 SQL Server Mistakes&rdquo;.</p>
<p>Also I enjoyed what I&rsquo;ve read of <a href="https://effectivepython.com/">Effective Python</a> by Brett Slatkin, which has a similar &ldquo;here are a bunch of
short Python style tips&rdquo; structure where you can quickly skim it and take
what&rsquo;s useful to you. There&rsquo;s also Effective C++, Effective Java, and probably
more.</p>
<h3 id="some-other-go-resources">some other Go resources</h3>
<p>other resources I&rsquo;ve appreciated:</p>
<ul>
<li><a href="https://gobyexample.com/">Go by example</a> for basic syntax</li>
<li><a href="https://go.dev/play/">go.dev/play</a></li>
<li>obviously <a href="https://pkg.go.dev">https://pkg.go.dev</a> for documentation about literally everything</li>
<li><a href="https://staticcheck.dev/">staticcheck</a> seems like a useful linter &ndash; for
example I just started using it to tell me when I&rsquo;ve forgotten to handle an
error</li>
<li>apparently <a href="https://golangci-lint.run/">golangci-lint</a> includes a bunch of different linters</li>
</ul>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Entering text in the terminal is complicated]]></title>
    <link href="https://jvns.ca/blog/2024/07/08/readline/"/>
    <updated>2024-07-08T13:00:15+00:00</updated>
    <id>https://jvns.ca/blog/2024/07/08/readline/</id>
    <content type="html"><![CDATA[<p>The other day I asked what folks on Mastodon find confusing about working in
the terminal, and one thing that stood out to me was &ldquo;editing a command you
already typed in&rdquo;.</p>
<p>This really resonated with me: even though entering some text and editing it is
a very &ldquo;basic&rdquo; task, it took me maybe 15 years of using the terminal every
single day to get used to using <code>Ctrl+A</code> to go to the beginning of the line (or
<code>Ctrl+E</code> for the end &ndash; I think I used <code>Home</code>/<code>End</code> instead).</p>
<p>So let&rsquo;s talk about why entering text might be hard! I&rsquo;ll also share a few tips
that I wish I&rsquo;d learned earlier.</p>
<h3 id="it-s-very-inconsistent-between-programs">it&rsquo;s very inconsistent between programs</h3>
<p>A big part of what makes entering text in the terminal hard is the
inconsistency between how different programs handle entering text. For example:</p>
<ol>
<li>some programs (<code>cat</code>, <code>nc</code>, <code>git commit --interactive</code>, etc) don&rsquo;t support using arrow keys at all: if you press arrow keys, you&rsquo;ll just see <code>^[[D^[[D^[[C^[[C^</code></li>
<li>many programs (like <code>irb</code>, <code>python3</code> on a Linux machine and many many more) use the <code>readline</code> library, which gives you a lot of basic functionality (history, arrow keys, etc)</li>
<li>some programs (like <code>/usr/bin/python3</code> on my Mac) do support very basic features like arrow keys, but not other features like <code>Ctrl+left</code> or reverse searching with <code>Ctrl+R</code></li>
<li>some programs (like the <code>fish</code> shell or <code>ipython3</code> or <code>micro</code> or <code>vim</code>) have their own fancy system for accepting input which is totally custom</li>
</ol>
<p>So there&rsquo;s a lot of variation! Let&rsquo;s talk about each of those a little more.</p>
<h3 id="mode-1-the-baseline">mode 1: the baseline</h3>
<p>First, there&rsquo;s &ldquo;the baseline&rdquo; &ndash; what happens if a program just accepts text by
calling <code>fgets()</code> or whatever and doing absolutely nothing else to provide a
nicer experience. Here&rsquo;s what using these tools typically looks for me &ndash; If I
start the version of <a href="https://wiki.archlinux.org/title/Dash">dash</a> installed on
my machine (a pretty minimal shell) press the left arrow keys, it just prints
<code>^[[D</code> to the terminal.</p>
<pre><code>$ ls l-^[[D^[[D^[[D
</code></pre>
<p>At first it doesn&rsquo;t seem like all of these &ldquo;baseline&rdquo; tools have much in
common, but there are actually a few features that you get for free just from
your terminal, without the program needing to do anything special at all.</p>
<p>The things you get for free are:</p>
<ol>
<li>typing in text, obviously</li>
<li>backspace</li>
<li><code>Ctrl+W</code>, to delete the previous word</li>
<li><code>Ctrl+U</code>, to delete the whole line</li>
<li>a few other things unrelated to text editing (like <code>Ctrl+C</code> to interrupt the process, <code>Ctrl+Z</code> to suspend, etc)</li>
</ol>
<p>This is not <em>great</em>, but it means that if you want to delete a word you
generally can do it with <code>Ctrl+W</code> instead of pressing backspace 15 times, even
if you&rsquo;re in an environment which is offering you absolutely zero features.</p>
<p>You can get a list of all the ctrl codes that your terminal supports with <code>stty -a</code>.</p>
<h3 id="mode-2-tools-that-use-readline">mode 2: tools that use <code>readline</code></h3>
<p>The next group is tools that use readline! Readline is a GNU library to make
entering text more pleasant, and it&rsquo;s very widely used.</p>
<p>My favourite readline keyboard shortcuts are:</p>
<ol>
<li><code>Ctrl+E</code> (or <code>End</code>) to go to the end of the line</li>
<li><code>Ctrl+A</code> (or <code>Home</code>) to go to the beginning of the line</li>
<li><code>Ctrl+left/right arrow</code> to go back/forward 1 word</li>
<li>up arrow to go back to the previous command</li>
<li><code>Ctrl+R</code> to search your history</li>
</ol>
<p>And you can use <code>Ctrl+W</code> / <code>Ctrl+U</code> from the &ldquo;baseline&rdquo; list, though <code>Ctrl+U</code>
deletes from the cursor to the beginning of the line instead of deleting the
whole line. I think <code>Ctrl+W</code> might also have a slightly different definition of
what a &ldquo;word&rdquo; is.</p>
<p>There are a lot more (<a href="https://www.man7.org/linux/man-pages/man3/readline.3.html#EDITING_COMMANDS">here&rsquo;s a full list</a>), but those are the only ones that I personally use.</p>
<p>The <code>bash</code> shell is probably the most famous readline user (when you use
<code>Ctrl+R</code> to search your history in bash, that feature actually comes from
readline), but there are TONS of programs that use it &ndash; for example <code>psql</code>,
<code>irb</code>, <code>python3</code>, etc.</p>
<h3 id="tip-you-can-make-anything-use-readline-with-rlwrap">tip: you can make ANYTHING use readline with <code>rlwrap</code></h3>
<p>One of my absolute favourite things is that if you have a program like <code>nc</code>
without readline support, you can just run <code>rlwrap nc</code> to turn it into a
program with readline support!</p>
<p>This is incredible and makes a lot of tools that are borderline unusable MUCH
more pleasant to use. You can even apparently set up <a href="https://github.com/hanslub42/rlwrap">rlwrap</a> to include your own
custom autocompletions, though I&rsquo;ve never tried that.</p>
<h3 id="some-reasons-tools-might-not-use-readline">some reasons tools might not use readline</h3>
<p>I think reasons tools might not use readline might include:</p>
<ul>
<li>the program is very simple (like <code>cat</code> or <code>nc</code>) and maybe the maintainers don&rsquo;t want to bring in a relatively large dependency</li>
<li>license reasons, if the program&rsquo;s license is not GPL-compatible &ndash; readline is GPL-licensed, not LGPL</li>
<li>only a very small part of the program is interactive, and maybe readline
support isn&rsquo;t seen as important. For example <code>git</code> has a few interactive
features (like <code>git add -p</code>), but not very many, and usually you&rsquo;re just
typing a single character like <code>y</code> or <code>n</code> &ndash; most of the time you need to really
type something significant in git, it&rsquo;ll drop you into a text editor instead.</li>
</ul>
<p>For example idris2 says <a href="https://idris2.readthedocs.io/en/latest/tutorial/interactive.html#editing-at-the-repl">they don&rsquo;t use readline</a>
to keep dependencies minimal and suggest using <code>rlwrap</code> to get better
interactive features.</p>
<h3 id="how-to-know-if-you-re-using-readline">how to know if you&rsquo;re using readline</h3>
<p>The simplest test I can think of is to press <code>Ctrl+R</code>, and if you see:</p>
<pre><code>(reverse-i-search)`':
</code></pre>
<p>then you&rsquo;re probably using readline. This obviously isn&rsquo;t a guarantee (some
other library could use the term <code>reverse-i-search</code> too!), but I don&rsquo;t know of
another system that uses that specific term to refer to searching history.</p>
<h3 id="the-readline-keybindings-come-from-emacs">the readline keybindings come from Emacs</h3>
<p>Because I&rsquo;m a vim user, It took me a very long time to understand where these
keybindings come from (why <code>Ctrl+A</code> to go to the beginning of a line??? so
weird!)</p>
<p>My understanding is these keybindings actually come from Emacs &ndash; <code>Ctrl+A</code> and
<code>Ctrl+E</code> do the same thing in Emacs as they do in Readline and I assume the
other keyboard shortcuts mostly do as well, though I tried out <code>Ctrl+W</code> and
<code>Ctrl+U</code> in Emacs and they don&rsquo;t do the same thing as they do in the terminal
so I guess there are some differences.</p>
<p>There&rsquo;s some more <a href="https://twobithistory.org/2019/08/22/readline.html">history of the Readline project here</a>.</p>
<h3 id="mode-3-another-input-library-like-libedit">mode 3: another input library (like <code>libedit</code>)</h3>
<p>On my Mac laptop, <code>/usr/bin/python3</code> is in a weird middle ground where it
supports <em>some</em> readline features (for example the arrow keys), but not the
other ones. For example when I press <code>Ctrl+left arrow</code>, it prints out <code>;5D</code>,
like this:</p>
<pre><code>$ python3
&gt;&gt;&gt; importt subprocess;5D
</code></pre>
<p>Folks on Mastodon helped me figure out that this is because in the default
Python install on Mac OS, the Python <code>readline</code> module is actually backed by
<code>libedit</code>, which is a similar library which has fewer features, presumably
because Readline is <a href="https://en.wikipedia.org/wiki/GNU_Readline#Choice_of_the_GPL_as_GNU_Readline's_license">GPL licensed</a>.</p>
<p>Here&rsquo;s how I was eventually able to figure out that Python was using libedit on
my system:</p>
<pre><code>$ python3 -c &quot;import readline; print(readline.__doc__)&quot;
Importing this module enables command line editing using libedit readline.
</code></pre>
<p>Generally Python uses readline though if you install it on Linux or through
Homebrew. It&rsquo;s just that the specific version that Apple includes on their
systems doesn&rsquo;t have readline. Also <a href="https://docs.python.org/3.13/whatsnew/3.13.html#a-better-interactive-interpreter">Python 3.13 is going to remove the readline dependency</a>
in favour of a custom library, so &ldquo;Python uses readline&rdquo; won&rsquo;t be true in the
future.</p>
<p>I assume that there are more programs on my Mac that use libedit but I haven&rsquo;t
looked into it.</p>
<h3 id="mode-4-something-custom">mode 4: something custom</h3>
<p>The last group of programs is programs that have their own custom (and sometimes
much fancier!) system for editing text. This includes:</p>
<ul>
<li>most terminal text editors (nano, micro, vim, emacs, etc)</li>
<li>some shells (like fish), for example it seems like fish supports <code>Ctrl+Z</code> for undo when typing in a command. Zsh&rsquo;s line editor is called <a href="https://zsh.sourceforge.io/Guide/zshguide04.html">zle</a>.</li>
<li>some REPLs (like <code>ipython</code>), for example IPython uses the <a href="https://python-prompt-toolkit.readthedocs.io/">prompt_toolkit</a> library instead of readline</li>
<li>lots of other programs (like <code>atuin</code>)</li>
</ul>
<p>Some features you might see are:</p>
<ul>
<li>better autocomplete which is more customized to the tool</li>
<li>nicer history management (for example with syntax highlighting) than the default you get from readline</li>
<li>more keyboard shortcuts</li>
</ul>
<h3 id="custom-input-systems-are-often-readline-inspired">custom input systems are often readline-inspired</h3>
<p>I went looking at how <a href="https://atuin.sh/">Atuin</a> (a wonderful tool for
searching your shell history that I started using recently) handles text input.
Looking at <a href="https://github.com/atuinsh/atuin/blob/a67cfc82fe0dc907a01f07a0fd625701e062a33b/crates/atuin/src/command/client/search/interactive.rs#L382-L430">the code</a>
and some of the discussion around it, their implementation is custom but it&rsquo;s
inspired by readline, which makes sense to me &ndash; a lot of users are used to
those keybindings, and it&rsquo;s convenient for them to work even though atuin
doesn&rsquo;t use readline.</p>
<p><a href="https://python-prompt-toolkit.readthedocs.io/">prompt_toolkit</a> (the library
IPython uses) is similar &ndash; it actually supports a lot of options (including
vi-like keybindings), but the default is to support the readline-style
keybindings.</p>
<p>This is like how you see a lot of programs which support very basic vim
keybindings (like <code>j</code> for down and <code>k</code> for up). For example Fastmail supports
<code>j</code> and <code>k</code> even though most of its other keybindings don&rsquo;t have much
relationship to vim.</p>
<p>I assume that most &ldquo;readline-inspired&rdquo; custom input systems have various subtle
incompatibilities with readline, but this doesn&rsquo;t really bother me at all
personally because I&rsquo;m extremely ignorant of most of readline&rsquo;s features. I only use
maybe 5 keyboard shortcuts, so as long as they support the 5 basic commands I
know (which they always do!) I feel pretty comfortable. And usually these
custom systems have much better autocomplete than you&rsquo;d get from just using
readline, so generally I prefer them over readline.</p>
<h3 id="lots-of-shells-support-vi-keybindings">lots of shells support vi keybindings</h3>
<p>Bash, zsh, and fish all have a &ldquo;vi mode&rdquo; for entering text. In a
<a href="https://social.jvns.ca/@b0rk/112723846172173621">very unscientific poll</a> I ran on
Mastodon, 12% of people said they use it, so it seems pretty popular.</p>
<p>Readline also has a &ldquo;vi mode&rdquo; (which is how Bash&rsquo;s support for it works), so by
extension lots of other programs have it too.</p>
<p>I&rsquo;ve always thought that vi mode seems really cool, but for some reason even
though I&rsquo;m a vim user it&rsquo;s never stuck for me.</p>
<h3 id="understanding-what-situation-you-re-in-really-helps">understanding what situation you&rsquo;re in really helps</h3>
<p>I&rsquo;ve spent a lot of my life being confused about why a command line application
I was using wasn&rsquo;t behaving the way I wanted, and it feels good to be able to
more or less understand what&rsquo;s going on.</p>
<p>I think this is roughly my mental flowchart when I&rsquo;m entering text at a command
line prompt:</p>
<ol>
<li>Do the arrow keys not work? Probably there&rsquo;s no input system at all, but at
least I can use <code>Ctrl+W</code> and <code>Ctrl+U</code>, and I can <code>rlwrap</code> the tool if I
want more features.</li>
<li>Does <code>Ctrl+R</code> print <code>reverse-i-search</code>? Probably it&rsquo;s readline, so I can use
all of the readline shortcuts I&rsquo;m used to, and I know I can get some basic
history and press up arrow to get the previous command.</li>
<li>Does <code>Ctrl+R</code> do something else? This is probably some custom input library:
it&rsquo;ll probably act more or less like readline, and I can check the
documentation if I really want to know how it works.</li>
</ol>
<p>Being able to diagnose what&rsquo;s going on like this makes the command line feel a
more predictable and less chaotic.</p>
<h3 id="some-things-this-post-left-out">some things this post left out</h3>
<p>There are lots more complications related to entering text that we didn&rsquo;t talk
about at all here, like:</p>
<ul>
<li>issues related to ssh / tmux / etc</li>
<li>the <code>TERM</code> environment variable</li>
<li>how different terminals (gnome terminal, iTerm, xterm, etc) have different kinds of support for copying/pasting text</li>
<li>unicode</li>
<li>probably a lot more</li>
</ul>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Reasons to use your shell's job control]]></title>
    <link href="https://jvns.ca/blog/2024/07/03/reasons-to-use-job-control/"/>
    <updated>2024-07-03T08:00:20+00:00</updated>
    <id>https://jvns.ca/blog/2024/07/03/reasons-to-use-job-control/</id>
    <content type="html"><![CDATA[<p>Hello! Today someone on Mastodon asked about job control (<code>fg</code>, <code>bg</code>, <code>Ctrl+z</code>,
<code>wait</code>, etc). It made me think about how I don&rsquo;t use my shell&rsquo;s job
control interactively very often: usually I prefer to just open a new terminal
tab if I want to run multiple terminal programs, or use tmux if it&rsquo;s over ssh.
But I was curious about whether other people used job control more often than me.</p>
<p>So I <a href="https://social.jvns.ca/@b0rk/112716835387523648">asked on Mastodon</a> for
reasons people use job control. There were a lot of great responses, and it
even made me want to consider using job control a little more!</p>
<p>In this post I&rsquo;m only going to talk about using job control interactively (not
in scripts) &ndash; the post is already long enough just talking about interactive
use.</p>
<h3 id="what-s-job-control">what&rsquo;s job control?</h3>
<p>First: what&rsquo;s job control? Well &ndash; in a terminal, your processes can be in one of 3 states:</p>
<ol>
<li>in the <strong>foreground</strong>. This is the normal state when you start a process.</li>
<li>in the <strong>background</strong>. This is what happens when you run <code>some_process &amp;</code>: the process is still running, but you can&rsquo;t interact with it anymore unless you bring it back to the foreground.</li>
<li><strong>stopped</strong>. This is what happens when you start a process and then press <code>Ctrl+Z</code>. This pauses the process: it won&rsquo;t keep using the CPU, but you can restart it if you want.</li>
</ol>
<p>&ldquo;Job control&rdquo; is a set of commands for seeing which processes are running in a terminal and moving processes between these 3 states</p>
<h3 id="how-to-use-job-control">how to use job control</h3>
<ul>
<li><code>fg</code> brings a process to the foreground. It works on both stopped processes and background processes. For example, if you start a background process with <code>cat &lt; /dev/zero &amp;</code>, you can bring it back to the foreground by running <code>fg</code></li>
<li><code>bg</code> restarts a stopped process and puts it in the background.</li>
<li>Pressing <code>Ctrl+z</code> stops the current foreground process.</li>
<li><code>jobs</code> lists all processes that are active in your terminal</li>
<li><code>kill</code> sends a signal (like <code>SIGKILL</code>) to a job (this is the shell builtin <code>kill</code>, not <code>/bin/kill</code>)</li>
<li><code>disown</code> removes the job from the list of running jobs, so that it doesn&rsquo;t get killed when you close the terminal</li>
<li><code>wait</code> waits for all background processes to complete. I only use this in scripts though.</li>
<li>apparently in bash/zsh you can also just type <code>%2</code> instead of <code>fg %2</code></li>
</ul>
<p>I might have forgotten some other job control commands but I think those are all the ones I&rsquo;ve ever used.</p>
<p>You can also give <code>fg</code> or <code>bg</code> a specific job to foreground/background. For example if I see this in the output of <code>jobs</code>:</p>
<pre><code>$ jobs
Job Group State   Command
1   3161  running cat &lt; /dev/zero &amp;
2   3264  stopped nvim -w ~/.vimkeys $argv
</code></pre>
<p>then I can foreground <code>nvim</code> with <code>fg %2</code>. You can also kill it with <code>kill -9 %2</code>, or just <code>kill %2</code> if you want to be more gentle.</p>
<h3 id="how-is-kill-2-implemented">how is <code>kill %2</code> implemented?</h3>
<p>I was curious about how <code>kill %2</code> works &ndash; does <code>%2</code> just get replaced with the
PID of the relevant process when you run the command, the way environment
variables are? Some quick experimentation shows that it isn&rsquo;t:</p>
<pre><code>$ echo kill %2
kill %2
$ type kill
kill is a function with definition
# Defined in /nix/store/vicfrai6lhnl8xw6azq5dzaizx56gw4m-fish-3.7.0/share/fish/config.fish
</code></pre>
<p>So <code>kill</code> is a fish builtin that knows how to interpret <code>%2</code>. Looking at
the source code (which is very easy in fish!), it uses <code>jobs -p %2</code> to expand <code>%2</code>
into a PID, and then runs the regular <code>kill</code> command.</p>
<h3 id="on-differences-between-shells">on differences between shells</h3>
<p>Job control is implemented by your shell. I use fish, but my sense is that the
basics of job control work pretty similarly in bash, fish, and zsh.</p>
<p>There are definitely some shells which don&rsquo;t have job control at all, but I&rsquo;ve
only used bash/fish/zsh so I don&rsquo;t know much about that.</p>
<p>Now let&rsquo;s get into a few reasons people use job control!</p>
<h3 id="reason-1-kill-a-command-that-s-not-responding-to-ctrl-c">reason 1: kill a command that&rsquo;s not responding to Ctrl+C</h3>
<p>I run into processes that don&rsquo;t respond to <code>Ctrl+C</code> pretty regularly, and it&rsquo;s
always a little annoying &ndash; I usually switch terminal tabs to find and kill and
the process. A bunch of people pointed out that you can do this in a faster way
using job control!</p>
<p>How to do this: Press <code>Ctrl+Z</code>, then <code>kill %1</code> (or the appropriate job number
if there&rsquo;s more than one stopped/background job, which you can get from
<code>jobs</code>). You can also <code>kill -9</code> if it&rsquo;s really not responding.</p>
<h3 id="reason-2-background-a-gui-app-so-it-s-not-using-up-a-terminal-tab">reason 2: background a GUI app so it&rsquo;s not using up a terminal tab</h3>
<p>Sometimes I start a GUI program from the command line (for example with
<code>wireshark some_file.pcap</code>), forget to start it in the background, and don&rsquo;t want it eating up my terminal tab.</p>
<p>How to do this:</p>
<ul>
<li>move the GUI program to the background by pressing <code>Ctrl+Z</code> and then running <code>bg</code>.</li>
<li>you can also run <code>disown</code> to remove it from the list of jobs, to make sure that
the GUI program won&rsquo;t get closed when you close your terminal tab.</li>
</ul>
<p>Personally I try to avoid starting GUI programs from the terminal if possible
because I don&rsquo;t like how their stdout pollutes my terminal (on a Mac I use
<code>open -a Wireshark</code> instead because I find it works better but sometimes you
don&rsquo;t have another choice.</p>
<h3 id="reason-2-5-accidentally-started-a-long-running-job-without-tmux">reason 2.5: accidentally started a long-running job without <code>tmux</code></h3>
<p>This is basically the same as the GUI app thing &ndash; you can move the job to the
background and disown it.</p>
<p>I was also curious about if there are ways to redirect a process&rsquo;s output to a
file after it&rsquo;s already started. A quick search turned up <a href="https://github.com/jerome-pouiller/reredirect/">this Linux-only tool</a> which is based on
<a href="https://blog.nelhage.com/">nelhage</a>&rsquo;s <a href="https://github.com/nelhage/reptyr">reptyr</a> (which lets you for example move a
process that you started outside of tmux to tmux) but I haven&rsquo;t tried either of
those.</p>
<h3 id="reason-3-running-a-command-while-using-vim">reason 3: running a command while using <code>vim</code></h3>
<p>A lot of people mentioned that if they want to quickly test something while
editing code in <code>vim</code> or another terminal editor, they like to use <code>Ctrl+Z</code>
to stop vim, run the command, and then run <code>fg</code> to go back to their editor.</p>
<p>You can also use this to check the output of a command that you ran before
starting <code>vim</code>.</p>
<p>I&rsquo;ve never gotten in the habit of this, probably because I mostly use a GUI
version of vim. I feel like I&rsquo;d also be likely to switch terminal tabs and end
up wondering &ldquo;wait&hellip; where did I put my editor???&rdquo; and have to go searching
for it.</p>
<h3 id="reason-4-preferring-interleaved-output">reason 4: preferring interleaved output</h3>
<p>A few people said that they prefer to the output of all of their commands being
interleaved in the terminal. This really surprised me because I usually think
of having the output of lots of different commands interleaved as being a <em>bad</em>
thing, but one person said that they like to do this with tcpdump specifically
and I think that actually sounds extremely useful. Here&rsquo;s what it looks like:</p>
<pre><code># start tcpdump
$ sudo tcpdump -ni any port 1234 &amp;
tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type PKTAP (Apple DLT_PKTAP), snapshot length 524288 bytes

# run curl
$ curl google.com:1234
13:13:29.881018 IP 192.168.1.173.49626 &gt; 142.251.41.78.1234: Flags [S], seq 613574185, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 2730440518 ecr 0,sackOK,eol], length 0
13:13:30.881963 IP 192.168.1.173.49626 &gt; 142.251.41.78.1234: Flags [S], seq 613574185, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 2730441519 ecr 0,sackOK,eol], length 0
13:13:31.882587 IP 192.168.1.173.49626 &gt; 142.251.41.78.1234: Flags [S], seq 613574185, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 2730442520 ecr 0,sackOK,eol], length 0
 
# when you're done, kill the tcpdump in the background
$ kill %1 
</code></pre>
<p>I think it&rsquo;s really nice here that you can see the output of tcpdump inline in
your terminal &ndash; when I&rsquo;m using tcpdump I&rsquo;m always switching back and forth and
I always get confused trying to match up the timestamps, so keeping everything
in one terminal seems like it might be a lot clearer. I&rsquo;m going to try it.</p>
<h3 id="reason-5-suspend-a-cpu-hungry-program">reason 5: suspend a CPU-hungry program</h3>
<p>One person said that sometimes they&rsquo;re running a very CPU-intensive program,
for example converting a video with <code>ffmpeg</code>, and they need to use the CPU for
something else, but don&rsquo;t want to lose the work that ffmpeg already did.</p>
<p>You can do this by pressing <code>Ctrl+Z</code> to pause the process, and then run <code>fg</code>
when you want to start it again.</p>
<h3 id="reason-6-you-accidentally-ran-ctrl-z">reason 6: you accidentally ran Ctrl+Z</h3>
<p>Many people replied that they didn&rsquo;t use job control <em>intentionally</em>, but
that they sometimes accidentally ran Ctrl+Z, which stopped whatever program was
running, so they needed to learn how to use <code>fg</code> to bring it back to the
foreground.</p>
<p>The were also some mentions of accidentally running <code>Ctrl+S</code> too (which stops
your terminal and I think can be undone with <code>Ctrl+Q</code>). My terminal totally
ignores <code>Ctrl+S</code> so I guess I&rsquo;m safe from that one though.</p>
<h3 id="reason-7-already-set-up-a-bunch-of-environment-variables">reason 7: already set up a bunch of environment variables</h3>
<p>Some folks mentioned that they already set up a bunch of environment variables
that they need to run various commands, so it&rsquo;s easier to use job control to
run multiple commands in the same terminal than to redo that work in another
tab.</p>
<h3 id="reason-8-it-s-your-only-option">reason 8: it&rsquo;s your only option</h3>
<p>Probably the most obvious reason to use job control to manage multiple
processes is &ldquo;because you have to&rdquo; &ndash; maybe you&rsquo;re in single-user mode, or on a
very restricted computer, or SSH&rsquo;d into a machine that doesn&rsquo;t have tmux or
screen and you don&rsquo;t want to create multiple SSH sessions.</p>
<h3 id="reason-9-some-people-just-like-it-better">reason 9: some people just like it better</h3>
<p>Some people also said that they just don&rsquo;t like using terminal tabs: for
instance a few folks mentioned that they prefer to be able to see all of their
terminals on the screen at the same time, so they&rsquo;d rather have 4 terminals on
the screen and then use job control if they need to run more than 4 programs.</p>
<h3 id="i-learned-a-few-new-tricks">I learned a few new tricks!</h3>
<p>I think my two main takeaways from thos post is I&rsquo;ll probably try out job control a little more for:</p>
<ol>
<li>killing processes that don&rsquo;t respond to Ctrl+C</li>
<li>running <code>tcpdump</code> in the background with whatever network command I&rsquo;m running, so I can see both of their output in the same place</li>
</ol>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[New zine: How Git Works!]]></title>
    <link href="https://jvns.ca/blog/2024/04/25/new-zine--how-git-works-/"/>
    <updated>2024-06-03T09:45:11+00:00</updated>
    <id>https://jvns.ca/blog/2024/04/25/new-zine--how-git-works-/</id>
    <content type="html"><![CDATA[<p>Hello! I&rsquo;ve been writing about git on here nonstop for months, and the git zine
is FINALLY done! It came out on Friday!</p>
<p>You can get it for $12 here:
<a href="https://wizardzines.com/zines/git">https://wizardzines.com/zines/git</a>, or get
an <a href="https://wizardzines.com/zines/all-the-zines/">14-pack of all my zines here</a>.</p>
<p>Here&rsquo;s the cover:</p>
<div align="center">
<a href="https://wizardzines.com/zines/git">
  <img width="600px" src="https://wizardzines.com/zines/git/cover-small.jpg">
  </a>
</div>
<h3 id="the-table-of-contents">the table of contents</h3>
<p>Here&rsquo;s the table of contents:</p>
<a href="https://wizardzines.com/zines/git/toc.png">
  <img width="600px" src="https://wizardzines.com/zines/git/toc.png">
</a>
<h3 id="who-is-this-zine-for">who is this zine for?</h3>
<p>I wrote this zine for people who have been using git for years and are still
afraid of it. As always &ndash; I think it sucks to be afraid of the tools that you
use in your work every day! I want folks to feel confident using git.</p>
<p>My goals are:</p>
<ul>
<li>To explain how some parts of git that initially seem scary (like &ldquo;detached
HEAD state&rdquo;) are pretty straightforward to deal with once you understand
what&rsquo;s going on</li>
<li>To show some parts of git you probably <em>should</em> be careful around.  For
example, the stash is one of the places in git where it&rsquo;s easiest to lose
your work in a way that&rsquo;s incredibly annoying to recover form, and I avoid
using it heavily because of that.</li>
<li>To clear up a few common misconceptions about how the core parts of git (like
commits, branches, and merging) work</li>
</ul>
<h3 id="what-s-the-difference-between-this-and-oh-shit-git">what&rsquo;s the difference between this and Oh Shit, Git!</h3>
<p>You might be wondering – Julia! You already have a zine about git! What’s going
on? <a href="https://wizardzines.com/zines/oh-shit-git">Oh Shit, Git!</a> is a set of tricks for fixing git messes. <a href="https://wizardzines.com/zines/git/">&ldquo;How Git Works&rdquo;</a>
explains how Git <strong>actually</strong> works.</p>
<p>Also, Oh Shit, Git! is the amazing <a href="https://sylormiller.com/">Katie Sylor Miller</a>&rsquo;s <a href="https://ohshitgit.com/">concept</a>: we made it
into a zine because I was such a huge fan of her work on it.</p>
<p>I think they go really well together.</p>
<h3 id="what-s-so-confusing-about-git-anyway">what&rsquo;s so confusing about git, anyway?</h3>
<p>This zine was really hard for me to write because when I started writing it,
I&rsquo;d been using git pretty confidently for 10 years. I had no real memory of
what it was <em>like</em> to struggle with git.</p>
<p>But thanks to a huge amount of help from <a href="https://marieflanagan.com/">Marie</a> as
well as everyone who talked to me about git on Mastodon, eventually I was able
to see that there are a lot of things about git that are counterintuitive,
misleading, or just plain confusing. These include:</p>
<ul>
<li><a href="https://jvns.ca/blog/2023/11/01/confusing-git-terminology/">confusing terminology</a> (for example &ldquo;fast-forward&rdquo;, &ldquo;reference&rdquo;, or &ldquo;remote-tracking branch&rdquo;)</li>
<li>misleading messages (for example how <code>Your branch is up to date with 'origin/main'</code> doesn&rsquo;t necessary mean that your branch is up to date with the <code>main</code> branch on the origin)</li>
<li>uninformative output (for example how I <em>STILL</em> can&rsquo;t reliably figure out which code comes from which branch when I&rsquo;m looking at a merge conflict)</li>
<li>a lack of guidance around handling diverged branches (for example how when you run <code>git pull</code> and your branch has diverged from the origin, it doesn&rsquo;t give you great guidance how to handle the situation)</li>
<li>inconsistent behaviour (for example how git&rsquo;s reflogs are almost always append-only, EXCEPT for the stash, where git will delete entries when you run <code>git stash drop</code>)</li>
</ul>
<p>The more I heard from people how about how confusing they find git, the more it
became clear that git really does not make it easy to figure out what its
internal logic is just by using it.</p>
<h3 id="handling-git-s-weirdnesses-becomes-pretty-routine">handling git&rsquo;s weirdnesses becomes pretty routine</h3>
<p>The previous section made git sound really bad, like &ldquo;how can anyone possibly
use this thing?&rdquo;.</p>
<p>But my experience is that after I learned what git actually means by all of its
weird error messages, dealing with it became pretty routine! I&rsquo;ll see an
<code>error: failed to push some refs to 'github.com:jvns/wizard-zines-site'</code>,
realize &ldquo;oh right, probably a coworker made some changes to <code>main</code> since I last
ran <code>git pull</code>&rdquo;, run <code>git pull --rebase</code> to incorporate their changes, and move
on with my day. The whole thing takes about 10 seconds.</p>
<p>Or if I see a <code>You are in 'detached HEAD' state</code> warning, I&rsquo;ll just make sure
to run <code>git checkout mybranch</code> before continuing to write code. No big deal.</p>
<p>For me (and for a lot of folks I talk to about git!), dealing with git&rsquo;s weird
language can become so normal that you totally forget why anybody would even
find it weird.</p>
<h3 id="a-little-bit-of-internals">a little bit of internals</h3>
<p>One of my biggest questions when writing this zine was how much to focus on
what&rsquo;s in the <code>.git</code> directory. We ended up deciding to include a couple of
pages about internals (&ldquo;inside .git&rdquo;, pages 14-15), but otherwise focus more on
git&rsquo;s <em>behaviour</em> when you use it and why sometimes git behaves in unexpected
ways.</p>
<p>This is partly because there are lots of great guides to git&rsquo;s internals
out there already (<a href="https://maryrosecook.com/blog/post/git-from-the-inside-out">1</a>, <a href="https://shop.jcoglan.com/building-git/">2</a>), and partly because I think even if you <em>have</em> read one
of these guides to git&rsquo;s internals, it isn&rsquo;t totally obvious how to connect
that information to what you actually see in git&rsquo;s user interface.</p>
<p>For example: it&rsquo;s easy to find documentation about remotes in git &ndash;
for example <a href="https://git-scm.com/book/en/v2/Git-Branching-Remote-Branches">this page</a> says:</p>
<blockquote>
<p>Remote-tracking branches [&hellip;] remind you where the branches in your remote
repositories were the last time you connected to them.</p>
</blockquote>
<p>But even if you&rsquo;ve read that, you might not realize that the statement <code>Your branch is up to date with 'origin/main'&quot;</code> in <code>git status</code> doesn&rsquo;t necessarily
mean that you&rsquo;re actually up to date with the remote <code>main</code> branch.</p>
<p>So in general in the zine we focus on the behaviour you see in Git&rsquo;s UI, and
then explain how that relates to what&rsquo;s happening internally in Git.</p>
<h3 id="the-cheat-sheet">the cheat sheet</h3>
<p>The zine also comes with a free printable cheat sheet: (click to get a PDF version)</p>
<a href="https://wizardzines.com/git-cheat-sheet.pdf">
  <img width="600px" src="https://wizardzines.com/images/cheat-sheet-smaller.png">
</a>
<h3 id="it-comes-with-an-html-transcript">it comes with an HTML transcript!</h3>
<p>The zine also comes with an HTML transcript, to (hopefully) make it easier to
read on a screen reader! Our Operations Manager, Lee, transcribed all of the
pages and wrote image descriptions. I&rsquo;d love feedback about the experience of
reading the zine on a screen reader if you try it.</p>
<h3 id="i-really-do-love-git">I really do love git</h3>
<p>I&rsquo;ve been pretty critical about git in this post, but I only write zines about
technologies I love, and git is no exception.</p>
<p>Some reasons I love git:</p>
<ul>
<li>it&rsquo;s fast!</li>
<li>it&rsquo;s backwards compatible! I learned how to use it 10 years ago and
everything I learned then is still true</li>
<li>there&rsquo;s tons of great free Git hosting available out there (GitHub! Gitlab! a
million more!), so I can easily back up all my code</li>
<li>simple workflows are REALLY simple (if I&rsquo;m working on a project on my own, I
can just run <code>git commit -am 'whatever'</code> and <code>git push</code> over and over again and it
works perfectly)</li>
<li>Almost every internal file in git is a pretty simple text file (or has a
version which is a text file), which makes me feel like I can always
understand exactly what&rsquo;s going on under the hood if I want to.</li>
</ul>
<p>I hope this zine helps some of you love it too.</p>
<h3 id="people-who-helped-with-this-zine">people who helped with this zine</h3>
<p>I don&rsquo;t make these zines by myself!</p>
<p>I worked with <a href="https://marieflanagan.com/">Marie Claire LeBlanc Flanagan</a> every
morning for 8 months to write clear explanations of git.</p>
<p>The cover is by Vladimir Kašiković,
Gersande La Flèche did copy editing,
James Coglan (of the great <a href="https://shop.jcoglan.com/building-git/">Building
Git</a>) did technical review, our
Operations Manager Lee did the transcription as well as a million other
things, my partner Kamal read the zine and told me which parts were off (as he
always does), and I had a million great conversations with Marco Rogers about
git.</p>
<p>And finally, I want to thank all the beta readers! There were 66 this time
which is a record! They left hundreds of comments about what was confusing,
what they learned, and which of my jokes were funny. It&rsquo;s always hard to hear
from beta readers that a page I thought made sense is actually extremely
confusing, and fixing those problems before the final version makes the zine so
much better.</p>
<h3 id="get-the-zine">get the zine</h3>
<p>Here are some links to get the zine again:</p>
<ul>
<li>get <a href="https://wizardzines.com/zines/git">How Git Works</a></li>
<li>get an <a href="https://wizardzines.com/zines/all-the-zines/">14-pack of all my zines here</a>.</li>
</ul>
<p>As always, you can get either a PDF version to print at home or a print version
shipped to your house. The only caveat is print orders will ship in <strong>July</strong> &ndash; I
need to wait for orders to come in to get an idea of how many I should print
before sending it to the printer.</p>
<h3 id="thank-you">thank you</h3>
<p>As always: if you&rsquo;ve bought zines in the past, thank you for all your support
over the years. And thanks to all of you (1000+ people!!!) who have already
bought the zine in the first 3 days. It&rsquo;s already set a record for most zines
sold in a single day and I&rsquo;ve been really blown away.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Notes on git's error messages]]></title>
    <link href="https://jvns.ca/blog/2024/04/10/notes-on-git-error-messages/"/>
    <updated>2024-04-10T12:43:14+00:00</updated>
    <id>https://jvns.ca/blog/2024/04/10/notes-on-git-error-messages/</id>
    <content type="html"><![CDATA[<p>While writing about Git, I&rsquo;ve noticed that a lot of folks struggle with Git&rsquo;s
error messages. I&rsquo;ve had many years to get used to these error messages so it
took me a really long time to understand <em>why</em> folks were confused, but having
thought about it much more, I&rsquo;ve realized that:</p>
<ol>
<li>sometimes I actually <em>am</em> confused by the error messages, I&rsquo;m just used to
being confused</li>
<li>I have a bunch of strategies for getting more information when the error
message git gives me isn&rsquo;t very informative</li>
</ol>
<p>So in this post, I&rsquo;m going to go through a bunch of Git&rsquo;s error messages,
list a few things that I think are confusing about them for each one, and talk
about what I do when I&rsquo;m confused by the message.</p>
<h3 id="improving-error-messages-isn-t-easy">improving error messages isn&rsquo;t easy</h3>
<p>Before we start, I want to say that trying to think about why these error
messages are confusing has given me a lot of respect for how difficult
maintaining Git is. I&rsquo;ve been thinking about Git for months, and for some of
these messages I really have no idea how to improve them.</p>
<p>Some things that seem hard to me about improving error messages:</p>
<ul>
<li>if you come up with an idea for a new message, it&rsquo;s hard to tell if it&rsquo;s actually better!</li>
<li>work like improving error messages often <a href="https://lwn.net/Articles/959768/">isn&rsquo;t funded</a></li>
<li>the error messages have to be translated (git&rsquo;s error messages are translated into <a href="https://github.com/git/git/tree/master/po">19 languages</a>!)</li>
</ul>
<p>That said, if you find these messages confusing, hopefully some of these notes
will help clarify them a bit.</p>
<style>
.error {
  color: #db322e;
}
.warning {
  color: #765900;
}
.bg {
  color: #fdf6e3
}
pre {
  background-color: #fdf6e3;
  padding: 10px;
  border-radius: 5px;
  /* wrap long lines */
  white-space: pre-wrap;
}

h2 a {
  color: black;
  text-decoration: none;
}

article span {
  padding: 0;
}

article a:hover {
  text-decoration: underline;
}
</style>
<h2 id="git-push-on-a-diverged-branch">
  <a href="#git-push-on-a-diverged-branch">
  error: <code>git push</code> on a diverged branch
  </a>
</h2>
<pre>
$ git push
To github.com:jvns/int-exposed
<span class="error">! [rejected]        main -> main (non-fast-forward)</span>
<span class="warning">error: failed to push some refs to 'github.com:jvns/int-exposed'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.</span>

$ git status
On branch main
Your branch and 'origin/main' have diverged,
and have 2 and 1 different commits each, respectively.
</pre>
<p>Some things I find confusing about this:</p>
<ol>
<li>You get the exact same error message whether the branch is just <strong>behind</strong>
or the branch has <strong>diverged</strong>. There&rsquo;s no way to tell which it is from this
message: you need to run <code>git status</code> or <code>git pull</code> to find out.</li>
<li>It says <code>failed to push some refs</code>, but it&rsquo;s not totally clear <em>which</em> references it
failed to push. I believe everything that failed to push is listed with <code>! [rejected]</code> on the previous line&ndash; in this case just the <code>main</code> branch.</li>
</ol>
<p><strong>What I like to do if I&rsquo;m confused:</strong></p>
<ul>
<li>I&rsquo;ll run <code>git status</code> to figure out what the state of my current branch is.</li>
<li>I think I almost never try to push more than one branch at a time, so I
usually totally ignore git&rsquo;s notes about which specific branch failed to push
&ndash; I just assume that it&rsquo;s my current branch</li>
</ul>
<h2 id="git-pull-on-a-diverged-branch">
  <a href="#git-pull-on-a-diverged-branch">
  error: <code>git pull</code> on a diverged branch
  </a>
</h2>
<pre>
$ git pull
<span class="warning">hint: You have divergent branches and need to specify how to reconcile them.
hint: You can do so by running one of the following commands sometime before
hint: your next pull:
hint:
hint:   git config pull.rebase false  # merge
hint:   git config pull.rebase true   # rebase
hint:   git config pull.ff only       # fast-forward only
hint:
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.</span>
fatal: Need to specify how to reconcile divergent branches.
</pre>
<p>The main thing I think is confusing here is that git is presenting you with a
kind of overwhelming number of options: it&rsquo;s saying that you can either:</p>
<ol>
<li>configure <code>pull.rebase false</code>, <code>pull.rebase true</code>, or <code>pull.ff only</code> locally</li>
<li>or configure them globally</li>
<li>or run <code>git pull --rebase</code> or <code>git pull --no-rebase</code></li>
</ol>
<p>It&rsquo;s very hard to imagine how a beginner to git could easily use this hint to
sort through all these options on their own.</p>
<p>If I were explaining this to a friend, I&rsquo;d say something like &ldquo;you can use <code>git pull --rebase</code>
or <code>git pull --no-rebase</code> to resolve this with a rebase or merge
<em>right now</em>, and if you want to set a permanent preference, you can do that
with <code>git config pull.rebase false</code> or <code>git config pull.rebase true</code>.</p>
<p><code>git config pull.ff only</code> feels a little redundant to me because that&rsquo;s git&rsquo;s
default behaviour anyway (though it wasn&rsquo;t always).</p>
<p><strong>What I like to do here:</strong></p>
<ul>
<li>run <code>git status</code> to see the state of my current branch</li>
<li>maybe run <code>git log origin/main</code> or <code>git log</code> to see what the diverged commits are</li>
<li>usually run <code>git pull --rebase</code> to resolve it</li>
<li>sometimes I&rsquo;ll run <code>git push --force</code> or <code>git reset --hard origin/main</code> if I
want to throw away my local work or remote work (for example because I
accidentally commited to the wrong branch, or because I ran <code>git commit --amend</code> on a personal branch that only I&rsquo;m using and want to force push)</li>
</ul>
<h2 id="git-checkout-asdf">
  <a href="#git-checkout-asdf">
  error: <code>git checkout asdf</code> (a branch that doesn't exist)
  </a>
</h2>
<pre>
$ git checkout asdf
error: pathspec 'asdf' did not match any file(s) known to git
</pre>
<p>This is a little weird because we my intention was to check out a <strong>branch</strong>,
but <code>git checkout</code> is complaining about a <strong>path</strong> that doesn&rsquo;t exist.</p>
<p>This is happening because <code>git checkout</code>&rsquo;s first argument can be either a
branch or a path, and git has no way of knowing which one you intended. This
seems tricky to improve, but I might expect something like &ldquo;No such branch,
commit, or path: asdf&rdquo;.</p>
<p><strong>What I like to do here:</strong></p>
<ul>
<li>in theory it would be good to use <code>git switch</code> instead, but I keep using <code>git checkout</code> anyway</li>
<li>generally I just remember that I need to decode this as &ldquo;branch <code>asdf</code> doesn&rsquo;t exist&rdquo;</li>
</ul>
<h2 id="git-switch-asdf">
  <a href="#git-switch-asdf">
  error: <code>git switch asdf</code> (a branch that doesn't exist)
  </a>
</h2>
<pre>
$ git switch asdf
fatal: invalid reference: asdf
</pre>
<p><code>git switch</code> only accepts a branch as an argument (unless you pass <code>-d</code>), so why is it saying <code>invalid reference: asdf</code> instead of <code>invalid branch: asdf</code>?</p>
<p>I think the reason is that internally, <code>git switch</code> is trying to be helpful in its error messages: if you run <code>git switch v0.1</code> to switch to a tag, it&rsquo;ll say:</p>
<pre><code>$ git switch v0.1
fatal: a branch is expected, got tag 'v0.1'`
</code></pre>
<p>So what git is trying to communicate with <code>fatal: invalid reference: asdf</code> is
&ldquo;<code>asdf</code> isn&rsquo;t a branch, but it&rsquo;s not a tag either, or any other reference&rdquo;. From my various <a href="https://jvns.ca/blog/2024/03/28/git-poll-results/">git polls</a> my impression is that
a lot of git users have literally no idea what a &ldquo;reference&rdquo; is in git, so I&rsquo;m not sure if that&rsquo;s coming across.</p>
<p><strong>What I like to do here:</strong></p>
<p>90% of the time when a git error message says <code>reference</code> I just mentally
replace it with <code>branch</code> in my head.</p>
<h2 id="detached-head">
  error: <a href="#detached-head"><code>git checkout HEAD^</code></a>
</h2>
<pre>$ git checkout HEAD^
Note: switching to 'HEAD^'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 182cd3f add "swap byte order" button
</pre>
<p>
This is a tough one. Definitely a lot of people are confused about this
message, but obviously there's been a lot of effort to improve it too. I don't
have anything smart to say about this one.
</p>
<p><strong>What I like to do here:</strong></p>
<ul>
<li>my shell prompt tells me if I&rsquo;m in detached HEAD state, and generally I can remember not to make new commits while in that state</li>
<li>when I&rsquo;m done looking at whatever old commits I wanted to look at, I&rsquo;ll run <code>git checkout main</code> or something to go back to a branch</li>
</ul>
<h2 id="rebase-in-progress">
  <a href="#rebase-in-progress">
  message: <code>git status</code> when a rebase is in progress
  </a>  
</h2>
<p>This isn&rsquo;t an error message, but I still find it a little confusing on its own:</p>
<pre>
$ git status
<span class="error">interactive rebase in progress;</span> onto c694cf8
Last command done (1 command done):
   pick 0a9964d wip
No commands remaining.
You are currently rebasing branch 'main' on 'c694cf8'.
  (fix conflicts and then run "git rebase --continue")
  (use "git rebase --skip" to skip this patch)
  (use "git rebase --abort" to check out the original branch)

Unmerged paths:
  (use "git restore --staged <file>..." to unstage)
  (use "git add <file>..." to mark resolution)
  <span class="error">both modified:   index.html</span>

no changes added to commit (use "git add" and/or "git commit -a")
</pre>
<p>Two things I think could be clearer here:</p>
<ol>
<li>I think it would be nice if <code>You are currently rebasing branch 'main' on 'c694cf8'.</code> were on the first line instead of the 5th line &ndash; right now the first line doesn&rsquo;t say which branch you&rsquo;re rebasing.</li>
<li>In this case, <code>c694cf8</code> is actually <code>origin/main</code>, so I feel like <code>You are currently rebasing branch 'main' on 'origin/main'</code> might be even clearer.</li>
</ol>
<p><strong>What I like to do here:</strong></p>
<p>My shell prompt includes the branch that I&rsquo;m currently rebasing, so I rely on that instead of the output of <code>git status</code>.</p>
<h2 id="merge-deleted">
  <a href="#merge-deleted">
  error: <code>git rebase</code> when a file has been deleted
  </a>
</h2>
<pre>
$ git rebase main
CONFLICT (modify/delete): index.html deleted in 0ce151e (wip) and modified in HEAD.  Version HEAD of index.html left in tree.
error: could not apply 0ce151e... wip
</pre>
<p>The thing I still find confusing about this is &ndash; <code>index.html</code> was modified in
<code>HEAD</code>. But what is <code>HEAD</code>? Is it the commit I was working on when I started
the merge/rebase, or is it the commit from the other branch? (the answer is
&ldquo;<code>HEAD</code> is your branch if you&rsquo;re doing a merge, and it&rsquo;s the &ldquo;other branch&rdquo; if
you&rsquo;re doing a rebase, but I always find that hard to remember)</p>
<p>I think I would personally find it easier to understand if the message listed the branch names if possible, something like this:</p>
<pre><code>CONFLICT (modify/delete): index.html deleted on `main` and modified on `mybranch`
</code></pre>
<h2 id="merge-ours">
  <a href="#merge-ours">
  error: <code>git status</code> during a merge or rebase (who is "them"?)
  </a>
</h2>
<pre>
$ git status 
On branch master
You have unmerged paths.
  (fix conflicts and run "git commit")
  (use "git merge --abort" to abort the merge)
<p>Unmerged paths:
(use &ldquo;git add/rm <file>&hellip;&rdquo; as appropriate to mark resolution)
deleted by them: the_file</p>
<p>no changes added to commit (use &ldquo;git add&rdquo; and/or &ldquo;git commit -a&rdquo;)
</pre></p>
<p>I find this one confusing in exactly the same way as the previous message: it
says <code>deleted by them:</code>, but what &ldquo;them&rdquo; refers to depends on whether you did a merge or rebase or cherry-pick.</p>
<ul>
<li>for a merge, <code>them</code> is the other branch you merged in</li>
<li>for a rebase, <code>them</code> is the branch that you were on when you ran <code>git rebase</code></li>
<li>for a cherry-pick, I guess it&rsquo;s the commit you cherry-picked</li>
</ul>
<p><strong>What I like to do if I&rsquo;m confused:</strong></p>
<ul>
<li>try to remember what I did</li>
<li>run <code>git show main --stat</code> or something to see what I did on the <code>main</code> branch if I can&rsquo;t remember</li>
</ul>
<h2 id="git clean">
  <a href="#git-clean">
  error: <code>git clean</code>
  </a>
</h2>
<pre>
$ git clean
fatal: clean.requireForce defaults to true and neither -i, -n, nor -f given; refusing to clean
</pre>
<p>I just find it a bit confusing that you need to look up what <code>-i</code>, <code>-n</code> and
<code>-f</code> are to be able to understand this error message. I&rsquo;m personally way too
lazy to do that so even though I&rsquo;ve probably been using <code>git clean</code> for 10
years I still had no idea what <code>-i</code> stood for (<code>interactive</code>) until I was
writing this down.</p>
<p><strong>What I like to do if I&rsquo;m confused:</strong></p>
<p>Usually I just chaotically run <code>git clean -f</code> to delete all my untracked files
and hope for the best, though I might actually switch to <code>git clean -i</code>  now
that I know what <code>-i</code> stands for. Seems a lot safer.</p>
<h3 id="that-s-all">that&rsquo;s all!</h3>
<p>Hopefully some of this is helpful!</p>
]]></content>
  </entry>
  
</feed>
Raw headers
{
  "age": "23657",
  "cache-control": "public,max-age=0,must-revalidate",
  "cache-status": "\"Netlify Edge\"; hit",
  "cf-cache-status": "DYNAMIC",
  "cf-ray": "929b6725d5e4e1e3-ORD",
  "connection": "keep-alive",
  "content-type": "application/xml",
  "date": "Tue, 01 Apr 2025 22:08:03 GMT",
  "etag": "W/\"0c063f078053a15687b2faaae11f146b-ssl-df\"",
  "nel": "{\"report_to\":\"default\",\"max_age\":31536000,\"include_subdomains\":true}",
  "report-to": "{\"group\":\"default\",\"max_age\":31536000,\"endpoints\":[{\"url\":\"https://jvns.report-uri.com/a/d/g\"}],\"include_subdomains\":true}",
  "server": "cloudflare",
  "strict-transport-security": "max-age=31536000",
  "transfer-encoding": "chunked",
  "vary": "Accept-Encoding",
  "x-nf-request-id": "01JQSPBD1NDVYVHND0V46Q3A8Y"
}
Parsed with @rowanmanning/feed-parser
{
  "meta": {
    "type": "atom",
    "version": "1.0"
  },
  "language": null,
  "title": "Julia Evans",
  "description": null,
  "copyright": null,
  "url": "http://jvns.ca",
  "self": "http://jvns.ca/atom.xml",
  "published": null,
  "updated": "2025-03-07T13:18:31.000Z",
  "generator": {
    "label": "Hugo",
    "version": null,
    "url": "http://gohugo.io/"
  },
  "image": null,
  "authors": [
    {
      "name": "Julia Evans",
      "email": null,
      "url": null
    }
  ],
  "categories": [],
  "items": [
    {
      "id": "https://jvns.ca/blog/2025/03/07/escape-code-standards/",
      "title": "Standards for ANSI escape codes",
      "description": null,
      "url": "https://jvns.ca/blog/2025/03/07/escape-code-standards/",
      "published": null,
      "updated": "2025-03-07T00:00:00.000Z",
      "content": "<p>Hello! Today I want to talk about ANSI escape codes.</p>\n<p>For a long time I was vaguely aware of ANSI escape codes (“that’s how you make\ntext red in the terminal and stuff”) but I had no real understanding of where they were\nsupposed to be defined or whether or not there were standards for them. I just\nhad a kind of vague “there be dragons” feeling around them. While learning\nabout the terminal this year, I’ve learned that:</p>\n<ol>\n<li>ANSI escape codes are responsible for a lot of usability improvements\nin the terminal (did you know there’s a way to copy to your system clipboard\nwhen SSHed into a remote machine?? It’s an escape code called <a href=\"https://jvns.ca/til/vim-osc52/\">OSC 52</a>!)</li>\n<li>They aren’t completely standardized, and because of that they don’t always\nwork reliably. And because they’re also invisible, it’s extremely\nfrustrating to troubleshoot escape code issues.</li>\n</ol>\n<p>So I wanted to put together a list for myself of some standards that exist\naround escape codes, because I want to know if they <em>have</em> to feel unreliable\nand frustrating, or if there’s a future where we could all rely on them with\nmore confidence.</p>\n<ul>\n<li><a href=\"#what-s-an-escape-code\">what’s an escape code?</a></li>\n<li><a href=\"#ecma-48\">ECMA-48</a></li>\n<li><a href=\"#xterm-control-sequences\">xterm control sequences</a></li>\n<li><a href=\"#terminfo\">terminfo</a></li>\n<li><a href=\"#should-programs-use-terminfo\">should programs use terminfo?</a></li>\n<li><a href=\"#is-there-a-single-common-set-of-escape-codes\">is there a “single common set” of escape codes?</a></li>\n<li><a href=\"#some-reasons-to-use-terminfo\">some reasons to use terminfo</a></li>\n<li><a href=\"#some-more-documents-standards\">some more documents/standards</a></li>\n<li><a href=\"#why-i-think-this-is-interesting\">why I think this is interesting</a></li>\n</ul>\n<h3 id=\"what-s-an-escape-code\">what’s an escape code?</h3>\n<p>Have you ever pressed the left arrow key in your terminal and seen <code>^[[D</code>?\nThat’s an escape code! It’s called an “escape code” because the first character\nis the “escape” character, which is usually written as <code>ESC</code>, <code>\\x1b</code>, <code>\\E</code>,\n<code>\\033</code>, or <code>^[</code>.</p>\n<p>Escape codes are how your terminal emulator communicates various kinds of\ninformation (colours, mouse movement, etc) with programs running in the\nterminal. There are two kind of escape codes:</p>\n<ol>\n<li><strong>input codes</strong> which your terminal emulator sends for keypresses or mouse\nmovements that don’t fit into Unicode. For example “left arrow key” is\n<code>ESC[D</code>, “Ctrl+left arrow” might be <code>ESC[1;5D</code>, and clicking the mouse might\nbe something like <code>ESC[M :3</code>.</li>\n<li><strong>output codes</strong> which programs can print out to colour text, move the\ncursor around, clear the screen, hide the cursor, copy text to the\nclipboard, enable mouse reporting, set the window title, etc.</li>\n</ol>\n<p>Now let’s talk about standards!</p>\n<h3 id=\"ecma-48\">ECMA-48</h3>\n<p>The first standard I found relating to escape codes was\n<a href=\"https://ecma-international.org/wp-content/uploads/ECMA-48_5th_edition_june_1991.pdf\">ECMA-48</a>,\nwhich was originally published in 1976.</p>\n<p>ECMA-48 does two things:</p>\n<ol>\n<li>Define some general <em>formats</em> for escape codes (like “CSI” codes, which are\n<code>ESC[</code> + something and “OSC” codes, which are <code>ESC]</code> + something)</li>\n<li>Define some specific escape codes, like how “move the cursor to the left” is\n<code>ESC[D</code>, or “turn text red” is  <code>ESC[31m</code>. In the spec, the “cursor left”\none is called <code>CURSOR LEFT</code> and the one for changing colours is called\n<code>SELECT GRAPHIC RENDITION</code>.</li>\n</ol>\n<p>The formats are extensible, so there’s room for others to define more escape\ncodes in the future. Lots of escape codes that are popular today aren’t defined\nin ECMA-48: for example it’s pretty common for terminal applications (like vim,\nhtop, or tmux) to support using the mouse, but ECMA-48 doesn’t define escape\ncodes for the mouse.</p>\n<h3 id=\"xterm-control-sequences\">xterm control sequences</h3>\n<p>There are a bunch of escape codes that aren’t defined in ECMA-48, for example:</p>\n<ul>\n<li>enabling mouse reporting (where did you click in your terminal?)</li>\n<li>bracketed paste (did you paste that text or type it in?)</li>\n<li>OSC 52 (which terminal applications can use to copy text to your system clipboard)</li>\n</ul>\n<p>I believe (correct me if I’m wrong!) that these and some others came from\nxterm, are documented in <a href=\"https://invisible-island.net/xterm/ctlseqs/ctlseqs.html\">XTerm Control Sequences</a>, and have\nbeen widely implemented by other terminal emulators.</p>\n<p>This list of “what xterm supports” is not a standard exactly, but xterm is\nextremely influential and so it seems like an important document.</p>\n<h3 id=\"terminfo\">terminfo</h3>\n<p>In the 80s (and to some extent today, but my understanding is that it was MUCH\nmore dramatic in the 80s) there was a huge amount of variation in what escape\ncodes terminals actually supported.</p>\n<p>To deal with this, there’s a database of escape codes for various terminals\ncalled “terminfo”.</p>\n<p>It looks like the standard for terminfo is called <a href=\"https://publications.opengroup.org/c243-1\">X/Open Curses</a>, though you need to create\nan account to view that standard for some reason. It defines the database format as well\nas a C library interface (“curses”) for accessing the database.</p>\n<p>For example you can run this bash snippet to see every possible escape code for\n“clear screen” for all of the different terminals your system knows about:</p>\n<pre><code>for term in $(toe -a | awk '{print $1}')\ndo\n  echo $term\n  infocmp -1 -T \"$term\" 2>/dev/null | grep 'clear=' | sed 's/clear=//g;s/,//g'\ndone\n</code></pre>\n<p>On my system (and probably every system I’ve ever used?), the terminfo database is managed by ncurses.</p>\n<h3 id=\"should-programs-use-terminfo\">should programs use terminfo?</h3>\n<p>I think it’s interesting that there are two main approaches that applications\ntake to handling ANSI escape codes:</p>\n<ol>\n<li>Use the terminfo database to figure out which escape codes to use, depending\non what’s in the <code>TERM</code> environment variable. Fish does this, for example.</li>\n<li>Identify a “single common set” of escape codes which works in “enough”\nterminal emulators and just hardcode those.</li>\n</ol>\n<p>Some examples of programs/libraries that take approach #2 (“don’t use terminfo”) include:</p>\n<ul>\n<li><a href=\"https://github.com/mawww/kakoune/commit/c12699d2e9c2806d6ed184032078d0b84a3370bb\">kakoune</a></li>\n<li><a href=\"https://github.com/prompt-toolkit/python-prompt-toolkit/blob/165258d2f3ae594b50f16c7b50ffb06627476269/src/prompt_toolkit/input/ansi_escape_sequences.py#L5-L8\">python-prompt-toolkit</a></li>\n<li><a href=\"https://github.com/antirez/linenoise\">linenoise</a></li>\n<li><a href=\"https://github.com/rockorager/libvaxis\">libvaxis</a></li>\n<li><a href=\"https://github.com/chalk/chalk\">chalk</a></li>\n</ul>\n<p>I got curious about why folks might be moving away from terminfo and I found\nthis very interesting and extremely detailed\n<a href=\"https://twoot.site/@bean/113056942625234032\">rant about terminfo from one of the fish maintainers</a>, which argues that:</p>\n<blockquote>\n<p>[the terminfo authors] have done a lot of work that, at the time, was\nextremely important and helpful. My point is that it no longer is.</p>\n</blockquote>\n<p>I’m not going to do it justice so I’m not going to summarize it, I think it’s\nworth reading.</p>\n<h3 id=\"is-there-a-single-common-set-of-escape-codes\">is there a “single common set” of escape codes?</h3>\n<p>I was just talking about the idea that you can use a “common set” of escape\ncodes that will work for most people. But what is that set? Is there any agreement?</p>\n<p>I really do not know the answer to this at all, but from doing some reading it\nseems like it’s some combination of:</p>\n<ul>\n<li>The codes that the VT100 supported (though some aren’t relevant on modern terminals)</li>\n<li>what’s in ECMA-48 (which I think also has some things that are no longer relevant)</li>\n<li>What xterm supports (though I’d guess that not everything in there is actually widely supported enough)</li>\n</ul>\n<p>and maybe ultimately “identify the terminal emulators you think your users are\ngoing to use most frequently and test in those”, the same way web developers do\nwhen deciding which CSS features are okay to use</p>\n<p>I don’t think there are any resources like <a href=\"https://caniuse.com/\">Can I use…?</a> or\n<a href=\"https://web-platform-dx.github.io/web-features/\">Baseline</a> for the terminal\nthough. (in theory terminfo is supposed to be the “caniuse” for the terminal\nbut it seems like it often takes 10+ years to add new terminal features when\npeople invent them which makes it very limited)</p>\n<h3 id=\"some-reasons-to-use-terminfo\">some reasons to use terminfo</h3>\n<p>I also asked on Mastodon why people found terminfo valuable in 2025 and got a\nfew reasons that made sense to me:</p>\n<ul>\n<li>some people expect to be able to use the <code>TERM</code> environment variable to\ncontrol how programs behave (for example with <code>TERM=dumb</code>), and there’s\nno standard for how that should work in a post-terminfo world</li>\n<li>even though there’s <em>less</em> variation between terminal emulators than\nthere was in the 80s, there’s far from zero variation: there are graphical\nterminals, the Linux framebuffer console, the situation you’re in when\nconnecting to a server via its serial console, Emacs shell mode, and probably\nmore that I’m missing</li>\n<li>there is no one standard for what the “single common set” of escape codes\nis, and sometimes programs use escape codes which aren’t actually widely\nsupported enough</li>\n</ul>\n<h3 id=\"terminfo-user-agent-detection\">terminfo & user agent detection</h3>\n<p>The way that ncurses uses the <code>TERM</code> environment variable to decide which\nescape codes to use reminds me of how webservers used to sometimes use the\nbrowser user agent to decide which version of a website to serve.</p>\n<p>It also seems like it’s had some of the same results – the way iTerm2 reports\nitself as being “xterm-256color” feels similar to how Safari’s user agent is\n“Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7_4) AppleWebKit/605.1.15 (KHTML,\nlike Gecko) Version/18.3 Safari/605.1.15”. In both cases the terminal emulator\n/ browser ends up changing its user agent to get around user agent detection\nthat isn’t working well.</p>\n<p>On the web we ended up deciding that user agent detection was not a good\npractice and to instead focus on standardization so we can serve the same\nHTML/CSS to all browsers. I don’t know if the same approach is the future in\nthe terminal though – I think the terminal landscape today is much more\nfragmented than the web ever was as well as being much less well funded.</p>\n<h3 id=\"some-more-documents-standards\">some more documents/standards</h3>\n<p>A few more documents and standards related to escape codes, in no particular order:</p>\n<ul>\n<li>the <a href=\"https://man7.org/linux/man-pages/man4/console_codes.4.html\">Linux console_codes man page</a> documents\nescape codes that Linux supports</li>\n<li>how the <a href=\"https://vt100.net/docs/vt100-ug/chapter3.html\">VT 100</a> handles escape codes & control sequences</li>\n<li>the <a href=\"https://sw.kovidgoyal.net/kitty/keyboard-protocol/\">kitty keyboard protocol</a></li>\n<li><a href=\"https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda\">OSC 8</a> for links in the terminal (and notes on <a href=\"https://github.com/Alhadis/OSC8-Adoption?tab=readme-ov-file\">adoption</a>)</li>\n<li>A <a href=\"https://github.com/tmux/tmux/blob/882fb4d295deb3e4b803eb444915763305114e4f/tools/ansicode.txt\">summary of ANSI standards from tmux</a></li>\n<li>this <a href=\"https://iterm2.com/feature-reporting/\">terminal features reporting specification from iTerm</a></li>\n<li>sixel graphics</li>\n</ul>\n<h3 id=\"why-i-think-this-is-interesting\">why I think this is interesting</h3>\n<p>I sometimes see people saying that the unix terminal is “outdated”, and since I\nlove the terminal so much I’m always curious about what incremental changes\nmight make it feel less “outdated”.</p>\n<p>Maybe if we had a clearer standards landscape (like we do on the web!) it would\nbe easier for terminal emulator developers to build new features and for\nauthors of terminal applications to more confidently adopt those features so\nthat we can all benefit from them and have a richer experience in the terminal.</p>\n<p>Obviously standardizing ANSI escape codes is not easy (ECMA-48 was first\npublished almost 50 years ago and we’re still not there!). I don’t even know\nwhat all of the challenges are. But the situation with HTML/CSS/JS used to be\nextremely bad too and now it’s MUCH better, so maybe there’s hope.</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2025/02/13/how-to-add-a-directory-to-your-path/",
      "title": "How to add a directory to your PATH",
      "description": null,
      "url": "https://jvns.ca/blog/2025/02/13/how-to-add-a-directory-to-your-path/",
      "published": null,
      "updated": "2025-02-13T12:27:56.000Z",
      "content": "<p>I was talking to a friend about how to add a directory to your PATH today. It’s\nsomething that feels “obvious” to me since I’ve been using the terminal for a\nlong time, but when I searched for instructions for how to do it, I actually\ncouldn’t find something that explained all of the steps – a lot of them just\nsaid “add this to <code>~/.bashrc</code>”, but what if you’re not using bash? What if your\nbash config is actually in a different file? And how are you supposed to figure\nout which directory to add anyway?</p>\n<p>So I wanted to try to write down some more complete directions and mention some\nof the gotchas I’ve run into over the years.</p>\n<p>Here’s a table of contents:</p>\n<ul>\n<li><a href=\"#step-1-what-shell-are-you-using\">step 1: what shell are you using?</a></li>\n<li><a href=\"#step-2-find-your-shell-s-config-file\">step 2: find your shell’s config file</a>\n<ul>\n<li><a href=\"#a-note-on-bash-s-config-file\">a note on bash’s config file</a></li>\n</ul>\n</li>\n<li><a href=\"#step-3-figure-out-which-directory-to-add\">step 3: figure out which directory to add</a>\n<ul>\n<li><a href=\"#step-3-1-double-check-it-s-the-right-directory\">step 3.1: double check it’s the right directory</a></li>\n</ul>\n</li>\n<li><a href=\"#step-4-edit-your-shell-config\">step 4: edit your shell config</a></li>\n<li><a href=\"#step-5-restart-your-shell\">step 5: restart your shell</a></li>\n<li>problems:\n<ul>\n<li><a href=\"#problem-1-it-ran-the-wrong-program\">problem 1: it ran the wrong program</a></li>\n<li><a href=\"#problem-2-the-program-isn-t-being-run-from-your-shell\">problem 2: the program isn’t being run from your shell</a></li>\n<li><a href=\"#problem-3-duplicate-path-entries-making-it-harder-to-debug\">problem 3: duplicate PATH entries making it harder to debug</a></li>\n<li><a href=\"#problem-4-losing-your-history-after-updating-your-path\">problem 4: losing your history after updating your PATH</a></li>\n</ul>\n</li>\n<li>notes:\n<ul>\n<li><a href=\"#a-note-on-source\">a note on source</a></li>\n<li><a href=\"#a-note-on-fish-add-path\">a note on fish_add_path</a></li>\n</ul>\n</li>\n</ul>\n<h3 id=\"step-1-what-shell-are-you-using\">step 1: what shell are you using?</h3>\n<p>If you’re not sure what shell you’re using, here’s a way to find out. Run this:</p>\n<pre><code>ps -p $$ -o pid,comm=\n</code></pre>\n<ul>\n<li>if you’re using <strong>bash</strong>, it’ll print out <code>97295 bash</code></li>\n<li>if you’re using <strong>zsh</strong>, it’ll print out <code>97295 zsh</code></li>\n<li>if you’re using <strong>fish</strong>, it’ll print out an error like “In fish, please use\n$fish_pid” (<code>$$</code> isn’t valid syntax in fish, but in any case the error\nmessage tells you that you’re using fish, which you probably already knew)</li>\n</ul>\n<p>Also bash is the default on Linux and zsh is the default on Mac OS (as of\n2024). I’ll only cover bash, zsh, and fish in these directions.</p>\n<h3 id=\"step-2-find-your-shell-s-config-file\">step 2: find your shell’s config file</h3>\n<ul>\n<li>in zsh, it’s probably <code>~/.zshrc</code></li>\n<li>in bash, it might be <code>~/.bashrc</code>, but it’s complicated, see the note in the next section</li>\n<li>in fish, it’s probably <code>~/.config/fish/config.fish</code> (you can run <code>echo $__fish_config_dir</code> if you want to be 100% sure)</li>\n</ul>\n<h3 id=\"a-note-on-bash-s-config-file\">a note on bash’s config file</h3>\n<p>Bash has three possible config files: <code>~/.bashrc</code>, <code>~/.bash_profile</code>, and <code>~/.profile</code>.</p>\n<p>If you’re not sure which one your system is set up to use, I’d recommend\ntesting this way:</p>\n<ol>\n<li>add <code>echo hi there</code> to your <code>~/.bashrc</code></li>\n<li>Restart your terminal</li>\n<li>If you see “hi there”, that means <code>~/.bashrc</code> is being used! Hooray!</li>\n<li>Otherwise remove it and try the same thing with <code>~/.bash_profile</code></li>\n<li>You can also try <code>~/.profile</code> if the first two options don’t work.</li>\n</ol>\n<p>(there are a lot of <a href=\"https://blog.flowblok.id.au/2013-02/shell-startup-scripts.html\">elaborate flow charts</a> out there that explain how bash\ndecides which config file to use but IMO it’s not worth it to internalize them\nand just testing is the fastest way to be sure)</p>\n<h3 id=\"step-3-figure-out-which-directory-to-add\">step 3: figure out which directory to add</h3>\n<p>Let’s say that you’re trying to install and run a program called <code>http-server</code>\nand it doesn’t work, like this:</p>\n<pre><code>$ npm install -g http-server\n$ http-server\nbash: http-server: command not found\n</code></pre>\n<p>How do you find what directory <code>http-server</code> is in? Honestly in general this is\nnot that easy – often the answer is something like “it depends on how npm is\nconfigured”. A few ideas:</p>\n<ul>\n<li>Often when setting up a new installer (like <code>cargo</code>, <code>npm</code>, <code>homebrew</code>, etc),\nwhen you first set it up it’ll print out some directions about how to update\nyour PATH. So if you’re paying attention you can get the directions then.</li>\n<li>Sometimes installers will automatically update your shell’s config file\nto update your <code>PATH</code> for you</li>\n<li>Sometimes just Googling “where does npm install things?” will turn up the\nanswer</li>\n<li>Some tools have a subcommand that tells you where they’re configured to\ninstall things, like:\n<ul>\n<li>Node/npm: <code>npm config get prefix</code> (then append <code>/bin/</code>)</li>\n<li>Go: <code>go env GOPATH</code> (then append <code>/bin/</code>)</li>\n<li>asdf: <code>asdf info | grep ASDF_DIR</code> (then append <code>/bin/</code> and <code>/shims/</code>)</li>\n</ul>\n</li>\n</ul>\n<h3 id=\"step-3-1-double-check-it-s-the-right-directory\">step 3.1: double check it’s the right directory</h3>\n<p>Once you’ve found a directory you think might be the right one, make sure it’s\nactually correct! For example, I found out that on my machine, <code>http-server</code> is\nin <code>~/.npm-global/bin</code>. I can make sure that it’s the right directory by trying to\nrun the program <code>http-server</code> in that directory like this:</p>\n<pre><code>$ ~/.npm-global/bin/http-server\nStarting up http-server, serving ./public\n</code></pre>\n<p>It worked! Now that you know what directory you need to add to your <code>PATH</code>,\nlet’s move to the next step!</p>\n<h3 id=\"step-4-edit-your-shell-config\">step 4: edit your shell config</h3>\n<p>Now we have the 2 critical pieces of information we need:</p>\n<ol>\n<li>Which directory you’re trying to add to your PATH (like  <code>~/.npm-global/bin/</code>)</li>\n<li>Where your shell’s config is (like <code>~/.bashrc</code>, <code>~/.zshrc</code>, or <code>~/.config/fish/config.fish</code>)</li>\n</ol>\n<p>Now what you need to add depends on your shell:</p>\n<p><strong>bash instructions:</strong></p>\n<p>Open your shell’s config file, and add a line like this:</p>\n<pre><code>export PATH=$PATH:~/.npm-global/bin/\n</code></pre>\n<p>(obviously replace <code>~/.npm-global/bin</code> with the actual directory you’re trying to add)</p>\n<p><strong>zsh instructions:</strong></p>\n<p>You can do the same thing as in bash, but zsh also has some slightly fancier\nsyntax you can use if you prefer:</p>\n<pre><code>path=(\n  $path\n  ~/.npm-global/bin\n)\n</code></pre>\n<p><strong>fish instructions:</strong></p>\n<p>In fish, the syntax is different:</p>\n<pre><code>set PATH $PATH ~/.npm-global/bin\n</code></pre>\n<p>(in fish you can also use <code>fish_add_path</code>, some notes on that <a href=\"#a-note-on-fish-add-path\">further down</a>)</p>\n<h3 id=\"step-5-restart-your-shell\">step 5: restart your shell</h3>\n<p>Now, an extremely important step: updating your shell’s config won’t take\neffect if you don’t restart it!</p>\n<p>Two ways to do this:</p>\n<ol>\n<li>open a new terminal (or terminal tab), and maybe close the old one so you don’t get confused</li>\n<li>Run <code>bash</code> to start a new shell (or <code>zsh</code> if you’re using zsh, or <code>fish</code> if you’re using fish)</li>\n</ol>\n<p>I’ve found that both of these usually work fine.</p>\n<p>And you should be done! Try running the program you were trying to run and\nhopefully it works now.</p>\n<p>If not, here are a couple of problems that you might run into:</p>\n<h3 id=\"problem-1-it-ran-the-wrong-program\">problem 1: it ran the wrong program</h3>\n<p>If the wrong <strong>version</strong> of a program is running, you might need to add the\ndirectory to the <em>beginning</em> of your PATH instead of the end.</p>\n<p>For example, on my system I have two versions of <code>python3</code> installed, which I\ncan see by running <code>which -a</code>:</p>\n<pre><code>$ which -a python3\n/usr/bin/python3\n/opt/homebrew/bin/python3\n</code></pre>\n<p>The one your shell will use is the <strong>first one listed</strong>.</p>\n<p>If you want to use the Homebrew version, you need to add that directory\n(<code>/opt/homebrew/bin</code>) to the <strong>beginning</strong> of your PATH instead, by putting this in\nyour shell’s config file (it’s <code>/opt/homebrew/bin/:$PATH</code> instead of the usual <code>$PATH:/opt/homebrew/bin/</code>)</p>\n<pre><code>export PATH=/opt/homebrew/bin/:$PATH\n</code></pre>\n<p>or in fish:</p>\n<pre><code>set PATH ~/.cargo/bin $PATH\n</code></pre>\n<h3 id=\"problem-2-the-program-isn-t-being-run-from-your-shell\">problem 2: the program isn’t being run from your shell</h3>\n<p>All of these directions only work if you’re running the program <strong>from your\nshell</strong>. If you’re running the program from an IDE, from a GUI, in a cron job,\nor some other way, you’ll need to add the directory to your PATH in a different\nway, and the exact details might depend on the situation.</p>\n<p><strong>in a cron job</strong></p>\n<p>Some options:</p>\n<ul>\n<li>use the full path to the program you’re running, like <code>/home/bork/bin/my-program</code></li>\n<li>put the full PATH you want as the first line of your crontab (something like\nPATH=/bin:/usr/bin:/usr/local/bin:….). You can get the full PATH you’re\nusing in your shell by running <code>echo \"PATH=$PATH\"</code>.</li>\n</ul>\n<p>I’m honestly not sure how to handle it in an IDE/GUI because I haven’t run into\nthat in a long time, will add directions here if someone points me in the right\ndirection.</p>\n<h3 id=\"problem-3-duplicate-path-entries-making-it-harder-to-debug\">problem 3: duplicate <code>PATH</code> entries making it harder to debug</h3>\n<p>If you edit your path and start a new shell by running <code>bash</code> (or <code>zsh</code>, or\n<code>fish</code>), you’ll often end up with duplicate <code>PATH</code> entries, because the shell\nkeeps adding new things to your <code>PATH</code> every time you start your shell.</p>\n<p>Personally I don’t think I’ve run into a situation where this kind of\nduplication breaks anything, but the duplicates can make it harder to debug\nwhat’s going on with your <code>PATH</code> if you’re trying to understand its contents.</p>\n<p>Some ways you could deal with this:</p>\n<ol>\n<li>If you’re debugging your <code>PATH</code>, open a new terminal to do it in so you get\na “fresh” state. This should avoid the duplication.</li>\n<li>Deduplicate your <code>PATH</code> at the end of your shell’s config  (for example in\nzsh apparently you can do this with <code>typeset -U path</code>)</li>\n<li>Check that the directory isn’t already in your <code>PATH</code> when adding it (for\nexample in fish I believe you can do this with <code>fish_add_path --path /some/directory</code>)</li>\n</ol>\n<p>How to deduplicate your <code>PATH</code> is shell-specific and there isn’t always a\nbuilt in way to do it so you’ll need to look up how to accomplish it in your\nshell.</p>\n<h3 id=\"problem-4-losing-your-history-after-updating-your-path\">problem 4: losing your history after updating your <code>PATH</code></h3>\n<p>Here’s a situation that’s easy to get into in bash or zsh:</p>\n<ol>\n<li>Run a command (it fails)</li>\n<li>Update your <code>PATH</code></li>\n<li>Run <code>bash</code> to reload your config</li>\n<li>Press the up arrow a couple of times to rerun the failed command (or open a new terminal)</li>\n<li>The failed command isn’t in your history! Why not?</li>\n</ol>\n<p>This happens because in bash, by default, history is not saved until you exit\nthe shell.</p>\n<p>Some options for fixing this:</p>\n<ul>\n<li>Instead of running <code>bash</code> to reload your config, run <code>source ~/.bashrc</code> (or\n<code>source ~/.zshrc</code> in zsh). This will reload the config inside your current\nsession.</li>\n<li>Configure your shell to continuously save your history instead of only saving\nthe history when the shell exits. (How to do this depends on whether you’re\nusing bash or zsh, the history options in zsh are a bit complicated and I’m\nnot exactly sure what the best way is)</li>\n</ul>\n<h3 id=\"a-note-on-source\">a note on <code>source</code></h3>\n<p>When you install <code>cargo</code> (Rust’s installer) for the first time, it gives you\nthese instructions for how to set up your PATH, which don’t mention a specific\ndirectory at all.</p>\n<pre><code>This is usually done by running one of the following (note the leading DOT):\n\n. \"$HOME/.cargo/env\"        \t# For sh/bash/zsh/ash/dash/pdksh\nsource \"$HOME/.cargo/env.fish\"  # For fish\n</code></pre>\n<p>The idea is that you add that line to your shell’s config, and their script\nautomatically sets up your <code>PATH</code> (and potentially other things) for you.</p>\n<p>This is pretty common (for example <a href=\"https://github.com/Homebrew/install/blob/deacfa6a6e62e5f4002baf9e1fac7a96e9aa5d41/install.sh#L1072-L1087\">Homebrew</a> suggests you eval <code>brew shellenv</code>), and there are\ntwo ways to approach this:</p>\n<ol>\n<li>Just do what the tool suggests (like adding <code>. \"$HOME/.cargo/env\"</code> to your shell’s config)</li>\n<li>Figure out which directories the script they’re telling you to run would add\nto your PATH, and then add those manually. Here’s how I’d do that:\n<ul>\n<li>Run <code>. \"$HOME/.cargo/env\"</code> in my shell (or the fish version if using fish)</li>\n<li>Run <code>echo \"$PATH\" | tr ':' '\\n' | grep cargo</code> to figure out which directories it added</li>\n<li>See that it says <code>/Users/bork/.cargo/bin</code> and shorten that to <code>~/.cargo/bin</code></li>\n<li>Add the directory <code>~/.cargo/bin</code> to PATH (with the directions in this post)</li>\n</ul>\n</li>\n</ol>\n<p>I don’t think there’s anything wrong with doing what the tool suggests (it\nmight be the “best way”!), but personally I usually use the second approach\nbecause I prefer knowing exactly what configuration I’m changing.</p>\n<h3 id=\"a-note-on-fish-add-path\">a note on <code>fish_add_path</code></h3>\n<p>fish has a handy function called <code>fish_add_path</code> that you can run to add a directory to your <code>PATH</code> like this:</p>\n<pre><code>fish_add_path /some/directory\n</code></pre>\n<p>This is cool (it’s such a simple command!) but I’ve stopped using it for a couple of reasons:</p>\n<ol>\n<li>Sometimes <code>fish_add_path</code> will update the <code>PATH</code> for every session in the\nfuture (with a “universal variable”) and sometimes it will update the <code>PATH</code>\njust for the current session and it’s hard for me to tell which one it will\ndo. In theory the docs explain this but I could not understand them.</li>\n<li>If you ever need to <em>remove</em> the directory from your <code>PATH</code> a few weeks or\nmonths later because maybe you made a mistake, it’s kind of hard to do\n(there are <a href=\"https://github.com/fish-shell/fish-shell/issues/8604\">instructions in this comments of this github issue though</a>).</li>\n</ol>\n<h3 id=\"that-s-all\">that’s all</h3>\n<p>Hopefully this will help some people. Let me know (on Mastodon or Bluesky) if\nyou there are other major gotchas that have tripped you up when adding a\ndirectory to your PATH, or if you have questions about this post!</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2025/02/05/some-terminal-frustrations/",
      "title": "Some terminal frustrations",
      "description": null,
      "url": "https://jvns.ca/blog/2025/02/05/some-terminal-frustrations/",
      "published": null,
      "updated": "2025-02-05T16:57:00.000Z",
      "content": "<p>A few weeks ago I ran a terminal survey (you can <a href=\"https://jvns.ca/terminal-survey/results-bsky.html\">read the results here</a>) and at the end I asked:</p>\n<blockquote>\n<p>What’s the most frustrating thing about using the terminal for you?</p>\n</blockquote>\n<p>1600 people answered, and I decided to spend a few days categorizing all the\nresponses. Along the way I learned that classifying qualitative data is not\neasy but I gave it my best shot. I ended up building a custom\n<a href=\"https://github.com/jvns/classificator\">tool</a> to make it faster to categorize\neverything.</p>\n<p>As with all of my surveys the methodology isn’t particularly scientific. I just\nposted the survey to Mastodon and Twitter, ran it for a couple of days, and got\nanswers from whoever happened to see it and felt like responding.</p>\n<p>Here are the top categories of frustrations!</p>\n<p>I think it’s worth keeping in mind while reading these comments that</p>\n<ul>\n<li>40% of people answering this survey have been using the terminal for <strong>21+ years</strong></li>\n<li>95% of people answering the survey have been using the terminal for at least 4 years</li>\n</ul>\n<p>These comments aren’t coming from total beginners.</p>\n<p>Here are the categories of frustrations! The number in brackets is the number\nof people with that frustration. I’m mostly writing this up for myself because\nI’m trying to write a zine about the terminal and I wanted to get a sense for\nwhat people are having trouble with.</p>\n<h3 id=\"remembering-syntax-115\">remembering syntax (115)</h3>\n<p>People talked about struggles remembering:</p>\n<ul>\n<li>the syntax for CLI tools like awk, jq, sed, etc</li>\n<li>the syntax for redirects</li>\n<li>keyboard shortcuts for tmux, text editing, etc</li>\n</ul>\n<p>One example comment:</p>\n<blockquote>\n<p>There are just so many little “trivia” details to remember for full\nfunctionality. Even after all these years I’ll sometimes forget where it’s 2\nor 1 for stderr, or forget which is which for <code>></code> and <code>>></code>.</p>\n</blockquote>\n<h3 id=\"switching-terminals-is-hard-91\">switching terminals is hard (91)</h3>\n<p>People talked about struggling with switching systems (for example home/work\ncomputer or when SSHing) and running into:</p>\n<ul>\n<li>OS differences in keyboard shortcuts (like Linux vs Mac)</li>\n<li>systems which don’t have their preferred text editor (“no vim” or “only vim”)</li>\n<li>different versions of the same command (like Mac OS grep vs GNU grep)</li>\n<li>no tab completion</li>\n<li>a shell they aren’t used to (“the subtle differences between zsh and bash”)</li>\n</ul>\n<p>as well as differences inside the same system like pagers being not consistent\nwith each other (git diff pagers, other pagers).</p>\n<p>One example comment:</p>\n<blockquote>\n<p>I got used to fish and vi mode which are not available when I ssh into\nservers, containers.</p>\n</blockquote>\n<h3 id=\"color-85\">color (85)</h3>\n<p>Lots of problems with color, like:</p>\n<ul>\n<li>programs setting colors that are unreadable with a light background color</li>\n<li>finding a colorscheme they like (and getting it to work consistently across different apps)</li>\n<li>color not working inside several layers of SSH/tmux/etc</li>\n<li>not liking the defaults</li>\n<li>not wanting color at all and struggling to turn it off</li>\n</ul>\n<p>This comment felt relatable to me:</p>\n<blockquote>\n<p>Getting my terminal theme configured in a reasonable way between the terminal\nemulator and fish (I did this years ago and remember it being tedious and\nfiddly and now feel like I’m locked into my current theme because it works\nand I dread touching any of that configuration ever again).</p>\n</blockquote>\n<h3 id=\"keyboard-shortcuts-84\">keyboard shortcuts (84)</h3>\n<p>Half of the comments on keyboard shortcuts were about how on Linux/Windows, the\nkeyboard shortcut to copy/paste in the terminal is different from in the rest\nof the OS.</p>\n<p>Some other issues with keyboard shortcuts other than copy/paste:</p>\n<ul>\n<li>using <code>Ctrl-W</code> in a browser-based terminal and closing the window</li>\n<li>the terminal only supports a limited set of keyboard shortcuts (no\n<code>Ctrl-Shift-</code>, no <code>Super</code>, no <code>Hyper</code>, lots of <code>ctrl-</code> shortcuts aren’t\npossible like <code>Ctrl-,</code>)</li>\n<li>the OS stopping you from using a terminal keyboard shortcut (like by default\nMac OS uses <code>Ctrl+left arrow</code> for something else)</li>\n<li>issues using emacs in the terminal</li>\n<li>backspace not working (2)</li>\n</ul>\n<h3 id=\"other-copy-and-paste-issues-75\">other copy and paste issues (75)</h3>\n<p>Aside from “the keyboard shortcut for copy and paste is different”, there were\na lot of OTHER issues with copy and paste, like:</p>\n<ul>\n<li>copying over SSH</li>\n<li>how tmux and the terminal emulator both do copy/paste in different ways</li>\n<li>dealing with many different clipboards (system clipboard, vim clipboard, the\n“middle click” clipboard on Linux, tmux’s clipboard, etc) and potentially\nsynchronizing them</li>\n<li>random spaces added when copying from the terminal</li>\n<li>pasting multiline commands which automatically get run in a terrifying way</li>\n<li>wanting a way to copy text without using the mouse</li>\n</ul>\n<h3 id=\"discoverability-55\">discoverability (55)</h3>\n<p>There were lots of comments about this, which all came down to the same basic\ncomplaint – it’s hard to discover useful tools or features! This comment kind of\nsummed it all up:</p>\n<blockquote>\n<p>How difficult it is to learn independently. Most of what I know is an\nassorted collection of stuff I’ve been told by random people over the years.</p>\n</blockquote>\n<h3 id=\"steep-learning-curve-44\">steep learning curve (44)</h3>\n<p>A lot of comments about it generally having a steep learning curve. A couple of\nexample comments:</p>\n<blockquote>\n<p>After 15 years of using it, I’m not much faster than using it than I was 5 or\nmaybe even 10 years ago.</p>\n</blockquote>\n<p>and</p>\n<blockquote>\n<p>That I know I could make my life easier by learning more about the shortcuts\nand commands and configuring the terminal but I don’t spend the time because it\nfeels overwhelming.</p>\n</blockquote>\n<h3 id=\"history-42\">history  (42)</h3>\n<p>Some issues with shell history:</p>\n<ul>\n<li>history not being shared between terminal tabs (16)</li>\n<li>limits that are too short (4)</li>\n<li>history not being restored when terminal tabs are restored</li>\n<li>losing history because the terminal crashed</li>\n<li>not knowing how to search history</li>\n</ul>\n<p>One example comment:</p>\n<blockquote>\n<p>It wasted a lot of time until I figured it out and still annoys me that\n“history” on zsh has such a small buffer;  I have to type “history 0” to get\nany useful length of history.</p>\n</blockquote>\n<h3 id=\"bad-documentation-37\">bad documentation (37)</h3>\n<p>People talked about:</p>\n<ul>\n<li>documentation being generally opaque</li>\n<li>lack of examples in man pages</li>\n<li>programs which don’t have man pages</li>\n</ul>\n<p>Here’s a representative comment:</p>\n<blockquote>\n<p>Finding good examples and docs. Man pages often not enough, have to wade\nthrough stack overflow</p>\n</blockquote>\n<h3 id=\"scrollback-36\">scrollback (36)</h3>\n<p>A few issues with scrollback:</p>\n<ul>\n<li>programs printing out too much data making you lose scrollback history</li>\n<li>resizing the terminal messes up the scrollback</li>\n<li>lack of timestamps</li>\n<li>GUI programs that you start in the background printing stuff out that gets in\nthe way of other programs’ outputs</li>\n</ul>\n<p>One example comment:</p>\n<blockquote>\n<p>When resizing the terminal (in particular: making it narrower) leads to\nbroken rewrapping of the scrollback content because the commands formatted\ntheir output based on the terminal window width.</p>\n</blockquote>\n<h3 id=\"it-feels-outdated-33\">“it feels outdated” (33)</h3>\n<p>Lots of comments about how the terminal feels hampered by legacy decisions and\nhow users often end up needing to learn implementation details that feel very\nesoteric. One example comment:</p>\n<blockquote>\n<p>Most of the legacy cruft, it would be great to have a green field\nimplementation of the CLI interface.</p>\n</blockquote>\n<h3 id=\"shell-scripting-32\">shell scripting (32)</h3>\n<p>Lots of complaints about POSIX shell scripting. There’s a general feeling that\nshell scripting is difficult but also that switching to a different less\nstandard scripting language (fish, nushell, etc) brings its own problems.</p>\n<blockquote>\n<p>Shell scripting. My tolerance to ditch a shell script and go to a scripting\nlanguage is pretty low. It’s just too messy and powerful. Screwing up can be\ncostly so I don’t even bother.</p>\n</blockquote>\n<h3 id=\"more-issues\">more issues</h3>\n<p>Some more issues that were mentioned at least 10 times:</p>\n<ul>\n<li>(31) inconsistent command line arguments: is it -h or help or –help?</li>\n<li>(24) keeping dotfiles in sync across different systems</li>\n<li>(23) performance (e.g. “my shell takes too long to start”)</li>\n<li>(20) window management (potentially with some combination of tmux tabs, terminal tabs, and multiple terminal windows. Where did that shell session go?)</li>\n<li>(17) generally feeling scared/uneasy (“The debilitating fear that I’m going\nto do some mysterious Bad Thing with a command and I will have absolutely no\nidea how to fix or undo it or even really figure out what happened”)</li>\n<li>(16) terminfo issues (“Having to learn about terminfo if/when I try a new terminal emulator and ssh elsewhere.”)</li>\n<li>(16) lack of image support (sixel etc)</li>\n<li>(15) SSH issues (like having to start over when you lose the SSH connection)</li>\n<li>(15) various tmux/screen issues (for example lack of integration between tmux and the terminal emulator)</li>\n<li>(15) typos & slow typing</li>\n<li>(13) the terminal getting messed up for various reasons (pressing <code>Ctrl-S</code>, <code>cat</code>ing a binary, etc)</li>\n<li>(12) quoting/escaping in the shell</li>\n<li>(11) various Windows/PowerShell issues</li>\n</ul>\n<h3 id=\"n-a-122\">n/a (122)</h3>\n<p>There were also 122 answers to the effect of “nothing really” or “only that I\ncan’t do EVERYTHING in the terminal”</p>\n<p>One example comment:</p>\n<blockquote>\n<p>Think I’ve found work arounds for most/all frustrations</p>\n</blockquote>\n<h3 id=\"that-s-all\">that’s all!</h3>\n<p>I’m not going to make a lot of commentary on these results, but here are a\ncouple of categories that feel related to me:</p>\n<ul>\n<li>remembering syntax & history (often the thing you need to remember is something you’ve run before!)</li>\n<li>discoverability & the learning curve (the lack of discoverability is definitely a big part of what makes it hard to learn)</li>\n<li>“switching systems is hard” & “it feels outdated” (tools that haven’t really\nchanged in 30 or 40 years have many problems but they do tend to be always\n<em>there</em> no matter what system you’re on, which is very useful and makes them\nhard to stop using)</li>\n</ul>\n<p>Trying to categorize all these results in a reasonable way really gave me an\nappreciation for social science researchers’ skills.</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2025/01/11/getting-a-modern-terminal-setup/",
      "title": "What's involved in getting a \"modern\" terminal setup?",
      "description": null,
      "url": "https://jvns.ca/blog/2025/01/11/getting-a-modern-terminal-setup/",
      "published": null,
      "updated": "2025-01-11T09:46:01.000Z",
      "content": "<p>Hello! Recently I ran a terminal survey and I asked people what frustrated\nthem. One person commented:</p>\n<blockquote>\n<p>There are so many pieces to having a modern terminal experience. I wish it\nall came out of the box.</p>\n</blockquote>\n<p>My immediate reaction was “oh, getting a modern terminal experience isn’t that\nhard, you just need to….”, but the more I thought about it, the longer the\n“you just need to…” list got, and I kept thinking about more and more\ncaveats.</p>\n<p>So I thought I would write down some notes about what it means to me personally\nto have a “modern” terminal experience and what I think can make it hard for\npeople to get there.</p>\n<h3 id=\"what-is-a-modern-terminal-experience\">what is a “modern terminal experience”?</h3>\n<p>Here are a few things that are important to me, with which part of the system\nis responsible for them:</p>\n<ul>\n<li><strong>multiline support for copy and paste</strong>: if you paste 3 commands in your shell, it should not immediately run them all! That’s scary! (<strong>shell</strong>, <strong>terminal emulator</strong>)</li>\n<li><strong>infinite shell history</strong>: if I run a command in my shell, it should be saved forever, not deleted after 500 history entries or whatever. Also I want commands to be saved to the history immediately when I run them, not only when I exit the shell session (<strong>shell</strong>)</li>\n<li><strong>a useful prompt</strong>: I can’t live without having my <strong>current directory</strong> and <strong>current git branch</strong> in my prompt (<strong>shell</strong>)</li>\n<li><strong>24-bit colour</strong>: this is important to me because I find it MUCH easier to theme neovim with 24-bit colour support than in a terminal with only 256 colours (<strong>terminal emulator</strong>)</li>\n<li><strong>clipboard integration</strong> between vim and my operating system so that when I copy in Firefox, I can just press <code>p</code> in vim to paste (<strong>text editor</strong>, maybe the OS/terminal emulator too)</li>\n<li><strong>good autocomplete</strong>: for example commands like git should have command-specific autocomplete (<strong>shell</strong>)</li>\n<li><strong>having colours in <code>ls</code></strong> (<strong>shell config</strong>)</li>\n<li><strong>a terminal theme I like</strong>: I spend a lot of time in my terminal, I want it to look nice and I want its theme to match my terminal editor’s theme. (<strong>terminal emulator</strong>, <strong>text editor</strong>)</li>\n<li><strong>automatic terminal fixing</strong>: If a programs prints out some weird escape\ncodes that mess up my terminal, I want that to automatically get reset so\nthat my terminal doesn’t get messed up (<strong>shell</strong>)</li>\n<li><strong>keybindings</strong>: I want <code>Ctrl+left arrow</code> to work (<strong>shell</strong> or <strong>application</strong>)</li>\n<li><strong>being able to use the scroll wheel in programs like <code>less</code></strong>: (<strong>terminal emulator</strong> and <strong>applications</strong>)</li>\n</ul>\n<p>There are a million other terminal conveniences out there and different people\nvalue different things, but those are the ones that I would be really unhappy\nwithout.</p>\n<h3 id=\"how-i-achieve-a-modern-experience\">how I achieve a “modern experience”</h3>\n<p>My basic approach is:</p>\n<ol>\n<li>use the <code>fish</code> shell. Mostly don’t configure it, except to:\n<ul>\n<li>set the <code>EDITOR</code> environment variable to my favourite terminal editor</li>\n<li>alias <code>ls</code> to <code>ls --color=auto</code></li>\n</ul>\n</li>\n<li>use any terminal emulator with 24-bit colour support. In the past I’ve used\nGNOME Terminal, Terminator, and iTerm, but I’m not picky about this. I don’t really\nconfigure it other than to choose a font.</li>\n<li>use <code>neovim</code>, with a configuration that I’ve been very slowly building over the last 9 years or so (the last time I deleted my vim config and started from scratch was 9 years ago)</li>\n<li>use the <a href=\"https://github.com/chriskempson/base16\">base16 framework</a> to theme everything</li>\n</ol>\n<p>A few things that affect my approach:</p>\n<ul>\n<li>I don’t spend a lot of time SSHed into other machines</li>\n<li>I’d rather use the mouse a little than come up with keyboard-based ways to do everything</li>\n<li>I work on a lot of small projects, not one big project</li>\n</ul>\n<h3 id=\"some-out-of-the-box-options-for-a-modern-experience\">some “out of the box” options for a “modern” experience</h3>\n<p>What if you want a nice experience, but don’t want to spend a lot of time on\nconfiguration? Figuring out how to configure vim in a way that I was satisfied\nwith really did take me like ten years, which is a long time!</p>\n<p>My best ideas for how to get a reasonable terminal experience with minimal\nconfig are:</p>\n<ul>\n<li>shell: either <code>fish</code> or <code>zsh</code> with <a href=\"https://ohmyz.sh/\">oh-my-zsh</a></li>\n<li>terminal emulator: almost anything with 24-bit colour support, for example all of these are popular:\n<ul>\n<li>linux: GNOME Terminal, Konsole, Terminator, xfce4-terminal</li>\n<li>mac: iTerm (Terminal.app doesn’t have 256-colour support)</li>\n<li>cross-platform: kitty, alacritty, wezterm, or ghostty</li>\n</ul>\n</li>\n<li>shell config:\n<ul>\n<li>set the <code>EDITOR</code> environment variable to your favourite terminal text\neditor</li>\n<li>maybe alias <code>ls</code> to <code>ls --color=auto</code></li>\n</ul>\n</li>\n<li>text editor: this is a tough one, maybe <a href=\"https://micro-editor.github.io/\">micro</a> or <a href=\"https://helix-editor.com/\">helix</a>? I haven’t used\neither of them seriously but they both seem like very cool projects and I\nthink it’s amazing that you can just use all the usual GUI editor commands\n(<code>Ctrl-C</code> to copy, <code>Ctrl-V</code> to paste, <code>Ctrl-A</code> to select all) in micro and\nthey do what you’d expect. I would probably try switching to helix except\nthat retraining my vim muscle memory seems way too hard. Also helix doesn’t\nhave a GUI or plugin system yet.</li>\n</ul>\n<p>Personally I <strong>wouldn’t</strong> use xterm, rxvt, or Terminal.app as a terminal emulator,\nbecause I’ve found in the past that they’re missing core features (like 24-bit\ncolour in Terminal.app’s case) that make the terminal harder to use for me.</p>\n<p>I don’t want to pretend that getting a “modern” terminal experience is easier\nthan it is though – I think there are two issues that make it hard. Let’s talk\nabout them!</p>\n<h3 id=\"issue-1-with-getting-to-a-modern-experience-the-shell\">issue 1 with getting to a “modern” experience: the shell</h3>\n<p>bash and zsh are by far the two most popular shells, and neither of them\nprovide a default experience that I would be happy using out of the box, for\nexample:</p>\n<ul>\n<li>you need to customize your prompt</li>\n<li>they don’t come with git completions by default, you have to set them up</li>\n<li>by default, bash only stores 500 (!) lines of history and (at least on Mac OS)\nzsh is only configured to store 2000 lines, which is still not a lot</li>\n<li>I find bash’s tab completion very frustrating, if there’s more than\none match then you can’t tab through them</li>\n</ul>\n<p>And even though <a href=\"https://jvns.ca/blog/2024/09/12/reasons-i--still--love-fish/\">I love fish</a>, the fact\nthat it isn’t POSIX does make it hard for a lot of folks to make the switch.</p>\n<p>Of course it’s totally possible to learn how to customize your prompt in bash\nor whatever, and it doesn’t even need to be that complicated (in bash I’d\nprobably start with something like <code>export PS1='[\\u@\\h \\W$(__git_ps1 \" (%s)\")]\\$ '</code>, or maybe use <a href=\"https://starship.rs/\">starship</a>).\nBut each of these “not complicated” things really does add up and it’s\nespecially tough if you need to keep your config in sync across several\nsystems.</p>\n<p>An extremely popular solution to getting a “modern” shell experience is\n<a href=\"https://ohmyz.sh/\">oh-my-zsh</a>. It seems like a great project and I know a lot\nof people use it very happily, but I’ve struggled with configuration systems\nlike that in the past – it looks like right now the base oh-my-zsh adds about\n3000 lines of config, and often I find that having an extra configuration\nsystem makes it harder to debug what’s happening when things go wrong. I\npersonally have a tendency to use the system to add a lot of extra plugins,\nmake my system slow, get frustrated that it’s slow, and then delete it\ncompletely and write a new config from scratch.</p>\n<h3 id=\"issue-2-with-getting-to-a-modern-experience-the-text-editor\">issue 2 with getting to a “modern” experience: the text editor</h3>\n<p>In the terminal survey I ran recently, the most popular terminal text editors\nby far were <code>vim</code>, <code>emacs</code>, and <code>nano</code>.</p>\n<p>I think the main options for terminal text editors are:</p>\n<ul>\n<li>use vim or emacs and configure it to your liking, you can probably have any\nfeature you want if you put in the work</li>\n<li>use nano and accept that you’re going to have a pretty limited experience\n(for example I don’t think you can select text with the mouse and then “cut”\nit in nano)</li>\n<li>use <code>micro</code> or <code>helix</code> which seem to offer a pretty good out-of-the-box\nexperience, potentially occasionally run into issues with using a less\nmainstream text editor</li>\n<li>just avoid using a terminal text editor as much as possible, maybe use VSCode, use\nVSCode’s terminal for all your terminal needs, and mostly never edit files in\nthe terminal. Or I know a lot of people use <code>code</code> as their <code>EDITOR</code> in the terminal.</li>\n</ul>\n<h3 id=\"issue-3-individual-applications\">issue 3: individual applications</h3>\n<p>The last issue is that sometimes individual programs that I use are kind of\nannoying. For example on my Mac OS machine, <code>/usr/bin/sqlite3</code> doesn’t support\nthe <code>Ctrl+Left Arrow</code> keyboard shortcut. Fixing this to get a reasonable\nterminal experience in SQLite was a little complicated, I had to:</p>\n<ul>\n<li>realize why this is happening (Mac OS won’t ship GNU tools, and “Ctrl-Left arrow” support comes from GNU readline)</li>\n<li>find a workaround (install sqlite from homebrew, which does have readline support)</li>\n<li>adjust my environment (put Homebrew’s sqlite3 in my PATH)</li>\n</ul>\n<p>I find that debugging application-specific issues like this is really not easy\nand often it doesn’t feel “worth it” – often I’ll end up just dealing with\nvarious minor inconveniences because I don’t want to spend hours investigating\nthem. The only reason I was even able to figure this one out at all is that\nI’ve been spending a huge amount of time thinking about the terminal recently.</p>\n<p>A big part of having a “modern” experience using terminal programs is just\nusing newer terminal programs, for example I can’t be bothered to learn a\nkeyboard shortcut to sort the columns in <code>top</code>, but in <code>htop</code>  I can just click\non a column heading with my mouse to sort it. So I use htop instead! But discovering new more “modern” command line tools isn’t easy (though\nI made <a href=\"https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-line-tools/\">a list here</a>),\nfinding ones that I actually like using in practice takes time, and if you’re\nSSHed into another machine, they won’t always be there.</p>\n<h3 id=\"everything-affects-everything-else\">everything affects everything else</h3>\n<p>Something I find tricky about configuring my terminal to make everything “nice”\nis that changing one seemingly small thing about my workflow can really affect\neverything else. For example right now I don’t use tmux. But if I needed to use\ntmux again (for example because I was doing a lot of work SSHed into another\nmachine), I’d need to think about a few things, like:</p>\n<ul>\n<li>if I wanted tmux’s copy to synchronize with my system clipboard over\nSSH, I’d need to make sure that my terminal emulator has <a href=\"https://old.reddit.com/r/vim/comments/k1ydpn/a_guide_on_how_to_copy_text_from_anywhere/\">OSC 52 support</a></li>\n<li>if I wanted to use iTerm’s tmux integration (which makes tmux tabs into iTerm\ntabs), I’d need to change how I configure colours – right now I set them\nwith a <a href=\"https://github.com/chriskempson/base16-shell/blob/588691ba71b47e75793ed9edfcfaa058326a6f41/scripts/base16-solarized-light.sh\">shell script</a> that I run when my shell starts, but that means the\ncolours get lost when restoring a tmux session.</li>\n</ul>\n<p>and probably more things I haven’t thought of. “Using tmux means that I have to\nchange how I manage my colours” sounds unlikely, but that really did happen to\nme and I decided “well, I don’t want to change how I manage colours right now,\nso I guess I’m not using that feature!”.</p>\n<p>It’s also hard to remember which features I’m relying on – for example maybe\nmy current terminal <em>does</em> have OSC 52 support and because copying from tmux over SSH\nhas always Just Worked I don’t even realize that that’s something I need, and\nthen it mysteriously stops working when I switch terminals.</p>\n<h3 id=\"change-things-slowly\">change things slowly</h3>\n<p>Personally even though I think my setup is not <em>that</em> complicated, it’s taken\nme 20 years to get to this point! Because terminal config changes are so likely\nto have unexpected and hard-to-understand consequences, I’ve found that if I\nchange a lot of terminal configuration all at once it makes it much harder to\nunderstand what went wrong if there’s a problem, which can be really\ndisorienting.</p>\n<p>So I usually prefer to make pretty small changes, and accept that changes can\nmight take me a REALLY long time to get used to. For example I switched from\nusing <code>ls</code> to <a href=\"https://github.com/eza-community/eza\">eza</a> a year or two ago and\nwhile I like it (because <code>eza -l</code> prints human-readable file sizes by default)\nI’m still not quite sure about it. But also sometimes it’s worth it to make a\nbig change, like I made the switch to fish (from bash) 10 years ago and I’m\nvery happy I did.</p>\n<h3 id=\"getting-a-modern-terminal-is-not-that-easy\">getting a “modern” terminal is not that easy</h3>\n<p>Trying to explain how “easy” it is to configure your terminal really just made\nme think that it’s kind of hard and that I still sometimes get confused.</p>\n<p>I’ve found that there’s never one perfect way to configure things in the\nterminal that will be compatible with every single other thing. I just need to\ntry stuff, figure out some kind of locally stable state that works for me, and\naccept that if I start using a new tool it might disrupt the system and I might\nneed to rethink things.</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/11/26/terminal-rules/",
      "title": "\"Rules\" that terminal programs follow",
      "description": null,
      "url": "https://jvns.ca/blog/2024/11/26/terminal-rules/",
      "published": null,
      "updated": "2024-12-12T09:28:22.000Z",
      "content": "<p>Recently I’ve been thinking about how everything that happens in the terminal\nis some combination of:</p>\n<ol>\n<li>Your <strong>operating system</strong>’s job</li>\n<li>Your <strong>shell</strong>’s job</li>\n<li>Your <strong>terminal emulator</strong>’s job</li>\n<li>The job of <strong>whatever program you happen to be running</strong> (like <code>top</code> or <code>vim</code> or <code>cat</code>)</li>\n</ol>\n<p>The first three (your operating system, shell, and terminal emulator) are all kind of\nknown quantities – if you’re using bash in GNOME Terminal on Linux, you can\nmore or less reason about how how all of those things interact, and some of\ntheir behaviour is standardized by POSIX.</p>\n<p>But the fourth one (“whatever program you happen to be running”) feels like it\ncould do ANYTHING. How are you supposed to know how a program is going to\nbehave?</p>\n<p>This post is kind of long so here’s a quick table of contents:</p>\n<ul>\n<li><a href=\"#programs-behave-surprisingly-consistently\">programs behave surprisingly consistently</a></li>\n<li><a href=\"#these-are-meant-to-be-descriptive-not-prescriptive\">these are meant to be descriptive, not prescriptive</a></li>\n<li><a href=\"#it-s-not-always-obvious-which-rules-are-the-program-s-responsibility-to-implement\">it’s not always obvious which “rules” are the program’s responsibility to implement</a></li>\n<li><a href=\"#rule-1-noninteractive-programs-should-quit-when-you-press-ctrl-c\">rule 1: noninteractive programs should quit when you press <code>Ctrl-C</code></a></li>\n<li><a href=\"#rule-2-tuis-should-quit-when-you-press-q\">rule 2: TUIs should quit when you press <code>q</code></a></li>\n<li><a href=\"#rule-3-repls-should-quit-when-you-press-ctrl-d-on-an-empty-line\">rule 3: REPLs should quit when you press <code>Ctrl-D</code> on an empty line</a></li>\n<li><a href=\"#rule-4-don-t-use-more-than-16-colours\">rule 4: don’t use more than 16 colours</a></li>\n<li><a href=\"#rule-5-vaguely-support-readline-keybindings\">rule 5: vaguely support readline keybindings</a></li>\n<li><a href=\"#rule-5-1-ctrl-w-should-delete-the-last-word\">rule 5.1: <code>Ctrl-W</code> should delete the last word</a></li>\n<li><a href=\"#rule-6-disable-colours-when-writing-to-a-pipe\">rule 6: disable colours when writing to a pipe</a></li>\n<li><a href=\"#rule-7-means-stdin-stdout\">rule 7: <code>-</code> means stdin/stdout</a></li>\n<li><a href=\"#these-rules-take-a-long-time-to-learn\">these “rules” take a long time to learn</a></li>\n</ul>\n<h3 id=\"programs-behave-surprisingly-consistently\">programs behave surprisingly consistently</h3>\n<p>As far as I know, there are no real standards for how programs in the terminal\nshould behave – the closest things I know of are:</p>\n<ul>\n<li>POSIX, which mostly dictates how your terminal emulator / OS / shell should\nwork together. I think it does specify a few things about how core utilities like\n<code>cp</code> should work but AFAIK it doesn’t have anything to say about how for\nexample <code>htop</code> should behave.</li>\n<li>these <a href=\"https://clig.dev/\">command line interface guidelines</a></li>\n</ul>\n<p>But even though there are no standards, in my experience programs in the\nterminal behave in a pretty consistent way. So I wanted to write down a list of\n“rules” that in my experience programs mostly follow.</p>\n<h3 id=\"these-are-meant-to-be-descriptive-not-prescriptive\">these are meant to be descriptive, not prescriptive</h3>\n<p>My goal here isn’t to convince authors of terminal programs that they <em>should</em>\nfollow any of these rules. There are lots of exceptions to these and often\nthere’s a good reason for those exceptions.</p>\n<p>But it’s very useful for me to know what behaviour to expect from a random new\nterminal program that I’m using. Instead of “uh, programs could do literally\nanything”, it’s “ok, here are the basic rules I expect, and then I can keep a\nshort mental list of exceptions”.</p>\n<p>So I’m just writing down what I’ve observed about how programs behave in my 20\nyears of using the terminal, why I think they behave that way, and some\nexamples of cases where that rule is “broken”.</p>\n<h3 id=\"it-s-not-always-obvious-which-rules-are-the-program-s-responsibility-to-implement\">it’s not always obvious which “rules” are the program’s responsibility to implement</h3>\n<p>There are a bunch of common conventions that I think are pretty clearly the\nprogram’s responsibility to implement, like:</p>\n<ul>\n<li>config files should go in <code>~/.BLAHrc</code> or <code>~/.config/BLAH/FILE</code> or <code>/etc/BLAH/</code> or something</li>\n<li><code>--help</code> should print help text</li>\n<li>programs should print “regular” output to stdout and errors to stderr</li>\n</ul>\n<p>But in this post I’m going to focus on things that it’s not 100% obvious are\nthe program’s responsibility. For example it feels to me like a “law of nature”\nthat pressing <code>Ctrl-D</code> should quit a REPL, but programs often\nneed to explicitly implement support for it – even though <code>cat</code> doesn’t need\nto implement <code>Ctrl-D</code> support, <code>ipython</code> <a href=\"https://github.com/prompt-toolkit/python-prompt-toolkit/blob/a2a12300c635ab3c051566e363ed27d853af4b21/src/prompt_toolkit/shortcuts/prompt.py#L824-L837\">does</a>. (more about that in “rule 3” below)</p>\n<p>Understanding which things are the program’s responsibility makes it much less\nsurprising when different programs’ implementations are slightly different.</p>\n<h3 id=\"rule-1-noninteractive-programs-should-quit-when-you-press-ctrl-c\">rule 1: noninteractive programs should quit when you press <code>Ctrl-C</code></h3>\n<p>The main reason for this rule is that noninteractive programs will quit by\ndefault on <code>Ctrl-C</code> if they don’t set up a <code>SIGINT</code> signal handler, so this is\nkind of a “you should act like the default” rule.</p>\n<p>Something that trips a lot of people up is that this doesn’t apply to\n<strong>interactive</strong> programs like <code>python3</code> or <code>bc</code> or <code>less</code>. This is because in\nan interactive program, <code>Ctrl-C</code> has a different job – if the program is\nrunning an operation (like for example a search in <code>less</code> or some Python code\nin <code>python3</code>), then <code>Ctrl-C</code> will interrupt that operation but not stop the\nprogram.</p>\n<p>As an example of how this works in an interactive program: here’s the code <a href=\"https://github.com/prompt-toolkit/python-prompt-toolkit/blob/a2a12300c635ab3c051566e363ed27d853af4b21/src/prompt_toolkit/key_binding/bindings/vi.py#L2225\">in prompt-toolkit</a> (the library that iPython uses for handling input)\nthat aborts a search when you press <code>Ctrl-C</code>.</p>\n<h3 id=\"rule-2-tuis-should-quit-when-you-press-q\">rule 2: TUIs should quit when you press <code>q</code></h3>\n<p>TUI programs (like <code>less</code> or <code>htop</code>) will usually quit when you press <code>q</code>.</p>\n<p>This rule doesn’t apply to any program where pressing <code>q</code> to quit wouldn’t make\nsense, like <code>tmux</code> or text editors.</p>\n<h3 id=\"rule-3-repls-should-quit-when-you-press-ctrl-d-on-an-empty-line\">rule 3: REPLs should quit when you press <code>Ctrl-D</code> on an empty line</h3>\n<p>REPLs (like <code>python3</code> or <code>ed</code>) will usually quit when you press <code>Ctrl-D</code> on an\nempty line. This rule is similar to the <code>Ctrl-C</code> rule – the reason for this is\nthat by default if you’re running a program (like <code>cat</code>) in “cooked mode”, then\nthe operating system will return an <code>EOF</code> when you press <code>Ctrl-D</code> on an empty\nline.</p>\n<p>Most of the REPLs I use (sqlite3, python3, fish, bash, etc) don’t actually use\ncooked mode, but they all implement this keyboard shortcut anyway to mimic the\ndefault behaviour.</p>\n<p>For example, here’s <a href=\"https://github.com/prompt-toolkit/python-prompt-toolkit/blob/a2a12300c635ab3c051566e363ed27d853af4b21/src/prompt_toolkit/shortcuts/prompt.py#L824-L837\">the code in prompt-toolkit</a>\nthat quits when you press Ctrl-D, and here’s <a href=\"https://github.com/bminor/bash/blob/6794b5478f660256a1023712b5fc169196ed0a22/lib/readline/readline.c#L658-L672\">the same code in readline</a>.</p>\n<p>I actually thought that this one was a “Law of Terminal Physics” until very\nrecently because I’ve basically never seen it broken, but you can see that it’s\njust something that each individual input library has to implement in the links\nabove.</p>\n<p>Someone pointed out that the Erlang REPL does not quit when you press <code>Ctrl-D</code>,\nso I guess not every REPL follows this “rule”.</p>\n<h3 id=\"rule-4-don-t-use-more-than-16-colours\">rule 4: don’t use more than 16 colours</h3>\n<p>Terminal programs rarely use colours other than the base 16 ANSI colours. This\nis because if you specify colours with a hex code, it’s very likely to clash\nwith some users’ background colour. For example if I print out some text as\n<code>#EEEEEE</code>, it would be almost invisible on a white background, though it would\nlook fine on a dark background.</p>\n<p>But if you stick to the default 16 base colours, you have a much better chance\nthat the user has configured those colours in their terminal emulator so that\nthey work reasonably well with their background color. Another reason to stick\nto the default base 16 colours is that it makes less assumptions about what\ncolours the terminal emulator supports.</p>\n<p>The only programs I usually see breaking this “rule” are text editors, for\nexample Helix by default will use a purple background which is not a default\nANSI colour. It seems fine for Helix to break this rule since Helix isn’t a\n“core” program and I assume any Helix user who doesn’t like that colorscheme\nwill just change the theme.</p>\n<h3 id=\"rule-5-vaguely-support-readline-keybindings\">rule 5: vaguely support readline keybindings</h3>\n<p>Almost every program I use supports <code>readline</code> keybindings if it would make\nsense to do so. For example, here are a bunch of different programs and a link\nto where they define <code>Ctrl-E</code> to go to the end of the line:</p>\n<ul>\n<li>ipython (<a href=\"https://github.com/prompt-toolkit/python-prompt-toolkit/blob/a2a12300c635ab3c051566e363ed27d853af4b21/src/prompt_toolkit/key_binding/bindings/emacs.py#L72\">Ctrl-E defined here</a>)</li>\n<li>atuin (<a href=\"https://github.com/atuinsh/atuin/blob/a67cfc82fe0dc907a01f07a0fd625701e062a33b/crates/atuin/src/command/client/search/interactive.rs#L407\">Ctrl-E defined here</a>)</li>\n<li>fzf (<a href=\"https://github.com/junegunn/fzf/blob/bb55045596d6d08445f3c6d320c3ec2b457462d1/src/terminal.go#L611\">Ctrl-E defined here</a>)</li>\n<li>zsh (<a href=\"https://github.com/zsh-users/zsh/blob/86d5f24a3d28541f242eb3807379301ea976de87/Src/Zle/zle_bindings.c#L94\">Ctrl-E defined here</a>)</li>\n<li>fish (<a href=\"https://github.com/fish-shell/fish-shell/blob/99fa8aaaa7956178973150a03ce4954ab17a197b/share/functions/fish_default_key_bindings.fish#L43\">Ctrl-E defined here</a>)</li>\n<li>tmux’s command prompt (<a href=\"https://github.com/tmux/tmux/blob/ae8f2208c98e3c2d6e3fe4cad2281dce8fd11f31/key-bindings.c#L490\">Ctrl-E defined here</a>)</li>\n</ul>\n<p>None of those programs actually uses <code>readline</code> directly, they just sort of\nmimic emacs/readline keybindings. They don’t always mimic them <em>exactly</em>: for\nexample atuin seems to use <code>Ctrl-A</code> as a prefix, so <code>Ctrl-A</code> doesn’t go to the\nbeginning of the line.</p>\n<p>Also all of these programs seem to implement their own internal cut and paste\nbuffers so you can delete a line with <code>Ctrl-U</code> and then paste it with <code>Ctrl-Y</code>.</p>\n<p>The exceptions to this are:</p>\n<ul>\n<li>some programs (like <code>git</code>, <code>cat</code>, and <code>nc</code>) don’t have any line editing support at all (except for backspace, <code>Ctrl-W</code>, and <code>Ctrl-U</code>)</li>\n<li>as usual text editors are an exception, every text editor has its own\napproach to editing text</li>\n</ul>\n<p>I wrote more about this “what keybindings does a program support?” question in\n<a href=\"https://jvns.ca/blog/2024/07/08/readline/\">entering text in the terminal is complicated</a>.</p>\n<h3 id=\"rule-5-1-ctrl-w-should-delete-the-last-word\">rule 5.1: Ctrl-W should delete the last word</h3>\n<p>I’ve never seen a program (other than a text editor) where <code>Ctrl-W</code> <em>doesn’t</em>\ndelete the last word. This is similar to the <code>Ctrl-C</code> rule – by default if a\nprogram is in “cooked mode”, the OS will delete the last word if you press\n<code>Ctrl-W</code>, and delete the whole line if you press <code>Ctrl-U</code>. So usually programs\nwill imitate that behaviour.</p>\n<p>I can’t think of any exceptions to this other than text editors but if there\nare I’d love to hear about them!</p>\n<h3 id=\"rule-6-disable-colours-when-writing-to-a-pipe\">rule 6: disable colours when writing to a pipe</h3>\n<p>Most programs will disable colours when writing to a pipe. For example:</p>\n<ul>\n<li><code>rg blah</code> will highlight all occurrences of <code>blah</code> in the output, but if the\noutput is to a pipe or a file, it’ll turn off the highlighting.</li>\n<li><code>ls --color=auto</code> will use colour when writing to a terminal, but not when\nwriting to a pipe</li>\n</ul>\n<p>Both of those programs will also format their output differently when writing\nto the terminal: <code>ls</code> will organize files into columns, and ripgrep will group\nmatches with headings.</p>\n<p>If you want to force the program to use colour (for example because you want to\nlook at the colour), you can use <code>unbuffer</code> to force the program’s output to be\na tty like this:</p>\n<pre><code>unbuffer rg blah |  less -R\n</code></pre>\n<p>I’m sure that there are some programs that “break” this rule but I can’t think\nof any examples right now. Some programs have an <code>--color</code> flag that you can\nuse to force colour to be on, in the example above you could also do <code>rg --color=always | less -R</code>.</p>\n<h3 id=\"rule-7-means-stdin-stdout\">rule 7: <code>-</code> means stdin/stdout</h3>\n<p>Usually if you pass <code>-</code> to a program instead of a filename, it’ll read from\nstdin or write to stdout (whichever is appropriate). For example, if you want\nto format the Python code that’s on your clipboard with <code>black</code> and then copy\nit, you could run:</p>\n<pre><code>pbpaste | black - | pbcopy\n</code></pre>\n<p>(<code>pbpaste</code> is a Mac program, you can do something similar on Linux with <code>xclip</code>)</p>\n<p>My impression is that most programs implement this if it would make sense and I\ncan’t think of any exceptions right now, but I’m sure there are many\nexceptions.</p>\n<h3 id=\"these-rules-take-a-long-time-to-learn\">these “rules” take a long time to learn</h3>\n<p>These rules took me a long time for me to learn because I had to:</p>\n<ol>\n<li>learn that the rule applied anywhere at all (\"<code>Ctrl-C</code> will exit programs\")</li>\n<li>notice some exceptions (“okay, <code>Ctrl-C</code> will exit <code>find</code> but not <code>less</code>”)</li>\n<li>subconsciously figure out what the pattern is (\"<code>Ctrl-C</code> will generally quit\nnoninteractive programs, but in interactive programs it might interrupt the\ncurrent operation instead of quitting the program\")</li>\n<li>eventually maybe formulate it into an explicit rule that I know</li>\n</ol>\n<p>A lot of my understanding of the terminal is honestly still in the\n“subconscious pattern recognition” stage. The only reason I’ve been taking the\ntime to make things explicit at all is because I’ve been trying to explain how\nit works to others. Hopefully writing down these “rules” explicitly will make\nlearning some of this stuff a little bit faster for others.</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/11/29/why-pipes-get-stuck-buffering/",
      "title": "Why pipes sometimes get \"stuck\": buffering",
      "description": null,
      "url": "https://jvns.ca/blog/2024/11/29/why-pipes-get-stuck-buffering/",
      "published": null,
      "updated": "2024-11-29T08:23:31.000Z",
      "content": "<p>Here’s a niche terminal problem that has bothered me for years but that I never\nreally understood until a few weeks ago. Let’s say you’re running this command\nto watch for some specific output in a log file:</p>\n<pre><code>tail -f /some/log/file | grep thing1 | grep thing2\n</code></pre>\n<p>If log lines are being added to the file relatively slowly, the result I’d see\nis… nothing! It doesn’t matter if there were matches in the log file or not,\nthere just wouldn’t be any output.</p>\n<p>I internalized this as “uh, I guess pipes just get stuck sometimes and don’t\nshow me the output, that’s weird”, and I’d handle it by just\nrunning <code>grep thing1 /some/log/file | grep thing2</code> instead, which would work.</p>\n<p>So as I’ve been doing a terminal deep dive over the last few months I was\nreally excited to finally learn exactly why this happens.</p>\n<h3 id=\"why-this-happens-buffering\">why this happens: buffering</h3>\n<p>The reason why “pipes get stuck” sometimes is that it’s VERY common for\nprograms to buffer their output before writing it to a pipe or file. So the\npipe is working fine, the problem is that the program never even wrote the data\nto the pipe!</p>\n<p>This is for performance reasons: writing all output immediately as soon as you\ncan uses more system calls, so it’s more efficient to save up data until you\nhave 8KB or so of data to write (or until the program exits) and THEN write it\nto the pipe.</p>\n<p>In this example:</p>\n<pre><code>tail -f /some/log/file | grep thing1 | grep thing2\n</code></pre>\n<p>the problem is that <code>grep thing1</code> is saving up all of its matches until it has\n8KB of data to write, which might literally never happen.</p>\n<h3 id=\"programs-don-t-buffer-when-writing-to-a-terminal\">programs don’t buffer when writing to a terminal</h3>\n<p>Part of why I found this so disorienting is that <code>tail -f file | grep thing</code>\nwill work totally fine, but then when you add the second <code>grep</code>, it stops\nworking!! The reason for this is that the way <code>grep</code> handles buffering depends\non whether it’s writing to a terminal or not.</p>\n<p>Here’s how <code>grep</code> (and many other programs) decides to buffer its output:</p>\n<ul>\n<li>Check if stdout is a terminal or not using the <code>isatty</code> function\n<ul>\n<li>If it’s a terminal, use line buffering (print every line immediately as soon as you have it)</li>\n<li>Otherwise, use “block buffering” – only print data if you have at least 8KB or so of data to print</li>\n</ul>\n</li>\n</ul>\n<p>So if <code>grep</code> is writing directly to your terminal then you’ll see the line as\nsoon as it’s printed, but if it’s writing to a pipe, you won’t.</p>\n<p>Of course the buffer size isn’t always 8KB for every program, it depends on the implementation. For <code>grep</code> the buffering is handled by libc, and libc’s buffer size is\ndefined in the <code>BUFSIZ</code> variable. <a href=\"https://github.com/bminor/glibc/blob/c69e8cccaff8f2d89cee43202623b33e6ef5d24a/libio/stdio.h#L100\">Here’s where that’s defined in glibc</a>.</p>\n<p>(as an aside: “programs do not use 8KB output buffers when writing to a\nterminal” isn’t, like, a law of terminal physics, a program COULD use an 8KB\nbuffer when writing output to a terminal if it wanted, it would just be\nextremely weird if it did that, I can’t think of any program that behaves that\nway)</p>\n<h3 id=\"commands-that-buffer-commands-that-don-t\">commands that buffer & commands that don’t</h3>\n<p>One annoying thing about this buffering behaviour is that you kind of need to\nremember which commands buffer their output when writing to a pipe.</p>\n<p>Some commands that <strong>don’t</strong> buffer their output:</p>\n<ul>\n<li>tail</li>\n<li>cat</li>\n<li>tee</li>\n</ul>\n<p>I think almost everything else will buffer output, especially if it’s a command\nwhere you’re likely to be using it for batch processing. Here’s a list of some\ncommon commands that buffer their output when writing to a pipe, along with the\nflag that disables block buffering.</p>\n<ul>\n<li>grep (<code>--line-buffered</code>)</li>\n<li>sed (<code>-u</code>)</li>\n<li>awk (there’s a <code>fflush()</code> function)</li>\n<li>tcpdump (<code>-l</code>)</li>\n<li>jq (<code>-u</code>)</li>\n<li>tr (<code>-u</code>)</li>\n<li>cut (can’t disable buffering)</li>\n</ul>\n<p>Those are all the ones I can think of, lots of unix commands (like <code>sort</code>) may\nor may not buffer their output but it doesn’t matter because <code>sort</code> can’t do\nanything until it finishes receiving input anyway.</p>\n<p>Also I did my best to test both the Mac OS and GNU versions of these but there\nare a lot of variations and I might have made some mistakes.</p>\n<h3 id=\"programming-languages-where-the-default-print-statement-buffers\">programming languages where the default “print” statement buffers</h3>\n<p>Also, here are a few programming language where the default print statement\nwill buffer output when writing to a pipe, and some ways to disable buffering\nif you want:</p>\n<ul>\n<li>C (disable with <code>setvbuf</code>)</li>\n<li>Python (disable with <code>python -u</code>, or <code>PYTHONUNBUFFERED=1</code>, or <code>sys.stdout.reconfigure(line_buffering=False)</code>, or <code>print(x, flush=True)</code>)</li>\n<li>Ruby (disable with <code>STDOUT.sync = true</code>)</li>\n<li>Perl (disable with <code>$| = 1</code>)</li>\n</ul>\n<p>I assume that these languages are designed this way so that the default print\nfunction will be fast when you’re doing batch processing.</p>\n<p>Also whether output is buffered or not might depend on how you print, for\nexample in C++ <code>cout << \"hello\\n\"</code> buffers when writing to a pipe but <code>cout << \"hello\" << endl</code> will flush its output.</p>\n<h3 id=\"when-you-press-ctrl-c-on-a-pipe-the-contents-of-the-buffer-are-lost\">when you press <code>Ctrl-C</code> on a pipe, the contents of the buffer are lost</h3>\n<p>Let’s say you’re running this command as a hacky way to watch for DNS requests\nto <code>example.com</code>, and you forgot to pass <code>-l</code> to tcpdump:</p>\n<pre><code>sudo tcpdump -ni any port 53 | grep example.com\n</code></pre>\n<p>When you press <code>Ctrl-C</code>, what happens? In a magical perfect world, what I would\n<em>want</em> to happen is for <code>tcpdump</code> to flush its buffer, <code>grep</code> would search for\n<code>example.com</code>, and I would see all the output I missed.</p>\n<p>But in the real world, what happens is that all the programs get killed and the\noutput in <code>tcpdump</code>’s buffer is lost.</p>\n<p>I think this problem is probably unavoidable – I spent a little time with\n<code>strace</code> to see how this works and <code>grep</code> receives the <code>SIGINT</code> before\n<code>tcpdump</code> anyway so even if <code>tcpdump</code> tried to flush its buffer <code>grep</code> would\nalready be dead.</p>\n<small>\n<p>After a little more investigation, there is a workaround: if you find\n<code>tcpdump</code>’s PID and <code>kill -TERM $PID</code>, then tcpdump will flush the buffer so\nyou can see the output. That’s kind of a pain but I tested it and it seems to\nwork.</p>\n</small>\n<h3 id=\"redirecting-to-a-file-also-buffers\">redirecting to a file also buffers</h3>\n<p>It’s not just pipes, this will also buffer:</p>\n<pre><code>sudo tcpdump -ni any port 53 > output.txt\n</code></pre>\n<p>Redirecting to a file doesn’t have the same “<code>Ctrl-C</code> will totally destroy the\ncontents of the buffer” problem though – in my experience it usually behaves\nmore like you’d want, where the contents of the buffer get written to the file\nbefore the program exits. I’m not 100% sure whether this is something you can\nalways rely on or not.</p>\n<h3 id=\"a-bunch-of-potential-ways-to-avoid-buffering\">a bunch of potential ways to avoid buffering</h3>\n<p>Okay, let’s talk solutions. Let’s say you’ve run this command:</p>\n<pre><code>tail -f /some/log/file | grep thing1 | grep thing2\n</code></pre>\n<p>I asked people on Mastodon how they would solve this in practice and there were\n5 basic approaches. Here they are:</p>\n<h4 id=\"solution-1-run-a-program-that-finishes-quickly\">solution 1: run a program that finishes quickly</h4>\n<p>Historically my solution to this has been to just avoid the “command writing to\npipe slowly” situation completely and instead run a program that will finish quickly\nlike this:</p>\n<pre><code>cat /some/log/file | grep thing1 | grep thing2 | tail\n</code></pre>\n<p>This doesn’t do the same thing as the original command but it does mean that\nyou get to avoid thinking about these weird buffering issues.</p>\n<p>(you could also do <code>grep thing1 /some/log/file</code> but I often prefer to use an\n“unnecessary” <code>cat</code>)</p>\n<h4 id=\"solution-2-remember-the-line-buffer-flag-to-grep\">solution 2: remember the “line buffer” flag to grep</h4>\n<p>You could remember that grep has a flag to avoid buffering and pass it like this:</p>\n<pre><code>tail -f /some/log/file | grep --line-buffered thing1 | grep thing2\n</code></pre>\n<h4 id=\"solution-3-use-awk\">solution 3: use awk</h4>\n<p>Some people said that if they’re specifically dealing with a multiple greps\nsituation, they’ll rewrite it to use a single <code>awk</code> instead, like this:</p>\n<pre><code>tail -f /some/log/file |  awk '/thing1/ && /thing2/'\n</code></pre>\n<p>Or you would write a more complicated <code>grep</code>, like this:</p>\n<pre><code>tail -f /some/log/file |  grep -E 'thing1.*thing2'\n</code></pre>\n<p>(<code>awk</code> also buffers, so for this to work you’ll want <code>awk</code> to be the last command in the pipeline)</p>\n<h4 id=\"solution-4-use-stdbuf\">solution 4: use <code>stdbuf</code></h4>\n<p><code>stdbuf</code> uses LD_PRELOAD to turn off libc’s buffering, and you can use it to turn off output buffering like this:</p>\n<pre><code>tail -f /some/log/file | stdbuf -o0 grep thing1 | grep thing2\n</code></pre>\n<p>Like any <code>LD_PRELOAD</code> solution it’s a bit unreliable – it doesn’t work on\nstatic binaries, I think won’t work if the program isn’t using libc’s\nbuffering, and doesn’t always work on Mac OS. Harry Marr has a really nice <a href=\"https://hmarr.com/blog/how-stdbuf-works/\">How stdbuf works</a> post.</p>\n<h4 id=\"solution-5-use-unbuffer\">solution 5: use <code>unbuffer</code></h4>\n<p><code>unbuffer program</code> will force the program’s output to be a TTY, which means\nthat it’ll behave the way it normally would on a TTY (less buffering, colour\noutput, etc). You could use it in this example like this:</p>\n<pre><code>tail -f /some/log/file | unbuffer grep thing1 | grep thing2\n</code></pre>\n<p>Unlike <code>stdbuf</code> it will always work, though it might have unwanted side\neffects, for example <code>grep thing1</code>’s will also colour matches.</p>\n<p>If you want to install unbuffer, it’s in the <code>expect</code> package.</p>\n<h3 id=\"that-s-all-the-solutions-i-know-about\">that’s all the solutions I know about!</h3>\n<p>It’s a bit hard for me to say which one is “best”, I think personally I’m\nmostly likely to use <code>unbuffer</code> because I know it’s always going to work.</p>\n<p>If I learn about more solutions I’ll try to add them to this post.</p>\n<h3 id=\"i-m-not-really-sure-how-often-this-comes-up\">I’m not really sure how often this comes up</h3>\n<p>I think it’s not very common for me to have a program that slowly trickles data\ninto a pipe like this, normally if I’m using a pipe a bunch of data gets\nwritten very quickly, processed by everything in the pipeline, and then\neverything exits. The only examples I can come up with right now are:</p>\n<ul>\n<li>tcpdump</li>\n<li><code>tail -f</code></li>\n<li>watching log files in a different way like with <code>kubectl logs</code></li>\n<li>the output of a slow computation</li>\n</ul>\n<h3 id=\"what-if-there-were-an-environment-variable-to-disable-buffering\">what if there were an environment variable to disable buffering?</h3>\n<p>I think it would be cool if there were a standard environment variable to turn\noff buffering, like <code>PYTHONUNBUFFERED</code> in Python. I got this idea from a\n<a href=\"https://blog.plover.com/Unix/stdio-buffering.html\">couple</a> of <a href=\"https://blog.plover.com/Unix/stdio-buffering-2.html\">blog posts</a> by Mark Dominus\nin 2018. Maybe <code>NO_BUFFER</code> like <a href=\"https://no-color.org/\">NO_COLOR</a>?</p>\n<p>The design seems tricky to get right; Mark points out that NETBSD has <a href=\"https://man.netbsd.org/setbuf.3\">environment variables called <code>STDBUF</code>, <code>STDBUF1</code>, etc</a> which gives you a\nton of control over buffering but I imagine most developers don’t want to\nimplement many different environment variables to handle a relatively minor\nedge case.</p>\n<p>I’m also curious about whether there are any programs that just automatically\nflush their output buffers after some period of time (like 1 second). It feels\nlike it would be nice in theory but I can’t think of any program that does that\nso I imagine there are some downsides.</p>\n<h3 id=\"stuff-i-left-out\">stuff I left out</h3>\n<p>Some things I didn’t talk about in this post since these posts have been\ngetting pretty long recently and seriously does anyone REALLY want to read 3000\nwords about buffering?</p>\n<ul>\n<li>the difference between line buffering and having totally unbuffered output</li>\n<li>how buffering to stderr is different from buffering to stdout</li>\n<li>this post is only about buffering that happens <strong>inside the program</strong>, your\noperating system’s TTY driver also does a little bit of buffering sometimes</li>\n<li>other reasons you might need to flush your output other than “you’re writing\nto a pipe”</li>\n</ul>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/11/18/how-to-import-a-javascript-library/",
      "title": "Importing a frontend Javascript library without a build system",
      "description": null,
      "url": "https://jvns.ca/blog/2024/11/18/how-to-import-a-javascript-library/",
      "published": null,
      "updated": "2024-11-18T09:35:42.000Z",
      "content": "<p>I like writing Javascript <a href=\"https://jvns.ca/blog/2023/02/16/writing-javascript-without-a-build-system/\">without a build system</a>\nand for the millionth time yesterday I ran into a problem where I needed to\nfigure out how to import a Javascript library in my code without using a build\nsystem, and it took FOREVER to figure out how to import it because the\nlibrary’s setup instructions assume that you’re using a build system.</p>\n<p>Luckily at this point I’ve mostly learned how to navigate this situation and\neither successfully use the library or decide it’s too difficult and switch to\na different library, so here’s the guide I wish I had to importing Javascript\nlibraries years ago.</p>\n<p>I’m only going to talk about using Javacript libraries on the frontend, and\nonly about how to use them in a no-build-system setup.</p>\n<p>In this post I’m going to talk about:</p>\n<ol>\n<li>the three main types of Javascript files a library might provide (ES Modules, the “classic” global variable kind, and CommonJS)</li>\n<li>how to figure out which types of files a Javascript library includes in its build</li>\n<li>ways to import each type of file in your code</li>\n</ol>\n<h3 id=\"the-three-kinds-of-javascript-files\">the three kinds of Javascript files</h3>\n<p>There are 3 basic types of Javascript files a library can provide:</p>\n<ol>\n<li>the “classic” type of file that defines a global variable. This is the kind\nof file that you can just <code><script src></code> and it’ll Just Work. Great if you\ncan get it but not always available</li>\n<li>an ES module (which may or may not depend on other files, we’ll get to that)</li>\n<li>a “CommonJS” module. This is for Node, you can’t use it in a browser at all\nwithout using a build system.</li>\n</ol>\n<p>I’m not sure if there’s a better name for the “classic” type but I’m just going\nto call it “classic”. Also there’s a type called “AMD” but I’m not sure how\nrelevant it is in 2024.</p>\n<p>Now that we know the 3 types of files, let’s talk about how to figure out which\nof these the library actually provides!</p>\n<h3 id=\"where-to-find-the-files-the-npm-build\">where to find the files: the NPM build</h3>\n<p>Every Javascript library has a <strong>build</strong> which it uploads to NPM. You might be\nthinking (like I did originally) – Julia! The whole POINT is that we’re not\nusing Node to build our library! Why are we talking about NPM?</p>\n<p>But if you’re using a link from a CDN like <a href=\"https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.min.js\">https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.min.js</a>,\nyou’re still using the NPM build! All the files on the CDNs originally come\nfrom NPM.</p>\n<p>Because of this, I sometimes like to <code>npm install</code> the library even if I’m not\nplanning to use Node to build my library at all – I’ll just create a new temp\nfolder, <code>npm install</code> there, and then delete it when I’m done. I like being able to poke\naround in the files in the NPM build on my filesystem, because then I can be\n100% sure that I’m seeing everything that the library is making available in\nits build and that the CDN isn’t hiding something from me.</p>\n<p>So let’s <code>npm install</code> a few libraries and try to figure out what types of\nJavascript files they provide in their builds!</p>\n<h3 id=\"example-library-1-chart-js\">example library 1: chart.js</h3>\n<p>First let’s look inside <a href=\"https://www.chartjs.org\">Chart.js</a>, a plotting library.</p>\n<pre><code>$ cd /tmp/whatever\n$ npm install chart.js\n$ cd node_modules/chart.js/dist\n$ ls *.*js\nchart.cjs  chart.js  chart.umd.js  helpers.cjs  helpers.js\n</code></pre>\n<p>This library seems to have 3 basic options:</p>\n<p><strong>option 1:</strong> <code>chart.cjs</code>. The <code>.cjs</code> suffix tells me that this is a <strong>CommonJS\nfile</strong>, for using in Node. This means it’s impossible to use it directly in the\nbrowser without some kind of build step.</p>\n<p><strong>option 2:<code>chart.js</code></strong>. The <code>.js</code> suffix by itself doesn’t tell us what kind of\nfile it is, but if I open it up, I see <code>import '@kurkle/color';</code> which is an\nimmediate sign that this is an ES module – the <code>import ...</code> syntax is ES\nmodule syntax.</p>\n<p><strong>option 3: <code>chart.umd.js</code></strong>. “UMD” stands for “Universal Module Definition”,\nwhich I think means that you can use this file either with a basic <code><script src></code>, CommonJS,\nor some third thing called AMD that I don’t understand.</p>\n<h3 id=\"how-to-use-a-umd-file\">how to use a UMD file</h3>\n<p>When I was using Chart.js I picked Option 3. I just needed to add this to my\ncode:</p>\n<pre><code><script src=\"./chart.umd.js\"> </script>\n</code></pre>\n<p>and then I could use the library with the global <code>Chart</code> environment variable.\nCouldn’t be easier. I just copied <code>chart.umd.js</code> into my Git repository so that\nI didn’t have to worry about using NPM or the CDNs going down or anything.</p>\n<h3 id=\"the-build-files-aren-t-always-in-the-dist-directory\">the build files aren’t always in the <code>dist</code> directory</h3>\n<p>A lot of libraries will put their build in the <code>dist</code> directory, but not\nalways! The build files’ location is specified in the library’s <code>package.json</code>.</p>\n<p>For example here’s an excerpt from Chart.js’s <code>package.json</code>.</p>\n<pre><code>  \"jsdelivr\": \"./dist/chart.umd.js\",\n  \"unpkg\": \"./dist/chart.umd.js\",\n  \"main\": \"./dist/chart.cjs\",\n  \"module\": \"./dist/chart.js\",\n</code></pre>\n<p>I think this is saying that if you want to use an ES Module (<code>module</code>) you\nshould use <code>dist/chart.js</code>, but the jsDelivr and unpkg CDNs should use\n<code>./dist/chart.umd.js</code>. I guess <code>main</code> is for Node.</p>\n<p><code>chart.js</code>’s <code>package.json</code> also says <code>\"type\": \"module\"</code>, which <a href=\"https://nodejs.org/api/packages.html#modules-packages\">according to this documentation</a>\ntells Node to treat files as ES modules by default. I think it doesn’t tell us\nspecifically which files are ES modules and which ones aren’t but it does tell\nus that <em>something</em> in there is an ES module.</p>\n<h3 id=\"example-library-2-atcute-oauth-browser-client\">example library 2: <code>@atcute/oauth-browser-client</code></h3>\n<p><a href=\"https://github.com/mary-ext/atcute/tree/trunk/packages/oauth/browser-client\"><code>@atcute/oauth-browser-client</code></a>\nis a library for logging into Bluesky with OAuth in the browser.</p>\n<p>Let’s see what kinds of Javascript files it provides in its build!</p>\n<pre><code>$ npm install @atcute/oauth-browser-client\n$ cd node_modules/@atcute/oauth-browser-client/dist\n$ ls *js\nconstants.js  dpop.js  environment.js  errors.js  index.js  resolvers.js\n</code></pre>\n<p>It seems like the only plausible root file in here is <code>index.js</code>, which looks\nsomething like this:</p>\n<pre><code>export { configureOAuth } from './environment.js';\nexport * from './errors.js';\nexport * from './resolvers.js';\n</code></pre>\n<p>This <code>export</code> syntax means it’s an <strong>ES module</strong>. That means we can use it in\nthe browser without a build step! Let’s see how to do that.</p>\n<h3 id=\"how-to-use-an-es-module-with-importmaps\">how to use an ES module with importmaps</h3>\n<p>Using an ES module isn’t an easy as just adding a <code><script src=\"whatever.js\"></code>. Instead, if\nthe ES module has dependencies (like <code>@atcute/oauth-browser-client</code> does) the\nsteps are:</p>\n<ol>\n<li>Set up an import map in your HTML</li>\n<li>Put import statements like <code>import { configureOAuth } from '@atcute/oauth-browser-client';</code> in your JS code</li>\n<li>Include your JS code in your HTML like this: <code><script type=\"module\" src=\"YOURSCRIPT.js\"></script></code></li>\n</ol>\n<p>The reason we need an import map instead of just doing something like <code>import { BrowserOAuthClient } from \"./oauth-client-browser.js\"</code> is that internally the module has more import statements like <code>import {something} from @atcute/client</code>, and we need to tell the browser where to get the code for <code>@atcute/client</code> and all of its other dependencies.</p>\n<p>Here’s what the importmap I used looks like for <code>@atcute/oauth-browser-client</code>:</p>\n<pre><code><script type=\"importmap\">\n{\n  \"imports\": {\n    \"nanoid\": \"./node_modules/nanoid/bin/dist/index.js\",\n    \"nanoid/non-secure\": \"./node_modules/nanoid/non-secure/index.js\",\n    \"nanoid/url-alphabet\": \"./node_modules/nanoid/url-alphabet/dist/index.js\",\n    \"@atcute/oauth-browser-client\": \"./node_modules/@atcute/oauth-browser-client/dist/index.js\",\n    \"@atcute/client\": \"./node_modules/@atcute/client/dist/index.js\",\n    \"@atcute/client/utils/did\": \"./node_modules/@atcute/client/dist/utils/did.js\"\n  }\n}\n</script>\n</code></pre>\n<p>Getting these import maps to work is pretty fiddly, I feel like there must be a\ntool to generate them automatically but I haven’t found one yet. It’s definitely possible to\nwrite a script that automatically generates the importmaps using <a href=\"https://esbuild.github.io/api/#metafile\">esbuild’s metafile</a> but I haven’t done that and\nmaybe there’s a better way.</p>\n<p>I decided to set up importmaps yesterday to get\n<a href=\"https://github.com/jvns/bsky-oauth-example\">github.com/jvns/bsky-oauth-example</a>\nto work, so there’s some example code in that repo.</p>\n<p>Also someone pointed me to Simon Willison’s\n<a href=\"https://simonwillison.net/2023/May/2/download-esm/\">download-esm</a>, which will\ndownload an ES module and rewrite the imports to point to the JS files directly\nso that you don’t need importmaps. I haven’t tried it yet but it seems like a\ngreat idea.</p>\n<h3 id=\"problems-with-importmaps-too-many-files\">problems with importmaps: too many files</h3>\n<p>I did run into some problems with using importmaps in the browser though – it\nneeded to download dozens of Javascript files to load my site, and my webserver\nin development couldn’t keep up for some reason. I kept seeing files fail to\nload randomly and then had to reload the page and hope that they would succeed\nthis time.</p>\n<p>It wasn’t an issue anymore when I deployed my site to production, so I guess it\nwas a problem with my local dev environment.</p>\n<p>Also one slightly annoying thing about ES modules in general is that you need to\nbe running a webserver to use them, I’m sure this is for a good reason but it’s\neasier when you can just open your <code>index.html</code> file without starting a\nwebserver.</p>\n<p>Because of the “too many files” thing I think actually using ES modules with\nimportmaps in this way isn’t actually that appealing to me, but it’s good to\nknow it’s possible.</p>\n<h3 id=\"how-to-use-an-es-module-without-importmaps\">how to use an ES module without importmaps</h3>\n<p>If the ES module doesn’t have dependencies then it’s even easier – you don’t\nneed the importmaps! You can just:</p>\n<ul>\n<li>put <code><script type=\"module\" src=\"YOURCODE.js\"></script></code> in your HTML. The <code>type=\"module\"</code> is important.</li>\n<li>put <code>import {whatever} from \"https://example.com/whatever.js\"</code> in <code>YOURCODE.js</code></li>\n</ul>\n<h3 id=\"alternative-use-esbuild\">alternative: use esbuild</h3>\n<p>If you don’t want to use importmaps, you can also use a build system like <a href=\"https://esbuild.github.io/\">esbuild</a>. I talked about how to do\nthat in <a href=\"https://jvns.ca/blog/2021/11/15/esbuild-vue/\">Some notes on using esbuild</a>, but this blog post is\nabout ways to avoid build systems completely so I’m not going to talk about\nthat option here. I do still like esbuild though and I think it’s a good option\nin this case.</p>\n<h3 id=\"what-s-the-browser-support-for-importmaps\">what’s the browser support for importmaps?</h3>\n<p><a href=\"https://caniuse.com/import-maps\">CanIUse</a> says that importmaps are in\n“Baseline 2023: newly available across major browsers” so my sense is that in\n2024 that’s still maybe a little bit too new? I think I would use importmaps\nfor some fun experimental code that I only wanted like myself and 12 people to\nuse, but if I wanted my code to be more widely usable I’d use <code>esbuild</code> instead.</p>\n<h3 id=\"example-library-3-atproto-oauth-client-browser\">example library 3: <code>@atproto/oauth-client-browser</code></h3>\n<p>Let’s look at one final example library! This is a different Bluesky auth\nlibrary than <code>@atcute/oauth-browser-client</code>.</p>\n<pre><code>$ npm install @atproto/oauth-client-browser\n$ cd node_modules/@atproto/oauth-client-browser/dist\n$ ls *js\nbrowser-oauth-client.js  browser-oauth-database.js  browser-runtime-implementation.js  errors.js  index.js  indexed-db-store.js  util.js\n</code></pre>\n<p>Again, it seems like only real candidate file here is <code>index.js</code>. But this is a\ndifferent situation from the previous example library! Let’s take a look at\n<code>index.js</code>:</p>\n<p>There’s a bunch of stuff like this in <code>index.js</code>:</p>\n<pre><code>__exportStar(require(\"@atproto/oauth-client\"), exports);\n__exportStar(require(\"./browser-oauth-client.js\"), exports);\n__exportStar(require(\"./errors.js\"), exports);\nvar util_js_1 = require(\"./util.js\");\n</code></pre>\n<p>This <code>require()</code> syntax is CommonJS syntax, which means that we can’t use this\nfile in the browser at all, we need to use some kind of build step, and\nESBuild won’t work either.</p>\n<p>Also in this library’s <code>package.json</code> it says <code>\"type\": \"commonjs\"</code> which is\nanother way to tell it’s CommonJS.</p>\n<h3 id=\"how-to-use-a-commonjs-module-with-esm-sh-https-esm-sh\">how to use a CommonJS module with <a href=\"https://esm.sh\">esm.sh</a></h3>\n<p>Originally I thought it was impossible to use CommonJS modules without learning\na build system, but then someone Bluesky told me about\n<a href=\"https://esm.sh\">esm.sh</a>! It’s a CDN that will translate anything into an ES\nModule. <a href=\"https://www.skypack.dev/\">skypack.dev</a> does something similar, I’m not\nsure what the difference is but one person mentioned that if one doesn’t work\nsometimes they’ll try the other one.</p>\n<p>For <code>@atproto/oauth-client-browser</code> using it seems pretty simple, I just need to put this in my HTML:</p>\n<pre><code><script type=\"module\" src=\"script.js\"> </script>\n</code></pre>\n<p>and then put this in <code>script.js</code>.</p>\n<pre><code>import { BrowserOAuthClient } from \"https://esm.sh/@atproto/[email protected]\"\n</code></pre>\n<p>It seems to Just Work, which is cool! Of course this is still sort of using a\nbuild system – it’s just that esm.sh is running the build instead of me. My\nmain concerns with this approach are:</p>\n<ul>\n<li>I don’t really trust CDNs to keep working forever – usually I like to copy dependencies into my repository so that they don’t go away for some reason in the future.</li>\n<li>I’ve heard of some issues with CDNs having security compromises which scares me.</li>\n<li>I don’t really understand what esm.sh is doing.</li>\n</ul>\n<h3 id=\"esbuild-can-also-convert-commonjs-modules-into-es-modules\">esbuild can also convert CommonJS modules into ES modules</h3>\n<p>I also learned that you can also use <code>esbuild</code> to convert a CommonJS module\ninto an ES module, though there are some limitations – the <code>import { BrowserOAuthClient } from</code> syntax doesn’t work. Here’s a <a href=\"https://github.com/evanw/esbuild/issues/442\">github issue about that</a>.</p>\n<p>I think the <code>esbuild</code> approach is probably more appealing to me than the\n<code>esm.sh</code> approach because it’s a tool that I already have on my computer so I\ntrust it more. I haven’t experimented with this much yet though.</p>\n<h3 id=\"summary-of-the-three-types-of-files\">summary of the three types of files</h3>\n<p>Here’s a summary of the three types of JS files you might encounter, options\nfor how to use them, and how to identify them.</p>\n<p>Unhelpfully a <code>.js</code> or <code>.min.js</code> file extension could be any of these 3\noptions, so if the file is <code>something.js</code> you need to do more detective work to\nfigure out what you’re dealing with.</p>\n<ol>\n<li><strong>“classic” JS files</strong>\n<ul>\n<li><strong>How to use it:</strong>: <code><script src=\"whatever.js\"></script></code></li>\n<li><strong>Ways to identify it:</strong>\n<ul>\n<li>The website has a big friendly banner in its setup instructions saying “Use this with a CDN!”  or something</li>\n<li>A <code>.umd.js</code> extension</li>\n<li>Just try to put it in a <code><script src=...</code> tag and see if it works</li>\n</ul>\n</li>\n</ul>\n</li>\n<li><strong>ES Modules</strong>\n<ul>\n<li><strong>Ways to use it:</strong>\n<ul>\n<li>If there are no dependencies, just <code>import {whatever} from \"./my-module.js\"</code> directly in your code</li>\n<li>If there are dependencies, create an importmap and <code>import {whatever} from \"my-module\"</code>\n<ul>\n<li>or use <a href=\"https://simonwillison.net/2023/May/2/download-esm/\">download-esm</a> to remove the need for an importmap</li>\n</ul>\n</li>\n<li>Use <a href=\"https://esbuild.github.io/\">esbuild</a> or any ES Module bundler</li>\n</ul>\n</li>\n<li><strong>Ways to identify it:</strong>\n<ul>\n<li>Look for an <code>import </code> or <code>export </code> statement. (not <code>module.exports = ...</code>, that’s CommonJS)</li>\n<li>An <code>.mjs</code> extension</li>\n<li>maybe <code>\"type\": \"module\"</code> in <code>package.json</code> (though it’s not clear to me which file exactly this refers to)</li>\n</ul>\n</li>\n</ul>\n</li>\n<li><strong>CommonJS Modules</strong>\n<ul>\n<li><strong>Ways to use it:</strong>\n<ul>\n<li>Use <a href=\"https://esm.sh/#docs\">https://esm.sh</a> to convert it into an ES module, like <code>https://esm.sh/@atproto/[email protected]</code></li>\n<li>Use a build somehow (??)</li>\n</ul>\n</li>\n<li><strong>Ways to identify it:</strong>\n<ul>\n<li>Look for <code>require()</code> or <code>module.exports = ...</code> in the code</li>\n<li>A <code>.cjs</code> extension</li>\n<li>maybe <code>\"type\": \"commonjs\"</code> in <code>package.json</code> (though it’s not clear to me which file exactly this refers to)</li>\n</ul>\n</li>\n</ul>\n</li>\n</ol>\n<h3 id=\"it-s-really-nice-to-have-es-modules-standardized\">it’s really nice to have ES modules standardized</h3>\n<p>The main difference between CommonJS modules and ES modules from my perspective\nis that ES modules are actually a standard. This makes me feel a lot more\nconfident using them, because browsers commit to backwards compatibility for\nweb standards forever – if I write some code using ES modules today, I can\nfeel sure that it’ll still work the same way in 15 years.</p>\n<p>It also makes me feel better about using tooling like <code>esbuild</code> because even if\nthe esbuild project dies, because it’s implementing a standard it feels likely\nthat there will be another similar tool in the future that I can replace it\nwith.</p>\n<h3 id=\"the-js-community-has-built-a-lot-of-very-cool-tools\">the JS community has built a lot of very cool tools</h3>\n<p>A lot of the time when I talk about this stuff I get responses like “I hate\njavascript!!! it’s the worst!!!”. But my experience is that there are a lot of great tools for Javascript\n(I just learned about <a href=\"https://esm.sh\">https://esm.sh</a> yesterday which seems great! I love\nesbuild!), and that if I take the time to learn how things works I can take\nadvantage of some of those tools and make my life a lot easier.</p>\n<p>So the goal of this post is definitely not to complain about Javascript, it’s\nto understand the landscape so I can use the tooling in a way that feels good\nto me.</p>\n<h3 id=\"questions-i-still-have\">questions I still have</h3>\n<p>Here are some questions I still have, I’ll add the answers into the post if I\nlearn the answer.</p>\n<ul>\n<li>Is there a tool that automatically generates importmaps for an ES Module that\nI have set up locally? (apparently yes: <a href=\"https://jspm.org/getting-started\">jspm</a>)</li>\n<li>How can I convert a CommonJS module into an ES module on my computer, the way\n<a href=\"https://esm.sh\">https://esm.sh</a> does? (apparently esbuild can sort of do this, though <a href=\"https://github.com/evanw/esbuild/issues/442\">named exports don’t work</a>)</li>\n<li>When people normally build CommonJS modules into regular JS code, what’s code is\ndoing that? Obviously there are tools like webpack, rollup, esbuild, etc, but\ndo those tools all implement their own JS parsers/static analysis? How many\nJS parsers are there out there?</li>\n<li>Is there any way to bundle an ES module into a single file (like\n<code>atcute-client.js</code>), but so that in the browser I can still import multiple\ndifferent paths from that file (like both <code>@atcute/client/lexicons</code> and\n<code>@atcute/client</code>)?</li>\n</ul>\n<h3 id=\"all-the-tools\">all the tools</h3>\n<p>Here’s a list of every tool we talked about in this post:</p>\n<ul>\n<li>Simon Willison’s\n<a href=\"https://simonwillison.net/2023/May/2/download-esm/\">download-esm</a> which will\ndownload an ES module and convert the imports to point at JS files so you\ndon’t need an importmap</li>\n<li><a href=\"esm.sh\">https://esm.sh/</a> and <a href=\"https://www.skypack.dev/\">skypack.dev</a></li>\n<li><a href=\"https://esbuild.github.io/\">esbuild</a></li>\n<li><a href=\"https://jspm.org/getting-started\">JSPM</a> can generate importmaps</li>\n</ul>\n<p>Writing this post has made me think that even though I usually don’t want to\nhave a build that I run every time I update the project, I might be willing to\nhave a build step (using <code>download-esm</code> or something) that I run <strong>only once</strong>\nwhen setting up the project and never run again except maybe if I’m updating my\ndependency versions.</p>\n<h3 id=\"that-s-all\">that’s all!</h3>\n<p>Thanks to <a href=\"https://polotek.net/\">Marco Rogers</a> who taught me a lot of the things\nin this post. I’ve probably made some mistakes in this post and I’d love to\nknow what they are – let me know on Bluesky or Mastodon!</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/11/09/new-microblog/",
      "title": "New microblog with TILs",
      "description": null,
      "url": "https://jvns.ca/blog/2024/11/09/new-microblog/",
      "published": null,
      "updated": "2024-11-09T09:24:29.000Z",
      "content": "<p>I added a new section to this site a couple weeks ago called\n<a href=\"https://jvns.ca/til/\">TIL</a> (“today I learned”).</p>\n<h3 id=\"the-goal-save-interesting-tools-facts-i-posted-on-social-media\">the goal: save interesting tools & facts I posted on social media</h3>\n<p>One kind of thing I like to post on Mastodon/Bluesky is “hey, here’s a cool\nthing”, like <a href=\"https://github.com/dbcli/litecli\">the great SQLite repl litecli</a>, or\nthe fact that cross compiling in Go Just Works and it’s amazing, or\n<a href=\"https://www.latacora.com/blog/2018/04/03/cryptographic-right-answers/\">cryptographic right answers</a>,\nor <a href=\"https://diffdiff.net/\">this great diff tool</a>. Usually I don’t want to write\na whole blog post about those things because I really don’t have much more to\nsay than “hey this is useful!”</p>\n<p>It started to bother me that I didn’t have anywhere to put those things: for\nexample recently I wanted to use <a href=\"https://diffdiff.net/\">diffdiff</a> and I just\ncould not remember what it was called.</p>\n<h3 id=\"the-solution-make-a-new-section-of-this-blog\">the solution: make a new section of this blog</h3>\n<p>So I quickly made a new folder called <a href=\"https://jvns.ca/til/\">/til/</a>, added some\ncustom styling (I wanted to style the posts to look a little bit like a tweet),\nmade a little Rake task to help me create new posts quickly (<code>rake new_til</code>), and\nset up a separate RSS Feed for it.</p>\n<p>I think this new section of the blog might be more for myself than anything,\nnow when I forget the link to Cryptographic Right Answers I can hopefully look\nit up on the TIL page. (you might think “julia, why not use bookmarks??” but I\nhave been failing to use bookmarks for my whole life and I don’t see that\nchanging ever, putting things in public is for whatever reason much easier for\nme)</p>\n<p>So far it’s been working, often I can actually just make a quick post in 2\nminutes which was the goal.</p>\n<h3 id=\"inspired-by-simon-willison-s-til-blog\">inspired by Simon Willison’s TIL blog</h3>\n<p>My page is inspired by <a href=\"https://til.simonwillison.net/\">Simon Willison’s great TIL blog</a>, though my TIL posts are a lot shorter.</p>\n<h3 id=\"i-don-t-necessarily-want-everything-to-be-archived\">I don’t necessarily want everything to be archived</h3>\n<p>This came about because I spent a lot of time on Twitter, so I’ve been thinking\nabout what I want to do about all of my tweets.</p>\n<p>I keep reading the advice to “POSSE” (“post on your own site, syndicate\nelsewhere”), and while I find the idea appealing in principle, for me part of\nthe appeal of social media is that it’s a little bit ephemeral. I can\npost polls or questions or observations or jokes and then they can just kind of\nfade away as they become less relevant.</p>\n<p>I find it a lot easier to identify specific categories of things that I actually\nwant to have on a Real Website That I Own:</p>\n<ul>\n<li>blog posts here!</li>\n<li>comics at <a href=\"https://wizardzines.com/comics/\">https://wizardzines.com/comics/</a>!</li>\n<li>now TILs at <a href=\"https://jvns.ca/til/\">https://jvns.ca/til/</a>)</li>\n</ul>\n<p>and then let everything else be kind of ephemeral.</p>\n<p>I really believe in the advice to make email lists though – the first two\n(blog posts & comics) both have email lists and RSS feeds that people can\nsubscribe to if they want. I might add a quick summary of any TIL posts from\nthat week to the “blog posts from this week” mailing list.</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/10/31/ascii-control-characters/",
      "title": "ASCII control characters in my terminal",
      "description": null,
      "url": "https://jvns.ca/blog/2024/10/31/ascii-control-characters/",
      "published": null,
      "updated": "2024-10-31T08:00:10.000Z",
      "content": "<p>Hello! I’ve been thinking about the terminal a lot and yesterday I got curious\nabout all these “control codes”, like <code>Ctrl-A</code>, <code>Ctrl-C</code>, <code>Ctrl-W</code>, etc. What’s\nthe deal with all of them?</p>\n<h3 id=\"a-table-of-ascii-control-characters\">a table of ASCII control characters</h3>\n<p>Here’s a table of all 33 ASCII control characters, and what they do on my\nmachine (on Mac OS), more or less. There are about a million caveats, but I’ll talk about\nwhat it means and all the problems with this diagram that I know about.</p>\n<p><a href=\"https://jvns.ca/ascii.html\"><img src=\"https://jvns.ca/images/ascii-control.png\"></a></p>\n<p>You can also view it <a href=\"https://jvns.ca/ascii.html\">as an HTML page</a> (I just made it an image so\nit would show up in RSS).</p>\n<h3 id=\"different-kinds-of-codes-are-mixed-together\">different kinds of codes are mixed together</h3>\n<p>The first surprising thing about this diagram to me is that there are 33\ncontrol codes, split into (very roughly speaking) these categories:</p>\n<ol>\n<li>Codes that are handled by the operating system’s terminal driver, for\nexample when the OS sees a <code>3</code> (<code>Ctrl-C</code>), it’ll send a <code>SIGINT</code> signal to\nthe current program</li>\n<li>Everything else is passed through to the application as-is and the\napplication can do whatever it wants with them. Some subcategories of\nthose:\n<ul>\n<li>Codes that correspond to a literal keypress of a key on your keyboard\n(<code>Enter</code>, <code>Tab</code>, <code>Backspace</code>). For example when you press <code>Enter</code>, your\nterminal gets sent <code>13</code>.</li>\n<li>Codes used by <code>readline</code>: “the application can do whatever it wants”\noften means “it’ll do more or less what the <code>readline</code> library does,\nwhether the application actually uses <code>readline</code> or not”, so I’ve\nlabelled a bunch of the codes that <code>readline</code> uses</li>\n<li>Other codes, for example I think <code>Ctrl-X</code> has no standard meaning in the\nterminal in general but emacs uses it very heavily</li>\n</ul>\n</li>\n</ol>\n<p>There’s no real structure to which codes are in which categories, they’re all\njust kind of randomly scattered because this evolved organically.</p>\n<p>(If you’re curious about readline, I wrote more about readline in <a href=\"https://jvns.ca/blog/2024/07/08/readline/\">entering text in the terminal is complicated</a>, and there are a lot of\n<a href=\"https://github.com/chzyer/readline/blob/master/doc/shortcut.md\">cheat sheets out there</a>)</p>\n<h3 id=\"there-are-only-33-control-codes\">there are only 33 control codes</h3>\n<p>Something else that I find a little surprising is that are only 33 control codes –\nA to Z, plus 7 more (<code>@, [, \\, ], ^, _, ?</code>). This means that if you want to\nhave for example <code>Ctrl-1</code> as a keyboard shortcut in a terminal application,\nthat’s not really meaningful – on my machine at least <code>Ctrl-1</code> is exactly the\nsame thing as just pressing <code>1</code>, <code>Ctrl-3</code> is the same as <code>Ctrl-[</code>, etc.</p>\n<p>Also <code>Ctrl+Shift+C</code> isn’t a control code – what it does depends on your\nterminal emulator. On Linux <code>Ctrl-Shift-X</code> is often used by the terminal\nemulator to copy or open a new tab or paste for example, it’s not sent to the\nTTY at all.</p>\n<p>Also I use <code>Ctrl+Left Arrow</code> all the time, but that isn’t a control code,\ninstead it sends an ANSI escape sequence (<code>ctrl-[[1;5D</code>) which is a different\nthing which we absolutely do not have space for in this post.</p>\n<p>This “there are only 33 codes” thing is totally different from how keyboard\nshortcuts work in a GUI where you can have <code>Ctrl+KEY</code> for any key you want.</p>\n<h3 id=\"the-official-ascii-names-aren-t-very-meaningful-to-me\">the official ASCII names aren’t very meaningful to me</h3>\n<p>Each of these 33 control codes has a name in ASCII (for example <code>3</code> is <code>ETX</code>).\nWhen all of these control codes were originally defined, they weren’t being\nused for computers or terminals at all, they were used for <a href=\"https://falsedoor.com/doc/ascii_evolution-of-character-codes.pdf\">the telegraph machine</a>.\nTelegraph machines aren’t the same as UNIX terminals so a lot of the codes were repurposed to mean something else.</p>\n<p>Personally I don’t find these ASCII names very useful, because 50% of the time\nthe name in ASCII has no actual relationship to what that code does on UNIX\nsystems today. So it feels easier to just ignore the ASCII names completely\ninstead of trying to figure which ones still match their original meaning.</p>\n<h3 id=\"it-s-hard-to-use-ctrl-m-as-a-keyboard-shortcut\">It’s hard to use Ctrl-M  as a keyboard shortcut</h3>\n<p>Another thing that’s a bit weird is that <code>Ctrl-M</code> is literally the same as\n<code>Enter</code>, and <code>Ctrl-I</code> is the same as <code>Tab</code>, which makes it hard to use those two as keyboard shortcuts.</p>\n<p>From some quick research, it seems like some folks do still use <code>Ctrl-I</code> and\n<code>Ctrl-M</code> as keyboard shortcuts (<a href=\"https://github.com/tmux/tmux/issues/2705\">here’s an example</a>), but to do that\nyou need to configure your terminal emulator to treat them differently than the\ndefault.</p>\n<p>For me the main takeaway is that if I ever write a terminal application I\nshould avoid <code>Ctrl-I</code> and <code>Ctrl-M</code> as keyboard shortcuts in it.</p>\n<h3 id=\"how-to-identify-what-control-codes-get-sent\">how to identify what control codes get sent</h3>\n<p>While writing this I needed to do a bunch of experimenting to figure out what\nvarious key combinations did, so I wrote this Python script\n<a href=\"https://gist.github.com/jvns/a2ea09dbfbe03cc75b7bfb381941c742\">echo-key.py</a>\nthat will print them out.</p>\n<p>There’s probably a more official way but I appreciated having a script I could\ncustomize.</p>\n<h3 id=\"caveat-on-canonical-vs-noncanonical-mode\">caveat: on canonical vs noncanonical mode</h3>\n<p>Two of these codes (<code>Ctrl-W</code> and <code>Ctrl-U</code>) are labelled in the table as\n“handled by the OS”, but actually they’re not <strong>always</strong> handled by the OS, it\ndepends on whether the terminal is in “canonical” mode or in “noncanonical mode”.</p>\n<p>In <a href=\"https://www.man7.org/linux/man-pages/man3/termios.3.html\">canonical mode</a>,\nprograms only get input when you press <code>Enter</code> (and the OS is in charge of deleting characters when you press <code>Backspace</code> or <code>Ctrl-W</code>). But in noncanonical mode the program gets\ninput immediately when you press a key, and the <code>Ctrl-W</code> and <code>Ctrl-U</code> codes are passed through to the program to handle any way it wants.</p>\n<p>Generally in noncanonical mode the program will handle <code>Ctrl-W</code> and <code>Ctrl-U</code>\nsimilarly to how the OS does, but there are some small differences.</p>\n<p>Some examples of programs that use canonical mode:</p>\n<ul>\n<li>probably pretty much any noninteractive program, like <code>grep</code> or <code>cat</code></li>\n<li><code>git</code>, I think</li>\n</ul>\n<p>Examples of programs that use noncanonical mode:</p>\n<ul>\n<li><code>python3</code>, <code>irb</code> and other REPLs</li>\n<li>your shell</li>\n<li>any full screen TUI like <code>less</code> or <code>vim</code></li>\n</ul>\n<h3 id=\"caveat-all-of-the-os-terminal-driver-codes-are-configurable-with-stty\">caveat: all of the “OS terminal driver” codes are configurable with <code>stty</code></h3>\n<p>I said that <code>Ctrl-C</code> sends <code>SIGINT</code> but technically this is not necessarily\ntrue, if you really want to you can remap all of the codes labelled “OS\nterminal driver”, plus Backspace, using a tool called <code>stty</code>, and you can view\nthe mappings with <code>stty -a</code>.</p>\n<p>Here are the mappings on my machine right now:</p>\n<pre><code>$ stty -a\ncchars: discard = ^O; dsusp = ^Y; eof = ^D; eol = <undef>;\n\teol2 = <undef>; erase = ^?; intr = ^C; kill = ^U; lnext = ^V;\n\tmin = 1; quit = ^\\; reprint = ^R; start = ^Q; status = ^T;\n\tstop = ^S; susp = ^Z; time = 0; werase = ^W;\n</code></pre>\n<p>I have personally never remapped any of these and I cannot imagine a reason I\nwould (I think it would be a recipe for confusion and disaster for me), but I\n<a href=\"TODO\">asked on Mastodon</a> and people said the most common reasons they used\n<code>stty</code> were:</p>\n<ul>\n<li>fix a broken terminal with <code>stty sane</code></li>\n<li>set <code>stty erase ^H</code> to change how Backspace works</li>\n<li>set <code>stty ixoff</code></li>\n<li>some people even map <code>SIGINT</code> to a different key, like their <code>DELETE</code> key</li>\n</ul>\n<h3 id=\"caveat-on-signals\">caveat: on signals</h3>\n<p>Two signals caveats:</p>\n<ol>\n<li>If the <code>ISIG</code> terminal mode is turned off, then the OS won’t send signals. For example <code>vim</code> turns off <code>ISIG</code></li>\n<li>Apparently on BSDs, there’s an extra control code (<code>Ctrl-T</code>) which sends <code>SIGINFO</code></li>\n</ol>\n<p>You can see which terminal modes a program is setting using <code>strace</code> like this,\nterminal modes are set with the <code>ioctl</code> system call:</p>\n<pre><code>$ strace -tt -o out  vim\n$ grep ioctl out | grep SET\n</code></pre>\n<p>here are the modes <code>vim</code> sets when it starts (<code>ISIG</code> and <code>ICANON</code> are\nmissing!):</p>\n<pre><code>17:43:36.670636 ioctl(0, TCSETS, {c_iflag=IXANY|IMAXBEL|IUTF8,\nc_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST, c_cflag=B38400|CS8|CREAD,\nc_lflag=ECHOK|ECHOCTL|ECHOKE|PENDIN, ...}) = 0\n</code></pre>\n<p>and it resets the modes when it exits:</p>\n<pre><code>17:43:38.027284 ioctl(0, TCSETS, {c_iflag=ICRNL|IXANY|IMAXBEL|IUTF8,\nc_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD,\nc_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE|PENDIN, ...}) = 0\n</code></pre>\n<p>I think the specific combination of modes vim is using here might be called\n“raw mode”, <a href=\"https://linux.die.net/man/3/cfmakeraw\">man cfmakeraw</a> talks about\nthat.</p>\n<h3 id=\"there-are-a-lot-of-conflicts\">there are a lot of conflicts</h3>\n<p>Related to “there are only 33 codes”, there are a lot of conflicts where\ndifferent parts of the system want to use the same code for different things,\nfor example by default <code>Ctrl-S</code> will freeze your screen, but if you turn that\noff then <code>readline</code> will use <code>Ctrl-S</code> to do a forward search.</p>\n<p>Another example is that on my machine sometimes <code>Ctrl-T</code> will send <code>SIGINFO</code>\nand sometimes it’ll transpose 2 characters and sometimes it’ll do something\ncompletely different depending on:</p>\n<ul>\n<li>whether the program has <code>ISIG</code> set</li>\n<li>whether the program uses <code>readline</code> / imitates readline’s behaviour</li>\n</ul>\n<h3 id=\"caveat-on-backspace-and-other-backspace\">caveat: on “backspace” and “other backspace”</h3>\n<p>In this diagram I’ve labelled code 127 as “backspace” and 8 as “other\nbackspace”. Uh, what?</p>\n<p>I think this was the single biggest topic of discussion in the replies on Mastodon – apparently there’s a LOT of history to this and I’d never heard of any of it before.</p>\n<p>First, here’s how it works on my machine:</p>\n<ol>\n<li>I press the <code>Backspace</code> key</li>\n<li>The TTY gets sent the byte <code>127</code>, which is called <code>DEL</code> in ASCII</li>\n<li>the OS terminal driver and readline both have <code>127</code> mapped to “backspace” (so it works both in canonical mode and noncanonical mode)</li>\n<li>The previous character gets deleted</li>\n</ol>\n<p>If I press <code>Ctrl+H</code>, it has the same effect as <code>Backspace</code> if I’m using\nreadline, but in a program without readline support (like <code>cat</code> for instance),\nit just prints out <code>^H</code>.</p>\n<p>Apparently Step 2 above is different for some folks – their <code>Backspace</code> key sends\nthe byte <code>8</code> instead of <code>127</code>, and so if they want Backspace to work then they\nneed to configure the OS (using <code>stty</code>) to set <code>erase = ^H</code>.</p>\n<p>There’s an incredible <a href=\"https://www.debian.org/doc/debian-policy/ch-opersys.html#keyboard-configuration\">section of the Debian Policy Manual on keyboard configuration</a>\nthat describes how <code>Delete</code> and <code>Backspace</code> should work according to Debian\npolicy, which seems very similar to how it works on my Mac today.  My\nunderstanding (via <a href=\"https://tech.lgbt/@Diziet/113396035847619715\">this mastodon post</a>)\nis that this policy was written in the 90s because there was a lot of confusion\nabout what <code>Backspace</code> should do in the 90s and there needed to be a standard\nto get everything to work.</p>\n<p>There’s a bunch more historical terminal stuff here but that’s all I’ll say for\nnow.</p>\n<h3 id=\"there-s-probably-a-lot-more-diversity-in-how-this-works\">there’s probably a lot more diversity in how this works</h3>\n<p>I’ve probably missed a bunch more ways that “how it works on my machine” might\nbe different from how it works on other people’s machines, and I’ve probably\nmade some mistakes about how it works on my machine too. But that’s all I’ve\ngot for today.</p>\n<p>Some more stuff I know that I’ve left out: according to <code>stty -a</code> <code>Ctrl-O</code> is\n“discard”, <code>Ctrl-R</code> is “reprint”, and <code>Ctrl-Y</code> is “dsusp”. I have no idea how\nto make those actually do anything (pressing them does not do anything\nobvious, and some people have told me what they used to do historically but\nit’s not clear to me if they have a use in 2024), and a lot of the time in practice\nthey seem to just be passed through to the application anyway so I just\nlabelled <code>Ctrl-R</code> and <code>Ctrl-Y</code> as\n<code>readline</code>.</p>\n<h3 id=\"not-all-of-this-is-that-useful-to-know\">not all of this is that useful to know</h3>\n<p>Also I want to say that I think the contents of this post are kind of interesting\nbut I don’t think they’re necessarily that <em>useful</em>. I’ve used the terminal\npretty successfully every day for the last 20 years without knowing literally\nany of this – I just knew what <code>Ctrl-C</code>, <code>Ctrl-D</code>, <code>Ctrl-Z</code>, <code>Ctrl-R</code>,\n<code>Ctrl-L</code> did in practice (plus maybe <code>Ctrl-A</code>, <code>Ctrl-E</code> and <code>Ctrl-W</code>) and did\nnot worry about the details for the most part, and that was\nalmost always totally fine except when I was <a href=\"https://jvns.ca/blog/2022/07/20/pseudoterminals/\">trying to use xterm.js</a>.</p>\n<p>But I had fun learning about it so maybe it’ll be interesting to you too.</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/10/27/asn-ip-address-memory/",
      "title": "Using less memory to look up IP addresses in Mess With DNS",
      "description": null,
      "url": "https://jvns.ca/blog/2024/10/27/asn-ip-address-memory/",
      "published": null,
      "updated": "2024-10-27T07:47:04.000Z",
      "content": "<p>I’ve been having problems for the last 3 years or so where <a href=\"https://messwithdns.net/\">Mess With DNS</a>\nperiodically runs out of memory and gets OOM killed.</p>\n<p>This hasn’t been a big priority for me: usually it just goes down for a few\nminutes while it restarts, and it only happens once a day at most, so I’ve just\nbeen ignoring. But last week it started actually causing a problem so I decided\nto look into it.</p>\n<p>This was kind of winding road where I learned a lot so here’s a table of contents:</p>\n<ul>\n<li><a href=\"#there-s-about-100mb-of-memory-available\">there’s about 100MB of memory available</a></li>\n<li><a href=\"#the-problem-oom-killing-the-backup-script\">the problem: OOM killing the backup script</a></li>\n<li><a href=\"#attempt-1-use-sqlite\">attempt 1: use SQLite</a>\n<ul>\n<li><a href=\"#problem-how-to-store-ipv6-addresses\">problem: how to store IPv6 addresses</a></li>\n<li><a href=\"#problem-it-s-500x-slower\">problem: it’s 500x slower</a></li>\n<li><a href=\"#time-for-explain-query-plan\">time for EXPLAIN QUERY PLAN</a></li>\n</ul>\n</li>\n<li><a href=\"#attempt-2-use-a-trie\">attempt 2: use a trie</a>\n<ul>\n<li><a href=\"#some-notes-on-memory-profiling\">some notes on memory profiling</a></li>\n</ul>\n</li>\n<li><a href=\"#attempt-3-make-my-array-use-less-memory\">attempt 3: make my array use less memory</a>\n<ul>\n<li><a href=\"#idea-3-1-deduplicate-the-name-and-country\">idea 3.1: deduplicate the Name and Country</a></li>\n<li><a href=\"#how-big-are-asns\">how big are ASNs?</a></li>\n<li><a href=\"#idea-3-2-use-netip-addr-instead-of-net-ip\">idea 3.2: use netip.Addr instead of net.IP</a></li>\n<li><a href=\"#the-result-saved-70mb-of-memory\">the result: saved 70MB of memory!</a></li>\n</ul>\n</li>\n</ul>\n<h3 id=\"there-s-about-100mb-of-memory-available\">there’s about 100MB of memory available</h3>\n<p>I run Mess With DNS on a VM without about 465MB of RAM, which according to\n<code>ps aux</code> (the <code>RSS</code> column) is split up something like:</p>\n<ul>\n<li>100MB for PowerDNS</li>\n<li>200MB for Mess With DNS</li>\n<li>40MB for <a href=\"https://fly.io/blog/ssh-and-user-mode-ip-wireguard/\">hallpass</a></li>\n</ul>\n<p>That leaves about 110MB of memory free.</p>\n<p>A while back I set <a href=\"https://tip.golang.org/doc/gc-guide\">GOMEMLIMIT</a> to 250MB\nto try to make sure the garbage collector ran if Mess With DNS used more than\n250MB of memory, and I think this helped but it didn’t solve everything.</p>\n<h3 id=\"the-problem-oom-killing-the-backup-script\">the problem: OOM killing the backup script</h3>\n<p>A few weeks ago I started backing up Mess With DNS’s database for the first time <a href=\"https://jvns.ca/til/restic-for-backing-up-sqlite-dbs/\">using restic</a>.</p>\n<p>This has been working okay, but since Mess With DNS operates without much extra\nmemory I think <code>restic</code> sometimes needed more memory than was available on the\nsystem, and so the backup script sometimes got OOM killed.</p>\n<p>This was a problem because</p>\n<ol>\n<li>backups might be corrupted sometimes</li>\n<li>more importantly, restic takes out a lock when it runs, and so I’d have to manually do an\nunlock if I wanted the backups to continue working. Doing manual work like\nthis is the #1 thing I try to avoid with all my web services (who has time\nfor that!) so I really wanted to do something about it.</li>\n</ol>\n<p>There’s probably more than one solution to this, but I decided to try to make\nMess With DNS use less memory so that there was more available memory on the\nsystem, mostly because it seemed like a fun problem to try to solve.</p>\n<h3 id=\"what-s-using-memory-ip-addresses\">what’s using memory: IP addresses</h3>\n<p>I’d run a memory profile of Mess With DNS a bunch of times in the past, so I\nknew exactly what was using most of Mess With DNS’s memory: IP addresses.</p>\n<p>When it starts, Mess With DNS loads this <a href=\"https://iptoasn.com/\">database where you can look up the\nASN of every IP address</a> into memory, so that when it\nreceives a DNS query it can take the source IP address like <code>74.125.16.248</code> and\ntell you that IP address belongs to <code>GOOGLE</code>.</p>\n<p>This database by itself used about 117MB of memory, and a simple <code>du</code> told me\nthat was too much – the original text files were only 37MB!</p>\n<pre><code>$ du -sh *.tsv\n26M\tip2asn-v4.tsv\n11M\tip2asn-v6.tsv\n</code></pre>\n<p>The way it worked originally is that I had an array of these:</p>\n<pre><code>type IPRange struct {\n\tStartIP net.IP\n\tEndIP   net.IP\n\tNum     int\n\tName    string\n\tCountry string\n}\n</code></pre>\n<p>and I searched through it with a binary search to figure out if any of the\nranges contained the IP I was looking for. Basically the simplest possible\nthing and it’s super fast, my machine can do about 9 million lookups per\nsecond.</p>\n<h3 id=\"attempt-1-use-sqlite\">attempt 1: use SQLite</h3>\n<p>I’ve been using SQLite recently, so my first thought was – maybe I can store\nall of this data on disk in an SQLite database, give the tables an index, and\nthat’ll use less memory.</p>\n<p>So I:</p>\n<ul>\n<li>wrote a quick Python script using <a href=\"https://sqlite-utils.datasette.io/en/stable/\">sqlite-utils</a> to import the TSV files into an SQLite database</li>\n<li>adjusted my code to select from the database instead</li>\n</ul>\n<p>This did solve the initial memory goal (after a GC it now hardly used any\nmemory at all because the table was on disk!), though I’m not sure how much GC\nchurn this solution would cause if we needed to do a lot of queries at once. I\ndid a quick memory profile and it seemed to allocate about 1KB of memory per\nlookup.</p>\n<p>Let’s talk about the issues I ran into with using SQLite though.</p>\n<h3 id=\"problem-how-to-store-ipv6-addresses\">problem: how to store IPv6 addresses</h3>\n<p>SQLite doesn’t have support for big integers and IPv6 addresses are 128 bits,\nso I decided to store them as text. I think <code>BLOB</code> might have been better, I\noriginally thought <code>BLOB</code>s couldn’t be compared but the <a href=\"https://www.sqlite.org/datatype3.html#sort_order\">sqlite docs</a> say they can.</p>\n<p>I ended up with this schema:</p>\n<pre><code>CREATE TABLE ipv4_ranges (\n   start_ip INTEGER NOT NULL,\n   end_ip INTEGER NOT NULL,\n   asn INTEGER NOT NULL,\n   country TEXT NOT NULL,\n   name TEXT NOT NULL\n);\nCREATE TABLE ipv6_ranges (\n   start_ip TEXT NOT NULL,\n   end_ip TEXT NOT NULL,\n   asn INTEGER,\n   country TEXT,\n   name TEXT\n);\nCREATE INDEX idx_ipv4_ranges_start_ip ON ipv4_ranges (start_ip);\nCREATE INDEX idx_ipv6_ranges_start_ip ON ipv6_ranges (start_ip);\nCREATE INDEX idx_ipv4_ranges_end_ip ON ipv4_ranges (end_ip);\nCREATE INDEX idx_ipv6_ranges_end_ip ON ipv6_ranges (end_ip);\n</code></pre>\n<p>Also I learned that Python has an <code>ipaddress</code> module, so I could use\n<code>ipaddress.ip_address(s).exploded</code> to make sure that the IPv6 addresses were\nexpanded so that a string comparison would compare them properly.</p>\n<h3 id=\"problem-it-s-500x-slower\">problem: it’s 500x slower</h3>\n<p>I ran a quick microbenchmark, something like this. It printed out that it could\nlook up 17,000 IPv6 addresses per second, and similarly for IPv4 addresses.</p>\n<p>This was pretty discouraging – being able to look up 17k addresses per section\nis kind of fine (Mess With DNS does not get a lot of traffic), but I compared it to\nthe original binary search code and the original code could do 9 million per second.</p>\n<pre><code>\tips := []net.IP{}\n\tcount := 20000\n\tfor i := 0; i < count; i++ {\n\t\t// create a random IPv6 address\n\t\tbytes := randomBytes()\n\t\tip := net.IP(bytes[:])\n\t\tips = append(ips, ip)\n\t}\n\tnow := time.Now()\n\tsuccess := 0\n\tfor _, ip := range ips {\n\t\t_, err := ranges.FindASN(ip)\n\t\tif err == nil {\n\t\t\tsuccess++\n\t\t}\n\t}\n\tfmt.Println(success)\n\telapsed := time.Since(now)\n\tfmt.Println(\"number per second\", float64(count)/elapsed.Seconds())\n</code></pre>\n<h3 id=\"time-for-explain-query-plan\">time for EXPLAIN QUERY PLAN</h3>\n<p>I’d never really done an EXPLAIN in sqlite, so I thought it would be a fun\nopportunity to see what the query plan was doing.</p>\n<pre><code>sqlite> explain query plan select * from ipv6_ranges where '2607:f8b0:4006:0824:0000:0000:0000:200e' BETWEEN start_ip and end_ip;\nQUERY PLAN\n`--SEARCH ipv6_ranges USING INDEX idx_ipv6_ranges_end_ip (end_ip>?)\n</code></pre>\n<p>It looks like it’s just using the <code>end_ip</code> index and not the <code>start_ip</code> index,\nso maybe it makes sense that it’s slower than the binary search.</p>\n<p>I tried to figure out if there was a way to make SQLite use both indexes, but I\ncouldn’t find one and maybe it knows best anyway.</p>\n<p>At this point I gave up on the SQLite solution, I didn’t love that it was\nslower and also it’s a lot more complex than just doing a binary search. I felt\nlike I’d rather keep something much more similar to the binary search.</p>\n<p>A few things I tried with SQLite that did not cause it to use both indexes:</p>\n<ul>\n<li>using a compound index instead of two separate indexes</li>\n<li>running <code>ANALYZE</code></li>\n<li>using <code>INTERSECT</code> to intersect the results of <code>start_ip < ?</code> and <code>? < end_ip</code>. This did make it use both indexes, but it also seemed to make the\nquery literally 1000x slower, probably because it needed to create the\nresults of both subqueries in memory and intersect them.</li>\n</ul>\n<h3 id=\"attempt-2-use-a-trie\">attempt 2: use a trie</h3>\n<p>My next idea was to use a\n<a href=\"https://medium.com/basecs/trying-to-understand-tries-3ec6bede0014\">trie</a>,\nbecause I had some vague idea that maybe a trie would use less memory, and\nI found this library called\n<a href=\"https://github.com/seancfoley/ipaddress-go\">ipaddress-go</a> that lets you look up IP addresses using a trie.</p>\n<p>I tried using it <a href=\"https://gist.github.com/jvns/3ce617796b22127017590ac62c57fddd\">here’s the code</a>, but I\nthink I was doing something wildly wrong because, compared to my naive array + binary search:</p>\n<ul>\n<li>it used WAY more memory (800MB to store just the IPv4 addresses)</li>\n<li>it was a lot slower to do the lookups (it could do only 100K/second instead of 9 million/second)</li>\n</ul>\n<p>I’m not really sure what went wrong here but I gave up on this approach and\ndecided to just try to make my array use less memory and stick to a simple\nbinary search.</p>\n<h3 id=\"some-notes-on-memory-profiling\">some notes on memory profiling</h3>\n<p>One thing I learned about memory profiling is that you can use <code>runtime</code>\npackage to see how much memory is currently allocated in the program. That’s\nhow I got all the memory numbers in this post. Here’s the code:</p>\n<pre><code>func memusage() {\n\truntime.GC()\n\tvar m runtime.MemStats\n\truntime.ReadMemStats(&m)\n\tfmt.Printf(\"Alloc = %v MiB\\n\", m.Alloc/1024/1024)\n\t// write mem.prof\n\tf, err := os.Create(\"mem.prof\")\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tpprof.WriteHeapProfile(f)\n\tf.Close()\n}\n</code></pre>\n<p>Also I learned that if you use <code>pprof</code> to analyze a heap profile there are two\nways to analyze it: you can pass either <code>--alloc-space</code> or <code>--inuse-space</code> to\n<code>go tool pprof</code>. I don’t know how I didn’t realize this before but\n<code>alloc-space</code> will tell you about everything that was allocated, and\n<code>inuse-space</code> will just include memory that’s currently in use.</p>\n<p>Anyway I ran <code>go tool pprof -pdf --inuse_space mem.prof > mem.pdf</code> a lot. Also\nevery time I use pprof I find myself referring to <a href=\"https://jvns.ca/blog/2017/09/24/profiling-go-with-pprof/\">my own intro to pprof</a>, it’s probably\nthe blog post I wrote that I use the most often. I should add <code>--alloc-space</code>\nand <code>--inuse-space</code> to it.</p>\n<h3 id=\"attempt-3-make-my-array-use-less-memory\">attempt 3: make my array use less memory</h3>\n<p>I was storing my ip2asn entries like this:</p>\n<pre><code>type IPRange struct {\n\tStartIP net.IP\n\tEndIP   net.IP\n\tNum     int\n\tName    string\n\tCountry string\n}\n</code></pre>\n<p>I had 3 ideas for ways to improve this:</p>\n<ol>\n<li>There was a lot of repetition of <code>Name</code> and the <code>Country</code>, because a lot of IP ranges belong to the same ASN</li>\n<li><code>net.IP</code> is an <code>[]byte</code> under the hood, which felt like it involved an unnecessary pointer, was there a way to inline it into the struct?</li>\n<li>Maybe I didn’t need both the start IP and the end IP, often the ranges were consecutive so maybe I could rearrange things so that I only had the start IP</li>\n</ol>\n<h3 id=\"idea-3-1-deduplicate-the-name-and-country\">idea 3.1: deduplicate the Name and Country</h3>\n<p>I figured I could store the ASN info in an array, and then just store the index\ninto the array in my <code>IPRange</code> struct. Here are the structs so you can see what\nI mean:</p>\n<pre><code>type IPRange struct {\n\tStartIP netip.Addr\n\tEndIP   netip.Addr\n\tASN     uint32\n\tIdx     uint32\n}\n\ntype ASNInfo struct {\n\tCountry string\n\tName    string\n}\n\ntype ASNPool struct {\n\tasns   []ASNInfo\n\tlookup map[ASNInfo]uint32\n}\n</code></pre>\n<p>This worked! It brought memory usage from 117MB to 65MB – a 50MB savings. I felt good about this.</p>\n<p><a href=\"https://github.com/jvns/mess-with-dns/blob/94f77b4bb1597b5e2a6768e33bd6c285919aa1bf/api/streamer/ip2asn/ip2asn.go#L18-L54\">Here’s all of the code for that part</a>.</p>\n<h3 id=\"how-big-are-asns\">how big are ASNs?</h3>\n<p>As an aside – I’m storing the ASN in a <code>uint32</code>, is that right? I looked in the ip2asn\nfile and the biggest one seems to be 401307, though there are a few lines that\nsay <code>4294901931</code> which is much bigger, but also are just inside the range of a\nuint32. So I can definitely use a <code>uint32</code>.</p>\n<pre><code>59.101.179.0\t59.101.179.255\t4294901931\tUnknown\tAS4294901931\n</code></pre>\n<h3 id=\"idea-3-2-use-netip-addr-instead-of-net-ip\">idea 3.2: use <code>netip.Addr</code> instead of <code>net.IP</code></h3>\n<p>It turns out that I’m not the only one who felt that <code>net.IP</code> was using an\nunnecessary amount of memory – in 2021 the folks at Tailscale released a new\nIP address library for Go which solves this and many other issues. <a href=\"https://tailscale.com/blog/netaddr-new-ip-type-for-go\">They wrote a great blog post about it</a>.</p>\n<p>I discovered (to my delight) that not only does this new IP address library exist and do exactly what I want, it’s also now in the Go\nstandard library as <a href=\"https://pkg.go.dev/net/netip#Addr\">netip.Addr</a>. Switching to <code>netip.Addr</code> was\nvery easy and saved another 20MB of memory, bringing us to 46MB.</p>\n<p>I didn’t try my third idea (remove the end IP from the struct) because I’d\nalready been programming for long enough on a Saturday morning and I was happy\nwith my progress.</p>\n<p>It’s always such a great feeling when I think “hey, I don’t like this, there\nmust be a better way” and then immediately discover that someone has already\nmade the exact thing I want, thought about it a lot more than me, and\nimplemented it much better than I would have.</p>\n<h3 id=\"all-of-this-was-messier-in-real-life\">all of this was messier in real life</h3>\n<p>Even though I tried to explain this in a simple linear way “I tried X, then I\ntried Y, then I tried Z”, that’s kind of a lie – I always try to take my\nactual debugging process (total chaos) and make it seem more linear and\nunderstandable because the reality is just too annoying to write down. It’s\nmore like:</p>\n<ul>\n<li>try sqlite</li>\n<li>try a trie</li>\n<li>second guess everything that I concluded about sqlite, go back and look at\nthe results again</li>\n<li>wait what about indexes</li>\n<li>very very belatedly realize that I can use <code>runtime</code> to check how much\nmemory everything is using, start doing that</li>\n<li>look at the trie again, maybe I misunderstood everything</li>\n<li>give up and go back to binary search</li>\n<li>look at all of the numbers for tries/sqlite again to make sure I didn’t misunderstand</li>\n</ul>\n<h3 id=\"a-note-on-using-512mb-of-memory\">A note on using 512MB of memory</h3>\n<p>Someone asked why I don’t just give the VM more memory. I could very easily\nafford to pay for a VM with 1GB of memory, but I feel like 512MB really\n<em>should</em> be enough (and really that 256MB should be enough!) so I’d rather stay\ninside that constraint. It’s kind of a fun puzzle.</p>\n<h3 id=\"a-few-ideas-from-the-replies\">a few ideas from the replies</h3>\n<p>Folks had a lot of good ideas I hadn’t thought of. Recording them as\ninspiration if I feel like having another Fun Performance Day at some point.</p>\n<ul>\n<li>Try Go’s <a href=\"https://pkg.go.dev/unique\">unique</a> package for the <code>ASNPool</code>. Someone tried this and it uses more memory, probably because Go’s pointers are 64 bits</li>\n<li>Try compiling with <code>GOARCH=386</code> to use 32-bit pointers to sace space (maybe in combination with using <code>unique</code>!)</li>\n<li>It should be possible to store all of the IPv6 addresses in just 64 bits, because only the first 64 bits of the address are public</li>\n<li><a href=\"https://en.m.wikipedia.org/wiki/Interpolation_search\">Interpolation search</a> might be faster than binary search since IP addresses are numeric</li>\n<li>Try the MaxMind db format with <a href=\"https://github.com/maxmind/mmdbwriter\">mmdbwriter</a> or <a href=\"https://github.com/ipinfo/mmdbctl\">mmdbctl</a></li>\n<li>Tailscale’s <a href=\"https://github.com/tailscale/art\">art</a> routing table package</li>\n</ul>\n<h3 id=\"the-result-saved-70mb-of-memory\">the result: saved 70MB of memory!</h3>\n<p>I deployed the new version and now Mess With DNS is using less memory! Hooray!</p>\n<p>A few other notes:</p>\n<ul>\n<li>lookups are a little slower – in my microbenchmark they went from 9 million\nlookups/second to 6 million, maybe because I added a little indirection.\nUsing less memory and a little more CPU seemed like a good tradeoff though.</li>\n<li>it’s still using more memory than the raw text files do (46MB vs 37MB), I\nguess pointers take up space and that’s okay.</li>\n</ul>\n<p>I’m honestly not sure if this will solve all my memory problems, probably not!\nBut I had fun, I learned a few things about SQLite, I still don’t know what to\nthink about tries, and it made me love binary search even more than I already\ndid.</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/10/07/some-notes-on-upgrading-hugo/",
      "title": "Some notes on upgrading Hugo",
      "description": null,
      "url": "https://jvns.ca/blog/2024/10/07/some-notes-on-upgrading-hugo/",
      "published": null,
      "updated": "2024-10-07T09:19:57.000Z",
      "content": "<p>Warning: this is a post about very boring yakshaving, probably only of interest\nto people who are trying to upgrade Hugo from a very old version to a new\nversion. But what are blogs for if not documenting one’s very boring yakshaves\nfrom time to time?</p>\n<p>So yesterday I decided to try to upgrade Hugo. There’s no real reason to do\nthis – I’ve been using Hugo version 0.40 to generate this blog since 2018, it\nworks fine, and I don’t have any problems with it. But I thought – maybe it\nwon’t be as hard as I think, and I kind of like a tedious computer task sometimes!</p>\n<p>I thought I’d document what I learned along the way in case it’s useful to\nanyone else doing this very specific migration. I upgraded from Hugo v0.40\n(from 2018) to v0.135 (from 2024).</p>\n<p>Here are most of the changes I had to make:</p>\n<h3 id=\"change-1-template-theme-partials-thing-html-is-now-partial-thing-html\">change 1: <code>template \"theme/partials/thing.html</code> is now <code>partial thing.html</code></h3>\n<p>I had to replace a bunch of instances of <code>{{ template \"theme/partials/header.html\" . }}</code> with <code>{{ partial \"header.html\" . }}</code>.</p>\n<p>This happened in <a href=\"https://github.com/gohugoio/hugo/releases/tag/v0.42\">v0.42</a>:</p>\n<blockquote>\n<p>We have now virtualized the filesystems for project and theme files. This\nmakes everything simpler, faster and more powerful. But it also means that\ntemplate lookups on the form {{ template “theme/partials/pagination.html” .\n}} will not work anymore. That syntax has never been documented, so it’s not\nexpected to be in wide use.</p>\n</blockquote>\n<h3 id=\"change-2-data-pages-is-now-site-regularpages\">change 2: <code>.Data.Pages</code> is now <code>site.RegularPages</code></h3>\n<p>This seems to be discussed in the <a href=\"https://github.com/gohugoio/hugo/releases/tag/v0.57.2\">release notes for 0.57.2</a></p>\n<p>I just needed to replace <code>.Data.Pages</code> with <code>site.RegularPages</code> in the template on the homepage as well as in my RSS feed template.</p>\n<h3 id=\"change-3-next-and-prev-got-flipped\">change 3:  <code>.Next</code> and <code>.Prev</code> got flipped</h3>\n<p>I had this comment in the part of my theme where I link to the next/previous blog post:</p>\n<blockquote>\n<p>“next” and “previous” in hugo apparently mean the opposite of what I’d think\nthey’d mean intuitively. I’d expect “next” to mean “in the future” and\n“previous” to mean “in the past” but it’s the opposite</p>\n</blockquote>\n<p>It looks they changed this in\n<a href=\"https://github.com/gohugoio/hugo/commit/ad705aac0649fa3102f7639bc4db65d45e108ee2\">ad705aac064</a>\nso that “next” actually is in the future and “prev” actually is in the past. I\ndefinitely find the new behaviour more intuitive.</p>\n<h3 id=\"downloading-the-hugo-changelogs-with-a-script\">downloading the Hugo changelogs with a script</h3>\n<p>Figuring out why/when all of these changes happened was a little difficult. I\nended up hacking together a bash script to <a href=\"https://gist.github.com/jvns/dbe4bd9271a56f1f8562bfe329c2aa9e\">download all of the changelogs from github as text files</a>, which I\ncould then grep to try to figure out what happened. It turns out it’s pretty\neasy to get all of the changelogs from the GitHub API.</p>\n<p>So far everything was not so bad – there was also a change around taxonomies\nthat’s I can’t quite explain, but it was all pretty manageable, but then we got\nto the really tough one: the markdown renderer.</p>\n<h3 id=\"change-4-the-markdown-renderer-blackfriday-goldmark\">change 4: the markdown renderer (blackfriday -> goldmark)</h3>\n<p>The blackfriday markdown renderer (which was previously the default) was removed in <a href=\"https://github.com/gohugoio/hugo/releases/tag/v0.100.0\">v0.100.0</a>. This seems pretty reasonable:</p>\n<blockquote>\n<p>It has been deprecated for a long time, its v1 version is not maintained\nanymore, and there are many known issues. Goldmark should be a mature\nreplacement by now.</p>\n</blockquote>\n<p>Fixing all my Markdown changes was a huge pain – I ended up having to update\n80 different Markdown files (out of 700) so that they would render properly, and I’m not totally sure</p>\n<h3 id=\"why-bother-switching-renderers\">why bother switching renderers?</h3>\n<p>The obvious question here is – why bother even trying to upgrade Hugo at all\nif I have to switch Markdown renderers?\nMy old site was running totally fine and I think it wasn’t necessarily a <em>good</em>\nuse of time, but the one reason I think it might be useful in the future is\nthat the new renderer (goldmark) uses the <a href=\"https://commonmark.org/\">CommonMark markdown standard</a>, which I’m hoping will be somewhat\nmore futureproof. So maybe I won’t have to go through this again? We’ll see.</p>\n<p>Also it turned out that the new Goldmark renderer does fix some problems I had\n(but didn’t know that I had) with smart quotes and how lists/blockquotes\ninteract.</p>\n<h3 id=\"finding-all-the-markdown-problems-the-process\">finding all the Markdown problems: the process</h3>\n<p>The hard part of this Markdown change was even figuring out what changed.\nAlmost all of the problems (including #2 and #3 above) just silently broke the\nsite, they didn’t cause any errors or anything. So I had to diff the HTML to\nhunt them down.</p>\n<p>Here’s what I ended up doing:</p>\n<ol>\n<li>Generate the site with the old version, put it in <code>public_old</code></li>\n<li>Generate the new version, put it in <code>public</code></li>\n<li>Diff every single HTML file in <code>public/</code> and <code>public_old</code> with <a href=\"https://gist.github.com/jvns/c7272cfb906e3ed0a3e9f8d361c5b5fc\">this diff.sh script</a> and put the results in a <code>diffs/</code> folder</li>\n<li>Run variations on <code>find diffs -type f | xargs cat | grep -C 5 '(31m|32m)' | less -r</code> over and over again to look at every single change until I found something that seemed wrong</li>\n<li>Update the Markdown to fix the problem</li>\n<li>Repeat until everything seemed okay</li>\n</ol>\n<p>(the <code>grep 31m|32m</code> thing is searching for red/green text in the diff)</p>\n<p>This was very time consuming but it was a little bit fun for some reason so I\nkept doing it until it seemed like nothing too horrible was left.</p>\n<h3 id=\"the-new-markdown-rules\">the new markdown rules</h3>\n<p>Here’s a list of every type of Markdown change I had to make. It’s very\npossible these are all extremely specific to me but it took me a long time to\nfigure them all out so maybe this will be helpful to one other person who finds\nthis in the future.</p>\n<h4 id=\"4-1-mixing-html-and-markdown\">4.1: mixing HTML and markdown</h4>\n<p>This doesn’t work anymore (it doesn’t expand the link):</p>\n<pre><code><small>\n[a link](https://example.com)\n</small>\n</code></pre>\n<p>I need to do this instead:</p>\n<pre><code><small>\n\n[a link](https://example.com)\n\n</small>\n</code></pre>\n<p>This works too:</p>\n<pre><code><small> [a link](https://example.com) </small>\n</code></pre>\n<h4 id=\"4-2-is-changed-into\">4.2: <code><<</code> is changed into «</h4>\n<p>I didn’t want this so I needed to configure:</p>\n<pre><code>markup:\n  goldmark:\n    extensions:\n      typographer:\n        leftAngleQuote: '&lt;&lt;'\n        rightAngleQuote: '&gt;&gt;'\n</code></pre>\n<h4 id=\"4-3-nested-lists-sometimes-need-4-space-indents\">4.3: nested lists sometimes need 4 space indents</h4>\n<p>This doesn’t render as a nested list anymore if I only indent by 2 spaces, I need to put 4 spaces.</p>\n<pre><code>1. a\n  * b\n  * c\n2. b\n</code></pre>\n<p>The problem is that the amount of indent needed depends on the size of the list\nmarkers. <a href=\"https://spec.commonmark.org/0.29/#example-263\">Here’s a reference in CommonMark for this</a>.</p>\n<h4 id=\"4-4-blockquotes-inside-lists-work-better\">4.4: blockquotes inside lists work better</h4>\n<p>Previously the <code>> quote</code> here didn’t render as a blockquote, and with the new renderer it does.</p>\n<pre><code>* something\n> quote\n* something else\n</code></pre>\n<p>I found a bunch of Markdown that had been kind of broken (which I hadn’t\nnoticed) that works better with the new renderer, and this is an example of\nthat.</p>\n<p>Lists inside blockquotes also seem to work better.</p>\n<h4 id=\"4-5-headings-inside-lists\">4.5: headings inside lists</h4>\n<p>Previously this didn’t render as a heading, but now it does. So I needed to\nreplace the <code>#</code> with <code>&num;</code>.</p>\n<pre><code>* # passengers: 20\n</code></pre>\n<h4 id=\"4-6-or-1-at-the-beginning-of-the-line-makes-it-a-list\">4.6:  <code>+</code> or <code>1)</code> at the beginning of the line makes it a list</h4>\n<p>I had something which looked like this:</p>\n<pre><code>`1 / (1\n+ exp(-1)) = 0.73`\n</code></pre>\n<p>With Blackfriday it rendered like this:</p>\n<pre><code><p><code>1 / (1\n+ exp(-1)) = 0.73</code></p>\n</code></pre>\n<p>and with Goldmark it rendered like this:</p>\n<pre><code><p>`1 / (1</p>\n<ul>\n<li>exp(-1)) = 0.73`</li>\n</ul>\n</code></pre>\n<p>Same thing if there was an accidental <code>1)</code> at the beginning of a line, like in this Markdown snippet</p>\n<pre><code>I set up a small Hadoop cluster (1 master, 2 workers, replication set to \n1) on \n</code></pre>\n<p>To fix this I just had to rewrap the line so that the <code>+</code> wasn’t the first character.</p>\n<p>The Markdown is formatted this way because I wrap my Markdown to 80 characters\na lot and the wrapping isn’t very context sensitive.</p>\n<h4 id=\"4-7-no-more-smart-quotes-in-code-blocks\">4.7: no more smart quotes in code blocks</h4>\n<p>There were a bunch of places where the old renderer (Blackfriday) was doing\nunwanted things in code blocks like replacing <code>...</code> with <code>…</code> or replacing\nquotes with smart quotes. I hadn’t realized this was happening and I was very\nhappy to have it fixed.</p>\n<h4 id=\"4-8-better-quote-management\">4.8: better quote management</h4>\n<p>The way this gets rendered got better:</p>\n<pre><code>\"Oh, *interesting*!\"\n</code></pre>\n<ul>\n<li>old: “Oh, <em>interesting</em>!“</li>\n<li>new: “Oh, <em>interesting</em>!”</li>\n</ul>\n<p>Before there were two left smart quotes, now the quotes match.</p>\n<h4 id=\"4-9-images-are-no-longer-wrapped-in-a-p-tag\">4.9: images are no longer wrapped in a <code>p</code> tag</h4>\n<p>Previously if I had an image like this:</p>\n<pre><code><img src=\"https://jvns.ca/images/rustboot1.png\">\n</code></pre>\n<p>it would get wrapped in a <code><p></code> tag, now it doesn’t anymore. I dealt with this\njust by adding a <code>margin-bottom: 0.75em</code> to images in the CSS, hopefully\nthat’ll make them display well enough.</p>\n<h4 id=\"4-10-br-is-now-wrapped-in-a-p-tag\">4.10: <code><br></code> is now wrapped in a <code>p</code> tag</h4>\n<p>Previously this wouldn’t get wrapped in a <code>p</code> tag, but now it seems to:</p>\n<pre><code><br><br>\n</code></pre>\n<p>I just gave up on fixing this though and resigned myself to maybe having some\nextra space in some cases. Maybe I’ll try to fix it later if I feel like\nanother yakshave.</p>\n<h4 id=\"4-11-some-more-goldmark-settings\">4.11: some more goldmark settings</h4>\n<p>I also needed to</p>\n<ul>\n<li>turn off code highlighting (because it wasn’t working properly and I didn’t have it before anyway)</li>\n<li>use the old “blackfriday” method to generate heading IDs so they didn’t change</li>\n<li>allow raw HTML in my markdown</li>\n</ul>\n<p>Here’s what I needed to add to my <code>config.yaml</code> to do all that:</p>\n<pre><code>markup:\n  highlight:\n    codeFences: false\n  goldmark:\n    renderer:\n      unsafe: true\n    parser:\n      autoHeadingIDType: blackfriday\n</code></pre>\n<p>Maybe I’ll try to get syntax highlighting working one day, who knows. I might\nprefer having it off though.</p>\n<h3 id=\"a-little-script-to-compare-blackfriday-and-goldmark\">a little script to compare blackfriday and goldmark</h3>\n<p>I also wrote a little program to compare the Blackfriday and Goldmark output\nfor various markdown snippets, <a href=\"https://gist.github.com/jvns/9cc3024ff98433ced5e3a2304c5fc5e4\">here it is in a gist</a>.</p>\n<p>It’s not really configured the exact same way Blackfriday and Goldmark were in\nmy Hugo versions, but it was still helpful to have to help me understand what\nwas going on.</p>\n<h3 id=\"a-quick-note-on-maintaining-themes\">a quick note on maintaining themes</h3>\n<p>My approach to themes in Hugo has been:</p>\n<ol>\n<li>pay someone to make a nice design for the site (for example wizardzines.com was designed by <a href=\"https://melody.dev/\">Melody Starling</a>)</li>\n<li>use a totally custom theme</li>\n<li>commit that theme to the same Github repo as the site</li>\n</ol>\n<p>So I just need to edit the theme files to fix any problems. Also I wrote a lot\nof the theme myself so I’m pretty familiar with how it works.</p>\n<p>Relying on someone else to keep a theme updated feels kind of scary to me, I\nthink if I were using a third-party theme I’d just copy the code into my site’s\ngithub repo and then maintain it myself.</p>\n<h3 id=\"which-static-site-generators-have-better-backwards-compatibility\">which static site generators have better backwards compatibility?</h3>\n<p>I <a href=\"https://social.jvns.ca/@b0rk/113260718682453232\">asked on Mastodon</a> if\nanyone had used a static site generator with good backwards compatibility.</p>\n<p>The main answers seemed to be Jekyll and 11ty. Several people said they’d been\nusing Jekyll for 10 years without any issues, and 11ty says it has\n<a href=\"https://www.11ty.dev/blog/stability/\">stability as a core goal</a>.</p>\n<p>I think a big factor in how appealing Jekyll/11ty are is how easy it is for you\nto maintain a working Ruby / Node environment on your computer: part of the\nreason I stopped using Jekyll was that I got tired of having to maintain a\nworking Ruby installation. But I imagine this wouldn’t be a problem for a Ruby\nor Node developer.</p>\n<p>Several people said that they don’t build their Jekyll site locally at all –\nthey just use GitHub Pages to build it.</p>\n<h3 id=\"that-s-it\">that’s it!</h3>\n<p>Overall I’ve been happy with Hugo – I <a href=\"https://jvns.ca/blog/2016/10/09/switching-to-hugo/\">started using it</a> because it had fast\nbuild times and it was a static binary, and both of those things are still\nextremely useful to me. I might have spent 10 hours on this upgrade, but I’ve\nprobably spent 1000+ hours writing blog posts without thinking about Hugo at\nall so that seems like an extremely reasonable ratio.</p>\n<p>I find it hard to be too mad about the backwards incompatible changes, most of\nthem were quite a long time ago, Hugo does a great job of making their old\nreleases available so you can use the old release if you want, and the most\ndifficult one is removing support for the <code>blackfriday</code> Markdown renderer in\nfavour of using something CommonMark-compliant which seems pretty reasonable to\nme even if it is a huge pain.</p>\n<p>But it did take a long time and I don’t think I’d particularly recommend moving\n700 blog posts to a new Markdown renderer unless you’re really in the mood for\na lot of computer suffering for some reason.</p>\n<p>The new renderer did fix a bunch of problems so I think overall it might be a\ngood thing, even if I’ll have to remember to make 2 changes to how I write\nMarkdown (4.1 and 4.3).</p>\n<p>Also I’m still using Hugo 0.54 for <a href=\"https://wizardzines.com\">https://wizardzines.com</a> so maybe these notes\nwill be useful to Future Me if I ever feel like upgrading Hugo for that site.</p>\n<p>Hopefully I didn’t break too many things on the blog by doing this, let me know\nif you see anything broken!</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/10/01/terminal-colours/",
      "title": "Terminal colours are tricky",
      "description": null,
      "url": "https://jvns.ca/blog/2024/10/01/terminal-colours/",
      "published": null,
      "updated": "2024-10-01T10:01:44.000Z",
      "content": "<p>Yesterday I was thinking about how long it took me to get a colorscheme in my\nterminal that I was mostly happy with (SO MANY YEARS), and it made me wonder\nwhat about terminal colours made it so hard.</p>\n<p>So I <a href=\"https://social.jvns.ca/@b0rk/113226972156366201\">asked people on Mastodon</a> what problems\nthey’ve run into with colours in the terminal, and I got a ton of interesting\nresponses! Let’s talk about some of the problems and a few possible ways to fix\nthem.</p>\n<h3 id=\"problem-1-blue-on-black\">problem 1: blue on black</h3>\n<p>One of the top complaints was “blue on black is hard to read”. Here’s an\nexample of that: if I open Terminal.app, set the background to black, and run\n<code>ls</code>, the directories are displayed in a blue that isn’t that easy to read:</p>\n<img src=\"https://jvns.ca/images/terminal-blue.png\" style=\"max-width: 400px\">\n<p>To understand why we’re seeing this blue, let’s talk about ANSI colours!</p>\n<h3 id=\"the-16-ansi-colours\">the 16 ANSI colours</h3>\n<p>Your terminal has 16 numbered colours – black, red, green, yellow, blue,\nmagenta, cyan, white, and “bright” version of each of those.</p>\n<p>Programs can use them by printing out an “ANSI escape code” – for example if\nyou want to see each of the 16 colours in your terminal, you can run this\nPython program:</p>\n<pre><code class=\"language-python\">def color(num, text):\n    return f\"\\033[38;5;{num}m{text}\\033[0m\"\n\nfor i in range(16):\n    print(color(i, f\"number {i:02}\"))\n</code></pre>\n<h3 id=\"what-are-the-ansi-colours\">what are the ANSI colours?</h3>\n<p>This made me wonder – if blue is colour number 5, who decides what hex color\nthat should correspond to?</p>\n<p>The answer seems to be “there’s no standard, terminal emulators just choose\ncolours and it’s not very consistent”. Here’s a <a href=\"https://en.m.wikipedia.org/wiki/ANSI_escape_code#Colors\">screenshot of a table from Wikipedia</a>, where you\ncan see that there’s a lot of variation:</p>\n<img src=\"https://jvns.ca/images/wikipedia.png\"> \n<h3 id=\"problem-1-5-bright-yellow-on-white\">problem 1.5: bright yellow on white</h3>\n<p>Bright yellow on white is even worse than blue on black, here’s what I get in\na terminal with the default settings:</p>\n<img src=\"https://jvns.ca/images/terminal-yellow.png\" style=\"max-height: 40px\">\n<p>That’s almost impossible to read (and some other colours like light green cause\nsimilar issues), so let’s talk about solutions!</p>\n<h3 id=\"two-ways-to-reconfigure-your-colours\">two ways to reconfigure your colours</h3>\n<p>If you’re annoyed by these colour contrast issues (or maybe you just think the\ndefault ANSI colours are ugly), you might think – well, I’ll just choose a\ndifferent “blue” and pick something I like better!</p>\n<p>There are two ways you can do this:</p>\n<p><strong>Way 1: Configure your terminal emulator</strong>: I think most modern terminal emulators\nhave a way to reconfigure the colours, and some of them even come with some\npreinstalled themes that you might like better than the defaults.</p>\n<p><strong>Way 2: Run a shell script</strong>: There are ANSI escape codes that you can print\nout to tell your terminal emulator to reconfigure its colours. <a href=\"https://github.com/chriskempson/base16-shell/blob/master/scripts/base16-solarized-light.sh\">Here’s a shell script that does that</a>,\nfrom the <a href=\"https://github.com/chriskempson/base16-shell\">base16-shell</a> project.\nYou can see that it has a few different conventions for changing the colours –\nI guess different terminal emulators have different escape codes for changing\ntheir colour palette, and so the script is trying to pick the right style of\nescape code based on the <code>TERM</code> environment variable.</p>\n<h3 id=\"what-are-the-pros-and-cons-of-the-2-ways-of-configuring-your-colours\">what are the pros and cons of the 2 ways of configuring your colours?</h3>\n<p>I prefer to use the “shell script” method, because:</p>\n<ul>\n<li>if I switch terminal emulators for some reason, I don’t need to a different configuration system, my colours still Just Work</li>\n<li>I use <a href=\"https://github.com/chriskempson/base16-shell\">base16-shell</a> with base16-vim to make my vim colours match my terminal colours, which is convenient</li>\n</ul>\n<p>some advantages of configuring colours in your terminal emulator:</p>\n<ul>\n<li>if you use a popular terminal emulator, there are probably a lot more nice terminal themes out there that you can choose from</li>\n<li>not all terminal emulators support the “shell script method”, and even if\nthey do, the results can be a little inconsistent</li>\n</ul>\n<p>This is what my shell has looked like for probably the last 5 years (using the\nsolarized light base16 theme), and I’m pretty happy with it. Here’s <code>htop</code>:</p>\n<img src=\"https://jvns.ca/images/terminal-my-colours.png\" style=\"max-width: 400px\">\n<p>Okay, so let’s say you’ve found a terminal colorscheme that you like. What else\ncan go wrong?</p>\n<h3 id=\"problem-2-programs-using-256-colours\">problem 2: programs using 256 colours</h3>\n<p>Here’s what some output of <code>fd</code>, a <code>find</code> alternative, looks like in my\ncolorscheme:</p>\n<img src=\"https://jvns.ca/images/terminal-problem-fd.png\" style=\"max-width: 400px\">\n<p>The contrast is pretty bad here, and I definitely don’t have that lime green in\nmy normal colorscheme. What’s going on?</p>\n<p>We can see what color codes <code>fd</code> is using using the <code>unbuffer</code> program to\ncapture its output including the color codes:</p>\n<pre><code>$ unbuffer fd . > out\n$ vim out\n^[[38;5;48mbad-again.sh^[[0m\n^[[38;5;48mbad.sh^[[0m\n^[[38;5;48mbetter.sh^[[0m\nout\n</code></pre>\n<p><code>^[[38;5;48</code> means “set the foreground color to color <code>48</code>”. Terminals don’t\nonly have 16 colours – many terminals these days actually have 3 ways of\nspecifying colours:</p>\n<ol>\n<li>the 16 ANSI colours we already talked about</li>\n<li>an extended set of 256 colours</li>\n<li>a further extended set of 24-bit hex colours, like <code>#ffea03</code></li>\n</ol>\n<p>So <code>fd</code> is using one of the colours from the extended 256-color set. <code>bat</code> (a\n<code>cat</code> alternative) does something similar – here’s what it looks like by\ndefault in my terminal.</p>\n<img src=\"https://jvns.ca/images/terminal-bat.png\" style=\"max-width: 400px\">\n<p>This looks fine though and it really seems like it’s trying to work well with a\nvariety of terminal themes.</p>\n<h3 id=\"some-newer-tools-seem-to-have-theme-support\">some newer tools seem to have theme support</h3>\n<p>I think it’s interesting that some of these newer terminal tools (<code>fd</code>, <code>cat</code>,\n<code>delta</code>, and probably more) have support for arbitrary custom themes. I guess\nthe downside of this approach is that the default theme might clash with your\nterminal’s background, but the upside is that it gives you a lot more control\nover theming the tool’s output than just choosing 16 ANSI colours.</p>\n<p>I don’t really use <code>bat</code>, but if I did I’d probably use <code>bat --theme ansi</code> to\njust use the ANSI colours that I have set in my normal terminal colorscheme.</p>\n<h3 id=\"problem-3-the-grays-in-solarized\">problem 3: the grays in Solarized</h3>\n<p>A bunch of people on Mastodon mentioned a specific issue with grays in the\nSolarized theme: when I list a directory, the base16 Solarized Light theme\nlooks like this:</p>\n<img src=\"https://jvns.ca/images/terminal-solarized-base16.png\" style=\"max-width: 400px\">\n<p>but iTerm’s default Solarized Light theme looks like this:</p>\n<img src=\"https://jvns.ca/images/terminal-solarized-iterm.png\" style=\"max-width: 400px\">\n<p>This is because in the iTerm theme (which is the <a href=\"https://ethanschoonover.com/solarized/#the-values\">original Solarized design</a>), colors 9-14 (the “bright blue”, “bright\nred”, etc) are mapped to a series of grays, and when I run <code>ls</code>, it’s trying to\nuse those “bright” colours to color my directories and executables.</p>\n<p>My best guess for why the original Solarized theme is designed this way is to\nmake the grays available to the <a href=\"https://github.com/altercation/vim-colors-solarized/blob/528a59f26d12278698bb946f8fb82a63711eec21/colors/solarized.vim\">vim Solarized colorscheme</a>.</p>\n<p>I’m pretty sure I prefer the modified base16 version I use where the “bright”\ncolours are actually colours instead of all being shades of gray though. (I\ndidn’t actually realize the version I was using wasn’t the “original” Solarized\ntheme until I wrote this post)</p>\n<p>In any case I really love Solarized and I’m very happy it exists so that I can\nuse a modified version of it.</p>\n<h3 id=\"problem-4-a-vim-theme-that-doesn-t-match-the-terminal-background\">problem 4: a vim theme that doesn’t match the terminal background</h3>\n<p>If I my vim theme has a different background colour than my terminal theme, I\nget this ugly border, like this:</p>\n<img src=\"https://jvns.ca/images/terminal-vim-black-bg.png\" style=\"max-width: 400px\">\n<p>This one is a pretty minor issue though and I think making your terminal\nbackground match your vim background is pretty straightforward.</p>\n<h3 id=\"problem-5-programs-setting-a-background-color\">problem 5: programs setting a background color</h3>\n<p>A few people mentioned problems with terminal applications setting an\nunwanted background colour, so let’s look at an example of that.</p>\n<p>Here <code>ngrok</code> has set the background to color #16 (“black”), but the\n<code>base16-shell</code> script I use sets color 16 to be bright orange, so I get this,\nwhich is pretty bad:</p>\n<img src=\"https://jvns.ca/images/terminal-ngrok-solarized.png\" style=\"max-width: 400px\">\n<p>I think the intention is for ngrok to look something like this:</p>\n<img src=\"https://jvns.ca/images/terminal-ngrok-regular.png\" style=\"max-width: 400px\">\n<p>I think <code>base16-shell</code> sets color #16 to orange (instead of black)\nso that it can provide extra colours for use by <a href=\"https://github.com/chriskempson/base16-vim/blob/3be3cd82cd31acfcab9a41bad853d9c68d30478d/colors/base16-solarized-light.vim\">base16-vim</a>.\nThis feels reasonable to me – I use <code>base16-vim</code> in the terminal, so I guess I’m\nusing that feature and it’s probably more important to me than <code>ngrok</code> (which I\nrarely use) behaving a bit weirdly.</p>\n<p>This particular issue is a maybe obscure clash between ngrok and my colorschem,\nbut I think this kind of clash is pretty common when a program sets an ANSI\nbackground color that the user has remapped for some reason.</p>\n<h3 id=\"a-nice-solution-to-contrast-issues-minimum-contrast\">a nice solution to contrast issues: “minimum contrast”</h3>\n<p>A bunch of terminals (iTerm2, <a href=\"https://github.com/Eugeny/tabby\">tabby</a>, kitty’s <a href=\"https://sw.kovidgoyal.net/kitty/conf/#opt-kitty.text_fg_override_threshold\">text_fg_override_threshold</a>, and\nfolks tell me also Ghostty and Windows Terminal) have a “minimum\ncontrast” feature that will automatically adjust colours to make sure they have enough contrast.</p>\n<p>Here’s an example from iTerm. This ngrok accident from before has pretty bad\ncontrast, I find it pretty difficult to read:</p>\n<img src=\"https://jvns.ca/images/terminal-ngrok-solarized.png\" style=\"max-width: 400px\">\n<p>With “minimum contrast” set to 40 in iTerm, it looks like this instead:</p>\n<img src=\"https://jvns.ca/images/terminal-ngrok-solarized-contrast.png\" style=\"max-width: 400px\">\n<p>I didn’t have minimum contrast turned on before but I just turned it on today\nbecause it makes such a big difference when something goes wrong with colours\nin the terminal.</p>\n<h3 id=\"problem-6-term-being-set-to-the-wrong-thing\">problem 6: <code>TERM</code> being set to the wrong thing</h3>\n<p>A few people mentioned that they’ll SSH into a system that doesn’t support the\n<code>TERM</code> environment variable that they have set locally, and then the colours\nwon’t work.</p>\n<p>I think the way <code>TERM</code> works is that systems have a <code>terminfo</code> database, so if\nthe value of the <code>TERM</code> environment variable isn’t in the system’s terminfo\ndatabase, then it won’t know how to output colours for that terminal. I don’t\nknow too much about terminfo, but someone linked me to this <a href=\"https://twoot.site/@bean/113056942625234032\">terminfo rant</a> that talks about a few other\nissues with terminfo.</p>\n<p>I don’t have a system on hand to reproduce this one so I can’t say for sure how\nto fix it, but <a href=\"https://unix.stackexchange.com/questions/67537/prevent-ssh-client-passing-term-environment-variable-to-server\">this stackoverflow question</a>\nsuggests running something like <code>TERM=xterm ssh</code> instead of <code>ssh</code>.</p>\n<h3 id=\"problem-7-picking-good-colours-is-hard\">problem 7: picking “good” colours is hard</h3>\n<p>A couple of problems people mentioned with designing / finding terminal colorschemes:</p>\n<ul>\n<li>some folks are colorblind and have trouble finding an appropriate colorscheme</li>\n<li>accidentally making the background color too close to the cursor or selection color, so they’re hard to find</li>\n<li>generally finding colours that work with every program is a struggle (for example you can see me having a problem with this with ngrok above!)</li>\n</ul>\n<h3 id=\"problem-8-making-nethack-mc-look-right\">problem 8: making nethack/mc look right</h3>\n<p>Another problem people mentioned is using a program like nethack or midnight\ncommander which you might expect to have a specific colourscheme based on the\ndefault ANSI terminal colours.</p>\n<p>For example, midnight commander has a really specific classic look:</p>\n<img src=\"https://jvns.ca/images/terminal-mc-normal.png\" style=\"max-width: 200px\">\n<p>But in my Solarized theme, midnight commander looks like this:</p>\n<img src=\"https://jvns.ca/images/terminal-mc-solarized.png\" style=\"max-width: 200px\">\n<p>The Solarized version feels like it could be disorienting if you’re\nvery used to the “classic” look.</p>\n<p>One solution Simon Tatham mentioned to this is using some palette customization\nANSI codes (like the ones base16 uses that I talked about earlier) to change\nthe color palette right before starting the program, for example remapping\nyellow to a brighter yellow before starting Nethack so that the yellow\ncharacters look better.</p>\n<h3 id=\"problem-9-commands-disabling-colours-when-writing-to-a-pipe\">problem 9: commands disabling colours when writing to a pipe</h3>\n<p>If I run <code>fd | less</code>, I see something like this, with the colours disabled.</p>\n<img src=\"https://jvns.ca/images/terminal-fd-bw.png\" style=\"max-width: 300px\">\n<p>In general I find this useful – if I pipe a command to <code>grep</code>, I don’t want it\nto print out all those color escape codes, I just want the plain text. But what if you want to see the colours?</p>\n<p>To see the colours, you can run <code>unbuffer fd | less -r</code>! I just learned about\n<code>unbuffer</code> recently and I think it’s really cool, <code>unbuffer</code> opens a tty for the\ncommand to write to so that it thinks it’s writing to a TTY. It also fixes\nissues with programs buffering their output when writing to a pipe, which is\nwhy it’s called <code>unbuffer</code>.</p>\n<p>Here’s what the output of <code>unbuffer fd | less -r</code> looks like for me:</p>\n<img src=\"https://jvns.ca/images/terminal-fd-color.png\" style=\"max-width: 300px\">\n<p>Also some commands (including <code>fd</code>) support a <code>--color=always</code> flag which will\nforce them to always print out the colours.</p>\n<h3 id=\"problem-10-unwanted-colour-in-ls-and-other-commands\">problem 10: unwanted colour in <code>ls</code> and other commands</h3>\n<p>Some people mentioned that they don’t want <code>ls</code> to use colour at all, perhaps\nbecause <code>ls</code> uses blue, it’s hard to read on black, and maybe they don’t feel like\ncustomizing their terminal’s colourscheme to make the blue more readable or\njust don’t find the use of colour helpful.</p>\n<p>Some possible solutions to this one:</p>\n<ul>\n<li>you can run <code>ls --color=never</code>, which is probably easiest</li>\n<li>you can also set <code>LS_COLORS</code> to customize the colours used by <code>ls</code>. I think some other programs other than <code>ls</code> support the <code>LS_COLORS</code> environment variable too.</li>\n<li>also some programs support setting <code>NO_COLOR=true</code> (there’s a <a href=\"https://no-color.org/\">list here</a>)</li>\n</ul>\n<p>Here’s an example of running <code>LS_COLORS=\"fi=0:di=0:ln=0:pi=0:so=0:bd=0:cd=0:or=0:ex=0\" ls</code>:</p>\n<img src=\"https://jvns.ca/images/terminal-ls-colors.png\" style=\"max-width: 500px\">\n<h3 id=\"problem-11-the-colours-in-vim\">problem 11: the colours in vim</h3>\n<p>I used to have a lot of problems with configuring my colours in vim – I’d set\nup my terminal colours in a way that I thought was okay, and then I’d start vim\nand it would just be a disaster.</p>\n<p>I think what was going on here is that today, there are two ways to set up a vim colorscheme in the terminal:</p>\n<ol>\n<li>using your ANSI terminal colours – you tell vim which ANSI colour number to use for the background, for functions, etc.</li>\n<li>using 24-bit hex colours – instead of ANSI terminal colours, the vim colorscheme can use hex codes like #faea99 directly</li>\n</ol>\n<p>20 years ago when I started using vim, terminals with 24-bit hex color support\nwere a lot less common (or maybe they didn’t exist at all), and vim certainly\ndidn’t have support for using 24-bit colour in the terminal. From some quick\nsearching through git, it looks like <a href=\"https://github.com/vim/vim/commit/8a633e3427b47286869aa4b96f2bfc1fe65b25cd\">vim added support for 24-bit colour in 2016</a>\n– just 8 years ago!</p>\n<p>So to get colours to work properly in vim before 2016, you needed to synchronize\nyour terminal colorscheme and your vim colorscheme. <a href=\"https://github.com/chriskempson/base16-vim/blob/3be3cd82cd31acfcab9a41bad853d9c68d30478d/colors/base16-solarized-light.vim#L52-L71\">Here’s what that looked like</a>,\nthe colorscheme needed to map the vim color classes like <code>cterm05</code> to ANSI colour numbers.</p>\n<p>But in 2024, the story is really different! Vim (and Neovim, which I use now)\nsupport 24-bit colours, and as of Neovim 0.10 (released in May 2024), the\n<code>termguicolors</code> setting (which tells Vim to use 24-bit hex colours for\ncolorschemes) is <a href=\"https://neovim.io/doc/user/news-0.10.html\">turned on by default</a> in any terminal with 24-bit\ncolor support.</p>\n<p>So this “you need to synchronize your terminal colorscheme and your vim\ncolorscheme” problem is not an issue anymore for me in 2024, since I\ndon’t plan to use terminals without 24-bit color support in the future.</p>\n<p>The biggest consequence for me of this whole thing is that I don’t need base16\nto set colors 16-21 to weird stuff anymore to integrate with vim – I can just\nuse a terminal theme and a vim theme, and as long as the two themes use similar\ncolours (so it’s not jarring for me to switch between them) there’s no problem.\nI think I can just remove those parts from my <code>base16</code> shell script and totally\navoid the problem with ngrok and the weird orange background I talked about\nabove.</p>\n<h3 id=\"some-more-problems-i-left-out\">some more problems I left out</h3>\n<p>I think there are a lot of issues around the intersection of multiple programs,\nlike using some combination tmux/ssh/vim that I couldn’t figure out how to\nreproduce well enough to talk about them. Also I’m sure I missed a lot of other\nthings too.</p>\n<h3 id=\"base16-has-really-worked-for-me\">base16 has really worked for me</h3>\n<p>I’ve personally had a lot of success with using\n<a href=\"https://github.com/chriskempson/base16-shell\">base16-shell</a> with\n<a href=\"https://github.com/chriskempson/base16-vim\">base16-vim</a> – I just need to add <a href=\"https://github.com/chriskempson/base16-shell?tab=readme-ov-file#fish\">a couple of lines</a> to my\nfish config to set it up (+ a few <code>.vimrc</code> lines) and then I can move on and\naccept any remaining problems that that doesn’t solve.</p>\n<p>I don’t think base16 is for everyone though, some limitations I’m aware\nof with base16 that might make it not work for you:</p>\n<ul>\n<li>it comes with a limited set of builtin themes and you might not like any of them</li>\n<li>the Solarized base16 theme (and maybe all of the themes?) sets the “bright”\nANSI colours to be exactly the same as the normal colours, which might cause\na problem if you’re relying on the “bright” colours to be different from the\nregular ones</li>\n<li>it sets colours 16-21 in order to give the vim colorschemes from <code>base16-vim</code>\naccess to more colours, which might not be relevant if you always use a\nterminal with 24-bit color support, and can cause problems like the ngrok\nissue above</li>\n<li>also the way it sets colours 16-21 could be a problem in terminals that don’t\nhave 256-color support, like the linux framebuffer terminal</li>\n</ul>\n<p>Apparently there’s a community fork of base16 called\n<a href=\"https://github.com/tinted-theming/home\">tinted-theming</a>, which I haven’t\nlooked into much yet.</p>\n<h3 id=\"some-other-colorscheme-tools\">some other colorscheme tools</h3>\n<p>Just one so far but I’ll link more if people tell me about them:</p>\n<ul>\n<li><a href=\"https://rootloops.sh/\">rootloops.sh</a> for generating colorschemes (and <a href=\"https://hamvocke.com/blog/lets-create-a-terminal-color-scheme/\">“let’s create a terminal color scheme”</a>)</li>\n<li>Some popular colorschemes (according to people I asked on Mastodon): <a href=\"https://catppuccin.com/\">catpuccin</a>, Monokai, Gruvbox, <a href=\"https://github.com/dracula\">Dracula</a>, <a href=\"https://protesilaos.com/emacs/modus-themes\">Modus (a high contrast theme)</a>, <a href=\"https://github.com/folke/tokyonight.nvim\">Tokyo Night</a>, <a href=\"https://www.nordtheme.com/\">Nord</a>, <a href=\"https://rosepinetheme.com/\">Rosé Pine</a></li>\n</ul>\n<h3 id=\"okay-that-was-a-lot\">okay, that was a lot</h3>\n<p>We talked about a lot in this post and  while I think learning about all these\ndetails is kind of fun if I’m in the mood to do a deep dive, I find it SO\nFRUSTRATING to deal with it when I just want my colours to work! Being\nsurprised by unreadable text and having to find a workaround is just not my\nidea of a good day.</p>\n<p>Personally I’m a zero-configuration kind of person and it’s not that appealing\nto me to have to put together a lot of custom configuration just to make my\ncolours in the terminal look acceptable. I’d much rather just have some\nreasonable defaults that I don’t have to change.</p>\n<h3 id=\"minimum-contrast-seems-like-an-amazing-feature\">minimum contrast seems like an amazing feature</h3>\n<p>My one big takeaway from writing this was to turn on “minimum contrast” in my\nterminal, I think it’s going to fix most of the occasional accidental\nunreadable text issues I run into and I’m pretty excited about it.</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/09/27/some-go-web-dev-notes/",
      "title": "Some Go web dev notes",
      "description": null,
      "url": "https://jvns.ca/blog/2024/09/27/some-go-web-dev-notes/",
      "published": null,
      "updated": "2024-09-27T11:16:00.000Z",
      "content": "<p>I spent a lot of time in the past couple of weeks working on a website in Go\nthat may or may not ever see the light of day, but I learned a couple of things\nalong the way I wanted to write down. Here they are:</p>\n<h3 id=\"go-1-22-now-has-better-routing\">go 1.22 now has better routing</h3>\n<p>I’ve never felt motivated to learn any of the Go routing libraries\n(gorilla/mux, chi, etc), so I’ve been doing all my routing by hand, like this.</p>\n<pre><code>\t// DELETE /records:\n\tcase r.Method == \"DELETE\" && n == 1 && p[0] == \"records\":\n\t\tif !requireLogin(username, r.URL.Path, r, w) {\n\t\t\treturn\n\t\t}\n\t\tdeleteAllRecords(ctx, username, rs, w, r)\n\t// POST /records/<ID>\n\tcase r.Method == \"POST\" && n == 2 && p[0] == \"records\" && len(p[1]) > 0:\n\t\tif !requireLogin(username, r.URL.Path, r, w) {\n\t\t\treturn\n\t\t}\n\t\tupdateRecord(ctx, username, p[1], rs, w, r)\n\n</code></pre>\n<p>But apparently <a href=\"https://go.dev/blog/routing-enhancements\">as of Go 1.22</a>, Go\nnow has better support for routing in the standard library, so that code can be\nrewritten something like this:</p>\n<pre><code>\tmux.HandleFunc(\"DELETE /records/\", app.deleteAllRecords)\n\tmux.HandleFunc(\"POST /records/{record_id}\", app.updateRecord)\n</code></pre>\n<p>Though it would also need a login middleware, so maybe something more like\nthis, with a <code>requireLogin</code> middleware.</p>\n<pre><code>\tmux.Handle(\"DELETE /records/\", requireLogin(http.HandlerFunc(app.deleteAllRecords)))\n</code></pre>\n<h3 id=\"a-gotcha-with-the-built-in-router-redirects-with-trailing-slashes\">a gotcha with the built-in router: redirects with trailing slashes</h3>\n<p>One annoying gotcha I ran into was: if I make a route for <code>/records/</code>, then a\nrequest for <code>/records</code> <a href=\"https://pkg.go.dev/net/http#hdr-Trailing_slash_redirection-ServeMux\">will be redirected</a> to <code>/records/</code>.</p>\n<p>I ran into an issue with this where sending a POST request to <code>/records</code>\nredirected to a GET request for <code>/records/</code>, which broke the POST request\nbecause it removed the request body. Thankfully <a href=\"https://xeiaso.net/blog/go-servemux-slash-2021-11-04/\">Xe Iaso wrote a blog post about the exact same issue</a> which made it\neasier to debug.</p>\n<p>I think the solution to this is just to use API endpoints like <code>POST /records</code>\ninstead of <code>POST /records/</code>, which seems like a more normal design anyway.</p>\n<h3 id=\"sqlc-automatically-generates-code-for-my-db-queries\">sqlc automatically generates code for my db queries</h3>\n<p>I got a little bit tired of writing so much boilerplate for my SQL queries, but\nI didn’t really feel like learning an ORM, because I know what SQL queries I\nwant to write, and I didn’t feel like learning the ORM’s conventions for\ntranslating things into SQL queries.</p>\n<p>But then I found <a href=\"https://sqlc.dev/\">sqlc</a>, which will compile a query like this:</p>\n<pre><code>\n-- name: GetVariant :one\nSELECT *\nFROM variants\nWHERE id = ?;\n\n</code></pre>\n<p>into Go code like this:</p>\n<pre><code>const getVariant = `-- name: GetVariant :one\nSELECT id, created_at, updated_at, disabled, product_name, variant_name\nFROM variants\nWHERE id = ?\n`\n\nfunc (q *Queries) GetVariant(ctx context.Context, id int64) (Variant, error) {\n\trow := q.db.QueryRowContext(ctx, getVariant, id)\n\tvar i Variant\n\terr := row.Scan(\n\t\t&i.ID,\n\t\t&i.CreatedAt,\n\t\t&i.UpdatedAt,\n\t\t&i.Disabled,\n\t\t&i.ProductName,\n\t\t&i.VariantName,\n\t)\n\treturn i, err\n}\n</code></pre>\n<p>What I like about this is that if I’m ever unsure about what Go code to write\nfor a given SQL query, I can just write the query I want, read the generated\nfunction and it’ll tell me exactly what to do to call it. It feels much easier\nto me than trying to dig through the ORM’s documentation to figure out how to\nconstruct the SQL query I want.</p>\n<p>Reading <a href=\"https://brandur.org/fragments/sqlc-2024\">Brandur’s sqlc notes from 2024</a> also gave me some confidence\nthat this is a workable path for my tiny programs. That post gives a really\nhelpful example of how to conditionally update fields in a table using CASE\nstatements (for example if you have a table with 20 columns and you only want\nto update 3 of them).</p>\n<h3 id=\"sqlite-tips\">sqlite tips</h3>\n<p>Someone on Mastodon linked me to this post called <a href=\"https://kerkour.com/sqlite-for-servers\">Optimizing sqlite for servers</a>. My projects are small and I’m\nnot so concerned about performance, but my main takeaways were:</p>\n<ul>\n<li>have a dedicated object for <strong>writing</strong> to the database, and run\n<code>db.SetMaxOpenConns(1)</code> on it. I learned the hard way that if I don’t do this\nthen I’ll get <code>SQLITE_BUSY</code> errors from two threads trying to write to the db\nat the same time.</li>\n<li>if I want to make reads faster, I could have 2 separate db objects, one for writing and one for reading</li>\n</ul>\n<p>There are a more tips in that post that seem useful (like “COUNT queries are\nslow” and “Use STRICT tables”), but I haven’t done those yet.</p>\n<p>Also sometimes if I have two tables where I know I’ll never need to do a <code>JOIN</code>\nbeteween them, I’ll just put them in separate databases so that I can connect\nto them independently.</p>\n<h3 id=\"go-1-19-introduced-a-way-to-set-a-gc-memory-limit\">Go 1.19 introduced a way to set a GC memory limit</h3>\n<p>I run all of my Go projects in VMs with relatively little memory, like 256MB or\n512MB. I ran into an issue where my application kept getting OOM killed and it\nwas confusing – did I have a memory leak? What?</p>\n<p>After some Googling, I realized that maybe I didn’t have a memory leak, maybe I\njust needed to reconfigure the garbage collector! It turns out that by default (according to <a href=\"https://tip.golang.org/doc/gc-guide\">A Guide to the Go Garbage Collector</a>), Go’s garbage collector will\nlet the application allocate memory up to <strong>2x</strong> the current heap size.</p>\n<p><a href=\"https://messwithdns.net\">Mess With DNS</a>’s base heap size is around 170MB and\nthe amount of memory free on the VM is around 160MB right now, so if its memory\ndoubled, it’ll get OOM killed.</p>\n<p>In Go 1.19, they added a way to tell Go “hey, if the application starts using\nthis much memory, run a GC”. So I set the GC memory limit to 250MB and it seems\nto have resulted in the application getting OOM killed less often:</p>\n<pre><code>export GOMEMLIMIT=250MiB\n</code></pre>\n<h3 id=\"some-reasons-i-like-making-websites-in-go\">some reasons I like making websites in Go</h3>\n<p>I’ve been making tiny websites (like the <a href=\"https://nginx-playground.wizardzines.com/\">nginx playground</a>) in Go on and off for the last 4 years or so and it’s really been working for me. I think I like it because:</p>\n<ul>\n<li>there’s just 1 static binary, all I need to do to deploy it is copy the binary. If there are static files I can just embed them in the binary with <a href=\"https://pkg.go.dev/embed\">embed</a>.</li>\n<li>there’s a built-in webserver that’s okay to use in production, so I don’t need to configure WSGI or whatever to get it to work. I can just put it behind <a href=\"https://caddyserver.com/\">Caddy</a> or run it on fly.io or whatever.</li>\n<li>Go’s toolchain is very easy to install, I can just do <code>apt-get install golang-go</code> or whatever and then a <code>go build</code> will build my project</li>\n<li>it feels like there’s very little to remember to start sending HTTP responses\n– basically all there is are functions like <code>Serve(w http.ResponseWriter, r *http.Request)</code> which read the request and send a response. If I need to\nremember some detail of how exactly that’s accomplished, I just have to read\nthe function!</li>\n<li>also <code>net/http</code> is in the standard library, so you can start making websites\nwithout installing any libraries at all. I really appreciate this one.</li>\n<li>Go is a pretty systems-y language, so if I need to run an <code>ioctl</code> or\nsomething that’s easy to do</li>\n</ul>\n<p>In general everything about it feels like it makes projects easy to work on for\n5 days, abandon for 2 years, and then get back into writing code without a lot\nof problems.</p>\n<p>For contrast, I’ve tried to learn Rails a couple of times and I really <em>want</em>\nto love Rails – I’ve made a couple of toy websites in Rails and it’s always\nfelt like a really magical experience. But ultimately when I come back to those\nprojects I can’t remember how anything works and I just end up giving up. It\nfeels easier to me to come back to my Go projects that are full of a lot of\nrepetitive boilerplate, because at least I can read the code and figure out how\nit works.</p>\n<h3 id=\"things-i-haven-t-figured-out-yet\">things I haven’t figured out yet</h3>\n<p>some things I haven’t done much of yet in Go:</p>\n<ul>\n<li>rendering HTML templates: usually my Go servers are just APIs and I make the\nfrontend a single-page app with Vue. I’ve used <code>html/template</code> a lot in Hugo (which I’ve used for this blog for the last 8 years)\nbut I’m still not sure how I feel about it.</li>\n<li>I’ve never made a real login system, usually my servers don’t have users at all.</li>\n<li>I’ve never tried to implement CSRF</li>\n</ul>\n<p>In general I’m not sure how to implement security-sensitive features so I don’t\nstart projects which need login/CSRF/etc. I imagine this is where a framework\nwould help.</p>\n<h3 id=\"it-s-cool-to-see-the-new-features-go-has-been-adding\">it’s cool to see the new features Go has been adding</h3>\n<p>Both of the Go features I mentioned in this post (<code>GOMEMLIMIT</code> and the routing)\nare new in the last couple of years and I didn’t notice when they came out. It\nmakes me think I should pay closer attention to the release notes for new Go\nversions.</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/09/12/reasons-i--still--love-fish/",
      "title": "Reasons I still love the fish shell",
      "description": null,
      "url": "https://jvns.ca/blog/2024/09/12/reasons-i--still--love-fish/",
      "published": null,
      "updated": "2024-09-12T15:09:12.000Z",
      "content": "<p>I wrote about how much I love <a href=\"https://fishshell.com/\">fish</a> in <a href=\"https://jvns.ca/blog/2017/04/23/the-fish-shell-is-awesome/\">this blog post from 2017</a> and, 7 years\nof using it every day later, I’ve found even more reasons to love it. So I\nthought I’d write a new post with both the old reasons I loved it and some\nreasons.</p>\n<p>This came up today because I was trying to figure out why my terminal doesn’t\nbreak anymore when I cat a binary to my terminal, the answer was “fish fixes\nthe terminal!”, and I just thought that was really nice.</p>\n<h3 id=\"1-no-configuration\">1. no configuration</h3>\n<p>In 10 years of using fish I have never found a single thing I wanted to configure. It just works the way I want. My fish config file just has:</p>\n<ul>\n<li>environment variables</li>\n<li>aliases (<code>alias ls eza</code>, <code>alias vim nvim</code>, etc)</li>\n<li>the occasional <code>direnv hook fish | source</code> to integrate a tool like direnv</li>\n<li>a script I run to set up my <a href=\"https://github.com/chriskempson/base16-shell/blob/588691ba71b47e75793ed9edfcfaa058326a6f41/scripts/base16-solarized-light.sh\">terminal colours</a></li>\n</ul>\n<p>I’ve been told that configuring things in fish is really easy if you ever do\nwant to configure something though.</p>\n<h3 id=\"2-autosuggestions-from-my-shell-history\">2. autosuggestions from my shell history</h3>\n<p>My absolute favourite thing about fish is that I type, it’ll automatically\nsuggest (in light grey) a matching command that I ran recently. I can press the\nright arrow key to accept the completion, or keep typing to ignore it.</p>\n<p>Here’s what that looks like. In this example I just typed the “v” key and it\nguessed that I want to run the previous vim command again.</p>\n<img src=\"https://jvns.ca/images/fish-2024.png\">\n<h3 id=\"2-5-smart-shell-autosuggestions\">2.5 “smart” shell autosuggestions</h3>\n<p>One of my favourite subtle autocomplete features is how fish handles autocompleting commands that contain paths in them. For example, if I run:</p>\n<pre><code>$ ls blah.txt\n</code></pre>\n<p>that command will only be autocompleted in directories that contain <code>blah.txt</code> – it won’t show up in a different directory. (here’s <a href=\"https://github.com/fish-shell/fish-shell/issues/120#issuecomment-6376019\">a short comment about how it works</a>)</p>\n<p>As an example, if in this directory I type <code>bash scripts/</code>, it’ll only suggest\nhistory commands including files that <em>actually exist</em> in my blog’s scripts\nfolder, and not the dozens of other irrelevant <code>scripts/</code> commands I’ve run in\nother folders.</p>\n<p>I didn’t understand exactly how this worked until last week, it just felt like fish was\nmagically able to suggest the right commands. It still feels a little like magic and I love it.</p>\n<h3 id=\"3-pasting-multiline-commands\">3. pasting multiline commands</h3>\n<p>If I copy and paste multiple lines, bash will run them all, like this:</p>\n<pre><code>[bork@grapefruit linux-playground (main)]$ echo hi\nhi\n[bork@grapefruit linux-playground (main)]$ touch blah\n[bork@grapefruit linux-playground (main)]$ echo hi\nhi\n</code></pre>\n<p>This is a bit alarming – what if I didn’t actually <em>want</em> to run all those\ncommands?</p>\n<p>Fish will paste them all at a single prompt, so that I can press Enter if I\nactually want to run them. Much less scary.</p>\n<pre><code>bork@grapefruit ~/work/> echo hi\n\n                         touch blah\n                         echo hi\n</code></pre>\n<h3 id=\"4-nice-tab-completion\">4. nice tab completion</h3>\n<p>If I run <code>ls</code> and press tab, it’ll display all the filenames in a nice grid. I can use either Tab, Shift+Tab, or the arrow keys to navigate the grid.</p>\n<p>Also, I can tab complete from the <strong>middle</strong> of a filename – if the filename\nstarts with a weird character (or if it’s just not very unique), I can type\nsome characters from the middle and press tab.</p>\n<p>Here’s what the tab completion looks like:</p>\n<pre><code>bork@grapefruit ~/work/> ls \napi/  blah.py     fly.toml   README.md\nblah  Dockerfile  frontend/  test_websocket.sh\n</code></pre>\n<p>I honestly don’t complete things other than filenames very much so I can’t\nspeak to that, but I’ve found the experience of tab completing filenames to be\nvery good.</p>\n<h3 id=\"5-nice-default-prompt-including-git-integration\">5. nice default prompt (including git integration)</h3>\n<p>Fish’s default prompt includes everything I want:</p>\n<ul>\n<li>username</li>\n<li>hostname</li>\n<li>current folder</li>\n<li>git integration</li>\n<li>status of last command exit (if the last command failed)</li>\n</ul>\n<p>Here’s a screenshot with a few different variations on the default prompt,\nincluding if the last command was interrupted (the <code>SIGINT</code>) or failed.</p>\n<img src=\"https://jvns.ca/images/fish-prompt-2024.png\">\n<h3 id=\"6-nice-history-defaults\">6. nice history defaults</h3>\n<p>In bash, the maximum history size is 500 by default, presumably because\ncomputers used to be slow and not have a lot of disk space. Also, by default,\ncommands don’t get added to your history until you end your session. So if your\ncomputer crashes, you lose some history.</p>\n<p>In fish:</p>\n<ol>\n<li>the default history size is 256,000 commands. I don’t see any reason I’d ever need more.</li>\n<li>if you open a new tab, everything you’ve ever run (including commands in\nopen sessions) is immediately available to you</li>\n<li>in an existing session, the history search will only include commands from\nthe current session, plus everything that was in history at the time that\nyou started the shell</li>\n</ol>\n<p>I’m not sure how clearly I’m explaining how fish’s history system works here,\nbut it feels really good to me in practice. My impression is that the way it’s\nimplemented is the commands are continually added to the history file, but fish\nonly loads the history file once, on startup.</p>\n<p>I’ll mention here that if you want to have a fancier history system in another\nshell it might be worth checking out <a href=\"https://github.com/atuinsh/atuin\">atuin</a> or <a href=\"https://github.com/junegunn/fzf\">fzf</a>.</p>\n<h3 id=\"7-press-up-arrow-to-search-history\">7. press up arrow to search history</h3>\n<p>I also like fish’s interface for searching history: for example if I want to\nedit my fish config file, I can just type:</p>\n<pre><code>$ config.fish\n</code></pre>\n<p>and then press the up arrow to go back the last command that included <code>config.fish</code>. That’ll complete to:</p>\n<pre><code>$ vim ~/.config/fish/config.fish\n</code></pre>\n<p>and I’m done. This isn’t <em>so</em> different from using <code>Ctrl+R</code> in bash to search\nyour history but I think I like it a little better over all, maybe because\n<code>Ctrl+R</code> has some behaviours that I find confusing (for example you can\nend up accidentally editing your history which I don’t like).</p>\n<h3 id=\"8-the-terminal-doesn-t-break\">8. the terminal doesn’t break</h3>\n<p>I used to run into issues with bash where I’d accidentally <code>cat</code> a binary to\nthe terminal, and it would break the terminal.</p>\n<p>Every time fish displays a prompt, it’ll try to fix up your terminal so that\nyou don’t end up in weird situations like this. I think <a href=\"https://github.com/fish-shell/fish-shell/blob/a979b6341d7fc4c466b3992f25da3209e0808aaa/src/reader.rs#L3601-L3623\">this is some of the\ncode in fish to prevent broken terminals</a>.</p>\n<p>Some things that it does are:</p>\n<ul>\n<li>turn on <code>echo</code> so that you can see the characters you type</li>\n<li>make sure that newlines work properly so that you don’t get that weird staircase effect</li>\n<li>reset your terminal background colour, etc</li>\n</ul>\n<p>I don’t think I’ve run into any of these “my terminal is broken” issues in a\nvery long time, and I actually didn’t even realize that this was because of\nfish – I thought that things somehow magically just got better, or maybe I\nwasn’t making as many mistakes. But I think it was mostly fish saving me from\nmyself, and I really appreciate that.</p>\n<h3 id=\"9-ctrl-s-is-disabled\">9. Ctrl+S is disabled</h3>\n<p>Also related to terminals breaking: fish disables Ctrl+S (which freezes your\nterminal and then you need to remember to press Ctrl+Q to unfreeze it). It’s a\nfeature that I’ve never wanted and I’m happy to not have it.</p>\n<p>Apparently you can disable <code>Ctrl+S</code> in other shells with <code>stty -ixon</code>.</p>\n<h3 id=\"10-nice-syntax-highlighting\">10. nice syntax highlighting</h3>\n<p>By default commands that don’t exist are highlighted in red, like this.</p>\n<img src=\"https://jvns.ca/images/fish-syntax-2024.png\">\n<h3 id=\"11-easier-loops\">11. easier loops</h3>\n<p>I find the loop syntax in fish a lot easier to type than the bash syntax. It looks like this:</p>\n<pre><code>for i in *.yaml\n  echo $i\nend\n</code></pre>\n<p>Also it’ll add indentation in your loops which is nice.</p>\n<h3 id=\"12-easier-multiline-editing\">12. easier multiline editing</h3>\n<p>Related to loops: you can edit multiline commands much more easily than in bash\n(just use the arrow keys to navigate the multiline command!). Also when you use\nthe up arrow to get a multiline command from your history, it’ll show you the\nwhole command the exact same way you typed it instead of squishing it all onto\none line like bash does:</p>\n<pre><code>$ bash\n$ for i in *.png\n> do\n> echo $i\n> done\n$ # press up arrow\n$ for i in *.png; do echo $i; done ink\n</code></pre>\n<h3 id=\"13-ctrl-left-arrow\">13. Ctrl+left arrow</h3>\n<p>This might just be me, but I really appreciate that fish has the <code>Ctrl+left arrow</code> / <code>Ctrl+right arrow</code> keyboard shortcut for moving between\nwords when writing a command.</p>\n<p>I’m honestly a bit confused about where this keyboard shortcut is coming from\n(the only documented keyboard shortcut for this I can find in fish is <code>Alt+left arrow</code> / <code>Alt + right arrow</code> which seems to do the same thing), but I’m pretty\nsure this is a fish shortcut.</p>\n<p>A couple of notes about getting this shortcut to work / where it comes from:</p>\n<ul>\n<li>one person said they needed to switch their terminal emulator from the “Linux\nconsole” keybindings to “Default (XFree 4)” to get it to work in fish</li>\n<li>on Mac OS, <code>Ctrl+left arrow</code> switches workspaces by default, so I had to turn\nthat off.</li>\n<li>Also apparently Ubuntu configures libreadline in <code>/etc/inputrc</code> to make\n<code>Ctrl+left/right arrow</code> go back/forward a word, so it’ll work in bash on\nUbuntu and maybe other Linux distros too. Here’s a <a href=\"https://stackoverflow.com/questions/5029118/bash-ctrl-to-move-cursor-between-words-strings\">stack overflow question talking about that</a></li>\n</ul>\n<h3 id=\"a-downside-not-everything-has-a-fish-integration\">a downside: not everything has a fish integration</h3>\n<p>Sometimes tools don’t have instructions for integrating them with fish. That’s annoying, but:</p>\n<ul>\n<li>I’ve found this has gotten better over the last 10 years as fish has gotten\nmore popular. For example Python’s virtualenv has had a fish integration for\na long time now.</li>\n<li>If I need to run a POSIX shell command real quick, I can always just run <code>bash</code> or <code>zsh</code></li>\n<li>I’ve gotten much better over the years at translating simple commands to fish syntax when I need to</li>\n</ul>\n<p>My biggest day-to-day to annoyance is probably that for whatever reason I’m\nstill not  used to fish’s syntax for setting environment variables, I get confused\nabout <code>set</code> vs <code>set -x</code>.</p>\n<h3 id=\"another-downside-fish-add-path\">another downside: <code>fish_add_path</code></h3>\n<p>fish has a function called <code>fish_add_path</code> that you can run to add a directory\nto your <code>PATH</code> like this:</p>\n<pre><code>fish_add_path /some/directory\n</code></pre>\n<p>I love the idea of it and I used to use it all the time, but I’ve stopped using\nit for two reasons:</p>\n<ol>\n<li>Sometimes <code>fish_add_path</code> will update the <code>PATH</code> for every session in the\nfuture (with a “universal variable”) and sometimes it will update the <code>PATH</code>\njust for the current session. It’s hard for me to tell which one it will\ndo: in theory the docs explain this but I could not understand them.</li>\n<li>If you ever need to <em>remove</em> the directory from your <code>PATH</code> a few weeks or\nmonths later because maybe you made a mistake, that’s also kind of hard to do\n(there are <a href=\"https://github.com/fish-shell/fish-shell/issues/8604\">instructions in this comments of this github issue though</a>).</li>\n</ol>\n<p>Instead I just update my PATH like this, similarly to how I’d do it in bash:</p>\n<pre><code>set PATH $PATH /some/directory/bin\n</code></pre>\n<h3 id=\"on-posix-compatibility\">on POSIX compatibility</h3>\n<p>When I started using fish, you couldn’t do things like <code>cmd1 && cmd2</code> – it\nwould complain “no, you need to run <code>cmd1; and cmd2</code>” instead.</p>\n<p>It seems like over the years fish has started accepting a little more POSIX-style syntax than it used to, like:</p>\n<ul>\n<li><code>cmd1 && cmd2</code></li>\n<li><code>export a=b</code> to set an environment variable (though this seems a bit limited, you can’t do <code>export PATH=$PATH:/whatever</code> so I think it’s probably better to learn <code>set</code> instead)</li>\n</ul>\n<h3 id=\"on-fish-as-a-default-shell\">on fish as a default shell</h3>\n<p>Changing my default shell to fish is always a little annoying, I occasionally get myself into a situation where</p>\n<ol>\n<li>I install fish somewhere like maybe <code>/home/bork/.nix-stuff/bin/fish</code></li>\n<li>I add the new fish location to <code>/etc/shells</code> as an allowed shell</li>\n<li>I change my shell with <code>chsh</code></li>\n<li>at some point months/years later I reinstall fish in a different location for some reason and remove the old one</li>\n<li>oh no!!! I have no valid shell! I can’t open a new terminal tab anymore!</li>\n</ol>\n<p>This has never been a major issue because I always have a terminal open\nsomewhere where I can fix the problem and rescue myself, but it’s a bit\nalarming.</p>\n<p>If you don’t want to use <code>chsh</code> to change your shell to fish (which is very reasonable,\nmaybe I shouldn’t be doing that), the <a href=\"https://wiki.archlinux.org/title/Fish\">Arch wiki page</a> has a couple of good suggestions –\neither configure your terminal emulator to run fish or add an <code>exec fish</code> to\nyour <code>.bashrc</code>.</p>\n<h3 id=\"i-ve-never-really-learned-the-scripting-language\">I’ve never really learned the scripting language</h3>\n<p>Other than occasionally writing a for loop interactively on the command line,\nI’ve never really learned the fish scripting language. I still do all of my\nshell scripting in bash.</p>\n<p>I don’t think I’ve ever written a fish function or <code>if</code> statement.</p>\n<h3 id=\"it-seems-like-fish-is-getting-pretty-popular\">it seems like fish is getting pretty popular</h3>\n<p>I ran a highly unscientific poll on Mastodon asking people what shell they <a href=\"https://social.jvns.ca/@b0rk/112722850642874842\">use interactively</a>. The results were (of 2600 responses):</p>\n<ul>\n<li>46% bash</li>\n<li>49% zsh</li>\n<li>16% fish</li>\n<li>5% other</li>\n</ul>\n<p>I think 16% for fish is pretty remarkable, since (as far as I know) there isn’t\nany system where fish is the default shell, and my sense is that it’s very\ncommon to just stick to whatever your system’s default shell is.</p>\n<p>It feels like a big achievement for the fish project, even if maybe my Mastodon\nfollowers are more likely than the average shell user to use fish for some\nreason.</p>\n<h3 id=\"who-might-fish-be-right-for\">who might fish be right for?</h3>\n<p>Fish definitely isn’t for everyone. I think I like it because:</p>\n<ol>\n<li>I really dislike configuring my shell (and honestly my dev environment in general), I want things to “just work” with the default settings</li>\n<li>fish’s defaults feel good to me</li>\n<li>I don’t spend that much time logged into random servers using other shells\nso there’s not too much context switching</li>\n<li>I liked its features so much that I was willing to relearn how to do a few\n“basic” shell things, like using parentheses <code>(seq 1 10)</code> to run a command\ninstead of backticks or using <code>set</code> instead of <code>export</code></li>\n</ol>\n<p>Maybe you’re also a person who would like fish! I hope a few more of the people\nwho fish is for can find it, because I spend so much of my time in the terminal\nand it’s made that time much more pleasant.</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/08/19/migrating-mess-with-dns-to-use-powerdns/",
      "title": "Migrating Mess With DNS to use PowerDNS",
      "description": null,
      "url": "https://jvns.ca/blog/2024/08/19/migrating-mess-with-dns-to-use-powerdns/",
      "published": null,
      "updated": "2024-08-19T08:15:28.000Z",
      "content": "<p>About 3 years ago, I announced <a href=\"https://messwithdns.net/\">Mess With DNS</a> in\n<a href=\"https://jvns.ca/blog/2021/12/15/mess-with-dns/\">this blog post</a>, a playground\nwhere you can learn how DNS works by messing around and creating records.</p>\n<p>I wasn’t very careful with the DNS implementation though (to quote the release blog\npost: “following the DNS RFCs? not exactly”), and people started reporting\nproblems that eventually I decided that I wanted to fix.</p>\n<h3 id=\"the-problems\">the problems</h3>\n<p>Some of the problems people have reported were:</p>\n<ul>\n<li>domain names with underscores weren’t allowed, even though they should be</li>\n<li>If there was a CNAME record for a domain name, it allowed you to create other records for that domain name, even if it shouldn’t</li>\n<li>you could create 2 different CNAME records for the same domain name, which shouldn’t be allowed</li>\n<li>no support for the SVCB or HTTPS record types, which seemed a little complex to implement</li>\n<li>no support for upgrading from UDP to TCP for big responses</li>\n</ul>\n<p>And there are certainly more issues that nobody got around to reporting, for\nexample that if you added an NS record for a subdomain to delegate it, Mess\nWith DNS wouldn’t handle the delegation properly.</p>\n<h3 id=\"the-solution-powerdns\">the solution: PowerDNS</h3>\n<p>I wasn’t sure how to fix these problems for a long time – technically I\n<em>could</em> have started addressing them individually, but it felt like there were\na million edge cases and I’d never get there.</p>\n<p>But then one day I was chatting with someone else who was working on a DNS\nserver and they said they were using <a href=\"https://github.com/PowerDNS/pdns/\">PowerDNS</a>: an open\nsource DNS server with an HTTP API!</p>\n<p>This seemed like an obvious solution to my problems – I could just swap out my\nown crappy DNS implementation for PowerDNS.</p>\n<p>There were a couple of challenges I ran into when setting up PowerDNS that I’ll\ntalk about here. I really don’t do a lot of web development and I think I’ve never\nbuilt a website that depends on a relatively complex API before, so it was a\nbit of a learning experience.</p>\n<h3 id=\"challenge-1-getting-every-query-made-to-the-dns-server\">challenge 1: getting every query made to the DNS server</h3>\n<p>One of the main things Mess With DNS does is give you a live view of every DNS\nquery it receives for your subdomain, using a websocket. To make this work, it\nneeds to intercept every DNS query before they it gets sent to the PowerDNS DNS\nserver:</p>\n<p>There were 2 options I could think of for how to intercept the DNS queries:</p>\n<ol>\n<li>dnstap: <code>dnsdist</code> (a DNS load balancer from the PowerDNS project) has\nsupport for logging all DNS queries it receives using\n<a href=\"https://dnstap.info/\">dnstap</a>, so I could put dnsdist in front of PowerDNS\nand then log queries that way</li>\n<li>Have my Go server listen on port 53 and proxy the queries myself</li>\n</ol>\n<p>I originally implemented option #1, but for some reason there was a 1 second\ndelay before every query got logged. I couldn’t figure out why, so I\nimplemented my own <a href=\"https://github.com/jvns/mess-with-dns/blob/3423c9496dd772f7157a56f9e068fd926e89c331/api/main.go#L265-L310\">very simple proxy</a> instead.</p>\n<h3 id=\"challenge-2-should-the-frontend-have-direct-access-to-the-powerdns-api\">challenge 2: should the frontend have direct access to the PowerDNS API?</h3>\n<p>The frontend used to have a lot of DNS logic in it – it converted emoji domain\nnames to ASCII using punycode, had a lookup table to convert numeric DNS query\ntypes (like <code>1</code>) to their human-readable names (like <code>A</code>), did a little bit of\nvalidation, and more.</p>\n<p>Originally I considered keeping this pattern and just giving the frontend (more\nor less) direct access to the PowerDNS API to create and delete, but writing\neven more complex code in Javascript didn’t feel that appealing to me – I\ndon’t really know how to write tests in Javascript and it seemed like it\nwouldn’t end well.</p>\n<p>So I decided to take all of the DNS logic out of the frontend and write a new\nDNS API for managing records, shaped something like this:</p>\n<ul>\n<li><code>GET /records</code></li>\n<li><code>DELETE /records/<ID></code></li>\n<li><code>DELETE /records/</code> (delete all records for a user)</li>\n<li><code>POST /records/</code> (create record)</li>\n<li><code>POST /records/<ID></code> (update record)</li>\n</ul>\n<p>This meant that I could actually write tests for my code, since the backend is\nin Go and I do know how to write tests in Go.</p>\n<h3 id=\"what-i-learned-it-s-okay-for-an-api-to-duplicate-information\">what I learned: it’s okay for an API to duplicate information</h3>\n<p>I had this idea that APIs shouldn’t return duplicate information – for example\nif I get a DNS record, it should only include a given piece of information\nonce.</p>\n<p>But I ran into a problem with that idea when displaying MX records: an MX\nrecord has 2 fields, “preference”, and “mail server”. And I needed to display\nthat information in 2 different ways on the frontend:</p>\n<ol>\n<li>In a form, where “Preference” and “Mail Server” are 2 different form fields (like <code>10</code> and <code>mail.example.com</code>)</li>\n<li>In a summary view, where I wanted to just show the record (<code>10 mail.example.com</code>)</li>\n</ol>\n<p>This is kind of a small problem, but it came up in a few different places.</p>\n<p>I talked to my friend Marco Rogers about this, and based on some advice from\nhim I realized that I could return the same information in the API in 2\ndifferent ways! Then the frontend just has to display it. So I started just\nreturning duplicate information in the API, something like this:</p>\n<pre><code>{\n  values: {'Preference': 10, 'Server': 'mail.example.com'},\n  content: '10 mail.example.com',\n  ...\n}\n</code></pre>\n<p>I ended up using this pattern in a couple of other places where I needed to\ndisplay the same information in 2 different ways and it was SO much easier.</p>\n<p>I think what I learned from this is that if I’m making an API that isn’t\nintended for external use (there are no users of this API other than the\nfrontend!), I can tailor it very specifically to the frontend’s needs and\nthat’s okay.</p>\n<h3 id=\"challenge-3-what-s-a-record-s-id\">challenge 3: what’s a record’s ID?</h3>\n<p>In Mess With DNS (and I think in most DNS user interfaces!), you create, add, and delete <strong>records</strong>.</p>\n<p>But that’s not how the PowerDNS API works. In PowerDNS, you create a <strong>zone</strong>,\nwhich is made of <strong>record sets</strong>. Records don’t have any ID in the API at all.</p>\n<p>I ended up solving this by generate a fake ID for each records which is made of:</p>\n<ul>\n<li>its <strong>name</strong></li>\n<li>its <strong>type</strong></li>\n<li>and its <strong>content</strong> (base64-encoded)</li>\n</ul>\n<p>For example one record’s ID is <code>brooch225.messwithdns.com.|NS|bnMxLm1lc3N3aXRoZG5zLmNvbS4=</code></p>\n<p>Then I can search through the zone and find the appropriate record to update\nit.</p>\n<p>This means that if you update a record then its ID will change which isn’t\nusually what I want in an ID, but that seems fine.</p>\n<h3 id=\"challenge-4-making-clear-error-messages\">challenge 4: making clear error messages</h3>\n<p>I think the error messages that the PowerDNS API returns aren’t really intended to be shown to end users, for example:</p>\n<ul>\n<li><code>Name 'new\\032site.island358.messwithdns.com.' contains unsupported characters</code> (this error encodes the space as <code>\\032</code>, which is a bit disorienting if you don’t know that the space character is 32 in ASCII)</li>\n<li><code>RRset test.pear5.messwithdns.com. IN CNAME: Conflicts with pre-existing RRset</code> (this talks about RRsets, which aren’t a concept that the Mess With DNS UI has at all)</li>\n<li><code>Record orange.beryl5.messwithdns.com./A '1.2.3.4$': Parsing record content (try 'pdnsutil check-zone'): unable to parse IP address, strange character: $</code> (mentions “pdnsutil”, a utility which Mess With DNS’s users don’t have\naccess to in this context)</li>\n</ul>\n<p>I ended up handling this in two ways:</p>\n<ol>\n<li>Do some initial basic validation of values that users enter (like IP addresses), so I can just return errors like <code>Invalid IPv4 address: \"1.2.3.4$</code></li>\n<li>If that goes well, send the request to PowerDNS and if we get an error back, then do some <a href=\"https://github.com/jvns/mess-with-dns/blob/c02579190e103218b2c8dfc6dceb19f863752f15/api/records/pdns_errors.go\">hacky translation</a> of those messages to make them clearer.</li>\n</ol>\n<p>Sometimes users will still get errors from PowerDNS directly, but I added some\nlogging of all the errors that users see, so hopefully I can review them and\nadd extra translations if there are other common errors that come up.</p>\n<p>I think what I learned from this is that if I’m building a user-facing\napplication on top of an API, I need to be pretty thoughtful about how I\nresurface those errors to users.</p>\n<h3 id=\"challenge-5-setting-up-sqlite\">challenge 5: setting up SQLite</h3>\n<p>Previously Mess With DNS was using a Postgres database. This was problematic\nbecause I only gave the Postgres machine 256MB of RAM, which meant that the\ndatabase got OOM killed almost every single day. I never really worked out\nexactly why it got OOM killed every day, but that’s how it was. I spent some\ntime trying to tune Postgres’ memory usage by setting the max connections /\n<code>work-mem</code> / <code>maintenance-work-mem</code> and it helped a bit but didn’t solve the\nproblem.</p>\n<p>So for this refactor I decided to use SQLite instead, because the website\ndoesn’t really get that much traffic. There are some choices involved with\nusing SQLite, and I decided to:</p>\n<ol>\n<li>Run <code>db.SetMaxOpenConns(1)</code> to make sure that we only open 1 connection to\nthe database at a time, to prevent <code>SQLITE_BUSY</code> errors from two threads\ntrying to access the database at the same time (just setting WAL mode didn’t\nwork)</li>\n<li>Use separate databases for each of the 3 tables (users, records, and\nrequests) to reduce contention. This maybe isn’t really necessary, but there\nwas no reason I needed the tables to be in the same database so I figured I’d set\nup separate databases to be safe.</li>\n<li>Use the cgo-free <a href=\"https://pkg.go.dev/modernc.org/sqlite?utm_source=godoc\">modernc.org/sqlite</a>, which <a href=\"https://datastation.multiprocess.io/blog/2022-05-12-sqlite-in-go-with-and-without-cgo.html\">translates SQLite’s source code to Go</a>.\nI might switch to a more “normal” sqlite implementation instead at some point and use cgo though.\nI think the main reason I prefer to avoid cgo is that cgo has landed me with <a href=\"https://jvns.ca/blog/2021/11/17/debugging-a-weird--file-not-found--error/\">difficult-to-debug errors in the past</a>.</li>\n<li>use WAL mode</li>\n</ol>\n<p>I still haven’t set up backups, though I don’t think my Postgres database had\nbackups either. I think I’m unlikely to use\n<a href=\"https://litestream.io/\">litestream</a> for backups – Mess With DNS is very far\nfrom a critical application, and I think daily backups that I could recover\nfrom in case of a disaster are more than good enough.</p>\n<h3 id=\"challenge-6-upgrading-vue-managing-forms\">challenge 6: upgrading Vue & managing forms</h3>\n<p>This has nothing to do with PowerDNS but I decided to upgrade Vue.js from\nversion 2 to 3 as part of this refresh. The main problem with that is that the\nform validation library I was using (FormKit) completely changed its API\nbetween Vue 2 and Vue 3, so I decided to just stop using it instead of learning\nthe new API.</p>\n<p>I ended up switching to some form validation tools that are built into the\nbrowser like <code>required</code> and <code>oninvalid</code> (<a href=\"https://github.com/jvns/mess-with-dns/blob/90f7a2d2982c8151a3ddcab532bc1db07a043f84/frontend/components/NewRecord.html#L5-L8\">here’s the code</a>).\nI think it could use some of improvement, I still don’t understand forms very well.</p>\n<h3 id=\"challenge-7-managing-state-in-the-frontend\">challenge 7: managing state in the frontend</h3>\n<p>This also has nothing to do with PowerDNS, but when modifying the frontend I\nrealized that my state management in the frontend was a mess – in every place\nwhere I made an API request to the backend, I had to try to remember to add a\n“refresh records” call after that in every place that I’d modified the state\nand I wasn’t always consistent about it.</p>\n<p>With some more advice from Marco, I ended up implementing a single global\n<a href=\"https://github.com/jvns/mess-with-dns/blob/90f7a2d2982c8151a3ddcab532bc1db07a043f84/frontend/store.ts#L32-L44\">state management store</a>\nwhich stores all the state for the application, and which lets me\ncreate/update/delete records.</p>\n<p>Then my components can just call <code>store.createRecord(record)</code>, and the store\nwill automatically resynchronize all of the state as needed.</p>\n<h3 id=\"challenge-8-sequencing-the-project\">challenge 8: sequencing the project</h3>\n<p>This project ended up having several steps because I reworked the whole\nintegration between the frontend and the backend. I ended up splitting it into\na few different phases:</p>\n<ol>\n<li>Upgrade Vue from v2 to v3</li>\n<li>Make the state management store</li>\n<li>Implement a different backend API, move a lot of DNS logic out of the frontend, and add tests for the backend</li>\n<li>Integrate PowerDNS</li>\n</ol>\n<p>I made sure that the website was (more or less) 100% working and then deployed\nit in between phases, so that the amount of changes I was managing at a time\nstayed somewhat under control.</p>\n<h3 id=\"the-new-website-is-up-now\">the new website is up now!</h3>\n<p>I released the upgraded website a few days ago and it seems to work!\nThe PowerDNS API has been great to work on top of, and I’m relieved that\nthere’s a whole class of problems that I now don’t have to think about at all,\nother than potentially trying to make the error messages from PowerDNS a little\nclearer. Using PowerDNS has fixed a lot of the DNS issues that folks have\nreported in the last few years and it feels great.</p>\n<p>If you run into problems with the new Mess With DNS I’d love to <a href=\"https://github.com/jvns/mess-with-dns/issues/\">hear about them here</a>.</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/08/06/go-structs-copied-on-assignment/",
      "title": "Go structs are copied on assignment (and other things about Go I'd missed)",
      "description": null,
      "url": "https://jvns.ca/blog/2024/08/06/go-structs-copied-on-assignment/",
      "published": null,
      "updated": "2024-08-06T08:38:35.000Z",
      "content": "<p>I’ve been writing Go pretty casually for years – the backends for all of my\nplaygrounds (<a href=\"https://nginx-playground.wizardzines.com/\">nginx</a>, <a href=\"https://messwithdns.net/\">dns</a>, <a href=\"https://memory-spy.wizardzines.com/\">memory</a>, <a href=\"https://dns-lookup.jvns.ca/\">more DNS</a>) are written in Go, but many of those projects are just a few hundred lines and I don’t come back to those codebases much.</p>\n<p>I thought I more or less understood the basics of the language, but this week\nI’ve been writing a lot more Go than usual while working on some upgrades to\n<a href=\"https://messwithdns.net\">Mess with DNS</a>, and ran into a bug that revealed I\nwas missing a very basic concept!</p>\n<p>Then I posted about this on Mastodon and someone linked me to this very cool\nsite (and book) called <a href=\"https://100go.co\">100 Go Mistakes and How To Avoid Them</a> by <a href=\"https://teivah.dev/\">Teiva Harsanyi</a>. It just came out in 2022 so it’s relatively new.</p>\n<p>I decided to read through the site to see what <em>else</em> I was missing, and found\na couple of other misconceptions I had about Go. I’ll talk about some of the\nmistakes that jumped out to me the most, but really the whole\n<a href=\"https://100go.co/\">100 Go Mistakes</a> site is great and I’d recommend reading it.</p>\n<p>Here’s the initial mistake that started me on this journey:</p>\n<h3 id=\"mistake-1-not-understanding-that-structs-are-copied-on-assignment\">mistake 1: not understanding that structs are copied on assignment</h3>\n<p>Let’s say we have a struct:</p>\n<pre><code>type Thing struct {\n    Name string\n}\n</code></pre>\n<p>and this code:</p>\n<pre><code>thing := Thing{\"record\"}\nother_thing := thing\nother_thing.Name = \"banana\"\nfmt.Println(thing)\n</code></pre>\n<p>This prints “record” and not “banana” (<a href=\"https://go.dev/play/p/kUeP2ocFtXw\">play.go.dev link</a>), because <code>thing</code> is copied when you\nassign it to <code>other_thing</code>.</p>\n<h3 id=\"the-problem-this-caused-me-ranges\">the problem this caused me: ranges</h3>\n<p>The bug I spent 2 hours of my life debugging last week was effectively this code (<a href=\"https://go.dev/play/p/85FnGG86UBP\">play.go.dev link</a>):</p>\n<pre><code>type Thing struct {\n  Name string\n}\nfunc findThing(things []Thing, name string) *Thing {\n  for _, thing := range things {\n    if thing.Name == name {\n      return &thing\n    }\n  }\n  return nil\n}\n\nfunc main() {\n  things := []Thing{Thing{\"record\"}, Thing{\"banana\"}}\n  thing := findThing(things, \"record\")\n  thing.Name = \"gramaphone\"\n  fmt.Println(things)\n}\n</code></pre>\n<p>This prints out <code>[{record} {banana}]</code> – because <code>findThing</code> returned a copy, we didn’t change the name in the original array.</p>\n<p>This mistake is <a href=\"https://100go.co/#ignoring-that-elements-are-copied-in-range-loops-30\">#30 in 100 Go Mistakes</a>.</p>\n<p>I fixed the bug by changing it to something like this (<a href=\"https://go.dev/play/p/CKZCRUwv_nG\">play.go.dev link</a>), which returns a\nreference to the item in the array we’re looking for instead of a copy.</p>\n<pre><code>func findThing(things []Thing, name string) *Thing {\n  for i := range things {\n    if things[i].Name == name {\n      return &things[i]\n    }\n  }\n  return nil\n}\n</code></pre>\n<h3 id=\"why-didn-t-i-realize-this\">why didn’t I realize this?</h3>\n<p>When I learned that I was mistaken about how assignment worked in Go I was\nreally taken aback, like – it’s such a basic fact about the language works!\nIf I was wrong about that then what ELSE am I wrong about in Go????</p>\n<p>My best guess for what happened is:</p>\n<ol>\n<li>I’ve heard for my whole life that when you define a function,\nyou need to think about whether its arguments are passed by <strong>reference</strong> or\nby <strong>value</strong></li>\n<li>So I’d thought about this in Go, and I knew that if you pass a struct as a\nvalue to a function, it gets copied – if you want to pass a reference then\nyou have to pass a pointer</li>\n<li>But somehow it never occurred to me that you need to think about the same\nthing for <strong>assignments</strong>, perhaps because in most of the other languages I\nuse (Python, JS, Java) I think everything is a reference anyway. Except for\nin Rust, where you do have values that you make copies of but I think most of the time I had to run <code>.clone()</code> explicitly.\n(though apparently structs will be automatically copied on assignment if the struct implements the <code>Copy</code> trait)</li>\n<li>Also obviously I just don’t write that much Go so I guess it’s never come\nup.</li>\n</ol>\n<h3 id=\"mistake-2-side-effects-appending-slices-25-https-100go-co-unexpected-side-effects-using-slice-append-25\">mistake 2: side effects appending slices (<a href=\"https://100go.co/#unexpected-side-effects-using-slice-append-25\">#25</a>)</h3>\n<p>When you subset a slice with <code>x[2:3]</code>, the original slice and the sub-slice\nshare the same backing array, so if you append to the new slice, it can\nunintentionally change the old slice:</p>\n<p>For example, this code prints <code>[1 2 3 555 5]</code> (<a href=\"https://go.dev/play/p/qssfM_NSXJD\">code on play.go.dev</a>)</p>\n<pre><code>x := []int{1, 2, 3, 4, 5}\ny := x[2:3]\ny = append(y, 555)\nfmt.Println(x)\n</code></pre>\n<p>I don’t think this has ever actually happened to me, but it’s alarming and I’m\nvery happy to know about it.</p>\n<p>Apparently you can avoid this problem by changing <code>y := x[2:3]</code> to <code>y := x[2:3:3]</code>, which restricts the new slice’s capacity so that appending to it\nwill re-allocate a new slice. Here’s some <a href=\"https://go.dev/play/p/aE78JUL4-Iv\">code on play.go.dev</a> that does that.</p>\n<h3 id=\"mistake-3-not-understanding-the-different-types-of-method-receivers-42\">mistake 3: not understanding the different types of method receivers (#42)</h3>\n<p>This one isn’t a “mistake” exactly, but it’s been a source of confusion for me\nand it’s pretty simple so I’m glad to have it cleared up.</p>\n<p>In Go you can declare methods in 2 different ways:</p>\n<ol>\n<li><code>func (t Thing) Function()</code> (a “value receiver”)</li>\n<li><code>func (t *Thing) Function()</code> (a “pointer receiver”)</li>\n</ol>\n<p>My understanding now is that basically:</p>\n<ul>\n<li>If you want the method to mutate the struct <code>t</code>, you need a pointer receiver.</li>\n<li>If you want to make sure the method <strong>doesn’t</strong> mutate the struct <code>t</code>, use a value receiver.</li>\n</ul>\n<p><a href=\"https://100go.co/#not-knowing-which-type-of-receiver-to-use-42\">Explanation #42</a> has a\nbunch of other interesting details though. There’s definitely still something\nI’m missing about value vs pointer receivers (I got a compile error related to\nthem a couple of times in the last week that I still don’t understand), but\nhopefully I’ll run into that error again soon and I can figure it out.</p>\n<h3 id=\"more-interesting-things-i-noticed\">more interesting things I noticed</h3>\n<p>Some more notes from 100 Go Mistakes:</p>\n<ul>\n<li>apparently you can <a href=\"https://100go.co/#never-using-named-result-parameters-43\">name the outputs of your function (#43)</a>, though that can have <a href=\"https://100go.co/#unintended-side-effects-with-named-result-parameters-44\">issues (#44)</a> and I’m not sure I want to</li>\n<li><a href=\"https://100go.co/#not-exploring-all-the-go-testing-features-90\">apparently you can put tests in a different package (#90)</a> to\nensure that you only use the package’s public interfaces, which seems really\nuseful</li>\n<li>there are a lots of notes about how to use contexts, channels, goroutines,\nmutexes, sync.WaitGroup, etc. I’m sure I have something to learn about all of\nthose but today is not the day I’m going to learn them.</li>\n</ul>\n<p>Also there are some things that have tripped me up in the past, like:</p>\n<ul>\n<li><a href=\"https://100go.co/#forgetting-the-return-statement-after-replying-to-an-http-request-80\">forgetting the return statement after replying to an HTTP request (#80)</a></li>\n<li><a href=\"https://100go.co/#not-using-testing-utility-packages-httptest-and-iotest-88\">not realizing the httptest package exists (#88)</a></li>\n</ul>\n<h3 id=\"this-100-common-mistakes-format-is-great\">this “100 common mistakes” format is great</h3>\n<p>I really appreciated this “100 common mistakes” format – it made it really\neasy for me to skim through the mistakes and very quickly mentally classify\nthem into:</p>\n<ol>\n<li>yep, I know that</li>\n<li>not interested in that one right now</li>\n<li>WOW WAIT I DID NOT KNOW THAT, THAT IS VERY USEFUL!!!!</li>\n</ol>\n<p>It looks like “100 Common Mistakes” is a series of books from Manning and they\nalso have “100 Java Mistakes” and an upcoming “100 SQL Server Mistakes”.</p>\n<p>Also I enjoyed what I’ve read of <a href=\"https://effectivepython.com/\">Effective Python</a> by Brett Slatkin, which has a similar “here are a bunch of\nshort Python style tips” structure where you can quickly skim it and take\nwhat’s useful to you. There’s also Effective C++, Effective Java, and probably\nmore.</p>\n<h3 id=\"some-other-go-resources\">some other Go resources</h3>\n<p>other resources I’ve appreciated:</p>\n<ul>\n<li><a href=\"https://gobyexample.com/\">Go by example</a> for basic syntax</li>\n<li><a href=\"https://go.dev/play/\">go.dev/play</a></li>\n<li>obviously <a href=\"https://pkg.go.dev\">https://pkg.go.dev</a> for documentation about literally everything</li>\n<li><a href=\"https://staticcheck.dev/\">staticcheck</a> seems like a useful linter – for\nexample I just started using it to tell me when I’ve forgotten to handle an\nerror</li>\n<li>apparently <a href=\"https://golangci-lint.run/\">golangci-lint</a> includes a bunch of different linters</li>\n</ul>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/07/08/readline/",
      "title": "Entering text in the terminal is complicated",
      "description": null,
      "url": "https://jvns.ca/blog/2024/07/08/readline/",
      "published": null,
      "updated": "2024-07-08T13:00:15.000Z",
      "content": "<p>The other day I asked what folks on Mastodon find confusing about working in\nthe terminal, and one thing that stood out to me was “editing a command you\nalready typed in”.</p>\n<p>This really resonated with me: even though entering some text and editing it is\na very “basic” task, it took me maybe 15 years of using the terminal every\nsingle day to get used to using <code>Ctrl+A</code> to go to the beginning of the line (or\n<code>Ctrl+E</code> for the end – I think I used <code>Home</code>/<code>End</code> instead).</p>\n<p>So let’s talk about why entering text might be hard! I’ll also share a few tips\nthat I wish I’d learned earlier.</p>\n<h3 id=\"it-s-very-inconsistent-between-programs\">it’s very inconsistent between programs</h3>\n<p>A big part of what makes entering text in the terminal hard is the\ninconsistency between how different programs handle entering text. For example:</p>\n<ol>\n<li>some programs (<code>cat</code>, <code>nc</code>, <code>git commit --interactive</code>, etc) don’t support using arrow keys at all: if you press arrow keys, you’ll just see <code>^[[D^[[D^[[C^[[C^</code></li>\n<li>many programs (like <code>irb</code>, <code>python3</code> on a Linux machine and many many more) use the <code>readline</code> library, which gives you a lot of basic functionality (history, arrow keys, etc)</li>\n<li>some programs (like <code>/usr/bin/python3</code> on my Mac) do support very basic features like arrow keys, but not other features like <code>Ctrl+left</code> or reverse searching with <code>Ctrl+R</code></li>\n<li>some programs (like the <code>fish</code> shell or <code>ipython3</code> or <code>micro</code> or <code>vim</code>) have their own fancy system for accepting input which is totally custom</li>\n</ol>\n<p>So there’s a lot of variation! Let’s talk about each of those a little more.</p>\n<h3 id=\"mode-1-the-baseline\">mode 1: the baseline</h3>\n<p>First, there’s “the baseline” – what happens if a program just accepts text by\ncalling <code>fgets()</code> or whatever and doing absolutely nothing else to provide a\nnicer experience. Here’s what using these tools typically looks for me – If I\nstart the version of <a href=\"https://wiki.archlinux.org/title/Dash\">dash</a> installed on\nmy machine (a pretty minimal shell) press the left arrow keys, it just prints\n<code>^[[D</code> to the terminal.</p>\n<pre><code>$ ls l-^[[D^[[D^[[D\n</code></pre>\n<p>At first it doesn’t seem like all of these “baseline” tools have much in\ncommon, but there are actually a few features that you get for free just from\nyour terminal, without the program needing to do anything special at all.</p>\n<p>The things you get for free are:</p>\n<ol>\n<li>typing in text, obviously</li>\n<li>backspace</li>\n<li><code>Ctrl+W</code>, to delete the previous word</li>\n<li><code>Ctrl+U</code>, to delete the whole line</li>\n<li>a few other things unrelated to text editing (like <code>Ctrl+C</code> to interrupt the process, <code>Ctrl+Z</code> to suspend, etc)</li>\n</ol>\n<p>This is not <em>great</em>, but it means that if you want to delete a word you\ngenerally can do it with <code>Ctrl+W</code> instead of pressing backspace 15 times, even\nif you’re in an environment which is offering you absolutely zero features.</p>\n<p>You can get a list of all the ctrl codes that your terminal supports with <code>stty -a</code>.</p>\n<h3 id=\"mode-2-tools-that-use-readline\">mode 2: tools that use <code>readline</code></h3>\n<p>The next group is tools that use readline! Readline is a GNU library to make\nentering text more pleasant, and it’s very widely used.</p>\n<p>My favourite readline keyboard shortcuts are:</p>\n<ol>\n<li><code>Ctrl+E</code> (or <code>End</code>) to go to the end of the line</li>\n<li><code>Ctrl+A</code> (or <code>Home</code>) to go to the beginning of the line</li>\n<li><code>Ctrl+left/right arrow</code> to go back/forward 1 word</li>\n<li>up arrow to go back to the previous command</li>\n<li><code>Ctrl+R</code> to search your history</li>\n</ol>\n<p>And you can use <code>Ctrl+W</code> / <code>Ctrl+U</code> from the “baseline” list, though <code>Ctrl+U</code>\ndeletes from the cursor to the beginning of the line instead of deleting the\nwhole line. I think <code>Ctrl+W</code> might also have a slightly different definition of\nwhat a “word” is.</p>\n<p>There are a lot more (<a href=\"https://www.man7.org/linux/man-pages/man3/readline.3.html#EDITING_COMMANDS\">here’s a full list</a>), but those are the only ones that I personally use.</p>\n<p>The <code>bash</code> shell is probably the most famous readline user (when you use\n<code>Ctrl+R</code> to search your history in bash, that feature actually comes from\nreadline), but there are TONS of programs that use it – for example <code>psql</code>,\n<code>irb</code>, <code>python3</code>, etc.</p>\n<h3 id=\"tip-you-can-make-anything-use-readline-with-rlwrap\">tip: you can make ANYTHING use readline with <code>rlwrap</code></h3>\n<p>One of my absolute favourite things is that if you have a program like <code>nc</code>\nwithout readline support, you can just run <code>rlwrap nc</code> to turn it into a\nprogram with readline support!</p>\n<p>This is incredible and makes a lot of tools that are borderline unusable MUCH\nmore pleasant to use. You can even apparently set up <a href=\"https://github.com/hanslub42/rlwrap\">rlwrap</a> to include your own\ncustom autocompletions, though I’ve never tried that.</p>\n<h3 id=\"some-reasons-tools-might-not-use-readline\">some reasons tools might not use readline</h3>\n<p>I think reasons tools might not use readline might include:</p>\n<ul>\n<li>the program is very simple (like <code>cat</code> or <code>nc</code>) and maybe the maintainers don’t want to bring in a relatively large dependency</li>\n<li>license reasons, if the program’s license is not GPL-compatible – readline is GPL-licensed, not LGPL</li>\n<li>only a very small part of the program is interactive, and maybe readline\nsupport isn’t seen as important. For example <code>git</code> has a few interactive\nfeatures (like <code>git add -p</code>), but not very many, and usually you’re just\ntyping a single character like <code>y</code> or <code>n</code> – most of the time you need to really\ntype something significant in git, it’ll drop you into a text editor instead.</li>\n</ul>\n<p>For example idris2 says <a href=\"https://idris2.readthedocs.io/en/latest/tutorial/interactive.html#editing-at-the-repl\">they don’t use readline</a>\nto keep dependencies minimal and suggest using <code>rlwrap</code> to get better\ninteractive features.</p>\n<h3 id=\"how-to-know-if-you-re-using-readline\">how to know if you’re using readline</h3>\n<p>The simplest test I can think of is to press <code>Ctrl+R</code>, and if you see:</p>\n<pre><code>(reverse-i-search)`':\n</code></pre>\n<p>then you’re probably using readline. This obviously isn’t a guarantee (some\nother library could use the term <code>reverse-i-search</code> too!), but I don’t know of\nanother system that uses that specific term to refer to searching history.</p>\n<h3 id=\"the-readline-keybindings-come-from-emacs\">the readline keybindings come from Emacs</h3>\n<p>Because I’m a vim user, It took me a very long time to understand where these\nkeybindings come from (why <code>Ctrl+A</code> to go to the beginning of a line??? so\nweird!)</p>\n<p>My understanding is these keybindings actually come from Emacs – <code>Ctrl+A</code> and\n<code>Ctrl+E</code> do the same thing in Emacs as they do in Readline and I assume the\nother keyboard shortcuts mostly do as well, though I tried out <code>Ctrl+W</code> and\n<code>Ctrl+U</code> in Emacs and they don’t do the same thing as they do in the terminal\nso I guess there are some differences.</p>\n<p>There’s some more <a href=\"https://twobithistory.org/2019/08/22/readline.html\">history of the Readline project here</a>.</p>\n<h3 id=\"mode-3-another-input-library-like-libedit\">mode 3: another input library (like <code>libedit</code>)</h3>\n<p>On my Mac laptop, <code>/usr/bin/python3</code> is in a weird middle ground where it\nsupports <em>some</em> readline features (for example the arrow keys), but not the\nother ones. For example when I press <code>Ctrl+left arrow</code>, it prints out <code>;5D</code>,\nlike this:</p>\n<pre><code>$ python3\n>>> importt subprocess;5D\n</code></pre>\n<p>Folks on Mastodon helped me figure out that this is because in the default\nPython install on Mac OS, the Python <code>readline</code> module is actually backed by\n<code>libedit</code>, which is a similar library which has fewer features, presumably\nbecause Readline is <a href=\"https://en.wikipedia.org/wiki/GNU_Readline#Choice_of_the_GPL_as_GNU_Readline's_license\">GPL licensed</a>.</p>\n<p>Here’s how I was eventually able to figure out that Python was using libedit on\nmy system:</p>\n<pre><code>$ python3 -c \"import readline; print(readline.__doc__)\"\nImporting this module enables command line editing using libedit readline.\n</code></pre>\n<p>Generally Python uses readline though if you install it on Linux or through\nHomebrew. It’s just that the specific version that Apple includes on their\nsystems doesn’t have readline. Also <a href=\"https://docs.python.org/3.13/whatsnew/3.13.html#a-better-interactive-interpreter\">Python 3.13 is going to remove the readline dependency</a>\nin favour of a custom library, so “Python uses readline” won’t be true in the\nfuture.</p>\n<p>I assume that there are more programs on my Mac that use libedit but I haven’t\nlooked into it.</p>\n<h3 id=\"mode-4-something-custom\">mode 4: something custom</h3>\n<p>The last group of programs is programs that have their own custom (and sometimes\nmuch fancier!) system for editing text. This includes:</p>\n<ul>\n<li>most terminal text editors (nano, micro, vim, emacs, etc)</li>\n<li>some shells (like fish), for example it seems like fish supports <code>Ctrl+Z</code> for undo when typing in a command. Zsh’s line editor is called <a href=\"https://zsh.sourceforge.io/Guide/zshguide04.html\">zle</a>.</li>\n<li>some REPLs (like <code>ipython</code>), for example IPython uses the <a href=\"https://python-prompt-toolkit.readthedocs.io/\">prompt_toolkit</a> library instead of readline</li>\n<li>lots of other programs (like <code>atuin</code>)</li>\n</ul>\n<p>Some features you might see are:</p>\n<ul>\n<li>better autocomplete which is more customized to the tool</li>\n<li>nicer history management (for example with syntax highlighting) than the default you get from readline</li>\n<li>more keyboard shortcuts</li>\n</ul>\n<h3 id=\"custom-input-systems-are-often-readline-inspired\">custom input systems are often readline-inspired</h3>\n<p>I went looking at how <a href=\"https://atuin.sh/\">Atuin</a> (a wonderful tool for\nsearching your shell history that I started using recently) handles text input.\nLooking at <a href=\"https://github.com/atuinsh/atuin/blob/a67cfc82fe0dc907a01f07a0fd625701e062a33b/crates/atuin/src/command/client/search/interactive.rs#L382-L430\">the code</a>\nand some of the discussion around it, their implementation is custom but it’s\ninspired by readline, which makes sense to me – a lot of users are used to\nthose keybindings, and it’s convenient for them to work even though atuin\ndoesn’t use readline.</p>\n<p><a href=\"https://python-prompt-toolkit.readthedocs.io/\">prompt_toolkit</a> (the library\nIPython uses) is similar – it actually supports a lot of options (including\nvi-like keybindings), but the default is to support the readline-style\nkeybindings.</p>\n<p>This is like how you see a lot of programs which support very basic vim\nkeybindings (like <code>j</code> for down and <code>k</code> for up). For example Fastmail supports\n<code>j</code> and <code>k</code> even though most of its other keybindings don’t have much\nrelationship to vim.</p>\n<p>I assume that most “readline-inspired” custom input systems have various subtle\nincompatibilities with readline, but this doesn’t really bother me at all\npersonally because I’m extremely ignorant of most of readline’s features. I only use\nmaybe 5 keyboard shortcuts, so as long as they support the 5 basic commands I\nknow (which they always do!) I feel pretty comfortable. And usually these\ncustom systems have much better autocomplete than you’d get from just using\nreadline, so generally I prefer them over readline.</p>\n<h3 id=\"lots-of-shells-support-vi-keybindings\">lots of shells support vi keybindings</h3>\n<p>Bash, zsh, and fish all have a “vi mode” for entering text. In a\n<a href=\"https://social.jvns.ca/@b0rk/112723846172173621\">very unscientific poll</a> I ran on\nMastodon, 12% of people said they use it, so it seems pretty popular.</p>\n<p>Readline also has a “vi mode” (which is how Bash’s support for it works), so by\nextension lots of other programs have it too.</p>\n<p>I’ve always thought that vi mode seems really cool, but for some reason even\nthough I’m a vim user it’s never stuck for me.</p>\n<h3 id=\"understanding-what-situation-you-re-in-really-helps\">understanding what situation you’re in really helps</h3>\n<p>I’ve spent a lot of my life being confused about why a command line application\nI was using wasn’t behaving the way I wanted, and it feels good to be able to\nmore or less understand what’s going on.</p>\n<p>I think this is roughly my mental flowchart when I’m entering text at a command\nline prompt:</p>\n<ol>\n<li>Do the arrow keys not work? Probably there’s no input system at all, but at\nleast I can use <code>Ctrl+W</code> and <code>Ctrl+U</code>, and I can <code>rlwrap</code> the tool if I\nwant more features.</li>\n<li>Does <code>Ctrl+R</code> print <code>reverse-i-search</code>? Probably it’s readline, so I can use\nall of the readline shortcuts I’m used to, and I know I can get some basic\nhistory and press up arrow to get the previous command.</li>\n<li>Does <code>Ctrl+R</code> do something else? This is probably some custom input library:\nit’ll probably act more or less like readline, and I can check the\ndocumentation if I really want to know how it works.</li>\n</ol>\n<p>Being able to diagnose what’s going on like this makes the command line feel a\nmore predictable and less chaotic.</p>\n<h3 id=\"some-things-this-post-left-out\">some things this post left out</h3>\n<p>There are lots more complications related to entering text that we didn’t talk\nabout at all here, like:</p>\n<ul>\n<li>issues related to ssh / tmux / etc</li>\n<li>the <code>TERM</code> environment variable</li>\n<li>how different terminals (gnome terminal, iTerm, xterm, etc) have different kinds of support for copying/pasting text</li>\n<li>unicode</li>\n<li>probably a lot more</li>\n</ul>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/07/03/reasons-to-use-job-control/",
      "title": "Reasons to use your shell's job control",
      "description": null,
      "url": "https://jvns.ca/blog/2024/07/03/reasons-to-use-job-control/",
      "published": null,
      "updated": "2024-07-03T08:00:20.000Z",
      "content": "<p>Hello! Today someone on Mastodon asked about job control (<code>fg</code>, <code>bg</code>, <code>Ctrl+z</code>,\n<code>wait</code>, etc). It made me think about how I don’t use my shell’s job\ncontrol interactively very often: usually I prefer to just open a new terminal\ntab if I want to run multiple terminal programs, or use tmux if it’s over ssh.\nBut I was curious about whether other people used job control more often than me.</p>\n<p>So I <a href=\"https://social.jvns.ca/@b0rk/112716835387523648\">asked on Mastodon</a> for\nreasons people use job control. There were a lot of great responses, and it\neven made me want to consider using job control a little more!</p>\n<p>In this post I’m only going to talk about using job control interactively (not\nin scripts) – the post is already long enough just talking about interactive\nuse.</p>\n<h3 id=\"what-s-job-control\">what’s job control?</h3>\n<p>First: what’s job control? Well – in a terminal, your processes can be in one of 3 states:</p>\n<ol>\n<li>in the <strong>foreground</strong>. This is the normal state when you start a process.</li>\n<li>in the <strong>background</strong>. This is what happens when you run <code>some_process &</code>: the process is still running, but you can’t interact with it anymore unless you bring it back to the foreground.</li>\n<li><strong>stopped</strong>. This is what happens when you start a process and then press <code>Ctrl+Z</code>. This pauses the process: it won’t keep using the CPU, but you can restart it if you want.</li>\n</ol>\n<p>“Job control” is a set of commands for seeing which processes are running in a terminal and moving processes between these 3 states</p>\n<h3 id=\"how-to-use-job-control\">how to use job control</h3>\n<ul>\n<li><code>fg</code> brings a process to the foreground. It works on both stopped processes and background processes. For example, if you start a background process with <code>cat < /dev/zero &</code>, you can bring it back to the foreground by running <code>fg</code></li>\n<li><code>bg</code> restarts a stopped process and puts it in the background.</li>\n<li>Pressing <code>Ctrl+z</code> stops the current foreground process.</li>\n<li><code>jobs</code> lists all processes that are active in your terminal</li>\n<li><code>kill</code> sends a signal (like <code>SIGKILL</code>) to a job (this is the shell builtin <code>kill</code>, not <code>/bin/kill</code>)</li>\n<li><code>disown</code> removes the job from the list of running jobs, so that it doesn’t get killed when you close the terminal</li>\n<li><code>wait</code> waits for all background processes to complete. I only use this in scripts though.</li>\n<li>apparently in bash/zsh you can also just type <code>%2</code> instead of <code>fg %2</code></li>\n</ul>\n<p>I might have forgotten some other job control commands but I think those are all the ones I’ve ever used.</p>\n<p>You can also give <code>fg</code> or <code>bg</code> a specific job to foreground/background. For example if I see this in the output of <code>jobs</code>:</p>\n<pre><code>$ jobs\nJob Group State   Command\n1   3161  running cat < /dev/zero &\n2   3264  stopped nvim -w ~/.vimkeys $argv\n</code></pre>\n<p>then I can foreground <code>nvim</code> with <code>fg %2</code>. You can also kill it with <code>kill -9 %2</code>, or just <code>kill %2</code> if you want to be more gentle.</p>\n<h3 id=\"how-is-kill-2-implemented\">how is <code>kill %2</code> implemented?</h3>\n<p>I was curious about how <code>kill %2</code> works – does <code>%2</code> just get replaced with the\nPID of the relevant process when you run the command, the way environment\nvariables are? Some quick experimentation shows that it isn’t:</p>\n<pre><code>$ echo kill %2\nkill %2\n$ type kill\nkill is a function with definition\n# Defined in /nix/store/vicfrai6lhnl8xw6azq5dzaizx56gw4m-fish-3.7.0/share/fish/config.fish\n</code></pre>\n<p>So <code>kill</code> is a fish builtin that knows how to interpret <code>%2</code>. Looking at\nthe source code (which is very easy in fish!), it uses <code>jobs -p %2</code> to expand <code>%2</code>\ninto a PID, and then runs the regular <code>kill</code> command.</p>\n<h3 id=\"on-differences-between-shells\">on differences between shells</h3>\n<p>Job control is implemented by your shell. I use fish, but my sense is that the\nbasics of job control work pretty similarly in bash, fish, and zsh.</p>\n<p>There are definitely some shells which don’t have job control at all, but I’ve\nonly used bash/fish/zsh so I don’t know much about that.</p>\n<p>Now let’s get into a few reasons people use job control!</p>\n<h3 id=\"reason-1-kill-a-command-that-s-not-responding-to-ctrl-c\">reason 1: kill a command that’s not responding to Ctrl+C</h3>\n<p>I run into processes that don’t respond to <code>Ctrl+C</code> pretty regularly, and it’s\nalways a little annoying – I usually switch terminal tabs to find and kill and\nthe process. A bunch of people pointed out that you can do this in a faster way\nusing job control!</p>\n<p>How to do this: Press <code>Ctrl+Z</code>, then <code>kill %1</code> (or the appropriate job number\nif there’s more than one stopped/background job, which you can get from\n<code>jobs</code>). You can also <code>kill -9</code> if it’s really not responding.</p>\n<h3 id=\"reason-2-background-a-gui-app-so-it-s-not-using-up-a-terminal-tab\">reason 2: background a GUI app so it’s not using up a terminal tab</h3>\n<p>Sometimes I start a GUI program from the command line (for example with\n<code>wireshark some_file.pcap</code>), forget to start it in the background, and don’t want it eating up my terminal tab.</p>\n<p>How to do this:</p>\n<ul>\n<li>move the GUI program to the background by pressing <code>Ctrl+Z</code> and then running <code>bg</code>.</li>\n<li>you can also run <code>disown</code> to remove it from the list of jobs, to make sure that\nthe GUI program won’t get closed when you close your terminal tab.</li>\n</ul>\n<p>Personally I try to avoid starting GUI programs from the terminal if possible\nbecause I don’t like how their stdout pollutes my terminal (on a Mac I use\n<code>open -a Wireshark</code> instead because I find it works better but sometimes you\ndon’t have another choice.</p>\n<h3 id=\"reason-2-5-accidentally-started-a-long-running-job-without-tmux\">reason 2.5: accidentally started a long-running job without <code>tmux</code></h3>\n<p>This is basically the same as the GUI app thing – you can move the job to the\nbackground and disown it.</p>\n<p>I was also curious about if there are ways to redirect a process’s output to a\nfile after it’s already started. A quick search turned up <a href=\"https://github.com/jerome-pouiller/reredirect/\">this Linux-only tool</a> which is based on\n<a href=\"https://blog.nelhage.com/\">nelhage</a>’s <a href=\"https://github.com/nelhage/reptyr\">reptyr</a> (which lets you for example move a\nprocess that you started outside of tmux to tmux) but I haven’t tried either of\nthose.</p>\n<h3 id=\"reason-3-running-a-command-while-using-vim\">reason 3: running a command while using <code>vim</code></h3>\n<p>A lot of people mentioned that if they want to quickly test something while\nediting code in <code>vim</code> or another terminal editor, they like to use <code>Ctrl+Z</code>\nto stop vim, run the command, and then run <code>fg</code> to go back to their editor.</p>\n<p>You can also use this to check the output of a command that you ran before\nstarting <code>vim</code>.</p>\n<p>I’ve never gotten in the habit of this, probably because I mostly use a GUI\nversion of vim. I feel like I’d also be likely to switch terminal tabs and end\nup wondering “wait… where did I put my editor???” and have to go searching\nfor it.</p>\n<h3 id=\"reason-4-preferring-interleaved-output\">reason 4: preferring interleaved output</h3>\n<p>A few people said that they prefer to the output of all of their commands being\ninterleaved in the terminal. This really surprised me because I usually think\nof having the output of lots of different commands interleaved as being a <em>bad</em>\nthing, but one person said that they like to do this with tcpdump specifically\nand I think that actually sounds extremely useful. Here’s what it looks like:</p>\n<pre><code># start tcpdump\n$ sudo tcpdump -ni any port 1234 &\ntcpdump: data link type PKTAP\ntcpdump: verbose output suppressed, use -v[v]... for full protocol decode\nlistening on any, link-type PKTAP (Apple DLT_PKTAP), snapshot length 524288 bytes\n\n# run curl\n$ curl google.com:1234\n13:13:29.881018 IP 192.168.1.173.49626 > 142.251.41.78.1234: Flags [S], seq 613574185, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 2730440518 ecr 0,sackOK,eol], length 0\n13:13:30.881963 IP 192.168.1.173.49626 > 142.251.41.78.1234: Flags [S], seq 613574185, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 2730441519 ecr 0,sackOK,eol], length 0\n13:13:31.882587 IP 192.168.1.173.49626 > 142.251.41.78.1234: Flags [S], seq 613574185, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 2730442520 ecr 0,sackOK,eol], length 0\n \n# when you're done, kill the tcpdump in the background\n$ kill %1 \n</code></pre>\n<p>I think it’s really nice here that you can see the output of tcpdump inline in\nyour terminal – when I’m using tcpdump I’m always switching back and forth and\nI always get confused trying to match up the timestamps, so keeping everything\nin one terminal seems like it might be a lot clearer. I’m going to try it.</p>\n<h3 id=\"reason-5-suspend-a-cpu-hungry-program\">reason 5: suspend a CPU-hungry program</h3>\n<p>One person said that sometimes they’re running a very CPU-intensive program,\nfor example converting a video with <code>ffmpeg</code>, and they need to use the CPU for\nsomething else, but don’t want to lose the work that ffmpeg already did.</p>\n<p>You can do this by pressing <code>Ctrl+Z</code> to pause the process, and then run <code>fg</code>\nwhen you want to start it again.</p>\n<h3 id=\"reason-6-you-accidentally-ran-ctrl-z\">reason 6: you accidentally ran Ctrl+Z</h3>\n<p>Many people replied that they didn’t use job control <em>intentionally</em>, but\nthat they sometimes accidentally ran Ctrl+Z, which stopped whatever program was\nrunning, so they needed to learn how to use <code>fg</code> to bring it back to the\nforeground.</p>\n<p>The were also some mentions of accidentally running <code>Ctrl+S</code> too (which stops\nyour terminal and I think can be undone with <code>Ctrl+Q</code>). My terminal totally\nignores <code>Ctrl+S</code> so I guess I’m safe from that one though.</p>\n<h3 id=\"reason-7-already-set-up-a-bunch-of-environment-variables\">reason 7: already set up a bunch of environment variables</h3>\n<p>Some folks mentioned that they already set up a bunch of environment variables\nthat they need to run various commands, so it’s easier to use job control to\nrun multiple commands in the same terminal than to redo that work in another\ntab.</p>\n<h3 id=\"reason-8-it-s-your-only-option\">reason 8: it’s your only option</h3>\n<p>Probably the most obvious reason to use job control to manage multiple\nprocesses is “because you have to” – maybe you’re in single-user mode, or on a\nvery restricted computer, or SSH’d into a machine that doesn’t have tmux or\nscreen and you don’t want to create multiple SSH sessions.</p>\n<h3 id=\"reason-9-some-people-just-like-it-better\">reason 9: some people just like it better</h3>\n<p>Some people also said that they just don’t like using terminal tabs: for\ninstance a few folks mentioned that they prefer to be able to see all of their\nterminals on the screen at the same time, so they’d rather have 4 terminals on\nthe screen and then use job control if they need to run more than 4 programs.</p>\n<h3 id=\"i-learned-a-few-new-tricks\">I learned a few new tricks!</h3>\n<p>I think my two main takeaways from thos post is I’ll probably try out job control a little more for:</p>\n<ol>\n<li>killing processes that don’t respond to Ctrl+C</li>\n<li>running <code>tcpdump</code> in the background with whatever network command I’m running, so I can see both of their output in the same place</li>\n</ol>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/04/25/new-zine--how-git-works-/",
      "title": "New zine: How Git Works!",
      "description": null,
      "url": "https://jvns.ca/blog/2024/04/25/new-zine--how-git-works-/",
      "published": null,
      "updated": "2024-06-03T09:45:11.000Z",
      "content": "<p>Hello! I’ve been writing about git on here nonstop for months, and the git zine\nis FINALLY done! It came out on Friday!</p>\n<p>You can get it for $12 here:\n<a href=\"https://wizardzines.com/zines/git\">https://wizardzines.com/zines/git</a>, or get\nan <a href=\"https://wizardzines.com/zines/all-the-zines/\">14-pack of all my zines here</a>.</p>\n<p>Here’s the cover:</p>\n<div align=\"center\">\n<a href=\"https://wizardzines.com/zines/git\">\n  <img width=\"600px\" src=\"https://wizardzines.com/zines/git/cover-small.jpg\">\n  </a>\n</div>\n<h3 id=\"the-table-of-contents\">the table of contents</h3>\n<p>Here’s the table of contents:</p>\n<a href=\"https://wizardzines.com/zines/git/toc.png\">\n  <img width=\"600px\" src=\"https://wizardzines.com/zines/git/toc.png\">\n</a>\n<h3 id=\"who-is-this-zine-for\">who is this zine for?</h3>\n<p>I wrote this zine for people who have been using git for years and are still\nafraid of it. As always – I think it sucks to be afraid of the tools that you\nuse in your work every day! I want folks to feel confident using git.</p>\n<p>My goals are:</p>\n<ul>\n<li>To explain how some parts of git that initially seem scary (like “detached\nHEAD state”) are pretty straightforward to deal with once you understand\nwhat’s going on</li>\n<li>To show some parts of git you probably <em>should</em> be careful around.  For\nexample, the stash is one of the places in git where it’s easiest to lose\nyour work in a way that’s incredibly annoying to recover form, and I avoid\nusing it heavily because of that.</li>\n<li>To clear up a few common misconceptions about how the core parts of git (like\ncommits, branches, and merging) work</li>\n</ul>\n<h3 id=\"what-s-the-difference-between-this-and-oh-shit-git\">what’s the difference between this and Oh Shit, Git!</h3>\n<p>You might be wondering – Julia! You already have a zine about git! What’s going\non? <a href=\"https://wizardzines.com/zines/oh-shit-git\">Oh Shit, Git!</a> is a set of tricks for fixing git messes. <a href=\"https://wizardzines.com/zines/git/\">“How Git Works”</a>\nexplains how Git <strong>actually</strong> works.</p>\n<p>Also, Oh Shit, Git! is the amazing <a href=\"https://sylormiller.com/\">Katie Sylor Miller</a>’s <a href=\"https://ohshitgit.com/\">concept</a>: we made it\ninto a zine because I was such a huge fan of her work on it.</p>\n<p>I think they go really well together.</p>\n<h3 id=\"what-s-so-confusing-about-git-anyway\">what’s so confusing about git, anyway?</h3>\n<p>This zine was really hard for me to write because when I started writing it,\nI’d been using git pretty confidently for 10 years. I had no real memory of\nwhat it was <em>like</em> to struggle with git.</p>\n<p>But thanks to a huge amount of help from <a href=\"https://marieflanagan.com/\">Marie</a> as\nwell as everyone who talked to me about git on Mastodon, eventually I was able\nto see that there are a lot of things about git that are counterintuitive,\nmisleading, or just plain confusing. These include:</p>\n<ul>\n<li><a href=\"https://jvns.ca/blog/2023/11/01/confusing-git-terminology/\">confusing terminology</a> (for example “fast-forward”, “reference”, or “remote-tracking branch”)</li>\n<li>misleading messages (for example how <code>Your branch is up to date with 'origin/main'</code> doesn’t necessary mean that your branch is up to date with the <code>main</code> branch on the origin)</li>\n<li>uninformative output (for example how I <em>STILL</em> can’t reliably figure out which code comes from which branch when I’m looking at a merge conflict)</li>\n<li>a lack of guidance around handling diverged branches (for example how when you run <code>git pull</code> and your branch has diverged from the origin, it doesn’t give you great guidance how to handle the situation)</li>\n<li>inconsistent behaviour (for example how git’s reflogs are almost always append-only, EXCEPT for the stash, where git will delete entries when you run <code>git stash drop</code>)</li>\n</ul>\n<p>The more I heard from people how about how confusing they find git, the more it\nbecame clear that git really does not make it easy to figure out what its\ninternal logic is just by using it.</p>\n<h3 id=\"handling-git-s-weirdnesses-becomes-pretty-routine\">handling git’s weirdnesses becomes pretty routine</h3>\n<p>The previous section made git sound really bad, like “how can anyone possibly\nuse this thing?”.</p>\n<p>But my experience is that after I learned what git actually means by all of its\nweird error messages, dealing with it became pretty routine! I’ll see an\n<code>error: failed to push some refs to 'github.com:jvns/wizard-zines-site'</code>,\nrealize “oh right, probably a coworker made some changes to <code>main</code> since I last\nran <code>git pull</code>”, run <code>git pull --rebase</code> to incorporate their changes, and move\non with my day. The whole thing takes about 10 seconds.</p>\n<p>Or if I see a <code>You are in 'detached HEAD' state</code> warning, I’ll just make sure\nto run <code>git checkout mybranch</code> before continuing to write code. No big deal.</p>\n<p>For me (and for a lot of folks I talk to about git!), dealing with git’s weird\nlanguage can become so normal that you totally forget why anybody would even\nfind it weird.</p>\n<h3 id=\"a-little-bit-of-internals\">a little bit of internals</h3>\n<p>One of my biggest questions when writing this zine was how much to focus on\nwhat’s in the <code>.git</code> directory. We ended up deciding to include a couple of\npages about internals (“inside .git”, pages 14-15), but otherwise focus more on\ngit’s <em>behaviour</em> when you use it and why sometimes git behaves in unexpected\nways.</p>\n<p>This is partly because there are lots of great guides to git’s internals\nout there already (<a href=\"https://maryrosecook.com/blog/post/git-from-the-inside-out\">1</a>, <a href=\"https://shop.jcoglan.com/building-git/\">2</a>), and partly because I think even if you <em>have</em> read one\nof these guides to git’s internals, it isn’t totally obvious how to connect\nthat information to what you actually see in git’s user interface.</p>\n<p>For example: it’s easy to find documentation about remotes in git –\nfor example <a href=\"https://git-scm.com/book/en/v2/Git-Branching-Remote-Branches\">this page</a> says:</p>\n<blockquote>\n<p>Remote-tracking branches […] remind you where the branches in your remote\nrepositories were the last time you connected to them.</p>\n</blockquote>\n<p>But even if you’ve read that, you might not realize that the statement <code>Your branch is up to date with 'origin/main'\"</code> in <code>git status</code> doesn’t necessarily\nmean that you’re actually up to date with the remote <code>main</code> branch.</p>\n<p>So in general in the zine we focus on the behaviour you see in Git’s UI, and\nthen explain how that relates to what’s happening internally in Git.</p>\n<h3 id=\"the-cheat-sheet\">the cheat sheet</h3>\n<p>The zine also comes with a free printable cheat sheet: (click to get a PDF version)</p>\n<a href=\"https://wizardzines.com/git-cheat-sheet.pdf\">\n  <img width=\"600px\" src=\"https://wizardzines.com/images/cheat-sheet-smaller.png\">\n</a>\n<h3 id=\"it-comes-with-an-html-transcript\">it comes with an HTML transcript!</h3>\n<p>The zine also comes with an HTML transcript, to (hopefully) make it easier to\nread on a screen reader! Our Operations Manager, Lee, transcribed all of the\npages and wrote image descriptions. I’d love feedback about the experience of\nreading the zine on a screen reader if you try it.</p>\n<h3 id=\"i-really-do-love-git\">I really do love git</h3>\n<p>I’ve been pretty critical about git in this post, but I only write zines about\ntechnologies I love, and git is no exception.</p>\n<p>Some reasons I love git:</p>\n<ul>\n<li>it’s fast!</li>\n<li>it’s backwards compatible! I learned how to use it 10 years ago and\neverything I learned then is still true</li>\n<li>there’s tons of great free Git hosting available out there (GitHub! Gitlab! a\nmillion more!), so I can easily back up all my code</li>\n<li>simple workflows are REALLY simple (if I’m working on a project on my own, I\ncan just run <code>git commit -am 'whatever'</code> and <code>git push</code> over and over again and it\nworks perfectly)</li>\n<li>Almost every internal file in git is a pretty simple text file (or has a\nversion which is a text file), which makes me feel like I can always\nunderstand exactly what’s going on under the hood if I want to.</li>\n</ul>\n<p>I hope this zine helps some of you love it too.</p>\n<h3 id=\"people-who-helped-with-this-zine\">people who helped with this zine</h3>\n<p>I don’t make these zines by myself!</p>\n<p>I worked with <a href=\"https://marieflanagan.com/\">Marie Claire LeBlanc Flanagan</a> every\nmorning for 8 months to write clear explanations of git.</p>\n<p>The cover is by Vladimir Kašiković,\nGersande La Flèche did copy editing,\nJames Coglan (of the great <a href=\"https://shop.jcoglan.com/building-git/\">Building\nGit</a>) did technical review, our\nOperations Manager Lee did the transcription as well as a million other\nthings, my partner Kamal read the zine and told me which parts were off (as he\nalways does), and I had a million great conversations with Marco Rogers about\ngit.</p>\n<p>And finally, I want to thank all the beta readers! There were 66 this time\nwhich is a record! They left hundreds of comments about what was confusing,\nwhat they learned, and which of my jokes were funny. It’s always hard to hear\nfrom beta readers that a page I thought made sense is actually extremely\nconfusing, and fixing those problems before the final version makes the zine so\nmuch better.</p>\n<h3 id=\"get-the-zine\">get the zine</h3>\n<p>Here are some links to get the zine again:</p>\n<ul>\n<li>get <a href=\"https://wizardzines.com/zines/git\">How Git Works</a></li>\n<li>get an <a href=\"https://wizardzines.com/zines/all-the-zines/\">14-pack of all my zines here</a>.</li>\n</ul>\n<p>As always, you can get either a PDF version to print at home or a print version\nshipped to your house. The only caveat is print orders will ship in <strong>July</strong> – I\nneed to wait for orders to come in to get an idea of how many I should print\nbefore sending it to the printer.</p>\n<h3 id=\"thank-you\">thank you</h3>\n<p>As always: if you’ve bought zines in the past, thank you for all your support\nover the years. And thanks to all of you (1000+ people!!!) who have already\nbought the zine in the first 3 days. It’s already set a record for most zines\nsold in a single day and I’ve been really blown away.</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://jvns.ca/blog/2024/04/10/notes-on-git-error-messages/",
      "title": "Notes on git's error messages",
      "description": null,
      "url": "https://jvns.ca/blog/2024/04/10/notes-on-git-error-messages/",
      "published": null,
      "updated": "2024-04-10T12:43:14.000Z",
      "content": "<p>While writing about Git, I’ve noticed that a lot of folks struggle with Git’s\nerror messages. I’ve had many years to get used to these error messages so it\ntook me a really long time to understand <em>why</em> folks were confused, but having\nthought about it much more, I’ve realized that:</p>\n<ol>\n<li>sometimes I actually <em>am</em> confused by the error messages, I’m just used to\nbeing confused</li>\n<li>I have a bunch of strategies for getting more information when the error\nmessage git gives me isn’t very informative</li>\n</ol>\n<p>So in this post, I’m going to go through a bunch of Git’s error messages,\nlist a few things that I think are confusing about them for each one, and talk\nabout what I do when I’m confused by the message.</p>\n<h3 id=\"improving-error-messages-isn-t-easy\">improving error messages isn’t easy</h3>\n<p>Before we start, I want to say that trying to think about why these error\nmessages are confusing has given me a lot of respect for how difficult\nmaintaining Git is. I’ve been thinking about Git for months, and for some of\nthese messages I really have no idea how to improve them.</p>\n<p>Some things that seem hard to me about improving error messages:</p>\n<ul>\n<li>if you come up with an idea for a new message, it’s hard to tell if it’s actually better!</li>\n<li>work like improving error messages often <a href=\"https://lwn.net/Articles/959768/\">isn’t funded</a></li>\n<li>the error messages have to be translated (git’s error messages are translated into <a href=\"https://github.com/git/git/tree/master/po\">19 languages</a>!)</li>\n</ul>\n<p>That said, if you find these messages confusing, hopefully some of these notes\nwill help clarify them a bit.</p>\n<style>\n.error {\n  color: #db322e;\n}\n.warning {\n  color: #765900;\n}\n.bg {\n  color: #fdf6e3\n}\npre {\n  background-color: #fdf6e3;\n  padding: 10px;\n  border-radius: 5px;\n  /* wrap long lines */\n  white-space: pre-wrap;\n}\n\nh2 a {\n  color: black;\n  text-decoration: none;\n}\n\narticle span {\n  padding: 0;\n}\n\narticle a:hover {\n  text-decoration: underline;\n}\n</style>\n<h2 id=\"git-push-on-a-diverged-branch\">\n  <a href=\"#git-push-on-a-diverged-branch\">\n  error: <code>git push</code> on a diverged branch\n  </a>\n</h2>\n<pre>\n$ git push\nTo github.com:jvns/int-exposed\n<span class=\"error\">! [rejected]        main -> main (non-fast-forward)</span>\n<span class=\"warning\">error: failed to push some refs to 'github.com:jvns/int-exposed'\nhint: Updates were rejected because the tip of your current branch is behind\nhint: its remote counterpart. Integrate the remote changes (e.g.\nhint: 'git pull ...') before pushing again.\nhint: See the 'Note about fast-forwards' in 'git push --help' for details.</span>\n\n$ git status\nOn branch main\nYour branch and 'origin/main' have diverged,\nand have 2 and 1 different commits each, respectively.\n</pre>\n<p>Some things I find confusing about this:</p>\n<ol>\n<li>You get the exact same error message whether the branch is just <strong>behind</strong>\nor the branch has <strong>diverged</strong>. There’s no way to tell which it is from this\nmessage: you need to run <code>git status</code> or <code>git pull</code> to find out.</li>\n<li>It says <code>failed to push some refs</code>, but it’s not totally clear <em>which</em> references it\nfailed to push. I believe everything that failed to push is listed with <code>! [rejected]</code> on the previous line– in this case just the <code>main</code> branch.</li>\n</ol>\n<p><strong>What I like to do if I’m confused:</strong></p>\n<ul>\n<li>I’ll run <code>git status</code> to figure out what the state of my current branch is.</li>\n<li>I think I almost never try to push more than one branch at a time, so I\nusually totally ignore git’s notes about which specific branch failed to push\n– I just assume that it’s my current branch</li>\n</ul>\n<h2 id=\"git-pull-on-a-diverged-branch\">\n  <a href=\"#git-pull-on-a-diverged-branch\">\n  error: <code>git pull</code> on a diverged branch\n  </a>\n</h2>\n<pre>\n$ git pull\n<span class=\"warning\">hint: You have divergent branches and need to specify how to reconcile them.\nhint: You can do so by running one of the following commands sometime before\nhint: your next pull:\nhint:\nhint:   git config pull.rebase false  # merge\nhint:   git config pull.rebase true   # rebase\nhint:   git config pull.ff only       # fast-forward only\nhint:\nhint: You can replace \"git config\" with \"git config --global\" to set a default\nhint: preference for all repositories. You can also pass --rebase, --no-rebase,\nhint: or --ff-only on the command line to override the configured default per\nhint: invocation.</span>\nfatal: Need to specify how to reconcile divergent branches.\n</pre>\n<p>The main thing I think is confusing here is that git is presenting you with a\nkind of overwhelming number of options: it’s saying that you can either:</p>\n<ol>\n<li>configure <code>pull.rebase false</code>, <code>pull.rebase true</code>, or <code>pull.ff only</code> locally</li>\n<li>or configure them globally</li>\n<li>or run <code>git pull --rebase</code> or <code>git pull --no-rebase</code></li>\n</ol>\n<p>It’s very hard to imagine how a beginner to git could easily use this hint to\nsort through all these options on their own.</p>\n<p>If I were explaining this to a friend, I’d say something like “you can use <code>git pull --rebase</code>\nor <code>git pull --no-rebase</code> to resolve this with a rebase or merge\n<em>right now</em>, and if you want to set a permanent preference, you can do that\nwith <code>git config pull.rebase false</code> or <code>git config pull.rebase true</code>.</p>\n<p><code>git config pull.ff only</code> feels a little redundant to me because that’s git’s\ndefault behaviour anyway (though it wasn’t always).</p>\n<p><strong>What I like to do here:</strong></p>\n<ul>\n<li>run <code>git status</code> to see the state of my current branch</li>\n<li>maybe run <code>git log origin/main</code> or <code>git log</code> to see what the diverged commits are</li>\n<li>usually run <code>git pull --rebase</code> to resolve it</li>\n<li>sometimes I’ll run <code>git push --force</code> or <code>git reset --hard origin/main</code> if I\nwant to throw away my local work or remote work (for example because I\naccidentally commited to the wrong branch, or because I ran <code>git commit --amend</code> on a personal branch that only I’m using and want to force push)</li>\n</ul>\n<h2 id=\"git-checkout-asdf\">\n  <a href=\"#git-checkout-asdf\">\n  error: <code>git checkout asdf</code> (a branch that doesn't exist)\n  </a>\n</h2>\n<pre>\n$ git checkout asdf\nerror: pathspec 'asdf' did not match any file(s) known to git\n</pre>\n<p>This is a little weird because we my intention was to check out a <strong>branch</strong>,\nbut <code>git checkout</code> is complaining about a <strong>path</strong> that doesn’t exist.</p>\n<p>This is happening because <code>git checkout</code>’s first argument can be either a\nbranch or a path, and git has no way of knowing which one you intended. This\nseems tricky to improve, but I might expect something like “No such branch,\ncommit, or path: asdf”.</p>\n<p><strong>What I like to do here:</strong></p>\n<ul>\n<li>in theory it would be good to use <code>git switch</code> instead, but I keep using <code>git checkout</code> anyway</li>\n<li>generally I just remember that I need to decode this as “branch <code>asdf</code> doesn’t exist”</li>\n</ul>\n<h2 id=\"git-switch-asdf\">\n  <a href=\"#git-switch-asdf\">\n  error: <code>git switch asdf</code> (a branch that doesn't exist)\n  </a>\n</h2>\n<pre>\n$ git switch asdf\nfatal: invalid reference: asdf\n</pre>\n<p><code>git switch</code> only accepts a branch as an argument (unless you pass <code>-d</code>), so why is it saying <code>invalid reference: asdf</code> instead of <code>invalid branch: asdf</code>?</p>\n<p>I think the reason is that internally, <code>git switch</code> is trying to be helpful in its error messages: if you run <code>git switch v0.1</code> to switch to a tag, it’ll say:</p>\n<pre><code>$ git switch v0.1\nfatal: a branch is expected, got tag 'v0.1'`\n</code></pre>\n<p>So what git is trying to communicate with <code>fatal: invalid reference: asdf</code> is\n“<code>asdf</code> isn’t a branch, but it’s not a tag either, or any other reference”. From my various <a href=\"https://jvns.ca/blog/2024/03/28/git-poll-results/\">git polls</a> my impression is that\na lot of git users have literally no idea what a “reference” is in git, so I’m not sure if that’s coming across.</p>\n<p><strong>What I like to do here:</strong></p>\n<p>90% of the time when a git error message says <code>reference</code> I just mentally\nreplace it with <code>branch</code> in my head.</p>\n<h2 id=\"detached-head\">\n  error: <a href=\"#detached-head\"><code>git checkout HEAD^</code></a>\n</h2>\n<pre>$ git checkout HEAD^\nNote: switching to 'HEAD^'.\n\nYou are in 'detached HEAD' state. You can look around, make experimental\nchanges and commit them, and you can discard any commits you make in this\nstate without impacting any branches by switching back to a branch.\n\nIf you want to create a new branch to retain commits you create, you may\ndo so (now or later) by using -c with the switch command. Example:\n\n  git switch -c <new-branch-name>\n\nOr undo this operation with:\n\n  git switch -\n\nTurn off this advice by setting config variable advice.detachedHead to false\n\nHEAD is now at 182cd3f add \"swap byte order\" button\n</pre>\n<p>\nThis is a tough one. Definitely a lot of people are confused about this\nmessage, but obviously there's been a lot of effort to improve it too. I don't\nhave anything smart to say about this one.\n</p>\n<p><strong>What I like to do here:</strong></p>\n<ul>\n<li>my shell prompt tells me if I’m in detached HEAD state, and generally I can remember not to make new commits while in that state</li>\n<li>when I’m done looking at whatever old commits I wanted to look at, I’ll run <code>git checkout main</code> or something to go back to a branch</li>\n</ul>\n<h2 id=\"rebase-in-progress\">\n  <a href=\"#rebase-in-progress\">\n  message: <code>git status</code> when a rebase is in progress\n  </a>  \n</h2>\n<p>This isn’t an error message, but I still find it a little confusing on its own:</p>\n<pre>\n$ git status\n<span class=\"error\">interactive rebase in progress;</span> onto c694cf8\nLast command done (1 command done):\n   pick 0a9964d wip\nNo commands remaining.\nYou are currently rebasing branch 'main' on 'c694cf8'.\n  (fix conflicts and then run \"git rebase --continue\")\n  (use \"git rebase --skip\" to skip this patch)\n  (use \"git rebase --abort\" to check out the original branch)\n\nUnmerged paths:\n  (use \"git restore --staged <file>...\" to unstage)\n  (use \"git add <file>...\" to mark resolution)\n  <span class=\"error\">both modified:   index.html</span>\n\nno changes added to commit (use \"git add\" and/or \"git commit -a\")\n</pre>\n<p>Two things I think could be clearer here:</p>\n<ol>\n<li>I think it would be nice if <code>You are currently rebasing branch 'main' on 'c694cf8'.</code> were on the first line instead of the 5th line – right now the first line doesn’t say which branch you’re rebasing.</li>\n<li>In this case, <code>c694cf8</code> is actually <code>origin/main</code>, so I feel like <code>You are currently rebasing branch 'main' on 'origin/main'</code> might be even clearer.</li>\n</ol>\n<p><strong>What I like to do here:</strong></p>\n<p>My shell prompt includes the branch that I’m currently rebasing, so I rely on that instead of the output of <code>git status</code>.</p>\n<h2 id=\"merge-deleted\">\n  <a href=\"#merge-deleted\">\n  error: <code>git rebase</code> when a file has been deleted\n  </a>\n</h2>\n<pre>\n$ git rebase main\nCONFLICT (modify/delete): index.html deleted in 0ce151e (wip) and modified in HEAD.  Version HEAD of index.html left in tree.\nerror: could not apply 0ce151e... wip\n</pre>\n<p>The thing I still find confusing about this is – <code>index.html</code> was modified in\n<code>HEAD</code>. But what is <code>HEAD</code>? Is it the commit I was working on when I started\nthe merge/rebase, or is it the commit from the other branch? (the answer is\n“<code>HEAD</code> is your branch if you’re doing a merge, and it’s the “other branch” if\nyou’re doing a rebase, but I always find that hard to remember)</p>\n<p>I think I would personally find it easier to understand if the message listed the branch names if possible, something like this:</p>\n<pre><code>CONFLICT (modify/delete): index.html deleted on `main` and modified on `mybranch`\n</code></pre>\n<h2 id=\"merge-ours\">\n  <a href=\"#merge-ours\">\n  error: <code>git status</code> during a merge or rebase (who is \"them\"?)\n  </a>\n</h2>\n<pre>\n$ git status \nOn branch master\nYou have unmerged paths.\n  (fix conflicts and run \"git commit\")\n  (use \"git merge --abort\" to abort the merge)\n<p>Unmerged paths:\n(use “git add/rm <file>…” as appropriate to mark resolution)\ndeleted by them: the_file</p>\n<p>no changes added to commit (use “git add” and/or “git commit -a”)\n</pre></p>\n<p>I find this one confusing in exactly the same way as the previous message: it\nsays <code>deleted by them:</code>, but what “them” refers to depends on whether you did a merge or rebase or cherry-pick.</p>\n<ul>\n<li>for a merge, <code>them</code> is the other branch you merged in</li>\n<li>for a rebase, <code>them</code> is the branch that you were on when you ran <code>git rebase</code></li>\n<li>for a cherry-pick, I guess it’s the commit you cherry-picked</li>\n</ul>\n<p><strong>What I like to do if I’m confused:</strong></p>\n<ul>\n<li>try to remember what I did</li>\n<li>run <code>git show main --stat</code> or something to see what I did on the <code>main</code> branch if I can’t remember</li>\n</ul>\n<h2 id=\"git clean\">\n  <a href=\"#git-clean\">\n  error: <code>git clean</code>\n  </a>\n</h2>\n<pre>\n$ git clean\nfatal: clean.requireForce defaults to true and neither -i, -n, nor -f given; refusing to clean\n</pre>\n<p>I just find it a bit confusing that you need to look up what <code>-i</code>, <code>-n</code> and\n<code>-f</code> are to be able to understand this error message. I’m personally way too\nlazy to do that so even though I’ve probably been using <code>git clean</code> for 10\nyears I still had no idea what <code>-i</code> stood for (<code>interactive</code>) until I was\nwriting this down.</p>\n<p><strong>What I like to do if I’m confused:</strong></p>\n<p>Usually I just chaotically run <code>git clean -f</code> to delete all my untracked files\nand hope for the best, though I might actually switch to <code>git clean -i</code>  now\nthat I know what <code>-i</code> stands for. Seems a lot safer.</p>\n<h3 id=\"that-s-all\">that’s all!</h3>\n<p>Hopefully some of this is helpful!</p>",
      "image": null,
      "media": [],
      "authors": [
        {
          "name": "Julia Evans",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    }
  ]
}
Analyze Another View with RSS.Style