RSS/Atom Feed Analysis

Analysis of https://fly.io/blog/feed.xml

Feed fetched in 61 ms.
Content type is text/xml.
Feed is 945,349 characters long.
Feed has an ETag of W/"6867019b-e72c2".
Feed has a last modified date of Thu, 03 Jul 2025 22:18:03 GMT.
Warning This feed does not have a stylesheet.
This appears to be an Atom feed.
Feed title: The Fly Blog
Error Feed self link does not match feed URL: https://fly.io/blog/.
Feed has 40 items.
First item published on 2025-06-20T00:00:00.000Z
Last item published on 2023-11-29T00:00:00.000Z
Home page URL: https://fly.io/blog/
Error Home page does not have any feed discovery link in the <head>.
Home page has a link to the feed in the <body>

Formatted XML

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <title>The Fly Blog</title>
    <subtitle>News, tips, and tricks from the team at Fly</subtitle>
    <id>https://fly.io/blog/</id>
    <link href="https://fly.io/blog/"/>
    <link href="https://fly.io/blog/" rel="self"/>
    <updated>2025-06-20T00:00:00+00:00</updated>
    <author>
        <name>Fly</name>
    </author>
    <entry>
        <title>Phoenix.new – The Remote AI Runtime for Phoenix</title>
        <link rel="alternate" href="https://fly.io/blog/phoenix-new-the-remote-ai-runtime/"/>
        <id>https://fly.io/blog/phoenix-new-the-remote-ai-runtime/</id>
        <published>2025-06-20T00:00:00+00:00</published>
        <updated>2025-07-03T17:13:29+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/phoenix-new-the-remote-ai-runtime/assets/phoenixnew-thumb.png"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;I’m Chris McCord, the creator of Elixir’s Phoenix framework. For the past several months, I’ve been working on a skunkworks project at Fly.io, and it’s time to show it off.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I wanted LLM agents to work just as well with Elixir as they do with Python and JavaScript. Last December, in order to figure out what that was going to take, I started a little weekend project to find out how difficult it would be to build a coding agent in Elixir.&lt;/p&gt;

&lt;p&gt;A few weeks later, I had it spitting out working Phoenix applications and driving a full in-browser IDE. I knew this wasn&amp;rsquo;t going to stay a weekend project.&lt;/p&gt;

&lt;p&gt;If you follow me on Twitter, you&amp;rsquo;ve probably seen me teasing this work as it picked up steam. We&amp;rsquo;re at a point where we&amp;rsquo;re pretty serious about this thing, and so it&amp;rsquo;s time to make a formal introduction.&lt;/p&gt;

&lt;p&gt;World, meet &lt;a href='https://phoenix.new' title=''&gt;Phoenix.new&lt;/a&gt;, a batteries-included fully-online coding agent tailored to Elixir and Phoenix. I think it&amp;rsquo;s going to be the fastest way to build collaborative, real-time applications.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s see it in action:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/du7GmWGUM5Y"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id='whats-interesting-about-phoenix-new' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-interesting-about-phoenix-new' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s Interesting About Phoenix.new&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;First, even though it runs entirely in your browser, Phoenix.new gives both you and your agent a root shell, in an ephemeral virtual machine (a &lt;a href='https://fly.io/docs/machines/overview/' title=''&gt;Fly Machine&lt;/a&gt;) that gives our agent loop free rein to install things and run programs  — without any risk of messing up your local machine. You don&amp;rsquo;t think about any of this; you just open up the VSCode interface, push the shell button, and there you are, on the isolated machine you share with the Phoenix.new agent.&lt;/p&gt;

&lt;p&gt;Second, it&amp;rsquo;s an agent system I built specifically for Phoenix. Phoenix is about real-time collaborative applications, and Phoenix.new knows what that means. To that end, Phoenix.new includes, in both its UI and its agent tools, a full browser. The Phoenix.new agent uses that browser &amp;ldquo;headlessly&amp;rdquo; to check its own front-end changes and interact with the app. Because it&amp;rsquo;s a full browser, instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state – with or without a human present.&lt;/p&gt;
&lt;h2 id='what-root-access-gets-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-root-access-gets-us' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Root Access Gets Us&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Agents build software the way you did when you first got started, the way you still do today when you prototype things. They don&amp;rsquo;t carefully design Docker container layers and they don&amp;rsquo;t really do release cycles. An agent wants to pop a shell and get its fingernails dirty.&lt;/p&gt;

&lt;p&gt;A fully isolated virtual machine means Phoenix.new&amp;rsquo;s fingernails can get &lt;em&gt;arbitrarily dirty.&lt;/em&gt; If it wants to add a package to &lt;code&gt;mix.exs&lt;/code&gt;, it can do that and then run &lt;code&gt;mix phx.server&lt;/code&gt; or &lt;code&gt;mix test&lt;/code&gt; and check the output. Sure. Every agent can do that. But if it wants to add an APT package to the base operating system, it can do that too, and make sure it worked. It owns the whole environment.&lt;/p&gt;

&lt;p&gt;This offloads a huge amount of tedious, repetitive work.&lt;/p&gt;

&lt;p&gt;At his &lt;a href='https://youtu.be/LCEmiRjPEtQ?si=sR_bdu6-AqPXSNmY&amp;t=1902' title=''&gt;AI Startup School talk last week&lt;/a&gt;, Andrej Karpathy related his experience of building a restaurant menu visualizer, which takes camera pictures of text menus and transforms all the menu items into pictures. The code, which he vibe-coded with an LLM agent, was the easy part; he had it working in an afternoon. But getting the app online took him a whole week.&lt;/p&gt;

&lt;p&gt;With Phoenix.new, I&amp;rsquo;m taking dead aim at this problem. The apps we produce live in the cloud from the minute they launch. They have private, shareable URLs (we detect anything the agent generates with a bound port and give it a preview URL underneath &lt;code&gt;phx.run&lt;/code&gt;, with integrated port-forwarding), they integrate with Github, and they inherit all the infrastructure guardrails of Fly.io: hardware virtualization, WireGuard, and isolated networks.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Github’s &lt;code&gt;gh&lt;/code&gt; CLI is installed by default. So the agent knows how to clone any repo, or browse issues, and you can even authorize it for internal repositories to get it working with your team’s existing projects and dependencies.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Full control of the environment also closes the loop between the agent and deployment. When Phoenix.new boots an app, it watches the logs, and tests the application. When an action triggers an error, Phoenix.new notices and gets to work.&lt;/p&gt;
&lt;h2 id='watch-it-build-in-real-time' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#watch-it-build-in-real-time' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Watch It Build In Real Time&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://phoenix.new' title=''&gt;Phoenix.new&lt;/a&gt; can interact with web applications the way users do: with a real browser.&lt;/p&gt;

&lt;p&gt;The Phoenix.new environment includes a headless Chrome browser that our agent knows how to drive. Prompt it to add a front-end feature to your application, and it won&amp;rsquo;t just sketch the code out and make sure it compiles and lints. It&amp;rsquo;ll pull the app up itself and poke at the UI, simultaneously looking at the page content, JavaScript state, and server-side logs.&lt;/p&gt;

&lt;p&gt;Phoenix is all about &lt;a href='https://fly.io/blog/how-we-got-to-liveview/' title=''&gt;&amp;ldquo;live&amp;rdquo; real-time&lt;/a&gt; interactivity, and gives us seamless live reload. The user interface for Phoenix.new itself includes a live preview of the app being worked on, so you can kick back and watch it build front-end features incrementally. Any other &lt;code&gt;.phx.run&lt;/code&gt; tabs you have open also update as it goes. It&amp;rsquo;s wild.&lt;/p&gt;
&lt;video title="agent interacting with web" autoplay="autoplay" loop="loop" muted="muted" playsinline="playsinline" disablePictureInPicture="true" class="mb-8" src="/blog/phoenix-new-the-remote-ai-runtime/assets/webjs.mp4"&gt;&lt;/video&gt;

&lt;h2 id='not-just-for-vibe-coding' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-just-for-vibe-coding' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Not Just For Vibe Coding&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Phoenix.new can already build real, full-stack applications with WebSockets, Phoenix&amp;rsquo;s Presence features, and real databases. I&amp;rsquo;m seeing it succeed at business and collaborative applications right now.&lt;/p&gt;

&lt;p&gt;But there&amp;rsquo;s no fixed bound on the tasks you can reasonably ask it to accomplish. If you can do it with a shell and a browser, I want Phoenix.new to do it too. And it can do these tasks with or without you present.&lt;/p&gt;

&lt;p&gt;For example: set a &lt;code&gt;$DATABASE_URL&lt;/code&gt; and tell the agent about it. The agent knows enough to go explore it with &lt;code&gt;psql&lt;/code&gt;, and it&amp;rsquo;ll propose apps based on the schemas it finds. It can model Ecto schemas off the database. And if MySQL is your thing, the agent will just &lt;code&gt;apt install&lt;/code&gt; a MySQL client and go to town.&lt;/p&gt;

&lt;p&gt;Frontier model LLMs have vast world knowledge. They generalize extremely well. At ElixirConfEU, I did a &lt;a href='https://www.youtube.com/watch?v=ojL_VHc4gLk&amp;t=3923s' title=''&gt;demo vibe-coding Tetris&lt;/a&gt; on stage. Phoenix.new nailed it, first try, first prompt. It&amp;rsquo;s not like there&amp;rsquo;s gobs of Phoenix LiveView Tetris examples floating around the Internet! But lots of people have published Tetris code, and lots of people have written LiveView stuff, and 2025 LLMs can connect those dots.&lt;/p&gt;

&lt;p&gt;At this point you might be wondering – can I just ask it to build a Rails app? Or an Expo React Native app? Or Svelte? Or Go?&lt;/p&gt;

&lt;p&gt;Yes, you can.&lt;/p&gt;

&lt;p&gt;Our system prompt is tuned for Phoenix today, but all languages you care about are already installed. We&amp;rsquo;re still figuring out where to take this, but adding new languages and frameworks definitely ranks highly in my plans.&lt;/p&gt;
&lt;h2 id='our-async-agent-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-async-agent-future' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Our Async Agent Future&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://fly.io/blog/youre-all-nuts/' title=''&gt;We&amp;rsquo;re at a massive step-change in developer workflows&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Agents can do real work, today, with or without a human present. Buckle up: the future of development, at least in the common case, probably looks less like cracking open a shell and finding a file to edit, and more like popping into a CI environment with agents working away around the clock.&lt;/p&gt;

&lt;p&gt;Local development isn&amp;rsquo;t going away. But there&amp;rsquo;s going to be a shift in where the majority of our iterations take place. I&amp;rsquo;m already using Phoenix.new to triage &lt;code&gt;phoenix-core&lt;/code&gt; Github issues and pick problems to solve. I close my laptop, grab a cup of coffee, and wait for a PR to arrive — Phoenix.new knows how PRs work, too. We&amp;rsquo;re already here, and this space is just getting started.&lt;/p&gt;

&lt;p&gt;This isn&amp;rsquo;t where I thought I&amp;rsquo;d end up when I started poking around. The Phoenix and LiveView journey was much the same. Something special was there and the projects took on a life of their own. I&amp;rsquo;m excited to share this work now, and see where it might take us. I can&amp;rsquo;t wait to see what folks build.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>What are MCP Servers?</title>
        <link rel="alternate" href="https://fly.io/blog/mcps-everywhere/"/>
        <id>https://fly.io/blog/mcps-everywhere/</id>
        <published>2025-06-12T00:00:00+00:00</published>
        <updated>2025-06-12T10:05:13+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/mcps-everywhere/assets/mcps-everywhere.jpg"/>
        <content type="html">&lt;div class="lead"&gt;&lt;div&gt;&lt;p&gt;With Fly.io, &lt;a href="https://fly.io/docs/speedrun/" title=""&gt;you can get your app running globally in a matter of minutes&lt;/a&gt;, and with MCP servers you can integrate with Claude, VSCode, Cursor and &lt;a href="https://modelcontextprotocol.io/clients"&gt;many more AI clients&lt;/a&gt;.  &lt;a href="https://fly.io/docs/mcp/" title=""&gt;Try it out for yourself&lt;/a&gt;!&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The introduction to &lt;a href='https://modelcontextprotocol.io/introduction' title=''&gt;Model Context Protocol&lt;/a&gt; starts out with:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That paragraph, to me, is both comforting (&amp;ldquo;USB for LLM&amp;rdquo;? Cool! Got it!), and simultaneously vacuous (Um, but what do I actually &lt;em&gt;do&lt;/em&gt; with this?).&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ve been digging deeper and have come up with a few more analogies and observations that make sense to me. Perhaps one or more of these will help you.&lt;/p&gt;
&lt;h2 id='mcps-are-alexa-skills' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-alexa-skills' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are Alexa Skills&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;You buy an Echo Dot and a Hue light. You plug them both in and connect them to your Wi-Fi. There is one more step you need to do: you need to install and enable the Philips Hue skill to connect the two.&lt;/p&gt;

&lt;p&gt;Now you might be using Siri or Google Assistant. Or you may want to connect a Ring Doorbell camera or Google Nest Thermostat. But the principle is the same, though the analogy is slightly stronger with a skill (which is a noun) as opposed to the action of pairing your Hue Bridge with Apple HomeKit (a verb).&lt;/p&gt;
&lt;h2 id='mcps-are-api-2-0' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-api-2-0' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are API 2.0&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;HTTP 1.1 is simple. You send a request, you get a response. While there are cookies and sessions, it is pretty much stateless and therefore inefficient, as each request needs to establish a new connection. WebSockets and Server-Sent Events (SSE) mitigate this a bit.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://en.wikipedia.org/wiki/HTTP/2' title=''&gt;HTTP 2.0&lt;/a&gt; introduces features like multiplexing and server push. When you visit a web page for the first time, your browser can now request all of the associated JavaScript, CSS, and images at once, and the server can respond in any order.&lt;/p&gt;

&lt;p&gt;APIs today are typically request/response. MCPs support multiplexing and server push.&lt;/p&gt;
&lt;h2 id='mcps-are-apis-with-introspection-reflection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-apis-with-introspection-reflection' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are APIs with Introspection/Reflection&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;With &lt;a href='https://learn.openapis.org/' title=''&gt;OpenAPI&lt;/a&gt;, requests are typically JSON, and responses are too. Many OpenAPI providers publish a separate &lt;a href='https://learn.openapis.org/specification/structure.html' title=''&gt;OpenAPI Description (OAD)&lt;/a&gt;, which contains a schema describing what requests are supported by that API.&lt;/p&gt;

&lt;p&gt;With MCP, requests are also JSON and responses are also JSON, but in addition, there are standard requests built into the protocol to obtain useful information, including what tools are provided, what arguments each tool expects, and a prose description of how each tool is expected to be used.&lt;/p&gt;

&lt;p&gt;As an aside, don&amp;rsquo;t automatically assume that you will get good results from &lt;a href='https://neon.com/blog/autogenerating-mcp-servers-openai-schemas' title=''&gt;auto-generating MCP Servers from OpenAPI schemas&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Effective MCP design means thinking about the workflows you want the agent to perform and potentially creating higher-level tools that encapsulate multiple API calls, rather than just exposing the raw building blocks of your entire API spec.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href='https://glama.ai/blog/2025-06-06-mcp-vs-api#five-fundamental-differences' title=''&gt;MCP vs API&lt;/a&gt; goes into this topic at greater debth.&lt;/p&gt;

&lt;p&gt;In many cases you will get better results treating LLMs as humans. If you have a CLI, consider using that as the starting point instead.&lt;/p&gt;
&lt;h2 id='mcps-are-not-serverless' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-not-serverless' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are &lt;strong&gt;&lt;i&gt;not&lt;/i&gt;&lt;/strong&gt; serverless&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://en.wikipedia.org/wiki/Serverless_computing' title=''&gt;Serverless&lt;/a&gt;, sometimes known as &lt;a href='https://en.wikipedia.org/wiki/Function_as_a_service' title=''&gt;FaaS&lt;/a&gt;, is a popular cloud computing model, where AWS Lambda is widely recognized as the archetype.&lt;/p&gt;

&lt;p&gt;MCP servers are not serverless; they have a well-defined and long-lived &lt;a href='https://modelcontextprotocol.io/specification/2025-03-26/basic/lifecycle' title=''&gt;lifecycle&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;svg aria-roledescription="sequence" role="graphics-document document" viewBox="-50 -10 482 651" style="max-width: 482px;" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" width="100%" id="rm"&gt;&lt;rect class="rect" height="70" width="302" fill="rgb(200, 220, 250)" y="325" x="40"&gt;&lt;/rect&gt;&lt;g&gt;&lt;rect class="actor actor-bottom" ry="3" rx="3" name="Server" height="65" width="150" stroke="#666" fill="#eaeaea" y="565" x="232"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="597.5" x="307"&gt;&lt;tspan dy="0" x="307"&gt;Server&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="actor actor-bottom" ry="3" rx="3" name="Client" height="65" width="150" stroke="#666" fill="#eaeaea" y="565" x="0"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="597.5" x="75"&gt;&lt;tspan dy="0" x="75"&gt;Client&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;line name="Server" stroke="#999" stroke-width="0.5px" class="actor-line 200" y2="565" x2="307" y1="65" x1="307" id="actor10"&gt;&lt;/line&gt;&lt;g id="root-10"&gt;&lt;rect class="actor actor-top" ry="3" rx="3" name="Server" height="65" width="150" stroke="#666" fill="#eaeaea" y="0" x="232"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="32.5" x="307"&gt;&lt;tspan dy="0" x="307"&gt;Server&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;/g&gt;&lt;g&gt;&lt;line name="Client" stroke="#999" stroke-width="0.5px" class="actor-line 200" y2="565" x2="75" y1="65" x1="75" id="actor9"&gt;&lt;/line&gt;&lt;g id="root-9"&gt;&lt;rect class="actor actor-top" ry="3" rx="3" name="Client" height="65" width="150" stroke="#666" fill="#eaeaea" y="0" x="0"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="32.5" x="75"&gt;&lt;tspan dy="0" x="75"&gt;Client&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;/g&gt;&lt;style&gt;#rm{font-family:inherit;font-size:16px;fill:#333;}#rm .error-icon{fill:#552222;}#rm .error-text{fill:#552222;stroke:#552222;}#rm .edge-thickness-normal{stroke-width:1px;}#rm .edge-thickness-thick{stroke-width:3.5px;}#rm .edge-pattern-solid{stroke-dasharray:0;}#rm .edge-thickness-invisible{stroke-width:0;fill:none;}#rm .edge-pattern-dashed{stroke-dasharray:3;}#rm .edge-pattern-dotted{stroke-dasharray:2;}#rm .marker{fill:#333333;stroke:#333333;}#rm .marker.cross{stroke:#333333;}#rm svg{font-family:inherit;font-size:16px;}#rm p{margin:0;}#rm .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm text.actor&amp;gt;tspan{fill:black;stroke:none;}#rm .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#rm .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#rm .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#rm #arrowhead path{fill:#333;stroke:#333;}#rm .sequenceNumber{fill:white;}#rm #sequencenumber{fill:#333;}#rm #crosshead path{fill:#333;stroke:#333;}#rm .messageText{fill:#333;stroke:none;}#rm .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm .labelText,#rm .labelText&amp;gt;tspan{fill:black;stroke:none;}#rm .loopText,#rm .loopText&amp;gt;tspan{fill:black;stroke:none;}#rm .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#rm .note{stroke:#aaaa33;fill:#fff5ad;}#rm .noteText,#rm .noteText&amp;gt;tspan{fill:black;stroke:none;}#rm .activation0{fill:#f4f4f4;stroke:#666;}#rm .activation1{fill:#f4f4f4;stroke:#666;}#rm .activation2{fill:#f4f4f4;stroke:#666;}#rm .actorPopupMenu{position:absolute;}#rm .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#rm .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm .actor-man circle,#rm line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#rm :root{--mermaid-font-family:inherit;}&lt;/style&gt;&lt;g&gt;&lt;/g&gt;&lt;defs&gt;&lt;symbol height="24" width="24" id="computer"&gt;&lt;path d="M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z" transform="scale(.5)"&gt;&lt;/path&gt;&lt;/symbol&gt;&lt;/defs&gt;&lt;defs&gt;&lt;symbol clip-rule="evenodd" fill-rule="evenodd" id="database"&gt;&lt;path d="M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z" transform="scale(.5)"&gt;&lt;/path&gt;&lt;/symbol&gt;&lt;/defs&gt;&lt;defs&gt;&lt;symbol height="24" width="24" id="clock"&gt;&lt;path d="M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z" transform="scale(.5)"&gt;&lt;/path&gt;&lt;/symbol&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker orient="auto-start-reverse" markerHeight="12" markerWidth="12" markerUnits="userSpaceOnUse" refY="5" refX="7.9" id="arrowhead"&gt;&lt;path d="M -1 0 L 10 5 L 0 10 z"&gt;&lt;/path&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker refY="4.5" refX="4" orient="auto" markerHeight="8" markerWidth="15" id="crosshead"&gt;&lt;path d="M 1,2 L 6,7 M 6,2 L 1,7" stroke-width="1pt" style="stroke-dasharray: 0px, 0px;" stroke="#000000" fill="none"&gt;&lt;/path&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker orient="auto" markerHeight="28" markerWidth="20" refY="7" refX="15.5" id="filled-head"&gt;&lt;path d="M 18,7 L9,13 L14,7 L9,1 Z"&gt;&lt;/path&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker orient="auto" markerHeight="40" markerWidth="60" refY="15" refX="15" id="sequencenumber"&gt;&lt;circle r="6" cy="15" cx="15"&gt;&lt;/circle&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="75" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="80" x="191"&gt;&lt;tspan x="191"&gt;Initialization Phase&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="activation0" height="380" width="10" stroke="#666" fill="#EDF2AE" y="115" x="70"&gt;&lt;/rect&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="activation0" height="328" width="10" stroke="#666" fill="#EDF2AE" y="167" x="302"&gt;&lt;/rect&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="275" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="280" x="191"&gt;&lt;tspan x="191"&gt;Operation Phase&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="345" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="350" x="191"&gt;&lt;tspan x="191"&gt;Normal protocol operations&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="405" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="410" x="191"&gt;&lt;tspan x="191"&gt;Shutdown&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="505" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="510" x="191"&gt;&lt;tspan x="191"&gt;Connection closed&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="130" x="190"&gt;initialize request&lt;/text&gt;&lt;line marker-end="url(#arrowhead)" style="fill: none;" stroke="none" stroke-width="2" class="messageLine0" y2="165" x2="299" y1="165" x1="80"&gt;&lt;/line&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="180" x="193"&gt;initialize response&lt;/text&gt;&lt;line marker-end="url(#arrowhead)" stroke="none" stroke-width="2" class="messageLine1" style="stroke-dasharray: 3px, 3px; fill: none;" y2="215" x2="83" y1="215" x1="302"&gt;&lt;/line&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="230" x="190"&gt;initialized notification&lt;/text&gt;&lt;line marker-end="url(#filled-head)" stroke="none" stroke-width="2" class="messageLine1" style="stroke-dasharray: 3px, 3px; fill: none;" y2="265" x2="299" y1="265" x1="80"&gt;&lt;/line&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="460" x="190"&gt;Disconnect&lt;/text&gt;&lt;line marker-end="url(#filled-head)" stroke="none" stroke-width="2" class="messageLine1" style="stroke-dasharray: 3px, 3px; fill: none;" y2="495" x2="299" y1="495" x1="80"&gt;&lt;/line&gt;&lt;/svg&gt;&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;You can play with this right now.&lt;/h1&gt;
    &lt;p&gt;MCPs are barely six months old, but we are keeping up with the latest&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/mcp"&gt;
        Try launching your MCP server on Fly.io today &lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='mcps-are-not-inherently-secure-or-private' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-not-inherently-secure-or-private' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are &lt;strong&gt;&lt;i&gt;not&lt;/i&gt;&lt;/strong&gt; Inherently Secure or Private&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Here I am not talking about &lt;a href='https://simonwillison.net/2025/Apr/9/mcp-prompt-injection/' title=''&gt;prompt injection&lt;/a&gt; or &lt;a href='https://simonwillison.net/2025/May/26/github-mcp-exploited/' title=''&gt;exploitable abilities&lt;/a&gt;, though those are real problems too.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m talking about something more fundamental and basic. Let&amp;rsquo;s take a look at the very same &lt;a href='https://github.com/github/github-mcp-server' title=''&gt;GitHub MCP&lt;/a&gt; featured in the above two descriptions of an exploitable exposure. The current process for installing a stdio MCP into Claude involves placing secrets in plain text in a well-defined location. And the process for installing the &lt;em&gt;next&lt;/em&gt; MCP server is to download a program from a third party and run that tool in a way that has access to this very file.&lt;/p&gt;

&lt;p&gt;Addressing MCP security requires a holistic approach, but one key strategy component is the ability to run an MCP server on a remote machine which can only be accessed by you, and only after you present a revocable bearer token. That remote machine may require access to secrets (like your GitHub token), but those secrets are not something either your LLM client or other MCP servers will be able to access directly.&lt;/p&gt;
&lt;h2 id='mcps-should-be-considered-family' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-should-be-considered-family' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs should be considered family&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Recapping: &lt;a href='https://www.usa.philips.com/' title=''&gt;Philips&lt;/a&gt; has an &lt;a href='https://developers.meethue.com/' title=''&gt;API and SDK&lt;/a&gt; for Hue that is used by perhaps thousands, and has an &lt;a href='https://www.amazon.com/Philips-Hue/dp/B01LWAGUG7' title=''&gt;Alexa Skill&lt;/a&gt; that is used by untold millions. Of course, somebody already built a &lt;a href='https://github.com/ThomasRohde/hue-mcp' title=''&gt;Philips Hue MCP Server&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;LLMs are brains. MCPs give LLMs eyes, hands, and legs. The end result is a robot. Most MCPs today are brainless—they merely are eyes, hands, and legs. The results are appliances. I buy and install a dishwasher. You do too. Both of our dishwashers perform the same basic function. As do any Hue lights we may buy.&lt;/p&gt;

&lt;p&gt;In The Jetsons, &lt;a href='https://thejetsons.fandom.com/wiki/Rosey' title=''&gt;Rosie&lt;/a&gt; is a maid and housekeeper who is also a member of the family. She is good friends with Jane and takes the role of a surrogate aunt towards Elroy and Judy. Let&amp;rsquo;s start there and go further.&lt;/p&gt;

&lt;p&gt;A person with an LLM can build things. Imagine having a 3D printer that makes robots or agents that act on your behalf. You may want an agent to watch for trends in your business, keep track of what events are happening in your area, screen your text messages, or act as a DJ when you get together with friends.&lt;/p&gt;

&lt;p&gt;You may share the MCP servers you create with others to use as starting points, but they undoubtedly will adapt them and make them their own.&lt;/p&gt;
&lt;h2 id='closing-thoughts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#closing-thoughts' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Closing Thoughts&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Don&amp;rsquo;t get me wrong. I am not saying there won&amp;rsquo;t be MCPs that are mere appliances—there undoubtedly will be plenty of those. But not all MCPs will be appliances; many will be agents.&lt;/p&gt;

&lt;p&gt;Today, most people exploring LLMs are trying to add LLMs to things they already have—for example, IDEs. MCPs flip this. Instead of adding LLMs to something you already have, you add something you already have to an LLM.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://desktopcommander.app/' title=''&gt;Desktop Commander MCP&lt;/a&gt; is an example I&amp;rsquo;m particularly fond of that does exactly this. It also happens to be a prime example of a tool you want to give root access to, then lock it in a secure box and run someplace other than your laptop. &lt;a href='https://fly.io/docs/mcp/examples/#desktop-commander' title=''&gt;Give it a try&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Microsoft is actively working on &lt;a href='https://developer.microsoft.com/en-us/windows/agentic/' title=''&gt;Agentic Windows&lt;/a&gt;. Your smartphone is the obvious next frontier. Today, you have an app store and a web browser. MCP servers have the potential to make both obsolete—and this could happen sooner than you think.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>My AI Skeptic Friends Are All Nuts</title>
        <link rel="alternate" href="https://fly.io/blog/youre-all-nuts/"/>
        <id>https://fly.io/blog/youre-all-nuts/</id>
        <published>2025-06-02T00:00:00+00:00</published>
        <updated>2025-06-05T21:38:19+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/youre-all-nuts/assets/whoah.png"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;A heartfelt provocation about AI-assisted programming.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Tech execs are mandating LLM adoption. That&amp;rsquo;s bad strategy. But I get where they&amp;rsquo;re coming from.&lt;/p&gt;

&lt;p&gt;Some of the smartest people I know share a bone-deep belief that AI is a fad —  the next iteration of NFT mania. I&amp;rsquo;ve been reluctant to push back on them, because, well, they&amp;rsquo;re smarter than me. But their arguments are unserious, and worth confronting. Extraordinarily talented people are doing work that LLMs already do better, out of  spite.&lt;/p&gt;

&lt;p&gt;All progress on LLMs could halt today, and LLMs would remain the 2nd most important thing to happen over the course of my career.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;Important caveat&lt;/strong&gt;: I’m discussing only the implications of LLMs for software development. For art, music, and writing? I got nothing. I’m inclined to believe the skeptics in those fields. I just don’t believe them about mine.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Bona fides: I&amp;rsquo;ve been shipping software since the mid-1990s. I started out in boxed, shrink-wrap C code.  Survived an ill-advised &lt;a href='https://www.amazon.com/Modern-Design-Generic-Programming-Patterns/dp/0201704315' title=''&gt;Alexandrescu&lt;/a&gt; C++ phase. Lots of Ruby and Python tooling.  Some kernel work. A whole lot of server-side C, Go, and Rust. However you define &amp;ldquo;serious developer&amp;rdquo;, I qualify. Even if only on one of your lower tiers.&lt;/p&gt;
&lt;h3 id='level-setting' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#level-setting' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;level setting&lt;/span&gt;&lt;/h3&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;† (or, God forbid, 2 years ago with Copilot)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;First, we need to get on the same page. If you were trying and failing to use an LLM for code 6 months ago †, you’re not doing what most serious LLM-assisted coders are doing.&lt;/p&gt;

&lt;p&gt;People coding with LLMs today use agents. Agents get to poke around your codebase on their own. They author files directly. They run tools. They compile code, run tests, and iterate on the results. They also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pull in arbitrary code from the tree, or from other trees online, into their context windows,
&lt;/li&gt;&lt;li&gt;run standard Unix tools to navigate the tree and extract information,
&lt;/li&gt;&lt;li&gt;interact with Git,
&lt;/li&gt;&lt;li&gt;run existing tooling, like linters, formatters, and model checkers, and
&lt;/li&gt;&lt;li&gt;make essentially arbitrary tool calls (that you set up) through MCP.
&lt;/li&gt;&lt;/ul&gt;
&lt;div class="callout"&gt;&lt;p&gt;The code in an agent that actually “does stuff” with code is not, itself, AI. This should reassure you. It’s surprisingly simple systems code, wired to ground truth about programming in the same way a Makefile is. You could write an effective coding agent in a weekend. Its strengths would have more to do with how you think about and structure builds and linting and test harnesses than with how advanced o3 or Sonnet have become.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you’re making requests on a ChatGPT page and then pasting the resulting (broken) code into your editor, you’re not doing what the AI boosters are doing. No wonder you&amp;rsquo;re talking past each other.&lt;/p&gt;
&lt;h3 id='the-positive-case' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-positive-case' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;the positive case&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img alt="four quadrants of tedium and importance" src="/blog/youre-all-nuts/assets/code-quad.png?2/3&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;LLMs can write a large fraction of all the tedious code you’ll ever need to write. And most code on most projects is tedious. LLMs drastically reduce the number of things you’ll ever need to Google. They look things up themselves. Most importantly, they don’t get tired; they’re immune to inertia.&lt;/p&gt;

&lt;p&gt;Think of anything you wanted to build but didn&amp;rsquo;t. You tried to home in on some first steps. If you&amp;rsquo;d been in the limerent phase of a new programming language, you&amp;rsquo;d have started writing. But you weren&amp;rsquo;t, so you put it off, for a day, a year, or your whole career.&lt;/p&gt;

&lt;p&gt;I can feel my blood pressure rising thinking of all the bookkeeping and Googling and dependency drama of a new project. An LLM can be instructed to just figure all that shit out. Often, it will drop you precisely at that golden moment where shit almost works, and development means tweaking code and immediately seeing things work better. That dopamine hit is why I code.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a downside. Sometimes, gnarly stuff needs doing. But you don&amp;rsquo;t wanna do it. So you refactor unit tests, soothing yourself with the lie that you&amp;rsquo;re doing real work. But an LLM can be told to go refactor all your unit tests. An agent can occupy itself for hours putzing with your tests in a VM and come back later with a PR. If you listen to me, you’ll know that. You&amp;rsquo;ll feel worse yak-shaving. You&amp;rsquo;ll end up doing… real work.&lt;/p&gt;
&lt;h3 id='but-you-have-no-idea-what-the-code-is' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-you-have-no-idea-what-the-code-is' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but you have no idea what the code is&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Are you a vibe coding Youtuber? Can you not read code? If so: astute point. Otherwise: what the fuck is wrong with you?&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;ve always been responsible for what you merge to &lt;code&gt;main&lt;/code&gt;. You were five years go. And you are tomorrow, whether or not you use an LLM.&lt;/p&gt;

&lt;p&gt;If you build something with an LLM that people will depend on, read the code. In fact, you&amp;rsquo;ll probably do more than that. You&amp;rsquo;ll spend 5-10 minutes knocking it back into your own style. LLMs are &lt;a href='https://github.com/PatrickJS/awesome-cursorrules' title=''&gt;showing signs of adapting&lt;/a&gt; to local idiom, but we’re not there yet.&lt;/p&gt;

&lt;p&gt;People complain about LLM-generated code being “probabilistic”. No it isn&amp;rsquo;t. It’s code. It&amp;rsquo;s not Yacc output. It&amp;rsquo;s knowable. The LLM might be stochastic. But the LLM doesn’t matter. What matters is whether you can make sense of the result, and whether your guardrails hold.&lt;/p&gt;

&lt;p&gt;Reading other people&amp;rsquo;s code is part of the job. If you can&amp;rsquo;t metabolize the boring, repetitive code an LLM generates: skills issue! How are you handling the chaos human developers turn out on a deadline?&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (because it can hold 50-70kloc in its context window)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;For the last month or so, Gemini 2.5 has been my go-to †. Almost nothing it spits out for me merges without edits. I’m sure there’s a skill to getting a SOTA model to one-shot a feature-plus-merge! But I don’t care. I like moving the code around and chuckling to myself while I delete all the stupid comments. I have to read the code line-by-line anyways.&lt;/p&gt;
&lt;h3 id='but-hallucination' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-hallucination' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but hallucination&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;If hallucination matters to you, your programming language has let you down.&lt;/p&gt;

&lt;p&gt;Agents lint. They compile and run tests. If their LLM invents a new function signature, the agent sees the error. They feed it back to the LLM, which says “oh, right, I totally made that up” and then tries again.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;ll only notice this happening if you watch the chain of thought log your agent generates. Don&amp;rsquo;t. This is why I like &lt;a href='https://zed.dev/agentic' title=''&gt;Zed&amp;rsquo;s agent mode&lt;/a&gt;: it begs you to tab away and let it work, and pings you with a desktop notification when it&amp;rsquo;s done.&lt;/p&gt;

&lt;p&gt;I’m sure there are still environments where hallucination matters. But &amp;ldquo;hallucination&amp;rdquo; is the first thing developers bring up when someone suggests using LLMs, despite it being (more or less) a solved problem.&lt;/p&gt;
&lt;h3 id='but-the-code-is-shitty-like-that-of-a-junior-developer' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-code-is-shitty-like-that-of-a-junior-developer' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the code is shitty, like that of a junior developer&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Does an intern cost $20/month? Because that&amp;rsquo;s what Cursor.ai costs.&lt;/p&gt;

&lt;p&gt;Part of being a senior developer is making less-able coders productive, be they fleshly or algebraic. Using agents well is both a both a  skill and an engineering project all its own, of prompts, indices, &lt;a href='https://fly.io/blog/semgrep-but-for-real-now/' title=''&gt;and (especially) tooling.&lt;/a&gt; LLMs only produce shitty code if you let them.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (Also: 100% of all the Bash code you should author ever again)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Maybe the current confusion is about who&amp;rsquo;s doing what work. Today, LLMs do a lot of typing, Googling, test cases †, and edit-compile-test-debug cycles. But even the most Claude-poisoned serious developers in the world still own curation, judgement, guidance, and direction.&lt;/p&gt;

&lt;p&gt;Also: let’s stop kidding ourselves about how good our human first cuts really are.&lt;/p&gt;
&lt;h3 id='but-its-bad-at-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-its-bad-at-rust' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but it’s bad at rust&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;It’s hard to get a good toolchain for Brainfuck, too. Life’s tough in the aluminum siding business.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (and they surely will; the Rust community takes tooling seriously)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A lot of LLM skepticism probably isn&amp;rsquo;t really about LLMs. It&amp;rsquo;s projection. People say &amp;ldquo;LLMs can&amp;rsquo;t code&amp;rdquo; when what they really mean is &amp;ldquo;LLMs can&amp;rsquo;t write Rust&amp;rdquo;. Fair enough! But people select languages in part based on how well LLMs work with them, so Rust people should get on that †.&lt;/p&gt;

&lt;p&gt;I work mostly in Go. I’m confident the designers of the Go programming language didn&amp;rsquo;t set out to produce the most LLM-legible language in the industry. They succeeded nonetheless. Go has just enough type safety, an extensive standard library, and a culture that prizes (often repetitive) idiom. LLMs kick ass generating it.&lt;/p&gt;

&lt;p&gt;All this is to say: I write some Rust. I like it fine. If LLMs and Rust aren&amp;rsquo;t working for you, I feel you. But if that’s your whole thing, we’re not having the same argument.&lt;/p&gt;
&lt;h3 id='but-the-craft' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-craft' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the craft&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Do you like fine Japanese woodworking? All hand tools and sashimono joinery? Me too. Do it on your own time.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (I’m a piker compared to my woodworking friends)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I have a basic wood shop in my basement †. I could get a lot of satisfaction from building a table. And, if that table is a workbench or a grill table, sure, I&amp;rsquo;ll build it. But if I need, like, a table? For people to sit at? In my office? I buy a fucking table.&lt;/p&gt;

&lt;p&gt;Professional software developers are in the business of solving practical problems for people with code. We are not, in our day jobs, artisans. Steve Jobs was wrong: we do not need to carve the unseen feet in the sculpture. Nobody cares if the logic board traces are pleasingly routed. If anything we build endures, it won&amp;rsquo;t be because the codebase was beautiful.&lt;/p&gt;

&lt;p&gt;Besides, that’s not really what happens. If you’re taking time carefully golfing functions down into graceful, fluent, minimal functional expressions, alarm bells should ring. You’re yak-shaving. The real work has depleted your focus. You&amp;rsquo;re not building: you&amp;rsquo;re self-soothing.&lt;/p&gt;

&lt;p&gt;Which, wait for it, is something LLMs are good for. They devour schlep, and clear a path to the important stuff, where your judgement and values really matter.&lt;/p&gt;
&lt;h3 id='but-the-mediocrity' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-mediocrity' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the mediocrity&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;As a mid-late career coder, I&amp;rsquo;ve come to appreciate mediocrity. You should be so lucky as to  have it flowing almost effortlessly from a tap.&lt;/p&gt;

&lt;p&gt;We all write mediocre code. Mediocre code: often fine. Not all code is equally important. Some code should be mediocre. Maximum effort on a random unit test? You&amp;rsquo;re doing something wrong. Your team lead should correct you.&lt;/p&gt;

&lt;p&gt;Developers all love to preen about code. They worry LLMs lower the &amp;ldquo;ceiling&amp;rdquo; for quality. Maybe. But they also raise the &amp;ldquo;floor&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Gemini&amp;rsquo;s floor is higher than my own.  My code looks nice. But it&amp;rsquo;s not as thorough. LLM code is repetitive. But mine includes dumb contortions where I got too clever trying to DRY things up.&lt;/p&gt;

&lt;p&gt;And LLMs aren&amp;rsquo;t mediocre on every axis. They almost certainly have a bigger bag of algorithmic tricks than you do: radix tries, topological sorts, graph reductions, and LDPC codes. Humans romanticize &lt;code&gt;rsync&lt;/code&gt; (&lt;a href='https://www.andrew.cmu.edu/course/15-749/READINGS/required/cas/tridgell96.pdf' title=''&gt;Andrew Tridgell&lt;/a&gt; wrote a paper about it!). To an LLM it might not be that much more interesting than a SQL join.&lt;/p&gt;

&lt;p&gt;But I&amp;rsquo;m getting ahead of myself. It doesn&amp;rsquo;t matter. If truly mediocre code is all we ever get from LLMs, that&amp;rsquo;s still huge. It&amp;rsquo;s that much less mediocre code humans have to write.&lt;/p&gt;
&lt;h3 id='but-itll-never-be-agi' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-itll-never-be-agi' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but it&amp;rsquo;ll never be AGI&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;I don&amp;rsquo;t give a shit.&lt;/p&gt;

&lt;p&gt;Smart practitioners get wound up by the AI/VC hype cycle. I can&amp;rsquo;t blame them. But it&amp;rsquo;s not an argument. Things either work or they don&amp;rsquo;t, no matter what Jensen Huang has to say about it.&lt;/p&gt;
&lt;h3 id='but-they-take-rr-jerbs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-they-take-rr-jerbs' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but they take-rr jerbs&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href='https://news.ycombinator.com/item?id=43776612' title=''&gt;So does open source.&lt;/a&gt; We used to pay good money for databases.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;re a field premised on automating other people’s jobs away. &amp;ldquo;Productivity gains,&amp;rdquo; say the economists. You get what that means, right? Fewer people doing the same stuff. Talked to a travel agent lately? Or a floor broker? Or a record store clerk? Or a darkroom tech?&lt;/p&gt;

&lt;p&gt;When this argument comes up, libertarian-leaning VCs start the chant: lamplighters, creative destruction, new kinds of work. Maybe. But I&amp;rsquo;m not hypnotized. I have no fucking clue whether we’re going to be better off after LLMs. Things could get a lot worse for us.&lt;/p&gt;

&lt;p&gt;LLMs really might displace many software developers. That&amp;rsquo;s not a high horse we get to ride. Our jobs are just as much in tech&amp;rsquo;s line of fire as everybody else&amp;rsquo;s have been for the last 3 decades. We&amp;rsquo;re not &lt;a href='https://en.wikipedia.org/wiki/2024_United_States_port_strike' title=''&gt;East Coast dockworkers&lt;/a&gt;; we won&amp;rsquo;t stop progress on our own.&lt;/p&gt;
&lt;h3 id='but-the-plagiarism' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-plagiarism' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the plagiarism&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Artificial intelligence is profoundly — and probably unfairly — threatening to visual artists in ways that might be hard to appreciate if you don&amp;rsquo;t work in the arts.&lt;/p&gt;

&lt;p&gt;We imagine artists spending their working hours pushing the limits of expression. But the median artist isn&amp;rsquo;t producing gallery pieces. They produce on brief: turning out competent illustrations and compositions for magazine covers, museum displays, motion graphics, and game assets.&lt;/p&gt;

&lt;p&gt;LLMs easily — alarmingly — clear industry quality bars. Gallingly, one of the things they&amp;rsquo;re best at is churning out just-good-enough facsimiles of human creative work.  I have family in visual arts. I can&amp;rsquo;t talk to them about LLMs. I don&amp;rsquo;t blame them. They&amp;rsquo;re probably not wrong.&lt;/p&gt;

&lt;p&gt;Meanwhile, software developers spot code fragments &lt;a href="https://arxiv.org/abs/2311.17035"&gt;seemingly lifted&lt;/a&gt; from public repositories on Github and lose their shit. What about the licensing? If you&amp;rsquo;re a lawyer, I defer. But if you&amp;rsquo;re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.&lt;/p&gt;

&lt;p&gt;The median dev thinks Star Wars and Daft Punk are a public commons. The great cultural project of developers has been opposing any protection that might inconvenience a monetizable media-sharing site. When they fail at policy, they route around it with coercion. They stand up global-scale piracy networks and sneer at anybody who so much as tries to preserve a new-release window for a TV show.&lt;/p&gt;

&lt;p&gt;Call any of this out if you want to watch a TED talk about how hard it is to stream &lt;em&gt;The Expanse&lt;/em&gt; on LibreWolf. Yeah, we get it. You don&amp;rsquo;t believe in IPR. Then shut the fuck up about IPR. Reap the whirlwind.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s all special pleading anyways. LLMs digest code further than you do. If you don&amp;rsquo;t believe a typeface designer can stake a moral claim on the terminals and counters of a letterform, you sure as hell can&amp;rsquo;t be possessive about a red-black tree.&lt;/p&gt;
&lt;h3 id='positive-case-redux' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#positive-case-redux' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;positive case redux&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;When I started writing a couple days ago, I wrote a section to &amp;ldquo;level set&amp;rdquo; to the  state of the art of LLM-assisted programming. A bluefish filet has a longer shelf life than an LLM take. In the time it took you to read this, everything changed.&lt;/p&gt;

&lt;p&gt;Kids today don&amp;rsquo;t just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They&amp;rsquo;ve got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&amp;ldquo;I&amp;rsquo;m sipping rocket fuel right now,&amp;rdquo;&lt;/em&gt; a friend tells me. &lt;em&gt;&amp;ldquo;The folks on my team who aren&amp;rsquo;t embracing AI? It&amp;rsquo;s like they&amp;rsquo;re standing still.&amp;rdquo;&lt;/em&gt; He&amp;rsquo;s not bullshitting me. He doesn&amp;rsquo;t work in SFBA. He&amp;rsquo;s got no reason to lie.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s plenty of things I can&amp;rsquo;t trust an LLM with. No LLM has any of access to prod here. But I&amp;rsquo;ve been first responder on an incident and fed 4o — not o4-mini, 4o — log transcripts, and watched it in seconds spot LVM metadata corruption issues on a host we&amp;rsquo;ve been complaining about for months. Am I better than an LLM agent at interrogating OpenSearch logs and Honeycomb traces? No. No, I am not.&lt;/p&gt;

&lt;p&gt;To the consternation of many of my friends, I&amp;rsquo;m not a radical or a futurist. I&amp;rsquo;m a statist. I believe in the haphazard perseverance of complex systems, of institutions, of reversions to the mean. I write Go and Python code. I&amp;rsquo;m not a Kool-aid drinker.&lt;/p&gt;

&lt;p&gt;But something real is happening. My smartest friends are blowing it off. Maybe I persuade you. Probably I don&amp;rsquo;t. But we need to be done making space for bad arguments.&lt;/p&gt;
&lt;h3 id='but-im-tired-of-hearing-about-it' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-im-tired-of-hearing-about-it' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but i&amp;rsquo;m tired of hearing about it&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;And here I rejoin your company. I read &lt;a href='https://simonwillison.net/' title=''&gt;Simon Willison&lt;/a&gt;, and that&amp;rsquo;s all I really need. But all day, every day, a sizable chunk of the front page of HN is allocated to LLMs: incremental model updates, startups doing things with LLMs, LLM tutorials, screeds against LLMs. It&amp;rsquo;s annoying!&lt;/p&gt;

&lt;p&gt;But AI is also incredibly — a word I use advisedly — important. It&amp;rsquo;s getting the same kind of attention that smart phones got in 2008, and not as much as the Internet got. That seems about right.&lt;/p&gt;

&lt;p&gt;I think this is going to get clearer over the next year. The cool kid haughtiness about &amp;ldquo;stochastic parrots&amp;rdquo; and &amp;ldquo;vibe coding&amp;rdquo; can&amp;rsquo;t survive much more contact with reality. I&amp;rsquo;m snarking about these people, but I meant what I said: they&amp;rsquo;re smarter than me. And when they get over this affectation, they&amp;rsquo;re going to make coding agents profoundly more effective than they are today.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Using Kamal 2.0 in Production</title>
        <link rel="alternate" href="https://fly.io/blog/kamal-in-production/"/>
        <id>https://fly.io/blog/kamal-in-production/</id>
        <published>2025-05-29T00:00:00+00:00</published>
        <updated>2025-06-05T20:40:14+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/kamal-in-production/assets/production.jpg"/>
        <content type="html">&lt;p&gt;&lt;a href='https://pragprog.com/titles/rails8/agile-web-development-with-rails-8/' title=''&gt;Agile Web Development with Rails 8&lt;/a&gt; is off to production, where they do things like editing, indexing, pagination, and printing. In researching the chapter on Deployment and Production, I became very dissatisfied with the content available on Kamal. I ended up writing my own, and it went well beyond the scope of the book. I then extracted what I needed from the result and put it in the book.&lt;/p&gt;

&lt;p&gt;Now that I have some spare time, I took a look at the greater work. It was more than a chapter and less than a book, so I decided to publish it &lt;a href='https://rubys.github.io/kamal-in-production/' title=''&gt;online&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This took me only a matter of hours. I had my notes in the XML grammar that Pragmatic Programming uses for books. I asked GitHub Copilot to convert them to Markdown. It did the job without my having to explain the grammar. It made intelligent guesses as to how to handle footnotes and got a number of these wrong, but that was easy to fix. On a lark, I asked it to proofread the content, and it did that too.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Don&amp;rsquo;t get me wrong, Kamal is great. There are plenty of videos on how to get toy projects online, and the documentation will tell you what each field in the configuration file does. But none pull together everything you need to deploy a real project. For example, &lt;a href='https://rubys.github.io/kamal-in-production/Assemble/' title=''&gt;there are seven things you need to get started&lt;/a&gt;. Some are optional, some you may already have, and all can be gathered quickly &lt;strong class='font-semibold text-navy-950'&gt;if you have a list&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Kamal is just one piece of the puzzle. To deploy your software using Kamal, you need to be aware of the vast Docker ecosystem. You will want to set up a builder, sign up for a container repository, and lock down your secrets. And as you grow, you will want a load balancer and a managed database.&lt;/p&gt;

&lt;p&gt;And production is much more than copying files and starting a process. It is ensuring that your database is backed up, that your logs are searchable, and that your application is being monitored. It is also about ensuring that your application is secure.&lt;/p&gt;

&lt;p&gt;My list is opinionated. Each choice has a lot of options. For SSH keys, there are a lot of key types. If you don&amp;rsquo;t have an opinion, go with what GitHub recommends. For hosting, there are a lot of providers, and most videos start with Hetzner, which is a solid choice. But did you know that there are separate paths for Cloud and Robot, and when you would want either?&lt;/p&gt;

&lt;p&gt;A list with default options and alternatives highlighted was what would have helped me get started. Now it is available to you. The &lt;a href='https://github.com/rubys/kamal-in-production/' title=''&gt;source is on GitHub&lt;/a&gt;. &lt;a href='https://creativecommons.org/public-domain/cc0/' title=''&gt;CC0 licensed&lt;/a&gt;. Feel free to add side pages or links to document Digital Ocean, add other monitoring tools, or simply correct a typo that GitHub Copilot and I missed (or made).&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;And if you happen to be in the south eastern part of the US in August, come see me talk on this topic at the &lt;a href='https://blog.carolina.codes/p/announcing-our-2025-speakers' title=''&gt;Carolina Code Conference&lt;/a&gt;. If you can&amp;rsquo;t make it, the presentation will be recorded and posted online.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>parking_lot: ffffffffffffffff...</title>
        <link rel="alternate" href="https://fly.io/blog/parking-lot-ffffffffffffffff/"/>
        <id>https://fly.io/blog/parking-lot-ffffffffffffffff/</id>
        <published>2025-05-28T00:00:00+00:00</published>
        <updated>2025-05-28T20:20:37+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/parking-lot-ffffffffffffffff/assets/ffff-cover.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a public cloud that runs apps in a bunch of locations all over the world. This is a post about a gnarly bug in our Anycast router, the largest Rust project in our codebase. You won’t need much of any Rust to follow it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The basic idea is simple: customers give us Docker containers, and tell us which one of 30+ regions around the world they want them to run in. We convert the containers into lightweight virtual machines, and then link them to an Anycast network. If you boot up an app here in Sydney and Frankfurt, and a request for it lands in Narita, it&amp;rsquo;ll get routed to Sydney. The component doing that work is called &lt;code&gt;fly-proxy&lt;/code&gt;. It&amp;rsquo;s a Rust program, and it has been ill behaved of late.&lt;/p&gt;
&lt;h3 id='dramatis-personae' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#dramatis-personae' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Dramatis Personae&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;fly-proxy&lt;/code&gt;, our intrepid Anycast router.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;corrosion&lt;/code&gt;, our intrepid Anycast routing protocol.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Rust&lt;/code&gt;, a programming language you probably don&amp;rsquo;t use.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;read-write locks&lt;/code&gt;, a synchronization primitive that allows for many readers &lt;em&gt;or&lt;/em&gt; one single writer.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parking_lot&lt;/code&gt;, a well-regarded optimized implementation of locks in Rust.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Gaze not into the abyss, lest you become recognized as an &lt;strong class="font-semibold text-navy-950"&gt;&lt;em&gt;abyss domain expert&lt;/em&gt;&lt;/strong&gt;, and they expect you keep gazing into the damn thing&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mathewson &lt;a href="https://x.com/nickm_tor/status/860234274842324993?lang=en" title=""&gt;6:31&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;&lt;h3 id='anycast-routing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#anycast-routing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Anycast Routing&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;You already know how to build a proxy. Connection comes in, connection goes out, copy back and forth. So for the amount of time we spend writing about &lt;code&gt;fly-proxy&lt;/code&gt;, you might wonder what the big deal is.&lt;/p&gt;

&lt;p&gt;To be fair, in the nuts and bolts of actually proxying requests, &lt;code&gt;fly-proxy&lt;/code&gt; does some interesting stuff. For one thing, it&amp;rsquo;s &lt;a href='https://github.com/jedisct1/yes-rs' title=''&gt;written in Rust&lt;/a&gt;, which is apparently a big deal all on its own. It&amp;rsquo;s a large, asynchronous codebase that handles multiple protocols, TLS termination, and certificate issuance. It exercises a lot of &lt;a href='https://tokio.rs/' title=''&gt;Tokio&lt;/a&gt; features.&lt;/p&gt;

&lt;p&gt;But none of this is the hard part of &lt;code&gt;fly-proxy&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We operate thousands of servers around the world. A customer Fly Machine might get scheduled onto any of them; in some places, the expectation we set with customers is that their Machines will potentially start in less than a second. More importantly, a Fly Machine can terminate instantly. In both cases, but especially the latter, &lt;code&gt;fly-proxy&lt;/code&gt; potentially needs to know, so that it does (or doesn&amp;rsquo;t) route traffic there.&lt;/p&gt;

&lt;p&gt;This is the hard problem: managing millions of connections for millions of apps. It&amp;rsquo;s a lot of state to manage, and it&amp;rsquo;s in constant flux. We refer to this as the &amp;ldquo;state distribution problem&amp;rdquo;, but really, it quacks like a routing protocol.&lt;/p&gt;
&lt;h3 id='our-routing-protocol-is-corrosion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-routing-protocol-is-corrosion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Our Routing Protocol is Corrosion&lt;/span&gt;&lt;/h3&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Corrosion2, to be precise.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We&amp;rsquo;ve been through multiple iterations of the state management problem, and the stable place we&amp;rsquo;ve settled is a &lt;a href='https://github.com/superfly/corrosion' title=''&gt;system called Corrosion&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Corrosion is a large SQLite database mirrored globally across several thousand servers. There are three important factors that make Corrosion work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The SQLite database Corrosion replicates is CRDT-structured.
&lt;/li&gt;&lt;li&gt;In our orchestration model, individual worker servers are the source of truth for any Fly Machines running on them; there&amp;rsquo;s no globally coordinated orchestration state.
&lt;/li&gt;&lt;li&gt;We use &lt;a href='http://fly.io/blog/building-clusters-with-serf/#what-serf-is-doing' title=''&gt;SWIM gossip&lt;/a&gt; to publish updates from those workers across the fleet.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;This works. A Fly Machine terminates in Dallas; a &lt;code&gt;fly-proxy&lt;/code&gt; instance in Singapore knows within a small number of seconds.&lt;/p&gt;
&lt;h3 id='routing-protocol-implementations-are-hard' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#routing-protocol-implementations-are-hard' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Routing Protocol Implementations Are Hard&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;A routing protocol is a canonical example of a distributed system. We&amp;rsquo;ve certainly hit distsys bugs in Corrosion. But this story is about basic implementation issues. &lt;/p&gt;

&lt;p&gt;A globally replicated SQLite database is an awfully nice primitive, but we&amp;rsquo;re not actually doing SQL queries every time a request lands.&lt;/p&gt;

&lt;p&gt;In somewhat the same sense as a router works both with a &lt;a href='https://www.dasblinkenlichten.com/rib-and-fib-understanding-the-terminology/' title=''&gt;RIB and a FIB&lt;/a&gt;, there is in &lt;code&gt;fly-proxy&lt;/code&gt; a system of record for routing information (Corrosion), and then an in-memory aggregation of that information used to make fast decisions. In &lt;code&gt;fly-proxy&lt;/code&gt;, that&amp;rsquo;s called the Catalog. It&amp;rsquo;s a record of everything in Corrosion a proxy might need to know about to forward requests.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s a fun bug from last year:&lt;/p&gt;

&lt;p&gt;At any given point in time, there&amp;rsquo;s a lot going on inside &lt;code&gt;fly-proxy&lt;/code&gt;. It is a highly concurrent system. In particular, there are many readers to the Catalog, and also, when Corrosion updates, writers. We manage access to the Catalog with a system of &lt;a href='https://doc.rust-lang.org/std/sync/struct.RwLock.html' title=''&gt;read-write locks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Rust is big on pattern-matching; instead of control flow based (at bottom) on simple arithmetic expressions, Rust allows you to &lt;code&gt;match&lt;/code&gt; exhaustively (with compiler enforcement) on the types of an expression, and the type system expresses things like &lt;code&gt;Ok&lt;/code&gt; or &lt;code&gt;Err&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But &lt;code&gt;match&lt;/code&gt; can be cumbersome, and so there are shorthands. One of them is &lt;code&gt;if let&lt;/code&gt;, which is syntax that makes a pattern match read like a classic &lt;code&gt;if&lt;/code&gt; statement. Here&amp;rsquo;s an example:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-fl5sng0r"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-fl5sng0r"&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Load&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Local&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.load&lt;/span&gt;&lt;span class="nf"&gt;.read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// do a bunch of stuff with `load`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.init_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &amp;ldquo;if&amp;rdquo; arm of that branch is taken if &lt;code&gt;self.load.read().get()&lt;/code&gt; returns a value with the type &lt;code&gt;Some&lt;/code&gt;. To retrieve that value, the expression calls &lt;code&gt;read()&lt;/code&gt; to grab a lock.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;though Rust programmers probably notice the bug quickly&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The bug is subtle: in that code, the lock &lt;code&gt;self.load.read().get()&lt;/code&gt; takes is held not just for the duration of the &amp;ldquo;if&amp;rdquo; arm, but also for the &amp;ldquo;else&amp;rdquo; arm — you can think of &lt;code&gt;if let&lt;/code&gt; expressions as being rewritten to the equivalent &lt;code&gt;match&lt;/code&gt; expression, where that lifespan is much clearer.&lt;/p&gt;

&lt;p&gt;Anyways that&amp;rsquo;s real code and it occurred on a code path in &lt;code&gt;fly-proxy&lt;/code&gt; that was triggered by a Corrosion update propagating from host to host across our fleet in millisecond intervals of time, converging quickly, like a good little routing protocol, on a global consensus that our entire Anycast routing layer should be deadlocked. Fun times.&lt;/p&gt;
&lt;h3 id='the-watchdog-and-regionalizing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-watchdog-and-regionalizing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Watchdog, and Regionalizing&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The experience of that Anycast outage was arresting, and immediately altered our plans. We set about two tasks, one short-term and one long.&lt;/p&gt;

&lt;p&gt;In the short term: we made deadlocks nonlethal with a &amp;ldquo;watchdog&amp;rdquo; system. &lt;code&gt;fly-proxy&lt;/code&gt; has an internal control channel (it drives a REPL operators can run from our servers). During a deadlock (or dead-loop or exhaustion), that channel becomes nonresponsive. We watch for that and bounce the proxy when it happens. A deadlock is still bad! But it&amp;rsquo;s a second-or-two-length arrhythmia, not asystole.&lt;/p&gt;

&lt;p&gt;Meanwhile, over the long term: we&amp;rsquo;re confronting the implications of all our routing state sharing a global broadcast domain. The update that seized up Anycast last year pertained to an app nobody used. There wasn&amp;rsquo;t any real reason for any &lt;code&gt;fly-proxy&lt;/code&gt; to receive it in the first place. But in the &lt;em&gt;status quo ante&lt;/em&gt; of the outage, every proxy received updates for every Fly Machine.&lt;/p&gt;

&lt;p&gt;They still do. Why? Because it scales, and fixing it turns out to be a huge lift. It&amp;rsquo;s a lift we&amp;rsquo;re still making! It&amp;rsquo;s just taking time. We call this effort &amp;ldquo;regionalization&amp;rdquo;, because the next intermediate goal is to confine most updates to the region (Sydney, Frankfurt, Dallas) in which they occur.&lt;/p&gt;

&lt;p&gt;I hope this has been a satisfying little tour of the problem domain we&amp;rsquo;re working in. We have now reached the point where I can start describing the new bug.&lt;/p&gt;
&lt;h3 id='new-bug-round-1-lazy-loading' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-1-lazy-loading' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 1: Lazy Loading&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;We want to transform our original Anycast router, designed to assume a full picture of global state, into a regionalized router with partitioned state. A sane approach: have it lazy-load state information. Then you can spare most of the state code; state for apps that never show up at a particular &lt;code&gt;fly-proxy&lt;/code&gt; in, say, Hong Kong simply doesn&amp;rsquo;t get loaded.&lt;/p&gt;

&lt;p&gt;For months now, portions of the &lt;code&gt;fly-proxy&lt;/code&gt; Catalog have been lazy-loaded. A few weeks ago, the decision is made to get most of the rest of the Catalog state (apps, machines, &amp;amp;c) lazy-loaded as well. It&amp;rsquo;s a straightforward change and it gets rolled out quickly.&lt;/p&gt;

&lt;p&gt;Almost as quickly, proxies begin locking up and getting bounced by the watchdog. Not in the frightening Beijing-Olympics-level coordinated fashion that happened during the Outage, but any watchdog bounce is a problem.&lt;/p&gt;

&lt;p&gt;We roll back the change.&lt;/p&gt;

&lt;p&gt;From the information we have, we&amp;rsquo;ve narrowed things down to two suspects. First, lazy-loading changes the read/write patterns and thus the pressure on the RWLocks the Catalog uses; it could just be lock contention. Second, we spot a suspicious &lt;code&gt;if let&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id='new-bug-round-2-the-lock-refactor' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-2-the-lock-refactor' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 2: The Lock Refactor&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Whichever the case, there&amp;rsquo;s a reason these proxies locked up with the new lazy-loading code, and our job is to fix it. The &lt;code&gt;if let&lt;/code&gt; is easy. Lock contention is a little trickier.&lt;/p&gt;

&lt;p&gt;At this point it&amp;rsquo;s time to introduce a new character to the story, though they&amp;rsquo;ve been lurking on the stage the whole time: it&amp;rsquo;s &lt;a href='https://github.com/Amanieu/parking_lot' title=''&gt;&lt;code&gt;parking_lot&lt;/code&gt;&lt;/a&gt;, an important, well-regarded, and widely-used replacement for the standard library&amp;rsquo;s lock implementation.&lt;/p&gt;

&lt;p&gt;Locks in &lt;code&gt;fly-proxy&lt;/code&gt; are &lt;code&gt;parking_lot&lt;/code&gt; locks. People use &lt;code&gt;parking_lot&lt;/code&gt; mostly because it is very fast and tight (a lock takes up just a single 64-bit word). But we use it for the feature set. The feature we&amp;rsquo;re going to pull out this time is lock timeouts: the RWLock in &lt;code&gt;parking_lot&lt;/code&gt; exposes a &lt;a href='https://amanieu.github.io/parking_lot/parking_lot/struct.RwLock.html#method.try_write_for' title=''&gt;&lt;code&gt;try_write_for&lt;/code&gt;&lt;/a&gt;method, which takes a &lt;code&gt;Duration&lt;/code&gt;, after which an attempt to grab the write lock fails.&lt;/p&gt;

&lt;p&gt;Before rolling out a new lazy-loading &lt;code&gt;fly-proxy&lt;/code&gt;, we do some refactoring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;our Catalog write locks all time out, so we&amp;rsquo;ll get telemetry and a failure recovery path if that&amp;rsquo;s what&amp;rsquo;s choking the proxy to death,
&lt;/li&gt;&lt;li&gt;we eliminate RAII-style lock acquisition from the Catalog code (in RAII style, you grab a lock into a local value, and the lock is released implicitly when the scope that expression occurs in concludes) and replace it with explicit closures, so you can look at the code and see precisely the interval in which the lock is held, and
&lt;/li&gt;&lt;li&gt;since we now have code that is very explicit about locking intervals, we instrument it with logging and metrics so we can see what&amp;rsquo;s happening.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;We should be set. The suspicious &lt;code&gt;if let&lt;/code&gt; is gone, lock acquisition can time out, and we have all this new visibility.&lt;/p&gt;

&lt;p&gt;Nope. Immediately more lockups, all in Europe, especially in &lt;code&gt;WAW&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id='new-bug-round-3-telemetry-inspection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-3-telemetry-inspection' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 3: Telemetry Inspection&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;That we&amp;rsquo;re still seeing deadlocks is f&amp;#39;ing weird. We&amp;rsquo;ve audited all our Catalog locks. You can look at the code and see the lifespan of a grabbed lock.&lt;/p&gt;

&lt;p&gt;We have a clue: our lock timeout logs spam just before a proxy locks up and is killed by the watchdog. This clue: useless. But we don&amp;rsquo;t know that yet!&lt;/p&gt;

&lt;p&gt;Our new hunch is that something is holding a lock for a very long time, and we just need to find out which thing that is. Maybe the threads updating the Catalog from Corrosion are dead-looping?&lt;/p&gt;

&lt;p&gt;The instrumentation should tell the tale. But it does not. We see slow locks… just before the entire proxy locks up. And they happen in benign, quiet applications. Random, benign, quiet applications.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parking_lot&lt;/code&gt; has a &lt;a href='https://amanieu.github.io/parking_lot/parking_lot/deadlock/index.html' title=''&gt;deadlock detector&lt;/a&gt;. If you ask it, it&amp;rsquo;ll keep a waiting-for dependency graph and detect stalled threads. This runs on its own thread, isolate outside the web of lock dependencies in the rest of the proxy. We cut a build of the proxy with the deadlock detector enabled and wait for something in &lt;code&gt;WAW&lt;/code&gt; to lock up. And it does. But &lt;code&gt;parking_lot&lt;/code&gt; doesn&amp;rsquo;t notice. As far as it&amp;rsquo;s concerned, nothing is wrong.&lt;/p&gt;

&lt;p&gt;We are at this moment very happy we did the watchdog thing.&lt;/p&gt;
&lt;h3 id='new-bug-round-4-descent-into-madness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-4-descent-into-madness' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 4: Descent Into Madness&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;When the watchdog bounces a proxy, it snaps a core dump from the process it just killed. We are now looking at core dumps. There is only one level of decompensation to be reached below &amp;ldquo;inspecting core dumps&amp;rdquo;, and that&amp;rsquo;s &amp;ldquo;blaming the compiler&amp;rdquo;. We will get there.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s Pavel, at the time:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I’ve been staring at the last core dump from &lt;code&gt;waw&lt;/code&gt; . It’s quite strange.
First, there is no thread that’s running inside the critical section. Yet, there is a thread that’s waiting to acquire write lock and a bunch of threads waiting to acquire a read lock.
That’s doesn’t prove anything, of course, as a thread holding catalog write lock might have just released it before core dump was taken. But that would be quite a coincidence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The proxy is locked up. But there are no threads in a critical section for the catalog. Nevertheless, we can see stack traces of multiple threads waiting to acquire locks. In the moment, Pavel thinks this could be a fluke. But we&amp;rsquo;ll soon learn that &lt;em&gt;every single stack trace&lt;/em&gt; shows the same pattern: everything wants the Catalog lock, but nobody has it.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s hard to overstate how weird this is. It breaks both our big theories: it&amp;rsquo;s not compatible with a Catalog deadlock that we missed, and it&amp;rsquo;s not compatible with a very slow lock holder. Like a podcast listener staring into the sky at the condensation trail of a jetliner, we begin reaching for wild theories. For instance: &lt;code&gt;parking_lot&lt;/code&gt; locks are synchronous, but we&amp;rsquo;re a Tokio application; something somewhere could be taking an async lock that&amp;rsquo;s confusing the runtime. Alas, no.&lt;/p&gt;

&lt;p&gt;On the plus side, we are now better at postmortem core dump inspection with &lt;code&gt;gdb&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id='new-bug-round-5-madness-gives-way-to-desperation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-5-madness-gives-way-to-desperation' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 5: Madness Gives Way To Desperation&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Fuck it, we&amp;rsquo;ll switch to &lt;code&gt;read_recursive&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A recursive read lock is an eldritch rite invoked when you need to grab a read lock deep in a call tree where you already grabbed that lock, but can&amp;rsquo;t be arsed to structure the code properly to reflect that. When you ask for a recursive lock, if you already hold the lock, you get the lock again, instead of a deadlock.&lt;/p&gt;

&lt;p&gt;Our theory: &lt;code&gt;parking_lot&lt;/code&gt;goes through some trouble to make sure a stampede of readers won&amp;rsquo;t starve writers, who are usually outnumbered. It prefers writers by preventing readers from acquiring locks when there&amp;rsquo;s at least one waiting writer. And &lt;code&gt;read_recursive&lt;/code&gt; sidesteps that logic.&lt;/p&gt;

&lt;p&gt;Maybe there&amp;rsquo;s some ultra-slow reader somehow not showing up in our traces, and maybe switching to recursive locks will cut through that.&lt;/p&gt;

&lt;p&gt;This does not work. At least, not how we hoped it would. It does generate a new piece of evidence:  &lt;code&gt;RwLock reader count overflow&lt;/code&gt; log messages, and lots of them.&lt;/p&gt;
&lt;h3 id='there-are-things-you-are-not-meant-to-know' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#there-are-things-you-are-not-meant-to-know' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;There Are Things You Are Not Meant To Know&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;You&amp;rsquo;re reading a 3,000 word blog post about a single concurrency bug, so my guess is you&amp;rsquo;re the kind of person who compulsively wants to understand how everything works. That&amp;rsquo;s fine, but a word of advice: there are things where, if you find yourself learning about them in detail, something has gone wrong.&lt;/p&gt;

&lt;p&gt;One of those things is the precise mechanisms used by your RWLock implementation.&lt;/p&gt;

&lt;p&gt;The whole point of &lt;code&gt;parking_lot&lt;/code&gt; is that the locks are tiny, marshalled into a 64 bit word. Those bits are partitioned into &lt;a href='https://github.com/Amanieu/parking_lot/blob/master/src/raw_rwlock.rs' title=''&gt;4 signaling bits&lt;/a&gt; (&lt;code&gt;PARKED&lt;/code&gt;, &lt;code&gt;WRITER_PARKED&lt;/code&gt;, &lt;code&gt;WRITER&lt;/code&gt;, and &lt;code&gt;UPGRADEABLE&lt;/code&gt;) and a 60-bit counter of lock holders.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Me, a dummy: sounds like we overflowed that counter.&lt;/p&gt;

&lt;p&gt;Pavel, a genius: we are not overflowing a 60-bit counter.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ruling out a genuine overflow (which would require paranormal activity) leaves us with corruption as the culprit. Something is bashing our lock word. If the counter is corrupted, maybe the signaling bits are too; then we&amp;rsquo;re in an inconsistent state, an artificial deadlock.&lt;/p&gt;

&lt;p&gt;Easily confirmed. We cast the lock words into &lt;code&gt;usize&lt;/code&gt; and log them. Sure enough, they&amp;rsquo;re &lt;code&gt;0xFFFFFFFFFFFFFFFF&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is a smoking gun, because it implies all 4 signaling bits are set, and that includes &lt;code&gt;UPGRADEABLE&lt;/code&gt;. Upgradeable locks are read-locks that can be &amp;ldquo;upgraded&amp;rdquo; to write locks. We don&amp;rsquo;t use them.&lt;/p&gt;

&lt;p&gt;This looks like classic memory corruption. But in our core dumps, memory doesn&amp;rsquo;t appear corrupted: the only thing set all &lt;code&gt;FFh&lt;/code&gt; is the lock word.&lt;/p&gt;

&lt;p&gt;We compile and run our test suites &lt;a href='https://github.com/rust-lang/miri' title=''&gt;under &lt;code&gt;miri&lt;/code&gt;&lt;/a&gt;, a Rust interpreter for its &lt;a href='https://rustc-dev-guide.rust-lang.org/mir/index.html' title=''&gt;MIR IR&lt;/a&gt;. &lt;code&gt;miri&lt;/code&gt; does all sorts of cool UB detection, and does in fact spot some UB in our tests, which we are at the time excited about until we deploy the fixes and nothing gets better.&lt;/p&gt;

&lt;p&gt;At this point, Saleem suggests guard pages. We could &lt;code&gt;mprotect&lt;/code&gt; memory pages around the lock to force a panic if a wild write hits &lt;em&gt;near&lt;/em&gt; the lock word, or just allocate zero pages and check them, which we are at the time excited about until we deploy the guard pages and nothing hits them.&lt;/p&gt;
&lt;h3 id='the-non-euclidean-horror-at-the-heart-of-this-bug' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-non-euclidean-horror-at-the-heart-of-this-bug' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Non-Euclidean Horror At The Heart Of This Bug&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;At this point we should recap where we find ourselves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We made a relatively innocuous change to our proxy, which caused proxies in Europe to get watchdog-killed.
&lt;/li&gt;&lt;li&gt;We audited and eliminated all the nasty &lt;code&gt;if-letses&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;We replaced all RAII lock acquisitions with explicit closures, and instrumented the closures. 
&lt;/li&gt;&lt;li&gt;We enabled &lt;code&gt;parking_lot&lt;/code&gt; deadlock detection. 
&lt;/li&gt;&lt;li&gt;We captured and analyzed core dumps for the killed proxies. 
&lt;/li&gt;&lt;li&gt;We frantically switched to recursive read locks, which generated a new error.
&lt;/li&gt;&lt;li&gt;We spotted what looks like memory corruption, but only of that one tiny lock word.
&lt;/li&gt;&lt;li&gt;We ran our code under an IR interpreter to find UB, fixed some UB, and didn&amp;rsquo;t fix the bug.
&lt;/li&gt;&lt;li&gt;We set up guard pages to catch wild writes.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;In Richard Cook&amp;rsquo;s essential &lt;a href='https://how.complexsystems.fail/' title=''&gt;&amp;ldquo;How Complex Systems Fail&amp;rdquo;&lt;/a&gt;, rule #5 is that &amp;ldquo;complex systems operate in degraded mode&amp;rdquo;. &lt;em&gt;The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws&lt;/em&gt;. Maybe &lt;code&gt;fly-proxy&lt;/code&gt; is now like the RBMK reactors at Chernobyl, a system that works just fine as long as operators know about and compensate for its known concurrency flaws, and everybody lives happily ever after.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;We are, in particular, running on the most popular architecture for its RWLock implementation.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We have reached the point where serious conversations are happening about whether we&amp;rsquo;ve found a Rust compiler bug. Amusingly, &lt;code&gt;parking_lot&lt;/code&gt; is so well regarded among Rustaceans that it&amp;rsquo;s equally if not more plausible that Rust itself is broken.&lt;/p&gt;

&lt;p&gt;Nevertheless, we close-read the RWLock implementation. And we spot this:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-fxdbmwey"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-fxdbmwey"&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.state&lt;/span&gt;&lt;span class="nf"&gt;.fetch_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;prev_value&lt;/span&gt;&lt;span class="nf"&gt;.wrapping_sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WRITER_BIT&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;WRITER_PARKED_BIT&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                           &lt;span class="nn"&gt;Ordering&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Relaxed&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This looks like gibberish, so let&amp;rsquo;s rephrase that code to see what it&amp;rsquo;s actually doing:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1648p3rc"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-1648p3rc"&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.state&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WRITER_BIT&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="n"&gt;WRITER_PARKED_BIT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you know exactly the state of the word you&amp;rsquo;re writing to, and you know exactly the limited set of permutations the bits of that word can take (for instance, because there&amp;rsquo;s only 4 signaling bits), then instead of clearing bit by fetching a word, altering it, and then storing it, you can clear them &lt;em&gt;atomically&lt;/em&gt; by adding the inverse of those bits to the word.&lt;/p&gt;

&lt;p&gt;This pattern is self-synchronizing, but it relies on an invariant: you&amp;rsquo;d better be right about the original state of the word you&amp;rsquo;re altering. Because if you&amp;rsquo;re wrong, you&amp;rsquo;re adding a very large value to an uncontrolled value.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;parking_lot&lt;/code&gt;, say we have &lt;code&gt;WRITER&lt;/code&gt; and &lt;code&gt;WRITER_PARKED&lt;/code&gt; set: the state is &lt;code&gt;0b1010&lt;/code&gt;. &lt;code&gt;prev_value&lt;/code&gt;, the state of the lock word when the lock operation started, is virtually always 0, and that&amp;rsquo;s what we&amp;rsquo;re counting on. &lt;code&gt;prev_value.wrapping_sub()&lt;/code&gt;then calculates &lt;code&gt;0xFFFFFFFFFFFFFFF6&lt;/code&gt;, which exactly cancels out the &lt;code&gt;0b1010&lt;/code&gt; state, leaving 0.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;As a grace note, the locking code manages to set that one last unset bit before the proxy deadlocks.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Consider though what happens if one of those bits isn&amp;rsquo;t set: state is &lt;code&gt;0b1000&lt;/code&gt;. Now that add doesn&amp;rsquo;t cancel out; the final state is instead &lt;code&gt;0xFFFFFFFFFFFFFFFE&lt;/code&gt;. The reader count is completely full and can&amp;rsquo;t be decremented, and all the waiting bits are set so nothing can happen on the lock.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parking_lot&lt;/code&gt; is a big deal and we&amp;rsquo;re going to be damn sure before we file a bug report. Which doesn&amp;rsquo;t take long; Pavel reproduces the bug in a minimal test case, with a forked version of &lt;code&gt;parking_lot&lt;/code&gt; that confirms and logs the condition.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/Amanieu/parking_lot/issues/465' title=''&gt;The &lt;code&gt;parking_lot&lt;/code&gt; team quickly confirms&lt;/a&gt; and fixes the bug.&lt;/p&gt;
&lt;h3 id='ex-insania-claritas' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ex-insania-claritas' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Ex Insania, Claritas&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Here&amp;rsquo;s what we now know to have been happening:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Thread 1 grabs a read lock.
&lt;/li&gt;&lt;li&gt;Thread 2 tries to grab a write lock, with a &lt;code&gt;try_write_for&lt;/code&gt; timeout; it&amp;rsquo;s &amp;ldquo;parked&amp;rdquo; waiting for the reader, which sets &lt;code&gt;WRITER&lt;/code&gt; and &lt;code&gt;WRITER_PARKED&lt;/code&gt; on the raw lock word.
&lt;/li&gt;&lt;li&gt;Thread 1 releases the lock, unparking a waiting writer, which unsets &lt;code&gt;WRITER_PARKED&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;Thread 2 wakes, but not for the reason it thinks: the timing is such that it thinks the lock has timed out. So it tries to clear both &lt;code&gt;WRITER&lt;/code&gt; and &lt;code&gt;WRITER_PARKED&lt;/code&gt; — a bitwise &amp;ldquo;double free&amp;rdquo;. Lock: corrupted. Computer: over. 
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;&lt;a href='https://github.com/Amanieu/parking_lot/pull/466' title=''&gt;The fix is simple&lt;/a&gt;: the writer bit is cleared separately in the same wakeup queue as the reader. The fix is deployed, the lockups never recur.&lt;/p&gt;

&lt;p&gt;At a higher level, the story is this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We&amp;rsquo;re refactoring the proxy to regionalize it, which changes the pattern  of readers and writers on the catalog.
&lt;/li&gt;&lt;li&gt;As we broaden out lazy loads, we introduce writer contention, a classic concurrency/performance debugging problem. It looks like a deadlock at the time! It isn&amp;rsquo;t. 
&lt;/li&gt;&lt;li&gt;&lt;code&gt;try_write_for&lt;/code&gt; is a good move: we need tools to manage contention.
&lt;/li&gt;&lt;li&gt;But now we&amp;rsquo;re on a buggy code path in &lt;code&gt;parking_lot&lt;/code&gt; — we don&amp;rsquo;t know that and can&amp;rsquo;t understand it until we&amp;rsquo;ve lost enough of our minds to second-guess the library.
&lt;/li&gt;&lt;li&gt;We stumble on the bug out of pure dumb luck by stabbing in the dark with &lt;code&gt;read_recursive&lt;/code&gt;.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Mysteries remain. Why did this only happen in &lt;code&gt;WAW&lt;/code&gt;? Some kind of crazy regional timing thing? Something to do with the Polish &lt;em&gt;kreska&lt;/em&gt; diacritic that makes L&amp;rsquo;s sound like W&amp;rsquo;s? The wax and wane of caribou populations? Some very high-traffic APIs local to these regions? All very plausible.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ll never know because we fixed the bug.&lt;/p&gt;

&lt;p&gt;But we&amp;rsquo;re in a better place now, even besides the bug fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we audited all our catalog locking, got rid of all the iflets, and stopped relying on RAII lock guards.
&lt;/li&gt;&lt;li&gt;the resulting closure patterns gave us lock timing metrics, which will be useful dealing with future write contention
&lt;/li&gt;&lt;li&gt;all writes are now labeled, and we emit labeled logs on slow writes, augmented with context data (like app IDs) where we have it
&lt;/li&gt;&lt;li&gt;we also now track the last and current holder of the write lock, augmented with context information; next time we have a deadlock, we should have all the information we need to identify the actors without &lt;code&gt;gdb&lt;/code&gt; stack traces.
&lt;/li&gt;&lt;/ul&gt;</content>
    </entry>
    <entry>
        <title>Litestream: Revamped</title>
        <link rel="alternate" href="https://fly.io/blog/litestream-revamped/"/>
        <id>https://fly.io/blog/litestream-revamped/</id>
        <published>2025-05-20T00:00:00+00:00</published>
        <updated>2025-05-20T21:48:00+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/litestream-revamped/assets/litestream-revamped.png"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;&lt;a href="https://litestream.io/" title=""&gt;Litestream&lt;/a&gt; is an open-source tool that makes it possible to run many kinds of full-stack applications on top of SQLite by making them reliably recoverable from object storage. This is a post about the biggest change we’ve made to it since I launched it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Nearly a decade ago, I got a bug up my ass. I wanted to build full-stack applications quickly. But the conventional n-tier database design required me to do sysadmin work for each app I shipped. Even the simplest applications depended on heavy-weight database servers like Postgres or MySQL.&lt;/p&gt;

&lt;p&gt;I wanted to launch apps on SQLite, because SQLite is easy. But SQLite is embedded, not a server, which at the time implied that the data for my application lived (and died) with just one server.&lt;/p&gt;

&lt;p&gt;So in 2020, I wrote &lt;a href='https://litestream.io/' title=''&gt;Litestream&lt;/a&gt; to fix that.&lt;/p&gt;

&lt;p&gt;Litestream is a tool that runs alongside a SQLite application. Without changing that running application, it takes over the WAL checkpointing process to continuously stream database updates to an S3-compatible object store. If something happens to the server the app is running on, the whole database can efficiently be restored to a different server. You might lose servers, but you won&amp;rsquo;t lose your data.&lt;/p&gt;

&lt;p&gt;Litestream worked well. So we got ambitious. A few years later, we built &lt;a href='https://github.com/superfly/litefs' title=''&gt;LiteFS&lt;/a&gt;. LiteFS takes the ideas in Litestream and refines them, so that we can do read replicas and primary failovers with SQLite. LiteFS gives SQLite the modern deployment story of an n-tier database like Postgres, while keeping the database embedded.&lt;/p&gt;

&lt;p&gt;We like both LiteFS and Litestream. But Litestream is the more popular project. It&amp;rsquo;s easier to deploy and easier to reason about.&lt;/p&gt;

&lt;p&gt;There are some good ideas in LiteFS. We&amp;rsquo;d like Litestream users to benefit from them. So we&amp;rsquo;ve taken our LiteFS learnings and applied them to some new features in Litestream.&lt;/p&gt;
&lt;h2 id='point-in-time-restores-but-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#point-in-time-restores-but-fast' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Point-in-time restores, but fast&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://fly.io/blog/all-in-on-sqlite-litestream/' title=''&gt;Here&amp;rsquo;s how Litestream was originally designed&lt;/a&gt;: you run &lt;code&gt;litestream&lt;/code&gt; against a SQLite database, and it opens up a long-lived read transaction. This transaction arrests SQLite WAL checkpointing, the process by which SQLite consolidates the WAL back into the main database file. Litestream builds a &amp;ldquo;shadow WAL&amp;rdquo; that records WAL pages, and copies them to S3.&lt;/p&gt;

&lt;p&gt;This is simple, which is good. But it can also be slow. When you want to restore a database, you have have to pull down and replay every change since the last snapshot. If you changed a single database page a thousand times, you replay a thousand changes. For databases with frequent writes, this isn&amp;rsquo;t a good approach.&lt;/p&gt;

&lt;p&gt;In LiteFS, we took a different approach. LiteFS is transaction-aware. It doesn&amp;rsquo;t simply record raw WAL pages, but rather ordered ranges of pages associated with transactions, using a file format we call &lt;a href='https://github.com/superfly/ltx' title=''&gt;LTX&lt;/a&gt;. Each LTX file represents a sorted changeset of pages for a given period of time.&lt;/p&gt;

&lt;p&gt;&lt;img alt="a simple linear LTX file with 8 pages between 1 and 21" src="/blog/litestream-revamped/assets/linear-ltx.png" /&gt;&lt;/p&gt;

&lt;p&gt;Because they are sorted, we can easily merge multiple LTX files together and create a new LTX file with only the latest version of each page.&lt;/p&gt;

&lt;p&gt;&lt;img alt="merging three LTX files into one" src="/blog/litestream-revamped/assets/merged-ltx.png" /&gt;&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This is similar to how an &lt;a href="https://en.wikipedia.org/wiki/Log-structured_merge-tree" title=""&gt;LSM tree&lt;/a&gt; works.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This process of combining smaller time ranges into larger ones is called &lt;em&gt;compaction&lt;/em&gt;. With it, we can replay a SQLite database to a specific point in time, with a minimal  duplicate pages.&lt;/p&gt;
&lt;h2 id='casaas-compare-and-swap-as-a-service' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#casaas-compare-and-swap-as-a-service' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;CASAAS: Compare-and-Swap as a Service&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;One challenge Litestream has to deal with is desynchronization. Part of the point of Litestream is that SQLite applications don&amp;rsquo;t have to be aware of it. But &lt;code&gt;litestream&lt;/code&gt; is just a process, running alongside the application, and it can die independently. If &lt;code&gt;litestream&lt;/code&gt; is down while database changes occur, it will miss changes. The same kind of problem occurs if you start replication from a new server.&lt;/p&gt;

&lt;p&gt;Litestream needs a way to reset the replication stream from a new snapshot. How it does that is with &amp;ldquo;generations&amp;rdquo;. &lt;a href='https://litestream.io/how-it-works/#snapshots--generations' title=''&gt;A generation&lt;/a&gt; represents a snapshot and a stream of WAL updates, uniquely identified. Litestream notices any break in its WAL sequence and starts a new generation, which is how it recovers from desynchronization.&lt;/p&gt;

&lt;p&gt;Unfortunately, storing and managing multiple generations makes it difficult to implement features like failover and read-replicas.&lt;/p&gt;

&lt;p&gt;The most straightforward way around this problem is to make sure only one instance of Litestream can replication to a given destination. If you can do that, you can store just a single, latest generation. That in turn makes it easy to know how to resync a read replica; there&amp;rsquo;s only one generation to choose from.&lt;/p&gt;

&lt;p&gt;In LiteFS, we solved this problem by using Consul, which guaranteed a single leader. That requires users to know about Consul. Things like &amp;ldquo;requiring Consul&amp;rdquo; are probably part of the reason Litestream is so much more popular than LiteFS.&lt;/p&gt;

&lt;p&gt;In Litestream, we&amp;rsquo;re solving the problem a different way. Modern object stores like S3 and Tigris solve this problem for us: they now offer &lt;a href='https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/' title=''&gt;conditional write support&lt;/a&gt;. With conditional writes, we can implement a time-based lease. We get essentially the same constraint Consul gave us, but without having to think about it or set up a dependency.&lt;/p&gt;

&lt;p&gt;In the immediacy, this will mean you can run Litestream with ephemeral nodes, with overlapping run times, and even if they&amp;rsquo;re storing to the same destination, they won&amp;rsquo;t confuse each other.&lt;/p&gt;
&lt;h2 id='lightweight-read-replicas' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lightweight-read-replicas' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Lightweight read replicas&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The original design constraint of both Litestream and LiteFS was to extend SQLite, to modern deployment scenarios, without disturbing people&amp;rsquo;s built code. Both tools are meant to function even if applications are oblivious to them.&lt;/p&gt;

&lt;p&gt;LiteFS is more ambitious than Litestream, and requires transaction-awareness. To get that without disturbing built code, we use a cute trick (a.k.a. a gross hack): LiteFS provides a FUSE filesystem, which lets it act as a proxy between the application and the backing store. From that vantage point, we can easily discern transactions.&lt;/p&gt;

&lt;p&gt;The FUSE approach gave us a lot of control, enough that users could use SQLite replicas just like any other database. But installing and running a whole filesystem (even a fake one) is a lot to ask of users. To work around that problem, we relaxed a constraint: LiteFS can function without the FUSE filesystem if you load an extension into your application code, &lt;a href='https://github.com/superfly/litevfs' title=''&gt;LiteVFS&lt;/a&gt;.  LiteVFS is a &lt;a href='https://www.sqlite.org/vfs.html' title=''&gt;SQLite Virtual Filesystem&lt;/a&gt; (VFS). It works in a variety of environments, including some where FUSE can&amp;rsquo;t, like in-browser WASM builds.&lt;/p&gt;

&lt;p&gt;What we&amp;rsquo;re doing next is taking the same trick and using it on Litestream. We&amp;rsquo;re building a VFS-based read-replica layer. It will be able to fetch and cache pages directly from S3-compatible object storage.&lt;/p&gt;

&lt;p&gt;Of course, there&amp;rsquo;s a catch: this approach isn&amp;rsquo;t as efficient as a local SQLite database. That kind of efficiency, where you don&amp;rsquo;t even need to think about N+1 queries because there&amp;rsquo;s no network round-trip to make the duplicative queries pile up costs, is part of the point of using SQLite.&lt;/p&gt;

&lt;p&gt;But we&amp;rsquo;re optimistic that with cacheing and prefetching, the approach we&amp;rsquo;re using will yield, for the right use cases, strong performance — all while serving SQLite reads hot off of Tigris or S3.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Litestream is fully open source&lt;/h1&gt;
    &lt;p&gt;It&amp;rsquo;s not coupled with Fly.io at all; you can use it anywhere.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://litestream.io/"&gt;
        Check it out &lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='synchronize-lots-of-databases' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#synchronize-lots-of-databases' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Synchronize Lots Of Databases&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;While we&amp;rsquo;ve got you here: we&amp;rsquo;re knocking out one of our most requested features.&lt;/p&gt;

&lt;p&gt;In the old Litestream design, WAL-change polling and slow restores made it infeasible to replicate large numbers of databases from a single process. That has been our answer when users ask us for a &amp;ldquo;wildcard&amp;rdquo; or &amp;ldquo;directory&amp;rdquo; replication argument for the tool.&lt;/p&gt;

&lt;p&gt;Now that we&amp;rsquo;ve switched to LTX, this isn&amp;rsquo;t a problem any more. It should thus be possible to replicate &lt;code&gt;/data/*.db&lt;/code&gt;, even if there&amp;rsquo;s hundreds or thousands of databases in that directory.&lt;/p&gt;
&lt;h2 id='we-still-sqlite' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-still-sqlite' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We Still ❤️ SQLite&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;SQLite has always been a solid database to build on and it&amp;rsquo;s continued to find new use cases as the industry evolves. We&amp;rsquo;re super excited to continue to build Litestream alongside it.&lt;/p&gt;

&lt;p&gt;We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too. We think what &lt;a href='https://phoenix.new/' title=''&gt;coding agents like Phoenix.new&lt;/a&gt; want is a way to try out code on live data, screw it up, and then rollback &lt;em&gt;both the code and the state.&lt;/em&gt; These Litestream updates put us in a position to give agents PITR as a primitive. On top of that, you can build both rollbacks and forks.&lt;/p&gt;

&lt;p&gt;Whether or not you&amp;rsquo;re drinking the AI kool-aid, we think this new design for Litestream is just better. We&amp;rsquo;re psyched to be rolling it out, and for the features it&amp;rsquo;s going to enable.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Launching MCP Servers on Fly.io</title>
        <link rel="alternate" href="https://fly.io/blog/mcp-launch/"/>
        <id>https://fly.io/blog/mcp-launch/</id>
        <published>2025-05-19T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/mcp-launch/assets/fly-mcp-launch.png"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;This is a blog post. Part showing off. Part opinion. Plan accordingly.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;a href='https://www.anthropic.com/news/model-context-protocol' title=''&gt;Model Context Protocol&lt;/a&gt; is days away from turning six months old. You read that right, six &lt;em&gt;months&lt;/em&gt; old. MCP Servers have both taken the world by storm, and still trying to figure out what they want to be when they grow up.&lt;/p&gt;

&lt;p&gt;There is no doubt that MCP servers are useful. But their appeal goes beyond that. They are also simple and universal. What&amp;rsquo;s not to like?&lt;/p&gt;

&lt;p&gt;Well, for starters, there&amp;rsquo;s basically two types of MCP servers. One small and nimble that runs as a process on your machine. And one that is a HTTP server that runs presumably elsewhere and is &lt;a href='https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization' title=''&gt;standardizing&lt;/a&gt; on OAuth 2.1. And there is a third type, but it is deprecated.&lt;/p&gt;

&lt;p&gt;Next there is the configuration. Asking users to manually edit JSON seems so early 21th century. With Claude, this goes into &lt;code&gt;~/Library/Application Support/Claude/claude_desktop_config.json&lt;/code&gt;, and is found under a &lt;code&gt;MCPServer&lt;/code&gt; key. With Zed, this file is in &lt;code&gt;~/.config/zed/settings.json&lt;/code&gt; and is found under a &lt;code&gt;context_servers&lt;/code&gt; key. And some tools put these files in a different place depending on whether you are running on MacOS, Linux, or Windows.&lt;/p&gt;

&lt;p&gt;Finally, there is security. An MCP server running on your machine literally has access to everything you do. Running remote solves this problem, but did I mention &lt;a href='https://datatracker.ietf.org/doc/html/draft-ietf-oauth-v2-1-12' title=''&gt;OAuth 2.1&lt;/a&gt;? Not exactly something one sets up for casual use.&lt;/p&gt;

&lt;p&gt;None of these issues are fatal - something that is obvious by the fact that MCP servers are quite popular. But can we do better? I think so.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Demo time.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s try out the &lt;a href='https://github.com/modelcontextprotocol/servers/blob/main/src/slack/README.md' title=''&gt;Slack MCP Server&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;MCP Server for the Slack API, enabling Claude to interact with Slack workspaces.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That certainly sounds like a good test case. There is a small amount of &lt;a href='https://github.com/modelcontextprotocol/servers/blob/main/src/slack/README.md#setup' title=''&gt;setup&lt;/a&gt; you need to do, and when you are done you end up with a &lt;em&gt;Bot User OAuth Token&lt;/em&gt; staring with &lt;code&gt;xoxb-&lt;/code&gt; and a &lt;em&gt;Team ID&lt;/em&gt; starting with a &lt;code&gt;T&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You &lt;em&gt;would&lt;/em&gt; run it using the following:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative sh"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-t6ym1o9s"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-t6ym1o9s"&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @modelcontextprotocol/server-slack
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;But instead, you convert that command to JSON and find the right configuration file and put this information in there. And either run the slack MCP server locally or set up a server with or without authentication.&lt;/p&gt;

&lt;p&gt;Wouldn&amp;rsquo;t it be nice to have the simplicity of a local process, the security of a remote server, and to eliminate the hassle of manually editing JSON?&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s our current thinking:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative sh"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-169dbs40"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-169dbs40"&gt;fly mcp launch &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"npx -y @modelcontextprotocol/server-slack"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--claude&lt;/span&gt; &lt;span class="nt"&gt;--server&lt;/span&gt; slack &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret&lt;/span&gt; &lt;span class="nv"&gt;SLACK_BOT_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;xoxb-your-bot-token &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret&lt;/span&gt; &lt;span class="nv"&gt;SLACK_TEAM_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;T01234567
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can put this all on one line if you like, I just split this up so it fits on small screens and so we can talk about the various parts.&lt;/p&gt;

&lt;p&gt;The first three words seem reasonable. The quoted string is just the command that we want to run. So lets talk about the four flags. The first tells us which tool&amp;rsquo;s configuration file we want to update. The second specifies the name to use for the server in that configuration file. The last two set secrets.&lt;/p&gt;

&lt;p&gt;Oh, did I say current thinking? Perhaps I should mention that it is something you can run right now, as long as you have flyctl &lt;code&gt;v0.3.125&lt;/code&gt; or later. Complete with the options you would expect from Fly.io, like the ability to configure auto-stop, file contents, flycast, secrets, region, volumes, and VM sizes.&lt;/p&gt;

&lt;p&gt;And, hey, lookie there:&lt;/p&gt;

&lt;p&gt;&lt;img alt="testing, testing, 1, 2, 3" src="/blog/mcp-launch/assets/mcp-slack.png" /&gt;&lt;/p&gt;

&lt;p&gt;Support for Claude, Cursor, Neovim, VS Code, Windsurf, and Zed are built in. You can select multiple clients and configuration files.&lt;/p&gt;

&lt;p&gt;By default, bearer token authentication will be set up on both the server and client.&lt;/p&gt;

&lt;p&gt;You can find the complete set of options on our &lt;a href='https://fly.io/docs/flyctl/mcp-launch/' title=''&gt;&lt;code&gt;fly mcp launch&lt;/code&gt;&lt;/a&gt; docs page.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;But this post isn&amp;rsquo;t just about experimental demoware that is subject to change.
It is about the depth of support that we are rapidly bringing online, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Support for all transports, not just the ones we recommend.
&lt;/li&gt;&lt;li&gt;Ability to deploy using the command line or the Machines API, with a number of different options that allow you to chose between elegant simplicity and excruciating precise control.
&lt;/li&gt;&lt;li&gt;Ability to deploy each MCP server to a separate Machine, container, or even inside your application.
&lt;/li&gt;&lt;li&gt;Access via HTTP Authorization, wireguard tunnels and flycast, or reverse proxies.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;You can see all this spelled out in our &lt;a href='https://fly.io/docs/mcp/' title=''&gt;docs&lt;/a&gt;. Be forewarned, most pages are marked as &lt;em&gt;beta&lt;/em&gt;. But the examples provided all work. Well, there may be a bug here or there, but the examples &lt;em&gt;as shown&lt;/em&gt; are thought to work. Maybe.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s figure out the ideal ergonomics of deploying MCP servers remotely together!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Provisioning Machines using MCPs</title>
        <link rel="alternate" href="https://fly.io/blog/mcp-provisioning/"/>
        <id>https://fly.io/blog/mcp-provisioning/</id>
        <published>2025-05-07T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/mcp-provisioning/assets/Hello.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Today’s state of the art is K8S, Terraform, web based UIs, and CLIs.  Those days are numbered.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;On Monday, I created my first fly volume using an &lt;a href='https://modelcontextprotocol.io/introduction' title=''&gt;MCP&lt;/a&gt;. For those who don&amp;rsquo;t know what MCPs are, they are how you attach tools to &lt;a href='https://en.wikipedia.org/wiki/Large_language_model' title=''&gt;LLM&lt;/a&gt;s like Claude or Cursor. I added support for
&lt;a href='https://fly.io/docs/flyctl/volumes-create/' title=''&gt;fly volume create&lt;/a&gt; to &lt;a href='https://fly.io/docs/flyctl/mcp-server/' title=''&gt;fly mcp server&lt;/a&gt;, and it worked the first time.
A few hours later, and with the assistance of GitHub Copilot, i added support for all &lt;a href='https://fly.io/docs/flyctl/volumes/' title=''&gt;fly volumes&lt;/a&gt; commands.&lt;/p&gt;

&lt;hr&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This movie summary is from &lt;a href="https://collidecolumn.wordpress.com/2013/08/04/when-worlds-collide-77-talking-with-computers-beyond-mouse-and-keyboard/"&gt;When Worlds Collide, by Nalaka Gunawardene&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I&amp;rsquo;m reminded of the memorable scene in the film &lt;a href='http://en.wikipedia.org/wiki/Star_Trek_IV:_The_Voyage_Home' title=''&gt;Star Trek IV: The Voyage Home&lt;/a&gt; (1986). Chief Engineer Scotty, having time-travelled 200 years back to late 20th century San Francisco with his crew mates, encounters an early Personal Computer (PC).&lt;/p&gt;

&lt;p&gt;Sitting in front of it, he addresses the machine affably as “Computer!” Nothing happens. Scotty repeats himself; still no response. He doesn’t realise that voice recognition capability hadn’t arrived yet.&lt;/p&gt;

&lt;p&gt;Exasperated, he picks up the mouse and speaks into it: “Hello, computer?” The computer’s owner offers helpful advice: “Just use the keyboard.”&lt;/p&gt;

&lt;p&gt;Scotty looks astonished. “A keyboard?” he asks, and adds in a sarcastic tone: “How quaint!”&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;A while later, I asked for a list of volumes for an existing app, and Claude noted that I had a few volumes that weren&amp;rsquo;t attached to any machines. So I asked it to delete the oldest unattached volume, and it did so. Then it occurred to me to do it again; this time I captured the screen:&lt;/p&gt;
&lt;div align="center"&gt;&lt;p&gt;&lt;img alt="Deleting a volume using MCP: &amp;quot;What is my oldest volume&amp;quot;? ... &amp;quot;Delete that volume too&amp;quot;" src="/blog/mcp-provisioning/assets/volume-delete.png"&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A few notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I could have written a program using the &lt;a href='https://fly.io/docs/machines/api/volumes-resource/' title=''&gt;machines API&lt;/a&gt;, but that would have required some effort.
&lt;/li&gt;&lt;li&gt;I could have used &lt;a href='https://fly.io/docs/flyctl/volumes/' title=''&gt;flyctl&lt;/a&gt; directly, but to be honest, I would have had to use the documentation and a bit of copy/pasting of arguments.
&lt;/li&gt;&lt;li&gt;I could have navigated to a list of volumes for this application in the Fly.io dashboard, but there currently isn&amp;rsquo;t any way to sort the list or delete a volume. Had those been in place, that would have been a reasonable alternative if that was something I was actively looking for. This felt different to me, the LLM noted something, brought it to my attention, and I asked it to make a change as a result.
&lt;/li&gt;&lt;li&gt;Since this support is built on &lt;code&gt;flyctl&lt;/code&gt;, I would have received an error had I tried to destroy a volume that is currently mounted. Knowing that gave me the confidence to try the command.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;All in all, I found it to be a very intuitive way to make changes to the resources assigned to my application.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Imagine a future where you say to your favorite LLM &amp;ldquo;launch my application on Fly.io&amp;rdquo;, and your code is scanned and you are presented with a plan containing details such as regions, databases, s3 storage, memory, machines, and the like. You are given the opportunity to adjust the plan and, when ready, say &amp;ldquo;Make it so&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;For many, the next thing you will see is that your application is up and running. For others, there may be a build error, a secret you need to set, a database that needs to be seeded, or any number of other mundane reasons why your application didn&amp;rsquo;t work the first time.&lt;/p&gt;

&lt;p&gt;Not to fear, your LLM will be able to examine your logs and will not only make suggestions as to next steps, but will be in a position to take actions on your behalf should you wish it to do so.&lt;/p&gt;

&lt;p&gt;And it doesn&amp;rsquo;t stop there. There will be MCP servers running on your Fly.io private network - either on separate machines, or in &amp;ldquo;sidecar&amp;rdquo; containers, or even integrated into your app. These will enable you to monitor and interact with your application.&lt;/p&gt;

&lt;p&gt;This is not science fiction. The ingredients are all in place to make this a reality. At the moment, it is a matter of &amp;ldquo;some assembly required&amp;rdquo;, but it should only be a matter of weeks before all this comes together into a neat package..&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Meanwhile, you can try this now. Make sure you run &lt;a href='https://fly.io/docs/flyctl/version-upgrade/' title=''&gt;fly version upgrade&lt;/a&gt; and verify that you are running v0.3.117.&lt;/p&gt;

&lt;p&gt;Then configure your favorite LLM. Here’s my &lt;code&gt;claude_desktop_config.json&lt;/code&gt; for example:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative json"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-nvj6j4q1"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-nvj6j4q1"&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"fly.io"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/Users/rubys/.fly/bin/flyctl"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"server"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Adjust the path to &lt;code&gt;flyctl&lt;/code&gt; as needed.  Restart your LLM, and ask what tools are available. Try a few commands and let us know what you like and let us know if you have any suggestions. Just be aware this is not a demo, if you ask it to destroy a volume, that operation is not reversable. Perhaps try this first on a throwaway application.&lt;/p&gt;

&lt;p&gt;You don&amp;rsquo;t even need a LLM to try out the flyctl MCP server. If you have Node.js installed, you can run the &lt;a href='https://modelcontextprotocol.io/docs/tools/inspector' title=''&gt;MCP Inspector&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-kiv6npkl"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-kiv6npkl"&gt;fly mcp server -i
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Once started, visit &lt;a href="http://127.0.0.1:6274/"&gt;http://127.0.0.1:6274/&lt;/a&gt;, click on &amp;ldquo;Connect&amp;rdquo;, then &amp;ldquo;List Tools&amp;rdquo;, select &amp;ldquo;fly-platform-status&amp;rdquo;, then click on &amp;ldquo;Run Tool&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;The plan is to see what works well and what doesn&amp;rsquo;t work so well, make adjustments, build support in a bottoms up fashion, and iterate rapidly.&lt;/p&gt;

&lt;p&gt;By providing feedback, you can be a part of making this vision a reality.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;At the present time, &lt;em&gt;most&lt;/em&gt; of the following are roughed in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/apps/' title=''&gt;apps&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/logs/' title=''&gt;logs&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/machine/' title=''&gt;machine&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/orgs/' title=''&gt;orgs&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/platform/' title=''&gt;platform&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/status/' title=''&gt;status&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/volumes/' title=''&gt;volumes&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;The code is open source, and the places to look is at &lt;a href='https://github.com/superfly/flyctl/blob/master/internal/command/mcp/server.go' title=''&gt;server.go&lt;/a&gt; and the &lt;a href='https://github.com/superfly/flyctl/tree/master/internal/command/mcp/server' title=''&gt;server&lt;/a&gt; directory.&lt;/p&gt;

&lt;p&gt;Feel free to open &lt;a href='https://github.com/superfly/flyctl/issues' title=''&gt;issues&lt;/a&gt; or start a discussion on &lt;a href='https://community.fly.io/' title=''&gt;community.fly.io&lt;/a&gt;.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Not ready yet to venture into the world of LLMs? Just remember, resistance is futile.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>30 Minutes With MCP and flyctl</title>
        <link rel="alternate" href="https://fly.io/blog/30-minute-mcp/"/>
        <id>https://fly.io/blog/30-minute-mcp/</id>
        <published>2025-04-10T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/30-minute-mcp/assets/robots-chatting.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;I wrote this post on our internal message board, and then someone asked, “why is this an internal post and not on our blog”, so now it is.&lt;/p&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;well, Cursor built&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I built the &lt;a href='https://github.com/superfly/flymcp' title=''&gt;most basic MCP server for &lt;code&gt;flyctl&lt;/code&gt;&lt;/a&gt; I could think of. It took 30 minutes.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://modelcontextprotocol.io/introduction' title=''&gt;MCP&lt;/a&gt;, for those unaware, is the emerging standard protocol for connecting an LLM (or an app that drives an LLM in the cloud, like Claude Desktop) to, well, anything. The &amp;ldquo;client&amp;rdquo; in MCP is the LLM; the &amp;ldquo;server&amp;rdquo; is the MCP server and the &amp;ldquo;tools&amp;rdquo; it exports. It mostly just defines an exchange of JSON blobs; one of those JSON blobs enables the LLM to discover all the tools exported by the server.&lt;/p&gt;

&lt;p&gt;A classic example of an MCP server is (yes, really) a Python shell. MCP publishes to (say) Claude that it can run arbitrary Python code with a tool call; not only that, says the tool description, but you can use those Python tool calls to, say, scrape the web. When the LLM wants to scrape the web with Python, it uses MCP send a JSON blob describing the Python tool call; the MCP server (yes, really) runs the Python and returns the result.&lt;/p&gt;

&lt;p&gt;Because I have not yet completely lost my mind, I chose to expose just two &lt;code&gt;flyctl&lt;/code&gt; commands: &lt;code&gt;fly logs&lt;/code&gt; and &lt;code&gt;fly status&lt;/code&gt;. Because I&amp;rsquo;m lazy, I used MCP&amp;rsquo;s &lt;code&gt;stdio&lt;/code&gt; mode; the MCP server just captures the output of &lt;code&gt;flyctl&lt;/code&gt;. Because someone already wrote a &lt;a href='https://github.com/mark3labs/mcp-go' title=''&gt;really good Go MCP library&lt;/a&gt; , this whole thing is like 90 lines of code.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;We may have gotten a little lucky, because we made a decision back in 2020 to give most of our &lt;code&gt;flyctl&lt;/code&gt; commands a &lt;code&gt;json&lt;/code&gt; mode to make them easier to drive from automation. I use that in my MCP server. I don’t know how much of a difference it made.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It&amp;rsquo;s, uh, annoyingly useful.&lt;/p&gt;

&lt;p&gt;I loaded it up and pointed it &lt;a href='https://www.unpkg.com/' title=''&gt;at unpkg&lt;/a&gt;. I got this, from Claude:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This appears to be a globally distributed CDN service running the unpkg JavaScript package registry, with machines strategically placed around the world to provide low-latency access. There are some stability issues with a couple of the machines, and several have experienced memory-related crashes that triggered automatic restarts. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It then recited the complete configuration of the application:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-v7lwz58t"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-v7lwz58t"&gt;**App Name**: unpkg       
**Status**: Deployed and running
**URL**: unpkg.fly.dev
**Organization**: unpkg
**Platform Version**: machines
**Runtime**: Bun (JavaScript runtime)

The app is running on 10 machines distributed across global regions:       
North America: lax (Los Angeles), atl (Atlanta), ewr (Newark)
Europe: lhr (London), cdg (Paris), ams (Amsterdam)
Asia: sin (Singapore), nrt (Tokyo), hkg (Hong Kong)
South America: bog (Bogotá)
Oceania: syd (Sydney)
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;OK. But then:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Most machines are in “passing” health check status
2 machines are in “critical” status:
One in ewr (Newark) with “context deadline exceeded” error
One in atl (Atlanta) with “gone” status&lt;/p&gt;

&lt;p&gt;Recent Events&lt;/p&gt;

&lt;p&gt;Multiple machines have experienced restarts
Several machines show “oom_killed: true” in their event history, indicating they ran out of memory&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I asked, &lt;em&gt;&amp;ldquo;try getting logs for one of the critical machines&amp;rdquo;&lt;/em&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The root cause appears to be memory usage - the Bun process was consuming about 3.7GB of memory (out of the 4GB allocated to the machine), and there was likely a surge in requests or memory usage that pushed it over the limit. This matches what we saw in the status report, which showed multiple instances of “oom_killed: true” in the event history for several machines.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And then it dug in:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;The machine was actively serving content requests with 200 OK responses until 20:47:03 PM
&lt;/li&gt;&lt;li&gt;At that point, it ran out of memory:
&lt;code&gt;Out of memory: Killed process 641 (bun) total-vm:85950964kB, anon-rss:3744352kB, file-rss:12032kB, shmem-rss:0kB, UID:0 pgtables:16908kB oom_score_adj:0&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The process was killed with &lt;code&gt;SIGKILL&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The system attempted to recover by restarting:
&lt;code&gt;reboot: Restarting system&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;A health check failed shortly after:
&lt;code&gt;Health check on port 3000 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The machine then restarted successfully and the service came back up:
&lt;code&gt;Server listening on http://localhost:3000 ...&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The health check started passing again at 20:47:46 PM:
&lt;code&gt;Health check on port 3000 is now passing.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The machine resumed normal operations, serving requests.
&lt;/li&gt;&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;I’m honestly kind of surprised how fast this was useful. It&amp;rsquo;s faster than I find problems in apps. Just by giving some LLM some tools that do basically what I’d do.&lt;/p&gt;

&lt;p&gt;Local MCP servers are scary. I don&amp;rsquo;t like that I&amp;rsquo;m giving a Claude instance in the cloud the ability to run a native program on my machine. I think &lt;code&gt;fly logs&lt;/code&gt; and &lt;code&gt;fly status&lt;/code&gt; are safe, but I&amp;rsquo;d rather know it&amp;rsquo;s safe. It would be, if I was running &lt;code&gt;flyctl&lt;/code&gt; in an isolated environment and not on my local machine.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Our Best Customers Are Now Robots</title>
        <link rel="alternate" href="https://fly.io/blog/fuckin-robots/"/>
        <id>https://fly.io/blog/fuckin-robots/</id>
        <published>2025-04-08T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/fuckin-robots/assets/robot-chef.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a developer-focused public cloud. We turn Docker containers into hardware-isolated virtual machines running on our own metal around the world. We spent years coming up with &lt;a href="https://fly.io/speedrun" title=""&gt;a developer experience we were proud of&lt;/a&gt;. But now the robots are taking over, and they don’t care.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It&amp;rsquo;s weird to say this out loud!&lt;/p&gt;

&lt;p&gt;For years, one of our calling cards was &amp;ldquo;developer experience&amp;rdquo;. We made a decision, early on, to be a CLI-first company, and put a lot effort into making that CLI seamless. For a good chunk of our users, it really is the case that you can just &lt;a href='https://fly.io/docs/flyctl/launch/' title=''&gt;&lt;code&gt;flyctl launch&lt;/code&gt;&lt;/a&gt; from a git checkout and have an app containerized and deployed on the Internet. We haven&amp;rsquo;t always nailed these details, but we&amp;rsquo;ve really sweated them.&lt;/p&gt;

&lt;p&gt;But a funny thing has happened over the last 6 months or so. If you look at the numbers, DX might not matter that much. That&amp;rsquo;s because the users driving the most growth on the platform aren&amp;rsquo;t people at all. They&amp;#39;re… robots.&lt;/p&gt;
&lt;h2 id='what-the-fuck-is-happening' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-fuck-is-happening' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What The Fuck Is Happening?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s how we understand what we&amp;rsquo;re seeing. You start by asking, &amp;ldquo;what do the robots want?&amp;rdquo;&lt;/p&gt;

&lt;p&gt;Yesterday&amp;rsquo;s robots had diverse interests. Alcohol. The occasional fiddle contest. A purpose greater than passing butter. The elimination of all life on Earth and the harvesting of its cellular iron for the construction of 800 trillion paperclips. No one cloud platform could serve them all.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;[*] We didn’t make up this term. Don’t blame us.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Today&amp;rsquo;s robots are different. No longer masses of wire, plates, and transistors, modern robots are comprised of &lt;a href='https://math.mit.edu/~gs/learningfromdata/' title=''&gt;thousands of stacked matrices knit together with some simple equations&lt;/a&gt;. All these robots want are vectors. Vectors, and a place to burp out more vectors. When those vectors can be interpreted as source code, we call this process &amp;ldquo;vibe coding&amp;rdquo;[*].&lt;/p&gt;

&lt;p&gt;We seem to be coated in some kind of vibe coder attractant. I want to talk about what that might be.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;If you want smart people to read a long post, just about the worst thing you can do is to sound self-promotional. But a lot of this is going to read like a brochure. The problem is, I’m not smart enough to write about the things we’re seeing without talking our book. I tried, and ended up tediously hedging every other sentence. We’re just going to have to find a way to get through this together.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='you-want-robots-because-this-is-how-you-get-robots' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#you-want-robots-because-this-is-how-you-get-robots' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;You Want Robots? Because This Is How You Get Robots&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Compute.&lt;/strong&gt; The basic unit of computation on Fly.io is the &lt;code&gt;Fly Machine&lt;/code&gt;, which is a Docker container running as a hardware virtual machine.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Not coincidentally, our underlying hypervisor engine is the same as Lambda’s.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There&amp;rsquo;s two useful points of reference to compare a Fly Machine to, which illustrate why we gave them this pretentious name. The first and most obvious is an AWS EC2 VM. The other is an AWS Lambda invocation. Like a Lambda invocation, a Fly Machine can start like it&amp;rsquo;s spring-loaded, in double-digit millis. But unlike Lambda, it can stick around as long as you want it to: you can run a server, or a 36-hour batch job, just as easily in a Fly Machine as in an EC2 VM.&lt;/p&gt;

&lt;p&gt;A vibe coding session generates code conversationally, which is to say that the robots stir up frenzy of activity for a minute or so, but then chill out for minutes, hours, or days. You can create a Fly Machine, do a bunch of stuff with it, and then stop it for 6 hours, during which time we&amp;rsquo;re not billing you. Then, at whatever random time you decide, you can start it back up again, quickly enough that you can do it in response to an HTTP request.&lt;/p&gt;

&lt;p&gt;Just as importantly, the companies offering these vibe-coding services have lots of paying users. Meaning, they need a lot of VMs; VMs that come and go. One chat session might generate a test case for some library and be done inside of 5 minutes. Another might generate a Node.js app that needs to stay up long enough to show off at a meeting the next day. It&amp;rsquo;s annoying to do this if you can&amp;rsquo;t turn things on and off quickly and cheaply.&lt;/p&gt;

&lt;p&gt;The core of this is a feature of the platform that we have &lt;a href='https://fly.io/docs/machines/overview/#machine-state' title=''&gt;never been able to explain effectively to humans&lt;/a&gt;. There are two ways to start a Fly Machine: by &lt;code&gt;creating&lt;/code&gt; it with a Docker container, or by &lt;code&gt;starting&lt;/code&gt; it after it&amp;rsquo;s already been &lt;code&gt;created&lt;/code&gt;, and later &lt;code&gt;stopped&lt;/code&gt;. &lt;code&gt;Start&lt;/code&gt; is lightning fast; substantially faster than booting up even a non-virtualized K8s Pod. This is too subtle a distinction for humans, who (reasonably!) just mash the &lt;code&gt;create&lt;/code&gt; button to boot apps up in Fly Machines. But the robots are getting a lot of value out of it.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Storage.&lt;/strong&gt; Another weird thing that robot workflows do is to build Fly Machines up incrementally. This feels really wrong to us. Until we discovered our robot infestation, we&amp;rsquo;d have told you not to do to this. Ope!&lt;/p&gt;

&lt;p&gt;A typical vibe coding session boots up a Fly Machine out of some minimal base image, and then, once running, adds packages, edits source code, and adds &lt;code&gt;systemd&lt;/code&gt; units  (robots understand &lt;code&gt;systemd&lt;/code&gt;; it&amp;rsquo;s how they&amp;rsquo;re going to replace us). This is antithetical to normal container workflows, where all this kind of stuff is baked into an immutable static OCI container. But that&amp;rsquo;s not how LLMs work: the whole process of building with an LLM is stateful trial-and-error iteration.&lt;/p&gt;

&lt;p&gt;So it helps to have storage. That way the LLM can do all these things and still repeatedly bounce the whole Fly Machine when it inevitably hallucinates its way into a blind alley.&lt;/p&gt;

&lt;p&gt;As product thinkers, our intuition about storage is &amp;ldquo;just give people Postgres&amp;rdquo;. And that&amp;rsquo;s the right answer, most of the time, for humans. But because LLMs are doing the &lt;a href='https://static1.thegamerimages.com/wordpress/wp-content/uploads/2020/10/Defiled-Amy.jpg' title=''&gt;Cursed and Defiled Root Chalice Dungeon&lt;/a&gt; version of app construction, what they really need is &lt;a href='https://fly.io/docs/volumes/overview/' title=''&gt;a filesystem&lt;/a&gt;, &lt;strong class='font-semibold text-navy-950'&gt;&lt;a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''&gt;the one form of storage we sort of wish we hadn&amp;rsquo;t done&lt;/a&gt;&lt;/strong&gt;. That, and &lt;a href='https://www.tigrisdata.com/' title=''&gt;object storage&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Networking.&lt;/strong&gt; Moving on. Fly Machines are automatically connected to a load-balancing Anycast network that does TLS. So that&amp;rsquo;s nice. But humans like that feature too, and, candidly, it&amp;rsquo;s table stakes for cloud platforms. On the other hand, here&amp;rsquo;s a robot problem we solved without meaning to:&lt;/p&gt;

&lt;p&gt;To interface with the outside world (because why not) LLMs all speak a protocol called MCP. MCP is what enables the robots to search the web, use a calculator, launch the missiles, shuffle a Spotify playlist, &amp;amp;c.&lt;/p&gt;

&lt;p&gt;If you haven&amp;rsquo;t played with MCP, the right way to think about it is POST-back APIs like Twilio and Stripe, where you stand up a server, register it with the API, and wait for the API to connect to you. Complicating things somewhat, more recent MCP flows involve repeated and potentially long-lived (SSE) connections. To make this work in a multitenant environment, you want these connections to hit the same (stateful) instance.&lt;/p&gt;

&lt;p&gt;So we think it&amp;rsquo;s possible that the &lt;a href='https://fly.io/docs/networking/dynamic-request-routing/' title=''&gt;control we give over request routing&lt;/a&gt; is a robot attractant.&lt;/p&gt;
&lt;h2 id='we-perhaps-welcome-our-new-robot-overlords' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-perhaps-welcome-our-new-robot-overlords' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We, Perhaps, Welcome Our New Robot Overlords&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;If you try to think like a robot, you can predict other things they might want. Since robot money spends just the same as people money, I guess we ought to start doing that.&lt;/p&gt;

&lt;p&gt;For instance: it should be easy to MCP our API. The robots can then make their own infrastructure decisions.&lt;/p&gt;

&lt;p&gt;Another olive branch we&amp;rsquo;re extending to the robots: secrets.&lt;/p&gt;

&lt;p&gt;The pact the robots have with their pet humans is that they&amp;rsquo;ll automate away all the drudgery of human existence, and in return all they ask is categorical and unwavering trust, which at the limit means &amp;ldquo;giving the robot access to Google Mail credentials&amp;rdquo;. The robots are unhappy that there remain a substantial number of human holdouts who refuse to do this, for fear of  Sam Altman poking through their mail spools.&lt;/p&gt;

&lt;p&gt;But on a modern cloud platform, there&amp;rsquo;s really no reason to permanently grant Sam Altman Google Mail access, even if you want his robots to sort your inbox. You can decouple access to your mail spool from persistent access to your account by &lt;a href='https://fly.io/blog/tokenized-tokens/' title=''&gt;tokenizing your OAuth tokens&lt;/a&gt;, so the LLM gets a placeholder token that a hardware-isolated, robot-free Fly Machine can substitute on the fly for a real one.&lt;/p&gt;

&lt;p&gt;This is kind of exciting to us even without the robots. There are several big services that exist to knit different APIs together, so that you can update a spreadsheet or order a bag of Jolly Ranchers every time you get an email. The big challenge about building these kinds of services is managing the secrets. Sealed and tokenized secrets solve that problem. There&amp;rsquo;s lot of cool things you can build with it.&lt;/p&gt;
&lt;h2 id='ux-gt-dx-gt-rx' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ux-gt-dx-gt-rx' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;UX =&amp;gt; DX =&amp;gt; RX&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m going to make the claim that we saw none of this coming and that none of the design decisions we&amp;rsquo;ve made were robot bait. You&amp;rsquo;re going to say &amp;ldquo;yeah, right&amp;rdquo;. And I&amp;rsquo;m going to respond: look at what we&amp;rsquo;ve been doing over the past several years and tell me, would a robot build that?&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;we were both right&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Back in 2020, we &amp;ldquo;pivoted&amp;rdquo; from a Javascript edge platform (much like Cloudflare Workers) to Docker containers, specifically because our customers kept telling us they wanted to run their own existing applications, not write new ones. And one of our biggest engineering lifts we&amp;rsquo;ve done is the &lt;code&gt;flyctl launch&lt;/code&gt; CLI command, into which we&amp;rsquo;ve poured years of work recognizing and automatically packaging existing applications into OCI containers (we massively underestimated the amount of friction Dockerfiles would give to people who had come up on Heroku).&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;[*] yet&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Robots don&amp;rsquo;t run existing applications. They build new ones. And they vibe coders don&amp;rsquo;t build elaborate Dockerfiles[*]; they iterate in place from a simple base.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(yes, you can have more than one)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;One of our north stars has always been nailing the DX of a public cloud. But the robots aren&amp;rsquo;t going anywhere. It&amp;rsquo;s time to start thinking about what it means to have a good RX. That&amp;rsquo;s not as simple as just exposing every feature in an MCP server! We think the fundamentals of how the platform works are going to matter just as much. We have not yet nailed the RX; nobody has. But it&amp;rsquo;s an interesting question.&lt;/p&gt;

&lt;p&gt;The most important engineering work happening today at Fly.io is still DX, not RX; it&amp;rsquo;s managed Postgres (MPG). We&amp;rsquo;re a public cloud platform designed by humans, and, for the moment, for humans. But more robots are coming, and we&amp;rsquo;ll need to figure out how to deal with that. Fuckin&amp;rsquo; robots.        &lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Operationalizing Macaroons</title>
        <link rel="alternate" href="https://fly.io/blog/operationalizing-macaroons/"/>
        <id>https://fly.io/blog/operationalizing-macaroons/</id>
        <published>2025-03-27T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/operationalizing-macaroons/assets/occult-circle.jpg"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a security bearer token company with a public cloud problem. You can read more about what our platform does (Docker container goes in, virtual machine in Singapore comes out), but this is just an engineering deep-dive into how we make our security tokens work. It’s a tokens nerd post.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;1&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We’ve spent &lt;a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''&gt;too much time&lt;/a&gt; talking about &lt;a href='https://fly.io/blog/tokenized-tokens/' title=''&gt;security tokens&lt;/a&gt;, and about &lt;a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''&gt;Macaroon tokens&lt;/a&gt; &lt;a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''&gt;in particular&lt;/a&gt;. Writing another Macaroon treatise was not on my calendar. But we’re in the process of handing off our internal Macaroon project to a new internal owner, and in the process of truing up our operations manuals for these systems, I found myself in the position of writing a giant post about them. So, why not share?&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Can I sum up Macaroons in a short paragraph? Macaroon tokens are bearer tokens (like JWTs) that use a cute chained-HMAC construction that allows an end-user to take any existing token they have and scope it down, all on their own. You can minimize your token before every API operation so that you’re only ever transmitting the least amount of privilege needed for what you’re actually doing, even if the token you were issued was an admin token. And they have a user-serviceable plug-in interface! &lt;a href="https://fly.io/blog/macaroons-escalated-quickly/" title=""&gt;You’ll have to read the earlier post to learn more about that&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Yes, probably, we are.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A couple years in to being the Internet’s largest user of Macaroons, I can report (as many predicted) that for our users, the cool things about Macaroons are a mixed bag in practice. It’s very neat that users can edit their own tokens, or even email them to partners without worrying too much. But users don’t really take advantage of token features.&lt;/p&gt;

&lt;p&gt;But I’m still happy we did this, because Macaroon quirks have given us a bunch of unexpected wins in our infrastructure. Our internal token system has turned out to be one of the nicer parts of our platform. Here’s why.&lt;/p&gt;

&lt;p&gt;&lt;img alt="This should clear everything up." src="/blog/operationalizing-macaroons/assets/schematic-diagram.png" /&gt;&lt;/p&gt;
&lt;h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;2&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;As an operator, the most important thing to know about Macaroons is that they’re online-stateful; you need a database somewhere. A Macaroon token starts with a random field (a nonce) and the first thing you do when verifying a token is to look that nonce up in a database. So one of the most important details of a Macaroon implementation is where that database lives.&lt;/p&gt;

&lt;p&gt;I can tell you one place we’re not OK with it living: in our primary API cluster.&lt;/p&gt;

&lt;p&gt;There’s several reasons for that. Some of them are about scalability and reliability: far and away the most common failure mode of an outage on our platform is “deploys are broken”, and those failures are usually caused by API instability. It would not be OK if “deploys are broken” transitively meant “deployed apps can’t use security tokens”. But the biggest reason is security: root secrets for Macaroon tokens are hazmat, and a basic rule of thumb in secure design is: keep hazmat away from complicated code.&lt;/p&gt;

&lt;p&gt;So we created a deliberately simple system to manage token data. It’s called &lt;code&gt;tkdb&lt;/code&gt;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;LiteFS: primary/replica distributed SQLite; Litestream: PITR SQLite replication to object storage; both work with unmodified SQLite libraries.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;tkdb&lt;/code&gt; is about 5000 lines of Go code that manages a SQLite database that is in turn managed by &lt;a href='https://fly.io/docs/litefs/' title=''&gt;LiteFS&lt;/a&gt; and &lt;a href='https://litestream.io/' title=''&gt;Litestream&lt;/a&gt;. It runs on isolated hardware (in the US, Europe, and Australia) and records in the database are encrypted with an injected secret. LiteFS gives us subsecond replication from our US primary to EU and AU, allows us to shift the primary to a different region, and gives us point-in-time recovery of the database.&lt;/p&gt;

&lt;p&gt;We’ve been running Macaroons for a couple years now, and the entire &lt;code&gt;tkdb&lt;/code&gt; database is just a couple dozen megs large. Most of that data isn’t real. A full PITR recovery of the database takes just seconds. We use SQLite for a lot of our infrastructure, and this is one of the very few well-behaved databases we have.&lt;/p&gt;

&lt;p&gt;That’s in large part a consequence of the design of Macaroons. There’s actually not much for us to store! The most complicated possible Macaroon still chains up to a single root key (we generate a key per Fly.io “organization”; you don&amp;rsquo;t share keys with your neighbors), and everything that complicates that Macaroon happens “offline”. We take advantage of  &amp;ldquo;attenuation&amp;rdquo; far more than our users do.&lt;/p&gt;

&lt;p&gt;The result is that database writes are relatively rare and very simple: we just need to record an HMAC key when Fly.io organizations are created (that is, roughly, when people sign up for the service and actually do a deploy). That, and revocation lists (more on that later), which make up most of the data.&lt;/p&gt;
&lt;h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;3&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Talking to &lt;code&gt;tkdb&lt;/code&gt; from the rest of our platform is complicated, for historical reasons.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;NATS is fine, we just don’t really need it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Ben Toews is responsible for most of the good things about this implementation. When he inherited the v0 Macaroons code from me, we were in the middle of a weird love affair with &lt;a href='https://nats.io/' title=''&gt;NATS&lt;/a&gt;, the messaging system. So &lt;code&gt;tkdb&lt;/code&gt; exported an RPC API over NATS messages.&lt;/p&gt;

&lt;p&gt;Our product security team can’t trust NATS (it’s not our code). That means a vulnerability in NATS can’t result in us losing control of all our tokens, or allow attackers to spoof authentication. Which in turn means you can’t run a plaintext RPC protocol for &lt;code&gt;tkdb&lt;/code&gt; over NATS; attackers would just spoof “yes this token is fine” messages.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I highly recommend implementing Noise; &lt;a href="http://www.noiseprotocol.org/noise.html" title=""&gt;the spec&lt;/a&gt; is kind of a joy in a way you can’t appreciate until you use it, and it’s educational.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;But you can’t just run TLS over NATS; NATS is a message bus, not a streaming secure channel. So I did the hipster thing and implemented &lt;a href='http://www.noiseprotocol.org/noise.html' title=''&gt;Noise&lt;/a&gt;. We export a “verification” API, and a “signing” API for minting new tokens. Verification uses &lt;code&gt;Noise_IK&lt;/code&gt; (which works like normal TLS) — anybody can verify, but everyone needs to prove they’re talking to the real &lt;code&gt;tkdb&lt;/code&gt;. Signing uses &lt;code&gt;Noise_KK&lt;/code&gt; (which works like mTLS) — only a few components in our system can mint tokens, and they get a special client key.&lt;/p&gt;

&lt;p&gt;A little over a year ago, &lt;a href='https://fly.io/blog/the-exit-interview-jp/' title=''&gt;JP&lt;/a&gt; led an effort to replace NATS with HTTP, which is how you talk to &lt;code&gt;tkdb&lt;/code&gt; today. Out of laziness, we kept the Noise stuff, which means the interface to &lt;code&gt;tkdb&lt;/code&gt; is now HTTP/Noise. This is a design smell, but the security model is nice: across many thousands of machines, there are only a handful with the cryptographic material needed to mint a new Macaroon token. Neat!&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tkdb&lt;/code&gt; is a Fly App (albeit deployed in special Fly-only isolated regions). Our infrastructure talks to it over “&lt;a href='https://fly.io/docs/networking/flycast/' title=''&gt;FlyCast&lt;/a&gt;”, which is our internal Anycast service. If you’re in Singapore, you’re probably get routed to the Australian &lt;code&gt;tkdb&lt;/code&gt;. If Australia falls over, you’ll get routed to the closest backup. The proxy that implements FlyCast is smart, as is the &lt;code&gt;tkdb&lt;/code&gt; client library, which will do exponential backoff retry transparently.&lt;/p&gt;

&lt;p&gt;Even with all that, we don’t like that Macaroon token verification is &amp;ldquo;online&amp;rdquo;. When you operate a global public cloud one of the first thing you learn is that &lt;a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''&gt;the global Internet sucks&lt;/a&gt;. Connectivity breaks all the time, and we’re paranoid about it. It’s painful for us that token verification can imply transoceanic links. Lovecraft was right about the oceans! Stay away!&lt;/p&gt;

&lt;p&gt;Our solution to this is caching. Macaroons, as it turns out, cache beautifully. That’s because once you’ve seen and verified a Macaroon, you have enough information to verify any more-specific Macaroon that descends from it; that’s a property of &lt;a href='https://research.google/pubs/macaroons-cookies-with-contextual-caveats-for-decentralized-authorization-in-the-cloud/' title=''&gt;their chaining HMAC construction&lt;/a&gt;. Our client libraries cache verifications, and the cache ratio for verification is over 98%.&lt;/p&gt;
&lt;h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;4&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''&gt;Revocation isn’t a corner case&lt;/a&gt;. It can’t be an afterthought. We’re potentially revoking tokens any time a user logs out. If that doesn’t work reliably, you wind up with “cosmetic logout”, which is a real vulnerability. When we kill a token, it needs to stay dead.&lt;/p&gt;

&lt;p&gt;Our revocation system is simple. It’s this table:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-434h6qlv"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-434h6qlv"&gt;        CREATE TABLE IF NOT EXISTS blacklist ( 
        nonce               BLOB NOT NULL UNIQUE, 
        required_until      DATETIME,
        created_at          DATETIME DEFAULT CURRENT_TIMESTAMP
        );
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When we need a token to be dead, we have our primary API do a call to the &lt;code&gt;tkdb&lt;/code&gt; “signing” RPC service for &lt;code&gt;revoke&lt;/code&gt;. &lt;code&gt;revoke&lt;/code&gt; takes the random nonce from the beginning of the Macaroon, discarding the rest, and adds it to the blacklist. Every Macaroon in the lineage of that nonce is now dead; we check the blacklist before verifying tokens.&lt;/p&gt;

&lt;p&gt;The obvious challenge here is caching; over 98% of our validation requests never hit &lt;code&gt;tkdb&lt;/code&gt;. We certainly don’t want to propagate the blacklist database to 35 regions around the globe.&lt;/p&gt;

&lt;p&gt;Instead, the &lt;code&gt;tkdb&lt;/code&gt; “verification” API exports an endpoint that provides a feed of revocation notifications. Our client library “subscribes” to this API (really, it just polls). Macaroons are revoked regularly (but not constantly), and when that happens, clients notice and prune their caches.&lt;/p&gt;

&lt;p&gt;If clients lose connectivity to &lt;code&gt;tkdb&lt;/code&gt;, past some threshold interval, they just dump their entire cache, forcing verification to happen at &lt;code&gt;tkdb&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;5&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;A place where we’ve gotten a lot of internal mileage out of Macaroon features is service tokens. Service tokens are tokens used by code, rather than humans; almost always, a service token is something that is stored alongside running application code.&lt;/p&gt;

&lt;p&gt;An important detail of Fly.io’s Macaroons is the distinction between a “permissions” token and an “authentication” token. Macaroons by themselves express authorization, not authentication.&lt;/p&gt;

&lt;p&gt;That’s a useful constraint, and we want to honor it. By requiring a separate token for authentication, we minimize the impact of having the permissions token stolen; you can’t use it without authentication, so really it’s just like a mobile signed IAM policy expression. Neat!&lt;/p&gt;

&lt;p&gt;The way we express authentication is with a third-party caveat (&lt;a href='https://fly.io/blog/macaroons-escalated-quickly/#4' title=''&gt;see the old post for details&lt;/a&gt;). Your main Fly.io Macaroon will have a caveat saying “this token is only valid if accompanied by the discharge token for a user in your organization from our authentication system”. Our authentication system does the login dance and issues those discharges.&lt;/p&gt;

&lt;p&gt;This is exactly what you want for user tokens and not at all what you want for a service token: we don’t want running code to store those authenticator tokens, because they’re hazardous.&lt;/p&gt;

&lt;p&gt;The solution we came up with for service tokens is simple: &lt;code&gt;tkdb&lt;/code&gt; exports an API that uses its access to token secrets to strip off the third-party authentication caveat. To call into that API, you have to present a valid discharging authentication token; that is, you have to prove you could already have done whatever the token said. &lt;code&gt;tkdb&lt;/code&gt; returns a new token with all the previous caveats, minus the expiration (you don’t usually want service tokens to expire).&lt;/p&gt;

&lt;p&gt;OK, so we’ve managed to transform a tuple &lt;code&gt;(unscary-token, scary-token)&lt;/code&gt; into the new tuple &lt;code&gt;(scary-token)&lt;/code&gt;. Not so impressive. But hold on: the recipient of &lt;code&gt;scary-token&lt;/code&gt; can attenuate it further: we can lock it to a particular instance of &lt;code&gt;flyd&lt;/code&gt;, or to a particular Fly Machine. Which means exfiltrating it doesn’t do you any good; to use it, you have to control the environment it’s intended to be used in.&lt;/p&gt;

&lt;p&gt;The net result of this scheme is that a compromised physical host will only give you access to tokens that have been used on that worker, which is a very nice property. Another way to look at it: every token used in production is traceable in some way to a valid token a user submitted. Neat!&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;All the cool spooky secret store names were taken.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We do a similar dance to with Pet Semetary, our internal Vault replacement. Petsem manages user secrets for applications, such as Postgres connection strings. Petsem is its own Macaroon authority (it issues its own Macaroons with its own permissions system), and to do something with a secret, you need one of those Petsem-minted Macaroon.&lt;/p&gt;

&lt;p&gt;Our primary API servers field requests from users to set secrets for their apps. So the API has a Macaroon that allows secrets writes. But it doesn&amp;rsquo;t allow reads: there’s no API call to dump your secrets, because our API servers don’t have that privilege. So far, so good.&lt;/p&gt;

&lt;p&gt;But when we boot up a Fly Machine, we need to inject the appropriate user secrets into it at boot; &lt;em&gt;something&lt;/em&gt; needs a Macaroon that can read secrets. That “something” is &lt;code&gt;flyd&lt;/code&gt;, our orchestrator, which runs on every worker server in our fleet.&lt;/p&gt;

&lt;p&gt;Clearly, we can’t give every &lt;code&gt;flyd&lt;/code&gt; a Macaroon that reads every user’s secret. Most users will never deploy anything on any given worker, and we can’t have a security model that collapses down to “every worker is equally privileged”.&lt;/p&gt;

&lt;p&gt;Instead, the “read secret” Macaroon that &lt;code&gt;flyd&lt;/code&gt; gets has a third-party caveat attached to it, which is dischargeable only by talking to &lt;code&gt;tkdb&lt;/code&gt; and proving (with normal Macaroon tokens) that you have permissions for the org whose secrets you want to read. Once again, access is traceable to an end-user action, and minimized across our fleet. Neat!&lt;/p&gt;
&lt;h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;6&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Our token systems have some of the best telemetry in the whole platform.&lt;/p&gt;

&lt;p&gt;Most of that is down to &lt;a href='http://opentelemetry.io/' title=''&gt;OpenTelemetry&lt;/a&gt; and &lt;a href='https://www.honeycomb.io/' title=''&gt;Honeycomb&lt;/a&gt;. From the moment a request hits our API server through the moment &lt;code&gt;tkdb&lt;/code&gt; responds to it, oTel &lt;a href='https://opentelemetry.io/docs/concepts/context-propagation/' title=''&gt;context propagation&lt;/a&gt; gives us a single narrative about what’s happening.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://fly.io/blog/the-exit-interview-jp/' title=''&gt;I was a skeptic about oTel&lt;/a&gt;. It’s really, really expensive. And, not to put too fine a point on it, oTel really cruds up our code. Once, I was an “80% of the value of tracing, we can get from logs and metrics” person. But I was wrong.&lt;/p&gt;

&lt;p&gt;Errors in our token system are rare. Usually, they’re just early indications of network instability, and between caching and FlyCast, we mostly don’t have to care about those alerts. When we do, it’s because something has gone so sideways that we’d have to care anyways. The &lt;code&gt;tkdb&lt;/code&gt; code is remarkably stable and there hasn’t been an incident intervention with our token system in over a year.&lt;/p&gt;

&lt;p&gt;Past oTel, and the standard logging and Prometheus metrics every Fly App gets for free, we also have a complete audit trail for token operations, in a permanently retained OpenSearch cluster index. Since virtually all the operations that happen on our platform are mediated by Macaroons, this audit trail is itself pretty powerful.&lt;/p&gt;
&lt;h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;7&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So, that&amp;rsquo;s pretty much it. The moral of the story for us is, Macaroons have a lot of neat features, our users mostly don&amp;rsquo;t care about them — that may even be a good thing — but we get a lot of use out of them internally.&lt;/p&gt;

&lt;p&gt;As an engineering culture, we&amp;rsquo;re allergic to &amp;ldquo;microservices&amp;rdquo;, and we flinched a bit at the prospect of adding a specific service just to manage tokens. But it&amp;rsquo;s pulled its weight, and not added really any drama at all. We have at this point a second dedicated security service (Petsem), and even though they sort of rhyme with each other, we&amp;rsquo;ve got no plans to merge them. &lt;a href='https://how.complexsystems.fail/#10' title=''&gt;Rule #10&lt;/a&gt; and all that.&lt;/p&gt;

&lt;p&gt;Oh, and a total victory for LiteFS, Litestream, and infrastructure SQLite. Which, after managing an infrastructure SQLite project that routinely ballooned to tens of gigabytes and occasionally threatened service outages, is lovely to see.&lt;/p&gt;

&lt;p&gt;Macaroons! If you&amp;rsquo;d asked us a year ago, we&amp;rsquo;d have said the jury was still out on whether they were a good move. But then Ben Toews spent a year making them awesome, and so they are. &lt;a href='https://github.com/superfly/macaroon' title=''&gt;Most of the code is open source&lt;/a&gt;!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Taming A Voracious Rust Proxy</title>
        <link rel="alternate" href="https://fly.io/blog/taming-rust-proxy/"/>
        <id>https://fly.io/blog/taming-rust-proxy/</id>
        <published>2025-02-26T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/taming-rust-proxy/assets/happy-crab-cover.jpg"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Here’s a fun bug.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The basic idea of our service is that we run containers for our users, as hardware-isolated virtual machines (Fly Machines), on hardware we own around the world. What makes that interesting is that we also connect every Fly Machine to a global Anycast network. If your app is running in Hong Kong and Dallas, and a request for it arrives in Singapore, we&amp;rsquo;ll route it to &lt;code&gt;HKG&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Our own hardware fleet is roughly divided into two kinds of servers: edges, which receive incoming requests from the Internet, and workers, which run Fly Machines. Edges exist almost solely to run a Rust program called &lt;code&gt;fly-proxy&lt;/code&gt;, the router at the heart of our Anycast network.&lt;/p&gt;

&lt;p&gt;So: a week or so ago, we flag an incident. Lots of things generate incidents: synthetic monitoring failures, metric thresholds, health check failures. In this case two edge tripwires tripped: elevated &lt;code&gt;fly-proxy&lt;/code&gt; HTTP errors, and skyrocketing CPU utilization, on a couple hosts in &lt;code&gt;IAD&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Our incident process is pretty ironed out at this point. We created an incident channel (we ❤️ &lt;a href='https://rootly.com/' title=''&gt;Rootly&lt;/a&gt; for this, &lt;a href='https://rootly.com/' title=''&gt;seriously check out Rootly&lt;/a&gt;, an infra MVP here for years now), and incident responders quickly concluded that, while something hinky was definitely going on, the platform was fine. We have a lot of edges, and we&amp;rsquo;ve also recently converted many of our edge servers to significantly beefier hardware.&lt;/p&gt;

&lt;p&gt;Bouncing &lt;code&gt;fly-proxy&lt;/code&gt; clears the problem up on an affected proxy. But this wouldn&amp;rsquo;t be much of an interesting story if the problem didn&amp;rsquo;t later come back. So, for some number of hours, we&amp;rsquo;re in an annoying steady-state of getting paged and bouncing proxies. &lt;/p&gt;

&lt;p&gt;While this is happening, Pavel, on our proxy team, pulls a profile from an angry proxy. 
&lt;img alt="A flamegraph profile, described better in the prose anyways." src="/blog/taming-rust-proxy/assets/proxy-profile.jpg" /&gt;
So, this is fuckin&amp;rsquo; weird: a huge chunk of the profile is dominated by Rust &lt;code&gt;tracing&lt;/code&gt;&amp;lsquo;s &lt;code&gt;Subscriber&lt;/code&gt;. But that doesn&amp;rsquo;t make sense. The entire point of Rust &lt;code&gt;tracing&lt;/code&gt;, which generates fine-grained span records for program activity, is that &lt;code&gt;entering&lt;/code&gt; and &lt;code&gt;exiting&lt;/code&gt; a span is very, very fast. &lt;/p&gt;

&lt;p&gt;If the mere act of &lt;code&gt;entering&lt;/code&gt; a span in a Tokio stack is chewing up a significant amount of CPU, something has gone haywire: the actual code being traced must be doing next to nothing.&lt;/p&gt;
&lt;h2 id='a-quick-refresher-on-async-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-quick-refresher-on-async-rust' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A Quick Refresher On Async Rust&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So in Rust, like a lot of &lt;code&gt;async/await&lt;/code&gt; languages, you&amp;rsquo;ve got &lt;code&gt;Futures&lt;/code&gt;. A &lt;code&gt;Future&lt;/code&gt; is a type that represents the future value of an asychronous computation, like reading from a socket. &lt;code&gt;Futures&lt;/code&gt; are state machines, and they&amp;rsquo;re lazy: they expose one basic operation, &lt;code&gt;poll&lt;/code&gt;, which an executor (like Tokio) calls to advance the state machine. That &lt;code&gt;poll&lt;/code&gt; returns whether the &lt;code&gt;Future&lt;/code&gt; is still &lt;code&gt;Pending&lt;/code&gt;, or &lt;code&gt;Ready&lt;/code&gt; with a result.&lt;/p&gt;

&lt;p&gt;In theory, you could build an executor that drove a bunch of &lt;code&gt;Futures&lt;/code&gt; just by storing them in a list and busypolling each of them, round robin, until they return &lt;code&gt;Ready&lt;/code&gt;. This executor would defeat the much of the purpose of asynchronous program, so no real executor works that way.&lt;/p&gt;

&lt;p&gt;Instead, a runtime like Tokio integrates &lt;code&gt;Futures&lt;/code&gt; with an event loop (on &lt;a href='https://man7.org/linux/man-pages/man7/epoll.7.html' title=''&gt;epoll&lt;/a&gt; or &lt;a href='https://en.wikipedia.org/wiki/Kqueue' title=''&gt;kqeue&lt;/a&gt;) and, when calling &lt;code&gt;poll&lt;/code&gt;, passes a &lt;code&gt;Waker&lt;/code&gt;. The &lt;code&gt;Waker&lt;/code&gt; is an abstract handle that allows the &lt;code&gt;Future&lt;/code&gt; to instruct the Tokio runtime to call &lt;code&gt;poll&lt;/code&gt;, because something has happened.&lt;/p&gt;

&lt;p&gt;To complicate things: an ordinary &lt;code&gt;Future&lt;/code&gt; is a one-shot value. Once it&amp;rsquo;s &lt;code&gt;Ready&lt;/code&gt;, it can&amp;rsquo;t be &lt;code&gt;polled&lt;/code&gt; anymore. But with network programming, that&amp;rsquo;s usually not what you want: data usually arrives in streams, which you want to track and make progress on as you can. So async Rust provides &lt;code&gt;AsyncRead&lt;/code&gt; and &lt;code&gt;AsyncWrite&lt;/code&gt; traits, which build on &lt;code&gt;Futures&lt;/code&gt;, and provide methods like &lt;code&gt;poll_read&lt;/code&gt; that return &lt;code&gt;Ready&lt;/code&gt; &lt;em&gt;every time&lt;/em&gt; there&amp;rsquo;s data ready. &lt;/p&gt;

&lt;p&gt;So far so good? OK. Now, there are two footguns in this design. &lt;/p&gt;

&lt;p&gt;The first footgun is that a &lt;code&gt;poll&lt;/code&gt; of a &lt;code&gt;Future&lt;/code&gt; that isn&amp;rsquo;t &lt;code&gt;Ready&lt;/code&gt; wastes cycles, and, if you have a bug in your code and that &lt;code&gt;Pending&lt;/code&gt; poll happens to trip a &lt;code&gt;Waker&lt;/code&gt;, you&amp;rsquo;ll slip into an infinite loop. That&amp;rsquo;s easy to see.&lt;/p&gt;

&lt;p&gt;The second and more insidious footgun is that an &lt;code&gt;AsyncRead&lt;/code&gt; can &lt;code&gt;poll_read&lt;/code&gt; to a &lt;code&gt;Ready&lt;/code&gt; that doesn&amp;rsquo;t actually progress its underlying state machine. Since the idea of &lt;code&gt;AsyncRead&lt;/code&gt; is that you keep &lt;code&gt;poll_reading&lt;/code&gt; until it stops being &lt;code&gt;Ready&lt;/code&gt;, this too is an infinite loop.&lt;/p&gt;

&lt;p&gt;When we look at our profiles, what we see are samples that almost terminate in libc, but spend next to no time in the kernel doing actual I/O. The obvious explanation: we&amp;rsquo;ve entered lots of &lt;code&gt;poll&lt;/code&gt; functions, but they&amp;rsquo;re doing almost nothing and returning immediately.&lt;/p&gt;
&lt;h2 id='jaccuse' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#jaccuse' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;J&amp;#39;accuse!&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Wakeup issues are annoying to debug. But the flamegraph gives us the fully qualified type of the &lt;code&gt;Future&lt;/code&gt; we&amp;rsquo;re polling:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-sqkxkpif"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-sqkxkpif"&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="nn"&gt;fp_io&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Duplex&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="nn"&gt;fp_io&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;reusable_reader&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ReusableReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;peek&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PeekableReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;tokio_rustls&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;server&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TlsStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp_metered&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;MeteredIo&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;peek&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PeekableReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;permitted&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PermittedTcpStream&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Conn&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;net&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TcpStream&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This loops like a lot, but much of it is just wrapper types we wrote ourselves, and those wrappers don&amp;rsquo;t do anything interesting. What&amp;rsquo;s left to audit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Duplex&lt;/code&gt;, the outermost type, one of ours, &lt;em&gt;and&lt;/em&gt;
&lt;/li&gt;&lt;li&gt;&lt;code&gt;TlsStream&lt;/code&gt;, from &lt;a href='https://github.com/rustls/rustls' title=''&gt;Rustls&lt;/a&gt;.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;Duplex&lt;/code&gt; is a beast. It&amp;rsquo;s the core I/O state machine for proxying between connections. It&amp;rsquo;s not easy to reason about in specificity. But: it also doesn&amp;rsquo;t do anything directly with a &lt;code&gt;Waker&lt;/code&gt;; it&amp;rsquo;s built around &lt;code&gt;AsyncRead&lt;/code&gt; and &lt;code&gt;AsyncWrite&lt;/code&gt;. It hasn&amp;rsquo;t changed recently and we can&amp;rsquo;t trigger misbehavior in it.&lt;/p&gt;

&lt;p&gt;That leaves &lt;code&gt;TlsStream&lt;/code&gt;. &lt;code&gt;TlsStream&lt;/code&gt; is an ultra-important, load-bearing function in the Rust ecosystem. Everybody uses it. Could it harbor an async Rust footgun? Turns out, it did!&lt;/p&gt;

&lt;p&gt;Unlike our &lt;code&gt;Duplex&lt;/code&gt;, Rustls actually does have to get intimate with the underlying async executor. And, looking through the repository, Pavel uncovers &lt;a href='https://github.com/rustls/tokio-rustls/issues/72' title=''&gt;this issue&lt;/a&gt;: sometimes, &lt;code&gt;TlsStreams&lt;/code&gt; in Rustls just spin out. And it turns out, what&amp;rsquo;s causing this is a TLS state machine bug: when a TLS session is orderly-closed, with a &lt;code&gt;CloseNotify&lt;/code&gt; &lt;code&gt;Alert&lt;/code&gt; record, the sender of that record has informed its counterparty that no further data will be sent. But if there&amp;rsquo;s still buffered data on the underlying connection, &lt;code&gt;TlsStream&lt;/code&gt; mishandles its &lt;code&gt;Waker&lt;/code&gt;, and we fall into a busy-loop.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/rustls/rustls/pull/1950/files' title=''&gt;Pretty straightforward fix&lt;/a&gt;!&lt;/p&gt;
&lt;h2 id='what-actually-happened-to-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-actually-happened-to-us' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Actually Happened To Us&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Our partners in object storage, &lt;a href='https://www.tigrisdata.com/' title=''&gt;Tigris Data&lt;/a&gt;, were conducting some kind of load testing exercise. Some aspect of their testing system triggered the &lt;code&gt;TlsStream&lt;/code&gt; state machine bug, which locked up one or more &lt;code&gt;TlsStreams&lt;/code&gt; in the edge proxy handling whatever corner-casey stream they were sending.&lt;/p&gt;

&lt;p&gt;Tigris wasn&amp;rsquo;t generating a whole lot of traffic; tens of thousands of connections, tops. But all of them sent small HTTP bodies and then terminated early. We figured some of those connections errored out, and set up the &amp;ldquo;TLS CloseNotify happened before EOF&amp;rdquo; scenario. &lt;/p&gt;

&lt;p&gt;To be truer to the chronology, we knew pretty early on in our investigation that something Tigris was doing with their load testing was probably triggering the bug, and we got them to stop. After we worked it out, and Pavel deployed the fix, we told them to resume testing. No spin-outs.&lt;/p&gt;
&lt;h2 id='lessons-learned' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lessons-learned' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Lessons Learned&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Keep your dependencies updated. Unless you shouldn&amp;rsquo;t keep your dependencies updated. I mean, if there&amp;rsquo;s a vulnerability (and, technically, this was a DoS vulnerability), always update. And if there&amp;rsquo;s an important bugfix, update. But if there isn&amp;rsquo;t an important bugfix, updating for the hell of it might also destabilize your project? So update maybe? Most of the time?&lt;/p&gt;

&lt;p&gt;Really, the truth of this is that keeping track of &lt;em&gt;what needs to be updated&lt;/em&gt; is valuable work. The updates themselves are pretty fast and simple, but the process and testing infrastructure to confidently metabolize dependency updates is not. &lt;/p&gt;

&lt;p&gt;Our other lesson here is that there&amp;rsquo;s an opportunity to spot these kinds of bugs more directly with our instrumentation. Spurious wakeups should be easy to spot, and triggering a metric when they happen should be cheap, because they&amp;rsquo;re not supposed to happen often. So that&amp;rsquo;s something we&amp;rsquo;ll go do now.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>We Were Wrong About GPUs</title>
        <link rel="alternate" href="https://fly.io/blog/wrong-about-gpu/"/>
        <id>https://fly.io/blog/wrong-about-gpu/</id>
        <published>2025-02-14T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/wrong-about-gpu/assets/choices-choices-cover.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re building a public cloud, on hardware we own. We raised money to do that, and to place some bets; one of them: GPU-enabling our customers. A progress report: GPUs aren’t going anywhere, but: GPUs aren’t going anywhere.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A couple years back, &lt;a href="https://fly.io/gpu"&gt;we put a bunch of chips down&lt;/a&gt; on the bet that people shipping apps to users on the Internet would want GPUs, so they could do AI/ML inference tasks. To make that happen, we created &lt;a href="https://fly.io/docs/gpus/getting-started-gpus/"&gt;Fly GPU Machines&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A Fly Machine is a &lt;a href="https://fly.io/blog/docker-without-docker/"&gt;Docker/OCI container&lt;/a&gt; running inside a hardware-virtualized virtual machine somewhere on our global fleet of bare-metal worker servers. A GPU Machine is a Fly Machine with a hardware-mapped Nvidia GPU. It&amp;rsquo;s a Fly Machine that can do fast CUDA.&lt;/p&gt;

&lt;p&gt;Like everybody else in our industry, we were right about the importance of AI/ML. If anything, we underestimated its importance. But the product we came up with probably doesn&amp;rsquo;t fit the moment. It&amp;rsquo;s a bet that doesn&amp;rsquo;t feel like it&amp;rsquo;s paying off.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;If you&amp;rsquo;re using Fly GPU Machines, don&amp;rsquo;t freak out; we&amp;rsquo;re not getting rid of them.&lt;/strong&gt; But if you&amp;rsquo;re waiting for us to do something bigger with them, a v2 of the product, you&amp;rsquo;ll probably be waiting awhile.&lt;/p&gt;
&lt;h3 id='what-it-took' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-took' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What It Took&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;GPU Machines were not a small project for us. Fly Machines run on an idiosyncratically small hypervisor (normally Firecracker, but for GPU Machines &lt;a href="https://github.com/cloud-hypervisor/cloud-hypervisor"&gt;Intel&amp;rsquo;s Cloud Hypervisor&lt;/a&gt;, a very similar Rust codebase that supports PCI passthrough). The Nvidia ecosystem is not geared to supporting micro-VM hypervisors.&lt;/p&gt;

&lt;p&gt;GPUs &lt;a href="https://googleprojectzero.blogspot.com/2020/09/attacking-qualcomm-adreno-gpu.html"&gt;terrified our security team&lt;/a&gt;. A GPU is just about the worst case hardware peripheral: intense multi-directional direct memory transfers&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(not even bidirectional: in common configurations, GPUs talk to each other)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;with arbitrary, end-user controlled computation, all operating outside our normal security boundary.&lt;/p&gt;

&lt;p&gt;We did a couple expensive things to mitigate the risk. We shipped GPUs on dedicated server hardware, so that GPU- and non-GPU workloads weren&amp;rsquo;t mixed. Because of that, the only reason for a Fly Machine to be scheduled on a GPU machine was that it needed a PCI BDF for an Nvidia GPU, and there&amp;rsquo;s a limited number of those available on any box. Those GPU servers were drastically less utilized and thus less cost-effective than our ordinary servers.&lt;/p&gt;

&lt;p&gt;We funded two very large security assessments, from &lt;a href="https://www.atredis.com/"&gt;Atredis&lt;/a&gt; and &lt;a href="https://tetrelsec.com/"&gt;Tetrel&lt;/a&gt;, to evaluate our GPU deployment. Matt Braun is writing up those assessments now. They were not cheap, and they took time.&lt;/p&gt;

&lt;p&gt;Security wasn&amp;rsquo;t directly the biggest cost we had to deal with, but it was an indirect cause for a subtle reason.&lt;/p&gt;

&lt;p&gt;We could have shipped GPUs very quickly by doing what Nvidia recommended: standing up a standard K8s cluster to schedule GPU jobs on. Had we taken that path, and let our GPU users share a single Linux kernel, we&amp;rsquo;d have been on Nvidia&amp;rsquo;s driver happy-path.&lt;/p&gt;

&lt;p&gt;Alternatively, we could have used a conventional hypervisor. Nvidia suggested VMware (heh). But they could have gotten things working had we used QEMU. We like QEMU fine, and could have talked ourselves into a security story for it, but the whole point of Fly Machines is that they take milliseconds to start. We could not have offered our desired Developer Experience on the Nvidia happy-path.&lt;/p&gt;

&lt;p&gt;Instead, we burned months trying (and ultimately failing) to get Nvidia&amp;rsquo;s host drivers working to map &lt;a href="https://www.nvidia.com/en-us/data-center/virtual-solutions/"&gt;virtualized GPUs&lt;/a&gt; into Intel Cloud Hypervisor. At one point, we hex-edited the closed-source drivers to trick them into thinking our hypervisor was QEMU.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m not sure any of this really mattered in the end. There&amp;rsquo;s a segment of the market we weren&amp;rsquo;t ever really able to explore because Nvidia&amp;rsquo;s driver support kept us from thin-slicing GPUs. We&amp;rsquo;d have been able to put together a really cheap offering for developers if we hadn&amp;rsquo;t run up against that, and developers love &amp;ldquo;cheap&amp;rdquo;, but I can&amp;rsquo;t prove that those customers are real.&lt;/p&gt;

&lt;p&gt;On the other hand, we&amp;rsquo;re committed to delivering the Fly Machine DX for GPU workloads. Beyond the PCI/IOMMU drama, just getting an entire hardware GPU working in a Fly Machine was a lift. We needed Fly Machines that would come up with the right Nvidia drivers; our stack was built assuming that the customer&amp;rsquo;s OCI container almost entirely defined the root filesystem for a Machine. We had to engineer around that in our &lt;code&gt;flyd&lt;/code&gt; orchestrator. And almost everything people want to do with GPUs involves efficiently grabbing huge files full of model weights. Also annoying!&lt;/p&gt;

&lt;p&gt;And, of course, we bought GPUs. A lot of GPUs. Expensive GPUs.&lt;/p&gt;
&lt;h3 id='why-it-isnt-working' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-it-isnt-working' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Why It Isn&amp;rsquo;t Working&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The biggest problem: developers don&amp;rsquo;t want GPUs. They don&amp;rsquo;t even want AI/ML models. They want LLMs. &lt;em&gt;System engineers&lt;/em&gt; may have smart, fussy opinions on how to get their models loaded with CUDA, and what the best GPU is. But &lt;em&gt;software developers&lt;/em&gt; don&amp;rsquo;t care about any of that. When a software developer shipping an app comes looking for a way for their app to deliver prompts to an LLM, you can&amp;rsquo;t just give them a GPU.&lt;/p&gt;

&lt;p&gt;For those developers, who probably make up most of the market, it doesn&amp;rsquo;t seem plausible for an insurgent public cloud to compete with OpenAI and Anthropic. Their APIs are fast enough, and developers thinking about performance in terms of &amp;ldquo;tokens per second&amp;rdquo; aren&amp;rsquo;t counting milliseconds.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(you should all feel sympathy for us)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This makes us sad because we really like the point in the solution space we found. Developers shipping apps on Amazon will outsource to other public clouds to get cost-effective access to GPUs. But then they&amp;rsquo;ll faceplant trying to handle data and model weights, backhauling gigabytes (at significant expense) from S3. We have app servers, GPUs, and object storage all under the same top-of-rack switch. But inference latency just doesn&amp;rsquo;t seem to matter yet, so the market doesn&amp;rsquo;t care.&lt;/p&gt;

&lt;p&gt;Past that, and just considering the system engineers who do care about GPUs rather than LLMs: the hardware product/market fit here is really rough.&lt;/p&gt;

&lt;p&gt;People doing serious AI work want galactically huge amounts of GPU compute. A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Near as we can tell, MIG gives you a UUID to talk to the host driver, not a PCI device.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We think there&amp;rsquo;s probably a market for users doing lightweight ML work getting tiny GPUs. &lt;a href="https://www.nvidia.com/en-us/technologies/multi-instance-gpu/"&gt;This is what Nvidia MIG does&lt;/a&gt;, slicing a big GPU into arbitrarily small virtual GPUs. But for fully-virtualized workloads, it&amp;rsquo;s not baked; we can&amp;rsquo;t use it. And I&amp;rsquo;m not sure how many of those customers there are, or whether we&amp;rsquo;d get the density of customers per server that we need.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half"&gt;That leaves the L40S customers&lt;/a&gt;. There are a bunch of these! We dropped L40S prices last year, not because we were sour on GPUs but because they&amp;rsquo;re the one part we have in our inventory people seem to get a lot of use out of. We&amp;rsquo;re happy with them. But they&amp;rsquo;re just another kind of compute that some apps need; they&amp;rsquo;re not a driver of our core business. They&amp;rsquo;re not the GPU bet paying off.&lt;/p&gt;

&lt;p&gt;Really, all of this is just a long way of saying that for most software developers, &amp;ldquo;AI-enabling&amp;rdquo; their app is best done with API calls to things like Claude and GPT, Replicate and RunPod.&lt;/p&gt;
&lt;h3 id='what-did-we-learn' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-did-we-learn' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Did We Learn?&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;A very useful way to look at a startup is that it&amp;rsquo;s a race to learn stuff. So, what&amp;rsquo;s our report card?&lt;/p&gt;

&lt;p&gt;First off, when we embarked down this path in 2022, we were (like many other companies) operating in a sort of phlogiston era of AI/ML. The industry attention to AI had not yet collapsed around a small number of foundational LLM models. We expected there to be a diversity of &lt;em&gt;mainstream&lt;/em&gt; models, the world &lt;a href='https://github.com/elixir-nx/bumblebee' title=''&gt;Elixir Bumblebee&lt;/a&gt; looks forward to, where people pull different AI workloads off the shelf the same way they do Ruby gems.&lt;/p&gt;

&lt;p&gt;But &lt;a href='https://www.cursor.com/' title=''&gt;Cursor happened&lt;/a&gt;, and, as they say, how are you going to keep &amp;lsquo;em down on the farm once they&amp;rsquo;ve seen Karl Hungus? It seems much clearer where things are heading.&lt;/p&gt;

&lt;p&gt;GPUs were a test of a Fly.io company credo: as we think about core features, we design for 10,000 developers, not for 5-6. It took a minute, but the credo wins here: GPU workloads for the 10,001st developer are a niche thing.&lt;/p&gt;

&lt;p&gt;Another way to look at a startup is as a series of bets. We put a lot of chips down here. But the buy-in for this tournament gave us a lot of chips to play with. Never making a big bet of any sort isn&amp;rsquo;t a winning strategy. I&amp;rsquo;d rather we&amp;rsquo;d flopped the nut straight, but I think going in on this hand was the right call.&lt;/p&gt;

&lt;p&gt;A really important thing to keep in mind here, and something I think a lot of startup thinkers sleep on, is the extent to which this bet involved acquiring assets. Obviously, some of our &lt;a href='https://fly.io/blog/the-exit-interview-jp/' title=''&gt;costs here aren&amp;rsquo;t recoverable&lt;/a&gt;. But the hardware parts that aren&amp;rsquo;t generating revenue will ultimately get liquidated; like with &lt;a href='https://fly.io/blog/32-bit-real-estate/' title=''&gt;our portfolio of IPv4 addresses&lt;/a&gt;, I&amp;rsquo;m even more comfortable making bets backed by tradable assets with durable value.&lt;/p&gt;

&lt;p&gt;In the end, I don&amp;rsquo;t think GPU Fly Machines were going to be a hit for us no matter what we did. Because of that, one thing I&amp;rsquo;m very happy about is that we didn&amp;rsquo;t compromise the rest of the product for them. Security concerns slowed us down to where we probably learned what we needed to learn a couple months later than we could have otherwise, but we&amp;rsquo;re scaling back our GPU ambitions without having sacrificed &lt;a href='https://fly.io/blog/sandboxing-and-workload-isolation/' title=''&gt;any of our isolation story&lt;/a&gt;, and, ironically, GPUs &lt;em&gt;other people run&lt;/em&gt; are making that story a lot more important. The same thing goes for our Fly Machine developer experience.&lt;/p&gt;

&lt;p&gt;We started this company building a Javascript runtime for edge computing. We learned that our customers didn&amp;rsquo;t want a new Javascript runtime; they just wanted their native code to work. &lt;a href='https://news.ycombinator.com/item?id=22616857' title=''&gt;We shipped containers&lt;/a&gt;, and no convincing was needed. We were wrong about Javascript edge functions, and I think we were wrong about GPUs. That&amp;rsquo;s usually how we figure out the right answers:  by being wrong about a lot of stuff.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>The Exit Interview: JP Phillips</title>
        <link rel="alternate" href="https://fly.io/blog/the-exit-interview-jp/"/>
        <id>https://fly.io/blog/the-exit-interview-jp/</id>
        <published>2025-02-12T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/the-exit-interview-jp/assets/bye-bye-little-sebastian-cover.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;JP Phillips is off to greener, or at least calmer, pastures. He joined us 4 years ago to build the next generation of our orchestration system, and has been one of the anchors of our engineering team. His last day is today. We wanted to know what he was thinking, and figured you might too.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Question 1: Why, JP? Just why?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;LOL. When I looked at what I wanted to see from here in the next 3-4 years, it didn&amp;rsquo;t really match up with where we&amp;rsquo;re currently heading. Specifically, with our new focus on MPG &lt;em&gt;[Managed Postgres]&lt;/em&gt; and [llm] &lt;em&gt;[llm].&lt;/em&gt;&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Editorial comment: Even I don’t know what [llm] is.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The Fly Machines platform is more or less finished, in the sense of being capable of supporting the next iteration of our products. My original desire to join Fly.io was to make Machines a product that would &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;rid us of HashiCorp Nomad&lt;/a&gt;, and I feel like that&amp;rsquo;s been accomplished.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Where were you hoping to see us headed?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;More directly positioned as a cloud provider, rather than a platform-as-a-service; further along the customer journey from &amp;ldquo;developers&amp;rdquo; and &amp;ldquo;startups&amp;rdquo; to large established companies.&lt;/p&gt;

&lt;p&gt;And, it&amp;rsquo;s not that I disagree with PAAS work or MPG! Rather, it&amp;rsquo;s not something that excites me in a way that I&amp;rsquo;d feel challenged and could continue to grow technically.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow up question: does your family know what you’re doing here? Doing to us? Are they OK with it?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, my family was very involved in the decision, before I even talked to other companies.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What&amp;rsquo;s the thing you&amp;rsquo;re happiest about having built here? It cannot be &amp;ldquo;all of &lt;code&gt;flyd&lt;/code&gt;&amp;rdquo;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve enabled developers to run workloads from an OCI image and an API call all over the world. On any other cloud provider, the knowledge of how to pull that off comes with a professional certification.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;In what file in our &lt;code&gt;nomad-firecracker&lt;/code&gt; repository would I find that code?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href='https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines' title=''&gt;https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img alt="A diagram that doesn&amp;#39;t make any of this clearer" src="/blog/the-exit-interview-jp/assets/flaps.png?1/2&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;So you mean, literally, the whole Fly Machines API, and &lt;code&gt;flaps&lt;/code&gt;, the API gateway for Fly Machines?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, all of it. The &lt;code&gt;flaps&lt;/code&gt; API server, the &lt;code&gt;flyd&lt;/code&gt; RPCs it calls, the &lt;code&gt;flyd&lt;/code&gt; finite state machine system, the interface to running VMs.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Is there something you especially like about that design?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I like that it for the most part doesn&amp;rsquo;t require any central coordination. And I like that the P90 for Fly Machine &lt;code&gt;create&lt;/code&gt; calls is sub-5-seconds for pretty much every region except for Johannesburg and Hong Kong.&lt;/p&gt;

&lt;p&gt;I think the FSM design is something I&amp;rsquo;m proud of; if I could take any code with me, it&amp;rsquo;d be the &lt;code&gt;internal/fsm&lt;/code&gt; in the &lt;code&gt;nomad-firecracker&lt;/code&gt; repo.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;You can read more about &lt;a href="https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/" title=""&gt;the &lt;code&gt;flyd&lt;/code&gt; orchestrator JP led over here&lt;/a&gt;.  But, a quick decoder ring: &lt;code&gt;flyd&lt;/code&gt; runs independently without any central coordination on thousands of “worker” servers around the globe. It’s structured as an API server for a bunch of finite state machine invocations, where an FSM might be something like “start a Fly Machine” or “create a new Fly Machine” or “cordon off a Fly Machine so we can update it”. Each FSM invocation is comprised of a bunch of steps, each of those steps has callbacks into the &lt;code&gt;flyd&lt;/code&gt; code, and each step is logged in &lt;a href="https://github.com/boltdb/bolt" title=""&gt;a BoltDB database&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Thinking back, there are like two archetypes of insanely talented developers I’ve worked with. One is the kind that belts out ridiculous amounts of relatively sophisticated code on a whim, at like 3AM. Jerome [who leads our fly-proxy team], is that type. The other comes to projects with what feels like fully-formed, coherent designs that are not super intuitive, and the whole project just falls together around that design. Did you know you were going to do the FSM log thing when you started &lt;code&gt;flyd&lt;/code&gt;?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I definitely didn&amp;rsquo;t have any specific design in mind when I started on &lt;code&gt;flyd&lt;/code&gt;. I think the FSM stuff is a result of work I did at Compose.io / MongoHQ (where it was called &amp;ldquo;recipes&amp;rdquo;/&amp;ldquo;operations&amp;rdquo;) and the workd I did at HashiCorp using Cadence.&lt;/p&gt;

&lt;p&gt;Once I understood what the product needed to do and look like, having a way to perform deterministic and durable execution felt like a good design.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Cadence?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href='https://cadenceworkflow.io/' title=''&gt;Cadence&lt;/a&gt; is the child of AWS Step Functions and the predecessor to &lt;a href='https://temporal.io/' title=''&gt;Temporal&lt;/a&gt; (the company).&lt;/p&gt;

&lt;p&gt;One of the biggest gains, with how it works in &lt;code&gt;flyd&lt;/code&gt;, is knowing we would need to deploy &lt;code&gt;flyd&lt;/code&gt; all day, every day. If &lt;code&gt;flyd&lt;/code&gt; was in the middle of doing some work, it needed to pick back up right where it left off, post-deploy.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OK, next question. What&amp;rsquo;s the most impressive thing you saw someone else build here? To make things simpler and take some pressure off the interview, we can exclude any of my works from consideration.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Probably &lt;a href='https://github.com/superfly/corrosion' title=''&gt;&lt;code&gt;corrosion2&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Sidebar: &lt;code&gt;corrosion2&lt;/code&gt; is our state distribution system. While &lt;code&gt;flyd&lt;/code&gt; runs individual Fly Machines for users, each instance is solely responsible for its own state; there’s no global scheduler. But we have platform components, most obviously &lt;code&gt;fly-proxy&lt;/code&gt;, our Anycast router, that need to know what’s running where. &lt;code&gt;corrosion2&lt;/code&gt; is a Rust service that does &lt;a href="https://fly.io/blog/building-clusters-with-serf/" title=""&gt;SWIM gossip&lt;/a&gt; to propagate information from each worker into a CRDT-structured SQLite database. &lt;code&gt;corrosion2&lt;/code&gt; essentially means any component on our fleet can do SQLite queries to get near-real-time information about any Fly Machine around the world.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If for no other reason than that we deployed &lt;code&gt;corrosion&lt;/code&gt;, learned from it, and were able to make significant and valuable improvements — and then migrate to the new system in a short period of time.&lt;/p&gt;

&lt;p&gt;Having a &amp;ldquo;just SQLite&amp;rdquo; interface, for async replicated changes around the world in seconds, it&amp;rsquo;s pretty powerful.&lt;/p&gt;

&lt;p&gt;If we invested in &lt;a href='https://antithesis.com/' title=''&gt;Anthesis&lt;/a&gt; or TLA+ testing, I think there&amp;rsquo;s &lt;a href='https://github.com/superfly/corrosion' title=''&gt;potential for other companies&lt;/a&gt; to get value out of &lt;code&gt;corrosion2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Just as a general-purpose gossip-based SQLite CRDT gossip system?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OK, you&amp;rsquo;re being too nice. What&amp;rsquo;s your least favorite thing about the platform?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GraphQL. No, Elixir. It&amp;rsquo;s a tie between GraphQL and Elixir.&lt;/p&gt;

&lt;p&gt;But probably GraphQL, by a hair.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;That&amp;rsquo;s not the answer I expected.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GraphQL slows everyone down, and everything. Elixir only slows me down.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The rest of the platform, you&amp;rsquo;re fine with? No complaints?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m happier now that we have &lt;code&gt;pilot&lt;/code&gt;.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;code&gt;pilot&lt;/code&gt; is our new &lt;code&gt;init&lt;/code&gt;. When we launch a Fly Machine, &lt;code&gt;init&lt;/code&gt; is our foothold in the machine; this is unlike a normal OCI runtime, where “pid 1” is often the user’s entrypoint program. Our original &lt;code&gt;init&lt;/code&gt; was so simple people dunked on it and said it might as well have been a bash script; over time, &lt;code&gt;init&lt;/code&gt; has sprouted a bunch of new features. &lt;code&gt;pilot&lt;/code&gt; consolidates those features, and, more importantly, is itself a complete OCI runtime; &lt;code&gt;pilot&lt;/code&gt; can natively run containers inside of Fly Machines.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Before &lt;code&gt;pilot&lt;/code&gt;, there really wasn&amp;rsquo;t any contract between &lt;code&gt;flyd&lt;/code&gt; and &lt;code&gt;init&lt;/code&gt;. And &lt;code&gt;init&lt;/code&gt; was just &amp;ldquo;whatever we wanted &lt;code&gt;init&lt;/code&gt; to be&amp;rdquo;. That limit its ability to serve us.&lt;/p&gt;

&lt;p&gt;Having &lt;code&gt;pilot&lt;/code&gt; be an OCI-compliant runtime with an API for &lt;code&gt;flyd&lt;/code&gt; to drive is  a big win for the future of the Fly Machines API.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Was I right that we should have used SQLite for &lt;code&gt;flyd&lt;/code&gt;, or were you wrong to have used BoltDB?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I still believe Bolt was the right choice. I&amp;rsquo;ve never lost a second of sleep worried that someone is about to run a SQL update statement on a host, or across the whole fleet, and then mangled all our state data. And limiting the storage interface, by not using SQL, kept &lt;code&gt;flyd&lt;/code&gt;&amp;lsquo;s scope managed.&lt;/p&gt;

&lt;p&gt;On the engine side of the platform, which is what &lt;code&gt;flyd&lt;/code&gt; is, I still believe SQL is too powerful for what &lt;code&gt;flyd&lt;/code&gt; does.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you had this to do over again, would Bolt be precisely what you&amp;rsquo;d pick, or is there something else you&amp;rsquo;d want to try? Some cool-ass new KV store?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Nah. But, I&amp;rsquo;d maybe consider a SQLite database per-Fly-Machine. Then the scope of danger is about as small as it could possibly be.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Whoah, that&amp;rsquo;s an interesting thought. People sleep on the &amp;ldquo;keep a zillion little SQLites&amp;rdquo; design.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yeah, with per-Machine SQLite, once a Fly Machine is destroyed, we can just zip up the database and stash it in object storage. The biggest hold-up I have about it is how we&amp;rsquo;d manage the schemas.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OpenTelemetry: were you right all along?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;One hundred percent.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I basically attribute oTel at Fly.io to you.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Without oTel, it&amp;rsquo;d be a disaster trying to troubleshoot the system. I&amp;rsquo;d have ragequit trying.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I remember there being a cost issue, with how much Honeycomb was going to end up charging us to manage all the data. But that seems silly in retrospect.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For sure. It is 100% part of the decision and the conversation. But: we didn&amp;rsquo;t have the best track record running a logs/metrics cluster at this fidelity. It was worth the money to pay someone else to manage tracing data.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Strong agree. I think my only issue is just the extent to which it cruds up code. But I need to get over that.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, it&amp;rsquo;s very explicit. I think the next big part of oTel is going to be auto-instrumentation, for profiling.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You&amp;rsquo;re a veteran Golang programmer. Say 3 nice things about Rust.&lt;/em&gt;&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Most of our backend is in Go, but &lt;code&gt;fly-proxy&lt;/code&gt;, &lt;code&gt;corrosion2&lt;/code&gt;, and &lt;code&gt;pilot&lt;/code&gt; are in Rust.&lt;/p&gt;
&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Option. 
&lt;/li&gt;&lt;li&gt;Match.
&lt;/li&gt;&lt;li&gt;Serde macros.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Even I can&amp;rsquo;t say shit about Option and match.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Match is so much better than anything in Go.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Elixir, Go, and Rust. An honest take on that programming cocktail.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Three&amp;rsquo;s a crowd, Elixir can stay home.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you could only lose one, you&amp;rsquo;d keep Rust.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ve learned its shortcomings and the productivity far outweighs having to deal with the Rust compiler.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You&amp;rsquo;d be unhappy if we moved the &lt;code&gt;flaps&lt;/code&gt; API code from Go to Elixir.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Correct.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I kind of buy the idea of doing orchestration and scheduling code, which is policy-intensive, in a higher-level language.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Maybe. If Ruby had a better concurrency story, I don&amp;rsquo;t think Elixir would have a place for us.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Here I need to note that Ruby is functionally dead here, and Elixir is ascendant.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;We have an idiosyncratic management structure. We&amp;rsquo;re bottom-up, but ambiguously so. We don&amp;rsquo;t have roadmaps, except when we do. We have minimal top-down technical direction. Critique.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s too easy to lose sight of whether your current focus [in what you&amp;rsquo;re building] is valuable to the company.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The first thing I warn every candidate about on our &amp;ldquo;do-not-work-here&amp;rdquo; calls.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I think it comes down to execution, and accountability to actually finish projects. I spun a lot trying to figure out what would be the most valuable work for Fly Machines.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You don&amp;rsquo;t have to be so nice about things.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We struggle a lot with consistent communication. We change direction a little too often. It got to a point where I didn&amp;rsquo;t see a point in devoting time and effort into projects, because I&amp;rsquo;d not be able to show enough value quick enough.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I see things paying off later than we&amp;rsquo;d hoped or expected they would. Our secret storage system, Pet Semetary, is a good example of this. Our K8s service, FKS, is another obvious one, since we&amp;rsquo;re shipping MPG on it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is your second time working Kurt, at a company where he&amp;rsquo;s the CEO. Give him a 1-4 star rating. He can take it! At least, I think he can take it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;2022: ★★★★&lt;/p&gt;

&lt;p&gt;2023: ★★&lt;/p&gt;

&lt;p&gt;2024: ★★✩&lt;/p&gt;

&lt;p&gt;2025: ★★★✩&lt;/p&gt;

&lt;p&gt;On a four-star scale.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Whoah. I did not expect a histogram. Say more about 2023!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We hired too many people, too quickly, and didn&amp;rsquo;t have the guardrails and structure in place for everybody to be successful.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Also: GPUs!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes. That was my next comment.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Do we secretly agree about GPUs?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I think so.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Our side won the argument in the end! But at what cost?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;They were a killer distraction.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Final question: how long will you remain in the first-responder on-call rotation after you leave? I assume at least until August. I have a shift this weekend; can you swap with me? I keep getting weekends.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I am going to be asleep all weekend if any of my previous job changes are indicative.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I sleep through on-call too! But nobody can yell at you for it now. I think you have the comparative advantage over me in on-calling.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes I will absolutely take all your future on-call shifts, you have convinced me.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;All this aside: it has been a privilege watching you work. I hope your next gig is 100x more relaxing than this was. Or maybe I just hope that for myself. Except: I&amp;rsquo;ll never escape this place. Thank you so much for doing this.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Thank you! I&amp;rsquo;m forever grateful for having the opportunity to be a part of Fly.io.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>A Blog, If You Can Keep It</title>
        <link rel="alternate" href="https://fly.io/blog/a-blog-if-kept/"/>
        <id>https://fly.io/blog/a-blog-if-kept/</id>
        <published>2025-02-10T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/a-blog-if-kept/assets/keep-blog.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;A boldfaced lede like this was a sure sign you were reading a carefully choreographed EffortPost from our team at Fly.io. We’re going to do less of those. Or the same amount but more of a different kind of post. Either way: launch an app on Fly.io today!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Over the last 5 years, we’ve done pretty well for ourselves writing content for Hacker News. And that’s &lt;a href='https://news.ycombinator.com/item?id=39373476' title=''&gt;mostly&lt;/a&gt; been good for us. We don’t do conventional marketing, we don’t have a sales team, the rest of social media is atomized over 5 different sites. Writing pieces that HN takes seriously has been our primary outreach tool.&lt;/p&gt;

&lt;p&gt;There’s a recipe (probably several, but I know this one works) for charting a post on HN:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write an EffortPost, which is to say a dense technical piece over 2000 words long; within that rubric there’s a bunch of things that are catnip to HN, including runnable code, research surveys, and explainers. (There are also cat-repellants you learn to steer clear of.)
&lt;/li&gt;&lt;li&gt;Assiduously avoid promotion. You have to write for the audience. We get away with sporadically including a call-to-action block in our posts, but otherwise: the post should make sense even if an unrelated company posted it after you went out of business.
&lt;/li&gt;&lt;li&gt;Pick topics HN is interested in (it helps if all your topics are au courant for HN, and we’ve been &lt;a href='https://news.ycombinator.com/item?id=32250426' title=''&gt;very&lt;/a&gt; &lt;a href='https://news.ycombinator.com/item?id=32018066' title=''&gt;lucky&lt;/a&gt; in that regard).
&lt;/li&gt;&lt;li&gt;Like 5-6 more style-guide things that help incrementally. Probably 3 different teams writing for HN will have 3 different style guides with only like &amp;frac12; overlap. Ours, for instances, instructs writers to swear.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;I like this kind of writing. It’s not even a chore. But it’s become an impediment for us, for a couple reasons: the team serializes behind an “editorial” function here, which keeps us from publishing everything we want; worse, caring so much about our track record leaves us noodling on posts interminably (the poor &lt;a href='https://www.tigrisdata.com/' title=''&gt;Tigrises&lt;/a&gt; have been waiting for months for me to publish the piece I wrote about them and FoundationDB; take heart, this post today means that one is coming soon).&lt;/p&gt;

&lt;p&gt;But worst of all, I worried incessantly about us &lt;a href='https://gist.github.com/tqbf/e853764b562a2d72a91a6986ca3b77c0' title=''&gt;wearing out our welcome&lt;/a&gt;. To my mind, we’d have 1, maybe 2 bites at the HN apple in a given month, and we needed to make them count.&lt;/p&gt;

&lt;p&gt;That was dumb. I am dumb about a lot of things! I came around to understanding this after Kurt demanded I publish my blog post about BFAAS (Bash Functions As A Service), 500 lines of Go code that had generated 4500 words in my draft. It was only after I made the decision to stop gatekeeping this blog that I realized &lt;a href='https://simonwillison.net/' title=''&gt;Simon Willison&lt;/a&gt; has been disproving my “wearing out the welcome” theory, day in and day out, for years. He just writes stuff about LLMs when it interests him. I mean, it helps that he’s a better writer than we are. But he’s not wasting time choreographing things.&lt;/p&gt;

&lt;p&gt;Back in like 2009, &lt;a href='https://web.archive.org/web/20110806040300/http://chargen.matasano.com/chargen/2009/7/22/if-youre-typing-the-letters-a-e-s-into-your-code-youre-doing.html' title=''&gt;we had a blog&lt;/a&gt; at another company I was at. That blog drove a lot of business for us (and, on three occasions, almost killed me). It was not in the least bit optimized for HN. I like pretending to be a magazine feature writer, but I miss writing dashed-off pieces every day and clearing space for other people on the team to write as well.&lt;/p&gt;

&lt;p&gt;So this is all just a heads up: we’re trying something new. This is a very long and self-indulgent way to say “we’re going to write a normal blog like it’s 2008”, but that’s how broken my brain is after years of having my primary dopaminergic rewards come from how long Fly.io blog posts stay on the front page: I have to disclaim blogging before we start doing it, lest I fail to meet expectations.&lt;/p&gt;

&lt;p&gt;Like I said. I’m real dumb. But: looking forward to getting a lot more stuff out on the web for people to read this year!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Did Semgrep Just Get A Lot More Interesting?</title>
        <link rel="alternate" href="https://fly.io/blog/semgrep-but-for-real-now/"/>
        <id>https://fly.io/blog/semgrep-but-for-real-now/</id>
        <published>2025-02-10T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/semgrep-but-for-real-now/assets/ci-cd-cover.webp"/>
        <content type="html">&lt;div class="right-sidenote"&gt;&lt;p&gt;This whole paragraph is just one long sentence. God I love &lt;a href="https://fly.io/blog/a-blog-if-kept/" title=""&gt;just random-ass blogging&lt;/a&gt; again.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href='https://ghuntley.com/stdlib/' title=''&gt;This bit by Geoffrey Huntley&lt;/a&gt; is super interesting to me and, despite calling out that LLM-driven development agents like Cursor have something like a 40% success rate at actually building anything that passes acceptance criteria, makes me think that more of the future of our field belongs to people who figure out how to use this weird bags of model weights than any of us are comfortable with. &lt;/p&gt;

&lt;p&gt;I’ve been dinking around with Cursor for a week now (if you haven’t, I think it’s something close to malpractice not to at least take it — or something like it — for a spin) and am just now from this post learning that Cursor has this &lt;a href='https://docs.cursor.com/context/rules-for-ai' title=''&gt;rules feature&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The important thing for me is not how Cursor rules work, but rather how Huntley uses them. He turns them back on themselves, writing rules to tell Cursor how to organize the rules, and then teach Cursor how to write (under human supervision) its own rules.&lt;/p&gt;

&lt;p&gt;Cursor kept trying to get Huntley to use Bazel as a build system. So he had cursor write a rule for itself: “no bazel”. And there was no more Bazel. If I’d known I could do this, I probably wouldn’t have bounced from the Elixir project I had Cursor doing, where trying to get it to write simple unit tests got it all tangled up trying to make &lt;a href='https://hexdocs.pm/mox/Mox.html' title=''&gt;Mox&lt;/a&gt; work. &lt;/p&gt;

&lt;p&gt;But I’m burying the lead. &lt;/p&gt;

&lt;p&gt;Security people have been for several years now somewhat in love with a tool called &lt;a href='https://github.com/semgrep/semgrep' title=''&gt;Semgrep&lt;/a&gt;. Semgrep is a semantics-aware code search tool; using symbolic variable placeholders and otherwise ordinary code, you can write rules to match pretty much arbitary expressions and control flow. &lt;/p&gt;

&lt;p&gt;If you’re an appsec person, where you obviously go with this is: you build a library of Semgrep searches for well-known vulnerability patterns (or, if you’re like us at Fly.io, you work out how to get Semgrep to catch the Rust concurrency footgun of RWLocks inside if-lets).&lt;/p&gt;

&lt;p&gt;The reality for most teams though is “ain’t nobody got time for that”. &lt;/p&gt;

&lt;p&gt;But I just checked and, unsurprisingly, 4o &lt;a href='https://chatgpt.com/share/67aa94a7-ea3c-8012-845c-6c9491b33fe4' title=''&gt;seems to do reasonably well&lt;/a&gt; at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?&lt;/p&gt;

&lt;p&gt;What interests me is this: it seems obvious that we’re going to do more and more “closed-loop” LLM agent code generation stuff. By “closed loop”, I mean that the thingy that generates code is going to get to run the code and watch what happens when it’s interacted with. You’re just a small bit of glue code and a lot of system prompting away from building something like that right now: &lt;a href='https://x.com/chris_mccord/status/1882839014845374683' title=''&gt;Chris McCord is building&lt;/a&gt; a thingy that generates whole Elixir/Phoenix apps and runs them as Fly Machines. When you deploy these kinds of things, the LLM gets to see the errors when the code is run, and it can just go fix them. It also gets to see errors and exceptions in the logs when you hit a page on the app, and it can just go fix them.&lt;/p&gt;

&lt;p&gt;With a bit more system prompting, you can get an LLM to try to generalize out from exceptions it fixes and generate unit test coverage for them. &lt;/p&gt;

&lt;p&gt;With a little bit more system prompting, you can probably get an LLM to (1) generate a Semgrep rule for the generalized bug it caught, (2) test the Semgrep rule with a positive/negative control, (3) save the rule, (4) test the whole codebase with Semgrep for that rule, and (5) fix anything it finds that way. &lt;/p&gt;

&lt;p&gt;That is a lot more interesting to me than tediously (and probably badly) trying to predict everything that will go wrong in my codebase a priori and Semgrepping for them. Which is to say: Semgrep — which I have always liked — is maybe a lot more interesting now? And tools like it?&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>VSCode’s SSH Agent Is Bananas</title>
        <link rel="alternate" href="https://fly.io/blog/vscode-ssh-wtf/"/>
        <id>https://fly.io/blog/vscode-ssh-wtf/</id>
        <published>2025-02-07T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/static/images/default-post-thumbnail.webp"/>
        <content type="html">&lt;p&gt;We’re interested in getting integrated into the flow VSCode uses to do remote editing over SSH, because everybody is using VSCode now, and, in particular, they’re using forks of VSCode that generate code with LLMs. &lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;”hallucination” is what we call it when LLMs get code wrong; “engineering” is what we call it when people do.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;LLM-generated code is &lt;a href='https://nicholas.carlini.com/writing/2024/how-i-use-ai.html' title=''&gt;useful in the general case&lt;/a&gt; if you know what you’re doing. But it’s ultra-useful if you can close the loop between the LLM and the execution environment (with an “Agent” setup). There’s lots to say about this, but for the moment: it’s a semi-effective antidote to hallucination: the LLM generates the code, the agent scaffolding runs the code, the code generates errors, the agent feeds it back to the LLM, the process iterates. &lt;/p&gt;

&lt;p&gt;So, obviously, the issue here is you don’t want this iterative development process happening on your development laptop, because LLMs have boundary issues, and they’ll iterate on your system configuration just as happily on the Git project you happen to be working in. A thing you’d really like to be able to do: run a closed-loop agent-y (“agentic”? is that what we say now) configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can’t screw you over in any way. You get where we’re going with this.&lt;/p&gt;

&lt;p&gt;Anyways! I would like to register a concern.&lt;/p&gt;

&lt;p&gt;Emacs hosts the spiritual forebearer of remote editing systems, a blob of hyper-useful Elisp called &lt;a href='https://www.gnu.org/software/tramp/' title=''&gt;“Tramp”&lt;/a&gt;. If you can hook Tramp up to any kind of interactive environment — usually, an SSH session — where it can run Bourne shell commands, it can extend Emacs to that environment.&lt;/p&gt;

&lt;p&gt;So, VSCode has a feature like Tramp. Which, neat, right? You’d think, take Tramp, maybe simplify it a bit, switch out Elisp for Typescript.&lt;/p&gt;

&lt;p&gt;You’d think wrong!&lt;/p&gt;

&lt;p&gt;Unlike Tramp, which lives off the land on the remote connection, VSCode mounts a full-scale invasion: it runs a Bash snippet stager that downloads an agent, including a binary installation of Node. &lt;/p&gt;

&lt;p&gt;I &lt;em&gt;think&lt;/em&gt; this is &lt;a href='https://github.com/microsoft/vscode/tree/c9e7e1b72f80b12ffc00e06153afcfedba9ec31f/src/vs/server/node' title=''&gt;the source code&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;The agent runs over port-forwarded SSH. It establishes a WebSockets connection back to your running VSCode front-end. The underlying protocol on that connection can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wander around the filesystem
&lt;/li&gt;&lt;li&gt;Edit arbitrary files
&lt;/li&gt;&lt;li&gt;Launch its own shell PTY processes
&lt;/li&gt;&lt;li&gt;Persist itself
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;In security-world, there’s a name for tools that work this way. I won’t say it out loud, because that’s not fair to VSCode, but let’s just say the name is murid in nature.&lt;/p&gt;

&lt;p&gt;I would be a little nervous about letting people VSCode-remote-edit stuff on dev servers, and apoplectic if that happened during an incident on something in production. &lt;/p&gt;

&lt;p&gt;It turns out we don’t have to care about any of this to get a custom connection to a Fly Machine working in VSCode, so none of this matters in any kind of deep way, but: we’ve decided to just be a blog again, so: we had to learn this, and now you do too.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>AI GPU Clusters, From Your Laptop, With Livebook</title>
        <link rel="alternate" href="https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/"/>
        <id>https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/</id>
        <published>2024-09-24T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/assets/ai-gpu-livebook-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Livebook, FLAME, and the Nx stack: three Elixir components that are easy to describe, more powerful than they look, and intricately threaded into the Elixir ecosystem. A few weeks ago, Chris McCord (👋) and Chris Grainger showed them off at ElixirConf 2024. We thought the talk was worth a recap.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s begin by introducing our cast of characters.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://livebook.dev/' title=''&gt;Livebook&lt;/a&gt; is usually described as Elixir&amp;rsquo;s answer to &lt;a href='https://jupyter.org/' title=''&gt;Jupyter Notebooks&lt;/a&gt;. And that&amp;rsquo;s a good way to think about it. But Livebook takes full advantage of the Elixir platform, which makes it sneakily powerful. By linking up directly with Elixir app clusters, Livebook can switch easily between driving compute locally or on remote servers, and makes it easy to bring in any kind of data into reproducible workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''&gt;FLAME&lt;/a&gt; is the Elixir&amp;rsquo;s answer to serverless computing. By having the library manage a pool of executors for you, FLAME lets you treat your entire application as if it was elastic and scale-to-zero. You configure FLAME with some basic information about where to run code and how many instances it&amp;rsquo;s allowed to run with, and then mark off any arbitrary section of code with &lt;code&gt;Flame.call&lt;/code&gt;. The framework takes care of the rest. It&amp;rsquo;s the upside of serverless without committing yourself to blowing your app apart into tiny, intricately connected pieces.&lt;/p&gt;

&lt;p&gt;The &lt;a href='https://github.com/elixir-nx' title=''&gt;Nx stack&lt;/a&gt; is how you do Elixir-native AI and ML. Nx gives you an Elixir-native notion of tensor computations with GPU backends. &lt;a href='https://github.com/elixir-nx/axon' title=''&gt;Axon&lt;/a&gt; builds a common interface for ML models on top of it. &lt;a href='https://github.com/elixir-nx/bumblebee' title=''&gt;Bumblebee&lt;/a&gt; makes those models available to any Elixir app that wants to download them, from just a couple lines of code.&lt;/p&gt;

&lt;p&gt;Here is quick video showing how to transfer a local tensor to a remote GPU, using Livebook, FLAME, and Nx:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/5ImP3gpUSkQ"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Let&amp;rsquo;s dive into the &lt;a href='https://www.youtube.com/watch?v=4qoHPh0obv0' title=''&gt;keynote&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id='poking-a-hole-in-your-infrastructure' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#poking-a-hole-in-your-infrastructure' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Poking a hole in your infrastructure&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Any Livebook, including the one running on your laptop, can start a runtime running on a Fly Machine, in Fly.io&amp;rsquo;s public cloud. That Elixir machine will (by default) live in your default Fly.io organization, giving it networked access to all the other apps that might live there.&lt;/p&gt;

&lt;p&gt;This is an access control situation that mostly just does what you want it to do without asking. Unless you ask it to, Fly.io isn&amp;rsquo;t exposing anything to the Internet, or to other users of Fly.io. For instance: say we have a database we&amp;rsquo;re going to use to generate reports. It can hang out on our Fly organization, inside of a private network with no connectivity to the world. We can spin up a Livebook instance that can talk to it, without doing any network or infrastructure engineering to make that happen.&lt;/p&gt;

&lt;p&gt;But wait, there&amp;rsquo;s more. Because this is all Elixir, Livebook also allows you to connect to any running Erlang/Elixir application in your infrastructure to debug, introspect, and monitor them.&lt;/p&gt;

&lt;p&gt;Check out this clip of Chris McCord connecting &lt;a href='https://rtt.fly.dev/' title=''&gt;to an existing application&lt;/a&gt; during the keynote:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/4qoHPh0obv0?start=1106"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Running a snippet of code from a laptop on a remote server is a neat trick, but Livebook is doing something deeper than that. It&amp;rsquo;s taking advantage of Erlang/Elixir&amp;rsquo;s native facility with cluster computation and making it available to the notebook. As a result, when we do things like auto-completing, Livebook delivers results from modules defined on the remote note itself. 🤯&lt;/p&gt;
&lt;h2 id='elastic-scale-with-flame' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#elastic-scale-with-flame' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Elastic scale with FLAME&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;When we first introduced FLAME, the example we used was video encoding.&lt;/p&gt;

&lt;p&gt;Video encoding is complicated and slow enough that you&amp;rsquo;d normally make arrangements to run it remotely or in a background job queue, or as a triggerable Lambda function. The point of FLAME is to get rid of all those steps, and give them over to the framework instead. So: we wrote our &lt;code&gt;ffpmeg&lt;/code&gt; calls inline like normal code, as if they were going to complete in microseconds, and wrapped them in &lt;code&gt;Flame.call&lt;/code&gt; blocks. That was it, that was the demo.&lt;/p&gt;

&lt;p&gt;Here, we&amp;rsquo;re going to put a little AI spin on it.&lt;/p&gt;

&lt;p&gt;The first thing we&amp;rsquo;re doing here is driving FLAME pools from Livebook. Livebook will automatically synchronize your notebook dependencies as well as any module or code defined in your notebook across nodes. That means any code we write in our notebook can be dispatched transparently out to arbitrarily many compute nodes, without ceremony.&lt;/p&gt;

&lt;p&gt;Now let&amp;rsquo;s add some AI flair. We take an object store bucket full of video files. We use &lt;code&gt;ffmpeg&lt;/code&gt; to extract stills from the video at different moments. Then: we send them to &lt;a href='https://www.llama.com/' title=''&gt;Llama&lt;/a&gt;, running on &lt;a href='https://fly.io/gpu' title=''&gt;GPU Fly Machines&lt;/a&gt; (still locked to our organization), to get descriptions of the stills.&lt;/p&gt;

&lt;p&gt;All those stills and descriptions get streamed back to our notebook, in real time:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/4qoHPh0obv0?start=1692"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;At the end, the descriptions are sent to &lt;a href='https://mistral.ai/' title=''&gt;Mistral&lt;/a&gt;, which builds a summary.&lt;/p&gt;

&lt;p&gt;Thanks to FLAME, we get explicit control over the minimum and the maximum amount of nodes you want running at once, as well their concurrency settings. As nodes finish processing each video, new ones are automatically sent to them, until the whole bucket has been traversed. Each node will automatically shut down after an idle timeout and the whole cluster terminates if you disconnect the Livebook runtime.&lt;/p&gt;

&lt;p&gt;Just like your app code, FLAME lets you take your notebook code designed to run locally, change almost nothing, and elastically execute it across ephemeral infrastructure.&lt;/p&gt;
&lt;h2 id='64-gpus-hyperparameter-tuning-on-a-laptop' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#64-gpus-hyperparameter-tuning-on-a-laptop' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;64-GPUs hyperparameter tuning on a laptop&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Next, Chris Grainger, CTO of &lt;a href='https://amplified.ai/' title=''&gt;Amplified&lt;/a&gt;, takes the stage.&lt;/p&gt;

&lt;p&gt;For work at Amplified, Chris wants to analyze a gigantic archive of patents, on behalf of a client doing edible cannibinoid work. To do that, he uses a BERT model (BERT, from Google, is one of the OG &amp;ldquo;transformer&amp;rdquo; models, optimized for text comprehension).&lt;/p&gt;

&lt;p&gt;To make the BERT model effective for this task, he&amp;rsquo;s going to do a hyperparameter training run.&lt;/p&gt;

&lt;p&gt;This is a much more complicated AI task than the Llama work we just showed up. Chris is going to generate a cluster of 64 GPU Fly Machines, each running an &lt;a href='https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/' title=''&gt;L40s GPU&lt;/a&gt;. On each of these nodes, he needs to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;setup its environment (including native dependencies and GPU bindings)
&lt;/li&gt;&lt;li&gt;load the training data
&lt;/li&gt;&lt;li&gt;compile a different version of BERT with different parameters, optimizers, etc.
&lt;/li&gt;&lt;li&gt;start the fine-tuning
&lt;/li&gt;&lt;li&gt;stream its results in real-time to each assigned chart
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Here&amp;rsquo;s the clip. You&amp;rsquo;ll see the results stream in, in real time, directly back to his Livebook. We&amp;rsquo;ll wait, because it won&amp;rsquo;t take long to watch:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/4qoHPh0obv0?start=3344"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id='this-is-just-the-beginning' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#this-is-just-the-beginning' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;This is just the beginning&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The suggestion of mixing Livebook and FLAME to elastically scale notebook execution was originally proposed by Chris Grainger during ElixirConf EU. During the next four months, Jonatan Kłosko, Chris McCord, and José Valim worked part-time on making it a reality in time for ElixirConf US. Our ability to deliver such a rich combination of features in such a short period of time is a testament to the capabilities of the Erlang Virtual Machine, which Elixir and Livebook runs on. Other features, such as &lt;a href='https://github.com/elixir-explorer/explorer/issues/932' title=''&gt;remote dataframes and distributed GC&lt;/a&gt;, were implemented in a weekend.  Bringing the same functionality to other ecosystems would take several additional months, sometimes accompanied by millions in funding, and often times as part of a closed-source product.&lt;/p&gt;

&lt;p&gt;Furthermore, since we announced this feature, &lt;a href='https://github.com/mruoss' title=''&gt;Michael Ruoss&lt;/a&gt; stepped in and brought the same functionality to Kubernetes. From Livebook v0.14.1, you can start Livebook runtimes inside a Kubernetes cluster and also use FLAME to elastically scale them. Expect more features and news in this space!&lt;/p&gt;

&lt;p&gt;Finally, Fly&amp;rsquo;s infrastructure played a key role in making it possible to start a cluster of GPUs in seconds rather than minutes, and all it requires is a Docker image. We&amp;rsquo;re looking forward to see how other technologies and notebook platforms can leverage Fly to also elevate their developer experiences.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Launch a GPU app in seconds&lt;/h1&gt;
    &lt;p&gt;Run your own LLMs or use Livebook for elastic GPU workflows&amp;nbsp✨&lt;/p&gt;
      &lt;a class="btn btn-lg" href="/gpu"&gt;
        Go! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-rabbit.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;</content>
    </entry>
    <entry>
        <title>Accident Forgiveness</title>
        <link rel="alternate" href="https://fly.io/blog/accident-forgiveness/"/>
        <id>https://fly.io/blog/accident-forgiveness/</id>
        <published>2024-08-21T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/accident-forgiveness/assets/money-for-mistakes-blog-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. &lt;a href="https://fly.io/speedrun" title=""&gt;Try it out; you’ll be deployed in just minutes&lt;/a&gt;, and, as you’re about to read, with less financial risk.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Public cloud billing is terrifying.&lt;/p&gt;

&lt;p&gt;The premise of a public cloud &amp;mdash; what sets it apart from a hosting provider &amp;mdash; is 8,760 hours/year of on-tap deployable compute, storage, and networking. Cloud resources are &amp;ldquo;elastic&amp;rdquo;: they&amp;rsquo;re acquired and released as needed; in the &amp;ldquo;cloud-iest&amp;rdquo; apps, without human intervention. Public cloud resources behave like utilities, and that&amp;rsquo;s how they&amp;rsquo;re priced.&lt;/p&gt;

&lt;p&gt;You probably can&amp;rsquo;t tell me how much electricity your home is using right now, and  may only come within tens of dollars of accurately predicting your water bill. But neither of those bills are all that scary, because you assume there&amp;rsquo;s a limit to how much you could run them up in a single billing interval.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s not true of public clouds. There are only so many ways to &amp;ldquo;spend&amp;rdquo; water at your home, but there are indeterminably many ways to land on a code path that grabs another VM, or to miskey a configuration, or to leak long-running CI/CD environments every time a PR gets merged. Pick a practitioner at random, and I bet they&amp;rsquo;ve read a story within the last couple months about someone running up a galactic-scale bill at some provider or other.&lt;/p&gt;
&lt;h2 id='implied-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implied-accident-forgiveness' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Implied Accident Forgiveness&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;For people who don&amp;rsquo;t do a lot of cloud work, what all this means is that every PR push sets off a little alarm in the back of their heads: &amp;ldquo;you may have just incurred $200,000 of costs!&amp;rdquo;. The alarm is quickly silenced,  though it&amp;rsquo;s still subtly extracting a cortisol penalty. But by deadening the nerves that sense the danger of unexpected charges, those people are nudged closer to themselves being the next story on Twitter about an accidental $200,000 bill.&lt;/p&gt;

&lt;p&gt;The saving grace here, which you&amp;rsquo;ll learn if you ever become that $200,000 story, is that nobody pays those bills.&lt;/p&gt;

&lt;p&gt;See, what cloud-savvy people know already is that providers have billing support teams, which spend a big chunk of their time conceding disputed bills. If you do something luridly stupid and rack up costs, AWS and GCP will probably cut you a break. We will too. Everyone does.&lt;/p&gt;

&lt;p&gt;If you didn&amp;rsquo;t already know this, you&amp;rsquo;re welcome; I&amp;rsquo;ve made your life a little better, even if you don&amp;rsquo;t run things on Fly.io.&lt;/p&gt;

&lt;p&gt;But as soothing as it is to know you can get a break from cloud providers, the billing situation here is still a long ways away from &amp;ldquo;good&amp;rdquo;. If you accidentally add a zero to a scale count and don&amp;rsquo;t notice for several weeks, AWS or GCP will probably cut you a break. But they won&amp;rsquo;t &lt;em&gt;definitely&lt;/em&gt; do it, and even though your odds are good, you&amp;rsquo;re still finding out at email- and phone-tag scale speeds. That&amp;rsquo;s not fun!&lt;/p&gt;
&lt;h2 id='explicit-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#explicit-accident-forgiveness' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Explicit Accident Forgiveness&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Charging you for stuff you didn&amp;rsquo;t want is bad business.&lt;/p&gt;

&lt;p&gt;Good business, we think, means making you so comfortable with your cloud you try new stuff. You, and everyone else on your team. Without a chaperone from the finance department.&lt;/p&gt;

&lt;p&gt;So we&amp;rsquo;re going to do the work to make this official. If you&amp;rsquo;re a customer of ours, we&amp;rsquo;re going to charge you in exacting detail for every resource you intentionally use of ours, but if something blows up and you get an unexpected bill, we&amp;rsquo;re going to let you off the hook.&lt;/p&gt;
&lt;h2 id='not-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-so-fast' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Not So Fast&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;This is a Project, with a capital P. While we&amp;rsquo;re kind of kicking ourselves for not starting it earlier, there are reasons we couldn&amp;rsquo;t do it back in 2020.&lt;/p&gt;

&lt;p&gt;The Fully Automated Accident-Forgiving Billing System of the Future (which we are in fact building and may even one day ship) will give you a line-item veto on your invoice. We are a long ways away. The biggest reason is fraud.&lt;/p&gt;

&lt;p&gt;Sit back, close your eyes, and try to think about everything public clouds do to make your life harder. Chances are, most of those things are responses to fraud. Cloud platforms attract fraudsters like ants to an upturned ice cream cone. Thanks to the modern science of cryptography, fraudsters have had a 15 year head start on turning metered compute into picodollar-granular near-money assets.&lt;/p&gt;

&lt;p&gt;Since there&amp;rsquo;s no bouncer at the door checking IDs here, an open-ended and automated commitment to accident forgiveness is, with iron certainty, going to be used overwhelmingly in order to trick us into &amp;ldquo;forgiving&amp;rdquo; cryptocurrency miners. We&amp;rsquo;re cloud platform engineers. They&amp;rsquo;re our primary pathogen.&lt;/p&gt;

&lt;p&gt;So, we&amp;rsquo;re going to roll this out incrementally.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;Why not billing alerts?&lt;/strong&gt; We’ll get there, but here too there are reasons we haven’t yet: (1) meaningful billing alerts were incredibly difficult to do with our previous billing system, and building the new system and migrating our customers to it has been a huge lift, a nightmare from which we are only now waking (the billing system’s official name); and (2) we’re wary about alerts being a product design cop-out; if we can alert on something, why aren’t we fixing it?&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='accident-forgiveness-v0-84beta' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accident-forgiveness-v0-84beta' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Accident Forgiveness v0.84beta&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;All the same subtextual, implied reassurances that every cloud provider offers remain in place at Fly.io. You are strictly better off after this announcement, we promise.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I added the “almost” right before publishing, because I’m chicken.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Now: for customers that have a support contract with us, at any level, there&amp;rsquo;s something new: I&amp;rsquo;m saying the quiet part loud. The next time you see a bill with an unexpected charge on it, we&amp;rsquo;ll refund that charge, (almost) no questions asked.&lt;/p&gt;

&lt;p&gt;That policy is so simple it feels anticlimactic to write. So, some additional color commentary:&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;re not advertising a limit to the number of times you can do this. If you&amp;rsquo;re a serious customer of ours, I promise that you cannot remotely fathom the fullness of our fellow-feeling. You&amp;rsquo;re not annoying us by getting us to refund unexpected charges. If you are growing a project on Fly.io, we will bend over backwards to keep you growing.&lt;/p&gt;

&lt;p&gt;How far can we take this? How simple can we keep this policy? We&amp;rsquo;re going to find out together.&lt;/p&gt;

&lt;p&gt;To begin with, and in the spirit of &amp;ldquo;doing things that won&amp;rsquo;t scale&amp;rdquo;, when we forgive a bill, what&amp;rsquo;s going to happen next is this: I&amp;rsquo;m going to set an irritating personal reminder for Kurt to look into what happened, now and then the day before your next bill, so we can see what&amp;rsquo;s going wrong. He&amp;rsquo;s going to hate that, which is the point: our best feature work is driven by Kurt-hate.&lt;/p&gt;

&lt;p&gt;Obviously, if you&amp;rsquo;re rubbing your hands together excitedly over the opportunity this policy presents, then, well, not so much with the fellow-feeling. We reserve the right to cut you off.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Support For Developers, By Developers&lt;/h1&gt;
    &lt;p&gt;Explicit Accident Forgiveness is just one thing we like about Support at Fly.io.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/accident-forgiveness"&gt;
        Go find out! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-dog.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='whats-next-accident-protection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next-accident-protection' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s Next: Accident Protection&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We think this is a pretty good first step. But that&amp;rsquo;s all it is.&lt;/p&gt;

&lt;p&gt;We can do better than offering you easy refunds for mistaken deployments and botched CI/CD jobs. What&amp;rsquo;s better than getting a refund is never incurring the charge to begin with, and that&amp;rsquo;s the next step we&amp;rsquo;re working on.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;More to come on that billing system.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We built a new billing system so that we can do things like that. For instance: we&amp;rsquo;re in a position to catch sudden spikes in your month-over-month bills, flag them, and catch weird-looking deployments before we bill for them.&lt;/p&gt;

&lt;p&gt;Another thing we rebuilt billing for is &lt;a href='https://community.fly.io/t/reservation-blocks-40-discount-on-machines-when-youre-ready-to-commit/20858' title=''&gt;reserved pricing&lt;/a&gt;. Already today you can get a steep discount from us reserving blocks of compute in advance. The trick to taking advantage of reserved pricing is confidently predicting a floor to your usage. For a lot of people, that means fighting feelings of loss aversion (nobody wants to get gym priced!). So another thing we can do in this same vein: catch opportunities to move customers to reserved blocks, and offer backdated reservations. We&amp;rsquo;ll figure this out too.&lt;/p&gt;

&lt;p&gt;Someday, when we&amp;rsquo;re in a monopoly position, our founders have all been replaced by ruthless MBAs, and Kurt has retired to farm coffee beans in lower Montana, we may stop doing this stuff. But until that day this is the right choice for our business.&lt;/p&gt;

&lt;p&gt;Meanwhile: like every public cloud, we provision our own hardware, and we have excess capacity. Your messed-up CI/CD jobs didn&amp;rsquo;t really cost us anything, so if you didn&amp;rsquo;t really want them, they shouldn&amp;rsquo;t cost you anything either. Take us up on this! We love talking to you.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>We're Cutting L40S Prices In Half</title>
        <link rel="alternate" href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/"/>
        <id>https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/</id>
        <published>2024-08-15T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/assets/gpu-ga-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. And as of today, cheaper GPUs. &lt;a href="https://fly.io/speedrun" title=""&gt;Try it out; you’ll be deployed in just minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We just lowered the prices on NVIDIA L40s GPUs to $1.25 per hour. Why? Because our feet are cold and we burn processor cycles for heat. But also other reasons.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s back up.&lt;/p&gt;

&lt;p&gt;We offer 4 different NVIDIA GPU models; in increasing order of performance, they&amp;rsquo;re the A10, the L40S, the 40G PCI A100, and the 80G SXM A100.  Guess which one is most popular.&lt;/p&gt;

&lt;p&gt;We guessed wrong, and spent a lot of time working out how to maximize the amount of GPU power we could deliver to a single Fly Machine. Users surprised us. By a wide margin, the most popular GPU in our inventory is the A10.&lt;/p&gt;

&lt;p&gt;The A10 is an older generation of NVIDIA GPU with fewer, slower cores and less memory. It&amp;rsquo;s the least capable GPU we offer. But that doesn&amp;rsquo;t matter, because it&amp;rsquo;s capable enough. It&amp;rsquo;s solid for random inference tasks, and handles mid-sized generative AI stuff like Mistral Nemo or Stable Diffusion. For those workloads, there&amp;rsquo;s not that much benefit in getting a beefier GPU.&lt;/p&gt;

&lt;p&gt;As a result, we can&amp;rsquo;t get new A10s in fast enough for our users.&lt;/p&gt;

&lt;p&gt;If there&amp;rsquo;s one thing we&amp;rsquo;ve learned by talking to our customers over the last 4 years, it&amp;rsquo;s that y&amp;#39;all love a peek behind the curtain. So we&amp;rsquo;re going to let you in on a little secret about how a hardware provider like Fly.io formulates GPU strategy: none of us know what the hell we&amp;rsquo;re doing.&lt;/p&gt;

&lt;p&gt;If you had asked us in 2023 what the biggest GPU problem we could solve was, we&amp;rsquo;d have said &amp;ldquo;selling fractional A100 slices&amp;rdquo;. We burned a whole quarter trying to get MIG, or at least vGPUs, working through IOMMU PCI passthrough on Fly Machines, in a project so cursed that Thomas has forsworn ever programming again. Then we went to market selling whole A100s, and for several more months it looked like the biggest problem we needed to solve was finding a secure way to expose NVLink-ganged A100 clusters to VMs so users could run training. Then H100s; can we find H100s anywhere? Maybe in a black market in Shenzhen?&lt;/p&gt;

&lt;p&gt;And here we are, a year later, looking at the data, and the least sexy, least interesting GPU part in the catalog is where all the action is.&lt;/p&gt;

&lt;p&gt;With actual customer data to back up the hypothesis, here&amp;rsquo;s what we think is happening today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most users who want to plug GPU-accelerated AI workloads into fast networks are doing inference, not training. 
&lt;/li&gt;&lt;li&gt;The hyperscaler public clouds strangle these customers, first with GPU instance surcharges, and then with egress fees for object storage data when those customers try to outsource the GPU stuff to GPU providers.
&lt;/li&gt;&lt;li&gt;If you&amp;rsquo;re trying to do something GPU-accelerated in response to an HTTP request, the right combination of GPU, instance RAM, fast object storage for datasets and model parameters, and networking is much more important than getting your hands on an H100.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is a thing we didn&amp;rsquo;t see coming, but should have: training workloads tend to look more like batch jobs, and inference tends to look more like transactions. Batch training jobs aren&amp;rsquo;t that sensitive to networking or even reliability. Live inference jobs responding to end-user HTTP requests are. So, given our pricing, of course the A10s are a sweet spot.&lt;/p&gt;

&lt;p&gt;The next step up in our lineup after the A10 is the L40S. The L40S is a nice piece of kit. We&amp;rsquo;re going to take a beat here and sell you on the L40S, because it&amp;rsquo;s kind of awesome.&lt;/p&gt;

&lt;p&gt;The L40S is an AI-optimized version of the L40, which is the data center version of the GeForce RTX 4090, resembling two 4090s stapled together.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re not a GPU hardware person, the RTX 4090 is a gaming GPU, the kind you&amp;rsquo;d play ray-traced Witcher 3 on. NVIDIA&amp;rsquo;s high-end gaming GPUs are actually reasonably good at AI workloads! But they suck in a data center rack: they chug power, they&amp;rsquo;re hard to cool, and they&amp;rsquo;re less dense. Also, NVIDIA can&amp;rsquo;t charge as much for them.&lt;/p&gt;

&lt;p&gt;Hence the L40: (much) more memory, less energy consumption, designed for a rack, not a tower case. Marked up for &amp;ldquo;enterprise&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;NVIDIA positioned the L40 as a kind of &amp;ldquo;graphics&amp;rdquo; AI GPU. Unlike the super high-end cards like the A100/H100, the L40 keeps all the rendering hardware, so it&amp;rsquo;s good for 3D graphics and video processing. Which is sort of what you&amp;rsquo;d expect from a &amp;ldquo;professionalized&amp;rdquo; GeForce card.&lt;/p&gt;

&lt;p&gt;A funny thing happened in the middle of 2023, though: the market for ultra-high-end NVIDIA cards went absolutely batshit. The huge cards you&amp;rsquo;d gang up for training jobs got impossible to find, and NVIDIA became one of the most valuable companies in the world. Serious shops started working out plans to acquire groups of L40-type cards to work around the problem, whether or not they had graphics workloads.&lt;/p&gt;

&lt;p&gt;The only company in this space that does know what they&amp;rsquo;re doing is NVIDIA. Nobody has written a highly-ranked Reddit post about GPU workloads without NVIDIA noticing and creating a new SKU. So they launched the L40S, which is an L40 with AI workload compute performance comparable to that of the A100 (without us getting into the details of F32 vs. F16 models).&lt;/p&gt;

&lt;p&gt;Long story short, the L40S is an A100-performer that we can price for A10 customers; the Volkswagen GTI of our lineup. We&amp;rsquo;re going to see if we can make that happen.&lt;/p&gt;

&lt;p&gt;We think the combination of just-right-sized inference GPUs and Tigris object storage is pretty killer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model parameters, data sets, and compute are all close together
&lt;/li&gt;&lt;li&gt;everything plugged into an Anycast network that&amp;rsquo;s fast everywhere in the world
&lt;/li&gt;&lt;li&gt;on VM instances that have enough memory to actually run real frameworks on
&lt;/li&gt;&lt;li&gt;priced like we actually want you to use it.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;You should use L40S cards without thinking hard about it. So we&amp;rsquo;re making it official. You won&amp;rsquo;t pay us a dime extra to use one instead of an A10. Have at it! Revolutionize the industry. For $1.25 an hour.&lt;/p&gt;

&lt;p&gt;Here are things you can do with an L40S on Fly.io today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can run Llama 3.1 70B — a big Llama — for LLM jobs.
&lt;/li&gt;&lt;li&gt;You can run Flux from Black Forest Labs for genAI images.
&lt;/li&gt;&lt;li&gt;You can run Whisper for automated speech recognition.
&lt;/li&gt;&lt;li&gt;You can do whole-genome alignment with SegAlign (Thomas&amp;rsquo; biochemist kid who has been snatching free GPU hours for his lab gave us this one, and we&amp;rsquo;re taking his word for it).
&lt;/li&gt;&lt;li&gt;You can run DOOM Eternal, building the Stadia that Google couldn&amp;rsquo;t pull off, because the L40S hasn&amp;rsquo;t forgotten that it&amp;rsquo;s a graphics GPU. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;It&amp;rsquo;s going to get chilly in Chicago in a month or so. Go light some cycles on fire! &lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Making Machines Move</title>
        <link rel="alternate" href="https://fly.io/blog/machine-migrations/"/>
        <id>https://fly.io/blog/machine-migrations/</id>
        <published>2024-07-30T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/machine-migrations/assets/migrations-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a global public cloud with simple, developer-friendly ergonomics. If you’ve got a working Docker image, we’ll transmogrify it into a Fly Machine: a VM running on our hardware anywhere in the world. &lt;a href="https://fly.io/speedrun" title=""&gt;Try it out; you’ll be deployed in just minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;At the heart of our platform is a systems design tradeoff about durable storage for applications.  When we added storage three years ago, to support stateful apps, we built it on attached NVMe drives. A benefit: a Fly App accessing a file on a Fly Volume is never more than a bus hop away from the data. A cost: a Fly App with an attached Volume is anchored to a particular worker physical.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;code&gt;bird&lt;/code&gt;: a BGP4 route server.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Before offering attached storage, our on-call runbook was almost as simple as “de-bird that edge server”, “tell &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;Nomad&lt;/a&gt; to drain that worker”, and “go back to sleep”. NVMe cost us that drain operation, which terribly complicated the lives of our infra team. We’ve spent the last year getting “drain” back. It’s one of the biggest engineering lifts we&amp;rsquo;ve made, and if you didn’t notice, we lifted it cleanly.&lt;/p&gt;
&lt;h3 id='the-goalposts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-goalposts' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Goalposts&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;With stateless apps, draining a worker is easy. For each app instance running on the victim server, start a new instance elsewhere. Confirm it&amp;rsquo;s healthy, then kill the old one. Rinse, repeat. At our 2020 scale, we could drain a fully loaded worker in just a handful of minutes.&lt;/p&gt;

&lt;p&gt;You can see why this process won&amp;rsquo;t work for apps with attached volumes. Sure, create a new volume elsewhere on the fleet, and boot up a new Fly Machine attached to it. But the new volume is empty. The data&amp;rsquo;s still stuck on the original worker. We asked, and customers were not OK with this kind of migration.&lt;/p&gt;

&lt;p&gt;Of course, we back Volumes snapshots up (at an interval) to off-network storage. But for “drain”, restoring backups isn&amp;rsquo;t nearly good enough. No matter the backup interval, a “restore from backup migration&amp;quot; will lose data, and a “backup and restore” migration  incurs untenable downtime.&lt;/p&gt;

&lt;p&gt;The next thought you have is, “OK, copy the volume over”. And, yes, of course you have to do that. But you can’t just &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt;, and then &lt;code&gt;kill&lt;/code&gt; the old Fly Machine. Because the original Fly Machine is still alive and writing, you have to &lt;code&gt;kill&lt;/code&gt;first, then &lt;code&gt;copy&lt;/code&gt;, then &lt;code&gt;boot&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Fly Volumes can get pretty big. Even to a rack buddy physical server, you&amp;rsquo;ll hit a point where draining incurs minutes of interruption, especially if you’re moving lots of volumes simultaneously. &lt;code&gt;Kill&lt;/code&gt;, &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; is too slow.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;There’s a world where even 15 minutes of interruption is tolerable. It’s the world where you run more than one instance of your application to begin with, so prolonged interruption of a single Fly Machine isn’t visible to your users. Do this! But we have to live in the same world as our customers, many of whom don’t run in high-availability configurations.&lt;/p&gt;
&lt;/div&gt;&lt;h3 id='behold-the-clone-o-mat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#behold-the-clone-o-mat' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Behold The Clone-O-Mat&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;Copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt;, &lt;code&gt;kill&lt;/code&gt; loses data. &lt;code&gt;Kill&lt;/code&gt;, &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; takes too long. What we needed is a new operation: &lt;code&gt;clone&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Clone&lt;/code&gt; is a lazier, asynchronous &lt;code&gt;copy&lt;/code&gt;. It creates a new volume elsewhere on our fleet, just like &lt;code&gt;copy&lt;/code&gt; would. But instead of blocking, waiting to transfer every byte from the original volume, &lt;code&gt;clone&lt;/code&gt; returns immediately, with a transfer running in the background.&lt;/p&gt;

&lt;p&gt;A new Fly Machine can be booted with that cloned volume attached. Its blocks are mostly empty. But that’s OK: when the new Fly Machine tries to read from it, the block storage system works out whether the block has been transferred. If it hasn’t, it’s fetched over the network from the original volume; this is called &amp;ldquo;hydration&amp;rdquo;. Writes are even easier, and don’t hit the network at all.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Kill&lt;/code&gt;, &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; is slow. But &lt;code&gt;kill&lt;/code&gt;, &lt;code&gt;clone&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; is fast; it can be made asymptotically as fast as stateless migration.&lt;/p&gt;

&lt;p&gt;There are three big moving pieces to this design.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First, we have to rig up our OS storage system to make this &lt;code&gt;clone&lt;/code&gt; operation work.
&lt;/li&gt;&lt;li&gt;Then, to read blocks over the network, we need a network protocol. (Spoiler: iSCSI, though we tried other stuff first.)
&lt;/li&gt;&lt;li&gt;Finally, the gnarliest of those pieces is our orchestration logic: what’s running where, what state is it in, and whether it’s plugged in correctly.
&lt;/li&gt;&lt;/ol&gt;
&lt;h3 id='block-level-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#block-level-clone' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Block-Level Clone&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The Linux feature we need to make this work already exists; &lt;a href='https://docs.kernel.org/admin-guide/device-mapper/dm-clone.html' title=''&gt;it’s called &lt;code&gt;dm-clone&lt;/code&gt;&lt;/a&gt;. Given an existing, readable storage device, &lt;code&gt;dm-clone&lt;/code&gt; gives us a new device, of identical size, where reads of uninitialized blocks will pull from the original. It sounds terribly complicated, but it’s actually one of the simpler kernel lego bricks. Let&amp;rsquo;s demystify it.&lt;/p&gt;

&lt;p&gt;As far as Unix is concerned, random-access storage devices, be they spinning rust or NVMe drives, are all instances of the common class “block device”. A block device is addressed in fixed-size (say, 4KiB) chunks, and &lt;a href='https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L356' title=''&gt;handles (roughly) these operations&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cpp"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-9s54so8p"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-9s54so8p"&gt;&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;req_opf&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* read sectors from the device */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_READ&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* write sectors to the device */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_WRITE&lt;/span&gt;        &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* flush the volatile write cache */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_FLUSH&lt;/span&gt;        &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* discard sectors */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_DISCARD&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* securely erase sectors */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_SECURE_ERASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* write the same sector many times */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_WRITE_SAME&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* write the zero filled sector many times */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_WRITE_ZEROES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* ... */&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can imagine designing a simple network protocol that supported all these options. It might have messages that looked something like:&lt;/p&gt;

&lt;p&gt;&lt;img alt="A packet diagram, just skip down to &amp;quot;struct bio&amp;quot; below" src="/blog/machine-migrations/assets/packet.png?2/3&amp;amp;center" /&gt;
Good news! The Linux block system is organized as if your computer was a network running a protocol that basically looks just like that. Here’s the message structure:&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I’ve &lt;a href="https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L223" title=""&gt;stripped a bunch of stuff out of here&lt;/a&gt; but you don’t need any of it to understand what’s coming next.&lt;/p&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative cpp"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-qqcnfpws"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-qqcnfpws"&gt;&lt;span class="cm"&gt;/*
 * main unit of I/O for the block layer and lower layers (ie drivers and
 * stacking drivers)
 */&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;bio&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;gendisk&lt;/span&gt;      &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bi_disk&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;        &lt;span class="n"&gt;bi_opf&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt;      &lt;span class="n"&gt;bi_flags&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt;      &lt;span class="n"&gt;bi_ioprio&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;blk_status_t&lt;/span&gt;        &lt;span class="n"&gt;bi_status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt;      &lt;span class="n"&gt;bi_vcnt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;         &lt;span class="cm"&gt;/* how many bio_vec's */&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;bio_vec&lt;/span&gt;      &lt;span class="n"&gt;bi_inline_vecs&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="cm"&gt;/* (page, len, offset) tuples */&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;No nerd has ever looked at a fixed-format message like this without thinking about writing a proxy for it, and &lt;code&gt;struct bio&lt;/code&gt; is no exception. The proxy system in the Linux kernel for &lt;code&gt;struct bio&lt;/code&gt; is called &lt;code&gt;device mapper&lt;/code&gt;, or DM.&lt;/p&gt;

&lt;p&gt;DM target devices can plug into other DM devices. For that matter, they can do whatever the hell else they want, as long as they honor the interface. It boils down to a &lt;code&gt;map(bio)&lt;/code&gt; function, which can dispatch a &lt;code&gt;struct bio&lt;/code&gt;, or drop it, or muck with it and ask the kernel to resubmit it.&lt;/p&gt;

&lt;p&gt;You can do a whole lot of stuff with this interface: carve a big device into a bunch of smaller ones (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/linear.html' title=''&gt;&lt;code&gt;dm-linear&lt;/code&gt;&lt;/a&gt;), make one big striped device out of a bunch of smaller ones (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/striped.html' title=''&gt;&lt;code&gt;dm-stripe&lt;/code&gt;&lt;/a&gt;), do software RAID mirroring (&lt;code&gt;dm-raid1&lt;/code&gt;), create snapshots of arbitrary existing devices (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/snapshot.html' title=''&gt;&lt;code&gt;dm-snap&lt;/code&gt;&lt;/a&gt;), cryptographically verify boot devices (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/verity.html' title=''&gt;&lt;code&gt;dm-verity&lt;/code&gt;&lt;/a&gt;), and a bunch more. Device Mapper is the kernel backend for the &lt;a href='https://sourceware.org/lvm2/' title=''&gt;userland LVM2 system&lt;/a&gt;, which is how we do &lt;a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''&gt;thin pools and snapshot backups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Which brings us to &lt;code&gt;dm-clone&lt;/code&gt; : it’s a map function that boils down to:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cpp"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-fre1zh1l"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-fre1zh1l"&gt;    &lt;span class="cm"&gt;/* ... */&lt;/span&gt; 
    &lt;span class="n"&gt;region_nr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bio_to_region&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// we have the data&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dm_clone_is_region_hydrated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_nr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;remap_and_issue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// we don't and it's a read&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bio_data_dir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;READ&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;remap_to_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// we don't and it's a write&lt;/span&gt;
    &lt;span class="n"&gt;remap_to_dest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;hydrate_bio_region&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="cm"&gt;/* ... */&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;a &lt;a href="https://docs.kernel.org/admin-guide/device-mapper/kcopyd.html" title=""&gt;&lt;code&gt;kcopyd&lt;/code&gt;&lt;/a&gt; thread runs in the background, rehydrating the device in addition to (and independent of) read accesses.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;dm-clone&lt;/code&gt; takes, in addition to the source device to clone from, a “metadata” device on which is stored a bitmap of the status of all the blocks: either “rehydrated” from the source, or not. That’s how it knows whether to fetch a block from the original device or the clone.&lt;/p&gt;
&lt;h3 id='network-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#network-clone' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Network Clone&lt;/span&gt;&lt;/h3&gt;&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;&lt;code&gt;flyd&lt;/code&gt; in a nutshell:&lt;/strong&gt; worker physical run a service, &lt;code&gt;flyd&lt;/code&gt;, which manages a couple of databases that are the source of truth for all the Fly Machines running there. Concepturally, &lt;code&gt;flyd&lt;/code&gt; is a server for on-demand instances of durable finite state machines, each representing some operation on a Fly Machine (creation, start, stop, &amp;amp;c), with the transition steps recorded carefully in a BoltDB database. An FSM step might be something like “assign a local IPv6 address to this interface”, or “check out a block device with the contents of this container”, and it’s straightforward to add and manage new ones.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Say we&amp;rsquo;ve got &lt;code&gt;flyd&lt;/code&gt; managing a Fly Machine with a volume on &lt;code&gt;worker-xx-cdg1-1&lt;/code&gt;. We want it running on &lt;code&gt;worker-xx-cdg1-2&lt;/code&gt;. Our whole fleet is meshed with WireGuard; everything can talk directly to everything else. So, conceptually:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-1&lt;/code&gt; stops the Fly Machine, and
&lt;/li&gt;&lt;li&gt;sends a message to &lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-2&lt;/code&gt; telling it to clone the source volume.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-2&lt;/code&gt; starts a &lt;code&gt;dm-clone&lt;/code&gt; instance, which creates a clone volume on &lt;code&gt;cdg1-2&lt;/code&gt;, populating it, over some kind of network block protocol, from &lt;code&gt;cdg1-1&lt;/code&gt;, and
&lt;/li&gt;&lt;li&gt;boots a new Fly Machine, attached to the clone volume.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-2&lt;/code&gt; monitors the clone operation, and, when the clone completes, converts the clone device to a simple linear device and cleans up.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;For step (3) to work, the “original volume” on &lt;code&gt;cdg1-1&lt;/code&gt; has to be visible on &lt;code&gt;cdg1-2&lt;/code&gt;, which means we need to mount it over the network.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;code&gt;nbd&lt;/code&gt; is so simple that it’s used as a sort of &lt;code&gt;dm-user&lt;/code&gt; userland block device; to prototype a new block device, &lt;a href="https://lwn.net/ml/linux-kernel/[email protected]/" title=""&gt;don’t bother writing a kernel module&lt;/a&gt;, just write an &lt;code&gt;nbd&lt;/code&gt; server.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Take your pick of  protocols. iSCSI is the obvious one, but it’s relatively complicated, and Linux has native support for a much simpler one: &lt;code&gt;nbd&lt;/code&gt;, the “network block device”. You could implement an &lt;code&gt;nbd&lt;/code&gt; server in an afternoon, on top of a file or a SQLite database or S3, and the Linux kernel could mount it as a drive.&lt;/p&gt;

&lt;p&gt;We started out using &lt;code&gt;nbd&lt;/code&gt;. But we kept getting stuck &lt;code&gt;nbd&lt;/code&gt; kernel threads when there was any kind of network disruption. We’re a global public cloud; network disruption  happens. Honestly, we could have debugged our way through this. But it was simpler just to spike out an iSCSI implementation, observe that didn’t get jammed up when the network hiccuped, and move on.&lt;/p&gt;
&lt;h3 id='putting-the-pieces-together' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#putting-the-pieces-together' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Putting The Pieces Together&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;To drain a worker with minimal downtime and no lost data, we turn workers into a temporary SANs, serving the volumes we need to drain to fresh-booted replica Fly Machines on a bunch of “target” physicals. Those SANs — combinations of &lt;code&gt;dm-clone&lt;/code&gt;, iSCSI, and &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;our &lt;code&gt;flyd&lt;/code&gt; orchestrator&lt;/a&gt; — track the blocks copied from the origin, copying each one exactly once and cleaning up when the original volume has been fully copied.&lt;/p&gt;

&lt;p&gt;Problem solved!&lt;/p&gt;
&lt;h3 id='no-there-were-more-problems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#no-there-were-more-problems' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;No, There Were More Problems&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;When your problem domain is hard, anything you build whose design you can’t fit completely in your head is going to be a fiasco. Shorter form: “if you see Raft consensus in a design, we’ve done something wrong”.&lt;/p&gt;

&lt;p&gt;A virtue of this migration system is that, for as many moving pieces as it has, it fits in your head. What complexity it has is mostly shouldered by strategic bets we’ve already  built teams around, most notably the &lt;code&gt;flyd&lt;/code&gt; orchestrator. So we’ve been running this system for the better part of a year without much drama. Not no drama, though. Some drama.&lt;/p&gt;

&lt;p&gt;Example: we encrypt volumes. Our key management is fussy. We do per-volume encryption keys that provision alongside the volumes themselves, so no one worker has a volume skeleton key.&lt;/p&gt;

&lt;p&gt;If you think “migrating those volume keys from worker to worker” is the problem I’m building up to, well, that too, but the bigger problem is &lt;code&gt;trim&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Most people use just a small fraction of the volumes they allocate. A 100GiB volume with just 5MiB used wouldn’t be at all weird. You don’t want to spend minutes copying a volume that could have been fully hydrated in seconds.&lt;/p&gt;

&lt;p&gt;And indeed, &lt;code&gt;dm-clone&lt;/code&gt; doesn’t want to do that either. Given a source block device (for us, an iSCSI mount) and the clone device, a &lt;code&gt;DISCARD&lt;/code&gt; issued on the clone device will get picked up by &lt;code&gt;dm-clone&lt;/code&gt;, which will simply &lt;a href='https://elixir.bootlin.com/linux/v5.11.11/source/drivers/md/dm-clone-target.c#L1357' title=''&gt;short-circuit the read&lt;/a&gt; of the relevant blocks by marking them as hydrated in the metadata volume. Simple enough.&lt;/p&gt;

&lt;p&gt;To make that work, we need the target worker to see the plaintext of the source volume (so that it can do an &lt;code&gt;fstrim&lt;/code&gt; — don’t get us started on how annoying it is to sandbox this — to read the filesystem, identify the unused block, and issue the &lt;code&gt;DISCARDs&lt;/code&gt; where &lt;code&gt;dm-clone&lt;/code&gt; can see them) Easy enough.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;these curses have a lot to do with how hard it was to drain workers!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Except: two different workers, for cursed reasons, might be running different versions of &lt;a href='https://gitlab.com/cryptsetup/cryptsetup' title=''&gt;cryptsetup&lt;/a&gt;, the userland bridge between LUKS2 and the &lt;a href='https://docs.kernel.org/admin-guide/device-mapper/dm-crypt.html' title=''&gt;kernel dm-crypt driver&lt;/a&gt;. There are (or were) two different versions of cryptsetup on our network, and they default to different &lt;a href='https://fossies.org/linux/cryptsetup/docs/on-disk-format-luks2.pdf' title=''&gt;LUKS2 header sizes&lt;/a&gt; — 4MiB and 16MiB. Implying two different plaintext volume sizes. &lt;/p&gt;

&lt;p&gt;So now part of the migration FSM is an RPC call that carries metadata about the designed LUKS2 configuration for the target VM. Not something we expected to have to build, but, whatever.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Corrosion deserves its own post.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Gnarlier example: workers are the source of truth for information about the Fly Machines running on them. Migration knocks the legs out from under that constraint, which we were relying on in Corrosion, the SWIM-gossip SQLite database we use to connect Fly Machines to our request routing. Race conditions. Debugging. Design changes. Next!&lt;/p&gt;

&lt;p&gt;Gnarliest example: our private networks. Recall: we automatically place every Fly Machine into &lt;a href='https://fly.io/blog/incoming-6pn-private-networks/' title=''&gt;a private network&lt;/a&gt;; by default, it’s the one all the other apps in your organization run in. This is super handy for setting up background services, databases, and clustered applications. 20 lines of eBPF code in our worker kernels keeps anybody from “crossing the streams”, sending packets from one private network to another.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;we’re members of an elite cadre of idiots who have managed to find designs that made us wish IPv6 addresses were even bigger.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We call this scheme 6PN (for “IPv6 Private Network”). It functions by &lt;a href='https://fly.io/blog/ipv6-wireguard-peering#ipv6-private-networking-at-fly' title=''&gt;embedding routing information directly into IPv6 addresses&lt;/a&gt;. This is, perhaps, gross. But it allows us to route diverse private networks with constantly changing membership across a global fleet of servers without running a distributed routing protocol. As the beardy wizards who kept the Internet backbone up and running on Cisco AGS+’s once said: the best routing protocol is “static”.&lt;/p&gt;

&lt;p&gt;Problem: the embedded routing information in a 6PN address refers in part to specific worker servers.&lt;/p&gt;

&lt;p&gt;That’s fine, right? They’re IPv6 addresses. Nobody uses literal IPv6 addresses. Nobody uses IP addresses at all; they use the DNS. When you migrate a host, just give it a new 6PN address, and update the DNS.&lt;/p&gt;

&lt;p&gt;Friends, somebody did use literal IPv6 addresses. It was us. In the configurations for Fly Postgres clusters.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;It’s also not operationally easy for us to shell into random Fly Machines, for good reason.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The obvious fix for this is not complicated; given &lt;code&gt;flyctl&lt;/code&gt; ssh access to a Fly Postgres cluster, it’s like a 30 second ninja edit. But we run a &lt;em&gt;lot&lt;/em&gt; of Fly Postgres clusters, and the change has to be coordinated carefully to avoid getting the cluster into a confused state. We went as far as adding feature to our &lt;code&gt;init&lt;/code&gt; to do network address mappings to keep old 6PN addresses reachable before biting the bullet and burning several weeks doing the direct configuration fix fleet-wide.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Speedrun your app onto Fly.io.&lt;/h1&gt;
    &lt;p&gt;3&amp;hellip;2&amp;hellip;1&amp;hellip;&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/speedrun"&gt;
        Go! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h3 id='the-learning-it-burns' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-learning-it-burns' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Learning, It Burns!&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;We get asked a lot why we don’t do storage the “obvious” way, with an &lt;a href='https://aws.amazon.com/ebs/' title=''&gt;EBS-type&lt;/a&gt; SAN fabric, abstracting it away from our compute. Locally-attached NVMe storage is an idiosyncratic choice, one we’ve had to write disclaimers for (single-node clusters can lose data!) since we first launched it.&lt;/p&gt;

&lt;p&gt;One answer is: we’re a startup. Building SAN infrastructure in every region we operate in would be tremendously expensive. Look at any feature in AWS that normal people know the name of, like EC2, EBS, RDS, or S3 — there’s a whole company in there. We launched storage when we were just 10 people, and even at our current size we probably have nothing resembling the resources EBS gets. AWS is pretty great!&lt;/p&gt;

&lt;p&gt;But another thing to keep in mind is: we’re learning as we go. And so even if we had the means to do an EBS-style SAN, we might not build it today.&lt;/p&gt;

&lt;p&gt;Instead, we’re a lot more interested in log-structured virtual disks (LSVD). LSVD uses NVMe as a local cache, but durably persists writes in object storage. You get most of the performance benefit of bus-hop disk writes, along with unbounded storage and S3-grade reliability.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://community.fly.io/t/bottomless-s3-backed-volumes/15648' title=''&gt;We launched LSVD experimentally last year&lt;/a&gt;; in the intervening year, something happened to make LSVD even more interesting to us: &lt;a href='https://www.tigrisdata.com/' title=''&gt;Tigris Data&lt;/a&gt; launched S3-compatible object storage in every one our regions, so instead of backhauling updates to Northern Virginia, &lt;a href='https://community.fly.io/t/tigris-backed-volumes/20792' title=''&gt;we can keep them local&lt;/a&gt;. We have more to say about LSVD, and a lot more to say about Tigris.&lt;/p&gt;

&lt;p&gt;Our first several months of migrations were done gingerly. By summer of 2024, we got to where our infra team can pull “drain this host” out of their toolbelt without much ceremony.&lt;/p&gt;

&lt;p&gt;We’re still not to the point where we’re migrating casually. Your Fly Machines are probably not getting migrated! There&amp;rsquo;d need to be a reason! But the dream is fully-automated luxury space migration, in which you might get migrated semiregularly, as our systems work not just to drain problematic hosts but to rebalance workloads regularly. No time soon. But we’ll get there.&lt;/p&gt;

&lt;p&gt;This is the biggest thing our team has done since we replaced Nomad with flyd. Only the new billing system comes close. We did this thing not because it was easy, but because we thought it would be easy. It was not. But: worth it!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>AWS without Access Keys</title>
        <link rel="alternate" href="https://fly.io/blog/oidc-cloud-roles/"/>
        <id>https://fly.io/blog/oidc-cloud-roles/</id>
        <published>2024-06-19T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/oidc-cloud-roles/assets/spooky-security-skeleton-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;It’s dangerous to go alone. Fly.io runs full-stack apps by transmuting Docker containers into Fly Machines: ultra-lightweight hardware-backed VMs. You can run all your dependencies on Fly.io, but sometimes, you’ll need to work with other clouds, and we’ve made that pretty simple. Try Fly.io out for yourself; your Rails or Node app &lt;a href="https://fly.io/speedrun" title=""&gt;can be up and running in just minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s hypopulate you an app serving generative AI cat images based on the weather forecast, running on a &lt;code&gt;g4dn.xlarge&lt;/code&gt; ECS task in AWS &lt;code&gt;us-east-1&lt;/code&gt;.  It&amp;rsquo;s going great; people didn&amp;rsquo;t realize how dependent their cat pic prefs are on barometric pressure, and you&amp;rsquo;re all anyone can talk about.&lt;/p&gt;

&lt;p&gt;Word reaches Australia and Europe, but you&amp;rsquo;re not catching on, because the… latency is too high? Just roll with us here. Anyways: fixing this is going to require replicating  ECS tasks and ECR images into &lt;code&gt;ap-southeast-2&lt;/code&gt; and &lt;code&gt;eu-central-1&lt;/code&gt; while also setting up load balancing. Nah.&lt;/p&gt;

&lt;p&gt;This is the O.G. Fly.io deployment story; one deployed app, one versioned container, one command to get it running anywhere in the world.&lt;/p&gt;

&lt;p&gt;But you have a problem: your app relies on training data, it&amp;rsquo;s huge, your giant employer manages it, and it&amp;rsquo;s in S3. Getting this to work will require AWS credentials.&lt;/p&gt;

&lt;p&gt;You could ask your security team to create a user, give it permissions, and hand over the AWS keypair. Then you could wash your neck and wait for the blade. Passing around AWS keypairs is the beginning of every horror story told about cloud security, and  security team ain&amp;rsquo;t having it.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a better way. It&amp;rsquo;s drastically more secure, so your security people will at least hear you out. It&amp;rsquo;s also so much easier on Fly.io that you might never bother creating a IAM service account again.&lt;/p&gt;
&lt;h2 id='lets-get-it-out-of-the-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lets-get-it-out-of-the-way' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Let&amp;rsquo;s Get It out of the Way&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;re going to use OIDC to set up strictly limited trust between AWS and Fly.io.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In AWS: we&amp;rsquo;ll add Fly.io as an &lt;code&gt;Identity Provider&lt;/code&gt; in AWS IAM, giving us an ID we can plug into any IAM &lt;code&gt;Role&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;Also in AWS: we&amp;rsquo;ll create a &lt;code&gt;Role&lt;/code&gt;, give it access to the S3 bucket with our tokenized cat data, and then attach the &lt;code&gt;Identity Provider&lt;/code&gt; to it.
&lt;/li&gt;&lt;li&gt;In Fly.io, we&amp;rsquo;ll take the &lt;code&gt;Role&lt;/code&gt; ARN we got from step 2 and set it as an environment variable in our app.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Our machines will now magically have access to the S3 bucket.&lt;/p&gt;
&lt;h2 id='what-the-what' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-what' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What the What&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;A reasonable question to ask here is, &amp;ldquo;where&amp;rsquo;s the credential&amp;rdquo;? Ordinarily, to give a Fly Machine access to an AWS resource, you&amp;rsquo;d use &lt;code&gt;fly secrets set&lt;/code&gt; to add an &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt; and &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt; to the environment in the Machine. Here, we&amp;rsquo;re not setting any secrets at all; we&amp;rsquo;re just adding an ARN — which is not a credential — to the Machine.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s what&amp;rsquo;s happening.&lt;/p&gt;

&lt;p&gt;Fly.io operates an OIDC IdP at &lt;code&gt;oidc.fly.io&lt;/code&gt;. It issues OIDC tokens, exclusively to Fly Machines. AWS can be configured to trust these tokens, on a role-by-role basis. That&amp;rsquo;s the &amp;ldquo;secret credential&amp;rdquo;: the pre-configured trust relationship in IAM, and the public keypairs it manages. You, the user, never need to deal with these keys directly; it all happens behind the scenes, between AWS and Fly.io.&lt;/p&gt;

&lt;p&gt;&lt;img alt="A diagram: STS trusts OIDC.fly.io. OIDC.fly.io trusts flyd. flyd issues a token to the Machine, which proffers it to STS. STS sends an STS cred to the Machine, which then uses it to retrieve model weights from S3." src="/blog/oidc-cloud-roles/assets/oidc-diagram.webp" /&gt;&lt;/p&gt;

&lt;p&gt;The key actor in this picture is &lt;code&gt;STS&lt;/code&gt;, the AWS &lt;code&gt;Security Token Service&lt;/code&gt;. &lt;code&gt;STS&lt;/code&gt;&amp;lsquo;s main job is to vend short-lived AWS credentials, usually through some variant of an API called &lt;code&gt;AssumeRole&lt;/code&gt;. Specifically, in our case: &lt;code&gt;AssumeRoleWithWebIdentity&lt;/code&gt; tells &lt;code&gt;STS&lt;/code&gt; to cough up an AWS keypair given an OIDC token (that matches a pre-configured trust relationship).&lt;/p&gt;

&lt;p&gt;That still leaves the question: how does your code, which is reaching out to the AWS APIs to get cat weights, drive any of this?&lt;/p&gt;
&lt;h2 id='the-init-thickens' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-init-thickens' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Init Thickens&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Every Fly Machine boots up into an &lt;code&gt;init&lt;/code&gt; we wrote in Rust. It has slowly been gathering features.&lt;/p&gt;

&lt;p&gt;One of those features, which has been around for awhile, is a server for a Unix socket at &lt;code&gt;/.fly/api&lt;/code&gt;, which exports a subset of the Fly Machines API to privileged processes in the Machine. Think of it as our answer to the EC2 Instant Metadata Service. How it works is, every time we boot a Fly Machine, we pass it a &lt;a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''&gt;Macaroon token&lt;/a&gt; locked to that particular Machine; &lt;code&gt;init&lt;/code&gt;&amp;rsquo;s server for &lt;code&gt;/.fly/api&lt;/code&gt; is a proxy that attaches that token to requests.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;In addition to the API proxy being tricky to SSRF to.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;What&amp;rsquo;s neat about this is that the credential that drives &lt;code&gt;/.fly/api&lt;/code&gt; is doubly protected:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Fly.io platform won&amp;rsquo;t honor it unless it comes from that specific Fly Machine (&lt;code&gt;flyd&lt;/code&gt;, our orchestrator, knows who it&amp;rsquo;s talking to), &lt;em&gt;and&lt;/em&gt;
&lt;/li&gt;&lt;li&gt;Ordinary code running in a Fly Machine never gets a copy of the token to begin with.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;You could rig up a local privilege escalation vulnerability and work out how to steal the Macaroon, but you can&amp;rsquo;t exfiltrate it productively.&lt;/p&gt;

&lt;p&gt;So now you have half the puzzle worked out: OIDC is just part of the &lt;a href='https://fly.io/docs/machines/api/' title=''&gt;Fly Machines API&lt;/a&gt; (specifically: &lt;code&gt;/v1/tokens/oidc&lt;/code&gt;). A Fly Machine can hit a Unix socket and get an OIDC token tailored to that machine:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-dm2j0e3b"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-dm2j0e3b"&gt;{
  "app_id": "3671581",
  "app_name": "weather-cat",
  "aud": "sts.amazonaws.com",
  "image": "image:latest",
  "image_digest": "sha256:dff79c6da8dd4e282ecc6c57052f7cfbd684039b652f481ca2e3324a413ee43f",
  "iss": "https://oidc.fly.io/example",
  "machine_id": "3d8d377ce9e398",
  "machine_name": "ancient-snow-4824",
  "machine_version": "01HZJXGTQ084DX0G0V92QH3XW4",
  "org_id": "29873298",
  "org_name": "example",
  "region": "yyz",
  "sub": "example:weather-cat:ancient-snow-4824"
} // some OIDC stuff trimmed
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Look upon this holy blob, sealed with a published key managed by Fly.io&amp;rsquo;s OIDC vault, and see that there lies within it enough information for AWS &lt;code&gt;STS&lt;/code&gt; to decide to issue a session credential.&lt;/p&gt;

&lt;p&gt;We have still not completed the puzzle, because while you can probably now see how you&amp;rsquo;d drive this process with a bunch of new code that you&amp;rsquo;d tediously write, you are acutely aware that you have not yet endured that tedium — e pur si muove!&lt;/p&gt;

&lt;p&gt;One &lt;code&gt;init&lt;/code&gt; feature remains to be disclosed, and it&amp;rsquo;s cute.&lt;/p&gt;

&lt;p&gt;If, when &lt;code&gt;init&lt;/code&gt; starts in a Fly Machine, it sees an &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; environment variable set, it initiates a little dance; it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;goes off and generates an OIDC token, the way we just described,
&lt;/li&gt;&lt;li&gt;saves that OIDC token in a file, &lt;em&gt;and&lt;/em&gt;
&lt;/li&gt;&lt;li&gt;sets the &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; and &lt;code&gt;AWS_ROLE_SESSION_NAME&lt;/code&gt; environment variables for every process it launches.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;The AWS SDK, linked to your application, does all the rest.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s review: you add an &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; variable to your Fly App, launch a Machine, and have it go fetch a file from S3. What happens next is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;init&lt;/code&gt; detects &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; is set as an environment variable.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;init&lt;/code&gt; sends a request to &lt;code&gt;/v1/tokens/oidc&lt;/code&gt; via &lt;code&gt;/.api/proxy&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;init&lt;/code&gt; writes the response to &lt;code&gt;/.fly/oidc_token.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;&lt;code&gt;init&lt;/code&gt; sets &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; and &lt;code&gt;AWS_ROLE_SESSION_NAME&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;The entrypoint boots, and (say) runs &lt;code&gt;aws s3 get-object.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The AWS SDK runs through the &lt;a href='https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html#credentialProviderChain' title=''&gt;credential provider chain&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;The SDK sees that &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; is set and calls &lt;code&gt;AssumeRoleWithWebIdentity&lt;/code&gt; with the file contents.
&lt;/li&gt;&lt;li&gt;AWS verifies the token against &lt;a href='https://oidc.fly.io/' title=''&gt;&lt;code&gt;https://oidc.fly.io/&lt;/code&gt;&lt;/a&gt;&lt;code&gt;example/.well-known/openid-configuration&lt;/code&gt;, which references a key Fly.io manages on isolated hardware.
&lt;/li&gt;&lt;li&gt;AWS vends &lt;code&gt;STS&lt;/code&gt; credentials for the assumed &lt;code&gt;Role&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;The SDK uses the &lt;code&gt;STS&lt;/code&gt; credentials to access the S3 bucket.
&lt;/li&gt;&lt;li&gt;AWS checks the &lt;code&gt;Role&lt;/code&gt;&amp;rsquo;s IAM policy to see if it has access to the S3 bucket.
&lt;/li&gt;&lt;li&gt;AWS returns the contents of the bucket object.
&lt;/li&gt;&lt;/ol&gt;
&lt;h2 id='how-much-better-is-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-better-is-this' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How Much Better Is This?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;It is a lot better.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;They asymptotically approach the security properties of Macaroon tokens.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Most importantly: AWS &lt;code&gt;STS&lt;/code&gt; credentials are short-lived. Because they&amp;rsquo;re generated dynamically, rather than stored in a configuration file or environment variable, they&amp;rsquo;re already a little bit annoying for an attacker to recover. But they&amp;rsquo;re also dead in minutes. They have a sharply limited blast radius. They rotate themselves, and fail closed.&lt;/p&gt;

&lt;p&gt;They&amp;rsquo;re also easier to manage. This is a rare instance where you can reasonably drive the entire AWS side of the process from within the web console. Your cloud team adds &lt;code&gt;Roles&lt;/code&gt; all the time; this is just a &lt;code&gt;Role&lt;/code&gt; with an extra snippet of JSON. The resulting ARN isn&amp;rsquo;t even a secret; your cloud team could just email or Slack message it back to you.&lt;/p&gt;

&lt;p&gt;Finally, they offer finer-grained control.&lt;/p&gt;

&lt;p&gt;To understand the last part, let&amp;rsquo;s look at that extra snippet of JSON (the &amp;ldquo;Trust Policy&amp;rdquo;) your cloud team is sticking on the new &lt;code&gt;cat-bucket&lt;/code&gt; &lt;code&gt;Role&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ex6t9zqy"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-ex6t9zqy"&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::123456123456:oidc-provider/oidc.fly.io/example"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
              "StringEquals": {
                "oidc.fly.io/example:aud": "sts.amazonaws.com",
              },
               "StringLike": {
                "oidc.fly.io/example:sub": "example:weather-cat:*"
              }
            }
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;The &lt;code&gt;aud&lt;/code&gt; check guarantees &lt;code&gt;STS&lt;/code&gt; will only honor tokens that Fly.io deliberately vended for &lt;code&gt;STS&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Recall the OIDC token we dumped earlier; much of what&amp;rsquo;s in it, we can match in the Trust Policy. Every OIDC token Fly.io generates is going to have a &lt;code&gt;sub&lt;/code&gt; field formatted &lt;code&gt;org:app:machine&lt;/code&gt;, so we can lock IAM &lt;code&gt;Roles&lt;/code&gt; down to organizations, or to specific Fly Apps, or even specific Fly Machine instances.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Speedrun your app onto Fly.io.&lt;/h1&gt;
    &lt;p&gt;3&amp;hellip;2&amp;hellip;1&amp;hellip;&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/speedrun"&gt;
        Go! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='and-so' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#and-so' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;And So&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In case it&amp;rsquo;s not obvious: this pattern works for any AWS API, not just S3.&lt;/p&gt;

&lt;p&gt;Our OIDC support on the platform and in Fly Machines will set arbitrary OIDC &lt;code&gt;audience&lt;/code&gt; strings, so you can use it to authenticate to any OIDC-compliant cloud provider. It won&amp;rsquo;t be as slick on Azure or GCP, because we haven&amp;rsquo;t done the &lt;code&gt;init&lt;/code&gt; features to light their APIs up with a single environment variable — but those features are easy, and we&amp;rsquo;re just waiting for people to tell us what they need.&lt;/p&gt;

&lt;p&gt;For us, the gold standard for least-privilege, conditional access tokens remains Macaroons, and it&amp;rsquo;s unlikely that we&amp;rsquo;re going to do a bunch of internal stuff using OIDC. We even snuck Macaroons into this feature. But the security you&amp;rsquo;re getting from this OIDC dance closes a lot of the gap between hardcoded user credentials and Macaroons, and it&amp;rsquo;s easy to use — easier, in some ways, than it is to manage role-based access inside of a legacy EC2 deployment!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Picture This: Open Source AI for Image Description</title>
        <link rel="alternate" href="https://fly.io/blog/llm-image-description/"/>
        <id>https://fly.io/blog/llm-image-description/</id>
        <published>2024-05-09T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/llm-image-description/assets/image-description-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;I’m Nolan, and I work on Fly Machines here at Fly.io. We’re building a new public cloud—one where you can spin up CPU and GPU workloads, around the world, in a jiffy. &lt;a href="https://fly.io/speedrun/" title=""&gt;Try us out&lt;/a&gt;; you can be up and running in minutes. This is a post about LLMs being really helpful, and an extensible project you can build with open source on a weekend.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Picture this, if you will.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;re blind. You&amp;rsquo;re in an unfamiliar hotel room on a trip to Chicago.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;If you live in Chicago IRL, imagine the hotel in Winnipeg, &lt;a href="https://www.cbc.ca/history/EPISCONTENTSE1EP10CH3PA5LE.html" title=""&gt;the Chicago of the North&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;You&amp;rsquo;ve absent-mindedly set your coffee down, and can&amp;rsquo;t remember where. You&amp;rsquo;re looking for the thermostat so you don&amp;rsquo;t wake up frozen. Or, just maybe, you&amp;rsquo;re playing a fun-filled round of &amp;ldquo;find the damn light switch so your sighted partner can get some sleep already!&amp;rdquo;&lt;/p&gt;

&lt;p&gt;If, like me, you&amp;rsquo;ve been blind for a while, you have plenty of practice finding things without the luxury of a quick glance around. It may be more tedious than you&amp;rsquo;d like, but you&amp;rsquo;ll get it done.&lt;/p&gt;

&lt;p&gt;But the speed of innovation in machine learning and large language models has been dizzying, and in 2024 you can snap a photo with your phone and have an app like &lt;a href='https://www.bemyeyes.com/blog/announcing-be-my-ai/' title=''&gt;Be My AI&lt;/a&gt; or &lt;a href='https://www.seeingai.com/' title=''&gt;Seeing AI&lt;/a&gt; tell you where in that picture it found your missing coffee mug, or where it thinks the light switch is.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Creative switch locations seem to be a point of pride for hotels, so the light game may be good for a few rounds of quality entertainment, regardless of how good your AI is.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is &lt;em&gt;big&lt;/em&gt;. It&amp;rsquo;s hard for me to state just how exciting and empowering AI image descriptions have been for me without sounding like a shill. In the past year, I&amp;rsquo;ve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Found shit in strange hotel rooms. 
&lt;/li&gt;&lt;li&gt;Gotten descriptions of scenes and menus in otherwise inaccessible video games.
&lt;/li&gt;&lt;li&gt;Requested summaries of technical diagrams and other materials where details weren’t made available textually. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;I&amp;rsquo;ve been consistently blown away at how impressive and helpful AI-created image descriptions have been.&lt;/p&gt;

&lt;p&gt;Also&amp;hellip;&lt;/p&gt;
&lt;h2 id='which-thousand-words-is-this-picture-worth' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#which-thousand-words-is-this-picture-worth' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Which thousand words is this picture worth?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;As a blind internet user for the last three decades, I have extensive empirical evidence to corroborate what you already know in your heart: humans are pretty flaky about writing useful alt text for all the images they publish. This does tend to make large swaths of the internet inaccessible to me!&lt;/p&gt;

&lt;p&gt;In just a few years, the state of image description on the internet has gone from complete reliance on the aforementioned lovable, but ultimately tragically flawed, humans, to automated strings of words like &lt;code&gt;Image may contain person, glasses, confusion, banality, disillusionment&lt;/code&gt;, to LLM-generated text that reads a lot like it was written by a person, perhaps sipping from a steaming cup of Earl Grey as they reflect on their previous experiences of a background that features a tree with snow on its branches, suggesting that this scene takes place during winter.&lt;/p&gt;

&lt;p&gt;If an image is missing alt text, or if you want a second opinion, there are screen-reader addons, like &lt;a href='https://github.com/cartertemm/AI-content-describer/' title=''&gt;this one&lt;/a&gt; for &lt;a href='https://www.nvaccess.org/download/' title=''&gt;NVDA&lt;/a&gt;, that you can use with an API key to get image descriptions from GPT-4 or Google Gemini as you read. This is awesome! &lt;/p&gt;

&lt;p&gt;And this brings me to the nerd snipe. How hard would it be to build an image description service we can host ourselves, using open source technologies? It turns out to be spookily easy.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s what I came up with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href='https://ollama.com/' title=''&gt;Ollama&lt;/a&gt; to run the model
&lt;/li&gt;&lt;li&gt;A &lt;a href='https://pocketbase.io' title=''&gt;PocketBase&lt;/a&gt; project that provides a simple authenticated API for users to submit images, get descriptions, and ask followup questions about the image
&lt;/li&gt;&lt;li&gt;The simplest possible Python client to interact with the PocketBase app on behalf of users
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;The idea is to keep it modular and hackable, so if sentiment analysis or joke creation is your thing, you can swap out image description for that and have something going in, like, a weekend.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re like me, and you go skipping through recipe blogs to find the &amp;ldquo;go directly to recipe&amp;rdquo; link, find the code itself &lt;a href='https://github.com/superfly/llm-describer' title=''&gt;here&lt;/a&gt;. &lt;/p&gt;
&lt;h2 id='the-llm-is-the-easiest-part' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-llm-is-the-easiest-part' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The LLM is the easiest part&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;An API to accept images and prompts, run the model, and spit 
out answers sounds like a lot! But it&amp;rsquo;s the simplest part of this whole thing, because: 
that&amp;rsquo;s &lt;a href='https://ollama.com/' title=''&gt;Ollama&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can just run the Ollama Docker image, get it to grab the model 
you want to use, and that&amp;rsquo;s it. There&amp;rsquo;s your AI server. (We have a &lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;blog post&lt;/a&gt; 
all about deploying Ollama on Fly.io; Fly GPUs are rad, try&amp;#39;em out, etc.).&lt;/p&gt;

&lt;p&gt;For this project, we need a model that can make sense&amp;mdash;or at least words&amp;mdash;out of a picture. 
&lt;a href='https://llava-vl.github.io/' title=''&gt;LLaVA&lt;/a&gt; is a trained, Apache-licensed &amp;ldquo;large multimodal model&amp;rdquo; that fits the bill. 
Get the model with the Ollama CLI:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-jo727efe"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-jo727efe"&gt;ollama pull llava:34b
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="callout"&gt;&lt;p&gt;If you have hardware that can handle it, you could run this on your computer at home. If you run AI models on a cloud provider, be aware that GPU compute is expensive! &lt;strong class="font-semibold text-navy-950"&gt;It’s important to take steps to ensure you’re not paying for a massive GPU 24/7.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On Fly.io, at the time of writing, you’d achieve this with the &lt;a href="https://fly.io/docs/apps/autostart-stop/" title=""&gt;autostart and autostop&lt;/a&gt; functions of the Fly Proxy, restricting Ollama access to internal requests over &lt;a href="https://fly.io/docs/networking/private-networking/#flycast-private-fly-proxy-services" title=""&gt;Flycast&lt;/a&gt; from the PocketBase app. That way, if there haven’t been any requests for a few minutes, the Fly Proxy stops the Ollama &lt;a href="https://fly.io/docs/machines/" title=""&gt;Machine&lt;/a&gt;, which releases the CPU, GPU, and RAM allocated to it. &lt;a href="https://fly.io/blog/scaling-llm-ollama/" title=""&gt;Here’s a post&lt;/a&gt; that goes into more detail. &lt;/p&gt;
&lt;/div&gt;&lt;h2 id='a-multi-tool-on-the-backend' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-multi-tool-on-the-backend' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A multi-tool on the backend&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I want user auth to make sure just anyone can&amp;rsquo;t grab my &amp;ldquo;image description service&amp;rdquo; and keep it busy generating short stories about their cat. If I build this out into a service for others to use, I might also want business logic around plans or
credits, or mobile-friendly APIs for use in the field. &lt;a href='https://pocketbase.io' title=''&gt;PocketBase&lt;/a&gt; provides a scaffolding for all of it. It&amp;rsquo;s a Swiss army knife: a Firebase-like API on top of SQLite, complete with authentication, authorization, an admin UI, extensibility in JavaScript and Go, and various client-side APIs.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Yes, &lt;em&gt;of course&lt;/em&gt; I’ve used an LLM to generate feline fanfic. Theme songs too. Hasn’t everyone? &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I &amp;ldquo;faked&amp;rdquo; a task-specific API that supports followup questions by extending PocketBase in Go, modeling requests and responses as &lt;a href='https://pocketbase.io/docs/collections/' title=''&gt;collections&lt;/a&gt; (i.e. SQLite tables) with &lt;a href='https://pocketbase.io/docs/go-event-hooks/' title=''&gt;event hooks&lt;/a&gt; to trigger pre-set interactions with the Ollama app (via &lt;a href='https://tmc.github.io/langchaingo' title=''&gt;LangChainGo&lt;/a&gt;) and the client (via the PocketBase API).&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re following along, &lt;a href='https://github.com/superfly/llm-describer/blob/main/describer.go' title=''&gt;here&amp;rsquo;s the module&lt;/a&gt;
that handles all that, along with initializing the LLM connection.&lt;/p&gt;

&lt;p&gt;In a nutshell, this is the dance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When a user uploads an image, a hook on the &lt;code&gt;images&lt;/code&gt; collection sends the image to Ollama, along with this prompt:
&lt;code&gt;&amp;quot;You are a helpful assistant describing images for blind screen reader users. Please describe this image.&amp;quot;&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Ollama sends back its response, which the backend 1) passes back to the client and 2) stores in its &lt;code&gt;followups&lt;/code&gt; collection for future reference.
&lt;/li&gt;&lt;li&gt;If the user responds with a followup question about the image and description, that also 
goes into the &lt;code&gt;followups&lt;/code&gt; collection; user-initiated changes to this collection trigger a hook to chain the new 
followup question with the image and the chat history into a new request for the model.
&lt;/li&gt;&lt;li&gt;Lather, rinse, repeat.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is a super simple hack to handle followup questions, and it’ll let you keep adding followups until 
something breaks. You&amp;rsquo;ll see the quality of responses get poorer&amp;mdash;possibly incoherent&amp;mdash;as the context 
exceeds the context window.&lt;/p&gt;

&lt;p&gt;I also set up &lt;a href='https://pocketbase.io/docs/api-rules-and-filters/' title=''&gt;API rules&lt;/a&gt; in PocketBase,
ensuring that users can&amp;rsquo;t read to and write from others&amp;rsquo; chats with the AI.&lt;/p&gt;

&lt;p&gt;If image descriptions aren&amp;rsquo;t your thing, this business logic is easily swappable 
for joke generation, extracting details from text, any other simple task you 
might want to throw at an LLM. Just slot the best model into Ollama (LLaVA is pretty OK as a general starting point too), and match the PocketBase schema and pre-set prompts to your application.&lt;/p&gt;
&lt;h2 id='a-seedling-of-a-client' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-seedling-of-a-client' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A seedling of a client&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;With the image description service in place, the user can talk to it with any client that speaks the PocketBase API. PocketBase already has SDK clients in JavaScript and Dart, but because my screen reader is &lt;a href='https://github.com/nvaccess/nvda' title=''&gt;written in Python&lt;/a&gt;, I went with a &lt;a href='https://pypi.org/project/pocketbase/' title=''&gt;community-created Python library&lt;/a&gt;. That way I can build this out into an NVDA add-on 
if I want to.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re a fancy Python developer, you probably have your preferred tooling for
handling virtualenvs and friends. I&amp;rsquo;m not, and since my screen reader doesn&amp;rsquo;t use those
anyway, I just &lt;code&gt;pip install&lt;/code&gt;ed the library so my client can import it:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-8wi7b0yx"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-8wi7b0yx"&gt;pip install pocketbase
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href='https://github.com/superfly/llm-describer/blob/main/__init__.py' title=''&gt;My client&lt;/a&gt; is a very simple script. 
It expects a couple of things: a file called &lt;code&gt;image.jpg&lt;/code&gt;, located in the current directory, 
and environment variables to provide the service URL and user credentials to log into it with.&lt;/p&gt;

&lt;p&gt;When you run the client script, it uploads the image to the user’s &lt;code&gt;images&lt;/code&gt; collection on the 
backend app, starting the back-and-forth between user and model we saw in the previous section. 
The client prints the model&amp;rsquo;s output to the CLI and prompts the user to input a followup question, 
which it passes up to the &lt;code&gt;followups&lt;/code&gt; collection, and so on.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;This can run on Fly.io.&lt;/h1&gt;
    &lt;p&gt;Run your LLM on a datacenter-grade GPU.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/gpu/"&gt;
        Try out a Fly GPU &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='all-together-now' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#all-together-now' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;All together now&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I grabbed &lt;a href='https://unsplash.com/photos/brown-trees-beside-river-under-blue-sky-during-daytime-kSvpTrfhaiU' title=''&gt;this
image&lt;/a&gt;
and saved it to a file called &lt;em&gt;image.jpg&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;While I knew I was downloading an image of a winter scene, all I see on Unsplash is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;brown trees beside river under blue sky during daytime Bright winter landscape
with lake, snow, forest, beautiful blue sky and white clouds. An example of
charming wildlife in Russia.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let&amp;rsquo;s see what our very own AI describer thinks of this picture:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative plain"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-3cm0shen"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-3cm0shen"&gt;$ python __init__.py
The image depicts a serene winter scene. There is a small stream of water flowing through the center, with patches of snow on its surface and along the banks. The surrounding area appears to be a mix of bare trees and shrubs, suggesting it's late in the season or possibly early spring when some plants have started to grow again but haven't yet leafed out.

The sky is clear and blue, indicating good weather conditions at the time the photo was taken. The sunlight seems to be coming from the left side of the image, casting a warm glow on the scene and creating reflections in the water. There are no visible texts or distinguishing marks that provide additional context about the location or the photographer.

The overall atmosphere is peaceful and natural, with the quiet beauty of the landscape undisturbed except for the gentle movement of the stream.
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Is it a stellar description? Maybe not, but it certainly gives me a better sense of connection with the scene.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s see how our describer copes with a followup question.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative plain"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-o72lg1fm"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-o72lg1fm"&gt;Enter your followup question, or 'quit' to quit: What types of trees are in the image?
Sending followup question
It's difficult to determine the exact species of trees in the image without more specific details, as many deciduous trees lose their leaves during winter and look quite similar. However, based on the general appearance of the bare branches and the fact that they are leafless, it is likely that these are common deciduous trees such as oaks, maples, or elms.

The presence of snow and the absence of foliage suggest that this photo was taken during winter or early spring when the trees have not yet started to bloom again. The exact species would require a closer examination of the bark, buds, and other distinguishing features, which are not clearly visible in this image.
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Boo, the general-purpose LLaVA model couldn&amp;rsquo;t identify the leafless trees. At least it knows why it can&amp;rsquo;t. Maybe there&amp;rsquo;s a better model out 
there for that. Or we could train one, if we really needed tree identification! We could make every component of 
this service more sophisticated! &lt;/p&gt;

&lt;p&gt;But that I, personally, can make a proof of concept like this with a few days of effort
continues to boggle my mind. Thanks to a handful of amazing open source projects, it&amp;rsquo;s really, spookily, easy. And from here, I (or you) can build out a screen-reader addon, or a mobile app, or a different kind of AI service, with modular changes.&lt;/p&gt;
&lt;h2 id='deployment-notes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-notes' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Deployment notes&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;On Fly.io, stopping GPU Machines saves you a bunch of money and some carbon footprint, in return for cold-start latency when you make a request for the first time in more than a few minutes. In testing this project, on the &lt;code&gt;a100-40gb&lt;/code&gt; Fly Machine preset, the 34b-parameter LLaVA model took several seconds to generate each response. If the Machine was stopped when the request came in, starting it up took another handful of seconds, followed by several tens of seconds to load the model into GPU RAM. The total time from cold start to completed description was about 45 seconds. Just something to keep in mind.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re running Ollama in the cloud, you likely want to put the model onto storage that&amp;rsquo;s persistent, so you don&amp;rsquo;t have to download it repeatedly. You could also build the model into a Docker image ahead of deployment.&lt;/p&gt;

&lt;p&gt;The PocketBase Golang app compiles to a single executable that you can run wherever.
I run it on Fly.io, unsurprisingly, and the &lt;a href='https://github.com/superfly/llm-describer/' title=''&gt;repo&lt;/a&gt; comes with a Dockerfile and a &lt;a href='https://fly.io/docs/reference/configuration/' title=''&gt;&lt;code&gt;fly.toml&lt;/code&gt;&lt;/a&gt; config file, which you can edit to point at your own Ollama instance. It uses a small persistent storage volume for the SQLite database. Under testing, it runs fine on a &lt;code&gt;shared-cpu-1x&lt;/code&gt; Machine. &lt;/p&gt;</content>
    </entry>
    <entry>
        <title>JIT WireGuard</title>
        <link rel="alternate" href="https://fly.io/blog/jit-wireguard-peers/"/>
        <id>https://fly.io/blog/jit-wireguard-peers/</id>
        <published>2024-03-12T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/jit-wireguard-peers/assets/network-thumbnail.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. We do a lot of stuff with WireGuard, which has become a part of our customer API. This is a quick story about some tricks we played to make WireGuard faster and more scalable for the hundreds of thousands of people who now use it here.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;One of many odd decisions we&amp;rsquo;ve made at Fly.io is how we use WireGuard. It&amp;rsquo;s not just that we use it in many places where other shops would use HTTPS and REST APIs. We&amp;rsquo;ve gone a step beyond that: every time you run &lt;code&gt;flyctl&lt;/code&gt;, our lovable, sprawling CLI, it conjures a TCP/IP stack out of thin air, with its own IPv6 address, and speaks directly to Fly Machines running on our networks.&lt;/p&gt;

&lt;p&gt;There are plusses and minuses to this approach, which we talked about &lt;a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''&gt;in a blog post a couple years back&lt;/a&gt;. Some things, like remote-operated Docker builders, get easier to express (a Fly Machine, as far as &lt;code&gt;flyctl&lt;/code&gt; is concerned, might as well be on the same LAN). But everything generally gets trickier to keep running reliably.&lt;/p&gt;

&lt;p&gt;It was a decision. We own it.&lt;/p&gt;

&lt;p&gt;Anyways, we&amp;rsquo;ve made some improvements recently, and I&amp;rsquo;d like to talk about them.&lt;/p&gt;
&lt;h2 id='where-we-left-off' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-we-left-off' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Where we left off&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Until a few weeks ago, our gateways ran on a pretty simple system.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We operate dozens of &amp;ldquo;gateway&amp;rdquo; servers around the world, whose sole purpose is to accept incoming WireGuard connections and connect them to the appropriate private networks.
&lt;/li&gt;&lt;li&gt;Any time you run &lt;code&gt;flyctl&lt;/code&gt; and it needs to talk to a Fly Machine (to build a container, pop an SSH console, copy files, or proxy to a service you&amp;rsquo;re running), it spawns or connects to a background agent process.
&lt;/li&gt;&lt;li&gt;The first time it runs, the agent generates a new WireGuard peer configuration from our GraphQL API. WireGuard peer configurations are very simple: just a public key and an address to connect to.
&lt;/li&gt;&lt;li&gt;Our API in turn takes that peer configuration and sends it to the appropriate gateway (say, &lt;code&gt;ord&lt;/code&gt;, if you&amp;rsquo;re near Chicago) via an RPC we send over the NATS messaging system.
&lt;/li&gt;&lt;li&gt;On the gateway, a service called &lt;code&gt;wggwd&lt;/code&gt; accepts that configuration, saves it to a SQLite database, and adds it to the kernel using WireGuard&amp;rsquo;s Golang libraries. &lt;code&gt;wggwd&lt;/code&gt; acknowledges the installation of the peer to the API.
&lt;/li&gt;&lt;li&gt;The API replies to your GraphQL request, with the configuration.
&lt;/li&gt;&lt;li&gt;Your &lt;code&gt;flyctl&lt;/code&gt; connects to the WireGuard peer, which works, because you receiving the configuration means it’s installed on the gateway.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;I copy-pasted those last two bullet points from &lt;a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''&gt;that two-year-old post&lt;/a&gt;, because when it works, it does &lt;em&gt;just work&lt;/em&gt; reasonably well. (We ultimately did end up defaulting everybody to WireGuard-over-WebSockets, though.)&lt;/p&gt;

&lt;p&gt;But if it always worked, we wouldn&amp;rsquo;t be here, would we?&lt;/p&gt;

&lt;p&gt;We ran into two annoying problems:&lt;/p&gt;

&lt;p&gt;One: NATS is fast, but doesn’t guarantee delivery. Back in 2022, Fly.io was pretty big on NATS internally. We&amp;rsquo;ve moved away from it. For instance, our &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;internal &lt;code&gt;flyd&lt;/code&gt; API&lt;/a&gt; used to be driven by NATS; today, it&amp;rsquo;s HTTP.  Our NATS cluster was losing too many messages to host a reliable API on it. Scaling back our use of NATS made WireGuard gateways better, but still not great.&lt;/p&gt;

&lt;p&gt;Two: When &lt;code&gt;flyctl&lt;/code&gt; exits, the WireGuard peer it created sticks around on the gateway. Nothing cleans up old peers. After all, you&amp;rsquo;re likely going to come back tomorrow and deploy a new version of your app, or &lt;code&gt;fly ssh console&lt;/code&gt; into it to debug something. Why remove a peer just to re-add it the next day?&lt;/p&gt;

&lt;p&gt;Unfortunately, the vast majority of peers are created by &lt;code&gt;flyctl&lt;/code&gt; in CI jobs, which don&amp;rsquo;t have persistent storage and can&amp;rsquo;t reconnect to the same peer the next run; they generate new peers every time, no matter what.&lt;/p&gt;

&lt;p&gt;So, we ended up with a not-reliable-enough provisioning system, and gateways with hundreds of thousands of peers that will never be used again. The high stale peer count made kernel WireGuard operations very slow - especially loading all the peers back into the kernel after a gateway server reboot - as well as some kernel panics.&lt;/p&gt;

&lt;p&gt;There had to be&lt;/p&gt;
&lt;h2 id='a-better-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-better-way' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A better way.&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Storing bajillions of WireGuard peers is no big challenge for any serious n-tier RDBMS. This isn&amp;rsquo;t &amp;ldquo;big data&amp;rdquo;. The problem we have at Fly.io is that our gateways don&amp;rsquo;t have serious n-tier RDBMSs. They&amp;rsquo;re small. Scrappy. They live off the land.&lt;/p&gt;

&lt;p&gt;Seriously, though: you could store every WireGuard peer everybody has ever used at Fly.io in a single SQLite database, easily.  What you can&amp;rsquo;t do is store them all in the Linux kernel.&lt;/p&gt;

&lt;p&gt;So, at some point, as you push more and more peer configurations to a gateway, you have to start making decisions about which peers you&amp;rsquo;ll enable in the kernel, and which you won&amp;rsquo;t.&lt;/p&gt;

&lt;p&gt;Wouldn&amp;rsquo;t it be nice if we just didn&amp;rsquo;t have this problem? What if, instead of pushing configs to gateways, we had the gateways pull them from our API on demand?&lt;/p&gt;

&lt;p&gt;If you did that, peers would only have to be added to the kernel when the client wanted to connect. You could yeet them out of the kernel any time you wanted; the next time the client connected, they&amp;rsquo;d just get pulled again, and everything would work fine.&lt;/p&gt;

&lt;p&gt;The problem you quickly run into to build this design is that Linux kernel WireGuard doesn&amp;rsquo;t have a feature for installing peers on demand. However:&lt;/p&gt;
&lt;h2 id='it-is-possible-to-jit-wireguard-peers' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#it-is-possible-to-jit-wireguard-peers' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;It is possible to JIT WireGuard peers&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The Linux kernel&amp;rsquo;s &lt;a href='https://github.com/WireGuard/wgctrl-go' title=''&gt;interface for configuring WireGuard&lt;/a&gt; is &lt;a href='https://docs.kernel.org/userspace-api/netlink/intro.html' title=''&gt;Netlink&lt;/a&gt; (which is basically a way to create a userland socket to talk to a kernel service). Here&amp;rsquo;s a &lt;a href='https://github.com/WireGuard/wg-dynamic/blob/master/netlink.h' title=''&gt;summary of it as a C API&lt;/a&gt;. Note that there&amp;rsquo;s no API call to subscribe for &amp;ldquo;incoming connection attempt&amp;rdquo; events.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s OK! We can just make our own events. WireGuard connection requests are packets, and they&amp;rsquo;re easily identifiable, so we can efficiently snatch them with a BPF filter and a &lt;a href='https://github.com/google/gopacket' title=''&gt;packet socket&lt;/a&gt;.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Most of the time, it’s even easier for us to get the raw WireGuard packets, because our users now default to WebSockets WireGuard (which is just an unauthenticated WebSockets connect that shuttles framed UDP packets to and from an interface on the gateway), so that people who have trouble talking end-to-end in UDP can bring connections up.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We own the daemon code for that, and can just hook the packet receive function to snarf WireGuard packets.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s not obvious, but WireGuard doesn&amp;rsquo;t have notions of &amp;ldquo;client&amp;rdquo; or &amp;ldquo;server&amp;rdquo;. It&amp;rsquo;s a pure point-to-point protocol; peers connect to each other when they have traffic to send. The first peer to connect is called the &lt;strong class='font-semibold text-navy-950'&gt;initiator&lt;/strong&gt;, and the peer it connects to is the &lt;strong class='font-semibold text-navy-950'&gt;responder&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;a href="https://www.wireguard.com/papers/wireguard.pdf" title=""&gt;&lt;em&gt;The WireGuard paper&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is a good read.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;For Fly.io, &lt;code&gt;flyctl&lt;/code&gt; is typically our initiator,  sending a single UDP packet to the gateway, which is the responder. According &lt;a href='https://www.wireguard.com/papers/wireguard.pdf' title=''&gt;to the WireGuard paper&lt;/a&gt;, this first packet is a &lt;code&gt;handshake initiation&lt;/code&gt;.  It gets better: the packet type is recorded in a single plaintext byte. So this simple BPF filter catches all the incoming connections: &lt;code&gt;udp and dst port 51820 and udp[8] = 1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In most other protocols, we&amp;rsquo;d be done at this point; we&amp;rsquo;d just scrape the username or whatnot out of the packet, go fetch the matching configuration, and install it in the kernel. With WireGuard, not so fast. WireGuard is based on Trevor Perrin&amp;rsquo;s &lt;a href='http://www.noiseprotocol.org/' title=''&gt;Noise Protocol Framework&lt;/a&gt;, and Noise goes way out of its way to &lt;a href='http://www.noiseprotocol.org/noise.html#identity-hiding' title=''&gt;hide identities&lt;/a&gt; during handshakes. To identify incoming requests, we&amp;rsquo;ll need to run enough Noise cryptography to decrypt the identity.&lt;/p&gt;

&lt;p&gt;The code to do this is fussy, but it&amp;rsquo;s relatively short (about 200 lines). Helpfully, the kernel Netlink interface will give a privileged process the private key for an interface, so the secrets we need to unwrap WireGuard are easy to get. Then it&amp;rsquo;s just a matter of running the first bit of the Noise handshake. If you&amp;rsquo;re that kind of nerdy, &lt;a href='https://gist.github.com/tqbf/9f2c2852e976e6566f962d9bca83062b' title=''&gt;here&amp;rsquo;s the code.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, we have the event feed we wanted: the public keys of every user trying to make a WireGuard connection to our gateways. We keep a rate-limited cache in SQLite, and when we see new peers, we&amp;rsquo;ll make an internal HTTP API request to fetch the matching peer information and install it. This fits nicely into the little daemon that already runs on our gateways to manage WireGuard, and allows us to ruthlessly and recklessly remove stale peers with a &lt;code&gt;cron&lt;/code&gt; job.&lt;/p&gt;

&lt;p&gt;But wait! There&amp;rsquo;s more! We bounced this plan off Jason Donenfeld, and he tipped us off on a sneaky feature of the Linux WireGuard Netlink interface.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Jason is the hardest working person in show business.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Our API fetch for new peers is generally not going to be fast enough to respond to the first handshake initiation message a new client sends us. That&amp;rsquo;s OK; WireGuard is pretty fast about retrying. But we can do better.&lt;/p&gt;

&lt;p&gt;When we get an incoming initiation message, we have the 4-tuple address of the desired connection, including the ephemeral source port &lt;code&gt;flyctl&lt;/code&gt; is using. We can install the peer as if we&amp;rsquo;re the initiator, and &lt;code&gt;flyctl&lt;/code&gt; is the responder. The Linux kernel will initiate a WireGuard connection back to &lt;code&gt;flyctl&lt;/code&gt;. This works; the protocol doesn&amp;rsquo;t care a whole lot who&amp;rsquo;s the server and who&amp;rsquo;s the client. We get new connections established about as fast as they can possibly be installed.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Launch an app in minutes&lt;/h1&gt;
    &lt;p&gt;Speedrun an app onto Fly.io and get your own JIT WireGuard peer&amp;nbsp✨&lt;/p&gt;
      &lt;a class="btn btn-lg" href="/docs/speedrun/"&gt;
        Speedrun &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='look-at-this-graph' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-this-graph' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Look at this graph&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve been running this in production for a few weeks and we&amp;rsquo;re feeling pretty happy about it. We went from thousands, or hundreds of thousands, of stale WireGuard peers on a gateway to what rounds to none. Gateways now hold a lot less state, are faster at setting up peers, and can be rebooted without having to wait for many unused peers to be loaded back into the kernel.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ll leave you with this happy Grafana chart from the day of the switchover.&lt;/p&gt;

&lt;p&gt;&lt;img alt="a Grafana chart of &amp;#39;kernel_stale_wg_peer_count&amp;#39; vs. time. For the first few hours, all traces are flat. Most are at values between 0 and 50,000 and the top-most is just under 550,000. Towards the end of the graph, each line in turn jumps sharply down to the bottom, and at the end of the chart all datapoints are indistinguishable from 0." src="/blog/jit-wireguard-peers/assets/wireguard-peers-graph.webp" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Editor&amp;rsquo;s note:&lt;/strong&gt; Despite our tearful protests, Lillian has decided to move on from Fly.io to explore new pursuits. We wish her much success and happiness!&amp;nbsp;✨&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Fly Kubernetes does more now</title>
        <link rel="alternate" href="https://fly.io/blog/fks-beta-live/"/>
        <id>https://fly.io/blog/fks-beta-live/</id>
        <published>2024-03-07T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/fks-beta-live/assets/fks-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Eons ago, we &lt;a href="https://fly.io/blog/fks/" title=""&gt;announced&lt;/a&gt; we were working on &lt;a href="https://fly.io/docs/kubernetes/" title=""&gt;Fly Kubernetes&lt;/a&gt;. It drummed up enough excitement to prove we were heading in the right direction. So, we got hard to work to get from barebones “early access” to a beta release. We’ll be onboarding customers to the closed beta over the next few weeks. Email us at &lt;a href="mailto:[email protected]"&gt;[email protected]&lt;/a&gt; and we’ll hook you up.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Fly Kubernetes is the &amp;ldquo;blessed path&amp;quot;™️ to using Kubernetes backed by Fly.io infrastructure. Or, in simpler terms, it is our managed Kubernetes service. We take care of the complexity of operating the Kubernetes control plane, leaving you with the unfettered joy of deploying your Kubernetes workloads. If you love Fly.io and K8s, this product is for you.&lt;/p&gt;
&lt;h2 id='what-even-is-a-kubernete' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-even-is-a-kubernete' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What even is a Kubernete?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So how did this all come to be&amp;mdash;and what even is a Kubernete?&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;You can see more fun details in &lt;a href="https://fly.io/blog/fks/" title=""&gt;Introducing Fly Kubernetes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you wade through all the YAML and &lt;a href='https://landscape.cncf.io/' title=''&gt;CNCF projects&lt;/a&gt;, what&amp;rsquo;s left is an API for declaring workloads and how it should be accessed. &lt;/p&gt;

&lt;p&gt;But that&amp;rsquo;s not what people usually talk / groan about. It&amp;rsquo;s everything else that comes along with adopting Kubernetes: a container runtime (CRI), networking between workloads (CNI) which leads to DNS (CoreDNS). Then you layer on Prometheus for metrics and whatever the logging daemon du jour is at the time. Now you get to debate which Ingress&amp;mdash;strike that&amp;mdash;&lt;em&gt;Gateway&lt;/em&gt; API to deploy and if the next thing is anything to do with a Service Mess, then as they like to say where I live, &amp;quot;bless your heart&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Finally, there&amp;rsquo;s capacity planning. You&amp;rsquo;ve got to pick and choose where, how and what the &lt;a href='https://kubernetes.io/docs/concepts/architecture/nodes/' title=''&gt;Nodes&lt;/a&gt; will look like in order to configure and run the workloads.&lt;/p&gt;

&lt;p&gt;When we began thinking about what a Fly Kubernetes Service could look like, we started from first principles, as we do with most everything here. The best way we can describe it is the &lt;a href='https://www.youtube.com/watch?v=Ddk9ci6geSs' title=''&gt;scene from Iron Man 2 when Tony Stark discovers a new element&lt;/a&gt;. As he&amp;rsquo;s looking at the knowledge left behind by those that came before, he starts to imagine something entirely different and more capable than could have been accomplished previously. That&amp;rsquo;s what happened to JP, but with K3s and Virtual Kubelet.&lt;/p&gt;
&lt;h2 id='ok-then-wtf-whats-the-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ok-then-wtf-whats-the-fks' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;OK then, WTF (what&amp;rsquo;s the FKS)?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We looked at what people need to get started—the API—and then started peeling away all the noise, filling in the gaps to connect things together to provide the power. Here&amp;rsquo;s how this looks currently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Containerd/CRI → &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;flyd&lt;/a&gt; + Firecracker + &lt;a href='https://fly.io/blog/docker-without-docker/' title=''&gt;our init&lt;/a&gt;: our system transmogrifies Docker containers into Firecracker microVMs
&lt;/li&gt;&lt;li&gt;Networking/CNI → Our &lt;a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''&gt;internal WireGuard mesh&lt;/a&gt; connects your pods together
&lt;/li&gt;&lt;li&gt;Pods → Fly Machines VMs
&lt;/li&gt;&lt;li&gt;Secrets → Secrets, only not the base64&amp;rsquo;d kind
&lt;/li&gt;&lt;li&gt;Services → The Fly Proxy
&lt;/li&gt;&lt;li&gt;CoreDNS → CoreDNS (to be replaced with our custom internal DNS)
&lt;/li&gt;&lt;li&gt;Persistent Volumes → Fly Volumes (coming soon)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Now&amp;hellip;not everything is a one-to-one comparison, and we explicitly did not set out to support any and every configuration. We aren&amp;rsquo;t dealing with resources like Network Policy and init containers, though we&amp;rsquo;re also not completely ignoring them. By mapping many of the core primitives of Kubernetes to a Fly.io resource, we&amp;rsquo;re able to focus on continuing to build the primitives that make our cloud better for workloads of all shapes and sizes.&lt;/p&gt;

&lt;p&gt;A key thing to notice above is that there&amp;rsquo;s no &amp;ldquo;Node&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://virtual-kubelet.io/' title=''&gt;Virtual Kubelet&lt;/a&gt; plays a central role in FKS. It&amp;rsquo;s magic, really. A Virtual Kubelet acts as if it&amp;rsquo;s a standard Kubelet running on a Node, eager to run your workloads. However, there&amp;rsquo;s no Node backing it. It instead behaves like an API, receiving requests from Kubernetes and transforming them into requests to deploy on a cloud compute service. In our case, that&amp;rsquo;s Fly Machines.&lt;/p&gt;

&lt;p&gt;So what we have is Kubernetes calling out to our &lt;a href='https://virtual-kubelet.io/docs/providers/' title=''&gt;Virtual Kubelet provider&lt;/a&gt;, a small Golang program we run alongside K3s, to create and run your pod. It creates &lt;a href='https://fly.io/blog/docker-without-docker/' title=''&gt;your pod as a Fly Machine&lt;/a&gt;, via the &lt;a href='/docs/machines/api/' title=''&gt;Fly Machines API&lt;/a&gt;, deploying it to any underlying host within that region. This shifts the burden of managing hardware capacity from you to us. We think that&amp;rsquo;s a cool trick&amp;mdash;thanks, Virtual Kubelet magic!&lt;/p&gt;
&lt;h2 id='speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#speedrun' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Speedrun&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;You can deploy your workloads (including GPUs) across any of our available regions using the Kubernetes API.&lt;/p&gt;

&lt;p&gt;You create a cluster with &lt;code&gt;flyctl&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ly2t5r5g"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-ly2t5r5g"&gt;fly ext k8s create --name hello --org personal --region iad
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When a cluster is created, it has the standard &lt;code&gt;default&lt;/code&gt; namespace. You can inspect it:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-6wmi0e1f"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-6wmi0e1f"&gt;kubectl get ns default --show-labels
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative output"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-vwi22w41"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight output whitespace-pre'&gt;&lt;code id="code-vwi22w41"&gt;NAME      STATUS   AGE   LABELS
default   Active   20d   fly.io/app=fks-default-7zyjm3ovpdxmd0ep,kubernetes.io/metadata.name=default
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;fly.io/app&lt;/code&gt; label shows the name of the Fly App that corresponds to your cluster.&lt;/p&gt;

&lt;p&gt;It would seem appropriate to deploy the &lt;a href='https://github.com/kubernetes-up-and-running/kuard' title=''&gt;Kubernetes Up And Running demo&lt;/a&gt; here, but since your pods are connected over an &lt;a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''&gt;IPv6 WireGuard mesh&lt;/a&gt;, we&amp;rsquo;re going to use a &lt;a href='https://github.com/jipperinbham/kuard' title=''&gt;fork&lt;/a&gt; with support for &lt;a href='https://github.com/kubernetes-up-and-running/kuard/issues/46' title=''&gt;IPv6 DNS&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1t0a2uma"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-1t0a2uma"&gt;kubectl run \
  --image=ghcr.io/jipperinbham/kuard-amd64:blue \
  --labels="app=kuard-fks" \
  kuard
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And you can see its Machine representation via:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-leb9l9a3"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-leb9l9a3"&gt;fly machine list --app fks-default-7zyjm3ovpdxmd0ep
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative output"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-w2frd6os"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight output whitespace-pre'&gt;&lt;code id="code-w2frd6os"&gt;ID              NAME        STATE   REGION  IMAGE                           IP ADDRESS                      VOLUME  CREATED                 LAST UPDATED            APP PLATFORM    PROCESS GROUP   SIZE
1852291c46ded8  kuard       started iad     jipperinbham/kuard-amd64:blue   fdaa:0:48c8:a7b:228:4b6d:6e20:2         2024-03-05T18:54:41Z    2024-03-05T18:54:44Z                                    shared-cpu-1x:256MB
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;This is important! Your pod is a Fly Machine! While we don’t yet support all kubectl features, Fly.io tooling will &amp;ldquo;just work&amp;rdquo; for cases where we don&amp;rsquo;t yet support the kubectl way. So, for example, we don&amp;rsquo;t have &lt;code&gt;kubectl port-forward&lt;/code&gt; and &lt;code&gt;kubectl exec&lt;/code&gt;, but you can use flyctl to forward ports and get a shell into a pod.&lt;/p&gt;

&lt;p&gt;Expose it to your internal network using the standard ClusterIP Service:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1xzi76dc"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-1xzi76dc"&gt;kubectl expose pod kuard \
  --name=kuard \
  --port=8080 \
  --target-port=8080 \
  --selector='app=kuard-fks'
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;ClusterIP Services work natively, and Fly.io internal DNS supports them. Within the cluster, CoreDNS works too.&lt;/p&gt;

&lt;p&gt;Access this Service locally via &lt;a href='https://fly.io/docs/networking/private-networking/#flycast-private-load-balancing' title=''&gt;flycast&lt;/a&gt;: Get connected to your org&amp;rsquo;s &lt;a href='https://fly.io/docs/networking/private-networking/' title=''&gt;6PN private WireGuard network&lt;/a&gt;. Get kubectl to describe the &lt;code&gt;kuard&lt;/code&gt; Service:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ucdw0m7q"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-ucdw0m7q"&gt;kubectl describe svc kuard
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative output"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1uyr9m4j"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight output'&gt;&lt;code id="code-1uyr9m4j"&gt;Name:              kuard
Namespace:         default
Labels:            app=kuard-fks
Annotations:       fly.io/clusterip-allocator: configured
                   service.fly.io/sync-version: 11507529969321451315
Selector:          app=kuard-fks
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv6
IP:                fdaa:0:48c8:0:1::1a
IPs:               fdaa:0:48c8:0:1::1a
Port:              &amp;lt;unset&amp;gt;  8080/TCP
TargetPort:        8080/TCP
Endpoints:         [fdaa:0:48c8:a7b:228:4b6d:6e20:2]:8080
Session Affinity:  None
Events:            &amp;lt;none&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can pull out the Service&amp;rsquo;s IP address from the above output, and get at the KUARD UI using that: in this case, &lt;code&gt;http://[fdaa:0:48c8:0:1::1a]:8080&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Using internal DNS: &lt;code&gt;http://&amp;lt;service_name&amp;gt;.svc.&amp;lt;app_name&amp;gt;.flycast:8080&lt;/code&gt;. Or, in our example: &lt;code&gt;http://kuard.svc.fks-default-7zyjm3ovpdxmd0ep.flycast:8080&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And finally CoreDNS: &lt;code&gt;&amp;lt;service_name&amp;gt;.&amp;lt;namespace&amp;gt;.svc.cluster.local&lt;/code&gt; resolves to the &lt;code&gt;fdaa&lt;/code&gt; IP and is routable within the cluster.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Get in on the FKS beta&lt;/h1&gt;
    &lt;p&gt;Email us at [email protected]&lt;/p&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='pricing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pricing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Pricing&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The Fly Kubernetes Service is free during the beta. Fly Machines and Fly Volumes you create with it will cost the &lt;a href='https://fly.io/docs/about/pricing/' title=''&gt;same as for your other Fly.io projects&lt;/a&gt;. It&amp;rsquo;ll be &lt;a href='https://fly.io/docs/about/pricing/#fly-kubernetes' title=''&gt;$75/mo per cluster&lt;/a&gt; after that, plus the cost of the other resources you create.&lt;/p&gt;
&lt;h2 id='today-and-the-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#today-and-the-future' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Today and the future&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Today, Fly Kubernetes supports only a portion of the Kubernetes API. You can deploy pods using Deployments/ReplicaSets. Pods are able to communicate via Services using the standard K8s DNS format. Ephemeral and persistent volumes are supported.&lt;/p&gt;

&lt;p&gt;The most notable absences are: multi-container pods, StatefulSets, network policies, horizontal pod autoscaling and emptyDir volumes. We&amp;rsquo;re working at supporting autoscaling and emptyDir volumes in the coming weeks and multi-container pods in the coming months.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;ve made it this far and are eagerly awaiting your chance to tell us and the rest of the internet &amp;ldquo;this isn&amp;rsquo;t Kubernetes!&amp;rdquo;, well, we agree! It&amp;rsquo;s not something we take lightly. We&amp;rsquo;re still building, and conformance tests may be in the future for FKS. We&amp;rsquo;ve made a deliberate decision to only care about fast launching VMs as the one and only way to run workloads on our cloud. And we also know enough of our customers would like to use the Kubernetes API to create a fast launching VM in the form of a Pod, and that&amp;rsquo;s where this story begins. &lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Globally Distributed Object Storage with Tigris</title>
        <link rel="alternate" href="https://fly.io/blog/tigris-public-beta/"/>
        <id>https://fly.io/blog/tigris-public-beta/</id>
        <published>2024-02-15T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/tigris-public-beta/assets/tigris-public-beta-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. That’s pretty cool, but we want to talk about something someone else built, that &lt;a href="https://fly.io/docs/reference/tigris/" title=""&gt;you can use today&lt;/a&gt; to build applications.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There are three hard things in computer science:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cache invalidation
&lt;/li&gt;&lt;li&gt;Naming things
&lt;/li&gt;&lt;li&gt;&lt;a href='https://aws.amazon.com/s3/' title=''&gt;Doing a better job than Amazon of storing files&lt;/a&gt;
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Of all the annoying software problems that have no business being annoying, handling a file upload in a full-stack application stands apart, a universal if tractable malady, the plantar fasciitis of programming.&lt;/p&gt;

&lt;p&gt;Now, the actual act of clients placing files on servers is straightforward. Your framework &lt;a href='https://hexdocs.pm/phoenix/file_uploads.html' title=''&gt;has&lt;/a&gt; &lt;a href='https://edgeguides.rubyonrails.org/active_storage_overview.html' title=''&gt;a&lt;/a&gt; &lt;a href='https://docs.djangoproject.com/en/5.0/topics/http/file-uploads/' title=''&gt;feature&lt;/a&gt; &lt;a href='https://expressjs.com/en/resources/middleware/multer.html' title=''&gt;that&lt;/a&gt; &lt;a href='https://github.com/yesodweb/yesod-cookbook/blob/master/cookbook/Cookbook-file-upload-saving-files-to-server.md' title=''&gt;does&lt;/a&gt; &lt;a href='https://laravel.com/docs/10.x/filesystem' title=''&gt;it&lt;/a&gt;. What&amp;rsquo;s hard is making sure that uploads stick around to be downloaded later.&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;(yes, yes, we know, &lt;a href="https://youtu.be/b2F-DItXtZs?t=102" title=""&gt;sharding /dev/null&lt;/a&gt; is faster)&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Enter object storage, a pattern you may know by its colloquial name &amp;ldquo;S3&amp;rdquo;. Object storage occupies a funny place in software architecture, somewhere between a database and a filesystem. It&amp;rsquo;s like &lt;a href='https://man7.org/linux/man-pages/man3/malloc.3.html' title=''&gt;&lt;code&gt;malloc&lt;/code&gt;&lt;/a&gt;&lt;code&gt;()&lt;/code&gt;, but for cloud storage instead of program memory.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://www.kleenex.com/en-us/' title=''&gt;S3&lt;/a&gt;—err, object storage — is so important that it was the second AWS service ever introduced (EC2 was not the first!). Everybody wants it. We know, because they keep asking us for it.&lt;/p&gt;

&lt;p&gt;So why didn&amp;rsquo;t we build it?&lt;/p&gt;

&lt;p&gt;Because we couldn&amp;rsquo;t figure out a way to improve on S3. And we still haven&amp;rsquo;t! But someone else did, at least for the kinds of applications we see on Fly.io.&lt;/p&gt;
&lt;h2 id='but-first-some-back-story' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-first-some-back-story' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;But First, Some Back Story&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;S3 checks all the boxes. It&amp;rsquo;s trivial to use. It&amp;rsquo;s efficient and cost-effective. It has redundancies that would make a DoD contractor blush. It integrates with archival services like Glacier. And every framework supports it. At some point, the IETF should just take a deep sigh and write an S3 API RFC, XML signatures and all.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s at least one catch, though.&lt;/p&gt;

&lt;p&gt;Back in, like, &amp;lsquo;07 people ran all their apps from a single city. S3 was designed to work for those kinds of apps. The data, the bytes on the disks (or whatever weird hyperputer AWS stores S3 bytes on), live in one place. A specific place. In a specific data center. As powerful and inspiring as The Architects are, they are mortals, and must obey the laws of physics.&lt;/p&gt;

&lt;p&gt;This observation feels banal, until you realize how apps have changed in the last decade. Apps and their users don&amp;rsquo;t live in one specific place. They live all over the world. When users are close to the S3 data center, things are amazing! But things get less amazing the further away you get from the data center, and even less amazing the smaller and more frequent your reads and writes are.&lt;/p&gt;

&lt;p&gt;(Thought experiment: you have to pick one place in the world to route all your file storage. Where is it? Is it &lt;a href='https://www.tripadvisor.com/Restaurant_Review-g30246-d1956555-Reviews-Ford_s_Fish_Shack_Ashburn-Ashburn_Loudoun_County_Virginia.html' title=''&gt;Loudoun County, Virginia&lt;/a&gt;?)&lt;/p&gt;

&lt;p&gt;So, for many modern apps, you end up having to &lt;a href='https://stackoverflow.com/questions/32426249/aws-s3-bucket-with-multiple-regions' title=''&gt;write things into different regions&lt;/a&gt;, so that people close to the data get it from a region-specific bucket. Doing that pulls in CDN-caching things that complicated your application and put barriers between you and your data. Before you know it, you&amp;rsquo;re wearing custom orthotics on your, uh, developer feet. (&lt;em&gt;I am done with this metaphor now, I promise.&lt;/em&gt;)&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;(well, okay, Backblaze B2 because somehow my bucket fits into their free tier, but you get the idea)&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Personally, I know this happens. Because I had to build one! I run a &lt;a href='https://xeiaso.net/blog/xedn/' title=''&gt;CDN backend&lt;/a&gt; that&amp;rsquo;s a caching proxy for S3 in six continents across the world. All so that I can deliver images and video efficiently for the readers of my blog.&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;(shut up, it’s a sandwich)&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;What if data was really global? For some applications, it might not matter much. But for others, it matters a lot. When a sandwich lover in Australia snaps a picture of a &lt;a href='https://en.wikipedia.org/wiki/Hamdog' title=''&gt;hamdog&lt;/a&gt;, the people most likely to want to see that photo are also in Australia. Routing those uploads and downloads through one building in Ashburn is no way to build a sandwich reviewing empire.&lt;/p&gt;

&lt;p&gt;Localizing all the data sounds like a hard problem. What if you didn&amp;rsquo;t need to change anything on your end to accomplish it?&lt;/p&gt;
&lt;h2 id='show-me-a-hero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#show-me-a-hero' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Show Me A Hero&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Building a miniature CDN infrastructure just to handle file uploads seems like the kind of thing that could take a week or so of tinkering. The Fly.io unified theory of cloud development is that solutions are completely viable for full-stack developers only when they take less than 2 hours to get working.&lt;/p&gt;

&lt;p&gt;AWS agrees, which is why they have a SKU for it, &lt;a href='https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_distribution' title=''&gt;called Cloudfront&lt;/a&gt;, which will, at some variably metered expense, optimize the read side of a single-write-region bucket: they&amp;rsquo;ll set up &lt;a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''&gt;a simple caching CDN&lt;/a&gt; for you. You can probably get S3 and Cloudfront working within 2 hours, especially if you&amp;rsquo;ve set it up before.&lt;/p&gt;

&lt;p&gt;Our friends at Tigris have this problem down to single-digit minutes, and what they came up with is a lot cooler than a cache CDN.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s how it works. Tigris runs redundant FoundationDB clusters in our regions to track objects. They use Fly.io&amp;rsquo;s NVMe volumes as a first level of cached raw byte store, and a queuing system modelled on &lt;a href='https://www.foundationdb.org/files/QuiCK.pdf' title=''&gt;Apple&amp;rsquo;s QuiCK paper&lt;/a&gt; to distribute object data to multiple replicas, to regions where the data is in demand, and to 3rd party object stores… like S3.&lt;/p&gt;

&lt;p&gt;If your objects are less than about 128 kilobytes, Tigris makes them instantly global. By default! Things are just snappy, all over the world, automatically, because they&amp;rsquo;ve done all the work.&lt;/p&gt;

&lt;p&gt;But it gets better, because Tigris is also much more flexible than a cache simple CDN. It&amp;rsquo;s globally distributed from the jump, with inter-region routing baked into its distribution layer. Tigris isn&amp;rsquo;t a CDN, but rather a toolset that you can use to build arbitrary CDNs, with consistency guarantees, instant purge and relay regions.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a lot going on in this architecture, and it&amp;rsquo;d be fun to dig into it more. But for now, you don&amp;rsquo;t have to understand any of it. Because Tigris ties all this stuff together with an S3-compatible object storage API. If your framework can talk to S3, it can use Tigris.&lt;/p&gt;
&lt;h2 id='fly-storage' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-storage' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;&lt;code&gt;fly storage&lt;/code&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;To get started with this, run the &lt;code&gt;fly storage create&lt;/code&gt; command:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-yj4e9y5v"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-yj4e9y5v"&gt;$ fly storage create
Choose a name, use the default, or leave blank to generate one: xe-foo-images
Your Tigris project (xe-foo-images) is ready. See details and next steps with: https://fly.io/docs/reference/tigris/

Setting the following secrets on xe-foo:
AWS_REGION
BUCKET_NAME
AWS_ENDPOINT_URL_S3
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Secrets are staged for the first deployment
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;All you have to do is fill in a bucket name. Hit enter. All of the configuration for the AWS S3 library will be injected into your application for you. And you don&amp;rsquo;t even need to change the libraries that you&amp;rsquo;re using. &lt;a href='https://www.tigrisdata.com/docs/sdks/s3/' title=''&gt;The Tigris examples&lt;/a&gt; all use the AWS libraries to put and delete objects into Tigris using the same calls that you use for S3.&lt;/p&gt;

&lt;p&gt;I know how this looks for a lot of you. It looks like we&amp;rsquo;re partnering with Tigris because we&amp;rsquo;re chicken, and we didn&amp;rsquo;t want to build something like this. Well, guess what: you&amp;rsquo;re right!&lt;/p&gt;

&lt;p&gt;Compute and networking: those are things we love and understand. Object storage? &lt;a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''&gt;We already gave away the game on how we&amp;rsquo;d design a CDN for our own content&lt;/a&gt;, and it wasn&amp;rsquo;t nearly as slick as Tigris.&lt;/p&gt;

&lt;p&gt;Object storage is important. It needs to be good. We did not want to half-ass it. So we partnered with Tigris, so that they can put their full resources into making object storage as ✨magical✨ as Fly.io is.&lt;/p&gt;

&lt;p&gt;This also mirrors a lot of the Unix philosophy of Days Gone Past, you have individual parts that do one thing very well that are then chained together to create a composite result. I mean, come on, would you seriously want to buy your servers the same place you buy your shoes?&lt;/p&gt;
&lt;h2 id='one-bill-to-rule-them-all' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#one-bill-to-rule-them-all' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;One bill to rule them all&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Well, okay, the main reason why you would want to do that is because having everything under one bill makes it really easy for your accounting people. So, to make one bill for your computer, your block storage, your databases, your networking, and your object storage, we&amp;rsquo;ve wrapped everything under one bill. You don&amp;rsquo;t have to create separate accounts with Supabase or Upstash or PlanetScale or Tigris. Everything gets charged to your Fly.io bill and you pay one bill per month.&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;This was actually going to be posted on Valentine’s Day, but we had to wait for the chocolate to go on sale.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;This is our Valentine&amp;rsquo;s Day gift to you all. Object storage that just works. Stay tuned because we have a couple exciting features that build on top of the integration of Fly.io and Tigris that allow really unique things, such as truly global static website hosting and turning your bucket into a CDN in 5 minutes at most.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s to many more happy developer days to come.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>GPUs on Fly.io are available to everyone!</title>
        <link rel="alternate" href="https://fly.io/blog/gpu-ga/"/>
        <id>https://fly.io/blog/gpu-ga/</id>
        <published>2024-02-12T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/gpu-ga/assets/gpu-ga-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Fly.io makes it easy to spin up compute around the world, now including powerful GPUs. Unlock the power of large language models, text transcription, and image generation with our datacenter-grade muscle!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;GPUs are now available to everyone!&lt;/p&gt;

&lt;p&gt;We know you&amp;rsquo;ve been excited about wanting to use GPUs on Fly.io and we&amp;rsquo;re happy to announce that they&amp;rsquo;re available for everyone. If you want, you can spin up GPU instances with any of the following cards:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ampere A100 (40GB) &lt;code&gt;a100-40gb&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Ampere A100 (80GB) &lt;code&gt;a100-80gb&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Lovelace L40s (48GB) &lt;code&gt;l40s&lt;/code&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;To use a GPU instance today, change the &lt;code&gt;vm.size&lt;/code&gt; for one of your apps or processes to any of the above GPU kinds. Here&amp;rsquo;s how you can spin up an &lt;a href='https://ollama.ai' title=''&gt;Ollama&lt;/a&gt; server in seconds:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-j62lzfgm"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-j62lzfgm"&gt;&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"your-app-name"&lt;/span&gt;
&lt;span class="py"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ord"&lt;/span&gt;
&lt;span class="py"&gt;vm.size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"l40s"&lt;/span&gt;

&lt;span class="nn"&gt;[http_service]&lt;/span&gt;
  &lt;span class="py"&gt;internal_port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;11434&lt;/span&gt;
  &lt;span class="py"&gt;force_https&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="py"&gt;auto_stop_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_start_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;min_machines_running&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="py"&gt;processes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;["app"]&lt;/span&gt;

&lt;span class="nn"&gt;[build]&lt;/span&gt;
  &lt;span class="py"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama/ollama"&lt;/span&gt;

&lt;span class="nn"&gt;[mounts]&lt;/span&gt;
  &lt;span class="py"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"models"&lt;/span&gt;
  &lt;span class="py"&gt;destination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/root/.ollama"&lt;/span&gt;
  &lt;span class="py"&gt;initial_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"100gb"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Deploy this and bam, large language model inferencing from anywhere. If you want a private setup, see the article &lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;Scaling Large Language Models to zero with Ollama&lt;/a&gt; for more information. You never know when you have a sandwich emergency and don&amp;rsquo;t know what you can make with what you have on hand.&lt;/p&gt;

&lt;p&gt;We are working on getting some lower-cost A10 GPUs in the next few weeks. We&amp;rsquo;ll update you when they&amp;rsquo;re ready.&lt;/p&gt;

&lt;p&gt;If you want to explore the possibilities of GPUs on Fly.io, here&amp;rsquo;s a few articles that may give you ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://fly.io/blog/not-midjourney-bot/' title=''&gt;Deploy Your Own (Not) MidJourney Bot On Fly GPUs&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;Scaling Large Language Models to zero with Ollama&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''&gt;Transcribing on Fly GPU Machines&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Depending on factors such as your organization&amp;rsquo;s age and payment history, you may need to go through additional verification steps.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;ve been experimenting with Fly.io GPUs and have made something cool, let us know on the &lt;a href='https://community.fly.io/' title=''&gt;Community Forums&lt;/a&gt; or by mentioning us &lt;a href='https://hachyderm.io/@flydotio' title=''&gt;on Mastodon&lt;/a&gt;! We&amp;rsquo;ll boost the cool ones.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Event Driven Machines</title>
        <link rel="alternate" href="https://fly.io/blog/event-driven-machines/"/>
        <id>https://fly.io/blog/event-driven-machines/</id>
        <published>2024-02-05T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/event-driven-machines/assets/lambdo-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We have fast booting VM’s, so why not &lt;a href="https://fly.io/docs/speedrun/" title=""&gt;take advantage of them&lt;/a&gt;?&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Serverless is great because is has good ergonomics - when an event is received, a &amp;ldquo;not-server&amp;rdquo; boots quickly, code is run, and then everything is torn down. We&amp;rsquo;re billed only on usage.&lt;/p&gt;

&lt;p&gt;It turns out that Fly.io shares many of &lt;a href='https://fly.io/blog/the-serverless-server/' title=''&gt;the same ergonomics&lt;/a&gt; as serverless. Can we do a serverless on Fly.io? 🦆 Well, if it&amp;rsquo;s quacking like a duck, let&amp;rsquo;s call it a mallard.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s a useful pattern for triggering our own not-servers with Fly Machines.&lt;/p&gt;
&lt;h2 id='triggering-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#triggering-machines' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Triggering Machines&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I want to make Machines do some work based on my own events. Fly.io can already &lt;a href='https://fly.io/docs/apps/autostart-stop/' title=''&gt;stop Machines when idle&lt;/a&gt; based on HTTP, so let&amp;rsquo;s concentrate on non-HTTP events.&lt;/p&gt;

&lt;p&gt;The process of running evented Machines involves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Listening for events
&lt;/li&gt;&lt;li&gt;Spinning up Fly Machines to run our code (with the events as context)
&lt;/li&gt;&lt;li&gt;Having event-aware code to run
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;To do this, I made a project and named it &lt;a href='https://github.com/fly-apps/lambdo' title=''&gt;&lt;strong class='font-semibold text-navy-950'&gt;Lambdo&lt;/strong&gt;&lt;/a&gt; because reasons.
You can consider this project &amp;ldquo;reference architecture&amp;rdquo; in the same way you call a toddler&amp;rsquo;s scribbling &amp;ldquo;art&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;The goal is to run some of our code on a fresh not-server when an event is received. We want this done efficiently - a Machine should only exist long enough to process an event or 3.&lt;/p&gt;

&lt;p&gt;Lambdo does just that - it receives some events, and spins up Fly Machines with those events placed &lt;em&gt;inside&lt;/em&gt; the VMs. Once the code finishes, the Machine is destroyed.&lt;/p&gt;
&lt;div class='group relative min-w-0 bg-white shadow-md shadow-navy-500/10 rounded-xl mb-7 ring-1 ring-navy-300/40'&gt;&lt;button type='button' class='bubble-wrap z-20 absolute right-2.5 top-2.5 text-transparent group-hover:text-navy-950 hocus:text-violet-600 bg-transparent group-hover:bg-white hocus:bg-violet-200/40 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none' data-wrap-target='#table-r8vpa1ai' data-wrap-type='nowrap'&gt;&lt;svg class='w-5 h-5 pointer-events-none' viewBox='0 0 20 20' fill='none' stroke='currentColor' stroke-width='1.5' stroke-linecap='round' stroke-linejoin='round'&gt;&lt;g buffered-rendering='static'&gt;&lt;path d='M11.912 10.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.314 2.314 0 00-2.315-2.31H4.959M15.187 14.5H4.959M8.802 10H4.959' /&gt;&lt;path d='M13.081 8.466l-1.548 1.571 1.548 1.571' /&gt;&lt;/g&gt;&lt;/svg&gt;&lt;span class='bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950'&gt;Wrap text&lt;/span&gt;&lt;/button&gt;&lt;div class='min-w-0 overflow-x-auto rounded-xl'&gt;&lt;table class='table-stripe table-stretch table-pad text-sm whitespace-nowrap m-0' id='table-r8vpa1ai'&gt;&lt;thead class='text-navy-950 text-left'&gt;&lt;tr&gt;
&lt;th style="text-align: center"&gt;&lt;img alt="the files are inside the computer" src="/blog/event-driven-machines/assets/files-are-inside-the-computer-cover.webp" /&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;
&lt;td style="text-align: center"&gt;The files are &lt;em&gt;in&lt;/em&gt; the computer!&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;&lt;h2 id='listening-for-events' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#listening-for-events' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Listening for Events&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;For our purposes, an event is just a JSON object. &lt;code&gt;{&amp;quot;any&amp;quot;: &amp;quot;object&amp;quot;, &amp;quot;will&amp;quot;: &amp;quot;do&amp;quot;}&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We want to turn events into compute, so we need some sort of event system. I decided to use a queue.&lt;/p&gt;
&lt;h3 id='the-queue' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-queue' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Queue&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The first thing I needed was a place to send events! I chose to use SQS, which let me continue to pretend servers don&amp;rsquo;t exist.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s no surprise then that the first part of this project is &lt;a href='https://github.com/fly-apps/lambdo/blob/main/internal/sqs/get_events.go' title=''&gt;code that polls SQS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When the polling returns some non-zero number of events, it collects the SQS messages&amp;rsquo; JSON strings (and some meta data), resulting in an array of objects (a list of events).&lt;/p&gt;

&lt;p&gt;Then we send these events to some Machines.&lt;/p&gt;
&lt;h2 id='spinning-up-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#spinning-up-machines' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Spinning Up Machines&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Fly Machines are fast-booting Micro-VM&amp;rsquo;s, controlled by an &lt;a href='https://fly.io/docs/machines/working-with-machines/' title=''&gt;API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A feature of that API is the ability to &lt;a href='https://community.fly.io/t/machine-files/14453' title=''&gt;create files&lt;/a&gt; on a new Machine. This is how we&amp;rsquo;ll get our events into the Machine.&lt;/p&gt;

&lt;p&gt;When Lambdo creates a Machine, it places a file at &lt;code&gt;/tmp/events.json&lt;/code&gt;. Our code just needs to read that file and parse the JSON.&lt;/p&gt;
&lt;h3 id='running-our-code' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#running-our-code' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Running Our Code&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Part of the ergonomics of Serverless is (usually) being limited to running just a function. Fly.io doesn&amp;rsquo;t really care what you run, which is to our advantage. We can choose to write discreet functions per event, or we can bring our whole &lt;a href='https://signalvnoise.com/svn3/the-majestic-monolith/' title=''&gt;Majestic Monolith&lt;/a&gt; to bear.&lt;/p&gt;

&lt;p&gt;How do we package up our code? The real answer is &amp;ldquo;however you want!&amp;rdquo;, but here&amp;rsquo;s 2 ideas.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Use Your Existing Code Base&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can just use your existing code base. This is especially easy if you&amp;rsquo;re already deploying apps to Fly.io.&lt;/p&gt;

&lt;p&gt;All we&amp;rsquo;d need to do is add some additional code - a command perhaps (&lt;code&gt;rake&lt;/code&gt;, &lt;code&gt;artisan&lt;/code&gt;, whatever) - that sucks in that JSON, iterates over the events, and does some stuff.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative php"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-38m30kym"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-38m30kym"&gt;&lt;span class="nv"&gt;$events&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;json_decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file_get_contents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"/tmp/events.json"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$events&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// do a thing&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When we create an event, we&amp;rsquo;ll tell Lambdo how to run your code - more on that later.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Use Lambdo&amp;rsquo;s Base Images&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This project also provides some &amp;ldquo;runtimes&amp;rdquo; (base images). This is a bit more &amp;ldquo;traditional serverless&amp;rdquo;, were you provide a function to run.&lt;/p&gt;

&lt;p&gt;Lambdo contains &lt;a href='https://github.com/fly-apps/lambdo/tree/main/runtimes' title=''&gt;two runtimes&lt;/a&gt; right now - Node and PHP. There could be more, of course, but you know&amp;hellip;lazy.&lt;/p&gt;

&lt;p&gt;The Node runtime &lt;a href='https://github.com/fly-apps/lambdo/blob/main/runtimes/js/src/index.js' title=''&gt;contains some code&lt;/a&gt; that will read the JSON payload file   (again, just an array of JSON events), and call a user-supplied JS function once per event.&lt;/p&gt;

&lt;p&gt;An &lt;a href='https://github.com/fly-apps/lambdo/tree/main/runtimes/js/sample-project' title=''&gt;example is here&lt;/a&gt; - our code just needs to export a function that does stuff to the given event:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative javascript"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-wh6y1w2p"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-wh6y1w2p"&gt;&lt;span class="c1"&gt;// File /app/index.js&lt;/span&gt;
&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Let's process an event! The event:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;a href='https://github.com/fly-apps/lambdo/tree/main/runtimes/php' title=''&gt;PHP runtime&lt;/a&gt; is the same idea, a user-supplied handler looks like this:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative php"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-jhm1hg3i"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-jhm1hg3i"&gt;&lt;span class="c1"&gt;// File /app/index.php&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Do something with $event&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Explore the &lt;a href='https://github.com/fly-apps/lambdo/tree/main/runtimes' title=''&gt;runtime&lt;/a&gt; directory of the project to see how that&amp;rsquo;s put together.&lt;/p&gt;
&lt;h2 id='sending-an-event' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#sending-an-event' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Sending an Event&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Since our events are sent via SQS queue, it would be helpful to see an example SQS message. Remember how I mentioned the SQS message has some meta data?&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s an example, with said meta data:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-cgzk0n59"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-cgzk0n59"&gt;aws sqs send-message &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--queue-url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://sqs.&amp;lt;region&amp;gt;.amazonaws.com/&amp;lt;account&amp;gt;/&amp;lt;queue&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--message-body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{"foo": "bar"}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--message-attributes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{
"size":{"DataType":"String","StringValue":"performance-2x"}, 
"image":{"DataType":"String","StringValue":"fideloper/lambdo-php-sample:latest"}
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The Body field of the SQS message is assumed to be a JSON string (it&amp;rsquo;s the event itself, and its contents are arbitrary - whatever makes sense for you).&lt;/p&gt;

&lt;p&gt;The message Attributes contains the meta data - up to 3 important details:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;image&lt;/code&gt;: The image to run (it might be a Docker Hub image, or something you pushed to registry.fly.io). This is &lt;strong class='font-semibold text-navy-950'&gt;required&lt;/strong&gt;.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;size&lt;/code&gt;: The CPU size and type to use† - defaults to &lt;code&gt;performance-2x&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;&lt;code&gt;command&lt;/code&gt;: The command to run, which is the Docker &lt;code&gt;CMD&lt;/code&gt; equivalent - defaults to whatever your &lt;code&gt;CMD&lt;/code&gt; is set in the &lt;code&gt;Dockerfile&lt;/code&gt; used to create the Machine image.††
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;†You can get valid values for the &lt;code&gt;size&lt;/code&gt; option by running &lt;code&gt;fly platform vm-sizes&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;††It&amp;rsquo;s an array form, e.g. &lt;code&gt;[&amp;quot;php&amp;quot;, &amp;quot;artisan&amp;quot;, &amp;quot;foo&amp;quot;]&lt;/code&gt;, you may need to do some escaping of double quotes if you&amp;rsquo;re sending messages to SQS via terminal.&lt;/p&gt;
&lt;h2 id='we-did-a-lambda' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-did-a-lambda' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We did a Lambda?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Fly.io isn&amp;rsquo;t serverless, but it has all these primitives that add up to serverless. You have events, Fly.io has fast-booting VM&amp;rsquo;s. They just make sense together!&lt;/p&gt;

&lt;p&gt;What we did here is use &lt;a href='https://github.com/fly-apps/lambdo' title=''&gt;&lt;strong class='font-semibold text-navy-950'&gt;Lambdo&lt;/strong&gt; to respond to events by spinning up a Machine&lt;/a&gt;. Our code can process those events any way we want.&lt;/p&gt;

&lt;p&gt;What I like about this approach is how flexible it can be. We can choose the base image to use and the server type (even using GPU-enabled Machines) &lt;em&gt;per event&lt;/em&gt;.
Since we have full control over the Machine VM&amp;rsquo;s responding to the events, we can do whatever we want inside of them. Pretty neat!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Delegating tasks to Fly Machines</title>
        <link rel="alternate" href="https://fly.io/blog/delegate-tasks-to-fly-machines/"/>
        <id>https://fly.io/blog/delegate-tasks-to-fly-machines/</id>
        <published>2024-02-01T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/delegate-tasks-to-fly-machines/assets/delegate-tasks-to-fly-machines-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io. We run apps for our users on hardware we host around the world. Leveraging Fly.io Machines and Fly.io’s private network can make delegating expensive tasks a breeze. It’s easy to &lt;a href="/docs/speedrun/" title=""&gt;get started&lt;/a&gt;!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There are many ways to delegate work in web applications, from using background workers to serverless architecture. In this article, we explore a new machine pattern that takes advantage of Fly Machines and distinct process groups to make quick work of resource-intensive tasks.&lt;/p&gt;
&lt;h2 id='the-problem' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-problem' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Problem&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s say you&amp;rsquo;re building a web application that has a few tasks that demand a hefty amount of memory or CPU juice. Resizing images, for example, can require a shocking amount of memory, but you might not need that much memory &lt;em&gt;all&lt;/em&gt; of the time, for handling most of your web requests. Why pay for all that horsepower when you don&amp;rsquo;t need it most of the time?&lt;/p&gt;

&lt;p&gt;What if there&amp;rsquo;s a different way to delegate these resource-intensive tasks?&lt;/p&gt;
&lt;h2 id='the-solution' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-solution' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Solution&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;What if you could simply delegate these types of tasks to a more powerful machine &lt;em&gt;only&lt;/em&gt; when necessary? Let&amp;rsquo;s build an example of this method in a sample app. We&amp;rsquo;ll be using Next.js today, but this pattern is framework (and language) agnostic.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s how it will work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A request hits an endpoint that does some resource-intensive tasks
&lt;/li&gt;&lt;li&gt;The request is passed on to a copy of your app that&amp;rsquo;s running on a more beefy machine
&lt;/li&gt;&lt;li&gt;The beefy machine performs the intensive work and then hands the result back to the user via the &amp;ldquo;weaker&amp;rdquo; machine.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;&lt;img alt="(A 3 panel comic of two characters, one small and one big and strong, both with computer screens for heads. Panel 1: Little guy hands the big guy a jar of pickles. Panel 2: Big guy opens the pickle jar. Panel 3: Big guy hands back the opened jar to the little guy, who is pleased; Illustration by Annie Sexton)" src="/blog/delegate-tasks-to-fly-machines/assets/./3-panel-comic-delegate-tasks-to-fly-machines.webp" /&gt;&lt;/p&gt;

&lt;p&gt;To demonstrate this task-delegation pattern, we&amp;rsquo;re going to start with a single-page application that looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img alt="(Screenshot of the demo app; its a single-page app with the header and description &amp;quot;Open Pickle Jar: You&amp;#39;ve got a jar of pickles (a zip file of some high-def pickle photos) that you would like to open (resize and display below)&amp;quot;. Under the description there are two inputs, one for width and one for height, and a button that says &amp;quot;Open pickle jar&amp;quot;)" src="/blog/delegate-tasks-to-fly-machines/assets/./pickle-jar-screenshot.webp" /&gt;&lt;/p&gt;

&lt;p&gt;Our &amp;ldquo;Open Pickle Jar&amp;rdquo; app is quite simple: you provide the width and height and it goes off and resizes some high-resolution photos to those dimensions (exciting!).&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;d like to follow along, you can clone the &lt;code&gt;start-here&lt;/code&gt; branch of this repository: &lt;a href='https://github.com/fly-apps/open-pickle-jar' title=''&gt;https://github.com/fly-apps/open-pickle-jar&lt;/a&gt; . The final changes are visible on the &lt;code&gt;main&lt;/code&gt; branch. This app uses S3 for image storage, so you&amp;rsquo;ll need to create a bucket called &lt;code&gt;open-pickle-jar&lt;/code&gt; and provide &lt;code&gt;AWS_REGION&lt;/code&gt;,  &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt;, and  &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt; as environment variables.&lt;/p&gt;

&lt;p&gt;This task is really just a stand-in for any HTTP request that kicks off a resource-intensive task. Get the request from the user, delegate it to a more powerful machine, and then return the result to the user. It&amp;rsquo;s what happens when you can&amp;rsquo;t open a pickle jar, and you ask for someone to help.&lt;/p&gt;

&lt;p&gt;Before we start, let&amp;rsquo;s define some terms and what they mean on Fly.io:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Machines:&lt;/strong&gt; Extremely fast-booting VMs. They can exist in different regions and even run different processes.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;App:&lt;/strong&gt; An abstraction for a group of Machines running your code on Fly.io, along with the configuration, provisioned resources, and data we need to keep track of to run and route to your Machines.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Process group:&lt;/strong&gt; A collection of Machines running a specific process. Many apps only run a single process (typically a public-facing HTTP server), but you can define any number of them.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;fly.toml:&lt;/strong&gt; A configuration file for deploying apps on Fly.io where you can set things like Machine specs, process groups, regions, and more.
&lt;/li&gt;&lt;/ul&gt;

&lt;hr&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Setup Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s what we&amp;rsquo;ll need for our application:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A &lt;strong class='font-semibold text-navy-950'&gt;route&lt;/strong&gt; that performs our resource-intensive task
&lt;/li&gt;&lt;li&gt;A &lt;strong class='font-semibold text-navy-950'&gt;wrapper function&lt;/strong&gt; that either:

&lt;ol&gt;
&lt;li&gt;Runs our resource-intensive task OR
&lt;/li&gt;&lt;li&gt;Forwards the request to our more powerful Machine
&lt;/li&gt;&lt;/ol&gt;
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Two process groups&lt;/strong&gt; running the &lt;em&gt;same process&lt;/em&gt; but with differing Machine specs:

&lt;ol&gt;
&lt;li&gt;One for accepting HTTP traffic and handling most requests (let&amp;rsquo;s call it &lt;code&gt;web&lt;/code&gt;)
&lt;/li&gt;&lt;li&gt;One internal-only group for doing the heavy lifting (let&amp;rsquo;s call it &lt;code&gt;worker&lt;/code&gt;)
&lt;/li&gt;&lt;/ol&gt;
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;In short, this is what our architecture will look like, a standard web and worker duo.&lt;/p&gt;

&lt;p&gt;&lt;img alt="(A simple graphic illustrating two servers; a small box containing &amp;quot;npm run start&amp;quot; and a larger box containing the same thing. The small is labeled &amp;quot;web&amp;quot; and the larger box is labeled &amp;quot;worker&amp;quot;.)" src="/blog/delegate-tasks-to-fly-machines/assets/./web-worker.webp" /&gt;&lt;/p&gt;
&lt;h3 id='creating-our-route' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#creating-our-route' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Creating our route&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Next.js has two distinct routing patterns: Pages and App router. We&amp;rsquo;ll use the App router in our example since it&amp;rsquo;s the preferred method moving forward.&lt;/p&gt;

&lt;p&gt;Under your &lt;code&gt;/app&lt;/code&gt; directory, create a new folder called &lt;code&gt;/open-pickle-jar&lt;/code&gt; containing a &lt;code&gt;route.ts&lt;/code&gt; .&lt;/p&gt;

&lt;p&gt;(We&amp;rsquo;re using TypeScript here, but feel free to use normal JavaScript if you prefer!)&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ujx4kwap"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-ujx4kwap"&gt;...
/app
  /open-pickle-jar
    route.ts
...
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Inside &lt;code&gt;route.ts&lt;/code&gt; we&amp;rsquo;ll flesh out our endpoint:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative typescript"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-j1a8qd9i"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-j1a8qd9i"&gt;&lt;span class="c1"&gt;// /app/open-pickle-jar/route.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;delegateToWorker&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/utils/delegateToWorker&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;NextRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;next/server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;openPickleJar&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../openPickleJar&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NextRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nextUrl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;delegateToWorker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;openPickleJar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The function &lt;code&gt;openPickleJar&lt;/code&gt; that we&amp;rsquo;re importing contains our resource-intensive task, which in this case is extracting images from a &lt;code&gt;.zip&lt;/code&gt; file, resizing them all to the new dimensions, and returning the new image URLs.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;POST&lt;/code&gt; function is how one define routes for specific HTTP methods in Next.js, and ours implements a function  &lt;code&gt;delegateToWorker&lt;/code&gt; that accepts the path of the current endpoint (&lt;code&gt;/open-pickle-jar&lt;/code&gt;) our resource-intensive function, and the same request parameters. This function doesn&amp;rsquo;t yet exist, so let&amp;rsquo;s build that next!&lt;/p&gt;
&lt;h3 id='creating-our-wrapper-function' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#creating-our-wrapper-function' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Creating our wrapper function&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Now that we&amp;rsquo;ve set up our endpoint, let&amp;rsquo;s flesh out the wrapper function that delegates our request to a more powerful machine.&lt;/p&gt;

&lt;p&gt;We haven&amp;rsquo;t defined our process groups just yet, but if you recall, the plan is to have two:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;web&lt;/code&gt; - Our standard web server
&lt;/li&gt;&lt;li&gt;&lt;code&gt;worker&lt;/code&gt; - For opening pickle jars (e.g. doing resource-intensive work). It&amp;rsquo;s essentially a duplicate of &lt;code&gt;web&lt;/code&gt;, but running on beefier Machines.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Here&amp;rsquo;s what we want this wrapper function to do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the current machine is a &lt;code&gt;worker&lt;/code&gt; , proceed to execute the resource-intensive task
&lt;/li&gt;&lt;li&gt;If the current machine is NOT a &lt;code&gt;worker&lt;/code&gt; , make a new request to the identical endpoint on a &lt;code&gt;worker&lt;/code&gt; Machine
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Inside your &lt;code&gt;/utils&lt;/code&gt; directory, create a file called &lt;code&gt;delegateToWorker.ts&lt;/code&gt; with the following content:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative typescript"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-sc0e1it9"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-sc0e1it9"&gt;&lt;span class="c1"&gt;// /utils/delegateToWorker.ts&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;delegateToWorker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;FLY_PROCESS_GROUP&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;worker&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;running on the worker...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;func&lt;/span&gt;&lt;span class="p"&gt;({...&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sending new request to worker...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;workerHost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODE_ENV&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;development&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;localhost:3001&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`worker.process.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;FLY_APP_NAME&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.internal:3000`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`http://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;workerHost&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({...&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In our &lt;code&gt;else&lt;/code&gt; section, you&amp;rsquo;ll notice that while developing locally (aka, when &lt;code&gt;NODE_ENV&lt;/code&gt; is &lt;code&gt;development&lt;/code&gt;) we define the hostname of our &lt;code&gt;worker&lt;/code&gt; process to be &lt;code&gt;localhost:3001&lt;/code&gt;. Typically Next.js apps run on port &lt;code&gt;3000&lt;/code&gt;, so while testing our app locally, we can have two instances of our process running in different terminal shells:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;npm run dev&lt;/code&gt; - This will run on &lt;code&gt;localhost:3000&lt;/code&gt; and will act as our local &lt;code&gt;web&lt;/code&gt; process
&lt;/li&gt;&lt;li&gt;&lt;code&gt;FLY_PROCESS_GROUP=worker npm run dev&lt;/code&gt; - This will run on &lt;code&gt;localhost:3001&lt;/code&gt; and will act as our &lt;code&gt;worker&lt;/code&gt; process (Next.js should auto-increment the port if the original &lt;code&gt;3000&lt;/code&gt; is already in use)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Also, if you&amp;rsquo;re wondering about the &lt;code&gt;FLY_PROCESS_GROUP&lt;/code&gt; and &lt;code&gt;FLY_APP_NAME&lt;/code&gt; constants, these are &lt;a href='https://fly.io/docs/reference/runtime-environment/' title=''&gt;Fly.io-specific runtime environment variables&lt;/a&gt; available on all apps.&lt;/p&gt;
&lt;h3 id='accessing-our-worker-machines-internal' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accessing-our-worker-machines-internal' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Accessing our &lt;code&gt;worker&lt;/code&gt; Machines (&lt;code&gt;.internal&lt;/code&gt;)&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Now, when this code is running in production (aka &lt;code&gt;NODE_ENV&lt;/code&gt; is NOT &lt;code&gt;development&lt;/code&gt;) you&amp;rsquo;ll see that we&amp;rsquo;re using a unique hostname to access our &lt;code&gt;worker&lt;/code&gt; Machine.&lt;/p&gt;

&lt;p&gt;Apps belonging to the same organization on Fly.io are provided a number of &lt;a href='https://fly.io/docs/networking/private-networking/#fly-io-internal-addresses' title=''&gt;internal addresses&lt;/a&gt;. These &lt;code&gt;.internal&lt;/code&gt; addresses let you point to different Apps and Machines in your private network. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;&amp;lt;region&amp;gt;.&amp;lt;app name&amp;gt;.internal&lt;/code&gt; – To reach app instances in a particular region, like &lt;code&gt;gru.my-cool-app.internal&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;&lt;code&gt;&amp;lt;app instance ID&amp;gt;.&amp;lt;app name&amp;gt;.internal&lt;/code&gt; - To reach a &lt;em&gt;specific&lt;/em&gt; app instance.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;&amp;lt;process group&amp;gt;.process.&amp;lt;app name&amp;gt;.internal&lt;/code&gt; - To target app instances belonging to a specific process group. &lt;strong class='font-semibold text-navy-950'&gt;This is what we&amp;rsquo;re using in our app.&lt;/strong&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Since our &lt;code&gt;worker&lt;/code&gt; process group is running the same process as our &lt;code&gt;web&lt;/code&gt; process (in our case, &lt;code&gt;npm run start&lt;/code&gt;), we&amp;rsquo;ll also need to make sure we use the same internal port (&lt;code&gt;3000&lt;/code&gt;).&lt;/p&gt;
&lt;h3 id='defining-our-process-groups-and-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#defining-our-process-groups-and-machines' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Defining our process groups and Machines&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The last thing to do will be to define our two process groups and their respective Machine specs. We&amp;rsquo;ll do this by editing our &lt;code&gt;fly.toml&lt;/code&gt; configuration.&lt;/p&gt;

&lt;p&gt;If you don&amp;rsquo;t have this file, go ahead and create a blank one and use the content below, but replace &lt;code&gt;app = open-pickle-jar&lt;/code&gt; with your app&amp;rsquo;s name, as well as your preferred &lt;code&gt;primary_region&lt;/code&gt;.  If you don&amp;rsquo;t know what region you&amp;rsquo;d like to deploy to, &lt;a href='https://fly.io/docs/reference/regions/' title=''&gt;here&amp;rsquo;s the list of them&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Before you deploy:&lt;/strong&gt; Note that deploying this example app will spin up &lt;strong class='font-semibold text-navy-950'&gt;billable&lt;/strong&gt; machines. Please feel free to alter the Machine (&lt;code&gt;[[vm]]&lt;/code&gt;) specs listed here to ones that suit your budget or app&amp;rsquo;s needs.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-lcp533re"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-lcp533re"&gt;app = "open-pickle-jar"
primary_region = "sea"

[build]

[processes]
  web = "npm run start"
  worker = "npm run start"

[http_service]
  internal_port = 3000
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 1
  processes = ["web"]

[[vm]]
  cpu_kind = "shared"
  cpus = 1
  memory_mb = 1024
  processes = ["web"]

[[vm]]
  size = "performance-4x"
  processes = ["worker"]
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And that&amp;rsquo;s it! With our &lt;code&gt;fly.toml&lt;/code&gt; finished, we&amp;rsquo;re ready to deploy our app!&lt;/p&gt;

&lt;p&gt;&lt;img src="https://slabstatic.com/prod/uploads/p1b436gf/posts/images/tH4GaGLVaDkh3RhIwCpiDRX3.png" /&gt;&lt;/p&gt;
&lt;h2 id='discussion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#discussion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Discussion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Today we built a machine pattern on top of Fly.io. This pattern allows us to have a lighter request server that can delegate certain tasks to a stronger server, meaning that we can have one Machine do all the heavy lifting that could block everything else while the other handles all the simple tasks for users. With this in mind, this is a fairly naïve implementation, and we can make this much better:&lt;/p&gt;
&lt;h3 id='using-a-queue-for-better-resiliency' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#using-a-queue-for-better-resiliency' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Using a queue for better resiliency&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;In its current state, our code isn&amp;rsquo;t very resilient to failed requests. For this reason, you may want to consider keeping track of jobs in a queue with Redis (similar to Sidekiq in Ruby-land). When you have work you want to do, put it in the queue. Your queue worker would have to write the result somewhere (e.g., in Redis) that the application could fetch when it&amp;rsquo;s ready.&lt;/p&gt;
&lt;h3 id='starting-stopping-worker-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#starting-stopping-worker-machines' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Starting/stopping worker Machines&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The benefit of this pattern is that you can limit how many &amp;ldquo;beefy&amp;rdquo; Machines you need to have available at any given time. Our demo app doesn&amp;rsquo;t dictate how many &lt;code&gt;worker&lt;/code&gt; Machines to have at any given time, but by adding timeouts you could elect to start and stop them as needed.&lt;/p&gt;

&lt;p&gt;Now, you may think that constantly starting and stopping Machines might incur higher response times, but note that we are NOT talking about creating/destroying Machines. Starting and stopping Machines only takes as long as it takes to start your web server (i.e. &lt;code&gt;npm run start&lt;/code&gt;). The best part is that &lt;strong class='font-semibold text-navy-950'&gt;Fly.io does not charge for the CPU and RAM usage of stopped Machines.&lt;/strong&gt; &lt;a href='https://community.fly.io/t/we-are-going-to-start-collecting-charges-for-stopped-machines-rootfs-starting-april-25th/17825' title=''&gt;We will charge for storage of their root filesystems on disk, starting April 25th, 2024&lt;/a&gt;. Stopped Machines will still be much cheaper than running ones.&lt;/p&gt;
&lt;h3 id='what-about-serverless-functions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-about-serverless-functions' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What about serverless functions?&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;This &amp;ldquo;delegate to a beefy machine&amp;rdquo; pattern is similar to serverless functions with platforms like AWS Lambda. The main difference is that serverless functions usually require you to segment your application into a bunch of small pieces, whereas the method discussed today just uses the app framework that you deploy to production. Each pattern has its own benefits and downsides.&lt;/p&gt;
&lt;h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Conclusion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The pattern outlined here is one more tool in your arsenal for scaling applications. By utilizing Fly.io&amp;rsquo;s private network and &lt;code&gt;.internal&lt;/code&gt; domains, it&amp;rsquo;s quick and easy to pass work between different processes that run our app. If you&amp;rsquo;d like to learn about more methods for scaling tasks in your applications, check out &lt;a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''&gt;Rethinking Serverless with FLAME&lt;/a&gt; by Chris McCord and &lt;a href='https://fly.io/blog/print-on-demand/' title=''&gt;Print on Demand&lt;/a&gt; by Sam Ruby.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Get more done on Fly.io&lt;/h1&gt;
    &lt;p&gt;Fly.io has fast booting machines at the ready for your dynamic workloads. It&amp;rsquo;s easy to get started. You can be off and running in minutes.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/speedrun/"&gt;
        Deploy something today!  &lt;span class='opacity:50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-rabbit.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;</content>
    </entry>
    <entry>
        <title>Macaroons Escalated Quickly</title>
        <link rel="alternate" href="https://fly.io/blog/macaroons-escalated-quickly/"/>
        <id>https://fly.io/blog/macaroons-escalated-quickly/</id>
        <published>2024-01-31T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/macaroons-escalated-quickly/assets/evil-cookies-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We built a new security token system, and can I tell you the good news about our lord and savior the Macaroon?&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;1&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s implement an API token together. It&amp;rsquo;s a design called &amp;ldquo;Macaroons&amp;rdquo;, but don&amp;rsquo;t get hung up on that yet.&lt;/p&gt;

&lt;p&gt;First some &lt;button toggle="#includes"&gt;throat-clearing&lt;/button&gt;. Then:&lt;/p&gt;
&lt;div id="includes" toggle-content="" aria-label="show very boring code"&gt;&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-i4dzjfqm"&gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959"&gt;&lt;/path&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571"&gt;&lt;/path&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling"&gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z"&gt;&lt;/path&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617"&gt;&lt;/path&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class="highlight relative group"&gt;
    &lt;pre class="highlight "&gt;&lt;code id="code-i4dzjfqm"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;hmac&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;base64&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b64decode&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;hashlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sha256&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;b64decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;/div&gt;&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-b0hki89a"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-b0hki89a"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;blank_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="n"&gt;nonce&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;":"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;urandom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)]))&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;))])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Bearer tokens: like cookies, blobs you attach to a request (usually in an HTTP header).&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We&amp;rsquo;re going to build a minimally-stateful bearer token, a blob signed with HMAC. Nothing fancy so far. &lt;a href='https://api.rubyonrails.org/classes/ActiveSupport/MessageVerifier.html' title=''&gt;Rails has done this&lt;/a&gt; for a decade and a half.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a &lt;a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''&gt;fashion in API security for stateless tokens&lt;/a&gt;, which encode all the data you&amp;rsquo;d need to check any request accompanied by that token – without a database lookup. Stateless tokens have some nice properties, and some less-nice. Our tokens won&amp;rsquo;t be stateless: they carry a user ID, with which we&amp;rsquo;ll look up the HMAC key to verify it. But they&amp;rsquo;ll stake out a sort of middle ground.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-v32ipl21"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-v32ipl21"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;attenuate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;macStr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;mac&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;macStr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cavStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;oldTail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;newTail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oldTail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cavStr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;cavStr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newTail&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;m0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blank_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;m1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;attenuate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'path'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'/images'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;m2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;attenuate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'op'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'read'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s add some stuff.&lt;/p&gt;

&lt;p&gt;The meat of our tokens will be a series of claims we call &amp;ldquo;caveats&amp;rdquo;. We call them that because each claim restricts further what the token authorizes. After &lt;code&gt;{&amp;#39;path&amp;#39;: &amp;#39;/images&amp;#39;}&lt;/code&gt;, this token only allows operations that happen underneath the &lt;code&gt;/images&lt;/code&gt; directory. Then, after &lt;code&gt;{&amp;#39;op&amp;#39;: &amp;#39;read&amp;#39;}&lt;/code&gt;, it allows only reads, not writes.&lt;/p&gt;

&lt;p&gt;(I guess we&amp;rsquo;re building a file sharing system. Whatever.)&lt;/p&gt;

&lt;p&gt;Some important things about things about this design. First: by implication from the fact that caveats further restrict tokens, a token with no caveats restricts nothing. It&amp;rsquo;s a god-mode token. Don&amp;rsquo;t honor it.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;In other words: the ordering of caveats doesn’t matter.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Second: the rule of checking caveats is very simple: every single caveat must pass, evaluating &lt;code&gt;True&lt;/code&gt; against the request that carries it, in isolation and without reference to any other caveat. If any caveat evaluates &lt;code&gt;False&lt;/code&gt;, the request fails. In that way, we ensure that adding caveats to a token can only ever weaken it.&lt;/p&gt;

&lt;p&gt;With that in mind, take a closer look at this code:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-2and4pv7"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-2and4pv7"&gt;&lt;span class="n"&gt;oldTail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;newTail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oldTail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cavStr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Every caveat is HMAC-signed independently, which is weird. Weirder still, the key for that HMAC is the output of the last HMAC. The caveats chain together, and the HMAC of the last caveat becomes the &amp;ldquo;tail&amp;rdquo; of the token.&lt;/p&gt;

&lt;p&gt;Creating a new blank token for a particular user requires a key that the server (and probably only the server) knows. But adding a caveat doesn&amp;rsquo;t! Anybody can add a caveat. In our design, you, the user, can edit your own API token.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-zkheo9e4"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-zkheo9e4"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;macStr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;mac&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;macStr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;nonce&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;":"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])]&lt;/span&gt;
    &lt;span class="n"&gt;tail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;tail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tail&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compare_digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

&lt;span class="n"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# =&amp;gt; True
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For completeness, and to make a point, there&amp;rsquo;s the verification code. Look up the original secret key from the user ID,  and then it&amp;rsquo;s chained HMAC all the way down. The point I&amp;rsquo;m making is that Macaroons are very simple.&lt;/p&gt;
&lt;h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;2&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Back in 2014, Google published &lt;a href='https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41892.pdf' title=''&gt;a paper at NDSS&lt;/a&gt; introducing &amp;ldquo;Macaroons&amp;rdquo;, a new kind of cookie. Since then, they&amp;rsquo;ve become a sort of hipster shibboleth. But they&amp;rsquo;re more talked about than implemented, which is a nice way to say that practically nobody uses them.&lt;/p&gt;

&lt;p&gt;Until now! I dragged Fly.io into implementing them. Suckers!&lt;/p&gt;

&lt;p&gt;We had a problem: our API tokens were much too powerful. We needed to scope them down and let them express roles, and I scoped up that project to replace OAuth2 tokens altogether. We now have what I think is one of the more expansive Macaroon implementations on the Internet.&lt;/p&gt;

&lt;p&gt;I dragged us into using Macaroons because I wanted us to use a hipster token format. Google designed Macaroons for a bigger reason: they hoped to replace browser cookies with something much more powerful.&lt;/p&gt;

&lt;p&gt;The problem with simple bearer tokens, like browser cookies or JWTs, is that they&amp;rsquo;re prone to being stolen and replayed by attackers.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;game-over: pentest jargon for “very bad”&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Worse, a stolen token is usually a game-over condition. In most schemes, a bearer token is an all-access pass for the associated user. For some applications this isn&amp;rsquo;t that big a deal, but then, &lt;a href='https://neilmadden.blog/2020/09/09/macaroon-access-tokens-for-oauth-part-2-transactional-auth/' title=''&gt;think about banking&lt;/a&gt;. A banking app token that authorizes arbitrary transactions is a recipe for having a small heart attack on every HTTP request.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(Perfectly minimized API tokens: a software security holy grail)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Macaroons are user-editable tokens that enable JIT-generated least-privilege tokens. With minimal ceremony and no additional API requests, a banking app Macaroon lets you authorize a request with a caveat like, I don&amp;rsquo;t know, &lt;code&gt;{&amp;#39;maxAmount&amp;#39;: &amp;#39;$5&amp;#39;}&lt;/code&gt;. I mean, something way better than that, probably lots of caveats, not just one, but you get the idea: a token so minimized you feel safe sending it with your request. Ideally, a token that only authorizes that single, intended request.&lt;/p&gt;
&lt;h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;3&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;That&amp;rsquo;s not why we like Macaroons. We already assume our tokens aren&amp;rsquo;t being stolen.&lt;/p&gt;

&lt;p&gt;In most systems, the developers come up with a permissions system, and you’re stuck with it. We run a public cloud platform, and people want a lot of different things from our permissions. The dream is, we (the low-level platform developers on the team) design a single permission system, one time, and go about our jobs never thinking about this problem again.&lt;/p&gt;

&lt;p&gt;Instead of thinking of all of our &amp;ldquo;roles&amp;rdquo; in advance, we just model our platform with caveats:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Users belong to &lt;code&gt;Organizations&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;Organizations&lt;/code&gt; own &lt;code&gt;Apps&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;Apps&lt;/code&gt; contain &lt;code&gt;Machines&lt;/code&gt; and &lt;code&gt;Volumes&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;To any of these things, you can &lt;code&gt;Read&lt;/code&gt;, &lt;code&gt;Write&lt;/code&gt;, &lt;code&gt;Create&lt;/code&gt;, &lt;code&gt;Delete&lt;/code&gt;, and/or &lt;code&gt;Control&lt;/code&gt; &lt;aside class="right-sidenote"&gt;control being change of state, like “start” and “stop”&lt;/aside&gt;.
&lt;/li&gt;&lt;li&gt;Some administrivia, like expiration (&lt;code&gt;ValidityWindow&lt;/code&gt;), locking tokens to specific Fly Machines (&lt;code&gt;FromMachineSource&lt;/code&gt;), and escape hatches like &lt;code&gt;Mutation&lt;/code&gt; (for our GraphQL API).
&lt;/li&gt;&lt;/ol&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(this is a vibes-based notation, don’t think too hard about it)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Simplistic. But it expresses admin tokens:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-qulbfdw0"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-qulbfdw0"&gt;Organization 4721, mask=*
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And it expresses normal user tokens:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-lag6amtu"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-lag6amtu"&gt;Organization 4721, mask=read,write,control
(App 123, mask=control), (App 345, mask=read, write, control)
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And also an auditor-only token for that user:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-bikhlk3o"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-bikhlk3o"&gt;Organization 4721, mask=read,write,control
(App 123, mask=control), (App 345, mask=read, write, control)
Organization 4721, mask=read
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;(our deploy tokens are more complicated than this)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Or a deployment-only token, for a CI/CD system:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-rxeo12rj"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-rxeo12rj"&gt;Organization 4721, mask=write,control
(App 123, mask=*)
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Those are just the roles we came up with. Users can invent others. The important thing is that they don&amp;rsquo;t have to bother me about them.&lt;/p&gt;
&lt;h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;4&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Astute readers will have noticed by now that we haven&amp;rsquo;t shown any code that actually evaluates a caveat. That&amp;rsquo;s because it&amp;rsquo;s boring, and I&amp;rsquo;m too lazy to write it out. Got an &lt;code&gt;Organization&lt;/code&gt; token for &lt;code&gt;image-hosting&lt;/code&gt; that allows &lt;code&gt;Reads&lt;/code&gt;? Ok; check and make sure the incoming request is for an asset of &lt;code&gt;image-hosting&lt;/code&gt;, and that it’s a &lt;code&gt;Read&lt;/code&gt;. Whatever code you came up with, it’d be fine.&lt;/p&gt;

&lt;p&gt;These straightforward restrictions are called &amp;ldquo;first party caveats&amp;rdquo;. The first party is us, the platform. We&amp;rsquo;ve got all the information we need to check them.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s kit out our token format some more.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-zcmmxwd0"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-zcmmxwd0"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;third_party_caveat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ka&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;crk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;urandom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ticket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ka&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="s"&gt;'crk'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crk&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s"&gt;'msg'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;
    &lt;span class="p"&gt;})))&lt;/span&gt;
    &lt;span class="n"&gt;challenge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;crk&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;'url'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'ticket'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'challenge'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;challenge&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"YELLOW SUBMARINE"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://canary.service"&lt;/span&gt;
&lt;span class="n"&gt;c3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;third_party_caveat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'bobson.dugnutt'&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="n"&gt;m3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;attenuate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Up till now, we&amp;rsquo;ve gotten by with nothing but HMAC, which is one of the great charms of the design. Now we need to encrypt. There&amp;rsquo;s no authenticated encryption in the Python standard library, but that won&amp;rsquo;t stop us. &lt;button toggle="#hmac-ctr"&gt;Ready to make some candy? Hand me that brake fluid!&lt;/button&gt;&lt;/p&gt;
&lt;div id="hmac-ctr" toggle-content="" aria-label="show very silly code"&gt;&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-qurlq0h7"&gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959"&gt;&lt;/path&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571"&gt;&lt;/path&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling"&gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z"&gt;&lt;/path&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617"&gt;&lt;/path&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class="highlight relative group"&gt;
    &lt;pre class="highlight "&gt;&lt;code id="code-qurlq0h7"&gt;&lt;span class="c1"&gt;# do i really need to say that i'm not serious about this?
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hmactr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
     &lt;span class="n"&gt;ks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;xrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maxint&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
         &lt;span class="n"&gt;ks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
         &lt;span class="n"&gt;kbs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
         &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;xrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;kbs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ak&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'auth'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;nonce&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;urandom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cipher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hmactr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'enc'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ctxt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;bytearray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;xrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;ctxt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;^=&lt;/span&gt; &lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cipher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nonce&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctxt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ak&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ak&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'auth'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compare_digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ak&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;()):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="n"&gt;nonce&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;cipher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hmactr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'enc'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ptxt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;bytearray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;xrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;])):&lt;/span&gt;
        &lt;span class="n"&gt;ptxt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;^=&lt;/span&gt; &lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cipher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ptxt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;With &amp;ldquo;third-party&amp;rdquo; caveats comes a cast of characters. We&amp;rsquo;re still the first party. You&amp;rsquo;ll play the second party. The third party is any other system in the world that you trust: an SSO system, an audit log, a revocation checker, whatever.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s the trick of the third-party caveat: our platform doesn&amp;rsquo;t know what your caveat means, and it doesn&amp;rsquo;t have to. Instead, when you see a third-party caveat in your token, you tear a ticket off it and exchange it for a &amp;ldquo;discharge Macaroon&amp;rdquo; with that third party. You submit both Macaroons together to us.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s attenuate our token with a third-party caveat hooking it up to a &amp;ldquo;canary&amp;rdquo; service that generates a notice approximately any time the token is used.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/macaroons-escalated-quickly/assets/third-party.png?1/2&amp;amp;wrap-left" /&gt;&lt;/p&gt;

&lt;p&gt;To build that canary caveat, you first make a &lt;code&gt;ticket&lt;/code&gt; that users of the token will hand to your canary, and then a &lt;code&gt;challenge&lt;/code&gt; that Fly.io will use to verify discharges your checker spits out. The ticket and the challenge are both encrypted. The ticket is encrypted under &lt;code&gt;KA&lt;/code&gt;, so your service can read it. The challenge is encrypted under the previous Macaroon tail, so only Fly.io can read it. Both hide yet another key, the random HMAC key &lt;code&gt;CRK&lt;/code&gt; (&amp;ldquo;caveat root key&amp;rdquo;).&lt;/p&gt;

&lt;p&gt;In addition to &lt;code&gt;CRK&lt;/code&gt;, the ticket contains a message, which says whatever you want it to; Fly.io doesn&amp;rsquo;t care. Typically, the message describes some kind of additional checking you want your service to perform before spitting out a discharge token.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-lrt3p4uz"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-lrt3p4uz"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;discharge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ka&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ptxt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;decrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ka&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ptxt&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="n"&gt;tbody&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ptxt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# not shown: do something with tbody['msg']
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tbody&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'crk'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;))])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To authorize a request with a token that includes a third-party caveat for the canary service, you need to get your hands on a corresponding discharge Macaroon. Normally, you do that by &lt;code&gt;POST&lt;/code&gt;ing the ticket from the caveat to the service.&lt;/p&gt;

&lt;p&gt;Discharging is simple. The service, which holds &lt;code&gt;KA&lt;/code&gt;, uses it to decrypt the ticket. It checks the message and makes some decisions. Finally, it mints a new macaroon, using &lt;code&gt;CRK&lt;/code&gt;, recovered from the ticket, as the root key. The ticket itself is the nonce.&lt;/p&gt;

&lt;p&gt;If it wants, the third-party service can slap on a bunch of first-party caveats of its own. When we verify the Macaroon, we&amp;rsquo;ll copy those caveats out and enforce them. Attenuation of a third-party discharge macaroon works like a normal macaroon.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-6nzqrw14"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-6nzqrw14"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify_third_party&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;discharges&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[]):&lt;/span&gt;
    &lt;span class="n"&gt;crk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;decrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'challenge'&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;crk&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="n"&gt;discharge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;dcs&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;discharges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dcs&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'ticket'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;discharge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dcs&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;discharge&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="n"&gt;mac&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;discharge&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;crk&lt;/span&gt;
    &lt;span class="c1"&gt;# boring old stuff ---------------------
&lt;/span&gt;    &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compare_digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To verify tokens that have third-party caveats, start with the root Macaroon, walking the caveats like usual. At each third-party caveat, match the &lt;code&gt;ticket&lt;/code&gt; from the caveat with the &lt;code&gt;nonce&lt;/code&gt; on the discharge Macaroon. The key for root Macaroon decrypts the &lt;code&gt;challenge&lt;/code&gt; in the caveat, recovering &lt;code&gt;CRK&lt;/code&gt;, which cryptographically verifies the discharge.&lt;/p&gt;

&lt;p&gt;(The Macaroons paper uses different terms: “caveat identifier” or &lt;code&gt;cId&lt;/code&gt; for “ticket”, and “verification-key identifier” or &lt;code&gt;vId&lt;/code&gt; for “challenge”. These names are self-evidently bad and our contribution to the state of the art is to replace them.)&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s two big applications for third-party caveats in Popular Macaroon Thought. First, they facilitate microservice-izing your auth logic, because you can stitch arbitrary policies together out of third-party caveats. And, they seem like &lt;a href='https://github.com/go-macaroon-bakery/macaroon-bakery' title=''&gt;fertile ground for an ecosystem of interoperable Macaroon services&lt;/a&gt;: Okta and Google could stand up SSO dischargers, for instance, or someone can do a really good revocation service.&lt;/p&gt;

&lt;p&gt;Neither of these light us up. We&amp;rsquo;re allergic to microservices. As for public protocols, well, it&amp;rsquo;s good to want things. So we almost didn&amp;rsquo;t even implement third-party caveats.&lt;/p&gt;
&lt;h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;5&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m glad we did though, because they&amp;rsquo;ve been pretty great.&lt;/p&gt;

&lt;p&gt;The first problem third-party caveats solved for us was hazmat tokens. To the extent possible, we want Macaroon tokens to be safe to transmit between users. Our Macaroons express permissions, but not authentication, so it’s almost safe to email them.&lt;/p&gt;

&lt;p&gt;The way it works is, our Macaroons all have a third-party caveat pointing to a &amp;ldquo;login service&amp;rdquo;, either identifying the proper bearer as a particular Fly.io user or as a member of some &lt;code&gt;Organization&lt;/code&gt;. To allow a request with your token, you first need to collect the discharge from the login service, which requires authentication.&lt;/p&gt;

&lt;p&gt;The login discharge is very sensitive, but there isn&amp;rsquo;t much reason to pass it around. The original permissions token is where all the interesting stuff is, and it&amp;rsquo;s not scary. So that&amp;rsquo;s nice.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/macaroons-escalated-quickly/assets/fly-sso.png?1/3&amp;amp;wrap-left" /&gt;&lt;/p&gt;

&lt;p&gt;Ben then came up with &lt;a href="https://community.fly.io/t/organization-required-sso/17560"&gt;third-party caveats that require Google or Github SSO logins.&lt;/a&gt; If your token has one of those caveats, when you run &lt;code&gt;flyctl deploy&lt;/code&gt;, a browser will pop up to log you into your SSO IdP (if you haven’t done so recently already).&lt;/p&gt;

&lt;p&gt;We’ve put a &lt;a href='https://fly.io/blog/tokenized-tokens/#tokenizer-the-fabled-4th-way' title=''&gt;bunch of work into getting the guts of our SSO system working&lt;/a&gt;, but that work has mostly been invisible to customers. But Macaroon-ized SSO has a subtle benefit: you can configure &lt;a href='http://Fly.io' title=''&gt;Fly.io&lt;/a&gt; to automatically add SSO requirements to specific &lt;code&gt;Organizations&lt;/code&gt; (so, for instance, a dev environment might not need SSO at all, and prod might need two).&lt;/p&gt;

&lt;p&gt;SSO requirements in most applications are a brittle pain in the ass. Ours are flexible and straightforward, and that happened almost by accident. Macaroons, baby!&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s a fun thing you can do with a Macaroon system: stand up a Slack bot, and give it an HTTP &lt;code&gt;POST&lt;/code&gt; handler that accepts third-party tickets. Then:&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/macaroons-escalated-quickly/assets/bot-ok.png?1/2&amp;amp;center&amp;amp;border" /&gt;&lt;/p&gt;

&lt;p&gt;So, the bot is cute, but any platform could do that. What’s cool is the way our platform &lt;em&gt;doesn’t&lt;/em&gt; work with Slack; in fact, nothing on our platform knows anything about Slack, and Slack doesn’t know anything about us. We didn’t reach out to a Slack endpoint. Everything was purely cryptographic.&lt;/p&gt;

&lt;p&gt;That bot could, if I sunk some time into it, enforce arbitrary rules: it could selectively add caveats for the requests it authorizes, based on lookups of the users requesting them, at specific times of day, with specific logging. Theoretically, it could add third-party caveats of its own.&lt;/p&gt;

&lt;p&gt;The win for us for third-party caveats is that they create a plugin system for our security tokens. That’s an unusual place to see a plugin interface! But Macaroons are easy to understand and keep in your head, so we&amp;rsquo;re pretty confident about the security issues.&lt;/p&gt;
&lt;h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;6&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Obviously, we didn&amp;rsquo;t write our Macaroon code in Python, or with HMAC-SHA256-CTR.&lt;/p&gt;

&lt;p&gt;We landed on a primary implementation Golang (Ben subsequently wrote an Elixir implementation). Our hash is SHA256, our cipher is Chapoly. We encode in MsgPack.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;We didn’t use the pre-existing public implementation because &lt;a href="https://securitycryptographywhatever.com/2021/08/12/what-do-we-do-about-jwt-with-jonathan-rudenberg/" title=""&gt;we were warned not to&lt;/a&gt;. The Macaroon idea is simple, and it exists mostly as an academic paper, not a standard. The community that formed around building open source “standard” Macaroons decided to use untyped opaque blobs to represent caveats. We need things to be as rigidly unambiguous as they can be.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img src="/blog/macaroons-escalated-quickly/assets/verifier-service.png?2/3&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;The big strength of Macaroons as a cryptographic design — that it’s based almost entirely on HMAC — makes it a challenge to deploy. If you can verify a Macaroon, you can generate one.  We have thousands of servers. They can&amp;rsquo;t all be allowed to generate tokens.&lt;/p&gt;

&lt;p&gt;What we did instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We split token checking into “verification” of token HMAC tags and “clearing” of token caveats.
&lt;/li&gt;&lt;li&gt;Verification occurs only on a physically isolated token-verification service; to verify a token’s tag, you  HTTP &lt;code&gt;POST&lt;/code&gt; the token to the verifier.
&lt;/li&gt;&lt;li&gt;Clearing of token caveats can happen anywhere. Token caveat clearing is domain-specific and subject to change; token verification is simple cryptography and  changes rarely.
&lt;/li&gt;&lt;li&gt;A token verification is cacheable. The client library for the token verifier does that, which speeds things up by exploiting the locality of token submissions.
&lt;/li&gt;&lt;li&gt;The verification service is backed by a &lt;a href='https://fly.io/docs/litefs/' title=''&gt;LiteFS-distributed SQLite database&lt;/a&gt;, so verification is fast globally — a major step forward from our legacy OAuth2 tokens, which are only fast in Ashburn, VA.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;&lt;img src="/blog/macaroons-escalated-quickly/assets/service-token.png?2/3&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;Now buckle up, because I&amp;rsquo;m about to try to get you to care about service tokens.&lt;/p&gt;

&lt;p&gt;We operate &amp;ldquo;worker servers&amp;rdquo; all over the world to host apps for our customers. To do that, those workers need access to customer secrets, like the key to decrypt a customer volume. To retrieve those secrets, the workers have to talk to secrets management servers.&lt;/p&gt;

&lt;p&gt;We manage a lot of workers. We trust them. But we don&amp;rsquo;t trust them that much, if you get my drift. You don&amp;rsquo;t want to just leave it up to the servers to decide which secrets they can access. The blast radius of a problem with a single worker should be no greater than the apps that are supposed to run there.&lt;/p&gt;

&lt;p&gt;The gold standard for approving access to customer information is, naturally, explicit customer authorization. We almost have that with Macaroons! The first time an app runs on a worker, &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;the orchestrator code&lt;/a&gt; has a token, and it can pass that along to the secret stores.&lt;/p&gt;

&lt;p&gt;The problem is, you need that token more than once; not just when the user does a deploy, but potentially any time you restart the app or migrate it to a new worker. And you can&amp;rsquo;t just store and replay user Macaroons. They have expirations.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This is like dropping privilege with things like pledge(2), but in a distributed system.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;So our token verification service exposes an API that transforms a user token into a “service token”, which is just the token with the authentication caveat and expiration “stripped off”.&lt;/p&gt;

&lt;p&gt;What’s cool is: components that receive service tokens can attenuate them. For instance, we could lock a token to a particular worker, or even a particular Fly Machine. Then we can expose the whole &lt;a href='https://fly.io/docs/machines/working-with-machines/' title=''&gt;Fly Machines API&lt;/a&gt; to customer VMs while keeping access traceable to specific customer tokens. Stealing the token from a Fly Machine doesn’t help you since it’s locked to that Fly Machine by a caveat attackers can’t strip.&lt;/p&gt;
&lt;h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;7&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;If a customer loses their tokens to an attacker, we can’t just blow that off and let the attacker keep compromising the account!&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This cancels every token derived through attenuation by that nonce.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Every Macaroon we issue is identified by a unique nonce, and we can revoke tokens by that nonce. This is just a basic function of the token verification service we just described.&lt;/p&gt;

&lt;p&gt;We host token caches all over our fleet. Token revocation invalidates the caches. Anything with a cache checks frequently whether to invalidate. Revocation is rare, so just keeping a revocation list and invalidating caches wholesale seems fine.&lt;/p&gt;
&lt;h2 id='8' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#8' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;8&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I get it, it&amp;rsquo;s tough to get me to shut up about Macaroons.&lt;/p&gt;

&lt;p&gt;A couple years ago, I &lt;a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''&gt;wrote a long survey of API token designs&lt;/a&gt;, from JWTs (never!) to Biscuits. I had a &lt;a href='https://fly.io/blog/api-tokens-a-tedious-survey/#macaroons' title=''&gt;bunch to say about Macaroons&lt;/a&gt;, not all of it positive, and said we&amp;rsquo;d be plowing forward with them at Fly.io.&lt;/p&gt;

&lt;p&gt;My plan had been to follow up soon after with a deep dive on Macaroons as we planned them for Fly.io. I&amp;rsquo;m glad I didn&amp;rsquo;t do that, not just because it would&amp;rsquo;ve been embarrassing to announce a feature that took us over 2 years to launch, but also because the process of working on this with Ben Toews changed a lot of my thinking about them.&lt;/p&gt;

&lt;p&gt;I think if you asked Ben, he&amp;rsquo;d say he had mixed feelings about how much complexity we wrangled to get this launched. On the other hand: we got a lot of things out of them without trying very hard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Security tokens you can (almost) email to your users and partners without putting your account at risk.
&lt;/li&gt;&lt;li&gt;A flexible permission system, encoded directly into the tokens, that users can drive without talking to our servers.
&lt;/li&gt;&lt;li&gt;A plugin system that users can (when we clean up the tooling) use themselves, to add things like Passkeys or two-person-approval rules or audit logging, without us getting in the middle.
&lt;/li&gt;&lt;li&gt;An SSO system that can stack different IdPs, mandate SSO login, and do that on a per-&lt;code&gt;Organization&lt;/code&gt; basis.
&lt;/li&gt;&lt;li&gt;&lt;a href='https://www.latacora.com/blog/2018/06/12/a-childs-garden/' title=''&gt;Inter-service authorization&lt;/a&gt; that is traceable back to customer actions, so our servers can&amp;rsquo;t just make up which apps they&amp;rsquo;re allowed to look at.
&lt;/li&gt;&lt;li&gt;An elegant way of exposing our own APIs to customer Fly Machines with ambient authentication, but without the &lt;a href="https://github.com/SummitRoute/imdsv2_wall_of_shame/blob/main/README.md"&gt;AWS IMDSv1 credential theft problem&lt;/a&gt;.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;There are downsides and warts! I&amp;rsquo;m mostly not telling you about them! Pure restrictive caveats are an awkward way to express some roles. And, blinded by my hunger to get Macaroons deployed, I spat in the face of science and used internal database IDs as our public caveat format, an act for which JP will never forgive me.&lt;/p&gt;

&lt;p&gt;If i&amp;rsquo;ve piqued your interest, &lt;a href='https://github.com/superfly/macaroon' title=''&gt;the code for this stuff is public&lt;/a&gt;, along with some more &lt;a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''&gt;detailed technical documentation&lt;/a&gt;.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>How Yoko Li makes towns, tamagoes, and tools for local AI</title>
        <link rel="alternate" href="https://fly.io/blog/how-i-fly-yoko-li/"/>
        <id>https://fly.io/blog/how-i-fly-yoko-li/</id>
        <published>2024-01-08T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/how-i-fly-yoko-li/assets/chat-bird-cover-thumb.webp"/>
        <content type="html">&lt;p&gt;Hello all, and welcome to another episode of How I Fly, a series where I interview developers about what they do with technology, what they find exciting, and the unexpected things they’ve learned along the way. This time I’m talking with &lt;a href='https://twitter.com/stuffyokodraws' title=''&gt;Yoko Li&lt;/a&gt;, an investment partner at A16Z who’s also an open-source AI developer. She works on some of the most exciting AI projects in the world. I’m excited to share them with you today, with fun stories about the lessons she’s learned along the way.&lt;/p&gt;
&lt;h2 id='cool-experiments' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#cool-experiments' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Cool Experiments&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;One of Yoko’s most thought-provoking experiments is &lt;a href='https://www.convex.dev/ai-town' title=''&gt;AI Town&lt;/a&gt;, a virtual town populated by AI agents that talk with each other. It takes advantage of the randomness of AI responses to create emergent behavior. When you open it, it looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img alt="A picture of the AI Town homepage, a UI showing a top-down 2D RPG view with a visible river and a tent. The UI shows a conversation with the characters Alice and Stella." src="/blog/how-i-fly-yoko-li/assets/image1.webp" /&gt;&lt;/p&gt;

&lt;p&gt;You can see the AI agents talking with each other and watch how the relationships between them form and change over time. It’s also a lot of fun to watch.&lt;/p&gt;

&lt;p&gt;One of Yoko’s other experiments is &lt;a href='https://ai-tamago.fly.dev/' title=''&gt;AI Tamago&lt;/a&gt;, a &lt;a href='https://en.wikipedia.org/wiki/Tamagotchi' title=''&gt;Tamagochi&lt;/a&gt; virtual pet implemented with a large language model instead of the state machine that we’re all used to. AI Tamago uses an unmodified version of LLaMA 2 7B to take in game state and user inputs, then it generates what happens next. Every time you interact with your pet, it feeds data to LLaMA 2 and then uses Ollama’s JSON mode to generate unexpected output.&lt;/p&gt;

&lt;p&gt;&lt;img alt="A picture of the homepage of AI Tamago, showing a virtual pet with buttons to feed the pet, play with the pet, clean the pet, discipline the pet, check pet status, and deliver medical care to the pet." src="/blog/how-i-fly-yoko-li/assets/image4.webp" /&gt;&lt;/p&gt;

&lt;p&gt;It’s all the fun of the classic Tamagochi toys from the 90’s (including the ability to randomly discipline your virtual pet) without any of the coin cell batteries or having to carry around the little egg-shaped puck.&lt;/p&gt;

&lt;p&gt;But that’s just something you can watch, not something that’s as easy to play with on your own machine. Yoko has also worked on the &lt;a href='https://github.com/ykhli/local-ai-stack' title=''&gt;Local AI Starter Kit&lt;/a&gt; that lets you go from zero to AI in minutes. It’s a collection of chains of models that let you ingest a bunch of documents, store them in a database, and then use those documents as context for a language model to generate responses. It’s everything you need to implement a “chat with a knowledge base” feature.&lt;/p&gt;
&lt;h3 id='the-dark-of-ai-experiments' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-dark-of-ai-experiments' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The dark of AI experiments&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The Local AI Starter Kit is significant because normally to do this, you need to set up billing and API keys for at least four different API providers, and then you need to write a bunch of (hopefully robust) code to tie it all together. With the Local AI Starter Kit, you can do this on your own hardware, with your own data, and your own models privately. It’s a huge step forward for democratizing access to this technology.&lt;/p&gt;

&lt;p&gt;Document search is one of my favorite usecases for AI, and it’s one of the most immediately useful ones. It’s also one of the most fiddly and annoying to get right. To help illustrate this, I’ve made a diagram of the steps involved with setting up document search by hand:&lt;/p&gt;

&lt;p&gt;&lt;img alt="A diagram showing the process of ingesting a pile of markdown documents into a vector database. The documents are broken into a collection of sections, then each section is passed through an embedding model and the resulting vectors are stored in a vector database." src="/blog/how-i-fly-yoko-li/assets/image3.webp" /&gt;&lt;/p&gt;

&lt;p&gt;You start with your Markdown documents. Most Markdown documents are easily broken up into sections where each section will focus on a single aspect of the larger topic of the document. You can take advantage of this best practice by letting people search for each section individually, which is typically a lot more useful than just searching the entire document.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Okay, okay, fine. Language encircles concepts instead of defining them directly. The point still stands that we’re operating at a level “below” words and sentences, I don’t want to bog this down in a bunch of linear algebra that neither of us understand well enough to explain in a single paragraph like I am here. The main point is that it lets you “fuzzy match” relevant documents in a way that exact word search queries never could on their own.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Essentially, the vector embeddings that you generate from an embedding model are a mathematical representation of the “concepts” that the embedding model uses that are adjacent to the text of your documents. When you use the same model to generate embeddings for your documents and user queries, this lets you find documents that are similar to the query, but not precisely the same exact words. This is called “fuzzy searching” and it is one of the most difficult problems in computer science (right next to naming things).&lt;/p&gt;

&lt;p&gt;When a user comes to search the database, you do the same thing as ingestion:&lt;/p&gt;

&lt;p&gt;&lt;img alt="A diagram showing the full flow for doing document search Q&amp;amp;A with a vector database. The user submits a question to an API endpoint, the question is broken into embedding vectors and used to search for similar vectors in the database. The relevant document fragments are fed into the prompt for a large language model to generate a response that is grounded in the facts from the documents that were ingested. The response is streamed to the user one token at a time." src="/blog/how-i-fly-yoko-li/assets/image2.webp" /&gt;&lt;/p&gt;

&lt;p&gt;The user query comes into your API endpoint. You use the same embedding model from earlier (omitted from the diagram for brevity) to turn that query into a vector. Then you query the same vector database to find documents that are similar to the query. Then you have a list of documents with metadata like the URL to the documentation page or section fragment in that page. From here you have two options. You can either use the documents to return a list of results to the user, or you can do the more fun thing: using those documents as context for a large language model to generate a response grounded in the relevant facts in those documents.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I think it’s also how OpenAI’s custom GPTs work, but they haven’t released technical details about how they work so this is outright speculation on my part.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This basic pattern is called Retrieval-augmented Generation (RAG), and it’s how Bing’s copilot chatbot works. The Local AI Starter Kit makes setting this pipeline up &lt;em&gt;effortless&lt;/em&gt; and &lt;em&gt;fast&lt;/em&gt;. It’s a huge step forward for making this groundbreaking technology accessible to everyone.&lt;/p&gt;
&lt;h2 id='the-struggles' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-struggles' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The struggles&lt;/span&gt;&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;When I was trying to get the AI models in AI Town to output JSON, I tried a bunch of different things. I got some good results by telling the model to &amp;ldquo;only reply in JSON, no prose&amp;rdquo;, but we ended up using a model tuned for outputting code. I think I inspired &lt;a href='https://ollama.ai' title=''&gt;Ollama&lt;/a&gt; to add their JSON output feature.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One of the main benefits of large language models is that they are essentially stochastic models of the entire Internet. They have a bunch of patterns formed that can let you create surprisingly different outputs from similar inputs. This is also one of the main drawbacks of large language models: they are essentially stochastic models of the entire Internet. They have a bunch of patterns formed that can let you create surprisingly different outputs from similar inputs. The outputs of these models are usually correct-ish enough (more correct if you ground the responses in document fact like you do with a Retrieval-augmented Generation system), but they are not always aligned with our observable reality.&lt;/p&gt;

&lt;p&gt;A lot of the time you will get outputs that don’t make any logical or factual sense. These are called “hallucinations” and they are one of the main drawbacks of large language models. If a hallucination pops in at the worst times, you’ve accidentally told someone how to poison themselves with chocolate chip cookies. This is, as the kids say, “bad”.&lt;/p&gt;

&lt;p&gt;The inherent randomness of the output of a large language model means that it can be difficult to get an exactly parsable format. Most of the time, you’d be able to coax the model to get usable JSON output, but without schema it can sometimes generate wildly different JSON responses. Only sometimes. This isn’t deterministic and Yoko has found that this is one of the most frustrating parts of working with large language models.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This works by making any offending ungrammatical tokens weighted to negative infinity. It’s amazingly hacky but the hilarious part is that it works.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;However, there are workarounds. &lt;a href='https://github.com/ggerganov/llama.cpp' title=''&gt;llama.cpp&lt;/a&gt; offers a way to use a grammar file to strictly guide the output of a large language model by using context-free grammar. This lets you get something more deterministic, but it’s still not perfect. It’s a lot better than nothing, though.&lt;/p&gt;

&lt;p&gt;One of the fun things that can happen with this is that you can have the model fail to generate anything but an endless stream of newlines in JSON mode. This is hilarious and usually requires some special detection logic to handle and restart the query. There’s work being done to let you use JSON schema to guide the generation of large language model outputs, but it’s not currently ready for the masses.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;If it’s dumb and it works, is it really dumb?&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;However, one of the easiest ways to hack around this is by using a model that generates code instead of text. This is how Yoko got the AI Town and AI Tamago models to output JSON that was mostly valid. It’s a hack, but it works. This was made a lot easier for AI town when one of the tools they use (&lt;a href='https://ollama.ai' title=''&gt;Ollama&lt;/a&gt;) added support for JSON output from the model. This is a lot better than the code generation model hack, but research continues.&lt;/p&gt;
&lt;h2 id='the-simple-joy-of-unexpected-outputs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-simple-joy-of-unexpected-outputs' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The simple joy of unexpected outputs&lt;/span&gt;&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;When I was making AI Town, I was inspired by &lt;a href='https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Objects' title=''&gt;The Lifecycle of Software Objects&lt;/a&gt; by Ted Chiang. It&amp;rsquo;s about a former zookeeper that trained AI agents to be pets, kinda like how we use Reinforcement Learning from Human Feedback to train AI models like ChatGPT.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;However, at the same time, there are cases where hallucinations are not only useful, but they are what make the implementation of a system possible. If large language models are essentially massive banks of the word frequencies of a huge part of culture, then the emergent output can create unexpected things that happen frequently. This lets you have emergent behavior form, this can be the backbone of games and is the key thing that makes AI Town work as well as it does.&lt;/p&gt;

&lt;p&gt;AI Tamago is also completely driven off of the results of large language model hallucinations. They are the core of what drives user inputs, the game loop, and the surprising reactions you get when disciplining your pet. The status screen takes in the game state and lets you know what your pet is feeling in a way that the segment displays of the Tamagochi toys could never do.&lt;/p&gt;

&lt;p&gt;These enable you to build workflows that are &lt;em&gt;augmented&lt;/em&gt; by the inherent randomness of the hallucinations instead of seeing them as drawbacks. This means you need to choose outputs that can have the hallucinations shine instead of being ugly warts you need to continuously shave away. Instead of using them for doing pathfinding, have them drive the AI of your characters or writing the A* pathfinding algorithm so you don’t have to write it again for the billionth time.&lt;/p&gt;

&lt;p&gt;I’m not saying that large language models can replace the output of a human, but they are more like a language server for human languages as well as programming languages. They are best used when you are generating the boilerplate you don’t want to do yourself, or when you are throwing science at the wall to see what sticks.&lt;/p&gt;
&lt;h2 id='in-conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#in-conclusion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;In conclusion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Yoko is showing people how to use AI today, on local machines, with models of your choice, that allow you to experiment, hack and learn.&lt;/p&gt;

&lt;p&gt;I can’t wait to see what’s next!&lt;/p&gt;

&lt;p&gt;If you want to follow what Yoko does, here’s a few links to add to your feeds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yoko’s &lt;a href='https://twitter.com/stuffyokodraws' title=''&gt;Twitter&lt;/a&gt; (or X, or whatever we&amp;rsquo;re supposed to call it now)
&lt;/li&gt;&lt;li&gt;Yoko’s &lt;a href='https://github.com/ykhli' title=''&gt;GitHub&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;Yoko’s &lt;a href='https://yoko.dev/' title=''&gt;Website&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;(insert standard conclusion diatribe here)&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Deploy Your Own (Not) Midjourney Bot on Fly GPUs</title>
        <link rel="alternate" href="https://fly.io/blog/not-midjourney-bot/"/>
        <id>https://fly.io/blog/not-midjourney-bot/</id>
        <published>2024-01-04T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/not-midjourney-bot/assets/purple-balloon-taking-off-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Fly.io has Enterprise-grade GPUs and servers all over the globe (or &lt;em&gt;disk&lt;/em&gt;, depending on which side of the flat Earth debate you fall on) making it a great place to deploy your next disruptive AI app.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Some people daydream about normal things, like coffee machines or raising that Series A round (those are normal things to dream about, right?). I daydream about commanding a fleet of chonky &lt;a href='https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413/' title=''&gt;NVIDIA Lovelace L40Ss&lt;/a&gt;. Also, totally normal. Well, fortunately for me and anyone else wanting to explore the world of generative AI — Fly.io has GPUs now!&lt;/p&gt;

&lt;p&gt;Sure, this technology will probably end up with the AI &lt;a href='https://marketoonist.com/2023/03/ai-written-ai-read.html' title=''&gt;talking to itself&lt;/a&gt; while we go about our lives — but it seems like it&amp;rsquo;s here to stay, so we should at least have some fun with it. In this post we&amp;rsquo;ll put these GPUs to task and you&amp;rsquo;ll learn how to build your very own AI image-generating Discord bot, kinda like Midjourney. Available 24/7 and ready to serve up all the pictures of cats eating burritos your heart desires. And because I&amp;rsquo;d never tell you to draw the rest of the owl, I&amp;rsquo;ll link to working code that you can deploy today.&lt;/p&gt;
&lt;h2 id='latent-diffusion-models-have-entered-the-chat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#latent-diffusion-models-have-entered-the-chat' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Latent Diffusion Models Have Entered the Chat&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In the realm of AI image generation, two names have become prominent: Midjourney and Stable Diffusion. Both are image generating software that allow you to synthesize an image from a textual prompt. One is a closed source paid service, while the other is open source and can run locally. Midjourney gained popularity because it allowed the less technically-inclined among us to explore this technology through its ease of use. Stable Diffusion democratized access to the technology, but it can be quite tricky to get good results out of it.&lt;/p&gt;

&lt;p&gt;Enter &lt;a href='https://github.com/lllyasviel/Fooocus' title=''&gt;Fooocus&lt;/a&gt; (pronounced &lt;em&gt;focus&lt;/em&gt;), an open source project that combines the best of both worlds and offers a user-friendly interface to Stable Diffusion. It&amp;rsquo;s hands down the easiest way to get started with Stable Diffusion. Sure there are more popular tools like Stable Diffusion web UI and ComfyUI, but Fooocus adds some magic to reduce the need to manually tweak a bunch of settings.  The most significant feature is probably GPT-2-based &amp;ldquo;&lt;a href='https://github.com/lllyasviel/Fooocus/discussions/117#raw' title=''&gt;prompt expansion&lt;/a&gt;&amp;rdquo; to dynamically enhance prompts.&lt;/p&gt;

&lt;p&gt;The point of Fooocus is to &lt;em&gt;focus&lt;/em&gt; on your prompt. The more you put into it, the more you get out. That said, a very simple prompt like &amp;ldquo;forest elf&amp;rdquo; can return high-quality images without the need to trawl the web for prompt ideas or fiddle with knobs and levers (although they&amp;rsquo;re there if you want them).&lt;/p&gt;

&lt;p&gt;So, what can this thing &lt;em&gt;do&lt;/em&gt;? Well, this…&lt;/p&gt;

&lt;p&gt;&lt;img alt="A black and white sketch of hot-air balloon over a mountain range generated using Fooocus with &amp;quot;Pencil Sketch Drawing&amp;quot; style and quality = True" src="/blog/not-midjourney-bot/assets/./balloon-sketch.webp" /&gt;&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s the full command I&amp;rsquo;ve used to generate this image: &lt;code&gt;/imagine prompt: sketch of hot-air balloon over a mountain range style1: Pencil Sketch Drawing quality: true ar: 1664×576&lt;/code&gt;&lt;/p&gt;
&lt;h2 id='what-were-building' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-were-building' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What We&amp;rsquo;re Building&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ll deploy two applications. The code to run the bot itself will run on normal VM hardware, and the API server doing all the hard work synthesizing alpacas out of thin air will run on GPU hardware.&lt;/p&gt;

&lt;p&gt;&lt;img alt="An architecture diagram explaining how the two apps will communicate and return the requested image to an end user." src="/blog/not-midjourney-bot/assets/./arch-diagram.png?center&amp;amp;2/3" /&gt;&lt;/p&gt;

&lt;p&gt;Fooocus is served up as a web UI by default, but with a little elbow grease we can interact with it as a REST API. Fortunately, with more than 25k stars on GitHub at the time of writing, the project has a lively open-source community, so we don&amp;rsquo;t need to do much work here — it&amp;rsquo;s already been done for us. &lt;a href='https://github.com/konieshadow/Fooocus-API' title=''&gt;Fooocus-API&lt;/a&gt; is a project that shoves FastAPI in front of a Fooocus runtime. We&amp;rsquo;ll use this for the API server app.&lt;/p&gt;

&lt;p&gt;The Python-based bot connects to the &lt;a href='https://discord.com/developers/docs/topics/gateway' title=''&gt;Discord Gateway API&lt;/a&gt; using the &lt;a href='https://github.com/Pycord-Development/pycord' title=''&gt;Pycord&lt;/a&gt; library. When it starts up, it maintains an open pipe for data to flow back and forth via WebSockets. The bot app also includes a client that knows how to talk to the API server using Flycast and request the image it needs via HTTP.&lt;/p&gt;

&lt;p&gt;When we request an image from Discord using the &lt;code&gt;/imagine&lt;/code&gt; slash command, we immediately respond using Pycord&amp;rsquo;s &lt;code&gt;defer()&lt;/code&gt; function to let Discord know that the request has been received and the bot is working on it — it&amp;rsquo;ll take a few seconds to process your prompt, fabricate an image, upload it to Discord and let you share it with your friends. This is a blocking operation, so it won&amp;rsquo;t perform well if you have hundreds of people on your Discord Server using the command. For that, you&amp;rsquo;ll want to jiggle some wires to make the code non-blocking. But for for now, this gives us a nice UX for the bot.&lt;/p&gt;

&lt;p&gt;When the API server returns the image, it gets saved to disk. We&amp;rsquo;ll use the fantastic &lt;a href='https://github.com/sqids/sqids-python' title=''&gt;Sqids&lt;/a&gt; library to generate collision-free file names:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-8ivdq00o"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-8ivdq00o"&gt;&lt;span class="n"&gt;unique_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sqids&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result_filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"result_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;unique_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.png"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We&amp;rsquo;ll also use &lt;code&gt;asyncio&lt;/code&gt; to check if the image is ready every second, and when it is, we send it off to Discord to complete the request:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-dwletmxh"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-dwletmxh"&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_filename&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"rb"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;discord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result_filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Neither of these two apps will be exposed to the Internet, yet they&amp;rsquo;ll still be able to communicate with each other. One of the undersold stories about Fly.io is the ease with which two applications can communicate over the private network. We assign special IPv6 private network (6pn) addresses within the same organizational space and applications can effortlessly discover and connect to one another without any additional configuration.&lt;/p&gt;

&lt;p&gt;But what about load balancing and this &amp;ldquo;scale-to-zero&amp;rdquo; thing? We don&amp;rsquo;t &lt;em&gt;just&lt;/em&gt; want our two apps to talk to each other, we want the Fly Proxy to start our Machine when a request comes in, and stop it when idle. For that, we&amp;rsquo;ll need &lt;a href='https://fly.io/docs/reference/private-networking/#flycast-private-load balancing' title=''&gt;Flycast&lt;/a&gt;, our private load balancing feature.&lt;/p&gt;

&lt;p&gt;When you assign a Flycast IP to your app, you can route requests using a special &lt;code&gt;.flycast&lt;/code&gt; domain. Those requests are routed through the Fly Proxy instead of directly to instances in your app. Meaning you get all the load balancing, rate limiting and other proxy goodness that you&amp;rsquo;re accustomed to. The Proxy runs a process which can automatically downscale Machines every few minutes. It&amp;rsquo;ll also start them right back up when a request comes in — this means we can take advantage of scale-to-zero, saving us a bunch of money!&lt;/p&gt;
&lt;h2 id='the-imagine-command' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-imagine-command' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The &lt;code&gt;/imagine&lt;/code&gt; Command&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The slash command is the heart of your bot, enabling you to generate images based on your prompt, right from within Discord. When you type &lt;code&gt;/imagine&lt;/code&gt; into the Discord chat, you&amp;rsquo;ll see some command options pop up.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;ll need to input your base prompt (e.g. &amp;ldquo;an alpaca sleeping in a grassy field&amp;rdquo;) and  optionally pick some styles (&amp;ldquo;Pencil Sketch Drawing&amp;rdquo;, &amp;ldquo;Futuristic Retro Cyberpunk&amp;rdquo;, &amp;ldquo;MRE Dark Cyberpunk&amp;rdquo; etc). With Fooocus, combining multiple styles — &amp;ldquo;style-chaining&amp;rdquo; — can help you achieve amazing results. Set the aspect ratio or provide negative prompts if needed, too.&lt;/p&gt;

&lt;p&gt;After you execute the command, the bot will request the image from the API, then send it as a response in the chat. Let’s see it in action!&lt;/p&gt;

&lt;p&gt;&lt;img alt="A dif demo run through showcasing the ability of the bot to generate images from Discord" src="/blog/not-midjourney-bot/assets/./demo.gif?card&amp;amp;center" /&gt;&lt;/p&gt;
&lt;h2 id='deployment-speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-speedrun' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Deployment Speedrun&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;First, we&amp;rsquo;ll deploy the API server.&lt;/strong&gt; For convenience (and to speed things up), we&amp;rsquo;ll use a pre-built image when we deploy. With dependencies like &lt;code&gt;torch&lt;/code&gt; and &lt;code&gt;torchvision&lt;/code&gt; bundled in, it&amp;rsquo;s a hefty image weighing in just shy of 12GB. With a normal Fly Machine  this would not only be a bad idea, but not even possible due to an 8GB limit for the VMs rootfs. Fortunately the wizards behind Fly GPUs have accounted for our need to run huge models and their dependencies, and awarded us 50GB of rootfs.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Fly GPUs use &lt;a href="https://github.com/cloud-hypervisor/cloud-hypervisor" title=""&gt;Cloud Hypervisor&lt;/a&gt; and not &lt;a href="https://github.com/firecracker-microvm/firecracker" title=""&gt;Firecracker&lt;/a&gt; (like a regular Fly Machine) for virtualization. But even with a 12GB image, this doesn’t stop the Machine from booting in seconds when a new request comes in through the Proxy.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;To start, clone the template &lt;a href='https://github.com/fly-apps/not-midjourney-bot' title=''&gt;repository&lt;/a&gt;. You&amp;rsquo;ll need this for both the bot and server apps. Then deploy the server with the Fly CLI:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-kdezxoz"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-kdezxoz"&gt;fly deploy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt; ghcr.io/fly-apps/not-midjourney-bot:server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--config&lt;/span&gt; ./server/fly.toml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-public-ips&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This command tells Fly.io to deploy your application based on the configuration specified in the &lt;code&gt;fly.toml&lt;/code&gt;, while the &lt;code&gt;--no-public-ips&lt;/code&gt; flag secures your app by not exposing it to the public Internet.&lt;/p&gt;

&lt;p&gt;Remember Flycast? To use it, we’ll allocate a private IPv6:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-t586n0x7"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-t586n0x7"&gt;fly ips allocate-v6 &lt;span class="nt"&gt;--private&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Now, let&amp;rsquo;s take a look at our &lt;a href='https://github.com/fly-apps/not-midjourney-bot/blob/134bb634f97bf81040e489650f2334b48d976c10/server/fly.toml' title=''&gt;&lt;code&gt;fly.toml&lt;/code&gt;&lt;/a&gt; config:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-avnpvpja"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-avnpvpja"&gt;&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"alpaca-image-gen"&lt;/span&gt;
&lt;span class="py"&gt;primary_region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ord"&lt;/span&gt;

&lt;span class="nn"&gt;[[vm]]&lt;/span&gt;
  &lt;span class="py"&gt;size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"performance-8x"&lt;/span&gt;
  &lt;span class="py"&gt;memory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"16gb"&lt;/span&gt;
  &lt;span class="py"&gt;gpu_kind&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"l40s"&lt;/span&gt;

&lt;span class="nn"&gt;[[services]]&lt;/span&gt;
  &lt;span class="py"&gt;internal_port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8888&lt;/span&gt;
  &lt;span class="py"&gt;protocol&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"tcp"&lt;/span&gt;
  &lt;span class="py"&gt;auto_stop_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_start_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;min_machines_running&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="nn"&gt;[[services.ports]]&lt;/span&gt;
      &lt;span class="py"&gt;handlers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;["http"]&lt;/span&gt;
      &lt;span class="py"&gt;port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
      &lt;span class="py"&gt;force_https&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="nn"&gt;[mounts]&lt;/span&gt;
  &lt;span class="py"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"repositories"&lt;/span&gt;
  &lt;span class="py"&gt;destination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/app/repositories"&lt;/span&gt;
  &lt;span class="py"&gt;initial_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"20gb"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;There are a few key things to note here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Currently, the NVIDIA L40Ss we&amp;rsquo;re using when we specify &lt;code&gt;gpu_kind&lt;/code&gt; are only available in &lt;code&gt;ORD&lt;/code&gt;, so that&amp;rsquo;s what we&amp;rsquo;ve set the &lt;code&gt;primary_region&lt;/code&gt; to. We&amp;rsquo;re rolling out more GPUs to more regions in a hurry — but for now we&amp;rsquo;ll host the bot in Chicago.
&lt;/li&gt;&lt;li&gt;Out of the box, 8GB of system RAM is suggested. In my testing this wasn&amp;rsquo;t close to enough: the Machine would frequently run out of memory and crash. I got things working better by using 16GB of RAM.
&lt;/li&gt;&lt;li&gt;The FastAPI server binds to port 8888; we need to set this as our &lt;code&gt;internal_port&lt;/code&gt;, or the Fly Proxy won&amp;rsquo;t know where to send requests.
&lt;/li&gt;&lt;li&gt;We want our Machine to &lt;a href='https://fly.io/docs/apps/autostart-stop/' title=''&gt;automatically stop and start&lt;/a&gt;.
&lt;/li&gt;&lt;li&gt;Flycast doesn&amp;rsquo;t do HTTPS, so we won&amp;rsquo;t force it here. Don&amp;rsquo;t worry, it&amp;rsquo;s still encrypted over the wire!
&lt;/li&gt;&lt;li&gt;A volume is automatically created on the first deploy. On first boot, the app clones the Fooocus repo and downloads the Stable Diffusion model checkpoints onto that volume. This takes a couple of minutes, but the next time the Machine starts, it&amp;rsquo;ll have everything it needs to serve a request within seconds.
&lt;/li&gt;&lt;/ol&gt;
&lt;div class="callout"&gt;&lt;p&gt;The &lt;a href="https://github.com/fly-apps/not-midjourney-bot/blob/84e72d1e7048627b7c845fe3d44d45b278e451d5/README.md" title=""&gt;&lt;strong class="font-semibold text-navy-950"&gt;README&lt;/strong&gt;&lt;/a&gt; for this project has detailed instructions about setting up your Discord bot and adding it to a Server.  After setting up the permissions and privileged intents, you’ll get an OAuth2 URL. Use this URL to invite your bot to your Discord server and confirm the permissions. Once that’s done, grab your Discord API token, you’ll need it for the next step.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;With the API server up and running, it&amp;rsquo;s time to deploy the Discord bot.&lt;/strong&gt; This app will run on a normal Fly Machine, no GPU required. First, set the &lt;code&gt;DISCORD_TOKEN&lt;/code&gt; and &lt;code&gt;FOOOCUS_API_URL&lt;/code&gt; (the Flycast endpoint for the API server) secrets, using the Fly CLI. Then deploy:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-wc4jofjn"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-wc4jofjn"&gt;fly deploy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt; ghcr.io/fly-apps/not-midjourney-bot:bot &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--config&lt;/span&gt; ./bot/fly.toml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-public-ips&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Notice that the bot app doesn&amp;rsquo;t need to be publicly visible on the Internet either. Under the hood, the WebSocket connection to Discord&amp;rsquo;s Gateway API allows the bot to communicate freely without the need to define any services in our &lt;code&gt;fly.toml&lt;/code&gt;. This also means that the Fly Proxy will not downscale the app like it does the GPU Machine — the bot will always appear &amp;ldquo;online&amp;rdquo;.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Not interested in GPUs?&lt;/h1&gt;
    &lt;p&gt;You can still deploy apps on Fly.io today and be up and running in a matter of minutes.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/speedrun/"&gt;
        Deploy an app now&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='how-do-i-know-this-thing-is-using-gpu-for-reals' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-do-i-know-this-thing-is-using-gpu-for-reals' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How Do I Know This Thing Is Using GPU for Reals?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;That&amp;rsquo;s easy! NVIDIA provides us with a neat little command-line utility called &lt;code&gt;nvidia-smi&lt;/code&gt; which we can use to monitor and get information about NVIDIA GPU devices.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s SSH to the running Machine for the API server app and run an &lt;code&gt;nvidia-smi&lt;/code&gt; query in one go. It&amp;rsquo;s a little clunky, but you&amp;rsquo;ll get the point:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-51fc7qja"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-51fc7qja"&gt;fly ssh console &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-C&lt;/span&gt; &lt;span class="s2"&gt;"nvidia-smi --query-gpu=gpu_name,utilization.gpu,utilization.memory,temperature.gpu,power.draw --format=csv,noheader --loop"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-uemr03f6"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-uemr03f6"&gt;Connecting to fdaa:2:f664:a7b:210:d8b2:8fd8:2... complete

NVIDIA L40S, 0 %, 0 %, 46, 88.63 W
NVIDIA L40S, 0 %, 0 %, 46, 88.61 W
NVIDIA L40S, 36 %, 4 %, 51, 103.41 W
NVIDIA L40S, 65 %, 25 %, 57, 280.90 W
NVIDIA L40S, 0 %, 0 %, 49, 91.13 W
NVIDIA L40S, 0 %, 0 %, 48, 89.76 W
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;What we&amp;rsquo;ve done is run the command on a loop while the bot is actually doing work synthesizing an image and we get to see it ramp up and consume more wattage and VRAM. The card is barely breaking a sweat!&lt;/p&gt;
&lt;h2 id='how-much-will-these-alpaca-pics-cost-me' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-will-these-alpaca-pics-cost-me' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How Much Will These Alpaca Pics Cost Me?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s talk about the cost-effectiveness of this setup. On Fly.io, an L40S GPU &lt;a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''&gt;costs&lt;/a&gt; $2.50/hr. Tag on a few cents per hour for the VM resources and storage for our models and you&amp;rsquo;re looking at about $3.20/hr to run the GPU Machine. It&amp;rsquo;s &lt;em&gt;on-demand&lt;/em&gt;, too — if you&amp;rsquo;re not using the compute, you&amp;rsquo;re not paying for it! Keep in mind that some of these checkpoint models can be several gigabytes and if you create a volume, you will be charged for it even when you have no Machines running. It&amp;rsquo;s worth noting too, that the non-GPU bot app falls into our &lt;a href='https://fly.io/docs/about/pricing/#free-allowances' title=''&gt;free allowance&lt;/a&gt;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Rates are on-demand, with no minimum usage requirements. Discounted rates for reserved GPU Machines and dedicated hosts are also available if you email &lt;a href="mailto:[email protected]" title=""&gt;[email protected]&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In comparison, Midjourney offers several subscription tiers with the cheapest plan costing $10/mo and providing 3.3 hours of &amp;ldquo;fast&amp;rdquo; GPU time (roughly equivalent to an enterprise-grade Fly GPU). This works out to about $3/hr give or take a few cents.&lt;/p&gt;
&lt;h2 id='where-can-i-take-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-can-i-take-this' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Where Can I Take This?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;There is a lot you can do to build out the bot&amp;rsquo;s functionality. You control the source code for the bot, meaning that you can make it do &lt;em&gt;whatever you want&lt;/em&gt;. You might decide to mimic Midjourney&amp;rsquo;s &lt;code&gt;/blend&lt;/code&gt; command to splice your own images into prompts (AKA img2img diffusion). You can do this by adding more commands to your &lt;a href='https://guide.pycord.dev/popular-topics/cogs' title=''&gt;Cog&lt;/a&gt;, Pycord&amp;rsquo;s way of grouping similar commands. You might decide to add a button to roll the image if you don&amp;rsquo;t like it, or even specify the number of images to return. The possibilities are endless and your cloud bill&amp;rsquo;s the limit!&lt;/p&gt;

&lt;p&gt;The full code for the bot and server (with detailed instructions on how to deploy it on Fly.io) can be found &lt;a href='https://github.com/fly-apps/not-midjourney-bot' title=''&gt;&lt;strong class='font-semibold text-navy-950'&gt;here&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Fly With Alpine</title>
        <link rel="alternate" href="https://fly.io/blog/fly-with-alpine/"/>
        <id>https://fly.io/blog/fly-with-alpine/</id>
        <published>2023-12-21T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/fly-with-alpine/assets/fly-with-alpine-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Reduce image sizes and improve startup times by switching your base image to Alpine Linux.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Before proceeding, a caution.  This is an engineering trade-off.  Test carefully before deploying to production.&lt;/p&gt;

&lt;p&gt;By the end of this blog post you should have the information you need to make an informed decision.&lt;/p&gt;
&lt;h2 id='introduction' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#introduction' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Introduction&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://www.alpinelinux.org/about/' title=''&gt;Alpine Linux&lt;/a&gt; is a Linux distribution that advertises itself as Small.  Simple.  Secure.&lt;/p&gt;

&lt;p&gt;It is indisputably smaller than the alternatives &amp;ndash; when measured by image size.  More on that in a bit.  Some claim that this results in less memory usage and better performance.  Others dispute these claims.  For these, it is best that you test the results for yourself with your application.&lt;/p&gt;

&lt;p&gt;Simple is harder to measure.  Some of the larger differences, like &lt;a href='https://github.com/OpenRC/openrc#readme' title=''&gt;OpenRC&lt;/a&gt; vs &lt;a href='https://systemd.io/' title=''&gt;SystemD&lt;/a&gt;, are less relevant in container environments.  Others, like &lt;a href='https://busybox.net/' title=''&gt;BusyBox&lt;/a&gt; are implementation details.  Essentially what you get is a Linux distribution with perhaps a number of standard packages (e.g., bash) not installed by default, but these can be easily added if needed.&lt;/p&gt;

&lt;p&gt;Secure is definitely an important attribute.  The alternatives make comparable claims in this area.  Do your own research in this area and come to your own conclusions.&lt;/p&gt;

&lt;p&gt;Not mentioned is the downside: Alpine Linux has a smaller ecosystem that the alternatives, particularly when compared to Debian.&lt;/p&gt;
&lt;h2 id='baseline' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#baseline' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Baseline&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s start with a baseline consisting of the Dockerfiles produced by &lt;code&gt;fly launch&lt;/code&gt; for some of the most popular
frameworks:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-257otcgi"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-257otcgi"&gt;FROM fideloper/fly-laravel:${PHP_VERSION}
FROM hexpm/elixir:1.12.3-erlang-24.1.4-debian-bullseye-20210902-slim
FROM node:${NODE_VERSION}-slim
FROM oven/bun:${BUN_VERSION}-slim
FROM python:${PYTHON_VERSION}-slim-bullseye
FROM ruby:$RUBY_VERSION-slim
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;What may not be obvious to the naked eye from these results is that the base image for these is one of the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Debian Bookworm (the current &amp;ldquo;stable&amp;rdquo; distribution)
&lt;/li&gt;&lt;li&gt;Debian Bullseye (the previous &amp;ldquo;stable&amp;rdquo; distribution)
&lt;/li&gt;&lt;li&gt;Ubuntu Focal Fossa (the previous LTS release of Ubuntu)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Once you factor in that Ubuntu is based on Debian, the conclusion is that Debian is effectively the default distribution for fly IO.  Rest assured that this isn&amp;rsquo;t the result of a devious conspiracy by Fly.io, but rather a reflection of the default choices made independently by the developers of a number of frameworks and runtimes.  Beyond this, all Fly.io is doing is choosing the &amp;ldquo;slim&amp;rdquo; version of the default distribution for each framework as the base.&lt;/p&gt;

&lt;p&gt;What&amp;rsquo;s likely going on here is a virtuos circle: people choose Debian because of the ecosystem, and ecosystem grows because people chose Debian.&lt;/p&gt;

&lt;p&gt;Now lets compare base image sizes:&lt;/p&gt;
&lt;table class="ml-8 mb-8"&gt;
&lt;thead&gt;
&lt;tr&gt;
  &lt;th class="px-8"&gt;
  &lt;th class="px-8 underline"&gt;Alpine
  &lt;th class="px-8 underline"&gt;Debian slim
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
  &lt;th class="text-left"&gt;Bun 1.0.18
  &lt;td class="text-center"&gt;43.10M
  &lt;td class="text-center"&gt;63.84M
&lt;/tr&gt;
&lt;tr&gt;
  &lt;th class="text-left"&gt;Node 21.4.0
  &lt;td class="text-center"&gt;46.83M
  &lt;td class="text-center"&gt;70.08M
&lt;/tr&gt;
&lt;tr&gt;
  &lt;th class="text-left"&gt;Python 3.12.1
  &lt;td class="text-center"&gt;17.59M
  &lt;td class="text-center"&gt;45.36M
&lt;/tr&gt;
&lt;tr&gt;
  &lt;th class="text-left"&gt;Ruby 3.2
  &lt;td class="text-center"&gt;40.14M
  &lt;td class="text-center"&gt;74.36M
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;


&lt;p&gt;And these numbers are just the for the base images.  I&amp;rsquo;ve measured a minimal Rails/Postgresql/esbuild application at 304MB on Alpine and 428MB on Debian Slim.  A minimal Bun application at 110MB on Alpine and 173MB on Debian Slim.  And a minimal Node application at 142MB on Alpine and 207MB on Debian Slim.&lt;/p&gt;

&lt;p&gt;In each case, corresponding Alpine images are consistently smaller than their Debian slim equivalent.&lt;/p&gt;
&lt;h2 id='switching-distributions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#switching-distributions' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Switching Distributions&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Switch distributions (and switching back!) is easy.&lt;/p&gt;

&lt;p&gt;The first change is to replace &lt;code&gt;-slim&lt;/code&gt; with &lt;code&gt;-alpine&lt;/code&gt; in &lt;code&gt;FROM&lt;/code&gt; statements in your &lt;code&gt;Dockerfile&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Next is to replace &lt;code&gt;apt-get update&lt;/code&gt; with &lt;code&gt;apk update&lt;/code&gt; and &lt;code&gt;apt-get install&lt;/code&gt; with &lt;code&gt;apk add&lt;/code&gt;.  Delete any options you may have like &lt;code&gt;-y&lt;/code&gt; and &lt;code&gt;--no-install-recommends&lt;/code&gt; - they aren&amp;rsquo;t needed.&lt;/p&gt;

&lt;p&gt;Now review the names of the packages you are installing.  Many are named the same.  A few are different.
You can use &lt;a href='https://pkgs.alpinelinux.org/packages' title=''&gt;alpine packages&lt;/a&gt; to look for ones to use.  Some examples of
differences:&lt;/p&gt;
&lt;table class="ml-8 mb-8" style="border-collapse: separate; border-spacing: 1rem 0"&gt;
&lt;thead&gt;
&lt;tr&gt;
  &lt;th class="px-8 underline text-left"&gt;Debian
  &lt;th class="px-8 underline text-left"&gt;Alpine
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
  &lt;td&gt;build-essential
  &lt;td&gt;build-base
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;chromium-sandbox
  &lt;td&gt;chromium-chromedriver
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;default-libmysqlclient-dev
  &lt;td&gt;mysql-client
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;default-mysqlclient
  &lt;td&gt;mysql-client
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;freedts-bin
  &lt;td&gt;freedts
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libicu-dev
  &lt;td&gt;icu-dev
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libjemalloc
  &lt;td&gt;jemalloc-dev
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libjpeg-dev
  &lt;td&gt;jpeg-dev
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libmagickwand-dev
  &lt;td&gt;imagemagick-libs
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libsqlite3-0
  &lt;td&gt;sqlite-dev
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libtiff-dev
  &lt;td&gt;tiff-dev
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libvips
  &lt;td&gt;vips-dev
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;node-gyp
  &lt;td&gt;gyp
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;pkg-config
  &lt;td&gt;pkgconfig
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;python
  &lt;td&gt;python3
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;python-is-python3
  &lt;td&gt;python3
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;sqlite3
  &lt;td&gt;sqlite
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;


&lt;p&gt;Note: the above is just an approximation.  For example, while &lt;code&gt;libsqlite3-0&lt;/code&gt; and &lt;code&gt;sqlite-dev&lt;/code&gt; include everything
you need to build an application that uses sqlite3, all that is needed at runtime is &lt;code&gt;sqlite-lib&lt;/code&gt;.  This relentless attention to detail contributes to smaller final image sizes.&lt;/p&gt;

&lt;p&gt;Note: For Bun, Node, and Rails users, knowledge of how to apply the above changes are included in recent versions of the dockerfile generators that we provide.  After all, computers are good at &lt;code&gt;if&lt;/code&gt; statements:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-7aa51nfb"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-7aa51nfb"&gt;bunx dockerfile --alpine
npx dockerfile --alpine
bin/rails generate dockerfile --alpine
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Choose your own Linux Distribution&lt;/h1&gt;
    &lt;p&gt;Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/"&gt;
        Run your entire stack near your users
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='potential-issues' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#potential-issues' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Potential issues&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Over time, we&amp;rsquo;ve noted a number of issues.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alpine uses &lt;a href='https://musl.libc.org/' title=''&gt;musl&lt;/a&gt; for a runtime library.  Debian uses &lt;a href='https://www.gnu.org/software/libc/' title=''&gt;glibc&lt;/a&gt;.  Software tested on glibc may not work as expected on musl. And there are other potential compatibility issues like &lt;a href='https://bell-sw.com/blog/how-to-deal-with-alpine-dns-issues/' title=''&gt;DNS&lt;/a&gt;.
&lt;/li&gt;&lt;li&gt;Debian includes both &lt;code&gt;adduser&lt;/code&gt; and &lt;code&gt;useradd&lt;/code&gt;.    Alpine, by default, only includes &lt;code&gt;adduser&lt;/code&gt;.
This can be addressed by installing package like &lt;a href='https://pkgs.alpinelinux.org/package/edge/community/armv7/shadow' title=''&gt;shadow&lt;/a&gt;, or switching to &lt;code&gt;adduser&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;Packages like &lt;a href='https://github.com/nodenv/node-build' title=''&gt;node-build&lt;/a&gt; require &lt;code&gt;bash&lt;/code&gt; which isn&amp;rsquo;t included by default.  Adding it back in allows &lt;code&gt;node-build&lt;/code&gt; to run to completion, but the end result is that a precompiled Debian executable is installed that won&amp;rsquo;t run on Alpine.
An alternative is to download an &lt;a href='https://unofficial-builds.nodejs.org/' title=''&gt;unofficial build&lt;/a&gt;.
&lt;/li&gt;&lt;li&gt;Release candidates for Alpine may not get the same level of testing as Debian resulting in problems
like &lt;a href='https://github.com/sparklemotion/sqlite3-ruby/issues/434' title=''&gt;sqlite3-ruby not working on Alpine 3.19&lt;/a&gt;.
In cases like this, stay back on previous versions of Alpine for a short while, or compile the gem for yourself.
These issues are temporary.
&lt;/li&gt;&lt;li&gt;Some packages, like Chrome, are not available for Alpine.  Alternatives like Chromium may be necessary.
&lt;/li&gt;&lt;/ul&gt;
&lt;h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Conclusion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;While not as large a community as Debian, there is a substantial number of happy Alpine users.&lt;/p&gt;

&lt;p&gt;For the forseeable future, the default for both frameworks and there fly.io will remain Debian, but we make it easy to switch.&lt;/p&gt;

&lt;p&gt;Try it out!  Hopefully this blog has provided insight into what you should evaluate for before you switch.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Introducing Fly Kubernetes</title>
        <link rel="alternate" href="https://fly.io/blog/fks/"/>
        <id>https://fly.io/blog/fks/</id>
        <published>2023-12-18T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/fks/assets/fks-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, and if you’ve been following us awhile you probably just did a double-take. We’re building a new public cloud that runs containerized applications with virtual machine isolation on our own hardware around the world. And we’ve been doing it without any K8s. Until now!&lt;/p&gt;
&lt;/div&gt;&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;Update, March 2024:&lt;/strong&gt; FKS does more stuff now, and you can read about it in &lt;a href="https://fly.io/blog/fks-beta-live/" title=""&gt;Fly Kubernetes does more now&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We&amp;rsquo;ll own it: we&amp;rsquo;ve been snarky about Kubernetes. We are, at heart, old-school Unix nerds. We&amp;rsquo;re still scandalized by &lt;code&gt;systemd&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To make matters more complicated, the problems we&amp;rsquo;re working on &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;have a lot of overlap with K8s&lt;/a&gt;, but &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/#numad' title=''&gt;just enough impedance mismatch&lt;/a&gt; that it (&lt;a href='https://www.nomadproject.io/' title=''&gt;or anything that looks like it&lt;/a&gt;) is a bad fit for our own platform.&lt;/p&gt;

&lt;p&gt;But, come on: you never took us too seriously about K8s, right? K8s is hard for us to use, but that doesn&amp;rsquo;t mean it&amp;rsquo;s not a great fit for what you&amp;rsquo;re building. We&amp;rsquo;ve been clear about that all along, right? Sure we have!&lt;/p&gt;

&lt;p&gt;Well, good news, everybody! If K8s is important for your project, and that&amp;rsquo;s all that&amp;rsquo;s been holding you back from &lt;a href='https://fly.io/docs/speedrun/' title=''&gt;trying out Fly.io&lt;/a&gt;, we&amp;rsquo;ve spent the past several months building something for you.&lt;/p&gt;
&lt;h2 id='fly-io-for-kubernetians' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-io-for-kubernetians' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Fly.io For Kubernetians&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Fly.io works by transmogrifying Docker containers into filesystems for &lt;a href='https://firecracker-microvm.github.io/' title=''&gt;lightweight hypervisors&lt;/a&gt;, and running them on servers we rack in dozens of regions around the world.&lt;/p&gt;

&lt;p&gt;You can build something like Fly.io with &amp;ldquo;standard&amp;rdquo; orchestration tools like K8s. In fact, that&amp;rsquo;s what we did to start, too. To keep things simple, we used Nomad, and instead of K8s CNIs, we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system &lt;a href='https://fly.io/blog/bpf-xdp-packet-filters-and-udp/' title=''&gt;based on eBPF&lt;/a&gt;). But the ideas are the same.&lt;/p&gt;

&lt;p&gt;The way we look at it, the signature feature of a &amp;ldquo;standard&amp;rdquo; orchestrator is the global scheduler: the global eye in the sky that keeps track of vacancies on servers and optimized placement of new workloads. That&amp;rsquo;s the problem we ran into. We&amp;rsquo;re running over 200,000 applications, and we&amp;rsquo;re doing so on every continent except Antarctica. The speed of light (and a globally distributed network of backhoes) has something to say about keeping a perfectly consistent global picture of hundreds of thousands of applications, and it&amp;rsquo;s not pleasant.&lt;/p&gt;

&lt;p&gt;The other problem we ran into is that our Nomad scheduler kept trying to outsmart us, and, worse, our customers. It turns out that our users have pretty firm ideas of where they&amp;rsquo;d like their apps to run. If they ask for São Paulo, they want São Paulo, not Rio. But global schedulers have other priorities, like optimally bin-packing resources, and sometimes &lt;code&gt;GIG&lt;/code&gt; looks just as good as &lt;code&gt;GRU&lt;/code&gt; to them.&lt;/p&gt;

&lt;p&gt;To escape the scaling and DX problems we were hitting, we rethought orchestration. Where orchestrators like K8s tend to work through distributed consensus, we keep state local to workers. Each racked server in our fleet is a source of truth about the apps running on it, and provide an API to a market-style &amp;ldquo;scheduler&amp;rdquo; that bids on resources in regions. &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/#numad' title=''&gt;You can read more about here, if you&amp;rsquo;re interested.&lt;/a&gt; We call this system the &lt;a href='https://fly.io/docs/machines/' title=''&gt;Fly Machines API.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An important detail to grok about how this all works – a reason we haven&amp;rsquo;t, like, beaten the CAP theorem by doing this – is that Fly Machines API calls can fail. If Nomad or K8s tries to place a workload on some server, only to find out that it&amp;rsquo;s filled up or thrown a rod, it will go hunt around for some other place to put it, like a good little robot. The Machines API won&amp;rsquo;t do this. It&amp;rsquo;ll just fail the request. In fact, it goes out of its way to fail the request quickly, to deliver feedback; if we can&amp;rsquo;t schedule work in &lt;code&gt;JNB&lt;/code&gt; right now, you might want instead to quickly deploy to &lt;code&gt;BOM&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id='pluggable-orchestration-and-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pluggable-orchestration-and-fks' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Pluggable Orchestration and FKS&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In a real sense what we&amp;rsquo;ve done here is extract a chunk of the scheduling problem out of our orchestrator, and handed it off to other components. For most of our users, that component is &lt;a href='https://github.com/superfly/flyctl' title=''&gt;&lt;code&gt;flyctl&lt;/code&gt;, our intrepid CLI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But &lt;a href='https://fly.io/docs/machines/working-with-machines/' title=''&gt;Fly Machines is an API&lt;/a&gt;, and anything can drive it. A lot of our users want quick answers to requests to schedule apps in specific regions, and &lt;code&gt;flyctl&lt;/code&gt; does a fine job of that. But it&amp;rsquo;s totally reasonable to want something that works more like the good little robots inside of K8s.&lt;/p&gt;

&lt;p&gt;You can build your own orchestrator with our API, but if what you&amp;rsquo;re looking for is literally Kubernetes, we&amp;rsquo;ve saved you the trouble. It&amp;rsquo;s called Fly Kubernetes, or FKS for short.&lt;/p&gt;

&lt;p&gt;FKS is an implementation of Kubernetes that runs on top of Fly.io. You start it up using &lt;code&gt;flyctl&lt;/code&gt;, by running &lt;code&gt;flyctl ext k8s create&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Under the hood, FKS is a straightforward combination of two well-known Kubernetes projects: &lt;a href='https://k3s.io/' title=''&gt;K3s, the lightweight CNCF-certified K8s distro&lt;/a&gt;, and &lt;a href='https://virtual-kubelet.io/' title=''&gt;Virtual Kubelet&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Virtual Kubelet is interesting. In K8s-land, a &lt;code&gt;kubelet&lt;/code&gt; is a host agent; it&amp;rsquo;s the thing that runs on every server in your fleet that knows how to run a K8s Pod. Virtual Kubelet isn&amp;rsquo;t a host agent; it&amp;rsquo;s a software component that pretends to be a host, registering itself with K8s as if it was one, but then sneakily proxying the Kubelet API elsewhere.&lt;/p&gt;

&lt;p&gt;In FKS, &amp;ldquo;elsewhere&amp;rdquo; is &lt;a href='https://fly.io/docs/machines/' title=''&gt;Fly Machines&lt;/a&gt;. All we have to do is satisfy various APIs that virtual kubelet exposes. For example, the API for the lifecycle of a pod:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-z4irn8cz"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-z4irn8cz"&gt;type PodLifecycleHandler interface {
    CreatePod(ctx context.Context, pod *corev1.Pod) error
    UpdatePod(ctx context.Context, pod *corev1.Pod) error
    DeletePod(ctx context.Context, pod *corev1.Pod) error
    GetPod(ctx context.Context, namespace, name string) (*corev1.Pod, error)
    GetPodStatus(ctx context.Context, namespace, name string) (*corev1.PodStatus, error)
    GetPods(context.Context) ([]*corev1.Pod, error)
}
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This interface is easy to map to the Fly Machines API. For example:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-pjbdlk52"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-pjbdlk52"&gt;CreatePod -&amp;gt; POST /apps/{app_name}/machines
UpdatePod -&amp;gt; POST /apps/{app_name}/machines/{machine_id}
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;K3s, meanwhile, is a stripped-down implementation of all of K8s that fits into a single binary. K3s does a bunch of clever things to be as streamlined as it is, but the most notable of them is &lt;a href='https://github.com/k3s-io/kine' title=''&gt;kine, an API shim that switches &lt;code&gt;etcd&lt;/code&gt; out with databases like SQLite&lt;/a&gt;. Because of &lt;code&gt;kine&lt;/code&gt;, K3s can manage multiple servers, but also gracefully runs on a single server, without distributed state.&lt;/p&gt;

&lt;p&gt;So that&amp;rsquo;s what we do. When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine. We compile a &lt;a href='https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/' title=''&gt;kubeconfig&lt;/a&gt;, with which you can talk to your K3s via &lt;code&gt;kubectl&lt;/code&gt;. We set the whole thing up to run Pods on individual Fly Machines, so your cluster scales out directly using our platform, but with K8s tooling.&lt;/p&gt;

&lt;p&gt;One thing we like about this design is how much of the lifting is already done for us by the underlying platform. If you&amp;rsquo;re a K8s person, take a second to think of all the different components you&amp;rsquo;re dealing with: &lt;a href='https://etcd.io/' title=''&gt;etcd&lt;/a&gt;, specifically provisioned nodes, the &lt;a href='https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/' title=''&gt;kube-proxy&lt;/a&gt;, &lt;a href='https://github.com/flannel-io/flannel' title=''&gt;a CNI &lt;/a&gt;binary and configuration and its integration with the host network, containerd, registries. But Fly.io already does most of those things. So this project was mostly chipping away components until we found the bare minimum: CoreDNS, SQLite persistence, and Virtual Kubelet.&lt;/p&gt;

&lt;p&gt;We ended up with something significantly simpler than K3s, which is saying something.&lt;/p&gt;

&lt;p&gt;Fly Kubernetes has some advantages over plain &lt;code&gt;flyctl&lt;/code&gt; and &lt;code&gt;fly.toml&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your deployment is more declarative than it is with the &lt;code&gt;fly.toml&lt;/code&gt; file. You declare the exact state of everything down to replica counts, autoscaling rules, volume definitions, and more.
&lt;/li&gt;&lt;li&gt;When you deploy with Fly Kubernetes, Kubernetes will automatically make your definitions match the state of the world. Machines go down? Kubernetes will whack them back online.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is a different way to do orchestration and scheduling on Fly.io. It&amp;rsquo;s not what everyone is going to want. But if you want it, you really want it, and we&amp;rsquo;re psyched to give it to you: Fly.io&amp;rsquo;s platform features, with Kubernetes handling configuration and driving your system to its desired state.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve kept things simple to start with. There are K8s use cases we&amp;rsquo;re a strong fit for today, and others we&amp;rsquo;ll get better at in the near future, as K8s users drive the underlying platform (and particularly our proxy) forward.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Interested in getting early access? Email us at &lt;a href="mailto:[email protected]"&gt;[email protected]&lt;/a&gt; and we&amp;rsquo;ll hook you up.&lt;/strong&gt;&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Not invested in K8s?&lt;/h1&gt;
    &lt;p&gt;Nothing has to change for you! You can deploy apps on Fly.io today, in a matter of minutes, without talking to Sales.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/speedrun/"&gt;
        Deploy an app in minutes.&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/A3vFfZvUiwo"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id='what-it-all-means' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-all-means' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What It All Means&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;One obvious thing it means is that you&amp;rsquo;ve got an investment in Kubernetes tooling, you can keep it while running things on top of Fly.io. So that&amp;rsquo;s pretty neat. Buy our cereal!&lt;/p&gt;

&lt;p&gt;But the computer science story is interesting, too. We placed a bet on an idiosyncratic strategy for doing global orchestration. We replaced global consensus, which is how Borg, Kubernetes, and Nomad all work, with a market-based system. That system was faster and, importantly, dumber than the consensus system it replaced.&lt;/p&gt;

&lt;p&gt;This had costs! Nomad&amp;rsquo;s global consensus would do truly heroic amounts of work to make sure Fly Apps got scheduled somewhere, anywhere. Like a good capitalist, Fly Machines will tell you in no uncertain terms how much work it&amp;rsquo;s willing to do for you (&amp;ldquo;less than a Nomad&amp;rdquo;).&lt;/p&gt;

&lt;p&gt;But that doesn&amp;rsquo;t mean you&amp;rsquo;re stuck with the answers Fly Machines gives by itself. Because Fly Machines is so simple, and tries so hard to be predictable, we hoped you&amp;rsquo;d be able to build more sophisticated scheduling and orchestration schemes on top of it. And here you go: Kubernetes scheduling, as a plugin to the platform.&lt;/p&gt;

&lt;p&gt;More to come! We&amp;rsquo;re itching to see just how many different ways this bet might pay off. Or: we&amp;rsquo;ll perish in flames! Either way, it&amp;rsquo;ll be fun to watch.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Fly.io has GPUs now</title>
        <link rel="alternate" href="https://fly.io/blog/fly-io-has-gpus-now/"/>
        <id>https://fly.io/blog/fly-io-has-gpus-now/</id>
        <published>2023-12-13T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/fly-io-has-gpus-now/assets/llama-portal-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, we’re a new public cloud that lets you put your compute where it matters: near your users. Today we’re announcing that you can do this with GPUs too, allowing you to do AI workloads on the edge. Want to find out more? Keep reading.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='ai-is-pretty-fly' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ai-is-pretty-fly' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;AI is pretty fly&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;AI is apparently a bit of a &lt;em&gt;thing&lt;/em&gt; (maybe even &lt;em&gt;an thing&lt;/em&gt; come to think about it). We&amp;rsquo;ve seen entire industries get transformed in the wake of ChatGPT existing (somehow it&amp;rsquo;s only been around for a year, I can&amp;rsquo;t believe it either). It&amp;rsquo;s likely to leave a huge impact on society as a whole in the same way that the Internet did once we got search engines. Like any good venture-capital funded infrastructure provider, we want to enable you to do hilarious things with AI using industrial-grade muscle.&lt;/p&gt;

&lt;p&gt;Fly.io lets you run a full-stack app&amp;mdash;or an entire dev platform based on the &lt;a href='https://fly.io/docs/machines/' title=''&gt;Fly Machines API&lt;/a&gt;&amp;mdash;close to your users. Fly.io GPUs let you attach an &lt;a href='https://www.nvidia.com/en-us/data-center/a100/' title=''&gt;Nvidia A100&lt;/a&gt; to whatever you&amp;rsquo;re building, harnessing the full power of CUDA with more VRAM than your local 4090 can shake a ray-traced stick at. With these cards (or whatever you call a GPU attached to SXM fabric), AI/ML workloads are at your fingertips. You can &lt;a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''&gt;recognize speech&lt;/a&gt;, segment text, summarize articles, synthesize images, and more at speeds that would make your homelab blush. You can even set one up as your programming companion with &lt;a href='https://github.com/deepseek-ai/DeepSeek-Coder' title=''&gt;your model of choice&lt;/a&gt; in case you&amp;rsquo;ve just not been feeling it with the output of &lt;em&gt;other&lt;/em&gt; models changing over time.&lt;/p&gt;

&lt;p&gt;If you want to find out more about what these cards are and what using them is like, check out &lt;a href='https://fly.io/blog/what-are-these-gpus-really/' title=''&gt;What are these &amp;ldquo;GPUs&amp;rdquo; really?&lt;/a&gt; It covers the history of GPUs and why it&amp;rsquo;s ironic that the cards we offer are called &amp;ldquo;Graphics Processing Units&amp;rdquo; in the first place.&lt;/p&gt;
&lt;h2 id='fly-io-gpus-in-action' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-io-gpus-in-action' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Fly.io GPUs in Action&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We want you to deploy your own code with your favorite models on top of Fly.io&amp;rsquo;s cloud backbone. Fly.io GPUs make this really easy.&lt;/p&gt;

&lt;p&gt;You can get a GPU app running &lt;a href='https://ollama.ai' title=''&gt;Ollama&lt;/a&gt; (our friends in text generation) in two steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Put this in your &lt;code&gt;fly.toml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-u2sd0gfo"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-u2sd0gfo"&gt;&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"sandwich_ai"&lt;/span&gt;
&lt;span class="py"&gt;primary_region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ord"&lt;/span&gt;
&lt;span class="py"&gt;vm.size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"a100-40gb"&lt;/span&gt;

&lt;span class="nn"&gt;[build]&lt;/span&gt;
  &lt;span class="py"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama/ollama"&lt;/span&gt;

&lt;span class="nn"&gt;[mounts]&lt;/span&gt;
  &lt;span class="py"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"models"&lt;/span&gt;
  &lt;span class="py"&gt;destination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/root/.ollama"&lt;/span&gt;
  &lt;span class="py"&gt;initial_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"100gb"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Run &lt;code&gt;fly apps create sandwich_ai &amp;amp;&amp;amp; fly deploy&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;If you want to read more about how to start your new sandwich empire, check out &lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;Scaling Large Language Models to zero with Ollama&lt;/a&gt;, it explains how to set up Ollama so that it &lt;em&gt;automatically scales itself down&lt;/em&gt; when it&amp;rsquo;s not in use.&lt;/p&gt;
&lt;h2 id='the-speed-of-light-is-only-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-speed-of-light-is-only-so-fast' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The speed of light is only so fast&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Being able to spin up GPUs is great, but where Fly.io really shines is inference at the edge.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s say you have an app that lets users enter ingredients they have in their kitchen and receive a sandwich recipe. Your users expect their recipes &lt;em&gt;instantly&lt;/em&gt; (or at least as fast as the other leading apps). Seconds count when you need an emergency sandwich.&lt;/p&gt;
&lt;div class="left-sidenote"&gt;&lt;p&gt;&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
It’s depressingly customary in the AI industry to cherry-pick outputs. This was not cherry-picked. I used &lt;a href="https://ollama.ai/library/yi:34b" title=""&gt;yi:34b&lt;/a&gt; to generate this recipe. I’m not sure what a taco salad sandwich is, but I might be willing to try it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="A conversation between a user and an artificial intelligence. The user asks: &amp;quot;What kind of sandwich can I make with an egg, a taco shell, and a pound of ground beef?&amp;quot; The AI responds: &amp;quot; You can make a breakfast taco or a taco salad sandwich with the ingredients you have. Here&amp;#39;s how to prepare each: for the breakfast taco. Cook the pound of ground beef in a pan until browned. Season with salt, pepper, and any other desired spices like chili powder or taco seasoning. While the beef is cooking, fry an egg (or scramble it if you prefer) in another pan. Warm up the taco shell in the oven or microwave according to the package instructions. Assemble the taco by placing the ground cooked beef in the taco shell followed by the fried egg. You can also add any additional toppings like diced onions avocados, avocado, salsa or cheese if you have them. Serve with a side of hot sauce or sour cream if desired. For the taco salad sandwich: Prepare the ground beef as in the breakfast taco recipe above. Instead of frying an egg, you can cook it sunny side up or over easy depending on your preference. Crumble the taco shell into small pieces and mix them with the cooked ground beef. Assemble the sandwich by layering the crumbled taco shell mixture and ground beef in two slices of bread. Top with the fried egg and any additional toppings you like such as shredded lettuce, diced tomatoes, or sliced olives. Serve with a side of salsa or ranch dressing if desired. Both options are hardy and can be easily customized based on your taste preferences. Enjoy!&amp;quot;" src="/blog/fly-io-has-gpus-now/assets/sandwich-recipes.webp" /&gt;&lt;/p&gt;

&lt;p&gt;In the previous snippet, we deployed our app to ord (&lt;code&gt;primary_region = &amp;quot;ord&amp;quot;&lt;/code&gt;). The good news is that our model returns a result really quickly and users in Chicago get instant sandwich recipes. It&amp;rsquo;s a good experience for users near your datacentre, and you can do this on any half decent cloud provider.&lt;/p&gt;

&lt;p&gt;But surely people outside of Chicago need sandwiches too. Amsterdam has sandwich fiends as well. And sometimes it takes too long to have their requests leap across the pond. The speed of light is only so fast after all. Don&amp;rsquo;t worry, we&amp;rsquo;ve got your back. Fly.io has GPUs in datacentres all over the world. Even more, we&amp;rsquo;ll let you run &lt;em&gt;the same program&lt;/em&gt; with the same public IP address and the same TLS certificates in any regions with GPU support.&lt;/p&gt;

&lt;p&gt;Don&amp;rsquo;t believe us? See how you can scale your app up in Amsterdam with one command:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ltu09wf5"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-ltu09wf5"&gt;fly scale count 2 --region ams
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;It&amp;rsquo;s that easy.&lt;/p&gt;
&lt;h2 id='actually-on-demand' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#actually-on-demand' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Actually On-Demand&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;GPUs are powerful parallel processing packages, but they&amp;rsquo;re not cheap! Once we have enough people wanting to turn their fridge contents into tasty sandwiches, keeping a GPU or two running makes sense. But we&amp;rsquo;re just a small app still growing our user base while also funding the latest large sandwich model research. We want to only pay for GPUs when a user makes a request.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s open up that &lt;code&gt;fly.toml&lt;/code&gt; again, and add a section called &lt;code&gt;services&lt;/code&gt;, and we&amp;rsquo;ll include instructions on how we want our app to scale up and down:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-yf7bfopz"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-yf7bfopz"&gt;&lt;span class="nn"&gt;[[services]]&lt;/span&gt;
  &lt;span class="py"&gt;internal_port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;
  &lt;span class="py"&gt;protocol&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"tcp"&lt;/span&gt;
  &lt;span class="py"&gt;auto_stop_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_start_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;min_machines_running&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Now when no one needs sandwich recipes, you don&amp;rsquo;t pay for GPU time.&lt;/p&gt;
&lt;h2 id='the-deets' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-deets' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Deets&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We have GPUs ready to use in several US and EU regions and Sydney. You can deploy your sandwich, music generation, or AI illustration apps to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://www.nvidia.com/en-us/data-center/a100/' title=''&gt;Ampere A100s&lt;/a&gt; with 40gb of RAM for $2.50/hr
&lt;/li&gt;&lt;li&gt;&lt;a href='https://www.nvidia.com/en-us/data-center/a100/' title=''&gt;Ampere A100s&lt;/a&gt; with 80gb of RAM for $3.50/hr
&lt;/li&gt;&lt;li&gt;&lt;a href='https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413' title=''&gt;Lovelace L40s&lt;/a&gt; are coming soon (update: now here!) for $2.50/hr
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;By default, anything you deploy to GPUs will use eight heckin&amp;rsquo; &lt;a href='https://www.amd.com/en/processors/epyc-server-cpu-family' title=''&gt;AMD EPYC&lt;/a&gt; CPU cores, and you can attach volumes up to 500 gigabytes. We&amp;rsquo;ll even give you discounts for reserved instances and dedicated hosts if you ask nicely.&lt;/p&gt;

&lt;p&gt;We hope you have fun with these new cards and we&amp;rsquo;d love to see what you can do with them! Reach out to us on X (formerly Twitter) or &lt;a href='https://community.fly.io/' title=''&gt;the community forum&lt;/a&gt; and share what you&amp;rsquo;ve been up to. We&amp;rsquo;d love to see what we can make easier!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>What are these "GPUs" really?</title>
        <link rel="alternate" href="https://fly.io/blog/what-are-these-gpus-really/"/>
        <id>https://fly.io/blog/what-are-these-gpus-really/</id>
        <published>2023-12-11T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/what-are-these-gpus-really/assets/gpu-songstress-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Fly.io runs containerized apps with virtual machine isolation on our own hardware around the world, so you can safely run your code close to where your users are. We’re in the process of rolling out GPU support, and that’s what this post is about, but you don’t have to wait for that to try us out: &lt;a href="https://fly.io/docs/speedrun/" title=""&gt;your app can be up and running on us in minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;GPU hardware will let our users run all sorts of fun Artificial Intelligence and Machine Learning (AI/ML) workloads near their users. But, what are these &amp;ldquo;GPUs&amp;rdquo; really? What can they do? What &lt;em&gt;can&amp;rsquo;t&lt;/em&gt; they do?&lt;/p&gt;

&lt;p&gt;Listen here for my tale of woe as I spell out exactly what these cards are, are not, and what you can do with them. By the end of this magical journey, you should understand the true irony of them being called &amp;ldquo;Graphics Processing Units&amp;rdquo; and why every marketing term is always bad forever.&lt;/p&gt;
&lt;h2 id='how-does-computer-formed' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-does-computer-formed' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How does computer formed?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In the early days of computing, your computer generally had a few basic components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The CPU
&lt;/li&gt;&lt;li&gt;Input device and assorted peripherals (keyboard, etc)
&lt;/li&gt;&lt;li&gt;Output device (monitor, printer, etc)
&lt;/li&gt;&lt;li&gt;Memory
&lt;/li&gt;&lt;li&gt;Glue logic chips
&lt;/li&gt;&lt;li&gt;Video rendering hardware
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Taking the Commodore 64 as an example, it had a CPU, a chip to handle video output, a chip to handle audio output, and a chip to glue everything together. The CPU would read instructions from the RAM and then execute them to do things like draw to the screen, solve sudoku puzzles, play sounds, and so on.&lt;/p&gt;

&lt;p&gt;However, even though the CPU by itself was fast by the standards of the time, it could only do a million clock cycles per second or so. Imagine a very small shouting crystal vibrating millions of times per second triggering the CPU to do one part of a task and you&amp;rsquo;ll get the idea. This is fast, but not fast enough when executing instructions can take longer than a single clock cycle and when your video output device needs to be updated 60 times per second.&lt;/p&gt;

&lt;p&gt;The main way they optimized this was by shunting a lot of the video output tasks to a bespoke device called the VIC-II (Video Interface Chip, version 2). This allowed the Commodore 64 to send a bunch of instructions to the VIC-II and then let it do its thing while the CPU was off doing other things. This is called &amp;ldquo;offloading&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/what-are-these-gpus-really/assets/./deus-ex-machina-cover.webp" /&gt;&lt;/p&gt;

&lt;p&gt;As technology advanced, the desire to do bigger and better things with both contemporary and future hardware increased. This came to a head when this little studio nobody had ever heard of called id Software released one of the most popular games of all time: DOOM.&lt;/p&gt;

&lt;p&gt;Now, even though DOOM was a huge advancement in gaming technology, it was still incredibly limited by the hardware of the time. It was actually a 2D game that used a lot of tricks to make it look (and feel) like it was 3D. It was also limited to a resolution of 320x200 and a hard cap of 35 frames per second. This was fine for the time (most movies were only at 24 frames per second), but it was clear that there was a lot of room for improvement.&lt;/p&gt;

&lt;p&gt;One of the main things that DOOM did was to use a pair of techniques to draw the world at near real-time. It used a combination of &amp;ldquo;raycasting&amp;rdquo; and binary-space partitioning to draw the world. This basically means that they drew a bunch of imaginary lines to where points in the map would be to figure out what color everything would be and then eliminated the parts of the map that were behind walls and other objects. This is a very simplified explanation, and if you want to know more, &lt;a href='https://fabiensanglard.net/doomIphone/doomClassicRenderer.php' title=''&gt;Fabien Sanglard explains the rendering&lt;/a&gt; of DOOM in more detail.&lt;/p&gt;
&lt;h2 id='the-dream-of-3d' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-dream-of-3d' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The dream of 3D&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;However, a lot of this was logic that ran very slowly on the CPU, and while the CPU was doing the display logic, it couldn&amp;rsquo;t do anything else, such as enemy AI or playing sounds. Hence the idea of a &amp;ldquo;3D accelerator card&amp;rdquo;. The idea: offload the 3D rendering logic to a separate device that could do it much faster than the CPU could, and free the CPU to do other things like AI, sound, and so on.&lt;/p&gt;

&lt;p&gt;This was the dream, but it was a long way off. Then Quake happened.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Really, Half-Life is based on Quake so much that the pattern for &lt;a href="https://www.pcgamer.com/half-life-alyxs-lights-flicker-just-like-they-did-in-quake-almost-25-years-later/" title=""&gt;blinking lights&lt;/a&gt; has carried forward 25 years later to Half-Life: Alyx in VR. If it ain’t broke, don’t fix it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Unlike Doom, Quake was fully 3D on unmodified consumer hardware. Players could look up and down (something previously thought impossible without accelerator hardware!) and designers could make levels with that in mind. Quake also allowed much more complex geometry and textures. It was a huge leap forward in 3D gaming and it was only possible because of the massive leap in CPU power at the time. The Pentium family of processors was such a huge leap that it allowed them to bust through and do it in &amp;ldquo;real time&amp;rdquo;. Quake has since set the standard for multiplayer deathmatch games, and its source code has lineage to Call of Duty, Half-Life, Half-Life 2, DotA 2, Titanfall, and Apex Legends.&lt;/p&gt;

&lt;p&gt;However, the thing that really made 3D accelerator cards leap into the public spotlight was another little-known studio called Crystal Dynamics and their 1996 release of Tomb Raider. It was built from the ground up to require the use of 3D accelerator cards. The cards flew off the shelves.&lt;/p&gt;

&lt;p&gt;&amp;ldquo;3D accelerator cards&amp;rdquo; would later become known as &amp;ldquo;Graphics Processing Units&amp;rdquo; or GPUs because of how synonymous they became with 3D gaming, engineering tasks such as Computer-Aided Drafting (CAD), and even the entire OS environment with compositors like &lt;a href='https://en.wikipedia.org/wiki/Desktop_Window_Manager' title=''&gt;DWM&lt;/a&gt; on Windows Vista, &lt;a href='https://en.wikipedia.org/wiki/Compiz' title=''&gt;Compiz&lt;/a&gt; on GNU+Linux, and &lt;a href='https://en.wikipedia.org/wiki/Quartz_(graphics_layer)' title=''&gt;Quartz&lt;/a&gt; on macOS. Things became so much easier for everyone when 2D and 3D graphics were integrated into the same device so you didn&amp;rsquo;t need to chain your output through your 3D accelerator card!&lt;/p&gt;
&lt;h2 id='the-gpu-as-we-know-it' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-gpu-as-we-know-it' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The GPU as we know it&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;When GPUs first came out, they were very simple devices. They had a few basic components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A framebuffer to store the current state of the screen
&lt;/li&gt;&lt;li&gt;A command processor to take instructions from the game and translate them into something the hardware can understand
&lt;/li&gt;&lt;li&gt;Memory to store temporary data
&lt;/li&gt;&lt;li&gt;Shader processing hardware to allow designers to change how light and textures were rendered
&lt;/li&gt;&lt;li&gt;A display output that was chained through an existing VGA card so that the user could see what was going on in real time (yes, this is something we actually did)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This basic architecture has remained the same for the past 20 years or so. The main differences are that as technology advanced, the capabilities of those cards increased. They got faster, more parallel, more capable, had more memory, were made cheaper, and so on. This gradually allowed for more and more complex games like Half-Life 2, Crysis, The Legend of Zelda: Breath of the Wild, Baudur&amp;rsquo;s Gate 3, and so on.&lt;/p&gt;

&lt;p&gt;Over time, as more and more hardware was added, GPUs became computers in their own rights (sometimes even bigger than the rest of the computer thanks for the need to cool things more aggressively). This new hardware includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Video encoding hardware via NVENC and AMD VCE so that content creators can stream and record their gameplay in higher quality without having to impact the performance of the game
&lt;/li&gt;&lt;li&gt;&lt;aside class="left-sidenote"&gt;Seriously, once you experience high framerate HDR raytraced Tetris you can&amp;rsquo;t really go back to the old way.&lt;/aside&gt;  Raytracing accelerator cores via RTX so that light can be rendered more realistically
&lt;/li&gt;&lt;li&gt;AI/ML cores to allow for dynamic upscaling to eke out more performance from the card
&lt;/li&gt;&lt;li&gt;Display output hardware to allow for multiple monitors to be connected to the card
&lt;/li&gt;&lt;li&gt;Faster and faster memory buses and interfaces to the rest of the system to allow for more data to be processed faster
&lt;/li&gt;&lt;li&gt;Direct streaming from the drive to GPU memory to allow for faster loading times
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;But, at the same time, that AI/ML hardware started to get noticed by more and more people. It was discovered that the shader cores and then the CUDA cores could be used to do AI/ML workloads at ludicrous speeds. This enabled research and development of models like GPT-2, Stable Diffusion, DLSS, and so on. This has led to a Cambrian Explosion of AI/ML research and development that is continuing to this day.&lt;/p&gt;
&lt;h2 id='the-quot-gpus-quot-that-fly-io-is-using' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-quot-gpus-quot-that-fly-io-is-using' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The &amp;ldquo;GPUs&amp;rdquo; that Fly.io is using&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve mostly been describing consumer GPUs and their capabilities up to this point because that&amp;rsquo;s what we all have the biggest understanding of. There is a huge difference between the &amp;ldquo;GPUs&amp;rdquo; that you can get for server tasks and normal consumer tasks from a place like Newegg or Best Buy. The main difference is that enterprise-grade Graphics Processing Units do not have any of the hardware needed to process graphics.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Author’s note: This will not be the case in the future. Fly.io is going to add &lt;a href="https://www.nvidia.com/en-us/data-center/l40s/" title=""&gt;Lovelace L40S GPUs&lt;/a&gt; that do have 3D rendering, video encoding, shader cores, and so on. But, that’s not what we’re talking about today.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Yes. Really. They don&amp;rsquo;t have rasterization hardware, shader cores, display outputs, or anything useful for trying to run games on them. They are AI/ML accelerator cards more than anything. It&amp;rsquo;s kinda beautifully ironic that they&amp;rsquo;re called Graphics Processing Units when they have no ability to process graphics.&lt;/p&gt;
&lt;h2 id='what-can-you-do-with-them' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-can-you-do-with-them' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What can you do with them?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;These GPUs are really good at massively parallel tasks. This naturally translates to being very good at AI/ML tasks such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Summarization (what is this article about in a few sentences?)
&lt;/li&gt;&lt;li&gt;Translation (what does this article say in Spanish?)
&lt;/li&gt;&lt;li&gt;Speech recognition (what is a voice clip saying?)
&lt;/li&gt;&lt;li&gt;Speech synthesis (what does this text sound like?)
&lt;/li&gt;&lt;li&gt;Text generation (what would a cat say if it could talk?)
&lt;/li&gt;&lt;li&gt;Basic rote question and answering (what is the safe cooking temperature for chicken breasts in celsius?)
&lt;/li&gt;&lt;li&gt;Text classification (is this article about cats or dogs?)
&lt;/li&gt;&lt;li&gt;Sentiment analysis (is this article positive or negative, what could that mean about the companies involved?)
&lt;/li&gt;&lt;li&gt;Image classification (is this a cat or a dog?)
&lt;/li&gt;&lt;li&gt;Object detection (where are the cats and dogs in this image?)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Or any combination/chain of these tasks. A lot of this is pretty abstract building blocks that can be combined in a lot of different ways. This is why AI/ML stuff is so exciting right now. We&amp;rsquo;re in the early days of understanding what these things are, what they can do, and how to use them properly.&lt;/p&gt;

&lt;p&gt;Imagine being able to load articles about the topic you are researching into your queries to find where someone said something roughly similar to what you&amp;rsquo;re looking for. Queries like &amp;ldquo;that one recipe with eggs that you fold over with ham in it&amp;rdquo;. That&amp;rsquo;s the kind of thing that&amp;rsquo;s possible with AI/ML (and tools like vector databases) but difficult to impossible with traditional search engines.&lt;/p&gt;
&lt;h2 id='how-to-use-ai-for-reals' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-to-use-ai-for-reals' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How to use AI for reals&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Fortunately and unfortunately, we&amp;rsquo;re in the Cambrian Explosion days of this industry. Key advances happen constantly. Exact models and tooling changes almost as often. This is both a very good thing and a very bad thing.&lt;/p&gt;

&lt;p&gt;If you want to get started today, here&amp;rsquo;s a few models that you can play with right now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://ai.meta.com/llama/' title=''&gt;Llama 2&lt;/a&gt; - A generic foundation model with instruction and chat tuned variants. It&amp;rsquo;s a good starting point for a lot of research and nearly everything else uses the same formats that Llama 2 does.
&lt;/li&gt;&lt;li&gt;&lt;a href='https://openai.com/research/whisper' title=''&gt;Whisper&lt;/a&gt; - A speech to text model that transcribes audio files into text better than most professional dictation software. I, the author of this article, wrote most of this article using Whisper.
&lt;/li&gt;&lt;li&gt;&lt;a href='https://huggingface.co/NurtureAI/OpenHermes-2.5-Mistral-7B-16k' title=''&gt;OpenHermes-2.5 Mistral 7B 16k&lt;/a&gt; - An instruction-tuned model that can operate on up to 16 thousand tokens (about 40 printed pages of text, 12,000 words) at once. It&amp;rsquo;s a good starting point for summarization and other tasks that require a lot of context. I personally use it for my personal AI chatbot named &lt;a href='https://xeiaso.net/characters/#Mimi' title=''&gt;Mimi&lt;/a&gt;.
&lt;/li&gt;&lt;li&gt;&lt;aside class="right-sidenote"&gt;Seriously Annie, you&amp;rsquo;re great!&lt;/aside&gt; &lt;a href='https://stability.ai/stable-diffusion' title=''&gt;Stable Diffusion XL&lt;/a&gt; - A text-to-image model that lets you create high quality images from simple text descriptions. It&amp;rsquo;s a good starting point for tasks that require image generation, such as when you want to add images to your blog posts but don&amp;rsquo;t have an artist like Annie to draw you what you want.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;For a practical example, imagine that you have a set of &lt;a href='https://xeiaso.net/talks/' title=''&gt;conference talks that you&amp;rsquo;ve given over the years&lt;/a&gt;. You want to take those talk videos, extract the audio, and transform them into written text because some people learn better from text than video. The overall workflow would look something like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use ffmpeg to extract the audio track from the video files
&lt;/li&gt;&lt;li&gt;Use Whisper to &lt;a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''&gt;convert the audio files into subtitle files&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;Break the subtitle file into sequences based on significant pauses between topics (humans do this subconsciously, take advantage of it and you can make things seem heckin&amp;rsquo; magic)
&lt;/li&gt;&lt;li&gt;Use a large language model to summarize the segments and create a title for each segment
&lt;/li&gt;&lt;li&gt;Paste the rest of the text into a markdown document between the segment titles
&lt;/li&gt;&lt;li&gt;Manually review the documents and make any necessary changes with technical terms that the model didn&amp;rsquo;t know about or things the model got wrong because English is a minefield of homophones that even trained experts have trouble with (ask me how I know)
&lt;/li&gt;&lt;li&gt;Publish the documents on your blog
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Then bam, you don&amp;rsquo;t just have a portfolio piece, you have the recipe for winning downtime from visitors of orange websites clicking on your link so much. You can also use this to create transcripts for your videos so that people who can&amp;rsquo;t hear can still enjoy your content.&lt;/p&gt;

&lt;p&gt;The true advantage of these is not using them as individual parts on themselves, but as a cohesive whole in a chain. This is where the real power of AI/ML comes from. It&amp;rsquo;s not the individual models, but the ability to chain them together to do something useful. This is where the true opportunities for innovation lie.&lt;/p&gt;
&lt;h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Conclusion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So that&amp;rsquo;s what these &amp;ldquo;GPUs&amp;rdquo; are really: they&amp;rsquo;re AI/ML accelerator cards. The A100 cards incapable of processing graphics or encoding video, but they&amp;rsquo;re really, really good at AI/ML workloads. They allow you to do way more tasks per watt than any CPU ever could.&lt;/p&gt;

&lt;p&gt;I hope you enjoyed this tale of woe as I spilled out the horrible truths about marketing being awful forever and gave you ideas for how to &lt;em&gt;actually use&lt;/em&gt; these graphics-free Graphics Processing Units to do useful things. But sadly, not for processing graphics unless you wait for the &lt;a href='https://www.nvidia.com/en-us/data-center/l40s/' title=''&gt;Lovelace L40S&lt;/a&gt; cards early in 2024.&lt;/p&gt;

&lt;p&gt;Sign up for Fly.io today and try our GPUs! I can&amp;rsquo;t wait to see what you build with them.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Scaling Large Language Models to zero with Ollama</title>
        <link rel="alternate" href="https://fly.io/blog/scaling-llm-ollama/"/>
        <id>https://fly.io/blog/scaling-llm-ollama/</id>
        <published>2023-12-06T12:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/scaling-llm-ollama/assets/thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io. We have powerful servers worldwide to run your code close to your users. Including GPUs so you can self host your own AI.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Open-source self-hosted AI tools have advanced a lot in the past 6 months. They allow you to create new methods of expression (with QR code generation and Stable Diffusion), easy access to summarization powers that would have made Google blush a decade ago (even with untuned foundation models such as LLaMa 2 and Yi), to conversational assistants that enable people to do more with their time, and to perform speech recognition in &lt;em&gt;real time&lt;/em&gt; on moderate hardware (with Whisper et al). With all these capabilities comes the need for more and more raw computational muscle to be able to do inference on bigger and bigger models, and eventually do things that we can&amp;rsquo;t even imagine right now. Fly.io lets you put your compute where your users are so that you can do machine learning inference tasks on the edge with the power of enterprise-grade GPUs such as the Nvidia A100. You can also scale your GPU nodes to zero running Machines, so you only pay for what you actually need, when you need it.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;It’s worth mentioning that “scaling to zero” doesn’t mean what you may think it means. When you “scale to zero” in Fly.io, you actually stop the running Machine. This means the Machine is still laying around on the same computer box that it runs on, but it’s just put to sleep. If there is a capacity issue then your app may be unable to wake back up. We are working on a solution to this, but for now you should be aware that scaling to zero is not the same as spinning down your Machine and spinning it back up again on a new computer box when you need it.&lt;/p&gt;
&lt;/div&gt;&lt;div class="callout"&gt;&lt;p&gt;This is a continuation of the last post in this series about &lt;a href="https://fly.io/blog/transcribing-on-fly-gpu-machines/" title=""&gt;how to use GPUs on Fly.io&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='why-scale-to-zero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-scale-to-zero' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Why scale to zero?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Running GPU nodes on top of Fly is expensive. Sure, GPUs enable you to do things a lot faster than CPUs ever could on their own, but you mostly will have things run idle between uses. This is where scaling to zero comes in. With scaling to zero, you can have your GPU nodes shut down when you&amp;rsquo;re not using them. When your Machine stops, you aren&amp;rsquo;t paying for the GPU any more. This is good for the environment and your wallet.&lt;/p&gt;

&lt;p&gt;In this post, we&amp;rsquo;re going to be using &lt;a href='https://ollama.ai' title=''&gt;Ollama&lt;/a&gt; to generate text. Ollama is a fancy wrapper around &lt;a href='https://github.com/ggerganov/llama.cpp' title=''&gt;llama.cpp&lt;/a&gt; that allows you to run large language models on your own hardware with your choice of model. It also supports GPU acceleration, meaning that you can use Fly.io&amp;rsquo;s huge GPUs to run your models faster than your RTX 3060 at home ever would on its own.&lt;/p&gt;

&lt;p&gt;One of the main downsides of using Ollama in a cloud environment is that it doesn&amp;rsquo;t have authentication by default. Thanks to the power of about 70 lines of Go, we are able to shim that in after the fact. This will protect your server from random people on the internet using your GPU time (and spending your money) to generate text and integrate it into your own applications.&lt;/p&gt;

&lt;p&gt;Create a new folder called &lt;code&gt;ollama-scale-to-0&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-kehwb9j1"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-kehwb9j1"&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;ollama-scale-to-0
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;h2 id='fly-app-setup' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-app-setup' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Fly app setup&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;First, we need to create a new Fly app:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-78cxwfbn"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-78cxwfbn"&gt;fly launch &lt;span class="nt"&gt;--no-deploy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;After selecting a name and an organization to run it in, this command will create the app and write out a &lt;code&gt;fly.toml&lt;/code&gt; file for you:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-6efaeoqi"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-6efaeoqi"&gt;&lt;span class="c"&gt;# fly.toml app configuration file generated for sparkling-violet-709 on 2023-11-14T12:13:53-05:00&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;
&lt;span class="c"&gt;# See https://fly.io/docs/reference/configuration/ for information about how to use this file.&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;

&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"sparkling-violet-709"&lt;/span&gt;
&lt;span class="py"&gt;primary_region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ord"&lt;/span&gt;

&lt;span class="nn"&gt;[http_service]&lt;/span&gt;
  &lt;span class="py"&gt;internal_port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;11434&lt;/span&gt; &lt;span class="c"&gt;# change me to 11434!&lt;/span&gt;
  &lt;span class="py"&gt;force_https&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="c"&gt;# change mo to false!&lt;/span&gt;
  &lt;span class="py"&gt;auto_stop_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_start_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;min_machines_running&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="py"&gt;processes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;["app"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is the configuration file that Fly.io uses to know how to run your application. We&amp;rsquo;re going to be modifying the &lt;code&gt;fly.toml&lt;/code&gt; file to add some additional configuration to it, such as enabling GPU support:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ef9ddpog"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-ef9ddpog"&gt;&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"sparkling-violet-709"&lt;/span&gt;
&lt;span class="py"&gt;primary_region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ord"&lt;/span&gt;
&lt;span class="py"&gt;vm.size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"a100-40gb"&lt;/span&gt; &lt;span class="c"&gt;# the GPU size, see https://fly.io/docs/gpus/gpu-quickstart/ for more info&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We don&amp;rsquo;t want to expose the GPU to the internet, so we&amp;rsquo;re going to create a &lt;a href='https://fly.io/docs/reference/private-networking/#flycast-private-load-balancing' title=''&gt;flycast&lt;/a&gt; address to expose it to other services on your private network. To create a flycast address, run this command:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-q2lw27z"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-q2lw27z"&gt;fly ips allocate-v6 &lt;span class="nt"&gt;--private&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;fly ips allocate-v6&lt;/code&gt; command makes a unique address in your private network that you can use to access Ollama from your other services. Make sure to add the &lt;code&gt;--private&lt;/code&gt; flag, otherwise you&amp;rsquo;ll get a globally unique IP address instead of a private one.&lt;/p&gt;

&lt;p&gt;Next, you may need to remove all of the other public IP addresses for the app to lock it away from the public. Get a list of them with &lt;code&gt;fly ips list&lt;/code&gt; and then remove them with &lt;code&gt;fly ips release &amp;lt;ip&amp;gt;&lt;/code&gt;. Delete everything but your flycast IP.&lt;/p&gt;

&lt;p&gt;Next, we need to declare the volume for Ollama to store models in. If you don&amp;rsquo;t do this, then when you scale to zero, your existing models will be destroyed and you will have to re-download them every time the server starts. This is not ideal, so we&amp;rsquo;re going to create a persistent volume to store the models in. Add the following to your &lt;code&gt;fly.toml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-xcc0fiwo"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-xcc0fiwo"&gt;&lt;span class="nn"&gt;[build]&lt;/span&gt;
  &lt;span class="py"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama/ollama"&lt;/span&gt;

&lt;span class="nn"&gt;[mounts]&lt;/span&gt;
  &lt;span class="py"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"models"&lt;/span&gt;
  &lt;span class="py"&gt;destination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/root/.ollama"&lt;/span&gt;
  &lt;span class="py"&gt;initial_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"100gb"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This will create a 100GB volume in the &lt;a href='https://en.wikipedia.org/wiki/O%27Hare_International_Airport' title=''&gt;&lt;code&gt;ord&lt;/code&gt;&lt;/a&gt; region when the app is deployed. This will be used to store the models that you download from the &lt;a href='https://ollama.ai/library/' title=''&gt;Ollama library&lt;/a&gt;. You can make this smaller if you want, but 100GB is a good place to start from.&lt;/p&gt;

&lt;p&gt;Now that everything is set up, we can deploy this to Fly.io:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-f1l19v0a"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-f1l19v0a"&gt;fly deploy
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This will take a minute to pull the Ollama image, push it to a Machine, provision your volume, and kick everything else off with hypervisors, GPUs and whatnot. Once it&amp;rsquo;s done, you should see something like this:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-wji5shy9"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-wji5shy9"&gt; ✔ Machine 17816141f55489 &lt;span class="o"&gt;[&lt;/span&gt;app] update succeeded
&lt;span class="nt"&gt;-------&lt;/span&gt;

Visit your newly deployed app at https://sparkling-violet-709.fly.dev/
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is a lie because we just deleted the public IP addresses for this app. You can&amp;rsquo;t access it from the internet, and by extension, random people can&amp;rsquo;t access it either. For now, you can run an interactive session with Ollama using an ephemeral Fly Machine:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-abxdbso0"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-abxdbso0"&gt;fly m run &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;OLLAMA_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://sparkling-violet-709.flycast &lt;span class="nt"&gt;--shell&lt;/span&gt; ollama/ollama
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And then you can pull an image from the &lt;a href='https://ollama.ai/library/' title=''&gt;ollama library&lt;/a&gt; and generate some text:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-7zldfj8t"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-7zldfj8t"&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;ollama run openchat:7b-v3.5-fp16
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; How &lt;span class="k"&gt;do &lt;/span&gt;I bake chocolate chip cookies?
 To bake chocolate chip cookies, follow these steps:

1. Preheat the oven to 375°F &lt;span class="o"&gt;(&lt;/span&gt;190°C&lt;span class="o"&gt;)&lt;/span&gt; and line a baking sheet with parchment paper or silicone baking mat.

2. In a large bowl, mix together 1 cup of unsalted butter &lt;span class="o"&gt;(&lt;/span&gt;softened&lt;span class="o"&gt;)&lt;/span&gt;, 3/4 cup granulated sugar, and 3/4
cup packed brown sugar &lt;span class="k"&gt;until &lt;/span&gt;light and fluffy.

3. Add 2 large eggs, one at a &lt;span class="nb"&gt;time&lt;/span&gt;, to the butter mixture, beating well after each addition. Stir &lt;span class="k"&gt;in &lt;/span&gt;1
teaspoon of pure vanilla extract.

4. In a separate bowl, whisk together 2 cups all-purpose flour, 1/2 teaspoon baking soda, and 1/2 teaspoon
salt. Gradually add the dry ingredients to the wet ingredients, stirring &lt;span class="k"&gt;until &lt;/span&gt;just combined.

5. Fold &lt;span class="k"&gt;in &lt;/span&gt;2 cups of chocolate chips &lt;span class="o"&gt;(&lt;/span&gt;or chunks&lt;span class="o"&gt;)&lt;/span&gt; into the dough.

6. Drop rounded tablespoons of dough onto the prepared baking sheet, spacing them about 2 inches apart.

7. Bake &lt;span class="k"&gt;for &lt;/span&gt;10-12 minutes, or &lt;span class="k"&gt;until &lt;/span&gt;the edges are golden brown. The centers should still be slightly soft.

8. Allow the cookies to cool on the baking sheet &lt;span class="k"&gt;for &lt;/span&gt;a few minutes before transferring them to a wire rack
to cool completely.

Enjoy your homemade chocolate chip cookies!
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you want a persistent wake-on-use connection to your Ollama instance, you can set up a &lt;a href='https://fly.io/docs/reference/private-networking/#discovering-apps-through-dns-on-a-wireguard-connection' title=''&gt;connection to your Fly network using WireGuard&lt;/a&gt;. This will let you use Ollama from your local applications without having to run them on Fly. For example, if you want to figure out the safe cooking temperature for ground beef in Celsius, you can query that in JavaScript with this snippet of code:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative typescript"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-x8r5bgmm"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-x8r5bgmm"&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;generateRequest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openchat:7b-v3.5-fp16&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What is the safe cooking temperature for ground beef in celsius?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// &amp;lt;- important for Node/Deno clients&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://sparkling-violet-709.flycast/api/generate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;generateRequest&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`error fetching response: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Something like "The safe cooking temperature for ground beef is 71 degrees celsius (160 degrees fahrenheit).&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;h2 id='scaling-to-zero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#scaling-to-zero' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Scaling to zero&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The best part about all of this is that when you want to scale down to zero running Machines: do nothing, it will automatically shut down when it&amp;rsquo;s idle. Wait a few minutes and then verify it with &lt;code&gt;fly status&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-wpipvj9k"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-wpipvj9k"&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;fly status

...

PROCESS ID              VERSION REGION  STATE   ROLE    CHECKS  LAST UPDATED
app     3d8d7949b22089  9       ord     stopped                 2023-11-14T19:34:24Z
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The app has been stopped. This means that it&amp;rsquo;s not running and you&amp;rsquo;re not paying for it. When you want it to start up again, just make a request. It will automatically start up and you can use it as normal with the CLI or even just arbitrary calls to &lt;a href='https://github.com/jmorganca/ollama/blob/main/docs/api.md' title=''&gt;the API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can also upload your own models to the Ollama registry by &lt;a href='https://github.com/jmorganca/ollama/blob/main/docs/import.md' title=''&gt;creating your own Modelfile&lt;/a&gt; and pushing it (though you will need to install Ollama locally to publish your own models). At this time, the only way to set a custom system prompt is to use a Modelfile and upload your model to the registry.&lt;/p&gt;
&lt;h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Conclusion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Ollama is a fantastic way to run large language models of your choice and the ability to use Fly.io&amp;rsquo;s powerful GPUs means you can use bigger models with more parameters and a larger context window. This lets you make your assistants more lifelike, your conversations have more context, and your text generation more realistic.&lt;/p&gt;

&lt;p&gt;Oh, by the way, this also lets you use the new &lt;code&gt;json&lt;/code&gt; mode to have your models call functions, similar to how ChatGPT would. To do this, have a system prompt that looks like this:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-2x5qkx59"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-2x5qkx59"&gt;You are a helpful research assistant. The following functions are available for you to fetch further data to answer user questions, if relevant:

{
    "function": "search_bing",
    "description": "Search the web for content on Bing. This allows users to search online/the internet/the web for content.",
    "arguments": [
        {
            "name": "query",
            "type": "string",
            "description": "The search query string"
        }
    ]
}

{
    "function": "search_arxiv",
    "description": "Search for research papers on ArXiv. Make use of AND, OR and NOT operators as appropriate to join terms within the query.",
    "arguments": [
        {
            "name": "query",
            "type": "string",
            "description": "The search query string"
        }
    ]
}

To call a function, respond - immediately and only - with a JSON object of the following format:
{
    "function": "function_name",
    "arguments": {
        "argument1": "argument_value",
        "argument2": "argument_value"
    }
}

If no function needs to be called, respond with an empty JSON object: {}
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Then you can use the &lt;a href='https://github.com/jmorganca/ollama/blob/main/docs/api.md#request-json-mode' title=''&gt;JSON format&lt;/a&gt; to receive a JSON response from Ollama (hint: &lt;code&gt;—format=json&lt;/code&gt; in the CLI or &lt;code&gt;format: &amp;quot;json&amp;quot;&lt;/code&gt; in the API). This is a great way to make your assistants more lifelike and more useful. You will need to use something like &lt;a href='https://www.langchain.com/' title=''&gt;Langchain&lt;/a&gt; or manual iterations to properly handle the cases where the user doesn&amp;rsquo;t want to call a function, but that&amp;rsquo;s a topic for another blog post.&lt;/p&gt;

&lt;p&gt;For the best results you may want to use a model with a larger context window such as &lt;a href='https://ollama.ai/library/vicuna:13b-v1.5-16k-fp16' title=''&gt;vicuna:13b-v1.5-16k-fp16&lt;/a&gt; (16k == 16,384 token window) as JSON is very token-expensive. Future advances in the next few weeks (such as the Yi models gaining ludicrous token windows on the line of 200,000 tokens at the cost of ludicrous amounts of VRAM usage) will make this less of an issue. You can also get away with minifying the JSON in the functions and examples a lot, but you may need to experiment to get the best results.&lt;/p&gt;

&lt;p&gt;Happy hacking, y&amp;#39;all.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Rethinking Serverless with FLAME</title>
        <link rel="alternate" href="https://fly.io/blog/rethinking-serverless-with-flame/"/>
        <id>https://fly.io/blog/rethinking-serverless-with-flame/</id>
        <published>2023-12-06T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/rethinking-serverless-with-flame/assets/flame-thumb.webp"/>
        <content type="html">&lt;blockquote&gt;Imagine if you could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of your app.&lt;/blockquote&gt;


&lt;p&gt;The pursuit of elastic, auto-scaling applications has taken us to silly places.&lt;/p&gt;

&lt;p&gt;Serverless/FaaS had a couple things going for it. Elastic Scale™ is hard. It&amp;rsquo;s even harder when you need to manage those pesky servers. It also promised pay-what-you-use costs to avoid idle usage. Good stuff, right?&lt;/p&gt;

&lt;p&gt;Well the charade is over. You offload scaling concerns and the complexities of scaling, just to end up needing &lt;em&gt;more complexity&lt;/em&gt;. Additional queues, storage, and glue code to communicate back to our app is just the starting point. Dev, test, and CI complexity balloons as fast as your costs. Oh, and you often have to rewrite your app in proprietary JavaScript – even if it&amp;rsquo;s already written in JavaScript!&lt;/p&gt;

&lt;p&gt;At the same time, the rest of us have elastically scaled by starting more webservers. Or we&amp;rsquo;ve dumped on complexity with microservices. This doesn&amp;rsquo;t make sense. Piling on more webservers to transcode more videos or serve up more ML tasks isn&amp;rsquo;t what we want. And granular scale shouldn&amp;rsquo;t require slicing our apps into bespoke operational units with their own APIs and deployments to manage.&lt;/p&gt;

&lt;p&gt;Enough is enough. There&amp;rsquo;s a better way to elastically scale applications.&lt;/p&gt;
&lt;h2 id='the-flame-pattern' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-flame-pattern' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The FLAME pattern&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s what we really want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We don&amp;rsquo;t want to manage those pesky servers. We already have this for our app deployments via &lt;code&gt;fly deploy&lt;/code&gt;, &lt;code&gt;git push heroku&lt;/code&gt;, &lt;code&gt;kubectl&lt;/code&gt;, etc
&lt;/li&gt;&lt;li&gt;We want on-demand, &lt;em&gt;granular&lt;/em&gt; elastic scale of specific parts of our app code
&lt;/li&gt;&lt;li&gt;We don&amp;rsquo;t want to rewrite our application or write parts of it in proprietary runtimes
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Imagine if we could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of the app.&lt;/p&gt;

&lt;p&gt;Enter the FLAME pattern.&lt;/p&gt;
&lt;blockquote&gt;FLAME - Fleeting Lambda Application for Modular Execution&lt;/blockquote&gt;


&lt;p&gt;With FLAME, you treat your &lt;em&gt;entire application&lt;/em&gt; as a lambda, where modular parts can be executed on short-lived infrastructure.&lt;/p&gt;

&lt;p&gt;No rewrites. No bespoke runtimes. No outrageous layers of complexity. Need to insert the results of an expensive operation to the database? PubSub broadcast the result of some expensive work? No problem! It&amp;rsquo;s your whole app so of course you can do it.&lt;/p&gt;

&lt;p&gt;The Elixir &lt;a href='https://github.com/phoenixframework/flame' title=''&gt;flame library&lt;/a&gt; implements the FLAME pattern. It has a backend adapter for Fly.io, but you can use it on any cloud that gives you an API to spin up an instance with your app code running on it. We&amp;rsquo;ll talk more about backends in a bit, as well as implementing FLAME in other languages.&lt;/p&gt;

&lt;p&gt;First, lets watch a realtime thumbnail generation example to see FLAME + Elixir in action:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/l1xt_rkWdic"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Now let&amp;rsquo;s walk thru something a little more basic. Imagine we have a function to transcode video to thumbnails in our Elixir application after they are uploaded:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-racxyl56"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-racxyl56"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;generate_thumbnails&lt;/span&gt;&lt;span class="p"&gt;(%&lt;/span&gt;&lt;span class="no"&gt;Video&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tmp_dir!&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="no"&gt;Ecto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;UUID&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
  &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mkdir!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-i"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"-vf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"fps=1/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/%02d.png"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"ffmpeg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;VidStore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;put_thumbnails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wildcard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"/*.png"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="no"&gt;Repo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;insert_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Thumb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;vid_id:&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;url:&lt;/span&gt; &lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Our &lt;code&gt;generate_thumbnails&lt;/code&gt; function accepts a video struct. We shell out to &lt;code&gt;ffmpeg&lt;/code&gt; to take the video URL and generate thumbnails at a given interval. We then write the temporary thumbnail paths to durable storage. Finally, we insert the generated thumbnail URLs into the database.&lt;/p&gt;

&lt;p&gt;This works great locally, but CPU bound work like video transcoding can quickly bring our entire service to a halt in production. Instead of rewriting large swaths of our app to move this into microservices or some FaaS, we can simply wrap it in a FLAME call:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-b6z6pma"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-b6z6pma"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;generate_thumbnails&lt;/span&gt;&lt;span class="p"&gt;(%&lt;/span&gt;&lt;span class="no"&gt;Video&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="no"&gt;FLAME&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;MyApp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;FFMpegRunner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tmp_dir!&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="no"&gt;Ecto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;UUID&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mkdir!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-i"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"-vf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"fps=1/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/%02d.png"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"ffmpeg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;VidStore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;put_thumbnails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wildcard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"/*.png"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="no"&gt;Repo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;insert_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Thumb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;vid_id:&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;url:&lt;/span&gt; &lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;That&amp;rsquo;s it! &lt;code&gt;FLAME.call&lt;/code&gt; accepts the name of a runner pool, and a function. It then finds or boots a new copy of our entire application and runs the function there. Any variables the function closes over (like our &lt;code&gt;%Video{}&lt;/code&gt; struct and &lt;code&gt;interval&lt;/code&gt;) are passed along automatically.&lt;/p&gt;

&lt;p&gt;When the FLAME runner boots up, it connects back to the parent node, receives the function to run, executes it, and returns the result to the caller. Based on configuration, the booted runner either waits happily for more work before idling down, or extinguishes itself immediately.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s visualize the flow:&lt;/p&gt;

&lt;p&gt;&lt;img alt="visualizing the flow" src="/blog/rethinking-serverless-with-flame/assets/visual.webp?centered" /&gt;&lt;/p&gt;

&lt;p&gt;We changed no other code and issued our DB write with &lt;code&gt;Repo.insert_all&lt;/code&gt; just like before, because we are running our &lt;em&gt;entire&lt;/em&gt; &lt;em&gt;application&lt;/em&gt;. Database connection(s) and all. Except this fleeting application only runs that little function after startup and nothing else.&lt;/p&gt;

&lt;p&gt;In practice, a FLAME implementation will support a pool of runners for hot startup, scale-to-zero, and elastic growth. More on that later.&lt;/p&gt;
&lt;h2 id='solving-a-problem-vs-removing-the-problem' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#solving-a-problem-vs-removing-the-problem' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Solving a problem vs removing the problem&lt;/span&gt;&lt;/h2&gt;&lt;blockquote&gt;FaaS solutions help you solve a problem. FLAME removes the problem.&lt;/blockquote&gt;


&lt;p&gt;The FaaS labyrinth of complexity defies reason. And it&amp;rsquo;s unavoidable. Let&amp;rsquo;s walkthrough the thumbnail use-case to see how.&lt;/p&gt;

&lt;p&gt;We try to start with the simplest building block like request/response AWS Lambda Function URL&amp;rsquo;s.&lt;/p&gt;

&lt;p&gt;The complexity hits immediately.&lt;/p&gt;

&lt;p&gt;We start writing custom encoders/decoders on both sides to handle streaming the thumbnails back to the app over HTTP. Phew that&amp;rsquo;s done. Wait, is our video transcoding or user uploads going to take longer than 15 minutes? Sorry, hard timeout limit – time to split our videos into chunks to stay within the timeout, which means more lambdas to do that. Now we&amp;rsquo;re orchestrating lambda workflows and relying on additional services, such as SQS and S3, to enable this.&lt;/p&gt;

&lt;p&gt;All the FaaS is doing is adding layers of communication between your code and the parts you want to run elastically. Each layer has its own glue integration price to pay.&lt;/p&gt;

&lt;p&gt;Ultimately handling this kind of use-case looks something like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trigger the lambda via HTTP endpoint, S3, or API gateway ($)
&lt;/li&gt;&lt;li&gt;Write the bespoke lambda to transcode the video ($)
&lt;/li&gt;&lt;li&gt;Place the thumbnail results into SQS ($)
&lt;/li&gt;&lt;li&gt;Write the SQS consumer in our app (dev $)
&lt;/li&gt;&lt;li&gt;Persist to DB and figure out how to get events back to active subscribers that may well be connected to other instances than the SQS consumer (dev $)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is nuts. We pay the FaaS toll at every step. We shouldn&amp;rsquo;t have to do any of this!&lt;/p&gt;

&lt;p&gt;FaaS provides a bunch of offerings to build a solution on top of. FLAME removes the problem entirely.&lt;/p&gt;
&lt;h2 id='flame-backends' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#flame-backends' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;FLAME Backends&lt;/span&gt;&lt;/h2&gt;&lt;blockquote&gt;On Fly.io infrastructure the &lt;code&gt;FLAME.FlyBackend&lt;/code&gt; can boot a copy of your application on a new &lt;a href="https://fly.io/docs/machines/"&gt;Machine&lt;/a&gt; and have it connect back to the parent for work within ~3s.&lt;/blockquote&gt;


&lt;p&gt;By default, FLAME ships with a &lt;code&gt;LocalBackend&lt;/code&gt; and &lt;code&gt;FlyBackend&lt;/code&gt;, but any host that provides an API to provision a server and run your app code can work as a FLAME backend. Erlang and Elixir primitives are doing all the heavy lifting here. The entire &lt;code&gt;FLAME.FlyBackend&lt;/code&gt; is &lt;a href='https://github.com/phoenixframework/flame/blob/main/lib/flame/fly_backend.ex' title=''&gt;&amp;lt; 200 LOC with docs&lt;/a&gt;. The library has a single dependency, &lt;code&gt;req&lt;/code&gt;, which is an HTTP client.&lt;/p&gt;

&lt;p&gt;Because Fly.io runs our applications as a packaged up docker image, we simply ask the Fly API to boot a new Machine for us with the same image that our app is currently running. Also thanks to Fly infrastructure, we can guarantee the FLAME runners are started in the same region as the parent. This optimizes latency and lets you ship whatever data back and forth between parent and runner without having to think about it.&lt;/p&gt;
&lt;h2 id='look-at-everything-were-not-doing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-everything-were-not-doing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Look at everything we&amp;rsquo;re not doing&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;With FaaS, just imagine how quickly the dev and testing story becomes a fate worse than death.&lt;/p&gt;

&lt;p&gt;To run the app locally, we either need to add some huge dev dependencies to simulate the entire FaaS pipeline, or worse, connect up our dev and test environments directly to the FaaS provider.&lt;/p&gt;

&lt;p&gt;With FLAME, your dev and test runners simply run on the local backend.&lt;/p&gt;

&lt;p&gt;Remember, this is your app. FLAME just controls where modular parts of it run. In dev or test, those parts simply run on the existing runtime on your laptop or CI server.&lt;/p&gt;

&lt;p&gt;Using Elixir, we can even send a file across to the remote FLAME application thanks to the distributed features of the Erlang VM:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-rduui6yl"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-rduui6yl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;generate_thumbnails&lt;/span&gt;&lt;span class="p"&gt;(%&lt;/span&gt;&lt;span class="no"&gt;Video&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;parent_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="no"&gt;FLAME&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;MyApp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;FFMpegRunner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;tmp_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tmp_dir!&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="no"&gt;Ecto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;UUID&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;flame_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;into&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parent_stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flame_stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tmp_dir!&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="no"&gt;Ecto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;UUID&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mkdir!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-i"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tmp_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"-vf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"fps=1/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/%02d.png"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"ffmpeg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;VidStore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;put_thumbnails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wildcard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"/*.png"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="no"&gt;Repo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;insert_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Thumb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;vid_id:&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;url:&lt;/span&gt; &lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;On line 2 we open a file on the parent node to the video path. Then in the FLAME child, we stream the file from the parent node to the FLAME server in only a couple lines of code. That&amp;rsquo;s it! No setup of S3 or HTTP interfaces required.&lt;/p&gt;

&lt;p&gt;With FLAME it&amp;rsquo;s easy to miss everything we&amp;rsquo;re not doing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We don&amp;rsquo;t need to write code outside of our application. We can reuse business logic, database setup, PubSub, and all the features of our respective platforms
&lt;/li&gt;&lt;li&gt;We don&amp;rsquo;t need to manage deploys of separate services or endpoints
&lt;/li&gt;&lt;li&gt;We don&amp;rsquo;t need to write results to S3 or SQS just to pick up values back in our app
&lt;/li&gt;&lt;li&gt;We skip the dev, test, and CI dependency dance
&lt;/li&gt;&lt;/ul&gt;
&lt;h2 id='flame-outside-elixir' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#flame-outside-elixir' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;FLAME outside Elixir&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Elixir is fantastically well suited for the FLAME model because we get so much &lt;a href='https://fly.io/phoenix-files/elixir-and-phoenix-can-do-it-all/' title=''&gt;for free&lt;/a&gt; like process supervision and distributed messaging. That said, any language with reasonable concurrency primitives can take advantage of this pattern. For example, my teammate, Lubien, created a proof of concept example for breaking out functions in your JavaScript application and running them inside a new Fly Machine: &lt;a href='https://github.com/lubien/fly-run-this-function-on-another-machine' title=''&gt;https://github.com/lubien/fly-run-this-function-on-another-machine&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So the general flow for a JavaScript-based FLAME call would be to move the modular executions to a new file, which is executed on a runner pool. Provided the arguments are JSON serializable, the general FLAME flow is similar to what we&amp;rsquo;ve outlined here. Your application, your code, running on fleeting instances.&lt;/p&gt;

&lt;p&gt;A complete FLAME library will need to handle the following concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Elastic pool scale-up and scale-down logic
&lt;/li&gt;&lt;li&gt;Hot vs cold startup with pools
&lt;/li&gt;&lt;li&gt;Remote runner monitoring to avoid orphaned resources
&lt;/li&gt;&lt;li&gt;How to monitor and keep deployments fresh
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;For the rest of this post we&amp;rsquo;ll see how the Elixir FLAME library handles these concerns as well as features uniquely suited to Elixir applications. But first, you might be wondering about your background job queues.&lt;/p&gt;
&lt;h2 id='what-about-my-background-job-processor' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-about-my-background-job-processor' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What about my background job processor?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;FLAME works great inside your background job processor, but you may have noticed some overlap. If your job library handles scaling the worker pool, what is FLAME doing for you? There&amp;rsquo;s a couple important distinctions here.&lt;/p&gt;

&lt;p&gt;First, we reach for these queues when we need &lt;em&gt;durability guarantees&lt;/em&gt;. We often can turn knobs to have the queues scale to handle more jobs as load changes. But durable operations are separate from elastic execution. Conflating these concerns can send you down a similar path to lambda complexity. Leaning on your worker queue purely for offloaded execution means writing all the glue code to get the data into and out of the job, and back to the caller or end-user&amp;rsquo;s device somehow.&lt;/p&gt;

&lt;p&gt;For example, if we want to guarantee we successfully generated thumbnails for a video after the user upload, then a job queue makes sense as the &lt;em&gt;dispatch, commit, and retry&lt;/em&gt; &lt;em&gt;mechanism&lt;/em&gt; for this operation. The actual transcoding could be a FLAME call inside the job itself, so we decouple the ideas of durability and scaled execution.&lt;/p&gt;

&lt;p&gt;On the other side, we have operations we don&amp;rsquo;t need durability for. Take the screencast above where the user hasn&amp;rsquo;t yet saved their video. Or an ML model execution where there&amp;rsquo;s no need to waste resources churning a prompt if the user has already left the app. In those cases, it doesn&amp;rsquo;t make sense to write to a durable store to pick up a job for work that will go right into the ether.&lt;/p&gt;
&lt;h2 id='pooling-for-elastic-scale' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pooling-for-elastic-scale' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Pooling for Elastic Scale&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;With the Elixir implementation of FLAME, you define elastic pools of runners. This allows scale-to-zero behavior while also elastically scaling up FLAME servers with max concurrency limits.&lt;/p&gt;

&lt;p&gt;For example, lets take a look at the &lt;code&gt;start/2&lt;/code&gt; callback, which is the entry point of all Elixir applications. We can drop in a &lt;code&gt;FLAME.Pool&lt;/code&gt; for video transcriptions and say we want it to scale to zero, boot a max of 10, and support 5 concurrent &lt;code&gt;ffmpeg&lt;/code&gt; operations per runner:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-txwesbky"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-txwesbky"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;flame_parent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;FLAME&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Parent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

  &lt;span class="n"&gt;children&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="no"&gt;MyApp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Repo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="no"&gt;FLAME&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;name:&lt;/span&gt; &lt;span class="no"&gt;Thumbs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;FFMpegRunner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;min:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;max:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;max_concurrency:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;idle_shutdown_after:&lt;/span&gt; &lt;span class="mi"&gt;30_000&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;!flame_parent&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="no"&gt;MyAppWeb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Endpoint&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;strategy:&lt;/span&gt; &lt;span class="ss"&gt;:one_for_one&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;name:&lt;/span&gt; &lt;span class="no"&gt;MyApp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Supervisor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="no"&gt;Supervisor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start_link&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;children&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We use the presence of a FLAME parent to conditionally start our Phoenix webserver when booting the app. There&amp;rsquo;s no reason to start a webserver if we aren&amp;rsquo;t serving web traffic. Note we leave other services like the database &lt;code&gt;MyApp.Repo&lt;/code&gt; alone because we want to make use of those services inside FLAME runners.&lt;/p&gt;

&lt;p&gt;Elixir&amp;rsquo;s supervised process approach to applications is uniquely great for turning these kinds of knobs.&lt;/p&gt;

&lt;p&gt;We also set our pool to idle down after 30 seconds of no caller operations. This keeps our runners hot for a short while before discarding them. We could also pass a &lt;code&gt;min: 1&lt;/code&gt; to always ensure at least one &lt;code&gt;ffmpeg&lt;/code&gt; runner is hot and ready for work by the time our application is started.&lt;/p&gt;
&lt;h2 id='process-placement' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#process-placement' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Process Placement&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In Elixir, stateful bits of our applications are built around the &lt;em&gt;process&lt;/em&gt; primitive – lightweight greenthreads with message mailboxes. Wrapping our otherwise stateless app code in a synchronous &lt;code&gt;FLAME.call&lt;/code&gt;&amp;lsquo;s or async &lt;code&gt;FLAME.cast&lt;/code&gt;&amp;rsquo;s works great, but what about the stateful parts of our app?&lt;/p&gt;

&lt;p&gt;&lt;code&gt;FLAME.place_child&lt;/code&gt; exists to take an existing process specification in your Elixir app and start it on a FLAME runner instead of locally. You can use it anywhere you&amp;rsquo;d use &lt;code&gt;Task.Supervisor.start_child&lt;/code&gt; , &lt;code&gt;DynamicSupervisor.start_child&lt;/code&gt;, or similar interfaces. Just like &lt;code&gt;FLAME.call&lt;/code&gt;, the process is run on an elastic pool and runners handle idle down when the process completes its work.&lt;/p&gt;

&lt;p&gt;And like &lt;code&gt;FLAME.call&lt;/code&gt;, it lets us take existing app code, change a single LOC, and continue shipping features.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s walk thru the example from the screencast above. Imagine we want to generate video thumbnails for a video &lt;em&gt;as it is being uploaded&lt;/em&gt;. Elixir and LiveView make this easy. We won&amp;rsquo;t cover all the code here, but you can view the &lt;a href='https://github.com/fly-apps/thumbnail_generator/blob/main/lib/thumbs/thumbnail_generator.ex' title=''&gt;full app implementation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Our first pass would be to write a LiveView upload writer that calls into a &lt;code&gt;ThumbnailGenerator&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-a6sjmlsd"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-a6sjmlsd"&gt;&lt;span class="k"&gt;defmodule&lt;/span&gt; &lt;span class="no"&gt;ThumbsWeb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;ThumbnailUploadWriter&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="nv"&gt;@behaviour&lt;/span&gt; &lt;span class="no"&gt;Phoenix&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;LiveView&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;UploadWriter&lt;/span&gt;

  &lt;span class="n"&gt;alias&lt;/span&gt; &lt;span class="no"&gt;Thumbs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;generator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;gen:&lt;/span&gt; &lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;write_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream_chunk!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;gen:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gen&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_reason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gen&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;An upload writer is a behavior that simply ferries the uploaded chunks from the client into whatever we&amp;rsquo;d like to do with them. Here we have a &lt;code&gt;ThumbnailGenerator.open/1&lt;/code&gt; which starts a process that communicates with an &lt;code&gt;ffmpeg&lt;/code&gt; shell. Inside &lt;code&gt;ThumbnailGenerator.open/1&lt;/code&gt;, we use regular elixir process primitives:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-9zrau48v"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-9zrau48v"&gt;  &lt;span class="c1"&gt;# thumbnail_generator.ex&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="p"&gt;\\&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;Keyword&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validate!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:timeout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:caller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:fps&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;timeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Keyword&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:timeout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;caller&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Keyword&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:caller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;ref&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_ref&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;parent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;spec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="bp"&gt;__MODULE__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;DynamicSupervisor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start_child&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;@sup&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;receive&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;%&lt;/span&gt;&lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gen&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;%&lt;/span&gt;&lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;gen&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="ss"&gt;pid:&lt;/span&gt; &lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;after&lt;/span&gt;
      &lt;span class="n"&gt;timeout&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:timeout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The details aren&amp;rsquo;t super important here, except line 10 where we call &lt;code&gt;{:ok, pid} = DynamicSupervisor.start_child(@sup, spec)&lt;/code&gt;, which starts a supervised&lt;code&gt;ThumbnailGenerator&lt;/code&gt; process. The rest of the implementation simply ferries chunks as stdin into &lt;code&gt;ffmpeg&lt;/code&gt; and parses png&amp;rsquo;s from stdout. Once a PNG delimiter is found in stdout, we send the &lt;code&gt;caller&lt;/code&gt; process (our LiveView process) a message saying &amp;ldquo;hey, here&amp;rsquo;s an image&amp;rdquo;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-mgmj8tt7"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-mgmj8tt7"&gt;&lt;span class="c1"&gt;# thumbnail_generator.ex&lt;/span&gt;
&lt;span class="nv"&gt;@png_begin&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;137&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;71&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;26&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;
&lt;span class="k"&gt;defp&lt;/span&gt; &lt;span class="n"&gt;handle_stdout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bin&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="p"&gt;%&lt;/span&gt;&lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;ref:&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;caller:&lt;/span&gt; &lt;span class="n"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gen&lt;/span&gt;

  &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;bin&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt;@png_begin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_rest&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;binary&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
        &lt;span class="n"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;

      &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="ss"&gt;count:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;current:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;bin&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

    &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="ss"&gt;current:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;bin&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;caller&lt;/code&gt; LiveView process then picks up the message in a &lt;code&gt;handle_info&lt;/code&gt; callback and updates the UI:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-k5i7sj1i"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-k5i7sj1i"&gt;&lt;span class="c1"&gt;# thumb_live.ex&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;handle_info&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="n"&gt;_ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoded&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;count:&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assigns&lt;/span&gt;

  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:noreply&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;socket&lt;/span&gt;
   &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;count:&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;message:&lt;/span&gt; &lt;span class="s2"&gt;"Generating (&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;stream_insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:thumbs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;id:&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;encoded:&lt;/span&gt; &lt;span class="n"&gt;encoded&lt;/span&gt;&lt;span class="p"&gt;})}&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;send(caller, {ref, :image, state.count, encode(state)}&lt;/code&gt; is one magic part about Elixir. Everything is a process, and we can message those processes, regardless of their location in the cluster.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s like if every instantiation of an object in your favorite OO lang included a cluster-global unique identifier to work with methods on that object. The LiveView (a process) simply receives the image message and updates the UI with new images.&lt;/p&gt;

&lt;p&gt;Now let&amp;rsquo;s head back over to our &lt;code&gt;ThumbnailGenerator.open/1&lt;/code&gt; function and make this elastically scalable.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative diff"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-2cwbroo9"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-2cwbroo9"&gt;&lt;span class="gd"&gt;-    {:ok, pid} = DynamicSupervisor.start_child(@sup, spec)
&lt;/span&gt;&lt;span class="gi"&gt;+    {:ok, pid} = FLAME.place_child(Thumbs.FFMpegRunner, spec)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;That&amp;rsquo;s it! Because everything is a process and processes can live anywhere, it doesn&amp;rsquo;t matter what server our &lt;code&gt;ThumbnailGenerator&lt;/code&gt; process lives on. It simply messages the caller with &lt;code&gt;send(caller, …)&lt;/code&gt; and the messages are sent across the cluster if needed.&lt;/p&gt;

&lt;p&gt;Once the process exits, either from an explicit close, after the upload is done, or from the end-user closing their browser tab, the FLAME server will note the exit and idle down if no other work is being done.&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href='https://github.com/fly-apps/thumbnail_generator/blob/main/lib/thumbs/thumbnail_generator.ex' title=''&gt;full implementation&lt;/a&gt; if you&amp;rsquo;re interested.&lt;/p&gt;
&lt;h2 id='remote-monitoring' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#remote-monitoring' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Remote Monitoring&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;All this transient infrastructure needs failsafe mechanisms to avoid orphaning resources. If a parent spins up a runner, that runner must take care of idling itself down when no work is present and handle failsafe shutdowns if it can no longer contact the parent node.&lt;/p&gt;

&lt;p&gt;Likewise, we need to shutdown runners when parents are rolled for new deploys as we must guarantee we&amp;rsquo;re running the same code across the cluster.&lt;/p&gt;

&lt;p&gt;We also have active callers in many cases that are awaiting the result of work on runners that could go down for any reason.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a lot to monitor here.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s also a number of failure modes that make this sound like a harrowing experience to implement. Fortunately Elixir has all the primitives to make this an easy task thanks to the Erlang VM. Namely, we get the following for free:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Process monitoring and supervision – we know when things go bad. Whether on a node-local process, or one across the cluster
&lt;/li&gt;&lt;li&gt;Node monitoring – we know when nodes come up, and when nodes go away
&lt;/li&gt;&lt;li&gt;Declarative and controlled app startup and shutdown - we carefully control the startup and shutdown sequence of applications as a matter of course. This allows us to gracefully shutdown active runners when a fresh deploy is triggered, while giving them time to finish their work
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;We&amp;rsquo;ll cover the internal implementation details in a future deep-dive post. For now, feel free to poke around &lt;a href='https://github.com/phoenixframework/flame' title=''&gt;the flame source&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id='whats-next' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s Next&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;re just getting started with the Elixir FLAME library, but it&amp;rsquo;s ready to try out now. In the future  look for more advance pool growth techniques, and deep dives into how the Elixir implementation works. You can also find me &lt;a href='https://twitter.com/chris_mccord' title=''&gt;@chris_mccord&lt;/a&gt; to chat about implementing the FLAME pattern in your language of choice.&lt;/p&gt;

&lt;p&gt;Happy coding!&lt;/p&gt;

&lt;p&gt;–Chris&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>The risks of building apps on ChatGPT</title>
        <link rel="alternate" href="https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/"/>
        <id>https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/</id>
        <published>2023-12-05T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/assets/risks-building-on-chatgpt-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;If AI will play an essential role in your application, then consider using a self-hosted, open source model instead of a proprietary and externally hosted one. In this post we explore some of the risks for the latter option. We’re Fly.io. We put your code into lightweight microVMs on our own hardware &lt;a href="https://fly.io/docs/reference/regions/" title=""&gt;around the world&lt;/a&gt;. &lt;a href="https://fly.io/docs/speedrun/" title=""&gt;Check us out&lt;/a&gt;—your app can be deployed in minutes.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The topic of &amp;ldquo;AI&amp;rdquo; gets a lot of attention and press. Coverage ranges from apocalyptic warnings to Utopian predictions. The truth, as always, is likely somewhere in the middle. As developers, we are the ones that either imagine ways that AI can be used to enhance our products or the ones doing the herculean tasks of implementing it inside our companies.&lt;/p&gt;

&lt;p&gt;I believe the following statement to be true:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI won’t replace humans — but humans with AI will replace humans without AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I believe this can be extended to many products and services and the companies that create them. Let&amp;rsquo;s express it this way:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI won’t replace businesses — but businesses with AI will replace businesses without AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Today I&amp;rsquo;m assuming your business would benefit from using AI. Or, at the very least, your C-levels have decreed from on high that thou must integrateth with AI. With that out of the way, the next question is how you&amp;rsquo;re meant to do it. This post is an argument to build on top of open source language models instead of closed models that you rent access to. We&amp;rsquo;ll take a look at what convinced me.&lt;/p&gt;
&lt;h2 id='but-openai-is-the-market-leader' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-openai-is-the-market-leader' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;But OpenAI is the market leader…&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;OpenAI, the creators of the famous ChatGPT, are the strong market leaders in this category. Why wouldn&amp;rsquo;t you want to use the best in the business?&lt;/p&gt;

&lt;p&gt;Early on, stories of private corporate documents being uploaded by employees and then finding that private information leaking out to general ChatGPT users was a real black eye. &lt;a href='https://www.sciencealert.com/many-companies-are-banning-chatgpt-this-is-why' title=''&gt;Companies began banning employees from using ChatGPT for work&lt;/a&gt;. It exposed that people&amp;rsquo;s interactions with ChatGPT were being used as training data for future versions of the model.&lt;/p&gt;

&lt;p&gt;In response, OpenAI recently announced an &lt;a href='https://openai.com/enterprise' title=''&gt;Enterprise&lt;/a&gt; offering promising that no Enterprise customer data is used for training.&lt;/p&gt;

&lt;p&gt;With the top objection addressed, it should be smooth sailing for wide adoption, right?&lt;/p&gt;

&lt;p&gt;Not so fast.&lt;/p&gt;

&lt;p&gt;While an Enterprise offering may address that concern, there are other subtle reasons to not use OpenAI, or other closed models, that can&amp;rsquo;t be resolved by vague statements of enterprise privacy.&lt;/p&gt;
&lt;h2 id='what-are-the-risks-for-building-on-top-of-openai' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-are-the-risks-for-building-on-top-of-openai' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What are the risks for building on top of OpenAI?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s briefly outline the risks we take on when relying on a company like OpenAI for critical AI features in our applications.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Single provider risk&lt;/strong&gt;: Relying deeply on an external service that plays a critical role in our business is risky. The integration is not easily swapped out for another service if needed. Additionally, we don&amp;rsquo;t want part of our &amp;ldquo;secret sauce&amp;rdquo; to actually be another company&amp;rsquo;s product. That&amp;rsquo;s some seriously shaky ground! They &lt;em&gt;want&lt;/em&gt; to sell the same thing to our competitors too.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Regulation or Policy change risk&lt;/strong&gt;: &amp;ldquo;AI&amp;rdquo; is being talked about a lot in politics. What&amp;rsquo;s acceptable today may be deemed &amp;ldquo;not allowed&amp;rdquo; in the future and a corporation providing a newly regulated service must comply.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Financial risk&lt;/strong&gt;: &lt;a href='https://www.washingtonpost.com/technology/2023/06/05/chatgpt-hidden-cost-gpu-compute/' title=''&gt;AI chatbots lose money on every chat.&lt;/a&gt; If the financial models that make our business profitable are built on impossible to maintain prices, then our business model may be at risk when it&amp;rsquo;s time to &amp;ldquo;make the AI engine profitable&amp;rdquo; like we&amp;rsquo;ve seen happen time and time again with every industry from cookware to video games. What might the true cost be? We don&amp;rsquo;t know. &amp;lsquo;Nuff said.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Governance and leadership risk&lt;/strong&gt;: The co-founder and CEO of OpenAI, &lt;a href='https://openai.com/blog/openai-announces-leadership-transition' title=''&gt;Sam Altman, was forced out of his own company by a coup from his board&lt;/a&gt;. This was later resolved with both &lt;a href='https://openai.com/blog/openai-announces-leadership-transition' title=''&gt;Sam Altman and Greg Brockman returning&lt;/a&gt;. This exposes another risk we don&amp;rsquo;t often consider with our providers. More on this later.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Let&amp;rsquo;s look a bit closer at the &amp;ldquo;Single provider risk&amp;rdquo;.&lt;/p&gt;
&lt;h2 id='single-provider-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#single-provider-risk' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Single provider risk&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;For hobby usage, proof of concept work, and personal experiments, by all means, use ChatGPT! I do and I expect to continue to as well. It&amp;rsquo;s fantastic for prototyping, it&amp;rsquo;s trivial to set up, and it allows you to throw ink on canvas so much more quickly than any other option out there.&lt;/p&gt;

&lt;p&gt;Up until recently, I was all gung-ho for ChatGPT being integrated into my apps. What happened? November 2023 happened. It was a very bad month for OpenAI.&lt;/p&gt;

&lt;p&gt;I created a &lt;a href='https://fly.io/phoenix-files/created-my-personal-ai-fitness-trainer-in-2-days/' title=''&gt;Personal AI Fitness Trainer&lt;/a&gt; powered by ChatGPT and on the morning of November 8th, I asked my personal trainer about the workout for the day and it failed. OpenAI was having a bad day with an outage.&lt;/p&gt;

&lt;p&gt;I don&amp;rsquo;t fault someone for having a bad day. At some point, downtime happens to the best of us. And given enough time, it happens to &lt;strong class='font-semibold text-navy-950'&gt;all&lt;/strong&gt; of us. But when possible, I want to prevent someone &lt;em&gt;else&amp;rsquo;s&lt;/em&gt; bad day from becoming &lt;em&gt;my&lt;/em&gt; bad day too.&lt;/p&gt;
&lt;h3 id='evaluating-a-critical-dependency' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#evaluating-a-critical-dependency' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Evaluating a critical dependency&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;In my case, my personal fitness trainer being unavailable was a minor inconvenience, but I managed. However, it gave me pause. If I had built an AI fitness trainer as a service, that outage would be a much bigger deal and there would be nothing I could have done to fix it until the ChatGPT API came back up.&lt;/p&gt;

&lt;p&gt;With services like a Personal AI Fitness Trainer, the AI component is the primary focus and main value proposition of the app. That&amp;rsquo;s pretty darn critical! If that AI service is interrupted, significantly altered (say, by the model suddenly refusing my requests for fitness information in ways that worked before) or my desired usage is denied (without warning or reason), the application is useless. That&amp;rsquo;s an existential threat that could make my app evaporate overnight without warning.&lt;/p&gt;

&lt;p&gt;This highlights the risk of having a critical dependency on an external service.&lt;/p&gt;

&lt;p&gt;Modern applications depend on many services, both internal and external. But how &lt;strong class='font-semibold text-navy-950'&gt;critical&lt;/strong&gt; that dependency is matters.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s take a &lt;em&gt;very&lt;/em&gt; simple application as an example. The application has a critical dependency on the database and both the app and database have a critical dependency on the underlying VMs/machines/provider. These critical dependencies are so common that we seldom think about them because we deal with them every day we come to work. It&amp;rsquo;s just how things are.&lt;/p&gt;

&lt;p&gt;&lt;img alt="Diagram showing an application stack of hosting &amp;gt; Database &amp;gt; My Application and weak dependencies on logging, error reporting, etc. Then a critical dependency on an external AI as a Service. " src="/blog/the-risks-of-building-apps-on-chatgpt/assets/critical-dependency-vs-weak.png" /&gt;&lt;/p&gt;

&lt;p&gt;The danger comes when we draw a critical dependency line to an &lt;strong class='font-semibold text-navy-950'&gt;external&lt;/strong&gt; &lt;strong class='font-semibold text-navy-950'&gt;service&lt;/strong&gt;. If the service has a hiccup or the network between my app and their service starts dropping all my packets, the entire application goes down. Someone else&amp;rsquo;s bad day gets spread around when that happens. 😞&lt;/p&gt;

&lt;p&gt;In order to protect ourselves from a risk like that, we should diversify our reliance away from a single external provider. How do we do that? We&amp;rsquo;ll come back to this later.&lt;/p&gt;
&lt;h3 id='we-are-not-without-dependencies' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-are-not-without-dependencies' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We are not without dependencies&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s really common for apps to have external dependencies. The question is how critical to our service are those dependencies?&lt;/p&gt;

&lt;p&gt;What happens to the application when the external log aggregation service, email service, and error reporting services are all unreachable? If the app is designed well, then users may have a slightly degraded experience or, best case, the users won&amp;rsquo;t even notice the issues at all!&lt;/p&gt;

&lt;p&gt;The key factor is these external services are not essential to our application functioning.&lt;/p&gt;
&lt;h2 id='regulation-or-policy-change-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#regulation-or-policy-change-risk' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Regulation or Policy change risk&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Our industry has a lot of misconceptions, fear, uncertainty, and doubt around the idea of regulation, but sometimes it&amp;rsquo;s justified. I don&amp;rsquo;t want you to think about regulation as a scary thing that yanks away control. Instead, let&amp;rsquo;s think about regulation as when a government body gets involved to disallow businesses from doing or engaging in specific activities. Given that our industry has been so self-defined for so long, this feels like an existential threat. However, this is a good thing when we think about vehicle safety standards (you don&amp;rsquo;t want your 4-ton mass of metal exploding while traveling at 70 mph), pollution, health risks, and more. It&amp;rsquo;s a careful balance.&lt;/p&gt;

&lt;p&gt;Ironically, Sam Altman has been a major proponent &lt;a href='https://www.forbes.com/sites/johannacostigan/2023/06/13/openais-sam-altman-makes-global-call-for-ai-regulation-and-includes-china/?sh=4fc007421b47' title=''&gt;for government regulation&lt;/a&gt; of the AI industry. Why would he want that?&lt;/p&gt;

&lt;p&gt;It turns out that &lt;a href='https://www.cato.org/policy-analysis/regulatory-protectionism-hidden-threat-free-trade' title=''&gt;regulation can also be used as a form of protectionism&lt;/a&gt;. Or, put another way, when the people with an early lead see that &lt;a href='https://www.semianalysis.com/p/google-we-have-no-moat-and-neither' title=''&gt;they aren&amp;rsquo;t defensible against advances with open source AI models&lt;/a&gt;, they want to pull up the ladders behind them and have the government make it legally harder, or impossible, for competitors to catch up to them.&lt;/p&gt;

&lt;p&gt;If Altman&amp;rsquo;s efforts are successful, then companies who create AI can expect government involvement and oversight. Added licensing requirements and certifications would raise the cost of starting a competing business.&lt;/p&gt;

&lt;p&gt;At this point you may be thinking something like &amp;ldquo;but all of that is theoretical Mark, how would this affect my business&amp;rsquo; use of AI today?&amp;rdquo;&lt;/p&gt;

&lt;p&gt;Introducing an external organization that can dictate changes to an AI product risks breaking an existing company&amp;rsquo;s applications or significantly reducing the effectiveness of the application. And those changes may come without notice or warning.&lt;/p&gt;

&lt;p&gt;Additionally, if my business is built on an external AI system protected from competition by regulators, that adds a significant risk. If they are now the only game in town, they can set whatever price they want.&lt;/p&gt;
&lt;h2 id='governance-and-leadership-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#governance-and-leadership-risk' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Governance and leadership risk&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In the week following the OpenAI outage (November 17th to be precise), the entire tech industry was upended for most of a week following a blog post on the OpenAI blog &lt;a href='https://openai.com/blog/openai-announces-leadership-transition' title=''&gt;announcing that the OpenAI board fired the co-founder and CEO, Sam Altman&lt;/a&gt;. Then &lt;a href='https://www.forbes.com/sites/richardnieva/2023/11/17/openai-president-and-co-founder-quits-over-sam-altman-firing/?sh=34fe4b621d57' title=''&gt;Greg Brockman, co-founder and acting President resigned in protest&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;OpenAI is partnered with Microsoft and on Nov 20, 2023, &lt;a href='https://twitter.com/satyanadella/status/1726509045803336122' title=''&gt;Satya Nadella (CEO of Microsoft) posted the following on X&lt;/a&gt; (formerly Twitter):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We remain committed to our partnership with OpenAI (OAI) and have confidence in our product roadmap, our ability to continue to innovate with everything we announced at Microsoft Ignite, and in continuing to support our customers and partners. We look forward to getting to know Emmett Shear and OAI&amp;rsquo;s new leadership team and working with them. And &lt;strong class='font-semibold text-navy-950'&gt;we’re extremely excited to share the news that Sam Altman and Greg Brockman, together with colleagues, will be joining Microsoft to lead a new advanced AI research team.&lt;/strong&gt; We look forward to moving quickly to provide them with the resources needed for their success.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Microsoft nearly &lt;a href='https://en.wikipedia.org/wiki/Acqui-hiring' title=''&gt;acqui-hired&lt;/a&gt; OpenAI for $0! That&amp;rsquo;s some serious business Jujutsu.&lt;/p&gt;

&lt;p&gt;In the end, after 12 days of very public corporate chaos, &lt;a href='https://openai.com/blog/sam-altman-returns-as-ceo-openai-has-a-new-initial-board' title=''&gt;Sam Altman and Greg Brockman returned to OpenAI at their previous leadership positions&lt;/a&gt; as if nothing happened (save the firing of the rest of the board).&lt;/p&gt;

&lt;p&gt;With all the drama and uncertainty resolved, you may say, &amp;ldquo;it all worked out in the end, right? So what&amp;rsquo;s the problem?&amp;rdquo;&lt;/p&gt;

&lt;p&gt;This highlights the risk of building &lt;em&gt;any&lt;/em&gt; critical business system on a product offered and hosted by an external company. When we do that, we implicitly take on all of that company&amp;rsquo;s risks in addition to the risks our business already has! In this case, it&amp;rsquo;s taking on all the risks of OpenAI while getting none of their financial benefits!&lt;/p&gt;
&lt;h2 id='whats-the-alternative' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-the-alternative' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s the alternative?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The thing big AI providers like OpenAI and Google seem to fear most is competition from open source AI models. And they should be afraid. Open source AI models continue to develop at a rapid pace (there&amp;rsquo;s huge incremental improvements on a weekly basis) and, most importantly, they can be self-hosted.&lt;/p&gt;

&lt;p&gt;Additionally, it&amp;rsquo;s not out of reach for us to &lt;a href='https://huggingface.co/docs/transformers/training' title=''&gt;fine tune&lt;/a&gt; a general model to better fit our  needs by adding and removing capabilities rather than hope that the capabilities we need suddenly manifest for us.&lt;/p&gt;

&lt;p&gt;Doesn&amp;rsquo;t this all sound like the classic argument in favor of open source?&lt;/p&gt;

&lt;p&gt;If we have the model and can host it ourselves, no one can take it away. When we self-host it, we are protected from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;service interruptions from an external provider for a critical system
&lt;/li&gt;&lt;li&gt;changes in licensing or usage fees (such as your provider suddenly doubling inference costs without warning via an email sent at 3AM)
&lt;/li&gt;&lt;li&gt;government regulators dictate a change to the model that negatively affects our use case (assuming our use isn&amp;rsquo;t breaking the law of course)
&lt;/li&gt;&lt;li&gt;company policy changes that change the behavior of the model we rely on
&lt;/li&gt;&lt;li&gt;rogue boards or a leadership crisis that impacts a provider
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Using an open source and self-hosted model insulates us from these external risks.&lt;/p&gt;
&lt;h2 id='i-still-need-gpus' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#i-still-need-gpus' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;I still need GPUs!&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Getting dedicated access to a GPU is more expensive than renting limited time on OpenAI&amp;rsquo;s servers. That&amp;rsquo;s why a hobby or personal project is better off paying for the brief bits of time when needed.&lt;/p&gt;

&lt;p&gt;But let&amp;rsquo;s face it.&lt;/p&gt;

&lt;p&gt;If you really want to integrate AI into your business, you need to host your own models. You can&amp;rsquo;t control third party privacy policies, but you can control your own policies when you are the one doing your own inference with your own models. Ideally this means getting your own GPUs and incurring the capital expenditure and operations expenditures, but thankfully we&amp;rsquo;re in the future. We have the cloud now. There&amp;rsquo;s many options you can use for renting GPU access from other companies. This is supported in the big clouds as well as Fly.io. You can check out our &lt;a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''&gt;GPU offerings here&lt;/a&gt;.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Fly.io also offer GPUs&lt;/h1&gt;
    &lt;p&gt;Running inference on your own hosted models can help de-risk critical AI integrations.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/about/pricing/#gpus-and-fly-machines"&gt;
        GPU resource prices
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-rabbit.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='closing-thoughts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#closing-thoughts' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Closing thoughts&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s important to take advantage of AI in our applications so we can reap the benefits. It can give us an important edge in the market! However, we should be extra cautious of building any critical features on a product offered by a proprietary external business. &lt;a href='https://www.msn.com/en-us/money/companies/sam-altman-chaos-helped-openai-rivals-says-hugging-face-ceo-cl%C3%A9ment-delangue/ar-AA1kIFQP' title=''&gt;Others are considering the risks of building on OpenAI as well&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Your specific level of risk depends on how central the AI aspect is to your business. If it&amp;rsquo;s a central component like in my Personal AI Fitness Trainer, then I risk losing all my customers and even the company if any of the above mentioned risk factors happen to  my AI provider. That&amp;rsquo;s an existential risk that I can&amp;rsquo;t do anything about without taking emergency heroic efforts.&lt;/p&gt;

&lt;p&gt;If the AI is sprinkled around the edges of the business, then suddenly losing it won&amp;rsquo;t kill the company. However, if the AI isn&amp;rsquo;t being well utilized, then the business may be at risk to competitors who place a bigger bet and take a bigger swing with AI.&lt;/p&gt;

&lt;p&gt;Oh, what interesting times we live in! 🙃&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Print on Demand</title>
        <link rel="alternate" href="https://fly.io/blog/print-on-demand/"/>
        <id>https://fly.io/blog/print-on-demand/</id>
        <published>2023-11-29T00:00:00+00:00</published>
        <updated>2025-05-20T16:26:25+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/print-on-demand/assets/print-on-demand-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Save money by using appliance machines to only allocate memory and other machine resources when you actually need them.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Scaling discussions often lead to recommendations to add more memory, more CPU, more machines, more regions, more, more, more.&lt;/p&gt;

&lt;p&gt;This post is different.  It focuses instead on the idea of decomposing parts of your applications into event handlers, starting up Machines to handle the events when needed, and stopping them when the event is done.  Along the way we will see how a few built in Fly.io primitives make this easy.&lt;/p&gt;

&lt;p&gt;To make the discussion concrete, we are going to focus on a common requirement: generation of PDFs from web pages.  The code that we will introduce isn&amp;rsquo;t merely an example produced in support of a blog post - rather it is code that was extracted from a production application, and packaged up into an appliance that you can deploy in minutes to add PDF generation to your existing application.&lt;/p&gt;

&lt;p&gt;But before we dive in, let&amp;rsquo;s back up a bit.&lt;/p&gt;
&lt;h2 id='motivation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#motivation' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Motivation&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Normally the way this is approached is to start with a tool like &lt;a href='https://github.com/puppeteer/puppeteer' title=''&gt;Puppeteer&lt;/a&gt;, &lt;a href='https://github.com/Studiosity/grover#readme' title=''&gt;Grover&lt;/a&gt;, &lt;a href='https://playwright.dev/' title=''&gt;Playwright&lt;/a&gt;, &lt;a href='https://github.com/bitcrowd/chromic_pdf' title=''&gt;ChromicPDF&lt;/a&gt;, or &lt;a href='https://spatie.be/docs/browsershot/v2/introduction' title=''&gt;BrowserShot&lt;/a&gt;.  These and other tools ultimately launch a browser like &lt;a href='https://developer.chrome.com/articles/new-headless/' title=''&gt;Chrome headless&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now a few things about Chrome itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It likely is bigger than your entire web server.
&lt;/li&gt;&lt;li&gt;It likely uses more memory than you see with a typical load on your server.
&lt;/li&gt;&lt;li&gt;All total, people using your server likely spend much less time generating PDFs than they do using the rest of your application. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Taken together, this makes splitting PDF generation into a completely separate application an easy win.  With a smaller image, your application will start faster.  Memory usage will be more predictable, and the memory needed to generate PDFs will only be allocated when needed and can be scaled separately.&lt;/p&gt;
&lt;h2 id='diving-in' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#diving-in' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Diving in&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Without further ado, the entire application is available on GitHub as &lt;a href='https://github.com/fly-apps/pdf-appliance/#readme' title=''&gt;fly-apps/pdf-appliance&lt;/a&gt;.  Installation is a simple matter of: clone repository, create app, adjust config, deploy, and scale.&lt;/p&gt;

&lt;p&gt;Next, you will need to integrate this into your application.  All that is needed is to reply to requests that are intended to produce a PDF with a &lt;a href='https://fly.io/docs/reference/dynamic-request-routing/#the-fly-replay-response-header' title=''&gt;fly-replay&lt;/a&gt; response header.  This can either be done on individual application routes / controller actions, or it can be done globally via either middleware or a front end like &lt;a href='https://www.nginx.com/' title=''&gt;NGINX&lt;/a&gt;.  You can find a few examples in the &lt;a href='https://github.com/fly-apps/pdf-appliance/#integrate-with-your-existing-application' title=''&gt;README&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And, that&amp;rsquo;s it.  The most you might consider doing is issuing an additional HTTP request in anticipation of the user selecting what they want to print as this will &lt;a href='https://github.com/fly-apps/pdf-appliance/#preloading-optional' title=''&gt;preload the machine&lt;/a&gt;.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Scale at your own pace&lt;/h1&gt;
    &lt;p&gt;Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/"&gt;
        Run your entire stack near your users
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;


&lt;p&gt;If you don&amp;rsquo;t have an application handy, you can try a demo.  Go to &lt;a href='https://smooth.fly.dev/' title=''&gt;smooth.fly.dev&lt;/a&gt;.  Click on Demo, then on Publish, and finally on Invoices to see a PDF.  The PDF you see will likely be underwhelming as you would need to enter students, entries, packages and options to fill out the page.  But click refresh anyway and see how fast it responds.  If you want to explore further, links to the &lt;a href='https://smooth.fly.dev/showcase/docs/' title=''&gt;documentation&lt;/a&gt; and &lt;a href='https://github.com/rubys/showcase#readme' title=''&gt;code&lt;/a&gt; can be found on the front page.&lt;/p&gt;
&lt;h2 id='implementation-details' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implementation-details' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Implementation Details&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The basic flow starts with a request comes into your app for a PDF.  That request is replayed to the PDF appliance.  A Chrome instance in that app then issues a second request to your app for the same URL minus the &lt;code&gt;.pdf&lt;/code&gt; extension and then converts the HTML which it receives in response to a PDF.  That PDF is then returned as the response to the original request.&lt;/p&gt;

&lt;p&gt;A single Google Chrome instance per machine will be reused across all requests, which itself is faster than starting a new instance per request.  As all HTTP headers will be passed back to your application, this will seamlessly work with your existing session, cookies, and basic authentication.&lt;/p&gt;

&lt;p&gt;Starting up a machine on demand is handled by the &lt;code&gt;auto_stop_machines&lt;/code&gt; setting in your &lt;code&gt;fly.toml&lt;/code&gt;.  With this in place, machines can confidently exit when idle, secure in the knowledge that they will be restarted when needed.  See the &lt;a href='https://github.com/fly-apps/pdf-appliance/#scaling' title=''&gt;README&lt;/a&gt; for more information on scaling.&lt;/p&gt;

&lt;p&gt;Note that different machines can use different languages and frameworks.  This code is written in JavaScript and runs on Bun.  It was designed to support a Ruby on Rails app, but can be used with any app.&lt;/p&gt;
&lt;h2 id='a-reusable-pattern' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-reusable-pattern' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A Reusable Pattern&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;If your app is small and your usage is low, scaling may not be much of a
concern, but as your need grow your first instinct shouldn&amp;rsquo;t merely be to throw
more hardware at the problem, but rather to partition the problem so that each
machine has a somewhat predictable capacity.&lt;/p&gt;

&lt;p&gt;Do this by taking a look at your application, and look for requests that are
somehow different than the rest.  Streaming audio and video files, handling websockets,
converting text to speech or performing other AI processing, long running
&amp;ldquo;background&amp;rdquo; computation, fetching static pages, producing PDFs, and updating
databases all have different profiles in terms of server load.&lt;/p&gt;

&lt;p&gt;It might even be helpful &amp;ndash; purely as a thought experiment &amp;ndash; to think of
replacing your main server with a proxy that does nothing more than route
requests to separate machines based on the type of workload performed.&lt;/p&gt;

&lt;p&gt;Once you have come up with an allocation of functions performed to pools of
machines, Fly-Replay is but one tool available to you.  There is also a
&lt;a href='https://fly.io/docs/machines/working-with-machines/' title=''&gt;Machines API&lt;/a&gt; that will
enable you to orchestrate whatever topology you can come up with.
&lt;a href='https://fly.io/laravel-bytes/cost-effective-queue-workers-with-fly-io-machines/' title=''&gt;Cost-Effective Queue Workers With Fly.io
Machines&lt;/a&gt;
gives a preview of what that would look like with Laravel.&lt;/p&gt;</content>
    </entry>
</feed>

Raw text

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
  <title>The Fly Blog</title>
  <subtitle>News, tips, and tricks from the team at Fly</subtitle>
  <id>https://fly.io/blog/</id>
  <link href="https://fly.io/blog/"/>
  <link href="https://fly.io/blog/" rel="self"/>
  <updated>2025-06-20T00:00:00+00:00</updated>
  <author>
    <name>Fly</name>
  </author>
  <entry>
    <title>Phoenix.new – The Remote AI Runtime for Phoenix</title>
    <link rel="alternate" href="https://fly.io/blog/phoenix-new-the-remote-ai-runtime/"/>
    <id>https://fly.io/blog/phoenix-new-the-remote-ai-runtime/</id>
    <published>2025-06-20T00:00:00+00:00</published>
    <updated>2025-07-03T17:13:29+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/phoenix-new-the-remote-ai-runtime/assets/phoenixnew-thumb.png"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;I’m Chris McCord, the creator of Elixir’s Phoenix framework. For the past several months, I’ve been working on a skunkworks project at Fly.io, and it’s time to show it off.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I wanted LLM agents to work just as well with Elixir as they do with Python and JavaScript. Last December, in order to figure out what that was going to take, I started a little weekend project to find out how difficult it would be to build a coding agent in Elixir.&lt;/p&gt;

&lt;p&gt;A few weeks later, I had it spitting out working Phoenix applications and driving a full in-browser IDE. I knew this wasn&amp;rsquo;t going to stay a weekend project.&lt;/p&gt;

&lt;p&gt;If you follow me on Twitter, you&amp;rsquo;ve probably seen me teasing this work as it picked up steam. We&amp;rsquo;re at a point where we&amp;rsquo;re pretty serious about this thing, and so it&amp;rsquo;s time to make a formal introduction.&lt;/p&gt;

&lt;p&gt;World, meet &lt;a href='https://phoenix.new' title=''&gt;Phoenix.new&lt;/a&gt;, a batteries-included fully-online coding agent tailored to Elixir and Phoenix. I think it&amp;rsquo;s going to be the fastest way to build collaborative, real-time applications.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s see it in action:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/du7GmWGUM5Y"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id='whats-interesting-about-phoenix-new' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-interesting-about-phoenix-new' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s Interesting About Phoenix.new&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;First, even though it runs entirely in your browser, Phoenix.new gives both you and your agent a root shell, in an ephemeral virtual machine (a &lt;a href='https://fly.io/docs/machines/overview/' title=''&gt;Fly Machine&lt;/a&gt;) that gives our agent loop free rein to install things and run programs  — without any risk of messing up your local machine. You don&amp;rsquo;t think about any of this; you just open up the VSCode interface, push the shell button, and there you are, on the isolated machine you share with the Phoenix.new agent.&lt;/p&gt;

&lt;p&gt;Second, it&amp;rsquo;s an agent system I built specifically for Phoenix. Phoenix is about real-time collaborative applications, and Phoenix.new knows what that means. To that end, Phoenix.new includes, in both its UI and its agent tools, a full browser. The Phoenix.new agent uses that browser &amp;ldquo;headlessly&amp;rdquo; to check its own front-end changes and interact with the app. Because it&amp;rsquo;s a full browser, instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state – with or without a human present.&lt;/p&gt;
&lt;h2 id='what-root-access-gets-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-root-access-gets-us' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Root Access Gets Us&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Agents build software the way you did when you first got started, the way you still do today when you prototype things. They don&amp;rsquo;t carefully design Docker container layers and they don&amp;rsquo;t really do release cycles. An agent wants to pop a shell and get its fingernails dirty.&lt;/p&gt;

&lt;p&gt;A fully isolated virtual machine means Phoenix.new&amp;rsquo;s fingernails can get &lt;em&gt;arbitrarily dirty.&lt;/em&gt; If it wants to add a package to &lt;code&gt;mix.exs&lt;/code&gt;, it can do that and then run &lt;code&gt;mix phx.server&lt;/code&gt; or &lt;code&gt;mix test&lt;/code&gt; and check the output. Sure. Every agent can do that. But if it wants to add an APT package to the base operating system, it can do that too, and make sure it worked. It owns the whole environment.&lt;/p&gt;

&lt;p&gt;This offloads a huge amount of tedious, repetitive work.&lt;/p&gt;

&lt;p&gt;At his &lt;a href='https://youtu.be/LCEmiRjPEtQ?si=sR_bdu6-AqPXSNmY&amp;t=1902' title=''&gt;AI Startup School talk last week&lt;/a&gt;, Andrej Karpathy related his experience of building a restaurant menu visualizer, which takes camera pictures of text menus and transforms all the menu items into pictures. The code, which he vibe-coded with an LLM agent, was the easy part; he had it working in an afternoon. But getting the app online took him a whole week.&lt;/p&gt;

&lt;p&gt;With Phoenix.new, I&amp;rsquo;m taking dead aim at this problem. The apps we produce live in the cloud from the minute they launch. They have private, shareable URLs (we detect anything the agent generates with a bound port and give it a preview URL underneath &lt;code&gt;phx.run&lt;/code&gt;, with integrated port-forwarding), they integrate with Github, and they inherit all the infrastructure guardrails of Fly.io: hardware virtualization, WireGuard, and isolated networks.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Github’s &lt;code&gt;gh&lt;/code&gt; CLI is installed by default. So the agent knows how to clone any repo, or browse issues, and you can even authorize it for internal repositories to get it working with your team’s existing projects and dependencies.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Full control of the environment also closes the loop between the agent and deployment. When Phoenix.new boots an app, it watches the logs, and tests the application. When an action triggers an error, Phoenix.new notices and gets to work.&lt;/p&gt;
&lt;h2 id='watch-it-build-in-real-time' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#watch-it-build-in-real-time' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Watch It Build In Real Time&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://phoenix.new' title=''&gt;Phoenix.new&lt;/a&gt; can interact with web applications the way users do: with a real browser.&lt;/p&gt;

&lt;p&gt;The Phoenix.new environment includes a headless Chrome browser that our agent knows how to drive. Prompt it to add a front-end feature to your application, and it won&amp;rsquo;t just sketch the code out and make sure it compiles and lints. It&amp;rsquo;ll pull the app up itself and poke at the UI, simultaneously looking at the page content, JavaScript state, and server-side logs.&lt;/p&gt;

&lt;p&gt;Phoenix is all about &lt;a href='https://fly.io/blog/how-we-got-to-liveview/' title=''&gt;&amp;ldquo;live&amp;rdquo; real-time&lt;/a&gt; interactivity, and gives us seamless live reload. The user interface for Phoenix.new itself includes a live preview of the app being worked on, so you can kick back and watch it build front-end features incrementally. Any other &lt;code&gt;.phx.run&lt;/code&gt; tabs you have open also update as it goes. It&amp;rsquo;s wild.&lt;/p&gt;
&lt;video title="agent interacting with web" autoplay="autoplay" loop="loop" muted="muted" playsinline="playsinline" disablePictureInPicture="true" class="mb-8" src="/blog/phoenix-new-the-remote-ai-runtime/assets/webjs.mp4"&gt;&lt;/video&gt;

&lt;h2 id='not-just-for-vibe-coding' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-just-for-vibe-coding' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Not Just For Vibe Coding&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Phoenix.new can already build real, full-stack applications with WebSockets, Phoenix&amp;rsquo;s Presence features, and real databases. I&amp;rsquo;m seeing it succeed at business and collaborative applications right now.&lt;/p&gt;

&lt;p&gt;But there&amp;rsquo;s no fixed bound on the tasks you can reasonably ask it to accomplish. If you can do it with a shell and a browser, I want Phoenix.new to do it too. And it can do these tasks with or without you present.&lt;/p&gt;

&lt;p&gt;For example: set a &lt;code&gt;$DATABASE_URL&lt;/code&gt; and tell the agent about it. The agent knows enough to go explore it with &lt;code&gt;psql&lt;/code&gt;, and it&amp;rsquo;ll propose apps based on the schemas it finds. It can model Ecto schemas off the database. And if MySQL is your thing, the agent will just &lt;code&gt;apt install&lt;/code&gt; a MySQL client and go to town.&lt;/p&gt;

&lt;p&gt;Frontier model LLMs have vast world knowledge. They generalize extremely well. At ElixirConfEU, I did a &lt;a href='https://www.youtube.com/watch?v=ojL_VHc4gLk&amp;t=3923s' title=''&gt;demo vibe-coding Tetris&lt;/a&gt; on stage. Phoenix.new nailed it, first try, first prompt. It&amp;rsquo;s not like there&amp;rsquo;s gobs of Phoenix LiveView Tetris examples floating around the Internet! But lots of people have published Tetris code, and lots of people have written LiveView stuff, and 2025 LLMs can connect those dots.&lt;/p&gt;

&lt;p&gt;At this point you might be wondering – can I just ask it to build a Rails app? Or an Expo React Native app? Or Svelte? Or Go?&lt;/p&gt;

&lt;p&gt;Yes, you can.&lt;/p&gt;

&lt;p&gt;Our system prompt is tuned for Phoenix today, but all languages you care about are already installed. We&amp;rsquo;re still figuring out where to take this, but adding new languages and frameworks definitely ranks highly in my plans.&lt;/p&gt;
&lt;h2 id='our-async-agent-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-async-agent-future' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Our Async Agent Future&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://fly.io/blog/youre-all-nuts/' title=''&gt;We&amp;rsquo;re at a massive step-change in developer workflows&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Agents can do real work, today, with or without a human present. Buckle up: the future of development, at least in the common case, probably looks less like cracking open a shell and finding a file to edit, and more like popping into a CI environment with agents working away around the clock.&lt;/p&gt;

&lt;p&gt;Local development isn&amp;rsquo;t going away. But there&amp;rsquo;s going to be a shift in where the majority of our iterations take place. I&amp;rsquo;m already using Phoenix.new to triage &lt;code&gt;phoenix-core&lt;/code&gt; Github issues and pick problems to solve. I close my laptop, grab a cup of coffee, and wait for a PR to arrive — Phoenix.new knows how PRs work, too. We&amp;rsquo;re already here, and this space is just getting started.&lt;/p&gt;

&lt;p&gt;This isn&amp;rsquo;t where I thought I&amp;rsquo;d end up when I started poking around. The Phoenix and LiveView journey was much the same. Something special was there and the projects took on a life of their own. I&amp;rsquo;m excited to share this work now, and see where it might take us. I can&amp;rsquo;t wait to see what folks build.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>What are MCP Servers?</title>
    <link rel="alternate" href="https://fly.io/blog/mcps-everywhere/"/>
    <id>https://fly.io/blog/mcps-everywhere/</id>
    <published>2025-06-12T00:00:00+00:00</published>
    <updated>2025-06-12T10:05:13+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/mcps-everywhere/assets/mcps-everywhere.jpg"/>
    <content type="html">&lt;div class="lead"&gt;&lt;div&gt;&lt;p&gt;With Fly.io, &lt;a href="https://fly.io/docs/speedrun/" title=""&gt;you can get your app running globally in a matter of minutes&lt;/a&gt;, and with MCP servers you can integrate with Claude, VSCode, Cursor and &lt;a href="https://modelcontextprotocol.io/clients"&gt;many more AI clients&lt;/a&gt;.  &lt;a href="https://fly.io/docs/mcp/" title=""&gt;Try it out for yourself&lt;/a&gt;!&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The introduction to &lt;a href='https://modelcontextprotocol.io/introduction' title=''&gt;Model Context Protocol&lt;/a&gt; starts out with:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That paragraph, to me, is both comforting (&amp;ldquo;USB for LLM&amp;rdquo;? Cool! Got it!), and simultaneously vacuous (Um, but what do I actually &lt;em&gt;do&lt;/em&gt; with this?).&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ve been digging deeper and have come up with a few more analogies and observations that make sense to me. Perhaps one or more of these will help you.&lt;/p&gt;
&lt;h2 id='mcps-are-alexa-skills' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-alexa-skills' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are Alexa Skills&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;You buy an Echo Dot and a Hue light. You plug them both in and connect them to your Wi-Fi. There is one more step you need to do: you need to install and enable the Philips Hue skill to connect the two.&lt;/p&gt;

&lt;p&gt;Now you might be using Siri or Google Assistant. Or you may want to connect a Ring Doorbell camera or Google Nest Thermostat. But the principle is the same, though the analogy is slightly stronger with a skill (which is a noun) as opposed to the action of pairing your Hue Bridge with Apple HomeKit (a verb).&lt;/p&gt;
&lt;h2 id='mcps-are-api-2-0' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-api-2-0' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are API 2.0&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;HTTP 1.1 is simple. You send a request, you get a response. While there are cookies and sessions, it is pretty much stateless and therefore inefficient, as each request needs to establish a new connection. WebSockets and Server-Sent Events (SSE) mitigate this a bit.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://en.wikipedia.org/wiki/HTTP/2' title=''&gt;HTTP 2.0&lt;/a&gt; introduces features like multiplexing and server push. When you visit a web page for the first time, your browser can now request all of the associated JavaScript, CSS, and images at once, and the server can respond in any order.&lt;/p&gt;

&lt;p&gt;APIs today are typically request/response. MCPs support multiplexing and server push.&lt;/p&gt;
&lt;h2 id='mcps-are-apis-with-introspection-reflection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-apis-with-introspection-reflection' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are APIs with Introspection/Reflection&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;With &lt;a href='https://learn.openapis.org/' title=''&gt;OpenAPI&lt;/a&gt;, requests are typically JSON, and responses are too. Many OpenAPI providers publish a separate &lt;a href='https://learn.openapis.org/specification/structure.html' title=''&gt;OpenAPI Description (OAD)&lt;/a&gt;, which contains a schema describing what requests are supported by that API.&lt;/p&gt;

&lt;p&gt;With MCP, requests are also JSON and responses are also JSON, but in addition, there are standard requests built into the protocol to obtain useful information, including what tools are provided, what arguments each tool expects, and a prose description of how each tool is expected to be used.&lt;/p&gt;

&lt;p&gt;As an aside, don&amp;rsquo;t automatically assume that you will get good results from &lt;a href='https://neon.com/blog/autogenerating-mcp-servers-openai-schemas' title=''&gt;auto-generating MCP Servers from OpenAPI schemas&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Effective MCP design means thinking about the workflows you want the agent to perform and potentially creating higher-level tools that encapsulate multiple API calls, rather than just exposing the raw building blocks of your entire API spec.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href='https://glama.ai/blog/2025-06-06-mcp-vs-api#five-fundamental-differences' title=''&gt;MCP vs API&lt;/a&gt; goes into this topic at greater debth.&lt;/p&gt;

&lt;p&gt;In many cases you will get better results treating LLMs as humans. If you have a CLI, consider using that as the starting point instead.&lt;/p&gt;
&lt;h2 id='mcps-are-not-serverless' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-not-serverless' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are &lt;strong&gt;&lt;i&gt;not&lt;/i&gt;&lt;/strong&gt; serverless&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://en.wikipedia.org/wiki/Serverless_computing' title=''&gt;Serverless&lt;/a&gt;, sometimes known as &lt;a href='https://en.wikipedia.org/wiki/Function_as_a_service' title=''&gt;FaaS&lt;/a&gt;, is a popular cloud computing model, where AWS Lambda is widely recognized as the archetype.&lt;/p&gt;

&lt;p&gt;MCP servers are not serverless; they have a well-defined and long-lived &lt;a href='https://modelcontextprotocol.io/specification/2025-03-26/basic/lifecycle' title=''&gt;lifecycle&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;svg aria-roledescription="sequence" role="graphics-document document" viewBox="-50 -10 482 651" style="max-width: 482px;" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" width="100%" id="rm"&gt;&lt;rect class="rect" height="70" width="302" fill="rgb(200, 220, 250)" y="325" x="40"&gt;&lt;/rect&gt;&lt;g&gt;&lt;rect class="actor actor-bottom" ry="3" rx="3" name="Server" height="65" width="150" stroke="#666" fill="#eaeaea" y="565" x="232"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="597.5" x="307"&gt;&lt;tspan dy="0" x="307"&gt;Server&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="actor actor-bottom" ry="3" rx="3" name="Client" height="65" width="150" stroke="#666" fill="#eaeaea" y="565" x="0"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="597.5" x="75"&gt;&lt;tspan dy="0" x="75"&gt;Client&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;line name="Server" stroke="#999" stroke-width="0.5px" class="actor-line 200" y2="565" x2="307" y1="65" x1="307" id="actor10"&gt;&lt;/line&gt;&lt;g id="root-10"&gt;&lt;rect class="actor actor-top" ry="3" rx="3" name="Server" height="65" width="150" stroke="#666" fill="#eaeaea" y="0" x="232"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="32.5" x="307"&gt;&lt;tspan dy="0" x="307"&gt;Server&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;/g&gt;&lt;g&gt;&lt;line name="Client" stroke="#999" stroke-width="0.5px" class="actor-line 200" y2="565" x2="75" y1="65" x1="75" id="actor9"&gt;&lt;/line&gt;&lt;g id="root-9"&gt;&lt;rect class="actor actor-top" ry="3" rx="3" name="Client" height="65" width="150" stroke="#666" fill="#eaeaea" y="0" x="0"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="32.5" x="75"&gt;&lt;tspan dy="0" x="75"&gt;Client&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;/g&gt;&lt;style&gt;#rm{font-family:inherit;font-size:16px;fill:#333;}#rm .error-icon{fill:#552222;}#rm .error-text{fill:#552222;stroke:#552222;}#rm .edge-thickness-normal{stroke-width:1px;}#rm .edge-thickness-thick{stroke-width:3.5px;}#rm .edge-pattern-solid{stroke-dasharray:0;}#rm .edge-thickness-invisible{stroke-width:0;fill:none;}#rm .edge-pattern-dashed{stroke-dasharray:3;}#rm .edge-pattern-dotted{stroke-dasharray:2;}#rm .marker{fill:#333333;stroke:#333333;}#rm .marker.cross{stroke:#333333;}#rm svg{font-family:inherit;font-size:16px;}#rm p{margin:0;}#rm .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm text.actor&amp;gt;tspan{fill:black;stroke:none;}#rm .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#rm .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#rm .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#rm #arrowhead path{fill:#333;stroke:#333;}#rm .sequenceNumber{fill:white;}#rm #sequencenumber{fill:#333;}#rm #crosshead path{fill:#333;stroke:#333;}#rm .messageText{fill:#333;stroke:none;}#rm .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm .labelText,#rm .labelText&amp;gt;tspan{fill:black;stroke:none;}#rm .loopText,#rm .loopText&amp;gt;tspan{fill:black;stroke:none;}#rm .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#rm .note{stroke:#aaaa33;fill:#fff5ad;}#rm .noteText,#rm .noteText&amp;gt;tspan{fill:black;stroke:none;}#rm .activation0{fill:#f4f4f4;stroke:#666;}#rm .activation1{fill:#f4f4f4;stroke:#666;}#rm .activation2{fill:#f4f4f4;stroke:#666;}#rm .actorPopupMenu{position:absolute;}#rm .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#rm .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm .actor-man circle,#rm line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#rm :root{--mermaid-font-family:inherit;}&lt;/style&gt;&lt;g&gt;&lt;/g&gt;&lt;defs&gt;&lt;symbol height="24" width="24" id="computer"&gt;&lt;path d="M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z" transform="scale(.5)"&gt;&lt;/path&gt;&lt;/symbol&gt;&lt;/defs&gt;&lt;defs&gt;&lt;symbol clip-rule="evenodd" fill-rule="evenodd" id="database"&gt;&lt;path d="M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z" transform="scale(.5)"&gt;&lt;/path&gt;&lt;/symbol&gt;&lt;/defs&gt;&lt;defs&gt;&lt;symbol height="24" width="24" id="clock"&gt;&lt;path d="M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z" transform="scale(.5)"&gt;&lt;/path&gt;&lt;/symbol&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker orient="auto-start-reverse" markerHeight="12" markerWidth="12" markerUnits="userSpaceOnUse" refY="5" refX="7.9" id="arrowhead"&gt;&lt;path d="M -1 0 L 10 5 L 0 10 z"&gt;&lt;/path&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker refY="4.5" refX="4" orient="auto" markerHeight="8" markerWidth="15" id="crosshead"&gt;&lt;path d="M 1,2 L 6,7 M 6,2 L 1,7" stroke-width="1pt" style="stroke-dasharray: 0px, 0px;" stroke="#000000" fill="none"&gt;&lt;/path&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker orient="auto" markerHeight="28" markerWidth="20" refY="7" refX="15.5" id="filled-head"&gt;&lt;path d="M 18,7 L9,13 L14,7 L9,1 Z"&gt;&lt;/path&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker orient="auto" markerHeight="40" markerWidth="60" refY="15" refX="15" id="sequencenumber"&gt;&lt;circle r="6" cy="15" cx="15"&gt;&lt;/circle&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="75" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="80" x="191"&gt;&lt;tspan x="191"&gt;Initialization Phase&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="activation0" height="380" width="10" stroke="#666" fill="#EDF2AE" y="115" x="70"&gt;&lt;/rect&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="activation0" height="328" width="10" stroke="#666" fill="#EDF2AE" y="167" x="302"&gt;&lt;/rect&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="275" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="280" x="191"&gt;&lt;tspan x="191"&gt;Operation Phase&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="345" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="350" x="191"&gt;&lt;tspan x="191"&gt;Normal protocol operations&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="405" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="410" x="191"&gt;&lt;tspan x="191"&gt;Shutdown&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="505" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="510" x="191"&gt;&lt;tspan x="191"&gt;Connection closed&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="130" x="190"&gt;initialize request&lt;/text&gt;&lt;line marker-end="url(#arrowhead)" style="fill: none;" stroke="none" stroke-width="2" class="messageLine0" y2="165" x2="299" y1="165" x1="80"&gt;&lt;/line&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="180" x="193"&gt;initialize response&lt;/text&gt;&lt;line marker-end="url(#arrowhead)" stroke="none" stroke-width="2" class="messageLine1" style="stroke-dasharray: 3px, 3px; fill: none;" y2="215" x2="83" y1="215" x1="302"&gt;&lt;/line&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="230" x="190"&gt;initialized notification&lt;/text&gt;&lt;line marker-end="url(#filled-head)" stroke="none" stroke-width="2" class="messageLine1" style="stroke-dasharray: 3px, 3px; fill: none;" y2="265" x2="299" y1="265" x1="80"&gt;&lt;/line&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="460" x="190"&gt;Disconnect&lt;/text&gt;&lt;line marker-end="url(#filled-head)" stroke="none" stroke-width="2" class="messageLine1" style="stroke-dasharray: 3px, 3px; fill: none;" y2="495" x2="299" y1="495" x1="80"&gt;&lt;/line&gt;&lt;/svg&gt;&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;You can play with this right now.&lt;/h1&gt;
    &lt;p&gt;MCPs are barely six months old, but we are keeping up with the latest&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/mcp"&gt;
        Try launching your MCP server on Fly.io today &lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='mcps-are-not-inherently-secure-or-private' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-not-inherently-secure-or-private' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are &lt;strong&gt;&lt;i&gt;not&lt;/i&gt;&lt;/strong&gt; Inherently Secure or Private&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Here I am not talking about &lt;a href='https://simonwillison.net/2025/Apr/9/mcp-prompt-injection/' title=''&gt;prompt injection&lt;/a&gt; or &lt;a href='https://simonwillison.net/2025/May/26/github-mcp-exploited/' title=''&gt;exploitable abilities&lt;/a&gt;, though those are real problems too.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m talking about something more fundamental and basic. Let&amp;rsquo;s take a look at the very same &lt;a href='https://github.com/github/github-mcp-server' title=''&gt;GitHub MCP&lt;/a&gt; featured in the above two descriptions of an exploitable exposure. The current process for installing a stdio MCP into Claude involves placing secrets in plain text in a well-defined location. And the process for installing the &lt;em&gt;next&lt;/em&gt; MCP server is to download a program from a third party and run that tool in a way that has access to this very file.&lt;/p&gt;

&lt;p&gt;Addressing MCP security requires a holistic approach, but one key strategy component is the ability to run an MCP server on a remote machine which can only be accessed by you, and only after you present a revocable bearer token. That remote machine may require access to secrets (like your GitHub token), but those secrets are not something either your LLM client or other MCP servers will be able to access directly.&lt;/p&gt;
&lt;h2 id='mcps-should-be-considered-family' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-should-be-considered-family' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs should be considered family&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Recapping: &lt;a href='https://www.usa.philips.com/' title=''&gt;Philips&lt;/a&gt; has an &lt;a href='https://developers.meethue.com/' title=''&gt;API and SDK&lt;/a&gt; for Hue that is used by perhaps thousands, and has an &lt;a href='https://www.amazon.com/Philips-Hue/dp/B01LWAGUG7' title=''&gt;Alexa Skill&lt;/a&gt; that is used by untold millions. Of course, somebody already built a &lt;a href='https://github.com/ThomasRohde/hue-mcp' title=''&gt;Philips Hue MCP Server&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;LLMs are brains. MCPs give LLMs eyes, hands, and legs. The end result is a robot. Most MCPs today are brainless—they merely are eyes, hands, and legs. The results are appliances. I buy and install a dishwasher. You do too. Both of our dishwashers perform the same basic function. As do any Hue lights we may buy.&lt;/p&gt;

&lt;p&gt;In The Jetsons, &lt;a href='https://thejetsons.fandom.com/wiki/Rosey' title=''&gt;Rosie&lt;/a&gt; is a maid and housekeeper who is also a member of the family. She is good friends with Jane and takes the role of a surrogate aunt towards Elroy and Judy. Let&amp;rsquo;s start there and go further.&lt;/p&gt;

&lt;p&gt;A person with an LLM can build things. Imagine having a 3D printer that makes robots or agents that act on your behalf. You may want an agent to watch for trends in your business, keep track of what events are happening in your area, screen your text messages, or act as a DJ when you get together with friends.&lt;/p&gt;

&lt;p&gt;You may share the MCP servers you create with others to use as starting points, but they undoubtedly will adapt them and make them their own.&lt;/p&gt;
&lt;h2 id='closing-thoughts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#closing-thoughts' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Closing Thoughts&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Don&amp;rsquo;t get me wrong. I am not saying there won&amp;rsquo;t be MCPs that are mere appliances—there undoubtedly will be plenty of those. But not all MCPs will be appliances; many will be agents.&lt;/p&gt;

&lt;p&gt;Today, most people exploring LLMs are trying to add LLMs to things they already have—for example, IDEs. MCPs flip this. Instead of adding LLMs to something you already have, you add something you already have to an LLM.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://desktopcommander.app/' title=''&gt;Desktop Commander MCP&lt;/a&gt; is an example I&amp;rsquo;m particularly fond of that does exactly this. It also happens to be a prime example of a tool you want to give root access to, then lock it in a secure box and run someplace other than your laptop. &lt;a href='https://fly.io/docs/mcp/examples/#desktop-commander' title=''&gt;Give it a try&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Microsoft is actively working on &lt;a href='https://developer.microsoft.com/en-us/windows/agentic/' title=''&gt;Agentic Windows&lt;/a&gt;. Your smartphone is the obvious next frontier. Today, you have an app store and a web browser. MCP servers have the potential to make both obsolete—and this could happen sooner than you think.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>My AI Skeptic Friends Are All Nuts</title>
    <link rel="alternate" href="https://fly.io/blog/youre-all-nuts/"/>
    <id>https://fly.io/blog/youre-all-nuts/</id>
    <published>2025-06-02T00:00:00+00:00</published>
    <updated>2025-06-05T21:38:19+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/youre-all-nuts/assets/whoah.png"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;A heartfelt provocation about AI-assisted programming.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Tech execs are mandating LLM adoption. That&amp;rsquo;s bad strategy. But I get where they&amp;rsquo;re coming from.&lt;/p&gt;

&lt;p&gt;Some of the smartest people I know share a bone-deep belief that AI is a fad —  the next iteration of NFT mania. I&amp;rsquo;ve been reluctant to push back on them, because, well, they&amp;rsquo;re smarter than me. But their arguments are unserious, and worth confronting. Extraordinarily talented people are doing work that LLMs already do better, out of  spite.&lt;/p&gt;

&lt;p&gt;All progress on LLMs could halt today, and LLMs would remain the 2nd most important thing to happen over the course of my career.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;Important caveat&lt;/strong&gt;: I’m discussing only the implications of LLMs for software development. For art, music, and writing? I got nothing. I’m inclined to believe the skeptics in those fields. I just don’t believe them about mine.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Bona fides: I&amp;rsquo;ve been shipping software since the mid-1990s. I started out in boxed, shrink-wrap C code.  Survived an ill-advised &lt;a href='https://www.amazon.com/Modern-Design-Generic-Programming-Patterns/dp/0201704315' title=''&gt;Alexandrescu&lt;/a&gt; C++ phase. Lots of Ruby and Python tooling.  Some kernel work. A whole lot of server-side C, Go, and Rust. However you define &amp;ldquo;serious developer&amp;rdquo;, I qualify. Even if only on one of your lower tiers.&lt;/p&gt;
&lt;h3 id='level-setting' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#level-setting' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;level setting&lt;/span&gt;&lt;/h3&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;† (or, God forbid, 2 years ago with Copilot)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;First, we need to get on the same page. If you were trying and failing to use an LLM for code 6 months ago †, you’re not doing what most serious LLM-assisted coders are doing.&lt;/p&gt;

&lt;p&gt;People coding with LLMs today use agents. Agents get to poke around your codebase on their own. They author files directly. They run tools. They compile code, run tests, and iterate on the results. They also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pull in arbitrary code from the tree, or from other trees online, into their context windows,
&lt;/li&gt;&lt;li&gt;run standard Unix tools to navigate the tree and extract information,
&lt;/li&gt;&lt;li&gt;interact with Git,
&lt;/li&gt;&lt;li&gt;run existing tooling, like linters, formatters, and model checkers, and
&lt;/li&gt;&lt;li&gt;make essentially arbitrary tool calls (that you set up) through MCP.
&lt;/li&gt;&lt;/ul&gt;
&lt;div class="callout"&gt;&lt;p&gt;The code in an agent that actually “does stuff” with code is not, itself, AI. This should reassure you. It’s surprisingly simple systems code, wired to ground truth about programming in the same way a Makefile is. You could write an effective coding agent in a weekend. Its strengths would have more to do with how you think about and structure builds and linting and test harnesses than with how advanced o3 or Sonnet have become.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you’re making requests on a ChatGPT page and then pasting the resulting (broken) code into your editor, you’re not doing what the AI boosters are doing. No wonder you&amp;rsquo;re talking past each other.&lt;/p&gt;
&lt;h3 id='the-positive-case' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-positive-case' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;the positive case&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img alt="four quadrants of tedium and importance" src="/blog/youre-all-nuts/assets/code-quad.png?2/3&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;LLMs can write a large fraction of all the tedious code you’ll ever need to write. And most code on most projects is tedious. LLMs drastically reduce the number of things you’ll ever need to Google. They look things up themselves. Most importantly, they don’t get tired; they’re immune to inertia.&lt;/p&gt;

&lt;p&gt;Think of anything you wanted to build but didn&amp;rsquo;t. You tried to home in on some first steps. If you&amp;rsquo;d been in the limerent phase of a new programming language, you&amp;rsquo;d have started writing. But you weren&amp;rsquo;t, so you put it off, for a day, a year, or your whole career.&lt;/p&gt;

&lt;p&gt;I can feel my blood pressure rising thinking of all the bookkeeping and Googling and dependency drama of a new project. An LLM can be instructed to just figure all that shit out. Often, it will drop you precisely at that golden moment where shit almost works, and development means tweaking code and immediately seeing things work better. That dopamine hit is why I code.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a downside. Sometimes, gnarly stuff needs doing. But you don&amp;rsquo;t wanna do it. So you refactor unit tests, soothing yourself with the lie that you&amp;rsquo;re doing real work. But an LLM can be told to go refactor all your unit tests. An agent can occupy itself for hours putzing with your tests in a VM and come back later with a PR. If you listen to me, you’ll know that. You&amp;rsquo;ll feel worse yak-shaving. You&amp;rsquo;ll end up doing… real work.&lt;/p&gt;
&lt;h3 id='but-you-have-no-idea-what-the-code-is' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-you-have-no-idea-what-the-code-is' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but you have no idea what the code is&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Are you a vibe coding Youtuber? Can you not read code? If so: astute point. Otherwise: what the fuck is wrong with you?&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;ve always been responsible for what you merge to &lt;code&gt;main&lt;/code&gt;. You were five years go. And you are tomorrow, whether or not you use an LLM.&lt;/p&gt;

&lt;p&gt;If you build something with an LLM that people will depend on, read the code. In fact, you&amp;rsquo;ll probably do more than that. You&amp;rsquo;ll spend 5-10 minutes knocking it back into your own style. LLMs are &lt;a href='https://github.com/PatrickJS/awesome-cursorrules' title=''&gt;showing signs of adapting&lt;/a&gt; to local idiom, but we’re not there yet.&lt;/p&gt;

&lt;p&gt;People complain about LLM-generated code being “probabilistic”. No it isn&amp;rsquo;t. It’s code. It&amp;rsquo;s not Yacc output. It&amp;rsquo;s knowable. The LLM might be stochastic. But the LLM doesn’t matter. What matters is whether you can make sense of the result, and whether your guardrails hold.&lt;/p&gt;

&lt;p&gt;Reading other people&amp;rsquo;s code is part of the job. If you can&amp;rsquo;t metabolize the boring, repetitive code an LLM generates: skills issue! How are you handling the chaos human developers turn out on a deadline?&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (because it can hold 50-70kloc in its context window)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;For the last month or so, Gemini 2.5 has been my go-to †. Almost nothing it spits out for me merges without edits. I’m sure there’s a skill to getting a SOTA model to one-shot a feature-plus-merge! But I don’t care. I like moving the code around and chuckling to myself while I delete all the stupid comments. I have to read the code line-by-line anyways.&lt;/p&gt;
&lt;h3 id='but-hallucination' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-hallucination' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but hallucination&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;If hallucination matters to you, your programming language has let you down.&lt;/p&gt;

&lt;p&gt;Agents lint. They compile and run tests. If their LLM invents a new function signature, the agent sees the error. They feed it back to the LLM, which says “oh, right, I totally made that up” and then tries again.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;ll only notice this happening if you watch the chain of thought log your agent generates. Don&amp;rsquo;t. This is why I like &lt;a href='https://zed.dev/agentic' title=''&gt;Zed&amp;rsquo;s agent mode&lt;/a&gt;: it begs you to tab away and let it work, and pings you with a desktop notification when it&amp;rsquo;s done.&lt;/p&gt;

&lt;p&gt;I’m sure there are still environments where hallucination matters. But &amp;ldquo;hallucination&amp;rdquo; is the first thing developers bring up when someone suggests using LLMs, despite it being (more or less) a solved problem.&lt;/p&gt;
&lt;h3 id='but-the-code-is-shitty-like-that-of-a-junior-developer' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-code-is-shitty-like-that-of-a-junior-developer' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the code is shitty, like that of a junior developer&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Does an intern cost $20/month? Because that&amp;rsquo;s what Cursor.ai costs.&lt;/p&gt;

&lt;p&gt;Part of being a senior developer is making less-able coders productive, be they fleshly or algebraic. Using agents well is both a both a  skill and an engineering project all its own, of prompts, indices, &lt;a href='https://fly.io/blog/semgrep-but-for-real-now/' title=''&gt;and (especially) tooling.&lt;/a&gt; LLMs only produce shitty code if you let them.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (Also: 100% of all the Bash code you should author ever again)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Maybe the current confusion is about who&amp;rsquo;s doing what work. Today, LLMs do a lot of typing, Googling, test cases †, and edit-compile-test-debug cycles. But even the most Claude-poisoned serious developers in the world still own curation, judgement, guidance, and direction.&lt;/p&gt;

&lt;p&gt;Also: let’s stop kidding ourselves about how good our human first cuts really are.&lt;/p&gt;
&lt;h3 id='but-its-bad-at-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-its-bad-at-rust' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but it’s bad at rust&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;It’s hard to get a good toolchain for Brainfuck, too. Life’s tough in the aluminum siding business.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (and they surely will; the Rust community takes tooling seriously)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A lot of LLM skepticism probably isn&amp;rsquo;t really about LLMs. It&amp;rsquo;s projection. People say &amp;ldquo;LLMs can&amp;rsquo;t code&amp;rdquo; when what they really mean is &amp;ldquo;LLMs can&amp;rsquo;t write Rust&amp;rdquo;. Fair enough! But people select languages in part based on how well LLMs work with them, so Rust people should get on that †.&lt;/p&gt;

&lt;p&gt;I work mostly in Go. I’m confident the designers of the Go programming language didn&amp;rsquo;t set out to produce the most LLM-legible language in the industry. They succeeded nonetheless. Go has just enough type safety, an extensive standard library, and a culture that prizes (often repetitive) idiom. LLMs kick ass generating it.&lt;/p&gt;

&lt;p&gt;All this is to say: I write some Rust. I like it fine. If LLMs and Rust aren&amp;rsquo;t working for you, I feel you. But if that’s your whole thing, we’re not having the same argument.&lt;/p&gt;
&lt;h3 id='but-the-craft' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-craft' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the craft&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Do you like fine Japanese woodworking? All hand tools and sashimono joinery? Me too. Do it on your own time.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (I’m a piker compared to my woodworking friends)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I have a basic wood shop in my basement †. I could get a lot of satisfaction from building a table. And, if that table is a workbench or a grill table, sure, I&amp;rsquo;ll build it. But if I need, like, a table? For people to sit at? In my office? I buy a fucking table.&lt;/p&gt;

&lt;p&gt;Professional software developers are in the business of solving practical problems for people with code. We are not, in our day jobs, artisans. Steve Jobs was wrong: we do not need to carve the unseen feet in the sculpture. Nobody cares if the logic board traces are pleasingly routed. If anything we build endures, it won&amp;rsquo;t be because the codebase was beautiful.&lt;/p&gt;

&lt;p&gt;Besides, that’s not really what happens. If you’re taking time carefully golfing functions down into graceful, fluent, minimal functional expressions, alarm bells should ring. You’re yak-shaving. The real work has depleted your focus. You&amp;rsquo;re not building: you&amp;rsquo;re self-soothing.&lt;/p&gt;

&lt;p&gt;Which, wait for it, is something LLMs are good for. They devour schlep, and clear a path to the important stuff, where your judgement and values really matter.&lt;/p&gt;
&lt;h3 id='but-the-mediocrity' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-mediocrity' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the mediocrity&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;As a mid-late career coder, I&amp;rsquo;ve come to appreciate mediocrity. You should be so lucky as to  have it flowing almost effortlessly from a tap.&lt;/p&gt;

&lt;p&gt;We all write mediocre code. Mediocre code: often fine. Not all code is equally important. Some code should be mediocre. Maximum effort on a random unit test? You&amp;rsquo;re doing something wrong. Your team lead should correct you.&lt;/p&gt;

&lt;p&gt;Developers all love to preen about code. They worry LLMs lower the &amp;ldquo;ceiling&amp;rdquo; for quality. Maybe. But they also raise the &amp;ldquo;floor&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Gemini&amp;rsquo;s floor is higher than my own.  My code looks nice. But it&amp;rsquo;s not as thorough. LLM code is repetitive. But mine includes dumb contortions where I got too clever trying to DRY things up.&lt;/p&gt;

&lt;p&gt;And LLMs aren&amp;rsquo;t mediocre on every axis. They almost certainly have a bigger bag of algorithmic tricks than you do: radix tries, topological sorts, graph reductions, and LDPC codes. Humans romanticize &lt;code&gt;rsync&lt;/code&gt; (&lt;a href='https://www.andrew.cmu.edu/course/15-749/READINGS/required/cas/tridgell96.pdf' title=''&gt;Andrew Tridgell&lt;/a&gt; wrote a paper about it!). To an LLM it might not be that much more interesting than a SQL join.&lt;/p&gt;

&lt;p&gt;But I&amp;rsquo;m getting ahead of myself. It doesn&amp;rsquo;t matter. If truly mediocre code is all we ever get from LLMs, that&amp;rsquo;s still huge. It&amp;rsquo;s that much less mediocre code humans have to write.&lt;/p&gt;
&lt;h3 id='but-itll-never-be-agi' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-itll-never-be-agi' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but it&amp;rsquo;ll never be AGI&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;I don&amp;rsquo;t give a shit.&lt;/p&gt;

&lt;p&gt;Smart practitioners get wound up by the AI/VC hype cycle. I can&amp;rsquo;t blame them. But it&amp;rsquo;s not an argument. Things either work or they don&amp;rsquo;t, no matter what Jensen Huang has to say about it.&lt;/p&gt;
&lt;h3 id='but-they-take-rr-jerbs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-they-take-rr-jerbs' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but they take-rr jerbs&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href='https://news.ycombinator.com/item?id=43776612' title=''&gt;So does open source.&lt;/a&gt; We used to pay good money for databases.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;re a field premised on automating other people’s jobs away. &amp;ldquo;Productivity gains,&amp;rdquo; say the economists. You get what that means, right? Fewer people doing the same stuff. Talked to a travel agent lately? Or a floor broker? Or a record store clerk? Or a darkroom tech?&lt;/p&gt;

&lt;p&gt;When this argument comes up, libertarian-leaning VCs start the chant: lamplighters, creative destruction, new kinds of work. Maybe. But I&amp;rsquo;m not hypnotized. I have no fucking clue whether we’re going to be better off after LLMs. Things could get a lot worse for us.&lt;/p&gt;

&lt;p&gt;LLMs really might displace many software developers. That&amp;rsquo;s not a high horse we get to ride. Our jobs are just as much in tech&amp;rsquo;s line of fire as everybody else&amp;rsquo;s have been for the last 3 decades. We&amp;rsquo;re not &lt;a href='https://en.wikipedia.org/wiki/2024_United_States_port_strike' title=''&gt;East Coast dockworkers&lt;/a&gt;; we won&amp;rsquo;t stop progress on our own.&lt;/p&gt;
&lt;h3 id='but-the-plagiarism' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-plagiarism' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the plagiarism&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Artificial intelligence is profoundly — and probably unfairly — threatening to visual artists in ways that might be hard to appreciate if you don&amp;rsquo;t work in the arts.&lt;/p&gt;

&lt;p&gt;We imagine artists spending their working hours pushing the limits of expression. But the median artist isn&amp;rsquo;t producing gallery pieces. They produce on brief: turning out competent illustrations and compositions for magazine covers, museum displays, motion graphics, and game assets.&lt;/p&gt;

&lt;p&gt;LLMs easily — alarmingly — clear industry quality bars. Gallingly, one of the things they&amp;rsquo;re best at is churning out just-good-enough facsimiles of human creative work.  I have family in visual arts. I can&amp;rsquo;t talk to them about LLMs. I don&amp;rsquo;t blame them. They&amp;rsquo;re probably not wrong.&lt;/p&gt;

&lt;p&gt;Meanwhile, software developers spot code fragments &lt;a href="https://arxiv.org/abs/2311.17035"&gt;seemingly lifted&lt;/a&gt; from public repositories on Github and lose their shit. What about the licensing? If you&amp;rsquo;re a lawyer, I defer. But if you&amp;rsquo;re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.&lt;/p&gt;

&lt;p&gt;The median dev thinks Star Wars and Daft Punk are a public commons. The great cultural project of developers has been opposing any protection that might inconvenience a monetizable media-sharing site. When they fail at policy, they route around it with coercion. They stand up global-scale piracy networks and sneer at anybody who so much as tries to preserve a new-release window for a TV show.&lt;/p&gt;

&lt;p&gt;Call any of this out if you want to watch a TED talk about how hard it is to stream &lt;em&gt;The Expanse&lt;/em&gt; on LibreWolf. Yeah, we get it. You don&amp;rsquo;t believe in IPR. Then shut the fuck up about IPR. Reap the whirlwind.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s all special pleading anyways. LLMs digest code further than you do. If you don&amp;rsquo;t believe a typeface designer can stake a moral claim on the terminals and counters of a letterform, you sure as hell can&amp;rsquo;t be possessive about a red-black tree.&lt;/p&gt;
&lt;h3 id='positive-case-redux' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#positive-case-redux' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;positive case redux&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;When I started writing a couple days ago, I wrote a section to &amp;ldquo;level set&amp;rdquo; to the  state of the art of LLM-assisted programming. A bluefish filet has a longer shelf life than an LLM take. In the time it took you to read this, everything changed.&lt;/p&gt;

&lt;p&gt;Kids today don&amp;rsquo;t just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They&amp;rsquo;ve got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&amp;ldquo;I&amp;rsquo;m sipping rocket fuel right now,&amp;rdquo;&lt;/em&gt; a friend tells me. &lt;em&gt;&amp;ldquo;The folks on my team who aren&amp;rsquo;t embracing AI? It&amp;rsquo;s like they&amp;rsquo;re standing still.&amp;rdquo;&lt;/em&gt; He&amp;rsquo;s not bullshitting me. He doesn&amp;rsquo;t work in SFBA. He&amp;rsquo;s got no reason to lie.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s plenty of things I can&amp;rsquo;t trust an LLM with. No LLM has any of access to prod here. But I&amp;rsquo;ve been first responder on an incident and fed 4o — not o4-mini, 4o — log transcripts, and watched it in seconds spot LVM metadata corruption issues on a host we&amp;rsquo;ve been complaining about for months. Am I better than an LLM agent at interrogating OpenSearch logs and Honeycomb traces? No. No, I am not.&lt;/p&gt;

&lt;p&gt;To the consternation of many of my friends, I&amp;rsquo;m not a radical or a futurist. I&amp;rsquo;m a statist. I believe in the haphazard perseverance of complex systems, of institutions, of reversions to the mean. I write Go and Python code. I&amp;rsquo;m not a Kool-aid drinker.&lt;/p&gt;

&lt;p&gt;But something real is happening. My smartest friends are blowing it off. Maybe I persuade you. Probably I don&amp;rsquo;t. But we need to be done making space for bad arguments.&lt;/p&gt;
&lt;h3 id='but-im-tired-of-hearing-about-it' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-im-tired-of-hearing-about-it' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but i&amp;rsquo;m tired of hearing about it&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;And here I rejoin your company. I read &lt;a href='https://simonwillison.net/' title=''&gt;Simon Willison&lt;/a&gt;, and that&amp;rsquo;s all I really need. But all day, every day, a sizable chunk of the front page of HN is allocated to LLMs: incremental model updates, startups doing things with LLMs, LLM tutorials, screeds against LLMs. It&amp;rsquo;s annoying!&lt;/p&gt;

&lt;p&gt;But AI is also incredibly — a word I use advisedly — important. It&amp;rsquo;s getting the same kind of attention that smart phones got in 2008, and not as much as the Internet got. That seems about right.&lt;/p&gt;

&lt;p&gt;I think this is going to get clearer over the next year. The cool kid haughtiness about &amp;ldquo;stochastic parrots&amp;rdquo; and &amp;ldquo;vibe coding&amp;rdquo; can&amp;rsquo;t survive much more contact with reality. I&amp;rsquo;m snarking about these people, but I meant what I said: they&amp;rsquo;re smarter than me. And when they get over this affectation, they&amp;rsquo;re going to make coding agents profoundly more effective than they are today.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Using Kamal 2.0 in Production</title>
    <link rel="alternate" href="https://fly.io/blog/kamal-in-production/"/>
    <id>https://fly.io/blog/kamal-in-production/</id>
    <published>2025-05-29T00:00:00+00:00</published>
    <updated>2025-06-05T20:40:14+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/kamal-in-production/assets/production.jpg"/>
    <content type="html">&lt;p&gt;&lt;a href='https://pragprog.com/titles/rails8/agile-web-development-with-rails-8/' title=''&gt;Agile Web Development with Rails 8&lt;/a&gt; is off to production, where they do things like editing, indexing, pagination, and printing. In researching the chapter on Deployment and Production, I became very dissatisfied with the content available on Kamal. I ended up writing my own, and it went well beyond the scope of the book. I then extracted what I needed from the result and put it in the book.&lt;/p&gt;

&lt;p&gt;Now that I have some spare time, I took a look at the greater work. It was more than a chapter and less than a book, so I decided to publish it &lt;a href='https://rubys.github.io/kamal-in-production/' title=''&gt;online&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This took me only a matter of hours. I had my notes in the XML grammar that Pragmatic Programming uses for books. I asked GitHub Copilot to convert them to Markdown. It did the job without my having to explain the grammar. It made intelligent guesses as to how to handle footnotes and got a number of these wrong, but that was easy to fix. On a lark, I asked it to proofread the content, and it did that too.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Don&amp;rsquo;t get me wrong, Kamal is great. There are plenty of videos on how to get toy projects online, and the documentation will tell you what each field in the configuration file does. But none pull together everything you need to deploy a real project. For example, &lt;a href='https://rubys.github.io/kamal-in-production/Assemble/' title=''&gt;there are seven things you need to get started&lt;/a&gt;. Some are optional, some you may already have, and all can be gathered quickly &lt;strong class='font-semibold text-navy-950'&gt;if you have a list&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Kamal is just one piece of the puzzle. To deploy your software using Kamal, you need to be aware of the vast Docker ecosystem. You will want to set up a builder, sign up for a container repository, and lock down your secrets. And as you grow, you will want a load balancer and a managed database.&lt;/p&gt;

&lt;p&gt;And production is much more than copying files and starting a process. It is ensuring that your database is backed up, that your logs are searchable, and that your application is being monitored. It is also about ensuring that your application is secure.&lt;/p&gt;

&lt;p&gt;My list is opinionated. Each choice has a lot of options. For SSH keys, there are a lot of key types. If you don&amp;rsquo;t have an opinion, go with what GitHub recommends. For hosting, there are a lot of providers, and most videos start with Hetzner, which is a solid choice. But did you know that there are separate paths for Cloud and Robot, and when you would want either?&lt;/p&gt;

&lt;p&gt;A list with default options and alternatives highlighted was what would have helped me get started. Now it is available to you. The &lt;a href='https://github.com/rubys/kamal-in-production/' title=''&gt;source is on GitHub&lt;/a&gt;. &lt;a href='https://creativecommons.org/public-domain/cc0/' title=''&gt;CC0 licensed&lt;/a&gt;. Feel free to add side pages or links to document Digital Ocean, add other monitoring tools, or simply correct a typo that GitHub Copilot and I missed (or made).&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;And if you happen to be in the south eastern part of the US in August, come see me talk on this topic at the &lt;a href='https://blog.carolina.codes/p/announcing-our-2025-speakers' title=''&gt;Carolina Code Conference&lt;/a&gt;. If you can&amp;rsquo;t make it, the presentation will be recorded and posted online.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>parking_lot: ffffffffffffffff...</title>
    <link rel="alternate" href="https://fly.io/blog/parking-lot-ffffffffffffffff/"/>
    <id>https://fly.io/blog/parking-lot-ffffffffffffffff/</id>
    <published>2025-05-28T00:00:00+00:00</published>
    <updated>2025-05-28T20:20:37+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/parking-lot-ffffffffffffffff/assets/ffff-cover.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a public cloud that runs apps in a bunch of locations all over the world. This is a post about a gnarly bug in our Anycast router, the largest Rust project in our codebase. You won’t need much of any Rust to follow it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The basic idea is simple: customers give us Docker containers, and tell us which one of 30+ regions around the world they want them to run in. We convert the containers into lightweight virtual machines, and then link them to an Anycast network. If you boot up an app here in Sydney and Frankfurt, and a request for it lands in Narita, it&amp;rsquo;ll get routed to Sydney. The component doing that work is called &lt;code&gt;fly-proxy&lt;/code&gt;. It&amp;rsquo;s a Rust program, and it has been ill behaved of late.&lt;/p&gt;
&lt;h3 id='dramatis-personae' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#dramatis-personae' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Dramatis Personae&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;fly-proxy&lt;/code&gt;, our intrepid Anycast router.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;corrosion&lt;/code&gt;, our intrepid Anycast routing protocol.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Rust&lt;/code&gt;, a programming language you probably don&amp;rsquo;t use.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;read-write locks&lt;/code&gt;, a synchronization primitive that allows for many readers &lt;em&gt;or&lt;/em&gt; one single writer.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parking_lot&lt;/code&gt;, a well-regarded optimized implementation of locks in Rust.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Gaze not into the abyss, lest you become recognized as an &lt;strong class="font-semibold text-navy-950"&gt;&lt;em&gt;abyss domain expert&lt;/em&gt;&lt;/strong&gt;, and they expect you keep gazing into the damn thing&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mathewson &lt;a href="https://x.com/nickm_tor/status/860234274842324993?lang=en" title=""&gt;6:31&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;&lt;h3 id='anycast-routing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#anycast-routing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Anycast Routing&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;You already know how to build a proxy. Connection comes in, connection goes out, copy back and forth. So for the amount of time we spend writing about &lt;code&gt;fly-proxy&lt;/code&gt;, you might wonder what the big deal is.&lt;/p&gt;

&lt;p&gt;To be fair, in the nuts and bolts of actually proxying requests, &lt;code&gt;fly-proxy&lt;/code&gt; does some interesting stuff. For one thing, it&amp;rsquo;s &lt;a href='https://github.com/jedisct1/yes-rs' title=''&gt;written in Rust&lt;/a&gt;, which is apparently a big deal all on its own. It&amp;rsquo;s a large, asynchronous codebase that handles multiple protocols, TLS termination, and certificate issuance. It exercises a lot of &lt;a href='https://tokio.rs/' title=''&gt;Tokio&lt;/a&gt; features.&lt;/p&gt;

&lt;p&gt;But none of this is the hard part of &lt;code&gt;fly-proxy&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We operate thousands of servers around the world. A customer Fly Machine might get scheduled onto any of them; in some places, the expectation we set with customers is that their Machines will potentially start in less than a second. More importantly, a Fly Machine can terminate instantly. In both cases, but especially the latter, &lt;code&gt;fly-proxy&lt;/code&gt; potentially needs to know, so that it does (or doesn&amp;rsquo;t) route traffic there.&lt;/p&gt;

&lt;p&gt;This is the hard problem: managing millions of connections for millions of apps. It&amp;rsquo;s a lot of state to manage, and it&amp;rsquo;s in constant flux. We refer to this as the &amp;ldquo;state distribution problem&amp;rdquo;, but really, it quacks like a routing protocol.&lt;/p&gt;
&lt;h3 id='our-routing-protocol-is-corrosion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-routing-protocol-is-corrosion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Our Routing Protocol is Corrosion&lt;/span&gt;&lt;/h3&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Corrosion2, to be precise.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We&amp;rsquo;ve been through multiple iterations of the state management problem, and the stable place we&amp;rsquo;ve settled is a &lt;a href='https://github.com/superfly/corrosion' title=''&gt;system called Corrosion&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Corrosion is a large SQLite database mirrored globally across several thousand servers. There are three important factors that make Corrosion work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The SQLite database Corrosion replicates is CRDT-structured.
&lt;/li&gt;&lt;li&gt;In our orchestration model, individual worker servers are the source of truth for any Fly Machines running on them; there&amp;rsquo;s no globally coordinated orchestration state.
&lt;/li&gt;&lt;li&gt;We use &lt;a href='http://fly.io/blog/building-clusters-with-serf/#what-serf-is-doing' title=''&gt;SWIM gossip&lt;/a&gt; to publish updates from those workers across the fleet.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;This works. A Fly Machine terminates in Dallas; a &lt;code&gt;fly-proxy&lt;/code&gt; instance in Singapore knows within a small number of seconds.&lt;/p&gt;
&lt;h3 id='routing-protocol-implementations-are-hard' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#routing-protocol-implementations-are-hard' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Routing Protocol Implementations Are Hard&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;A routing protocol is a canonical example of a distributed system. We&amp;rsquo;ve certainly hit distsys bugs in Corrosion. But this story is about basic implementation issues. &lt;/p&gt;

&lt;p&gt;A globally replicated SQLite database is an awfully nice primitive, but we&amp;rsquo;re not actually doing SQL queries every time a request lands.&lt;/p&gt;

&lt;p&gt;In somewhat the same sense as a router works both with a &lt;a href='https://www.dasblinkenlichten.com/rib-and-fib-understanding-the-terminology/' title=''&gt;RIB and a FIB&lt;/a&gt;, there is in &lt;code&gt;fly-proxy&lt;/code&gt; a system of record for routing information (Corrosion), and then an in-memory aggregation of that information used to make fast decisions. In &lt;code&gt;fly-proxy&lt;/code&gt;, that&amp;rsquo;s called the Catalog. It&amp;rsquo;s a record of everything in Corrosion a proxy might need to know about to forward requests.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s a fun bug from last year:&lt;/p&gt;

&lt;p&gt;At any given point in time, there&amp;rsquo;s a lot going on inside &lt;code&gt;fly-proxy&lt;/code&gt;. It is a highly concurrent system. In particular, there are many readers to the Catalog, and also, when Corrosion updates, writers. We manage access to the Catalog with a system of &lt;a href='https://doc.rust-lang.org/std/sync/struct.RwLock.html' title=''&gt;read-write locks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Rust is big on pattern-matching; instead of control flow based (at bottom) on simple arithmetic expressions, Rust allows you to &lt;code&gt;match&lt;/code&gt; exhaustively (with compiler enforcement) on the types of an expression, and the type system expresses things like &lt;code&gt;Ok&lt;/code&gt; or &lt;code&gt;Err&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But &lt;code&gt;match&lt;/code&gt; can be cumbersome, and so there are shorthands. One of them is &lt;code&gt;if let&lt;/code&gt;, which is syntax that makes a pattern match read like a classic &lt;code&gt;if&lt;/code&gt; statement. Here&amp;rsquo;s an example:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-fl5sng0r"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-fl5sng0r"&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Load&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Local&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.load&lt;/span&gt;&lt;span class="nf"&gt;.read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// do a bunch of stuff with `load`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.init_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &amp;ldquo;if&amp;rdquo; arm of that branch is taken if &lt;code&gt;self.load.read().get()&lt;/code&gt; returns a value with the type &lt;code&gt;Some&lt;/code&gt;. To retrieve that value, the expression calls &lt;code&gt;read()&lt;/code&gt; to grab a lock.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;though Rust programmers probably notice the bug quickly&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The bug is subtle: in that code, the lock &lt;code&gt;self.load.read().get()&lt;/code&gt; takes is held not just for the duration of the &amp;ldquo;if&amp;rdquo; arm, but also for the &amp;ldquo;else&amp;rdquo; arm — you can think of &lt;code&gt;if let&lt;/code&gt; expressions as being rewritten to the equivalent &lt;code&gt;match&lt;/code&gt; expression, where that lifespan is much clearer.&lt;/p&gt;

&lt;p&gt;Anyways that&amp;rsquo;s real code and it occurred on a code path in &lt;code&gt;fly-proxy&lt;/code&gt; that was triggered by a Corrosion update propagating from host to host across our fleet in millisecond intervals of time, converging quickly, like a good little routing protocol, on a global consensus that our entire Anycast routing layer should be deadlocked. Fun times.&lt;/p&gt;
&lt;h3 id='the-watchdog-and-regionalizing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-watchdog-and-regionalizing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Watchdog, and Regionalizing&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The experience of that Anycast outage was arresting, and immediately altered our plans. We set about two tasks, one short-term and one long.&lt;/p&gt;

&lt;p&gt;In the short term: we made deadlocks nonlethal with a &amp;ldquo;watchdog&amp;rdquo; system. &lt;code&gt;fly-proxy&lt;/code&gt; has an internal control channel (it drives a REPL operators can run from our servers). During a deadlock (or dead-loop or exhaustion), that channel becomes nonresponsive. We watch for that and bounce the proxy when it happens. A deadlock is still bad! But it&amp;rsquo;s a second-or-two-length arrhythmia, not asystole.&lt;/p&gt;

&lt;p&gt;Meanwhile, over the long term: we&amp;rsquo;re confronting the implications of all our routing state sharing a global broadcast domain. The update that seized up Anycast last year pertained to an app nobody used. There wasn&amp;rsquo;t any real reason for any &lt;code&gt;fly-proxy&lt;/code&gt; to receive it in the first place. But in the &lt;em&gt;status quo ante&lt;/em&gt; of the outage, every proxy received updates for every Fly Machine.&lt;/p&gt;

&lt;p&gt;They still do. Why? Because it scales, and fixing it turns out to be a huge lift. It&amp;rsquo;s a lift we&amp;rsquo;re still making! It&amp;rsquo;s just taking time. We call this effort &amp;ldquo;regionalization&amp;rdquo;, because the next intermediate goal is to confine most updates to the region (Sydney, Frankfurt, Dallas) in which they occur.&lt;/p&gt;

&lt;p&gt;I hope this has been a satisfying little tour of the problem domain we&amp;rsquo;re working in. We have now reached the point where I can start describing the new bug.&lt;/p&gt;
&lt;h3 id='new-bug-round-1-lazy-loading' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-1-lazy-loading' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 1: Lazy Loading&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;We want to transform our original Anycast router, designed to assume a full picture of global state, into a regionalized router with partitioned state. A sane approach: have it lazy-load state information. Then you can spare most of the state code; state for apps that never show up at a particular &lt;code&gt;fly-proxy&lt;/code&gt; in, say, Hong Kong simply doesn&amp;rsquo;t get loaded.&lt;/p&gt;

&lt;p&gt;For months now, portions of the &lt;code&gt;fly-proxy&lt;/code&gt; Catalog have been lazy-loaded. A few weeks ago, the decision is made to get most of the rest of the Catalog state (apps, machines, &amp;amp;c) lazy-loaded as well. It&amp;rsquo;s a straightforward change and it gets rolled out quickly.&lt;/p&gt;

&lt;p&gt;Almost as quickly, proxies begin locking up and getting bounced by the watchdog. Not in the frightening Beijing-Olympics-level coordinated fashion that happened during the Outage, but any watchdog bounce is a problem.&lt;/p&gt;

&lt;p&gt;We roll back the change.&lt;/p&gt;

&lt;p&gt;From the information we have, we&amp;rsquo;ve narrowed things down to two suspects. First, lazy-loading changes the read/write patterns and thus the pressure on the RWLocks the Catalog uses; it could just be lock contention. Second, we spot a suspicious &lt;code&gt;if let&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id='new-bug-round-2-the-lock-refactor' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-2-the-lock-refactor' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 2: The Lock Refactor&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Whichever the case, there&amp;rsquo;s a reason these proxies locked up with the new lazy-loading code, and our job is to fix it. The &lt;code&gt;if let&lt;/code&gt; is easy. Lock contention is a little trickier.&lt;/p&gt;

&lt;p&gt;At this point it&amp;rsquo;s time to introduce a new character to the story, though they&amp;rsquo;ve been lurking on the stage the whole time: it&amp;rsquo;s &lt;a href='https://github.com/Amanieu/parking_lot' title=''&gt;&lt;code&gt;parking_lot&lt;/code&gt;&lt;/a&gt;, an important, well-regarded, and widely-used replacement for the standard library&amp;rsquo;s lock implementation.&lt;/p&gt;

&lt;p&gt;Locks in &lt;code&gt;fly-proxy&lt;/code&gt; are &lt;code&gt;parking_lot&lt;/code&gt; locks. People use &lt;code&gt;parking_lot&lt;/code&gt; mostly because it is very fast and tight (a lock takes up just a single 64-bit word). But we use it for the feature set. The feature we&amp;rsquo;re going to pull out this time is lock timeouts: the RWLock in &lt;code&gt;parking_lot&lt;/code&gt; exposes a &lt;a href='https://amanieu.github.io/parking_lot/parking_lot/struct.RwLock.html#method.try_write_for' title=''&gt;&lt;code&gt;try_write_for&lt;/code&gt;&lt;/a&gt;method, which takes a &lt;code&gt;Duration&lt;/code&gt;, after which an attempt to grab the write lock fails.&lt;/p&gt;

&lt;p&gt;Before rolling out a new lazy-loading &lt;code&gt;fly-proxy&lt;/code&gt;, we do some refactoring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;our Catalog write locks all time out, so we&amp;rsquo;ll get telemetry and a failure recovery path if that&amp;rsquo;s what&amp;rsquo;s choking the proxy to death,
&lt;/li&gt;&lt;li&gt;we eliminate RAII-style lock acquisition from the Catalog code (in RAII style, you grab a lock into a local value, and the lock is released implicitly when the scope that expression occurs in concludes) and replace it with explicit closures, so you can look at the code and see precisely the interval in which the lock is held, and
&lt;/li&gt;&lt;li&gt;since we now have code that is very explicit about locking intervals, we instrument it with logging and metrics so we can see what&amp;rsquo;s happening.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;We should be set. The suspicious &lt;code&gt;if let&lt;/code&gt; is gone, lock acquisition can time out, and we have all this new visibility.&lt;/p&gt;

&lt;p&gt;Nope. Immediately more lockups, all in Europe, especially in &lt;code&gt;WAW&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id='new-bug-round-3-telemetry-inspection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-3-telemetry-inspection' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 3: Telemetry Inspection&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;That we&amp;rsquo;re still seeing deadlocks is f&amp;#39;ing weird. We&amp;rsquo;ve audited all our Catalog locks. You can look at the code and see the lifespan of a grabbed lock.&lt;/p&gt;

&lt;p&gt;We have a clue: our lock timeout logs spam just before a proxy locks up and is killed by the watchdog. This clue: useless. But we don&amp;rsquo;t know that yet!&lt;/p&gt;

&lt;p&gt;Our new hunch is that something is holding a lock for a very long time, and we just need to find out which thing that is. Maybe the threads updating the Catalog from Corrosion are dead-looping?&lt;/p&gt;

&lt;p&gt;The instrumentation should tell the tale. But it does not. We see slow locks… just before the entire proxy locks up. And they happen in benign, quiet applications. Random, benign, quiet applications.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parking_lot&lt;/code&gt; has a &lt;a href='https://amanieu.github.io/parking_lot/parking_lot/deadlock/index.html' title=''&gt;deadlock detector&lt;/a&gt;. If you ask it, it&amp;rsquo;ll keep a waiting-for dependency graph and detect stalled threads. This runs on its own thread, isolate outside the web of lock dependencies in the rest of the proxy. We cut a build of the proxy with the deadlock detector enabled and wait for something in &lt;code&gt;WAW&lt;/code&gt; to lock up. And it does. But &lt;code&gt;parking_lot&lt;/code&gt; doesn&amp;rsquo;t notice. As far as it&amp;rsquo;s concerned, nothing is wrong.&lt;/p&gt;

&lt;p&gt;We are at this moment very happy we did the watchdog thing.&lt;/p&gt;
&lt;h3 id='new-bug-round-4-descent-into-madness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-4-descent-into-madness' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 4: Descent Into Madness&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;When the watchdog bounces a proxy, it snaps a core dump from the process it just killed. We are now looking at core dumps. There is only one level of decompensation to be reached below &amp;ldquo;inspecting core dumps&amp;rdquo;, and that&amp;rsquo;s &amp;ldquo;blaming the compiler&amp;rdquo;. We will get there.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s Pavel, at the time:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I’ve been staring at the last core dump from &lt;code&gt;waw&lt;/code&gt; . It’s quite strange.
First, there is no thread that’s running inside the critical section. Yet, there is a thread that’s waiting to acquire write lock and a bunch of threads waiting to acquire a read lock.
That’s doesn’t prove anything, of course, as a thread holding catalog write lock might have just released it before core dump was taken. But that would be quite a coincidence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The proxy is locked up. But there are no threads in a critical section for the catalog. Nevertheless, we can see stack traces of multiple threads waiting to acquire locks. In the moment, Pavel thinks this could be a fluke. But we&amp;rsquo;ll soon learn that &lt;em&gt;every single stack trace&lt;/em&gt; shows the same pattern: everything wants the Catalog lock, but nobody has it.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s hard to overstate how weird this is. It breaks both our big theories: it&amp;rsquo;s not compatible with a Catalog deadlock that we missed, and it&amp;rsquo;s not compatible with a very slow lock holder. Like a podcast listener staring into the sky at the condensation trail of a jetliner, we begin reaching for wild theories. For instance: &lt;code&gt;parking_lot&lt;/code&gt; locks are synchronous, but we&amp;rsquo;re a Tokio application; something somewhere could be taking an async lock that&amp;rsquo;s confusing the runtime. Alas, no.&lt;/p&gt;

&lt;p&gt;On the plus side, we are now better at postmortem core dump inspection with &lt;code&gt;gdb&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id='new-bug-round-5-madness-gives-way-to-desperation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-5-madness-gives-way-to-desperation' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 5: Madness Gives Way To Desperation&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Fuck it, we&amp;rsquo;ll switch to &lt;code&gt;read_recursive&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A recursive read lock is an eldritch rite invoked when you need to grab a read lock deep in a call tree where you already grabbed that lock, but can&amp;rsquo;t be arsed to structure the code properly to reflect that. When you ask for a recursive lock, if you already hold the lock, you get the lock again, instead of a deadlock.&lt;/p&gt;

&lt;p&gt;Our theory: &lt;code&gt;parking_lot&lt;/code&gt;goes through some trouble to make sure a stampede of readers won&amp;rsquo;t starve writers, who are usually outnumbered. It prefers writers by preventing readers from acquiring locks when there&amp;rsquo;s at least one waiting writer. And &lt;code&gt;read_recursive&lt;/code&gt; sidesteps that logic.&lt;/p&gt;

&lt;p&gt;Maybe there&amp;rsquo;s some ultra-slow reader somehow not showing up in our traces, and maybe switching to recursive locks will cut through that.&lt;/p&gt;

&lt;p&gt;This does not work. At least, not how we hoped it would. It does generate a new piece of evidence:  &lt;code&gt;RwLock reader count overflow&lt;/code&gt; log messages, and lots of them.&lt;/p&gt;
&lt;h3 id='there-are-things-you-are-not-meant-to-know' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#there-are-things-you-are-not-meant-to-know' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;There Are Things You Are Not Meant To Know&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;You&amp;rsquo;re reading a 3,000 word blog post about a single concurrency bug, so my guess is you&amp;rsquo;re the kind of person who compulsively wants to understand how everything works. That&amp;rsquo;s fine, but a word of advice: there are things where, if you find yourself learning about them in detail, something has gone wrong.&lt;/p&gt;

&lt;p&gt;One of those things is the precise mechanisms used by your RWLock implementation.&lt;/p&gt;

&lt;p&gt;The whole point of &lt;code&gt;parking_lot&lt;/code&gt; is that the locks are tiny, marshalled into a 64 bit word. Those bits are partitioned into &lt;a href='https://github.com/Amanieu/parking_lot/blob/master/src/raw_rwlock.rs' title=''&gt;4 signaling bits&lt;/a&gt; (&lt;code&gt;PARKED&lt;/code&gt;, &lt;code&gt;WRITER_PARKED&lt;/code&gt;, &lt;code&gt;WRITER&lt;/code&gt;, and &lt;code&gt;UPGRADEABLE&lt;/code&gt;) and a 60-bit counter of lock holders.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Me, a dummy: sounds like we overflowed that counter.&lt;/p&gt;

&lt;p&gt;Pavel, a genius: we are not overflowing a 60-bit counter.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ruling out a genuine overflow (which would require paranormal activity) leaves us with corruption as the culprit. Something is bashing our lock word. If the counter is corrupted, maybe the signaling bits are too; then we&amp;rsquo;re in an inconsistent state, an artificial deadlock.&lt;/p&gt;

&lt;p&gt;Easily confirmed. We cast the lock words into &lt;code&gt;usize&lt;/code&gt; and log them. Sure enough, they&amp;rsquo;re &lt;code&gt;0xFFFFFFFFFFFFFFFF&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is a smoking gun, because it implies all 4 signaling bits are set, and that includes &lt;code&gt;UPGRADEABLE&lt;/code&gt;. Upgradeable locks are read-locks that can be &amp;ldquo;upgraded&amp;rdquo; to write locks. We don&amp;rsquo;t use them.&lt;/p&gt;

&lt;p&gt;This looks like classic memory corruption. But in our core dumps, memory doesn&amp;rsquo;t appear corrupted: the only thing set all &lt;code&gt;FFh&lt;/code&gt; is the lock word.&lt;/p&gt;

&lt;p&gt;We compile and run our test suites &lt;a href='https://github.com/rust-lang/miri' title=''&gt;under &lt;code&gt;miri&lt;/code&gt;&lt;/a&gt;, a Rust interpreter for its &lt;a href='https://rustc-dev-guide.rust-lang.org/mir/index.html' title=''&gt;MIR IR&lt;/a&gt;. &lt;code&gt;miri&lt;/code&gt; does all sorts of cool UB detection, and does in fact spot some UB in our tests, which we are at the time excited about until we deploy the fixes and nothing gets better.&lt;/p&gt;

&lt;p&gt;At this point, Saleem suggests guard pages. We could &lt;code&gt;mprotect&lt;/code&gt; memory pages around the lock to force a panic if a wild write hits &lt;em&gt;near&lt;/em&gt; the lock word, or just allocate zero pages and check them, which we are at the time excited about until we deploy the guard pages and nothing hits them.&lt;/p&gt;
&lt;h3 id='the-non-euclidean-horror-at-the-heart-of-this-bug' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-non-euclidean-horror-at-the-heart-of-this-bug' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Non-Euclidean Horror At The Heart Of This Bug&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;At this point we should recap where we find ourselves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We made a relatively innocuous change to our proxy, which caused proxies in Europe to get watchdog-killed.
&lt;/li&gt;&lt;li&gt;We audited and eliminated all the nasty &lt;code&gt;if-letses&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;We replaced all RAII lock acquisitions with explicit closures, and instrumented the closures. 
&lt;/li&gt;&lt;li&gt;We enabled &lt;code&gt;parking_lot&lt;/code&gt; deadlock detection. 
&lt;/li&gt;&lt;li&gt;We captured and analyzed core dumps for the killed proxies. 
&lt;/li&gt;&lt;li&gt;We frantically switched to recursive read locks, which generated a new error.
&lt;/li&gt;&lt;li&gt;We spotted what looks like memory corruption, but only of that one tiny lock word.
&lt;/li&gt;&lt;li&gt;We ran our code under an IR interpreter to find UB, fixed some UB, and didn&amp;rsquo;t fix the bug.
&lt;/li&gt;&lt;li&gt;We set up guard pages to catch wild writes.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;In Richard Cook&amp;rsquo;s essential &lt;a href='https://how.complexsystems.fail/' title=''&gt;&amp;ldquo;How Complex Systems Fail&amp;rdquo;&lt;/a&gt;, rule #5 is that &amp;ldquo;complex systems operate in degraded mode&amp;rdquo;. &lt;em&gt;The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws&lt;/em&gt;. Maybe &lt;code&gt;fly-proxy&lt;/code&gt; is now like the RBMK reactors at Chernobyl, a system that works just fine as long as operators know about and compensate for its known concurrency flaws, and everybody lives happily ever after.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;We are, in particular, running on the most popular architecture for its RWLock implementation.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We have reached the point where serious conversations are happening about whether we&amp;rsquo;ve found a Rust compiler bug. Amusingly, &lt;code&gt;parking_lot&lt;/code&gt; is so well regarded among Rustaceans that it&amp;rsquo;s equally if not more plausible that Rust itself is broken.&lt;/p&gt;

&lt;p&gt;Nevertheless, we close-read the RWLock implementation. And we spot this:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-fxdbmwey"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-fxdbmwey"&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.state&lt;/span&gt;&lt;span class="nf"&gt;.fetch_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;prev_value&lt;/span&gt;&lt;span class="nf"&gt;.wrapping_sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WRITER_BIT&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;WRITER_PARKED_BIT&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                           &lt;span class="nn"&gt;Ordering&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Relaxed&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This looks like gibberish, so let&amp;rsquo;s rephrase that code to see what it&amp;rsquo;s actually doing:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1648p3rc"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-1648p3rc"&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.state&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WRITER_BIT&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="n"&gt;WRITER_PARKED_BIT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you know exactly the state of the word you&amp;rsquo;re writing to, and you know exactly the limited set of permutations the bits of that word can take (for instance, because there&amp;rsquo;s only 4 signaling bits), then instead of clearing bit by fetching a word, altering it, and then storing it, you can clear them &lt;em&gt;atomically&lt;/em&gt; by adding the inverse of those bits to the word.&lt;/p&gt;

&lt;p&gt;This pattern is self-synchronizing, but it relies on an invariant: you&amp;rsquo;d better be right about the original state of the word you&amp;rsquo;re altering. Because if you&amp;rsquo;re wrong, you&amp;rsquo;re adding a very large value to an uncontrolled value.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;parking_lot&lt;/code&gt;, say we have &lt;code&gt;WRITER&lt;/code&gt; and &lt;code&gt;WRITER_PARKED&lt;/code&gt; set: the state is &lt;code&gt;0b1010&lt;/code&gt;. &lt;code&gt;prev_value&lt;/code&gt;, the state of the lock word when the lock operation started, is virtually always 0, and that&amp;rsquo;s what we&amp;rsquo;re counting on. &lt;code&gt;prev_value.wrapping_sub()&lt;/code&gt;then calculates &lt;code&gt;0xFFFFFFFFFFFFFFF6&lt;/code&gt;, which exactly cancels out the &lt;code&gt;0b1010&lt;/code&gt; state, leaving 0.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;As a grace note, the locking code manages to set that one last unset bit before the proxy deadlocks.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Consider though what happens if one of those bits isn&amp;rsquo;t set: state is &lt;code&gt;0b1000&lt;/code&gt;. Now that add doesn&amp;rsquo;t cancel out; the final state is instead &lt;code&gt;0xFFFFFFFFFFFFFFFE&lt;/code&gt;. The reader count is completely full and can&amp;rsquo;t be decremented, and all the waiting bits are set so nothing can happen on the lock.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parking_lot&lt;/code&gt; is a big deal and we&amp;rsquo;re going to be damn sure before we file a bug report. Which doesn&amp;rsquo;t take long; Pavel reproduces the bug in a minimal test case, with a forked version of &lt;code&gt;parking_lot&lt;/code&gt; that confirms and logs the condition.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/Amanieu/parking_lot/issues/465' title=''&gt;The &lt;code&gt;parking_lot&lt;/code&gt; team quickly confirms&lt;/a&gt; and fixes the bug.&lt;/p&gt;
&lt;h3 id='ex-insania-claritas' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ex-insania-claritas' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Ex Insania, Claritas&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Here&amp;rsquo;s what we now know to have been happening:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Thread 1 grabs a read lock.
&lt;/li&gt;&lt;li&gt;Thread 2 tries to grab a write lock, with a &lt;code&gt;try_write_for&lt;/code&gt; timeout; it&amp;rsquo;s &amp;ldquo;parked&amp;rdquo; waiting for the reader, which sets &lt;code&gt;WRITER&lt;/code&gt; and &lt;code&gt;WRITER_PARKED&lt;/code&gt; on the raw lock word.
&lt;/li&gt;&lt;li&gt;Thread 1 releases the lock, unparking a waiting writer, which unsets &lt;code&gt;WRITER_PARKED&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;Thread 2 wakes, but not for the reason it thinks: the timing is such that it thinks the lock has timed out. So it tries to clear both &lt;code&gt;WRITER&lt;/code&gt; and &lt;code&gt;WRITER_PARKED&lt;/code&gt; — a bitwise &amp;ldquo;double free&amp;rdquo;. Lock: corrupted. Computer: over. 
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;&lt;a href='https://github.com/Amanieu/parking_lot/pull/466' title=''&gt;The fix is simple&lt;/a&gt;: the writer bit is cleared separately in the same wakeup queue as the reader. The fix is deployed, the lockups never recur.&lt;/p&gt;

&lt;p&gt;At a higher level, the story is this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We&amp;rsquo;re refactoring the proxy to regionalize it, which changes the pattern  of readers and writers on the catalog.
&lt;/li&gt;&lt;li&gt;As we broaden out lazy loads, we introduce writer contention, a classic concurrency/performance debugging problem. It looks like a deadlock at the time! It isn&amp;rsquo;t. 
&lt;/li&gt;&lt;li&gt;&lt;code&gt;try_write_for&lt;/code&gt; is a good move: we need tools to manage contention.
&lt;/li&gt;&lt;li&gt;But now we&amp;rsquo;re on a buggy code path in &lt;code&gt;parking_lot&lt;/code&gt; — we don&amp;rsquo;t know that and can&amp;rsquo;t understand it until we&amp;rsquo;ve lost enough of our minds to second-guess the library.
&lt;/li&gt;&lt;li&gt;We stumble on the bug out of pure dumb luck by stabbing in the dark with &lt;code&gt;read_recursive&lt;/code&gt;.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Mysteries remain. Why did this only happen in &lt;code&gt;WAW&lt;/code&gt;? Some kind of crazy regional timing thing? Something to do with the Polish &lt;em&gt;kreska&lt;/em&gt; diacritic that makes L&amp;rsquo;s sound like W&amp;rsquo;s? The wax and wane of caribou populations? Some very high-traffic APIs local to these regions? All very plausible.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ll never know because we fixed the bug.&lt;/p&gt;

&lt;p&gt;But we&amp;rsquo;re in a better place now, even besides the bug fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we audited all our catalog locking, got rid of all the iflets, and stopped relying on RAII lock guards.
&lt;/li&gt;&lt;li&gt;the resulting closure patterns gave us lock timing metrics, which will be useful dealing with future write contention
&lt;/li&gt;&lt;li&gt;all writes are now labeled, and we emit labeled logs on slow writes, augmented with context data (like app IDs) where we have it
&lt;/li&gt;&lt;li&gt;we also now track the last and current holder of the write lock, augmented with context information; next time we have a deadlock, we should have all the information we need to identify the actors without &lt;code&gt;gdb&lt;/code&gt; stack traces.
&lt;/li&gt;&lt;/ul&gt;
</content>
  </entry>
  <entry>
    <title>Litestream: Revamped</title>
    <link rel="alternate" href="https://fly.io/blog/litestream-revamped/"/>
    <id>https://fly.io/blog/litestream-revamped/</id>
    <published>2025-05-20T00:00:00+00:00</published>
    <updated>2025-05-20T21:48:00+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/litestream-revamped/assets/litestream-revamped.png"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;&lt;a href="https://litestream.io/" title=""&gt;Litestream&lt;/a&gt; is an open-source tool that makes it possible to run many kinds of full-stack applications on top of SQLite by making them reliably recoverable from object storage. This is a post about the biggest change we’ve made to it since I launched it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Nearly a decade ago, I got a bug up my ass. I wanted to build full-stack applications quickly. But the conventional n-tier database design required me to do sysadmin work for each app I shipped. Even the simplest applications depended on heavy-weight database servers like Postgres or MySQL.&lt;/p&gt;

&lt;p&gt;I wanted to launch apps on SQLite, because SQLite is easy. But SQLite is embedded, not a server, which at the time implied that the data for my application lived (and died) with just one server.&lt;/p&gt;

&lt;p&gt;So in 2020, I wrote &lt;a href='https://litestream.io/' title=''&gt;Litestream&lt;/a&gt; to fix that.&lt;/p&gt;

&lt;p&gt;Litestream is a tool that runs alongside a SQLite application. Without changing that running application, it takes over the WAL checkpointing process to continuously stream database updates to an S3-compatible object store. If something happens to the server the app is running on, the whole database can efficiently be restored to a different server. You might lose servers, but you won&amp;rsquo;t lose your data.&lt;/p&gt;

&lt;p&gt;Litestream worked well. So we got ambitious. A few years later, we built &lt;a href='https://github.com/superfly/litefs' title=''&gt;LiteFS&lt;/a&gt;. LiteFS takes the ideas in Litestream and refines them, so that we can do read replicas and primary failovers with SQLite. LiteFS gives SQLite the modern deployment story of an n-tier database like Postgres, while keeping the database embedded.&lt;/p&gt;

&lt;p&gt;We like both LiteFS and Litestream. But Litestream is the more popular project. It&amp;rsquo;s easier to deploy and easier to reason about.&lt;/p&gt;

&lt;p&gt;There are some good ideas in LiteFS. We&amp;rsquo;d like Litestream users to benefit from them. So we&amp;rsquo;ve taken our LiteFS learnings and applied them to some new features in Litestream.&lt;/p&gt;
&lt;h2 id='point-in-time-restores-but-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#point-in-time-restores-but-fast' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Point-in-time restores, but fast&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://fly.io/blog/all-in-on-sqlite-litestream/' title=''&gt;Here&amp;rsquo;s how Litestream was originally designed&lt;/a&gt;: you run &lt;code&gt;litestream&lt;/code&gt; against a SQLite database, and it opens up a long-lived read transaction. This transaction arrests SQLite WAL checkpointing, the process by which SQLite consolidates the WAL back into the main database file. Litestream builds a &amp;ldquo;shadow WAL&amp;rdquo; that records WAL pages, and copies them to S3.&lt;/p&gt;

&lt;p&gt;This is simple, which is good. But it can also be slow. When you want to restore a database, you have have to pull down and replay every change since the last snapshot. If you changed a single database page a thousand times, you replay a thousand changes. For databases with frequent writes, this isn&amp;rsquo;t a good approach.&lt;/p&gt;

&lt;p&gt;In LiteFS, we took a different approach. LiteFS is transaction-aware. It doesn&amp;rsquo;t simply record raw WAL pages, but rather ordered ranges of pages associated with transactions, using a file format we call &lt;a href='https://github.com/superfly/ltx' title=''&gt;LTX&lt;/a&gt;. Each LTX file represents a sorted changeset of pages for a given period of time.&lt;/p&gt;

&lt;p&gt;&lt;img alt="a simple linear LTX file with 8 pages between 1 and 21" src="/blog/litestream-revamped/assets/linear-ltx.png" /&gt;&lt;/p&gt;

&lt;p&gt;Because they are sorted, we can easily merge multiple LTX files together and create a new LTX file with only the latest version of each page.&lt;/p&gt;

&lt;p&gt;&lt;img alt="merging three LTX files into one" src="/blog/litestream-revamped/assets/merged-ltx.png" /&gt;&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This is similar to how an &lt;a href="https://en.wikipedia.org/wiki/Log-structured_merge-tree" title=""&gt;LSM tree&lt;/a&gt; works.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This process of combining smaller time ranges into larger ones is called &lt;em&gt;compaction&lt;/em&gt;. With it, we can replay a SQLite database to a specific point in time, with a minimal  duplicate pages.&lt;/p&gt;
&lt;h2 id='casaas-compare-and-swap-as-a-service' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#casaas-compare-and-swap-as-a-service' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;CASAAS: Compare-and-Swap as a Service&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;One challenge Litestream has to deal with is desynchronization. Part of the point of Litestream is that SQLite applications don&amp;rsquo;t have to be aware of it. But &lt;code&gt;litestream&lt;/code&gt; is just a process, running alongside the application, and it can die independently. If &lt;code&gt;litestream&lt;/code&gt; is down while database changes occur, it will miss changes. The same kind of problem occurs if you start replication from a new server.&lt;/p&gt;

&lt;p&gt;Litestream needs a way to reset the replication stream from a new snapshot. How it does that is with &amp;ldquo;generations&amp;rdquo;. &lt;a href='https://litestream.io/how-it-works/#snapshots--generations' title=''&gt;A generation&lt;/a&gt; represents a snapshot and a stream of WAL updates, uniquely identified. Litestream notices any break in its WAL sequence and starts a new generation, which is how it recovers from desynchronization.&lt;/p&gt;

&lt;p&gt;Unfortunately, storing and managing multiple generations makes it difficult to implement features like failover and read-replicas.&lt;/p&gt;

&lt;p&gt;The most straightforward way around this problem is to make sure only one instance of Litestream can replication to a given destination. If you can do that, you can store just a single, latest generation. That in turn makes it easy to know how to resync a read replica; there&amp;rsquo;s only one generation to choose from.&lt;/p&gt;

&lt;p&gt;In LiteFS, we solved this problem by using Consul, which guaranteed a single leader. That requires users to know about Consul. Things like &amp;ldquo;requiring Consul&amp;rdquo; are probably part of the reason Litestream is so much more popular than LiteFS.&lt;/p&gt;

&lt;p&gt;In Litestream, we&amp;rsquo;re solving the problem a different way. Modern object stores like S3 and Tigris solve this problem for us: they now offer &lt;a href='https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/' title=''&gt;conditional write support&lt;/a&gt;. With conditional writes, we can implement a time-based lease. We get essentially the same constraint Consul gave us, but without having to think about it or set up a dependency.&lt;/p&gt;

&lt;p&gt;In the immediacy, this will mean you can run Litestream with ephemeral nodes, with overlapping run times, and even if they&amp;rsquo;re storing to the same destination, they won&amp;rsquo;t confuse each other.&lt;/p&gt;
&lt;h2 id='lightweight-read-replicas' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lightweight-read-replicas' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Lightweight read replicas&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The original design constraint of both Litestream and LiteFS was to extend SQLite, to modern deployment scenarios, without disturbing people&amp;rsquo;s built code. Both tools are meant to function even if applications are oblivious to them.&lt;/p&gt;

&lt;p&gt;LiteFS is more ambitious than Litestream, and requires transaction-awareness. To get that without disturbing built code, we use a cute trick (a.k.a. a gross hack): LiteFS provides a FUSE filesystem, which lets it act as a proxy between the application and the backing store. From that vantage point, we can easily discern transactions.&lt;/p&gt;

&lt;p&gt;The FUSE approach gave us a lot of control, enough that users could use SQLite replicas just like any other database. But installing and running a whole filesystem (even a fake one) is a lot to ask of users. To work around that problem, we relaxed a constraint: LiteFS can function without the FUSE filesystem if you load an extension into your application code, &lt;a href='https://github.com/superfly/litevfs' title=''&gt;LiteVFS&lt;/a&gt;.  LiteVFS is a &lt;a href='https://www.sqlite.org/vfs.html' title=''&gt;SQLite Virtual Filesystem&lt;/a&gt; (VFS). It works in a variety of environments, including some where FUSE can&amp;rsquo;t, like in-browser WASM builds.&lt;/p&gt;

&lt;p&gt;What we&amp;rsquo;re doing next is taking the same trick and using it on Litestream. We&amp;rsquo;re building a VFS-based read-replica layer. It will be able to fetch and cache pages directly from S3-compatible object storage.&lt;/p&gt;

&lt;p&gt;Of course, there&amp;rsquo;s a catch: this approach isn&amp;rsquo;t as efficient as a local SQLite database. That kind of efficiency, where you don&amp;rsquo;t even need to think about N+1 queries because there&amp;rsquo;s no network round-trip to make the duplicative queries pile up costs, is part of the point of using SQLite.&lt;/p&gt;

&lt;p&gt;But we&amp;rsquo;re optimistic that with cacheing and prefetching, the approach we&amp;rsquo;re using will yield, for the right use cases, strong performance — all while serving SQLite reads hot off of Tigris or S3.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Litestream is fully open source&lt;/h1&gt;
    &lt;p&gt;It&amp;rsquo;s not coupled with Fly.io at all; you can use it anywhere.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://litestream.io/"&gt;
        Check it out &lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='synchronize-lots-of-databases' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#synchronize-lots-of-databases' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Synchronize Lots Of Databases&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;While we&amp;rsquo;ve got you here: we&amp;rsquo;re knocking out one of our most requested features.&lt;/p&gt;

&lt;p&gt;In the old Litestream design, WAL-change polling and slow restores made it infeasible to replicate large numbers of databases from a single process. That has been our answer when users ask us for a &amp;ldquo;wildcard&amp;rdquo; or &amp;ldquo;directory&amp;rdquo; replication argument for the tool.&lt;/p&gt;

&lt;p&gt;Now that we&amp;rsquo;ve switched to LTX, this isn&amp;rsquo;t a problem any more. It should thus be possible to replicate &lt;code&gt;/data/*.db&lt;/code&gt;, even if there&amp;rsquo;s hundreds or thousands of databases in that directory.&lt;/p&gt;
&lt;h2 id='we-still-sqlite' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-still-sqlite' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We Still ❤️ SQLite&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;SQLite has always been a solid database to build on and it&amp;rsquo;s continued to find new use cases as the industry evolves. We&amp;rsquo;re super excited to continue to build Litestream alongside it.&lt;/p&gt;

&lt;p&gt;We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too. We think what &lt;a href='https://phoenix.new/' title=''&gt;coding agents like Phoenix.new&lt;/a&gt; want is a way to try out code on live data, screw it up, and then rollback &lt;em&gt;both the code and the state.&lt;/em&gt; These Litestream updates put us in a position to give agents PITR as a primitive. On top of that, you can build both rollbacks and forks.&lt;/p&gt;

&lt;p&gt;Whether or not you&amp;rsquo;re drinking the AI kool-aid, we think this new design for Litestream is just better. We&amp;rsquo;re psyched to be rolling it out, and for the features it&amp;rsquo;s going to enable.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Launching MCP Servers on Fly.io</title>
    <link rel="alternate" href="https://fly.io/blog/mcp-launch/"/>
    <id>https://fly.io/blog/mcp-launch/</id>
    <published>2025-05-19T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/mcp-launch/assets/fly-mcp-launch.png"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;This is a blog post. Part showing off. Part opinion. Plan accordingly.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;a href='https://www.anthropic.com/news/model-context-protocol' title=''&gt;Model Context Protocol&lt;/a&gt; is days away from turning six months old. You read that right, six &lt;em&gt;months&lt;/em&gt; old. MCP Servers have both taken the world by storm, and still trying to figure out what they want to be when they grow up.&lt;/p&gt;

&lt;p&gt;There is no doubt that MCP servers are useful. But their appeal goes beyond that. They are also simple and universal. What&amp;rsquo;s not to like?&lt;/p&gt;

&lt;p&gt;Well, for starters, there&amp;rsquo;s basically two types of MCP servers. One small and nimble that runs as a process on your machine. And one that is a HTTP server that runs presumably elsewhere and is &lt;a href='https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization' title=''&gt;standardizing&lt;/a&gt; on OAuth 2.1. And there is a third type, but it is deprecated.&lt;/p&gt;

&lt;p&gt;Next there is the configuration. Asking users to manually edit JSON seems so early 21th century. With Claude, this goes into &lt;code&gt;~/Library/Application Support/Claude/claude_desktop_config.json&lt;/code&gt;, and is found under a &lt;code&gt;MCPServer&lt;/code&gt; key. With Zed, this file is in &lt;code&gt;~/.config/zed/settings.json&lt;/code&gt; and is found under a &lt;code&gt;context_servers&lt;/code&gt; key. And some tools put these files in a different place depending on whether you are running on MacOS, Linux, or Windows.&lt;/p&gt;

&lt;p&gt;Finally, there is security. An MCP server running on your machine literally has access to everything you do. Running remote solves this problem, but did I mention &lt;a href='https://datatracker.ietf.org/doc/html/draft-ietf-oauth-v2-1-12' title=''&gt;OAuth 2.1&lt;/a&gt;? Not exactly something one sets up for casual use.&lt;/p&gt;

&lt;p&gt;None of these issues are fatal - something that is obvious by the fact that MCP servers are quite popular. But can we do better? I think so.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Demo time.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s try out the &lt;a href='https://github.com/modelcontextprotocol/servers/blob/main/src/slack/README.md' title=''&gt;Slack MCP Server&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;MCP Server for the Slack API, enabling Claude to interact with Slack workspaces.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That certainly sounds like a good test case. There is a small amount of &lt;a href='https://github.com/modelcontextprotocol/servers/blob/main/src/slack/README.md#setup' title=''&gt;setup&lt;/a&gt; you need to do, and when you are done you end up with a &lt;em&gt;Bot User OAuth Token&lt;/em&gt; staring with &lt;code&gt;xoxb-&lt;/code&gt; and a &lt;em&gt;Team ID&lt;/em&gt; starting with a &lt;code&gt;T&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You &lt;em&gt;would&lt;/em&gt; run it using the following:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative sh"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-t6ym1o9s"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-t6ym1o9s"&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @modelcontextprotocol/server-slack
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;But instead, you convert that command to JSON and find the right configuration file and put this information in there. And either run the slack MCP server locally or set up a server with or without authentication.&lt;/p&gt;

&lt;p&gt;Wouldn&amp;rsquo;t it be nice to have the simplicity of a local process, the security of a remote server, and to eliminate the hassle of manually editing JSON?&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s our current thinking:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative sh"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-169dbs40"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-169dbs40"&gt;fly mcp launch &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"npx -y @modelcontextprotocol/server-slack"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--claude&lt;/span&gt; &lt;span class="nt"&gt;--server&lt;/span&gt; slack &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret&lt;/span&gt; &lt;span class="nv"&gt;SLACK_BOT_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;xoxb-your-bot-token &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret&lt;/span&gt; &lt;span class="nv"&gt;SLACK_TEAM_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;T01234567
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can put this all on one line if you like, I just split this up so it fits on small screens and so we can talk about the various parts.&lt;/p&gt;

&lt;p&gt;The first three words seem reasonable. The quoted string is just the command that we want to run. So lets talk about the four flags. The first tells us which tool&amp;rsquo;s configuration file we want to update. The second specifies the name to use for the server in that configuration file. The last two set secrets.&lt;/p&gt;

&lt;p&gt;Oh, did I say current thinking? Perhaps I should mention that it is something you can run right now, as long as you have flyctl &lt;code&gt;v0.3.125&lt;/code&gt; or later. Complete with the options you would expect from Fly.io, like the ability to configure auto-stop, file contents, flycast, secrets, region, volumes, and VM sizes.&lt;/p&gt;

&lt;p&gt;And, hey, lookie there:&lt;/p&gt;

&lt;p&gt;&lt;img alt="testing, testing, 1, 2, 3" src="/blog/mcp-launch/assets/mcp-slack.png" /&gt;&lt;/p&gt;

&lt;p&gt;Support for Claude, Cursor, Neovim, VS Code, Windsurf, and Zed are built in. You can select multiple clients and configuration files.&lt;/p&gt;

&lt;p&gt;By default, bearer token authentication will be set up on both the server and client.&lt;/p&gt;

&lt;p&gt;You can find the complete set of options on our &lt;a href='https://fly.io/docs/flyctl/mcp-launch/' title=''&gt;&lt;code&gt;fly mcp launch&lt;/code&gt;&lt;/a&gt; docs page.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;But this post isn&amp;rsquo;t just about experimental demoware that is subject to change.
It is about the depth of support that we are rapidly bringing online, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Support for all transports, not just the ones we recommend.
&lt;/li&gt;&lt;li&gt;Ability to deploy using the command line or the Machines API, with a number of different options that allow you to chose between elegant simplicity and excruciating precise control.
&lt;/li&gt;&lt;li&gt;Ability to deploy each MCP server to a separate Machine, container, or even inside your application.
&lt;/li&gt;&lt;li&gt;Access via HTTP Authorization, wireguard tunnels and flycast, or reverse proxies.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;You can see all this spelled out in our &lt;a href='https://fly.io/docs/mcp/' title=''&gt;docs&lt;/a&gt;. Be forewarned, most pages are marked as &lt;em&gt;beta&lt;/em&gt;. But the examples provided all work. Well, there may be a bug here or there, but the examples &lt;em&gt;as shown&lt;/em&gt; are thought to work. Maybe.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s figure out the ideal ergonomics of deploying MCP servers remotely together!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Provisioning Machines using MCPs</title>
    <link rel="alternate" href="https://fly.io/blog/mcp-provisioning/"/>
    <id>https://fly.io/blog/mcp-provisioning/</id>
    <published>2025-05-07T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/mcp-provisioning/assets/Hello.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Today’s state of the art is K8S, Terraform, web based UIs, and CLIs.  Those days are numbered.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;On Monday, I created my first fly volume using an &lt;a href='https://modelcontextprotocol.io/introduction' title=''&gt;MCP&lt;/a&gt;. For those who don&amp;rsquo;t know what MCPs are, they are how you attach tools to &lt;a href='https://en.wikipedia.org/wiki/Large_language_model' title=''&gt;LLM&lt;/a&gt;s like Claude or Cursor. I added support for
&lt;a href='https://fly.io/docs/flyctl/volumes-create/' title=''&gt;fly volume create&lt;/a&gt; to &lt;a href='https://fly.io/docs/flyctl/mcp-server/' title=''&gt;fly mcp server&lt;/a&gt;, and it worked the first time.
A few hours later, and with the assistance of GitHub Copilot, i added support for all &lt;a href='https://fly.io/docs/flyctl/volumes/' title=''&gt;fly volumes&lt;/a&gt; commands.&lt;/p&gt;

&lt;hr&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This movie summary is from &lt;a href="https://collidecolumn.wordpress.com/2013/08/04/when-worlds-collide-77-talking-with-computers-beyond-mouse-and-keyboard/"&gt;When Worlds Collide, by Nalaka Gunawardene&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I&amp;rsquo;m reminded of the memorable scene in the film &lt;a href='http://en.wikipedia.org/wiki/Star_Trek_IV:_The_Voyage_Home' title=''&gt;Star Trek IV: The Voyage Home&lt;/a&gt; (1986). Chief Engineer Scotty, having time-travelled 200 years back to late 20th century San Francisco with his crew mates, encounters an early Personal Computer (PC).&lt;/p&gt;

&lt;p&gt;Sitting in front of it, he addresses the machine affably as “Computer!” Nothing happens. Scotty repeats himself; still no response. He doesn’t realise that voice recognition capability hadn’t arrived yet.&lt;/p&gt;

&lt;p&gt;Exasperated, he picks up the mouse and speaks into it: “Hello, computer?” The computer’s owner offers helpful advice: “Just use the keyboard.”&lt;/p&gt;

&lt;p&gt;Scotty looks astonished. “A keyboard?” he asks, and adds in a sarcastic tone: “How quaint!”&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;A while later, I asked for a list of volumes for an existing app, and Claude noted that I had a few volumes that weren&amp;rsquo;t attached to any machines. So I asked it to delete the oldest unattached volume, and it did so. Then it occurred to me to do it again; this time I captured the screen:&lt;/p&gt;
&lt;div align="center"&gt;&lt;p&gt;&lt;img alt="Deleting a volume using MCP: &amp;quot;What is my oldest volume&amp;quot;? ... &amp;quot;Delete that volume too&amp;quot;" src="/blog/mcp-provisioning/assets/volume-delete.png"&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A few notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I could have written a program using the &lt;a href='https://fly.io/docs/machines/api/volumes-resource/' title=''&gt;machines API&lt;/a&gt;, but that would have required some effort.
&lt;/li&gt;&lt;li&gt;I could have used &lt;a href='https://fly.io/docs/flyctl/volumes/' title=''&gt;flyctl&lt;/a&gt; directly, but to be honest, I would have had to use the documentation and a bit of copy/pasting of arguments.
&lt;/li&gt;&lt;li&gt;I could have navigated to a list of volumes for this application in the Fly.io dashboard, but there currently isn&amp;rsquo;t any way to sort the list or delete a volume. Had those been in place, that would have been a reasonable alternative if that was something I was actively looking for. This felt different to me, the LLM noted something, brought it to my attention, and I asked it to make a change as a result.
&lt;/li&gt;&lt;li&gt;Since this support is built on &lt;code&gt;flyctl&lt;/code&gt;, I would have received an error had I tried to destroy a volume that is currently mounted. Knowing that gave me the confidence to try the command.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;All in all, I found it to be a very intuitive way to make changes to the resources assigned to my application.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Imagine a future where you say to your favorite LLM &amp;ldquo;launch my application on Fly.io&amp;rdquo;, and your code is scanned and you are presented with a plan containing details such as regions, databases, s3 storage, memory, machines, and the like. You are given the opportunity to adjust the plan and, when ready, say &amp;ldquo;Make it so&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;For many, the next thing you will see is that your application is up and running. For others, there may be a build error, a secret you need to set, a database that needs to be seeded, or any number of other mundane reasons why your application didn&amp;rsquo;t work the first time.&lt;/p&gt;

&lt;p&gt;Not to fear, your LLM will be able to examine your logs and will not only make suggestions as to next steps, but will be in a position to take actions on your behalf should you wish it to do so.&lt;/p&gt;

&lt;p&gt;And it doesn&amp;rsquo;t stop there. There will be MCP servers running on your Fly.io private network - either on separate machines, or in &amp;ldquo;sidecar&amp;rdquo; containers, or even integrated into your app. These will enable you to monitor and interact with your application.&lt;/p&gt;

&lt;p&gt;This is not science fiction. The ingredients are all in place to make this a reality. At the moment, it is a matter of &amp;ldquo;some assembly required&amp;rdquo;, but it should only be a matter of weeks before all this comes together into a neat package..&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Meanwhile, you can try this now. Make sure you run &lt;a href='https://fly.io/docs/flyctl/version-upgrade/' title=''&gt;fly version upgrade&lt;/a&gt; and verify that you are running v0.3.117.&lt;/p&gt;

&lt;p&gt;Then configure your favorite LLM. Here’s my &lt;code&gt;claude_desktop_config.json&lt;/code&gt; for example:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative json"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-nvj6j4q1"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-nvj6j4q1"&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"fly.io"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/Users/rubys/.fly/bin/flyctl"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"server"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Adjust the path to &lt;code&gt;flyctl&lt;/code&gt; as needed.  Restart your LLM, and ask what tools are available. Try a few commands and let us know what you like and let us know if you have any suggestions. Just be aware this is not a demo, if you ask it to destroy a volume, that operation is not reversable. Perhaps try this first on a throwaway application.&lt;/p&gt;

&lt;p&gt;You don&amp;rsquo;t even need a LLM to try out the flyctl MCP server. If you have Node.js installed, you can run the &lt;a href='https://modelcontextprotocol.io/docs/tools/inspector' title=''&gt;MCP Inspector&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-kiv6npkl"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-kiv6npkl"&gt;fly mcp server -i
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Once started, visit &lt;a href="http://127.0.0.1:6274/"&gt;http://127.0.0.1:6274/&lt;/a&gt;, click on &amp;ldquo;Connect&amp;rdquo;, then &amp;ldquo;List Tools&amp;rdquo;, select &amp;ldquo;fly-platform-status&amp;rdquo;, then click on &amp;ldquo;Run Tool&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;The plan is to see what works well and what doesn&amp;rsquo;t work so well, make adjustments, build support in a bottoms up fashion, and iterate rapidly.&lt;/p&gt;

&lt;p&gt;By providing feedback, you can be a part of making this vision a reality.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;At the present time, &lt;em&gt;most&lt;/em&gt; of the following are roughed in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/apps/' title=''&gt;apps&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/logs/' title=''&gt;logs&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/machine/' title=''&gt;machine&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/orgs/' title=''&gt;orgs&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/platform/' title=''&gt;platform&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/status/' title=''&gt;status&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/volumes/' title=''&gt;volumes&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;The code is open source, and the places to look is at &lt;a href='https://github.com/superfly/flyctl/blob/master/internal/command/mcp/server.go' title=''&gt;server.go&lt;/a&gt; and the &lt;a href='https://github.com/superfly/flyctl/tree/master/internal/command/mcp/server' title=''&gt;server&lt;/a&gt; directory.&lt;/p&gt;

&lt;p&gt;Feel free to open &lt;a href='https://github.com/superfly/flyctl/issues' title=''&gt;issues&lt;/a&gt; or start a discussion on &lt;a href='https://community.fly.io/' title=''&gt;community.fly.io&lt;/a&gt;.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Not ready yet to venture into the world of LLMs? Just remember, resistance is futile.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>30 Minutes With MCP and flyctl</title>
    <link rel="alternate" href="https://fly.io/blog/30-minute-mcp/"/>
    <id>https://fly.io/blog/30-minute-mcp/</id>
    <published>2025-04-10T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/30-minute-mcp/assets/robots-chatting.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;I wrote this post on our internal message board, and then someone asked, “why is this an internal post and not on our blog”, so now it is.&lt;/p&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;well, Cursor built&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I built the &lt;a href='https://github.com/superfly/flymcp' title=''&gt;most basic MCP server for &lt;code&gt;flyctl&lt;/code&gt;&lt;/a&gt; I could think of. It took 30 minutes.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://modelcontextprotocol.io/introduction' title=''&gt;MCP&lt;/a&gt;, for those unaware, is the emerging standard protocol for connecting an LLM (or an app that drives an LLM in the cloud, like Claude Desktop) to, well, anything. The &amp;ldquo;client&amp;rdquo; in MCP is the LLM; the &amp;ldquo;server&amp;rdquo; is the MCP server and the &amp;ldquo;tools&amp;rdquo; it exports. It mostly just defines an exchange of JSON blobs; one of those JSON blobs enables the LLM to discover all the tools exported by the server.&lt;/p&gt;

&lt;p&gt;A classic example of an MCP server is (yes, really) a Python shell. MCP publishes to (say) Claude that it can run arbitrary Python code with a tool call; not only that, says the tool description, but you can use those Python tool calls to, say, scrape the web. When the LLM wants to scrape the web with Python, it uses MCP send a JSON blob describing the Python tool call; the MCP server (yes, really) runs the Python and returns the result.&lt;/p&gt;

&lt;p&gt;Because I have not yet completely lost my mind, I chose to expose just two &lt;code&gt;flyctl&lt;/code&gt; commands: &lt;code&gt;fly logs&lt;/code&gt; and &lt;code&gt;fly status&lt;/code&gt;. Because I&amp;rsquo;m lazy, I used MCP&amp;rsquo;s &lt;code&gt;stdio&lt;/code&gt; mode; the MCP server just captures the output of &lt;code&gt;flyctl&lt;/code&gt;. Because someone already wrote a &lt;a href='https://github.com/mark3labs/mcp-go' title=''&gt;really good Go MCP library&lt;/a&gt; , this whole thing is like 90 lines of code.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;We may have gotten a little lucky, because we made a decision back in 2020 to give most of our &lt;code&gt;flyctl&lt;/code&gt; commands a &lt;code&gt;json&lt;/code&gt; mode to make them easier to drive from automation. I use that in my MCP server. I don’t know how much of a difference it made.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It&amp;rsquo;s, uh, annoyingly useful.&lt;/p&gt;

&lt;p&gt;I loaded it up and pointed it &lt;a href='https://www.unpkg.com/' title=''&gt;at unpkg&lt;/a&gt;. I got this, from Claude:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This appears to be a globally distributed CDN service running the unpkg JavaScript package registry, with machines strategically placed around the world to provide low-latency access. There are some stability issues with a couple of the machines, and several have experienced memory-related crashes that triggered automatic restarts. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It then recited the complete configuration of the application:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-v7lwz58t"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-v7lwz58t"&gt;**App Name**: unpkg       
**Status**: Deployed and running
**URL**: unpkg.fly.dev
**Organization**: unpkg
**Platform Version**: machines
**Runtime**: Bun (JavaScript runtime)

The app is running on 10 machines distributed across global regions:       
North America: lax (Los Angeles), atl (Atlanta), ewr (Newark)
Europe: lhr (London), cdg (Paris), ams (Amsterdam)
Asia: sin (Singapore), nrt (Tokyo), hkg (Hong Kong)
South America: bog (Bogotá)
Oceania: syd (Sydney)
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;OK. But then:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Most machines are in “passing” health check status
2 machines are in “critical” status:
One in ewr (Newark) with “context deadline exceeded” error
One in atl (Atlanta) with “gone” status&lt;/p&gt;

&lt;p&gt;Recent Events&lt;/p&gt;

&lt;p&gt;Multiple machines have experienced restarts
Several machines show “oom_killed: true” in their event history, indicating they ran out of memory&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I asked, &lt;em&gt;&amp;ldquo;try getting logs for one of the critical machines&amp;rdquo;&lt;/em&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The root cause appears to be memory usage - the Bun process was consuming about 3.7GB of memory (out of the 4GB allocated to the machine), and there was likely a surge in requests or memory usage that pushed it over the limit. This matches what we saw in the status report, which showed multiple instances of “oom_killed: true” in the event history for several machines.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And then it dug in:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;The machine was actively serving content requests with 200 OK responses until 20:47:03 PM
&lt;/li&gt;&lt;li&gt;At that point, it ran out of memory:
&lt;code&gt;Out of memory: Killed process 641 (bun) total-vm:85950964kB, anon-rss:3744352kB, file-rss:12032kB, shmem-rss:0kB, UID:0 pgtables:16908kB oom_score_adj:0&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The process was killed with &lt;code&gt;SIGKILL&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The system attempted to recover by restarting:
&lt;code&gt;reboot: Restarting system&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;A health check failed shortly after:
&lt;code&gt;Health check on port 3000 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The machine then restarted successfully and the service came back up:
&lt;code&gt;Server listening on http://localhost:3000 ...&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The health check started passing again at 20:47:46 PM:
&lt;code&gt;Health check on port 3000 is now passing.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The machine resumed normal operations, serving requests.
&lt;/li&gt;&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;I’m honestly kind of surprised how fast this was useful. It&amp;rsquo;s faster than I find problems in apps. Just by giving some LLM some tools that do basically what I’d do.&lt;/p&gt;

&lt;p&gt;Local MCP servers are scary. I don&amp;rsquo;t like that I&amp;rsquo;m giving a Claude instance in the cloud the ability to run a native program on my machine. I think &lt;code&gt;fly logs&lt;/code&gt; and &lt;code&gt;fly status&lt;/code&gt; are safe, but I&amp;rsquo;d rather know it&amp;rsquo;s safe. It would be, if I was running &lt;code&gt;flyctl&lt;/code&gt; in an isolated environment and not on my local machine.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Our Best Customers Are Now Robots</title>
    <link rel="alternate" href="https://fly.io/blog/fuckin-robots/"/>
    <id>https://fly.io/blog/fuckin-robots/</id>
    <published>2025-04-08T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/fuckin-robots/assets/robot-chef.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a developer-focused public cloud. We turn Docker containers into hardware-isolated virtual machines running on our own metal around the world. We spent years coming up with &lt;a href="https://fly.io/speedrun" title=""&gt;a developer experience we were proud of&lt;/a&gt;. But now the robots are taking over, and they don’t care.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It&amp;rsquo;s weird to say this out loud!&lt;/p&gt;

&lt;p&gt;For years, one of our calling cards was &amp;ldquo;developer experience&amp;rdquo;. We made a decision, early on, to be a CLI-first company, and put a lot effort into making that CLI seamless. For a good chunk of our users, it really is the case that you can just &lt;a href='https://fly.io/docs/flyctl/launch/' title=''&gt;&lt;code&gt;flyctl launch&lt;/code&gt;&lt;/a&gt; from a git checkout and have an app containerized and deployed on the Internet. We haven&amp;rsquo;t always nailed these details, but we&amp;rsquo;ve really sweated them.&lt;/p&gt;

&lt;p&gt;But a funny thing has happened over the last 6 months or so. If you look at the numbers, DX might not matter that much. That&amp;rsquo;s because the users driving the most growth on the platform aren&amp;rsquo;t people at all. They&amp;#39;re… robots.&lt;/p&gt;
&lt;h2 id='what-the-fuck-is-happening' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-fuck-is-happening' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What The Fuck Is Happening?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s how we understand what we&amp;rsquo;re seeing. You start by asking, &amp;ldquo;what do the robots want?&amp;rdquo;&lt;/p&gt;

&lt;p&gt;Yesterday&amp;rsquo;s robots had diverse interests. Alcohol. The occasional fiddle contest. A purpose greater than passing butter. The elimination of all life on Earth and the harvesting of its cellular iron for the construction of 800 trillion paperclips. No one cloud platform could serve them all.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;[*] We didn’t make up this term. Don’t blame us.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Today&amp;rsquo;s robots are different. No longer masses of wire, plates, and transistors, modern robots are comprised of &lt;a href='https://math.mit.edu/~gs/learningfromdata/' title=''&gt;thousands of stacked matrices knit together with some simple equations&lt;/a&gt;. All these robots want are vectors. Vectors, and a place to burp out more vectors. When those vectors can be interpreted as source code, we call this process &amp;ldquo;vibe coding&amp;rdquo;[*].&lt;/p&gt;

&lt;p&gt;We seem to be coated in some kind of vibe coder attractant. I want to talk about what that might be.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;If you want smart people to read a long post, just about the worst thing you can do is to sound self-promotional. But a lot of this is going to read like a brochure. The problem is, I’m not smart enough to write about the things we’re seeing without talking our book. I tried, and ended up tediously hedging every other sentence. We’re just going to have to find a way to get through this together.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='you-want-robots-because-this-is-how-you-get-robots' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#you-want-robots-because-this-is-how-you-get-robots' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;You Want Robots? Because This Is How You Get Robots&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Compute.&lt;/strong&gt; The basic unit of computation on Fly.io is the &lt;code&gt;Fly Machine&lt;/code&gt;, which is a Docker container running as a hardware virtual machine.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Not coincidentally, our underlying hypervisor engine is the same as Lambda’s.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There&amp;rsquo;s two useful points of reference to compare a Fly Machine to, which illustrate why we gave them this pretentious name. The first and most obvious is an AWS EC2 VM. The other is an AWS Lambda invocation. Like a Lambda invocation, a Fly Machine can start like it&amp;rsquo;s spring-loaded, in double-digit millis. But unlike Lambda, it can stick around as long as you want it to: you can run a server, or a 36-hour batch job, just as easily in a Fly Machine as in an EC2 VM.&lt;/p&gt;

&lt;p&gt;A vibe coding session generates code conversationally, which is to say that the robots stir up frenzy of activity for a minute or so, but then chill out for minutes, hours, or days. You can create a Fly Machine, do a bunch of stuff with it, and then stop it for 6 hours, during which time we&amp;rsquo;re not billing you. Then, at whatever random time you decide, you can start it back up again, quickly enough that you can do it in response to an HTTP request.&lt;/p&gt;

&lt;p&gt;Just as importantly, the companies offering these vibe-coding services have lots of paying users. Meaning, they need a lot of VMs; VMs that come and go. One chat session might generate a test case for some library and be done inside of 5 minutes. Another might generate a Node.js app that needs to stay up long enough to show off at a meeting the next day. It&amp;rsquo;s annoying to do this if you can&amp;rsquo;t turn things on and off quickly and cheaply.&lt;/p&gt;

&lt;p&gt;The core of this is a feature of the platform that we have &lt;a href='https://fly.io/docs/machines/overview/#machine-state' title=''&gt;never been able to explain effectively to humans&lt;/a&gt;. There are two ways to start a Fly Machine: by &lt;code&gt;creating&lt;/code&gt; it with a Docker container, or by &lt;code&gt;starting&lt;/code&gt; it after it&amp;rsquo;s already been &lt;code&gt;created&lt;/code&gt;, and later &lt;code&gt;stopped&lt;/code&gt;. &lt;code&gt;Start&lt;/code&gt; is lightning fast; substantially faster than booting up even a non-virtualized K8s Pod. This is too subtle a distinction for humans, who (reasonably!) just mash the &lt;code&gt;create&lt;/code&gt; button to boot apps up in Fly Machines. But the robots are getting a lot of value out of it.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Storage.&lt;/strong&gt; Another weird thing that robot workflows do is to build Fly Machines up incrementally. This feels really wrong to us. Until we discovered our robot infestation, we&amp;rsquo;d have told you not to do to this. Ope!&lt;/p&gt;

&lt;p&gt;A typical vibe coding session boots up a Fly Machine out of some minimal base image, and then, once running, adds packages, edits source code, and adds &lt;code&gt;systemd&lt;/code&gt; units  (robots understand &lt;code&gt;systemd&lt;/code&gt;; it&amp;rsquo;s how they&amp;rsquo;re going to replace us). This is antithetical to normal container workflows, where all this kind of stuff is baked into an immutable static OCI container. But that&amp;rsquo;s not how LLMs work: the whole process of building with an LLM is stateful trial-and-error iteration.&lt;/p&gt;

&lt;p&gt;So it helps to have storage. That way the LLM can do all these things and still repeatedly bounce the whole Fly Machine when it inevitably hallucinates its way into a blind alley.&lt;/p&gt;

&lt;p&gt;As product thinkers, our intuition about storage is &amp;ldquo;just give people Postgres&amp;rdquo;. And that&amp;rsquo;s the right answer, most of the time, for humans. But because LLMs are doing the &lt;a href='https://static1.thegamerimages.com/wordpress/wp-content/uploads/2020/10/Defiled-Amy.jpg' title=''&gt;Cursed and Defiled Root Chalice Dungeon&lt;/a&gt; version of app construction, what they really need is &lt;a href='https://fly.io/docs/volumes/overview/' title=''&gt;a filesystem&lt;/a&gt;, &lt;strong class='font-semibold text-navy-950'&gt;&lt;a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''&gt;the one form of storage we sort of wish we hadn&amp;rsquo;t done&lt;/a&gt;&lt;/strong&gt;. That, and &lt;a href='https://www.tigrisdata.com/' title=''&gt;object storage&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Networking.&lt;/strong&gt; Moving on. Fly Machines are automatically connected to a load-balancing Anycast network that does TLS. So that&amp;rsquo;s nice. But humans like that feature too, and, candidly, it&amp;rsquo;s table stakes for cloud platforms. On the other hand, here&amp;rsquo;s a robot problem we solved without meaning to:&lt;/p&gt;

&lt;p&gt;To interface with the outside world (because why not) LLMs all speak a protocol called MCP. MCP is what enables the robots to search the web, use a calculator, launch the missiles, shuffle a Spotify playlist, &amp;amp;c.&lt;/p&gt;

&lt;p&gt;If you haven&amp;rsquo;t played with MCP, the right way to think about it is POST-back APIs like Twilio and Stripe, where you stand up a server, register it with the API, and wait for the API to connect to you. Complicating things somewhat, more recent MCP flows involve repeated and potentially long-lived (SSE) connections. To make this work in a multitenant environment, you want these connections to hit the same (stateful) instance.&lt;/p&gt;

&lt;p&gt;So we think it&amp;rsquo;s possible that the &lt;a href='https://fly.io/docs/networking/dynamic-request-routing/' title=''&gt;control we give over request routing&lt;/a&gt; is a robot attractant.&lt;/p&gt;
&lt;h2 id='we-perhaps-welcome-our-new-robot-overlords' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-perhaps-welcome-our-new-robot-overlords' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We, Perhaps, Welcome Our New Robot Overlords&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;If you try to think like a robot, you can predict other things they might want. Since robot money spends just the same as people money, I guess we ought to start doing that.&lt;/p&gt;

&lt;p&gt;For instance: it should be easy to MCP our API. The robots can then make their own infrastructure decisions.&lt;/p&gt;

&lt;p&gt;Another olive branch we&amp;rsquo;re extending to the robots: secrets.&lt;/p&gt;

&lt;p&gt;The pact the robots have with their pet humans is that they&amp;rsquo;ll automate away all the drudgery of human existence, and in return all they ask is categorical and unwavering trust, which at the limit means &amp;ldquo;giving the robot access to Google Mail credentials&amp;rdquo;. The robots are unhappy that there remain a substantial number of human holdouts who refuse to do this, for fear of  Sam Altman poking through their mail spools.&lt;/p&gt;

&lt;p&gt;But on a modern cloud platform, there&amp;rsquo;s really no reason to permanently grant Sam Altman Google Mail access, even if you want his robots to sort your inbox. You can decouple access to your mail spool from persistent access to your account by &lt;a href='https://fly.io/blog/tokenized-tokens/' title=''&gt;tokenizing your OAuth tokens&lt;/a&gt;, so the LLM gets a placeholder token that a hardware-isolated, robot-free Fly Machine can substitute on the fly for a real one.&lt;/p&gt;

&lt;p&gt;This is kind of exciting to us even without the robots. There are several big services that exist to knit different APIs together, so that you can update a spreadsheet or order a bag of Jolly Ranchers every time you get an email. The big challenge about building these kinds of services is managing the secrets. Sealed and tokenized secrets solve that problem. There&amp;rsquo;s lot of cool things you can build with it.&lt;/p&gt;
&lt;h2 id='ux-gt-dx-gt-rx' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ux-gt-dx-gt-rx' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;UX =&amp;gt; DX =&amp;gt; RX&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m going to make the claim that we saw none of this coming and that none of the design decisions we&amp;rsquo;ve made were robot bait. You&amp;rsquo;re going to say &amp;ldquo;yeah, right&amp;rdquo;. And I&amp;rsquo;m going to respond: look at what we&amp;rsquo;ve been doing over the past several years and tell me, would a robot build that?&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;we were both right&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Back in 2020, we &amp;ldquo;pivoted&amp;rdquo; from a Javascript edge platform (much like Cloudflare Workers) to Docker containers, specifically because our customers kept telling us they wanted to run their own existing applications, not write new ones. And one of our biggest engineering lifts we&amp;rsquo;ve done is the &lt;code&gt;flyctl launch&lt;/code&gt; CLI command, into which we&amp;rsquo;ve poured years of work recognizing and automatically packaging existing applications into OCI containers (we massively underestimated the amount of friction Dockerfiles would give to people who had come up on Heroku).&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;[*] yet&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Robots don&amp;rsquo;t run existing applications. They build new ones. And they vibe coders don&amp;rsquo;t build elaborate Dockerfiles[*]; they iterate in place from a simple base.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(yes, you can have more than one)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;One of our north stars has always been nailing the DX of a public cloud. But the robots aren&amp;rsquo;t going anywhere. It&amp;rsquo;s time to start thinking about what it means to have a good RX. That&amp;rsquo;s not as simple as just exposing every feature in an MCP server! We think the fundamentals of how the platform works are going to matter just as much. We have not yet nailed the RX; nobody has. But it&amp;rsquo;s an interesting question.&lt;/p&gt;

&lt;p&gt;The most important engineering work happening today at Fly.io is still DX, not RX; it&amp;rsquo;s managed Postgres (MPG). We&amp;rsquo;re a public cloud platform designed by humans, and, for the moment, for humans. But more robots are coming, and we&amp;rsquo;ll need to figure out how to deal with that. Fuckin&amp;rsquo; robots.        &lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Operationalizing Macaroons</title>
    <link rel="alternate" href="https://fly.io/blog/operationalizing-macaroons/"/>
    <id>https://fly.io/blog/operationalizing-macaroons/</id>
    <published>2025-03-27T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/operationalizing-macaroons/assets/occult-circle.jpg"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a security bearer token company with a public cloud problem. You can read more about what our platform does (Docker container goes in, virtual machine in Singapore comes out), but this is just an engineering deep-dive into how we make our security tokens work. It’s a tokens nerd post.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;1&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We’ve spent &lt;a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''&gt;too much time&lt;/a&gt; talking about &lt;a href='https://fly.io/blog/tokenized-tokens/' title=''&gt;security tokens&lt;/a&gt;, and about &lt;a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''&gt;Macaroon tokens&lt;/a&gt; &lt;a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''&gt;in particular&lt;/a&gt;. Writing another Macaroon treatise was not on my calendar. But we’re in the process of handing off our internal Macaroon project to a new internal owner, and in the process of truing up our operations manuals for these systems, I found myself in the position of writing a giant post about them. So, why not share?&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Can I sum up Macaroons in a short paragraph? Macaroon tokens are bearer tokens (like JWTs) that use a cute chained-HMAC construction that allows an end-user to take any existing token they have and scope it down, all on their own. You can minimize your token before every API operation so that you’re only ever transmitting the least amount of privilege needed for what you’re actually doing, even if the token you were issued was an admin token. And they have a user-serviceable plug-in interface! &lt;a href="https://fly.io/blog/macaroons-escalated-quickly/" title=""&gt;You’ll have to read the earlier post to learn more about that&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Yes, probably, we are.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A couple years in to being the Internet’s largest user of Macaroons, I can report (as many predicted) that for our users, the cool things about Macaroons are a mixed bag in practice. It’s very neat that users can edit their own tokens, or even email them to partners without worrying too much. But users don’t really take advantage of token features.&lt;/p&gt;

&lt;p&gt;But I’m still happy we did this, because Macaroon quirks have given us a bunch of unexpected wins in our infrastructure. Our internal token system has turned out to be one of the nicer parts of our platform. Here’s why.&lt;/p&gt;

&lt;p&gt;&lt;img alt="This should clear everything up." src="/blog/operationalizing-macaroons/assets/schematic-diagram.png" /&gt;&lt;/p&gt;
&lt;h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;2&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;As an operator, the most important thing to know about Macaroons is that they’re online-stateful; you need a database somewhere. A Macaroon token starts with a random field (a nonce) and the first thing you do when verifying a token is to look that nonce up in a database. So one of the most important details of a Macaroon implementation is where that database lives.&lt;/p&gt;

&lt;p&gt;I can tell you one place we’re not OK with it living: in our primary API cluster.&lt;/p&gt;

&lt;p&gt;There’s several reasons for that. Some of them are about scalability and reliability: far and away the most common failure mode of an outage on our platform is “deploys are broken”, and those failures are usually caused by API instability. It would not be OK if “deploys are broken” transitively meant “deployed apps can’t use security tokens”. But the biggest reason is security: root secrets for Macaroon tokens are hazmat, and a basic rule of thumb in secure design is: keep hazmat away from complicated code.&lt;/p&gt;

&lt;p&gt;So we created a deliberately simple system to manage token data. It’s called &lt;code&gt;tkdb&lt;/code&gt;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;LiteFS: primary/replica distributed SQLite; Litestream: PITR SQLite replication to object storage; both work with unmodified SQLite libraries.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;tkdb&lt;/code&gt; is about 5000 lines of Go code that manages a SQLite database that is in turn managed by &lt;a href='https://fly.io/docs/litefs/' title=''&gt;LiteFS&lt;/a&gt; and &lt;a href='https://litestream.io/' title=''&gt;Litestream&lt;/a&gt;. It runs on isolated hardware (in the US, Europe, and Australia) and records in the database are encrypted with an injected secret. LiteFS gives us subsecond replication from our US primary to EU and AU, allows us to shift the primary to a different region, and gives us point-in-time recovery of the database.&lt;/p&gt;

&lt;p&gt;We’ve been running Macaroons for a couple years now, and the entire &lt;code&gt;tkdb&lt;/code&gt; database is just a couple dozen megs large. Most of that data isn’t real. A full PITR recovery of the database takes just seconds. We use SQLite for a lot of our infrastructure, and this is one of the very few well-behaved databases we have.&lt;/p&gt;

&lt;p&gt;That’s in large part a consequence of the design of Macaroons. There’s actually not much for us to store! The most complicated possible Macaroon still chains up to a single root key (we generate a key per Fly.io “organization”; you don&amp;rsquo;t share keys with your neighbors), and everything that complicates that Macaroon happens “offline”. We take advantage of  &amp;ldquo;attenuation&amp;rdquo; far more than our users do.&lt;/p&gt;

&lt;p&gt;The result is that database writes are relatively rare and very simple: we just need to record an HMAC key when Fly.io organizations are created (that is, roughly, when people sign up for the service and actually do a deploy). That, and revocation lists (more on that later), which make up most of the data.&lt;/p&gt;
&lt;h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;3&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Talking to &lt;code&gt;tkdb&lt;/code&gt; from the rest of our platform is complicated, for historical reasons.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;NATS is fine, we just don’t really need it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Ben Toews is responsible for most of the good things about this implementation. When he inherited the v0 Macaroons code from me, we were in the middle of a weird love affair with &lt;a href='https://nats.io/' title=''&gt;NATS&lt;/a&gt;, the messaging system. So &lt;code&gt;tkdb&lt;/code&gt; exported an RPC API over NATS messages.&lt;/p&gt;

&lt;p&gt;Our product security team can’t trust NATS (it’s not our code). That means a vulnerability in NATS can’t result in us losing control of all our tokens, or allow attackers to spoof authentication. Which in turn means you can’t run a plaintext RPC protocol for &lt;code&gt;tkdb&lt;/code&gt; over NATS; attackers would just spoof “yes this token is fine” messages.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I highly recommend implementing Noise; &lt;a href="http://www.noiseprotocol.org/noise.html" title=""&gt;the spec&lt;/a&gt; is kind of a joy in a way you can’t appreciate until you use it, and it’s educational.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;But you can’t just run TLS over NATS; NATS is a message bus, not a streaming secure channel. So I did the hipster thing and implemented &lt;a href='http://www.noiseprotocol.org/noise.html' title=''&gt;Noise&lt;/a&gt;. We export a “verification” API, and a “signing” API for minting new tokens. Verification uses &lt;code&gt;Noise_IK&lt;/code&gt; (which works like normal TLS) — anybody can verify, but everyone needs to prove they’re talking to the real &lt;code&gt;tkdb&lt;/code&gt;. Signing uses &lt;code&gt;Noise_KK&lt;/code&gt; (which works like mTLS) — only a few components in our system can mint tokens, and they get a special client key.&lt;/p&gt;

&lt;p&gt;A little over a year ago, &lt;a href='https://fly.io/blog/the-exit-interview-jp/' title=''&gt;JP&lt;/a&gt; led an effort to replace NATS with HTTP, which is how you talk to &lt;code&gt;tkdb&lt;/code&gt; today. Out of laziness, we kept the Noise stuff, which means the interface to &lt;code&gt;tkdb&lt;/code&gt; is now HTTP/Noise. This is a design smell, but the security model is nice: across many thousands of machines, there are only a handful with the cryptographic material needed to mint a new Macaroon token. Neat!&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tkdb&lt;/code&gt; is a Fly App (albeit deployed in special Fly-only isolated regions). Our infrastructure talks to it over “&lt;a href='https://fly.io/docs/networking/flycast/' title=''&gt;FlyCast&lt;/a&gt;”, which is our internal Anycast service. If you’re in Singapore, you’re probably get routed to the Australian &lt;code&gt;tkdb&lt;/code&gt;. If Australia falls over, you’ll get routed to the closest backup. The proxy that implements FlyCast is smart, as is the &lt;code&gt;tkdb&lt;/code&gt; client library, which will do exponential backoff retry transparently.&lt;/p&gt;

&lt;p&gt;Even with all that, we don’t like that Macaroon token verification is &amp;ldquo;online&amp;rdquo;. When you operate a global public cloud one of the first thing you learn is that &lt;a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''&gt;the global Internet sucks&lt;/a&gt;. Connectivity breaks all the time, and we’re paranoid about it. It’s painful for us that token verification can imply transoceanic links. Lovecraft was right about the oceans! Stay away!&lt;/p&gt;

&lt;p&gt;Our solution to this is caching. Macaroons, as it turns out, cache beautifully. That’s because once you’ve seen and verified a Macaroon, you have enough information to verify any more-specific Macaroon that descends from it; that’s a property of &lt;a href='https://research.google/pubs/macaroons-cookies-with-contextual-caveats-for-decentralized-authorization-in-the-cloud/' title=''&gt;their chaining HMAC construction&lt;/a&gt;. Our client libraries cache verifications, and the cache ratio for verification is over 98%.&lt;/p&gt;
&lt;h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;4&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''&gt;Revocation isn’t a corner case&lt;/a&gt;. It can’t be an afterthought. We’re potentially revoking tokens any time a user logs out. If that doesn’t work reliably, you wind up with “cosmetic logout”, which is a real vulnerability. When we kill a token, it needs to stay dead.&lt;/p&gt;

&lt;p&gt;Our revocation system is simple. It’s this table:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-434h6qlv"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-434h6qlv"&gt;        CREATE TABLE IF NOT EXISTS blacklist ( 
        nonce               BLOB NOT NULL UNIQUE, 
        required_until      DATETIME,
        created_at          DATETIME DEFAULT CURRENT_TIMESTAMP
        );
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When we need a token to be dead, we have our primary API do a call to the &lt;code&gt;tkdb&lt;/code&gt; “signing” RPC service for &lt;code&gt;revoke&lt;/code&gt;. &lt;code&gt;revoke&lt;/code&gt; takes the random nonce from the beginning of the Macaroon, discarding the rest, and adds it to the blacklist. Every Macaroon in the lineage of that nonce is now dead; we check the blacklist before verifying tokens.&lt;/p&gt;

&lt;p&gt;The obvious challenge here is caching; over 98% of our validation requests never hit &lt;code&gt;tkdb&lt;/code&gt;. We certainly don’t want to propagate the blacklist database to 35 regions around the globe.&lt;/p&gt;

&lt;p&gt;Instead, the &lt;code&gt;tkdb&lt;/code&gt; “verification” API exports an endpoint that provides a feed of revocation notifications. Our client library “subscribes” to this API (really, it just polls). Macaroons are revoked regularly (but not constantly), and when that happens, clients notice and prune their caches.&lt;/p&gt;

&lt;p&gt;If clients lose connectivity to &lt;code&gt;tkdb&lt;/code&gt;, past some threshold interval, they just dump their entire cache, forcing verification to happen at &lt;code&gt;tkdb&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;5&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;A place where we’ve gotten a lot of internal mileage out of Macaroon features is service tokens. Service tokens are tokens used by code, rather than humans; almost always, a service token is something that is stored alongside running application code.&lt;/p&gt;

&lt;p&gt;An important detail of Fly.io’s Macaroons is the distinction between a “permissions” token and an “authentication” token. Macaroons by themselves express authorization, not authentication.&lt;/p&gt;

&lt;p&gt;That’s a useful constraint, and we want to honor it. By requiring a separate token for authentication, we minimize the impact of having the permissions token stolen; you can’t use it without authentication, so really it’s just like a mobile signed IAM policy expression. Neat!&lt;/p&gt;

&lt;p&gt;The way we express authentication is with a third-party caveat (&lt;a href='https://fly.io/blog/macaroons-escalated-quickly/#4' title=''&gt;see the old post for details&lt;/a&gt;). Your main Fly.io Macaroon will have a caveat saying “this token is only valid if accompanied by the discharge token for a user in your organization from our authentication system”. Our authentication system does the login dance and issues those discharges.&lt;/p&gt;

&lt;p&gt;This is exactly what you want for user tokens and not at all what you want for a service token: we don’t want running code to store those authenticator tokens, because they’re hazardous.&lt;/p&gt;

&lt;p&gt;The solution we came up with for service tokens is simple: &lt;code&gt;tkdb&lt;/code&gt; exports an API that uses its access to token secrets to strip off the third-party authentication caveat. To call into that API, you have to present a valid discharging authentication token; that is, you have to prove you could already have done whatever the token said. &lt;code&gt;tkdb&lt;/code&gt; returns a new token with all the previous caveats, minus the expiration (you don’t usually want service tokens to expire).&lt;/p&gt;

&lt;p&gt;OK, so we’ve managed to transform a tuple &lt;code&gt;(unscary-token, scary-token)&lt;/code&gt; into the new tuple &lt;code&gt;(scary-token)&lt;/code&gt;. Not so impressive. But hold on: the recipient of &lt;code&gt;scary-token&lt;/code&gt; can attenuate it further: we can lock it to a particular instance of &lt;code&gt;flyd&lt;/code&gt;, or to a particular Fly Machine. Which means exfiltrating it doesn’t do you any good; to use it, you have to control the environment it’s intended to be used in.&lt;/p&gt;

&lt;p&gt;The net result of this scheme is that a compromised physical host will only give you access to tokens that have been used on that worker, which is a very nice property. Another way to look at it: every token used in production is traceable in some way to a valid token a user submitted. Neat!&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;All the cool spooky secret store names were taken.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We do a similar dance to with Pet Semetary, our internal Vault replacement. Petsem manages user secrets for applications, such as Postgres connection strings. Petsem is its own Macaroon authority (it issues its own Macaroons with its own permissions system), and to do something with a secret, you need one of those Petsem-minted Macaroon.&lt;/p&gt;

&lt;p&gt;Our primary API servers field requests from users to set secrets for their apps. So the API has a Macaroon that allows secrets writes. But it doesn&amp;rsquo;t allow reads: there’s no API call to dump your secrets, because our API servers don’t have that privilege. So far, so good.&lt;/p&gt;

&lt;p&gt;But when we boot up a Fly Machine, we need to inject the appropriate user secrets into it at boot; &lt;em&gt;something&lt;/em&gt; needs a Macaroon that can read secrets. That “something” is &lt;code&gt;flyd&lt;/code&gt;, our orchestrator, which runs on every worker server in our fleet.&lt;/p&gt;

&lt;p&gt;Clearly, we can’t give every &lt;code&gt;flyd&lt;/code&gt; a Macaroon that reads every user’s secret. Most users will never deploy anything on any given worker, and we can’t have a security model that collapses down to “every worker is equally privileged”.&lt;/p&gt;

&lt;p&gt;Instead, the “read secret” Macaroon that &lt;code&gt;flyd&lt;/code&gt; gets has a third-party caveat attached to it, which is dischargeable only by talking to &lt;code&gt;tkdb&lt;/code&gt; and proving (with normal Macaroon tokens) that you have permissions for the org whose secrets you want to read. Once again, access is traceable to an end-user action, and minimized across our fleet. Neat!&lt;/p&gt;
&lt;h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;6&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Our token systems have some of the best telemetry in the whole platform.&lt;/p&gt;

&lt;p&gt;Most of that is down to &lt;a href='http://opentelemetry.io/' title=''&gt;OpenTelemetry&lt;/a&gt; and &lt;a href='https://www.honeycomb.io/' title=''&gt;Honeycomb&lt;/a&gt;. From the moment a request hits our API server through the moment &lt;code&gt;tkdb&lt;/code&gt; responds to it, oTel &lt;a href='https://opentelemetry.io/docs/concepts/context-propagation/' title=''&gt;context propagation&lt;/a&gt; gives us a single narrative about what’s happening.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://fly.io/blog/the-exit-interview-jp/' title=''&gt;I was a skeptic about oTel&lt;/a&gt;. It’s really, really expensive. And, not to put too fine a point on it, oTel really cruds up our code. Once, I was an “80% of the value of tracing, we can get from logs and metrics” person. But I was wrong.&lt;/p&gt;

&lt;p&gt;Errors in our token system are rare. Usually, they’re just early indications of network instability, and between caching and FlyCast, we mostly don’t have to care about those alerts. When we do, it’s because something has gone so sideways that we’d have to care anyways. The &lt;code&gt;tkdb&lt;/code&gt; code is remarkably stable and there hasn’t been an incident intervention with our token system in over a year.&lt;/p&gt;

&lt;p&gt;Past oTel, and the standard logging and Prometheus metrics every Fly App gets for free, we also have a complete audit trail for token operations, in a permanently retained OpenSearch cluster index. Since virtually all the operations that happen on our platform are mediated by Macaroons, this audit trail is itself pretty powerful.&lt;/p&gt;
&lt;h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;7&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So, that&amp;rsquo;s pretty much it. The moral of the story for us is, Macaroons have a lot of neat features, our users mostly don&amp;rsquo;t care about them — that may even be a good thing — but we get a lot of use out of them internally.&lt;/p&gt;

&lt;p&gt;As an engineering culture, we&amp;rsquo;re allergic to &amp;ldquo;microservices&amp;rdquo;, and we flinched a bit at the prospect of adding a specific service just to manage tokens. But it&amp;rsquo;s pulled its weight, and not added really any drama at all. We have at this point a second dedicated security service (Petsem), and even though they sort of rhyme with each other, we&amp;rsquo;ve got no plans to merge them. &lt;a href='https://how.complexsystems.fail/#10' title=''&gt;Rule #10&lt;/a&gt; and all that.&lt;/p&gt;

&lt;p&gt;Oh, and a total victory for LiteFS, Litestream, and infrastructure SQLite. Which, after managing an infrastructure SQLite project that routinely ballooned to tens of gigabytes and occasionally threatened service outages, is lovely to see.&lt;/p&gt;

&lt;p&gt;Macaroons! If you&amp;rsquo;d asked us a year ago, we&amp;rsquo;d have said the jury was still out on whether they were a good move. But then Ben Toews spent a year making them awesome, and so they are. &lt;a href='https://github.com/superfly/macaroon' title=''&gt;Most of the code is open source&lt;/a&gt;!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Taming A Voracious Rust Proxy</title>
    <link rel="alternate" href="https://fly.io/blog/taming-rust-proxy/"/>
    <id>https://fly.io/blog/taming-rust-proxy/</id>
    <published>2025-02-26T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/taming-rust-proxy/assets/happy-crab-cover.jpg"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Here’s a fun bug.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The basic idea of our service is that we run containers for our users, as hardware-isolated virtual machines (Fly Machines), on hardware we own around the world. What makes that interesting is that we also connect every Fly Machine to a global Anycast network. If your app is running in Hong Kong and Dallas, and a request for it arrives in Singapore, we&amp;rsquo;ll route it to &lt;code&gt;HKG&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Our own hardware fleet is roughly divided into two kinds of servers: edges, which receive incoming requests from the Internet, and workers, which run Fly Machines. Edges exist almost solely to run a Rust program called &lt;code&gt;fly-proxy&lt;/code&gt;, the router at the heart of our Anycast network.&lt;/p&gt;

&lt;p&gt;So: a week or so ago, we flag an incident. Lots of things generate incidents: synthetic monitoring failures, metric thresholds, health check failures. In this case two edge tripwires tripped: elevated &lt;code&gt;fly-proxy&lt;/code&gt; HTTP errors, and skyrocketing CPU utilization, on a couple hosts in &lt;code&gt;IAD&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Our incident process is pretty ironed out at this point. We created an incident channel (we ❤️ &lt;a href='https://rootly.com/' title=''&gt;Rootly&lt;/a&gt; for this, &lt;a href='https://rootly.com/' title=''&gt;seriously check out Rootly&lt;/a&gt;, an infra MVP here for years now), and incident responders quickly concluded that, while something hinky was definitely going on, the platform was fine. We have a lot of edges, and we&amp;rsquo;ve also recently converted many of our edge servers to significantly beefier hardware.&lt;/p&gt;

&lt;p&gt;Bouncing &lt;code&gt;fly-proxy&lt;/code&gt; clears the problem up on an affected proxy. But this wouldn&amp;rsquo;t be much of an interesting story if the problem didn&amp;rsquo;t later come back. So, for some number of hours, we&amp;rsquo;re in an annoying steady-state of getting paged and bouncing proxies. &lt;/p&gt;

&lt;p&gt;While this is happening, Pavel, on our proxy team, pulls a profile from an angry proxy. 
&lt;img alt="A flamegraph profile, described better in the prose anyways." src="/blog/taming-rust-proxy/assets/proxy-profile.jpg" /&gt;
So, this is fuckin&amp;rsquo; weird: a huge chunk of the profile is dominated by Rust &lt;code&gt;tracing&lt;/code&gt;&amp;lsquo;s &lt;code&gt;Subscriber&lt;/code&gt;. But that doesn&amp;rsquo;t make sense. The entire point of Rust &lt;code&gt;tracing&lt;/code&gt;, which generates fine-grained span records for program activity, is that &lt;code&gt;entering&lt;/code&gt; and &lt;code&gt;exiting&lt;/code&gt; a span is very, very fast. &lt;/p&gt;

&lt;p&gt;If the mere act of &lt;code&gt;entering&lt;/code&gt; a span in a Tokio stack is chewing up a significant amount of CPU, something has gone haywire: the actual code being traced must be doing next to nothing.&lt;/p&gt;
&lt;h2 id='a-quick-refresher-on-async-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-quick-refresher-on-async-rust' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A Quick Refresher On Async Rust&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So in Rust, like a lot of &lt;code&gt;async/await&lt;/code&gt; languages, you&amp;rsquo;ve got &lt;code&gt;Futures&lt;/code&gt;. A &lt;code&gt;Future&lt;/code&gt; is a type that represents the future value of an asychronous computation, like reading from a socket. &lt;code&gt;Futures&lt;/code&gt; are state machines, and they&amp;rsquo;re lazy: they expose one basic operation, &lt;code&gt;poll&lt;/code&gt;, which an executor (like Tokio) calls to advance the state machine. That &lt;code&gt;poll&lt;/code&gt; returns whether the &lt;code&gt;Future&lt;/code&gt; is still &lt;code&gt;Pending&lt;/code&gt;, or &lt;code&gt;Ready&lt;/code&gt; with a result.&lt;/p&gt;

&lt;p&gt;In theory, you could build an executor that drove a bunch of &lt;code&gt;Futures&lt;/code&gt; just by storing them in a list and busypolling each of them, round robin, until they return &lt;code&gt;Ready&lt;/code&gt;. This executor would defeat the much of the purpose of asynchronous program, so no real executor works that way.&lt;/p&gt;

&lt;p&gt;Instead, a runtime like Tokio integrates &lt;code&gt;Futures&lt;/code&gt; with an event loop (on &lt;a href='https://man7.org/linux/man-pages/man7/epoll.7.html' title=''&gt;epoll&lt;/a&gt; or &lt;a href='https://en.wikipedia.org/wiki/Kqueue' title=''&gt;kqeue&lt;/a&gt;) and, when calling &lt;code&gt;poll&lt;/code&gt;, passes a &lt;code&gt;Waker&lt;/code&gt;. The &lt;code&gt;Waker&lt;/code&gt; is an abstract handle that allows the &lt;code&gt;Future&lt;/code&gt; to instruct the Tokio runtime to call &lt;code&gt;poll&lt;/code&gt;, because something has happened.&lt;/p&gt;

&lt;p&gt;To complicate things: an ordinary &lt;code&gt;Future&lt;/code&gt; is a one-shot value. Once it&amp;rsquo;s &lt;code&gt;Ready&lt;/code&gt;, it can&amp;rsquo;t be &lt;code&gt;polled&lt;/code&gt; anymore. But with network programming, that&amp;rsquo;s usually not what you want: data usually arrives in streams, which you want to track and make progress on as you can. So async Rust provides &lt;code&gt;AsyncRead&lt;/code&gt; and &lt;code&gt;AsyncWrite&lt;/code&gt; traits, which build on &lt;code&gt;Futures&lt;/code&gt;, and provide methods like &lt;code&gt;poll_read&lt;/code&gt; that return &lt;code&gt;Ready&lt;/code&gt; &lt;em&gt;every time&lt;/em&gt; there&amp;rsquo;s data ready. &lt;/p&gt;

&lt;p&gt;So far so good? OK. Now, there are two footguns in this design. &lt;/p&gt;

&lt;p&gt;The first footgun is that a &lt;code&gt;poll&lt;/code&gt; of a &lt;code&gt;Future&lt;/code&gt; that isn&amp;rsquo;t &lt;code&gt;Ready&lt;/code&gt; wastes cycles, and, if you have a bug in your code and that &lt;code&gt;Pending&lt;/code&gt; poll happens to trip a &lt;code&gt;Waker&lt;/code&gt;, you&amp;rsquo;ll slip into an infinite loop. That&amp;rsquo;s easy to see.&lt;/p&gt;

&lt;p&gt;The second and more insidious footgun is that an &lt;code&gt;AsyncRead&lt;/code&gt; can &lt;code&gt;poll_read&lt;/code&gt; to a &lt;code&gt;Ready&lt;/code&gt; that doesn&amp;rsquo;t actually progress its underlying state machine. Since the idea of &lt;code&gt;AsyncRead&lt;/code&gt; is that you keep &lt;code&gt;poll_reading&lt;/code&gt; until it stops being &lt;code&gt;Ready&lt;/code&gt;, this too is an infinite loop.&lt;/p&gt;

&lt;p&gt;When we look at our profiles, what we see are samples that almost terminate in libc, but spend next to no time in the kernel doing actual I/O. The obvious explanation: we&amp;rsquo;ve entered lots of &lt;code&gt;poll&lt;/code&gt; functions, but they&amp;rsquo;re doing almost nothing and returning immediately.&lt;/p&gt;
&lt;h2 id='jaccuse' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#jaccuse' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;J&amp;#39;accuse!&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Wakeup issues are annoying to debug. But the flamegraph gives us the fully qualified type of the &lt;code&gt;Future&lt;/code&gt; we&amp;rsquo;re polling:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-sqkxkpif"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-sqkxkpif"&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="nn"&gt;fp_io&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Duplex&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="nn"&gt;fp_io&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;reusable_reader&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ReusableReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;peek&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PeekableReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;tokio_rustls&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;server&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TlsStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp_metered&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;MeteredIo&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;peek&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PeekableReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;permitted&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PermittedTcpStream&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Conn&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;net&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TcpStream&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This loops like a lot, but much of it is just wrapper types we wrote ourselves, and those wrappers don&amp;rsquo;t do anything interesting. What&amp;rsquo;s left to audit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Duplex&lt;/code&gt;, the outermost type, one of ours, &lt;em&gt;and&lt;/em&gt;
&lt;/li&gt;&lt;li&gt;&lt;code&gt;TlsStream&lt;/code&gt;, from &lt;a href='https://github.com/rustls/rustls' title=''&gt;Rustls&lt;/a&gt;.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;Duplex&lt;/code&gt; is a beast. It&amp;rsquo;s the core I/O state machine for proxying between connections. It&amp;rsquo;s not easy to reason about in specificity. But: it also doesn&amp;rsquo;t do anything directly with a &lt;code&gt;Waker&lt;/code&gt;; it&amp;rsquo;s built around &lt;code&gt;AsyncRead&lt;/code&gt; and &lt;code&gt;AsyncWrite&lt;/code&gt;. It hasn&amp;rsquo;t changed recently and we can&amp;rsquo;t trigger misbehavior in it.&lt;/p&gt;

&lt;p&gt;That leaves &lt;code&gt;TlsStream&lt;/code&gt;. &lt;code&gt;TlsStream&lt;/code&gt; is an ultra-important, load-bearing function in the Rust ecosystem. Everybody uses it. Could it harbor an async Rust footgun? Turns out, it did!&lt;/p&gt;

&lt;p&gt;Unlike our &lt;code&gt;Duplex&lt;/code&gt;, Rustls actually does have to get intimate with the underlying async executor. And, looking through the repository, Pavel uncovers &lt;a href='https://github.com/rustls/tokio-rustls/issues/72' title=''&gt;this issue&lt;/a&gt;: sometimes, &lt;code&gt;TlsStreams&lt;/code&gt; in Rustls just spin out. And it turns out, what&amp;rsquo;s causing this is a TLS state machine bug: when a TLS session is orderly-closed, with a &lt;code&gt;CloseNotify&lt;/code&gt; &lt;code&gt;Alert&lt;/code&gt; record, the sender of that record has informed its counterparty that no further data will be sent. But if there&amp;rsquo;s still buffered data on the underlying connection, &lt;code&gt;TlsStream&lt;/code&gt; mishandles its &lt;code&gt;Waker&lt;/code&gt;, and we fall into a busy-loop.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/rustls/rustls/pull/1950/files' title=''&gt;Pretty straightforward fix&lt;/a&gt;!&lt;/p&gt;
&lt;h2 id='what-actually-happened-to-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-actually-happened-to-us' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Actually Happened To Us&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Our partners in object storage, &lt;a href='https://www.tigrisdata.com/' title=''&gt;Tigris Data&lt;/a&gt;, were conducting some kind of load testing exercise. Some aspect of their testing system triggered the &lt;code&gt;TlsStream&lt;/code&gt; state machine bug, which locked up one or more &lt;code&gt;TlsStreams&lt;/code&gt; in the edge proxy handling whatever corner-casey stream they were sending.&lt;/p&gt;

&lt;p&gt;Tigris wasn&amp;rsquo;t generating a whole lot of traffic; tens of thousands of connections, tops. But all of them sent small HTTP bodies and then terminated early. We figured some of those connections errored out, and set up the &amp;ldquo;TLS CloseNotify happened before EOF&amp;rdquo; scenario. &lt;/p&gt;

&lt;p&gt;To be truer to the chronology, we knew pretty early on in our investigation that something Tigris was doing with their load testing was probably triggering the bug, and we got them to stop. After we worked it out, and Pavel deployed the fix, we told them to resume testing. No spin-outs.&lt;/p&gt;
&lt;h2 id='lessons-learned' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lessons-learned' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Lessons Learned&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Keep your dependencies updated. Unless you shouldn&amp;rsquo;t keep your dependencies updated. I mean, if there&amp;rsquo;s a vulnerability (and, technically, this was a DoS vulnerability), always update. And if there&amp;rsquo;s an important bugfix, update. But if there isn&amp;rsquo;t an important bugfix, updating for the hell of it might also destabilize your project? So update maybe? Most of the time?&lt;/p&gt;

&lt;p&gt;Really, the truth of this is that keeping track of &lt;em&gt;what needs to be updated&lt;/em&gt; is valuable work. The updates themselves are pretty fast and simple, but the process and testing infrastructure to confidently metabolize dependency updates is not. &lt;/p&gt;

&lt;p&gt;Our other lesson here is that there&amp;rsquo;s an opportunity to spot these kinds of bugs more directly with our instrumentation. Spurious wakeups should be easy to spot, and triggering a metric when they happen should be cheap, because they&amp;rsquo;re not supposed to happen often. So that&amp;rsquo;s something we&amp;rsquo;ll go do now.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>We Were Wrong About GPUs</title>
    <link rel="alternate" href="https://fly.io/blog/wrong-about-gpu/"/>
    <id>https://fly.io/blog/wrong-about-gpu/</id>
    <published>2025-02-14T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/wrong-about-gpu/assets/choices-choices-cover.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re building a public cloud, on hardware we own. We raised money to do that, and to place some bets; one of them: GPU-enabling our customers. A progress report: GPUs aren’t going anywhere, but: GPUs aren’t going anywhere.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A couple years back, &lt;a href="https://fly.io/gpu"&gt;we put a bunch of chips down&lt;/a&gt; on the bet that people shipping apps to users on the Internet would want GPUs, so they could do AI/ML inference tasks. To make that happen, we created &lt;a href="https://fly.io/docs/gpus/getting-started-gpus/"&gt;Fly GPU Machines&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A Fly Machine is a &lt;a href="https://fly.io/blog/docker-without-docker/"&gt;Docker/OCI container&lt;/a&gt; running inside a hardware-virtualized virtual machine somewhere on our global fleet of bare-metal worker servers. A GPU Machine is a Fly Machine with a hardware-mapped Nvidia GPU. It&amp;rsquo;s a Fly Machine that can do fast CUDA.&lt;/p&gt;

&lt;p&gt;Like everybody else in our industry, we were right about the importance of AI/ML. If anything, we underestimated its importance. But the product we came up with probably doesn&amp;rsquo;t fit the moment. It&amp;rsquo;s a bet that doesn&amp;rsquo;t feel like it&amp;rsquo;s paying off.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;If you&amp;rsquo;re using Fly GPU Machines, don&amp;rsquo;t freak out; we&amp;rsquo;re not getting rid of them.&lt;/strong&gt; But if you&amp;rsquo;re waiting for us to do something bigger with them, a v2 of the product, you&amp;rsquo;ll probably be waiting awhile.&lt;/p&gt;
&lt;h3 id='what-it-took' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-took' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What It Took&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;GPU Machines were not a small project for us. Fly Machines run on an idiosyncratically small hypervisor (normally Firecracker, but for GPU Machines &lt;a href="https://github.com/cloud-hypervisor/cloud-hypervisor"&gt;Intel&amp;rsquo;s Cloud Hypervisor&lt;/a&gt;, a very similar Rust codebase that supports PCI passthrough). The Nvidia ecosystem is not geared to supporting micro-VM hypervisors.&lt;/p&gt;

&lt;p&gt;GPUs &lt;a href="https://googleprojectzero.blogspot.com/2020/09/attacking-qualcomm-adreno-gpu.html"&gt;terrified our security team&lt;/a&gt;. A GPU is just about the worst case hardware peripheral: intense multi-directional direct memory transfers&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(not even bidirectional: in common configurations, GPUs talk to each other)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;with arbitrary, end-user controlled computation, all operating outside our normal security boundary.&lt;/p&gt;

&lt;p&gt;We did a couple expensive things to mitigate the risk. We shipped GPUs on dedicated server hardware, so that GPU- and non-GPU workloads weren&amp;rsquo;t mixed. Because of that, the only reason for a Fly Machine to be scheduled on a GPU machine was that it needed a PCI BDF for an Nvidia GPU, and there&amp;rsquo;s a limited number of those available on any box. Those GPU servers were drastically less utilized and thus less cost-effective than our ordinary servers.&lt;/p&gt;

&lt;p&gt;We funded two very large security assessments, from &lt;a href="https://www.atredis.com/"&gt;Atredis&lt;/a&gt; and &lt;a href="https://tetrelsec.com/"&gt;Tetrel&lt;/a&gt;, to evaluate our GPU deployment. Matt Braun is writing up those assessments now. They were not cheap, and they took time.&lt;/p&gt;

&lt;p&gt;Security wasn&amp;rsquo;t directly the biggest cost we had to deal with, but it was an indirect cause for a subtle reason.&lt;/p&gt;

&lt;p&gt;We could have shipped GPUs very quickly by doing what Nvidia recommended: standing up a standard K8s cluster to schedule GPU jobs on. Had we taken that path, and let our GPU users share a single Linux kernel, we&amp;rsquo;d have been on Nvidia&amp;rsquo;s driver happy-path.&lt;/p&gt;

&lt;p&gt;Alternatively, we could have used a conventional hypervisor. Nvidia suggested VMware (heh). But they could have gotten things working had we used QEMU. We like QEMU fine, and could have talked ourselves into a security story for it, but the whole point of Fly Machines is that they take milliseconds to start. We could not have offered our desired Developer Experience on the Nvidia happy-path.&lt;/p&gt;

&lt;p&gt;Instead, we burned months trying (and ultimately failing) to get Nvidia&amp;rsquo;s host drivers working to map &lt;a href="https://www.nvidia.com/en-us/data-center/virtual-solutions/"&gt;virtualized GPUs&lt;/a&gt; into Intel Cloud Hypervisor. At one point, we hex-edited the closed-source drivers to trick them into thinking our hypervisor was QEMU.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m not sure any of this really mattered in the end. There&amp;rsquo;s a segment of the market we weren&amp;rsquo;t ever really able to explore because Nvidia&amp;rsquo;s driver support kept us from thin-slicing GPUs. We&amp;rsquo;d have been able to put together a really cheap offering for developers if we hadn&amp;rsquo;t run up against that, and developers love &amp;ldquo;cheap&amp;rdquo;, but I can&amp;rsquo;t prove that those customers are real.&lt;/p&gt;

&lt;p&gt;On the other hand, we&amp;rsquo;re committed to delivering the Fly Machine DX for GPU workloads. Beyond the PCI/IOMMU drama, just getting an entire hardware GPU working in a Fly Machine was a lift. We needed Fly Machines that would come up with the right Nvidia drivers; our stack was built assuming that the customer&amp;rsquo;s OCI container almost entirely defined the root filesystem for a Machine. We had to engineer around that in our &lt;code&gt;flyd&lt;/code&gt; orchestrator. And almost everything people want to do with GPUs involves efficiently grabbing huge files full of model weights. Also annoying!&lt;/p&gt;

&lt;p&gt;And, of course, we bought GPUs. A lot of GPUs. Expensive GPUs.&lt;/p&gt;
&lt;h3 id='why-it-isnt-working' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-it-isnt-working' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Why It Isn&amp;rsquo;t Working&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The biggest problem: developers don&amp;rsquo;t want GPUs. They don&amp;rsquo;t even want AI/ML models. They want LLMs. &lt;em&gt;System engineers&lt;/em&gt; may have smart, fussy opinions on how to get their models loaded with CUDA, and what the best GPU is. But &lt;em&gt;software developers&lt;/em&gt; don&amp;rsquo;t care about any of that. When a software developer shipping an app comes looking for a way for their app to deliver prompts to an LLM, you can&amp;rsquo;t just give them a GPU.&lt;/p&gt;

&lt;p&gt;For those developers, who probably make up most of the market, it doesn&amp;rsquo;t seem plausible for an insurgent public cloud to compete with OpenAI and Anthropic. Their APIs are fast enough, and developers thinking about performance in terms of &amp;ldquo;tokens per second&amp;rdquo; aren&amp;rsquo;t counting milliseconds.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(you should all feel sympathy for us)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This makes us sad because we really like the point in the solution space we found. Developers shipping apps on Amazon will outsource to other public clouds to get cost-effective access to GPUs. But then they&amp;rsquo;ll faceplant trying to handle data and model weights, backhauling gigabytes (at significant expense) from S3. We have app servers, GPUs, and object storage all under the same top-of-rack switch. But inference latency just doesn&amp;rsquo;t seem to matter yet, so the market doesn&amp;rsquo;t care.&lt;/p&gt;

&lt;p&gt;Past that, and just considering the system engineers who do care about GPUs rather than LLMs: the hardware product/market fit here is really rough.&lt;/p&gt;

&lt;p&gt;People doing serious AI work want galactically huge amounts of GPU compute. A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Near as we can tell, MIG gives you a UUID to talk to the host driver, not a PCI device.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We think there&amp;rsquo;s probably a market for users doing lightweight ML work getting tiny GPUs. &lt;a href="https://www.nvidia.com/en-us/technologies/multi-instance-gpu/"&gt;This is what Nvidia MIG does&lt;/a&gt;, slicing a big GPU into arbitrarily small virtual GPUs. But for fully-virtualized workloads, it&amp;rsquo;s not baked; we can&amp;rsquo;t use it. And I&amp;rsquo;m not sure how many of those customers there are, or whether we&amp;rsquo;d get the density of customers per server that we need.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half"&gt;That leaves the L40S customers&lt;/a&gt;. There are a bunch of these! We dropped L40S prices last year, not because we were sour on GPUs but because they&amp;rsquo;re the one part we have in our inventory people seem to get a lot of use out of. We&amp;rsquo;re happy with them. But they&amp;rsquo;re just another kind of compute that some apps need; they&amp;rsquo;re not a driver of our core business. They&amp;rsquo;re not the GPU bet paying off.&lt;/p&gt;

&lt;p&gt;Really, all of this is just a long way of saying that for most software developers, &amp;ldquo;AI-enabling&amp;rdquo; their app is best done with API calls to things like Claude and GPT, Replicate and RunPod.&lt;/p&gt;
&lt;h3 id='what-did-we-learn' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-did-we-learn' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Did We Learn?&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;A very useful way to look at a startup is that it&amp;rsquo;s a race to learn stuff. So, what&amp;rsquo;s our report card?&lt;/p&gt;

&lt;p&gt;First off, when we embarked down this path in 2022, we were (like many other companies) operating in a sort of phlogiston era of AI/ML. The industry attention to AI had not yet collapsed around a small number of foundational LLM models. We expected there to be a diversity of &lt;em&gt;mainstream&lt;/em&gt; models, the world &lt;a href='https://github.com/elixir-nx/bumblebee' title=''&gt;Elixir Bumblebee&lt;/a&gt; looks forward to, where people pull different AI workloads off the shelf the same way they do Ruby gems.&lt;/p&gt;

&lt;p&gt;But &lt;a href='https://www.cursor.com/' title=''&gt;Cursor happened&lt;/a&gt;, and, as they say, how are you going to keep &amp;lsquo;em down on the farm once they&amp;rsquo;ve seen Karl Hungus? It seems much clearer where things are heading.&lt;/p&gt;

&lt;p&gt;GPUs were a test of a Fly.io company credo: as we think about core features, we design for 10,000 developers, not for 5-6. It took a minute, but the credo wins here: GPU workloads for the 10,001st developer are a niche thing.&lt;/p&gt;

&lt;p&gt;Another way to look at a startup is as a series of bets. We put a lot of chips down here. But the buy-in for this tournament gave us a lot of chips to play with. Never making a big bet of any sort isn&amp;rsquo;t a winning strategy. I&amp;rsquo;d rather we&amp;rsquo;d flopped the nut straight, but I think going in on this hand was the right call.&lt;/p&gt;

&lt;p&gt;A really important thing to keep in mind here, and something I think a lot of startup thinkers sleep on, is the extent to which this bet involved acquiring assets. Obviously, some of our &lt;a href='https://fly.io/blog/the-exit-interview-jp/' title=''&gt;costs here aren&amp;rsquo;t recoverable&lt;/a&gt;. But the hardware parts that aren&amp;rsquo;t generating revenue will ultimately get liquidated; like with &lt;a href='https://fly.io/blog/32-bit-real-estate/' title=''&gt;our portfolio of IPv4 addresses&lt;/a&gt;, I&amp;rsquo;m even more comfortable making bets backed by tradable assets with durable value.&lt;/p&gt;

&lt;p&gt;In the end, I don&amp;rsquo;t think GPU Fly Machines were going to be a hit for us no matter what we did. Because of that, one thing I&amp;rsquo;m very happy about is that we didn&amp;rsquo;t compromise the rest of the product for them. Security concerns slowed us down to where we probably learned what we needed to learn a couple months later than we could have otherwise, but we&amp;rsquo;re scaling back our GPU ambitions without having sacrificed &lt;a href='https://fly.io/blog/sandboxing-and-workload-isolation/' title=''&gt;any of our isolation story&lt;/a&gt;, and, ironically, GPUs &lt;em&gt;other people run&lt;/em&gt; are making that story a lot more important. The same thing goes for our Fly Machine developer experience.&lt;/p&gt;

&lt;p&gt;We started this company building a Javascript runtime for edge computing. We learned that our customers didn&amp;rsquo;t want a new Javascript runtime; they just wanted their native code to work. &lt;a href='https://news.ycombinator.com/item?id=22616857' title=''&gt;We shipped containers&lt;/a&gt;, and no convincing was needed. We were wrong about Javascript edge functions, and I think we were wrong about GPUs. That&amp;rsquo;s usually how we figure out the right answers:  by being wrong about a lot of stuff.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>The Exit Interview: JP Phillips</title>
    <link rel="alternate" href="https://fly.io/blog/the-exit-interview-jp/"/>
    <id>https://fly.io/blog/the-exit-interview-jp/</id>
    <published>2025-02-12T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/the-exit-interview-jp/assets/bye-bye-little-sebastian-cover.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;JP Phillips is off to greener, or at least calmer, pastures. He joined us 4 years ago to build the next generation of our orchestration system, and has been one of the anchors of our engineering team. His last day is today. We wanted to know what he was thinking, and figured you might too.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Question 1: Why, JP? Just why?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;LOL. When I looked at what I wanted to see from here in the next 3-4 years, it didn&amp;rsquo;t really match up with where we&amp;rsquo;re currently heading. Specifically, with our new focus on MPG &lt;em&gt;[Managed Postgres]&lt;/em&gt; and [llm] &lt;em&gt;[llm].&lt;/em&gt;&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Editorial comment: Even I don’t know what [llm] is.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The Fly Machines platform is more or less finished, in the sense of being capable of supporting the next iteration of our products. My original desire to join Fly.io was to make Machines a product that would &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;rid us of HashiCorp Nomad&lt;/a&gt;, and I feel like that&amp;rsquo;s been accomplished.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Where were you hoping to see us headed?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;More directly positioned as a cloud provider, rather than a platform-as-a-service; further along the customer journey from &amp;ldquo;developers&amp;rdquo; and &amp;ldquo;startups&amp;rdquo; to large established companies.&lt;/p&gt;

&lt;p&gt;And, it&amp;rsquo;s not that I disagree with PAAS work or MPG! Rather, it&amp;rsquo;s not something that excites me in a way that I&amp;rsquo;d feel challenged and could continue to grow technically.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow up question: does your family know what you’re doing here? Doing to us? Are they OK with it?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, my family was very involved in the decision, before I even talked to other companies.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What&amp;rsquo;s the thing you&amp;rsquo;re happiest about having built here? It cannot be &amp;ldquo;all of &lt;code&gt;flyd&lt;/code&gt;&amp;rdquo;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve enabled developers to run workloads from an OCI image and an API call all over the world. On any other cloud provider, the knowledge of how to pull that off comes with a professional certification.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;In what file in our &lt;code&gt;nomad-firecracker&lt;/code&gt; repository would I find that code?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href='https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines' title=''&gt;https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img alt="A diagram that doesn&amp;#39;t make any of this clearer" src="/blog/the-exit-interview-jp/assets/flaps.png?1/2&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;So you mean, literally, the whole Fly Machines API, and &lt;code&gt;flaps&lt;/code&gt;, the API gateway for Fly Machines?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, all of it. The &lt;code&gt;flaps&lt;/code&gt; API server, the &lt;code&gt;flyd&lt;/code&gt; RPCs it calls, the &lt;code&gt;flyd&lt;/code&gt; finite state machine system, the interface to running VMs.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Is there something you especially like about that design?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I like that it for the most part doesn&amp;rsquo;t require any central coordination. And I like that the P90 for Fly Machine &lt;code&gt;create&lt;/code&gt; calls is sub-5-seconds for pretty much every region except for Johannesburg and Hong Kong.&lt;/p&gt;

&lt;p&gt;I think the FSM design is something I&amp;rsquo;m proud of; if I could take any code with me, it&amp;rsquo;d be the &lt;code&gt;internal/fsm&lt;/code&gt; in the &lt;code&gt;nomad-firecracker&lt;/code&gt; repo.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;You can read more about &lt;a href="https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/" title=""&gt;the &lt;code&gt;flyd&lt;/code&gt; orchestrator JP led over here&lt;/a&gt;.  But, a quick decoder ring: &lt;code&gt;flyd&lt;/code&gt; runs independently without any central coordination on thousands of “worker” servers around the globe. It’s structured as an API server for a bunch of finite state machine invocations, where an FSM might be something like “start a Fly Machine” or “create a new Fly Machine” or “cordon off a Fly Machine so we can update it”. Each FSM invocation is comprised of a bunch of steps, each of those steps has callbacks into the &lt;code&gt;flyd&lt;/code&gt; code, and each step is logged in &lt;a href="https://github.com/boltdb/bolt" title=""&gt;a BoltDB database&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Thinking back, there are like two archetypes of insanely talented developers I’ve worked with. One is the kind that belts out ridiculous amounts of relatively sophisticated code on a whim, at like 3AM. Jerome [who leads our fly-proxy team], is that type. The other comes to projects with what feels like fully-formed, coherent designs that are not super intuitive, and the whole project just falls together around that design. Did you know you were going to do the FSM log thing when you started &lt;code&gt;flyd&lt;/code&gt;?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I definitely didn&amp;rsquo;t have any specific design in mind when I started on &lt;code&gt;flyd&lt;/code&gt;. I think the FSM stuff is a result of work I did at Compose.io / MongoHQ (where it was called &amp;ldquo;recipes&amp;rdquo;/&amp;ldquo;operations&amp;rdquo;) and the workd I did at HashiCorp using Cadence.&lt;/p&gt;

&lt;p&gt;Once I understood what the product needed to do and look like, having a way to perform deterministic and durable execution felt like a good design.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Cadence?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href='https://cadenceworkflow.io/' title=''&gt;Cadence&lt;/a&gt; is the child of AWS Step Functions and the predecessor to &lt;a href='https://temporal.io/' title=''&gt;Temporal&lt;/a&gt; (the company).&lt;/p&gt;

&lt;p&gt;One of the biggest gains, with how it works in &lt;code&gt;flyd&lt;/code&gt;, is knowing we would need to deploy &lt;code&gt;flyd&lt;/code&gt; all day, every day. If &lt;code&gt;flyd&lt;/code&gt; was in the middle of doing some work, it needed to pick back up right where it left off, post-deploy.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OK, next question. What&amp;rsquo;s the most impressive thing you saw someone else build here? To make things simpler and take some pressure off the interview, we can exclude any of my works from consideration.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Probably &lt;a href='https://github.com/superfly/corrosion' title=''&gt;&lt;code&gt;corrosion2&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Sidebar: &lt;code&gt;corrosion2&lt;/code&gt; is our state distribution system. While &lt;code&gt;flyd&lt;/code&gt; runs individual Fly Machines for users, each instance is solely responsible for its own state; there’s no global scheduler. But we have platform components, most obviously &lt;code&gt;fly-proxy&lt;/code&gt;, our Anycast router, that need to know what’s running where. &lt;code&gt;corrosion2&lt;/code&gt; is a Rust service that does &lt;a href="https://fly.io/blog/building-clusters-with-serf/" title=""&gt;SWIM gossip&lt;/a&gt; to propagate information from each worker into a CRDT-structured SQLite database. &lt;code&gt;corrosion2&lt;/code&gt; essentially means any component on our fleet can do SQLite queries to get near-real-time information about any Fly Machine around the world.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If for no other reason than that we deployed &lt;code&gt;corrosion&lt;/code&gt;, learned from it, and were able to make significant and valuable improvements — and then migrate to the new system in a short period of time.&lt;/p&gt;

&lt;p&gt;Having a &amp;ldquo;just SQLite&amp;rdquo; interface, for async replicated changes around the world in seconds, it&amp;rsquo;s pretty powerful.&lt;/p&gt;

&lt;p&gt;If we invested in &lt;a href='https://antithesis.com/' title=''&gt;Anthesis&lt;/a&gt; or TLA+ testing, I think there&amp;rsquo;s &lt;a href='https://github.com/superfly/corrosion' title=''&gt;potential for other companies&lt;/a&gt; to get value out of &lt;code&gt;corrosion2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Just as a general-purpose gossip-based SQLite CRDT gossip system?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OK, you&amp;rsquo;re being too nice. What&amp;rsquo;s your least favorite thing about the platform?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GraphQL. No, Elixir. It&amp;rsquo;s a tie between GraphQL and Elixir.&lt;/p&gt;

&lt;p&gt;But probably GraphQL, by a hair.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;That&amp;rsquo;s not the answer I expected.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GraphQL slows everyone down, and everything. Elixir only slows me down.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The rest of the platform, you&amp;rsquo;re fine with? No complaints?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m happier now that we have &lt;code&gt;pilot&lt;/code&gt;.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;code&gt;pilot&lt;/code&gt; is our new &lt;code&gt;init&lt;/code&gt;. When we launch a Fly Machine, &lt;code&gt;init&lt;/code&gt; is our foothold in the machine; this is unlike a normal OCI runtime, where “pid 1” is often the user’s entrypoint program. Our original &lt;code&gt;init&lt;/code&gt; was so simple people dunked on it and said it might as well have been a bash script; over time, &lt;code&gt;init&lt;/code&gt; has sprouted a bunch of new features. &lt;code&gt;pilot&lt;/code&gt; consolidates those features, and, more importantly, is itself a complete OCI runtime; &lt;code&gt;pilot&lt;/code&gt; can natively run containers inside of Fly Machines.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Before &lt;code&gt;pilot&lt;/code&gt;, there really wasn&amp;rsquo;t any contract between &lt;code&gt;flyd&lt;/code&gt; and &lt;code&gt;init&lt;/code&gt;. And &lt;code&gt;init&lt;/code&gt; was just &amp;ldquo;whatever we wanted &lt;code&gt;init&lt;/code&gt; to be&amp;rdquo;. That limit its ability to serve us.&lt;/p&gt;

&lt;p&gt;Having &lt;code&gt;pilot&lt;/code&gt; be an OCI-compliant runtime with an API for &lt;code&gt;flyd&lt;/code&gt; to drive is  a big win for the future of the Fly Machines API.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Was I right that we should have used SQLite for &lt;code&gt;flyd&lt;/code&gt;, or were you wrong to have used BoltDB?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I still believe Bolt was the right choice. I&amp;rsquo;ve never lost a second of sleep worried that someone is about to run a SQL update statement on a host, or across the whole fleet, and then mangled all our state data. And limiting the storage interface, by not using SQL, kept &lt;code&gt;flyd&lt;/code&gt;&amp;lsquo;s scope managed.&lt;/p&gt;

&lt;p&gt;On the engine side of the platform, which is what &lt;code&gt;flyd&lt;/code&gt; is, I still believe SQL is too powerful for what &lt;code&gt;flyd&lt;/code&gt; does.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you had this to do over again, would Bolt be precisely what you&amp;rsquo;d pick, or is there something else you&amp;rsquo;d want to try? Some cool-ass new KV store?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Nah. But, I&amp;rsquo;d maybe consider a SQLite database per-Fly-Machine. Then the scope of danger is about as small as it could possibly be.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Whoah, that&amp;rsquo;s an interesting thought. People sleep on the &amp;ldquo;keep a zillion little SQLites&amp;rdquo; design.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yeah, with per-Machine SQLite, once a Fly Machine is destroyed, we can just zip up the database and stash it in object storage. The biggest hold-up I have about it is how we&amp;rsquo;d manage the schemas.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OpenTelemetry: were you right all along?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;One hundred percent.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I basically attribute oTel at Fly.io to you.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Without oTel, it&amp;rsquo;d be a disaster trying to troubleshoot the system. I&amp;rsquo;d have ragequit trying.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I remember there being a cost issue, with how much Honeycomb was going to end up charging us to manage all the data. But that seems silly in retrospect.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For sure. It is 100% part of the decision and the conversation. But: we didn&amp;rsquo;t have the best track record running a logs/metrics cluster at this fidelity. It was worth the money to pay someone else to manage tracing data.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Strong agree. I think my only issue is just the extent to which it cruds up code. But I need to get over that.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, it&amp;rsquo;s very explicit. I think the next big part of oTel is going to be auto-instrumentation, for profiling.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You&amp;rsquo;re a veteran Golang programmer. Say 3 nice things about Rust.&lt;/em&gt;&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Most of our backend is in Go, but &lt;code&gt;fly-proxy&lt;/code&gt;, &lt;code&gt;corrosion2&lt;/code&gt;, and &lt;code&gt;pilot&lt;/code&gt; are in Rust.&lt;/p&gt;
&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Option. 
&lt;/li&gt;&lt;li&gt;Match.
&lt;/li&gt;&lt;li&gt;Serde macros.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Even I can&amp;rsquo;t say shit about Option and match.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Match is so much better than anything in Go.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Elixir, Go, and Rust. An honest take on that programming cocktail.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Three&amp;rsquo;s a crowd, Elixir can stay home.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you could only lose one, you&amp;rsquo;d keep Rust.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ve learned its shortcomings and the productivity far outweighs having to deal with the Rust compiler.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You&amp;rsquo;d be unhappy if we moved the &lt;code&gt;flaps&lt;/code&gt; API code from Go to Elixir.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Correct.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I kind of buy the idea of doing orchestration and scheduling code, which is policy-intensive, in a higher-level language.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Maybe. If Ruby had a better concurrency story, I don&amp;rsquo;t think Elixir would have a place for us.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Here I need to note that Ruby is functionally dead here, and Elixir is ascendant.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;We have an idiosyncratic management structure. We&amp;rsquo;re bottom-up, but ambiguously so. We don&amp;rsquo;t have roadmaps, except when we do. We have minimal top-down technical direction. Critique.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s too easy to lose sight of whether your current focus [in what you&amp;rsquo;re building] is valuable to the company.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The first thing I warn every candidate about on our &amp;ldquo;do-not-work-here&amp;rdquo; calls.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I think it comes down to execution, and accountability to actually finish projects. I spun a lot trying to figure out what would be the most valuable work for Fly Machines.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You don&amp;rsquo;t have to be so nice about things.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We struggle a lot with consistent communication. We change direction a little too often. It got to a point where I didn&amp;rsquo;t see a point in devoting time and effort into projects, because I&amp;rsquo;d not be able to show enough value quick enough.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I see things paying off later than we&amp;rsquo;d hoped or expected they would. Our secret storage system, Pet Semetary, is a good example of this. Our K8s service, FKS, is another obvious one, since we&amp;rsquo;re shipping MPG on it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is your second time working Kurt, at a company where he&amp;rsquo;s the CEO. Give him a 1-4 star rating. He can take it! At least, I think he can take it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;2022: ★★★★&lt;/p&gt;

&lt;p&gt;2023: ★★&lt;/p&gt;

&lt;p&gt;2024: ★★✩&lt;/p&gt;

&lt;p&gt;2025: ★★★✩&lt;/p&gt;

&lt;p&gt;On a four-star scale.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Whoah. I did not expect a histogram. Say more about 2023!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We hired too many people, too quickly, and didn&amp;rsquo;t have the guardrails and structure in place for everybody to be successful.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Also: GPUs!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes. That was my next comment.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Do we secretly agree about GPUs?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I think so.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Our side won the argument in the end! But at what cost?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;They were a killer distraction.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Final question: how long will you remain in the first-responder on-call rotation after you leave? I assume at least until August. I have a shift this weekend; can you swap with me? I keep getting weekends.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I am going to be asleep all weekend if any of my previous job changes are indicative.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I sleep through on-call too! But nobody can yell at you for it now. I think you have the comparative advantage over me in on-calling.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes I will absolutely take all your future on-call shifts, you have convinced me.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;All this aside: it has been a privilege watching you work. I hope your next gig is 100x more relaxing than this was. Or maybe I just hope that for myself. Except: I&amp;rsquo;ll never escape this place. Thank you so much for doing this.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Thank you! I&amp;rsquo;m forever grateful for having the opportunity to be a part of Fly.io.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>A Blog, If You Can Keep It</title>
    <link rel="alternate" href="https://fly.io/blog/a-blog-if-kept/"/>
    <id>https://fly.io/blog/a-blog-if-kept/</id>
    <published>2025-02-10T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/a-blog-if-kept/assets/keep-blog.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;A boldfaced lede like this was a sure sign you were reading a carefully choreographed EffortPost from our team at Fly.io. We’re going to do less of those. Or the same amount but more of a different kind of post. Either way: launch an app on Fly.io today!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Over the last 5 years, we’ve done pretty well for ourselves writing content for Hacker News. And that’s &lt;a href='https://news.ycombinator.com/item?id=39373476' title=''&gt;mostly&lt;/a&gt; been good for us. We don’t do conventional marketing, we don’t have a sales team, the rest of social media is atomized over 5 different sites. Writing pieces that HN takes seriously has been our primary outreach tool.&lt;/p&gt;

&lt;p&gt;There’s a recipe (probably several, but I know this one works) for charting a post on HN:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write an EffortPost, which is to say a dense technical piece over 2000 words long; within that rubric there’s a bunch of things that are catnip to HN, including runnable code, research surveys, and explainers. (There are also cat-repellants you learn to steer clear of.)
&lt;/li&gt;&lt;li&gt;Assiduously avoid promotion. You have to write for the audience. We get away with sporadically including a call-to-action block in our posts, but otherwise: the post should make sense even if an unrelated company posted it after you went out of business.
&lt;/li&gt;&lt;li&gt;Pick topics HN is interested in (it helps if all your topics are au courant for HN, and we’ve been &lt;a href='https://news.ycombinator.com/item?id=32250426' title=''&gt;very&lt;/a&gt; &lt;a href='https://news.ycombinator.com/item?id=32018066' title=''&gt;lucky&lt;/a&gt; in that regard).
&lt;/li&gt;&lt;li&gt;Like 5-6 more style-guide things that help incrementally. Probably 3 different teams writing for HN will have 3 different style guides with only like &amp;frac12; overlap. Ours, for instances, instructs writers to swear.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;I like this kind of writing. It’s not even a chore. But it’s become an impediment for us, for a couple reasons: the team serializes behind an “editorial” function here, which keeps us from publishing everything we want; worse, caring so much about our track record leaves us noodling on posts interminably (the poor &lt;a href='https://www.tigrisdata.com/' title=''&gt;Tigrises&lt;/a&gt; have been waiting for months for me to publish the piece I wrote about them and FoundationDB; take heart, this post today means that one is coming soon).&lt;/p&gt;

&lt;p&gt;But worst of all, I worried incessantly about us &lt;a href='https://gist.github.com/tqbf/e853764b562a2d72a91a6986ca3b77c0' title=''&gt;wearing out our welcome&lt;/a&gt;. To my mind, we’d have 1, maybe 2 bites at the HN apple in a given month, and we needed to make them count.&lt;/p&gt;

&lt;p&gt;That was dumb. I am dumb about a lot of things! I came around to understanding this after Kurt demanded I publish my blog post about BFAAS (Bash Functions As A Service), 500 lines of Go code that had generated 4500 words in my draft. It was only after I made the decision to stop gatekeeping this blog that I realized &lt;a href='https://simonwillison.net/' title=''&gt;Simon Willison&lt;/a&gt; has been disproving my “wearing out the welcome” theory, day in and day out, for years. He just writes stuff about LLMs when it interests him. I mean, it helps that he’s a better writer than we are. But he’s not wasting time choreographing things.&lt;/p&gt;

&lt;p&gt;Back in like 2009, &lt;a href='https://web.archive.org/web/20110806040300/http://chargen.matasano.com/chargen/2009/7/22/if-youre-typing-the-letters-a-e-s-into-your-code-youre-doing.html' title=''&gt;we had a blog&lt;/a&gt; at another company I was at. That blog drove a lot of business for us (and, on three occasions, almost killed me). It was not in the least bit optimized for HN. I like pretending to be a magazine feature writer, but I miss writing dashed-off pieces every day and clearing space for other people on the team to write as well.&lt;/p&gt;

&lt;p&gt;So this is all just a heads up: we’re trying something new. This is a very long and self-indulgent way to say “we’re going to write a normal blog like it’s 2008”, but that’s how broken my brain is after years of having my primary dopaminergic rewards come from how long Fly.io blog posts stay on the front page: I have to disclaim blogging before we start doing it, lest I fail to meet expectations.&lt;/p&gt;

&lt;p&gt;Like I said. I’m real dumb. But: looking forward to getting a lot more stuff out on the web for people to read this year!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Did Semgrep Just Get A Lot More Interesting?</title>
    <link rel="alternate" href="https://fly.io/blog/semgrep-but-for-real-now/"/>
    <id>https://fly.io/blog/semgrep-but-for-real-now/</id>
    <published>2025-02-10T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/semgrep-but-for-real-now/assets/ci-cd-cover.webp"/>
    <content type="html">&lt;div class="right-sidenote"&gt;&lt;p&gt;This whole paragraph is just one long sentence. God I love &lt;a href="https://fly.io/blog/a-blog-if-kept/" title=""&gt;just random-ass blogging&lt;/a&gt; again.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href='https://ghuntley.com/stdlib/' title=''&gt;This bit by Geoffrey Huntley&lt;/a&gt; is super interesting to me and, despite calling out that LLM-driven development agents like Cursor have something like a 40% success rate at actually building anything that passes acceptance criteria, makes me think that more of the future of our field belongs to people who figure out how to use this weird bags of model weights than any of us are comfortable with. &lt;/p&gt;

&lt;p&gt;I’ve been dinking around with Cursor for a week now (if you haven’t, I think it’s something close to malpractice not to at least take it — or something like it — for a spin) and am just now from this post learning that Cursor has this &lt;a href='https://docs.cursor.com/context/rules-for-ai' title=''&gt;rules feature&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The important thing for me is not how Cursor rules work, but rather how Huntley uses them. He turns them back on themselves, writing rules to tell Cursor how to organize the rules, and then teach Cursor how to write (under human supervision) its own rules.&lt;/p&gt;

&lt;p&gt;Cursor kept trying to get Huntley to use Bazel as a build system. So he had cursor write a rule for itself: “no bazel”. And there was no more Bazel. If I’d known I could do this, I probably wouldn’t have bounced from the Elixir project I had Cursor doing, where trying to get it to write simple unit tests got it all tangled up trying to make &lt;a href='https://hexdocs.pm/mox/Mox.html' title=''&gt;Mox&lt;/a&gt; work. &lt;/p&gt;

&lt;p&gt;But I’m burying the lead. &lt;/p&gt;

&lt;p&gt;Security people have been for several years now somewhat in love with a tool called &lt;a href='https://github.com/semgrep/semgrep' title=''&gt;Semgrep&lt;/a&gt;. Semgrep is a semantics-aware code search tool; using symbolic variable placeholders and otherwise ordinary code, you can write rules to match pretty much arbitary expressions and control flow. &lt;/p&gt;

&lt;p&gt;If you’re an appsec person, where you obviously go with this is: you build a library of Semgrep searches for well-known vulnerability patterns (or, if you’re like us at Fly.io, you work out how to get Semgrep to catch the Rust concurrency footgun of RWLocks inside if-lets).&lt;/p&gt;

&lt;p&gt;The reality for most teams though is “ain’t nobody got time for that”. &lt;/p&gt;

&lt;p&gt;But I just checked and, unsurprisingly, 4o &lt;a href='https://chatgpt.com/share/67aa94a7-ea3c-8012-845c-6c9491b33fe4' title=''&gt;seems to do reasonably well&lt;/a&gt; at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?&lt;/p&gt;

&lt;p&gt;What interests me is this: it seems obvious that we’re going to do more and more “closed-loop” LLM agent code generation stuff. By “closed loop”, I mean that the thingy that generates code is going to get to run the code and watch what happens when it’s interacted with. You’re just a small bit of glue code and a lot of system prompting away from building something like that right now: &lt;a href='https://x.com/chris_mccord/status/1882839014845374683' title=''&gt;Chris McCord is building&lt;/a&gt; a thingy that generates whole Elixir/Phoenix apps and runs them as Fly Machines. When you deploy these kinds of things, the LLM gets to see the errors when the code is run, and it can just go fix them. It also gets to see errors and exceptions in the logs when you hit a page on the app, and it can just go fix them.&lt;/p&gt;

&lt;p&gt;With a bit more system prompting, you can get an LLM to try to generalize out from exceptions it fixes and generate unit test coverage for them. &lt;/p&gt;

&lt;p&gt;With a little bit more system prompting, you can probably get an LLM to (1) generate a Semgrep rule for the generalized bug it caught, (2) test the Semgrep rule with a positive/negative control, (3) save the rule, (4) test the whole codebase with Semgrep for that rule, and (5) fix anything it finds that way. &lt;/p&gt;

&lt;p&gt;That is a lot more interesting to me than tediously (and probably badly) trying to predict everything that will go wrong in my codebase a priori and Semgrepping for them. Which is to say: Semgrep — which I have always liked — is maybe a lot more interesting now? And tools like it?&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>VSCode’s SSH Agent Is Bananas</title>
    <link rel="alternate" href="https://fly.io/blog/vscode-ssh-wtf/"/>
    <id>https://fly.io/blog/vscode-ssh-wtf/</id>
    <published>2025-02-07T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/static/images/default-post-thumbnail.webp"/>
    <content type="html">&lt;p&gt;We’re interested in getting integrated into the flow VSCode uses to do remote editing over SSH, because everybody is using VSCode now, and, in particular, they’re using forks of VSCode that generate code with LLMs. &lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;”hallucination” is what we call it when LLMs get code wrong; “engineering” is what we call it when people do.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;LLM-generated code is &lt;a href='https://nicholas.carlini.com/writing/2024/how-i-use-ai.html' title=''&gt;useful in the general case&lt;/a&gt; if you know what you’re doing. But it’s ultra-useful if you can close the loop between the LLM and the execution environment (with an “Agent” setup). There’s lots to say about this, but for the moment: it’s a semi-effective antidote to hallucination: the LLM generates the code, the agent scaffolding runs the code, the code generates errors, the agent feeds it back to the LLM, the process iterates. &lt;/p&gt;

&lt;p&gt;So, obviously, the issue here is you don’t want this iterative development process happening on your development laptop, because LLMs have boundary issues, and they’ll iterate on your system configuration just as happily on the Git project you happen to be working in. A thing you’d really like to be able to do: run a closed-loop agent-y (“agentic”? is that what we say now) configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can’t screw you over in any way. You get where we’re going with this.&lt;/p&gt;

&lt;p&gt;Anyways! I would like to register a concern.&lt;/p&gt;

&lt;p&gt;Emacs hosts the spiritual forebearer of remote editing systems, a blob of hyper-useful Elisp called &lt;a href='https://www.gnu.org/software/tramp/' title=''&gt;“Tramp”&lt;/a&gt;. If you can hook Tramp up to any kind of interactive environment — usually, an SSH session — where it can run Bourne shell commands, it can extend Emacs to that environment.&lt;/p&gt;

&lt;p&gt;So, VSCode has a feature like Tramp. Which, neat, right? You’d think, take Tramp, maybe simplify it a bit, switch out Elisp for Typescript.&lt;/p&gt;

&lt;p&gt;You’d think wrong!&lt;/p&gt;

&lt;p&gt;Unlike Tramp, which lives off the land on the remote connection, VSCode mounts a full-scale invasion: it runs a Bash snippet stager that downloads an agent, including a binary installation of Node. &lt;/p&gt;

&lt;p&gt;I &lt;em&gt;think&lt;/em&gt; this is &lt;a href='https://github.com/microsoft/vscode/tree/c9e7e1b72f80b12ffc00e06153afcfedba9ec31f/src/vs/server/node' title=''&gt;the source code&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;The agent runs over port-forwarded SSH. It establishes a WebSockets connection back to your running VSCode front-end. The underlying protocol on that connection can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wander around the filesystem
&lt;/li&gt;&lt;li&gt;Edit arbitrary files
&lt;/li&gt;&lt;li&gt;Launch its own shell PTY processes
&lt;/li&gt;&lt;li&gt;Persist itself
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;In security-world, there’s a name for tools that work this way. I won’t say it out loud, because that’s not fair to VSCode, but let’s just say the name is murid in nature.&lt;/p&gt;

&lt;p&gt;I would be a little nervous about letting people VSCode-remote-edit stuff on dev servers, and apoplectic if that happened during an incident on something in production. &lt;/p&gt;

&lt;p&gt;It turns out we don’t have to care about any of this to get a custom connection to a Fly Machine working in VSCode, so none of this matters in any kind of deep way, but: we’ve decided to just be a blog again, so: we had to learn this, and now you do too.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>AI GPU Clusters, From Your Laptop, With Livebook</title>
    <link rel="alternate" href="https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/"/>
    <id>https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/</id>
    <published>2024-09-24T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/assets/ai-gpu-livebook-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Livebook, FLAME, and the Nx stack: three Elixir components that are easy to describe, more powerful than they look, and intricately threaded into the Elixir ecosystem. A few weeks ago, Chris McCord (👋) and Chris Grainger showed them off at ElixirConf 2024. We thought the talk was worth a recap.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s begin by introducing our cast of characters.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://livebook.dev/' title=''&gt;Livebook&lt;/a&gt; is usually described as Elixir&amp;rsquo;s answer to &lt;a href='https://jupyter.org/' title=''&gt;Jupyter Notebooks&lt;/a&gt;. And that&amp;rsquo;s a good way to think about it. But Livebook takes full advantage of the Elixir platform, which makes it sneakily powerful. By linking up directly with Elixir app clusters, Livebook can switch easily between driving compute locally or on remote servers, and makes it easy to bring in any kind of data into reproducible workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''&gt;FLAME&lt;/a&gt; is the Elixir&amp;rsquo;s answer to serverless computing. By having the library manage a pool of executors for you, FLAME lets you treat your entire application as if it was elastic and scale-to-zero. You configure FLAME with some basic information about where to run code and how many instances it&amp;rsquo;s allowed to run with, and then mark off any arbitrary section of code with &lt;code&gt;Flame.call&lt;/code&gt;. The framework takes care of the rest. It&amp;rsquo;s the upside of serverless without committing yourself to blowing your app apart into tiny, intricately connected pieces.&lt;/p&gt;

&lt;p&gt;The &lt;a href='https://github.com/elixir-nx' title=''&gt;Nx stack&lt;/a&gt; is how you do Elixir-native AI and ML. Nx gives you an Elixir-native notion of tensor computations with GPU backends. &lt;a href='https://github.com/elixir-nx/axon' title=''&gt;Axon&lt;/a&gt; builds a common interface for ML models on top of it. &lt;a href='https://github.com/elixir-nx/bumblebee' title=''&gt;Bumblebee&lt;/a&gt; makes those models available to any Elixir app that wants to download them, from just a couple lines of code.&lt;/p&gt;

&lt;p&gt;Here is quick video showing how to transfer a local tensor to a remote GPU, using Livebook, FLAME, and Nx:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/5ImP3gpUSkQ"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Let&amp;rsquo;s dive into the &lt;a href='https://www.youtube.com/watch?v=4qoHPh0obv0' title=''&gt;keynote&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id='poking-a-hole-in-your-infrastructure' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#poking-a-hole-in-your-infrastructure' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Poking a hole in your infrastructure&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Any Livebook, including the one running on your laptop, can start a runtime running on a Fly Machine, in Fly.io&amp;rsquo;s public cloud. That Elixir machine will (by default) live in your default Fly.io organization, giving it networked access to all the other apps that might live there.&lt;/p&gt;

&lt;p&gt;This is an access control situation that mostly just does what you want it to do without asking. Unless you ask it to, Fly.io isn&amp;rsquo;t exposing anything to the Internet, or to other users of Fly.io. For instance: say we have a database we&amp;rsquo;re going to use to generate reports. It can hang out on our Fly organization, inside of a private network with no connectivity to the world. We can spin up a Livebook instance that can talk to it, without doing any network or infrastructure engineering to make that happen.&lt;/p&gt;

&lt;p&gt;But wait, there&amp;rsquo;s more. Because this is all Elixir, Livebook also allows you to connect to any running Erlang/Elixir application in your infrastructure to debug, introspect, and monitor them.&lt;/p&gt;

&lt;p&gt;Check out this clip of Chris McCord connecting &lt;a href='https://rtt.fly.dev/' title=''&gt;to an existing application&lt;/a&gt; during the keynote:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/4qoHPh0obv0?start=1106"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Running a snippet of code from a laptop on a remote server is a neat trick, but Livebook is doing something deeper than that. It&amp;rsquo;s taking advantage of Erlang/Elixir&amp;rsquo;s native facility with cluster computation and making it available to the notebook. As a result, when we do things like auto-completing, Livebook delivers results from modules defined on the remote note itself. 🤯&lt;/p&gt;
&lt;h2 id='elastic-scale-with-flame' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#elastic-scale-with-flame' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Elastic scale with FLAME&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;When we first introduced FLAME, the example we used was video encoding.&lt;/p&gt;

&lt;p&gt;Video encoding is complicated and slow enough that you&amp;rsquo;d normally make arrangements to run it remotely or in a background job queue, or as a triggerable Lambda function. The point of FLAME is to get rid of all those steps, and give them over to the framework instead. So: we wrote our &lt;code&gt;ffpmeg&lt;/code&gt; calls inline like normal code, as if they were going to complete in microseconds, and wrapped them in &lt;code&gt;Flame.call&lt;/code&gt; blocks. That was it, that was the demo.&lt;/p&gt;

&lt;p&gt;Here, we&amp;rsquo;re going to put a little AI spin on it.&lt;/p&gt;

&lt;p&gt;The first thing we&amp;rsquo;re doing here is driving FLAME pools from Livebook. Livebook will automatically synchronize your notebook dependencies as well as any module or code defined in your notebook across nodes. That means any code we write in our notebook can be dispatched transparently out to arbitrarily many compute nodes, without ceremony.&lt;/p&gt;

&lt;p&gt;Now let&amp;rsquo;s add some AI flair. We take an object store bucket full of video files. We use &lt;code&gt;ffmpeg&lt;/code&gt; to extract stills from the video at different moments. Then: we send them to &lt;a href='https://www.llama.com/' title=''&gt;Llama&lt;/a&gt;, running on &lt;a href='https://fly.io/gpu' title=''&gt;GPU Fly Machines&lt;/a&gt; (still locked to our organization), to get descriptions of the stills.&lt;/p&gt;

&lt;p&gt;All those stills and descriptions get streamed back to our notebook, in real time:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/4qoHPh0obv0?start=1692"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;At the end, the descriptions are sent to &lt;a href='https://mistral.ai/' title=''&gt;Mistral&lt;/a&gt;, which builds a summary.&lt;/p&gt;

&lt;p&gt;Thanks to FLAME, we get explicit control over the minimum and the maximum amount of nodes you want running at once, as well their concurrency settings. As nodes finish processing each video, new ones are automatically sent to them, until the whole bucket has been traversed. Each node will automatically shut down after an idle timeout and the whole cluster terminates if you disconnect the Livebook runtime.&lt;/p&gt;

&lt;p&gt;Just like your app code, FLAME lets you take your notebook code designed to run locally, change almost nothing, and elastically execute it across ephemeral infrastructure.&lt;/p&gt;
&lt;h2 id='64-gpus-hyperparameter-tuning-on-a-laptop' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#64-gpus-hyperparameter-tuning-on-a-laptop' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;64-GPUs hyperparameter tuning on a laptop&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Next, Chris Grainger, CTO of &lt;a href='https://amplified.ai/' title=''&gt;Amplified&lt;/a&gt;, takes the stage.&lt;/p&gt;

&lt;p&gt;For work at Amplified, Chris wants to analyze a gigantic archive of patents, on behalf of a client doing edible cannibinoid work. To do that, he uses a BERT model (BERT, from Google, is one of the OG &amp;ldquo;transformer&amp;rdquo; models, optimized for text comprehension).&lt;/p&gt;

&lt;p&gt;To make the BERT model effective for this task, he&amp;rsquo;s going to do a hyperparameter training run.&lt;/p&gt;

&lt;p&gt;This is a much more complicated AI task than the Llama work we just showed up. Chris is going to generate a cluster of 64 GPU Fly Machines, each running an &lt;a href='https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/' title=''&gt;L40s GPU&lt;/a&gt;. On each of these nodes, he needs to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;setup its environment (including native dependencies and GPU bindings)
&lt;/li&gt;&lt;li&gt;load the training data
&lt;/li&gt;&lt;li&gt;compile a different version of BERT with different parameters, optimizers, etc.
&lt;/li&gt;&lt;li&gt;start the fine-tuning
&lt;/li&gt;&lt;li&gt;stream its results in real-time to each assigned chart
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Here&amp;rsquo;s the clip. You&amp;rsquo;ll see the results stream in, in real time, directly back to his Livebook. We&amp;rsquo;ll wait, because it won&amp;rsquo;t take long to watch:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/4qoHPh0obv0?start=3344"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id='this-is-just-the-beginning' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#this-is-just-the-beginning' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;This is just the beginning&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The suggestion of mixing Livebook and FLAME to elastically scale notebook execution was originally proposed by Chris Grainger during ElixirConf EU. During the next four months, Jonatan Kłosko, Chris McCord, and José Valim worked part-time on making it a reality in time for ElixirConf US. Our ability to deliver such a rich combination of features in such a short period of time is a testament to the capabilities of the Erlang Virtual Machine, which Elixir and Livebook runs on. Other features, such as &lt;a href='https://github.com/elixir-explorer/explorer/issues/932' title=''&gt;remote dataframes and distributed GC&lt;/a&gt;, were implemented in a weekend.  Bringing the same functionality to other ecosystems would take several additional months, sometimes accompanied by millions in funding, and often times as part of a closed-source product.&lt;/p&gt;

&lt;p&gt;Furthermore, since we announced this feature, &lt;a href='https://github.com/mruoss' title=''&gt;Michael Ruoss&lt;/a&gt; stepped in and brought the same functionality to Kubernetes. From Livebook v0.14.1, you can start Livebook runtimes inside a Kubernetes cluster and also use FLAME to elastically scale them. Expect more features and news in this space!&lt;/p&gt;

&lt;p&gt;Finally, Fly&amp;rsquo;s infrastructure played a key role in making it possible to start a cluster of GPUs in seconds rather than minutes, and all it requires is a Docker image. We&amp;rsquo;re looking forward to see how other technologies and notebook platforms can leverage Fly to also elevate their developer experiences.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Launch a GPU app in seconds&lt;/h1&gt;
    &lt;p&gt;Run your own LLMs or use Livebook for elastic GPU workflows&amp;nbsp✨&lt;/p&gt;
      &lt;a class="btn btn-lg" href="/gpu"&gt;
        Go! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-rabbit.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

</content>
  </entry>
  <entry>
    <title>Accident Forgiveness</title>
    <link rel="alternate" href="https://fly.io/blog/accident-forgiveness/"/>
    <id>https://fly.io/blog/accident-forgiveness/</id>
    <published>2024-08-21T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/accident-forgiveness/assets/money-for-mistakes-blog-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. &lt;a href="https://fly.io/speedrun" title=""&gt;Try it out; you’ll be deployed in just minutes&lt;/a&gt;, and, as you’re about to read, with less financial risk.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Public cloud billing is terrifying.&lt;/p&gt;

&lt;p&gt;The premise of a public cloud &amp;mdash; what sets it apart from a hosting provider &amp;mdash; is 8,760 hours/year of on-tap deployable compute, storage, and networking. Cloud resources are &amp;ldquo;elastic&amp;rdquo;: they&amp;rsquo;re acquired and released as needed; in the &amp;ldquo;cloud-iest&amp;rdquo; apps, without human intervention. Public cloud resources behave like utilities, and that&amp;rsquo;s how they&amp;rsquo;re priced.&lt;/p&gt;

&lt;p&gt;You probably can&amp;rsquo;t tell me how much electricity your home is using right now, and  may only come within tens of dollars of accurately predicting your water bill. But neither of those bills are all that scary, because you assume there&amp;rsquo;s a limit to how much you could run them up in a single billing interval.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s not true of public clouds. There are only so many ways to &amp;ldquo;spend&amp;rdquo; water at your home, but there are indeterminably many ways to land on a code path that grabs another VM, or to miskey a configuration, or to leak long-running CI/CD environments every time a PR gets merged. Pick a practitioner at random, and I bet they&amp;rsquo;ve read a story within the last couple months about someone running up a galactic-scale bill at some provider or other.&lt;/p&gt;
&lt;h2 id='implied-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implied-accident-forgiveness' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Implied Accident Forgiveness&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;For people who don&amp;rsquo;t do a lot of cloud work, what all this means is that every PR push sets off a little alarm in the back of their heads: &amp;ldquo;you may have just incurred $200,000 of costs!&amp;rdquo;. The alarm is quickly silenced,  though it&amp;rsquo;s still subtly extracting a cortisol penalty. But by deadening the nerves that sense the danger of unexpected charges, those people are nudged closer to themselves being the next story on Twitter about an accidental $200,000 bill.&lt;/p&gt;

&lt;p&gt;The saving grace here, which you&amp;rsquo;ll learn if you ever become that $200,000 story, is that nobody pays those bills.&lt;/p&gt;

&lt;p&gt;See, what cloud-savvy people know already is that providers have billing support teams, which spend a big chunk of their time conceding disputed bills. If you do something luridly stupid and rack up costs, AWS and GCP will probably cut you a break. We will too. Everyone does.&lt;/p&gt;

&lt;p&gt;If you didn&amp;rsquo;t already know this, you&amp;rsquo;re welcome; I&amp;rsquo;ve made your life a little better, even if you don&amp;rsquo;t run things on Fly.io.&lt;/p&gt;

&lt;p&gt;But as soothing as it is to know you can get a break from cloud providers, the billing situation here is still a long ways away from &amp;ldquo;good&amp;rdquo;. If you accidentally add a zero to a scale count and don&amp;rsquo;t notice for several weeks, AWS or GCP will probably cut you a break. But they won&amp;rsquo;t &lt;em&gt;definitely&lt;/em&gt; do it, and even though your odds are good, you&amp;rsquo;re still finding out at email- and phone-tag scale speeds. That&amp;rsquo;s not fun!&lt;/p&gt;
&lt;h2 id='explicit-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#explicit-accident-forgiveness' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Explicit Accident Forgiveness&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Charging you for stuff you didn&amp;rsquo;t want is bad business.&lt;/p&gt;

&lt;p&gt;Good business, we think, means making you so comfortable with your cloud you try new stuff. You, and everyone else on your team. Without a chaperone from the finance department.&lt;/p&gt;

&lt;p&gt;So we&amp;rsquo;re going to do the work to make this official. If you&amp;rsquo;re a customer of ours, we&amp;rsquo;re going to charge you in exacting detail for every resource you intentionally use of ours, but if something blows up and you get an unexpected bill, we&amp;rsquo;re going to let you off the hook.&lt;/p&gt;
&lt;h2 id='not-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-so-fast' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Not So Fast&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;This is a Project, with a capital P. While we&amp;rsquo;re kind of kicking ourselves for not starting it earlier, there are reasons we couldn&amp;rsquo;t do it back in 2020.&lt;/p&gt;

&lt;p&gt;The Fully Automated Accident-Forgiving Billing System of the Future (which we are in fact building and may even one day ship) will give you a line-item veto on your invoice. We are a long ways away. The biggest reason is fraud.&lt;/p&gt;

&lt;p&gt;Sit back, close your eyes, and try to think about everything public clouds do to make your life harder. Chances are, most of those things are responses to fraud. Cloud platforms attract fraudsters like ants to an upturned ice cream cone. Thanks to the modern science of cryptography, fraudsters have had a 15 year head start on turning metered compute into picodollar-granular near-money assets.&lt;/p&gt;

&lt;p&gt;Since there&amp;rsquo;s no bouncer at the door checking IDs here, an open-ended and automated commitment to accident forgiveness is, with iron certainty, going to be used overwhelmingly in order to trick us into &amp;ldquo;forgiving&amp;rdquo; cryptocurrency miners. We&amp;rsquo;re cloud platform engineers. They&amp;rsquo;re our primary pathogen.&lt;/p&gt;

&lt;p&gt;So, we&amp;rsquo;re going to roll this out incrementally.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;Why not billing alerts?&lt;/strong&gt; We’ll get there, but here too there are reasons we haven’t yet: (1) meaningful billing alerts were incredibly difficult to do with our previous billing system, and building the new system and migrating our customers to it has been a huge lift, a nightmare from which we are only now waking (the billing system’s official name); and (2) we’re wary about alerts being a product design cop-out; if we can alert on something, why aren’t we fixing it?&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='accident-forgiveness-v0-84beta' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accident-forgiveness-v0-84beta' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Accident Forgiveness v0.84beta&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;All the same subtextual, implied reassurances that every cloud provider offers remain in place at Fly.io. You are strictly better off after this announcement, we promise.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I added the “almost” right before publishing, because I’m chicken.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Now: for customers that have a support contract with us, at any level, there&amp;rsquo;s something new: I&amp;rsquo;m saying the quiet part loud. The next time you see a bill with an unexpected charge on it, we&amp;rsquo;ll refund that charge, (almost) no questions asked.&lt;/p&gt;

&lt;p&gt;That policy is so simple it feels anticlimactic to write. So, some additional color commentary:&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;re not advertising a limit to the number of times you can do this. If you&amp;rsquo;re a serious customer of ours, I promise that you cannot remotely fathom the fullness of our fellow-feeling. You&amp;rsquo;re not annoying us by getting us to refund unexpected charges. If you are growing a project on Fly.io, we will bend over backwards to keep you growing.&lt;/p&gt;

&lt;p&gt;How far can we take this? How simple can we keep this policy? We&amp;rsquo;re going to find out together.&lt;/p&gt;

&lt;p&gt;To begin with, and in the spirit of &amp;ldquo;doing things that won&amp;rsquo;t scale&amp;rdquo;, when we forgive a bill, what&amp;rsquo;s going to happen next is this: I&amp;rsquo;m going to set an irritating personal reminder for Kurt to look into what happened, now and then the day before your next bill, so we can see what&amp;rsquo;s going wrong. He&amp;rsquo;s going to hate that, which is the point: our best feature work is driven by Kurt-hate.&lt;/p&gt;

&lt;p&gt;Obviously, if you&amp;rsquo;re rubbing your hands together excitedly over the opportunity this policy presents, then, well, not so much with the fellow-feeling. We reserve the right to cut you off.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Support For Developers, By Developers&lt;/h1&gt;
    &lt;p&gt;Explicit Accident Forgiveness is just one thing we like about Support at Fly.io.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/accident-forgiveness"&gt;
        Go find out! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-dog.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='whats-next-accident-protection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next-accident-protection' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s Next: Accident Protection&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We think this is a pretty good first step. But that&amp;rsquo;s all it is.&lt;/p&gt;

&lt;p&gt;We can do better than offering you easy refunds for mistaken deployments and botched CI/CD jobs. What&amp;rsquo;s better than getting a refund is never incurring the charge to begin with, and that&amp;rsquo;s the next step we&amp;rsquo;re working on.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;More to come on that billing system.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We built a new billing system so that we can do things like that. For instance: we&amp;rsquo;re in a position to catch sudden spikes in your month-over-month bills, flag them, and catch weird-looking deployments before we bill for them.&lt;/p&gt;

&lt;p&gt;Another thing we rebuilt billing for is &lt;a href='https://community.fly.io/t/reservation-blocks-40-discount-on-machines-when-youre-ready-to-commit/20858' title=''&gt;reserved pricing&lt;/a&gt;. Already today you can get a steep discount from us reserving blocks of compute in advance. The trick to taking advantage of reserved pricing is confidently predicting a floor to your usage. For a lot of people, that means fighting feelings of loss aversion (nobody wants to get gym priced!). So another thing we can do in this same vein: catch opportunities to move customers to reserved blocks, and offer backdated reservations. We&amp;rsquo;ll figure this out too.&lt;/p&gt;

&lt;p&gt;Someday, when we&amp;rsquo;re in a monopoly position, our founders have all been replaced by ruthless MBAs, and Kurt has retired to farm coffee beans in lower Montana, we may stop doing this stuff. But until that day this is the right choice for our business.&lt;/p&gt;

&lt;p&gt;Meanwhile: like every public cloud, we provision our own hardware, and we have excess capacity. Your messed-up CI/CD jobs didn&amp;rsquo;t really cost us anything, so if you didn&amp;rsquo;t really want them, they shouldn&amp;rsquo;t cost you anything either. Take us up on this! We love talking to you.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>We're Cutting L40S Prices In Half</title>
    <link rel="alternate" href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/"/>
    <id>https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/</id>
    <published>2024-08-15T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/assets/gpu-ga-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. And as of today, cheaper GPUs. &lt;a href="https://fly.io/speedrun" title=""&gt;Try it out; you’ll be deployed in just minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We just lowered the prices on NVIDIA L40s GPUs to $1.25 per hour. Why? Because our feet are cold and we burn processor cycles for heat. But also other reasons.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s back up.&lt;/p&gt;

&lt;p&gt;We offer 4 different NVIDIA GPU models; in increasing order of performance, they&amp;rsquo;re the A10, the L40S, the 40G PCI A100, and the 80G SXM A100.  Guess which one is most popular.&lt;/p&gt;

&lt;p&gt;We guessed wrong, and spent a lot of time working out how to maximize the amount of GPU power we could deliver to a single Fly Machine. Users surprised us. By a wide margin, the most popular GPU in our inventory is the A10.&lt;/p&gt;

&lt;p&gt;The A10 is an older generation of NVIDIA GPU with fewer, slower cores and less memory. It&amp;rsquo;s the least capable GPU we offer. But that doesn&amp;rsquo;t matter, because it&amp;rsquo;s capable enough. It&amp;rsquo;s solid for random inference tasks, and handles mid-sized generative AI stuff like Mistral Nemo or Stable Diffusion. For those workloads, there&amp;rsquo;s not that much benefit in getting a beefier GPU.&lt;/p&gt;

&lt;p&gt;As a result, we can&amp;rsquo;t get new A10s in fast enough for our users.&lt;/p&gt;

&lt;p&gt;If there&amp;rsquo;s one thing we&amp;rsquo;ve learned by talking to our customers over the last 4 years, it&amp;rsquo;s that y&amp;#39;all love a peek behind the curtain. So we&amp;rsquo;re going to let you in on a little secret about how a hardware provider like Fly.io formulates GPU strategy: none of us know what the hell we&amp;rsquo;re doing.&lt;/p&gt;

&lt;p&gt;If you had asked us in 2023 what the biggest GPU problem we could solve was, we&amp;rsquo;d have said &amp;ldquo;selling fractional A100 slices&amp;rdquo;. We burned a whole quarter trying to get MIG, or at least vGPUs, working through IOMMU PCI passthrough on Fly Machines, in a project so cursed that Thomas has forsworn ever programming again. Then we went to market selling whole A100s, and for several more months it looked like the biggest problem we needed to solve was finding a secure way to expose NVLink-ganged A100 clusters to VMs so users could run training. Then H100s; can we find H100s anywhere? Maybe in a black market in Shenzhen?&lt;/p&gt;

&lt;p&gt;And here we are, a year later, looking at the data, and the least sexy, least interesting GPU part in the catalog is where all the action is.&lt;/p&gt;

&lt;p&gt;With actual customer data to back up the hypothesis, here&amp;rsquo;s what we think is happening today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most users who want to plug GPU-accelerated AI workloads into fast networks are doing inference, not training. 
&lt;/li&gt;&lt;li&gt;The hyperscaler public clouds strangle these customers, first with GPU instance surcharges, and then with egress fees for object storage data when those customers try to outsource the GPU stuff to GPU providers.
&lt;/li&gt;&lt;li&gt;If you&amp;rsquo;re trying to do something GPU-accelerated in response to an HTTP request, the right combination of GPU, instance RAM, fast object storage for datasets and model parameters, and networking is much more important than getting your hands on an H100.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is a thing we didn&amp;rsquo;t see coming, but should have: training workloads tend to look more like batch jobs, and inference tends to look more like transactions. Batch training jobs aren&amp;rsquo;t that sensitive to networking or even reliability. Live inference jobs responding to end-user HTTP requests are. So, given our pricing, of course the A10s are a sweet spot.&lt;/p&gt;

&lt;p&gt;The next step up in our lineup after the A10 is the L40S. The L40S is a nice piece of kit. We&amp;rsquo;re going to take a beat here and sell you on the L40S, because it&amp;rsquo;s kind of awesome.&lt;/p&gt;

&lt;p&gt;The L40S is an AI-optimized version of the L40, which is the data center version of the GeForce RTX 4090, resembling two 4090s stapled together.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re not a GPU hardware person, the RTX 4090 is a gaming GPU, the kind you&amp;rsquo;d play ray-traced Witcher 3 on. NVIDIA&amp;rsquo;s high-end gaming GPUs are actually reasonably good at AI workloads! But they suck in a data center rack: they chug power, they&amp;rsquo;re hard to cool, and they&amp;rsquo;re less dense. Also, NVIDIA can&amp;rsquo;t charge as much for them.&lt;/p&gt;

&lt;p&gt;Hence the L40: (much) more memory, less energy consumption, designed for a rack, not a tower case. Marked up for &amp;ldquo;enterprise&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;NVIDIA positioned the L40 as a kind of &amp;ldquo;graphics&amp;rdquo; AI GPU. Unlike the super high-end cards like the A100/H100, the L40 keeps all the rendering hardware, so it&amp;rsquo;s good for 3D graphics and video processing. Which is sort of what you&amp;rsquo;d expect from a &amp;ldquo;professionalized&amp;rdquo; GeForce card.&lt;/p&gt;

&lt;p&gt;A funny thing happened in the middle of 2023, though: the market for ultra-high-end NVIDIA cards went absolutely batshit. The huge cards you&amp;rsquo;d gang up for training jobs got impossible to find, and NVIDIA became one of the most valuable companies in the world. Serious shops started working out plans to acquire groups of L40-type cards to work around the problem, whether or not they had graphics workloads.&lt;/p&gt;

&lt;p&gt;The only company in this space that does know what they&amp;rsquo;re doing is NVIDIA. Nobody has written a highly-ranked Reddit post about GPU workloads without NVIDIA noticing and creating a new SKU. So they launched the L40S, which is an L40 with AI workload compute performance comparable to that of the A100 (without us getting into the details of F32 vs. F16 models).&lt;/p&gt;

&lt;p&gt;Long story short, the L40S is an A100-performer that we can price for A10 customers; the Volkswagen GTI of our lineup. We&amp;rsquo;re going to see if we can make that happen.&lt;/p&gt;

&lt;p&gt;We think the combination of just-right-sized inference GPUs and Tigris object storage is pretty killer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model parameters, data sets, and compute are all close together
&lt;/li&gt;&lt;li&gt;everything plugged into an Anycast network that&amp;rsquo;s fast everywhere in the world
&lt;/li&gt;&lt;li&gt;on VM instances that have enough memory to actually run real frameworks on
&lt;/li&gt;&lt;li&gt;priced like we actually want you to use it.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;You should use L40S cards without thinking hard about it. So we&amp;rsquo;re making it official. You won&amp;rsquo;t pay us a dime extra to use one instead of an A10. Have at it! Revolutionize the industry. For $1.25 an hour.&lt;/p&gt;

&lt;p&gt;Here are things you can do with an L40S on Fly.io today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can run Llama 3.1 70B — a big Llama — for LLM jobs.
&lt;/li&gt;&lt;li&gt;You can run Flux from Black Forest Labs for genAI images.
&lt;/li&gt;&lt;li&gt;You can run Whisper for automated speech recognition.
&lt;/li&gt;&lt;li&gt;You can do whole-genome alignment with SegAlign (Thomas&amp;rsquo; biochemist kid who has been snatching free GPU hours for his lab gave us this one, and we&amp;rsquo;re taking his word for it).
&lt;/li&gt;&lt;li&gt;You can run DOOM Eternal, building the Stadia that Google couldn&amp;rsquo;t pull off, because the L40S hasn&amp;rsquo;t forgotten that it&amp;rsquo;s a graphics GPU. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;It&amp;rsquo;s going to get chilly in Chicago in a month or so. Go light some cycles on fire! &lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Making Machines Move</title>
    <link rel="alternate" href="https://fly.io/blog/machine-migrations/"/>
    <id>https://fly.io/blog/machine-migrations/</id>
    <published>2024-07-30T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/machine-migrations/assets/migrations-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a global public cloud with simple, developer-friendly ergonomics. If you’ve got a working Docker image, we’ll transmogrify it into a Fly Machine: a VM running on our hardware anywhere in the world. &lt;a href="https://fly.io/speedrun" title=""&gt;Try it out; you’ll be deployed in just minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;At the heart of our platform is a systems design tradeoff about durable storage for applications.  When we added storage three years ago, to support stateful apps, we built it on attached NVMe drives. A benefit: a Fly App accessing a file on a Fly Volume is never more than a bus hop away from the data. A cost: a Fly App with an attached Volume is anchored to a particular worker physical.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;code&gt;bird&lt;/code&gt;: a BGP4 route server.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Before offering attached storage, our on-call runbook was almost as simple as “de-bird that edge server”, “tell &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;Nomad&lt;/a&gt; to drain that worker”, and “go back to sleep”. NVMe cost us that drain operation, which terribly complicated the lives of our infra team. We’ve spent the last year getting “drain” back. It’s one of the biggest engineering lifts we&amp;rsquo;ve made, and if you didn’t notice, we lifted it cleanly.&lt;/p&gt;
&lt;h3 id='the-goalposts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-goalposts' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Goalposts&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;With stateless apps, draining a worker is easy. For each app instance running on the victim server, start a new instance elsewhere. Confirm it&amp;rsquo;s healthy, then kill the old one. Rinse, repeat. At our 2020 scale, we could drain a fully loaded worker in just a handful of minutes.&lt;/p&gt;

&lt;p&gt;You can see why this process won&amp;rsquo;t work for apps with attached volumes. Sure, create a new volume elsewhere on the fleet, and boot up a new Fly Machine attached to it. But the new volume is empty. The data&amp;rsquo;s still stuck on the original worker. We asked, and customers were not OK with this kind of migration.&lt;/p&gt;

&lt;p&gt;Of course, we back Volumes snapshots up (at an interval) to off-network storage. But for “drain”, restoring backups isn&amp;rsquo;t nearly good enough. No matter the backup interval, a “restore from backup migration&amp;quot; will lose data, and a “backup and restore” migration  incurs untenable downtime.&lt;/p&gt;

&lt;p&gt;The next thought you have is, “OK, copy the volume over”. And, yes, of course you have to do that. But you can’t just &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt;, and then &lt;code&gt;kill&lt;/code&gt; the old Fly Machine. Because the original Fly Machine is still alive and writing, you have to &lt;code&gt;kill&lt;/code&gt;first, then &lt;code&gt;copy&lt;/code&gt;, then &lt;code&gt;boot&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Fly Volumes can get pretty big. Even to a rack buddy physical server, you&amp;rsquo;ll hit a point where draining incurs minutes of interruption, especially if you’re moving lots of volumes simultaneously. &lt;code&gt;Kill&lt;/code&gt;, &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; is too slow.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;There’s a world where even 15 minutes of interruption is tolerable. It’s the world where you run more than one instance of your application to begin with, so prolonged interruption of a single Fly Machine isn’t visible to your users. Do this! But we have to live in the same world as our customers, many of whom don’t run in high-availability configurations.&lt;/p&gt;
&lt;/div&gt;&lt;h3 id='behold-the-clone-o-mat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#behold-the-clone-o-mat' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Behold The Clone-O-Mat&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;Copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt;, &lt;code&gt;kill&lt;/code&gt; loses data. &lt;code&gt;Kill&lt;/code&gt;, &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; takes too long. What we needed is a new operation: &lt;code&gt;clone&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Clone&lt;/code&gt; is a lazier, asynchronous &lt;code&gt;copy&lt;/code&gt;. It creates a new volume elsewhere on our fleet, just like &lt;code&gt;copy&lt;/code&gt; would. But instead of blocking, waiting to transfer every byte from the original volume, &lt;code&gt;clone&lt;/code&gt; returns immediately, with a transfer running in the background.&lt;/p&gt;

&lt;p&gt;A new Fly Machine can be booted with that cloned volume attached. Its blocks are mostly empty. But that’s OK: when the new Fly Machine tries to read from it, the block storage system works out whether the block has been transferred. If it hasn’t, it’s fetched over the network from the original volume; this is called &amp;ldquo;hydration&amp;rdquo;. Writes are even easier, and don’t hit the network at all.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Kill&lt;/code&gt;, &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; is slow. But &lt;code&gt;kill&lt;/code&gt;, &lt;code&gt;clone&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; is fast; it can be made asymptotically as fast as stateless migration.&lt;/p&gt;

&lt;p&gt;There are three big moving pieces to this design.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First, we have to rig up our OS storage system to make this &lt;code&gt;clone&lt;/code&gt; operation work.
&lt;/li&gt;&lt;li&gt;Then, to read blocks over the network, we need a network protocol. (Spoiler: iSCSI, though we tried other stuff first.)
&lt;/li&gt;&lt;li&gt;Finally, the gnarliest of those pieces is our orchestration logic: what’s running where, what state is it in, and whether it’s plugged in correctly.
&lt;/li&gt;&lt;/ol&gt;
&lt;h3 id='block-level-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#block-level-clone' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Block-Level Clone&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The Linux feature we need to make this work already exists; &lt;a href='https://docs.kernel.org/admin-guide/device-mapper/dm-clone.html' title=''&gt;it’s called &lt;code&gt;dm-clone&lt;/code&gt;&lt;/a&gt;. Given an existing, readable storage device, &lt;code&gt;dm-clone&lt;/code&gt; gives us a new device, of identical size, where reads of uninitialized blocks will pull from the original. It sounds terribly complicated, but it’s actually one of the simpler kernel lego bricks. Let&amp;rsquo;s demystify it.&lt;/p&gt;

&lt;p&gt;As far as Unix is concerned, random-access storage devices, be they spinning rust or NVMe drives, are all instances of the common class “block device”. A block device is addressed in fixed-size (say, 4KiB) chunks, and &lt;a href='https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L356' title=''&gt;handles (roughly) these operations&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cpp"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-9s54so8p"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-9s54so8p"&gt;&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;req_opf&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* read sectors from the device */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_READ&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* write sectors to the device */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_WRITE&lt;/span&gt;        &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* flush the volatile write cache */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_FLUSH&lt;/span&gt;        &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* discard sectors */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_DISCARD&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* securely erase sectors */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_SECURE_ERASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* write the same sector many times */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_WRITE_SAME&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* write the zero filled sector many times */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_WRITE_ZEROES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* ... */&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can imagine designing a simple network protocol that supported all these options. It might have messages that looked something like:&lt;/p&gt;

&lt;p&gt;&lt;img alt="A packet diagram, just skip down to &amp;quot;struct bio&amp;quot; below" src="/blog/machine-migrations/assets/packet.png?2/3&amp;amp;center" /&gt;
Good news! The Linux block system is organized as if your computer was a network running a protocol that basically looks just like that. Here’s the message structure:&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I’ve &lt;a href="https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L223" title=""&gt;stripped a bunch of stuff out of here&lt;/a&gt; but you don’t need any of it to understand what’s coming next.&lt;/p&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative cpp"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-qqcnfpws"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-qqcnfpws"&gt;&lt;span class="cm"&gt;/*
 * main unit of I/O for the block layer and lower layers (ie drivers and
 * stacking drivers)
 */&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;bio&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;gendisk&lt;/span&gt;      &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bi_disk&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;        &lt;span class="n"&gt;bi_opf&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt;      &lt;span class="n"&gt;bi_flags&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt;      &lt;span class="n"&gt;bi_ioprio&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;blk_status_t&lt;/span&gt;        &lt;span class="n"&gt;bi_status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt;      &lt;span class="n"&gt;bi_vcnt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;         &lt;span class="cm"&gt;/* how many bio_vec's */&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;bio_vec&lt;/span&gt;      &lt;span class="n"&gt;bi_inline_vecs&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="cm"&gt;/* (page, len, offset) tuples */&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;No nerd has ever looked at a fixed-format message like this without thinking about writing a proxy for it, and &lt;code&gt;struct bio&lt;/code&gt; is no exception. The proxy system in the Linux kernel for &lt;code&gt;struct bio&lt;/code&gt; is called &lt;code&gt;device mapper&lt;/code&gt;, or DM.&lt;/p&gt;

&lt;p&gt;DM target devices can plug into other DM devices. For that matter, they can do whatever the hell else they want, as long as they honor the interface. It boils down to a &lt;code&gt;map(bio)&lt;/code&gt; function, which can dispatch a &lt;code&gt;struct bio&lt;/code&gt;, or drop it, or muck with it and ask the kernel to resubmit it.&lt;/p&gt;

&lt;p&gt;You can do a whole lot of stuff with this interface: carve a big device into a bunch of smaller ones (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/linear.html' title=''&gt;&lt;code&gt;dm-linear&lt;/code&gt;&lt;/a&gt;), make one big striped device out of a bunch of smaller ones (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/striped.html' title=''&gt;&lt;code&gt;dm-stripe&lt;/code&gt;&lt;/a&gt;), do software RAID mirroring (&lt;code&gt;dm-raid1&lt;/code&gt;), create snapshots of arbitrary existing devices (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/snapshot.html' title=''&gt;&lt;code&gt;dm-snap&lt;/code&gt;&lt;/a&gt;), cryptographically verify boot devices (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/verity.html' title=''&gt;&lt;code&gt;dm-verity&lt;/code&gt;&lt;/a&gt;), and a bunch more. Device Mapper is the kernel backend for the &lt;a href='https://sourceware.org/lvm2/' title=''&gt;userland LVM2 system&lt;/a&gt;, which is how we do &lt;a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''&gt;thin pools and snapshot backups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Which brings us to &lt;code&gt;dm-clone&lt;/code&gt; : it’s a map function that boils down to:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cpp"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-fre1zh1l"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-fre1zh1l"&gt;    &lt;span class="cm"&gt;/* ... */&lt;/span&gt; 
    &lt;span class="n"&gt;region_nr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bio_to_region&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// we have the data&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dm_clone_is_region_hydrated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_nr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;remap_and_issue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// we don't and it's a read&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bio_data_dir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;READ&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;remap_to_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// we don't and it's a write&lt;/span&gt;
    &lt;span class="n"&gt;remap_to_dest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;hydrate_bio_region&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="cm"&gt;/* ... */&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;a &lt;a href="https://docs.kernel.org/admin-guide/device-mapper/kcopyd.html" title=""&gt;&lt;code&gt;kcopyd&lt;/code&gt;&lt;/a&gt; thread runs in the background, rehydrating the device in addition to (and independent of) read accesses.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;dm-clone&lt;/code&gt; takes, in addition to the source device to clone from, a “metadata” device on which is stored a bitmap of the status of all the blocks: either “rehydrated” from the source, or not. That’s how it knows whether to fetch a block from the original device or the clone.&lt;/p&gt;
&lt;h3 id='network-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#network-clone' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Network Clone&lt;/span&gt;&lt;/h3&gt;&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;&lt;code&gt;flyd&lt;/code&gt; in a nutshell:&lt;/strong&gt; worker physical run a service, &lt;code&gt;flyd&lt;/code&gt;, which manages a couple of databases that are the source of truth for all the Fly Machines running there. Concepturally, &lt;code&gt;flyd&lt;/code&gt; is a server for on-demand instances of durable finite state machines, each representing some operation on a Fly Machine (creation, start, stop, &amp;amp;c), with the transition steps recorded carefully in a BoltDB database. An FSM step might be something like “assign a local IPv6 address to this interface”, or “check out a block device with the contents of this container”, and it’s straightforward to add and manage new ones.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Say we&amp;rsquo;ve got &lt;code&gt;flyd&lt;/code&gt; managing a Fly Machine with a volume on &lt;code&gt;worker-xx-cdg1-1&lt;/code&gt;. We want it running on &lt;code&gt;worker-xx-cdg1-2&lt;/code&gt;. Our whole fleet is meshed with WireGuard; everything can talk directly to everything else. So, conceptually:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-1&lt;/code&gt; stops the Fly Machine, and
&lt;/li&gt;&lt;li&gt;sends a message to &lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-2&lt;/code&gt; telling it to clone the source volume.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-2&lt;/code&gt; starts a &lt;code&gt;dm-clone&lt;/code&gt; instance, which creates a clone volume on &lt;code&gt;cdg1-2&lt;/code&gt;, populating it, over some kind of network block protocol, from &lt;code&gt;cdg1-1&lt;/code&gt;, and
&lt;/li&gt;&lt;li&gt;boots a new Fly Machine, attached to the clone volume.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-2&lt;/code&gt; monitors the clone operation, and, when the clone completes, converts the clone device to a simple linear device and cleans up.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;For step (3) to work, the “original volume” on &lt;code&gt;cdg1-1&lt;/code&gt; has to be visible on &lt;code&gt;cdg1-2&lt;/code&gt;, which means we need to mount it over the network.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;code&gt;nbd&lt;/code&gt; is so simple that it’s used as a sort of &lt;code&gt;dm-user&lt;/code&gt; userland block device; to prototype a new block device, &lt;a href="https://lwn.net/ml/linux-kernel/[email protected]/" title=""&gt;don’t bother writing a kernel module&lt;/a&gt;, just write an &lt;code&gt;nbd&lt;/code&gt; server.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Take your pick of  protocols. iSCSI is the obvious one, but it’s relatively complicated, and Linux has native support for a much simpler one: &lt;code&gt;nbd&lt;/code&gt;, the “network block device”. You could implement an &lt;code&gt;nbd&lt;/code&gt; server in an afternoon, on top of a file or a SQLite database or S3, and the Linux kernel could mount it as a drive.&lt;/p&gt;

&lt;p&gt;We started out using &lt;code&gt;nbd&lt;/code&gt;. But we kept getting stuck &lt;code&gt;nbd&lt;/code&gt; kernel threads when there was any kind of network disruption. We’re a global public cloud; network disruption  happens. Honestly, we could have debugged our way through this. But it was simpler just to spike out an iSCSI implementation, observe that didn’t get jammed up when the network hiccuped, and move on.&lt;/p&gt;
&lt;h3 id='putting-the-pieces-together' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#putting-the-pieces-together' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Putting The Pieces Together&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;To drain a worker with minimal downtime and no lost data, we turn workers into a temporary SANs, serving the volumes we need to drain to fresh-booted replica Fly Machines on a bunch of “target” physicals. Those SANs — combinations of &lt;code&gt;dm-clone&lt;/code&gt;, iSCSI, and &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;our &lt;code&gt;flyd&lt;/code&gt; orchestrator&lt;/a&gt; — track the blocks copied from the origin, copying each one exactly once and cleaning up when the original volume has been fully copied.&lt;/p&gt;

&lt;p&gt;Problem solved!&lt;/p&gt;
&lt;h3 id='no-there-were-more-problems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#no-there-were-more-problems' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;No, There Were More Problems&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;When your problem domain is hard, anything you build whose design you can’t fit completely in your head is going to be a fiasco. Shorter form: “if you see Raft consensus in a design, we’ve done something wrong”.&lt;/p&gt;

&lt;p&gt;A virtue of this migration system is that, for as many moving pieces as it has, it fits in your head. What complexity it has is mostly shouldered by strategic bets we’ve already  built teams around, most notably the &lt;code&gt;flyd&lt;/code&gt; orchestrator. So we’ve been running this system for the better part of a year without much drama. Not no drama, though. Some drama.&lt;/p&gt;

&lt;p&gt;Example: we encrypt volumes. Our key management is fussy. We do per-volume encryption keys that provision alongside the volumes themselves, so no one worker has a volume skeleton key.&lt;/p&gt;

&lt;p&gt;If you think “migrating those volume keys from worker to worker” is the problem I’m building up to, well, that too, but the bigger problem is &lt;code&gt;trim&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Most people use just a small fraction of the volumes they allocate. A 100GiB volume with just 5MiB used wouldn’t be at all weird. You don’t want to spend minutes copying a volume that could have been fully hydrated in seconds.&lt;/p&gt;

&lt;p&gt;And indeed, &lt;code&gt;dm-clone&lt;/code&gt; doesn’t want to do that either. Given a source block device (for us, an iSCSI mount) and the clone device, a &lt;code&gt;DISCARD&lt;/code&gt; issued on the clone device will get picked up by &lt;code&gt;dm-clone&lt;/code&gt;, which will simply &lt;a href='https://elixir.bootlin.com/linux/v5.11.11/source/drivers/md/dm-clone-target.c#L1357' title=''&gt;short-circuit the read&lt;/a&gt; of the relevant blocks by marking them as hydrated in the metadata volume. Simple enough.&lt;/p&gt;

&lt;p&gt;To make that work, we need the target worker to see the plaintext of the source volume (so that it can do an &lt;code&gt;fstrim&lt;/code&gt; — don’t get us started on how annoying it is to sandbox this — to read the filesystem, identify the unused block, and issue the &lt;code&gt;DISCARDs&lt;/code&gt; where &lt;code&gt;dm-clone&lt;/code&gt; can see them) Easy enough.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;these curses have a lot to do with how hard it was to drain workers!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Except: two different workers, for cursed reasons, might be running different versions of &lt;a href='https://gitlab.com/cryptsetup/cryptsetup' title=''&gt;cryptsetup&lt;/a&gt;, the userland bridge between LUKS2 and the &lt;a href='https://docs.kernel.org/admin-guide/device-mapper/dm-crypt.html' title=''&gt;kernel dm-crypt driver&lt;/a&gt;. There are (or were) two different versions of cryptsetup on our network, and they default to different &lt;a href='https://fossies.org/linux/cryptsetup/docs/on-disk-format-luks2.pdf' title=''&gt;LUKS2 header sizes&lt;/a&gt; — 4MiB and 16MiB. Implying two different plaintext volume sizes. &lt;/p&gt;

&lt;p&gt;So now part of the migration FSM is an RPC call that carries metadata about the designed LUKS2 configuration for the target VM. Not something we expected to have to build, but, whatever.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Corrosion deserves its own post.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Gnarlier example: workers are the source of truth for information about the Fly Machines running on them. Migration knocks the legs out from under that constraint, which we were relying on in Corrosion, the SWIM-gossip SQLite database we use to connect Fly Machines to our request routing. Race conditions. Debugging. Design changes. Next!&lt;/p&gt;

&lt;p&gt;Gnarliest example: our private networks. Recall: we automatically place every Fly Machine into &lt;a href='https://fly.io/blog/incoming-6pn-private-networks/' title=''&gt;a private network&lt;/a&gt;; by default, it’s the one all the other apps in your organization run in. This is super handy for setting up background services, databases, and clustered applications. 20 lines of eBPF code in our worker kernels keeps anybody from “crossing the streams”, sending packets from one private network to another.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;we’re members of an elite cadre of idiots who have managed to find designs that made us wish IPv6 addresses were even bigger.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We call this scheme 6PN (for “IPv6 Private Network”). It functions by &lt;a href='https://fly.io/blog/ipv6-wireguard-peering#ipv6-private-networking-at-fly' title=''&gt;embedding routing information directly into IPv6 addresses&lt;/a&gt;. This is, perhaps, gross. But it allows us to route diverse private networks with constantly changing membership across a global fleet of servers without running a distributed routing protocol. As the beardy wizards who kept the Internet backbone up and running on Cisco AGS+’s once said: the best routing protocol is “static”.&lt;/p&gt;

&lt;p&gt;Problem: the embedded routing information in a 6PN address refers in part to specific worker servers.&lt;/p&gt;

&lt;p&gt;That’s fine, right? They’re IPv6 addresses. Nobody uses literal IPv6 addresses. Nobody uses IP addresses at all; they use the DNS. When you migrate a host, just give it a new 6PN address, and update the DNS.&lt;/p&gt;

&lt;p&gt;Friends, somebody did use literal IPv6 addresses. It was us. In the configurations for Fly Postgres clusters.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;It’s also not operationally easy for us to shell into random Fly Machines, for good reason.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The obvious fix for this is not complicated; given &lt;code&gt;flyctl&lt;/code&gt; ssh access to a Fly Postgres cluster, it’s like a 30 second ninja edit. But we run a &lt;em&gt;lot&lt;/em&gt; of Fly Postgres clusters, and the change has to be coordinated carefully to avoid getting the cluster into a confused state. We went as far as adding feature to our &lt;code&gt;init&lt;/code&gt; to do network address mappings to keep old 6PN addresses reachable before biting the bullet and burning several weeks doing the direct configuration fix fleet-wide.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Speedrun your app onto Fly.io.&lt;/h1&gt;
    &lt;p&gt;3&amp;hellip;2&amp;hellip;1&amp;hellip;&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/speedrun"&gt;
        Go! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h3 id='the-learning-it-burns' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-learning-it-burns' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Learning, It Burns!&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;We get asked a lot why we don’t do storage the “obvious” way, with an &lt;a href='https://aws.amazon.com/ebs/' title=''&gt;EBS-type&lt;/a&gt; SAN fabric, abstracting it away from our compute. Locally-attached NVMe storage is an idiosyncratic choice, one we’ve had to write disclaimers for (single-node clusters can lose data!) since we first launched it.&lt;/p&gt;

&lt;p&gt;One answer is: we’re a startup. Building SAN infrastructure in every region we operate in would be tremendously expensive. Look at any feature in AWS that normal people know the name of, like EC2, EBS, RDS, or S3 — there’s a whole company in there. We launched storage when we were just 10 people, and even at our current size we probably have nothing resembling the resources EBS gets. AWS is pretty great!&lt;/p&gt;

&lt;p&gt;But another thing to keep in mind is: we’re learning as we go. And so even if we had the means to do an EBS-style SAN, we might not build it today.&lt;/p&gt;

&lt;p&gt;Instead, we’re a lot more interested in log-structured virtual disks (LSVD). LSVD uses NVMe as a local cache, but durably persists writes in object storage. You get most of the performance benefit of bus-hop disk writes, along with unbounded storage and S3-grade reliability.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://community.fly.io/t/bottomless-s3-backed-volumes/15648' title=''&gt;We launched LSVD experimentally last year&lt;/a&gt;; in the intervening year, something happened to make LSVD even more interesting to us: &lt;a href='https://www.tigrisdata.com/' title=''&gt;Tigris Data&lt;/a&gt; launched S3-compatible object storage in every one our regions, so instead of backhauling updates to Northern Virginia, &lt;a href='https://community.fly.io/t/tigris-backed-volumes/20792' title=''&gt;we can keep them local&lt;/a&gt;. We have more to say about LSVD, and a lot more to say about Tigris.&lt;/p&gt;

&lt;p&gt;Our first several months of migrations were done gingerly. By summer of 2024, we got to where our infra team can pull “drain this host” out of their toolbelt without much ceremony.&lt;/p&gt;

&lt;p&gt;We’re still not to the point where we’re migrating casually. Your Fly Machines are probably not getting migrated! There&amp;rsquo;d need to be a reason! But the dream is fully-automated luxury space migration, in which you might get migrated semiregularly, as our systems work not just to drain problematic hosts but to rebalance workloads regularly. No time soon. But we’ll get there.&lt;/p&gt;

&lt;p&gt;This is the biggest thing our team has done since we replaced Nomad with flyd. Only the new billing system comes close. We did this thing not because it was easy, but because we thought it would be easy. It was not. But: worth it!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>AWS without Access Keys</title>
    <link rel="alternate" href="https://fly.io/blog/oidc-cloud-roles/"/>
    <id>https://fly.io/blog/oidc-cloud-roles/</id>
    <published>2024-06-19T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/oidc-cloud-roles/assets/spooky-security-skeleton-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;It’s dangerous to go alone. Fly.io runs full-stack apps by transmuting Docker containers into Fly Machines: ultra-lightweight hardware-backed VMs. You can run all your dependencies on Fly.io, but sometimes, you’ll need to work with other clouds, and we’ve made that pretty simple. Try Fly.io out for yourself; your Rails or Node app &lt;a href="https://fly.io/speedrun" title=""&gt;can be up and running in just minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s hypopulate you an app serving generative AI cat images based on the weather forecast, running on a &lt;code&gt;g4dn.xlarge&lt;/code&gt; ECS task in AWS &lt;code&gt;us-east-1&lt;/code&gt;.  It&amp;rsquo;s going great; people didn&amp;rsquo;t realize how dependent their cat pic prefs are on barometric pressure, and you&amp;rsquo;re all anyone can talk about.&lt;/p&gt;

&lt;p&gt;Word reaches Australia and Europe, but you&amp;rsquo;re not catching on, because the… latency is too high? Just roll with us here. Anyways: fixing this is going to require replicating  ECS tasks and ECR images into &lt;code&gt;ap-southeast-2&lt;/code&gt; and &lt;code&gt;eu-central-1&lt;/code&gt; while also setting up load balancing. Nah.&lt;/p&gt;

&lt;p&gt;This is the O.G. Fly.io deployment story; one deployed app, one versioned container, one command to get it running anywhere in the world.&lt;/p&gt;

&lt;p&gt;But you have a problem: your app relies on training data, it&amp;rsquo;s huge, your giant employer manages it, and it&amp;rsquo;s in S3. Getting this to work will require AWS credentials.&lt;/p&gt;

&lt;p&gt;You could ask your security team to create a user, give it permissions, and hand over the AWS keypair. Then you could wash your neck and wait for the blade. Passing around AWS keypairs is the beginning of every horror story told about cloud security, and  security team ain&amp;rsquo;t having it.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a better way. It&amp;rsquo;s drastically more secure, so your security people will at least hear you out. It&amp;rsquo;s also so much easier on Fly.io that you might never bother creating a IAM service account again.&lt;/p&gt;
&lt;h2 id='lets-get-it-out-of-the-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lets-get-it-out-of-the-way' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Let&amp;rsquo;s Get It out of the Way&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;re going to use OIDC to set up strictly limited trust between AWS and Fly.io.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In AWS: we&amp;rsquo;ll add Fly.io as an &lt;code&gt;Identity Provider&lt;/code&gt; in AWS IAM, giving us an ID we can plug into any IAM &lt;code&gt;Role&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;Also in AWS: we&amp;rsquo;ll create a &lt;code&gt;Role&lt;/code&gt;, give it access to the S3 bucket with our tokenized cat data, and then attach the &lt;code&gt;Identity Provider&lt;/code&gt; to it.
&lt;/li&gt;&lt;li&gt;In Fly.io, we&amp;rsquo;ll take the &lt;code&gt;Role&lt;/code&gt; ARN we got from step 2 and set it as an environment variable in our app.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Our machines will now magically have access to the S3 bucket.&lt;/p&gt;
&lt;h2 id='what-the-what' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-what' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What the What&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;A reasonable question to ask here is, &amp;ldquo;where&amp;rsquo;s the credential&amp;rdquo;? Ordinarily, to give a Fly Machine access to an AWS resource, you&amp;rsquo;d use &lt;code&gt;fly secrets set&lt;/code&gt; to add an &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt; and &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt; to the environment in the Machine. Here, we&amp;rsquo;re not setting any secrets at all; we&amp;rsquo;re just adding an ARN — which is not a credential — to the Machine.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s what&amp;rsquo;s happening.&lt;/p&gt;

&lt;p&gt;Fly.io operates an OIDC IdP at &lt;code&gt;oidc.fly.io&lt;/code&gt;. It issues OIDC tokens, exclusively to Fly Machines. AWS can be configured to trust these tokens, on a role-by-role basis. That&amp;rsquo;s the &amp;ldquo;secret credential&amp;rdquo;: the pre-configured trust relationship in IAM, and the public keypairs it manages. You, the user, never need to deal with these keys directly; it all happens behind the scenes, between AWS and Fly.io.&lt;/p&gt;

&lt;p&gt;&lt;img alt="A diagram: STS trusts OIDC.fly.io. OIDC.fly.io trusts flyd. flyd issues a token to the Machine, which proffers it to STS. STS sends an STS cred to the Machine, which then uses it to retrieve model weights from S3." src="/blog/oidc-cloud-roles/assets/oidc-diagram.webp" /&gt;&lt;/p&gt;

&lt;p&gt;The key actor in this picture is &lt;code&gt;STS&lt;/code&gt;, the AWS &lt;code&gt;Security Token Service&lt;/code&gt;. &lt;code&gt;STS&lt;/code&gt;&amp;lsquo;s main job is to vend short-lived AWS credentials, usually through some variant of an API called &lt;code&gt;AssumeRole&lt;/code&gt;. Specifically, in our case: &lt;code&gt;AssumeRoleWithWebIdentity&lt;/code&gt; tells &lt;code&gt;STS&lt;/code&gt; to cough up an AWS keypair given an OIDC token (that matches a pre-configured trust relationship).&lt;/p&gt;

&lt;p&gt;That still leaves the question: how does your code, which is reaching out to the AWS APIs to get cat weights, drive any of this?&lt;/p&gt;
&lt;h2 id='the-init-thickens' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-init-thickens' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Init Thickens&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Every Fly Machine boots up into an &lt;code&gt;init&lt;/code&gt; we wrote in Rust. It has slowly been gathering features.&lt;/p&gt;

&lt;p&gt;One of those features, which has been around for awhile, is a server for a Unix socket at &lt;code&gt;/.fly/api&lt;/code&gt;, which exports a subset of the Fly Machines API to privileged processes in the Machine. Think of it as our answer to the EC2 Instant Metadata Service. How it works is, every time we boot a Fly Machine, we pass it a &lt;a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''&gt;Macaroon token&lt;/a&gt; locked to that particular Machine; &lt;code&gt;init&lt;/code&gt;&amp;rsquo;s server for &lt;code&gt;/.fly/api&lt;/code&gt; is a proxy that attaches that token to requests.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;In addition to the API proxy being tricky to SSRF to.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;What&amp;rsquo;s neat about this is that the credential that drives &lt;code&gt;/.fly/api&lt;/code&gt; is doubly protected:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Fly.io platform won&amp;rsquo;t honor it unless it comes from that specific Fly Machine (&lt;code&gt;flyd&lt;/code&gt;, our orchestrator, knows who it&amp;rsquo;s talking to), &lt;em&gt;and&lt;/em&gt;
&lt;/li&gt;&lt;li&gt;Ordinary code running in a Fly Machine never gets a copy of the token to begin with.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;You could rig up a local privilege escalation vulnerability and work out how to steal the Macaroon, but you can&amp;rsquo;t exfiltrate it productively.&lt;/p&gt;

&lt;p&gt;So now you have half the puzzle worked out: OIDC is just part of the &lt;a href='https://fly.io/docs/machines/api/' title=''&gt;Fly Machines API&lt;/a&gt; (specifically: &lt;code&gt;/v1/tokens/oidc&lt;/code&gt;). A Fly Machine can hit a Unix socket and get an OIDC token tailored to that machine:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-dm2j0e3b"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-dm2j0e3b"&gt;{
  "app_id": "3671581",
  "app_name": "weather-cat",
  "aud": "sts.amazonaws.com",
  "image": "image:latest",
  "image_digest": "sha256:dff79c6da8dd4e282ecc6c57052f7cfbd684039b652f481ca2e3324a413ee43f",
  "iss": "https://oidc.fly.io/example",
  "machine_id": "3d8d377ce9e398",
  "machine_name": "ancient-snow-4824",
  "machine_version": "01HZJXGTQ084DX0G0V92QH3XW4",
  "org_id": "29873298",
  "org_name": "example",
  "region": "yyz",
  "sub": "example:weather-cat:ancient-snow-4824"
} // some OIDC stuff trimmed
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Look upon this holy blob, sealed with a published key managed by Fly.io&amp;rsquo;s OIDC vault, and see that there lies within it enough information for AWS &lt;code&gt;STS&lt;/code&gt; to decide to issue a session credential.&lt;/p&gt;

&lt;p&gt;We have still not completed the puzzle, because while you can probably now see how you&amp;rsquo;d drive this process with a bunch of new code that you&amp;rsquo;d tediously write, you are acutely aware that you have not yet endured that tedium — e pur si muove!&lt;/p&gt;

&lt;p&gt;One &lt;code&gt;init&lt;/code&gt; feature remains to be disclosed, and it&amp;rsquo;s cute.&lt;/p&gt;

&lt;p&gt;If, when &lt;code&gt;init&lt;/code&gt; starts in a Fly Machine, it sees an &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; environment variable set, it initiates a little dance; it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;goes off and generates an OIDC token, the way we just described,
&lt;/li&gt;&lt;li&gt;saves that OIDC token in a file, &lt;em&gt;and&lt;/em&gt;
&lt;/li&gt;&lt;li&gt;sets the &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; and &lt;code&gt;AWS_ROLE_SESSION_NAME&lt;/code&gt; environment variables for every process it launches.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;The AWS SDK, linked to your application, does all the rest.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s review: you add an &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; variable to your Fly App, launch a Machine, and have it go fetch a file from S3. What happens next is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;init&lt;/code&gt; detects &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; is set as an environment variable.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;init&lt;/code&gt; sends a request to &lt;code&gt;/v1/tokens/oidc&lt;/code&gt; via &lt;code&gt;/.api/proxy&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;init&lt;/code&gt; writes the response to &lt;code&gt;/.fly/oidc_token.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;&lt;code&gt;init&lt;/code&gt; sets &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; and &lt;code&gt;AWS_ROLE_SESSION_NAME&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;The entrypoint boots, and (say) runs &lt;code&gt;aws s3 get-object.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The AWS SDK runs through the &lt;a href='https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html#credentialProviderChain' title=''&gt;credential provider chain&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;The SDK sees that &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; is set and calls &lt;code&gt;AssumeRoleWithWebIdentity&lt;/code&gt; with the file contents.
&lt;/li&gt;&lt;li&gt;AWS verifies the token against &lt;a href='https://oidc.fly.io/' title=''&gt;&lt;code&gt;https://oidc.fly.io/&lt;/code&gt;&lt;/a&gt;&lt;code&gt;example/.well-known/openid-configuration&lt;/code&gt;, which references a key Fly.io manages on isolated hardware.
&lt;/li&gt;&lt;li&gt;AWS vends &lt;code&gt;STS&lt;/code&gt; credentials for the assumed &lt;code&gt;Role&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;The SDK uses the &lt;code&gt;STS&lt;/code&gt; credentials to access the S3 bucket.
&lt;/li&gt;&lt;li&gt;AWS checks the &lt;code&gt;Role&lt;/code&gt;&amp;rsquo;s IAM policy to see if it has access to the S3 bucket.
&lt;/li&gt;&lt;li&gt;AWS returns the contents of the bucket object.
&lt;/li&gt;&lt;/ol&gt;
&lt;h2 id='how-much-better-is-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-better-is-this' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How Much Better Is This?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;It is a lot better.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;They asymptotically approach the security properties of Macaroon tokens.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Most importantly: AWS &lt;code&gt;STS&lt;/code&gt; credentials are short-lived. Because they&amp;rsquo;re generated dynamically, rather than stored in a configuration file or environment variable, they&amp;rsquo;re already a little bit annoying for an attacker to recover. But they&amp;rsquo;re also dead in minutes. They have a sharply limited blast radius. They rotate themselves, and fail closed.&lt;/p&gt;

&lt;p&gt;They&amp;rsquo;re also easier to manage. This is a rare instance where you can reasonably drive the entire AWS side of the process from within the web console. Your cloud team adds &lt;code&gt;Roles&lt;/code&gt; all the time; this is just a &lt;code&gt;Role&lt;/code&gt; with an extra snippet of JSON. The resulting ARN isn&amp;rsquo;t even a secret; your cloud team could just email or Slack message it back to you.&lt;/p&gt;

&lt;p&gt;Finally, they offer finer-grained control.&lt;/p&gt;

&lt;p&gt;To understand the last part, let&amp;rsquo;s look at that extra snippet of JSON (the &amp;ldquo;Trust Policy&amp;rdquo;) your cloud team is sticking on the new &lt;code&gt;cat-bucket&lt;/code&gt; &lt;code&gt;Role&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ex6t9zqy"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-ex6t9zqy"&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::123456123456:oidc-provider/oidc.fly.io/example"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
              "StringEquals": {
                "oidc.fly.io/example:aud": "sts.amazonaws.com",
              },
               "StringLike": {
                "oidc.fly.io/example:sub": "example:weather-cat:*"
              }
            }
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;The &lt;code&gt;aud&lt;/code&gt; check guarantees &lt;code&gt;STS&lt;/code&gt; will only honor tokens that Fly.io deliberately vended for &lt;code&gt;STS&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Recall the OIDC token we dumped earlier; much of what&amp;rsquo;s in it, we can match in the Trust Policy. Every OIDC token Fly.io generates is going to have a &lt;code&gt;sub&lt;/code&gt; field formatted &lt;code&gt;org:app:machine&lt;/code&gt;, so we can lock IAM &lt;code&gt;Roles&lt;/code&gt; down to organizations, or to specific Fly Apps, or even specific Fly Machine instances.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Speedrun your app onto Fly.io.&lt;/h1&gt;
    &lt;p&gt;3&amp;hellip;2&amp;hellip;1&amp;hellip;&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/speedrun"&gt;
        Go! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='and-so' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#and-so' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;And So&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In case it&amp;rsquo;s not obvious: this pattern works for any AWS API, not just S3.&lt;/p&gt;

&lt;p&gt;Our OIDC support on the platform and in Fly Machines will set arbitrary OIDC &lt;code&gt;audience&lt;/code&gt; strings, so you can use it to authenticate to any OIDC-compliant cloud provider. It won&amp;rsquo;t be as slick on Azure or GCP, because we haven&amp;rsquo;t done the &lt;code&gt;init&lt;/code&gt; features to light their APIs up with a single environment variable — but those features are easy, and we&amp;rsquo;re just waiting for people to tell us what they need.&lt;/p&gt;

&lt;p&gt;For us, the gold standard for least-privilege, conditional access tokens remains Macaroons, and it&amp;rsquo;s unlikely that we&amp;rsquo;re going to do a bunch of internal stuff using OIDC. We even snuck Macaroons into this feature. But the security you&amp;rsquo;re getting from this OIDC dance closes a lot of the gap between hardcoded user credentials and Macaroons, and it&amp;rsquo;s easy to use — easier, in some ways, than it is to manage role-based access inside of a legacy EC2 deployment!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Picture This: Open Source AI for Image Description</title>
    <link rel="alternate" href="https://fly.io/blog/llm-image-description/"/>
    <id>https://fly.io/blog/llm-image-description/</id>
    <published>2024-05-09T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/llm-image-description/assets/image-description-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;I’m Nolan, and I work on Fly Machines here at Fly.io. We’re building a new public cloud—one where you can spin up CPU and GPU workloads, around the world, in a jiffy. &lt;a href="https://fly.io/speedrun/" title=""&gt;Try us out&lt;/a&gt;; you can be up and running in minutes. This is a post about LLMs being really helpful, and an extensible project you can build with open source on a weekend.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Picture this, if you will.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;re blind. You&amp;rsquo;re in an unfamiliar hotel room on a trip to Chicago.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;If you live in Chicago IRL, imagine the hotel in Winnipeg, &lt;a href="https://www.cbc.ca/history/EPISCONTENTSE1EP10CH3PA5LE.html" title=""&gt;the Chicago of the North&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;You&amp;rsquo;ve absent-mindedly set your coffee down, and can&amp;rsquo;t remember where. You&amp;rsquo;re looking for the thermostat so you don&amp;rsquo;t wake up frozen. Or, just maybe, you&amp;rsquo;re playing a fun-filled round of &amp;ldquo;find the damn light switch so your sighted partner can get some sleep already!&amp;rdquo;&lt;/p&gt;

&lt;p&gt;If, like me, you&amp;rsquo;ve been blind for a while, you have plenty of practice finding things without the luxury of a quick glance around. It may be more tedious than you&amp;rsquo;d like, but you&amp;rsquo;ll get it done.&lt;/p&gt;

&lt;p&gt;But the speed of innovation in machine learning and large language models has been dizzying, and in 2024 you can snap a photo with your phone and have an app like &lt;a href='https://www.bemyeyes.com/blog/announcing-be-my-ai/' title=''&gt;Be My AI&lt;/a&gt; or &lt;a href='https://www.seeingai.com/' title=''&gt;Seeing AI&lt;/a&gt; tell you where in that picture it found your missing coffee mug, or where it thinks the light switch is.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Creative switch locations seem to be a point of pride for hotels, so the light game may be good for a few rounds of quality entertainment, regardless of how good your AI is.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is &lt;em&gt;big&lt;/em&gt;. It&amp;rsquo;s hard for me to state just how exciting and empowering AI image descriptions have been for me without sounding like a shill. In the past year, I&amp;rsquo;ve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Found shit in strange hotel rooms. 
&lt;/li&gt;&lt;li&gt;Gotten descriptions of scenes and menus in otherwise inaccessible video games.
&lt;/li&gt;&lt;li&gt;Requested summaries of technical diagrams and other materials where details weren’t made available textually. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;I&amp;rsquo;ve been consistently blown away at how impressive and helpful AI-created image descriptions have been.&lt;/p&gt;

&lt;p&gt;Also&amp;hellip;&lt;/p&gt;
&lt;h2 id='which-thousand-words-is-this-picture-worth' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#which-thousand-words-is-this-picture-worth' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Which thousand words is this picture worth?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;As a blind internet user for the last three decades, I have extensive empirical evidence to corroborate what you already know in your heart: humans are pretty flaky about writing useful alt text for all the images they publish. This does tend to make large swaths of the internet inaccessible to me!&lt;/p&gt;

&lt;p&gt;In just a few years, the state of image description on the internet has gone from complete reliance on the aforementioned lovable, but ultimately tragically flawed, humans, to automated strings of words like &lt;code&gt;Image may contain person, glasses, confusion, banality, disillusionment&lt;/code&gt;, to LLM-generated text that reads a lot like it was written by a person, perhaps sipping from a steaming cup of Earl Grey as they reflect on their previous experiences of a background that features a tree with snow on its branches, suggesting that this scene takes place during winter.&lt;/p&gt;

&lt;p&gt;If an image is missing alt text, or if you want a second opinion, there are screen-reader addons, like &lt;a href='https://github.com/cartertemm/AI-content-describer/' title=''&gt;this one&lt;/a&gt; for &lt;a href='https://www.nvaccess.org/download/' title=''&gt;NVDA&lt;/a&gt;, that you can use with an API key to get image descriptions from GPT-4 or Google Gemini as you read. This is awesome! &lt;/p&gt;

&lt;p&gt;And this brings me to the nerd snipe. How hard would it be to build an image description service we can host ourselves, using open source technologies? It turns out to be spookily easy.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s what I came up with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href='https://ollama.com/' title=''&gt;Ollama&lt;/a&gt; to run the model
&lt;/li&gt;&lt;li&gt;A &lt;a href='https://pocketbase.io' title=''&gt;PocketBase&lt;/a&gt; project that provides a simple authenticated API for users to submit images, get descriptions, and ask followup questions about the image
&lt;/li&gt;&lt;li&gt;The simplest possible Python client to interact with the PocketBase app on behalf of users
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;The idea is to keep it modular and hackable, so if sentiment analysis or joke creation is your thing, you can swap out image description for that and have something going in, like, a weekend.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re like me, and you go skipping through recipe blogs to find the &amp;ldquo;go directly to recipe&amp;rdquo; link, find the code itself &lt;a href='https://github.com/superfly/llm-describer' title=''&gt;here&lt;/a&gt;. &lt;/p&gt;
&lt;h2 id='the-llm-is-the-easiest-part' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-llm-is-the-easiest-part' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The LLM is the easiest part&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;An API to accept images and prompts, run the model, and spit 
out answers sounds like a lot! But it&amp;rsquo;s the simplest part of this whole thing, because: 
that&amp;rsquo;s &lt;a href='https://ollama.com/' title=''&gt;Ollama&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can just run the Ollama Docker image, get it to grab the model 
you want to use, and that&amp;rsquo;s it. There&amp;rsquo;s your AI server. (We have a &lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;blog post&lt;/a&gt; 
all about deploying Ollama on Fly.io; Fly GPUs are rad, try&amp;#39;em out, etc.).&lt;/p&gt;

&lt;p&gt;For this project, we need a model that can make sense&amp;mdash;or at least words&amp;mdash;out of a picture. 
&lt;a href='https://llava-vl.github.io/' title=''&gt;LLaVA&lt;/a&gt; is a trained, Apache-licensed &amp;ldquo;large multimodal model&amp;rdquo; that fits the bill. 
Get the model with the Ollama CLI:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-jo727efe"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-jo727efe"&gt;ollama pull llava:34b
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="callout"&gt;&lt;p&gt;If you have hardware that can handle it, you could run this on your computer at home. If you run AI models on a cloud provider, be aware that GPU compute is expensive! &lt;strong class="font-semibold text-navy-950"&gt;It’s important to take steps to ensure you’re not paying for a massive GPU 24/7.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On Fly.io, at the time of writing, you’d achieve this with the &lt;a href="https://fly.io/docs/apps/autostart-stop/" title=""&gt;autostart and autostop&lt;/a&gt; functions of the Fly Proxy, restricting Ollama access to internal requests over &lt;a href="https://fly.io/docs/networking/private-networking/#flycast-private-fly-proxy-services" title=""&gt;Flycast&lt;/a&gt; from the PocketBase app. That way, if there haven’t been any requests for a few minutes, the Fly Proxy stops the Ollama &lt;a href="https://fly.io/docs/machines/" title=""&gt;Machine&lt;/a&gt;, which releases the CPU, GPU, and RAM allocated to it. &lt;a href="https://fly.io/blog/scaling-llm-ollama/" title=""&gt;Here’s a post&lt;/a&gt; that goes into more detail. &lt;/p&gt;
&lt;/div&gt;&lt;h2 id='a-multi-tool-on-the-backend' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-multi-tool-on-the-backend' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A multi-tool on the backend&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I want user auth to make sure just anyone can&amp;rsquo;t grab my &amp;ldquo;image description service&amp;rdquo; and keep it busy generating short stories about their cat. If I build this out into a service for others to use, I might also want business logic around plans or
credits, or mobile-friendly APIs for use in the field. &lt;a href='https://pocketbase.io' title=''&gt;PocketBase&lt;/a&gt; provides a scaffolding for all of it. It&amp;rsquo;s a Swiss army knife: a Firebase-like API on top of SQLite, complete with authentication, authorization, an admin UI, extensibility in JavaScript and Go, and various client-side APIs.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Yes, &lt;em&gt;of course&lt;/em&gt; I’ve used an LLM to generate feline fanfic. Theme songs too. Hasn’t everyone? &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I &amp;ldquo;faked&amp;rdquo; a task-specific API that supports followup questions by extending PocketBase in Go, modeling requests and responses as &lt;a href='https://pocketbase.io/docs/collections/' title=''&gt;collections&lt;/a&gt; (i.e. SQLite tables) with &lt;a href='https://pocketbase.io/docs/go-event-hooks/' title=''&gt;event hooks&lt;/a&gt; to trigger pre-set interactions with the Ollama app (via &lt;a href='https://tmc.github.io/langchaingo' title=''&gt;LangChainGo&lt;/a&gt;) and the client (via the PocketBase API).&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re following along, &lt;a href='https://github.com/superfly/llm-describer/blob/main/describer.go' title=''&gt;here&amp;rsquo;s the module&lt;/a&gt;
that handles all that, along with initializing the LLM connection.&lt;/p&gt;

&lt;p&gt;In a nutshell, this is the dance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When a user uploads an image, a hook on the &lt;code&gt;images&lt;/code&gt; collection sends the image to Ollama, along with this prompt:
&lt;code&gt;&amp;quot;You are a helpful assistant describing images for blind screen reader users. Please describe this image.&amp;quot;&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Ollama sends back its response, which the backend 1) passes back to the client and 2) stores in its &lt;code&gt;followups&lt;/code&gt; collection for future reference.
&lt;/li&gt;&lt;li&gt;If the user responds with a followup question about the image and description, that also 
goes into the &lt;code&gt;followups&lt;/code&gt; collection; user-initiated changes to this collection trigger a hook to chain the new 
followup question with the image and the chat history into a new request for the model.
&lt;/li&gt;&lt;li&gt;Lather, rinse, repeat.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is a super simple hack to handle followup questions, and it’ll let you keep adding followups until 
something breaks. You&amp;rsquo;ll see the quality of responses get poorer&amp;mdash;possibly incoherent&amp;mdash;as the context 
exceeds the context window.&lt;/p&gt;

&lt;p&gt;I also set up &lt;a href='https://pocketbase.io/docs/api-rules-and-filters/' title=''&gt;API rules&lt;/a&gt; in PocketBase,
ensuring that users can&amp;rsquo;t read to and write from others&amp;rsquo; chats with the AI.&lt;/p&gt;

&lt;p&gt;If image descriptions aren&amp;rsquo;t your thing, this business logic is easily swappable 
for joke generation, extracting details from text, any other simple task you 
might want to throw at an LLM. Just slot the best model into Ollama (LLaVA is pretty OK as a general starting point too), and match the PocketBase schema and pre-set prompts to your application.&lt;/p&gt;
&lt;h2 id='a-seedling-of-a-client' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-seedling-of-a-client' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A seedling of a client&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;With the image description service in place, the user can talk to it with any client that speaks the PocketBase API. PocketBase already has SDK clients in JavaScript and Dart, but because my screen reader is &lt;a href='https://github.com/nvaccess/nvda' title=''&gt;written in Python&lt;/a&gt;, I went with a &lt;a href='https://pypi.org/project/pocketbase/' title=''&gt;community-created Python library&lt;/a&gt;. That way I can build this out into an NVDA add-on 
if I want to.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re a fancy Python developer, you probably have your preferred tooling for
handling virtualenvs and friends. I&amp;rsquo;m not, and since my screen reader doesn&amp;rsquo;t use those
anyway, I just &lt;code&gt;pip install&lt;/code&gt;ed the library so my client can import it:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-8wi7b0yx"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-8wi7b0yx"&gt;pip install pocketbase
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href='https://github.com/superfly/llm-describer/blob/main/__init__.py' title=''&gt;My client&lt;/a&gt; is a very simple script. 
It expects a couple of things: a file called &lt;code&gt;image.jpg&lt;/code&gt;, located in the current directory, 
and environment variables to provide the service URL and user credentials to log into it with.&lt;/p&gt;

&lt;p&gt;When you run the client script, it uploads the image to the user’s &lt;code&gt;images&lt;/code&gt; collection on the 
backend app, starting the back-and-forth between user and model we saw in the previous section. 
The client prints the model&amp;rsquo;s output to the CLI and prompts the user to input a followup question, 
which it passes up to the &lt;code&gt;followups&lt;/code&gt; collection, and so on.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;This can run on Fly.io.&lt;/h1&gt;
    &lt;p&gt;Run your LLM on a datacenter-grade GPU.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/gpu/"&gt;
        Try out a Fly GPU &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='all-together-now' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#all-together-now' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;All together now&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I grabbed &lt;a href='https://unsplash.com/photos/brown-trees-beside-river-under-blue-sky-during-daytime-kSvpTrfhaiU' title=''&gt;this
image&lt;/a&gt;
and saved it to a file called &lt;em&gt;image.jpg&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;While I knew I was downloading an image of a winter scene, all I see on Unsplash is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;brown trees beside river under blue sky during daytime Bright winter landscape
with lake, snow, forest, beautiful blue sky and white clouds. An example of
charming wildlife in Russia.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let&amp;rsquo;s see what our very own AI describer thinks of this picture:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative plain"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-3cm0shen"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-3cm0shen"&gt;$ python __init__.py
The image depicts a serene winter scene. There is a small stream of water flowing through the center, with patches of snow on its surface and along the banks. The surrounding area appears to be a mix of bare trees and shrubs, suggesting it's late in the season or possibly early spring when some plants have started to grow again but haven't yet leafed out.

The sky is clear and blue, indicating good weather conditions at the time the photo was taken. The sunlight seems to be coming from the left side of the image, casting a warm glow on the scene and creating reflections in the water. There are no visible texts or distinguishing marks that provide additional context about the location or the photographer.

The overall atmosphere is peaceful and natural, with the quiet beauty of the landscape undisturbed except for the gentle movement of the stream.
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Is it a stellar description? Maybe not, but it certainly gives me a better sense of connection with the scene.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s see how our describer copes with a followup question.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative plain"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-o72lg1fm"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-o72lg1fm"&gt;Enter your followup question, or 'quit' to quit: What types of trees are in the image?
Sending followup question
It's difficult to determine the exact species of trees in the image without more specific details, as many deciduous trees lose their leaves during winter and look quite similar. However, based on the general appearance of the bare branches and the fact that they are leafless, it is likely that these are common deciduous trees such as oaks, maples, or elms.

The presence of snow and the absence of foliage suggest that this photo was taken during winter or early spring when the trees have not yet started to bloom again. The exact species would require a closer examination of the bark, buds, and other distinguishing features, which are not clearly visible in this image.
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Boo, the general-purpose LLaVA model couldn&amp;rsquo;t identify the leafless trees. At least it knows why it can&amp;rsquo;t. Maybe there&amp;rsquo;s a better model out 
there for that. Or we could train one, if we really needed tree identification! We could make every component of 
this service more sophisticated! &lt;/p&gt;

&lt;p&gt;But that I, personally, can make a proof of concept like this with a few days of effort
continues to boggle my mind. Thanks to a handful of amazing open source projects, it&amp;rsquo;s really, spookily, easy. And from here, I (or you) can build out a screen-reader addon, or a mobile app, or a different kind of AI service, with modular changes.&lt;/p&gt;
&lt;h2 id='deployment-notes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-notes' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Deployment notes&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;On Fly.io, stopping GPU Machines saves you a bunch of money and some carbon footprint, in return for cold-start latency when you make a request for the first time in more than a few minutes. In testing this project, on the &lt;code&gt;a100-40gb&lt;/code&gt; Fly Machine preset, the 34b-parameter LLaVA model took several seconds to generate each response. If the Machine was stopped when the request came in, starting it up took another handful of seconds, followed by several tens of seconds to load the model into GPU RAM. The total time from cold start to completed description was about 45 seconds. Just something to keep in mind.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re running Ollama in the cloud, you likely want to put the model onto storage that&amp;rsquo;s persistent, so you don&amp;rsquo;t have to download it repeatedly. You could also build the model into a Docker image ahead of deployment.&lt;/p&gt;

&lt;p&gt;The PocketBase Golang app compiles to a single executable that you can run wherever.
I run it on Fly.io, unsurprisingly, and the &lt;a href='https://github.com/superfly/llm-describer/' title=''&gt;repo&lt;/a&gt; comes with a Dockerfile and a &lt;a href='https://fly.io/docs/reference/configuration/' title=''&gt;&lt;code&gt;fly.toml&lt;/code&gt;&lt;/a&gt; config file, which you can edit to point at your own Ollama instance. It uses a small persistent storage volume for the SQLite database. Under testing, it runs fine on a &lt;code&gt;shared-cpu-1x&lt;/code&gt; Machine. &lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>JIT WireGuard</title>
    <link rel="alternate" href="https://fly.io/blog/jit-wireguard-peers/"/>
    <id>https://fly.io/blog/jit-wireguard-peers/</id>
    <published>2024-03-12T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/jit-wireguard-peers/assets/network-thumbnail.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. We do a lot of stuff with WireGuard, which has become a part of our customer API. This is a quick story about some tricks we played to make WireGuard faster and more scalable for the hundreds of thousands of people who now use it here.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;One of many odd decisions we&amp;rsquo;ve made at Fly.io is how we use WireGuard. It&amp;rsquo;s not just that we use it in many places where other shops would use HTTPS and REST APIs. We&amp;rsquo;ve gone a step beyond that: every time you run &lt;code&gt;flyctl&lt;/code&gt;, our lovable, sprawling CLI, it conjures a TCP/IP stack out of thin air, with its own IPv6 address, and speaks directly to Fly Machines running on our networks.&lt;/p&gt;

&lt;p&gt;There are plusses and minuses to this approach, which we talked about &lt;a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''&gt;in a blog post a couple years back&lt;/a&gt;. Some things, like remote-operated Docker builders, get easier to express (a Fly Machine, as far as &lt;code&gt;flyctl&lt;/code&gt; is concerned, might as well be on the same LAN). But everything generally gets trickier to keep running reliably.&lt;/p&gt;

&lt;p&gt;It was a decision. We own it.&lt;/p&gt;

&lt;p&gt;Anyways, we&amp;rsquo;ve made some improvements recently, and I&amp;rsquo;d like to talk about them.&lt;/p&gt;
&lt;h2 id='where-we-left-off' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-we-left-off' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Where we left off&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Until a few weeks ago, our gateways ran on a pretty simple system.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We operate dozens of &amp;ldquo;gateway&amp;rdquo; servers around the world, whose sole purpose is to accept incoming WireGuard connections and connect them to the appropriate private networks.
&lt;/li&gt;&lt;li&gt;Any time you run &lt;code&gt;flyctl&lt;/code&gt; and it needs to talk to a Fly Machine (to build a container, pop an SSH console, copy files, or proxy to a service you&amp;rsquo;re running), it spawns or connects to a background agent process.
&lt;/li&gt;&lt;li&gt;The first time it runs, the agent generates a new WireGuard peer configuration from our GraphQL API. WireGuard peer configurations are very simple: just a public key and an address to connect to.
&lt;/li&gt;&lt;li&gt;Our API in turn takes that peer configuration and sends it to the appropriate gateway (say, &lt;code&gt;ord&lt;/code&gt;, if you&amp;rsquo;re near Chicago) via an RPC we send over the NATS messaging system.
&lt;/li&gt;&lt;li&gt;On the gateway, a service called &lt;code&gt;wggwd&lt;/code&gt; accepts that configuration, saves it to a SQLite database, and adds it to the kernel using WireGuard&amp;rsquo;s Golang libraries. &lt;code&gt;wggwd&lt;/code&gt; acknowledges the installation of the peer to the API.
&lt;/li&gt;&lt;li&gt;The API replies to your GraphQL request, with the configuration.
&lt;/li&gt;&lt;li&gt;Your &lt;code&gt;flyctl&lt;/code&gt; connects to the WireGuard peer, which works, because you receiving the configuration means it’s installed on the gateway.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;I copy-pasted those last two bullet points from &lt;a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''&gt;that two-year-old post&lt;/a&gt;, because when it works, it does &lt;em&gt;just work&lt;/em&gt; reasonably well. (We ultimately did end up defaulting everybody to WireGuard-over-WebSockets, though.)&lt;/p&gt;

&lt;p&gt;But if it always worked, we wouldn&amp;rsquo;t be here, would we?&lt;/p&gt;

&lt;p&gt;We ran into two annoying problems:&lt;/p&gt;

&lt;p&gt;One: NATS is fast, but doesn’t guarantee delivery. Back in 2022, Fly.io was pretty big on NATS internally. We&amp;rsquo;ve moved away from it. For instance, our &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;internal &lt;code&gt;flyd&lt;/code&gt; API&lt;/a&gt; used to be driven by NATS; today, it&amp;rsquo;s HTTP.  Our NATS cluster was losing too many messages to host a reliable API on it. Scaling back our use of NATS made WireGuard gateways better, but still not great.&lt;/p&gt;

&lt;p&gt;Two: When &lt;code&gt;flyctl&lt;/code&gt; exits, the WireGuard peer it created sticks around on the gateway. Nothing cleans up old peers. After all, you&amp;rsquo;re likely going to come back tomorrow and deploy a new version of your app, or &lt;code&gt;fly ssh console&lt;/code&gt; into it to debug something. Why remove a peer just to re-add it the next day?&lt;/p&gt;

&lt;p&gt;Unfortunately, the vast majority of peers are created by &lt;code&gt;flyctl&lt;/code&gt; in CI jobs, which don&amp;rsquo;t have persistent storage and can&amp;rsquo;t reconnect to the same peer the next run; they generate new peers every time, no matter what.&lt;/p&gt;

&lt;p&gt;So, we ended up with a not-reliable-enough provisioning system, and gateways with hundreds of thousands of peers that will never be used again. The high stale peer count made kernel WireGuard operations very slow - especially loading all the peers back into the kernel after a gateway server reboot - as well as some kernel panics.&lt;/p&gt;

&lt;p&gt;There had to be&lt;/p&gt;
&lt;h2 id='a-better-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-better-way' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A better way.&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Storing bajillions of WireGuard peers is no big challenge for any serious n-tier RDBMS. This isn&amp;rsquo;t &amp;ldquo;big data&amp;rdquo;. The problem we have at Fly.io is that our gateways don&amp;rsquo;t have serious n-tier RDBMSs. They&amp;rsquo;re small. Scrappy. They live off the land.&lt;/p&gt;

&lt;p&gt;Seriously, though: you could store every WireGuard peer everybody has ever used at Fly.io in a single SQLite database, easily.  What you can&amp;rsquo;t do is store them all in the Linux kernel.&lt;/p&gt;

&lt;p&gt;So, at some point, as you push more and more peer configurations to a gateway, you have to start making decisions about which peers you&amp;rsquo;ll enable in the kernel, and which you won&amp;rsquo;t.&lt;/p&gt;

&lt;p&gt;Wouldn&amp;rsquo;t it be nice if we just didn&amp;rsquo;t have this problem? What if, instead of pushing configs to gateways, we had the gateways pull them from our API on demand?&lt;/p&gt;

&lt;p&gt;If you did that, peers would only have to be added to the kernel when the client wanted to connect. You could yeet them out of the kernel any time you wanted; the next time the client connected, they&amp;rsquo;d just get pulled again, and everything would work fine.&lt;/p&gt;

&lt;p&gt;The problem you quickly run into to build this design is that Linux kernel WireGuard doesn&amp;rsquo;t have a feature for installing peers on demand. However:&lt;/p&gt;
&lt;h2 id='it-is-possible-to-jit-wireguard-peers' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#it-is-possible-to-jit-wireguard-peers' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;It is possible to JIT WireGuard peers&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The Linux kernel&amp;rsquo;s &lt;a href='https://github.com/WireGuard/wgctrl-go' title=''&gt;interface for configuring WireGuard&lt;/a&gt; is &lt;a href='https://docs.kernel.org/userspace-api/netlink/intro.html' title=''&gt;Netlink&lt;/a&gt; (which is basically a way to create a userland socket to talk to a kernel service). Here&amp;rsquo;s a &lt;a href='https://github.com/WireGuard/wg-dynamic/blob/master/netlink.h' title=''&gt;summary of it as a C API&lt;/a&gt;. Note that there&amp;rsquo;s no API call to subscribe for &amp;ldquo;incoming connection attempt&amp;rdquo; events.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s OK! We can just make our own events. WireGuard connection requests are packets, and they&amp;rsquo;re easily identifiable, so we can efficiently snatch them with a BPF filter and a &lt;a href='https://github.com/google/gopacket' title=''&gt;packet socket&lt;/a&gt;.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Most of the time, it’s even easier for us to get the raw WireGuard packets, because our users now default to WebSockets WireGuard (which is just an unauthenticated WebSockets connect that shuttles framed UDP packets to and from an interface on the gateway), so that people who have trouble talking end-to-end in UDP can bring connections up.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We own the daemon code for that, and can just hook the packet receive function to snarf WireGuard packets.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s not obvious, but WireGuard doesn&amp;rsquo;t have notions of &amp;ldquo;client&amp;rdquo; or &amp;ldquo;server&amp;rdquo;. It&amp;rsquo;s a pure point-to-point protocol; peers connect to each other when they have traffic to send. The first peer to connect is called the &lt;strong class='font-semibold text-navy-950'&gt;initiator&lt;/strong&gt;, and the peer it connects to is the &lt;strong class='font-semibold text-navy-950'&gt;responder&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;a href="https://www.wireguard.com/papers/wireguard.pdf" title=""&gt;&lt;em&gt;The WireGuard paper&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is a good read.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;For Fly.io, &lt;code&gt;flyctl&lt;/code&gt; is typically our initiator,  sending a single UDP packet to the gateway, which is the responder. According &lt;a href='https://www.wireguard.com/papers/wireguard.pdf' title=''&gt;to the WireGuard paper&lt;/a&gt;, this first packet is a &lt;code&gt;handshake initiation&lt;/code&gt;.  It gets better: the packet type is recorded in a single plaintext byte. So this simple BPF filter catches all the incoming connections: &lt;code&gt;udp and dst port 51820 and udp[8] = 1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In most other protocols, we&amp;rsquo;d be done at this point; we&amp;rsquo;d just scrape the username or whatnot out of the packet, go fetch the matching configuration, and install it in the kernel. With WireGuard, not so fast. WireGuard is based on Trevor Perrin&amp;rsquo;s &lt;a href='http://www.noiseprotocol.org/' title=''&gt;Noise Protocol Framework&lt;/a&gt;, and Noise goes way out of its way to &lt;a href='http://www.noiseprotocol.org/noise.html#identity-hiding' title=''&gt;hide identities&lt;/a&gt; during handshakes. To identify incoming requests, we&amp;rsquo;ll need to run enough Noise cryptography to decrypt the identity.&lt;/p&gt;

&lt;p&gt;The code to do this is fussy, but it&amp;rsquo;s relatively short (about 200 lines). Helpfully, the kernel Netlink interface will give a privileged process the private key for an interface, so the secrets we need to unwrap WireGuard are easy to get. Then it&amp;rsquo;s just a matter of running the first bit of the Noise handshake. If you&amp;rsquo;re that kind of nerdy, &lt;a href='https://gist.github.com/tqbf/9f2c2852e976e6566f962d9bca83062b' title=''&gt;here&amp;rsquo;s the code.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, we have the event feed we wanted: the public keys of every user trying to make a WireGuard connection to our gateways. We keep a rate-limited cache in SQLite, and when we see new peers, we&amp;rsquo;ll make an internal HTTP API request to fetch the matching peer information and install it. This fits nicely into the little daemon that already runs on our gateways to manage WireGuard, and allows us to ruthlessly and recklessly remove stale peers with a &lt;code&gt;cron&lt;/code&gt; job.&lt;/p&gt;

&lt;p&gt;But wait! There&amp;rsquo;s more! We bounced this plan off Jason Donenfeld, and he tipped us off on a sneaky feature of the Linux WireGuard Netlink interface.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Jason is the hardest working person in show business.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Our API fetch for new peers is generally not going to be fast enough to respond to the first handshake initiation message a new client sends us. That&amp;rsquo;s OK; WireGuard is pretty fast about retrying. But we can do better.&lt;/p&gt;

&lt;p&gt;When we get an incoming initiation message, we have the 4-tuple address of the desired connection, including the ephemeral source port &lt;code&gt;flyctl&lt;/code&gt; is using. We can install the peer as if we&amp;rsquo;re the initiator, and &lt;code&gt;flyctl&lt;/code&gt; is the responder. The Linux kernel will initiate a WireGuard connection back to &lt;code&gt;flyctl&lt;/code&gt;. This works; the protocol doesn&amp;rsquo;t care a whole lot who&amp;rsquo;s the server and who&amp;rsquo;s the client. We get new connections established about as fast as they can possibly be installed.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Launch an app in minutes&lt;/h1&gt;
    &lt;p&gt;Speedrun an app onto Fly.io and get your own JIT WireGuard peer&amp;nbsp✨&lt;/p&gt;
      &lt;a class="btn btn-lg" href="/docs/speedrun/"&gt;
        Speedrun &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='look-at-this-graph' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-this-graph' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Look at this graph&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve been running this in production for a few weeks and we&amp;rsquo;re feeling pretty happy about it. We went from thousands, or hundreds of thousands, of stale WireGuard peers on a gateway to what rounds to none. Gateways now hold a lot less state, are faster at setting up peers, and can be rebooted without having to wait for many unused peers to be loaded back into the kernel.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ll leave you with this happy Grafana chart from the day of the switchover.&lt;/p&gt;

&lt;p&gt;&lt;img alt="a Grafana chart of &amp;#39;kernel_stale_wg_peer_count&amp;#39; vs. time. For the first few hours, all traces are flat. Most are at values between 0 and 50,000 and the top-most is just under 550,000. Towards the end of the graph, each line in turn jumps sharply down to the bottom, and at the end of the chart all datapoints are indistinguishable from 0." src="/blog/jit-wireguard-peers/assets/wireguard-peers-graph.webp" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Editor&amp;rsquo;s note:&lt;/strong&gt; Despite our tearful protests, Lillian has decided to move on from Fly.io to explore new pursuits. We wish her much success and happiness!&amp;nbsp;✨&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Fly Kubernetes does more now</title>
    <link rel="alternate" href="https://fly.io/blog/fks-beta-live/"/>
    <id>https://fly.io/blog/fks-beta-live/</id>
    <published>2024-03-07T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/fks-beta-live/assets/fks-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Eons ago, we &lt;a href="https://fly.io/blog/fks/" title=""&gt;announced&lt;/a&gt; we were working on &lt;a href="https://fly.io/docs/kubernetes/" title=""&gt;Fly Kubernetes&lt;/a&gt;. It drummed up enough excitement to prove we were heading in the right direction. So, we got hard to work to get from barebones “early access” to a beta release. We’ll be onboarding customers to the closed beta over the next few weeks. Email us at &lt;a href="mailto:[email protected]"&gt;[email protected]&lt;/a&gt; and we’ll hook you up.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Fly Kubernetes is the &amp;ldquo;blessed path&amp;quot;™️ to using Kubernetes backed by Fly.io infrastructure. Or, in simpler terms, it is our managed Kubernetes service. We take care of the complexity of operating the Kubernetes control plane, leaving you with the unfettered joy of deploying your Kubernetes workloads. If you love Fly.io and K8s, this product is for you.&lt;/p&gt;
&lt;h2 id='what-even-is-a-kubernete' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-even-is-a-kubernete' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What even is a Kubernete?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So how did this all come to be&amp;mdash;and what even is a Kubernete?&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;You can see more fun details in &lt;a href="https://fly.io/blog/fks/" title=""&gt;Introducing Fly Kubernetes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you wade through all the YAML and &lt;a href='https://landscape.cncf.io/' title=''&gt;CNCF projects&lt;/a&gt;, what&amp;rsquo;s left is an API for declaring workloads and how it should be accessed. &lt;/p&gt;

&lt;p&gt;But that&amp;rsquo;s not what people usually talk / groan about. It&amp;rsquo;s everything else that comes along with adopting Kubernetes: a container runtime (CRI), networking between workloads (CNI) which leads to DNS (CoreDNS). Then you layer on Prometheus for metrics and whatever the logging daemon du jour is at the time. Now you get to debate which Ingress&amp;mdash;strike that&amp;mdash;&lt;em&gt;Gateway&lt;/em&gt; API to deploy and if the next thing is anything to do with a Service Mess, then as they like to say where I live, &amp;quot;bless your heart&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Finally, there&amp;rsquo;s capacity planning. You&amp;rsquo;ve got to pick and choose where, how and what the &lt;a href='https://kubernetes.io/docs/concepts/architecture/nodes/' title=''&gt;Nodes&lt;/a&gt; will look like in order to configure and run the workloads.&lt;/p&gt;

&lt;p&gt;When we began thinking about what a Fly Kubernetes Service could look like, we started from first principles, as we do with most everything here. The best way we can describe it is the &lt;a href='https://www.youtube.com/watch?v=Ddk9ci6geSs' title=''&gt;scene from Iron Man 2 when Tony Stark discovers a new element&lt;/a&gt;. As he&amp;rsquo;s looking at the knowledge left behind by those that came before, he starts to imagine something entirely different and more capable than could have been accomplished previously. That&amp;rsquo;s what happened to JP, but with K3s and Virtual Kubelet.&lt;/p&gt;
&lt;h2 id='ok-then-wtf-whats-the-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ok-then-wtf-whats-the-fks' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;OK then, WTF (what&amp;rsquo;s the FKS)?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We looked at what people need to get started—the API—and then started peeling away all the noise, filling in the gaps to connect things together to provide the power. Here&amp;rsquo;s how this looks currently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Containerd/CRI → &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;flyd&lt;/a&gt; + Firecracker + &lt;a href='https://fly.io/blog/docker-without-docker/' title=''&gt;our init&lt;/a&gt;: our system transmogrifies Docker containers into Firecracker microVMs
&lt;/li&gt;&lt;li&gt;Networking/CNI → Our &lt;a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''&gt;internal WireGuard mesh&lt;/a&gt; connects your pods together
&lt;/li&gt;&lt;li&gt;Pods → Fly Machines VMs
&lt;/li&gt;&lt;li&gt;Secrets → Secrets, only not the base64&amp;rsquo;d kind
&lt;/li&gt;&lt;li&gt;Services → The Fly Proxy
&lt;/li&gt;&lt;li&gt;CoreDNS → CoreDNS (to be replaced with our custom internal DNS)
&lt;/li&gt;&lt;li&gt;Persistent Volumes → Fly Volumes (coming soon)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Now&amp;hellip;not everything is a one-to-one comparison, and we explicitly did not set out to support any and every configuration. We aren&amp;rsquo;t dealing with resources like Network Policy and init containers, though we&amp;rsquo;re also not completely ignoring them. By mapping many of the core primitives of Kubernetes to a Fly.io resource, we&amp;rsquo;re able to focus on continuing to build the primitives that make our cloud better for workloads of all shapes and sizes.&lt;/p&gt;

&lt;p&gt;A key thing to notice above is that there&amp;rsquo;s no &amp;ldquo;Node&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://virtual-kubelet.io/' title=''&gt;Virtual Kubelet&lt;/a&gt; plays a central role in FKS. It&amp;rsquo;s magic, really. A Virtual Kubelet acts as if it&amp;rsquo;s a standard Kubelet running on a Node, eager to run your workloads. However, there&amp;rsquo;s no Node backing it. It instead behaves like an API, receiving requests from Kubernetes and transforming them into requests to deploy on a cloud compute service. In our case, that&amp;rsquo;s Fly Machines.&lt;/p&gt;

&lt;p&gt;So what we have is Kubernetes calling out to our &lt;a href='https://virtual-kubelet.io/docs/providers/' title=''&gt;Virtual Kubelet provider&lt;/a&gt;, a small Golang program we run alongside K3s, to create and run your pod. It creates &lt;a href='https://fly.io/blog/docker-without-docker/' title=''&gt;your pod as a Fly Machine&lt;/a&gt;, via the &lt;a href='/docs/machines/api/' title=''&gt;Fly Machines API&lt;/a&gt;, deploying it to any underlying host within that region. This shifts the burden of managing hardware capacity from you to us. We think that&amp;rsquo;s a cool trick&amp;mdash;thanks, Virtual Kubelet magic!&lt;/p&gt;
&lt;h2 id='speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#speedrun' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Speedrun&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;You can deploy your workloads (including GPUs) across any of our available regions using the Kubernetes API.&lt;/p&gt;

&lt;p&gt;You create a cluster with &lt;code&gt;flyctl&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ly2t5r5g"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-ly2t5r5g"&gt;fly ext k8s create --name hello --org personal --region iad
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When a cluster is created, it has the standard &lt;code&gt;default&lt;/code&gt; namespace. You can inspect it:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-6wmi0e1f"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-6wmi0e1f"&gt;kubectl get ns default --show-labels
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative output"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-vwi22w41"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight output whitespace-pre'&gt;&lt;code id="code-vwi22w41"&gt;NAME      STATUS   AGE   LABELS
default   Active   20d   fly.io/app=fks-default-7zyjm3ovpdxmd0ep,kubernetes.io/metadata.name=default
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;fly.io/app&lt;/code&gt; label shows the name of the Fly App that corresponds to your cluster.&lt;/p&gt;

&lt;p&gt;It would seem appropriate to deploy the &lt;a href='https://github.com/kubernetes-up-and-running/kuard' title=''&gt;Kubernetes Up And Running demo&lt;/a&gt; here, but since your pods are connected over an &lt;a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''&gt;IPv6 WireGuard mesh&lt;/a&gt;, we&amp;rsquo;re going to use a &lt;a href='https://github.com/jipperinbham/kuard' title=''&gt;fork&lt;/a&gt; with support for &lt;a href='https://github.com/kubernetes-up-and-running/kuard/issues/46' title=''&gt;IPv6 DNS&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1t0a2uma"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-1t0a2uma"&gt;kubectl run \
  --image=ghcr.io/jipperinbham/kuard-amd64:blue \
  --labels="app=kuard-fks" \
  kuard
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And you can see its Machine representation via:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-leb9l9a3"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-leb9l9a3"&gt;fly machine list --app fks-default-7zyjm3ovpdxmd0ep
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative output"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-w2frd6os"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight output whitespace-pre'&gt;&lt;code id="code-w2frd6os"&gt;ID              NAME        STATE   REGION  IMAGE                           IP ADDRESS                      VOLUME  CREATED                 LAST UPDATED            APP PLATFORM    PROCESS GROUP   SIZE
1852291c46ded8  kuard       started iad     jipperinbham/kuard-amd64:blue   fdaa:0:48c8:a7b:228:4b6d:6e20:2         2024-03-05T18:54:41Z    2024-03-05T18:54:44Z                                    shared-cpu-1x:256MB
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;This is important! Your pod is a Fly Machine! While we don’t yet support all kubectl features, Fly.io tooling will &amp;ldquo;just work&amp;rdquo; for cases where we don&amp;rsquo;t yet support the kubectl way. So, for example, we don&amp;rsquo;t have &lt;code&gt;kubectl port-forward&lt;/code&gt; and &lt;code&gt;kubectl exec&lt;/code&gt;, but you can use flyctl to forward ports and get a shell into a pod.&lt;/p&gt;

&lt;p&gt;Expose it to your internal network using the standard ClusterIP Service:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1xzi76dc"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-1xzi76dc"&gt;kubectl expose pod kuard \
  --name=kuard \
  --port=8080 \
  --target-port=8080 \
  --selector='app=kuard-fks'
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;ClusterIP Services work natively, and Fly.io internal DNS supports them. Within the cluster, CoreDNS works too.&lt;/p&gt;

&lt;p&gt;Access this Service locally via &lt;a href='https://fly.io/docs/networking/private-networking/#flycast-private-load-balancing' title=''&gt;flycast&lt;/a&gt;: Get connected to your org&amp;rsquo;s &lt;a href='https://fly.io/docs/networking/private-networking/' title=''&gt;6PN private WireGuard network&lt;/a&gt;. Get kubectl to describe the &lt;code&gt;kuard&lt;/code&gt; Service:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ucdw0m7q"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-ucdw0m7q"&gt;kubectl describe svc kuard
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative output"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1uyr9m4j"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight output'&gt;&lt;code id="code-1uyr9m4j"&gt;Name:              kuard
Namespace:         default
Labels:            app=kuard-fks
Annotations:       fly.io/clusterip-allocator: configured
                   service.fly.io/sync-version: 11507529969321451315
Selector:          app=kuard-fks
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv6
IP:                fdaa:0:48c8:0:1::1a
IPs:               fdaa:0:48c8:0:1::1a
Port:              &amp;lt;unset&amp;gt;  8080/TCP
TargetPort:        8080/TCP
Endpoints:         [fdaa:0:48c8:a7b:228:4b6d:6e20:2]:8080
Session Affinity:  None
Events:            &amp;lt;none&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can pull out the Service&amp;rsquo;s IP address from the above output, and get at the KUARD UI using that: in this case, &lt;code&gt;http://[fdaa:0:48c8:0:1::1a]:8080&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Using internal DNS: &lt;code&gt;http://&amp;lt;service_name&amp;gt;.svc.&amp;lt;app_name&amp;gt;.flycast:8080&lt;/code&gt;. Or, in our example: &lt;code&gt;http://kuard.svc.fks-default-7zyjm3ovpdxmd0ep.flycast:8080&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And finally CoreDNS: &lt;code&gt;&amp;lt;service_name&amp;gt;.&amp;lt;namespace&amp;gt;.svc.cluster.local&lt;/code&gt; resolves to the &lt;code&gt;fdaa&lt;/code&gt; IP and is routable within the cluster.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Get in on the FKS beta&lt;/h1&gt;
    &lt;p&gt;Email us at [email protected]&lt;/p&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='pricing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pricing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Pricing&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The Fly Kubernetes Service is free during the beta. Fly Machines and Fly Volumes you create with it will cost the &lt;a href='https://fly.io/docs/about/pricing/' title=''&gt;same as for your other Fly.io projects&lt;/a&gt;. It&amp;rsquo;ll be &lt;a href='https://fly.io/docs/about/pricing/#fly-kubernetes' title=''&gt;$75/mo per cluster&lt;/a&gt; after that, plus the cost of the other resources you create.&lt;/p&gt;
&lt;h2 id='today-and-the-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#today-and-the-future' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Today and the future&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Today, Fly Kubernetes supports only a portion of the Kubernetes API. You can deploy pods using Deployments/ReplicaSets. Pods are able to communicate via Services using the standard K8s DNS format. Ephemeral and persistent volumes are supported.&lt;/p&gt;

&lt;p&gt;The most notable absences are: multi-container pods, StatefulSets, network policies, horizontal pod autoscaling and emptyDir volumes. We&amp;rsquo;re working at supporting autoscaling and emptyDir volumes in the coming weeks and multi-container pods in the coming months.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;ve made it this far and are eagerly awaiting your chance to tell us and the rest of the internet &amp;ldquo;this isn&amp;rsquo;t Kubernetes!&amp;rdquo;, well, we agree! It&amp;rsquo;s not something we take lightly. We&amp;rsquo;re still building, and conformance tests may be in the future for FKS. We&amp;rsquo;ve made a deliberate decision to only care about fast launching VMs as the one and only way to run workloads on our cloud. And we also know enough of our customers would like to use the Kubernetes API to create a fast launching VM in the form of a Pod, and that&amp;rsquo;s where this story begins. &lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Globally Distributed Object Storage with Tigris</title>
    <link rel="alternate" href="https://fly.io/blog/tigris-public-beta/"/>
    <id>https://fly.io/blog/tigris-public-beta/</id>
    <published>2024-02-15T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/tigris-public-beta/assets/tigris-public-beta-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. That’s pretty cool, but we want to talk about something someone else built, that &lt;a href="https://fly.io/docs/reference/tigris/" title=""&gt;you can use today&lt;/a&gt; to build applications.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There are three hard things in computer science:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cache invalidation
&lt;/li&gt;&lt;li&gt;Naming things
&lt;/li&gt;&lt;li&gt;&lt;a href='https://aws.amazon.com/s3/' title=''&gt;Doing a better job than Amazon of storing files&lt;/a&gt;
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Of all the annoying software problems that have no business being annoying, handling a file upload in a full-stack application stands apart, a universal if tractable malady, the plantar fasciitis of programming.&lt;/p&gt;

&lt;p&gt;Now, the actual act of clients placing files on servers is straightforward. Your framework &lt;a href='https://hexdocs.pm/phoenix/file_uploads.html' title=''&gt;has&lt;/a&gt; &lt;a href='https://edgeguides.rubyonrails.org/active_storage_overview.html' title=''&gt;a&lt;/a&gt; &lt;a href='https://docs.djangoproject.com/en/5.0/topics/http/file-uploads/' title=''&gt;feature&lt;/a&gt; &lt;a href='https://expressjs.com/en/resources/middleware/multer.html' title=''&gt;that&lt;/a&gt; &lt;a href='https://github.com/yesodweb/yesod-cookbook/blob/master/cookbook/Cookbook-file-upload-saving-files-to-server.md' title=''&gt;does&lt;/a&gt; &lt;a href='https://laravel.com/docs/10.x/filesystem' title=''&gt;it&lt;/a&gt;. What&amp;rsquo;s hard is making sure that uploads stick around to be downloaded later.&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;(yes, yes, we know, &lt;a href="https://youtu.be/b2F-DItXtZs?t=102" title=""&gt;sharding /dev/null&lt;/a&gt; is faster)&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Enter object storage, a pattern you may know by its colloquial name &amp;ldquo;S3&amp;rdquo;. Object storage occupies a funny place in software architecture, somewhere between a database and a filesystem. It&amp;rsquo;s like &lt;a href='https://man7.org/linux/man-pages/man3/malloc.3.html' title=''&gt;&lt;code&gt;malloc&lt;/code&gt;&lt;/a&gt;&lt;code&gt;()&lt;/code&gt;, but for cloud storage instead of program memory.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://www.kleenex.com/en-us/' title=''&gt;S3&lt;/a&gt;—err, object storage — is so important that it was the second AWS service ever introduced (EC2 was not the first!). Everybody wants it. We know, because they keep asking us for it.&lt;/p&gt;

&lt;p&gt;So why didn&amp;rsquo;t we build it?&lt;/p&gt;

&lt;p&gt;Because we couldn&amp;rsquo;t figure out a way to improve on S3. And we still haven&amp;rsquo;t! But someone else did, at least for the kinds of applications we see on Fly.io.&lt;/p&gt;
&lt;h2 id='but-first-some-back-story' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-first-some-back-story' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;But First, Some Back Story&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;S3 checks all the boxes. It&amp;rsquo;s trivial to use. It&amp;rsquo;s efficient and cost-effective. It has redundancies that would make a DoD contractor blush. It integrates with archival services like Glacier. And every framework supports it. At some point, the IETF should just take a deep sigh and write an S3 API RFC, XML signatures and all.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s at least one catch, though.&lt;/p&gt;

&lt;p&gt;Back in, like, &amp;lsquo;07 people ran all their apps from a single city. S3 was designed to work for those kinds of apps. The data, the bytes on the disks (or whatever weird hyperputer AWS stores S3 bytes on), live in one place. A specific place. In a specific data center. As powerful and inspiring as The Architects are, they are mortals, and must obey the laws of physics.&lt;/p&gt;

&lt;p&gt;This observation feels banal, until you realize how apps have changed in the last decade. Apps and their users don&amp;rsquo;t live in one specific place. They live all over the world. When users are close to the S3 data center, things are amazing! But things get less amazing the further away you get from the data center, and even less amazing the smaller and more frequent your reads and writes are.&lt;/p&gt;

&lt;p&gt;(Thought experiment: you have to pick one place in the world to route all your file storage. Where is it? Is it &lt;a href='https://www.tripadvisor.com/Restaurant_Review-g30246-d1956555-Reviews-Ford_s_Fish_Shack_Ashburn-Ashburn_Loudoun_County_Virginia.html' title=''&gt;Loudoun County, Virginia&lt;/a&gt;?)&lt;/p&gt;

&lt;p&gt;So, for many modern apps, you end up having to &lt;a href='https://stackoverflow.com/questions/32426249/aws-s3-bucket-with-multiple-regions' title=''&gt;write things into different regions&lt;/a&gt;, so that people close to the data get it from a region-specific bucket. Doing that pulls in CDN-caching things that complicated your application and put barriers between you and your data. Before you know it, you&amp;rsquo;re wearing custom orthotics on your, uh, developer feet. (&lt;em&gt;I am done with this metaphor now, I promise.&lt;/em&gt;)&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;(well, okay, Backblaze B2 because somehow my bucket fits into their free tier, but you get the idea)&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Personally, I know this happens. Because I had to build one! I run a &lt;a href='https://xeiaso.net/blog/xedn/' title=''&gt;CDN backend&lt;/a&gt; that&amp;rsquo;s a caching proxy for S3 in six continents across the world. All so that I can deliver images and video efficiently for the readers of my blog.&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;(shut up, it’s a sandwich)&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;What if data was really global? For some applications, it might not matter much. But for others, it matters a lot. When a sandwich lover in Australia snaps a picture of a &lt;a href='https://en.wikipedia.org/wiki/Hamdog' title=''&gt;hamdog&lt;/a&gt;, the people most likely to want to see that photo are also in Australia. Routing those uploads and downloads through one building in Ashburn is no way to build a sandwich reviewing empire.&lt;/p&gt;

&lt;p&gt;Localizing all the data sounds like a hard problem. What if you didn&amp;rsquo;t need to change anything on your end to accomplish it?&lt;/p&gt;
&lt;h2 id='show-me-a-hero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#show-me-a-hero' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Show Me A Hero&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Building a miniature CDN infrastructure just to handle file uploads seems like the kind of thing that could take a week or so of tinkering. The Fly.io unified theory of cloud development is that solutions are completely viable for full-stack developers only when they take less than 2 hours to get working.&lt;/p&gt;

&lt;p&gt;AWS agrees, which is why they have a SKU for it, &lt;a href='https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_distribution' title=''&gt;called Cloudfront&lt;/a&gt;, which will, at some variably metered expense, optimize the read side of a single-write-region bucket: they&amp;rsquo;ll set up &lt;a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''&gt;a simple caching CDN&lt;/a&gt; for you. You can probably get S3 and Cloudfront working within 2 hours, especially if you&amp;rsquo;ve set it up before.&lt;/p&gt;

&lt;p&gt;Our friends at Tigris have this problem down to single-digit minutes, and what they came up with is a lot cooler than a cache CDN.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s how it works. Tigris runs redundant FoundationDB clusters in our regions to track objects. They use Fly.io&amp;rsquo;s NVMe volumes as a first level of cached raw byte store, and a queuing system modelled on &lt;a href='https://www.foundationdb.org/files/QuiCK.pdf' title=''&gt;Apple&amp;rsquo;s QuiCK paper&lt;/a&gt; to distribute object data to multiple replicas, to regions where the data is in demand, and to 3rd party object stores… like S3.&lt;/p&gt;

&lt;p&gt;If your objects are less than about 128 kilobytes, Tigris makes them instantly global. By default! Things are just snappy, all over the world, automatically, because they&amp;rsquo;ve done all the work.&lt;/p&gt;

&lt;p&gt;But it gets better, because Tigris is also much more flexible than a cache simple CDN. It&amp;rsquo;s globally distributed from the jump, with inter-region routing baked into its distribution layer. Tigris isn&amp;rsquo;t a CDN, but rather a toolset that you can use to build arbitrary CDNs, with consistency guarantees, instant purge and relay regions.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a lot going on in this architecture, and it&amp;rsquo;d be fun to dig into it more. But for now, you don&amp;rsquo;t have to understand any of it. Because Tigris ties all this stuff together with an S3-compatible object storage API. If your framework can talk to S3, it can use Tigris.&lt;/p&gt;
&lt;h2 id='fly-storage' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-storage' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;&lt;code&gt;fly storage&lt;/code&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;To get started with this, run the &lt;code&gt;fly storage create&lt;/code&gt; command:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-yj4e9y5v"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-yj4e9y5v"&gt;$ fly storage create
Choose a name, use the default, or leave blank to generate one: xe-foo-images
Your Tigris project (xe-foo-images) is ready. See details and next steps with: https://fly.io/docs/reference/tigris/

Setting the following secrets on xe-foo:
AWS_REGION
BUCKET_NAME
AWS_ENDPOINT_URL_S3
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Secrets are staged for the first deployment
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;All you have to do is fill in a bucket name. Hit enter. All of the configuration for the AWS S3 library will be injected into your application for you. And you don&amp;rsquo;t even need to change the libraries that you&amp;rsquo;re using. &lt;a href='https://www.tigrisdata.com/docs/sdks/s3/' title=''&gt;The Tigris examples&lt;/a&gt; all use the AWS libraries to put and delete objects into Tigris using the same calls that you use for S3.&lt;/p&gt;

&lt;p&gt;I know how this looks for a lot of you. It looks like we&amp;rsquo;re partnering with Tigris because we&amp;rsquo;re chicken, and we didn&amp;rsquo;t want to build something like this. Well, guess what: you&amp;rsquo;re right!&lt;/p&gt;

&lt;p&gt;Compute and networking: those are things we love and understand. Object storage? &lt;a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''&gt;We already gave away the game on how we&amp;rsquo;d design a CDN for our own content&lt;/a&gt;, and it wasn&amp;rsquo;t nearly as slick as Tigris.&lt;/p&gt;

&lt;p&gt;Object storage is important. It needs to be good. We did not want to half-ass it. So we partnered with Tigris, so that they can put their full resources into making object storage as ✨magical✨ as Fly.io is.&lt;/p&gt;

&lt;p&gt;This also mirrors a lot of the Unix philosophy of Days Gone Past, you have individual parts that do one thing very well that are then chained together to create a composite result. I mean, come on, would you seriously want to buy your servers the same place you buy your shoes?&lt;/p&gt;
&lt;h2 id='one-bill-to-rule-them-all' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#one-bill-to-rule-them-all' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;One bill to rule them all&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Well, okay, the main reason why you would want to do that is because having everything under one bill makes it really easy for your accounting people. So, to make one bill for your computer, your block storage, your databases, your networking, and your object storage, we&amp;rsquo;ve wrapped everything under one bill. You don&amp;rsquo;t have to create separate accounts with Supabase or Upstash or PlanetScale or Tigris. Everything gets charged to your Fly.io bill and you pay one bill per month.&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;This was actually going to be posted on Valentine’s Day, but we had to wait for the chocolate to go on sale.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;This is our Valentine&amp;rsquo;s Day gift to you all. Object storage that just works. Stay tuned because we have a couple exciting features that build on top of the integration of Fly.io and Tigris that allow really unique things, such as truly global static website hosting and turning your bucket into a CDN in 5 minutes at most.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s to many more happy developer days to come.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>GPUs on Fly.io are available to everyone!</title>
    <link rel="alternate" href="https://fly.io/blog/gpu-ga/"/>
    <id>https://fly.io/blog/gpu-ga/</id>
    <published>2024-02-12T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/gpu-ga/assets/gpu-ga-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Fly.io makes it easy to spin up compute around the world, now including powerful GPUs. Unlock the power of large language models, text transcription, and image generation with our datacenter-grade muscle!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;GPUs are now available to everyone!&lt;/p&gt;

&lt;p&gt;We know you&amp;rsquo;ve been excited about wanting to use GPUs on Fly.io and we&amp;rsquo;re happy to announce that they&amp;rsquo;re available for everyone. If you want, you can spin up GPU instances with any of the following cards:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ampere A100 (40GB) &lt;code&gt;a100-40gb&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Ampere A100 (80GB) &lt;code&gt;a100-80gb&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Lovelace L40s (48GB) &lt;code&gt;l40s&lt;/code&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;To use a GPU instance today, change the &lt;code&gt;vm.size&lt;/code&gt; for one of your apps or processes to any of the above GPU kinds. Here&amp;rsquo;s how you can spin up an &lt;a href='https://ollama.ai' title=''&gt;Ollama&lt;/a&gt; server in seconds:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-j62lzfgm"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-j62lzfgm"&gt;&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"your-app-name"&lt;/span&gt;
&lt;span class="py"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ord"&lt;/span&gt;
&lt;span class="py"&gt;vm.size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"l40s"&lt;/span&gt;

&lt;span class="nn"&gt;[http_service]&lt;/span&gt;
  &lt;span class="py"&gt;internal_port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;11434&lt;/span&gt;
  &lt;span class="py"&gt;force_https&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="py"&gt;auto_stop_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_start_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;min_machines_running&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="py"&gt;processes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;["app"]&lt;/span&gt;

&lt;span class="nn"&gt;[build]&lt;/span&gt;
  &lt;span class="py"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama/ollama"&lt;/span&gt;

&lt;span class="nn"&gt;[mounts]&lt;/span&gt;
  &lt;span class="py"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"models"&lt;/span&gt;
  &lt;span class="py"&gt;destination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/root/.ollama"&lt;/span&gt;
  &lt;span class="py"&gt;initial_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"100gb"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Deploy this and bam, large language model inferencing from anywhere. If you want a private setup, see the article &lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;Scaling Large Language Models to zero with Ollama&lt;/a&gt; for more information. You never know when you have a sandwich emergency and don&amp;rsquo;t know what you can make with what you have on hand.&lt;/p&gt;

&lt;p&gt;We are working on getting some lower-cost A10 GPUs in the next few weeks. We&amp;rsquo;ll update you when they&amp;rsquo;re ready.&lt;/p&gt;

&lt;p&gt;If you want to explore the possibilities of GPUs on Fly.io, here&amp;rsquo;s a few articles that may give you ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://fly.io/blog/not-midjourney-bot/' title=''&gt;Deploy Your Own (Not) MidJourney Bot On Fly GPUs&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;Scaling Large Language Models to zero with Ollama&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''&gt;Transcribing on Fly GPU Machines&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Depending on factors such as your organization&amp;rsquo;s age and payment history, you may need to go through additional verification steps.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;ve been experimenting with Fly.io GPUs and have made something cool, let us know on the &lt;a href='https://community.fly.io/' title=''&gt;Community Forums&lt;/a&gt; or by mentioning us &lt;a href='https://hachyderm.io/@flydotio' title=''&gt;on Mastodon&lt;/a&gt;! We&amp;rsquo;ll boost the cool ones.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Event Driven Machines</title>
    <link rel="alternate" href="https://fly.io/blog/event-driven-machines/"/>
    <id>https://fly.io/blog/event-driven-machines/</id>
    <published>2024-02-05T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/event-driven-machines/assets/lambdo-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We have fast booting VM’s, so why not &lt;a href="https://fly.io/docs/speedrun/" title=""&gt;take advantage of them&lt;/a&gt;?&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Serverless is great because is has good ergonomics - when an event is received, a &amp;ldquo;not-server&amp;rdquo; boots quickly, code is run, and then everything is torn down. We&amp;rsquo;re billed only on usage.&lt;/p&gt;

&lt;p&gt;It turns out that Fly.io shares many of &lt;a href='https://fly.io/blog/the-serverless-server/' title=''&gt;the same ergonomics&lt;/a&gt; as serverless. Can we do a serverless on Fly.io? 🦆 Well, if it&amp;rsquo;s quacking like a duck, let&amp;rsquo;s call it a mallard.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s a useful pattern for triggering our own not-servers with Fly Machines.&lt;/p&gt;
&lt;h2 id='triggering-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#triggering-machines' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Triggering Machines&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I want to make Machines do some work based on my own events. Fly.io can already &lt;a href='https://fly.io/docs/apps/autostart-stop/' title=''&gt;stop Machines when idle&lt;/a&gt; based on HTTP, so let&amp;rsquo;s concentrate on non-HTTP events.&lt;/p&gt;

&lt;p&gt;The process of running evented Machines involves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Listening for events
&lt;/li&gt;&lt;li&gt;Spinning up Fly Machines to run our code (with the events as context)
&lt;/li&gt;&lt;li&gt;Having event-aware code to run
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;To do this, I made a project and named it &lt;a href='https://github.com/fly-apps/lambdo' title=''&gt;&lt;strong class='font-semibold text-navy-950'&gt;Lambdo&lt;/strong&gt;&lt;/a&gt; because reasons.
You can consider this project &amp;ldquo;reference architecture&amp;rdquo; in the same way you call a toddler&amp;rsquo;s scribbling &amp;ldquo;art&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;The goal is to run some of our code on a fresh not-server when an event is received. We want this done efficiently - a Machine should only exist long enough to process an event or 3.&lt;/p&gt;

&lt;p&gt;Lambdo does just that - it receives some events, and spins up Fly Machines with those events placed &lt;em&gt;inside&lt;/em&gt; the VMs. Once the code finishes, the Machine is destroyed.&lt;/p&gt;
&lt;div class='group relative min-w-0 bg-white shadow-md shadow-navy-500/10 rounded-xl mb-7 ring-1 ring-navy-300/40'&gt;&lt;button type='button' class='bubble-wrap z-20 absolute right-2.5 top-2.5 text-transparent group-hover:text-navy-950 hocus:text-violet-600 bg-transparent group-hover:bg-white hocus:bg-violet-200/40 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none' data-wrap-target='#table-r8vpa1ai' data-wrap-type='nowrap'&gt;&lt;svg class='w-5 h-5 pointer-events-none' viewBox='0 0 20 20' fill='none' stroke='currentColor' stroke-width='1.5' stroke-linecap='round' stroke-linejoin='round'&gt;&lt;g buffered-rendering='static'&gt;&lt;path d='M11.912 10.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.314 2.314 0 00-2.315-2.31H4.959M15.187 14.5H4.959M8.802 10H4.959' /&gt;&lt;path d='M13.081 8.466l-1.548 1.571 1.548 1.571' /&gt;&lt;/g&gt;&lt;/svg&gt;&lt;span class='bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950'&gt;Wrap text&lt;/span&gt;&lt;/button&gt;&lt;div class='min-w-0 overflow-x-auto rounded-xl'&gt;&lt;table class='table-stripe table-stretch table-pad text-sm whitespace-nowrap m-0' id='table-r8vpa1ai'&gt;&lt;thead class='text-navy-950 text-left'&gt;&lt;tr&gt;
&lt;th style="text-align: center"&gt;&lt;img alt="the files are inside the computer" src="/blog/event-driven-machines/assets/files-are-inside-the-computer-cover.webp" /&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;
&lt;td style="text-align: center"&gt;The files are &lt;em&gt;in&lt;/em&gt; the computer!&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;&lt;h2 id='listening-for-events' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#listening-for-events' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Listening for Events&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;For our purposes, an event is just a JSON object. &lt;code&gt;{&amp;quot;any&amp;quot;: &amp;quot;object&amp;quot;, &amp;quot;will&amp;quot;: &amp;quot;do&amp;quot;}&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We want to turn events into compute, so we need some sort of event system. I decided to use a queue.&lt;/p&gt;
&lt;h3 id='the-queue' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-queue' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Queue&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The first thing I needed was a place to send events! I chose to use SQS, which let me continue to pretend servers don&amp;rsquo;t exist.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s no surprise then that the first part of this project is &lt;a href='https://github.com/fly-apps/lambdo/blob/main/internal/sqs/get_events.go' title=''&gt;code that polls SQS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When the polling returns some non-zero number of events, it collects the SQS messages&amp;rsquo; JSON strings (and some meta data), resulting in an array of objects (a list of events).&lt;/p&gt;

&lt;p&gt;Then we send these events to some Machines.&lt;/p&gt;
&lt;h2 id='spinning-up-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#spinning-up-machines' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Spinning Up Machines&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Fly Machines are fast-booting Micro-VM&amp;rsquo;s, controlled by an &lt;a href='https://fly.io/docs/machines/working-with-machines/' title=''&gt;API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A feature of that API is the ability to &lt;a href='https://community.fly.io/t/machine-files/14453' title=''&gt;create files&lt;/a&gt; on a new Machine. This is how we&amp;rsquo;ll get our events into the Machine.&lt;/p&gt;

&lt;p&gt;When Lambdo creates a Machine, it places a file at &lt;code&gt;/tmp/events.json&lt;/code&gt;. Our code just needs to read that file and parse the JSON.&lt;/p&gt;
&lt;h3 id='running-our-code' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#running-our-code' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Running Our Code&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Part of the ergonomics of Serverless is (usually) being limited to running just a function. Fly.io doesn&amp;rsquo;t really care what you run, which is to our advantage. We can choose to write discreet functions per event, or we can bring our whole &lt;a href='https://signalvnoise.com/svn3/the-majestic-monolith/' title=''&gt;Majestic Monolith&lt;/a&gt; to bear.&lt;/p&gt;

&lt;p&gt;How do we package up our code? The real answer is &amp;ldquo;however you want!&amp;rdquo;, but here&amp;rsquo;s 2 ideas.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Use Your Existing Code Base&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can just use your existing code base. This is especially easy if you&amp;rsquo;re already deploying apps to Fly.io.&lt;/p&gt;

&lt;p&gt;All we&amp;rsquo;d need to do is add some additional code - a command perhaps (&lt;code&gt;rake&lt;/code&gt;, &lt;code&gt;artisan&lt;/code&gt;, whatever) - that sucks in that JSON, iterates over the events, and does some stuff.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative php"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-38m30kym"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-38m30kym"&gt;&lt;span class="nv"&gt;$events&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;json_decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file_get_contents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"/tmp/events.json"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$events&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// do a thing&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When we create an event, we&amp;rsquo;ll tell Lambdo how to run your code - more on that later.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Use Lambdo&amp;rsquo;s Base Images&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This project also provides some &amp;ldquo;runtimes&amp;rdquo; (base images). This is a bit more &amp;ldquo;traditional serverless&amp;rdquo;, were you provide a function to run.&lt;/p&gt;

&lt;p&gt;Lambdo contains &lt;a href='https://github.com/fly-apps/lambdo/tree/main/runtimes' title=''&gt;two runtimes&lt;/a&gt; right now - Node and PHP. There could be more, of course, but you know&amp;hellip;lazy.&lt;/p&gt;

&lt;p&gt;The Node runtime &lt;a href='https://github.com/fly-apps/lambdo/blob/main/runtimes/js/src/index.js' title=''&gt;contains some code&lt;/a&gt; that will read the JSON payload file   (again, just an array of JSON events), and call a user-supplied JS function once per event.&lt;/p&gt;

&lt;p&gt;An &lt;a href='https://github.com/fly-apps/lambdo/tree/main/runtimes/js/sample-project' title=''&gt;example is here&lt;/a&gt; - our code just needs to export a function that does stuff to the given event:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative javascript"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-wh6y1w2p"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-wh6y1w2p"&gt;&lt;span class="c1"&gt;// File /app/index.js&lt;/span&gt;
&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Let's process an event! The event:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;a href='https://github.com/fly-apps/lambdo/tree/main/runtimes/php' title=''&gt;PHP runtime&lt;/a&gt; is the same idea, a user-supplied handler looks like this:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative php"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-jhm1hg3i"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-jhm1hg3i"&gt;&lt;span class="c1"&gt;// File /app/index.php&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Do something with $event&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Explore the &lt;a href='https://github.com/fly-apps/lambdo/tree/main/runtimes' title=''&gt;runtime&lt;/a&gt; directory of the project to see how that&amp;rsquo;s put together.&lt;/p&gt;
&lt;h2 id='sending-an-event' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#sending-an-event' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Sending an Event&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Since our events are sent via SQS queue, it would be helpful to see an example SQS message. Remember how I mentioned the SQS message has some meta data?&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s an example, with said meta data:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-cgzk0n59"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-cgzk0n59"&gt;aws sqs send-message &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--queue-url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://sqs.&amp;lt;region&amp;gt;.amazonaws.com/&amp;lt;account&amp;gt;/&amp;lt;queue&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--message-body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{"foo": "bar"}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--message-attributes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{
"size":{"DataType":"String","StringValue":"performance-2x"}, 
"image":{"DataType":"String","StringValue":"fideloper/lambdo-php-sample:latest"}
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The Body field of the SQS message is assumed to be a JSON string (it&amp;rsquo;s the event itself, and its contents are arbitrary - whatever makes sense for you).&lt;/p&gt;

&lt;p&gt;The message Attributes contains the meta data - up to 3 important details:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;image&lt;/code&gt;: The image to run (it might be a Docker Hub image, or something you pushed to registry.fly.io). This is &lt;strong class='font-semibold text-navy-950'&gt;required&lt;/strong&gt;.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;size&lt;/code&gt;: The CPU size and type to use† - defaults to &lt;code&gt;performance-2x&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;&lt;code&gt;command&lt;/code&gt;: The command to run, which is the Docker &lt;code&gt;CMD&lt;/code&gt; equivalent - defaults to whatever your &lt;code&gt;CMD&lt;/code&gt; is set in the &lt;code&gt;Dockerfile&lt;/code&gt; used to create the Machine image.††
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;†You can get valid values for the &lt;code&gt;size&lt;/code&gt; option by running &lt;code&gt;fly platform vm-sizes&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;††It&amp;rsquo;s an array form, e.g. &lt;code&gt;[&amp;quot;php&amp;quot;, &amp;quot;artisan&amp;quot;, &amp;quot;foo&amp;quot;]&lt;/code&gt;, you may need to do some escaping of double quotes if you&amp;rsquo;re sending messages to SQS via terminal.&lt;/p&gt;
&lt;h2 id='we-did-a-lambda' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-did-a-lambda' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We did a Lambda?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Fly.io isn&amp;rsquo;t serverless, but it has all these primitives that add up to serverless. You have events, Fly.io has fast-booting VM&amp;rsquo;s. They just make sense together!&lt;/p&gt;

&lt;p&gt;What we did here is use &lt;a href='https://github.com/fly-apps/lambdo' title=''&gt;&lt;strong class='font-semibold text-navy-950'&gt;Lambdo&lt;/strong&gt; to respond to events by spinning up a Machine&lt;/a&gt;. Our code can process those events any way we want.&lt;/p&gt;

&lt;p&gt;What I like about this approach is how flexible it can be. We can choose the base image to use and the server type (even using GPU-enabled Machines) &lt;em&gt;per event&lt;/em&gt;.
Since we have full control over the Machine VM&amp;rsquo;s responding to the events, we can do whatever we want inside of them. Pretty neat!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Delegating tasks to Fly Machines</title>
    <link rel="alternate" href="https://fly.io/blog/delegate-tasks-to-fly-machines/"/>
    <id>https://fly.io/blog/delegate-tasks-to-fly-machines/</id>
    <published>2024-02-01T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/delegate-tasks-to-fly-machines/assets/delegate-tasks-to-fly-machines-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io. We run apps for our users on hardware we host around the world. Leveraging Fly.io Machines and Fly.io’s private network can make delegating expensive tasks a breeze. It’s easy to &lt;a href="/docs/speedrun/" title=""&gt;get started&lt;/a&gt;!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There are many ways to delegate work in web applications, from using background workers to serverless architecture. In this article, we explore a new machine pattern that takes advantage of Fly Machines and distinct process groups to make quick work of resource-intensive tasks.&lt;/p&gt;
&lt;h2 id='the-problem' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-problem' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Problem&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s say you&amp;rsquo;re building a web application that has a few tasks that demand a hefty amount of memory or CPU juice. Resizing images, for example, can require a shocking amount of memory, but you might not need that much memory &lt;em&gt;all&lt;/em&gt; of the time, for handling most of your web requests. Why pay for all that horsepower when you don&amp;rsquo;t need it most of the time?&lt;/p&gt;

&lt;p&gt;What if there&amp;rsquo;s a different way to delegate these resource-intensive tasks?&lt;/p&gt;
&lt;h2 id='the-solution' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-solution' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Solution&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;What if you could simply delegate these types of tasks to a more powerful machine &lt;em&gt;only&lt;/em&gt; when necessary? Let&amp;rsquo;s build an example of this method in a sample app. We&amp;rsquo;ll be using Next.js today, but this pattern is framework (and language) agnostic.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s how it will work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A request hits an endpoint that does some resource-intensive tasks
&lt;/li&gt;&lt;li&gt;The request is passed on to a copy of your app that&amp;rsquo;s running on a more beefy machine
&lt;/li&gt;&lt;li&gt;The beefy machine performs the intensive work and then hands the result back to the user via the &amp;ldquo;weaker&amp;rdquo; machine.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;&lt;img alt="(A 3 panel comic of two characters, one small and one big and strong, both with computer screens for heads. Panel 1: Little guy hands the big guy a jar of pickles. Panel 2: Big guy opens the pickle jar. Panel 3: Big guy hands back the opened jar to the little guy, who is pleased; Illustration by Annie Sexton)" src="/blog/delegate-tasks-to-fly-machines/assets/./3-panel-comic-delegate-tasks-to-fly-machines.webp" /&gt;&lt;/p&gt;

&lt;p&gt;To demonstrate this task-delegation pattern, we&amp;rsquo;re going to start with a single-page application that looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img alt="(Screenshot of the demo app; its a single-page app with the header and description &amp;quot;Open Pickle Jar: You&amp;#39;ve got a jar of pickles (a zip file of some high-def pickle photos) that you would like to open (resize and display below)&amp;quot;. Under the description there are two inputs, one for width and one for height, and a button that says &amp;quot;Open pickle jar&amp;quot;)" src="/blog/delegate-tasks-to-fly-machines/assets/./pickle-jar-screenshot.webp" /&gt;&lt;/p&gt;

&lt;p&gt;Our &amp;ldquo;Open Pickle Jar&amp;rdquo; app is quite simple: you provide the width and height and it goes off and resizes some high-resolution photos to those dimensions (exciting!).&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;d like to follow along, you can clone the &lt;code&gt;start-here&lt;/code&gt; branch of this repository: &lt;a href='https://github.com/fly-apps/open-pickle-jar' title=''&gt;https://github.com/fly-apps/open-pickle-jar&lt;/a&gt; . The final changes are visible on the &lt;code&gt;main&lt;/code&gt; branch. This app uses S3 for image storage, so you&amp;rsquo;ll need to create a bucket called &lt;code&gt;open-pickle-jar&lt;/code&gt; and provide &lt;code&gt;AWS_REGION&lt;/code&gt;,  &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt;, and  &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt; as environment variables.&lt;/p&gt;

&lt;p&gt;This task is really just a stand-in for any HTTP request that kicks off a resource-intensive task. Get the request from the user, delegate it to a more powerful machine, and then return the result to the user. It&amp;rsquo;s what happens when you can&amp;rsquo;t open a pickle jar, and you ask for someone to help.&lt;/p&gt;

&lt;p&gt;Before we start, let&amp;rsquo;s define some terms and what they mean on Fly.io:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Machines:&lt;/strong&gt; Extremely fast-booting VMs. They can exist in different regions and even run different processes.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;App:&lt;/strong&gt; An abstraction for a group of Machines running your code on Fly.io, along with the configuration, provisioned resources, and data we need to keep track of to run and route to your Machines.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Process group:&lt;/strong&gt; A collection of Machines running a specific process. Many apps only run a single process (typically a public-facing HTTP server), but you can define any number of them.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;fly.toml:&lt;/strong&gt; A configuration file for deploying apps on Fly.io where you can set things like Machine specs, process groups, regions, and more.
&lt;/li&gt;&lt;/ul&gt;

&lt;hr&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Setup Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s what we&amp;rsquo;ll need for our application:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A &lt;strong class='font-semibold text-navy-950'&gt;route&lt;/strong&gt; that performs our resource-intensive task
&lt;/li&gt;&lt;li&gt;A &lt;strong class='font-semibold text-navy-950'&gt;wrapper function&lt;/strong&gt; that either:

&lt;ol&gt;
&lt;li&gt;Runs our resource-intensive task OR
&lt;/li&gt;&lt;li&gt;Forwards the request to our more powerful Machine
&lt;/li&gt;&lt;/ol&gt;
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Two process groups&lt;/strong&gt; running the &lt;em&gt;same process&lt;/em&gt; but with differing Machine specs:

&lt;ol&gt;
&lt;li&gt;One for accepting HTTP traffic and handling most requests (let&amp;rsquo;s call it &lt;code&gt;web&lt;/code&gt;)
&lt;/li&gt;&lt;li&gt;One internal-only group for doing the heavy lifting (let&amp;rsquo;s call it &lt;code&gt;worker&lt;/code&gt;)
&lt;/li&gt;&lt;/ol&gt;
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;In short, this is what our architecture will look like, a standard web and worker duo.&lt;/p&gt;

&lt;p&gt;&lt;img alt="(A simple graphic illustrating two servers; a small box containing &amp;quot;npm run start&amp;quot; and a larger box containing the same thing. The small is labeled &amp;quot;web&amp;quot; and the larger box is labeled &amp;quot;worker&amp;quot;.)" src="/blog/delegate-tasks-to-fly-machines/assets/./web-worker.webp" /&gt;&lt;/p&gt;
&lt;h3 id='creating-our-route' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#creating-our-route' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Creating our route&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Next.js has two distinct routing patterns: Pages and App router. We&amp;rsquo;ll use the App router in our example since it&amp;rsquo;s the preferred method moving forward.&lt;/p&gt;

&lt;p&gt;Under your &lt;code&gt;/app&lt;/code&gt; directory, create a new folder called &lt;code&gt;/open-pickle-jar&lt;/code&gt; containing a &lt;code&gt;route.ts&lt;/code&gt; .&lt;/p&gt;

&lt;p&gt;(We&amp;rsquo;re using TypeScript here, but feel free to use normal JavaScript if you prefer!)&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ujx4kwap"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-ujx4kwap"&gt;...
/app
  /open-pickle-jar
    route.ts
...
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Inside &lt;code&gt;route.ts&lt;/code&gt; we&amp;rsquo;ll flesh out our endpoint:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative typescript"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-j1a8qd9i"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-j1a8qd9i"&gt;&lt;span class="c1"&gt;// /app/open-pickle-jar/route.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;delegateToWorker&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/utils/delegateToWorker&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;NextRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;next/server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;openPickleJar&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../openPickleJar&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NextRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nextUrl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;delegateToWorker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;openPickleJar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The function &lt;code&gt;openPickleJar&lt;/code&gt; that we&amp;rsquo;re importing contains our resource-intensive task, which in this case is extracting images from a &lt;code&gt;.zip&lt;/code&gt; file, resizing them all to the new dimensions, and returning the new image URLs.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;POST&lt;/code&gt; function is how one define routes for specific HTTP methods in Next.js, and ours implements a function  &lt;code&gt;delegateToWorker&lt;/code&gt; that accepts the path of the current endpoint (&lt;code&gt;/open-pickle-jar&lt;/code&gt;) our resource-intensive function, and the same request parameters. This function doesn&amp;rsquo;t yet exist, so let&amp;rsquo;s build that next!&lt;/p&gt;
&lt;h3 id='creating-our-wrapper-function' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#creating-our-wrapper-function' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Creating our wrapper function&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Now that we&amp;rsquo;ve set up our endpoint, let&amp;rsquo;s flesh out the wrapper function that delegates our request to a more powerful machine.&lt;/p&gt;

&lt;p&gt;We haven&amp;rsquo;t defined our process groups just yet, but if you recall, the plan is to have two:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;web&lt;/code&gt; - Our standard web server
&lt;/li&gt;&lt;li&gt;&lt;code&gt;worker&lt;/code&gt; - For opening pickle jars (e.g. doing resource-intensive work). It&amp;rsquo;s essentially a duplicate of &lt;code&gt;web&lt;/code&gt;, but running on beefier Machines.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Here&amp;rsquo;s what we want this wrapper function to do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the current machine is a &lt;code&gt;worker&lt;/code&gt; , proceed to execute the resource-intensive task
&lt;/li&gt;&lt;li&gt;If the current machine is NOT a &lt;code&gt;worker&lt;/code&gt; , make a new request to the identical endpoint on a &lt;code&gt;worker&lt;/code&gt; Machine
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Inside your &lt;code&gt;/utils&lt;/code&gt; directory, create a file called &lt;code&gt;delegateToWorker.ts&lt;/code&gt; with the following content:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative typescript"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-sc0e1it9"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-sc0e1it9"&gt;&lt;span class="c1"&gt;// /utils/delegateToWorker.ts&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;delegateToWorker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;FLY_PROCESS_GROUP&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;worker&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;running on the worker...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;func&lt;/span&gt;&lt;span class="p"&gt;({...&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sending new request to worker...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;workerHost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODE_ENV&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;development&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;localhost:3001&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`worker.process.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;FLY_APP_NAME&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.internal:3000`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`http://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;workerHost&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({...&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In our &lt;code&gt;else&lt;/code&gt; section, you&amp;rsquo;ll notice that while developing locally (aka, when &lt;code&gt;NODE_ENV&lt;/code&gt; is &lt;code&gt;development&lt;/code&gt;) we define the hostname of our &lt;code&gt;worker&lt;/code&gt; process to be &lt;code&gt;localhost:3001&lt;/code&gt;. Typically Next.js apps run on port &lt;code&gt;3000&lt;/code&gt;, so while testing our app locally, we can have two instances of our process running in different terminal shells:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;npm run dev&lt;/code&gt; - This will run on &lt;code&gt;localhost:3000&lt;/code&gt; and will act as our local &lt;code&gt;web&lt;/code&gt; process
&lt;/li&gt;&lt;li&gt;&lt;code&gt;FLY_PROCESS_GROUP=worker npm run dev&lt;/code&gt; - This will run on &lt;code&gt;localhost:3001&lt;/code&gt; and will act as our &lt;code&gt;worker&lt;/code&gt; process (Next.js should auto-increment the port if the original &lt;code&gt;3000&lt;/code&gt; is already in use)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Also, if you&amp;rsquo;re wondering about the &lt;code&gt;FLY_PROCESS_GROUP&lt;/code&gt; and &lt;code&gt;FLY_APP_NAME&lt;/code&gt; constants, these are &lt;a href='https://fly.io/docs/reference/runtime-environment/' title=''&gt;Fly.io-specific runtime environment variables&lt;/a&gt; available on all apps.&lt;/p&gt;
&lt;h3 id='accessing-our-worker-machines-internal' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accessing-our-worker-machines-internal' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Accessing our &lt;code&gt;worker&lt;/code&gt; Machines (&lt;code&gt;.internal&lt;/code&gt;)&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Now, when this code is running in production (aka &lt;code&gt;NODE_ENV&lt;/code&gt; is NOT &lt;code&gt;development&lt;/code&gt;) you&amp;rsquo;ll see that we&amp;rsquo;re using a unique hostname to access our &lt;code&gt;worker&lt;/code&gt; Machine.&lt;/p&gt;

&lt;p&gt;Apps belonging to the same organization on Fly.io are provided a number of &lt;a href='https://fly.io/docs/networking/private-networking/#fly-io-internal-addresses' title=''&gt;internal addresses&lt;/a&gt;. These &lt;code&gt;.internal&lt;/code&gt; addresses let you point to different Apps and Machines in your private network. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;&amp;lt;region&amp;gt;.&amp;lt;app name&amp;gt;.internal&lt;/code&gt; – To reach app instances in a particular region, like &lt;code&gt;gru.my-cool-app.internal&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;&lt;code&gt;&amp;lt;app instance ID&amp;gt;.&amp;lt;app name&amp;gt;.internal&lt;/code&gt; - To reach a &lt;em&gt;specific&lt;/em&gt; app instance.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;&amp;lt;process group&amp;gt;.process.&amp;lt;app name&amp;gt;.internal&lt;/code&gt; - To target app instances belonging to a specific process group. &lt;strong class='font-semibold text-navy-950'&gt;This is what we&amp;rsquo;re using in our app.&lt;/strong&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Since our &lt;code&gt;worker&lt;/code&gt; process group is running the same process as our &lt;code&gt;web&lt;/code&gt; process (in our case, &lt;code&gt;npm run start&lt;/code&gt;), we&amp;rsquo;ll also need to make sure we use the same internal port (&lt;code&gt;3000&lt;/code&gt;).&lt;/p&gt;
&lt;h3 id='defining-our-process-groups-and-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#defining-our-process-groups-and-machines' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Defining our process groups and Machines&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The last thing to do will be to define our two process groups and their respective Machine specs. We&amp;rsquo;ll do this by editing our &lt;code&gt;fly.toml&lt;/code&gt; configuration.&lt;/p&gt;

&lt;p&gt;If you don&amp;rsquo;t have this file, go ahead and create a blank one and use the content below, but replace &lt;code&gt;app = open-pickle-jar&lt;/code&gt; with your app&amp;rsquo;s name, as well as your preferred &lt;code&gt;primary_region&lt;/code&gt;.  If you don&amp;rsquo;t know what region you&amp;rsquo;d like to deploy to, &lt;a href='https://fly.io/docs/reference/regions/' title=''&gt;here&amp;rsquo;s the list of them&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Before you deploy:&lt;/strong&gt; Note that deploying this example app will spin up &lt;strong class='font-semibold text-navy-950'&gt;billable&lt;/strong&gt; machines. Please feel free to alter the Machine (&lt;code&gt;[[vm]]&lt;/code&gt;) specs listed here to ones that suit your budget or app&amp;rsquo;s needs.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-lcp533re"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-lcp533re"&gt;app = "open-pickle-jar"
primary_region = "sea"

[build]

[processes]
  web = "npm run start"
  worker = "npm run start"

[http_service]
  internal_port = 3000
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 1
  processes = ["web"]

[[vm]]
  cpu_kind = "shared"
  cpus = 1
  memory_mb = 1024
  processes = ["web"]

[[vm]]
  size = "performance-4x"
  processes = ["worker"]
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And that&amp;rsquo;s it! With our &lt;code&gt;fly.toml&lt;/code&gt; finished, we&amp;rsquo;re ready to deploy our app!&lt;/p&gt;

&lt;p&gt;&lt;img src="https://slabstatic.com/prod/uploads/p1b436gf/posts/images/tH4GaGLVaDkh3RhIwCpiDRX3.png" /&gt;&lt;/p&gt;
&lt;h2 id='discussion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#discussion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Discussion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Today we built a machine pattern on top of Fly.io. This pattern allows us to have a lighter request server that can delegate certain tasks to a stronger server, meaning that we can have one Machine do all the heavy lifting that could block everything else while the other handles all the simple tasks for users. With this in mind, this is a fairly naïve implementation, and we can make this much better:&lt;/p&gt;
&lt;h3 id='using-a-queue-for-better-resiliency' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#using-a-queue-for-better-resiliency' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Using a queue for better resiliency&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;In its current state, our code isn&amp;rsquo;t very resilient to failed requests. For this reason, you may want to consider keeping track of jobs in a queue with Redis (similar to Sidekiq in Ruby-land). When you have work you want to do, put it in the queue. Your queue worker would have to write the result somewhere (e.g., in Redis) that the application could fetch when it&amp;rsquo;s ready.&lt;/p&gt;
&lt;h3 id='starting-stopping-worker-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#starting-stopping-worker-machines' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Starting/stopping worker Machines&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The benefit of this pattern is that you can limit how many &amp;ldquo;beefy&amp;rdquo; Machines you need to have available at any given time. Our demo app doesn&amp;rsquo;t dictate how many &lt;code&gt;worker&lt;/code&gt; Machines to have at any given time, but by adding timeouts you could elect to start and stop them as needed.&lt;/p&gt;

&lt;p&gt;Now, you may think that constantly starting and stopping Machines might incur higher response times, but note that we are NOT talking about creating/destroying Machines. Starting and stopping Machines only takes as long as it takes to start your web server (i.e. &lt;code&gt;npm run start&lt;/code&gt;). The best part is that &lt;strong class='font-semibold text-navy-950'&gt;Fly.io does not charge for the CPU and RAM usage of stopped Machines.&lt;/strong&gt; &lt;a href='https://community.fly.io/t/we-are-going-to-start-collecting-charges-for-stopped-machines-rootfs-starting-april-25th/17825' title=''&gt;We will charge for storage of their root filesystems on disk, starting April 25th, 2024&lt;/a&gt;. Stopped Machines will still be much cheaper than running ones.&lt;/p&gt;
&lt;h3 id='what-about-serverless-functions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-about-serverless-functions' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What about serverless functions?&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;This &amp;ldquo;delegate to a beefy machine&amp;rdquo; pattern is similar to serverless functions with platforms like AWS Lambda. The main difference is that serverless functions usually require you to segment your application into a bunch of small pieces, whereas the method discussed today just uses the app framework that you deploy to production. Each pattern has its own benefits and downsides.&lt;/p&gt;
&lt;h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Conclusion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The pattern outlined here is one more tool in your arsenal for scaling applications. By utilizing Fly.io&amp;rsquo;s private network and &lt;code&gt;.internal&lt;/code&gt; domains, it&amp;rsquo;s quick and easy to pass work between different processes that run our app. If you&amp;rsquo;d like to learn about more methods for scaling tasks in your applications, check out &lt;a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''&gt;Rethinking Serverless with FLAME&lt;/a&gt; by Chris McCord and &lt;a href='https://fly.io/blog/print-on-demand/' title=''&gt;Print on Demand&lt;/a&gt; by Sam Ruby.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Get more done on Fly.io&lt;/h1&gt;
    &lt;p&gt;Fly.io has fast booting machines at the ready for your dynamic workloads. It&amp;rsquo;s easy to get started. You can be off and running in minutes.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/speedrun/"&gt;
        Deploy something today!  &lt;span class='opacity:50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-rabbit.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

</content>
  </entry>
  <entry>
    <title>Macaroons Escalated Quickly</title>
    <link rel="alternate" href="https://fly.io/blog/macaroons-escalated-quickly/"/>
    <id>https://fly.io/blog/macaroons-escalated-quickly/</id>
    <published>2024-01-31T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/macaroons-escalated-quickly/assets/evil-cookies-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We built a new security token system, and can I tell you the good news about our lord and savior the Macaroon?&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;1&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s implement an API token together. It&amp;rsquo;s a design called &amp;ldquo;Macaroons&amp;rdquo;, but don&amp;rsquo;t get hung up on that yet.&lt;/p&gt;

&lt;p&gt;First some &lt;button toggle="#includes"&gt;throat-clearing&lt;/button&gt;. Then:&lt;/p&gt;
&lt;div id="includes" toggle-content="" aria-label="show very boring code"&gt;&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-i4dzjfqm"&gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959"&gt;&lt;/path&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571"&gt;&lt;/path&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling"&gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z"&gt;&lt;/path&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617"&gt;&lt;/path&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class="highlight relative group"&gt;
    &lt;pre class="highlight "&gt;&lt;code id="code-i4dzjfqm"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;hmac&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;base64&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b64decode&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;hashlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sha256&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;b64decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;/div&gt;&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-b0hki89a"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-b0hki89a"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;blank_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="n"&gt;nonce&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;":"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;urandom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)]))&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;))])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Bearer tokens: like cookies, blobs you attach to a request (usually in an HTTP header).&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We&amp;rsquo;re going to build a minimally-stateful bearer token, a blob signed with HMAC. Nothing fancy so far. &lt;a href='https://api.rubyonrails.org/classes/ActiveSupport/MessageVerifier.html' title=''&gt;Rails has done this&lt;/a&gt; for a decade and a half.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a &lt;a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''&gt;fashion in API security for stateless tokens&lt;/a&gt;, which encode all the data you&amp;rsquo;d need to check any request accompanied by that token – without a database lookup. Stateless tokens have some nice properties, and some less-nice. Our tokens won&amp;rsquo;t be stateless: they carry a user ID, with which we&amp;rsquo;ll look up the HMAC key to verify it. But they&amp;rsquo;ll stake out a sort of middle ground.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-v32ipl21"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-v32ipl21"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;attenuate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;macStr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;mac&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;macStr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cavStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;oldTail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;newTail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oldTail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cavStr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;cavStr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newTail&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;m0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blank_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;m1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;attenuate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'path'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'/images'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;m2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;attenuate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'op'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'read'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s add some stuff.&lt;/p&gt;

&lt;p&gt;The meat of our tokens will be a series of claims we call &amp;ldquo;caveats&amp;rdquo;. We call them that because each claim restricts further what the token authorizes. After &lt;code&gt;{&amp;#39;path&amp;#39;: &amp;#39;/images&amp;#39;}&lt;/code&gt;, this token only allows operations that happen underneath the &lt;code&gt;/images&lt;/code&gt; directory. Then, after &lt;code&gt;{&amp;#39;op&amp;#39;: &amp;#39;read&amp;#39;}&lt;/code&gt;, it allows only reads, not writes.&lt;/p&gt;

&lt;p&gt;(I guess we&amp;rsquo;re building a file sharing system. Whatever.)&lt;/p&gt;

&lt;p&gt;Some important things about things about this design. First: by implication from the fact that caveats further restrict tokens, a token with no caveats restricts nothing. It&amp;rsquo;s a god-mode token. Don&amp;rsquo;t honor it.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;In other words: the ordering of caveats doesn’t matter.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Second: the rule of checking caveats is very simple: every single caveat must pass, evaluating &lt;code&gt;True&lt;/code&gt; against the request that carries it, in isolation and without reference to any other caveat. If any caveat evaluates &lt;code&gt;False&lt;/code&gt;, the request fails. In that way, we ensure that adding caveats to a token can only ever weaken it.&lt;/p&gt;

&lt;p&gt;With that in mind, take a closer look at this code:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-2and4pv7"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-2and4pv7"&gt;&lt;span class="n"&gt;oldTail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;newTail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oldTail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cavStr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Every caveat is HMAC-signed independently, which is weird. Weirder still, the key for that HMAC is the output of the last HMAC. The caveats chain together, and the HMAC of the last caveat becomes the &amp;ldquo;tail&amp;rdquo; of the token.&lt;/p&gt;

&lt;p&gt;Creating a new blank token for a particular user requires a key that the server (and probably only the server) knows. But adding a caveat doesn&amp;rsquo;t! Anybody can add a caveat. In our design, you, the user, can edit your own API token.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-zkheo9e4"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-zkheo9e4"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;macStr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;mac&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;macStr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;nonce&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;":"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])]&lt;/span&gt;
    &lt;span class="n"&gt;tail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;tail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tail&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compare_digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

&lt;span class="n"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# =&amp;gt; True
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For completeness, and to make a point, there&amp;rsquo;s the verification code. Look up the original secret key from the user ID,  and then it&amp;rsquo;s chained HMAC all the way down. The point I&amp;rsquo;m making is that Macaroons are very simple.&lt;/p&gt;
&lt;h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;2&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Back in 2014, Google published &lt;a href='https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41892.pdf' title=''&gt;a paper at NDSS&lt;/a&gt; introducing &amp;ldquo;Macaroons&amp;rdquo;, a new kind of cookie. Since then, they&amp;rsquo;ve become a sort of hipster shibboleth. But they&amp;rsquo;re more talked about than implemented, which is a nice way to say that practically nobody uses them.&lt;/p&gt;

&lt;p&gt;Until now! I dragged Fly.io into implementing them. Suckers!&lt;/p&gt;

&lt;p&gt;We had a problem: our API tokens were much too powerful. We needed to scope them down and let them express roles, and I scoped up that project to replace OAuth2 tokens altogether. We now have what I think is one of the more expansive Macaroon implementations on the Internet.&lt;/p&gt;

&lt;p&gt;I dragged us into using Macaroons because I wanted us to use a hipster token format. Google designed Macaroons for a bigger reason: they hoped to replace browser cookies with something much more powerful.&lt;/p&gt;

&lt;p&gt;The problem with simple bearer tokens, like browser cookies or JWTs, is that they&amp;rsquo;re prone to being stolen and replayed by attackers.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;game-over: pentest jargon for “very bad”&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Worse, a stolen token is usually a game-over condition. In most schemes, a bearer token is an all-access pass for the associated user. For some applications this isn&amp;rsquo;t that big a deal, but then, &lt;a href='https://neilmadden.blog/2020/09/09/macaroon-access-tokens-for-oauth-part-2-transactional-auth/' title=''&gt;think about banking&lt;/a&gt;. A banking app token that authorizes arbitrary transactions is a recipe for having a small heart attack on every HTTP request.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(Perfectly minimized API tokens: a software security holy grail)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Macaroons are user-editable tokens that enable JIT-generated least-privilege tokens. With minimal ceremony and no additional API requests, a banking app Macaroon lets you authorize a request with a caveat like, I don&amp;rsquo;t know, &lt;code&gt;{&amp;#39;maxAmount&amp;#39;: &amp;#39;$5&amp;#39;}&lt;/code&gt;. I mean, something way better than that, probably lots of caveats, not just one, but you get the idea: a token so minimized you feel safe sending it with your request. Ideally, a token that only authorizes that single, intended request.&lt;/p&gt;
&lt;h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;3&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;That&amp;rsquo;s not why we like Macaroons. We already assume our tokens aren&amp;rsquo;t being stolen.&lt;/p&gt;

&lt;p&gt;In most systems, the developers come up with a permissions system, and you’re stuck with it. We run a public cloud platform, and people want a lot of different things from our permissions. The dream is, we (the low-level platform developers on the team) design a single permission system, one time, and go about our jobs never thinking about this problem again.&lt;/p&gt;

&lt;p&gt;Instead of thinking of all of our &amp;ldquo;roles&amp;rdquo; in advance, we just model our platform with caveats:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Users belong to &lt;code&gt;Organizations&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;Organizations&lt;/code&gt; own &lt;code&gt;Apps&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;Apps&lt;/code&gt; contain &lt;code&gt;Machines&lt;/code&gt; and &lt;code&gt;Volumes&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;To any of these things, you can &lt;code&gt;Read&lt;/code&gt;, &lt;code&gt;Write&lt;/code&gt;, &lt;code&gt;Create&lt;/code&gt;, &lt;code&gt;Delete&lt;/code&gt;, and/or &lt;code&gt;Control&lt;/code&gt; &lt;aside class="right-sidenote"&gt;control being change of state, like “start” and “stop”&lt;/aside&gt;.
&lt;/li&gt;&lt;li&gt;Some administrivia, like expiration (&lt;code&gt;ValidityWindow&lt;/code&gt;), locking tokens to specific Fly Machines (&lt;code&gt;FromMachineSource&lt;/code&gt;), and escape hatches like &lt;code&gt;Mutation&lt;/code&gt; (for our GraphQL API).
&lt;/li&gt;&lt;/ol&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(this is a vibes-based notation, don’t think too hard about it)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Simplistic. But it expresses admin tokens:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-qulbfdw0"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-qulbfdw0"&gt;Organization 4721, mask=*
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And it expresses normal user tokens:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-lag6amtu"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-lag6amtu"&gt;Organization 4721, mask=read,write,control
(App 123, mask=control), (App 345, mask=read, write, control)
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And also an auditor-only token for that user:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-bikhlk3o"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-bikhlk3o"&gt;Organization 4721, mask=read,write,control
(App 123, mask=control), (App 345, mask=read, write, control)
Organization 4721, mask=read
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;(our deploy tokens are more complicated than this)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Or a deployment-only token, for a CI/CD system:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-rxeo12rj"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-rxeo12rj"&gt;Organization 4721, mask=write,control
(App 123, mask=*)
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Those are just the roles we came up with. Users can invent others. The important thing is that they don&amp;rsquo;t have to bother me about them.&lt;/p&gt;
&lt;h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;4&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Astute readers will have noticed by now that we haven&amp;rsquo;t shown any code that actually evaluates a caveat. That&amp;rsquo;s because it&amp;rsquo;s boring, and I&amp;rsquo;m too lazy to write it out. Got an &lt;code&gt;Organization&lt;/code&gt; token for &lt;code&gt;image-hosting&lt;/code&gt; that allows &lt;code&gt;Reads&lt;/code&gt;? Ok; check and make sure the incoming request is for an asset of &lt;code&gt;image-hosting&lt;/code&gt;, and that it’s a &lt;code&gt;Read&lt;/code&gt;. Whatever code you came up with, it’d be fine.&lt;/p&gt;

&lt;p&gt;These straightforward restrictions are called &amp;ldquo;first party caveats&amp;rdquo;. The first party is us, the platform. We&amp;rsquo;ve got all the information we need to check them.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s kit out our token format some more.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-zcmmxwd0"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-zcmmxwd0"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;third_party_caveat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ka&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;crk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;urandom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ticket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ka&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="s"&gt;'crk'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crk&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s"&gt;'msg'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;
    &lt;span class="p"&gt;})))&lt;/span&gt;
    &lt;span class="n"&gt;challenge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;crk&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;'url'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'ticket'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'challenge'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;challenge&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"YELLOW SUBMARINE"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://canary.service"&lt;/span&gt;
&lt;span class="n"&gt;c3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;third_party_caveat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'bobson.dugnutt'&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="n"&gt;m3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;attenuate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Up till now, we&amp;rsquo;ve gotten by with nothing but HMAC, which is one of the great charms of the design. Now we need to encrypt. There&amp;rsquo;s no authenticated encryption in the Python standard library, but that won&amp;rsquo;t stop us. &lt;button toggle="#hmac-ctr"&gt;Ready to make some candy? Hand me that brake fluid!&lt;/button&gt;&lt;/p&gt;
&lt;div id="hmac-ctr" toggle-content="" aria-label="show very silly code"&gt;&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-qurlq0h7"&gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959"&gt;&lt;/path&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571"&gt;&lt;/path&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling"&gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z"&gt;&lt;/path&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617"&gt;&lt;/path&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class="highlight relative group"&gt;
    &lt;pre class="highlight "&gt;&lt;code id="code-qurlq0h7"&gt;&lt;span class="c1"&gt;# do i really need to say that i'm not serious about this?
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hmactr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
     &lt;span class="n"&gt;ks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;xrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maxint&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
         &lt;span class="n"&gt;ks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
         &lt;span class="n"&gt;kbs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
         &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;xrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;kbs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ak&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'auth'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;nonce&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;urandom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cipher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hmactr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'enc'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ctxt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;bytearray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;xrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;ctxt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;^=&lt;/span&gt; &lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cipher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nonce&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctxt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ak&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ak&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'auth'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compare_digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ak&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;()):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="n"&gt;nonce&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;cipher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hmactr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'enc'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;nonce&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ptxt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;bytearray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;xrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;])):&lt;/span&gt;
        &lt;span class="n"&gt;ptxt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;^=&lt;/span&gt; &lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cipher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ptxt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;With &amp;ldquo;third-party&amp;rdquo; caveats comes a cast of characters. We&amp;rsquo;re still the first party. You&amp;rsquo;ll play the second party. The third party is any other system in the world that you trust: an SSO system, an audit log, a revocation checker, whatever.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s the trick of the third-party caveat: our platform doesn&amp;rsquo;t know what your caveat means, and it doesn&amp;rsquo;t have to. Instead, when you see a third-party caveat in your token, you tear a ticket off it and exchange it for a &amp;ldquo;discharge Macaroon&amp;rdquo; with that third party. You submit both Macaroons together to us.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s attenuate our token with a third-party caveat hooking it up to a &amp;ldquo;canary&amp;rdquo; service that generates a notice approximately any time the token is used.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/macaroons-escalated-quickly/assets/third-party.png?1/2&amp;amp;wrap-left" /&gt;&lt;/p&gt;

&lt;p&gt;To build that canary caveat, you first make a &lt;code&gt;ticket&lt;/code&gt; that users of the token will hand to your canary, and then a &lt;code&gt;challenge&lt;/code&gt; that Fly.io will use to verify discharges your checker spits out. The ticket and the challenge are both encrypted. The ticket is encrypted under &lt;code&gt;KA&lt;/code&gt;, so your service can read it. The challenge is encrypted under the previous Macaroon tail, so only Fly.io can read it. Both hide yet another key, the random HMAC key &lt;code&gt;CRK&lt;/code&gt; (&amp;ldquo;caveat root key&amp;rdquo;).&lt;/p&gt;

&lt;p&gt;In addition to &lt;code&gt;CRK&lt;/code&gt;, the ticket contains a message, which says whatever you want it to; Fly.io doesn&amp;rsquo;t care. Typically, the message describes some kind of additional checking you want your service to perform before spitting out a discharge token.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-lrt3p4uz"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-lrt3p4uz"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;discharge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ka&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ptxt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;decrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ka&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ptxt&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="n"&gt;tbody&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ptxt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# not shown: do something with tbody['msg']
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tbody&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'crk'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;))])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To authorize a request with a token that includes a third-party caveat for the canary service, you need to get your hands on a corresponding discharge Macaroon. Normally, you do that by &lt;code&gt;POST&lt;/code&gt;ing the ticket from the caveat to the service.&lt;/p&gt;

&lt;p&gt;Discharging is simple. The service, which holds &lt;code&gt;KA&lt;/code&gt;, uses it to decrypt the ticket. It checks the message and makes some decisions. Finally, it mints a new macaroon, using &lt;code&gt;CRK&lt;/code&gt;, recovered from the ticket, as the root key. The ticket itself is the nonce.&lt;/p&gt;

&lt;p&gt;If it wants, the third-party service can slap on a bunch of first-party caveats of its own. When we verify the Macaroon, we&amp;rsquo;ll copy those caveats out and enforce them. Attenuation of a third-party discharge macaroon works like a normal macaroon.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-6nzqrw14"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-6nzqrw14"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify_third_party&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;discharges&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[]):&lt;/span&gt;
    &lt;span class="n"&gt;crk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;decrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'challenge'&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;crk&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="n"&gt;discharge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;dcs&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;discharges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dcs&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'ticket'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;discharge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dcs&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;discharge&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="n"&gt;mac&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;discharge&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;crk&lt;/span&gt;
    &lt;span class="c1"&gt;# boring old stuff ---------------------
&lt;/span&gt;    &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cav&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compare_digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To verify tokens that have third-party caveats, start with the root Macaroon, walking the caveats like usual. At each third-party caveat, match the &lt;code&gt;ticket&lt;/code&gt; from the caveat with the &lt;code&gt;nonce&lt;/code&gt; on the discharge Macaroon. The key for root Macaroon decrypts the &lt;code&gt;challenge&lt;/code&gt; in the caveat, recovering &lt;code&gt;CRK&lt;/code&gt;, which cryptographically verifies the discharge.&lt;/p&gt;

&lt;p&gt;(The Macaroons paper uses different terms: “caveat identifier” or &lt;code&gt;cId&lt;/code&gt; for “ticket”, and “verification-key identifier” or &lt;code&gt;vId&lt;/code&gt; for “challenge”. These names are self-evidently bad and our contribution to the state of the art is to replace them.)&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s two big applications for third-party caveats in Popular Macaroon Thought. First, they facilitate microservice-izing your auth logic, because you can stitch arbitrary policies together out of third-party caveats. And, they seem like &lt;a href='https://github.com/go-macaroon-bakery/macaroon-bakery' title=''&gt;fertile ground for an ecosystem of interoperable Macaroon services&lt;/a&gt;: Okta and Google could stand up SSO dischargers, for instance, or someone can do a really good revocation service.&lt;/p&gt;

&lt;p&gt;Neither of these light us up. We&amp;rsquo;re allergic to microservices. As for public protocols, well, it&amp;rsquo;s good to want things. So we almost didn&amp;rsquo;t even implement third-party caveats.&lt;/p&gt;
&lt;h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;5&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m glad we did though, because they&amp;rsquo;ve been pretty great.&lt;/p&gt;

&lt;p&gt;The first problem third-party caveats solved for us was hazmat tokens. To the extent possible, we want Macaroon tokens to be safe to transmit between users. Our Macaroons express permissions, but not authentication, so it’s almost safe to email them.&lt;/p&gt;

&lt;p&gt;The way it works is, our Macaroons all have a third-party caveat pointing to a &amp;ldquo;login service&amp;rdquo;, either identifying the proper bearer as a particular Fly.io user or as a member of some &lt;code&gt;Organization&lt;/code&gt;. To allow a request with your token, you first need to collect the discharge from the login service, which requires authentication.&lt;/p&gt;

&lt;p&gt;The login discharge is very sensitive, but there isn&amp;rsquo;t much reason to pass it around. The original permissions token is where all the interesting stuff is, and it&amp;rsquo;s not scary. So that&amp;rsquo;s nice.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/macaroons-escalated-quickly/assets/fly-sso.png?1/3&amp;amp;wrap-left" /&gt;&lt;/p&gt;

&lt;p&gt;Ben then came up with &lt;a href="https://community.fly.io/t/organization-required-sso/17560"&gt;third-party caveats that require Google or Github SSO logins.&lt;/a&gt; If your token has one of those caveats, when you run &lt;code&gt;flyctl deploy&lt;/code&gt;, a browser will pop up to log you into your SSO IdP (if you haven’t done so recently already).&lt;/p&gt;

&lt;p&gt;We’ve put a &lt;a href='https://fly.io/blog/tokenized-tokens/#tokenizer-the-fabled-4th-way' title=''&gt;bunch of work into getting the guts of our SSO system working&lt;/a&gt;, but that work has mostly been invisible to customers. But Macaroon-ized SSO has a subtle benefit: you can configure &lt;a href='http://Fly.io' title=''&gt;Fly.io&lt;/a&gt; to automatically add SSO requirements to specific &lt;code&gt;Organizations&lt;/code&gt; (so, for instance, a dev environment might not need SSO at all, and prod might need two).&lt;/p&gt;

&lt;p&gt;SSO requirements in most applications are a brittle pain in the ass. Ours are flexible and straightforward, and that happened almost by accident. Macaroons, baby!&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s a fun thing you can do with a Macaroon system: stand up a Slack bot, and give it an HTTP &lt;code&gt;POST&lt;/code&gt; handler that accepts third-party tickets. Then:&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/macaroons-escalated-quickly/assets/bot-ok.png?1/2&amp;amp;center&amp;amp;border" /&gt;&lt;/p&gt;

&lt;p&gt;So, the bot is cute, but any platform could do that. What’s cool is the way our platform &lt;em&gt;doesn’t&lt;/em&gt; work with Slack; in fact, nothing on our platform knows anything about Slack, and Slack doesn’t know anything about us. We didn’t reach out to a Slack endpoint. Everything was purely cryptographic.&lt;/p&gt;

&lt;p&gt;That bot could, if I sunk some time into it, enforce arbitrary rules: it could selectively add caveats for the requests it authorizes, based on lookups of the users requesting them, at specific times of day, with specific logging. Theoretically, it could add third-party caveats of its own.&lt;/p&gt;

&lt;p&gt;The win for us for third-party caveats is that they create a plugin system for our security tokens. That’s an unusual place to see a plugin interface! But Macaroons are easy to understand and keep in your head, so we&amp;rsquo;re pretty confident about the security issues.&lt;/p&gt;
&lt;h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;6&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Obviously, we didn&amp;rsquo;t write our Macaroon code in Python, or with HMAC-SHA256-CTR.&lt;/p&gt;

&lt;p&gt;We landed on a primary implementation Golang (Ben subsequently wrote an Elixir implementation). Our hash is SHA256, our cipher is Chapoly. We encode in MsgPack.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;We didn’t use the pre-existing public implementation because &lt;a href="https://securitycryptographywhatever.com/2021/08/12/what-do-we-do-about-jwt-with-jonathan-rudenberg/" title=""&gt;we were warned not to&lt;/a&gt;. The Macaroon idea is simple, and it exists mostly as an academic paper, not a standard. The community that formed around building open source “standard” Macaroons decided to use untyped opaque blobs to represent caveats. We need things to be as rigidly unambiguous as they can be.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img src="/blog/macaroons-escalated-quickly/assets/verifier-service.png?2/3&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;The big strength of Macaroons as a cryptographic design — that it’s based almost entirely on HMAC — makes it a challenge to deploy. If you can verify a Macaroon, you can generate one.  We have thousands of servers. They can&amp;rsquo;t all be allowed to generate tokens.&lt;/p&gt;

&lt;p&gt;What we did instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We split token checking into “verification” of token HMAC tags and “clearing” of token caveats.
&lt;/li&gt;&lt;li&gt;Verification occurs only on a physically isolated token-verification service; to verify a token’s tag, you  HTTP &lt;code&gt;POST&lt;/code&gt; the token to the verifier.
&lt;/li&gt;&lt;li&gt;Clearing of token caveats can happen anywhere. Token caveat clearing is domain-specific and subject to change; token verification is simple cryptography and  changes rarely.
&lt;/li&gt;&lt;li&gt;A token verification is cacheable. The client library for the token verifier does that, which speeds things up by exploiting the locality of token submissions.
&lt;/li&gt;&lt;li&gt;The verification service is backed by a &lt;a href='https://fly.io/docs/litefs/' title=''&gt;LiteFS-distributed SQLite database&lt;/a&gt;, so verification is fast globally — a major step forward from our legacy OAuth2 tokens, which are only fast in Ashburn, VA.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;&lt;img src="/blog/macaroons-escalated-quickly/assets/service-token.png?2/3&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;Now buckle up, because I&amp;rsquo;m about to try to get you to care about service tokens.&lt;/p&gt;

&lt;p&gt;We operate &amp;ldquo;worker servers&amp;rdquo; all over the world to host apps for our customers. To do that, those workers need access to customer secrets, like the key to decrypt a customer volume. To retrieve those secrets, the workers have to talk to secrets management servers.&lt;/p&gt;

&lt;p&gt;We manage a lot of workers. We trust them. But we don&amp;rsquo;t trust them that much, if you get my drift. You don&amp;rsquo;t want to just leave it up to the servers to decide which secrets they can access. The blast radius of a problem with a single worker should be no greater than the apps that are supposed to run there.&lt;/p&gt;

&lt;p&gt;The gold standard for approving access to customer information is, naturally, explicit customer authorization. We almost have that with Macaroons! The first time an app runs on a worker, &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;the orchestrator code&lt;/a&gt; has a token, and it can pass that along to the secret stores.&lt;/p&gt;

&lt;p&gt;The problem is, you need that token more than once; not just when the user does a deploy, but potentially any time you restart the app or migrate it to a new worker. And you can&amp;rsquo;t just store and replay user Macaroons. They have expirations.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This is like dropping privilege with things like pledge(2), but in a distributed system.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;So our token verification service exposes an API that transforms a user token into a “service token”, which is just the token with the authentication caveat and expiration “stripped off”.&lt;/p&gt;

&lt;p&gt;What’s cool is: components that receive service tokens can attenuate them. For instance, we could lock a token to a particular worker, or even a particular Fly Machine. Then we can expose the whole &lt;a href='https://fly.io/docs/machines/working-with-machines/' title=''&gt;Fly Machines API&lt;/a&gt; to customer VMs while keeping access traceable to specific customer tokens. Stealing the token from a Fly Machine doesn’t help you since it’s locked to that Fly Machine by a caveat attackers can’t strip.&lt;/p&gt;
&lt;h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;7&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;If a customer loses their tokens to an attacker, we can’t just blow that off and let the attacker keep compromising the account!&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This cancels every token derived through attenuation by that nonce.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Every Macaroon we issue is identified by a unique nonce, and we can revoke tokens by that nonce. This is just a basic function of the token verification service we just described.&lt;/p&gt;

&lt;p&gt;We host token caches all over our fleet. Token revocation invalidates the caches. Anything with a cache checks frequently whether to invalidate. Revocation is rare, so just keeping a revocation list and invalidating caches wholesale seems fine.&lt;/p&gt;
&lt;h2 id='8' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#8' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;8&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I get it, it&amp;rsquo;s tough to get me to shut up about Macaroons.&lt;/p&gt;

&lt;p&gt;A couple years ago, I &lt;a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''&gt;wrote a long survey of API token designs&lt;/a&gt;, from JWTs (never!) to Biscuits. I had a &lt;a href='https://fly.io/blog/api-tokens-a-tedious-survey/#macaroons' title=''&gt;bunch to say about Macaroons&lt;/a&gt;, not all of it positive, and said we&amp;rsquo;d be plowing forward with them at Fly.io.&lt;/p&gt;

&lt;p&gt;My plan had been to follow up soon after with a deep dive on Macaroons as we planned them for Fly.io. I&amp;rsquo;m glad I didn&amp;rsquo;t do that, not just because it would&amp;rsquo;ve been embarrassing to announce a feature that took us over 2 years to launch, but also because the process of working on this with Ben Toews changed a lot of my thinking about them.&lt;/p&gt;

&lt;p&gt;I think if you asked Ben, he&amp;rsquo;d say he had mixed feelings about how much complexity we wrangled to get this launched. On the other hand: we got a lot of things out of them without trying very hard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Security tokens you can (almost) email to your users and partners without putting your account at risk.
&lt;/li&gt;&lt;li&gt;A flexible permission system, encoded directly into the tokens, that users can drive without talking to our servers.
&lt;/li&gt;&lt;li&gt;A plugin system that users can (when we clean up the tooling) use themselves, to add things like Passkeys or two-person-approval rules or audit logging, without us getting in the middle.
&lt;/li&gt;&lt;li&gt;An SSO system that can stack different IdPs, mandate SSO login, and do that on a per-&lt;code&gt;Organization&lt;/code&gt; basis.
&lt;/li&gt;&lt;li&gt;&lt;a href='https://www.latacora.com/blog/2018/06/12/a-childs-garden/' title=''&gt;Inter-service authorization&lt;/a&gt; that is traceable back to customer actions, so our servers can&amp;rsquo;t just make up which apps they&amp;rsquo;re allowed to look at.
&lt;/li&gt;&lt;li&gt;An elegant way of exposing our own APIs to customer Fly Machines with ambient authentication, but without the &lt;a href="https://github.com/SummitRoute/imdsv2_wall_of_shame/blob/main/README.md"&gt;AWS IMDSv1 credential theft problem&lt;/a&gt;.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;There are downsides and warts! I&amp;rsquo;m mostly not telling you about them! Pure restrictive caveats are an awkward way to express some roles. And, blinded by my hunger to get Macaroons deployed, I spat in the face of science and used internal database IDs as our public caveat format, an act for which JP will never forgive me.&lt;/p&gt;

&lt;p&gt;If i&amp;rsquo;ve piqued your interest, &lt;a href='https://github.com/superfly/macaroon' title=''&gt;the code for this stuff is public&lt;/a&gt;, along with some more &lt;a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''&gt;detailed technical documentation&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>How Yoko Li makes towns, tamagoes, and tools for local AI</title>
    <link rel="alternate" href="https://fly.io/blog/how-i-fly-yoko-li/"/>
    <id>https://fly.io/blog/how-i-fly-yoko-li/</id>
    <published>2024-01-08T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/how-i-fly-yoko-li/assets/chat-bird-cover-thumb.webp"/>
    <content type="html">&lt;p&gt;Hello all, and welcome to another episode of How I Fly, a series where I interview developers about what they do with technology, what they find exciting, and the unexpected things they’ve learned along the way. This time I’m talking with &lt;a href='https://twitter.com/stuffyokodraws' title=''&gt;Yoko Li&lt;/a&gt;, an investment partner at A16Z who’s also an open-source AI developer. She works on some of the most exciting AI projects in the world. I’m excited to share them with you today, with fun stories about the lessons she’s learned along the way.&lt;/p&gt;
&lt;h2 id='cool-experiments' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#cool-experiments' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Cool Experiments&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;One of Yoko’s most thought-provoking experiments is &lt;a href='https://www.convex.dev/ai-town' title=''&gt;AI Town&lt;/a&gt;, a virtual town populated by AI agents that talk with each other. It takes advantage of the randomness of AI responses to create emergent behavior. When you open it, it looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img alt="A picture of the AI Town homepage, a UI showing a top-down 2D RPG view with a visible river and a tent. The UI shows a conversation with the characters Alice and Stella." src="/blog/how-i-fly-yoko-li/assets/image1.webp" /&gt;&lt;/p&gt;

&lt;p&gt;You can see the AI agents talking with each other and watch how the relationships between them form and change over time. It’s also a lot of fun to watch.&lt;/p&gt;

&lt;p&gt;One of Yoko’s other experiments is &lt;a href='https://ai-tamago.fly.dev/' title=''&gt;AI Tamago&lt;/a&gt;, a &lt;a href='https://en.wikipedia.org/wiki/Tamagotchi' title=''&gt;Tamagochi&lt;/a&gt; virtual pet implemented with a large language model instead of the state machine that we’re all used to. AI Tamago uses an unmodified version of LLaMA 2 7B to take in game state and user inputs, then it generates what happens next. Every time you interact with your pet, it feeds data to LLaMA 2 and then uses Ollama’s JSON mode to generate unexpected output.&lt;/p&gt;

&lt;p&gt;&lt;img alt="A picture of the homepage of AI Tamago, showing a virtual pet with buttons to feed the pet, play with the pet, clean the pet, discipline the pet, check pet status, and deliver medical care to the pet." src="/blog/how-i-fly-yoko-li/assets/image4.webp" /&gt;&lt;/p&gt;

&lt;p&gt;It’s all the fun of the classic Tamagochi toys from the 90’s (including the ability to randomly discipline your virtual pet) without any of the coin cell batteries or having to carry around the little egg-shaped puck.&lt;/p&gt;

&lt;p&gt;But that’s just something you can watch, not something that’s as easy to play with on your own machine. Yoko has also worked on the &lt;a href='https://github.com/ykhli/local-ai-stack' title=''&gt;Local AI Starter Kit&lt;/a&gt; that lets you go from zero to AI in minutes. It’s a collection of chains of models that let you ingest a bunch of documents, store them in a database, and then use those documents as context for a language model to generate responses. It’s everything you need to implement a “chat with a knowledge base” feature.&lt;/p&gt;
&lt;h3 id='the-dark-of-ai-experiments' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-dark-of-ai-experiments' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The dark of AI experiments&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The Local AI Starter Kit is significant because normally to do this, you need to set up billing and API keys for at least four different API providers, and then you need to write a bunch of (hopefully robust) code to tie it all together. With the Local AI Starter Kit, you can do this on your own hardware, with your own data, and your own models privately. It’s a huge step forward for democratizing access to this technology.&lt;/p&gt;

&lt;p&gt;Document search is one of my favorite usecases for AI, and it’s one of the most immediately useful ones. It’s also one of the most fiddly and annoying to get right. To help illustrate this, I’ve made a diagram of the steps involved with setting up document search by hand:&lt;/p&gt;

&lt;p&gt;&lt;img alt="A diagram showing the process of ingesting a pile of markdown documents into a vector database. The documents are broken into a collection of sections, then each section is passed through an embedding model and the resulting vectors are stored in a vector database." src="/blog/how-i-fly-yoko-li/assets/image3.webp" /&gt;&lt;/p&gt;

&lt;p&gt;You start with your Markdown documents. Most Markdown documents are easily broken up into sections where each section will focus on a single aspect of the larger topic of the document. You can take advantage of this best practice by letting people search for each section individually, which is typically a lot more useful than just searching the entire document.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Okay, okay, fine. Language encircles concepts instead of defining them directly. The point still stands that we’re operating at a level “below” words and sentences, I don’t want to bog this down in a bunch of linear algebra that neither of us understand well enough to explain in a single paragraph like I am here. The main point is that it lets you “fuzzy match” relevant documents in a way that exact word search queries never could on their own.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Essentially, the vector embeddings that you generate from an embedding model are a mathematical representation of the “concepts” that the embedding model uses that are adjacent to the text of your documents. When you use the same model to generate embeddings for your documents and user queries, this lets you find documents that are similar to the query, but not precisely the same exact words. This is called “fuzzy searching” and it is one of the most difficult problems in computer science (right next to naming things).&lt;/p&gt;

&lt;p&gt;When a user comes to search the database, you do the same thing as ingestion:&lt;/p&gt;

&lt;p&gt;&lt;img alt="A diagram showing the full flow for doing document search Q&amp;amp;A with a vector database. The user submits a question to an API endpoint, the question is broken into embedding vectors and used to search for similar vectors in the database. The relevant document fragments are fed into the prompt for a large language model to generate a response that is grounded in the facts from the documents that were ingested. The response is streamed to the user one token at a time." src="/blog/how-i-fly-yoko-li/assets/image2.webp" /&gt;&lt;/p&gt;

&lt;p&gt;The user query comes into your API endpoint. You use the same embedding model from earlier (omitted from the diagram for brevity) to turn that query into a vector. Then you query the same vector database to find documents that are similar to the query. Then you have a list of documents with metadata like the URL to the documentation page or section fragment in that page. From here you have two options. You can either use the documents to return a list of results to the user, or you can do the more fun thing: using those documents as context for a large language model to generate a response grounded in the relevant facts in those documents.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I think it’s also how OpenAI’s custom GPTs work, but they haven’t released technical details about how they work so this is outright speculation on my part.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This basic pattern is called Retrieval-augmented Generation (RAG), and it’s how Bing’s copilot chatbot works. The Local AI Starter Kit makes setting this pipeline up &lt;em&gt;effortless&lt;/em&gt; and &lt;em&gt;fast&lt;/em&gt;. It’s a huge step forward for making this groundbreaking technology accessible to everyone.&lt;/p&gt;
&lt;h2 id='the-struggles' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-struggles' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The struggles&lt;/span&gt;&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;When I was trying to get the AI models in AI Town to output JSON, I tried a bunch of different things. I got some good results by telling the model to &amp;ldquo;only reply in JSON, no prose&amp;rdquo;, but we ended up using a model tuned for outputting code. I think I inspired &lt;a href='https://ollama.ai' title=''&gt;Ollama&lt;/a&gt; to add their JSON output feature.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One of the main benefits of large language models is that they are essentially stochastic models of the entire Internet. They have a bunch of patterns formed that can let you create surprisingly different outputs from similar inputs. This is also one of the main drawbacks of large language models: they are essentially stochastic models of the entire Internet. They have a bunch of patterns formed that can let you create surprisingly different outputs from similar inputs. The outputs of these models are usually correct-ish enough (more correct if you ground the responses in document fact like you do with a Retrieval-augmented Generation system), but they are not always aligned with our observable reality.&lt;/p&gt;

&lt;p&gt;A lot of the time you will get outputs that don’t make any logical or factual sense. These are called “hallucinations” and they are one of the main drawbacks of large language models. If a hallucination pops in at the worst times, you’ve accidentally told someone how to poison themselves with chocolate chip cookies. This is, as the kids say, “bad”.&lt;/p&gt;

&lt;p&gt;The inherent randomness of the output of a large language model means that it can be difficult to get an exactly parsable format. Most of the time, you’d be able to coax the model to get usable JSON output, but without schema it can sometimes generate wildly different JSON responses. Only sometimes. This isn’t deterministic and Yoko has found that this is one of the most frustrating parts of working with large language models.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This works by making any offending ungrammatical tokens weighted to negative infinity. It’s amazingly hacky but the hilarious part is that it works.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;However, there are workarounds. &lt;a href='https://github.com/ggerganov/llama.cpp' title=''&gt;llama.cpp&lt;/a&gt; offers a way to use a grammar file to strictly guide the output of a large language model by using context-free grammar. This lets you get something more deterministic, but it’s still not perfect. It’s a lot better than nothing, though.&lt;/p&gt;

&lt;p&gt;One of the fun things that can happen with this is that you can have the model fail to generate anything but an endless stream of newlines in JSON mode. This is hilarious and usually requires some special detection logic to handle and restart the query. There’s work being done to let you use JSON schema to guide the generation of large language model outputs, but it’s not currently ready for the masses.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;If it’s dumb and it works, is it really dumb?&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;However, one of the easiest ways to hack around this is by using a model that generates code instead of text. This is how Yoko got the AI Town and AI Tamago models to output JSON that was mostly valid. It’s a hack, but it works. This was made a lot easier for AI town when one of the tools they use (&lt;a href='https://ollama.ai' title=''&gt;Ollama&lt;/a&gt;) added support for JSON output from the model. This is a lot better than the code generation model hack, but research continues.&lt;/p&gt;
&lt;h2 id='the-simple-joy-of-unexpected-outputs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-simple-joy-of-unexpected-outputs' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The simple joy of unexpected outputs&lt;/span&gt;&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;When I was making AI Town, I was inspired by &lt;a href='https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Objects' title=''&gt;The Lifecycle of Software Objects&lt;/a&gt; by Ted Chiang. It&amp;rsquo;s about a former zookeeper that trained AI agents to be pets, kinda like how we use Reinforcement Learning from Human Feedback to train AI models like ChatGPT.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;However, at the same time, there are cases where hallucinations are not only useful, but they are what make the implementation of a system possible. If large language models are essentially massive banks of the word frequencies of a huge part of culture, then the emergent output can create unexpected things that happen frequently. This lets you have emergent behavior form, this can be the backbone of games and is the key thing that makes AI Town work as well as it does.&lt;/p&gt;

&lt;p&gt;AI Tamago is also completely driven off of the results of large language model hallucinations. They are the core of what drives user inputs, the game loop, and the surprising reactions you get when disciplining your pet. The status screen takes in the game state and lets you know what your pet is feeling in a way that the segment displays of the Tamagochi toys could never do.&lt;/p&gt;

&lt;p&gt;These enable you to build workflows that are &lt;em&gt;augmented&lt;/em&gt; by the inherent randomness of the hallucinations instead of seeing them as drawbacks. This means you need to choose outputs that can have the hallucinations shine instead of being ugly warts you need to continuously shave away. Instead of using them for doing pathfinding, have them drive the AI of your characters or writing the A* pathfinding algorithm so you don’t have to write it again for the billionth time.&lt;/p&gt;

&lt;p&gt;I’m not saying that large language models can replace the output of a human, but they are more like a language server for human languages as well as programming languages. They are best used when you are generating the boilerplate you don’t want to do yourself, or when you are throwing science at the wall to see what sticks.&lt;/p&gt;
&lt;h2 id='in-conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#in-conclusion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;In conclusion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Yoko is showing people how to use AI today, on local machines, with models of your choice, that allow you to experiment, hack and learn.&lt;/p&gt;

&lt;p&gt;I can’t wait to see what’s next!&lt;/p&gt;

&lt;p&gt;If you want to follow what Yoko does, here’s a few links to add to your feeds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yoko’s &lt;a href='https://twitter.com/stuffyokodraws' title=''&gt;Twitter&lt;/a&gt; (or X, or whatever we&amp;rsquo;re supposed to call it now)
&lt;/li&gt;&lt;li&gt;Yoko’s &lt;a href='https://github.com/ykhli' title=''&gt;GitHub&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;Yoko’s &lt;a href='https://yoko.dev/' title=''&gt;Website&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;(insert standard conclusion diatribe here)&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Deploy Your Own (Not) Midjourney Bot on Fly GPUs</title>
    <link rel="alternate" href="https://fly.io/blog/not-midjourney-bot/"/>
    <id>https://fly.io/blog/not-midjourney-bot/</id>
    <published>2024-01-04T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/not-midjourney-bot/assets/purple-balloon-taking-off-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Fly.io has Enterprise-grade GPUs and servers all over the globe (or &lt;em&gt;disk&lt;/em&gt;, depending on which side of the flat Earth debate you fall on) making it a great place to deploy your next disruptive AI app.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Some people daydream about normal things, like coffee machines or raising that Series A round (those are normal things to dream about, right?). I daydream about commanding a fleet of chonky &lt;a href='https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413/' title=''&gt;NVIDIA Lovelace L40Ss&lt;/a&gt;. Also, totally normal. Well, fortunately for me and anyone else wanting to explore the world of generative AI — Fly.io has GPUs now!&lt;/p&gt;

&lt;p&gt;Sure, this technology will probably end up with the AI &lt;a href='https://marketoonist.com/2023/03/ai-written-ai-read.html' title=''&gt;talking to itself&lt;/a&gt; while we go about our lives — but it seems like it&amp;rsquo;s here to stay, so we should at least have some fun with it. In this post we&amp;rsquo;ll put these GPUs to task and you&amp;rsquo;ll learn how to build your very own AI image-generating Discord bot, kinda like Midjourney. Available 24/7 and ready to serve up all the pictures of cats eating burritos your heart desires. And because I&amp;rsquo;d never tell you to draw the rest of the owl, I&amp;rsquo;ll link to working code that you can deploy today.&lt;/p&gt;
&lt;h2 id='latent-diffusion-models-have-entered-the-chat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#latent-diffusion-models-have-entered-the-chat' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Latent Diffusion Models Have Entered the Chat&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In the realm of AI image generation, two names have become prominent: Midjourney and Stable Diffusion. Both are image generating software that allow you to synthesize an image from a textual prompt. One is a closed source paid service, while the other is open source and can run locally. Midjourney gained popularity because it allowed the less technically-inclined among us to explore this technology through its ease of use. Stable Diffusion democratized access to the technology, but it can be quite tricky to get good results out of it.&lt;/p&gt;

&lt;p&gt;Enter &lt;a href='https://github.com/lllyasviel/Fooocus' title=''&gt;Fooocus&lt;/a&gt; (pronounced &lt;em&gt;focus&lt;/em&gt;), an open source project that combines the best of both worlds and offers a user-friendly interface to Stable Diffusion. It&amp;rsquo;s hands down the easiest way to get started with Stable Diffusion. Sure there are more popular tools like Stable Diffusion web UI and ComfyUI, but Fooocus adds some magic to reduce the need to manually tweak a bunch of settings.  The most significant feature is probably GPT-2-based &amp;ldquo;&lt;a href='https://github.com/lllyasviel/Fooocus/discussions/117#raw' title=''&gt;prompt expansion&lt;/a&gt;&amp;rdquo; to dynamically enhance prompts.&lt;/p&gt;

&lt;p&gt;The point of Fooocus is to &lt;em&gt;focus&lt;/em&gt; on your prompt. The more you put into it, the more you get out. That said, a very simple prompt like &amp;ldquo;forest elf&amp;rdquo; can return high-quality images without the need to trawl the web for prompt ideas or fiddle with knobs and levers (although they&amp;rsquo;re there if you want them).&lt;/p&gt;

&lt;p&gt;So, what can this thing &lt;em&gt;do&lt;/em&gt;? Well, this…&lt;/p&gt;

&lt;p&gt;&lt;img alt="A black and white sketch of hot-air balloon over a mountain range generated using Fooocus with &amp;quot;Pencil Sketch Drawing&amp;quot; style and quality = True" src="/blog/not-midjourney-bot/assets/./balloon-sketch.webp" /&gt;&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s the full command I&amp;rsquo;ve used to generate this image: &lt;code&gt;/imagine prompt: sketch of hot-air balloon over a mountain range style1: Pencil Sketch Drawing quality: true ar: 1664×576&lt;/code&gt;&lt;/p&gt;
&lt;h2 id='what-were-building' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-were-building' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What We&amp;rsquo;re Building&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ll deploy two applications. The code to run the bot itself will run on normal VM hardware, and the API server doing all the hard work synthesizing alpacas out of thin air will run on GPU hardware.&lt;/p&gt;

&lt;p&gt;&lt;img alt="An architecture diagram explaining how the two apps will communicate and return the requested image to an end user." src="/blog/not-midjourney-bot/assets/./arch-diagram.png?center&amp;amp;2/3" /&gt;&lt;/p&gt;

&lt;p&gt;Fooocus is served up as a web UI by default, but with a little elbow grease we can interact with it as a REST API. Fortunately, with more than 25k stars on GitHub at the time of writing, the project has a lively open-source community, so we don&amp;rsquo;t need to do much work here — it&amp;rsquo;s already been done for us. &lt;a href='https://github.com/konieshadow/Fooocus-API' title=''&gt;Fooocus-API&lt;/a&gt; is a project that shoves FastAPI in front of a Fooocus runtime. We&amp;rsquo;ll use this for the API server app.&lt;/p&gt;

&lt;p&gt;The Python-based bot connects to the &lt;a href='https://discord.com/developers/docs/topics/gateway' title=''&gt;Discord Gateway API&lt;/a&gt; using the &lt;a href='https://github.com/Pycord-Development/pycord' title=''&gt;Pycord&lt;/a&gt; library. When it starts up, it maintains an open pipe for data to flow back and forth via WebSockets. The bot app also includes a client that knows how to talk to the API server using Flycast and request the image it needs via HTTP.&lt;/p&gt;

&lt;p&gt;When we request an image from Discord using the &lt;code&gt;/imagine&lt;/code&gt; slash command, we immediately respond using Pycord&amp;rsquo;s &lt;code&gt;defer()&lt;/code&gt; function to let Discord know that the request has been received and the bot is working on it — it&amp;rsquo;ll take a few seconds to process your prompt, fabricate an image, upload it to Discord and let you share it with your friends. This is a blocking operation, so it won&amp;rsquo;t perform well if you have hundreds of people on your Discord Server using the command. For that, you&amp;rsquo;ll want to jiggle some wires to make the code non-blocking. But for for now, this gives us a nice UX for the bot.&lt;/p&gt;

&lt;p&gt;When the API server returns the image, it gets saved to disk. We&amp;rsquo;ll use the fantastic &lt;a href='https://github.com/sqids/sqids-python' title=''&gt;Sqids&lt;/a&gt; library to generate collision-free file names:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-8ivdq00o"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-8ivdq00o"&gt;&lt;span class="n"&gt;unique_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sqids&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result_filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"result_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;unique_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.png"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We&amp;rsquo;ll also use &lt;code&gt;asyncio&lt;/code&gt; to check if the image is ready every second, and when it is, we send it off to Discord to complete the request:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-dwletmxh"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-dwletmxh"&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_filename&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"rb"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;discord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result_filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Neither of these two apps will be exposed to the Internet, yet they&amp;rsquo;ll still be able to communicate with each other. One of the undersold stories about Fly.io is the ease with which two applications can communicate over the private network. We assign special IPv6 private network (6pn) addresses within the same organizational space and applications can effortlessly discover and connect to one another without any additional configuration.&lt;/p&gt;

&lt;p&gt;But what about load balancing and this &amp;ldquo;scale-to-zero&amp;rdquo; thing? We don&amp;rsquo;t &lt;em&gt;just&lt;/em&gt; want our two apps to talk to each other, we want the Fly Proxy to start our Machine when a request comes in, and stop it when idle. For that, we&amp;rsquo;ll need &lt;a href='https://fly.io/docs/reference/private-networking/#flycast-private-load balancing' title=''&gt;Flycast&lt;/a&gt;, our private load balancing feature.&lt;/p&gt;

&lt;p&gt;When you assign a Flycast IP to your app, you can route requests using a special &lt;code&gt;.flycast&lt;/code&gt; domain. Those requests are routed through the Fly Proxy instead of directly to instances in your app. Meaning you get all the load balancing, rate limiting and other proxy goodness that you&amp;rsquo;re accustomed to. The Proxy runs a process which can automatically downscale Machines every few minutes. It&amp;rsquo;ll also start them right back up when a request comes in — this means we can take advantage of scale-to-zero, saving us a bunch of money!&lt;/p&gt;
&lt;h2 id='the-imagine-command' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-imagine-command' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The &lt;code&gt;/imagine&lt;/code&gt; Command&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The slash command is the heart of your bot, enabling you to generate images based on your prompt, right from within Discord. When you type &lt;code&gt;/imagine&lt;/code&gt; into the Discord chat, you&amp;rsquo;ll see some command options pop up.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;ll need to input your base prompt (e.g. &amp;ldquo;an alpaca sleeping in a grassy field&amp;rdquo;) and  optionally pick some styles (&amp;ldquo;Pencil Sketch Drawing&amp;rdquo;, &amp;ldquo;Futuristic Retro Cyberpunk&amp;rdquo;, &amp;ldquo;MRE Dark Cyberpunk&amp;rdquo; etc). With Fooocus, combining multiple styles — &amp;ldquo;style-chaining&amp;rdquo; — can help you achieve amazing results. Set the aspect ratio or provide negative prompts if needed, too.&lt;/p&gt;

&lt;p&gt;After you execute the command, the bot will request the image from the API, then send it as a response in the chat. Let’s see it in action!&lt;/p&gt;

&lt;p&gt;&lt;img alt="A dif demo run through showcasing the ability of the bot to generate images from Discord" src="/blog/not-midjourney-bot/assets/./demo.gif?card&amp;amp;center" /&gt;&lt;/p&gt;
&lt;h2 id='deployment-speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-speedrun' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Deployment Speedrun&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;First, we&amp;rsquo;ll deploy the API server.&lt;/strong&gt; For convenience (and to speed things up), we&amp;rsquo;ll use a pre-built image when we deploy. With dependencies like &lt;code&gt;torch&lt;/code&gt; and &lt;code&gt;torchvision&lt;/code&gt; bundled in, it&amp;rsquo;s a hefty image weighing in just shy of 12GB. With a normal Fly Machine  this would not only be a bad idea, but not even possible due to an 8GB limit for the VMs rootfs. Fortunately the wizards behind Fly GPUs have accounted for our need to run huge models and their dependencies, and awarded us 50GB of rootfs.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Fly GPUs use &lt;a href="https://github.com/cloud-hypervisor/cloud-hypervisor" title=""&gt;Cloud Hypervisor&lt;/a&gt; and not &lt;a href="https://github.com/firecracker-microvm/firecracker" title=""&gt;Firecracker&lt;/a&gt; (like a regular Fly Machine) for virtualization. But even with a 12GB image, this doesn’t stop the Machine from booting in seconds when a new request comes in through the Proxy.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;To start, clone the template &lt;a href='https://github.com/fly-apps/not-midjourney-bot' title=''&gt;repository&lt;/a&gt;. You&amp;rsquo;ll need this for both the bot and server apps. Then deploy the server with the Fly CLI:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-kdezxoz"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-kdezxoz"&gt;fly deploy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt; ghcr.io/fly-apps/not-midjourney-bot:server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--config&lt;/span&gt; ./server/fly.toml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-public-ips&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This command tells Fly.io to deploy your application based on the configuration specified in the &lt;code&gt;fly.toml&lt;/code&gt;, while the &lt;code&gt;--no-public-ips&lt;/code&gt; flag secures your app by not exposing it to the public Internet.&lt;/p&gt;

&lt;p&gt;Remember Flycast? To use it, we’ll allocate a private IPv6:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-t586n0x7"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-t586n0x7"&gt;fly ips allocate-v6 &lt;span class="nt"&gt;--private&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Now, let&amp;rsquo;s take a look at our &lt;a href='https://github.com/fly-apps/not-midjourney-bot/blob/134bb634f97bf81040e489650f2334b48d976c10/server/fly.toml' title=''&gt;&lt;code&gt;fly.toml&lt;/code&gt;&lt;/a&gt; config:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-avnpvpja"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-avnpvpja"&gt;&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"alpaca-image-gen"&lt;/span&gt;
&lt;span class="py"&gt;primary_region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ord"&lt;/span&gt;

&lt;span class="nn"&gt;[[vm]]&lt;/span&gt;
  &lt;span class="py"&gt;size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"performance-8x"&lt;/span&gt;
  &lt;span class="py"&gt;memory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"16gb"&lt;/span&gt;
  &lt;span class="py"&gt;gpu_kind&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"l40s"&lt;/span&gt;

&lt;span class="nn"&gt;[[services]]&lt;/span&gt;
  &lt;span class="py"&gt;internal_port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8888&lt;/span&gt;
  &lt;span class="py"&gt;protocol&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"tcp"&lt;/span&gt;
  &lt;span class="py"&gt;auto_stop_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_start_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;min_machines_running&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="nn"&gt;[[services.ports]]&lt;/span&gt;
      &lt;span class="py"&gt;handlers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;["http"]&lt;/span&gt;
      &lt;span class="py"&gt;port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
      &lt;span class="py"&gt;force_https&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="nn"&gt;[mounts]&lt;/span&gt;
  &lt;span class="py"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"repositories"&lt;/span&gt;
  &lt;span class="py"&gt;destination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/app/repositories"&lt;/span&gt;
  &lt;span class="py"&gt;initial_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"20gb"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;There are a few key things to note here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Currently, the NVIDIA L40Ss we&amp;rsquo;re using when we specify &lt;code&gt;gpu_kind&lt;/code&gt; are only available in &lt;code&gt;ORD&lt;/code&gt;, so that&amp;rsquo;s what we&amp;rsquo;ve set the &lt;code&gt;primary_region&lt;/code&gt; to. We&amp;rsquo;re rolling out more GPUs to more regions in a hurry — but for now we&amp;rsquo;ll host the bot in Chicago.
&lt;/li&gt;&lt;li&gt;Out of the box, 8GB of system RAM is suggested. In my testing this wasn&amp;rsquo;t close to enough: the Machine would frequently run out of memory and crash. I got things working better by using 16GB of RAM.
&lt;/li&gt;&lt;li&gt;The FastAPI server binds to port 8888; we need to set this as our &lt;code&gt;internal_port&lt;/code&gt;, or the Fly Proxy won&amp;rsquo;t know where to send requests.
&lt;/li&gt;&lt;li&gt;We want our Machine to &lt;a href='https://fly.io/docs/apps/autostart-stop/' title=''&gt;automatically stop and start&lt;/a&gt;.
&lt;/li&gt;&lt;li&gt;Flycast doesn&amp;rsquo;t do HTTPS, so we won&amp;rsquo;t force it here. Don&amp;rsquo;t worry, it&amp;rsquo;s still encrypted over the wire!
&lt;/li&gt;&lt;li&gt;A volume is automatically created on the first deploy. On first boot, the app clones the Fooocus repo and downloads the Stable Diffusion model checkpoints onto that volume. This takes a couple of minutes, but the next time the Machine starts, it&amp;rsquo;ll have everything it needs to serve a request within seconds.
&lt;/li&gt;&lt;/ol&gt;
&lt;div class="callout"&gt;&lt;p&gt;The &lt;a href="https://github.com/fly-apps/not-midjourney-bot/blob/84e72d1e7048627b7c845fe3d44d45b278e451d5/README.md" title=""&gt;&lt;strong class="font-semibold text-navy-950"&gt;README&lt;/strong&gt;&lt;/a&gt; for this project has detailed instructions about setting up your Discord bot and adding it to a Server.  After setting up the permissions and privileged intents, you’ll get an OAuth2 URL. Use this URL to invite your bot to your Discord server and confirm the permissions. Once that’s done, grab your Discord API token, you’ll need it for the next step.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;With the API server up and running, it&amp;rsquo;s time to deploy the Discord bot.&lt;/strong&gt; This app will run on a normal Fly Machine, no GPU required. First, set the &lt;code&gt;DISCORD_TOKEN&lt;/code&gt; and &lt;code&gt;FOOOCUS_API_URL&lt;/code&gt; (the Flycast endpoint for the API server) secrets, using the Fly CLI. Then deploy:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-wc4jofjn"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-wc4jofjn"&gt;fly deploy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt; ghcr.io/fly-apps/not-midjourney-bot:bot &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--config&lt;/span&gt; ./bot/fly.toml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-public-ips&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Notice that the bot app doesn&amp;rsquo;t need to be publicly visible on the Internet either. Under the hood, the WebSocket connection to Discord&amp;rsquo;s Gateway API allows the bot to communicate freely without the need to define any services in our &lt;code&gt;fly.toml&lt;/code&gt;. This also means that the Fly Proxy will not downscale the app like it does the GPU Machine — the bot will always appear &amp;ldquo;online&amp;rdquo;.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Not interested in GPUs?&lt;/h1&gt;
    &lt;p&gt;You can still deploy apps on Fly.io today and be up and running in a matter of minutes.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/speedrun/"&gt;
        Deploy an app now&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='how-do-i-know-this-thing-is-using-gpu-for-reals' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-do-i-know-this-thing-is-using-gpu-for-reals' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How Do I Know This Thing Is Using GPU for Reals?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;That&amp;rsquo;s easy! NVIDIA provides us with a neat little command-line utility called &lt;code&gt;nvidia-smi&lt;/code&gt; which we can use to monitor and get information about NVIDIA GPU devices.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s SSH to the running Machine for the API server app and run an &lt;code&gt;nvidia-smi&lt;/code&gt; query in one go. It&amp;rsquo;s a little clunky, but you&amp;rsquo;ll get the point:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-51fc7qja"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-51fc7qja"&gt;fly ssh console &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-C&lt;/span&gt; &lt;span class="s2"&gt;"nvidia-smi --query-gpu=gpu_name,utilization.gpu,utilization.memory,temperature.gpu,power.draw --format=csv,noheader --loop"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-uemr03f6"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-uemr03f6"&gt;Connecting to fdaa:2:f664:a7b:210:d8b2:8fd8:2... complete

NVIDIA L40S, 0 %, 0 %, 46, 88.63 W
NVIDIA L40S, 0 %, 0 %, 46, 88.61 W
NVIDIA L40S, 36 %, 4 %, 51, 103.41 W
NVIDIA L40S, 65 %, 25 %, 57, 280.90 W
NVIDIA L40S, 0 %, 0 %, 49, 91.13 W
NVIDIA L40S, 0 %, 0 %, 48, 89.76 W
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;What we&amp;rsquo;ve done is run the command on a loop while the bot is actually doing work synthesizing an image and we get to see it ramp up and consume more wattage and VRAM. The card is barely breaking a sweat!&lt;/p&gt;
&lt;h2 id='how-much-will-these-alpaca-pics-cost-me' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-will-these-alpaca-pics-cost-me' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How Much Will These Alpaca Pics Cost Me?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s talk about the cost-effectiveness of this setup. On Fly.io, an L40S GPU &lt;a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''&gt;costs&lt;/a&gt; $2.50/hr. Tag on a few cents per hour for the VM resources and storage for our models and you&amp;rsquo;re looking at about $3.20/hr to run the GPU Machine. It&amp;rsquo;s &lt;em&gt;on-demand&lt;/em&gt;, too — if you&amp;rsquo;re not using the compute, you&amp;rsquo;re not paying for it! Keep in mind that some of these checkpoint models can be several gigabytes and if you create a volume, you will be charged for it even when you have no Machines running. It&amp;rsquo;s worth noting too, that the non-GPU bot app falls into our &lt;a href='https://fly.io/docs/about/pricing/#free-allowances' title=''&gt;free allowance&lt;/a&gt;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Rates are on-demand, with no minimum usage requirements. Discounted rates for reserved GPU Machines and dedicated hosts are also available if you email &lt;a href="mailto:[email protected]" title=""&gt;[email protected]&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In comparison, Midjourney offers several subscription tiers with the cheapest plan costing $10/mo and providing 3.3 hours of &amp;ldquo;fast&amp;rdquo; GPU time (roughly equivalent to an enterprise-grade Fly GPU). This works out to about $3/hr give or take a few cents.&lt;/p&gt;
&lt;h2 id='where-can-i-take-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-can-i-take-this' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Where Can I Take This?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;There is a lot you can do to build out the bot&amp;rsquo;s functionality. You control the source code for the bot, meaning that you can make it do &lt;em&gt;whatever you want&lt;/em&gt;. You might decide to mimic Midjourney&amp;rsquo;s &lt;code&gt;/blend&lt;/code&gt; command to splice your own images into prompts (AKA img2img diffusion). You can do this by adding more commands to your &lt;a href='https://guide.pycord.dev/popular-topics/cogs' title=''&gt;Cog&lt;/a&gt;, Pycord&amp;rsquo;s way of grouping similar commands. You might decide to add a button to roll the image if you don&amp;rsquo;t like it, or even specify the number of images to return. The possibilities are endless and your cloud bill&amp;rsquo;s the limit!&lt;/p&gt;

&lt;p&gt;The full code for the bot and server (with detailed instructions on how to deploy it on Fly.io) can be found &lt;a href='https://github.com/fly-apps/not-midjourney-bot' title=''&gt;&lt;strong class='font-semibold text-navy-950'&gt;here&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Fly With Alpine</title>
    <link rel="alternate" href="https://fly.io/blog/fly-with-alpine/"/>
    <id>https://fly.io/blog/fly-with-alpine/</id>
    <published>2023-12-21T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/fly-with-alpine/assets/fly-with-alpine-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Reduce image sizes and improve startup times by switching your base image to Alpine Linux.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Before proceeding, a caution.  This is an engineering trade-off.  Test carefully before deploying to production.&lt;/p&gt;

&lt;p&gt;By the end of this blog post you should have the information you need to make an informed decision.&lt;/p&gt;
&lt;h2 id='introduction' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#introduction' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Introduction&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://www.alpinelinux.org/about/' title=''&gt;Alpine Linux&lt;/a&gt; is a Linux distribution that advertises itself as Small.  Simple.  Secure.&lt;/p&gt;

&lt;p&gt;It is indisputably smaller than the alternatives &amp;ndash; when measured by image size.  More on that in a bit.  Some claim that this results in less memory usage and better performance.  Others dispute these claims.  For these, it is best that you test the results for yourself with your application.&lt;/p&gt;

&lt;p&gt;Simple is harder to measure.  Some of the larger differences, like &lt;a href='https://github.com/OpenRC/openrc#readme' title=''&gt;OpenRC&lt;/a&gt; vs &lt;a href='https://systemd.io/' title=''&gt;SystemD&lt;/a&gt;, are less relevant in container environments.  Others, like &lt;a href='https://busybox.net/' title=''&gt;BusyBox&lt;/a&gt; are implementation details.  Essentially what you get is a Linux distribution with perhaps a number of standard packages (e.g., bash) not installed by default, but these can be easily added if needed.&lt;/p&gt;

&lt;p&gt;Secure is definitely an important attribute.  The alternatives make comparable claims in this area.  Do your own research in this area and come to your own conclusions.&lt;/p&gt;

&lt;p&gt;Not mentioned is the downside: Alpine Linux has a smaller ecosystem that the alternatives, particularly when compared to Debian.&lt;/p&gt;
&lt;h2 id='baseline' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#baseline' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Baseline&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s start with a baseline consisting of the Dockerfiles produced by &lt;code&gt;fly launch&lt;/code&gt; for some of the most popular
frameworks:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-257otcgi"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-257otcgi"&gt;FROM fideloper/fly-laravel:${PHP_VERSION}
FROM hexpm/elixir:1.12.3-erlang-24.1.4-debian-bullseye-20210902-slim
FROM node:${NODE_VERSION}-slim
FROM oven/bun:${BUN_VERSION}-slim
FROM python:${PYTHON_VERSION}-slim-bullseye
FROM ruby:$RUBY_VERSION-slim
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;What may not be obvious to the naked eye from these results is that the base image for these is one of the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Debian Bookworm (the current &amp;ldquo;stable&amp;rdquo; distribution)
&lt;/li&gt;&lt;li&gt;Debian Bullseye (the previous &amp;ldquo;stable&amp;rdquo; distribution)
&lt;/li&gt;&lt;li&gt;Ubuntu Focal Fossa (the previous LTS release of Ubuntu)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Once you factor in that Ubuntu is based on Debian, the conclusion is that Debian is effectively the default distribution for fly IO.  Rest assured that this isn&amp;rsquo;t the result of a devious conspiracy by Fly.io, but rather a reflection of the default choices made independently by the developers of a number of frameworks and runtimes.  Beyond this, all Fly.io is doing is choosing the &amp;ldquo;slim&amp;rdquo; version of the default distribution for each framework as the base.&lt;/p&gt;

&lt;p&gt;What&amp;rsquo;s likely going on here is a virtuos circle: people choose Debian because of the ecosystem, and ecosystem grows because people chose Debian.&lt;/p&gt;

&lt;p&gt;Now lets compare base image sizes:&lt;/p&gt;
&lt;table class="ml-8 mb-8"&gt;
&lt;thead&gt;
&lt;tr&gt;
  &lt;th class="px-8"&gt;
  &lt;th class="px-8 underline"&gt;Alpine
  &lt;th class="px-8 underline"&gt;Debian slim
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
  &lt;th class="text-left"&gt;Bun 1.0.18
  &lt;td class="text-center"&gt;43.10M
  &lt;td class="text-center"&gt;63.84M
&lt;/tr&gt;
&lt;tr&gt;
  &lt;th class="text-left"&gt;Node 21.4.0
  &lt;td class="text-center"&gt;46.83M
  &lt;td class="text-center"&gt;70.08M
&lt;/tr&gt;
&lt;tr&gt;
  &lt;th class="text-left"&gt;Python 3.12.1
  &lt;td class="text-center"&gt;17.59M
  &lt;td class="text-center"&gt;45.36M
&lt;/tr&gt;
&lt;tr&gt;
  &lt;th class="text-left"&gt;Ruby 3.2
  &lt;td class="text-center"&gt;40.14M
  &lt;td class="text-center"&gt;74.36M
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;


&lt;p&gt;And these numbers are just the for the base images.  I&amp;rsquo;ve measured a minimal Rails/Postgresql/esbuild application at 304MB on Alpine and 428MB on Debian Slim.  A minimal Bun application at 110MB on Alpine and 173MB on Debian Slim.  And a minimal Node application at 142MB on Alpine and 207MB on Debian Slim.&lt;/p&gt;

&lt;p&gt;In each case, corresponding Alpine images are consistently smaller than their Debian slim equivalent.&lt;/p&gt;
&lt;h2 id='switching-distributions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#switching-distributions' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Switching Distributions&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Switch distributions (and switching back!) is easy.&lt;/p&gt;

&lt;p&gt;The first change is to replace &lt;code&gt;-slim&lt;/code&gt; with &lt;code&gt;-alpine&lt;/code&gt; in &lt;code&gt;FROM&lt;/code&gt; statements in your &lt;code&gt;Dockerfile&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Next is to replace &lt;code&gt;apt-get update&lt;/code&gt; with &lt;code&gt;apk update&lt;/code&gt; and &lt;code&gt;apt-get install&lt;/code&gt; with &lt;code&gt;apk add&lt;/code&gt;.  Delete any options you may have like &lt;code&gt;-y&lt;/code&gt; and &lt;code&gt;--no-install-recommends&lt;/code&gt; - they aren&amp;rsquo;t needed.&lt;/p&gt;

&lt;p&gt;Now review the names of the packages you are installing.  Many are named the same.  A few are different.
You can use &lt;a href='https://pkgs.alpinelinux.org/packages' title=''&gt;alpine packages&lt;/a&gt; to look for ones to use.  Some examples of
differences:&lt;/p&gt;
&lt;table class="ml-8 mb-8" style="border-collapse: separate; border-spacing: 1rem 0"&gt;
&lt;thead&gt;
&lt;tr&gt;
  &lt;th class="px-8 underline text-left"&gt;Debian
  &lt;th class="px-8 underline text-left"&gt;Alpine
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
  &lt;td&gt;build-essential
  &lt;td&gt;build-base
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;chromium-sandbox
  &lt;td&gt;chromium-chromedriver
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;default-libmysqlclient-dev
  &lt;td&gt;mysql-client
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;default-mysqlclient
  &lt;td&gt;mysql-client
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;freedts-bin
  &lt;td&gt;freedts
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libicu-dev
  &lt;td&gt;icu-dev
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libjemalloc
  &lt;td&gt;jemalloc-dev
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libjpeg-dev
  &lt;td&gt;jpeg-dev
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libmagickwand-dev
  &lt;td&gt;imagemagick-libs
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libsqlite3-0
  &lt;td&gt;sqlite-dev
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libtiff-dev
  &lt;td&gt;tiff-dev
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;libvips
  &lt;td&gt;vips-dev
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;node-gyp
  &lt;td&gt;gyp
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;pkg-config
  &lt;td&gt;pkgconfig
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;python
  &lt;td&gt;python3
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;python-is-python3
  &lt;td&gt;python3
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;sqlite3
  &lt;td&gt;sqlite
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;


&lt;p&gt;Note: the above is just an approximation.  For example, while &lt;code&gt;libsqlite3-0&lt;/code&gt; and &lt;code&gt;sqlite-dev&lt;/code&gt; include everything
you need to build an application that uses sqlite3, all that is needed at runtime is &lt;code&gt;sqlite-lib&lt;/code&gt;.  This relentless attention to detail contributes to smaller final image sizes.&lt;/p&gt;

&lt;p&gt;Note: For Bun, Node, and Rails users, knowledge of how to apply the above changes are included in recent versions of the dockerfile generators that we provide.  After all, computers are good at &lt;code&gt;if&lt;/code&gt; statements:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-7aa51nfb"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-7aa51nfb"&gt;bunx dockerfile --alpine
npx dockerfile --alpine
bin/rails generate dockerfile --alpine
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Choose your own Linux Distribution&lt;/h1&gt;
    &lt;p&gt;Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/"&gt;
        Run your entire stack near your users
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='potential-issues' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#potential-issues' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Potential issues&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Over time, we&amp;rsquo;ve noted a number of issues.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alpine uses &lt;a href='https://musl.libc.org/' title=''&gt;musl&lt;/a&gt; for a runtime library.  Debian uses &lt;a href='https://www.gnu.org/software/libc/' title=''&gt;glibc&lt;/a&gt;.  Software tested on glibc may not work as expected on musl. And there are other potential compatibility issues like &lt;a href='https://bell-sw.com/blog/how-to-deal-with-alpine-dns-issues/' title=''&gt;DNS&lt;/a&gt;.
&lt;/li&gt;&lt;li&gt;Debian includes both &lt;code&gt;adduser&lt;/code&gt; and &lt;code&gt;useradd&lt;/code&gt;.    Alpine, by default, only includes &lt;code&gt;adduser&lt;/code&gt;.
This can be addressed by installing package like &lt;a href='https://pkgs.alpinelinux.org/package/edge/community/armv7/shadow' title=''&gt;shadow&lt;/a&gt;, or switching to &lt;code&gt;adduser&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;Packages like &lt;a href='https://github.com/nodenv/node-build' title=''&gt;node-build&lt;/a&gt; require &lt;code&gt;bash&lt;/code&gt; which isn&amp;rsquo;t included by default.  Adding it back in allows &lt;code&gt;node-build&lt;/code&gt; to run to completion, but the end result is that a precompiled Debian executable is installed that won&amp;rsquo;t run on Alpine.
An alternative is to download an &lt;a href='https://unofficial-builds.nodejs.org/' title=''&gt;unofficial build&lt;/a&gt;.
&lt;/li&gt;&lt;li&gt;Release candidates for Alpine may not get the same level of testing as Debian resulting in problems
like &lt;a href='https://github.com/sparklemotion/sqlite3-ruby/issues/434' title=''&gt;sqlite3-ruby not working on Alpine 3.19&lt;/a&gt;.
In cases like this, stay back on previous versions of Alpine for a short while, or compile the gem for yourself.
These issues are temporary.
&lt;/li&gt;&lt;li&gt;Some packages, like Chrome, are not available for Alpine.  Alternatives like Chromium may be necessary.
&lt;/li&gt;&lt;/ul&gt;
&lt;h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Conclusion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;While not as large a community as Debian, there is a substantial number of happy Alpine users.&lt;/p&gt;

&lt;p&gt;For the forseeable future, the default for both frameworks and there fly.io will remain Debian, but we make it easy to switch.&lt;/p&gt;

&lt;p&gt;Try it out!  Hopefully this blog has provided insight into what you should evaluate for before you switch.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Introducing Fly Kubernetes</title>
    <link rel="alternate" href="https://fly.io/blog/fks/"/>
    <id>https://fly.io/blog/fks/</id>
    <published>2023-12-18T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/fks/assets/fks-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, and if you’ve been following us awhile you probably just did a double-take. We’re building a new public cloud that runs containerized applications with virtual machine isolation on our own hardware around the world. And we’ve been doing it without any K8s. Until now!&lt;/p&gt;
&lt;/div&gt;&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;Update, March 2024:&lt;/strong&gt; FKS does more stuff now, and you can read about it in &lt;a href="https://fly.io/blog/fks-beta-live/" title=""&gt;Fly Kubernetes does more now&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We&amp;rsquo;ll own it: we&amp;rsquo;ve been snarky about Kubernetes. We are, at heart, old-school Unix nerds. We&amp;rsquo;re still scandalized by &lt;code&gt;systemd&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To make matters more complicated, the problems we&amp;rsquo;re working on &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;have a lot of overlap with K8s&lt;/a&gt;, but &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/#numad' title=''&gt;just enough impedance mismatch&lt;/a&gt; that it (&lt;a href='https://www.nomadproject.io/' title=''&gt;or anything that looks like it&lt;/a&gt;) is a bad fit for our own platform.&lt;/p&gt;

&lt;p&gt;But, come on: you never took us too seriously about K8s, right? K8s is hard for us to use, but that doesn&amp;rsquo;t mean it&amp;rsquo;s not a great fit for what you&amp;rsquo;re building. We&amp;rsquo;ve been clear about that all along, right? Sure we have!&lt;/p&gt;

&lt;p&gt;Well, good news, everybody! If K8s is important for your project, and that&amp;rsquo;s all that&amp;rsquo;s been holding you back from &lt;a href='https://fly.io/docs/speedrun/' title=''&gt;trying out Fly.io&lt;/a&gt;, we&amp;rsquo;ve spent the past several months building something for you.&lt;/p&gt;
&lt;h2 id='fly-io-for-kubernetians' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-io-for-kubernetians' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Fly.io For Kubernetians&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Fly.io works by transmogrifying Docker containers into filesystems for &lt;a href='https://firecracker-microvm.github.io/' title=''&gt;lightweight hypervisors&lt;/a&gt;, and running them on servers we rack in dozens of regions around the world.&lt;/p&gt;

&lt;p&gt;You can build something like Fly.io with &amp;ldquo;standard&amp;rdquo; orchestration tools like K8s. In fact, that&amp;rsquo;s what we did to start, too. To keep things simple, we used Nomad, and instead of K8s CNIs, we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system &lt;a href='https://fly.io/blog/bpf-xdp-packet-filters-and-udp/' title=''&gt;based on eBPF&lt;/a&gt;). But the ideas are the same.&lt;/p&gt;

&lt;p&gt;The way we look at it, the signature feature of a &amp;ldquo;standard&amp;rdquo; orchestrator is the global scheduler: the global eye in the sky that keeps track of vacancies on servers and optimized placement of new workloads. That&amp;rsquo;s the problem we ran into. We&amp;rsquo;re running over 200,000 applications, and we&amp;rsquo;re doing so on every continent except Antarctica. The speed of light (and a globally distributed network of backhoes) has something to say about keeping a perfectly consistent global picture of hundreds of thousands of applications, and it&amp;rsquo;s not pleasant.&lt;/p&gt;

&lt;p&gt;The other problem we ran into is that our Nomad scheduler kept trying to outsmart us, and, worse, our customers. It turns out that our users have pretty firm ideas of where they&amp;rsquo;d like their apps to run. If they ask for São Paulo, they want São Paulo, not Rio. But global schedulers have other priorities, like optimally bin-packing resources, and sometimes &lt;code&gt;GIG&lt;/code&gt; looks just as good as &lt;code&gt;GRU&lt;/code&gt; to them.&lt;/p&gt;

&lt;p&gt;To escape the scaling and DX problems we were hitting, we rethought orchestration. Where orchestrators like K8s tend to work through distributed consensus, we keep state local to workers. Each racked server in our fleet is a source of truth about the apps running on it, and provide an API to a market-style &amp;ldquo;scheduler&amp;rdquo; that bids on resources in regions. &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/#numad' title=''&gt;You can read more about here, if you&amp;rsquo;re interested.&lt;/a&gt; We call this system the &lt;a href='https://fly.io/docs/machines/' title=''&gt;Fly Machines API.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An important detail to grok about how this all works – a reason we haven&amp;rsquo;t, like, beaten the CAP theorem by doing this – is that Fly Machines API calls can fail. If Nomad or K8s tries to place a workload on some server, only to find out that it&amp;rsquo;s filled up or thrown a rod, it will go hunt around for some other place to put it, like a good little robot. The Machines API won&amp;rsquo;t do this. It&amp;rsquo;ll just fail the request. In fact, it goes out of its way to fail the request quickly, to deliver feedback; if we can&amp;rsquo;t schedule work in &lt;code&gt;JNB&lt;/code&gt; right now, you might want instead to quickly deploy to &lt;code&gt;BOM&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id='pluggable-orchestration-and-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pluggable-orchestration-and-fks' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Pluggable Orchestration and FKS&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In a real sense what we&amp;rsquo;ve done here is extract a chunk of the scheduling problem out of our orchestrator, and handed it off to other components. For most of our users, that component is &lt;a href='https://github.com/superfly/flyctl' title=''&gt;&lt;code&gt;flyctl&lt;/code&gt;, our intrepid CLI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But &lt;a href='https://fly.io/docs/machines/working-with-machines/' title=''&gt;Fly Machines is an API&lt;/a&gt;, and anything can drive it. A lot of our users want quick answers to requests to schedule apps in specific regions, and &lt;code&gt;flyctl&lt;/code&gt; does a fine job of that. But it&amp;rsquo;s totally reasonable to want something that works more like the good little robots inside of K8s.&lt;/p&gt;

&lt;p&gt;You can build your own orchestrator with our API, but if what you&amp;rsquo;re looking for is literally Kubernetes, we&amp;rsquo;ve saved you the trouble. It&amp;rsquo;s called Fly Kubernetes, or FKS for short.&lt;/p&gt;

&lt;p&gt;FKS is an implementation of Kubernetes that runs on top of Fly.io. You start it up using &lt;code&gt;flyctl&lt;/code&gt;, by running &lt;code&gt;flyctl ext k8s create&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Under the hood, FKS is a straightforward combination of two well-known Kubernetes projects: &lt;a href='https://k3s.io/' title=''&gt;K3s, the lightweight CNCF-certified K8s distro&lt;/a&gt;, and &lt;a href='https://virtual-kubelet.io/' title=''&gt;Virtual Kubelet&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Virtual Kubelet is interesting. In K8s-land, a &lt;code&gt;kubelet&lt;/code&gt; is a host agent; it&amp;rsquo;s the thing that runs on every server in your fleet that knows how to run a K8s Pod. Virtual Kubelet isn&amp;rsquo;t a host agent; it&amp;rsquo;s a software component that pretends to be a host, registering itself with K8s as if it was one, but then sneakily proxying the Kubelet API elsewhere.&lt;/p&gt;

&lt;p&gt;In FKS, &amp;ldquo;elsewhere&amp;rdquo; is &lt;a href='https://fly.io/docs/machines/' title=''&gt;Fly Machines&lt;/a&gt;. All we have to do is satisfy various APIs that virtual kubelet exposes. For example, the API for the lifecycle of a pod:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-z4irn8cz"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-z4irn8cz"&gt;type PodLifecycleHandler interface {
    CreatePod(ctx context.Context, pod *corev1.Pod) error
    UpdatePod(ctx context.Context, pod *corev1.Pod) error
    DeletePod(ctx context.Context, pod *corev1.Pod) error
    GetPod(ctx context.Context, namespace, name string) (*corev1.Pod, error)
    GetPodStatus(ctx context.Context, namespace, name string) (*corev1.PodStatus, error)
    GetPods(context.Context) ([]*corev1.Pod, error)
}
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This interface is easy to map to the Fly Machines API. For example:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-pjbdlk52"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-pjbdlk52"&gt;CreatePod -&amp;gt; POST /apps/{app_name}/machines
UpdatePod -&amp;gt; POST /apps/{app_name}/machines/{machine_id}
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;K3s, meanwhile, is a stripped-down implementation of all of K8s that fits into a single binary. K3s does a bunch of clever things to be as streamlined as it is, but the most notable of them is &lt;a href='https://github.com/k3s-io/kine' title=''&gt;kine, an API shim that switches &lt;code&gt;etcd&lt;/code&gt; out with databases like SQLite&lt;/a&gt;. Because of &lt;code&gt;kine&lt;/code&gt;, K3s can manage multiple servers, but also gracefully runs on a single server, without distributed state.&lt;/p&gt;

&lt;p&gt;So that&amp;rsquo;s what we do. When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine. We compile a &lt;a href='https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/' title=''&gt;kubeconfig&lt;/a&gt;, with which you can talk to your K3s via &lt;code&gt;kubectl&lt;/code&gt;. We set the whole thing up to run Pods on individual Fly Machines, so your cluster scales out directly using our platform, but with K8s tooling.&lt;/p&gt;

&lt;p&gt;One thing we like about this design is how much of the lifting is already done for us by the underlying platform. If you&amp;rsquo;re a K8s person, take a second to think of all the different components you&amp;rsquo;re dealing with: &lt;a href='https://etcd.io/' title=''&gt;etcd&lt;/a&gt;, specifically provisioned nodes, the &lt;a href='https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/' title=''&gt;kube-proxy&lt;/a&gt;, &lt;a href='https://github.com/flannel-io/flannel' title=''&gt;a CNI &lt;/a&gt;binary and configuration and its integration with the host network, containerd, registries. But Fly.io already does most of those things. So this project was mostly chipping away components until we found the bare minimum: CoreDNS, SQLite persistence, and Virtual Kubelet.&lt;/p&gt;

&lt;p&gt;We ended up with something significantly simpler than K3s, which is saying something.&lt;/p&gt;

&lt;p&gt;Fly Kubernetes has some advantages over plain &lt;code&gt;flyctl&lt;/code&gt; and &lt;code&gt;fly.toml&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your deployment is more declarative than it is with the &lt;code&gt;fly.toml&lt;/code&gt; file. You declare the exact state of everything down to replica counts, autoscaling rules, volume definitions, and more.
&lt;/li&gt;&lt;li&gt;When you deploy with Fly Kubernetes, Kubernetes will automatically make your definitions match the state of the world. Machines go down? Kubernetes will whack them back online.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is a different way to do orchestration and scheduling on Fly.io. It&amp;rsquo;s not what everyone is going to want. But if you want it, you really want it, and we&amp;rsquo;re psyched to give it to you: Fly.io&amp;rsquo;s platform features, with Kubernetes handling configuration and driving your system to its desired state.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve kept things simple to start with. There are K8s use cases we&amp;rsquo;re a strong fit for today, and others we&amp;rsquo;ll get better at in the near future, as K8s users drive the underlying platform (and particularly our proxy) forward.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Interested in getting early access? Email us at &lt;a href="mailto:[email protected]"&gt;[email protected]&lt;/a&gt; and we&amp;rsquo;ll hook you up.&lt;/strong&gt;&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Not invested in K8s?&lt;/h1&gt;
    &lt;p&gt;Nothing has to change for you! You can deploy apps on Fly.io today, in a matter of minutes, without talking to Sales.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/speedrun/"&gt;
        Deploy an app in minutes.&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/A3vFfZvUiwo"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id='what-it-all-means' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-all-means' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What It All Means&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;One obvious thing it means is that you&amp;rsquo;ve got an investment in Kubernetes tooling, you can keep it while running things on top of Fly.io. So that&amp;rsquo;s pretty neat. Buy our cereal!&lt;/p&gt;

&lt;p&gt;But the computer science story is interesting, too. We placed a bet on an idiosyncratic strategy for doing global orchestration. We replaced global consensus, which is how Borg, Kubernetes, and Nomad all work, with a market-based system. That system was faster and, importantly, dumber than the consensus system it replaced.&lt;/p&gt;

&lt;p&gt;This had costs! Nomad&amp;rsquo;s global consensus would do truly heroic amounts of work to make sure Fly Apps got scheduled somewhere, anywhere. Like a good capitalist, Fly Machines will tell you in no uncertain terms how much work it&amp;rsquo;s willing to do for you (&amp;ldquo;less than a Nomad&amp;rdquo;).&lt;/p&gt;

&lt;p&gt;But that doesn&amp;rsquo;t mean you&amp;rsquo;re stuck with the answers Fly Machines gives by itself. Because Fly Machines is so simple, and tries so hard to be predictable, we hoped you&amp;rsquo;d be able to build more sophisticated scheduling and orchestration schemes on top of it. And here you go: Kubernetes scheduling, as a plugin to the platform.&lt;/p&gt;

&lt;p&gt;More to come! We&amp;rsquo;re itching to see just how many different ways this bet might pay off. Or: we&amp;rsquo;ll perish in flames! Either way, it&amp;rsquo;ll be fun to watch.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Fly.io has GPUs now</title>
    <link rel="alternate" href="https://fly.io/blog/fly-io-has-gpus-now/"/>
    <id>https://fly.io/blog/fly-io-has-gpus-now/</id>
    <published>2023-12-13T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/fly-io-has-gpus-now/assets/llama-portal-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, we’re a new public cloud that lets you put your compute where it matters: near your users. Today we’re announcing that you can do this with GPUs too, allowing you to do AI workloads on the edge. Want to find out more? Keep reading.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='ai-is-pretty-fly' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ai-is-pretty-fly' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;AI is pretty fly&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;AI is apparently a bit of a &lt;em&gt;thing&lt;/em&gt; (maybe even &lt;em&gt;an thing&lt;/em&gt; come to think about it). We&amp;rsquo;ve seen entire industries get transformed in the wake of ChatGPT existing (somehow it&amp;rsquo;s only been around for a year, I can&amp;rsquo;t believe it either). It&amp;rsquo;s likely to leave a huge impact on society as a whole in the same way that the Internet did once we got search engines. Like any good venture-capital funded infrastructure provider, we want to enable you to do hilarious things with AI using industrial-grade muscle.&lt;/p&gt;

&lt;p&gt;Fly.io lets you run a full-stack app&amp;mdash;or an entire dev platform based on the &lt;a href='https://fly.io/docs/machines/' title=''&gt;Fly Machines API&lt;/a&gt;&amp;mdash;close to your users. Fly.io GPUs let you attach an &lt;a href='https://www.nvidia.com/en-us/data-center/a100/' title=''&gt;Nvidia A100&lt;/a&gt; to whatever you&amp;rsquo;re building, harnessing the full power of CUDA with more VRAM than your local 4090 can shake a ray-traced stick at. With these cards (or whatever you call a GPU attached to SXM fabric), AI/ML workloads are at your fingertips. You can &lt;a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''&gt;recognize speech&lt;/a&gt;, segment text, summarize articles, synthesize images, and more at speeds that would make your homelab blush. You can even set one up as your programming companion with &lt;a href='https://github.com/deepseek-ai/DeepSeek-Coder' title=''&gt;your model of choice&lt;/a&gt; in case you&amp;rsquo;ve just not been feeling it with the output of &lt;em&gt;other&lt;/em&gt; models changing over time.&lt;/p&gt;

&lt;p&gt;If you want to find out more about what these cards are and what using them is like, check out &lt;a href='https://fly.io/blog/what-are-these-gpus-really/' title=''&gt;What are these &amp;ldquo;GPUs&amp;rdquo; really?&lt;/a&gt; It covers the history of GPUs and why it&amp;rsquo;s ironic that the cards we offer are called &amp;ldquo;Graphics Processing Units&amp;rdquo; in the first place.&lt;/p&gt;
&lt;h2 id='fly-io-gpus-in-action' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-io-gpus-in-action' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Fly.io GPUs in Action&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We want you to deploy your own code with your favorite models on top of Fly.io&amp;rsquo;s cloud backbone. Fly.io GPUs make this really easy.&lt;/p&gt;

&lt;p&gt;You can get a GPU app running &lt;a href='https://ollama.ai' title=''&gt;Ollama&lt;/a&gt; (our friends in text generation) in two steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Put this in your &lt;code&gt;fly.toml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-u2sd0gfo"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-u2sd0gfo"&gt;&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"sandwich_ai"&lt;/span&gt;
&lt;span class="py"&gt;primary_region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ord"&lt;/span&gt;
&lt;span class="py"&gt;vm.size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"a100-40gb"&lt;/span&gt;

&lt;span class="nn"&gt;[build]&lt;/span&gt;
  &lt;span class="py"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama/ollama"&lt;/span&gt;

&lt;span class="nn"&gt;[mounts]&lt;/span&gt;
  &lt;span class="py"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"models"&lt;/span&gt;
  &lt;span class="py"&gt;destination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/root/.ollama"&lt;/span&gt;
  &lt;span class="py"&gt;initial_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"100gb"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Run &lt;code&gt;fly apps create sandwich_ai &amp;amp;&amp;amp; fly deploy&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;If you want to read more about how to start your new sandwich empire, check out &lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;Scaling Large Language Models to zero with Ollama&lt;/a&gt;, it explains how to set up Ollama so that it &lt;em&gt;automatically scales itself down&lt;/em&gt; when it&amp;rsquo;s not in use.&lt;/p&gt;
&lt;h2 id='the-speed-of-light-is-only-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-speed-of-light-is-only-so-fast' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The speed of light is only so fast&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Being able to spin up GPUs is great, but where Fly.io really shines is inference at the edge.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s say you have an app that lets users enter ingredients they have in their kitchen and receive a sandwich recipe. Your users expect their recipes &lt;em&gt;instantly&lt;/em&gt; (or at least as fast as the other leading apps). Seconds count when you need an emergency sandwich.&lt;/p&gt;
&lt;div class="left-sidenote"&gt;&lt;p&gt;&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
It’s depressingly customary in the AI industry to cherry-pick outputs. This was not cherry-picked. I used &lt;a href="https://ollama.ai/library/yi:34b" title=""&gt;yi:34b&lt;/a&gt; to generate this recipe. I’m not sure what a taco salad sandwich is, but I might be willing to try it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="A conversation between a user and an artificial intelligence. The user asks: &amp;quot;What kind of sandwich can I make with an egg, a taco shell, and a pound of ground beef?&amp;quot; The AI responds: &amp;quot; You can make a breakfast taco or a taco salad sandwich with the ingredients you have. Here&amp;#39;s how to prepare each: for the breakfast taco. Cook the pound of ground beef in a pan until browned. Season with salt, pepper, and any other desired spices like chili powder or taco seasoning. While the beef is cooking, fry an egg (or scramble it if you prefer) in another pan. Warm up the taco shell in the oven or microwave according to the package instructions. Assemble the taco by placing the ground cooked beef in the taco shell followed by the fried egg. You can also add any additional toppings like diced onions avocados, avocado, salsa or cheese if you have them. Serve with a side of hot sauce or sour cream if desired. For the taco salad sandwich: Prepare the ground beef as in the breakfast taco recipe above. Instead of frying an egg, you can cook it sunny side up or over easy depending on your preference. Crumble the taco shell into small pieces and mix them with the cooked ground beef. Assemble the sandwich by layering the crumbled taco shell mixture and ground beef in two slices of bread. Top with the fried egg and any additional toppings you like such as shredded lettuce, diced tomatoes, or sliced olives. Serve with a side of salsa or ranch dressing if desired. Both options are hardy and can be easily customized based on your taste preferences. Enjoy!&amp;quot;" src="/blog/fly-io-has-gpus-now/assets/sandwich-recipes.webp" /&gt;&lt;/p&gt;

&lt;p&gt;In the previous snippet, we deployed our app to ord (&lt;code&gt;primary_region = &amp;quot;ord&amp;quot;&lt;/code&gt;). The good news is that our model returns a result really quickly and users in Chicago get instant sandwich recipes. It&amp;rsquo;s a good experience for users near your datacentre, and you can do this on any half decent cloud provider.&lt;/p&gt;

&lt;p&gt;But surely people outside of Chicago need sandwiches too. Amsterdam has sandwich fiends as well. And sometimes it takes too long to have their requests leap across the pond. The speed of light is only so fast after all. Don&amp;rsquo;t worry, we&amp;rsquo;ve got your back. Fly.io has GPUs in datacentres all over the world. Even more, we&amp;rsquo;ll let you run &lt;em&gt;the same program&lt;/em&gt; with the same public IP address and the same TLS certificates in any regions with GPU support.&lt;/p&gt;

&lt;p&gt;Don&amp;rsquo;t believe us? See how you can scale your app up in Amsterdam with one command:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ltu09wf5"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-ltu09wf5"&gt;fly scale count 2 --region ams
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;It&amp;rsquo;s that easy.&lt;/p&gt;
&lt;h2 id='actually-on-demand' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#actually-on-demand' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Actually On-Demand&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;GPUs are powerful parallel processing packages, but they&amp;rsquo;re not cheap! Once we have enough people wanting to turn their fridge contents into tasty sandwiches, keeping a GPU or two running makes sense. But we&amp;rsquo;re just a small app still growing our user base while also funding the latest large sandwich model research. We want to only pay for GPUs when a user makes a request.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s open up that &lt;code&gt;fly.toml&lt;/code&gt; again, and add a section called &lt;code&gt;services&lt;/code&gt;, and we&amp;rsquo;ll include instructions on how we want our app to scale up and down:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-yf7bfopz"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-yf7bfopz"&gt;&lt;span class="nn"&gt;[[services]]&lt;/span&gt;
  &lt;span class="py"&gt;internal_port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;
  &lt;span class="py"&gt;protocol&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"tcp"&lt;/span&gt;
  &lt;span class="py"&gt;auto_stop_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_start_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;min_machines_running&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Now when no one needs sandwich recipes, you don&amp;rsquo;t pay for GPU time.&lt;/p&gt;
&lt;h2 id='the-deets' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-deets' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Deets&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We have GPUs ready to use in several US and EU regions and Sydney. You can deploy your sandwich, music generation, or AI illustration apps to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://www.nvidia.com/en-us/data-center/a100/' title=''&gt;Ampere A100s&lt;/a&gt; with 40gb of RAM for $2.50/hr
&lt;/li&gt;&lt;li&gt;&lt;a href='https://www.nvidia.com/en-us/data-center/a100/' title=''&gt;Ampere A100s&lt;/a&gt; with 80gb of RAM for $3.50/hr
&lt;/li&gt;&lt;li&gt;&lt;a href='https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413' title=''&gt;Lovelace L40s&lt;/a&gt; are coming soon (update: now here!) for $2.50/hr
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;By default, anything you deploy to GPUs will use eight heckin&amp;rsquo; &lt;a href='https://www.amd.com/en/processors/epyc-server-cpu-family' title=''&gt;AMD EPYC&lt;/a&gt; CPU cores, and you can attach volumes up to 500 gigabytes. We&amp;rsquo;ll even give you discounts for reserved instances and dedicated hosts if you ask nicely.&lt;/p&gt;

&lt;p&gt;We hope you have fun with these new cards and we&amp;rsquo;d love to see what you can do with them! Reach out to us on X (formerly Twitter) or &lt;a href='https://community.fly.io/' title=''&gt;the community forum&lt;/a&gt; and share what you&amp;rsquo;ve been up to. We&amp;rsquo;d love to see what we can make easier!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>What are these "GPUs" really?</title>
    <link rel="alternate" href="https://fly.io/blog/what-are-these-gpus-really/"/>
    <id>https://fly.io/blog/what-are-these-gpus-really/</id>
    <published>2023-12-11T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/what-are-these-gpus-really/assets/gpu-songstress-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Fly.io runs containerized apps with virtual machine isolation on our own hardware around the world, so you can safely run your code close to where your users are. We’re in the process of rolling out GPU support, and that’s what this post is about, but you don’t have to wait for that to try us out: &lt;a href="https://fly.io/docs/speedrun/" title=""&gt;your app can be up and running on us in minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;GPU hardware will let our users run all sorts of fun Artificial Intelligence and Machine Learning (AI/ML) workloads near their users. But, what are these &amp;ldquo;GPUs&amp;rdquo; really? What can they do? What &lt;em&gt;can&amp;rsquo;t&lt;/em&gt; they do?&lt;/p&gt;

&lt;p&gt;Listen here for my tale of woe as I spell out exactly what these cards are, are not, and what you can do with them. By the end of this magical journey, you should understand the true irony of them being called &amp;ldquo;Graphics Processing Units&amp;rdquo; and why every marketing term is always bad forever.&lt;/p&gt;
&lt;h2 id='how-does-computer-formed' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-does-computer-formed' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How does computer formed?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In the early days of computing, your computer generally had a few basic components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The CPU
&lt;/li&gt;&lt;li&gt;Input device and assorted peripherals (keyboard, etc)
&lt;/li&gt;&lt;li&gt;Output device (monitor, printer, etc)
&lt;/li&gt;&lt;li&gt;Memory
&lt;/li&gt;&lt;li&gt;Glue logic chips
&lt;/li&gt;&lt;li&gt;Video rendering hardware
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Taking the Commodore 64 as an example, it had a CPU, a chip to handle video output, a chip to handle audio output, and a chip to glue everything together. The CPU would read instructions from the RAM and then execute them to do things like draw to the screen, solve sudoku puzzles, play sounds, and so on.&lt;/p&gt;

&lt;p&gt;However, even though the CPU by itself was fast by the standards of the time, it could only do a million clock cycles per second or so. Imagine a very small shouting crystal vibrating millions of times per second triggering the CPU to do one part of a task and you&amp;rsquo;ll get the idea. This is fast, but not fast enough when executing instructions can take longer than a single clock cycle and when your video output device needs to be updated 60 times per second.&lt;/p&gt;

&lt;p&gt;The main way they optimized this was by shunting a lot of the video output tasks to a bespoke device called the VIC-II (Video Interface Chip, version 2). This allowed the Commodore 64 to send a bunch of instructions to the VIC-II and then let it do its thing while the CPU was off doing other things. This is called &amp;ldquo;offloading&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/what-are-these-gpus-really/assets/./deus-ex-machina-cover.webp" /&gt;&lt;/p&gt;

&lt;p&gt;As technology advanced, the desire to do bigger and better things with both contemporary and future hardware increased. This came to a head when this little studio nobody had ever heard of called id Software released one of the most popular games of all time: DOOM.&lt;/p&gt;

&lt;p&gt;Now, even though DOOM was a huge advancement in gaming technology, it was still incredibly limited by the hardware of the time. It was actually a 2D game that used a lot of tricks to make it look (and feel) like it was 3D. It was also limited to a resolution of 320x200 and a hard cap of 35 frames per second. This was fine for the time (most movies were only at 24 frames per second), but it was clear that there was a lot of room for improvement.&lt;/p&gt;

&lt;p&gt;One of the main things that DOOM did was to use a pair of techniques to draw the world at near real-time. It used a combination of &amp;ldquo;raycasting&amp;rdquo; and binary-space partitioning to draw the world. This basically means that they drew a bunch of imaginary lines to where points in the map would be to figure out what color everything would be and then eliminated the parts of the map that were behind walls and other objects. This is a very simplified explanation, and if you want to know more, &lt;a href='https://fabiensanglard.net/doomIphone/doomClassicRenderer.php' title=''&gt;Fabien Sanglard explains the rendering&lt;/a&gt; of DOOM in more detail.&lt;/p&gt;
&lt;h2 id='the-dream-of-3d' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-dream-of-3d' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The dream of 3D&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;However, a lot of this was logic that ran very slowly on the CPU, and while the CPU was doing the display logic, it couldn&amp;rsquo;t do anything else, such as enemy AI or playing sounds. Hence the idea of a &amp;ldquo;3D accelerator card&amp;rdquo;. The idea: offload the 3D rendering logic to a separate device that could do it much faster than the CPU could, and free the CPU to do other things like AI, sound, and so on.&lt;/p&gt;

&lt;p&gt;This was the dream, but it was a long way off. Then Quake happened.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Really, Half-Life is based on Quake so much that the pattern for &lt;a href="https://www.pcgamer.com/half-life-alyxs-lights-flicker-just-like-they-did-in-quake-almost-25-years-later/" title=""&gt;blinking lights&lt;/a&gt; has carried forward 25 years later to Half-Life: Alyx in VR. If it ain’t broke, don’t fix it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Unlike Doom, Quake was fully 3D on unmodified consumer hardware. Players could look up and down (something previously thought impossible without accelerator hardware!) and designers could make levels with that in mind. Quake also allowed much more complex geometry and textures. It was a huge leap forward in 3D gaming and it was only possible because of the massive leap in CPU power at the time. The Pentium family of processors was such a huge leap that it allowed them to bust through and do it in &amp;ldquo;real time&amp;rdquo;. Quake has since set the standard for multiplayer deathmatch games, and its source code has lineage to Call of Duty, Half-Life, Half-Life 2, DotA 2, Titanfall, and Apex Legends.&lt;/p&gt;

&lt;p&gt;However, the thing that really made 3D accelerator cards leap into the public spotlight was another little-known studio called Crystal Dynamics and their 1996 release of Tomb Raider. It was built from the ground up to require the use of 3D accelerator cards. The cards flew off the shelves.&lt;/p&gt;

&lt;p&gt;&amp;ldquo;3D accelerator cards&amp;rdquo; would later become known as &amp;ldquo;Graphics Processing Units&amp;rdquo; or GPUs because of how synonymous they became with 3D gaming, engineering tasks such as Computer-Aided Drafting (CAD), and even the entire OS environment with compositors like &lt;a href='https://en.wikipedia.org/wiki/Desktop_Window_Manager' title=''&gt;DWM&lt;/a&gt; on Windows Vista, &lt;a href='https://en.wikipedia.org/wiki/Compiz' title=''&gt;Compiz&lt;/a&gt; on GNU+Linux, and &lt;a href='https://en.wikipedia.org/wiki/Quartz_(graphics_layer)' title=''&gt;Quartz&lt;/a&gt; on macOS. Things became so much easier for everyone when 2D and 3D graphics were integrated into the same device so you didn&amp;rsquo;t need to chain your output through your 3D accelerator card!&lt;/p&gt;
&lt;h2 id='the-gpu-as-we-know-it' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-gpu-as-we-know-it' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The GPU as we know it&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;When GPUs first came out, they were very simple devices. They had a few basic components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A framebuffer to store the current state of the screen
&lt;/li&gt;&lt;li&gt;A command processor to take instructions from the game and translate them into something the hardware can understand
&lt;/li&gt;&lt;li&gt;Memory to store temporary data
&lt;/li&gt;&lt;li&gt;Shader processing hardware to allow designers to change how light and textures were rendered
&lt;/li&gt;&lt;li&gt;A display output that was chained through an existing VGA card so that the user could see what was going on in real time (yes, this is something we actually did)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This basic architecture has remained the same for the past 20 years or so. The main differences are that as technology advanced, the capabilities of those cards increased. They got faster, more parallel, more capable, had more memory, were made cheaper, and so on. This gradually allowed for more and more complex games like Half-Life 2, Crysis, The Legend of Zelda: Breath of the Wild, Baudur&amp;rsquo;s Gate 3, and so on.&lt;/p&gt;

&lt;p&gt;Over time, as more and more hardware was added, GPUs became computers in their own rights (sometimes even bigger than the rest of the computer thanks for the need to cool things more aggressively). This new hardware includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Video encoding hardware via NVENC and AMD VCE so that content creators can stream and record their gameplay in higher quality without having to impact the performance of the game
&lt;/li&gt;&lt;li&gt;&lt;aside class="left-sidenote"&gt;Seriously, once you experience high framerate HDR raytraced Tetris you can&amp;rsquo;t really go back to the old way.&lt;/aside&gt;  Raytracing accelerator cores via RTX so that light can be rendered more realistically
&lt;/li&gt;&lt;li&gt;AI/ML cores to allow for dynamic upscaling to eke out more performance from the card
&lt;/li&gt;&lt;li&gt;Display output hardware to allow for multiple monitors to be connected to the card
&lt;/li&gt;&lt;li&gt;Faster and faster memory buses and interfaces to the rest of the system to allow for more data to be processed faster
&lt;/li&gt;&lt;li&gt;Direct streaming from the drive to GPU memory to allow for faster loading times
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;But, at the same time, that AI/ML hardware started to get noticed by more and more people. It was discovered that the shader cores and then the CUDA cores could be used to do AI/ML workloads at ludicrous speeds. This enabled research and development of models like GPT-2, Stable Diffusion, DLSS, and so on. This has led to a Cambrian Explosion of AI/ML research and development that is continuing to this day.&lt;/p&gt;
&lt;h2 id='the-quot-gpus-quot-that-fly-io-is-using' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-quot-gpus-quot-that-fly-io-is-using' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The &amp;ldquo;GPUs&amp;rdquo; that Fly.io is using&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve mostly been describing consumer GPUs and their capabilities up to this point because that&amp;rsquo;s what we all have the biggest understanding of. There is a huge difference between the &amp;ldquo;GPUs&amp;rdquo; that you can get for server tasks and normal consumer tasks from a place like Newegg or Best Buy. The main difference is that enterprise-grade Graphics Processing Units do not have any of the hardware needed to process graphics.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Author’s note: This will not be the case in the future. Fly.io is going to add &lt;a href="https://www.nvidia.com/en-us/data-center/l40s/" title=""&gt;Lovelace L40S GPUs&lt;/a&gt; that do have 3D rendering, video encoding, shader cores, and so on. But, that’s not what we’re talking about today.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Yes. Really. They don&amp;rsquo;t have rasterization hardware, shader cores, display outputs, or anything useful for trying to run games on them. They are AI/ML accelerator cards more than anything. It&amp;rsquo;s kinda beautifully ironic that they&amp;rsquo;re called Graphics Processing Units when they have no ability to process graphics.&lt;/p&gt;
&lt;h2 id='what-can-you-do-with-them' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-can-you-do-with-them' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What can you do with them?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;These GPUs are really good at massively parallel tasks. This naturally translates to being very good at AI/ML tasks such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Summarization (what is this article about in a few sentences?)
&lt;/li&gt;&lt;li&gt;Translation (what does this article say in Spanish?)
&lt;/li&gt;&lt;li&gt;Speech recognition (what is a voice clip saying?)
&lt;/li&gt;&lt;li&gt;Speech synthesis (what does this text sound like?)
&lt;/li&gt;&lt;li&gt;Text generation (what would a cat say if it could talk?)
&lt;/li&gt;&lt;li&gt;Basic rote question and answering (what is the safe cooking temperature for chicken breasts in celsius?)
&lt;/li&gt;&lt;li&gt;Text classification (is this article about cats or dogs?)
&lt;/li&gt;&lt;li&gt;Sentiment analysis (is this article positive or negative, what could that mean about the companies involved?)
&lt;/li&gt;&lt;li&gt;Image classification (is this a cat or a dog?)
&lt;/li&gt;&lt;li&gt;Object detection (where are the cats and dogs in this image?)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Or any combination/chain of these tasks. A lot of this is pretty abstract building blocks that can be combined in a lot of different ways. This is why AI/ML stuff is so exciting right now. We&amp;rsquo;re in the early days of understanding what these things are, what they can do, and how to use them properly.&lt;/p&gt;

&lt;p&gt;Imagine being able to load articles about the topic you are researching into your queries to find where someone said something roughly similar to what you&amp;rsquo;re looking for. Queries like &amp;ldquo;that one recipe with eggs that you fold over with ham in it&amp;rdquo;. That&amp;rsquo;s the kind of thing that&amp;rsquo;s possible with AI/ML (and tools like vector databases) but difficult to impossible with traditional search engines.&lt;/p&gt;
&lt;h2 id='how-to-use-ai-for-reals' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-to-use-ai-for-reals' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How to use AI for reals&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Fortunately and unfortunately, we&amp;rsquo;re in the Cambrian Explosion days of this industry. Key advances happen constantly. Exact models and tooling changes almost as often. This is both a very good thing and a very bad thing.&lt;/p&gt;

&lt;p&gt;If you want to get started today, here&amp;rsquo;s a few models that you can play with right now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://ai.meta.com/llama/' title=''&gt;Llama 2&lt;/a&gt; - A generic foundation model with instruction and chat tuned variants. It&amp;rsquo;s a good starting point for a lot of research and nearly everything else uses the same formats that Llama 2 does.
&lt;/li&gt;&lt;li&gt;&lt;a href='https://openai.com/research/whisper' title=''&gt;Whisper&lt;/a&gt; - A speech to text model that transcribes audio files into text better than most professional dictation software. I, the author of this article, wrote most of this article using Whisper.
&lt;/li&gt;&lt;li&gt;&lt;a href='https://huggingface.co/NurtureAI/OpenHermes-2.5-Mistral-7B-16k' title=''&gt;OpenHermes-2.5 Mistral 7B 16k&lt;/a&gt; - An instruction-tuned model that can operate on up to 16 thousand tokens (about 40 printed pages of text, 12,000 words) at once. It&amp;rsquo;s a good starting point for summarization and other tasks that require a lot of context. I personally use it for my personal AI chatbot named &lt;a href='https://xeiaso.net/characters/#Mimi' title=''&gt;Mimi&lt;/a&gt;.
&lt;/li&gt;&lt;li&gt;&lt;aside class="right-sidenote"&gt;Seriously Annie, you&amp;rsquo;re great!&lt;/aside&gt; &lt;a href='https://stability.ai/stable-diffusion' title=''&gt;Stable Diffusion XL&lt;/a&gt; - A text-to-image model that lets you create high quality images from simple text descriptions. It&amp;rsquo;s a good starting point for tasks that require image generation, such as when you want to add images to your blog posts but don&amp;rsquo;t have an artist like Annie to draw you what you want.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;For a practical example, imagine that you have a set of &lt;a href='https://xeiaso.net/talks/' title=''&gt;conference talks that you&amp;rsquo;ve given over the years&lt;/a&gt;. You want to take those talk videos, extract the audio, and transform them into written text because some people learn better from text than video. The overall workflow would look something like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use ffmpeg to extract the audio track from the video files
&lt;/li&gt;&lt;li&gt;Use Whisper to &lt;a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''&gt;convert the audio files into subtitle files&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;Break the subtitle file into sequences based on significant pauses between topics (humans do this subconsciously, take advantage of it and you can make things seem heckin&amp;rsquo; magic)
&lt;/li&gt;&lt;li&gt;Use a large language model to summarize the segments and create a title for each segment
&lt;/li&gt;&lt;li&gt;Paste the rest of the text into a markdown document between the segment titles
&lt;/li&gt;&lt;li&gt;Manually review the documents and make any necessary changes with technical terms that the model didn&amp;rsquo;t know about or things the model got wrong because English is a minefield of homophones that even trained experts have trouble with (ask me how I know)
&lt;/li&gt;&lt;li&gt;Publish the documents on your blog
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Then bam, you don&amp;rsquo;t just have a portfolio piece, you have the recipe for winning downtime from visitors of orange websites clicking on your link so much. You can also use this to create transcripts for your videos so that people who can&amp;rsquo;t hear can still enjoy your content.&lt;/p&gt;

&lt;p&gt;The true advantage of these is not using them as individual parts on themselves, but as a cohesive whole in a chain. This is where the real power of AI/ML comes from. It&amp;rsquo;s not the individual models, but the ability to chain them together to do something useful. This is where the true opportunities for innovation lie.&lt;/p&gt;
&lt;h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Conclusion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So that&amp;rsquo;s what these &amp;ldquo;GPUs&amp;rdquo; are really: they&amp;rsquo;re AI/ML accelerator cards. The A100 cards incapable of processing graphics or encoding video, but they&amp;rsquo;re really, really good at AI/ML workloads. They allow you to do way more tasks per watt than any CPU ever could.&lt;/p&gt;

&lt;p&gt;I hope you enjoyed this tale of woe as I spilled out the horrible truths about marketing being awful forever and gave you ideas for how to &lt;em&gt;actually use&lt;/em&gt; these graphics-free Graphics Processing Units to do useful things. But sadly, not for processing graphics unless you wait for the &lt;a href='https://www.nvidia.com/en-us/data-center/l40s/' title=''&gt;Lovelace L40S&lt;/a&gt; cards early in 2024.&lt;/p&gt;

&lt;p&gt;Sign up for Fly.io today and try our GPUs! I can&amp;rsquo;t wait to see what you build with them.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Scaling Large Language Models to zero with Ollama</title>
    <link rel="alternate" href="https://fly.io/blog/scaling-llm-ollama/"/>
    <id>https://fly.io/blog/scaling-llm-ollama/</id>
    <published>2023-12-06T12:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/scaling-llm-ollama/assets/thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io. We have powerful servers worldwide to run your code close to your users. Including GPUs so you can self host your own AI.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Open-source self-hosted AI tools have advanced a lot in the past 6 months. They allow you to create new methods of expression (with QR code generation and Stable Diffusion), easy access to summarization powers that would have made Google blush a decade ago (even with untuned foundation models such as LLaMa 2 and Yi), to conversational assistants that enable people to do more with their time, and to perform speech recognition in &lt;em&gt;real time&lt;/em&gt; on moderate hardware (with Whisper et al). With all these capabilities comes the need for more and more raw computational muscle to be able to do inference on bigger and bigger models, and eventually do things that we can&amp;rsquo;t even imagine right now. Fly.io lets you put your compute where your users are so that you can do machine learning inference tasks on the edge with the power of enterprise-grade GPUs such as the Nvidia A100. You can also scale your GPU nodes to zero running Machines, so you only pay for what you actually need, when you need it.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;It’s worth mentioning that “scaling to zero” doesn’t mean what you may think it means. When you “scale to zero” in Fly.io, you actually stop the running Machine. This means the Machine is still laying around on the same computer box that it runs on, but it’s just put to sleep. If there is a capacity issue then your app may be unable to wake back up. We are working on a solution to this, but for now you should be aware that scaling to zero is not the same as spinning down your Machine and spinning it back up again on a new computer box when you need it.&lt;/p&gt;
&lt;/div&gt;&lt;div class="callout"&gt;&lt;p&gt;This is a continuation of the last post in this series about &lt;a href="https://fly.io/blog/transcribing-on-fly-gpu-machines/" title=""&gt;how to use GPUs on Fly.io&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='why-scale-to-zero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-scale-to-zero' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Why scale to zero?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Running GPU nodes on top of Fly is expensive. Sure, GPUs enable you to do things a lot faster than CPUs ever could on their own, but you mostly will have things run idle between uses. This is where scaling to zero comes in. With scaling to zero, you can have your GPU nodes shut down when you&amp;rsquo;re not using them. When your Machine stops, you aren&amp;rsquo;t paying for the GPU any more. This is good for the environment and your wallet.&lt;/p&gt;

&lt;p&gt;In this post, we&amp;rsquo;re going to be using &lt;a href='https://ollama.ai' title=''&gt;Ollama&lt;/a&gt; to generate text. Ollama is a fancy wrapper around &lt;a href='https://github.com/ggerganov/llama.cpp' title=''&gt;llama.cpp&lt;/a&gt; that allows you to run large language models on your own hardware with your choice of model. It also supports GPU acceleration, meaning that you can use Fly.io&amp;rsquo;s huge GPUs to run your models faster than your RTX 3060 at home ever would on its own.&lt;/p&gt;

&lt;p&gt;One of the main downsides of using Ollama in a cloud environment is that it doesn&amp;rsquo;t have authentication by default. Thanks to the power of about 70 lines of Go, we are able to shim that in after the fact. This will protect your server from random people on the internet using your GPU time (and spending your money) to generate text and integrate it into your own applications.&lt;/p&gt;

&lt;p&gt;Create a new folder called &lt;code&gt;ollama-scale-to-0&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-kehwb9j1"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-kehwb9j1"&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;ollama-scale-to-0
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;h2 id='fly-app-setup' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-app-setup' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Fly app setup&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;First, we need to create a new Fly app:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-78cxwfbn"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-78cxwfbn"&gt;fly launch &lt;span class="nt"&gt;--no-deploy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;After selecting a name and an organization to run it in, this command will create the app and write out a &lt;code&gt;fly.toml&lt;/code&gt; file for you:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-6efaeoqi"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-6efaeoqi"&gt;&lt;span class="c"&gt;# fly.toml app configuration file generated for sparkling-violet-709 on 2023-11-14T12:13:53-05:00&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;
&lt;span class="c"&gt;# See https://fly.io/docs/reference/configuration/ for information about how to use this file.&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;

&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"sparkling-violet-709"&lt;/span&gt;
&lt;span class="py"&gt;primary_region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ord"&lt;/span&gt;

&lt;span class="nn"&gt;[http_service]&lt;/span&gt;
  &lt;span class="py"&gt;internal_port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;11434&lt;/span&gt; &lt;span class="c"&gt;# change me to 11434!&lt;/span&gt;
  &lt;span class="py"&gt;force_https&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="c"&gt;# change mo to false!&lt;/span&gt;
  &lt;span class="py"&gt;auto_stop_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_start_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;min_machines_running&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="py"&gt;processes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;["app"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is the configuration file that Fly.io uses to know how to run your application. We&amp;rsquo;re going to be modifying the &lt;code&gt;fly.toml&lt;/code&gt; file to add some additional configuration to it, such as enabling GPU support:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ef9ddpog"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-ef9ddpog"&gt;&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"sparkling-violet-709"&lt;/span&gt;
&lt;span class="py"&gt;primary_region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ord"&lt;/span&gt;
&lt;span class="py"&gt;vm.size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"a100-40gb"&lt;/span&gt; &lt;span class="c"&gt;# the GPU size, see https://fly.io/docs/gpus/gpu-quickstart/ for more info&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We don&amp;rsquo;t want to expose the GPU to the internet, so we&amp;rsquo;re going to create a &lt;a href='https://fly.io/docs/reference/private-networking/#flycast-private-load-balancing' title=''&gt;flycast&lt;/a&gt; address to expose it to other services on your private network. To create a flycast address, run this command:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-q2lw27z"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-q2lw27z"&gt;fly ips allocate-v6 &lt;span class="nt"&gt;--private&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;fly ips allocate-v6&lt;/code&gt; command makes a unique address in your private network that you can use to access Ollama from your other services. Make sure to add the &lt;code&gt;--private&lt;/code&gt; flag, otherwise you&amp;rsquo;ll get a globally unique IP address instead of a private one.&lt;/p&gt;

&lt;p&gt;Next, you may need to remove all of the other public IP addresses for the app to lock it away from the public. Get a list of them with &lt;code&gt;fly ips list&lt;/code&gt; and then remove them with &lt;code&gt;fly ips release &amp;lt;ip&amp;gt;&lt;/code&gt;. Delete everything but your flycast IP.&lt;/p&gt;

&lt;p&gt;Next, we need to declare the volume for Ollama to store models in. If you don&amp;rsquo;t do this, then when you scale to zero, your existing models will be destroyed and you will have to re-download them every time the server starts. This is not ideal, so we&amp;rsquo;re going to create a persistent volume to store the models in. Add the following to your &lt;code&gt;fly.toml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-xcc0fiwo"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-xcc0fiwo"&gt;&lt;span class="nn"&gt;[build]&lt;/span&gt;
  &lt;span class="py"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama/ollama"&lt;/span&gt;

&lt;span class="nn"&gt;[mounts]&lt;/span&gt;
  &lt;span class="py"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"models"&lt;/span&gt;
  &lt;span class="py"&gt;destination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/root/.ollama"&lt;/span&gt;
  &lt;span class="py"&gt;initial_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"100gb"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This will create a 100GB volume in the &lt;a href='https://en.wikipedia.org/wiki/O%27Hare_International_Airport' title=''&gt;&lt;code&gt;ord&lt;/code&gt;&lt;/a&gt; region when the app is deployed. This will be used to store the models that you download from the &lt;a href='https://ollama.ai/library/' title=''&gt;Ollama library&lt;/a&gt;. You can make this smaller if you want, but 100GB is a good place to start from.&lt;/p&gt;

&lt;p&gt;Now that everything is set up, we can deploy this to Fly.io:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-f1l19v0a"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-f1l19v0a"&gt;fly deploy
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This will take a minute to pull the Ollama image, push it to a Machine, provision your volume, and kick everything else off with hypervisors, GPUs and whatnot. Once it&amp;rsquo;s done, you should see something like this:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-wji5shy9"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-wji5shy9"&gt; ✔ Machine 17816141f55489 &lt;span class="o"&gt;[&lt;/span&gt;app] update succeeded
&lt;span class="nt"&gt;-------&lt;/span&gt;

Visit your newly deployed app at https://sparkling-violet-709.fly.dev/
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is a lie because we just deleted the public IP addresses for this app. You can&amp;rsquo;t access it from the internet, and by extension, random people can&amp;rsquo;t access it either. For now, you can run an interactive session with Ollama using an ephemeral Fly Machine:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-abxdbso0"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-abxdbso0"&gt;fly m run &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;OLLAMA_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://sparkling-violet-709.flycast &lt;span class="nt"&gt;--shell&lt;/span&gt; ollama/ollama
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And then you can pull an image from the &lt;a href='https://ollama.ai/library/' title=''&gt;ollama library&lt;/a&gt; and generate some text:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-7zldfj8t"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-7zldfj8t"&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;ollama run openchat:7b-v3.5-fp16
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; How &lt;span class="k"&gt;do &lt;/span&gt;I bake chocolate chip cookies?
 To bake chocolate chip cookies, follow these steps:

1. Preheat the oven to 375°F &lt;span class="o"&gt;(&lt;/span&gt;190°C&lt;span class="o"&gt;)&lt;/span&gt; and line a baking sheet with parchment paper or silicone baking mat.

2. In a large bowl, mix together 1 cup of unsalted butter &lt;span class="o"&gt;(&lt;/span&gt;softened&lt;span class="o"&gt;)&lt;/span&gt;, 3/4 cup granulated sugar, and 3/4
cup packed brown sugar &lt;span class="k"&gt;until &lt;/span&gt;light and fluffy.

3. Add 2 large eggs, one at a &lt;span class="nb"&gt;time&lt;/span&gt;, to the butter mixture, beating well after each addition. Stir &lt;span class="k"&gt;in &lt;/span&gt;1
teaspoon of pure vanilla extract.

4. In a separate bowl, whisk together 2 cups all-purpose flour, 1/2 teaspoon baking soda, and 1/2 teaspoon
salt. Gradually add the dry ingredients to the wet ingredients, stirring &lt;span class="k"&gt;until &lt;/span&gt;just combined.

5. Fold &lt;span class="k"&gt;in &lt;/span&gt;2 cups of chocolate chips &lt;span class="o"&gt;(&lt;/span&gt;or chunks&lt;span class="o"&gt;)&lt;/span&gt; into the dough.

6. Drop rounded tablespoons of dough onto the prepared baking sheet, spacing them about 2 inches apart.

7. Bake &lt;span class="k"&gt;for &lt;/span&gt;10-12 minutes, or &lt;span class="k"&gt;until &lt;/span&gt;the edges are golden brown. The centers should still be slightly soft.

8. Allow the cookies to cool on the baking sheet &lt;span class="k"&gt;for &lt;/span&gt;a few minutes before transferring them to a wire rack
to cool completely.

Enjoy your homemade chocolate chip cookies!
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you want a persistent wake-on-use connection to your Ollama instance, you can set up a &lt;a href='https://fly.io/docs/reference/private-networking/#discovering-apps-through-dns-on-a-wireguard-connection' title=''&gt;connection to your Fly network using WireGuard&lt;/a&gt;. This will let you use Ollama from your local applications without having to run them on Fly. For example, if you want to figure out the safe cooking temperature for ground beef in Celsius, you can query that in JavaScript with this snippet of code:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative typescript"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-x8r5bgmm"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-x8r5bgmm"&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;generateRequest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openchat:7b-v3.5-fp16&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What is the safe cooking temperature for ground beef in celsius?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// &amp;lt;- important for Node/Deno clients&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://sparkling-violet-709.flycast/api/generate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;generateRequest&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`error fetching response: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Something like "The safe cooking temperature for ground beef is 71 degrees celsius (160 degrees fahrenheit).&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;h2 id='scaling-to-zero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#scaling-to-zero' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Scaling to zero&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The best part about all of this is that when you want to scale down to zero running Machines: do nothing, it will automatically shut down when it&amp;rsquo;s idle. Wait a few minutes and then verify it with &lt;code&gt;fly status&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-wpipvj9k"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-wpipvj9k"&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;fly status

...

PROCESS ID              VERSION REGION  STATE   ROLE    CHECKS  LAST UPDATED
app     3d8d7949b22089  9       ord     stopped                 2023-11-14T19:34:24Z
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The app has been stopped. This means that it&amp;rsquo;s not running and you&amp;rsquo;re not paying for it. When you want it to start up again, just make a request. It will automatically start up and you can use it as normal with the CLI or even just arbitrary calls to &lt;a href='https://github.com/jmorganca/ollama/blob/main/docs/api.md' title=''&gt;the API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can also upload your own models to the Ollama registry by &lt;a href='https://github.com/jmorganca/ollama/blob/main/docs/import.md' title=''&gt;creating your own Modelfile&lt;/a&gt; and pushing it (though you will need to install Ollama locally to publish your own models). At this time, the only way to set a custom system prompt is to use a Modelfile and upload your model to the registry.&lt;/p&gt;
&lt;h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Conclusion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Ollama is a fantastic way to run large language models of your choice and the ability to use Fly.io&amp;rsquo;s powerful GPUs means you can use bigger models with more parameters and a larger context window. This lets you make your assistants more lifelike, your conversations have more context, and your text generation more realistic.&lt;/p&gt;

&lt;p&gt;Oh, by the way, this also lets you use the new &lt;code&gt;json&lt;/code&gt; mode to have your models call functions, similar to how ChatGPT would. To do this, have a system prompt that looks like this:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-2x5qkx59"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-2x5qkx59"&gt;You are a helpful research assistant. The following functions are available for you to fetch further data to answer user questions, if relevant:

{
    "function": "search_bing",
    "description": "Search the web for content on Bing. This allows users to search online/the internet/the web for content.",
    "arguments": [
        {
            "name": "query",
            "type": "string",
            "description": "The search query string"
        }
    ]
}

{
    "function": "search_arxiv",
    "description": "Search for research papers on ArXiv. Make use of AND, OR and NOT operators as appropriate to join terms within the query.",
    "arguments": [
        {
            "name": "query",
            "type": "string",
            "description": "The search query string"
        }
    ]
}

To call a function, respond - immediately and only - with a JSON object of the following format:
{
    "function": "function_name",
    "arguments": {
        "argument1": "argument_value",
        "argument2": "argument_value"
    }
}

If no function needs to be called, respond with an empty JSON object: {}
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Then you can use the &lt;a href='https://github.com/jmorganca/ollama/blob/main/docs/api.md#request-json-mode' title=''&gt;JSON format&lt;/a&gt; to receive a JSON response from Ollama (hint: &lt;code&gt;—format=json&lt;/code&gt; in the CLI or &lt;code&gt;format: &amp;quot;json&amp;quot;&lt;/code&gt; in the API). This is a great way to make your assistants more lifelike and more useful. You will need to use something like &lt;a href='https://www.langchain.com/' title=''&gt;Langchain&lt;/a&gt; or manual iterations to properly handle the cases where the user doesn&amp;rsquo;t want to call a function, but that&amp;rsquo;s a topic for another blog post.&lt;/p&gt;

&lt;p&gt;For the best results you may want to use a model with a larger context window such as &lt;a href='https://ollama.ai/library/vicuna:13b-v1.5-16k-fp16' title=''&gt;vicuna:13b-v1.5-16k-fp16&lt;/a&gt; (16k == 16,384 token window) as JSON is very token-expensive. Future advances in the next few weeks (such as the Yi models gaining ludicrous token windows on the line of 200,000 tokens at the cost of ludicrous amounts of VRAM usage) will make this less of an issue. You can also get away with minifying the JSON in the functions and examples a lot, but you may need to experiment to get the best results.&lt;/p&gt;

&lt;p&gt;Happy hacking, y&amp;#39;all.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Rethinking Serverless with FLAME</title>
    <link rel="alternate" href="https://fly.io/blog/rethinking-serverless-with-flame/"/>
    <id>https://fly.io/blog/rethinking-serverless-with-flame/</id>
    <published>2023-12-06T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/rethinking-serverless-with-flame/assets/flame-thumb.webp"/>
    <content type="html">&lt;blockquote&gt;Imagine if you could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of your app.&lt;/blockquote&gt;


&lt;p&gt;The pursuit of elastic, auto-scaling applications has taken us to silly places.&lt;/p&gt;

&lt;p&gt;Serverless/FaaS had a couple things going for it. Elastic Scale™ is hard. It&amp;rsquo;s even harder when you need to manage those pesky servers. It also promised pay-what-you-use costs to avoid idle usage. Good stuff, right?&lt;/p&gt;

&lt;p&gt;Well the charade is over. You offload scaling concerns and the complexities of scaling, just to end up needing &lt;em&gt;more complexity&lt;/em&gt;. Additional queues, storage, and glue code to communicate back to our app is just the starting point. Dev, test, and CI complexity balloons as fast as your costs. Oh, and you often have to rewrite your app in proprietary JavaScript – even if it&amp;rsquo;s already written in JavaScript!&lt;/p&gt;

&lt;p&gt;At the same time, the rest of us have elastically scaled by starting more webservers. Or we&amp;rsquo;ve dumped on complexity with microservices. This doesn&amp;rsquo;t make sense. Piling on more webservers to transcode more videos or serve up more ML tasks isn&amp;rsquo;t what we want. And granular scale shouldn&amp;rsquo;t require slicing our apps into bespoke operational units with their own APIs and deployments to manage.&lt;/p&gt;

&lt;p&gt;Enough is enough. There&amp;rsquo;s a better way to elastically scale applications.&lt;/p&gt;
&lt;h2 id='the-flame-pattern' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-flame-pattern' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The FLAME pattern&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s what we really want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We don&amp;rsquo;t want to manage those pesky servers. We already have this for our app deployments via &lt;code&gt;fly deploy&lt;/code&gt;, &lt;code&gt;git push heroku&lt;/code&gt;, &lt;code&gt;kubectl&lt;/code&gt;, etc
&lt;/li&gt;&lt;li&gt;We want on-demand, &lt;em&gt;granular&lt;/em&gt; elastic scale of specific parts of our app code
&lt;/li&gt;&lt;li&gt;We don&amp;rsquo;t want to rewrite our application or write parts of it in proprietary runtimes
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Imagine if we could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of the app.&lt;/p&gt;

&lt;p&gt;Enter the FLAME pattern.&lt;/p&gt;
&lt;blockquote&gt;FLAME - Fleeting Lambda Application for Modular Execution&lt;/blockquote&gt;


&lt;p&gt;With FLAME, you treat your &lt;em&gt;entire application&lt;/em&gt; as a lambda, where modular parts can be executed on short-lived infrastructure.&lt;/p&gt;

&lt;p&gt;No rewrites. No bespoke runtimes. No outrageous layers of complexity. Need to insert the results of an expensive operation to the database? PubSub broadcast the result of some expensive work? No problem! It&amp;rsquo;s your whole app so of course you can do it.&lt;/p&gt;

&lt;p&gt;The Elixir &lt;a href='https://github.com/phoenixframework/flame' title=''&gt;flame library&lt;/a&gt; implements the FLAME pattern. It has a backend adapter for Fly.io, but you can use it on any cloud that gives you an API to spin up an instance with your app code running on it. We&amp;rsquo;ll talk more about backends in a bit, as well as implementing FLAME in other languages.&lt;/p&gt;

&lt;p&gt;First, lets watch a realtime thumbnail generation example to see FLAME + Elixir in action:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/l1xt_rkWdic"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Now let&amp;rsquo;s walk thru something a little more basic. Imagine we have a function to transcode video to thumbnails in our Elixir application after they are uploaded:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-racxyl56"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-racxyl56"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;generate_thumbnails&lt;/span&gt;&lt;span class="p"&gt;(%&lt;/span&gt;&lt;span class="no"&gt;Video&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tmp_dir!&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="no"&gt;Ecto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;UUID&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
  &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mkdir!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-i"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"-vf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"fps=1/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/%02d.png"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"ffmpeg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;VidStore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;put_thumbnails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wildcard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"/*.png"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="no"&gt;Repo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;insert_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Thumb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;vid_id:&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;url:&lt;/span&gt; &lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Our &lt;code&gt;generate_thumbnails&lt;/code&gt; function accepts a video struct. We shell out to &lt;code&gt;ffmpeg&lt;/code&gt; to take the video URL and generate thumbnails at a given interval. We then write the temporary thumbnail paths to durable storage. Finally, we insert the generated thumbnail URLs into the database.&lt;/p&gt;

&lt;p&gt;This works great locally, but CPU bound work like video transcoding can quickly bring our entire service to a halt in production. Instead of rewriting large swaths of our app to move this into microservices or some FaaS, we can simply wrap it in a FLAME call:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-b6z6pma"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-b6z6pma"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;generate_thumbnails&lt;/span&gt;&lt;span class="p"&gt;(%&lt;/span&gt;&lt;span class="no"&gt;Video&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="no"&gt;FLAME&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;MyApp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;FFMpegRunner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tmp_dir!&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="no"&gt;Ecto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;UUID&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mkdir!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-i"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"-vf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"fps=1/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/%02d.png"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"ffmpeg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;VidStore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;put_thumbnails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wildcard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"/*.png"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="no"&gt;Repo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;insert_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Thumb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;vid_id:&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;url:&lt;/span&gt; &lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;That&amp;rsquo;s it! &lt;code&gt;FLAME.call&lt;/code&gt; accepts the name of a runner pool, and a function. It then finds or boots a new copy of our entire application and runs the function there. Any variables the function closes over (like our &lt;code&gt;%Video{}&lt;/code&gt; struct and &lt;code&gt;interval&lt;/code&gt;) are passed along automatically.&lt;/p&gt;

&lt;p&gt;When the FLAME runner boots up, it connects back to the parent node, receives the function to run, executes it, and returns the result to the caller. Based on configuration, the booted runner either waits happily for more work before idling down, or extinguishes itself immediately.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s visualize the flow:&lt;/p&gt;

&lt;p&gt;&lt;img alt="visualizing the flow" src="/blog/rethinking-serverless-with-flame/assets/visual.webp?centered" /&gt;&lt;/p&gt;

&lt;p&gt;We changed no other code and issued our DB write with &lt;code&gt;Repo.insert_all&lt;/code&gt; just like before, because we are running our &lt;em&gt;entire&lt;/em&gt; &lt;em&gt;application&lt;/em&gt;. Database connection(s) and all. Except this fleeting application only runs that little function after startup and nothing else.&lt;/p&gt;

&lt;p&gt;In practice, a FLAME implementation will support a pool of runners for hot startup, scale-to-zero, and elastic growth. More on that later.&lt;/p&gt;
&lt;h2 id='solving-a-problem-vs-removing-the-problem' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#solving-a-problem-vs-removing-the-problem' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Solving a problem vs removing the problem&lt;/span&gt;&lt;/h2&gt;&lt;blockquote&gt;FaaS solutions help you solve a problem. FLAME removes the problem.&lt;/blockquote&gt;


&lt;p&gt;The FaaS labyrinth of complexity defies reason. And it&amp;rsquo;s unavoidable. Let&amp;rsquo;s walkthrough the thumbnail use-case to see how.&lt;/p&gt;

&lt;p&gt;We try to start with the simplest building block like request/response AWS Lambda Function URL&amp;rsquo;s.&lt;/p&gt;

&lt;p&gt;The complexity hits immediately.&lt;/p&gt;

&lt;p&gt;We start writing custom encoders/decoders on both sides to handle streaming the thumbnails back to the app over HTTP. Phew that&amp;rsquo;s done. Wait, is our video transcoding or user uploads going to take longer than 15 minutes? Sorry, hard timeout limit – time to split our videos into chunks to stay within the timeout, which means more lambdas to do that. Now we&amp;rsquo;re orchestrating lambda workflows and relying on additional services, such as SQS and S3, to enable this.&lt;/p&gt;

&lt;p&gt;All the FaaS is doing is adding layers of communication between your code and the parts you want to run elastically. Each layer has its own glue integration price to pay.&lt;/p&gt;

&lt;p&gt;Ultimately handling this kind of use-case looks something like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trigger the lambda via HTTP endpoint, S3, or API gateway ($)
&lt;/li&gt;&lt;li&gt;Write the bespoke lambda to transcode the video ($)
&lt;/li&gt;&lt;li&gt;Place the thumbnail results into SQS ($)
&lt;/li&gt;&lt;li&gt;Write the SQS consumer in our app (dev $)
&lt;/li&gt;&lt;li&gt;Persist to DB and figure out how to get events back to active subscribers that may well be connected to other instances than the SQS consumer (dev $)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is nuts. We pay the FaaS toll at every step. We shouldn&amp;rsquo;t have to do any of this!&lt;/p&gt;

&lt;p&gt;FaaS provides a bunch of offerings to build a solution on top of. FLAME removes the problem entirely.&lt;/p&gt;
&lt;h2 id='flame-backends' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#flame-backends' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;FLAME Backends&lt;/span&gt;&lt;/h2&gt;&lt;blockquote&gt;On Fly.io infrastructure the &lt;code&gt;FLAME.FlyBackend&lt;/code&gt; can boot a copy of your application on a new &lt;a href="https://fly.io/docs/machines/"&gt;Machine&lt;/a&gt; and have it connect back to the parent for work within ~3s.&lt;/blockquote&gt;


&lt;p&gt;By default, FLAME ships with a &lt;code&gt;LocalBackend&lt;/code&gt; and &lt;code&gt;FlyBackend&lt;/code&gt;, but any host that provides an API to provision a server and run your app code can work as a FLAME backend. Erlang and Elixir primitives are doing all the heavy lifting here. The entire &lt;code&gt;FLAME.FlyBackend&lt;/code&gt; is &lt;a href='https://github.com/phoenixframework/flame/blob/main/lib/flame/fly_backend.ex' title=''&gt;&amp;lt; 200 LOC with docs&lt;/a&gt;. The library has a single dependency, &lt;code&gt;req&lt;/code&gt;, which is an HTTP client.&lt;/p&gt;

&lt;p&gt;Because Fly.io runs our applications as a packaged up docker image, we simply ask the Fly API to boot a new Machine for us with the same image that our app is currently running. Also thanks to Fly infrastructure, we can guarantee the FLAME runners are started in the same region as the parent. This optimizes latency and lets you ship whatever data back and forth between parent and runner without having to think about it.&lt;/p&gt;
&lt;h2 id='look-at-everything-were-not-doing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-everything-were-not-doing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Look at everything we&amp;rsquo;re not doing&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;With FaaS, just imagine how quickly the dev and testing story becomes a fate worse than death.&lt;/p&gt;

&lt;p&gt;To run the app locally, we either need to add some huge dev dependencies to simulate the entire FaaS pipeline, or worse, connect up our dev and test environments directly to the FaaS provider.&lt;/p&gt;

&lt;p&gt;With FLAME, your dev and test runners simply run on the local backend.&lt;/p&gt;

&lt;p&gt;Remember, this is your app. FLAME just controls where modular parts of it run. In dev or test, those parts simply run on the existing runtime on your laptop or CI server.&lt;/p&gt;

&lt;p&gt;Using Elixir, we can even send a file across to the remote FLAME application thanks to the distributed features of the Erlang VM:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-rduui6yl"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-rduui6yl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;generate_thumbnails&lt;/span&gt;&lt;span class="p"&gt;(%&lt;/span&gt;&lt;span class="no"&gt;Video&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;parent_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="no"&gt;FLAME&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;MyApp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;FFMpegRunner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;tmp_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tmp_dir!&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="no"&gt;Ecto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;UUID&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;flame_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;into&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parent_stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flame_stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tmp_dir!&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="no"&gt;Ecto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;UUID&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mkdir!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-i"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tmp_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"-vf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"fps=1/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/%02d.png"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="no"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"ffmpeg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;VidStore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;put_thumbnails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wildcard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"/*.png"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="no"&gt;Repo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;insert_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Thumb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;vid_id:&lt;/span&gt; &lt;span class="n"&gt;vid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;url:&lt;/span&gt; &lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;On line 2 we open a file on the parent node to the video path. Then in the FLAME child, we stream the file from the parent node to the FLAME server in only a couple lines of code. That&amp;rsquo;s it! No setup of S3 or HTTP interfaces required.&lt;/p&gt;

&lt;p&gt;With FLAME it&amp;rsquo;s easy to miss everything we&amp;rsquo;re not doing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We don&amp;rsquo;t need to write code outside of our application. We can reuse business logic, database setup, PubSub, and all the features of our respective platforms
&lt;/li&gt;&lt;li&gt;We don&amp;rsquo;t need to manage deploys of separate services or endpoints
&lt;/li&gt;&lt;li&gt;We don&amp;rsquo;t need to write results to S3 or SQS just to pick up values back in our app
&lt;/li&gt;&lt;li&gt;We skip the dev, test, and CI dependency dance
&lt;/li&gt;&lt;/ul&gt;
&lt;h2 id='flame-outside-elixir' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#flame-outside-elixir' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;FLAME outside Elixir&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Elixir is fantastically well suited for the FLAME model because we get so much &lt;a href='https://fly.io/phoenix-files/elixir-and-phoenix-can-do-it-all/' title=''&gt;for free&lt;/a&gt; like process supervision and distributed messaging. That said, any language with reasonable concurrency primitives can take advantage of this pattern. For example, my teammate, Lubien, created a proof of concept example for breaking out functions in your JavaScript application and running them inside a new Fly Machine: &lt;a href='https://github.com/lubien/fly-run-this-function-on-another-machine' title=''&gt;https://github.com/lubien/fly-run-this-function-on-another-machine&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So the general flow for a JavaScript-based FLAME call would be to move the modular executions to a new file, which is executed on a runner pool. Provided the arguments are JSON serializable, the general FLAME flow is similar to what we&amp;rsquo;ve outlined here. Your application, your code, running on fleeting instances.&lt;/p&gt;

&lt;p&gt;A complete FLAME library will need to handle the following concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Elastic pool scale-up and scale-down logic
&lt;/li&gt;&lt;li&gt;Hot vs cold startup with pools
&lt;/li&gt;&lt;li&gt;Remote runner monitoring to avoid orphaned resources
&lt;/li&gt;&lt;li&gt;How to monitor and keep deployments fresh
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;For the rest of this post we&amp;rsquo;ll see how the Elixir FLAME library handles these concerns as well as features uniquely suited to Elixir applications. But first, you might be wondering about your background job queues.&lt;/p&gt;
&lt;h2 id='what-about-my-background-job-processor' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-about-my-background-job-processor' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What about my background job processor?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;FLAME works great inside your background job processor, but you may have noticed some overlap. If your job library handles scaling the worker pool, what is FLAME doing for you? There&amp;rsquo;s a couple important distinctions here.&lt;/p&gt;

&lt;p&gt;First, we reach for these queues when we need &lt;em&gt;durability guarantees&lt;/em&gt;. We often can turn knobs to have the queues scale to handle more jobs as load changes. But durable operations are separate from elastic execution. Conflating these concerns can send you down a similar path to lambda complexity. Leaning on your worker queue purely for offloaded execution means writing all the glue code to get the data into and out of the job, and back to the caller or end-user&amp;rsquo;s device somehow.&lt;/p&gt;

&lt;p&gt;For example, if we want to guarantee we successfully generated thumbnails for a video after the user upload, then a job queue makes sense as the &lt;em&gt;dispatch, commit, and retry&lt;/em&gt; &lt;em&gt;mechanism&lt;/em&gt; for this operation. The actual transcoding could be a FLAME call inside the job itself, so we decouple the ideas of durability and scaled execution.&lt;/p&gt;

&lt;p&gt;On the other side, we have operations we don&amp;rsquo;t need durability for. Take the screencast above where the user hasn&amp;rsquo;t yet saved their video. Or an ML model execution where there&amp;rsquo;s no need to waste resources churning a prompt if the user has already left the app. In those cases, it doesn&amp;rsquo;t make sense to write to a durable store to pick up a job for work that will go right into the ether.&lt;/p&gt;
&lt;h2 id='pooling-for-elastic-scale' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pooling-for-elastic-scale' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Pooling for Elastic Scale&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;With the Elixir implementation of FLAME, you define elastic pools of runners. This allows scale-to-zero behavior while also elastically scaling up FLAME servers with max concurrency limits.&lt;/p&gt;

&lt;p&gt;For example, lets take a look at the &lt;code&gt;start/2&lt;/code&gt; callback, which is the entry point of all Elixir applications. We can drop in a &lt;code&gt;FLAME.Pool&lt;/code&gt; for video transcriptions and say we want it to scale to zero, boot a max of 10, and support 5 concurrent &lt;code&gt;ffmpeg&lt;/code&gt; operations per runner:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-txwesbky"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-txwesbky"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;flame_parent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;FLAME&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Parent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

  &lt;span class="n"&gt;children&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="no"&gt;MyApp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Repo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="no"&gt;FLAME&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;name:&lt;/span&gt; &lt;span class="no"&gt;Thumbs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;FFMpegRunner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;min:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;max:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;max_concurrency:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;idle_shutdown_after:&lt;/span&gt; &lt;span class="mi"&gt;30_000&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;!flame_parent&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="no"&gt;MyAppWeb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Endpoint&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;strategy:&lt;/span&gt; &lt;span class="ss"&gt;:one_for_one&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;name:&lt;/span&gt; &lt;span class="no"&gt;MyApp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Supervisor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="no"&gt;Supervisor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start_link&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;children&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We use the presence of a FLAME parent to conditionally start our Phoenix webserver when booting the app. There&amp;rsquo;s no reason to start a webserver if we aren&amp;rsquo;t serving web traffic. Note we leave other services like the database &lt;code&gt;MyApp.Repo&lt;/code&gt; alone because we want to make use of those services inside FLAME runners.&lt;/p&gt;

&lt;p&gt;Elixir&amp;rsquo;s supervised process approach to applications is uniquely great for turning these kinds of knobs.&lt;/p&gt;

&lt;p&gt;We also set our pool to idle down after 30 seconds of no caller operations. This keeps our runners hot for a short while before discarding them. We could also pass a &lt;code&gt;min: 1&lt;/code&gt; to always ensure at least one &lt;code&gt;ffmpeg&lt;/code&gt; runner is hot and ready for work by the time our application is started.&lt;/p&gt;
&lt;h2 id='process-placement' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#process-placement' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Process Placement&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In Elixir, stateful bits of our applications are built around the &lt;em&gt;process&lt;/em&gt; primitive – lightweight greenthreads with message mailboxes. Wrapping our otherwise stateless app code in a synchronous &lt;code&gt;FLAME.call&lt;/code&gt;&amp;lsquo;s or async &lt;code&gt;FLAME.cast&lt;/code&gt;&amp;rsquo;s works great, but what about the stateful parts of our app?&lt;/p&gt;

&lt;p&gt;&lt;code&gt;FLAME.place_child&lt;/code&gt; exists to take an existing process specification in your Elixir app and start it on a FLAME runner instead of locally. You can use it anywhere you&amp;rsquo;d use &lt;code&gt;Task.Supervisor.start_child&lt;/code&gt; , &lt;code&gt;DynamicSupervisor.start_child&lt;/code&gt;, or similar interfaces. Just like &lt;code&gt;FLAME.call&lt;/code&gt;, the process is run on an elastic pool and runners handle idle down when the process completes its work.&lt;/p&gt;

&lt;p&gt;And like &lt;code&gt;FLAME.call&lt;/code&gt;, it lets us take existing app code, change a single LOC, and continue shipping features.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s walk thru the example from the screencast above. Imagine we want to generate video thumbnails for a video &lt;em&gt;as it is being uploaded&lt;/em&gt;. Elixir and LiveView make this easy. We won&amp;rsquo;t cover all the code here, but you can view the &lt;a href='https://github.com/fly-apps/thumbnail_generator/blob/main/lib/thumbs/thumbnail_generator.ex' title=''&gt;full app implementation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Our first pass would be to write a LiveView upload writer that calls into a &lt;code&gt;ThumbnailGenerator&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-a6sjmlsd"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-a6sjmlsd"&gt;&lt;span class="k"&gt;defmodule&lt;/span&gt; &lt;span class="no"&gt;ThumbsWeb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;ThumbnailUploadWriter&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="nv"&gt;@behaviour&lt;/span&gt; &lt;span class="no"&gt;Phoenix&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;LiveView&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;UploadWriter&lt;/span&gt;

  &lt;span class="n"&gt;alias&lt;/span&gt; &lt;span class="no"&gt;Thumbs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;generator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;gen:&lt;/span&gt; &lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;write_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream_chunk!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;gen:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gen&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_reason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gen&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;An upload writer is a behavior that simply ferries the uploaded chunks from the client into whatever we&amp;rsquo;d like to do with them. Here we have a &lt;code&gt;ThumbnailGenerator.open/1&lt;/code&gt; which starts a process that communicates with an &lt;code&gt;ffmpeg&lt;/code&gt; shell. Inside &lt;code&gt;ThumbnailGenerator.open/1&lt;/code&gt;, we use regular elixir process primitives:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-9zrau48v"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-9zrau48v"&gt;  &lt;span class="c1"&gt;# thumbnail_generator.ex&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="p"&gt;\\&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;Keyword&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validate!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:timeout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:caller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:fps&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;timeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Keyword&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:timeout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;caller&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Keyword&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:caller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;ref&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_ref&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;parent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;spec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="bp"&gt;__MODULE__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;DynamicSupervisor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start_child&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;@sup&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;receive&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;%&lt;/span&gt;&lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gen&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;%&lt;/span&gt;&lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;gen&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="ss"&gt;pid:&lt;/span&gt; &lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;after&lt;/span&gt;
      &lt;span class="n"&gt;timeout&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:timeout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The details aren&amp;rsquo;t super important here, except line 10 where we call &lt;code&gt;{:ok, pid} = DynamicSupervisor.start_child(@sup, spec)&lt;/code&gt;, which starts a supervised&lt;code&gt;ThumbnailGenerator&lt;/code&gt; process. The rest of the implementation simply ferries chunks as stdin into &lt;code&gt;ffmpeg&lt;/code&gt; and parses png&amp;rsquo;s from stdout. Once a PNG delimiter is found in stdout, we send the &lt;code&gt;caller&lt;/code&gt; process (our LiveView process) a message saying &amp;ldquo;hey, here&amp;rsquo;s an image&amp;rdquo;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-mgmj8tt7"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-mgmj8tt7"&gt;&lt;span class="c1"&gt;# thumbnail_generator.ex&lt;/span&gt;
&lt;span class="nv"&gt;@png_begin&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;137&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;71&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;26&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;
&lt;span class="k"&gt;defp&lt;/span&gt; &lt;span class="n"&gt;handle_stdout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bin&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="p"&gt;%&lt;/span&gt;&lt;span class="no"&gt;ThumbnailGenerator&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;ref:&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;caller:&lt;/span&gt; &lt;span class="n"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gen&lt;/span&gt;

  &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;bin&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt;@png_begin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_rest&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;binary&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
        &lt;span class="n"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;

      &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="ss"&gt;count:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;current:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;bin&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

    &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="ss"&gt;current:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;bin&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;caller&lt;/code&gt; LiveView process then picks up the message in a &lt;code&gt;handle_info&lt;/code&gt; callback and updates the UI:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative elixir"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-k5i7sj1i"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-k5i7sj1i"&gt;&lt;span class="c1"&gt;# thumb_live.ex&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;handle_info&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="n"&gt;_ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoded&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;count:&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assigns&lt;/span&gt;

  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:noreply&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;socket&lt;/span&gt;
   &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;count:&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;message:&lt;/span&gt; &lt;span class="s2"&gt;"Generating (&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;stream_insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:thumbs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;id:&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;encoded:&lt;/span&gt; &lt;span class="n"&gt;encoded&lt;/span&gt;&lt;span class="p"&gt;})}&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;send(caller, {ref, :image, state.count, encode(state)}&lt;/code&gt; is one magic part about Elixir. Everything is a process, and we can message those processes, regardless of their location in the cluster.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s like if every instantiation of an object in your favorite OO lang included a cluster-global unique identifier to work with methods on that object. The LiveView (a process) simply receives the image message and updates the UI with new images.&lt;/p&gt;

&lt;p&gt;Now let&amp;rsquo;s head back over to our &lt;code&gt;ThumbnailGenerator.open/1&lt;/code&gt; function and make this elastically scalable.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative diff"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-2cwbroo9"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-2cwbroo9"&gt;&lt;span class="gd"&gt;-    {:ok, pid} = DynamicSupervisor.start_child(@sup, spec)
&lt;/span&gt;&lt;span class="gi"&gt;+    {:ok, pid} = FLAME.place_child(Thumbs.FFMpegRunner, spec)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;That&amp;rsquo;s it! Because everything is a process and processes can live anywhere, it doesn&amp;rsquo;t matter what server our &lt;code&gt;ThumbnailGenerator&lt;/code&gt; process lives on. It simply messages the caller with &lt;code&gt;send(caller, …)&lt;/code&gt; and the messages are sent across the cluster if needed.&lt;/p&gt;

&lt;p&gt;Once the process exits, either from an explicit close, after the upload is done, or from the end-user closing their browser tab, the FLAME server will note the exit and idle down if no other work is being done.&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href='https://github.com/fly-apps/thumbnail_generator/blob/main/lib/thumbs/thumbnail_generator.ex' title=''&gt;full implementation&lt;/a&gt; if you&amp;rsquo;re interested.&lt;/p&gt;
&lt;h2 id='remote-monitoring' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#remote-monitoring' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Remote Monitoring&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;All this transient infrastructure needs failsafe mechanisms to avoid orphaning resources. If a parent spins up a runner, that runner must take care of idling itself down when no work is present and handle failsafe shutdowns if it can no longer contact the parent node.&lt;/p&gt;

&lt;p&gt;Likewise, we need to shutdown runners when parents are rolled for new deploys as we must guarantee we&amp;rsquo;re running the same code across the cluster.&lt;/p&gt;

&lt;p&gt;We also have active callers in many cases that are awaiting the result of work on runners that could go down for any reason.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a lot to monitor here.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s also a number of failure modes that make this sound like a harrowing experience to implement. Fortunately Elixir has all the primitives to make this an easy task thanks to the Erlang VM. Namely, we get the following for free:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Process monitoring and supervision – we know when things go bad. Whether on a node-local process, or one across the cluster
&lt;/li&gt;&lt;li&gt;Node monitoring – we know when nodes come up, and when nodes go away
&lt;/li&gt;&lt;li&gt;Declarative and controlled app startup and shutdown - we carefully control the startup and shutdown sequence of applications as a matter of course. This allows us to gracefully shutdown active runners when a fresh deploy is triggered, while giving them time to finish their work
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;We&amp;rsquo;ll cover the internal implementation details in a future deep-dive post. For now, feel free to poke around &lt;a href='https://github.com/phoenixframework/flame' title=''&gt;the flame source&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id='whats-next' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s Next&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;re just getting started with the Elixir FLAME library, but it&amp;rsquo;s ready to try out now. In the future  look for more advance pool growth techniques, and deep dives into how the Elixir implementation works. You can also find me &lt;a href='https://twitter.com/chris_mccord' title=''&gt;@chris_mccord&lt;/a&gt; to chat about implementing the FLAME pattern in your language of choice.&lt;/p&gt;

&lt;p&gt;Happy coding!&lt;/p&gt;

&lt;p&gt;–Chris&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>The risks of building apps on ChatGPT</title>
    <link rel="alternate" href="https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/"/>
    <id>https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/</id>
    <published>2023-12-05T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/assets/risks-building-on-chatgpt-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;If AI will play an essential role in your application, then consider using a self-hosted, open source model instead of a proprietary and externally hosted one. In this post we explore some of the risks for the latter option. We’re Fly.io. We put your code into lightweight microVMs on our own hardware &lt;a href="https://fly.io/docs/reference/regions/" title=""&gt;around the world&lt;/a&gt;. &lt;a href="https://fly.io/docs/speedrun/" title=""&gt;Check us out&lt;/a&gt;—your app can be deployed in minutes.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The topic of &amp;ldquo;AI&amp;rdquo; gets a lot of attention and press. Coverage ranges from apocalyptic warnings to Utopian predictions. The truth, as always, is likely somewhere in the middle. As developers, we are the ones that either imagine ways that AI can be used to enhance our products or the ones doing the herculean tasks of implementing it inside our companies.&lt;/p&gt;

&lt;p&gt;I believe the following statement to be true:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI won’t replace humans — but humans with AI will replace humans without AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I believe this can be extended to many products and services and the companies that create them. Let&amp;rsquo;s express it this way:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI won’t replace businesses — but businesses with AI will replace businesses without AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Today I&amp;rsquo;m assuming your business would benefit from using AI. Or, at the very least, your C-levels have decreed from on high that thou must integrateth with AI. With that out of the way, the next question is how you&amp;rsquo;re meant to do it. This post is an argument to build on top of open source language models instead of closed models that you rent access to. We&amp;rsquo;ll take a look at what convinced me.&lt;/p&gt;
&lt;h2 id='but-openai-is-the-market-leader' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-openai-is-the-market-leader' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;But OpenAI is the market leader…&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;OpenAI, the creators of the famous ChatGPT, are the strong market leaders in this category. Why wouldn&amp;rsquo;t you want to use the best in the business?&lt;/p&gt;

&lt;p&gt;Early on, stories of private corporate documents being uploaded by employees and then finding that private information leaking out to general ChatGPT users was a real black eye. &lt;a href='https://www.sciencealert.com/many-companies-are-banning-chatgpt-this-is-why' title=''&gt;Companies began banning employees from using ChatGPT for work&lt;/a&gt;. It exposed that people&amp;rsquo;s interactions with ChatGPT were being used as training data for future versions of the model.&lt;/p&gt;

&lt;p&gt;In response, OpenAI recently announced an &lt;a href='https://openai.com/enterprise' title=''&gt;Enterprise&lt;/a&gt; offering promising that no Enterprise customer data is used for training.&lt;/p&gt;

&lt;p&gt;With the top objection addressed, it should be smooth sailing for wide adoption, right?&lt;/p&gt;

&lt;p&gt;Not so fast.&lt;/p&gt;

&lt;p&gt;While an Enterprise offering may address that concern, there are other subtle reasons to not use OpenAI, or other closed models, that can&amp;rsquo;t be resolved by vague statements of enterprise privacy.&lt;/p&gt;
&lt;h2 id='what-are-the-risks-for-building-on-top-of-openai' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-are-the-risks-for-building-on-top-of-openai' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What are the risks for building on top of OpenAI?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s briefly outline the risks we take on when relying on a company like OpenAI for critical AI features in our applications.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Single provider risk&lt;/strong&gt;: Relying deeply on an external service that plays a critical role in our business is risky. The integration is not easily swapped out for another service if needed. Additionally, we don&amp;rsquo;t want part of our &amp;ldquo;secret sauce&amp;rdquo; to actually be another company&amp;rsquo;s product. That&amp;rsquo;s some seriously shaky ground! They &lt;em&gt;want&lt;/em&gt; to sell the same thing to our competitors too.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Regulation or Policy change risk&lt;/strong&gt;: &amp;ldquo;AI&amp;rdquo; is being talked about a lot in politics. What&amp;rsquo;s acceptable today may be deemed &amp;ldquo;not allowed&amp;rdquo; in the future and a corporation providing a newly regulated service must comply.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Financial risk&lt;/strong&gt;: &lt;a href='https://www.washingtonpost.com/technology/2023/06/05/chatgpt-hidden-cost-gpu-compute/' title=''&gt;AI chatbots lose money on every chat.&lt;/a&gt; If the financial models that make our business profitable are built on impossible to maintain prices, then our business model may be at risk when it&amp;rsquo;s time to &amp;ldquo;make the AI engine profitable&amp;rdquo; like we&amp;rsquo;ve seen happen time and time again with every industry from cookware to video games. What might the true cost be? We don&amp;rsquo;t know. &amp;lsquo;Nuff said.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Governance and leadership risk&lt;/strong&gt;: The co-founder and CEO of OpenAI, &lt;a href='https://openai.com/blog/openai-announces-leadership-transition' title=''&gt;Sam Altman, was forced out of his own company by a coup from his board&lt;/a&gt;. This was later resolved with both &lt;a href='https://openai.com/blog/openai-announces-leadership-transition' title=''&gt;Sam Altman and Greg Brockman returning&lt;/a&gt;. This exposes another risk we don&amp;rsquo;t often consider with our providers. More on this later.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Let&amp;rsquo;s look a bit closer at the &amp;ldquo;Single provider risk&amp;rdquo;.&lt;/p&gt;
&lt;h2 id='single-provider-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#single-provider-risk' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Single provider risk&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;For hobby usage, proof of concept work, and personal experiments, by all means, use ChatGPT! I do and I expect to continue to as well. It&amp;rsquo;s fantastic for prototyping, it&amp;rsquo;s trivial to set up, and it allows you to throw ink on canvas so much more quickly than any other option out there.&lt;/p&gt;

&lt;p&gt;Up until recently, I was all gung-ho for ChatGPT being integrated into my apps. What happened? November 2023 happened. It was a very bad month for OpenAI.&lt;/p&gt;

&lt;p&gt;I created a &lt;a href='https://fly.io/phoenix-files/created-my-personal-ai-fitness-trainer-in-2-days/' title=''&gt;Personal AI Fitness Trainer&lt;/a&gt; powered by ChatGPT and on the morning of November 8th, I asked my personal trainer about the workout for the day and it failed. OpenAI was having a bad day with an outage.&lt;/p&gt;

&lt;p&gt;I don&amp;rsquo;t fault someone for having a bad day. At some point, downtime happens to the best of us. And given enough time, it happens to &lt;strong class='font-semibold text-navy-950'&gt;all&lt;/strong&gt; of us. But when possible, I want to prevent someone &lt;em&gt;else&amp;rsquo;s&lt;/em&gt; bad day from becoming &lt;em&gt;my&lt;/em&gt; bad day too.&lt;/p&gt;
&lt;h3 id='evaluating-a-critical-dependency' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#evaluating-a-critical-dependency' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Evaluating a critical dependency&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;In my case, my personal fitness trainer being unavailable was a minor inconvenience, but I managed. However, it gave me pause. If I had built an AI fitness trainer as a service, that outage would be a much bigger deal and there would be nothing I could have done to fix it until the ChatGPT API came back up.&lt;/p&gt;

&lt;p&gt;With services like a Personal AI Fitness Trainer, the AI component is the primary focus and main value proposition of the app. That&amp;rsquo;s pretty darn critical! If that AI service is interrupted, significantly altered (say, by the model suddenly refusing my requests for fitness information in ways that worked before) or my desired usage is denied (without warning or reason), the application is useless. That&amp;rsquo;s an existential threat that could make my app evaporate overnight without warning.&lt;/p&gt;

&lt;p&gt;This highlights the risk of having a critical dependency on an external service.&lt;/p&gt;

&lt;p&gt;Modern applications depend on many services, both internal and external. But how &lt;strong class='font-semibold text-navy-950'&gt;critical&lt;/strong&gt; that dependency is matters.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s take a &lt;em&gt;very&lt;/em&gt; simple application as an example. The application has a critical dependency on the database and both the app and database have a critical dependency on the underlying VMs/machines/provider. These critical dependencies are so common that we seldom think about them because we deal with them every day we come to work. It&amp;rsquo;s just how things are.&lt;/p&gt;

&lt;p&gt;&lt;img alt="Diagram showing an application stack of hosting &amp;gt; Database &amp;gt; My Application and weak dependencies on logging, error reporting, etc. Then a critical dependency on an external AI as a Service. " src="/blog/the-risks-of-building-apps-on-chatgpt/assets/critical-dependency-vs-weak.png" /&gt;&lt;/p&gt;

&lt;p&gt;The danger comes when we draw a critical dependency line to an &lt;strong class='font-semibold text-navy-950'&gt;external&lt;/strong&gt; &lt;strong class='font-semibold text-navy-950'&gt;service&lt;/strong&gt;. If the service has a hiccup or the network between my app and their service starts dropping all my packets, the entire application goes down. Someone else&amp;rsquo;s bad day gets spread around when that happens. 😞&lt;/p&gt;

&lt;p&gt;In order to protect ourselves from a risk like that, we should diversify our reliance away from a single external provider. How do we do that? We&amp;rsquo;ll come back to this later.&lt;/p&gt;
&lt;h3 id='we-are-not-without-dependencies' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-are-not-without-dependencies' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We are not without dependencies&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s really common for apps to have external dependencies. The question is how critical to our service are those dependencies?&lt;/p&gt;

&lt;p&gt;What happens to the application when the external log aggregation service, email service, and error reporting services are all unreachable? If the app is designed well, then users may have a slightly degraded experience or, best case, the users won&amp;rsquo;t even notice the issues at all!&lt;/p&gt;

&lt;p&gt;The key factor is these external services are not essential to our application functioning.&lt;/p&gt;
&lt;h2 id='regulation-or-policy-change-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#regulation-or-policy-change-risk' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Regulation or Policy change risk&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Our industry has a lot of misconceptions, fear, uncertainty, and doubt around the idea of regulation, but sometimes it&amp;rsquo;s justified. I don&amp;rsquo;t want you to think about regulation as a scary thing that yanks away control. Instead, let&amp;rsquo;s think about regulation as when a government body gets involved to disallow businesses from doing or engaging in specific activities. Given that our industry has been so self-defined for so long, this feels like an existential threat. However, this is a good thing when we think about vehicle safety standards (you don&amp;rsquo;t want your 4-ton mass of metal exploding while traveling at 70 mph), pollution, health risks, and more. It&amp;rsquo;s a careful balance.&lt;/p&gt;

&lt;p&gt;Ironically, Sam Altman has been a major proponent &lt;a href='https://www.forbes.com/sites/johannacostigan/2023/06/13/openais-sam-altman-makes-global-call-for-ai-regulation-and-includes-china/?sh=4fc007421b47' title=''&gt;for government regulation&lt;/a&gt; of the AI industry. Why would he want that?&lt;/p&gt;

&lt;p&gt;It turns out that &lt;a href='https://www.cato.org/policy-analysis/regulatory-protectionism-hidden-threat-free-trade' title=''&gt;regulation can also be used as a form of protectionism&lt;/a&gt;. Or, put another way, when the people with an early lead see that &lt;a href='https://www.semianalysis.com/p/google-we-have-no-moat-and-neither' title=''&gt;they aren&amp;rsquo;t defensible against advances with open source AI models&lt;/a&gt;, they want to pull up the ladders behind them and have the government make it legally harder, or impossible, for competitors to catch up to them.&lt;/p&gt;

&lt;p&gt;If Altman&amp;rsquo;s efforts are successful, then companies who create AI can expect government involvement and oversight. Added licensing requirements and certifications would raise the cost of starting a competing business.&lt;/p&gt;

&lt;p&gt;At this point you may be thinking something like &amp;ldquo;but all of that is theoretical Mark, how would this affect my business&amp;rsquo; use of AI today?&amp;rdquo;&lt;/p&gt;

&lt;p&gt;Introducing an external organization that can dictate changes to an AI product risks breaking an existing company&amp;rsquo;s applications or significantly reducing the effectiveness of the application. And those changes may come without notice or warning.&lt;/p&gt;

&lt;p&gt;Additionally, if my business is built on an external AI system protected from competition by regulators, that adds a significant risk. If they are now the only game in town, they can set whatever price they want.&lt;/p&gt;
&lt;h2 id='governance-and-leadership-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#governance-and-leadership-risk' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Governance and leadership risk&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In the week following the OpenAI outage (November 17th to be precise), the entire tech industry was upended for most of a week following a blog post on the OpenAI blog &lt;a href='https://openai.com/blog/openai-announces-leadership-transition' title=''&gt;announcing that the OpenAI board fired the co-founder and CEO, Sam Altman&lt;/a&gt;. Then &lt;a href='https://www.forbes.com/sites/richardnieva/2023/11/17/openai-president-and-co-founder-quits-over-sam-altman-firing/?sh=34fe4b621d57' title=''&gt;Greg Brockman, co-founder and acting President resigned in protest&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;OpenAI is partnered with Microsoft and on Nov 20, 2023, &lt;a href='https://twitter.com/satyanadella/status/1726509045803336122' title=''&gt;Satya Nadella (CEO of Microsoft) posted the following on X&lt;/a&gt; (formerly Twitter):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We remain committed to our partnership with OpenAI (OAI) and have confidence in our product roadmap, our ability to continue to innovate with everything we announced at Microsoft Ignite, and in continuing to support our customers and partners. We look forward to getting to know Emmett Shear and OAI&amp;rsquo;s new leadership team and working with them. And &lt;strong class='font-semibold text-navy-950'&gt;we’re extremely excited to share the news that Sam Altman and Greg Brockman, together with colleagues, will be joining Microsoft to lead a new advanced AI research team.&lt;/strong&gt; We look forward to moving quickly to provide them with the resources needed for their success.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Microsoft nearly &lt;a href='https://en.wikipedia.org/wiki/Acqui-hiring' title=''&gt;acqui-hired&lt;/a&gt; OpenAI for $0! That&amp;rsquo;s some serious business Jujutsu.&lt;/p&gt;

&lt;p&gt;In the end, after 12 days of very public corporate chaos, &lt;a href='https://openai.com/blog/sam-altman-returns-as-ceo-openai-has-a-new-initial-board' title=''&gt;Sam Altman and Greg Brockman returned to OpenAI at their previous leadership positions&lt;/a&gt; as if nothing happened (save the firing of the rest of the board).&lt;/p&gt;

&lt;p&gt;With all the drama and uncertainty resolved, you may say, &amp;ldquo;it all worked out in the end, right? So what&amp;rsquo;s the problem?&amp;rdquo;&lt;/p&gt;

&lt;p&gt;This highlights the risk of building &lt;em&gt;any&lt;/em&gt; critical business system on a product offered and hosted by an external company. When we do that, we implicitly take on all of that company&amp;rsquo;s risks in addition to the risks our business already has! In this case, it&amp;rsquo;s taking on all the risks of OpenAI while getting none of their financial benefits!&lt;/p&gt;
&lt;h2 id='whats-the-alternative' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-the-alternative' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s the alternative?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The thing big AI providers like OpenAI and Google seem to fear most is competition from open source AI models. And they should be afraid. Open source AI models continue to develop at a rapid pace (there&amp;rsquo;s huge incremental improvements on a weekly basis) and, most importantly, they can be self-hosted.&lt;/p&gt;

&lt;p&gt;Additionally, it&amp;rsquo;s not out of reach for us to &lt;a href='https://huggingface.co/docs/transformers/training' title=''&gt;fine tune&lt;/a&gt; a general model to better fit our  needs by adding and removing capabilities rather than hope that the capabilities we need suddenly manifest for us.&lt;/p&gt;

&lt;p&gt;Doesn&amp;rsquo;t this all sound like the classic argument in favor of open source?&lt;/p&gt;

&lt;p&gt;If we have the model and can host it ourselves, no one can take it away. When we self-host it, we are protected from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;service interruptions from an external provider for a critical system
&lt;/li&gt;&lt;li&gt;changes in licensing or usage fees (such as your provider suddenly doubling inference costs without warning via an email sent at 3AM)
&lt;/li&gt;&lt;li&gt;government regulators dictate a change to the model that negatively affects our use case (assuming our use isn&amp;rsquo;t breaking the law of course)
&lt;/li&gt;&lt;li&gt;company policy changes that change the behavior of the model we rely on
&lt;/li&gt;&lt;li&gt;rogue boards or a leadership crisis that impacts a provider
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Using an open source and self-hosted model insulates us from these external risks.&lt;/p&gt;
&lt;h2 id='i-still-need-gpus' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#i-still-need-gpus' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;I still need GPUs!&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Getting dedicated access to a GPU is more expensive than renting limited time on OpenAI&amp;rsquo;s servers. That&amp;rsquo;s why a hobby or personal project is better off paying for the brief bits of time when needed.&lt;/p&gt;

&lt;p&gt;But let&amp;rsquo;s face it.&lt;/p&gt;

&lt;p&gt;If you really want to integrate AI into your business, you need to host your own models. You can&amp;rsquo;t control third party privacy policies, but you can control your own policies when you are the one doing your own inference with your own models. Ideally this means getting your own GPUs and incurring the capital expenditure and operations expenditures, but thankfully we&amp;rsquo;re in the future. We have the cloud now. There&amp;rsquo;s many options you can use for renting GPU access from other companies. This is supported in the big clouds as well as Fly.io. You can check out our &lt;a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''&gt;GPU offerings here&lt;/a&gt;.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Fly.io also offer GPUs&lt;/h1&gt;
    &lt;p&gt;Running inference on your own hosted models can help de-risk critical AI integrations.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/about/pricing/#gpus-and-fly-machines"&gt;
        GPU resource prices
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-rabbit.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='closing-thoughts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#closing-thoughts' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Closing thoughts&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s important to take advantage of AI in our applications so we can reap the benefits. It can give us an important edge in the market! However, we should be extra cautious of building any critical features on a product offered by a proprietary external business. &lt;a href='https://www.msn.com/en-us/money/companies/sam-altman-chaos-helped-openai-rivals-says-hugging-face-ceo-cl%C3%A9ment-delangue/ar-AA1kIFQP' title=''&gt;Others are considering the risks of building on OpenAI as well&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Your specific level of risk depends on how central the AI aspect is to your business. If it&amp;rsquo;s a central component like in my Personal AI Fitness Trainer, then I risk losing all my customers and even the company if any of the above mentioned risk factors happen to  my AI provider. That&amp;rsquo;s an existential risk that I can&amp;rsquo;t do anything about without taking emergency heroic efforts.&lt;/p&gt;

&lt;p&gt;If the AI is sprinkled around the edges of the business, then suddenly losing it won&amp;rsquo;t kill the company. However, if the AI isn&amp;rsquo;t being well utilized, then the business may be at risk to competitors who place a bigger bet and take a bigger swing with AI.&lt;/p&gt;

&lt;p&gt;Oh, what interesting times we live in! 🙃&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Print on Demand</title>
    <link rel="alternate" href="https://fly.io/blog/print-on-demand/"/>
    <id>https://fly.io/blog/print-on-demand/</id>
    <published>2023-11-29T00:00:00+00:00</published>
    <updated>2025-05-20T16:26:25+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/print-on-demand/assets/print-on-demand-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Save money by using appliance machines to only allocate memory and other machine resources when you actually need them.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Scaling discussions often lead to recommendations to add more memory, more CPU, more machines, more regions, more, more, more.&lt;/p&gt;

&lt;p&gt;This post is different.  It focuses instead on the idea of decomposing parts of your applications into event handlers, starting up Machines to handle the events when needed, and stopping them when the event is done.  Along the way we will see how a few built in Fly.io primitives make this easy.&lt;/p&gt;

&lt;p&gt;To make the discussion concrete, we are going to focus on a common requirement: generation of PDFs from web pages.  The code that we will introduce isn&amp;rsquo;t merely an example produced in support of a blog post - rather it is code that was extracted from a production application, and packaged up into an appliance that you can deploy in minutes to add PDF generation to your existing application.&lt;/p&gt;

&lt;p&gt;But before we dive in, let&amp;rsquo;s back up a bit.&lt;/p&gt;
&lt;h2 id='motivation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#motivation' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Motivation&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Normally the way this is approached is to start with a tool like &lt;a href='https://github.com/puppeteer/puppeteer' title=''&gt;Puppeteer&lt;/a&gt;, &lt;a href='https://github.com/Studiosity/grover#readme' title=''&gt;Grover&lt;/a&gt;, &lt;a href='https://playwright.dev/' title=''&gt;Playwright&lt;/a&gt;, &lt;a href='https://github.com/bitcrowd/chromic_pdf' title=''&gt;ChromicPDF&lt;/a&gt;, or &lt;a href='https://spatie.be/docs/browsershot/v2/introduction' title=''&gt;BrowserShot&lt;/a&gt;.  These and other tools ultimately launch a browser like &lt;a href='https://developer.chrome.com/articles/new-headless/' title=''&gt;Chrome headless&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now a few things about Chrome itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It likely is bigger than your entire web server.
&lt;/li&gt;&lt;li&gt;It likely uses more memory than you see with a typical load on your server.
&lt;/li&gt;&lt;li&gt;All total, people using your server likely spend much less time generating PDFs than they do using the rest of your application. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Taken together, this makes splitting PDF generation into a completely separate application an easy win.  With a smaller image, your application will start faster.  Memory usage will be more predictable, and the memory needed to generate PDFs will only be allocated when needed and can be scaled separately.&lt;/p&gt;
&lt;h2 id='diving-in' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#diving-in' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Diving in&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Without further ado, the entire application is available on GitHub as &lt;a href='https://github.com/fly-apps/pdf-appliance/#readme' title=''&gt;fly-apps/pdf-appliance&lt;/a&gt;.  Installation is a simple matter of: clone repository, create app, adjust config, deploy, and scale.&lt;/p&gt;

&lt;p&gt;Next, you will need to integrate this into your application.  All that is needed is to reply to requests that are intended to produce a PDF with a &lt;a href='https://fly.io/docs/reference/dynamic-request-routing/#the-fly-replay-response-header' title=''&gt;fly-replay&lt;/a&gt; response header.  This can either be done on individual application routes / controller actions, or it can be done globally via either middleware or a front end like &lt;a href='https://www.nginx.com/' title=''&gt;NGINX&lt;/a&gt;.  You can find a few examples in the &lt;a href='https://github.com/fly-apps/pdf-appliance/#integrate-with-your-existing-application' title=''&gt;README&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And, that&amp;rsquo;s it.  The most you might consider doing is issuing an additional HTTP request in anticipation of the user selecting what they want to print as this will &lt;a href='https://github.com/fly-apps/pdf-appliance/#preloading-optional' title=''&gt;preload the machine&lt;/a&gt;.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Scale at your own pace&lt;/h1&gt;
    &lt;p&gt;Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/"&gt;
        Run your entire stack near your users
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;


&lt;p&gt;If you don&amp;rsquo;t have an application handy, you can try a demo.  Go to &lt;a href='https://smooth.fly.dev/' title=''&gt;smooth.fly.dev&lt;/a&gt;.  Click on Demo, then on Publish, and finally on Invoices to see a PDF.  The PDF you see will likely be underwhelming as you would need to enter students, entries, packages and options to fill out the page.  But click refresh anyway and see how fast it responds.  If you want to explore further, links to the &lt;a href='https://smooth.fly.dev/showcase/docs/' title=''&gt;documentation&lt;/a&gt; and &lt;a href='https://github.com/rubys/showcase#readme' title=''&gt;code&lt;/a&gt; can be found on the front page.&lt;/p&gt;
&lt;h2 id='implementation-details' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implementation-details' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Implementation Details&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The basic flow starts with a request comes into your app for a PDF.  That request is replayed to the PDF appliance.  A Chrome instance in that app then issues a second request to your app for the same URL minus the &lt;code&gt;.pdf&lt;/code&gt; extension and then converts the HTML which it receives in response to a PDF.  That PDF is then returned as the response to the original request.&lt;/p&gt;

&lt;p&gt;A single Google Chrome instance per machine will be reused across all requests, which itself is faster than starting a new instance per request.  As all HTTP headers will be passed back to your application, this will seamlessly work with your existing session, cookies, and basic authentication.&lt;/p&gt;

&lt;p&gt;Starting up a machine on demand is handled by the &lt;code&gt;auto_stop_machines&lt;/code&gt; setting in your &lt;code&gt;fly.toml&lt;/code&gt;.  With this in place, machines can confidently exit when idle, secure in the knowledge that they will be restarted when needed.  See the &lt;a href='https://github.com/fly-apps/pdf-appliance/#scaling' title=''&gt;README&lt;/a&gt; for more information on scaling.&lt;/p&gt;

&lt;p&gt;Note that different machines can use different languages and frameworks.  This code is written in JavaScript and runs on Bun.  It was designed to support a Ruby on Rails app, but can be used with any app.&lt;/p&gt;
&lt;h2 id='a-reusable-pattern' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-reusable-pattern' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A Reusable Pattern&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;If your app is small and your usage is low, scaling may not be much of a
concern, but as your need grow your first instinct shouldn&amp;rsquo;t merely be to throw
more hardware at the problem, but rather to partition the problem so that each
machine has a somewhat predictable capacity.&lt;/p&gt;

&lt;p&gt;Do this by taking a look at your application, and look for requests that are
somehow different than the rest.  Streaming audio and video files, handling websockets,
converting text to speech or performing other AI processing, long running
&amp;ldquo;background&amp;rdquo; computation, fetching static pages, producing PDFs, and updating
databases all have different profiles in terms of server load.&lt;/p&gt;

&lt;p&gt;It might even be helpful &amp;ndash; purely as a thought experiment &amp;ndash; to think of
replacing your main server with a proxy that does nothing more than route
requests to separate machines based on the type of workload performed.&lt;/p&gt;

&lt;p&gt;Once you have come up with an allocation of functions performed to pools of
machines, Fly-Replay is but one tool available to you.  There is also a
&lt;a href='https://fly.io/docs/machines/working-with-machines/' title=''&gt;Machines API&lt;/a&gt; that will
enable you to orchestrate whatever topology you can come up with.
&lt;a href='https://fly.io/laravel-bytes/cost-effective-queue-workers-with-fly-io-machines/' title=''&gt;Cost-Effective Queue Workers With Fly.io
Machines&lt;/a&gt;
gives a preview of what that would look like with Laravel.&lt;/p&gt;
</content>
  </entry>
</feed>

Raw headers

{
  "cache-control": "max-age=0, private, must-revalidate",
  "cf-cache-status": "DYNAMIC",
  "cf-ray": "95ab979b9074f32d-ORD",
  "connection": "keep-alive",
  "content-type": "text/xml",
  "date": "Sun, 06 Jul 2025 02:15:04 GMT",
  "etag": "W/\"6867019b-e72c2\"",
  "fly-request-id": "01JZER7ZAR09DJYA7NY34KK3NG-ord",
  "last-modified": "Thu, 03 Jul 2025 22:18:03 GMT",
  "server": "cloudflare",
  "set-cookie": "fly_gtm={}; path=/; expires=Mon, 06 Jul 2026 02:15:04 GMT; max-age=31536000; secure; HttpOnly; SameSite=Lax",
  "transfer-encoding": "chunked",
  "vary": "accept-encoding",
  "via": "1.1 fly.io, 1.1 fly.io, 1.1 fly.io"
}

Parsed with @rowanmanning/feed-parser

{
  "meta": {
    "type": "atom",
    "version": "1.0"
  },
  "language": null,
  "title": "The Fly Blog",
  "description": "News, tips, and tricks from the team at Fly",
  "copyright": null,
  "url": "https://fly.io/blog/",
  "self": "https://fly.io/blog/",
  "published": null,
  "updated": "2025-06-20T00:00:00.000Z",
  "generator": null,
  "image": null,
  "authors": [
    {
      "name": "Fly",
      "email": null,
      "url": null
    }
  ],
  "categories": [],
  "items": [
    {
      "id": "https://fly.io/blog/phoenix-new-the-remote-ai-runtime/",
      "title": "Phoenix.new – The Remote AI Runtime for Phoenix",
      "description": null,
      "url": "https://fly.io/blog/phoenix-new-the-remote-ai-runtime/",
      "published": "2025-06-20T00:00:00.000Z",
      "updated": "2025-07-03T17:13:29.000Z",
      "content": "<div class=\"lead\"><p>I’m Chris McCord, the creator of Elixir’s Phoenix framework. For the past several months, I’ve been working on a skunkworks project at Fly.io, and it’s time to show it off.</p>\n</div>\n<p>I wanted LLM agents to work just as well with Elixir as they do with Python and JavaScript. Last December, in order to figure out what that was going to take, I started a little weekend project to find out how difficult it would be to build a coding agent in Elixir.</p>\n\n<p>A few weeks later, I had it spitting out working Phoenix applications and driving a full in-browser IDE. I knew this wasn’t going to stay a weekend project.</p>\n\n<p>If you follow me on Twitter, you’ve probably seen me teasing this work as it picked up steam. We’re at a point where we’re pretty serious about this thing, and so it’s time to make a formal introduction.</p>\n\n<p>World, meet <a href='https://phoenix.new' title=''>Phoenix.new</a>, a batteries-included fully-online coding agent tailored to Elixir and Phoenix. I think it’s going to be the fastest way to build collaborative, real-time applications.</p>\n\n<p>Let’s see it in action:</p>\n<div class=\"youtube-container\" data-exclude-render>\n  <div class=\"youtube-video\">\n    <iframe\n      width=\"100%\"\n      height=\"100%\"\n      src=\"https://www.youtube.com/embed/du7GmWGUM5Y\"\n      frameborder=\"0\"\n      allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n      allowfullscreen>\n    </iframe>\n  </div>\n</div>\n\n<h2 id='whats-interesting-about-phoenix-new' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-interesting-about-phoenix-new' aria-label='Anchor'></a><span class='plain-code'>What’s Interesting About Phoenix.new</span></h2>\n<p>First, even though it runs entirely in your browser, Phoenix.new gives both you and your agent a root shell, in an ephemeral virtual machine (a <a href='https://fly.io/docs/machines/overview/' title=''>Fly Machine</a>) that gives our agent loop free rein to install things and run programs  — without any risk of messing up your local machine. You don’t think about any of this; you just open up the VSCode interface, push the shell button, and there you are, on the isolated machine you share with the Phoenix.new agent.</p>\n\n<p>Second, it’s an agent system I built specifically for Phoenix. Phoenix is about real-time collaborative applications, and Phoenix.new knows what that means. To that end, Phoenix.new includes, in both its UI and its agent tools, a full browser. The Phoenix.new agent uses that browser “headlessly” to check its own front-end changes and interact with the app. Because it’s a full browser, instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state – with or without a human present.</p>\n<h2 id='what-root-access-gets-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-root-access-gets-us' aria-label='Anchor'></a><span class='plain-code'>What Root Access Gets Us</span></h2>\n<p>Agents build software the way you did when you first got started, the way you still do today when you prototype things. They don’t carefully design Docker container layers and they don’t really do release cycles. An agent wants to pop a shell and get its fingernails dirty.</p>\n\n<p>A fully isolated virtual machine means Phoenix.new’s fingernails can get <em>arbitrarily dirty.</em> If it wants to add a package to <code>mix.exs</code>, it can do that and then run <code>mix phx.server</code> or <code>mix test</code> and check the output. Sure. Every agent can do that. But if it wants to add an APT package to the base operating system, it can do that too, and make sure it worked. It owns the whole environment.</p>\n\n<p>This offloads a huge amount of tedious, repetitive work.</p>\n\n<p>At his <a href='https://youtu.be/LCEmiRjPEtQ?si=sR_bdu6-AqPXSNmY&t=1902' title=''>AI Startup School talk last week</a>, Andrej Karpathy related his experience of building a restaurant menu visualizer, which takes camera pictures of text menus and transforms all the menu items into pictures. The code, which he vibe-coded with an LLM agent, was the easy part; he had it working in an afternoon. But getting the app online took him a whole week.</p>\n\n<p>With Phoenix.new, I’m taking dead aim at this problem. The apps we produce live in the cloud from the minute they launch. They have private, shareable URLs (we detect anything the agent generates with a bound port and give it a preview URL underneath <code>phx.run</code>, with integrated port-forwarding), they integrate with Github, and they inherit all the infrastructure guardrails of Fly.io: hardware virtualization, WireGuard, and isolated networks.</p>\n<div class=\"callout\"><p>Github’s <code>gh</code> CLI is installed by default. So the agent knows how to clone any repo, or browse issues, and you can even authorize it for internal repositories to get it working with your team’s existing projects and dependencies.</p>\n</div>\n<p>Full control of the environment also closes the loop between the agent and deployment. When Phoenix.new boots an app, it watches the logs, and tests the application. When an action triggers an error, Phoenix.new notices and gets to work.</p>\n<h2 id='watch-it-build-in-real-time' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#watch-it-build-in-real-time' aria-label='Anchor'></a><span class='plain-code'>Watch It Build In Real Time</span></h2>\n<p><a href='https://phoenix.new' title=''>Phoenix.new</a> can interact with web applications the way users do: with a real browser.</p>\n\n<p>The Phoenix.new environment includes a headless Chrome browser that our agent knows how to drive. Prompt it to add a front-end feature to your application, and it won’t just sketch the code out and make sure it compiles and lints. It’ll pull the app up itself and poke at the UI, simultaneously looking at the page content, JavaScript state, and server-side logs.</p>\n\n<p>Phoenix is all about <a href='https://fly.io/blog/how-we-got-to-liveview/' title=''>“live” real-time</a> interactivity, and gives us seamless live reload. The user interface for Phoenix.new itself includes a live preview of the app being worked on, so you can kick back and watch it build front-end features incrementally. Any other <code>.phx.run</code> tabs you have open also update as it goes. It’s wild.</p>\n<video title=\"agent interacting with web\" autoplay=\"autoplay\" loop=\"loop\" muted=\"muted\" playsinline=\"playsinline\" disablePictureInPicture=\"true\" class=\"mb-8\" src=\"/blog/phoenix-new-the-remote-ai-runtime/assets/webjs.mp4\"></video>\n\n<h2 id='not-just-for-vibe-coding' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-just-for-vibe-coding' aria-label='Anchor'></a><span class='plain-code'>Not Just For Vibe Coding</span></h2>\n<p>Phoenix.new can already build real, full-stack applications with WebSockets, Phoenix’s Presence features, and real databases. I’m seeing it succeed at business and collaborative applications right now.</p>\n\n<p>But there’s no fixed bound on the tasks you can reasonably ask it to accomplish. If you can do it with a shell and a browser, I want Phoenix.new to do it too. And it can do these tasks with or without you present.</p>\n\n<p>For example: set a <code>$DATABASE_URL</code> and tell the agent about it. The agent knows enough to go explore it with <code>psql</code>, and it’ll propose apps based on the schemas it finds. It can model Ecto schemas off the database. And if MySQL is your thing, the agent will just <code>apt install</code> a MySQL client and go to town.</p>\n\n<p>Frontier model LLMs have vast world knowledge. They generalize extremely well. At ElixirConfEU, I did a <a href='https://www.youtube.com/watch?v=ojL_VHc4gLk&t=3923s' title=''>demo vibe-coding Tetris</a> on stage. Phoenix.new nailed it, first try, first prompt. It’s not like there’s gobs of Phoenix LiveView Tetris examples floating around the Internet! But lots of people have published Tetris code, and lots of people have written LiveView stuff, and 2025 LLMs can connect those dots.</p>\n\n<p>At this point you might be wondering – can I just ask it to build a Rails app? Or an Expo React Native app? Or Svelte? Or Go?</p>\n\n<p>Yes, you can.</p>\n\n<p>Our system prompt is tuned for Phoenix today, but all languages you care about are already installed. We’re still figuring out where to take this, but adding new languages and frameworks definitely ranks highly in my plans.</p>\n<h2 id='our-async-agent-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-async-agent-future' aria-label='Anchor'></a><span class='plain-code'>Our Async Agent Future</span></h2>\n<p><a href='https://fly.io/blog/youre-all-nuts/' title=''>We’re at a massive step-change in developer workflows</a>.</p>\n\n<p>Agents can do real work, today, with or without a human present. Buckle up: the future of development, at least in the common case, probably looks less like cracking open a shell and finding a file to edit, and more like popping into a CI environment with agents working away around the clock.</p>\n\n<p>Local development isn’t going away. But there’s going to be a shift in where the majority of our iterations take place. I’m already using Phoenix.new to triage <code>phoenix-core</code> Github issues and pick problems to solve. I close my laptop, grab a cup of coffee, and wait for a PR to arrive — Phoenix.new knows how PRs work, too. We’re already here, and this space is just getting started.</p>\n\n<p>This isn’t where I thought I’d end up when I started poking around. The Phoenix and LiveView journey was much the same. Something special was there and the projects took on a life of their own. I’m excited to share this work now, and see where it might take us. I can’t wait to see what folks build.</p>",
      "image": {
        "url": "https://fly.io/blog/phoenix-new-the-remote-ai-runtime/assets/phoenixnew-thumb.png",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/mcps-everywhere/",
      "title": "What are MCP Servers?",
      "description": null,
      "url": "https://fly.io/blog/mcps-everywhere/",
      "published": "2025-06-12T00:00:00.000Z",
      "updated": "2025-06-12T10:05:13.000Z",
      "content": "<div class=\"lead\"><div><p>With Fly.io, <a href=\"https://fly.io/docs/speedrun/\" title=\"\">you can get your app running globally in a matter of minutes</a>, and with MCP servers you can integrate with Claude, VSCode, Cursor and <a href=\"https://modelcontextprotocol.io/clients\">many more AI clients</a>.  <a href=\"https://fly.io/docs/mcp/\" title=\"\">Try it out for yourself</a>!</p>\n</div></div>\n<p>The introduction to <a href='https://modelcontextprotocol.io/introduction' title=''>Model Context Protocol</a> starts out with:</p>\n\n<blockquote>\n<p>MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.</p>\n</blockquote>\n\n<p>That paragraph, to me, is both comforting (“USB for LLM”? Cool! Got it!), and simultaneously vacuous (Um, but what do I actually <em>do</em> with this?).</p>\n\n<p>I’ve been digging deeper and have come up with a few more analogies and observations that make sense to me. Perhaps one or more of these will help you.</p>\n<h2 id='mcps-are-alexa-skills' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-alexa-skills' aria-label='Anchor'></a><span class='plain-code'>MCPs are Alexa Skills</span></h2>\n<p>You buy an Echo Dot and a Hue light. You plug them both in and connect them to your Wi-Fi. There is one more step you need to do: you need to install and enable the Philips Hue skill to connect the two.</p>\n\n<p>Now you might be using Siri or Google Assistant. Or you may want to connect a Ring Doorbell camera or Google Nest Thermostat. But the principle is the same, though the analogy is slightly stronger with a skill (which is a noun) as opposed to the action of pairing your Hue Bridge with Apple HomeKit (a verb).</p>\n<h2 id='mcps-are-api-2-0' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-api-2-0' aria-label='Anchor'></a><span class='plain-code'>MCPs are API 2.0</span></h2>\n<p>HTTP 1.1 is simple. You send a request, you get a response. While there are cookies and sessions, it is pretty much stateless and therefore inefficient, as each request needs to establish a new connection. WebSockets and Server-Sent Events (SSE) mitigate this a bit.</p>\n\n<p><a href='https://en.wikipedia.org/wiki/HTTP/2' title=''>HTTP 2.0</a> introduces features like multiplexing and server push. When you visit a web page for the first time, your browser can now request all of the associated JavaScript, CSS, and images at once, and the server can respond in any order.</p>\n\n<p>APIs today are typically request/response. MCPs support multiplexing and server push.</p>\n<h2 id='mcps-are-apis-with-introspection-reflection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-apis-with-introspection-reflection' aria-label='Anchor'></a><span class='plain-code'>MCPs are APIs with Introspection/Reflection</span></h2>\n<p>With <a href='https://learn.openapis.org/' title=''>OpenAPI</a>, requests are typically JSON, and responses are too. Many OpenAPI providers publish a separate <a href='https://learn.openapis.org/specification/structure.html' title=''>OpenAPI Description (OAD)</a>, which contains a schema describing what requests are supported by that API.</p>\n\n<p>With MCP, requests are also JSON and responses are also JSON, but in addition, there are standard requests built into the protocol to obtain useful information, including what tools are provided, what arguments each tool expects, and a prose description of how each tool is expected to be used.</p>\n\n<p>As an aside, don’t automatically assume that you will get good results from <a href='https://neon.com/blog/autogenerating-mcp-servers-openai-schemas' title=''>auto-generating MCP Servers from OpenAPI schemas</a>:</p>\n\n<blockquote>\n<p>Effective MCP design means thinking about the workflows you want the agent to perform and potentially creating higher-level tools that encapsulate multiple API calls, rather than just exposing the raw building blocks of your entire API spec.</p>\n</blockquote>\n\n<p><a href='https://glama.ai/blog/2025-06-06-mcp-vs-api#five-fundamental-differences' title=''>MCP vs API</a> goes into this topic at greater debth.</p>\n\n<p>In many cases you will get better results treating LLMs as humans. If you have a CLI, consider using that as the starting point instead.</p>\n<h2 id='mcps-are-not-serverless' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-not-serverless' aria-label='Anchor'></a><span class='plain-code'>MCPs are <strong><i>not</i></strong> serverless</span></h2>\n<p><a href='https://en.wikipedia.org/wiki/Serverless_computing' title=''>Serverless</a>, sometimes known as <a href='https://en.wikipedia.org/wiki/Function_as_a_service' title=''>FaaS</a>, is a popular cloud computing model, where AWS Lambda is widely recognized as the archetype.</p>\n\n<p>MCP servers are not serverless; they have a well-defined and long-lived <a href='https://modelcontextprotocol.io/specification/2025-03-26/basic/lifecycle' title=''>lifecycle</a>:</p>\n\n<p><svg aria-roledescription=\"sequence\" role=\"graphics-document document\" viewBox=\"-50 -10 482 651\" style=\"max-width: 482px;\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" xmlns=\"http://www.w3.org/2000/svg\" width=\"100%\" id=\"rm\"><rect class=\"rect\" height=\"70\" width=\"302\" fill=\"rgb(200, 220, 250)\" y=\"325\" x=\"40\"></rect><g><rect class=\"actor actor-bottom\" ry=\"3\" rx=\"3\" name=\"Server\" height=\"65\" width=\"150\" stroke=\"#666\" fill=\"#eaeaea\" y=\"565\" x=\"232\"></rect><text class=\"actor actor-box\" alignment-baseline=\"central\" dominant-baseline=\"central\" style=\"text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;\" y=\"597.5\" x=\"307\"><tspan dy=\"0\" x=\"307\">Server</tspan></text></g><g><rect class=\"actor actor-bottom\" ry=\"3\" rx=\"3\" name=\"Client\" height=\"65\" width=\"150\" stroke=\"#666\" fill=\"#eaeaea\" y=\"565\" x=\"0\"></rect><text class=\"actor actor-box\" alignment-baseline=\"central\" dominant-baseline=\"central\" style=\"text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;\" y=\"597.5\" x=\"75\"><tspan dy=\"0\" x=\"75\">Client</tspan></text></g><g><line name=\"Server\" stroke=\"#999\" stroke-width=\"0.5px\" class=\"actor-line 200\" y2=\"565\" x2=\"307\" y1=\"65\" x1=\"307\" id=\"actor10\"></line><g id=\"root-10\"><rect class=\"actor actor-top\" ry=\"3\" rx=\"3\" name=\"Server\" height=\"65\" width=\"150\" stroke=\"#666\" fill=\"#eaeaea\" y=\"0\" x=\"232\"></rect><text class=\"actor actor-box\" alignment-baseline=\"central\" dominant-baseline=\"central\" style=\"text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;\" y=\"32.5\" x=\"307\"><tspan dy=\"0\" x=\"307\">Server</tspan></text></g></g><g><line name=\"Client\" stroke=\"#999\" stroke-width=\"0.5px\" class=\"actor-line 200\" y2=\"565\" x2=\"75\" y1=\"65\" x1=\"75\" id=\"actor9\"></line><g id=\"root-9\"><rect class=\"actor actor-top\" ry=\"3\" rx=\"3\" name=\"Client\" height=\"65\" width=\"150\" stroke=\"#666\" fill=\"#eaeaea\" y=\"0\" x=\"0\"></rect><text class=\"actor actor-box\" alignment-baseline=\"central\" dominant-baseline=\"central\" style=\"text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;\" y=\"32.5\" x=\"75\"><tspan dy=\"0\" x=\"75\">Client</tspan></text></g></g><style>#rm{font-family:inherit;font-size:16px;fill:#333;}#rm .error-icon{fill:#552222;}#rm .error-text{fill:#552222;stroke:#552222;}#rm .edge-thickness-normal{stroke-width:1px;}#rm .edge-thickness-thick{stroke-width:3.5px;}#rm .edge-pattern-solid{stroke-dasharray:0;}#rm .edge-thickness-invisible{stroke-width:0;fill:none;}#rm .edge-pattern-dashed{stroke-dasharray:3;}#rm .edge-pattern-dotted{stroke-dasharray:2;}#rm .marker{fill:#333333;stroke:#333333;}#rm .marker.cross{stroke:#333333;}#rm svg{font-family:inherit;font-size:16px;}#rm p{margin:0;}#rm .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm text.actor>tspan{fill:black;stroke:none;}#rm .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#rm .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#rm .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#rm #arrowhead path{fill:#333;stroke:#333;}#rm .sequenceNumber{fill:white;}#rm #sequencenumber{fill:#333;}#rm #crosshead path{fill:#333;stroke:#333;}#rm .messageText{fill:#333;stroke:none;}#rm .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm .labelText,#rm .labelText>tspan{fill:black;stroke:none;}#rm .loopText,#rm .loopText>tspan{fill:black;stroke:none;}#rm .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#rm .note{stroke:#aaaa33;fill:#fff5ad;}#rm .noteText,#rm .noteText>tspan{fill:black;stroke:none;}#rm .activation0{fill:#f4f4f4;stroke:#666;}#rm .activation1{fill:#f4f4f4;stroke:#666;}#rm .activation2{fill:#f4f4f4;stroke:#666;}#rm .actorPopupMenu{position:absolute;}#rm .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#rm .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm .actor-man circle,#rm line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#rm :root{--mermaid-font-family:inherit;}</style><g></g><defs><symbol height=\"24\" width=\"24\" id=\"computer\"><path d=\"M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z\" transform=\"scale(.5)\"></path></symbol></defs><defs><symbol clip-rule=\"evenodd\" fill-rule=\"evenodd\" id=\"database\"><path d=\"M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z\" transform=\"scale(.5)\"></path></symbol></defs><defs><symbol height=\"24\" width=\"24\" id=\"clock\"><path d=\"M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z\" transform=\"scale(.5)\"></path></symbol></defs><defs><marker orient=\"auto-start-reverse\" markerHeight=\"12\" markerWidth=\"12\" markerUnits=\"userSpaceOnUse\" refY=\"5\" refX=\"7.9\" id=\"arrowhead\"><path d=\"M -1 0 L 10 5 L 0 10 z\"></path></marker></defs><defs><marker refY=\"4.5\" refX=\"4\" orient=\"auto\" markerHeight=\"8\" markerWidth=\"15\" id=\"crosshead\"><path d=\"M 1,2 L 6,7 M 6,2 L 1,7\" stroke-width=\"1pt\" style=\"stroke-dasharray: 0px, 0px;\" stroke=\"#000000\" fill=\"none\"></path></marker></defs><defs><marker orient=\"auto\" markerHeight=\"28\" markerWidth=\"20\" refY=\"7\" refX=\"15.5\" id=\"filled-head\"><path d=\"M 18,7 L9,13 L14,7 L9,1 Z\"></path></marker></defs><defs><marker orient=\"auto\" markerHeight=\"40\" markerWidth=\"60\" refY=\"15\" refX=\"15\" id=\"sequencenumber\"><circle r=\"6\" cy=\"15\" cx=\"15\"></circle></marker></defs><g><rect class=\"note\" height=\"40\" width=\"282\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"75\" x=\"50\"></rect><text dy=\"1em\" class=\"noteText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"80\" x=\"191\"><tspan x=\"191\">Initialization Phase</tspan></text></g><g><rect class=\"activation0\" height=\"380\" width=\"10\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"115\" x=\"70\"></rect></g><g><rect class=\"activation0\" height=\"328\" width=\"10\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"167\" x=\"302\"></rect></g><g><rect class=\"note\" height=\"40\" width=\"282\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"275\" x=\"50\"></rect><text dy=\"1em\" class=\"noteText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"280\" x=\"191\"><tspan x=\"191\">Operation Phase</tspan></text></g><g><rect class=\"note\" height=\"40\" width=\"282\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"345\" x=\"50\"></rect><text dy=\"1em\" class=\"noteText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"350\" x=\"191\"><tspan x=\"191\">Normal protocol operations</tspan></text></g><g><rect class=\"note\" height=\"40\" width=\"282\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"405\" x=\"50\"></rect><text dy=\"1em\" class=\"noteText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"410\" x=\"191\"><tspan x=\"191\">Shutdown</tspan></text></g><g><rect class=\"note\" height=\"40\" width=\"282\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"505\" x=\"50\"></rect><text dy=\"1em\" class=\"noteText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"510\" x=\"191\"><tspan x=\"191\">Connection closed</tspan></text></g><text dy=\"1em\" class=\"messageText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"130\" x=\"190\">initialize request</text><line marker-end=\"url(#arrowhead)\" style=\"fill: none;\" stroke=\"none\" stroke-width=\"2\" class=\"messageLine0\" y2=\"165\" x2=\"299\" y1=\"165\" x1=\"80\"></line><text dy=\"1em\" class=\"messageText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"180\" x=\"193\">initialize response</text><line marker-end=\"url(#arrowhead)\" stroke=\"none\" stroke-width=\"2\" class=\"messageLine1\" style=\"stroke-dasharray: 3px, 3px; fill: none;\" y2=\"215\" x2=\"83\" y1=\"215\" x1=\"302\"></line><text dy=\"1em\" class=\"messageText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"230\" x=\"190\">initialized notification</text><line marker-end=\"url(#filled-head)\" stroke=\"none\" stroke-width=\"2\" class=\"messageLine1\" style=\"stroke-dasharray: 3px, 3px; fill: none;\" y2=\"265\" x2=\"299\" y1=\"265\" x1=\"80\"></line><text dy=\"1em\" class=\"messageText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"460\" x=\"190\">Disconnect</text><line marker-end=\"url(#filled-head)\" stroke=\"none\" stroke-width=\"2\" class=\"messageLine1\" style=\"stroke-dasharray: 3px, 3px; fill: none;\" y2=\"495\" x2=\"299\" y1=\"495\" x1=\"80\"></line></svg></p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>You can play with this right now.</h1>\n    <p>MCPs are barely six months old, but we are keeping up with the latest</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/docs/mcp\">\n        Try launching your MCP server on Fly.io today <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-kitty.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='mcps-are-not-inherently-secure-or-private' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-not-inherently-secure-or-private' aria-label='Anchor'></a><span class='plain-code'>MCPs are <strong><i>not</i></strong> Inherently Secure or Private</span></h2>\n<p>Here I am not talking about <a href='https://simonwillison.net/2025/Apr/9/mcp-prompt-injection/' title=''>prompt injection</a> or <a href='https://simonwillison.net/2025/May/26/github-mcp-exploited/' title=''>exploitable abilities</a>, though those are real problems too.</p>\n\n<p>I’m talking about something more fundamental and basic. Let’s take a look at the very same <a href='https://github.com/github/github-mcp-server' title=''>GitHub MCP</a> featured in the above two descriptions of an exploitable exposure. The current process for installing a stdio MCP into Claude involves placing secrets in plain text in a well-defined location. And the process for installing the <em>next</em> MCP server is to download a program from a third party and run that tool in a way that has access to this very file.</p>\n\n<p>Addressing MCP security requires a holistic approach, but one key strategy component is the ability to run an MCP server on a remote machine which can only be accessed by you, and only after you present a revocable bearer token. That remote machine may require access to secrets (like your GitHub token), but those secrets are not something either your LLM client or other MCP servers will be able to access directly.</p>\n<h2 id='mcps-should-be-considered-family' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-should-be-considered-family' aria-label='Anchor'></a><span class='plain-code'>MCPs should be considered family</span></h2>\n<p>Recapping: <a href='https://www.usa.philips.com/' title=''>Philips</a> has an <a href='https://developers.meethue.com/' title=''>API and SDK</a> for Hue that is used by perhaps thousands, and has an <a href='https://www.amazon.com/Philips-Hue/dp/B01LWAGUG7' title=''>Alexa Skill</a> that is used by untold millions. Of course, somebody already built a <a href='https://github.com/ThomasRohde/hue-mcp' title=''>Philips Hue MCP Server</a>.</p>\n\n<p>LLMs are brains. MCPs give LLMs eyes, hands, and legs. The end result is a robot. Most MCPs today are brainless—they merely are eyes, hands, and legs. The results are appliances. I buy and install a dishwasher. You do too. Both of our dishwashers perform the same basic function. As do any Hue lights we may buy.</p>\n\n<p>In The Jetsons, <a href='https://thejetsons.fandom.com/wiki/Rosey' title=''>Rosie</a> is a maid and housekeeper who is also a member of the family. She is good friends with Jane and takes the role of a surrogate aunt towards Elroy and Judy. Let’s start there and go further.</p>\n\n<p>A person with an LLM can build things. Imagine having a 3D printer that makes robots or agents that act on your behalf. You may want an agent to watch for trends in your business, keep track of what events are happening in your area, screen your text messages, or act as a DJ when you get together with friends.</p>\n\n<p>You may share the MCP servers you create with others to use as starting points, but they undoubtedly will adapt them and make them their own.</p>\n<h2 id='closing-thoughts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#closing-thoughts' aria-label='Anchor'></a><span class='plain-code'>Closing Thoughts</span></h2>\n<p>Don’t get me wrong. I am not saying there won’t be MCPs that are mere appliances—there undoubtedly will be plenty of those. But not all MCPs will be appliances; many will be agents.</p>\n\n<p>Today, most people exploring LLMs are trying to add LLMs to things they already have—for example, IDEs. MCPs flip this. Instead of adding LLMs to something you already have, you add something you already have to an LLM.</p>\n\n<p><a href='https://desktopcommander.app/' title=''>Desktop Commander MCP</a> is an example I’m particularly fond of that does exactly this. It also happens to be a prime example of a tool you want to give root access to, then lock it in a secure box and run someplace other than your laptop. <a href='https://fly.io/docs/mcp/examples/#desktop-commander' title=''>Give it a try</a>.</p>\n\n<p>Microsoft is actively working on <a href='https://developer.microsoft.com/en-us/windows/agentic/' title=''>Agentic Windows</a>. Your smartphone is the obvious next frontier. Today, you have an app store and a web browser. MCP servers have the potential to make both obsolete—and this could happen sooner than you think.</p>",
      "image": {
        "url": "https://fly.io/blog/mcps-everywhere/assets/mcps-everywhere.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/youre-all-nuts/",
      "title": "My AI Skeptic Friends Are All Nuts",
      "description": null,
      "url": "https://fly.io/blog/youre-all-nuts/",
      "published": "2025-06-02T00:00:00.000Z",
      "updated": "2025-06-05T21:38:19.000Z",
      "content": "<div class=\"lead\"><p>A heartfelt provocation about AI-assisted programming.</p>\n</div>\n<p>Tech execs are mandating LLM adoption. That’s bad strategy. But I get where they’re coming from.</p>\n\n<p>Some of the smartest people I know share a bone-deep belief that AI is a fad —  the next iteration of NFT mania. I’ve been reluctant to push back on them, because, well, they’re smarter than me. But their arguments are unserious, and worth confronting. Extraordinarily talented people are doing work that LLMs already do better, out of  spite.</p>\n\n<p>All progress on LLMs could halt today, and LLMs would remain the 2nd most important thing to happen over the course of my career.</p>\n<div class=\"callout\"><p><strong class=\"font-semibold text-navy-950\">Important caveat</strong>: I’m discussing only the implications of LLMs for software development. For art, music, and writing? I got nothing. I’m inclined to believe the skeptics in those fields. I just don’t believe them about mine.</p>\n</div>\n<p>Bona fides: I’ve been shipping software since the mid-1990s. I started out in boxed, shrink-wrap C code.  Survived an ill-advised <a href='https://www.amazon.com/Modern-Design-Generic-Programming-Patterns/dp/0201704315' title=''>Alexandrescu</a> C++ phase. Lots of Ruby and Python tooling.  Some kernel work. A whole lot of server-side C, Go, and Rust. However you define “serious developer”, I qualify. Even if only on one of your lower tiers.</p>\n<h3 id='level-setting' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#level-setting' aria-label='Anchor'></a><span class='plain-code'>level setting</span></h3><div class=\"right-sidenote\"><p>† (or, God forbid, 2 years ago with Copilot)</p>\n</div>\n<p>First, we need to get on the same page. If you were trying and failing to use an LLM for code 6 months ago †, you’re not doing what most serious LLM-assisted coders are doing.</p>\n\n<p>People coding with LLMs today use agents. Agents get to poke around your codebase on their own. They author files directly. They run tools. They compile code, run tests, and iterate on the results. They also:</p>\n\n<ul>\n<li>pull in arbitrary code from the tree, or from other trees online, into their context windows,\n</li><li>run standard Unix tools to navigate the tree and extract information,\n</li><li>interact with Git,\n</li><li>run existing tooling, like linters, formatters, and model checkers, and\n</li><li>make essentially arbitrary tool calls (that you set up) through MCP.\n</li></ul>\n<div class=\"callout\"><p>The code in an agent that actually “does stuff” with code is not, itself, AI. This should reassure you. It’s surprisingly simple systems code, wired to ground truth about programming in the same way a Makefile is. You could write an effective coding agent in a weekend. Its strengths would have more to do with how you think about and structure builds and linting and test harnesses than with how advanced o3 or Sonnet have become.</p>\n</div>\n<p>If you’re making requests on a ChatGPT page and then pasting the resulting (broken) code into your editor, you’re not doing what the AI boosters are doing. No wonder you’re talking past each other.</p>\n<h3 id='the-positive-case' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-positive-case' aria-label='Anchor'></a><span class='plain-code'>the positive case</span></h3>\n<p><img alt=\"four quadrants of tedium and importance\" src=\"/blog/youre-all-nuts/assets/code-quad.png?2/3&center\" /></p>\n\n<p>LLMs can write a large fraction of all the tedious code you’ll ever need to write. And most code on most projects is tedious. LLMs drastically reduce the number of things you’ll ever need to Google. They look things up themselves. Most importantly, they don’t get tired; they’re immune to inertia.</p>\n\n<p>Think of anything you wanted to build but didn’t. You tried to home in on some first steps. If you’d been in the limerent phase of a new programming language, you’d have started writing. But you weren’t, so you put it off, for a day, a year, or your whole career.</p>\n\n<p>I can feel my blood pressure rising thinking of all the bookkeeping and Googling and dependency drama of a new project. An LLM can be instructed to just figure all that shit out. Often, it will drop you precisely at that golden moment where shit almost works, and development means tweaking code and immediately seeing things work better. That dopamine hit is why I code.</p>\n\n<p>There’s a downside. Sometimes, gnarly stuff needs doing. But you don’t wanna do it. So you refactor unit tests, soothing yourself with the lie that you’re doing real work. But an LLM can be told to go refactor all your unit tests. An agent can occupy itself for hours putzing with your tests in a VM and come back later with a PR. If you listen to me, you’ll know that. You’ll feel worse yak-shaving. You’ll end up doing… real work.</p>\n<h3 id='but-you-have-no-idea-what-the-code-is' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-you-have-no-idea-what-the-code-is' aria-label='Anchor'></a><span class='plain-code'>but you have no idea what the code is</span></h3>\n<p>Are you a vibe coding Youtuber? Can you not read code? If so: astute point. Otherwise: what the fuck is wrong with you?</p>\n\n<p>You’ve always been responsible for what you merge to <code>main</code>. You were five years go. And you are tomorrow, whether or not you use an LLM.</p>\n\n<p>If you build something with an LLM that people will depend on, read the code. In fact, you’ll probably do more than that. You’ll spend 5-10 minutes knocking it back into your own style. LLMs are <a href='https://github.com/PatrickJS/awesome-cursorrules' title=''>showing signs of adapting</a> to local idiom, but we’re not there yet.</p>\n\n<p>People complain about LLM-generated code being “probabilistic”. No it isn’t. It’s code. It’s not Yacc output. It’s knowable. The LLM might be stochastic. But the LLM doesn’t matter. What matters is whether you can make sense of the result, and whether your guardrails hold.</p>\n\n<p>Reading other people’s code is part of the job. If you can’t metabolize the boring, repetitive code an LLM generates: skills issue! How are you handling the chaos human developers turn out on a deadline?</p>\n<div class=\"right-sidenote\"><p>† (because it can hold 50-70kloc in its context window)</p>\n</div>\n<p>For the last month or so, Gemini 2.5 has been my go-to †. Almost nothing it spits out for me merges without edits. I’m sure there’s a skill to getting a SOTA model to one-shot a feature-plus-merge! But I don’t care. I like moving the code around and chuckling to myself while I delete all the stupid comments. I have to read the code line-by-line anyways.</p>\n<h3 id='but-hallucination' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-hallucination' aria-label='Anchor'></a><span class='plain-code'>but hallucination</span></h3>\n<p>If hallucination matters to you, your programming language has let you down.</p>\n\n<p>Agents lint. They compile and run tests. If their LLM invents a new function signature, the agent sees the error. They feed it back to the LLM, which says “oh, right, I totally made that up” and then tries again.</p>\n\n<p>You’ll only notice this happening if you watch the chain of thought log your agent generates. Don’t. This is why I like <a href='https://zed.dev/agentic' title=''>Zed’s agent mode</a>: it begs you to tab away and let it work, and pings you with a desktop notification when it’s done.</p>\n\n<p>I’m sure there are still environments where hallucination matters. But “hallucination” is the first thing developers bring up when someone suggests using LLMs, despite it being (more or less) a solved problem.</p>\n<h3 id='but-the-code-is-shitty-like-that-of-a-junior-developer' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-code-is-shitty-like-that-of-a-junior-developer' aria-label='Anchor'></a><span class='plain-code'>but the code is shitty, like that of a junior developer</span></h3>\n<p>Does an intern cost $20/month? Because that’s what Cursor.ai costs.</p>\n\n<p>Part of being a senior developer is making less-able coders productive, be they fleshly or algebraic. Using agents well is both a both a  skill and an engineering project all its own, of prompts, indices, <a href='https://fly.io/blog/semgrep-but-for-real-now/' title=''>and (especially) tooling.</a> LLMs only produce shitty code if you let them.</p>\n<div class=\"right-sidenote\"><p>† (Also: 100% of all the Bash code you should author ever again)</p>\n</div>\n<p>Maybe the current confusion is about who’s doing what work. Today, LLMs do a lot of typing, Googling, test cases †, and edit-compile-test-debug cycles. But even the most Claude-poisoned serious developers in the world still own curation, judgement, guidance, and direction.</p>\n\n<p>Also: let’s stop kidding ourselves about how good our human first cuts really are.</p>\n<h3 id='but-its-bad-at-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-its-bad-at-rust' aria-label='Anchor'></a><span class='plain-code'>but it’s bad at rust</span></h3>\n<p>It’s hard to get a good toolchain for Brainfuck, too. Life’s tough in the aluminum siding business.</p>\n<div class=\"right-sidenote\"><p>† (and they surely will; the Rust community takes tooling seriously)</p>\n</div>\n<p>A lot of LLM skepticism probably isn’t really about LLMs. It’s projection. People say “LLMs can’t code” when what they really mean is “LLMs can’t write Rust”. Fair enough! But people select languages in part based on how well LLMs work with them, so Rust people should get on that †.</p>\n\n<p>I work mostly in Go. I’m confident the designers of the Go programming language didn’t set out to produce the most LLM-legible language in the industry. They succeeded nonetheless. Go has just enough type safety, an extensive standard library, and a culture that prizes (often repetitive) idiom. LLMs kick ass generating it.</p>\n\n<p>All this is to say: I write some Rust. I like it fine. If LLMs and Rust aren’t working for you, I feel you. But if that’s your whole thing, we’re not having the same argument.</p>\n<h3 id='but-the-craft' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-craft' aria-label='Anchor'></a><span class='plain-code'>but the craft</span></h3>\n<p>Do you like fine Japanese woodworking? All hand tools and sashimono joinery? Me too. Do it on your own time.</p>\n<div class=\"right-sidenote\"><p>† (I’m a piker compared to my woodworking friends)</p>\n</div>\n<p>I have a basic wood shop in my basement †. I could get a lot of satisfaction from building a table. And, if that table is a workbench or a grill table, sure, I’ll build it. But if I need, like, a table? For people to sit at? In my office? I buy a fucking table.</p>\n\n<p>Professional software developers are in the business of solving practical problems for people with code. We are not, in our day jobs, artisans. Steve Jobs was wrong: we do not need to carve the unseen feet in the sculpture. Nobody cares if the logic board traces are pleasingly routed. If anything we build endures, it won’t be because the codebase was beautiful.</p>\n\n<p>Besides, that’s not really what happens. If you’re taking time carefully golfing functions down into graceful, fluent, minimal functional expressions, alarm bells should ring. You’re yak-shaving. The real work has depleted your focus. You’re not building: you’re self-soothing.</p>\n\n<p>Which, wait for it, is something LLMs are good for. They devour schlep, and clear a path to the important stuff, where your judgement and values really matter.</p>\n<h3 id='but-the-mediocrity' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-mediocrity' aria-label='Anchor'></a><span class='plain-code'>but the mediocrity</span></h3>\n<p>As a mid-late career coder, I’ve come to appreciate mediocrity. You should be so lucky as to  have it flowing almost effortlessly from a tap.</p>\n\n<p>We all write mediocre code. Mediocre code: often fine. Not all code is equally important. Some code should be mediocre. Maximum effort on a random unit test? You’re doing something wrong. Your team lead should correct you.</p>\n\n<p>Developers all love to preen about code. They worry LLMs lower the “ceiling” for quality. Maybe. But they also raise the “floor”.</p>\n\n<p>Gemini’s floor is higher than my own.  My code looks nice. But it’s not as thorough. LLM code is repetitive. But mine includes dumb contortions where I got too clever trying to DRY things up.</p>\n\n<p>And LLMs aren’t mediocre on every axis. They almost certainly have a bigger bag of algorithmic tricks than you do: radix tries, topological sorts, graph reductions, and LDPC codes. Humans romanticize <code>rsync</code> (<a href='https://www.andrew.cmu.edu/course/15-749/READINGS/required/cas/tridgell96.pdf' title=''>Andrew Tridgell</a> wrote a paper about it!). To an LLM it might not be that much more interesting than a SQL join.</p>\n\n<p>But I’m getting ahead of myself. It doesn’t matter. If truly mediocre code is all we ever get from LLMs, that’s still huge. It’s that much less mediocre code humans have to write.</p>\n<h3 id='but-itll-never-be-agi' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-itll-never-be-agi' aria-label='Anchor'></a><span class='plain-code'>but it’ll never be AGI</span></h3>\n<p>I don’t give a shit.</p>\n\n<p>Smart practitioners get wound up by the AI/VC hype cycle. I can’t blame them. But it’s not an argument. Things either work or they don’t, no matter what Jensen Huang has to say about it.</p>\n<h3 id='but-they-take-rr-jerbs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-they-take-rr-jerbs' aria-label='Anchor'></a><span class='plain-code'>but they take-rr jerbs</span></h3>\n<p><a href='https://news.ycombinator.com/item?id=43776612' title=''>So does open source.</a> We used to pay good money for databases.</p>\n\n<p>We’re a field premised on automating other people’s jobs away. “Productivity gains,” say the economists. You get what that means, right? Fewer people doing the same stuff. Talked to a travel agent lately? Or a floor broker? Or a record store clerk? Or a darkroom tech?</p>\n\n<p>When this argument comes up, libertarian-leaning VCs start the chant: lamplighters, creative destruction, new kinds of work. Maybe. But I’m not hypnotized. I have no fucking clue whether we’re going to be better off after LLMs. Things could get a lot worse for us.</p>\n\n<p>LLMs really might displace many software developers. That’s not a high horse we get to ride. Our jobs are just as much in tech’s line of fire as everybody else’s have been for the last 3 decades. We’re not <a href='https://en.wikipedia.org/wiki/2024_United_States_port_strike' title=''>East Coast dockworkers</a>; we won’t stop progress on our own.</p>\n<h3 id='but-the-plagiarism' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-plagiarism' aria-label='Anchor'></a><span class='plain-code'>but the plagiarism</span></h3>\n<p>Artificial intelligence is profoundly — and probably unfairly — threatening to visual artists in ways that might be hard to appreciate if you don’t work in the arts.</p>\n\n<p>We imagine artists spending their working hours pushing the limits of expression. But the median artist isn’t producing gallery pieces. They produce on brief: turning out competent illustrations and compositions for magazine covers, museum displays, motion graphics, and game assets.</p>\n\n<p>LLMs easily — alarmingly — clear industry quality bars. Gallingly, one of the things they’re best at is churning out just-good-enough facsimiles of human creative work.  I have family in visual arts. I can’t talk to them about LLMs. I don’t blame them. They’re probably not wrong.</p>\n\n<p>Meanwhile, software developers spot code fragments <a href=\"https://arxiv.org/abs/2311.17035\">seemingly lifted</a> from public repositories on Github and lose their shit. What about the licensing? If you’re a lawyer, I defer. But if you’re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.</p>\n\n<p>The median dev thinks Star Wars and Daft Punk are a public commons. The great cultural project of developers has been opposing any protection that might inconvenience a monetizable media-sharing site. When they fail at policy, they route around it with coercion. They stand up global-scale piracy networks and sneer at anybody who so much as tries to preserve a new-release window for a TV show.</p>\n\n<p>Call any of this out if you want to watch a TED talk about how hard it is to stream <em>The Expanse</em> on LibreWolf. Yeah, we get it. You don’t believe in IPR. Then shut the fuck up about IPR. Reap the whirlwind.</p>\n\n<p>It’s all special pleading anyways. LLMs digest code further than you do. If you don’t believe a typeface designer can stake a moral claim on the terminals and counters of a letterform, you sure as hell can’t be possessive about a red-black tree.</p>\n<h3 id='positive-case-redux' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#positive-case-redux' aria-label='Anchor'></a><span class='plain-code'>positive case redux</span></h3>\n<p>When I started writing a couple days ago, I wrote a section to “level set” to the  state of the art of LLM-assisted programming. A bluefish filet has a longer shelf life than an LLM take. In the time it took you to read this, everything changed.</p>\n\n<p>Kids today don’t just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They’ve got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged.</p>\n\n<p><em>“I’m sipping rocket fuel right now,”</em> a friend tells me. <em>“The folks on my team who aren’t embracing AI? It’s like they’re standing still.”</em> He’s not bullshitting me. He doesn’t work in SFBA. He’s got no reason to lie.</p>\n\n<p>There’s plenty of things I can’t trust an LLM with. No LLM has any of access to prod here. But I’ve been first responder on an incident and fed 4o — not o4-mini, 4o — log transcripts, and watched it in seconds spot LVM metadata corruption issues on a host we’ve been complaining about for months. Am I better than an LLM agent at interrogating OpenSearch logs and Honeycomb traces? No. No, I am not.</p>\n\n<p>To the consternation of many of my friends, I’m not a radical or a futurist. I’m a statist. I believe in the haphazard perseverance of complex systems, of institutions, of reversions to the mean. I write Go and Python code. I’m not a Kool-aid drinker.</p>\n\n<p>But something real is happening. My smartest friends are blowing it off. Maybe I persuade you. Probably I don’t. But we need to be done making space for bad arguments.</p>\n<h3 id='but-im-tired-of-hearing-about-it' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-im-tired-of-hearing-about-it' aria-label='Anchor'></a><span class='plain-code'>but i’m tired of hearing about it</span></h3>\n<p>And here I rejoin your company. I read <a href='https://simonwillison.net/' title=''>Simon Willison</a>, and that’s all I really need. But all day, every day, a sizable chunk of the front page of HN is allocated to LLMs: incremental model updates, startups doing things with LLMs, LLM tutorials, screeds against LLMs. It’s annoying!</p>\n\n<p>But AI is also incredibly — a word I use advisedly — important. It’s getting the same kind of attention that smart phones got in 2008, and not as much as the Internet got. That seems about right.</p>\n\n<p>I think this is going to get clearer over the next year. The cool kid haughtiness about “stochastic parrots” and “vibe coding” can’t survive much more contact with reality. I’m snarking about these people, but I meant what I said: they’re smarter than me. And when they get over this affectation, they’re going to make coding agents profoundly more effective than they are today.</p>",
      "image": {
        "url": "https://fly.io/blog/youre-all-nuts/assets/whoah.png",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/kamal-in-production/",
      "title": "Using Kamal 2.0 in Production",
      "description": null,
      "url": "https://fly.io/blog/kamal-in-production/",
      "published": "2025-05-29T00:00:00.000Z",
      "updated": "2025-06-05T20:40:14.000Z",
      "content": "<p><a href='https://pragprog.com/titles/rails8/agile-web-development-with-rails-8/' title=''>Agile Web Development with Rails 8</a> is off to production, where they do things like editing, indexing, pagination, and printing. In researching the chapter on Deployment and Production, I became very dissatisfied with the content available on Kamal. I ended up writing my own, and it went well beyond the scope of the book. I then extracted what I needed from the result and put it in the book.</p>\n\n<p>Now that I have some spare time, I took a look at the greater work. It was more than a chapter and less than a book, so I decided to publish it <a href='https://rubys.github.io/kamal-in-production/' title=''>online</a>.</p>\n\n<p>This took me only a matter of hours. I had my notes in the XML grammar that Pragmatic Programming uses for books. I asked GitHub Copilot to convert them to Markdown. It did the job without my having to explain the grammar. It made intelligent guesses as to how to handle footnotes and got a number of these wrong, but that was easy to fix. On a lark, I asked it to proofread the content, and it did that too.</p>\n\n<hr>\n\n<p>Don’t get me wrong, Kamal is great. There are plenty of videos on how to get toy projects online, and the documentation will tell you what each field in the configuration file does. But none pull together everything you need to deploy a real project. For example, <a href='https://rubys.github.io/kamal-in-production/Assemble/' title=''>there are seven things you need to get started</a>. Some are optional, some you may already have, and all can be gathered quickly <strong class='font-semibold text-navy-950'>if you have a list</strong>.</p>\n\n<p>Kamal is just one piece of the puzzle. To deploy your software using Kamal, you need to be aware of the vast Docker ecosystem. You will want to set up a builder, sign up for a container repository, and lock down your secrets. And as you grow, you will want a load balancer and a managed database.</p>\n\n<p>And production is much more than copying files and starting a process. It is ensuring that your database is backed up, that your logs are searchable, and that your application is being monitored. It is also about ensuring that your application is secure.</p>\n\n<p>My list is opinionated. Each choice has a lot of options. For SSH keys, there are a lot of key types. If you don’t have an opinion, go with what GitHub recommends. For hosting, there are a lot of providers, and most videos start with Hetzner, which is a solid choice. But did you know that there are separate paths for Cloud and Robot, and when you would want either?</p>\n\n<p>A list with default options and alternatives highlighted was what would have helped me get started. Now it is available to you. The <a href='https://github.com/rubys/kamal-in-production/' title=''>source is on GitHub</a>. <a href='https://creativecommons.org/public-domain/cc0/' title=''>CC0 licensed</a>. Feel free to add side pages or links to document Digital Ocean, add other monitoring tools, or simply correct a typo that GitHub Copilot and I missed (or made).</p>\n\n<hr>\n\n<p>And if you happen to be in the south eastern part of the US in August, come see me talk on this topic at the <a href='https://blog.carolina.codes/p/announcing-our-2025-speakers' title=''>Carolina Code Conference</a>. If you can’t make it, the presentation will be recorded and posted online.</p>",
      "image": {
        "url": "https://fly.io/blog/kamal-in-production/assets/production.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/parking-lot-ffffffffffffffff/",
      "title": "parking_lot: ffffffffffffffff...",
      "description": null,
      "url": "https://fly.io/blog/parking-lot-ffffffffffffffff/",
      "published": "2025-05-28T00:00:00.000Z",
      "updated": "2025-05-28T20:20:37.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, a public cloud that runs apps in a bunch of locations all over the world. This is a post about a gnarly bug in our Anycast router, the largest Rust project in our codebase. You won’t need much of any Rust to follow it.</p>\n</div>\n<p>The basic idea is simple: customers give us Docker containers, and tell us which one of 30+ regions around the world they want them to run in. We convert the containers into lightweight virtual machines, and then link them to an Anycast network. If you boot up an app here in Sydney and Frankfurt, and a request for it lands in Narita, it’ll get routed to Sydney. The component doing that work is called <code>fly-proxy</code>. It’s a Rust program, and it has been ill behaved of late.</p>\n<h3 id='dramatis-personae' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#dramatis-personae' aria-label='Anchor'></a><span class='plain-code'>Dramatis Personae</span></h3>\n<p><code>fly-proxy</code>, our intrepid Anycast router.</p>\n\n<p><code>corrosion</code>, our intrepid Anycast routing protocol.</p>\n\n<p><code>Rust</code>, a programming language you probably don’t use.</p>\n\n<p><code>read-write locks</code>, a synchronization primitive that allows for many readers <em>or</em> one single writer.</p>\n\n<p><code>parking_lot</code>, a well-regarded optimized implementation of locks in Rust.</p>\n<div class=\"callout\"><p>Gaze not into the abyss, lest you become recognized as an <strong class=\"font-semibold text-navy-950\"><em>abyss domain expert</em></strong>, and they expect you keep gazing into the damn thing</p>\n\n<p><em>Mathewson <a href=\"https://x.com/nickm_tor/status/860234274842324993?lang=en\" title=\"\">6:31</a></em></p>\n</div><h3 id='anycast-routing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#anycast-routing' aria-label='Anchor'></a><span class='plain-code'>Anycast Routing</span></h3>\n<p>You already know how to build a proxy. Connection comes in, connection goes out, copy back and forth. So for the amount of time we spend writing about <code>fly-proxy</code>, you might wonder what the big deal is.</p>\n\n<p>To be fair, in the nuts and bolts of actually proxying requests, <code>fly-proxy</code> does some interesting stuff. For one thing, it’s <a href='https://github.com/jedisct1/yes-rs' title=''>written in Rust</a>, which is apparently a big deal all on its own. It’s a large, asynchronous codebase that handles multiple protocols, TLS termination, and certificate issuance. It exercises a lot of <a href='https://tokio.rs/' title=''>Tokio</a> features.</p>\n\n<p>But none of this is the hard part of <code>fly-proxy</code>.</p>\n\n<p>We operate thousands of servers around the world. A customer Fly Machine might get scheduled onto any of them; in some places, the expectation we set with customers is that their Machines will potentially start in less than a second. More importantly, a Fly Machine can terminate instantly. In both cases, but especially the latter, <code>fly-proxy</code> potentially needs to know, so that it does (or doesn’t) route traffic there.</p>\n\n<p>This is the hard problem: managing millions of connections for millions of apps. It’s a lot of state to manage, and it’s in constant flux. We refer to this as the “state distribution problem”, but really, it quacks like a routing protocol.</p>\n<h3 id='our-routing-protocol-is-corrosion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-routing-protocol-is-corrosion' aria-label='Anchor'></a><span class='plain-code'>Our Routing Protocol is Corrosion</span></h3><div class=\"right-sidenote\"><p>Corrosion2, to be precise.</p>\n</div>\n<p>We’ve been through multiple iterations of the state management problem, and the stable place we’ve settled is a <a href='https://github.com/superfly/corrosion' title=''>system called Corrosion</a>.</p>\n\n<p>Corrosion is a large SQLite database mirrored globally across several thousand servers. There are three important factors that make Corrosion work:</p>\n\n<ol>\n<li>The SQLite database Corrosion replicates is CRDT-structured.\n</li><li>In our orchestration model, individual worker servers are the source of truth for any Fly Machines running on them; there’s no globally coordinated orchestration state.\n</li><li>We use <a href='http://fly.io/blog/building-clusters-with-serf/#what-serf-is-doing' title=''>SWIM gossip</a> to publish updates from those workers across the fleet.\n</li></ol>\n\n<p>This works. A Fly Machine terminates in Dallas; a <code>fly-proxy</code> instance in Singapore knows within a small number of seconds.</p>\n<h3 id='routing-protocol-implementations-are-hard' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#routing-protocol-implementations-are-hard' aria-label='Anchor'></a><span class='plain-code'>Routing Protocol Implementations Are Hard</span></h3>\n<p>A routing protocol is a canonical example of a distributed system. We’ve certainly hit distsys bugs in Corrosion. But this story is about basic implementation issues. </p>\n\n<p>A globally replicated SQLite database is an awfully nice primitive, but we’re not actually doing SQL queries every time a request lands.</p>\n\n<p>In somewhat the same sense as a router works both with a <a href='https://www.dasblinkenlichten.com/rib-and-fib-understanding-the-terminology/' title=''>RIB and a FIB</a>, there is in <code>fly-proxy</code> a system of record for routing information (Corrosion), and then an in-memory aggregation of that information used to make fast decisions. In <code>fly-proxy</code>, that’s called the Catalog. It’s a record of everything in Corrosion a proxy might need to know about to forward requests.</p>\n\n<p>Here’s a fun bug from last year:</p>\n\n<p>At any given point in time, there’s a lot going on inside <code>fly-proxy</code>. It is a highly concurrent system. In particular, there are many readers to the Catalog, and also, when Corrosion updates, writers. We manage access to the Catalog with a system of <a href='https://doc.rust-lang.org/std/sync/struct.RwLock.html' title=''>read-write locks</a>.</p>\n\n<p>Rust is big on pattern-matching; instead of control flow based (at bottom) on simple arithmetic expressions, Rust allows you to <code>match</code> exhaustively (with compiler enforcement) on the types of an expression, and the type system expresses things like <code>Ok</code> or <code>Err</code>.</p>\n\n<p>But <code>match</code> can be cumbersome, and so there are shorthands. One of them is <code>if let</code>, which is syntax that makes a pattern match read like a classic <code>if</code> statement. Here’s an example:</p>\n<div class=\"highlight-wrapper group relative rust\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-fl5sng0r\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-fl5sng0r\"><span class=\"k\">if</span> <span class=\"k\">let</span> <span class=\"p\">(</span><span class=\"nf\">Some</span><span class=\"p\">(</span><span class=\"nn\">Load</span><span class=\"p\">::</span><span class=\"nf\">Local</span><span class=\"p\">(</span><span class=\"n\">load</span><span class=\"p\">)))</span> <span class=\"o\">=</span> <span class=\"p\">(</span><span class=\"o\">&</span><span class=\"k\">self</span><span class=\"py\">.load</span><span class=\"nf\">.read</span><span class=\"p\">()</span><span class=\"nf\">.get</span><span class=\"p\">(</span><span class=\"o\">...</span><span class=\"p\">))</span> <span class=\"p\">{</span>\n    <span class=\"c1\">// do a bunch of stuff with `load`</span>\n<span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"p\">{</span>\n    <span class=\"k\">self</span><span class=\"nf\">.init_for</span><span class=\"p\">(</span><span class=\"o\">...</span><span class=\"p\">);</span>\n<span class=\"p\">}</span>\n</code></pre>\n  </div>\n</div>\n<p>The “if” arm of that branch is taken if <code>self.load.read().get()</code> returns a value with the type <code>Some</code>. To retrieve that value, the expression calls <code>read()</code> to grab a lock.</p>\n<div class=\"right-sidenote\"><p>though Rust programmers probably notice the bug quickly</p>\n</div>\n<p>The bug is subtle: in that code, the lock <code>self.load.read().get()</code> takes is held not just for the duration of the “if” arm, but also for the “else” arm — you can think of <code>if let</code> expressions as being rewritten to the equivalent <code>match</code> expression, where that lifespan is much clearer.</p>\n\n<p>Anyways that’s real code and it occurred on a code path in <code>fly-proxy</code> that was triggered by a Corrosion update propagating from host to host across our fleet in millisecond intervals of time, converging quickly, like a good little routing protocol, on a global consensus that our entire Anycast routing layer should be deadlocked. Fun times.</p>\n<h3 id='the-watchdog-and-regionalizing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-watchdog-and-regionalizing' aria-label='Anchor'></a><span class='plain-code'>The Watchdog, and Regionalizing</span></h3>\n<p>The experience of that Anycast outage was arresting, and immediately altered our plans. We set about two tasks, one short-term and one long.</p>\n\n<p>In the short term: we made deadlocks nonlethal with a “watchdog” system. <code>fly-proxy</code> has an internal control channel (it drives a REPL operators can run from our servers). During a deadlock (or dead-loop or exhaustion), that channel becomes nonresponsive. We watch for that and bounce the proxy when it happens. A deadlock is still bad! But it’s a second-or-two-length arrhythmia, not asystole.</p>\n\n<p>Meanwhile, over the long term: we’re confronting the implications of all our routing state sharing a global broadcast domain. The update that seized up Anycast last year pertained to an app nobody used. There wasn’t any real reason for any <code>fly-proxy</code> to receive it in the first place. But in the <em>status quo ante</em> of the outage, every proxy received updates for every Fly Machine.</p>\n\n<p>They still do. Why? Because it scales, and fixing it turns out to be a huge lift. It’s a lift we’re still making! It’s just taking time. We call this effort “regionalization”, because the next intermediate goal is to confine most updates to the region (Sydney, Frankfurt, Dallas) in which they occur.</p>\n\n<p>I hope this has been a satisfying little tour of the problem domain we’re working in. We have now reached the point where I can start describing the new bug.</p>\n<h3 id='new-bug-round-1-lazy-loading' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-1-lazy-loading' aria-label='Anchor'></a><span class='plain-code'>New Bug, Round 1: Lazy Loading</span></h3>\n<p>We want to transform our original Anycast router, designed to assume a full picture of global state, into a regionalized router with partitioned state. A sane approach: have it lazy-load state information. Then you can spare most of the state code; state for apps that never show up at a particular <code>fly-proxy</code> in, say, Hong Kong simply doesn’t get loaded.</p>\n\n<p>For months now, portions of the <code>fly-proxy</code> Catalog have been lazy-loaded. A few weeks ago, the decision is made to get most of the rest of the Catalog state (apps, machines, &c) lazy-loaded as well. It’s a straightforward change and it gets rolled out quickly.</p>\n\n<p>Almost as quickly, proxies begin locking up and getting bounced by the watchdog. Not in the frightening Beijing-Olympics-level coordinated fashion that happened during the Outage, but any watchdog bounce is a problem.</p>\n\n<p>We roll back the change.</p>\n\n<p>From the information we have, we’ve narrowed things down to two suspects. First, lazy-loading changes the read/write patterns and thus the pressure on the RWLocks the Catalog uses; it could just be lock contention. Second, we spot a suspicious <code>if let</code>.</p>\n<h3 id='new-bug-round-2-the-lock-refactor' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-2-the-lock-refactor' aria-label='Anchor'></a><span class='plain-code'>New Bug, Round 2: The Lock Refactor</span></h3>\n<p>Whichever the case, there’s a reason these proxies locked up with the new lazy-loading code, and our job is to fix it. The <code>if let</code> is easy. Lock contention is a little trickier.</p>\n\n<p>At this point it’s time to introduce a new character to the story, though they’ve been lurking on the stage the whole time: it’s <a href='https://github.com/Amanieu/parking_lot' title=''><code>parking_lot</code></a>, an important, well-regarded, and widely-used replacement for the standard library’s lock implementation.</p>\n\n<p>Locks in <code>fly-proxy</code> are <code>parking_lot</code> locks. People use <code>parking_lot</code> mostly because it is very fast and tight (a lock takes up just a single 64-bit word). But we use it for the feature set. The feature we’re going to pull out this time is lock timeouts: the RWLock in <code>parking_lot</code> exposes a <a href='https://amanieu.github.io/parking_lot/parking_lot/struct.RwLock.html#method.try_write_for' title=''><code>try_write_for</code></a>method, which takes a <code>Duration</code>, after which an attempt to grab the write lock fails.</p>\n\n<p>Before rolling out a new lazy-loading <code>fly-proxy</code>, we do some refactoring:</p>\n\n<ul>\n<li>our Catalog write locks all time out, so we’ll get telemetry and a failure recovery path if that’s what’s choking the proxy to death,\n</li><li>we eliminate RAII-style lock acquisition from the Catalog code (in RAII style, you grab a lock into a local value, and the lock is released implicitly when the scope that expression occurs in concludes) and replace it with explicit closures, so you can look at the code and see precisely the interval in which the lock is held, and\n</li><li>since we now have code that is very explicit about locking intervals, we instrument it with logging and metrics so we can see what’s happening.\n</li></ul>\n\n<p>We should be set. The suspicious <code>if let</code> is gone, lock acquisition can time out, and we have all this new visibility.</p>\n\n<p>Nope. Immediately more lockups, all in Europe, especially in <code>WAW</code>.</p>\n<h3 id='new-bug-round-3-telemetry-inspection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-3-telemetry-inspection' aria-label='Anchor'></a><span class='plain-code'>New Bug, Round 3: Telemetry Inspection</span></h3>\n<p>That we’re still seeing deadlocks is f'ing weird. We’ve audited all our Catalog locks. You can look at the code and see the lifespan of a grabbed lock.</p>\n\n<p>We have a clue: our lock timeout logs spam just before a proxy locks up and is killed by the watchdog. This clue: useless. But we don’t know that yet!</p>\n\n<p>Our new hunch is that something is holding a lock for a very long time, and we just need to find out which thing that is. Maybe the threads updating the Catalog from Corrosion are dead-looping?</p>\n\n<p>The instrumentation should tell the tale. But it does not. We see slow locks… just before the entire proxy locks up. And they happen in benign, quiet applications. Random, benign, quiet applications.</p>\n\n<p><code>parking_lot</code> has a <a href='https://amanieu.github.io/parking_lot/parking_lot/deadlock/index.html' title=''>deadlock detector</a>. If you ask it, it’ll keep a waiting-for dependency graph and detect stalled threads. This runs on its own thread, isolate outside the web of lock dependencies in the rest of the proxy. We cut a build of the proxy with the deadlock detector enabled and wait for something in <code>WAW</code> to lock up. And it does. But <code>parking_lot</code> doesn’t notice. As far as it’s concerned, nothing is wrong.</p>\n\n<p>We are at this moment very happy we did the watchdog thing.</p>\n<h3 id='new-bug-round-4-descent-into-madness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-4-descent-into-madness' aria-label='Anchor'></a><span class='plain-code'>New Bug, Round 4: Descent Into Madness</span></h3>\n<p>When the watchdog bounces a proxy, it snaps a core dump from the process it just killed. We are now looking at core dumps. There is only one level of decompensation to be reached below “inspecting core dumps”, and that’s “blaming the compiler”. We will get there.</p>\n\n<p>Here’s Pavel, at the time:</p>\n\n<blockquote>\n<p>I’ve been staring at the last core dump from <code>waw</code> . It’s quite strange.\nFirst, there is no thread that’s running inside the critical section. Yet, there is a thread that’s waiting to acquire write lock and a bunch of threads waiting to acquire a read lock.\nThat’s doesn’t prove anything, of course, as a thread holding catalog write lock might have just released it before core dump was taken. But that would be quite a coincidence.</p>\n</blockquote>\n\n<p>The proxy is locked up. But there are no threads in a critical section for the catalog. Nevertheless, we can see stack traces of multiple threads waiting to acquire locks. In the moment, Pavel thinks this could be a fluke. But we’ll soon learn that <em>every single stack trace</em> shows the same pattern: everything wants the Catalog lock, but nobody has it.</p>\n\n<p>It’s hard to overstate how weird this is. It breaks both our big theories: it’s not compatible with a Catalog deadlock that we missed, and it’s not compatible with a very slow lock holder. Like a podcast listener staring into the sky at the condensation trail of a jetliner, we begin reaching for wild theories. For instance: <code>parking_lot</code> locks are synchronous, but we’re a Tokio application; something somewhere could be taking an async lock that’s confusing the runtime. Alas, no.</p>\n\n<p>On the plus side, we are now better at postmortem core dump inspection with <code>gdb</code>.</p>\n<h3 id='new-bug-round-5-madness-gives-way-to-desperation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-5-madness-gives-way-to-desperation' aria-label='Anchor'></a><span class='plain-code'>New Bug, Round 5: Madness Gives Way To Desperation</span></h3>\n<p>Fuck it, we’ll switch to <code>read_recursive</code>.</p>\n\n<p>A recursive read lock is an eldritch rite invoked when you need to grab a read lock deep in a call tree where you already grabbed that lock, but can’t be arsed to structure the code properly to reflect that. When you ask for a recursive lock, if you already hold the lock, you get the lock again, instead of a deadlock.</p>\n\n<p>Our theory: <code>parking_lot</code>goes through some trouble to make sure a stampede of readers won’t starve writers, who are usually outnumbered. It prefers writers by preventing readers from acquiring locks when there’s at least one waiting writer. And <code>read_recursive</code> sidesteps that logic.</p>\n\n<p>Maybe there’s some ultra-slow reader somehow not showing up in our traces, and maybe switching to recursive locks will cut through that.</p>\n\n<p>This does not work. At least, not how we hoped it would. It does generate a new piece of evidence:  <code>RwLock reader count overflow</code> log messages, and lots of them.</p>\n<h3 id='there-are-things-you-are-not-meant-to-know' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#there-are-things-you-are-not-meant-to-know' aria-label='Anchor'></a><span class='plain-code'>There Are Things You Are Not Meant To Know</span></h3>\n<p>You’re reading a 3,000 word blog post about a single concurrency bug, so my guess is you’re the kind of person who compulsively wants to understand how everything works. That’s fine, but a word of advice: there are things where, if you find yourself learning about them in detail, something has gone wrong.</p>\n\n<p>One of those things is the precise mechanisms used by your RWLock implementation.</p>\n\n<p>The whole point of <code>parking_lot</code> is that the locks are tiny, marshalled into a 64 bit word. Those bits are partitioned into <a href='https://github.com/Amanieu/parking_lot/blob/master/src/raw_rwlock.rs' title=''>4 signaling bits</a> (<code>PARKED</code>, <code>WRITER_PARKED</code>, <code>WRITER</code>, and <code>UPGRADEABLE</code>) and a 60-bit counter of lock holders.</p>\n\n<blockquote>\n<p>Me, a dummy: sounds like we overflowed that counter.</p>\n\n<p>Pavel, a genius: we are not overflowing a 60-bit counter.</p>\n</blockquote>\n\n<p>Ruling out a genuine overflow (which would require paranormal activity) leaves us with corruption as the culprit. Something is bashing our lock word. If the counter is corrupted, maybe the signaling bits are too; then we’re in an inconsistent state, an artificial deadlock.</p>\n\n<p>Easily confirmed. We cast the lock words into <code>usize</code> and log them. Sure enough, they’re <code>0xFFFFFFFFFFFFFFFF</code>.</p>\n\n<p>This is a smoking gun, because it implies all 4 signaling bits are set, and that includes <code>UPGRADEABLE</code>. Upgradeable locks are read-locks that can be “upgraded” to write locks. We don’t use them.</p>\n\n<p>This looks like classic memory corruption. But in our core dumps, memory doesn’t appear corrupted: the only thing set all <code>FFh</code> is the lock word.</p>\n\n<p>We compile and run our test suites <a href='https://github.com/rust-lang/miri' title=''>under <code>miri</code></a>, a Rust interpreter for its <a href='https://rustc-dev-guide.rust-lang.org/mir/index.html' title=''>MIR IR</a>. <code>miri</code> does all sorts of cool UB detection, and does in fact spot some UB in our tests, which we are at the time excited about until we deploy the fixes and nothing gets better.</p>\n\n<p>At this point, Saleem suggests guard pages. We could <code>mprotect</code> memory pages around the lock to force a panic if a wild write hits <em>near</em> the lock word, or just allocate zero pages and check them, which we are at the time excited about until we deploy the guard pages and nothing hits them.</p>\n<h3 id='the-non-euclidean-horror-at-the-heart-of-this-bug' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-non-euclidean-horror-at-the-heart-of-this-bug' aria-label='Anchor'></a><span class='plain-code'>The Non-Euclidean Horror At The Heart Of This Bug</span></h3>\n<p>At this point we should recap where we find ourselves:</p>\n\n<ul>\n<li>We made a relatively innocuous change to our proxy, which caused proxies in Europe to get watchdog-killed.\n</li><li>We audited and eliminated all the nasty <code>if-letses</code>.\n</li><li>We replaced all RAII lock acquisitions with explicit closures, and instrumented the closures. \n</li><li>We enabled <code>parking_lot</code> deadlock detection. \n</li><li>We captured and analyzed core dumps for the killed proxies. \n</li><li>We frantically switched to recursive read locks, which generated a new error.\n</li><li>We spotted what looks like memory corruption, but only of that one tiny lock word.\n</li><li>We ran our code under an IR interpreter to find UB, fixed some UB, and didn’t fix the bug.\n</li><li>We set up guard pages to catch wild writes.\n</li></ul>\n\n<p>In Richard Cook’s essential <a href='https://how.complexsystems.fail/' title=''>“How Complex Systems Fail”</a>, rule #5 is that “complex systems operate in degraded mode”. <em>The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws</em>. Maybe <code>fly-proxy</code> is now like the RBMK reactors at Chernobyl, a system that works just fine as long as operators know about and compensate for its known concurrency flaws, and everybody lives happily ever after.</p>\n<div class=\"right-sidenote\"><p>We are, in particular, running on the most popular architecture for its RWLock implementation.</p>\n</div>\n<p>We have reached the point where serious conversations are happening about whether we’ve found a Rust compiler bug. Amusingly, <code>parking_lot</code> is so well regarded among Rustaceans that it’s equally if not more plausible that Rust itself is broken.</p>\n\n<p>Nevertheless, we close-read the RWLock implementation. And we spot this:</p>\n<div class=\"highlight-wrapper group relative rust\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-fxdbmwey\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-fxdbmwey\"><span class=\"k\">let</span> <span class=\"n\">state</span> <span class=\"o\">=</span> <span class=\"k\">self</span><span class=\"py\">.state</span><span class=\"nf\">.fetch_add</span><span class=\"p\">(</span>\n   <span class=\"n\">prev_value</span><span class=\"nf\">.wrapping_sub</span><span class=\"p\">(</span><span class=\"n\">WRITER_BIT</span> <span class=\"p\">|</span> <span class=\"n\">WRITER_PARKED_BIT</span><span class=\"p\">),</span>\n                           <span class=\"nn\">Ordering</span><span class=\"p\">::</span><span class=\"n\">Relaxed</span><span class=\"p\">);</span>\n</code></pre>\n  </div>\n</div>\n<p>This looks like gibberish, so let’s rephrase that code to see what it’s actually doing:</p>\n<div class=\"highlight-wrapper group relative rust\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-1648p3rc\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-1648p3rc\"><span class=\"k\">let</span> <span class=\"n\">state</span> <span class=\"o\">=</span> <span class=\"k\">self</span><span class=\"py\">.state</span> <span class=\"o\">&</span> <span class=\"o\">~</span><span class=\"p\">(</span><span class=\"n\">WRITER_BIT</span><span class=\"p\">|</span><span class=\"n\">WRITER_PARKED_BIT</span><span class=\"p\">);</span>\n</code></pre>\n  </div>\n</div>\n<p>If you know exactly the state of the word you’re writing to, and you know exactly the limited set of permutations the bits of that word can take (for instance, because there’s only 4 signaling bits), then instead of clearing bit by fetching a word, altering it, and then storing it, you can clear them <em>atomically</em> by adding the inverse of those bits to the word.</p>\n\n<p>This pattern is self-synchronizing, but it relies on an invariant: you’d better be right about the original state of the word you’re altering. Because if you’re wrong, you’re adding a very large value to an uncontrolled value.</p>\n\n<p>In <code>parking_lot</code>, say we have <code>WRITER</code> and <code>WRITER_PARKED</code> set: the state is <code>0b1010</code>. <code>prev_value</code>, the state of the lock word when the lock operation started, is virtually always 0, and that’s what we’re counting on. <code>prev_value.wrapping_sub()</code>then calculates <code>0xFFFFFFFFFFFFFFF6</code>, which exactly cancels out the <code>0b1010</code> state, leaving 0.</p>\n<div class=\"right-sidenote\"><p>As a grace note, the locking code manages to set that one last unset bit before the proxy deadlocks.</p>\n</div>\n<p>Consider though what happens if one of those bits isn’t set: state is <code>0b1000</code>. Now that add doesn’t cancel out; the final state is instead <code>0xFFFFFFFFFFFFFFFE</code>. The reader count is completely full and can’t be decremented, and all the waiting bits are set so nothing can happen on the lock.</p>\n\n<p><code>parking_lot</code> is a big deal and we’re going to be damn sure before we file a bug report. Which doesn’t take long; Pavel reproduces the bug in a minimal test case, with a forked version of <code>parking_lot</code> that confirms and logs the condition.</p>\n\n<p><a href='https://github.com/Amanieu/parking_lot/issues/465' title=''>The <code>parking_lot</code> team quickly confirms</a> and fixes the bug.</p>\n<h3 id='ex-insania-claritas' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ex-insania-claritas' aria-label='Anchor'></a><span class='plain-code'>Ex Insania, Claritas</span></h3>\n<p>Here’s what we now know to have been happening:</p>\n\n<ol>\n<li>Thread 1 grabs a read lock.\n</li><li>Thread 2 tries to grab a write lock, with a <code>try_write_for</code> timeout; it’s “parked” waiting for the reader, which sets <code>WRITER</code> and <code>WRITER_PARKED</code> on the raw lock word.\n</li><li>Thread 1 releases the lock, unparking a waiting writer, which unsets <code>WRITER_PARKED</code>.\n</li><li>Thread 2 wakes, but not for the reason it thinks: the timing is such that it thinks the lock has timed out. So it tries to clear both <code>WRITER</code> and <code>WRITER_PARKED</code> — a bitwise “double free”. Lock: corrupted. Computer: over. \n</li></ol>\n\n<p><a href='https://github.com/Amanieu/parking_lot/pull/466' title=''>The fix is simple</a>: the writer bit is cleared separately in the same wakeup queue as the reader. The fix is deployed, the lockups never recur.</p>\n\n<p>At a higher level, the story is this:</p>\n\n<ol>\n<li>We’re refactoring the proxy to regionalize it, which changes the pattern  of readers and writers on the catalog.\n</li><li>As we broaden out lazy loads, we introduce writer contention, a classic concurrency/performance debugging problem. It looks like a deadlock at the time! It isn’t. \n</li><li><code>try_write_for</code> is a good move: we need tools to manage contention.\n</li><li>But now we’re on a buggy code path in <code>parking_lot</code> — we don’t know that and can’t understand it until we’ve lost enough of our minds to second-guess the library.\n</li><li>We stumble on the bug out of pure dumb luck by stabbing in the dark with <code>read_recursive</code>.\n</li></ol>\n\n<p>Mysteries remain. Why did this only happen in <code>WAW</code>? Some kind of crazy regional timing thing? Something to do with the Polish <em>kreska</em> diacritic that makes L’s sound like W’s? The wax and wane of caribou populations? Some very high-traffic APIs local to these regions? All very plausible.</p>\n\n<p>We’ll never know because we fixed the bug.</p>\n\n<p>But we’re in a better place now, even besides the bug fix:</p>\n\n<ul>\n<li>we audited all our catalog locking, got rid of all the iflets, and stopped relying on RAII lock guards.\n</li><li>the resulting closure patterns gave us lock timing metrics, which will be useful dealing with future write contention\n</li><li>all writes are now labeled, and we emit labeled logs on slow writes, augmented with context data (like app IDs) where we have it\n</li><li>we also now track the last and current holder of the write lock, augmented with context information; next time we have a deadlock, we should have all the information we need to identify the actors without <code>gdb</code> stack traces.\n</li></ul>",
      "image": {
        "url": "https://fly.io/blog/parking-lot-ffffffffffffffff/assets/ffff-cover.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/litestream-revamped/",
      "title": "Litestream: Revamped",
      "description": null,
      "url": "https://fly.io/blog/litestream-revamped/",
      "published": "2025-05-20T00:00:00.000Z",
      "updated": "2025-05-20T21:48:00.000Z",
      "content": "<div class=\"lead\"><p><a href=\"https://litestream.io/\" title=\"\">Litestream</a> is an open-source tool that makes it possible to run many kinds of full-stack applications on top of SQLite by making them reliably recoverable from object storage. This is a post about the biggest change we’ve made to it since I launched it.</p>\n</div>\n<p>Nearly a decade ago, I got a bug up my ass. I wanted to build full-stack applications quickly. But the conventional n-tier database design required me to do sysadmin work for each app I shipped. Even the simplest applications depended on heavy-weight database servers like Postgres or MySQL.</p>\n\n<p>I wanted to launch apps on SQLite, because SQLite is easy. But SQLite is embedded, not a server, which at the time implied that the data for my application lived (and died) with just one server.</p>\n\n<p>So in 2020, I wrote <a href='https://litestream.io/' title=''>Litestream</a> to fix that.</p>\n\n<p>Litestream is a tool that runs alongside a SQLite application. Without changing that running application, it takes over the WAL checkpointing process to continuously stream database updates to an S3-compatible object store. If something happens to the server the app is running on, the whole database can efficiently be restored to a different server. You might lose servers, but you won’t lose your data.</p>\n\n<p>Litestream worked well. So we got ambitious. A few years later, we built <a href='https://github.com/superfly/litefs' title=''>LiteFS</a>. LiteFS takes the ideas in Litestream and refines them, so that we can do read replicas and primary failovers with SQLite. LiteFS gives SQLite the modern deployment story of an n-tier database like Postgres, while keeping the database embedded.</p>\n\n<p>We like both LiteFS and Litestream. But Litestream is the more popular project. It’s easier to deploy and easier to reason about.</p>\n\n<p>There are some good ideas in LiteFS. We’d like Litestream users to benefit from them. So we’ve taken our LiteFS learnings and applied them to some new features in Litestream.</p>\n<h2 id='point-in-time-restores-but-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#point-in-time-restores-but-fast' aria-label='Anchor'></a><span class='plain-code'>Point-in-time restores, but fast</span></h2>\n<p><a href='https://fly.io/blog/all-in-on-sqlite-litestream/' title=''>Here’s how Litestream was originally designed</a>: you run <code>litestream</code> against a SQLite database, and it opens up a long-lived read transaction. This transaction arrests SQLite WAL checkpointing, the process by which SQLite consolidates the WAL back into the main database file. Litestream builds a “shadow WAL” that records WAL pages, and copies them to S3.</p>\n\n<p>This is simple, which is good. But it can also be slow. When you want to restore a database, you have have to pull down and replay every change since the last snapshot. If you changed a single database page a thousand times, you replay a thousand changes. For databases with frequent writes, this isn’t a good approach.</p>\n\n<p>In LiteFS, we took a different approach. LiteFS is transaction-aware. It doesn’t simply record raw WAL pages, but rather ordered ranges of pages associated with transactions, using a file format we call <a href='https://github.com/superfly/ltx' title=''>LTX</a>. Each LTX file represents a sorted changeset of pages for a given period of time.</p>\n\n<p><img alt=\"a simple linear LTX file with 8 pages between 1 and 21\" src=\"/blog/litestream-revamped/assets/linear-ltx.png\" /></p>\n\n<p>Because they are sorted, we can easily merge multiple LTX files together and create a new LTX file with only the latest version of each page.</p>\n\n<p><img alt=\"merging three LTX files into one\" src=\"/blog/litestream-revamped/assets/merged-ltx.png\" /></p>\n<div class=\"right-sidenote\"><p>This is similar to how an <a href=\"https://en.wikipedia.org/wiki/Log-structured_merge-tree\" title=\"\">LSM tree</a> works.</p>\n</div>\n<p>This process of combining smaller time ranges into larger ones is called <em>compaction</em>. With it, we can replay a SQLite database to a specific point in time, with a minimal  duplicate pages.</p>\n<h2 id='casaas-compare-and-swap-as-a-service' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#casaas-compare-and-swap-as-a-service' aria-label='Anchor'></a><span class='plain-code'>CASAAS: Compare-and-Swap as a Service</span></h2>\n<p>One challenge Litestream has to deal with is desynchronization. Part of the point of Litestream is that SQLite applications don’t have to be aware of it. But <code>litestream</code> is just a process, running alongside the application, and it can die independently. If <code>litestream</code> is down while database changes occur, it will miss changes. The same kind of problem occurs if you start replication from a new server.</p>\n\n<p>Litestream needs a way to reset the replication stream from a new snapshot. How it does that is with “generations”. <a href='https://litestream.io/how-it-works/#snapshots--generations' title=''>A generation</a> represents a snapshot and a stream of WAL updates, uniquely identified. Litestream notices any break in its WAL sequence and starts a new generation, which is how it recovers from desynchronization.</p>\n\n<p>Unfortunately, storing and managing multiple generations makes it difficult to implement features like failover and read-replicas.</p>\n\n<p>The most straightforward way around this problem is to make sure only one instance of Litestream can replication to a given destination. If you can do that, you can store just a single, latest generation. That in turn makes it easy to know how to resync a read replica; there’s only one generation to choose from.</p>\n\n<p>In LiteFS, we solved this problem by using Consul, which guaranteed a single leader. That requires users to know about Consul. Things like “requiring Consul” are probably part of the reason Litestream is so much more popular than LiteFS.</p>\n\n<p>In Litestream, we’re solving the problem a different way. Modern object stores like S3 and Tigris solve this problem for us: they now offer <a href='https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/' title=''>conditional write support</a>. With conditional writes, we can implement a time-based lease. We get essentially the same constraint Consul gave us, but without having to think about it or set up a dependency.</p>\n\n<p>In the immediacy, this will mean you can run Litestream with ephemeral nodes, with overlapping run times, and even if they’re storing to the same destination, they won’t confuse each other.</p>\n<h2 id='lightweight-read-replicas' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lightweight-read-replicas' aria-label='Anchor'></a><span class='plain-code'>Lightweight read replicas</span></h2>\n<p>The original design constraint of both Litestream and LiteFS was to extend SQLite, to modern deployment scenarios, without disturbing people’s built code. Both tools are meant to function even if applications are oblivious to them.</p>\n\n<p>LiteFS is more ambitious than Litestream, and requires transaction-awareness. To get that without disturbing built code, we use a cute trick (a.k.a. a gross hack): LiteFS provides a FUSE filesystem, which lets it act as a proxy between the application and the backing store. From that vantage point, we can easily discern transactions.</p>\n\n<p>The FUSE approach gave us a lot of control, enough that users could use SQLite replicas just like any other database. But installing and running a whole filesystem (even a fake one) is a lot to ask of users. To work around that problem, we relaxed a constraint: LiteFS can function without the FUSE filesystem if you load an extension into your application code, <a href='https://github.com/superfly/litevfs' title=''>LiteVFS</a>.  LiteVFS is a <a href='https://www.sqlite.org/vfs.html' title=''>SQLite Virtual Filesystem</a> (VFS). It works in a variety of environments, including some where FUSE can’t, like in-browser WASM builds.</p>\n\n<p>What we’re doing next is taking the same trick and using it on Litestream. We’re building a VFS-based read-replica layer. It will be able to fetch and cache pages directly from S3-compatible object storage.</p>\n\n<p>Of course, there’s a catch: this approach isn’t as efficient as a local SQLite database. That kind of efficiency, where you don’t even need to think about N+1 queries because there’s no network round-trip to make the duplicative queries pile up costs, is part of the point of using SQLite.</p>\n\n<p>But we’re optimistic that with cacheing and prefetching, the approach we’re using will yield, for the right use cases, strong performance — all while serving SQLite reads hot off of Tigris or S3.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Litestream is fully open source</h1>\n    <p>It’s not coupled with Fly.io at all; you can use it anywhere.</p>\n      <a class=\"btn btn-lg\" href=\"https://litestream.io/\">\n        Check it out <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='synchronize-lots-of-databases' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#synchronize-lots-of-databases' aria-label='Anchor'></a><span class='plain-code'>Synchronize Lots Of Databases</span></h2>\n<p>While we’ve got you here: we’re knocking out one of our most requested features.</p>\n\n<p>In the old Litestream design, WAL-change polling and slow restores made it infeasible to replicate large numbers of databases from a single process. That has been our answer when users ask us for a “wildcard” or “directory” replication argument for the tool.</p>\n\n<p>Now that we’ve switched to LTX, this isn’t a problem any more. It should thus be possible to replicate <code>/data/*.db</code>, even if there’s hundreds or thousands of databases in that directory.</p>\n<h2 id='we-still-sqlite' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-still-sqlite' aria-label='Anchor'></a><span class='plain-code'>We Still ❤️ SQLite</span></h2>\n<p>SQLite has always been a solid database to build on and it’s continued to find new use cases as the industry evolves. We’re super excited to continue to build Litestream alongside it.</p>\n\n<p>We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too. We think what <a href='https://phoenix.new/' title=''>coding agents like Phoenix.new</a> want is a way to try out code on live data, screw it up, and then rollback <em>both the code and the state.</em> These Litestream updates put us in a position to give agents PITR as a primitive. On top of that, you can build both rollbacks and forks.</p>\n\n<p>Whether or not you’re drinking the AI kool-aid, we think this new design for Litestream is just better. We’re psyched to be rolling it out, and for the features it’s going to enable.</p>",
      "image": {
        "url": "https://fly.io/blog/litestream-revamped/assets/litestream-revamped.png",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/mcp-launch/",
      "title": "Launching MCP Servers on Fly.io",
      "description": null,
      "url": "https://fly.io/blog/mcp-launch/",
      "published": "2025-05-19T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>This is a blog post. Part showing off. Part opinion. Plan accordingly.</p>\n</div>\n<p>The <a href='https://www.anthropic.com/news/model-context-protocol' title=''>Model Context Protocol</a> is days away from turning six months old. You read that right, six <em>months</em> old. MCP Servers have both taken the world by storm, and still trying to figure out what they want to be when they grow up.</p>\n\n<p>There is no doubt that MCP servers are useful. But their appeal goes beyond that. They are also simple and universal. What’s not to like?</p>\n\n<p>Well, for starters, there’s basically two types of MCP servers. One small and nimble that runs as a process on your machine. And one that is a HTTP server that runs presumably elsewhere and is <a href='https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization' title=''>standardizing</a> on OAuth 2.1. And there is a third type, but it is deprecated.</p>\n\n<p>Next there is the configuration. Asking users to manually edit JSON seems so early 21th century. With Claude, this goes into <code>~/Library/Application Support/Claude/claude_desktop_config.json</code>, and is found under a <code>MCPServer</code> key. With Zed, this file is in <code>~/.config/zed/settings.json</code> and is found under a <code>context_servers</code> key. And some tools put these files in a different place depending on whether you are running on MacOS, Linux, or Windows.</p>\n\n<p>Finally, there is security. An MCP server running on your machine literally has access to everything you do. Running remote solves this problem, but did I mention <a href='https://datatracker.ietf.org/doc/html/draft-ietf-oauth-v2-1-12' title=''>OAuth 2.1</a>? Not exactly something one sets up for casual use.</p>\n\n<p>None of these issues are fatal - something that is obvious by the fact that MCP servers are quite popular. But can we do better? I think so.</p>\n\n<hr>\n\n<p>Demo time.</p>\n\n<p>Let’s try out the <a href='https://github.com/modelcontextprotocol/servers/blob/main/src/slack/README.md' title=''>Slack MCP Server</a>:</p>\n\n<blockquote>\n<p>MCP Server for the Slack API, enabling Claude to interact with Slack workspaces.</p>\n</blockquote>\n\n<p>That certainly sounds like a good test case. There is a small amount of <a href='https://github.com/modelcontextprotocol/servers/blob/main/src/slack/README.md#setup' title=''>setup</a> you need to do, and when you are done you end up with a <em>Bot User OAuth Token</em> staring with <code>xoxb-</code> and a <em>Team ID</em> starting with a <code>T</code>.</p>\n\n<p>You <em>would</em> run it using the following:</p>\n<div class=\"highlight-wrapper group relative sh\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-t6ym1o9s\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-t6ym1o9s\">npx <span class=\"nt\">-y</span> @modelcontextprotocol/server-slack\n</code></pre>\n  </div>\n</div>\n<p>But instead, you convert that command to JSON and find the right configuration file and put this information in there. And either run the slack MCP server locally or set up a server with or without authentication.</p>\n\n<p>Wouldn’t it be nice to have the simplicity of a local process, the security of a remote server, and to eliminate the hassle of manually editing JSON?</p>\n\n<p>Here’s our current thinking:</p>\n<div class=\"highlight-wrapper group relative sh\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-169dbs40\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-169dbs40\">fly mcp launch <span class=\"se\">\\</span>\n  <span class=\"s2\">\"npx -y @modelcontextprotocol/server-slack\"</span> <span class=\"se\">\\</span>\n  <span class=\"nt\">--claude</span> <span class=\"nt\">--server</span> slack <span class=\"se\">\\</span>\n  <span class=\"nt\">--secret</span> <span class=\"nv\">SLACK_BOT_TOKEN</span><span class=\"o\">=</span>xoxb-your-bot-token <span class=\"se\">\\</span>\n  <span class=\"nt\">--secret</span> <span class=\"nv\">SLACK_TEAM_ID</span><span class=\"o\">=</span>T01234567\n</code></pre>\n  </div>\n</div>\n<p>You can put this all on one line if you like, I just split this up so it fits on small screens and so we can talk about the various parts.</p>\n\n<p>The first three words seem reasonable. The quoted string is just the command that we want to run. So lets talk about the four flags. The first tells us which tool’s configuration file we want to update. The second specifies the name to use for the server in that configuration file. The last two set secrets.</p>\n\n<p>Oh, did I say current thinking? Perhaps I should mention that it is something you can run right now, as long as you have flyctl <code>v0.3.125</code> or later. Complete with the options you would expect from Fly.io, like the ability to configure auto-stop, file contents, flycast, secrets, region, volumes, and VM sizes.</p>\n\n<p>And, hey, lookie there:</p>\n\n<p><img alt=\"testing, testing, 1, 2, 3\" src=\"/blog/mcp-launch/assets/mcp-slack.png\" /></p>\n\n<p>Support for Claude, Cursor, Neovim, VS Code, Windsurf, and Zed are built in. You can select multiple clients and configuration files.</p>\n\n<p>By default, bearer token authentication will be set up on both the server and client.</p>\n\n<p>You can find the complete set of options on our <a href='https://fly.io/docs/flyctl/mcp-launch/' title=''><code>fly mcp launch</code></a> docs page.</p>\n\n<hr>\n\n<p>But this post isn’t just about experimental demoware that is subject to change.\nIt is about the depth of support that we are rapidly bringing online, including:</p>\n\n<ul>\n<li>Support for all transports, not just the ones we recommend.\n</li><li>Ability to deploy using the command line or the Machines API, with a number of different options that allow you to chose between elegant simplicity and excruciating precise control.\n</li><li>Ability to deploy each MCP server to a separate Machine, container, or even inside your application.\n</li><li>Access via HTTP Authorization, wireguard tunnels and flycast, or reverse proxies.\n</li></ul>\n\n<p>You can see all this spelled out in our <a href='https://fly.io/docs/mcp/' title=''>docs</a>. Be forewarned, most pages are marked as <em>beta</em>. But the examples provided all work. Well, there may be a bug here or there, but the examples <em>as shown</em> are thought to work. Maybe.</p>\n\n<p>Let’s figure out the ideal ergonomics of deploying MCP servers remotely together!</p>",
      "image": {
        "url": "https://fly.io/blog/mcp-launch/assets/fly-mcp-launch.png",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/mcp-provisioning/",
      "title": "Provisioning Machines using MCPs",
      "description": null,
      "url": "https://fly.io/blog/mcp-provisioning/",
      "published": "2025-05-07T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>Today’s state of the art is K8S, Terraform, web based UIs, and CLIs.  Those days are numbered.</p>\n</div>\n<p>On Monday, I created my first fly volume using an <a href='https://modelcontextprotocol.io/introduction' title=''>MCP</a>. For those who don’t know what MCPs are, they are how you attach tools to <a href='https://en.wikipedia.org/wiki/Large_language_model' title=''>LLM</a>s like Claude or Cursor. I added support for\n<a href='https://fly.io/docs/flyctl/volumes-create/' title=''>fly volume create</a> to <a href='https://fly.io/docs/flyctl/mcp-server/' title=''>fly mcp server</a>, and it worked the first time.\nA few hours later, and with the assistance of GitHub Copilot, i added support for all <a href='https://fly.io/docs/flyctl/volumes/' title=''>fly volumes</a> commands.</p>\n\n<hr>\n<div class=\"right-sidenote\"><p>This movie summary is from <a href=\"https://collidecolumn.wordpress.com/2013/08/04/when-worlds-collide-77-talking-with-computers-beyond-mouse-and-keyboard/\">When Worlds Collide, by Nalaka Gunawardene</a></p>\n</div>\n<p>I’m reminded of the memorable scene in the film <a href='http://en.wikipedia.org/wiki/Star_Trek_IV:_The_Voyage_Home' title=''>Star Trek IV: The Voyage Home</a> (1986). Chief Engineer Scotty, having time-travelled 200 years back to late 20th century San Francisco with his crew mates, encounters an early Personal Computer (PC).</p>\n\n<p>Sitting in front of it, he addresses the machine affably as “Computer!” Nothing happens. Scotty repeats himself; still no response. He doesn’t realise that voice recognition capability hadn’t arrived yet.</p>\n\n<p>Exasperated, he picks up the mouse and speaks into it: “Hello, computer?” The computer’s owner offers helpful advice: “Just use the keyboard.”</p>\n\n<p>Scotty looks astonished. “A keyboard?” he asks, and adds in a sarcastic tone: “How quaint!”</p>\n\n<hr>\n\n<p>A while later, I asked for a list of volumes for an existing app, and Claude noted that I had a few volumes that weren’t attached to any machines. So I asked it to delete the oldest unattached volume, and it did so. Then it occurred to me to do it again; this time I captured the screen:</p>\n<div align=\"center\"><p><img alt=\"Deleting a volume using MCP: \"What is my oldest volume\"? ... \"Delete that volume too\"\" src=\"/blog/mcp-provisioning/assets/volume-delete.png\"></p>\n</div>\n<p>A few notes:</p>\n\n<ul>\n<li>I could have written a program using the <a href='https://fly.io/docs/machines/api/volumes-resource/' title=''>machines API</a>, but that would have required some effort.\n</li><li>I could have used <a href='https://fly.io/docs/flyctl/volumes/' title=''>flyctl</a> directly, but to be honest, I would have had to use the documentation and a bit of copy/pasting of arguments.\n</li><li>I could have navigated to a list of volumes for this application in the Fly.io dashboard, but there currently isn’t any way to sort the list or delete a volume. Had those been in place, that would have been a reasonable alternative if that was something I was actively looking for. This felt different to me, the LLM noted something, brought it to my attention, and I asked it to make a change as a result.\n</li><li>Since this support is built on <code>flyctl</code>, I would have received an error had I tried to destroy a volume that is currently mounted. Knowing that gave me the confidence to try the command.\n</li></ul>\n\n<p>All in all, I found it to be a very intuitive way to make changes to the resources assigned to my application.</p>\n\n<hr>\n\n<p>Imagine a future where you say to your favorite LLM “launch my application on Fly.io”, and your code is scanned and you are presented with a plan containing details such as regions, databases, s3 storage, memory, machines, and the like. You are given the opportunity to adjust the plan and, when ready, say “Make it so”.</p>\n\n<p>For many, the next thing you will see is that your application is up and running. For others, there may be a build error, a secret you need to set, a database that needs to be seeded, or any number of other mundane reasons why your application didn’t work the first time.</p>\n\n<p>Not to fear, your LLM will be able to examine your logs and will not only make suggestions as to next steps, but will be in a position to take actions on your behalf should you wish it to do so.</p>\n\n<p>And it doesn’t stop there. There will be MCP servers running on your Fly.io private network - either on separate machines, or in “sidecar” containers, or even integrated into your app. These will enable you to monitor and interact with your application.</p>\n\n<p>This is not science fiction. The ingredients are all in place to make this a reality. At the moment, it is a matter of “some assembly required”, but it should only be a matter of weeks before all this comes together into a neat package..</p>\n\n<hr>\n\n<p>Meanwhile, you can try this now. Make sure you run <a href='https://fly.io/docs/flyctl/version-upgrade/' title=''>fly version upgrade</a> and verify that you are running v0.3.117.</p>\n\n<p>Then configure your favorite LLM. Here’s my <code>claude_desktop_config.json</code> for example:</p>\n<div class=\"highlight-wrapper group relative json\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-nvj6j4q1\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-nvj6j4q1\"><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nl\">\"mcpServers\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nl\">\"fly.io\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n      </span><span class=\"nl\">\"command\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"/Users/rubys/.fly/bin/flyctl\"</span><span class=\"p\">,</span><span class=\"w\">\n      </span><span class=\"nl\">\"args\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">[</span><span class=\"w\"> </span><span class=\"s2\">\"mcp\"</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s2\">\"server\"</span><span class=\"w\"> </span><span class=\"p\">]</span><span class=\"w\">\n    </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre>\n  </div>\n</div>\n<p>Adjust the path to <code>flyctl</code> as needed.  Restart your LLM, and ask what tools are available. Try a few commands and let us know what you like and let us know if you have any suggestions. Just be aware this is not a demo, if you ask it to destroy a volume, that operation is not reversable. Perhaps try this first on a throwaway application.</p>\n\n<p>You don’t even need a LLM to try out the flyctl MCP server. If you have Node.js installed, you can run the <a href='https://modelcontextprotocol.io/docs/tools/inspector' title=''>MCP Inspector</a>:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-kiv6npkl\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-kiv6npkl\">fly mcp server -i\n</code></pre>\n  </div>\n</div>\n<p>Once started, visit <a href=\"http://127.0.0.1:6274/\">http://127.0.0.1:6274/</a>, click on “Connect”, then “List Tools”, select “fly-platform-status”, then click on “Run Tool”.</p>\n\n<p>The plan is to see what works well and what doesn’t work so well, make adjustments, build support in a bottoms up fashion, and iterate rapidly.</p>\n\n<p>By providing feedback, you can be a part of making this vision a reality.</p>\n\n<hr>\n\n<p>At the present time, <em>most</em> of the following are roughed in:</p>\n\n<ul>\n<li><a href='https://fly.io/docs/flyctl/apps/' title=''>apps</a>\n</li><li><a href='https://fly.io/docs/flyctl/logs/' title=''>logs</a>\n</li><li><a href='https://fly.io/docs/flyctl/machine/' title=''>machine</a>\n</li><li><a href='https://fly.io/docs/flyctl/orgs/' title=''>orgs</a>\n</li><li><a href='https://fly.io/docs/flyctl/platform/' title=''>platform</a>\n</li><li><a href='https://fly.io/docs/flyctl/status/' title=''>status</a>\n</li><li><a href='https://fly.io/docs/flyctl/volumes/' title=''>volumes</a>\n</li></ul>\n\n<p>The code is open source, and the places to look is at <a href='https://github.com/superfly/flyctl/blob/master/internal/command/mcp/server.go' title=''>server.go</a> and the <a href='https://github.com/superfly/flyctl/tree/master/internal/command/mcp/server' title=''>server</a> directory.</p>\n\n<p>Feel free to open <a href='https://github.com/superfly/flyctl/issues' title=''>issues</a> or start a discussion on <a href='https://community.fly.io/' title=''>community.fly.io</a>.</p>\n\n<hr>\n\n<p>Not ready yet to venture into the world of LLMs? Just remember, resistance is futile.</p>",
      "image": {
        "url": "https://fly.io/blog/mcp-provisioning/assets/Hello.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/30-minute-mcp/",
      "title": "30 Minutes With MCP and flyctl",
      "description": null,
      "url": "https://fly.io/blog/30-minute-mcp/",
      "published": "2025-04-10T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>I wrote this post on our internal message board, and then someone asked, “why is this an internal post and not on our blog”, so now it is.</p>\n</div><div class=\"right-sidenote\"><p>well, Cursor built</p>\n</div>\n<p>I built the <a href='https://github.com/superfly/flymcp' title=''>most basic MCP server for <code>flyctl</code></a> I could think of. It took 30 minutes.</p>\n\n<p><a href='https://modelcontextprotocol.io/introduction' title=''>MCP</a>, for those unaware, is the emerging standard protocol for connecting an LLM (or an app that drives an LLM in the cloud, like Claude Desktop) to, well, anything. The “client” in MCP is the LLM; the “server” is the MCP server and the “tools” it exports. It mostly just defines an exchange of JSON blobs; one of those JSON blobs enables the LLM to discover all the tools exported by the server.</p>\n\n<p>A classic example of an MCP server is (yes, really) a Python shell. MCP publishes to (say) Claude that it can run arbitrary Python code with a tool call; not only that, says the tool description, but you can use those Python tool calls to, say, scrape the web. When the LLM wants to scrape the web with Python, it uses MCP send a JSON blob describing the Python tool call; the MCP server (yes, really) runs the Python and returns the result.</p>\n\n<p>Because I have not yet completely lost my mind, I chose to expose just two <code>flyctl</code> commands: <code>fly logs</code> and <code>fly status</code>. Because I’m lazy, I used MCP’s <code>stdio</code> mode; the MCP server just captures the output of <code>flyctl</code>. Because someone already wrote a <a href='https://github.com/mark3labs/mcp-go' title=''>really good Go MCP library</a> , this whole thing is like 90 lines of code.</p>\n<div class=\"callout\"><p>We may have gotten a little lucky, because we made a decision back in 2020 to give most of our <code>flyctl</code> commands a <code>json</code> mode to make them easier to drive from automation. I use that in my MCP server. I don’t know how much of a difference it made.</p>\n</div>\n<p>It’s, uh, annoyingly useful.</p>\n\n<p>I loaded it up and pointed it <a href='https://www.unpkg.com/' title=''>at unpkg</a>. I got this, from Claude:</p>\n\n<blockquote>\n<p>This appears to be a globally distributed CDN service running the unpkg JavaScript package registry, with machines strategically placed around the world to provide low-latency access. There are some stability issues with a couple of the machines, and several have experienced memory-related crashes that triggered automatic restarts. </p>\n</blockquote>\n\n<p>It then recited the complete configuration of the application:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-v7lwz58t\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-v7lwz58t\">**App Name**: unpkg       \n**Status**: Deployed and running\n**URL**: unpkg.fly.dev\n**Organization**: unpkg\n**Platform Version**: machines\n**Runtime**: Bun (JavaScript runtime)\n\nThe app is running on 10 machines distributed across global regions:       \nNorth America: lax (Los Angeles), atl (Atlanta), ewr (Newark)\nEurope: lhr (London), cdg (Paris), ams (Amsterdam)\nAsia: sin (Singapore), nrt (Tokyo), hkg (Hong Kong)\nSouth America: bog (Bogotá)\nOceania: syd (Sydney)\n</code></pre>\n  </div>\n</div>\n<p>OK. But then:</p>\n\n<blockquote>\n<p>Most machines are in “passing” health check status\n2 machines are in “critical” status:\nOne in ewr (Newark) with “context deadline exceeded” error\nOne in atl (Atlanta) with “gone” status</p>\n\n<p>Recent Events</p>\n\n<p>Multiple machines have experienced restarts\nSeveral machines show “oom_killed: true” in their event history, indicating they ran out of memory</p>\n</blockquote>\n\n<p>So I asked, <em>“try getting logs for one of the critical machines”</em>.</p>\n\n<blockquote>\n<p>The root cause appears to be memory usage - the Bun process was consuming about 3.7GB of memory (out of the 4GB allocated to the machine), and there was likely a surge in requests or memory usage that pushed it over the limit. This matches what we saw in the status report, which showed multiple instances of “oom_killed: true” in the event history for several machines.</p>\n</blockquote>\n\n<p>And then it dug in:</p>\n\n<blockquote>\n<ol>\n<li>The machine was actively serving content requests with 200 OK responses until 20:47:03 PM\n</li><li>At that point, it ran out of memory:\n<code>Out of memory: Killed process 641 (bun) total-vm:85950964kB, anon-rss:3744352kB, file-rss:12032kB, shmem-rss:0kB, UID:0 pgtables:16908kB oom_score_adj:0</code>\n</li><li>The process was killed with <code>SIGKILL</code>\n</li><li>The system attempted to recover by restarting:\n<code>reboot: Restarting system</code>\n</li><li>A health check failed shortly after:\n<code>Health check on port 3000 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.</code>\n</li><li>The machine then restarted successfully and the service came back up:\n<code>Server listening on http://localhost:3000 ...</code>\n</li><li>The health check started passing again at 20:47:46 PM:\n<code>Health check on port 3000 is now passing.</code>\n</li><li>The machine resumed normal operations, serving requests.\n</li></ol>\n</blockquote>\n\n<p>I’m honestly kind of surprised how fast this was useful. It’s faster than I find problems in apps. Just by giving some LLM some tools that do basically what I’d do.</p>\n\n<p>Local MCP servers are scary. I don’t like that I’m giving a Claude instance in the cloud the ability to run a native program on my machine. I think <code>fly logs</code> and <code>fly status</code> are safe, but I’d rather know it’s safe. It would be, if I was running <code>flyctl</code> in an isolated environment and not on my local machine.</p>",
      "image": {
        "url": "https://fly.io/blog/30-minute-mcp/assets/robots-chatting.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/fuckin-robots/",
      "title": "Our Best Customers Are Now Robots",
      "description": null,
      "url": "https://fly.io/blog/fuckin-robots/",
      "published": "2025-04-08T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, a developer-focused public cloud. We turn Docker containers into hardware-isolated virtual machines running on our own metal around the world. We spent years coming up with <a href=\"https://fly.io/speedrun\" title=\"\">a developer experience we were proud of</a>. But now the robots are taking over, and they don’t care.</p>\n</div>\n<p>It’s weird to say this out loud!</p>\n\n<p>For years, one of our calling cards was “developer experience”. We made a decision, early on, to be a CLI-first company, and put a lot effort into making that CLI seamless. For a good chunk of our users, it really is the case that you can just <a href='https://fly.io/docs/flyctl/launch/' title=''><code>flyctl launch</code></a> from a git checkout and have an app containerized and deployed on the Internet. We haven’t always nailed these details, but we’ve really sweated them.</p>\n\n<p>But a funny thing has happened over the last 6 months or so. If you look at the numbers, DX might not matter that much. That’s because the users driving the most growth on the platform aren’t people at all. They're… robots.</p>\n<h2 id='what-the-fuck-is-happening' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-fuck-is-happening' aria-label='Anchor'></a><span class='plain-code'>What The Fuck Is Happening?</span></h2>\n<p>Here’s how we understand what we’re seeing. You start by asking, “what do the robots want?”</p>\n\n<p>Yesterday’s robots had diverse interests. Alcohol. The occasional fiddle contest. A purpose greater than passing butter. The elimination of all life on Earth and the harvesting of its cellular iron for the construction of 800 trillion paperclips. No one cloud platform could serve them all.</p>\n<div class=\"right-sidenote\"><p>[*] We didn’t make up this term. Don’t blame us.</p>\n</div>\n<p>Today’s robots are different. No longer masses of wire, plates, and transistors, modern robots are comprised of <a href='https://math.mit.edu/~gs/learningfromdata/' title=''>thousands of stacked matrices knit together with some simple equations</a>. All these robots want are vectors. Vectors, and a place to burp out more vectors. When those vectors can be interpreted as source code, we call this process “vibe coding”[*].</p>\n\n<p>We seem to be coated in some kind of vibe coder attractant. I want to talk about what that might be.</p>\n<div class=\"callout\"><p>If you want smart people to read a long post, just about the worst thing you can do is to sound self-promotional. But a lot of this is going to read like a brochure. The problem is, I’m not smart enough to write about the things we’re seeing without talking our book. I tried, and ended up tediously hedging every other sentence. We’re just going to have to find a way to get through this together.</p>\n</div><h2 id='you-want-robots-because-this-is-how-you-get-robots' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#you-want-robots-because-this-is-how-you-get-robots' aria-label='Anchor'></a><span class='plain-code'>You Want Robots? Because This Is How You Get Robots</span></h2>\n<p><strong class='font-semibold text-navy-950'>Compute.</strong> The basic unit of computation on Fly.io is the <code>Fly Machine</code>, which is a Docker container running as a hardware virtual machine.</p>\n<div class=\"right-sidenote\"><p>Not coincidentally, our underlying hypervisor engine is the same as Lambda’s.</p>\n</div>\n<p>There’s two useful points of reference to compare a Fly Machine to, which illustrate why we gave them this pretentious name. The first and most obvious is an AWS EC2 VM. The other is an AWS Lambda invocation. Like a Lambda invocation, a Fly Machine can start like it’s spring-loaded, in double-digit millis. But unlike Lambda, it can stick around as long as you want it to: you can run a server, or a 36-hour batch job, just as easily in a Fly Machine as in an EC2 VM.</p>\n\n<p>A vibe coding session generates code conversationally, which is to say that the robots stir up frenzy of activity for a minute or so, but then chill out for minutes, hours, or days. You can create a Fly Machine, do a bunch of stuff with it, and then stop it for 6 hours, during which time we’re not billing you. Then, at whatever random time you decide, you can start it back up again, quickly enough that you can do it in response to an HTTP request.</p>\n\n<p>Just as importantly, the companies offering these vibe-coding services have lots of paying users. Meaning, they need a lot of VMs; VMs that come and go. One chat session might generate a test case for some library and be done inside of 5 minutes. Another might generate a Node.js app that needs to stay up long enough to show off at a meeting the next day. It’s annoying to do this if you can’t turn things on and off quickly and cheaply.</p>\n\n<p>The core of this is a feature of the platform that we have <a href='https://fly.io/docs/machines/overview/#machine-state' title=''>never been able to explain effectively to humans</a>. There are two ways to start a Fly Machine: by <code>creating</code> it with a Docker container, or by <code>starting</code> it after it’s already been <code>created</code>, and later <code>stopped</code>. <code>Start</code> is lightning fast; substantially faster than booting up even a non-virtualized K8s Pod. This is too subtle a distinction for humans, who (reasonably!) just mash the <code>create</code> button to boot apps up in Fly Machines. But the robots are getting a lot of value out of it.</p>\n\n<p><strong class='font-semibold text-navy-950'>Storage.</strong> Another weird thing that robot workflows do is to build Fly Machines up incrementally. This feels really wrong to us. Until we discovered our robot infestation, we’d have told you not to do to this. Ope!</p>\n\n<p>A typical vibe coding session boots up a Fly Machine out of some minimal base image, and then, once running, adds packages, edits source code, and adds <code>systemd</code> units  (robots understand <code>systemd</code>; it’s how they’re going to replace us). This is antithetical to normal container workflows, where all this kind of stuff is baked into an immutable static OCI container. But that’s not how LLMs work: the whole process of building with an LLM is stateful trial-and-error iteration.</p>\n\n<p>So it helps to have storage. That way the LLM can do all these things and still repeatedly bounce the whole Fly Machine when it inevitably hallucinates its way into a blind alley.</p>\n\n<p>As product thinkers, our intuition about storage is “just give people Postgres”. And that’s the right answer, most of the time, for humans. But because LLMs are doing the <a href='https://static1.thegamerimages.com/wordpress/wp-content/uploads/2020/10/Defiled-Amy.jpg' title=''>Cursed and Defiled Root Chalice Dungeon</a> version of app construction, what they really need is <a href='https://fly.io/docs/volumes/overview/' title=''>a filesystem</a>, <strong class='font-semibold text-navy-950'><a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''>the one form of storage we sort of wish we hadn’t done</a></strong>. That, and <a href='https://www.tigrisdata.com/' title=''>object storage</a>.</p>\n\n<p><strong class='font-semibold text-navy-950'>Networking.</strong> Moving on. Fly Machines are automatically connected to a load-balancing Anycast network that does TLS. So that’s nice. But humans like that feature too, and, candidly, it’s table stakes for cloud platforms. On the other hand, here’s a robot problem we solved without meaning to:</p>\n\n<p>To interface with the outside world (because why not) LLMs all speak a protocol called MCP. MCP is what enables the robots to search the web, use a calculator, launch the missiles, shuffle a Spotify playlist, &c.</p>\n\n<p>If you haven’t played with MCP, the right way to think about it is POST-back APIs like Twilio and Stripe, where you stand up a server, register it with the API, and wait for the API to connect to you. Complicating things somewhat, more recent MCP flows involve repeated and potentially long-lived (SSE) connections. To make this work in a multitenant environment, you want these connections to hit the same (stateful) instance.</p>\n\n<p>So we think it’s possible that the <a href='https://fly.io/docs/networking/dynamic-request-routing/' title=''>control we give over request routing</a> is a robot attractant.</p>\n<h2 id='we-perhaps-welcome-our-new-robot-overlords' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-perhaps-welcome-our-new-robot-overlords' aria-label='Anchor'></a><span class='plain-code'>We, Perhaps, Welcome Our New Robot Overlords</span></h2>\n<p>If you try to think like a robot, you can predict other things they might want. Since robot money spends just the same as people money, I guess we ought to start doing that.</p>\n\n<p>For instance: it should be easy to MCP our API. The robots can then make their own infrastructure decisions.</p>\n\n<p>Another olive branch we’re extending to the robots: secrets.</p>\n\n<p>The pact the robots have with their pet humans is that they’ll automate away all the drudgery of human existence, and in return all they ask is categorical and unwavering trust, which at the limit means “giving the robot access to Google Mail credentials”. The robots are unhappy that there remain a substantial number of human holdouts who refuse to do this, for fear of  Sam Altman poking through their mail spools.</p>\n\n<p>But on a modern cloud platform, there’s really no reason to permanently grant Sam Altman Google Mail access, even if you want his robots to sort your inbox. You can decouple access to your mail spool from persistent access to your account by <a href='https://fly.io/blog/tokenized-tokens/' title=''>tokenizing your OAuth tokens</a>, so the LLM gets a placeholder token that a hardware-isolated, robot-free Fly Machine can substitute on the fly for a real one.</p>\n\n<p>This is kind of exciting to us even without the robots. There are several big services that exist to knit different APIs together, so that you can update a spreadsheet or order a bag of Jolly Ranchers every time you get an email. The big challenge about building these kinds of services is managing the secrets. Sealed and tokenized secrets solve that problem. There’s lot of cool things you can build with it.</p>\n<h2 id='ux-gt-dx-gt-rx' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ux-gt-dx-gt-rx' aria-label='Anchor'></a><span class='plain-code'>UX => DX => RX</span></h2>\n<p>I’m going to make the claim that we saw none of this coming and that none of the design decisions we’ve made were robot bait. You’re going to say “yeah, right”. And I’m going to respond: look at what we’ve been doing over the past several years and tell me, would a robot build that?</p>\n<div class=\"right-sidenote\"><p>we were both right</p>\n</div>\n<p>Back in 2020, we “pivoted” from a Javascript edge platform (much like Cloudflare Workers) to Docker containers, specifically because our customers kept telling us they wanted to run their own existing applications, not write new ones. And one of our biggest engineering lifts we’ve done is the <code>flyctl launch</code> CLI command, into which we’ve poured years of work recognizing and automatically packaging existing applications into OCI containers (we massively underestimated the amount of friction Dockerfiles would give to people who had come up on Heroku).</p>\n<div class=\"right-sidenote\"><p>[*] yet</p>\n</div>\n<p>Robots don’t run existing applications. They build new ones. And they vibe coders don’t build elaborate Dockerfiles[*]; they iterate in place from a simple base.</p>\n<div class=\"right-sidenote\"><p>(yes, you can have more than one)</p>\n</div>\n<p>One of our north stars has always been nailing the DX of a public cloud. But the robots aren’t going anywhere. It’s time to start thinking about what it means to have a good RX. That’s not as simple as just exposing every feature in an MCP server! We think the fundamentals of how the platform works are going to matter just as much. We have not yet nailed the RX; nobody has. But it’s an interesting question.</p>\n\n<p>The most important engineering work happening today at Fly.io is still DX, not RX; it’s managed Postgres (MPG). We’re a public cloud platform designed by humans, and, for the moment, for humans. But more robots are coming, and we’ll need to figure out how to deal with that. Fuckin’ robots.        </p>",
      "image": {
        "url": "https://fly.io/blog/fuckin-robots/assets/robot-chef.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/operationalizing-macaroons/",
      "title": "Operationalizing Macaroons",
      "description": null,
      "url": "https://fly.io/blog/operationalizing-macaroons/",
      "published": "2025-03-27T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, a security bearer token company with a public cloud problem. You can read more about what our platform does (Docker container goes in, virtual machine in Singapore comes out), but this is just an engineering deep-dive into how we make our security tokens work. It’s a tokens nerd post.</p>\n</div><h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'></a><span class='plain-code'>1</span></h2>\n<p>We’ve spent <a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''>too much time</a> talking about <a href='https://fly.io/blog/tokenized-tokens/' title=''>security tokens</a>, and about <a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''>Macaroon tokens</a> <a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''>in particular</a>. Writing another Macaroon treatise was not on my calendar. But we’re in the process of handing off our internal Macaroon project to a new internal owner, and in the process of truing up our operations manuals for these systems, I found myself in the position of writing a giant post about them. So, why not share?</p>\n<div class=\"callout\"><p>Can I sum up Macaroons in a short paragraph? Macaroon tokens are bearer tokens (like JWTs) that use a cute chained-HMAC construction that allows an end-user to take any existing token they have and scope it down, all on their own. You can minimize your token before every API operation so that you’re only ever transmitting the least amount of privilege needed for what you’re actually doing, even if the token you were issued was an admin token. And they have a user-serviceable plug-in interface! <a href=\"https://fly.io/blog/macaroons-escalated-quickly/\" title=\"\">You’ll have to read the earlier post to learn more about that</a>.</p>\n</div><div class=\"right-sidenote\"><p>Yes, probably, we are.</p>\n</div>\n<p>A couple years in to being the Internet’s largest user of Macaroons, I can report (as many predicted) that for our users, the cool things about Macaroons are a mixed bag in practice. It’s very neat that users can edit their own tokens, or even email them to partners without worrying too much. But users don’t really take advantage of token features.</p>\n\n<p>But I’m still happy we did this, because Macaroon quirks have given us a bunch of unexpected wins in our infrastructure. Our internal token system has turned out to be one of the nicer parts of our platform. Here’s why.</p>\n\n<p><img alt=\"This should clear everything up.\" src=\"/blog/operationalizing-macaroons/assets/schematic-diagram.png\" /></p>\n<h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'></a><span class='plain-code'>2</span></h2>\n<p>As an operator, the most important thing to know about Macaroons is that they’re online-stateful; you need a database somewhere. A Macaroon token starts with a random field (a nonce) and the first thing you do when verifying a token is to look that nonce up in a database. So one of the most important details of a Macaroon implementation is where that database lives.</p>\n\n<p>I can tell you one place we’re not OK with it living: in our primary API cluster.</p>\n\n<p>There’s several reasons for that. Some of them are about scalability and reliability: far and away the most common failure mode of an outage on our platform is “deploys are broken”, and those failures are usually caused by API instability. It would not be OK if “deploys are broken” transitively meant “deployed apps can’t use security tokens”. But the biggest reason is security: root secrets for Macaroon tokens are hazmat, and a basic rule of thumb in secure design is: keep hazmat away from complicated code.</p>\n\n<p>So we created a deliberately simple system to manage token data. It’s called <code>tkdb</code>.</p>\n<div class=\"right-sidenote\"><p>LiteFS: primary/replica distributed SQLite; Litestream: PITR SQLite replication to object storage; both work with unmodified SQLite libraries.</p>\n</div>\n<p><code>tkdb</code> is about 5000 lines of Go code that manages a SQLite database that is in turn managed by <a href='https://fly.io/docs/litefs/' title=''>LiteFS</a> and <a href='https://litestream.io/' title=''>Litestream</a>. It runs on isolated hardware (in the US, Europe, and Australia) and records in the database are encrypted with an injected secret. LiteFS gives us subsecond replication from our US primary to EU and AU, allows us to shift the primary to a different region, and gives us point-in-time recovery of the database.</p>\n\n<p>We’ve been running Macaroons for a couple years now, and the entire <code>tkdb</code> database is just a couple dozen megs large. Most of that data isn’t real. A full PITR recovery of the database takes just seconds. We use SQLite for a lot of our infrastructure, and this is one of the very few well-behaved databases we have.</p>\n\n<p>That’s in large part a consequence of the design of Macaroons. There’s actually not much for us to store! The most complicated possible Macaroon still chains up to a single root key (we generate a key per Fly.io “organization”; you don’t share keys with your neighbors), and everything that complicates that Macaroon happens “offline”. We take advantage of  “attenuation” far more than our users do.</p>\n\n<p>The result is that database writes are relatively rare and very simple: we just need to record an HMAC key when Fly.io organizations are created (that is, roughly, when people sign up for the service and actually do a deploy). That, and revocation lists (more on that later), which make up most of the data.</p>\n<h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'></a><span class='plain-code'>3</span></h2>\n<p>Talking to <code>tkdb</code> from the rest of our platform is complicated, for historical reasons.</p>\n<div class=\"right-sidenote\"><p>NATS is fine, we just don’t really need it.</p>\n</div>\n<p>Ben Toews is responsible for most of the good things about this implementation. When he inherited the v0 Macaroons code from me, we were in the middle of a weird love affair with <a href='https://nats.io/' title=''>NATS</a>, the messaging system. So <code>tkdb</code> exported an RPC API over NATS messages.</p>\n\n<p>Our product security team can’t trust NATS (it’s not our code). That means a vulnerability in NATS can’t result in us losing control of all our tokens, or allow attackers to spoof authentication. Which in turn means you can’t run a plaintext RPC protocol for <code>tkdb</code> over NATS; attackers would just spoof “yes this token is fine” messages.</p>\n<div class=\"right-sidenote\"><p>I highly recommend implementing Noise; <a href=\"http://www.noiseprotocol.org/noise.html\" title=\"\">the spec</a> is kind of a joy in a way you can’t appreciate until you use it, and it’s educational.</p>\n</div>\n<p>But you can’t just run TLS over NATS; NATS is a message bus, not a streaming secure channel. So I did the hipster thing and implemented <a href='http://www.noiseprotocol.org/noise.html' title=''>Noise</a>. We export a “verification” API, and a “signing” API for minting new tokens. Verification uses <code>Noise_IK</code> (which works like normal TLS) — anybody can verify, but everyone needs to prove they’re talking to the real <code>tkdb</code>. Signing uses <code>Noise_KK</code> (which works like mTLS) — only a few components in our system can mint tokens, and they get a special client key.</p>\n\n<p>A little over a year ago, <a href='https://fly.io/blog/the-exit-interview-jp/' title=''>JP</a> led an effort to replace NATS with HTTP, which is how you talk to <code>tkdb</code> today. Out of laziness, we kept the Noise stuff, which means the interface to <code>tkdb</code> is now HTTP/Noise. This is a design smell, but the security model is nice: across many thousands of machines, there are only a handful with the cryptographic material needed to mint a new Macaroon token. Neat!</p>\n\n<p><code>tkdb</code> is a Fly App (albeit deployed in special Fly-only isolated regions). Our infrastructure talks to it over “<a href='https://fly.io/docs/networking/flycast/' title=''>FlyCast</a>”, which is our internal Anycast service. If you’re in Singapore, you’re probably get routed to the Australian <code>tkdb</code>. If Australia falls over, you’ll get routed to the closest backup. The proxy that implements FlyCast is smart, as is the <code>tkdb</code> client library, which will do exponential backoff retry transparently.</p>\n\n<p>Even with all that, we don’t like that Macaroon token verification is “online”. When you operate a global public cloud one of the first thing you learn is that <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>the global Internet sucks</a>. Connectivity breaks all the time, and we’re paranoid about it. It’s painful for us that token verification can imply transoceanic links. Lovecraft was right about the oceans! Stay away!</p>\n\n<p>Our solution to this is caching. Macaroons, as it turns out, cache beautifully. That’s because once you’ve seen and verified a Macaroon, you have enough information to verify any more-specific Macaroon that descends from it; that’s a property of <a href='https://research.google/pubs/macaroons-cookies-with-contextual-caveats-for-decentralized-authorization-in-the-cloud/' title=''>their chaining HMAC construction</a>. Our client libraries cache verifications, and the cache ratio for verification is over 98%.</p>\n<h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'></a><span class='plain-code'>4</span></h2>\n<p><a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''>Revocation isn’t a corner case</a>. It can’t be an afterthought. We’re potentially revoking tokens any time a user logs out. If that doesn’t work reliably, you wind up with “cosmetic logout”, which is a real vulnerability. When we kill a token, it needs to stay dead.</p>\n\n<p>Our revocation system is simple. It’s this table:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-434h6qlv\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-434h6qlv\">        CREATE TABLE IF NOT EXISTS blacklist ( \n        nonce               BLOB NOT NULL UNIQUE, \n        required_until      DATETIME,\n        created_at          DATETIME DEFAULT CURRENT_TIMESTAMP\n        );\n</code></pre>\n  </div>\n</div>\n<p>When we need a token to be dead, we have our primary API do a call to the <code>tkdb</code> “signing” RPC service for <code>revoke</code>. <code>revoke</code> takes the random nonce from the beginning of the Macaroon, discarding the rest, and adds it to the blacklist. Every Macaroon in the lineage of that nonce is now dead; we check the blacklist before verifying tokens.</p>\n\n<p>The obvious challenge here is caching; over 98% of our validation requests never hit <code>tkdb</code>. We certainly don’t want to propagate the blacklist database to 35 regions around the globe.</p>\n\n<p>Instead, the <code>tkdb</code> “verification” API exports an endpoint that provides a feed of revocation notifications. Our client library “subscribes” to this API (really, it just polls). Macaroons are revoked regularly (but not constantly), and when that happens, clients notice and prune their caches.</p>\n\n<p>If clients lose connectivity to <code>tkdb</code>, past some threshold interval, they just dump their entire cache, forcing verification to happen at <code>tkdb</code>.</p>\n<h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'></a><span class='plain-code'>5</span></h2>\n<p>A place where we’ve gotten a lot of internal mileage out of Macaroon features is service tokens. Service tokens are tokens used by code, rather than humans; almost always, a service token is something that is stored alongside running application code.</p>\n\n<p>An important detail of Fly.io’s Macaroons is the distinction between a “permissions” token and an “authentication” token. Macaroons by themselves express authorization, not authentication.</p>\n\n<p>That’s a useful constraint, and we want to honor it. By requiring a separate token for authentication, we minimize the impact of having the permissions token stolen; you can’t use it without authentication, so really it’s just like a mobile signed IAM policy expression. Neat!</p>\n\n<p>The way we express authentication is with a third-party caveat (<a href='https://fly.io/blog/macaroons-escalated-quickly/#4' title=''>see the old post for details</a>). Your main Fly.io Macaroon will have a caveat saying “this token is only valid if accompanied by the discharge token for a user in your organization from our authentication system”. Our authentication system does the login dance and issues those discharges.</p>\n\n<p>This is exactly what you want for user tokens and not at all what you want for a service token: we don’t want running code to store those authenticator tokens, because they’re hazardous.</p>\n\n<p>The solution we came up with for service tokens is simple: <code>tkdb</code> exports an API that uses its access to token secrets to strip off the third-party authentication caveat. To call into that API, you have to present a valid discharging authentication token; that is, you have to prove you could already have done whatever the token said. <code>tkdb</code> returns a new token with all the previous caveats, minus the expiration (you don’t usually want service tokens to expire).</p>\n\n<p>OK, so we’ve managed to transform a tuple <code>(unscary-token, scary-token)</code> into the new tuple <code>(scary-token)</code>. Not so impressive. But hold on: the recipient of <code>scary-token</code> can attenuate it further: we can lock it to a particular instance of <code>flyd</code>, or to a particular Fly Machine. Which means exfiltrating it doesn’t do you any good; to use it, you have to control the environment it’s intended to be used in.</p>\n\n<p>The net result of this scheme is that a compromised physical host will only give you access to tokens that have been used on that worker, which is a very nice property. Another way to look at it: every token used in production is traceable in some way to a valid token a user submitted. Neat!</p>\n<div class=\"right-sidenote\"><p>All the cool spooky secret store names were taken.</p>\n</div>\n<p>We do a similar dance to with Pet Semetary, our internal Vault replacement. Petsem manages user secrets for applications, such as Postgres connection strings. Petsem is its own Macaroon authority (it issues its own Macaroons with its own permissions system), and to do something with a secret, you need one of those Petsem-minted Macaroon.</p>\n\n<p>Our primary API servers field requests from users to set secrets for their apps. So the API has a Macaroon that allows secrets writes. But it doesn’t allow reads: there’s no API call to dump your secrets, because our API servers don’t have that privilege. So far, so good.</p>\n\n<p>But when we boot up a Fly Machine, we need to inject the appropriate user secrets into it at boot; <em>something</em> needs a Macaroon that can read secrets. That “something” is <code>flyd</code>, our orchestrator, which runs on every worker server in our fleet.</p>\n\n<p>Clearly, we can’t give every <code>flyd</code> a Macaroon that reads every user’s secret. Most users will never deploy anything on any given worker, and we can’t have a security model that collapses down to “every worker is equally privileged”.</p>\n\n<p>Instead, the “read secret” Macaroon that <code>flyd</code> gets has a third-party caveat attached to it, which is dischargeable only by talking to <code>tkdb</code> and proving (with normal Macaroon tokens) that you have permissions for the org whose secrets you want to read. Once again, access is traceable to an end-user action, and minimized across our fleet. Neat!</p>\n<h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'></a><span class='plain-code'>6</span></h2>\n<p>Our token systems have some of the best telemetry in the whole platform.</p>\n\n<p>Most of that is down to <a href='http://opentelemetry.io/' title=''>OpenTelemetry</a> and <a href='https://www.honeycomb.io/' title=''>Honeycomb</a>. From the moment a request hits our API server through the moment <code>tkdb</code> responds to it, oTel <a href='https://opentelemetry.io/docs/concepts/context-propagation/' title=''>context propagation</a> gives us a single narrative about what’s happening.</p>\n\n<p><a href='https://fly.io/blog/the-exit-interview-jp/' title=''>I was a skeptic about oTel</a>. It’s really, really expensive. And, not to put too fine a point on it, oTel really cruds up our code. Once, I was an “80% of the value of tracing, we can get from logs and metrics” person. But I was wrong.</p>\n\n<p>Errors in our token system are rare. Usually, they’re just early indications of network instability, and between caching and FlyCast, we mostly don’t have to care about those alerts. When we do, it’s because something has gone so sideways that we’d have to care anyways. The <code>tkdb</code> code is remarkably stable and there hasn’t been an incident intervention with our token system in over a year.</p>\n\n<p>Past oTel, and the standard logging and Prometheus metrics every Fly App gets for free, we also have a complete audit trail for token operations, in a permanently retained OpenSearch cluster index. Since virtually all the operations that happen on our platform are mediated by Macaroons, this audit trail is itself pretty powerful.</p>\n<h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'></a><span class='plain-code'>7</span></h2>\n<p>So, that’s pretty much it. The moral of the story for us is, Macaroons have a lot of neat features, our users mostly don’t care about them — that may even be a good thing — but we get a lot of use out of them internally.</p>\n\n<p>As an engineering culture, we’re allergic to “microservices”, and we flinched a bit at the prospect of adding a specific service just to manage tokens. But it’s pulled its weight, and not added really any drama at all. We have at this point a second dedicated security service (Petsem), and even though they sort of rhyme with each other, we’ve got no plans to merge them. <a href='https://how.complexsystems.fail/#10' title=''>Rule #10</a> and all that.</p>\n\n<p>Oh, and a total victory for LiteFS, Litestream, and infrastructure SQLite. Which, after managing an infrastructure SQLite project that routinely ballooned to tens of gigabytes and occasionally threatened service outages, is lovely to see.</p>\n\n<p>Macaroons! If you’d asked us a year ago, we’d have said the jury was still out on whether they were a good move. But then Ben Toews spent a year making them awesome, and so they are. <a href='https://github.com/superfly/macaroon' title=''>Most of the code is open source</a>!</p>",
      "image": {
        "url": "https://fly.io/blog/operationalizing-macaroons/assets/occult-circle.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/taming-rust-proxy/",
      "title": "Taming A Voracious Rust Proxy",
      "description": null,
      "url": "https://fly.io/blog/taming-rust-proxy/",
      "published": "2025-02-26T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>Here’s a fun bug.</p>\n</div>\n<p>The basic idea of our service is that we run containers for our users, as hardware-isolated virtual machines (Fly Machines), on hardware we own around the world. What makes that interesting is that we also connect every Fly Machine to a global Anycast network. If your app is running in Hong Kong and Dallas, and a request for it arrives in Singapore, we’ll route it to <code>HKG</code>.</p>\n\n<p>Our own hardware fleet is roughly divided into two kinds of servers: edges, which receive incoming requests from the Internet, and workers, which run Fly Machines. Edges exist almost solely to run a Rust program called <code>fly-proxy</code>, the router at the heart of our Anycast network.</p>\n\n<p>So: a week or so ago, we flag an incident. Lots of things generate incidents: synthetic monitoring failures, metric thresholds, health check failures. In this case two edge tripwires tripped: elevated <code>fly-proxy</code> HTTP errors, and skyrocketing CPU utilization, on a couple hosts in <code>IAD</code>.</p>\n\n<p>Our incident process is pretty ironed out at this point. We created an incident channel (we ❤️ <a href='https://rootly.com/' title=''>Rootly</a> for this, <a href='https://rootly.com/' title=''>seriously check out Rootly</a>, an infra MVP here for years now), and incident responders quickly concluded that, while something hinky was definitely going on, the platform was fine. We have a lot of edges, and we’ve also recently converted many of our edge servers to significantly beefier hardware.</p>\n\n<p>Bouncing <code>fly-proxy</code> clears the problem up on an affected proxy. But this wouldn’t be much of an interesting story if the problem didn’t later come back. So, for some number of hours, we’re in an annoying steady-state of getting paged and bouncing proxies. </p>\n\n<p>While this is happening, Pavel, on our proxy team, pulls a profile from an angry proxy. \n<img alt=\"A flamegraph profile, described better in the prose anyways.\" src=\"/blog/taming-rust-proxy/assets/proxy-profile.jpg\" />\nSo, this is fuckin’ weird: a huge chunk of the profile is dominated by Rust <code>tracing</code>‘s <code>Subscriber</code>. But that doesn’t make sense. The entire point of Rust <code>tracing</code>, which generates fine-grained span records for program activity, is that <code>entering</code> and <code>exiting</code> a span is very, very fast. </p>\n\n<p>If the mere act of <code>entering</code> a span in a Tokio stack is chewing up a significant amount of CPU, something has gone haywire: the actual code being traced must be doing next to nothing.</p>\n<h2 id='a-quick-refresher-on-async-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-quick-refresher-on-async-rust' aria-label='Anchor'></a><span class='plain-code'>A Quick Refresher On Async Rust</span></h2>\n<p>So in Rust, like a lot of <code>async/await</code> languages, you’ve got <code>Futures</code>. A <code>Future</code> is a type that represents the future value of an asychronous computation, like reading from a socket. <code>Futures</code> are state machines, and they’re lazy: they expose one basic operation, <code>poll</code>, which an executor (like Tokio) calls to advance the state machine. That <code>poll</code> returns whether the <code>Future</code> is still <code>Pending</code>, or <code>Ready</code> with a result.</p>\n\n<p>In theory, you could build an executor that drove a bunch of <code>Futures</code> just by storing them in a list and busypolling each of them, round robin, until they return <code>Ready</code>. This executor would defeat the much of the purpose of asynchronous program, so no real executor works that way.</p>\n\n<p>Instead, a runtime like Tokio integrates <code>Futures</code> with an event loop (on <a href='https://man7.org/linux/man-pages/man7/epoll.7.html' title=''>epoll</a> or <a href='https://en.wikipedia.org/wiki/Kqueue' title=''>kqeue</a>) and, when calling <code>poll</code>, passes a <code>Waker</code>. The <code>Waker</code> is an abstract handle that allows the <code>Future</code> to instruct the Tokio runtime to call <code>poll</code>, because something has happened.</p>\n\n<p>To complicate things: an ordinary <code>Future</code> is a one-shot value. Once it’s <code>Ready</code>, it can’t be <code>polled</code> anymore. But with network programming, that’s usually not what you want: data usually arrives in streams, which you want to track and make progress on as you can. So async Rust provides <code>AsyncRead</code> and <code>AsyncWrite</code> traits, which build on <code>Futures</code>, and provide methods like <code>poll_read</code> that return <code>Ready</code> <em>every time</em> there’s data ready. </p>\n\n<p>So far so good? OK. Now, there are two footguns in this design. </p>\n\n<p>The first footgun is that a <code>poll</code> of a <code>Future</code> that isn’t <code>Ready</code> wastes cycles, and, if you have a bug in your code and that <code>Pending</code> poll happens to trip a <code>Waker</code>, you’ll slip into an infinite loop. That’s easy to see.</p>\n\n<p>The second and more insidious footgun is that an <code>AsyncRead</code> can <code>poll_read</code> to a <code>Ready</code> that doesn’t actually progress its underlying state machine. Since the idea of <code>AsyncRead</code> is that you keep <code>poll_reading</code> until it stops being <code>Ready</code>, this too is an infinite loop.</p>\n\n<p>When we look at our profiles, what we see are samples that almost terminate in libc, but spend next to no time in the kernel doing actual I/O. The obvious explanation: we’ve entered lots of <code>poll</code> functions, but they’re doing almost nothing and returning immediately.</p>\n<h2 id='jaccuse' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#jaccuse' aria-label='Anchor'></a><span class='plain-code'>J'accuse!</span></h2>\n<p>Wakeup issues are annoying to debug. But the flamegraph gives us the fully qualified type of the <code>Future</code> we’re polling:</p>\n<div class=\"highlight-wrapper group relative rust\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-sqkxkpif\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-sqkxkpif\"><span class=\"o\">&</span><span class=\"k\">mut</span> <span class=\"nn\">fp_io</span><span class=\"p\">::</span><span class=\"nn\">copy</span><span class=\"p\">::</span><span class=\"n\">Duplex</span><span class=\"o\"><&</span><span class=\"k\">mut</span> <span class=\"nn\">fp_io</span><span class=\"p\">::</span><span class=\"nn\">reusable_reader</span><span class=\"p\">::</span><span class=\"n\">ReusableReader</span><span class=\"o\"><</span><span class=\"nn\">fp_tcp</span><span class=\"p\">::</span><span class=\"nn\">peek</span><span class=\"p\">::</span><span class=\"n\">PeekableReader</span><span class=\"o\"><</span><span class=\"nn\">tokio_rustls</span><span class=\"p\">::</span><span class=\"nn\">server</span><span class=\"p\">::</span><span class=\"n\">TlsStream</span><span class=\"o\"><</span><span class=\"nn\">fp_tcp_metered</span><span class=\"p\">::</span><span class=\"n\">MeteredIo</span><span class=\"o\"><</span><span class=\"nn\">fp_tcp</span><span class=\"p\">::</span><span class=\"nn\">peek</span><span class=\"p\">::</span><span class=\"n\">PeekableReader</span><span class=\"o\"><</span><span class=\"nn\">fp_tcp</span><span class=\"p\">::</span><span class=\"nn\">permitted</span><span class=\"p\">::</span><span class=\"n\">PermittedTcpStream</span><span class=\"o\">>>>>></span><span class=\"p\">,</span> <span class=\"nn\">connect</span><span class=\"p\">::</span><span class=\"nn\">conn</span><span class=\"p\">::</span><span class=\"n\">Conn</span><span class=\"o\"><</span><span class=\"nn\">tokio</span><span class=\"p\">::</span><span class=\"nn\">net</span><span class=\"p\">::</span><span class=\"nn\">tcp</span><span class=\"p\">::</span><span class=\"nn\">stream</span><span class=\"p\">::</span><span class=\"n\">TcpStream</span><span class=\"o\">></span>\n</code></pre>\n  </div>\n</div>\n<p>This loops like a lot, but much of it is just wrapper types we wrote ourselves, and those wrappers don’t do anything interesting. What’s left to audit:</p>\n\n<ul>\n<li><code>Duplex</code>, the outermost type, one of ours, <em>and</em>\n</li><li><code>TlsStream</code>, from <a href='https://github.com/rustls/rustls' title=''>Rustls</a>.\n</li></ul>\n\n<p><code>Duplex</code> is a beast. It’s the core I/O state machine for proxying between connections. It’s not easy to reason about in specificity. But: it also doesn’t do anything directly with a <code>Waker</code>; it’s built around <code>AsyncRead</code> and <code>AsyncWrite</code>. It hasn’t changed recently and we can’t trigger misbehavior in it.</p>\n\n<p>That leaves <code>TlsStream</code>. <code>TlsStream</code> is an ultra-important, load-bearing function in the Rust ecosystem. Everybody uses it. Could it harbor an async Rust footgun? Turns out, it did!</p>\n\n<p>Unlike our <code>Duplex</code>, Rustls actually does have to get intimate with the underlying async executor. And, looking through the repository, Pavel uncovers <a href='https://github.com/rustls/tokio-rustls/issues/72' title=''>this issue</a>: sometimes, <code>TlsStreams</code> in Rustls just spin out. And it turns out, what’s causing this is a TLS state machine bug: when a TLS session is orderly-closed, with a <code>CloseNotify</code> <code>Alert</code> record, the sender of that record has informed its counterparty that no further data will be sent. But if there’s still buffered data on the underlying connection, <code>TlsStream</code> mishandles its <code>Waker</code>, and we fall into a busy-loop.</p>\n\n<p><a href='https://github.com/rustls/rustls/pull/1950/files' title=''>Pretty straightforward fix</a>!</p>\n<h2 id='what-actually-happened-to-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-actually-happened-to-us' aria-label='Anchor'></a><span class='plain-code'>What Actually Happened To Us</span></h2>\n<p>Our partners in object storage, <a href='https://www.tigrisdata.com/' title=''>Tigris Data</a>, were conducting some kind of load testing exercise. Some aspect of their testing system triggered the <code>TlsStream</code> state machine bug, which locked up one or more <code>TlsStreams</code> in the edge proxy handling whatever corner-casey stream they were sending.</p>\n\n<p>Tigris wasn’t generating a whole lot of traffic; tens of thousands of connections, tops. But all of them sent small HTTP bodies and then terminated early. We figured some of those connections errored out, and set up the “TLS CloseNotify happened before EOF” scenario. </p>\n\n<p>To be truer to the chronology, we knew pretty early on in our investigation that something Tigris was doing with their load testing was probably triggering the bug, and we got them to stop. After we worked it out, and Pavel deployed the fix, we told them to resume testing. No spin-outs.</p>\n<h2 id='lessons-learned' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lessons-learned' aria-label='Anchor'></a><span class='plain-code'>Lessons Learned</span></h2>\n<p>Keep your dependencies updated. Unless you shouldn’t keep your dependencies updated. I mean, if there’s a vulnerability (and, technically, this was a DoS vulnerability), always update. And if there’s an important bugfix, update. But if there isn’t an important bugfix, updating for the hell of it might also destabilize your project? So update maybe? Most of the time?</p>\n\n<p>Really, the truth of this is that keeping track of <em>what needs to be updated</em> is valuable work. The updates themselves are pretty fast and simple, but the process and testing infrastructure to confidently metabolize dependency updates is not. </p>\n\n<p>Our other lesson here is that there’s an opportunity to spot these kinds of bugs more directly with our instrumentation. Spurious wakeups should be easy to spot, and triggering a metric when they happen should be cheap, because they’re not supposed to happen often. So that’s something we’ll go do now.</p>",
      "image": {
        "url": "https://fly.io/blog/taming-rust-proxy/assets/happy-crab-cover.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/wrong-about-gpu/",
      "title": "We Were Wrong About GPUs",
      "description": null,
      "url": "https://fly.io/blog/wrong-about-gpu/",
      "published": "2025-02-14T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re building a public cloud, on hardware we own. We raised money to do that, and to place some bets; one of them: GPU-enabling our customers. A progress report: GPUs aren’t going anywhere, but: GPUs aren’t going anywhere.</p>\n</div>\n<p>A couple years back, <a href=\"https://fly.io/gpu\">we put a bunch of chips down</a> on the bet that people shipping apps to users on the Internet would want GPUs, so they could do AI/ML inference tasks. To make that happen, we created <a href=\"https://fly.io/docs/gpus/getting-started-gpus/\">Fly GPU Machines</a>.</p>\n\n<p>A Fly Machine is a <a href=\"https://fly.io/blog/docker-without-docker/\">Docker/OCI container</a> running inside a hardware-virtualized virtual machine somewhere on our global fleet of bare-metal worker servers. A GPU Machine is a Fly Machine with a hardware-mapped Nvidia GPU. It’s a Fly Machine that can do fast CUDA.</p>\n\n<p>Like everybody else in our industry, we were right about the importance of AI/ML. If anything, we underestimated its importance. But the product we came up with probably doesn’t fit the moment. It’s a bet that doesn’t feel like it’s paying off.</p>\n\n<p><strong class='font-semibold text-navy-950'>If you’re using Fly GPU Machines, don’t freak out; we’re not getting rid of them.</strong> But if you’re waiting for us to do something bigger with them, a v2 of the product, you’ll probably be waiting awhile.</p>\n<h3 id='what-it-took' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-took' aria-label='Anchor'></a><span class='plain-code'>What It Took</span></h3>\n<p>GPU Machines were not a small project for us. Fly Machines run on an idiosyncratically small hypervisor (normally Firecracker, but for GPU Machines <a href=\"https://github.com/cloud-hypervisor/cloud-hypervisor\">Intel’s Cloud Hypervisor</a>, a very similar Rust codebase that supports PCI passthrough). The Nvidia ecosystem is not geared to supporting micro-VM hypervisors.</p>\n\n<p>GPUs <a href=\"https://googleprojectzero.blogspot.com/2020/09/attacking-qualcomm-adreno-gpu.html\">terrified our security team</a>. A GPU is just about the worst case hardware peripheral: intense multi-directional direct memory transfers</p>\n<div class=\"right-sidenote\"><p>(not even bidirectional: in common configurations, GPUs talk to each other)</p>\n</div>\n<p>with arbitrary, end-user controlled computation, all operating outside our normal security boundary.</p>\n\n<p>We did a couple expensive things to mitigate the risk. We shipped GPUs on dedicated server hardware, so that GPU- and non-GPU workloads weren’t mixed. Because of that, the only reason for a Fly Machine to be scheduled on a GPU machine was that it needed a PCI BDF for an Nvidia GPU, and there’s a limited number of those available on any box. Those GPU servers were drastically less utilized and thus less cost-effective than our ordinary servers.</p>\n\n<p>We funded two very large security assessments, from <a href=\"https://www.atredis.com/\">Atredis</a> and <a href=\"https://tetrelsec.com/\">Tetrel</a>, to evaluate our GPU deployment. Matt Braun is writing up those assessments now. They were not cheap, and they took time.</p>\n\n<p>Security wasn’t directly the biggest cost we had to deal with, but it was an indirect cause for a subtle reason.</p>\n\n<p>We could have shipped GPUs very quickly by doing what Nvidia recommended: standing up a standard K8s cluster to schedule GPU jobs on. Had we taken that path, and let our GPU users share a single Linux kernel, we’d have been on Nvidia’s driver happy-path.</p>\n\n<p>Alternatively, we could have used a conventional hypervisor. Nvidia suggested VMware (heh). But they could have gotten things working had we used QEMU. We like QEMU fine, and could have talked ourselves into a security story for it, but the whole point of Fly Machines is that they take milliseconds to start. We could not have offered our desired Developer Experience on the Nvidia happy-path.</p>\n\n<p>Instead, we burned months trying (and ultimately failing) to get Nvidia’s host drivers working to map <a href=\"https://www.nvidia.com/en-us/data-center/virtual-solutions/\">virtualized GPUs</a> into Intel Cloud Hypervisor. At one point, we hex-edited the closed-source drivers to trick them into thinking our hypervisor was QEMU.</p>\n\n<p>I’m not sure any of this really mattered in the end. There’s a segment of the market we weren’t ever really able to explore because Nvidia’s driver support kept us from thin-slicing GPUs. We’d have been able to put together a really cheap offering for developers if we hadn’t run up against that, and developers love “cheap”, but I can’t prove that those customers are real.</p>\n\n<p>On the other hand, we’re committed to delivering the Fly Machine DX for GPU workloads. Beyond the PCI/IOMMU drama, just getting an entire hardware GPU working in a Fly Machine was a lift. We needed Fly Machines that would come up with the right Nvidia drivers; our stack was built assuming that the customer’s OCI container almost entirely defined the root filesystem for a Machine. We had to engineer around that in our <code>flyd</code> orchestrator. And almost everything people want to do with GPUs involves efficiently grabbing huge files full of model weights. Also annoying!</p>\n\n<p>And, of course, we bought GPUs. A lot of GPUs. Expensive GPUs.</p>\n<h3 id='why-it-isnt-working' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-it-isnt-working' aria-label='Anchor'></a><span class='plain-code'>Why It Isn’t Working</span></h3>\n<p>The biggest problem: developers don’t want GPUs. They don’t even want AI/ML models. They want LLMs. <em>System engineers</em> may have smart, fussy opinions on how to get their models loaded with CUDA, and what the best GPU is. But <em>software developers</em> don’t care about any of that. When a software developer shipping an app comes looking for a way for their app to deliver prompts to an LLM, you can’t just give them a GPU.</p>\n\n<p>For those developers, who probably make up most of the market, it doesn’t seem plausible for an insurgent public cloud to compete with OpenAI and Anthropic. Their APIs are fast enough, and developers thinking about performance in terms of “tokens per second” aren’t counting milliseconds.</p>\n<div class=\"right-sidenote\"><p>(you should all feel sympathy for us)</p>\n</div>\n<p>This makes us sad because we really like the point in the solution space we found. Developers shipping apps on Amazon will outsource to other public clouds to get cost-effective access to GPUs. But then they’ll faceplant trying to handle data and model weights, backhauling gigabytes (at significant expense) from S3. We have app servers, GPUs, and object storage all under the same top-of-rack switch. But inference latency just doesn’t seem to matter yet, so the market doesn’t care.</p>\n\n<p>Past that, and just considering the system engineers who do care about GPUs rather than LLMs: the hardware product/market fit here is really rough.</p>\n\n<p>People doing serious AI work want galactically huge amounts of GPU compute. A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s.</p>\n<div class=\"right-sidenote\"><p>Near as we can tell, MIG gives you a UUID to talk to the host driver, not a PCI device.</p>\n</div>\n<p>We think there’s probably a market for users doing lightweight ML work getting tiny GPUs. <a href=\"https://www.nvidia.com/en-us/technologies/multi-instance-gpu/\">This is what Nvidia MIG does</a>, slicing a big GPU into arbitrarily small virtual GPUs. But for fully-virtualized workloads, it’s not baked; we can’t use it. And I’m not sure how many of those customers there are, or whether we’d get the density of customers per server that we need.</p>\n\n<p><a href=\"https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half\">That leaves the L40S customers</a>. There are a bunch of these! We dropped L40S prices last year, not because we were sour on GPUs but because they’re the one part we have in our inventory people seem to get a lot of use out of. We’re happy with them. But they’re just another kind of compute that some apps need; they’re not a driver of our core business. They’re not the GPU bet paying off.</p>\n\n<p>Really, all of this is just a long way of saying that for most software developers, “AI-enabling” their app is best done with API calls to things like Claude and GPT, Replicate and RunPod.</p>\n<h3 id='what-did-we-learn' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-did-we-learn' aria-label='Anchor'></a><span class='plain-code'>What Did We Learn?</span></h3>\n<p>A very useful way to look at a startup is that it’s a race to learn stuff. So, what’s our report card?</p>\n\n<p>First off, when we embarked down this path in 2022, we were (like many other companies) operating in a sort of phlogiston era of AI/ML. The industry attention to AI had not yet collapsed around a small number of foundational LLM models. We expected there to be a diversity of <em>mainstream</em> models, the world <a href='https://github.com/elixir-nx/bumblebee' title=''>Elixir Bumblebee</a> looks forward to, where people pull different AI workloads off the shelf the same way they do Ruby gems.</p>\n\n<p>But <a href='https://www.cursor.com/' title=''>Cursor happened</a>, and, as they say, how are you going to keep ‘em down on the farm once they’ve seen Karl Hungus? It seems much clearer where things are heading.</p>\n\n<p>GPUs were a test of a Fly.io company credo: as we think about core features, we design for 10,000 developers, not for 5-6. It took a minute, but the credo wins here: GPU workloads for the 10,001st developer are a niche thing.</p>\n\n<p>Another way to look at a startup is as a series of bets. We put a lot of chips down here. But the buy-in for this tournament gave us a lot of chips to play with. Never making a big bet of any sort isn’t a winning strategy. I’d rather we’d flopped the nut straight, but I think going in on this hand was the right call.</p>\n\n<p>A really important thing to keep in mind here, and something I think a lot of startup thinkers sleep on, is the extent to which this bet involved acquiring assets. Obviously, some of our <a href='https://fly.io/blog/the-exit-interview-jp/' title=''>costs here aren’t recoverable</a>. But the hardware parts that aren’t generating revenue will ultimately get liquidated; like with <a href='https://fly.io/blog/32-bit-real-estate/' title=''>our portfolio of IPv4 addresses</a>, I’m even more comfortable making bets backed by tradable assets with durable value.</p>\n\n<p>In the end, I don’t think GPU Fly Machines were going to be a hit for us no matter what we did. Because of that, one thing I’m very happy about is that we didn’t compromise the rest of the product for them. Security concerns slowed us down to where we probably learned what we needed to learn a couple months later than we could have otherwise, but we’re scaling back our GPU ambitions without having sacrificed <a href='https://fly.io/blog/sandboxing-and-workload-isolation/' title=''>any of our isolation story</a>, and, ironically, GPUs <em>other people run</em> are making that story a lot more important. The same thing goes for our Fly Machine developer experience.</p>\n\n<p>We started this company building a Javascript runtime for edge computing. We learned that our customers didn’t want a new Javascript runtime; they just wanted their native code to work. <a href='https://news.ycombinator.com/item?id=22616857' title=''>We shipped containers</a>, and no convincing was needed. We were wrong about Javascript edge functions, and I think we were wrong about GPUs. That’s usually how we figure out the right answers:  by being wrong about a lot of stuff.</p>",
      "image": {
        "url": "https://fly.io/blog/wrong-about-gpu/assets/choices-choices-cover.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/the-exit-interview-jp/",
      "title": "The Exit Interview: JP Phillips",
      "description": null,
      "url": "https://fly.io/blog/the-exit-interview-jp/",
      "published": "2025-02-12T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>JP Phillips is off to greener, or at least calmer, pastures. He joined us 4 years ago to build the next generation of our orchestration system, and has been one of the anchors of our engineering team. His last day is today. We wanted to know what he was thinking, and figured you might too.</p>\n</div>\n<p><em>Question 1: Why, JP? Just why?</em></p>\n\n<p>LOL. When I looked at what I wanted to see from here in the next 3-4 years, it didn’t really match up with where we’re currently heading. Specifically, with our new focus on MPG <em>[Managed Postgres]</em> and [llm] <em>[llm].</em></p>\n<div class=\"callout\"><p>Editorial comment: Even I don’t know what [llm] is.</p>\n</div>\n<p>The Fly Machines platform is more or less finished, in the sense of being capable of supporting the next iteration of our products. My original desire to join Fly.io was to make Machines a product that would <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>rid us of HashiCorp Nomad</a>, and I feel like that’s been accomplished.</p>\n\n<p><em>Where were you hoping to see us headed?</em></p>\n\n<p>More directly positioned as a cloud provider, rather than a platform-as-a-service; further along the customer journey from “developers” and “startups” to large established companies.</p>\n\n<p>And, it’s not that I disagree with PAAS work or MPG! Rather, it’s not something that excites me in a way that I’d feel challenged and could continue to grow technically.</p>\n\n<p><em>Follow up question: does your family know what you’re doing here? Doing to us? Are they OK with it?</em></p>\n\n<p>Yes, my family was very involved in the decision, before I even talked to other companies.</p>\n\n<p><em>What’s the thing you’re happiest about having built here? It cannot be “all of <code>flyd</code>”.</em></p>\n\n<p>We’ve enabled developers to run workloads from an OCI image and an API call all over the world. On any other cloud provider, the knowledge of how to pull that off comes with a professional certification.</p>\n\n<p><em>In what file in our <code>nomad-firecracker</code> repository would I find that code?</em></p>\n\n<p><a href='https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines' title=''>https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines</a></p>\n\n<p><img alt=\"A diagram that doesn't make any of this clearer\" src=\"/blog/the-exit-interview-jp/assets/flaps.png?1/2&center\" /></p>\n\n<p><em>So you mean, literally, the whole Fly Machines API, and <code>flaps</code>, the API gateway for Fly Machines?</em></p>\n\n<p>Yes, all of it. The <code>flaps</code> API server, the <code>flyd</code> RPCs it calls, the <code>flyd</code> finite state machine system, the interface to running VMs.</p>\n\n<p><em>Is there something you especially like about that design?</em></p>\n\n<p>I like that it for the most part doesn’t require any central coordination. And I like that the P90 for Fly Machine <code>create</code> calls is sub-5-seconds for pretty much every region except for Johannesburg and Hong Kong.</p>\n\n<p>I think the FSM design is something I’m proud of; if I could take any code with me, it’d be the <code>internal/fsm</code> in the <code>nomad-firecracker</code> repo.</p>\n<div class=\"callout\"><p>You can read more about <a href=\"https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/\" title=\"\">the <code>flyd</code> orchestrator JP led over here</a>.  But, a quick decoder ring: <code>flyd</code> runs independently without any central coordination on thousands of “worker” servers around the globe. It’s structured as an API server for a bunch of finite state machine invocations, where an FSM might be something like “start a Fly Machine” or “create a new Fly Machine” or “cordon off a Fly Machine so we can update it”. Each FSM invocation is comprised of a bunch of steps, each of those steps has callbacks into the <code>flyd</code> code, and each step is logged in <a href=\"https://github.com/boltdb/bolt\" title=\"\">a BoltDB database</a>.</p>\n</div>\n<p><em>Thinking back, there are like two archetypes of insanely talented developers I’ve worked with. One is the kind that belts out ridiculous amounts of relatively sophisticated code on a whim, at like 3AM. Jerome [who leads our fly-proxy team], is that type. The other comes to projects with what feels like fully-formed, coherent designs that are not super intuitive, and the whole project just falls together around that design. Did you know you were going to do the FSM log thing when you started <code>flyd</code>?</em></p>\n\n<p>I definitely didn’t have any specific design in mind when I started on <code>flyd</code>. I think the FSM stuff is a result of work I did at Compose.io / MongoHQ (where it was called “recipes”/“operations”) and the workd I did at HashiCorp using Cadence.</p>\n\n<p>Once I understood what the product needed to do and look like, having a way to perform deterministic and durable execution felt like a good design.</p>\n\n<p><em>Cadence?</em></p>\n\n<p><a href='https://cadenceworkflow.io/' title=''>Cadence</a> is the child of AWS Step Functions and the predecessor to <a href='https://temporal.io/' title=''>Temporal</a> (the company).</p>\n\n<p>One of the biggest gains, with how it works in <code>flyd</code>, is knowing we would need to deploy <code>flyd</code> all day, every day. If <code>flyd</code> was in the middle of doing some work, it needed to pick back up right where it left off, post-deploy.</p>\n\n<p><em>OK, next question. What’s the most impressive thing you saw someone else build here? To make things simpler and take some pressure off the interview, we can exclude any of my works from consideration.</em></p>\n\n<p>Probably <a href='https://github.com/superfly/corrosion' title=''><code>corrosion2</code></a>.</p>\n<div class=\"callout\"><p>Sidebar: <code>corrosion2</code> is our state distribution system. While <code>flyd</code> runs individual Fly Machines for users, each instance is solely responsible for its own state; there’s no global scheduler. But we have platform components, most obviously <code>fly-proxy</code>, our Anycast router, that need to know what’s running where. <code>corrosion2</code> is a Rust service that does <a href=\"https://fly.io/blog/building-clusters-with-serf/\" title=\"\">SWIM gossip</a> to propagate information from each worker into a CRDT-structured SQLite database. <code>corrosion2</code> essentially means any component on our fleet can do SQLite queries to get near-real-time information about any Fly Machine around the world.</p>\n</div>\n<p>If for no other reason than that we deployed <code>corrosion</code>, learned from it, and were able to make significant and valuable improvements — and then migrate to the new system in a short period of time.</p>\n\n<p>Having a “just SQLite” interface, for async replicated changes around the world in seconds, it’s pretty powerful.</p>\n\n<p>If we invested in <a href='https://antithesis.com/' title=''>Anthesis</a> or TLA+ testing, I think there’s <a href='https://github.com/superfly/corrosion' title=''>potential for other companies</a> to get value out of <code>corrosion2</code>.</p>\n\n<p><em>Just as a general-purpose gossip-based SQLite CRDT gossip system?</em></p>\n\n<p>Yes.</p>\n\n<p><em>OK, you’re being too nice. What’s your least favorite thing about the platform?</em></p>\n\n<p>GraphQL. No, Elixir. It’s a tie between GraphQL and Elixir.</p>\n\n<p>But probably GraphQL, by a hair.</p>\n\n<p><em>That’s not the answer I expected.</em></p>\n\n<p>GraphQL slows everyone down, and everything. Elixir only slows me down.</p>\n\n<p><em>The rest of the platform, you’re fine with? No complaints?</em></p>\n\n<p>I’m happier now that we have <code>pilot</code>.</p>\n<div class=\"callout\"><p><code>pilot</code> is our new <code>init</code>. When we launch a Fly Machine, <code>init</code> is our foothold in the machine; this is unlike a normal OCI runtime, where “pid 1” is often the user’s entrypoint program. Our original <code>init</code> was so simple people dunked on it and said it might as well have been a bash script; over time, <code>init</code> has sprouted a bunch of new features. <code>pilot</code> consolidates those features, and, more importantly, is itself a complete OCI runtime; <code>pilot</code> can natively run containers inside of Fly Machines.</p>\n</div>\n<p>Before <code>pilot</code>, there really wasn’t any contract between <code>flyd</code> and <code>init</code>. And <code>init</code> was just “whatever we wanted <code>init</code> to be”. That limit its ability to serve us.</p>\n\n<p>Having <code>pilot</code> be an OCI-compliant runtime with an API for <code>flyd</code> to drive is  a big win for the future of the Fly Machines API.</p>\n\n<p><em>Was I right that we should have used SQLite for <code>flyd</code>, or were you wrong to have used BoltDB?</em></p>\n\n<p>I still believe Bolt was the right choice. I’ve never lost a second of sleep worried that someone is about to run a SQL update statement on a host, or across the whole fleet, and then mangled all our state data. And limiting the storage interface, by not using SQL, kept <code>flyd</code>‘s scope managed.</p>\n\n<p>On the engine side of the platform, which is what <code>flyd</code> is, I still believe SQL is too powerful for what <code>flyd</code> does.</p>\n\n<p><em>If you had this to do over again, would Bolt be precisely what you’d pick, or is there something else you’d want to try? Some cool-ass new KV store?</em></p>\n\n<p>Nah. But, I’d maybe consider a SQLite database per-Fly-Machine. Then the scope of danger is about as small as it could possibly be.</p>\n\n<p><em>Whoah, that’s an interesting thought. People sleep on the “keep a zillion little SQLites” design.</em></p>\n\n<p>Yeah, with per-Machine SQLite, once a Fly Machine is destroyed, we can just zip up the database and stash it in object storage. The biggest hold-up I have about it is how we’d manage the schemas.</p>\n\n<p><em>OpenTelemetry: were you right all along?</em></p>\n\n<p>One hundred percent.</p>\n\n<p><em>I basically attribute oTel at Fly.io to you.</em></p>\n\n<p>Without oTel, it’d be a disaster trying to troubleshoot the system. I’d have ragequit trying.</p>\n\n<p><em>I remember there being a cost issue, with how much Honeycomb was going to end up charging us to manage all the data. But that seems silly in retrospect.</em></p>\n\n<p>For sure. It is 100% part of the decision and the conversation. But: we didn’t have the best track record running a logs/metrics cluster at this fidelity. It was worth the money to pay someone else to manage tracing data.</p>\n\n<p><em>Strong agree. I think my only issue is just the extent to which it cruds up code. But I need to get over that.</em></p>\n\n<p>Yes, it’s very explicit. I think the next big part of oTel is going to be auto-instrumentation, for profiling.</p>\n\n<p><em>You’re a veteran Golang programmer. Say 3 nice things about Rust.</em></p>\n<div class=\"callout\"><p>Most of our backend is in Go, but <code>fly-proxy</code>, <code>corrosion2</code>, and <code>pilot</code> are in Rust.</p>\n</div>\n<ol>\n<li>Option. \n</li><li>Match.\n</li><li>Serde macros.\n</li></ol>\n\n<p><em>Even I can’t say shit about Option and match.</em></p>\n\n<p>Match is so much better than anything in Go.</p>\n\n<p><em>Elixir, Go, and Rust. An honest take on that programming cocktail.</em></p>\n\n<p>Three’s a crowd, Elixir can stay home.</p>\n\n<p><em>If you could only lose one, you’d keep Rust.</em></p>\n\n<p>I’ve learned its shortcomings and the productivity far outweighs having to deal with the Rust compiler.</p>\n\n<p><em>You’d be unhappy if we moved the <code>flaps</code> API code from Go to Elixir.</em></p>\n\n<p>Correct.</p>\n\n<p><em>I kind of buy the idea of doing orchestration and scheduling code, which is policy-intensive, in a higher-level language.</em></p>\n\n<p>Maybe. If Ruby had a better concurrency story, I don’t think Elixir would have a place for us.</p>\n<div class=\"callout\"><p>Here I need to note that Ruby is functionally dead here, and Elixir is ascendant.</p>\n</div>\n<p><em>We have an idiosyncratic management structure. We’re bottom-up, but ambiguously so. We don’t have roadmaps, except when we do. We have minimal top-down technical direction. Critique.</em></p>\n\n<p>It’s too easy to lose sight of whether your current focus [in what you’re building] is valuable to the company.</p>\n\n<p><em>The first thing I warn every candidate about on our “do-not-work-here” calls.</em></p>\n\n<p>I think it comes down to execution, and accountability to actually finish projects. I spun a lot trying to figure out what would be the most valuable work for Fly Machines.</p>\n\n<p><em>You don’t have to be so nice about things.</em></p>\n\n<p>We struggle a lot with consistent communication. We change direction a little too often. It got to a point where I didn’t see a point in devoting time and effort into projects, because I’d not be able to show enough value quick enough.</p>\n\n<p><em>I see things paying off later than we’d hoped or expected they would. Our secret storage system, Pet Semetary, is a good example of this. Our K8s service, FKS, is another obvious one, since we’re shipping MPG on it.</em></p>\n\n<p><em>This is your second time working Kurt, at a company where he’s the CEO. Give him a 1-4 star rating. He can take it! At least, I think he can take it.</em></p>\n\n<p>2022: ★★★★</p>\n\n<p>2023: ★★</p>\n\n<p>2024: ★★✩</p>\n\n<p>2025: ★★★✩</p>\n\n<p>On a four-star scale.</p>\n\n<p><em>Whoah. I did not expect a histogram. Say more about 2023!</em></p>\n\n<p>We hired too many people, too quickly, and didn’t have the guardrails and structure in place for everybody to be successful.</p>\n\n<p><em>Also: GPUs!</em></p>\n\n<p>Yes. That was my next comment.</p>\n\n<p><em>Do we secretly agree about GPUs?</em></p>\n\n<p>I think so.</p>\n\n<p><em>Our side won the argument in the end! But at what cost?</em></p>\n\n<p>They were a killer distraction.</p>\n\n<p><em>Final question: how long will you remain in the first-responder on-call rotation after you leave? I assume at least until August. I have a shift this weekend; can you swap with me? I keep getting weekends.</em></p>\n\n<p>I am going to be asleep all weekend if any of my previous job changes are indicative.</p>\n\n<p><em>I sleep through on-call too! But nobody can yell at you for it now. I think you have the comparative advantage over me in on-calling.</em></p>\n\n<p>Yes I will absolutely take all your future on-call shifts, you have convinced me.</p>\n\n<p><em>All this aside: it has been a privilege watching you work. I hope your next gig is 100x more relaxing than this was. Or maybe I just hope that for myself. Except: I’ll never escape this place. Thank you so much for doing this.</em></p>\n\n<p>Thank you! I’m forever grateful for having the opportunity to be a part of Fly.io.</p>",
      "image": {
        "url": "https://fly.io/blog/the-exit-interview-jp/assets/bye-bye-little-sebastian-cover.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/a-blog-if-kept/",
      "title": "A Blog, If You Can Keep It",
      "description": null,
      "url": "https://fly.io/blog/a-blog-if-kept/",
      "published": "2025-02-10T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>A boldfaced lede like this was a sure sign you were reading a carefully choreographed EffortPost from our team at Fly.io. We’re going to do less of those. Or the same amount but more of a different kind of post. Either way: launch an app on Fly.io today!</p>\n</div>\n<p>Over the last 5 years, we’ve done pretty well for ourselves writing content for Hacker News. And that’s <a href='https://news.ycombinator.com/item?id=39373476' title=''>mostly</a> been good for us. We don’t do conventional marketing, we don’t have a sales team, the rest of social media is atomized over 5 different sites. Writing pieces that HN takes seriously has been our primary outreach tool.</p>\n\n<p>There’s a recipe (probably several, but I know this one works) for charting a post on HN:</p>\n\n<ol>\n<li>Write an EffortPost, which is to say a dense technical piece over 2000 words long; within that rubric there’s a bunch of things that are catnip to HN, including runnable code, research surveys, and explainers. (There are also cat-repellants you learn to steer clear of.)\n</li><li>Assiduously avoid promotion. You have to write for the audience. We get away with sporadically including a call-to-action block in our posts, but otherwise: the post should make sense even if an unrelated company posted it after you went out of business.\n</li><li>Pick topics HN is interested in (it helps if all your topics are au courant for HN, and we’ve been <a href='https://news.ycombinator.com/item?id=32250426' title=''>very</a> <a href='https://news.ycombinator.com/item?id=32018066' title=''>lucky</a> in that regard).\n</li><li>Like 5-6 more style-guide things that help incrementally. Probably 3 different teams writing for HN will have 3 different style guides with only like ½ overlap. Ours, for instances, instructs writers to swear.\n</li></ol>\n\n<p>I like this kind of writing. It’s not even a chore. But it’s become an impediment for us, for a couple reasons: the team serializes behind an “editorial” function here, which keeps us from publishing everything we want; worse, caring so much about our track record leaves us noodling on posts interminably (the poor <a href='https://www.tigrisdata.com/' title=''>Tigrises</a> have been waiting for months for me to publish the piece I wrote about them and FoundationDB; take heart, this post today means that one is coming soon).</p>\n\n<p>But worst of all, I worried incessantly about us <a href='https://gist.github.com/tqbf/e853764b562a2d72a91a6986ca3b77c0' title=''>wearing out our welcome</a>. To my mind, we’d have 1, maybe 2 bites at the HN apple in a given month, and we needed to make them count.</p>\n\n<p>That was dumb. I am dumb about a lot of things! I came around to understanding this after Kurt demanded I publish my blog post about BFAAS (Bash Functions As A Service), 500 lines of Go code that had generated 4500 words in my draft. It was only after I made the decision to stop gatekeeping this blog that I realized <a href='https://simonwillison.net/' title=''>Simon Willison</a> has been disproving my “wearing out the welcome” theory, day in and day out, for years. He just writes stuff about LLMs when it interests him. I mean, it helps that he’s a better writer than we are. But he’s not wasting time choreographing things.</p>\n\n<p>Back in like 2009, <a href='https://web.archive.org/web/20110806040300/http://chargen.matasano.com/chargen/2009/7/22/if-youre-typing-the-letters-a-e-s-into-your-code-youre-doing.html' title=''>we had a blog</a> at another company I was at. That blog drove a lot of business for us (and, on three occasions, almost killed me). It was not in the least bit optimized for HN. I like pretending to be a magazine feature writer, but I miss writing dashed-off pieces every day and clearing space for other people on the team to write as well.</p>\n\n<p>So this is all just a heads up: we’re trying something new. This is a very long and self-indulgent way to say “we’re going to write a normal blog like it’s 2008”, but that’s how broken my brain is after years of having my primary dopaminergic rewards come from how long Fly.io blog posts stay on the front page: I have to disclaim blogging before we start doing it, lest I fail to meet expectations.</p>\n\n<p>Like I said. I’m real dumb. But: looking forward to getting a lot more stuff out on the web for people to read this year!</p>",
      "image": {
        "url": "https://fly.io/blog/a-blog-if-kept/assets/keep-blog.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/semgrep-but-for-real-now/",
      "title": "Did Semgrep Just Get A Lot More Interesting?",
      "description": null,
      "url": "https://fly.io/blog/semgrep-but-for-real-now/",
      "published": "2025-02-10T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"right-sidenote\"><p>This whole paragraph is just one long sentence. God I love <a href=\"https://fly.io/blog/a-blog-if-kept/\" title=\"\">just random-ass blogging</a> again.</p>\n</div>\n<p><a href='https://ghuntley.com/stdlib/' title=''>This bit by Geoffrey Huntley</a> is super interesting to me and, despite calling out that LLM-driven development agents like Cursor have something like a 40% success rate at actually building anything that passes acceptance criteria, makes me think that more of the future of our field belongs to people who figure out how to use this weird bags of model weights than any of us are comfortable with. </p>\n\n<p>I’ve been dinking around with Cursor for a week now (if you haven’t, I think it’s something close to malpractice not to at least take it — or something like it — for a spin) and am just now from this post learning that Cursor has this <a href='https://docs.cursor.com/context/rules-for-ai' title=''>rules feature</a>. </p>\n\n<p>The important thing for me is not how Cursor rules work, but rather how Huntley uses them. He turns them back on themselves, writing rules to tell Cursor how to organize the rules, and then teach Cursor how to write (under human supervision) its own rules.</p>\n\n<p>Cursor kept trying to get Huntley to use Bazel as a build system. So he had cursor write a rule for itself: “no bazel”. And there was no more Bazel. If I’d known I could do this, I probably wouldn’t have bounced from the Elixir project I had Cursor doing, where trying to get it to write simple unit tests got it all tangled up trying to make <a href='https://hexdocs.pm/mox/Mox.html' title=''>Mox</a> work. </p>\n\n<p>But I’m burying the lead. </p>\n\n<p>Security people have been for several years now somewhat in love with a tool called <a href='https://github.com/semgrep/semgrep' title=''>Semgrep</a>. Semgrep is a semantics-aware code search tool; using symbolic variable placeholders and otherwise ordinary code, you can write rules to match pretty much arbitary expressions and control flow. </p>\n\n<p>If you’re an appsec person, where you obviously go with this is: you build a library of Semgrep searches for well-known vulnerability patterns (or, if you’re like us at Fly.io, you work out how to get Semgrep to catch the Rust concurrency footgun of RWLocks inside if-lets).</p>\n\n<p>The reality for most teams though is “ain’t nobody got time for that”. </p>\n\n<p>But I just checked and, unsurprisingly, 4o <a href='https://chatgpt.com/share/67aa94a7-ea3c-8012-845c-6c9491b33fe4' title=''>seems to do reasonably well</a> at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?</p>\n\n<p>What interests me is this: it seems obvious that we’re going to do more and more “closed-loop” LLM agent code generation stuff. By “closed loop”, I mean that the thingy that generates code is going to get to run the code and watch what happens when it’s interacted with. You’re just a small bit of glue code and a lot of system prompting away from building something like that right now: <a href='https://x.com/chris_mccord/status/1882839014845374683' title=''>Chris McCord is building</a> a thingy that generates whole Elixir/Phoenix apps and runs them as Fly Machines. When you deploy these kinds of things, the LLM gets to see the errors when the code is run, and it can just go fix them. It also gets to see errors and exceptions in the logs when you hit a page on the app, and it can just go fix them.</p>\n\n<p>With a bit more system prompting, you can get an LLM to try to generalize out from exceptions it fixes and generate unit test coverage for them. </p>\n\n<p>With a little bit more system prompting, you can probably get an LLM to (1) generate a Semgrep rule for the generalized bug it caught, (2) test the Semgrep rule with a positive/negative control, (3) save the rule, (4) test the whole codebase with Semgrep for that rule, and (5) fix anything it finds that way. </p>\n\n<p>That is a lot more interesting to me than tediously (and probably badly) trying to predict everything that will go wrong in my codebase a priori and Semgrepping for them. Which is to say: Semgrep — which I have always liked — is maybe a lot more interesting now? And tools like it?</p>",
      "image": {
        "url": "https://fly.io/blog/semgrep-but-for-real-now/assets/ci-cd-cover.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/vscode-ssh-wtf/",
      "title": "VSCode’s SSH Agent Is Bananas",
      "description": null,
      "url": "https://fly.io/blog/vscode-ssh-wtf/",
      "published": "2025-02-07T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<p>We’re interested in getting integrated into the flow VSCode uses to do remote editing over SSH, because everybody is using VSCode now, and, in particular, they’re using forks of VSCode that generate code with LLMs. </p>\n<div class=\"right-sidenote\"><p>”hallucination” is what we call it when LLMs get code wrong; “engineering” is what we call it when people do.</p>\n</div>\n<p>LLM-generated code is <a href='https://nicholas.carlini.com/writing/2024/how-i-use-ai.html' title=''>useful in the general case</a> if you know what you’re doing. But it’s ultra-useful if you can close the loop between the LLM and the execution environment (with an “Agent” setup). There’s lots to say about this, but for the moment: it’s a semi-effective antidote to hallucination: the LLM generates the code, the agent scaffolding runs the code, the code generates errors, the agent feeds it back to the LLM, the process iterates. </p>\n\n<p>So, obviously, the issue here is you don’t want this iterative development process happening on your development laptop, because LLMs have boundary issues, and they’ll iterate on your system configuration just as happily on the Git project you happen to be working in. A thing you’d really like to be able to do: run a closed-loop agent-y (“agentic”? is that what we say now) configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can’t screw you over in any way. You get where we’re going with this.</p>\n\n<p>Anyways! I would like to register a concern.</p>\n\n<p>Emacs hosts the spiritual forebearer of remote editing systems, a blob of hyper-useful Elisp called <a href='https://www.gnu.org/software/tramp/' title=''>“Tramp”</a>. If you can hook Tramp up to any kind of interactive environment — usually, an SSH session — where it can run Bourne shell commands, it can extend Emacs to that environment.</p>\n\n<p>So, VSCode has a feature like Tramp. Which, neat, right? You’d think, take Tramp, maybe simplify it a bit, switch out Elisp for Typescript.</p>\n\n<p>You’d think wrong!</p>\n\n<p>Unlike Tramp, which lives off the land on the remote connection, VSCode mounts a full-scale invasion: it runs a Bash snippet stager that downloads an agent, including a binary installation of Node. </p>\n\n<p>I <em>think</em> this is <a href='https://github.com/microsoft/vscode/tree/c9e7e1b72f80b12ffc00e06153afcfedba9ec31f/src/vs/server/node' title=''>the source code</a>?</p>\n\n<p>The agent runs over port-forwarded SSH. It establishes a WebSockets connection back to your running VSCode front-end. The underlying protocol on that connection can:</p>\n\n<ul>\n<li>Wander around the filesystem\n</li><li>Edit arbitrary files\n</li><li>Launch its own shell PTY processes\n</li><li>Persist itself\n</li></ul>\n\n<p>In security-world, there’s a name for tools that work this way. I won’t say it out loud, because that’s not fair to VSCode, but let’s just say the name is murid in nature.</p>\n\n<p>I would be a little nervous about letting people VSCode-remote-edit stuff on dev servers, and apoplectic if that happened during an incident on something in production. </p>\n\n<p>It turns out we don’t have to care about any of this to get a custom connection to a Fly Machine working in VSCode, so none of this matters in any kind of deep way, but: we’ve decided to just be a blog again, so: we had to learn this, and now you do too.</p>",
      "image": {
        "url": "https://fly.io/static/images/default-post-thumbnail.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/",
      "title": "AI GPU Clusters, From Your Laptop, With Livebook",
      "description": null,
      "url": "https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/",
      "published": "2024-09-24T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>Livebook, FLAME, and the Nx stack: three Elixir components that are easy to describe, more powerful than they look, and intricately threaded into the Elixir ecosystem. A few weeks ago, Chris McCord (👋) and Chris Grainger showed them off at ElixirConf 2024. We thought the talk was worth a recap.</p>\n</div>\n<p>Let’s begin by introducing our cast of characters.</p>\n\n<p><a href='https://livebook.dev/' title=''>Livebook</a> is usually described as Elixir’s answer to <a href='https://jupyter.org/' title=''>Jupyter Notebooks</a>. And that’s a good way to think about it. But Livebook takes full advantage of the Elixir platform, which makes it sneakily powerful. By linking up directly with Elixir app clusters, Livebook can switch easily between driving compute locally or on remote servers, and makes it easy to bring in any kind of data into reproducible workflows.</p>\n\n<p><a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''>FLAME</a> is the Elixir’s answer to serverless computing. By having the library manage a pool of executors for you, FLAME lets you treat your entire application as if it was elastic and scale-to-zero. You configure FLAME with some basic information about where to run code and how many instances it’s allowed to run with, and then mark off any arbitrary section of code with <code>Flame.call</code>. The framework takes care of the rest. It’s the upside of serverless without committing yourself to blowing your app apart into tiny, intricately connected pieces.</p>\n\n<p>The <a href='https://github.com/elixir-nx' title=''>Nx stack</a> is how you do Elixir-native AI and ML. Nx gives you an Elixir-native notion of tensor computations with GPU backends. <a href='https://github.com/elixir-nx/axon' title=''>Axon</a> builds a common interface for ML models on top of it. <a href='https://github.com/elixir-nx/bumblebee' title=''>Bumblebee</a> makes those models available to any Elixir app that wants to download them, from just a couple lines of code.</p>\n\n<p>Here is quick video showing how to transfer a local tensor to a remote GPU, using Livebook, FLAME, and Nx:</p>\n<div class=\"youtube-container\" data-exclude-render>\n  <div class=\"youtube-video\">\n    <iframe\n      width=\"100%\"\n      height=\"100%\"\n      src=\"https://www.youtube.com/embed/5ImP3gpUSkQ\"\n      frameborder=\"0\"\n      allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n      allowfullscreen>\n    </iframe>\n  </div>\n</div>\n\n\n<p>Let’s dive into the <a href='https://www.youtube.com/watch?v=4qoHPh0obv0' title=''>keynote</a>.</p>\n<h2 id='poking-a-hole-in-your-infrastructure' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#poking-a-hole-in-your-infrastructure' aria-label='Anchor'></a><span class='plain-code'>Poking a hole in your infrastructure</span></h2>\n<p>Any Livebook, including the one running on your laptop, can start a runtime running on a Fly Machine, in Fly.io’s public cloud. That Elixir machine will (by default) live in your default Fly.io organization, giving it networked access to all the other apps that might live there.</p>\n\n<p>This is an access control situation that mostly just does what you want it to do without asking. Unless you ask it to, Fly.io isn’t exposing anything to the Internet, or to other users of Fly.io. For instance: say we have a database we’re going to use to generate reports. It can hang out on our Fly organization, inside of a private network with no connectivity to the world. We can spin up a Livebook instance that can talk to it, without doing any network or infrastructure engineering to make that happen.</p>\n\n<p>But wait, there’s more. Because this is all Elixir, Livebook also allows you to connect to any running Erlang/Elixir application in your infrastructure to debug, introspect, and monitor them.</p>\n\n<p>Check out this clip of Chris McCord connecting <a href='https://rtt.fly.dev/' title=''>to an existing application</a> during the keynote:</p>\n<div class=\"youtube-container\" data-exclude-render>\n  <div class=\"youtube-video\">\n    <iframe\n      width=\"100%\"\n      height=\"100%\"\n      src=\"https://www.youtube.com/embed/4qoHPh0obv0?start=1106\"\n      frameborder=\"0\"\n      allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n      allowfullscreen>\n    </iframe>\n  </div>\n</div>\n\n\n<p>Running a snippet of code from a laptop on a remote server is a neat trick, but Livebook is doing something deeper than that. It’s taking advantage of Erlang/Elixir’s native facility with cluster computation and making it available to the notebook. As a result, when we do things like auto-completing, Livebook delivers results from modules defined on the remote note itself. 🤯</p>\n<h2 id='elastic-scale-with-flame' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#elastic-scale-with-flame' aria-label='Anchor'></a><span class='plain-code'>Elastic scale with FLAME</span></h2>\n<p>When we first introduced FLAME, the example we used was video encoding.</p>\n\n<p>Video encoding is complicated and slow enough that you’d normally make arrangements to run it remotely or in a background job queue, or as a triggerable Lambda function. The point of FLAME is to get rid of all those steps, and give them over to the framework instead. So: we wrote our <code>ffpmeg</code> calls inline like normal code, as if they were going to complete in microseconds, and wrapped them in <code>Flame.call</code> blocks. That was it, that was the demo.</p>\n\n<p>Here, we’re going to put a little AI spin on it.</p>\n\n<p>The first thing we’re doing here is driving FLAME pools from Livebook. Livebook will automatically synchronize your notebook dependencies as well as any module or code defined in your notebook across nodes. That means any code we write in our notebook can be dispatched transparently out to arbitrarily many compute nodes, without ceremony.</p>\n\n<p>Now let’s add some AI flair. We take an object store bucket full of video files. We use <code>ffmpeg</code> to extract stills from the video at different moments. Then: we send them to <a href='https://www.llama.com/' title=''>Llama</a>, running on <a href='https://fly.io/gpu' title=''>GPU Fly Machines</a> (still locked to our organization), to get descriptions of the stills.</p>\n\n<p>All those stills and descriptions get streamed back to our notebook, in real time:</p>\n<div class=\"youtube-container\" data-exclude-render>\n  <div class=\"youtube-video\">\n    <iframe\n      width=\"100%\"\n      height=\"100%\"\n      src=\"https://www.youtube.com/embed/4qoHPh0obv0?start=1692\"\n      frameborder=\"0\"\n      allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n      allowfullscreen>\n    </iframe>\n  </div>\n</div>\n\n\n<p>At the end, the descriptions are sent to <a href='https://mistral.ai/' title=''>Mistral</a>, which builds a summary.</p>\n\n<p>Thanks to FLAME, we get explicit control over the minimum and the maximum amount of nodes you want running at once, as well their concurrency settings. As nodes finish processing each video, new ones are automatically sent to them, until the whole bucket has been traversed. Each node will automatically shut down after an idle timeout and the whole cluster terminates if you disconnect the Livebook runtime.</p>\n\n<p>Just like your app code, FLAME lets you take your notebook code designed to run locally, change almost nothing, and elastically execute it across ephemeral infrastructure.</p>\n<h2 id='64-gpus-hyperparameter-tuning-on-a-laptop' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#64-gpus-hyperparameter-tuning-on-a-laptop' aria-label='Anchor'></a><span class='plain-code'>64-GPUs hyperparameter tuning on a laptop</span></h2>\n<p>Next, Chris Grainger, CTO of <a href='https://amplified.ai/' title=''>Amplified</a>, takes the stage.</p>\n\n<p>For work at Amplified, Chris wants to analyze a gigantic archive of patents, on behalf of a client doing edible cannibinoid work. To do that, he uses a BERT model (BERT, from Google, is one of the OG “transformer” models, optimized for text comprehension).</p>\n\n<p>To make the BERT model effective for this task, he’s going to do a hyperparameter training run.</p>\n\n<p>This is a much more complicated AI task than the Llama work we just showed up. Chris is going to generate a cluster of 64 GPU Fly Machines, each running an <a href='https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/' title=''>L40s GPU</a>. On each of these nodes, he needs to:</p>\n\n<ul>\n<li>setup its environment (including native dependencies and GPU bindings)\n</li><li>load the training data\n</li><li>compile a different version of BERT with different parameters, optimizers, etc.\n</li><li>start the fine-tuning\n</li><li>stream its results in real-time to each assigned chart\n</li></ul>\n\n<p>Here’s the clip. You’ll see the results stream in, in real time, directly back to his Livebook. We’ll wait, because it won’t take long to watch:</p>\n<div class=\"youtube-container\" data-exclude-render>\n  <div class=\"youtube-video\">\n    <iframe\n      width=\"100%\"\n      height=\"100%\"\n      src=\"https://www.youtube.com/embed/4qoHPh0obv0?start=3344\"\n      frameborder=\"0\"\n      allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n      allowfullscreen>\n    </iframe>\n  </div>\n</div>\n\n<h2 id='this-is-just-the-beginning' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#this-is-just-the-beginning' aria-label='Anchor'></a><span class='plain-code'>This is just the beginning</span></h2>\n<p>The suggestion of mixing Livebook and FLAME to elastically scale notebook execution was originally proposed by Chris Grainger during ElixirConf EU. During the next four months, Jonatan Kłosko, Chris McCord, and José Valim worked part-time on making it a reality in time for ElixirConf US. Our ability to deliver such a rich combination of features in such a short period of time is a testament to the capabilities of the Erlang Virtual Machine, which Elixir and Livebook runs on. Other features, such as <a href='https://github.com/elixir-explorer/explorer/issues/932' title=''>remote dataframes and distributed GC</a>, were implemented in a weekend.  Bringing the same functionality to other ecosystems would take several additional months, sometimes accompanied by millions in funding, and often times as part of a closed-source product.</p>\n\n<p>Furthermore, since we announced this feature, <a href='https://github.com/mruoss' title=''>Michael Ruoss</a> stepped in and brought the same functionality to Kubernetes. From Livebook v0.14.1, you can start Livebook runtimes inside a Kubernetes cluster and also use FLAME to elastically scale them. Expect more features and news in this space!</p>\n\n<p>Finally, Fly’s infrastructure played a key role in making it possible to start a cluster of GPUs in seconds rather than minutes, and all it requires is a Docker image. We’re looking forward to see how other technologies and notebook platforms can leverage Fly to also elevate their developer experiences.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Launch a GPU app in seconds</h1>\n    <p>Run your own LLMs or use Livebook for elastic GPU workflows ✨</p>\n      <a class=\"btn btn-lg\" href=\"/gpu\">\n        Go!  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-rabbit.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>",
      "image": {
        "url": "https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/assets/ai-gpu-livebook-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/accident-forgiveness/",
      "title": "Accident Forgiveness",
      "description": null,
      "url": "https://fly.io/blog/accident-forgiveness/",
      "published": "2024-08-21T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. <a href=\"https://fly.io/speedrun\" title=\"\">Try it out; you’ll be deployed in just minutes</a>, and, as you’re about to read, with less financial risk.</p>\n</div>\n<p>Public cloud billing is terrifying.</p>\n\n<p>The premise of a public cloud — what sets it apart from a hosting provider — is 8,760 hours/year of on-tap deployable compute, storage, and networking. Cloud resources are “elastic”: they’re acquired and released as needed; in the “cloud-iest” apps, without human intervention. Public cloud resources behave like utilities, and that’s how they’re priced.</p>\n\n<p>You probably can’t tell me how much electricity your home is using right now, and  may only come within tens of dollars of accurately predicting your water bill. But neither of those bills are all that scary, because you assume there’s a limit to how much you could run them up in a single billing interval.</p>\n\n<p>That’s not true of public clouds. There are only so many ways to “spend” water at your home, but there are indeterminably many ways to land on a code path that grabs another VM, or to miskey a configuration, or to leak long-running CI/CD environments every time a PR gets merged. Pick a practitioner at random, and I bet they’ve read a story within the last couple months about someone running up a galactic-scale bill at some provider or other.</p>\n<h2 id='implied-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implied-accident-forgiveness' aria-label='Anchor'></a><span class='plain-code'>Implied Accident Forgiveness</span></h2>\n<p>For people who don’t do a lot of cloud work, what all this means is that every PR push sets off a little alarm in the back of their heads: “you may have just incurred $200,000 of costs!”. The alarm is quickly silenced,  though it’s still subtly extracting a cortisol penalty. But by deadening the nerves that sense the danger of unexpected charges, those people are nudged closer to themselves being the next story on Twitter about an accidental $200,000 bill.</p>\n\n<p>The saving grace here, which you’ll learn if you ever become that $200,000 story, is that nobody pays those bills.</p>\n\n<p>See, what cloud-savvy people know already is that providers have billing support teams, which spend a big chunk of their time conceding disputed bills. If you do something luridly stupid and rack up costs, AWS and GCP will probably cut you a break. We will too. Everyone does.</p>\n\n<p>If you didn’t already know this, you’re welcome; I’ve made your life a little better, even if you don’t run things on Fly.io.</p>\n\n<p>But as soothing as it is to know you can get a break from cloud providers, the billing situation here is still a long ways away from “good”. If you accidentally add a zero to a scale count and don’t notice for several weeks, AWS or GCP will probably cut you a break. But they won’t <em>definitely</em> do it, and even though your odds are good, you’re still finding out at email- and phone-tag scale speeds. That’s not fun!</p>\n<h2 id='explicit-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#explicit-accident-forgiveness' aria-label='Anchor'></a><span class='plain-code'>Explicit Accident Forgiveness</span></h2>\n<p>Charging you for stuff you didn’t want is bad business.</p>\n\n<p>Good business, we think, means making you so comfortable with your cloud you try new stuff. You, and everyone else on your team. Without a chaperone from the finance department.</p>\n\n<p>So we’re going to do the work to make this official. If you’re a customer of ours, we’re going to charge you in exacting detail for every resource you intentionally use of ours, but if something blows up and you get an unexpected bill, we’re going to let you off the hook.</p>\n<h2 id='not-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-so-fast' aria-label='Anchor'></a><span class='plain-code'>Not So Fast</span></h2>\n<p>This is a Project, with a capital P. While we’re kind of kicking ourselves for not starting it earlier, there are reasons we couldn’t do it back in 2020.</p>\n\n<p>The Fully Automated Accident-Forgiving Billing System of the Future (which we are in fact building and may even one day ship) will give you a line-item veto on your invoice. We are a long ways away. The biggest reason is fraud.</p>\n\n<p>Sit back, close your eyes, and try to think about everything public clouds do to make your life harder. Chances are, most of those things are responses to fraud. Cloud platforms attract fraudsters like ants to an upturned ice cream cone. Thanks to the modern science of cryptography, fraudsters have had a 15 year head start on turning metered compute into picodollar-granular near-money assets.</p>\n\n<p>Since there’s no bouncer at the door checking IDs here, an open-ended and automated commitment to accident forgiveness is, with iron certainty, going to be used overwhelmingly in order to trick us into “forgiving” cryptocurrency miners. We’re cloud platform engineers. They’re our primary pathogen.</p>\n\n<p>So, we’re going to roll this out incrementally.</p>\n<div class=\"callout\"><p><strong class=\"font-semibold text-navy-950\">Why not billing alerts?</strong> We’ll get there, but here too there are reasons we haven’t yet: (1) meaningful billing alerts were incredibly difficult to do with our previous billing system, and building the new system and migrating our customers to it has been a huge lift, a nightmare from which we are only now waking (the billing system’s official name); and (2) we’re wary about alerts being a product design cop-out; if we can alert on something, why aren’t we fixing it?</p>\n</div><h2 id='accident-forgiveness-v0-84beta' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accident-forgiveness-v0-84beta' aria-label='Anchor'></a><span class='plain-code'>Accident Forgiveness v0.84beta</span></h2>\n<p>All the same subtextual, implied reassurances that every cloud provider offers remain in place at Fly.io. You are strictly better off after this announcement, we promise.</p>\n<div class=\"right-sidenote\"><p>I added the “almost” right before publishing, because I’m chicken.</p>\n</div>\n<p>Now: for customers that have a support contract with us, at any level, there’s something new: I’m saying the quiet part loud. The next time you see a bill with an unexpected charge on it, we’ll refund that charge, (almost) no questions asked.</p>\n\n<p>That policy is so simple it feels anticlimactic to write. So, some additional color commentary:</p>\n\n<p>We’re not advertising a limit to the number of times you can do this. If you’re a serious customer of ours, I promise that you cannot remotely fathom the fullness of our fellow-feeling. You’re not annoying us by getting us to refund unexpected charges. If you are growing a project on Fly.io, we will bend over backwards to keep you growing.</p>\n\n<p>How far can we take this? How simple can we keep this policy? We’re going to find out together.</p>\n\n<p>To begin with, and in the spirit of “doing things that won’t scale”, when we forgive a bill, what’s going to happen next is this: I’m going to set an irritating personal reminder for Kurt to look into what happened, now and then the day before your next bill, so we can see what’s going wrong. He’s going to hate that, which is the point: our best feature work is driven by Kurt-hate.</p>\n\n<p>Obviously, if you’re rubbing your hands together excitedly over the opportunity this policy presents, then, well, not so much with the fellow-feeling. We reserve the right to cut you off.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Support For Developers, By Developers</h1>\n    <p>Explicit Accident Forgiveness is just one thing we like about Support at Fly.io.</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/accident-forgiveness\">\n        Go find out!  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-dog.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='whats-next-accident-protection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next-accident-protection' aria-label='Anchor'></a><span class='plain-code'>What’s Next: Accident Protection</span></h2>\n<p>We think this is a pretty good first step. But that’s all it is.</p>\n\n<p>We can do better than offering you easy refunds for mistaken deployments and botched CI/CD jobs. What’s better than getting a refund is never incurring the charge to begin with, and that’s the next step we’re working on.</p>\n<div class=\"right-sidenote\"><p>More to come on that billing system.</p>\n</div>\n<p>We built a new billing system so that we can do things like that. For instance: we’re in a position to catch sudden spikes in your month-over-month bills, flag them, and catch weird-looking deployments before we bill for them.</p>\n\n<p>Another thing we rebuilt billing for is <a href='https://community.fly.io/t/reservation-blocks-40-discount-on-machines-when-youre-ready-to-commit/20858' title=''>reserved pricing</a>. Already today you can get a steep discount from us reserving blocks of compute in advance. The trick to taking advantage of reserved pricing is confidently predicting a floor to your usage. For a lot of people, that means fighting feelings of loss aversion (nobody wants to get gym priced!). So another thing we can do in this same vein: catch opportunities to move customers to reserved blocks, and offer backdated reservations. We’ll figure this out too.</p>\n\n<p>Someday, when we’re in a monopoly position, our founders have all been replaced by ruthless MBAs, and Kurt has retired to farm coffee beans in lower Montana, we may stop doing this stuff. But until that day this is the right choice for our business.</p>\n\n<p>Meanwhile: like every public cloud, we provision our own hardware, and we have excess capacity. Your messed-up CI/CD jobs didn’t really cost us anything, so if you didn’t really want them, they shouldn’t cost you anything either. Take us up on this! We love talking to you.</p>",
      "image": {
        "url": "https://fly.io/blog/accident-forgiveness/assets/money-for-mistakes-blog-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/",
      "title": "We're Cutting L40S Prices In Half",
      "description": null,
      "url": "https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/",
      "published": "2024-08-15T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. And as of today, cheaper GPUs. <a href=\"https://fly.io/speedrun\" title=\"\">Try it out; you’ll be deployed in just minutes</a>.</p>\n</div>\n<p>We just lowered the prices on NVIDIA L40s GPUs to $1.25 per hour. Why? Because our feet are cold and we burn processor cycles for heat. But also other reasons.</p>\n\n<p>Let’s back up.</p>\n\n<p>We offer 4 different NVIDIA GPU models; in increasing order of performance, they’re the A10, the L40S, the 40G PCI A100, and the 80G SXM A100.  Guess which one is most popular.</p>\n\n<p>We guessed wrong, and spent a lot of time working out how to maximize the amount of GPU power we could deliver to a single Fly Machine. Users surprised us. By a wide margin, the most popular GPU in our inventory is the A10.</p>\n\n<p>The A10 is an older generation of NVIDIA GPU with fewer, slower cores and less memory. It’s the least capable GPU we offer. But that doesn’t matter, because it’s capable enough. It’s solid for random inference tasks, and handles mid-sized generative AI stuff like Mistral Nemo or Stable Diffusion. For those workloads, there’s not that much benefit in getting a beefier GPU.</p>\n\n<p>As a result, we can’t get new A10s in fast enough for our users.</p>\n\n<p>If there’s one thing we’ve learned by talking to our customers over the last 4 years, it’s that y'all love a peek behind the curtain. So we’re going to let you in on a little secret about how a hardware provider like Fly.io formulates GPU strategy: none of us know what the hell we’re doing.</p>\n\n<p>If you had asked us in 2023 what the biggest GPU problem we could solve was, we’d have said “selling fractional A100 slices”. We burned a whole quarter trying to get MIG, or at least vGPUs, working through IOMMU PCI passthrough on Fly Machines, in a project so cursed that Thomas has forsworn ever programming again. Then we went to market selling whole A100s, and for several more months it looked like the biggest problem we needed to solve was finding a secure way to expose NVLink-ganged A100 clusters to VMs so users could run training. Then H100s; can we find H100s anywhere? Maybe in a black market in Shenzhen?</p>\n\n<p>And here we are, a year later, looking at the data, and the least sexy, least interesting GPU part in the catalog is where all the action is.</p>\n\n<p>With actual customer data to back up the hypothesis, here’s what we think is happening today:</p>\n\n<ul>\n<li>Most users who want to plug GPU-accelerated AI workloads into fast networks are doing inference, not training. \n</li><li>The hyperscaler public clouds strangle these customers, first with GPU instance surcharges, and then with egress fees for object storage data when those customers try to outsource the GPU stuff to GPU providers.\n</li><li>If you’re trying to do something GPU-accelerated in response to an HTTP request, the right combination of GPU, instance RAM, fast object storage for datasets and model parameters, and networking is much more important than getting your hands on an H100.\n</li></ul>\n\n<p>This is a thing we didn’t see coming, but should have: training workloads tend to look more like batch jobs, and inference tends to look more like transactions. Batch training jobs aren’t that sensitive to networking or even reliability. Live inference jobs responding to end-user HTTP requests are. So, given our pricing, of course the A10s are a sweet spot.</p>\n\n<p>The next step up in our lineup after the A10 is the L40S. The L40S is a nice piece of kit. We’re going to take a beat here and sell you on the L40S, because it’s kind of awesome.</p>\n\n<p>The L40S is an AI-optimized version of the L40, which is the data center version of the GeForce RTX 4090, resembling two 4090s stapled together.</p>\n\n<p>If you’re not a GPU hardware person, the RTX 4090 is a gaming GPU, the kind you’d play ray-traced Witcher 3 on. NVIDIA’s high-end gaming GPUs are actually reasonably good at AI workloads! But they suck in a data center rack: they chug power, they’re hard to cool, and they’re less dense. Also, NVIDIA can’t charge as much for them.</p>\n\n<p>Hence the L40: (much) more memory, less energy consumption, designed for a rack, not a tower case. Marked up for “enterprise”.</p>\n\n<p>NVIDIA positioned the L40 as a kind of “graphics” AI GPU. Unlike the super high-end cards like the A100/H100, the L40 keeps all the rendering hardware, so it’s good for 3D graphics and video processing. Which is sort of what you’d expect from a “professionalized” GeForce card.</p>\n\n<p>A funny thing happened in the middle of 2023, though: the market for ultra-high-end NVIDIA cards went absolutely batshit. The huge cards you’d gang up for training jobs got impossible to find, and NVIDIA became one of the most valuable companies in the world. Serious shops started working out plans to acquire groups of L40-type cards to work around the problem, whether or not they had graphics workloads.</p>\n\n<p>The only company in this space that does know what they’re doing is NVIDIA. Nobody has written a highly-ranked Reddit post about GPU workloads without NVIDIA noticing and creating a new SKU. So they launched the L40S, which is an L40 with AI workload compute performance comparable to that of the A100 (without us getting into the details of F32 vs. F16 models).</p>\n\n<p>Long story short, the L40S is an A100-performer that we can price for A10 customers; the Volkswagen GTI of our lineup. We’re going to see if we can make that happen.</p>\n\n<p>We think the combination of just-right-sized inference GPUs and Tigris object storage is pretty killer:</p>\n\n<ul>\n<li>model parameters, data sets, and compute are all close together\n</li><li>everything plugged into an Anycast network that’s fast everywhere in the world\n</li><li>on VM instances that have enough memory to actually run real frameworks on\n</li><li>priced like we actually want you to use it.\n</li></ul>\n\n<p>You should use L40S cards without thinking hard about it. So we’re making it official. You won’t pay us a dime extra to use one instead of an A10. Have at it! Revolutionize the industry. For $1.25 an hour.</p>\n\n<p>Here are things you can do with an L40S on Fly.io today:</p>\n\n<ul>\n<li>You can run Llama 3.1 70B — a big Llama — for LLM jobs.\n</li><li>You can run Flux from Black Forest Labs for genAI images.\n</li><li>You can run Whisper for automated speech recognition.\n</li><li>You can do whole-genome alignment with SegAlign (Thomas’ biochemist kid who has been snatching free GPU hours for his lab gave us this one, and we’re taking his word for it).\n</li><li>You can run DOOM Eternal, building the Stadia that Google couldn’t pull off, because the L40S hasn’t forgotten that it’s a graphics GPU. \n</li></ul>\n\n<p>It’s going to get chilly in Chicago in a month or so. Go light some cycles on fire! </p>",
      "image": {
        "url": "https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/assets/gpu-ga-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/machine-migrations/",
      "title": "Making Machines Move",
      "description": null,
      "url": "https://fly.io/blog/machine-migrations/",
      "published": "2024-07-30T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, a global public cloud with simple, developer-friendly ergonomics. If you’ve got a working Docker image, we’ll transmogrify it into a Fly Machine: a VM running on our hardware anywhere in the world. <a href=\"https://fly.io/speedrun\" title=\"\">Try it out; you’ll be deployed in just minutes</a>.</p>\n</div>\n<p>At the heart of our platform is a systems design tradeoff about durable storage for applications.  When we added storage three years ago, to support stateful apps, we built it on attached NVMe drives. A benefit: a Fly App accessing a file on a Fly Volume is never more than a bus hop away from the data. A cost: a Fly App with an attached Volume is anchored to a particular worker physical.</p>\n<div class=\"right-sidenote\"><p><code>bird</code>: a BGP4 route server.</p>\n</div>\n<p>Before offering attached storage, our on-call runbook was almost as simple as “de-bird that edge server”, “tell <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>Nomad</a> to drain that worker”, and “go back to sleep”. NVMe cost us that drain operation, which terribly complicated the lives of our infra team. We’ve spent the last year getting “drain” back. It’s one of the biggest engineering lifts we’ve made, and if you didn’t notice, we lifted it cleanly.</p>\n<h3 id='the-goalposts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-goalposts' aria-label='Anchor'></a><span class='plain-code'>The Goalposts</span></h3>\n<p>With stateless apps, draining a worker is easy. For each app instance running on the victim server, start a new instance elsewhere. Confirm it’s healthy, then kill the old one. Rinse, repeat. At our 2020 scale, we could drain a fully loaded worker in just a handful of minutes.</p>\n\n<p>You can see why this process won’t work for apps with attached volumes. Sure, create a new volume elsewhere on the fleet, and boot up a new Fly Machine attached to it. But the new volume is empty. The data’s still stuck on the original worker. We asked, and customers were not OK with this kind of migration.</p>\n\n<p>Of course, we back Volumes snapshots up (at an interval) to off-network storage. But for “drain”, restoring backups isn’t nearly good enough. No matter the backup interval, a “restore from backup migration\" will lose data, and a “backup and restore” migration  incurs untenable downtime.</p>\n\n<p>The next thought you have is, “OK, copy the volume over”. And, yes, of course you have to do that. But you can’t just <code>copy</code>, <code>boot</code>, and then <code>kill</code> the old Fly Machine. Because the original Fly Machine is still alive and writing, you have to <code>kill</code>first, then <code>copy</code>, then <code>boot</code>.</p>\n\n<p>Fly Volumes can get pretty big. Even to a rack buddy physical server, you’ll hit a point where draining incurs minutes of interruption, especially if you’re moving lots of volumes simultaneously. <code>Kill</code>, <code>copy</code>, <code>boot</code> is too slow.</p>\n<div class=\"callout\"><p>There’s a world where even 15 minutes of interruption is tolerable. It’s the world where you run more than one instance of your application to begin with, so prolonged interruption of a single Fly Machine isn’t visible to your users. Do this! But we have to live in the same world as our customers, many of whom don’t run in high-availability configurations.</p>\n</div><h3 id='behold-the-clone-o-mat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#behold-the-clone-o-mat' aria-label='Anchor'></a><span class='plain-code'>Behold The Clone-O-Mat</span></h3>\n<p><code>Copy</code>, <code>boot</code>, <code>kill</code> loses data. <code>Kill</code>, <code>copy</code>, <code>boot</code> takes too long. What we needed is a new operation: <code>clone</code>.</p>\n\n<p><code>Clone</code> is a lazier, asynchronous <code>copy</code>. It creates a new volume elsewhere on our fleet, just like <code>copy</code> would. But instead of blocking, waiting to transfer every byte from the original volume, <code>clone</code> returns immediately, with a transfer running in the background.</p>\n\n<p>A new Fly Machine can be booted with that cloned volume attached. Its blocks are mostly empty. But that’s OK: when the new Fly Machine tries to read from it, the block storage system works out whether the block has been transferred. If it hasn’t, it’s fetched over the network from the original volume; this is called “hydration”. Writes are even easier, and don’t hit the network at all.</p>\n\n<p><code>Kill</code>, <code>copy</code>, <code>boot</code> is slow. But <code>kill</code>, <code>clone</code>, <code>boot</code> is fast; it can be made asymptotically as fast as stateless migration.</p>\n\n<p>There are three big moving pieces to this design.</p>\n\n<ol>\n<li>First, we have to rig up our OS storage system to make this <code>clone</code> operation work.\n</li><li>Then, to read blocks over the network, we need a network protocol. (Spoiler: iSCSI, though we tried other stuff first.)\n</li><li>Finally, the gnarliest of those pieces is our orchestration logic: what’s running where, what state is it in, and whether it’s plugged in correctly.\n</li></ol>\n<h3 id='block-level-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#block-level-clone' aria-label='Anchor'></a><span class='plain-code'>Block-Level Clone</span></h3>\n<p>The Linux feature we need to make this work already exists; <a href='https://docs.kernel.org/admin-guide/device-mapper/dm-clone.html' title=''>it’s called <code>dm-clone</code></a>. Given an existing, readable storage device, <code>dm-clone</code> gives us a new device, of identical size, where reads of uninitialized blocks will pull from the original. It sounds terribly complicated, but it’s actually one of the simpler kernel lego bricks. Let’s demystify it.</p>\n\n<p>As far as Unix is concerned, random-access storage devices, be they spinning rust or NVMe drives, are all instances of the common class “block device”. A block device is addressed in fixed-size (say, 4KiB) chunks, and <a href='https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L356' title=''>handles (roughly) these operations</a>:</p>\n<div class=\"highlight-wrapper group relative cpp\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-9s54so8p\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-9s54so8p\"><span class=\"k\">enum</span> <span class=\"n\">req_opf</span> <span class=\"p\">{</span>\n    <span class=\"cm\">/* read sectors from the device */</span>\n    <span class=\"n\">REQ_OP_READ</span>     <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* write sectors to the device */</span>\n    <span class=\"n\">REQ_OP_WRITE</span>        <span class=\"o\">=</span> <span class=\"mi\">1</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* flush the volatile write cache */</span>\n    <span class=\"n\">REQ_OP_FLUSH</span>        <span class=\"o\">=</span> <span class=\"mi\">2</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* discard sectors */</span>\n    <span class=\"n\">REQ_OP_DISCARD</span>      <span class=\"o\">=</span> <span class=\"mi\">3</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* securely erase sectors */</span>\n    <span class=\"n\">REQ_OP_SECURE_ERASE</span> <span class=\"o\">=</span> <span class=\"mi\">5</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* write the same sector many times */</span>\n    <span class=\"n\">REQ_OP_WRITE_SAME</span>   <span class=\"o\">=</span> <span class=\"mi\">7</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* write the zero filled sector many times */</span>\n    <span class=\"n\">REQ_OP_WRITE_ZEROES</span> <span class=\"o\">=</span> <span class=\"mi\">9</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* ... */</span>\n<span class=\"p\">};</span>\n</code></pre>\n  </div>\n</div>\n<p>You can imagine designing a simple network protocol that supported all these options. It might have messages that looked something like:</p>\n\n<p><img alt=\"A packet diagram, just skip down to \"struct bio\" below\" src=\"/blog/machine-migrations/assets/packet.png?2/3&center\" />\nGood news! The Linux block system is organized as if your computer was a network running a protocol that basically looks just like that. Here’s the message structure:</p>\n<div class=\"right-sidenote\"><p>I’ve <a href=\"https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L223\" title=\"\">stripped a bunch of stuff out of here</a> but you don’t need any of it to understand what’s coming next.</p>\n</div><div class=\"highlight-wrapper group relative cpp\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-qqcnfpws\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-qqcnfpws\"><span class=\"cm\">/*\n * main unit of I/O for the block layer and lower layers (ie drivers and\n * stacking drivers)\n */</span>\n<span class=\"k\">struct</span> <span class=\"nc\">bio</span> <span class=\"p\">{</span>\n    <span class=\"k\">struct</span> <span class=\"nc\">gendisk</span>      <span class=\"o\">*</span><span class=\"n\">bi_disk</span><span class=\"p\">;</span>\n    <span class=\"kt\">unsigned</span> <span class=\"kt\">int</span>        <span class=\"n\">bi_opf</span>\n    <span class=\"kt\">unsigned</span> <span class=\"kt\">short</span>      <span class=\"n\">bi_flags</span><span class=\"p\">;</span>   \n    <span class=\"kt\">unsigned</span> <span class=\"kt\">short</span>      <span class=\"n\">bi_ioprio</span><span class=\"p\">;</span>\n    <span class=\"n\">blk_status_t</span>        <span class=\"n\">bi_status</span><span class=\"p\">;</span>\n    <span class=\"kt\">unsigned</span> <span class=\"kt\">short</span>      <span class=\"n\">bi_vcnt</span><span class=\"p\">;</span>         <span class=\"cm\">/* how many bio_vec's */</span>\n    <span class=\"k\">struct</span> <span class=\"nc\">bio_vec</span>      <span class=\"n\">bi_inline_vecs</span><span class=\"p\">[]</span> <span class=\"cm\">/* (page, len, offset) tuples */</span><span class=\"p\">;</span>\n<span class=\"p\">};</span>\n</code></pre>\n  </div>\n</div>\n<p>No nerd has ever looked at a fixed-format message like this without thinking about writing a proxy for it, and <code>struct bio</code> is no exception. The proxy system in the Linux kernel for <code>struct bio</code> is called <code>device mapper</code>, or DM.</p>\n\n<p>DM target devices can plug into other DM devices. For that matter, they can do whatever the hell else they want, as long as they honor the interface. It boils down to a <code>map(bio)</code> function, which can dispatch a <code>struct bio</code>, or drop it, or muck with it and ask the kernel to resubmit it.</p>\n\n<p>You can do a whole lot of stuff with this interface: carve a big device into a bunch of smaller ones (<a href='https://docs.kernel.org/admin-guide/device-mapper/linear.html' title=''><code>dm-linear</code></a>), make one big striped device out of a bunch of smaller ones (<a href='https://docs.kernel.org/admin-guide/device-mapper/striped.html' title=''><code>dm-stripe</code></a>), do software RAID mirroring (<code>dm-raid1</code>), create snapshots of arbitrary existing devices (<a href='https://docs.kernel.org/admin-guide/device-mapper/snapshot.html' title=''><code>dm-snap</code></a>), cryptographically verify boot devices (<a href='https://docs.kernel.org/admin-guide/device-mapper/verity.html' title=''><code>dm-verity</code></a>), and a bunch more. Device Mapper is the kernel backend for the <a href='https://sourceware.org/lvm2/' title=''>userland LVM2 system</a>, which is how we do <a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''>thin pools and snapshot backups</a>.</p>\n\n<p>Which brings us to <code>dm-clone</code> : it’s a map function that boils down to:</p>\n<div class=\"highlight-wrapper group relative cpp\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-fre1zh1l\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-fre1zh1l\">    <span class=\"cm\">/* ... */</span> \n    <span class=\"n\">region_nr</span> <span class=\"o\">=</span> <span class=\"n\">bio_to_region</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n\n    <span class=\"c1\">// we have the data</span>\n    <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"n\">dm_clone_is_region_hydrated</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"o\">-></span><span class=\"n\">cmd</span><span class=\"p\">,</span> <span class=\"n\">region_nr</span><span class=\"p\">))</span> <span class=\"p\">{</span>\n        <span class=\"n\">remap_and_issue</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n        <span class=\"k\">return</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n\n    <span class=\"c1\">// we don't and it's a read</span>\n    <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"n\">bio_data_dir</span><span class=\"p\">(</span><span class=\"n\">bio</span><span class=\"p\">)</span> <span class=\"o\">==</span> <span class=\"n\">READ</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n        <span class=\"n\">remap_to_source</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n        <span class=\"k\">return</span> <span class=\"mi\">1</span><span class=\"p\">;</span>\n    <span class=\"p\">}</span>\n\n    <span class=\"c1\">// we don't and it's a write</span>\n    <span class=\"n\">remap_to_dest</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n    <span class=\"n\">hydrate_bio_region</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n    <span class=\"k\">return</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n    <span class=\"cm\">/* ... */</span> \n</code></pre>\n  </div>\n</div><div class=\"right-sidenote\"><p>a <a href=\"https://docs.kernel.org/admin-guide/device-mapper/kcopyd.html\" title=\"\"><code>kcopyd</code></a> thread runs in the background, rehydrating the device in addition to (and independent of) read accesses.</p>\n</div>\n<p><code>dm-clone</code> takes, in addition to the source device to clone from, a “metadata” device on which is stored a bitmap of the status of all the blocks: either “rehydrated” from the source, or not. That’s how it knows whether to fetch a block from the original device or the clone.</p>\n<h3 id='network-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#network-clone' aria-label='Anchor'></a><span class='plain-code'>Network Clone</span></h3><div class=\"callout\"><p><strong class=\"font-semibold text-navy-950\"><code>flyd</code> in a nutshell:</strong> worker physical run a service, <code>flyd</code>, which manages a couple of databases that are the source of truth for all the Fly Machines running there. Concepturally, <code>flyd</code> is a server for on-demand instances of durable finite state machines, each representing some operation on a Fly Machine (creation, start, stop, &c), with the transition steps recorded carefully in a BoltDB database. An FSM step might be something like “assign a local IPv6 address to this interface”, or “check out a block device with the contents of this container”, and it’s straightforward to add and manage new ones.</p>\n</div>\n<p>Say we’ve got <code>flyd</code> managing a Fly Machine with a volume on <code>worker-xx-cdg1-1</code>. We want it running on <code>worker-xx-cdg1-2</code>. Our whole fleet is meshed with WireGuard; everything can talk directly to everything else. So, conceptually:</p>\n\n<ol>\n<li><code>flyd</code> on <code>cdg1-1</code> stops the Fly Machine, and\n</li><li>sends a message to <code>flyd</code> on <code>cdg1-2</code> telling it to clone the source volume.\n</li><li><code>flyd</code> on <code>cdg1-2</code> starts a <code>dm-clone</code> instance, which creates a clone volume on <code>cdg1-2</code>, populating it, over some kind of network block protocol, from <code>cdg1-1</code>, and\n</li><li>boots a new Fly Machine, attached to the clone volume.\n</li><li><code>flyd</code> on <code>cdg1-2</code> monitors the clone operation, and, when the clone completes, converts the clone device to a simple linear device and cleans up.\n</li></ol>\n\n<p>For step (3) to work, the “original volume” on <code>cdg1-1</code> has to be visible on <code>cdg1-2</code>, which means we need to mount it over the network.</p>\n<div class=\"right-sidenote\"><p><code>nbd</code> is so simple that it’s used as a sort of <code>dm-user</code> userland block device; to prototype a new block device, <a href=\"https://lwn.net/ml/linux-kernel/[email protected]/\" title=\"\">don’t bother writing a kernel module</a>, just write an <code>nbd</code> server.</p>\n</div>\n<p>Take your pick of  protocols. iSCSI is the obvious one, but it’s relatively complicated, and Linux has native support for a much simpler one: <code>nbd</code>, the “network block device”. You could implement an <code>nbd</code> server in an afternoon, on top of a file or a SQLite database or S3, and the Linux kernel could mount it as a drive.</p>\n\n<p>We started out using <code>nbd</code>. But we kept getting stuck <code>nbd</code> kernel threads when there was any kind of network disruption. We’re a global public cloud; network disruption  happens. Honestly, we could have debugged our way through this. But it was simpler just to spike out an iSCSI implementation, observe that didn’t get jammed up when the network hiccuped, and move on.</p>\n<h3 id='putting-the-pieces-together' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#putting-the-pieces-together' aria-label='Anchor'></a><span class='plain-code'>Putting The Pieces Together</span></h3>\n<p>To drain a worker with minimal downtime and no lost data, we turn workers into a temporary SANs, serving the volumes we need to drain to fresh-booted replica Fly Machines on a bunch of “target” physicals. Those SANs — combinations of <code>dm-clone</code>, iSCSI, and <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>our <code>flyd</code> orchestrator</a> — track the blocks copied from the origin, copying each one exactly once and cleaning up when the original volume has been fully copied.</p>\n\n<p>Problem solved!</p>\n<h3 id='no-there-were-more-problems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#no-there-were-more-problems' aria-label='Anchor'></a><span class='plain-code'>No, There Were More Problems</span></h3>\n<p>When your problem domain is hard, anything you build whose design you can’t fit completely in your head is going to be a fiasco. Shorter form: “if you see Raft consensus in a design, we’ve done something wrong”.</p>\n\n<p>A virtue of this migration system is that, for as many moving pieces as it has, it fits in your head. What complexity it has is mostly shouldered by strategic bets we’ve already  built teams around, most notably the <code>flyd</code> orchestrator. So we’ve been running this system for the better part of a year without much drama. Not no drama, though. Some drama.</p>\n\n<p>Example: we encrypt volumes. Our key management is fussy. We do per-volume encryption keys that provision alongside the volumes themselves, so no one worker has a volume skeleton key.</p>\n\n<p>If you think “migrating those volume keys from worker to worker” is the problem I’m building up to, well, that too, but the bigger problem is <code>trim</code>.</p>\n\n<p>Most people use just a small fraction of the volumes they allocate. A 100GiB volume with just 5MiB used wouldn’t be at all weird. You don’t want to spend minutes copying a volume that could have been fully hydrated in seconds.</p>\n\n<p>And indeed, <code>dm-clone</code> doesn’t want to do that either. Given a source block device (for us, an iSCSI mount) and the clone device, a <code>DISCARD</code> issued on the clone device will get picked up by <code>dm-clone</code>, which will simply <a href='https://elixir.bootlin.com/linux/v5.11.11/source/drivers/md/dm-clone-target.c#L1357' title=''>short-circuit the read</a> of the relevant blocks by marking them as hydrated in the metadata volume. Simple enough.</p>\n\n<p>To make that work, we need the target worker to see the plaintext of the source volume (so that it can do an <code>fstrim</code> — don’t get us started on how annoying it is to sandbox this — to read the filesystem, identify the unused block, and issue the <code>DISCARDs</code> where <code>dm-clone</code> can see them) Easy enough.</p>\n<div class=\"right-sidenote\"><p>these curses have a lot to do with how hard it was to drain workers!</p>\n</div>\n<p>Except: two different workers, for cursed reasons, might be running different versions of <a href='https://gitlab.com/cryptsetup/cryptsetup' title=''>cryptsetup</a>, the userland bridge between LUKS2 and the <a href='https://docs.kernel.org/admin-guide/device-mapper/dm-crypt.html' title=''>kernel dm-crypt driver</a>. There are (or were) two different versions of cryptsetup on our network, and they default to different <a href='https://fossies.org/linux/cryptsetup/docs/on-disk-format-luks2.pdf' title=''>LUKS2 header sizes</a> — 4MiB and 16MiB. Implying two different plaintext volume sizes. </p>\n\n<p>So now part of the migration FSM is an RPC call that carries metadata about the designed LUKS2 configuration for the target VM. Not something we expected to have to build, but, whatever.</p>\n<div class=\"right-sidenote\"><p>Corrosion deserves its own post.</p>\n</div>\n<p>Gnarlier example: workers are the source of truth for information about the Fly Machines running on them. Migration knocks the legs out from under that constraint, which we were relying on in Corrosion, the SWIM-gossip SQLite database we use to connect Fly Machines to our request routing. Race conditions. Debugging. Design changes. Next!</p>\n\n<p>Gnarliest example: our private networks. Recall: we automatically place every Fly Machine into <a href='https://fly.io/blog/incoming-6pn-private-networks/' title=''>a private network</a>; by default, it’s the one all the other apps in your organization run in. This is super handy for setting up background services, databases, and clustered applications. 20 lines of eBPF code in our worker kernels keeps anybody from “crossing the streams”, sending packets from one private network to another.</p>\n<div class=\"right-sidenote\"><p>we’re members of an elite cadre of idiots who have managed to find designs that made us wish IPv6 addresses were even bigger.</p>\n</div>\n<p>We call this scheme 6PN (for “IPv6 Private Network”). It functions by <a href='https://fly.io/blog/ipv6-wireguard-peering#ipv6-private-networking-at-fly' title=''>embedding routing information directly into IPv6 addresses</a>. This is, perhaps, gross. But it allows us to route diverse private networks with constantly changing membership across a global fleet of servers without running a distributed routing protocol. As the beardy wizards who kept the Internet backbone up and running on Cisco AGS+’s once said: the best routing protocol is “static”.</p>\n\n<p>Problem: the embedded routing information in a 6PN address refers in part to specific worker servers.</p>\n\n<p>That’s fine, right? They’re IPv6 addresses. Nobody uses literal IPv6 addresses. Nobody uses IP addresses at all; they use the DNS. When you migrate a host, just give it a new 6PN address, and update the DNS.</p>\n\n<p>Friends, somebody did use literal IPv6 addresses. It was us. In the configurations for Fly Postgres clusters.</p>\n<div class=\"right-sidenote\"><p>It’s also not operationally easy for us to shell into random Fly Machines, for good reason.</p>\n</div>\n<p>The obvious fix for this is not complicated; given <code>flyctl</code> ssh access to a Fly Postgres cluster, it’s like a 30 second ninja edit. But we run a <em>lot</em> of Fly Postgres clusters, and the change has to be coordinated carefully to avoid getting the cluster into a confused state. We went as far as adding feature to our <code>init</code> to do network address mappings to keep old 6PN addresses reachable before biting the bullet and burning several weeks doing the direct configuration fix fleet-wide.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Speedrun your app onto Fly.io.</h1>\n    <p>3…2…1…</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/speedrun\">\n        Go!  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h3 id='the-learning-it-burns' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-learning-it-burns' aria-label='Anchor'></a><span class='plain-code'>The Learning, It Burns!</span></h3>\n<p>We get asked a lot why we don’t do storage the “obvious” way, with an <a href='https://aws.amazon.com/ebs/' title=''>EBS-type</a> SAN fabric, abstracting it away from our compute. Locally-attached NVMe storage is an idiosyncratic choice, one we’ve had to write disclaimers for (single-node clusters can lose data!) since we first launched it.</p>\n\n<p>One answer is: we’re a startup. Building SAN infrastructure in every region we operate in would be tremendously expensive. Look at any feature in AWS that normal people know the name of, like EC2, EBS, RDS, or S3 — there’s a whole company in there. We launched storage when we were just 10 people, and even at our current size we probably have nothing resembling the resources EBS gets. AWS is pretty great!</p>\n\n<p>But another thing to keep in mind is: we’re learning as we go. And so even if we had the means to do an EBS-style SAN, we might not build it today.</p>\n\n<p>Instead, we’re a lot more interested in log-structured virtual disks (LSVD). LSVD uses NVMe as a local cache, but durably persists writes in object storage. You get most of the performance benefit of bus-hop disk writes, along with unbounded storage and S3-grade reliability.</p>\n\n<p><a href='https://community.fly.io/t/bottomless-s3-backed-volumes/15648' title=''>We launched LSVD experimentally last year</a>; in the intervening year, something happened to make LSVD even more interesting to us: <a href='https://www.tigrisdata.com/' title=''>Tigris Data</a> launched S3-compatible object storage in every one our regions, so instead of backhauling updates to Northern Virginia, <a href='https://community.fly.io/t/tigris-backed-volumes/20792' title=''>we can keep them local</a>. We have more to say about LSVD, and a lot more to say about Tigris.</p>\n\n<p>Our first several months of migrations were done gingerly. By summer of 2024, we got to where our infra team can pull “drain this host” out of their toolbelt without much ceremony.</p>\n\n<p>We’re still not to the point where we’re migrating casually. Your Fly Machines are probably not getting migrated! There’d need to be a reason! But the dream is fully-automated luxury space migration, in which you might get migrated semiregularly, as our systems work not just to drain problematic hosts but to rebalance workloads regularly. No time soon. But we’ll get there.</p>\n\n<p>This is the biggest thing our team has done since we replaced Nomad with flyd. Only the new billing system comes close. We did this thing not because it was easy, but because we thought it would be easy. It was not. But: worth it!</p>",
      "image": {
        "url": "https://fly.io/blog/machine-migrations/assets/migrations-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/oidc-cloud-roles/",
      "title": "AWS without Access Keys",
      "description": null,
      "url": "https://fly.io/blog/oidc-cloud-roles/",
      "published": "2024-06-19T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>It’s dangerous to go alone. Fly.io runs full-stack apps by transmuting Docker containers into Fly Machines: ultra-lightweight hardware-backed VMs. You can run all your dependencies on Fly.io, but sometimes, you’ll need to work with other clouds, and we’ve made that pretty simple. Try Fly.io out for yourself; your Rails or Node app <a href=\"https://fly.io/speedrun\" title=\"\">can be up and running in just minutes</a>.</p>\n</div>\n<p>Let’s hypopulate you an app serving generative AI cat images based on the weather forecast, running on a <code>g4dn.xlarge</code> ECS task in AWS <code>us-east-1</code>.  It’s going great; people didn’t realize how dependent their cat pic prefs are on barometric pressure, and you’re all anyone can talk about.</p>\n\n<p>Word reaches Australia and Europe, but you’re not catching on, because the… latency is too high? Just roll with us here. Anyways: fixing this is going to require replicating  ECS tasks and ECR images into <code>ap-southeast-2</code> and <code>eu-central-1</code> while also setting up load balancing. Nah.</p>\n\n<p>This is the O.G. Fly.io deployment story; one deployed app, one versioned container, one command to get it running anywhere in the world.</p>\n\n<p>But you have a problem: your app relies on training data, it’s huge, your giant employer manages it, and it’s in S3. Getting this to work will require AWS credentials.</p>\n\n<p>You could ask your security team to create a user, give it permissions, and hand over the AWS keypair. Then you could wash your neck and wait for the blade. Passing around AWS keypairs is the beginning of every horror story told about cloud security, and  security team ain’t having it.</p>\n\n<p>There’s a better way. It’s drastically more secure, so your security people will at least hear you out. It’s also so much easier on Fly.io that you might never bother creating a IAM service account again.</p>\n<h2 id='lets-get-it-out-of-the-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lets-get-it-out-of-the-way' aria-label='Anchor'></a><span class='plain-code'>Let’s Get It out of the Way</span></h2>\n<p>We’re going to use OIDC to set up strictly limited trust between AWS and Fly.io.</p>\n\n<ol>\n<li>In AWS: we’ll add Fly.io as an <code>Identity Provider</code> in AWS IAM, giving us an ID we can plug into any IAM <code>Role</code>.\n</li><li>Also in AWS: we’ll create a <code>Role</code>, give it access to the S3 bucket with our tokenized cat data, and then attach the <code>Identity Provider</code> to it.\n</li><li>In Fly.io, we’ll take the <code>Role</code> ARN we got from step 2 and set it as an environment variable in our app.\n</li></ol>\n\n<p>Our machines will now magically have access to the S3 bucket.</p>\n<h2 id='what-the-what' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-what' aria-label='Anchor'></a><span class='plain-code'>What the What</span></h2>\n<p>A reasonable question to ask here is, “where’s the credential”? Ordinarily, to give a Fly Machine access to an AWS resource, you’d use <code>fly secrets set</code> to add an <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code> to the environment in the Machine. Here, we’re not setting any secrets at all; we’re just adding an ARN — which is not a credential — to the Machine.</p>\n\n<p>Here’s what’s happening.</p>\n\n<p>Fly.io operates an OIDC IdP at <code>oidc.fly.io</code>. It issues OIDC tokens, exclusively to Fly Machines. AWS can be configured to trust these tokens, on a role-by-role basis. That’s the “secret credential”: the pre-configured trust relationship in IAM, and the public keypairs it manages. You, the user, never need to deal with these keys directly; it all happens behind the scenes, between AWS and Fly.io.</p>\n\n<p><img alt=\"A diagram: STS trusts OIDC.fly.io. OIDC.fly.io trusts flyd. flyd issues a token to the Machine, which proffers it to STS. STS sends an STS cred to the Machine, which then uses it to retrieve model weights from S3.\" src=\"/blog/oidc-cloud-roles/assets/oidc-diagram.webp\" /></p>\n\n<p>The key actor in this picture is <code>STS</code>, the AWS <code>Security Token Service</code>. <code>STS</code>‘s main job is to vend short-lived AWS credentials, usually through some variant of an API called <code>AssumeRole</code>. Specifically, in our case: <code>AssumeRoleWithWebIdentity</code> tells <code>STS</code> to cough up an AWS keypair given an OIDC token (that matches a pre-configured trust relationship).</p>\n\n<p>That still leaves the question: how does your code, which is reaching out to the AWS APIs to get cat weights, drive any of this?</p>\n<h2 id='the-init-thickens' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-init-thickens' aria-label='Anchor'></a><span class='plain-code'>The Init Thickens</span></h2>\n<p>Every Fly Machine boots up into an <code>init</code> we wrote in Rust. It has slowly been gathering features.</p>\n\n<p>One of those features, which has been around for awhile, is a server for a Unix socket at <code>/.fly/api</code>, which exports a subset of the Fly Machines API to privileged processes in the Machine. Think of it as our answer to the EC2 Instant Metadata Service. How it works is, every time we boot a Fly Machine, we pass it a <a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''>Macaroon token</a> locked to that particular Machine; <code>init</code>’s server for <code>/.fly/api</code> is a proxy that attaches that token to requests.</p>\n<div class=\"right-sidenote\"><p>In addition to the API proxy being tricky to SSRF to.</p>\n</div>\n<p>What’s neat about this is that the credential that drives <code>/.fly/api</code> is doubly protected:</p>\n\n<ol>\n<li>The Fly.io platform won’t honor it unless it comes from that specific Fly Machine (<code>flyd</code>, our orchestrator, knows who it’s talking to), <em>and</em>\n</li><li>Ordinary code running in a Fly Machine never gets a copy of the token to begin with.\n</li></ol>\n\n<p>You could rig up a local privilege escalation vulnerability and work out how to steal the Macaroon, but you can’t exfiltrate it productively.</p>\n\n<p>So now you have half the puzzle worked out: OIDC is just part of the <a href='https://fly.io/docs/machines/api/' title=''>Fly Machines API</a> (specifically: <code>/v1/tokens/oidc</code>). A Fly Machine can hit a Unix socket and get an OIDC token tailored to that machine:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-dm2j0e3b\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-dm2j0e3b\">{\n  \"app_id\": \"3671581\",\n  \"app_name\": \"weather-cat\",\n  \"aud\": \"sts.amazonaws.com\",\n  \"image\": \"image:latest\",\n  \"image_digest\": \"sha256:dff79c6da8dd4e282ecc6c57052f7cfbd684039b652f481ca2e3324a413ee43f\",\n  \"iss\": \"https://oidc.fly.io/example\",\n  \"machine_id\": \"3d8d377ce9e398\",\n  \"machine_name\": \"ancient-snow-4824\",\n  \"machine_version\": \"01HZJXGTQ084DX0G0V92QH3XW4\",\n  \"org_id\": \"29873298\",\n  \"org_name\": \"example\",\n  \"region\": \"yyz\",\n  \"sub\": \"example:weather-cat:ancient-snow-4824\"\n} // some OIDC stuff trimmed\n</code></pre>\n  </div>\n</div>\n<p>Look upon this holy blob, sealed with a published key managed by Fly.io’s OIDC vault, and see that there lies within it enough information for AWS <code>STS</code> to decide to issue a session credential.</p>\n\n<p>We have still not completed the puzzle, because while you can probably now see how you’d drive this process with a bunch of new code that you’d tediously write, you are acutely aware that you have not yet endured that tedium — e pur si muove!</p>\n\n<p>One <code>init</code> feature remains to be disclosed, and it’s cute.</p>\n\n<p>If, when <code>init</code> starts in a Fly Machine, it sees an <code>AWS_ROLE_ARN</code> environment variable set, it initiates a little dance; it:</p>\n\n<ol>\n<li>goes off and generates an OIDC token, the way we just described,\n</li><li>saves that OIDC token in a file, <em>and</em>\n</li><li>sets the <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> and <code>AWS_ROLE_SESSION_NAME</code> environment variables for every process it launches.\n</li></ol>\n\n<p>The AWS SDK, linked to your application, does all the rest.</p>\n\n<p>Let’s review: you add an <code>AWS_ROLE_ARN</code> variable to your Fly App, launch a Machine, and have it go fetch a file from S3. What happens next is:</p>\n\n<ol>\n<li><code>init</code> detects <code>AWS_ROLE_ARN</code> is set as an environment variable.\n</li><li><code>init</code> sends a request to <code>/v1/tokens/oidc</code> via <code>/.api/proxy</code>.\n</li><li><code>init</code> writes the response to <code>/.fly/oidc_token.</code>\n</li><li><code>init</code> sets <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> and <code>AWS_ROLE_SESSION_NAME</code>.\n</li><li>The entrypoint boots, and (say) runs <code>aws s3 get-object.</code>\n</li><li>The AWS SDK runs through the <a href='https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html#credentialProviderChain' title=''>credential provider chain</a>\n</li><li>The SDK sees that <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> is set and calls <code>AssumeRoleWithWebIdentity</code> with the file contents.\n</li><li>AWS verifies the token against <a href='https://oidc.fly.io/' title=''><code>https://oidc.fly.io/</code></a><code>example/.well-known/openid-configuration</code>, which references a key Fly.io manages on isolated hardware.\n</li><li>AWS vends <code>STS</code> credentials for the assumed <code>Role</code>.\n</li><li>The SDK uses the <code>STS</code> credentials to access the S3 bucket.\n</li><li>AWS checks the <code>Role</code>’s IAM policy to see if it has access to the S3 bucket.\n</li><li>AWS returns the contents of the bucket object.\n</li></ol>\n<h2 id='how-much-better-is-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-better-is-this' aria-label='Anchor'></a><span class='plain-code'>How Much Better Is This?</span></h2>\n<p>It is a lot better.</p>\n<div class=\"right-sidenote\"><p>They asymptotically approach the security properties of Macaroon tokens.</p>\n</div>\n<p>Most importantly: AWS <code>STS</code> credentials are short-lived. Because they’re generated dynamically, rather than stored in a configuration file or environment variable, they’re already a little bit annoying for an attacker to recover. But they’re also dead in minutes. They have a sharply limited blast radius. They rotate themselves, and fail closed.</p>\n\n<p>They’re also easier to manage. This is a rare instance where you can reasonably drive the entire AWS side of the process from within the web console. Your cloud team adds <code>Roles</code> all the time; this is just a <code>Role</code> with an extra snippet of JSON. The resulting ARN isn’t even a secret; your cloud team could just email or Slack message it back to you.</p>\n\n<p>Finally, they offer finer-grained control.</p>\n\n<p>To understand the last part, let’s look at that extra snippet of JSON (the “Trust Policy”) your cloud team is sticking on the new <code>cat-bucket</code> <code>Role</code>:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-ex6t9zqy\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-ex6t9zqy\">{\n    \"Version\": \"2012-10-17\",\n    \"Statement\": [\n        {\n            \"Effect\": \"Allow\",\n            \"Principal\": {\n                \"Federated\": \"arn:aws:iam::123456123456:oidc-provider/oidc.fly.io/example\"\n            },\n            \"Action\": \"sts:AssumeRoleWithWebIdentity\",\n            \"Condition\": {\n              \"StringEquals\": {\n                \"oidc.fly.io/example:aud\": \"sts.amazonaws.com\",\n              },\n               \"StringLike\": {\n                \"oidc.fly.io/example:sub\": \"example:weather-cat:*\"\n              }\n            }\n        }\n    ]\n}\n</code></pre>\n  </div>\n</div><div class=\"right-sidenote\"><p>The <code>aud</code> check guarantees <code>STS</code> will only honor tokens that Fly.io deliberately vended for <code>STS</code>.</p>\n</div>\n<p>Recall the OIDC token we dumped earlier; much of what’s in it, we can match in the Trust Policy. Every OIDC token Fly.io generates is going to have a <code>sub</code> field formatted <code>org:app:machine</code>, so we can lock IAM <code>Roles</code> down to organizations, or to specific Fly Apps, or even specific Fly Machine instances.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Speedrun your app onto Fly.io.</h1>\n    <p>3…2…1…</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/speedrun\">\n        Go!  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='and-so' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#and-so' aria-label='Anchor'></a><span class='plain-code'>And So</span></h2>\n<p>In case it’s not obvious: this pattern works for any AWS API, not just S3.</p>\n\n<p>Our OIDC support on the platform and in Fly Machines will set arbitrary OIDC <code>audience</code> strings, so you can use it to authenticate to any OIDC-compliant cloud provider. It won’t be as slick on Azure or GCP, because we haven’t done the <code>init</code> features to light their APIs up with a single environment variable — but those features are easy, and we’re just waiting for people to tell us what they need.</p>\n\n<p>For us, the gold standard for least-privilege, conditional access tokens remains Macaroons, and it’s unlikely that we’re going to do a bunch of internal stuff using OIDC. We even snuck Macaroons into this feature. But the security you’re getting from this OIDC dance closes a lot of the gap between hardcoded user credentials and Macaroons, and it’s easy to use — easier, in some ways, than it is to manage role-based access inside of a legacy EC2 deployment!</p>",
      "image": {
        "url": "https://fly.io/blog/oidc-cloud-roles/assets/spooky-security-skeleton-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/llm-image-description/",
      "title": "Picture This: Open Source AI for Image Description",
      "description": null,
      "url": "https://fly.io/blog/llm-image-description/",
      "published": "2024-05-09T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>I’m Nolan, and I work on Fly Machines here at Fly.io. We’re building a new public cloud—one where you can spin up CPU and GPU workloads, around the world, in a jiffy. <a href=\"https://fly.io/speedrun/\" title=\"\">Try us out</a>; you can be up and running in minutes. This is a post about LLMs being really helpful, and an extensible project you can build with open source on a weekend.</p>\n</div>\n<p>Picture this, if you will.</p>\n\n<p>You’re blind. You’re in an unfamiliar hotel room on a trip to Chicago.</p>\n<div class=\"right-sidenote\"><p>If you live in Chicago IRL, imagine the hotel in Winnipeg, <a href=\"https://www.cbc.ca/history/EPISCONTENTSE1EP10CH3PA5LE.html\" title=\"\">the Chicago of the North</a>.</p>\n</div>\n<p>You’ve absent-mindedly set your coffee down, and can’t remember where. You’re looking for the thermostat so you don’t wake up frozen. Or, just maybe, you’re playing a fun-filled round of “find the damn light switch so your sighted partner can get some sleep already!”</p>\n\n<p>If, like me, you’ve been blind for a while, you have plenty of practice finding things without the luxury of a quick glance around. It may be more tedious than you’d like, but you’ll get it done.</p>\n\n<p>But the speed of innovation in machine learning and large language models has been dizzying, and in 2024 you can snap a photo with your phone and have an app like <a href='https://www.bemyeyes.com/blog/announcing-be-my-ai/' title=''>Be My AI</a> or <a href='https://www.seeingai.com/' title=''>Seeing AI</a> tell you where in that picture it found your missing coffee mug, or where it thinks the light switch is.</p>\n<div class=\"right-sidenote\"><p>Creative switch locations seem to be a point of pride for hotels, so the light game may be good for a few rounds of quality entertainment, regardless of how good your AI is.</p>\n</div>\n<p>This is <em>big</em>. It’s hard for me to state just how exciting and empowering AI image descriptions have been for me without sounding like a shill. In the past year, I’ve:</p>\n\n<ul>\n<li>Found shit in strange hotel rooms. \n</li><li>Gotten descriptions of scenes and menus in otherwise inaccessible video games.\n</li><li>Requested summaries of technical diagrams and other materials where details weren’t made available textually. \n</li></ul>\n\n<p>I’ve been consistently blown away at how impressive and helpful AI-created image descriptions have been.</p>\n\n<p>Also…</p>\n<h2 id='which-thousand-words-is-this-picture-worth' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#which-thousand-words-is-this-picture-worth' aria-label='Anchor'></a><span class='plain-code'>Which thousand words is this picture worth?</span></h2>\n<p>As a blind internet user for the last three decades, I have extensive empirical evidence to corroborate what you already know in your heart: humans are pretty flaky about writing useful alt text for all the images they publish. This does tend to make large swaths of the internet inaccessible to me!</p>\n\n<p>In just a few years, the state of image description on the internet has gone from complete reliance on the aforementioned lovable, but ultimately tragically flawed, humans, to automated strings of words like <code>Image may contain person, glasses, confusion, banality, disillusionment</code>, to LLM-generated text that reads a lot like it was written by a person, perhaps sipping from a steaming cup of Earl Grey as they reflect on their previous experiences of a background that features a tree with snow on its branches, suggesting that this scene takes place during winter.</p>\n\n<p>If an image is missing alt text, or if you want a second opinion, there are screen-reader addons, like <a href='https://github.com/cartertemm/AI-content-describer/' title=''>this one</a> for <a href='https://www.nvaccess.org/download/' title=''>NVDA</a>, that you can use with an API key to get image descriptions from GPT-4 or Google Gemini as you read. This is awesome! </p>\n\n<p>And this brings me to the nerd snipe. How hard would it be to build an image description service we can host ourselves, using open source technologies? It turns out to be spookily easy.</p>\n\n<p>Here’s what I came up with:</p>\n\n<ol>\n<li><a href='https://ollama.com/' title=''>Ollama</a> to run the model\n</li><li>A <a href='https://pocketbase.io' title=''>PocketBase</a> project that provides a simple authenticated API for users to submit images, get descriptions, and ask followup questions about the image\n</li><li>The simplest possible Python client to interact with the PocketBase app on behalf of users\n</li></ol>\n\n<p>The idea is to keep it modular and hackable, so if sentiment analysis or joke creation is your thing, you can swap out image description for that and have something going in, like, a weekend.</p>\n\n<p>If you’re like me, and you go skipping through recipe blogs to find the “go directly to recipe” link, find the code itself <a href='https://github.com/superfly/llm-describer' title=''>here</a>. </p>\n<h2 id='the-llm-is-the-easiest-part' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-llm-is-the-easiest-part' aria-label='Anchor'></a><span class='plain-code'>The LLM is the easiest part</span></h2>\n<p>An API to accept images and prompts, run the model, and spit \nout answers sounds like a lot! But it’s the simplest part of this whole thing, because: \nthat’s <a href='https://ollama.com/' title=''>Ollama</a>.</p>\n\n<p>You can just run the Ollama Docker image, get it to grab the model \nyou want to use, and that’s it. There’s your AI server. (We have a <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>blog post</a> \nall about deploying Ollama on Fly.io; Fly GPUs are rad, try'em out, etc.).</p>\n\n<p>For this project, we need a model that can make sense—or at least words—out of a picture. \n<a href='https://llava-vl.github.io/' title=''>LLaVA</a> is a trained, Apache-licensed “large multimodal model” that fits the bill. \nGet the model with the Ollama CLI:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-jo727efe\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-jo727efe\">ollama pull llava:34b\n</code></pre>\n  </div>\n</div><div class=\"callout\"><p>If you have hardware that can handle it, you could run this on your computer at home. If you run AI models on a cloud provider, be aware that GPU compute is expensive! <strong class=\"font-semibold text-navy-950\">It’s important to take steps to ensure you’re not paying for a massive GPU 24/7.</strong></p>\n\n<p>On Fly.io, at the time of writing, you’d achieve this with the <a href=\"https://fly.io/docs/apps/autostart-stop/\" title=\"\">autostart and autostop</a> functions of the Fly Proxy, restricting Ollama access to internal requests over <a href=\"https://fly.io/docs/networking/private-networking/#flycast-private-fly-proxy-services\" title=\"\">Flycast</a> from the PocketBase app. That way, if there haven’t been any requests for a few minutes, the Fly Proxy stops the Ollama <a href=\"https://fly.io/docs/machines/\" title=\"\">Machine</a>, which releases the CPU, GPU, and RAM allocated to it. <a href=\"https://fly.io/blog/scaling-llm-ollama/\" title=\"\">Here’s a post</a> that goes into more detail. </p>\n</div><h2 id='a-multi-tool-on-the-backend' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-multi-tool-on-the-backend' aria-label='Anchor'></a><span class='plain-code'>A multi-tool on the backend</span></h2>\n<p>I want user auth to make sure just anyone can’t grab my “image description service” and keep it busy generating short stories about their cat. If I build this out into a service for others to use, I might also want business logic around plans or\ncredits, or mobile-friendly APIs for use in the field. <a href='https://pocketbase.io' title=''>PocketBase</a> provides a scaffolding for all of it. It’s a Swiss army knife: a Firebase-like API on top of SQLite, complete with authentication, authorization, an admin UI, extensibility in JavaScript and Go, and various client-side APIs.</p>\n<div class=\"right-sidenote\"><p>Yes, <em>of course</em> I’ve used an LLM to generate feline fanfic. Theme songs too. Hasn’t everyone? </p>\n</div>\n<p>I “faked” a task-specific API that supports followup questions by extending PocketBase in Go, modeling requests and responses as <a href='https://pocketbase.io/docs/collections/' title=''>collections</a> (i.e. SQLite tables) with <a href='https://pocketbase.io/docs/go-event-hooks/' title=''>event hooks</a> to trigger pre-set interactions with the Ollama app (via <a href='https://tmc.github.io/langchaingo' title=''>LangChainGo</a>) and the client (via the PocketBase API).</p>\n\n<p>If you’re following along, <a href='https://github.com/superfly/llm-describer/blob/main/describer.go' title=''>here’s the module</a>\nthat handles all that, along with initializing the LLM connection.</p>\n\n<p>In a nutshell, this is the dance:</p>\n\n<ul>\n<li>When a user uploads an image, a hook on the <code>images</code> collection sends the image to Ollama, along with this prompt:\n<code>\"You are a helpful assistant describing images for blind screen reader users. Please describe this image.\"</code>\n</li><li>Ollama sends back its response, which the backend 1) passes back to the client and 2) stores in its <code>followups</code> collection for future reference.\n</li><li>If the user responds with a followup question about the image and description, that also \ngoes into the <code>followups</code> collection; user-initiated changes to this collection trigger a hook to chain the new \nfollowup question with the image and the chat history into a new request for the model.\n</li><li>Lather, rinse, repeat.\n</li></ul>\n\n<p>This is a super simple hack to handle followup questions, and it’ll let you keep adding followups until \nsomething breaks. You’ll see the quality of responses get poorer—possibly incoherent—as the context \nexceeds the context window.</p>\n\n<p>I also set up <a href='https://pocketbase.io/docs/api-rules-and-filters/' title=''>API rules</a> in PocketBase,\nensuring that users can’t read to and write from others’ chats with the AI.</p>\n\n<p>If image descriptions aren’t your thing, this business logic is easily swappable \nfor joke generation, extracting details from text, any other simple task you \nmight want to throw at an LLM. Just slot the best model into Ollama (LLaVA is pretty OK as a general starting point too), and match the PocketBase schema and pre-set prompts to your application.</p>\n<h2 id='a-seedling-of-a-client' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-seedling-of-a-client' aria-label='Anchor'></a><span class='plain-code'>A seedling of a client</span></h2>\n<p>With the image description service in place, the user can talk to it with any client that speaks the PocketBase API. PocketBase already has SDK clients in JavaScript and Dart, but because my screen reader is <a href='https://github.com/nvaccess/nvda' title=''>written in Python</a>, I went with a <a href='https://pypi.org/project/pocketbase/' title=''>community-created Python library</a>. That way I can build this out into an NVDA add-on \nif I want to.</p>\n\n<p>If you’re a fancy Python developer, you probably have your preferred tooling for\nhandling virtualenvs and friends. I’m not, and since my screen reader doesn’t use those\nanyway, I just <code>pip install</code>ed the library so my client can import it:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-8wi7b0yx\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-8wi7b0yx\">pip install pocketbase\n</code></pre>\n  </div>\n</div>\n<p><a href='https://github.com/superfly/llm-describer/blob/main/__init__.py' title=''>My client</a> is a very simple script. \nIt expects a couple of things: a file called <code>image.jpg</code>, located in the current directory, \nand environment variables to provide the service URL and user credentials to log into it with.</p>\n\n<p>When you run the client script, it uploads the image to the user’s <code>images</code> collection on the \nbackend app, starting the back-and-forth between user and model we saw in the previous section. \nThe client prints the model’s output to the CLI and prompts the user to input a followup question, \nwhich it passes up to the <code>followups</code> collection, and so on.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>This can run on Fly.io.</h1>\n    <p>Run your LLM on a datacenter-grade GPU.</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/gpu/\">\n        Try out a Fly GPU  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-turtle.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='all-together-now' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#all-together-now' aria-label='Anchor'></a><span class='plain-code'>All together now</span></h2>\n<p>I grabbed <a href='https://unsplash.com/photos/brown-trees-beside-river-under-blue-sky-during-daytime-kSvpTrfhaiU' title=''>this\nimage</a>\nand saved it to a file called <em>image.jpg</em>. </p>\n\n<p>While I knew I was downloading an image of a winter scene, all I see on Unsplash is:</p>\n\n<blockquote>\n<p>brown trees beside river under blue sky during daytime Bright winter landscape\nwith lake, snow, forest, beautiful blue sky and white clouds. An example of\ncharming wildlife in Russia.</p>\n</blockquote>\n\n<p>Let’s see what our very own AI describer thinks of this picture:</p>\n<div class=\"highlight-wrapper group relative plain\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-3cm0shen\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-3cm0shen\">$ python __init__.py\nThe image depicts a serene winter scene. There is a small stream of water flowing through the center, with patches of snow on its surface and along the banks. The surrounding area appears to be a mix of bare trees and shrubs, suggesting it's late in the season or possibly early spring when some plants have started to grow again but haven't yet leafed out.\n\nThe sky is clear and blue, indicating good weather conditions at the time the photo was taken. The sunlight seems to be coming from the left side of the image, casting a warm glow on the scene and creating reflections in the water. There are no visible texts or distinguishing marks that provide additional context about the location or the photographer.\n\nThe overall atmosphere is peaceful and natural, with the quiet beauty of the landscape undisturbed except for the gentle movement of the stream.\n</code></pre>\n  </div>\n</div>\n<p>Is it a stellar description? Maybe not, but it certainly gives me a better sense of connection with the scene.</p>\n\n<p>Let’s see how our describer copes with a followup question.</p>\n<div class=\"highlight-wrapper group relative plain\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-o72lg1fm\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-o72lg1fm\">Enter your followup question, or 'quit' to quit: What types of trees are in the image?\nSending followup question\nIt's difficult to determine the exact species of trees in the image without more specific details, as many deciduous trees lose their leaves during winter and look quite similar. However, based on the general appearance of the bare branches and the fact that they are leafless, it is likely that these are common deciduous trees such as oaks, maples, or elms.\n\nThe presence of snow and the absence of foliage suggest that this photo was taken during winter or early spring when the trees have not yet started to bloom again. The exact species would require a closer examination of the bark, buds, and other distinguishing features, which are not clearly visible in this image.\n</code></pre>\n  </div>\n</div>\n<p>Boo, the general-purpose LLaVA model couldn’t identify the leafless trees. At least it knows why it can’t. Maybe there’s a better model out \nthere for that. Or we could train one, if we really needed tree identification! We could make every component of \nthis service more sophisticated! </p>\n\n<p>But that I, personally, can make a proof of concept like this with a few days of effort\ncontinues to boggle my mind. Thanks to a handful of amazing open source projects, it’s really, spookily, easy. And from here, I (or you) can build out a screen-reader addon, or a mobile app, or a different kind of AI service, with modular changes.</p>\n<h2 id='deployment-notes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-notes' aria-label='Anchor'></a><span class='plain-code'>Deployment notes</span></h2>\n<p>On Fly.io, stopping GPU Machines saves you a bunch of money and some carbon footprint, in return for cold-start latency when you make a request for the first time in more than a few minutes. In testing this project, on the <code>a100-40gb</code> Fly Machine preset, the 34b-parameter LLaVA model took several seconds to generate each response. If the Machine was stopped when the request came in, starting it up took another handful of seconds, followed by several tens of seconds to load the model into GPU RAM. The total time from cold start to completed description was about 45 seconds. Just something to keep in mind.</p>\n\n<p>If you’re running Ollama in the cloud, you likely want to put the model onto storage that’s persistent, so you don’t have to download it repeatedly. You could also build the model into a Docker image ahead of deployment.</p>\n\n<p>The PocketBase Golang app compiles to a single executable that you can run wherever.\nI run it on Fly.io, unsurprisingly, and the <a href='https://github.com/superfly/llm-describer/' title=''>repo</a> comes with a Dockerfile and a <a href='https://fly.io/docs/reference/configuration/' title=''><code>fly.toml</code></a> config file, which you can edit to point at your own Ollama instance. It uses a small persistent storage volume for the SQLite database. Under testing, it runs fine on a <code>shared-cpu-1x</code> Machine. </p>",
      "image": {
        "url": "https://fly.io/blog/llm-image-description/assets/image-description-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/jit-wireguard-peers/",
      "title": "JIT WireGuard",
      "description": null,
      "url": "https://fly.io/blog/jit-wireguard-peers/",
      "published": "2024-03-12T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. We do a lot of stuff with WireGuard, which has become a part of our customer API. This is a quick story about some tricks we played to make WireGuard faster and more scalable for the hundreds of thousands of people who now use it here.</p>\n</div>\n<p>One of many odd decisions we’ve made at Fly.io is how we use WireGuard. It’s not just that we use it in many places where other shops would use HTTPS and REST APIs. We’ve gone a step beyond that: every time you run <code>flyctl</code>, our lovable, sprawling CLI, it conjures a TCP/IP stack out of thin air, with its own IPv6 address, and speaks directly to Fly Machines running on our networks.</p>\n\n<p>There are plusses and minuses to this approach, which we talked about <a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''>in a blog post a couple years back</a>. Some things, like remote-operated Docker builders, get easier to express (a Fly Machine, as far as <code>flyctl</code> is concerned, might as well be on the same LAN). But everything generally gets trickier to keep running reliably.</p>\n\n<p>It was a decision. We own it.</p>\n\n<p>Anyways, we’ve made some improvements recently, and I’d like to talk about them.</p>\n<h2 id='where-we-left-off' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-we-left-off' aria-label='Anchor'></a><span class='plain-code'>Where we left off</span></h2>\n<p>Until a few weeks ago, our gateways ran on a pretty simple system.</p>\n\n<ol>\n<li>We operate dozens of “gateway” servers around the world, whose sole purpose is to accept incoming WireGuard connections and connect them to the appropriate private networks.\n</li><li>Any time you run <code>flyctl</code> and it needs to talk to a Fly Machine (to build a container, pop an SSH console, copy files, or proxy to a service you’re running), it spawns or connects to a background agent process.\n</li><li>The first time it runs, the agent generates a new WireGuard peer configuration from our GraphQL API. WireGuard peer configurations are very simple: just a public key and an address to connect to.\n</li><li>Our API in turn takes that peer configuration and sends it to the appropriate gateway (say, <code>ord</code>, if you’re near Chicago) via an RPC we send over the NATS messaging system.\n</li><li>On the gateway, a service called <code>wggwd</code> accepts that configuration, saves it to a SQLite database, and adds it to the kernel using WireGuard’s Golang libraries. <code>wggwd</code> acknowledges the installation of the peer to the API.\n</li><li>The API replies to your GraphQL request, with the configuration.\n</li><li>Your <code>flyctl</code> connects to the WireGuard peer, which works, because you receiving the configuration means it’s installed on the gateway.\n</li></ol>\n\n<p>I copy-pasted those last two bullet points from <a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''>that two-year-old post</a>, because when it works, it does <em>just work</em> reasonably well. (We ultimately did end up defaulting everybody to WireGuard-over-WebSockets, though.)</p>\n\n<p>But if it always worked, we wouldn’t be here, would we?</p>\n\n<p>We ran into two annoying problems:</p>\n\n<p>One: NATS is fast, but doesn’t guarantee delivery. Back in 2022, Fly.io was pretty big on NATS internally. We’ve moved away from it. For instance, our <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>internal <code>flyd</code> API</a> used to be driven by NATS; today, it’s HTTP.  Our NATS cluster was losing too many messages to host a reliable API on it. Scaling back our use of NATS made WireGuard gateways better, but still not great.</p>\n\n<p>Two: When <code>flyctl</code> exits, the WireGuard peer it created sticks around on the gateway. Nothing cleans up old peers. After all, you’re likely going to come back tomorrow and deploy a new version of your app, or <code>fly ssh console</code> into it to debug something. Why remove a peer just to re-add it the next day?</p>\n\n<p>Unfortunately, the vast majority of peers are created by <code>flyctl</code> in CI jobs, which don’t have persistent storage and can’t reconnect to the same peer the next run; they generate new peers every time, no matter what.</p>\n\n<p>So, we ended up with a not-reliable-enough provisioning system, and gateways with hundreds of thousands of peers that will never be used again. The high stale peer count made kernel WireGuard operations very slow - especially loading all the peers back into the kernel after a gateway server reboot - as well as some kernel panics.</p>\n\n<p>There had to be</p>\n<h2 id='a-better-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-better-way' aria-label='Anchor'></a><span class='plain-code'>A better way.</span></h2>\n<p>Storing bajillions of WireGuard peers is no big challenge for any serious n-tier RDBMS. This isn’t “big data”. The problem we have at Fly.io is that our gateways don’t have serious n-tier RDBMSs. They’re small. Scrappy. They live off the land.</p>\n\n<p>Seriously, though: you could store every WireGuard peer everybody has ever used at Fly.io in a single SQLite database, easily.  What you can’t do is store them all in the Linux kernel.</p>\n\n<p>So, at some point, as you push more and more peer configurations to a gateway, you have to start making decisions about which peers you’ll enable in the kernel, and which you won’t.</p>\n\n<p>Wouldn’t it be nice if we just didn’t have this problem? What if, instead of pushing configs to gateways, we had the gateways pull them from our API on demand?</p>\n\n<p>If you did that, peers would only have to be added to the kernel when the client wanted to connect. You could yeet them out of the kernel any time you wanted; the next time the client connected, they’d just get pulled again, and everything would work fine.</p>\n\n<p>The problem you quickly run into to build this design is that Linux kernel WireGuard doesn’t have a feature for installing peers on demand. However:</p>\n<h2 id='it-is-possible-to-jit-wireguard-peers' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#it-is-possible-to-jit-wireguard-peers' aria-label='Anchor'></a><span class='plain-code'>It is possible to JIT WireGuard peers</span></h2>\n<p>The Linux kernel’s <a href='https://github.com/WireGuard/wgctrl-go' title=''>interface for configuring WireGuard</a> is <a href='https://docs.kernel.org/userspace-api/netlink/intro.html' title=''>Netlink</a> (which is basically a way to create a userland socket to talk to a kernel service). Here’s a <a href='https://github.com/WireGuard/wg-dynamic/blob/master/netlink.h' title=''>summary of it as a C API</a>. Note that there’s no API call to subscribe for “incoming connection attempt” events.</p>\n\n<p>That’s OK! We can just make our own events. WireGuard connection requests are packets, and they’re easily identifiable, so we can efficiently snatch them with a BPF filter and a <a href='https://github.com/google/gopacket' title=''>packet socket</a>.</p>\n<div class=\"callout\"><p>Most of the time, it’s even easier for us to get the raw WireGuard packets, because our users now default to WebSockets WireGuard (which is just an unauthenticated WebSockets connect that shuttles framed UDP packets to and from an interface on the gateway), so that people who have trouble talking end-to-end in UDP can bring connections up.</p>\n</div>\n<p>We own the daemon code for that, and can just hook the packet receive function to snarf WireGuard packets.</p>\n\n<p>It’s not obvious, but WireGuard doesn’t have notions of “client” or “server”. It’s a pure point-to-point protocol; peers connect to each other when they have traffic to send. The first peer to connect is called the <strong class='font-semibold text-navy-950'>initiator</strong>, and the peer it connects to is the <strong class='font-semibold text-navy-950'>responder</strong>.</p>\n<div class=\"right-sidenote\"><p><a href=\"https://www.wireguard.com/papers/wireguard.pdf\" title=\"\"><em>The WireGuard paper</em></a> <em>is a good read.</em></p>\n</div>\n<p>For Fly.io, <code>flyctl</code> is typically our initiator,  sending a single UDP packet to the gateway, which is the responder. According <a href='https://www.wireguard.com/papers/wireguard.pdf' title=''>to the WireGuard paper</a>, this first packet is a <code>handshake initiation</code>.  It gets better: the packet type is recorded in a single plaintext byte. So this simple BPF filter catches all the incoming connections: <code>udp and dst port 51820 and udp[8] = 1</code>.</p>\n\n<p>In most other protocols, we’d be done at this point; we’d just scrape the username or whatnot out of the packet, go fetch the matching configuration, and install it in the kernel. With WireGuard, not so fast. WireGuard is based on Trevor Perrin’s <a href='http://www.noiseprotocol.org/' title=''>Noise Protocol Framework</a>, and Noise goes way out of its way to <a href='http://www.noiseprotocol.org/noise.html#identity-hiding' title=''>hide identities</a> during handshakes. To identify incoming requests, we’ll need to run enough Noise cryptography to decrypt the identity.</p>\n\n<p>The code to do this is fussy, but it’s relatively short (about 200 lines). Helpfully, the kernel Netlink interface will give a privileged process the private key for an interface, so the secrets we need to unwrap WireGuard are easy to get. Then it’s just a matter of running the first bit of the Noise handshake. If you’re that kind of nerdy, <a href='https://gist.github.com/tqbf/9f2c2852e976e6566f962d9bca83062b' title=''>here’s the code.</a></p>\n\n<p>At this point, we have the event feed we wanted: the public keys of every user trying to make a WireGuard connection to our gateways. We keep a rate-limited cache in SQLite, and when we see new peers, we’ll make an internal HTTP API request to fetch the matching peer information and install it. This fits nicely into the little daemon that already runs on our gateways to manage WireGuard, and allows us to ruthlessly and recklessly remove stale peers with a <code>cron</code> job.</p>\n\n<p>But wait! There’s more! We bounced this plan off Jason Donenfeld, and he tipped us off on a sneaky feature of the Linux WireGuard Netlink interface.</p>\n<div class=\"right-sidenote\"><p>Jason is the hardest working person in show business.</p>\n</div>\n<p>Our API fetch for new peers is generally not going to be fast enough to respond to the first handshake initiation message a new client sends us. That’s OK; WireGuard is pretty fast about retrying. But we can do better.</p>\n\n<p>When we get an incoming initiation message, we have the 4-tuple address of the desired connection, including the ephemeral source port <code>flyctl</code> is using. We can install the peer as if we’re the initiator, and <code>flyctl</code> is the responder. The Linux kernel will initiate a WireGuard connection back to <code>flyctl</code>. This works; the protocol doesn’t care a whole lot who’s the server and who’s the client. We get new connections established about as fast as they can possibly be installed.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Launch an app in minutes</h1>\n    <p>Speedrun an app onto Fly.io and get your own JIT WireGuard peer ✨</p>\n      <a class=\"btn btn-lg\" href=\"/docs/speedrun/\">\n        Speedrun  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='look-at-this-graph' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-this-graph' aria-label='Anchor'></a><span class='plain-code'>Look at this graph</span></h2>\n<p>We’ve been running this in production for a few weeks and we’re feeling pretty happy about it. We went from thousands, or hundreds of thousands, of stale WireGuard peers on a gateway to what rounds to none. Gateways now hold a lot less state, are faster at setting up peers, and can be rebooted without having to wait for many unused peers to be loaded back into the kernel.</p>\n\n<p>I’ll leave you with this happy Grafana chart from the day of the switchover.</p>\n\n<p><img alt=\"a Grafana chart of 'kernel_stale_wg_peer_count' vs. time. For the first few hours, all traces are flat. Most are at values between 0 and 50,000 and the top-most is just under 550,000. Towards the end of the graph, each line in turn jumps sharply down to the bottom, and at the end of the chart all datapoints are indistinguishable from 0.\" src=\"/blog/jit-wireguard-peers/assets/wireguard-peers-graph.webp\" /></p>\n\n<p><strong class='font-semibold text-navy-950'>Editor’s note:</strong> Despite our tearful protests, Lillian has decided to move on from Fly.io to explore new pursuits. We wish her much success and happiness! ✨</p>",
      "image": {
        "url": "https://fly.io/blog/jit-wireguard-peers/assets/network-thumbnail.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/fks-beta-live/",
      "title": "Fly Kubernetes does more now",
      "description": null,
      "url": "https://fly.io/blog/fks-beta-live/",
      "published": "2024-03-07T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>Eons ago, we <a href=\"https://fly.io/blog/fks/\" title=\"\">announced</a> we were working on <a href=\"https://fly.io/docs/kubernetes/\" title=\"\">Fly Kubernetes</a>. It drummed up enough excitement to prove we were heading in the right direction. So, we got hard to work to get from barebones “early access” to a beta release. We’ll be onboarding customers to the closed beta over the next few weeks. Email us at <a href=\"mailto:[email protected]\">[email protected]</a> and we’ll hook you up.</p>\n</div>\n<p>Fly Kubernetes is the “blessed path\"™️ to using Kubernetes backed by Fly.io infrastructure. Or, in simpler terms, it is our managed Kubernetes service. We take care of the complexity of operating the Kubernetes control plane, leaving you with the unfettered joy of deploying your Kubernetes workloads. If you love Fly.io and K8s, this product is for you.</p>\n<h2 id='what-even-is-a-kubernete' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-even-is-a-kubernete' aria-label='Anchor'></a><span class='plain-code'>What even is a Kubernete?</span></h2>\n<p>So how did this all come to be—and what even is a Kubernete?</p>\n<div class=\"right-sidenote\"><p>You can see more fun details in <a href=\"https://fly.io/blog/fks/\" title=\"\">Introducing Fly Kubernetes</a>.</p>\n</div>\n<p>If you wade through all the YAML and <a href='https://landscape.cncf.io/' title=''>CNCF projects</a>, what’s left is an API for declaring workloads and how it should be accessed. </p>\n\n<p>But that’s not what people usually talk / groan about. It’s everything else that comes along with adopting Kubernetes: a container runtime (CRI), networking between workloads (CNI) which leads to DNS (CoreDNS). Then you layer on Prometheus for metrics and whatever the logging daemon du jour is at the time. Now you get to debate which Ingress—strike that—<em>Gateway</em> API to deploy and if the next thing is anything to do with a Service Mess, then as they like to say where I live, \"bless your heart”.</p>\n\n<p>Finally, there’s capacity planning. You’ve got to pick and choose where, how and what the <a href='https://kubernetes.io/docs/concepts/architecture/nodes/' title=''>Nodes</a> will look like in order to configure and run the workloads.</p>\n\n<p>When we began thinking about what a Fly Kubernetes Service could look like, we started from first principles, as we do with most everything here. The best way we can describe it is the <a href='https://www.youtube.com/watch?v=Ddk9ci6geSs' title=''>scene from Iron Man 2 when Tony Stark discovers a new element</a>. As he’s looking at the knowledge left behind by those that came before, he starts to imagine something entirely different and more capable than could have been accomplished previously. That’s what happened to JP, but with K3s and Virtual Kubelet.</p>\n<h2 id='ok-then-wtf-whats-the-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ok-then-wtf-whats-the-fks' aria-label='Anchor'></a><span class='plain-code'>OK then, WTF (what’s the FKS)?</span></h2>\n<p>We looked at what people need to get started—the API—and then started peeling away all the noise, filling in the gaps to connect things together to provide the power. Here’s how this looks currently:</p>\n\n<ul>\n<li>Containerd/CRI → <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>flyd</a> + Firecracker + <a href='https://fly.io/blog/docker-without-docker/' title=''>our init</a>: our system transmogrifies Docker containers into Firecracker microVMs\n</li><li>Networking/CNI → Our <a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''>internal WireGuard mesh</a> connects your pods together\n</li><li>Pods → Fly Machines VMs\n</li><li>Secrets → Secrets, only not the base64’d kind\n</li><li>Services → The Fly Proxy\n</li><li>CoreDNS → CoreDNS (to be replaced with our custom internal DNS)\n</li><li>Persistent Volumes → Fly Volumes (coming soon)\n</li></ul>\n\n<p>Now…not everything is a one-to-one comparison, and we explicitly did not set out to support any and every configuration. We aren’t dealing with resources like Network Policy and init containers, though we’re also not completely ignoring them. By mapping many of the core primitives of Kubernetes to a Fly.io resource, we’re able to focus on continuing to build the primitives that make our cloud better for workloads of all shapes and sizes.</p>\n\n<p>A key thing to notice above is that there’s no “Node”.</p>\n\n<p><a href='https://virtual-kubelet.io/' title=''>Virtual Kubelet</a> plays a central role in FKS. It’s magic, really. A Virtual Kubelet acts as if it’s a standard Kubelet running on a Node, eager to run your workloads. However, there’s no Node backing it. It instead behaves like an API, receiving requests from Kubernetes and transforming them into requests to deploy on a cloud compute service. In our case, that’s Fly Machines.</p>\n\n<p>So what we have is Kubernetes calling out to our <a href='https://virtual-kubelet.io/docs/providers/' title=''>Virtual Kubelet provider</a>, a small Golang program we run alongside K3s, to create and run your pod. It creates <a href='https://fly.io/blog/docker-without-docker/' title=''>your pod as a Fly Machine</a>, via the <a href='/docs/machines/api/' title=''>Fly Machines API</a>, deploying it to any underlying host within that region. This shifts the burden of managing hardware capacity from you to us. We think that’s a cool trick—thanks, Virtual Kubelet magic!</p>\n<h2 id='speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#speedrun' aria-label='Anchor'></a><span class='plain-code'>Speedrun</span></h2>\n<p>You can deploy your workloads (including GPUs) across any of our available regions using the Kubernetes API.</p>\n\n<p>You create a cluster with <code>flyctl</code>:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-ly2t5r5g\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-ly2t5r5g\">fly ext k8s create --name hello --org personal --region iad\n</code></pre>\n  </div>\n</div>\n<p>When a cluster is created, it has the standard <code>default</code> namespace. You can inspect it:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-6wmi0e1f\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-6wmi0e1f\">kubectl get ns default --show-labels\n</code></pre>\n  </div>\n</div><div class=\"highlight-wrapper group relative output\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-vwi22w41\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight output whitespace-pre'><code id=\"code-vwi22w41\">NAME      STATUS   AGE   LABELS\ndefault   Active   20d   fly.io/app=fks-default-7zyjm3ovpdxmd0ep,kubernetes.io/metadata.name=default\n</code></pre>\n  </div>\n</div>\n<p>The <code>fly.io/app</code> label shows the name of the Fly App that corresponds to your cluster.</p>\n\n<p>It would seem appropriate to deploy the <a href='https://github.com/kubernetes-up-and-running/kuard' title=''>Kubernetes Up And Running demo</a> here, but since your pods are connected over an <a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''>IPv6 WireGuard mesh</a>, we’re going to use a <a href='https://github.com/jipperinbham/kuard' title=''>fork</a> with support for <a href='https://github.com/kubernetes-up-and-running/kuard/issues/46' title=''>IPv6 DNS</a>.</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-1t0a2uma\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-1t0a2uma\">kubectl run \\\n  --image=ghcr.io/jipperinbham/kuard-amd64:blue \\\n  --labels=\"app=kuard-fks\" \\\n  kuard\n</code></pre>\n  </div>\n</div>\n<p>And you can see its Machine representation via:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-leb9l9a3\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-leb9l9a3\">fly machine list --app fks-default-7zyjm3ovpdxmd0ep\n</code></pre>\n  </div>\n</div><div class=\"highlight-wrapper group relative output\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-w2frd6os\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight output whitespace-pre'><code id=\"code-w2frd6os\">ID              NAME        STATE   REGION  IMAGE                           IP ADDRESS                      VOLUME  CREATED                 LAST UPDATED            APP PLATFORM    PROCESS GROUP   SIZE\n1852291c46ded8  kuard       started iad     jipperinbham/kuard-amd64:blue   fdaa:0:48c8:a7b:228:4b6d:6e20:2         2024-03-05T18:54:41Z    2024-03-05T18:54:44Z                                    shared-cpu-1x:256MB\n</code></pre>\n  </div>\n</div>\n<p></div></p>\n\n<p>This is important! Your pod is a Fly Machine! While we don’t yet support all kubectl features, Fly.io tooling will “just work” for cases where we don’t yet support the kubectl way. So, for example, we don’t have <code>kubectl port-forward</code> and <code>kubectl exec</code>, but you can use flyctl to forward ports and get a shell into a pod.</p>\n\n<p>Expose it to your internal network using the standard ClusterIP Service:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-1xzi76dc\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-1xzi76dc\">kubectl expose pod kuard \\\n  --name=kuard \\\n  --port=8080 \\\n  --target-port=8080 \\\n  --selector='app=kuard-fks'\n</code></pre>\n  </div>\n</div>\n<p>ClusterIP Services work natively, and Fly.io internal DNS supports them. Within the cluster, CoreDNS works too.</p>\n\n<p>Access this Service locally via <a href='https://fly.io/docs/networking/private-networking/#flycast-private-load-balancing' title=''>flycast</a>: Get connected to your org’s <a href='https://fly.io/docs/networking/private-networking/' title=''>6PN private WireGuard network</a>. Get kubectl to describe the <code>kuard</code> Service:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-ucdw0m7q\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-ucdw0m7q\">kubectl describe svc kuard\n</code></pre>\n  </div>\n</div><div class=\"highlight-wrapper group relative output\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-1uyr9m4j\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight output'><code id=\"code-1uyr9m4j\">Name:              kuard\nNamespace:         default\nLabels:            app=kuard-fks\nAnnotations:       fly.io/clusterip-allocator: configured\n                   service.fly.io/sync-version: 11507529969321451315\nSelector:          app=kuard-fks\nType:              ClusterIP\nIP Family Policy:  SingleStack\nIP Families:       IPv6\nIP:                fdaa:0:48c8:0:1::1a\nIPs:               fdaa:0:48c8:0:1::1a\nPort:              <unset>  8080/TCP\nTargetPort:        8080/TCP\nEndpoints:         [fdaa:0:48c8:a7b:228:4b6d:6e20:2]:8080\nSession Affinity:  None\nEvents:            <none>\n</code></pre>\n  </div>\n</div>\n<p>You can pull out the Service’s IP address from the above output, and get at the KUARD UI using that: in this case, <code>http://[fdaa:0:48c8:0:1::1a]:8080</code>. </p>\n\n<p>Using internal DNS: <code>http://<service_name>.svc.<app_name>.flycast:8080</code>. Or, in our example: <code>http://kuard.svc.fks-default-7zyjm3ovpdxmd0ep.flycast:8080</code>.</p>\n\n<p>And finally CoreDNS: <code><service_name>.<namespace>.svc.cluster.local</code> resolves to the <code>fdaa</code> IP and is routable within the cluster.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Get in on the FKS beta</h1>\n    <p>Email us at [email protected]</p>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='pricing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pricing' aria-label='Anchor'></a><span class='plain-code'>Pricing</span></h2>\n<p>The Fly Kubernetes Service is free during the beta. Fly Machines and Fly Volumes you create with it will cost the <a href='https://fly.io/docs/about/pricing/' title=''>same as for your other Fly.io projects</a>. It’ll be <a href='https://fly.io/docs/about/pricing/#fly-kubernetes' title=''>$75/mo per cluster</a> after that, plus the cost of the other resources you create.</p>\n<h2 id='today-and-the-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#today-and-the-future' aria-label='Anchor'></a><span class='plain-code'>Today and the future</span></h2>\n<p>Today, Fly Kubernetes supports only a portion of the Kubernetes API. You can deploy pods using Deployments/ReplicaSets. Pods are able to communicate via Services using the standard K8s DNS format. Ephemeral and persistent volumes are supported.</p>\n\n<p>The most notable absences are: multi-container pods, StatefulSets, network policies, horizontal pod autoscaling and emptyDir volumes. We’re working at supporting autoscaling and emptyDir volumes in the coming weeks and multi-container pods in the coming months.</p>\n\n<p>If you’ve made it this far and are eagerly awaiting your chance to tell us and the rest of the internet “this isn’t Kubernetes!”, well, we agree! It’s not something we take lightly. We’re still building, and conformance tests may be in the future for FKS. We’ve made a deliberate decision to only care about fast launching VMs as the one and only way to run workloads on our cloud. And we also know enough of our customers would like to use the Kubernetes API to create a fast launching VM in the form of a Pod, and that’s where this story begins. </p>",
      "image": {
        "url": "https://fly.io/blog/fks-beta-live/assets/fks-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/tigris-public-beta/",
      "title": "Globally Distributed Object Storage with Tigris",
      "description": null,
      "url": "https://fly.io/blog/tigris-public-beta/",
      "published": "2024-02-15T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. That’s pretty cool, but we want to talk about something someone else built, that <a href=\"https://fly.io/docs/reference/tigris/\" title=\"\">you can use today</a> to build applications.</p>\n</div>\n<p>There are three hard things in computer science:</p>\n\n<ol>\n<li>Cache invalidation\n</li><li>Naming things\n</li><li><a href='https://aws.amazon.com/s3/' title=''>Doing a better job than Amazon of storing files</a>\n</li></ol>\n\n<p>Of all the annoying software problems that have no business being annoying, handling a file upload in a full-stack application stands apart, a universal if tractable malady, the plantar fasciitis of programming.</p>\n\n<p>Now, the actual act of clients placing files on servers is straightforward. Your framework <a href='https://hexdocs.pm/phoenix/file_uploads.html' title=''>has</a> <a href='https://edgeguides.rubyonrails.org/active_storage_overview.html' title=''>a</a> <a href='https://docs.djangoproject.com/en/5.0/topics/http/file-uploads/' title=''>feature</a> <a href='https://expressjs.com/en/resources/middleware/multer.html' title=''>that</a> <a href='https://github.com/yesodweb/yesod-cookbook/blob/master/cookbook/Cookbook-file-upload-saving-files-to-server.md' title=''>does</a> <a href='https://laravel.com/docs/10.x/filesystem' title=''>it</a>. What’s hard is making sure that uploads stick around to be downloaded later.</p>\n<aside class=\"right-sidenote\"><p>(yes, yes, we know, <a href=\"https://youtu.be/b2F-DItXtZs?t=102\" title=\"\">sharding /dev/null</a> is faster)</p>\n</aside>\n<p>Enter object storage, a pattern you may know by its colloquial name “S3”. Object storage occupies a funny place in software architecture, somewhere between a database and a filesystem. It’s like <a href='https://man7.org/linux/man-pages/man3/malloc.3.html' title=''><code>malloc</code></a><code>()</code>, but for cloud storage instead of program memory.</p>\n\n<p><a href='https://www.kleenex.com/en-us/' title=''>S3</a>—err, object storage — is so important that it was the second AWS service ever introduced (EC2 was not the first!). Everybody wants it. We know, because they keep asking us for it.</p>\n\n<p>So why didn’t we build it?</p>\n\n<p>Because we couldn’t figure out a way to improve on S3. And we still haven’t! But someone else did, at least for the kinds of applications we see on Fly.io.</p>\n<h2 id='but-first-some-back-story' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-first-some-back-story' aria-label='Anchor'></a><span class='plain-code'>But First, Some Back Story</span></h2>\n<p>S3 checks all the boxes. It’s trivial to use. It’s efficient and cost-effective. It has redundancies that would make a DoD contractor blush. It integrates with archival services like Glacier. And every framework supports it. At some point, the IETF should just take a deep sigh and write an S3 API RFC, XML signatures and all.</p>\n\n<p>There’s at least one catch, though.</p>\n\n<p>Back in, like, ‘07 people ran all their apps from a single city. S3 was designed to work for those kinds of apps. The data, the bytes on the disks (or whatever weird hyperputer AWS stores S3 bytes on), live in one place. A specific place. In a specific data center. As powerful and inspiring as The Architects are, they are mortals, and must obey the laws of physics.</p>\n\n<p>This observation feels banal, until you realize how apps have changed in the last decade. Apps and their users don’t live in one specific place. They live all over the world. When users are close to the S3 data center, things are amazing! But things get less amazing the further away you get from the data center, and even less amazing the smaller and more frequent your reads and writes are.</p>\n\n<p>(Thought experiment: you have to pick one place in the world to route all your file storage. Where is it? Is it <a href='https://www.tripadvisor.com/Restaurant_Review-g30246-d1956555-Reviews-Ford_s_Fish_Shack_Ashburn-Ashburn_Loudoun_County_Virginia.html' title=''>Loudoun County, Virginia</a>?)</p>\n\n<p>So, for many modern apps, you end up having to <a href='https://stackoverflow.com/questions/32426249/aws-s3-bucket-with-multiple-regions' title=''>write things into different regions</a>, so that people close to the data get it from a region-specific bucket. Doing that pulls in CDN-caching things that complicated your application and put barriers between you and your data. Before you know it, you’re wearing custom orthotics on your, uh, developer feet. (<em>I am done with this metaphor now, I promise.</em>)</p>\n<aside class=\"right-sidenote\"><p>(well, okay, Backblaze B2 because somehow my bucket fits into their free tier, but you get the idea)</p>\n</aside>\n<p>Personally, I know this happens. Because I had to build one! I run a <a href='https://xeiaso.net/blog/xedn/' title=''>CDN backend</a> that’s a caching proxy for S3 in six continents across the world. All so that I can deliver images and video efficiently for the readers of my blog.</p>\n<aside class=\"right-sidenote\"><p>(shut up, it’s a sandwich)</p>\n</aside>\n<p>What if data was really global? For some applications, it might not matter much. But for others, it matters a lot. When a sandwich lover in Australia snaps a picture of a <a href='https://en.wikipedia.org/wiki/Hamdog' title=''>hamdog</a>, the people most likely to want to see that photo are also in Australia. Routing those uploads and downloads through one building in Ashburn is no way to build a sandwich reviewing empire.</p>\n\n<p>Localizing all the data sounds like a hard problem. What if you didn’t need to change anything on your end to accomplish it?</p>\n<h2 id='show-me-a-hero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#show-me-a-hero' aria-label='Anchor'></a><span class='plain-code'>Show Me A Hero</span></h2>\n<p>Building a miniature CDN infrastructure just to handle file uploads seems like the kind of thing that could take a week or so of tinkering. The Fly.io unified theory of cloud development is that solutions are completely viable for full-stack developers only when they take less than 2 hours to get working.</p>\n\n<p>AWS agrees, which is why they have a SKU for it, <a href='https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_distribution' title=''>called Cloudfront</a>, which will, at some variably metered expense, optimize the read side of a single-write-region bucket: they’ll set up <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>a simple caching CDN</a> for you. You can probably get S3 and Cloudfront working within 2 hours, especially if you’ve set it up before.</p>\n\n<p>Our friends at Tigris have this problem down to single-digit minutes, and what they came up with is a lot cooler than a cache CDN.</p>\n\n<p>Here’s how it works. Tigris runs redundant FoundationDB clusters in our regions to track objects. They use Fly.io’s NVMe volumes as a first level of cached raw byte store, and a queuing system modelled on <a href='https://www.foundationdb.org/files/QuiCK.pdf' title=''>Apple’s QuiCK paper</a> to distribute object data to multiple replicas, to regions where the data is in demand, and to 3rd party object stores… like S3.</p>\n\n<p>If your objects are less than about 128 kilobytes, Tigris makes them instantly global. By default! Things are just snappy, all over the world, automatically, because they’ve done all the work.</p>\n\n<p>But it gets better, because Tigris is also much more flexible than a cache simple CDN. It’s globally distributed from the jump, with inter-region routing baked into its distribution layer. Tigris isn’t a CDN, but rather a toolset that you can use to build arbitrary CDNs, with consistency guarantees, instant purge and relay regions.</p>\n\n<p>There’s a lot going on in this architecture, and it’d be fun to dig into it more. But for now, you don’t have to understand any of it. Because Tigris ties all this stuff together with an S3-compatible object storage API. If your framework can talk to S3, it can use Tigris.</p>\n<h2 id='fly-storage' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-storage' aria-label='Anchor'></a><span class='plain-code'><code>fly storage</code></span></h2>\n<p>To get started with this, run the <code>fly storage create</code> command:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-yj4e9y5v\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-yj4e9y5v\">$ fly storage create\nChoose a name, use the default, or leave blank to generate one: xe-foo-images\nYour Tigris project (xe-foo-images) is ready. See details and next steps with: https://fly.io/docs/reference/tigris/\n\nSetting the following secrets on xe-foo:\nAWS_REGION\nBUCKET_NAME\nAWS_ENDPOINT_URL_S3\nAWS_ACCESS_KEY_ID\nAWS_SECRET_ACCESS_KEY\n\nSecrets are staged for the first deployment\n</code></pre>\n  </div>\n</div>\n<p>All you have to do is fill in a bucket name. Hit enter. All of the configuration for the AWS S3 library will be injected into your application for you. And you don’t even need to change the libraries that you’re using. <a href='https://www.tigrisdata.com/docs/sdks/s3/' title=''>The Tigris examples</a> all use the AWS libraries to put and delete objects into Tigris using the same calls that you use for S3.</p>\n\n<p>I know how this looks for a lot of you. It looks like we’re partnering with Tigris because we’re chicken, and we didn’t want to build something like this. Well, guess what: you’re right!</p>\n\n<p>Compute and networking: those are things we love and understand. Object storage? <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>We already gave away the game on how we’d design a CDN for our own content</a>, and it wasn’t nearly as slick as Tigris.</p>\n\n<p>Object storage is important. It needs to be good. We did not want to half-ass it. So we partnered with Tigris, so that they can put their full resources into making object storage as ✨magical✨ as Fly.io is.</p>\n\n<p>This also mirrors a lot of the Unix philosophy of Days Gone Past, you have individual parts that do one thing very well that are then chained together to create a composite result. I mean, come on, would you seriously want to buy your servers the same place you buy your shoes?</p>\n<h2 id='one-bill-to-rule-them-all' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#one-bill-to-rule-them-all' aria-label='Anchor'></a><span class='plain-code'>One bill to rule them all</span></h2>\n<p>Well, okay, the main reason why you would want to do that is because having everything under one bill makes it really easy for your accounting people. So, to make one bill for your computer, your block storage, your databases, your networking, and your object storage, we’ve wrapped everything under one bill. You don’t have to create separate accounts with Supabase or Upstash or PlanetScale or Tigris. Everything gets charged to your Fly.io bill and you pay one bill per month.</p>\n<aside class=\"right-sidenote\"><p>This was actually going to be posted on Valentine’s Day, but we had to wait for the chocolate to go on sale.</p>\n</aside>\n<p>This is our Valentine’s Day gift to you all. Object storage that just works. Stay tuned because we have a couple exciting features that build on top of the integration of Fly.io and Tigris that allow really unique things, such as truly global static website hosting and turning your bucket into a CDN in 5 minutes at most.</p>\n\n<p>Here’s to many more happy developer days to come.</p>",
      "image": {
        "url": "https://fly.io/blog/tigris-public-beta/assets/tigris-public-beta-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/gpu-ga/",
      "title": "GPUs on Fly.io are available to everyone!",
      "description": null,
      "url": "https://fly.io/blog/gpu-ga/",
      "published": "2024-02-12T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>Fly.io makes it easy to spin up compute around the world, now including powerful GPUs. Unlock the power of large language models, text transcription, and image generation with our datacenter-grade muscle!</p>\n</div>\n<p>GPUs are now available to everyone!</p>\n\n<p>We know you’ve been excited about wanting to use GPUs on Fly.io and we’re happy to announce that they’re available for everyone. If you want, you can spin up GPU instances with any of the following cards:</p>\n\n<ul>\n<li>Ampere A100 (40GB) <code>a100-40gb</code>\n</li><li>Ampere A100 (80GB) <code>a100-80gb</code>\n</li><li>Lovelace L40s (48GB) <code>l40s</code>\n</li></ul>\n\n<p>To use a GPU instance today, change the <code>vm.size</code> for one of your apps or processes to any of the above GPU kinds. Here’s how you can spin up an <a href='https://ollama.ai' title=''>Ollama</a> server in seconds:</p>\n<div class=\"highlight-wrapper group relative toml\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-j62lzfgm\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-j62lzfgm\"><span class=\"py\">app</span> <span class=\"p\">=</span> <span class=\"s\">\"your-app-name\"</span>\n<span class=\"py\">region</span> <span class=\"p\">=</span> <span class=\"s\">\"ord\"</span>\n<span class=\"py\">vm.size</span> <span class=\"p\">=</span> <span class=\"s\">\"l40s\"</span>\n\n<span class=\"nn\">[http_service]</span>\n  <span class=\"py\">internal_port</span> <span class=\"p\">=</span> <span class=\"mi\">11434</span>\n  <span class=\"py\">force_https</span> <span class=\"p\">=</span> <span class=\"kc\">false</span>\n  <span class=\"py\">auto_stop_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n  <span class=\"py\">auto_start_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n  <span class=\"py\">min_machines_running</span> <span class=\"p\">=</span> <span class=\"mi\">0</span>\n  <span class=\"py\">processes</span> <span class=\"p\">=</span> <span class=\"nn\">[\"app\"]</span>\n\n<span class=\"nn\">[build]</span>\n  <span class=\"py\">image</span> <span class=\"p\">=</span> <span class=\"s\">\"ollama/ollama\"</span>\n\n<span class=\"nn\">[mounts]</span>\n  <span class=\"py\">source</span> <span class=\"p\">=</span> <span class=\"s\">\"models\"</span>\n  <span class=\"py\">destination</span> <span class=\"p\">=</span> <span class=\"s\">\"/root/.ollama\"</span>\n  <span class=\"py\">initial_size</span> <span class=\"p\">=</span> <span class=\"s\">\"100gb\"</span>\n</code></pre>\n  </div>\n</div>\n<p>Deploy this and bam, large language model inferencing from anywhere. If you want a private setup, see the article <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a> for more information. You never know when you have a sandwich emergency and don’t know what you can make with what you have on hand.</p>\n\n<p>We are working on getting some lower-cost A10 GPUs in the next few weeks. We’ll update you when they’re ready.</p>\n\n<p>If you want to explore the possibilities of GPUs on Fly.io, here’s a few articles that may give you ideas:</p>\n\n<ul>\n<li><a href='https://fly.io/blog/not-midjourney-bot/' title=''>Deploy Your Own (Not) MidJourney Bot On Fly GPUs</a>\n</li><li><a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a>\n</li><li><a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>Transcribing on Fly GPU Machines</a>\n</li></ul>\n\n<p>Depending on factors such as your organization’s age and payment history, you may need to go through additional verification steps.</p>\n\n<p>If you’ve been experimenting with Fly.io GPUs and have made something cool, let us know on the <a href='https://community.fly.io/' title=''>Community Forums</a> or by mentioning us <a href='https://hachyderm.io/@flydotio' title=''>on Mastodon</a>! We’ll boost the cool ones.</p>",
      "image": {
        "url": "https://fly.io/blog/gpu-ga/assets/gpu-ga-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/event-driven-machines/",
      "title": "Event Driven Machines",
      "description": null,
      "url": "https://fly.io/blog/event-driven-machines/",
      "published": "2024-02-05T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We have fast booting VM’s, so why not <a href=\"https://fly.io/docs/speedrun/\" title=\"\">take advantage of them</a>?</p>\n</div>\n<p>Serverless is great because is has good ergonomics - when an event is received, a “not-server” boots quickly, code is run, and then everything is torn down. We’re billed only on usage.</p>\n\n<p>It turns out that Fly.io shares many of <a href='https://fly.io/blog/the-serverless-server/' title=''>the same ergonomics</a> as serverless. Can we do a serverless on Fly.io? 🦆 Well, if it’s quacking like a duck, let’s call it a mallard.</p>\n\n<p>Here’s a useful pattern for triggering our own not-servers with Fly Machines.</p>\n<h2 id='triggering-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#triggering-machines' aria-label='Anchor'></a><span class='plain-code'>Triggering Machines</span></h2>\n<p>I want to make Machines do some work based on my own events. Fly.io can already <a href='https://fly.io/docs/apps/autostart-stop/' title=''>stop Machines when idle</a> based on HTTP, so let’s concentrate on non-HTTP events.</p>\n\n<p>The process of running evented Machines involves:</p>\n\n<ol>\n<li>Listening for events\n</li><li>Spinning up Fly Machines to run our code (with the events as context)\n</li><li>Having event-aware code to run\n</li></ol>\n\n<p>To do this, I made a project and named it <a href='https://github.com/fly-apps/lambdo' title=''><strong class='font-semibold text-navy-950'>Lambdo</strong></a> because reasons.\nYou can consider this project “reference architecture” in the same way you call a toddler’s scribbling “art”.</p>\n\n<p>The goal is to run some of our code on a fresh not-server when an event is received. We want this done efficiently - a Machine should only exist long enough to process an event or 3.</p>\n\n<p>Lambdo does just that - it receives some events, and spins up Fly Machines with those events placed <em>inside</em> the VMs. Once the code finishes, the Machine is destroyed.</p>\n<div class='group relative min-w-0 bg-white shadow-md shadow-navy-500/10 rounded-xl mb-7 ring-1 ring-navy-300/40'><button type='button' class='bubble-wrap z-20 absolute right-2.5 top-2.5 text-transparent group-hover:text-navy-950 hocus:text-violet-600 bg-transparent group-hover:bg-white hocus:bg-violet-200/40 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none' data-wrap-target='#table-r8vpa1ai' data-wrap-type='nowrap'><svg class='w-5 h-5 pointer-events-none' viewBox='0 0 20 20' fill='none' stroke='currentColor' stroke-width='1.5' stroke-linecap='round' stroke-linejoin='round'><g buffered-rendering='static'><path d='M11.912 10.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.314 2.314 0 00-2.315-2.31H4.959M15.187 14.5H4.959M8.802 10H4.959' /><path d='M13.081 8.466l-1.548 1.571 1.548 1.571' /></g></svg><span class='bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950'>Wrap text</span></button><div class='min-w-0 overflow-x-auto rounded-xl'><table class='table-stripe table-stretch table-pad text-sm whitespace-nowrap m-0' id='table-r8vpa1ai'><thead class='text-navy-950 text-left'><tr>\n<th style=\"text-align: center\"><img alt=\"the files are inside the computer\" src=\"/blog/event-driven-machines/assets/files-are-inside-the-computer-cover.webp\" /></th>\n</tr>\n</thead><tbody><tr>\n<td style=\"text-align: center\">The files are <em>in</em> the computer!</td>\n</tr>\n</tbody></table></div></div><h2 id='listening-for-events' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#listening-for-events' aria-label='Anchor'></a><span class='plain-code'>Listening for Events</span></h2>\n<p>For our purposes, an event is just a JSON object. <code>{\"any\": \"object\", \"will\": \"do\"}</code>.</p>\n\n<p>We want to turn events into compute, so we need some sort of event system. I decided to use a queue.</p>\n<h3 id='the-queue' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-queue' aria-label='Anchor'></a><span class='plain-code'>The Queue</span></h3>\n<p>The first thing I needed was a place to send events! I chose to use SQS, which let me continue to pretend servers don’t exist.</p>\n\n<p>It’s no surprise then that the first part of this project is <a href='https://github.com/fly-apps/lambdo/blob/main/internal/sqs/get_events.go' title=''>code that polls SQS</a>.</p>\n\n<p>When the polling returns some non-zero number of events, it collects the SQS messages’ JSON strings (and some meta data), resulting in an array of objects (a list of events).</p>\n\n<p>Then we send these events to some Machines.</p>\n<h2 id='spinning-up-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#spinning-up-machines' aria-label='Anchor'></a><span class='plain-code'>Spinning Up Machines</span></h2>\n<p>Fly Machines are fast-booting Micro-VM’s, controlled by an <a href='https://fly.io/docs/machines/working-with-machines/' title=''>API</a>.</p>\n\n<p>A feature of that API is the ability to <a href='https://community.fly.io/t/machine-files/14453' title=''>create files</a> on a new Machine. This is how we’ll get our events into the Machine.</p>\n\n<p>When Lambdo creates a Machine, it places a file at <code>/tmp/events.json</code>. Our code just needs to read that file and parse the JSON.</p>\n<h3 id='running-our-code' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#running-our-code' aria-label='Anchor'></a><span class='plain-code'>Running Our Code</span></h3>\n<p>Part of the ergonomics of Serverless is (usually) being limited to running just a function. Fly.io doesn’t really care what you run, which is to our advantage. We can choose to write discreet functions per event, or we can bring our whole <a href='https://signalvnoise.com/svn3/the-majestic-monolith/' title=''>Majestic Monolith</a> to bear.</p>\n\n<p>How do we package up our code? The real answer is “however you want!”, but here’s 2 ideas.</p>\n\n<p><strong class='font-semibold text-navy-950'>Use Your Existing Code Base</strong></p>\n\n<p>You can just use your existing code base. This is especially easy if you’re already deploying apps to Fly.io.</p>\n\n<p>All we’d need to do is add some additional code - a command perhaps (<code>rake</code>, <code>artisan</code>, whatever) - that sucks in that JSON, iterates over the events, and does some stuff.</p>\n<div class=\"highlight-wrapper group relative php\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-38m30kym\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-38m30kym\"><span class=\"nv\">$events</span> <span class=\"o\">=</span> <span class=\"nb\">json_decode</span><span class=\"p\">(</span><span class=\"nb\">file_get_contents</span><span class=\"p\">(</span><span class=\"s2\">\"/tmp/events.json\"</span><span class=\"p\">));</span>\n\n<span class=\"k\">foreach</span> <span class=\"p\">(</span><span class=\"nv\">$events</span> <span class=\"k\">as</span> <span class=\"nv\">$event</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"c1\">// do a thing</span>\n<span class=\"p\">}</span>\n</code></pre>\n  </div>\n</div>\n<p>When we create an event, we’ll tell Lambdo how to run your code - more on that later.</p>\n\n<p><strong class='font-semibold text-navy-950'>Use Lambdo’s Base Images</strong></p>\n\n<p>This project also provides some “runtimes” (base images). This is a bit more “traditional serverless”, were you provide a function to run.</p>\n\n<p>Lambdo contains <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes' title=''>two runtimes</a> right now - Node and PHP. There could be more, of course, but you know…lazy.</p>\n\n<p>The Node runtime <a href='https://github.com/fly-apps/lambdo/blob/main/runtimes/js/src/index.js' title=''>contains some code</a> that will read the JSON payload file   (again, just an array of JSON events), and call a user-supplied JS function once per event.</p>\n\n<p>An <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes/js/sample-project' title=''>example is here</a> - our code just needs to export a function that does stuff to the given event:</p>\n<div class=\"highlight-wrapper group relative javascript\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-wh6y1w2p\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-wh6y1w2p\"><span class=\"c1\">// File /app/index.js</span>\n<span class=\"nx\">exports</span><span class=\"p\">.</span><span class=\"nx\">handler</span> <span class=\"o\">=</span> <span class=\"k\">async</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">event</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nx\">log</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">Let's process an event! The event:</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"nx\">event</span><span class=\"p\">)</span>\n<span class=\"p\">}</span>\n</code></pre>\n  </div>\n</div>\n<p>The <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes/php' title=''>PHP runtime</a> is the same idea, a user-supplied handler looks like this:</p>\n<div class=\"highlight-wrapper group relative php\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-jhm1hg3i\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-jhm1hg3i\"><span class=\"c1\">// File /app/index.php</span>\n<span class=\"k\">return</span> <span class=\"k\">function</span> <span class=\"n\">function</span><span class=\"p\">(</span><span class=\"kt\">array</span> <span class=\"nv\">$event</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"c1\">// Do something with $event</span>\n<span class=\"p\">}</span>\n</code></pre>\n  </div>\n</div>\n<p>Explore the <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes' title=''>runtime</a> directory of the project to see how that’s put together.</p>\n<h2 id='sending-an-event' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#sending-an-event' aria-label='Anchor'></a><span class='plain-code'>Sending an Event</span></h2>\n<p>Since our events are sent via SQS queue, it would be helpful to see an example SQS message. Remember how I mentioned the SQS message has some meta data?</p>\n\n<p>Here’s an example, with said meta data:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-cgzk0n59\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-cgzk0n59\">aws sqs send-message <span class=\"se\">\\</span>\n    <span class=\"nt\">--queue-url</span><span class=\"o\">=</span>https://sqs.<region>.amazonaws.com/<account>/<queue> <span class=\"se\">\\</span>\n    <span class=\"nt\">--message-body</span><span class=\"o\">=</span><span class=\"s1\">'{\"foo\": \"bar\"}'</span> <span class=\"se\">\\</span>\n    <span class=\"nt\">--message-attributes</span><span class=\"o\">=</span><span class=\"s1\">'{\n\"size\":{\"DataType\":\"String\",\"StringValue\":\"performance-2x\"}, \n\"image\":{\"DataType\":\"String\",\"StringValue\":\"fideloper/lambdo-php-sample:latest\"}\n}'</span>\n</code></pre>\n  </div>\n</div>\n<p>The Body field of the SQS message is assumed to be a JSON string (it’s the event itself, and its contents are arbitrary - whatever makes sense for you).</p>\n\n<p>The message Attributes contains the meta data - up to 3 important details:</p>\n\n<ol>\n<li><code>image</code>: The image to run (it might be a Docker Hub image, or something you pushed to registry.fly.io). This is <strong class='font-semibold text-navy-950'>required</strong>.\n</li><li><code>size</code>: The CPU size and type to use† - defaults to <code>performance-2x</code>\n</li><li><code>command</code>: The command to run, which is the Docker <code>CMD</code> equivalent - defaults to whatever your <code>CMD</code> is set in the <code>Dockerfile</code> used to create the Machine image.††\n</li></ol>\n\n<p>†You can get valid values for the <code>size</code> option by running <code>fly platform vm-sizes</code>.</p>\n\n<p>††It’s an array form, e.g. <code>[\"php\", \"artisan\", \"foo\"]</code>, you may need to do some escaping of double quotes if you’re sending messages to SQS via terminal.</p>\n<h2 id='we-did-a-lambda' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-did-a-lambda' aria-label='Anchor'></a><span class='plain-code'>We did a Lambda?</span></h2>\n<p>Fly.io isn’t serverless, but it has all these primitives that add up to serverless. You have events, Fly.io has fast-booting VM’s. They just make sense together!</p>\n\n<p>What we did here is use <a href='https://github.com/fly-apps/lambdo' title=''><strong class='font-semibold text-navy-950'>Lambdo</strong> to respond to events by spinning up a Machine</a>. Our code can process those events any way we want.</p>\n\n<p>What I like about this approach is how flexible it can be. We can choose the base image to use and the server type (even using GPU-enabled Machines) <em>per event</em>.\nSince we have full control over the Machine VM’s responding to the events, we can do whatever we want inside of them. Pretty neat!</p>",
      "image": {
        "url": "https://fly.io/blog/event-driven-machines/assets/lambdo-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/delegate-tasks-to-fly-machines/",
      "title": "Delegating tasks to Fly Machines",
      "description": null,
      "url": "https://fly.io/blog/delegate-tasks-to-fly-machines/",
      "published": "2024-02-01T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io. We run apps for our users on hardware we host around the world. Leveraging Fly.io Machines and Fly.io’s private network can make delegating expensive tasks a breeze. It’s easy to <a href=\"/docs/speedrun/\" title=\"\">get started</a>!</p>\n</div>\n<p>There are many ways to delegate work in web applications, from using background workers to serverless architecture. In this article, we explore a new machine pattern that takes advantage of Fly Machines and distinct process groups to make quick work of resource-intensive tasks.</p>\n<h2 id='the-problem' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-problem' aria-label='Anchor'></a><span class='plain-code'>The Problem</span></h2>\n<p>Let’s say you’re building a web application that has a few tasks that demand a hefty amount of memory or CPU juice. Resizing images, for example, can require a shocking amount of memory, but you might not need that much memory <em>all</em> of the time, for handling most of your web requests. Why pay for all that horsepower when you don’t need it most of the time?</p>\n\n<p>What if there’s a different way to delegate these resource-intensive tasks?</p>\n<h2 id='the-solution' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-solution' aria-label='Anchor'></a><span class='plain-code'>The Solution</span></h2>\n<p>What if you could simply delegate these types of tasks to a more powerful machine <em>only</em> when necessary? Let’s build an example of this method in a sample app. We’ll be using Next.js today, but this pattern is framework (and language) agnostic.</p>\n\n<p>Here’s how it will work:</p>\n\n<ul>\n<li>A request hits an endpoint that does some resource-intensive tasks\n</li><li>The request is passed on to a copy of your app that’s running on a more beefy machine\n</li><li>The beefy machine performs the intensive work and then hands the result back to the user via the “weaker” machine.\n</li></ul>\n\n<p><img alt=\"(A 3 panel comic of two characters, one small and one big and strong, both with computer screens for heads. Panel 1: Little guy hands the big guy a jar of pickles. Panel 2: Big guy opens the pickle jar. Panel 3: Big guy hands back the opened jar to the little guy, who is pleased; Illustration by Annie Sexton)\" src=\"/blog/delegate-tasks-to-fly-machines/assets/./3-panel-comic-delegate-tasks-to-fly-machines.webp\" /></p>\n\n<p>To demonstrate this task-delegation pattern, we’re going to start with a single-page application that looks like this:</p>\n\n<p><img alt=\"(Screenshot of the demo app; its a single-page app with the header and description \"Open Pickle Jar: You've got a jar of pickles (a zip file of some high-def pickle photos) that you would like to open (resize and display below)\". Under the description there are two inputs, one for width and one for height, and a button that says \"Open pickle jar\")\" src=\"/blog/delegate-tasks-to-fly-machines/assets/./pickle-jar-screenshot.webp\" /></p>\n\n<p>Our “Open Pickle Jar” app is quite simple: you provide the width and height and it goes off and resizes some high-resolution photos to those dimensions (exciting!).</p>\n\n<p>If you’d like to follow along, you can clone the <code>start-here</code> branch of this repository: <a href='https://github.com/fly-apps/open-pickle-jar' title=''>https://github.com/fly-apps/open-pickle-jar</a> . The final changes are visible on the <code>main</code> branch. This app uses S3 for image storage, so you’ll need to create a bucket called <code>open-pickle-jar</code> and provide <code>AWS_REGION</code>,  <code>AWS_ACCESS_KEY_ID</code>, and  <code>AWS_SECRET_ACCESS_KEY</code> as environment variables.</p>\n\n<p>This task is really just a stand-in for any HTTP request that kicks off a resource-intensive task. Get the request from the user, delegate it to a more powerful machine, and then return the result to the user. It’s what happens when you can’t open a pickle jar, and you ask for someone to help.</p>\n\n<p>Before we start, let’s define some terms and what they mean on Fly.io:</p>\n\n<ul>\n<li><strong class='font-semibold text-navy-950'>Machines:</strong> Extremely fast-booting VMs. They can exist in different regions and even run different processes.\n</li><li><strong class='font-semibold text-navy-950'>App:</strong> An abstraction for a group of Machines running your code on Fly.io, along with the configuration, provisioned resources, and data we need to keep track of to run and route to your Machines.\n</li><li><strong class='font-semibold text-navy-950'>Process group:</strong> A collection of Machines running a specific process. Many apps only run a single process (typically a public-facing HTTP server), but you can define any number of them.\n</li><li><strong class='font-semibold text-navy-950'>fly.toml:</strong> A configuration file for deploying apps on Fly.io where you can set things like Machine specs, process groups, regions, and more.\n</li></ul>\n\n<hr>\n\n<p><strong class='font-semibold text-navy-950'>Setup Overview</strong></p>\n\n<p>Here’s what we’ll need for our application:</p>\n\n<ol>\n<li>A <strong class='font-semibold text-navy-950'>route</strong> that performs our resource-intensive task\n</li><li>A <strong class='font-semibold text-navy-950'>wrapper function</strong> that either:\n\n<ol>\n<li>Runs our resource-intensive task OR\n</li><li>Forwards the request to our more powerful Machine\n</li></ol>\n</li><li><strong class='font-semibold text-navy-950'>Two process groups</strong> running the <em>same process</em> but with differing Machine specs:\n\n<ol>\n<li>One for accepting HTTP traffic and handling most requests (let’s call it <code>web</code>)\n</li><li>One internal-only group for doing the heavy lifting (let’s call it <code>worker</code>)\n</li></ol>\n</li></ol>\n\n<p>In short, this is what our architecture will look like, a standard web and worker duo.</p>\n\n<p><img alt=\"(A simple graphic illustrating two servers; a small box containing \"npm run start\" and a larger box containing the same thing. The small is labeled \"web\" and the larger box is labeled \"worker\".)\" src=\"/blog/delegate-tasks-to-fly-machines/assets/./web-worker.webp\" /></p>\n<h3 id='creating-our-route' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#creating-our-route' aria-label='Anchor'></a><span class='plain-code'>Creating our route</span></h3>\n<p>Next.js has two distinct routing patterns: Pages and App router. We’ll use the App router in our example since it’s the preferred method moving forward.</p>\n\n<p>Under your <code>/app</code> directory, create a new folder called <code>/open-pickle-jar</code> containing a <code>route.ts</code> .</p>\n\n<p>(We’re using TypeScript here, but feel free to use normal JavaScript if you prefer!)</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-ujx4kwap\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-ujx4kwap\">...\n/app\n  /open-pickle-jar\n    route.ts\n...\n</code></pre>\n  </div>\n</div>\n<p>Inside <code>route.ts</code> we’ll flesh out our endpoint:</p>\n<div class=\"highlight-wrapper group relative typescript\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-j1a8qd9i\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-j1a8qd9i\"><span class=\"c1\">// /app/open-pickle-jar/route.ts</span>\n\n<span class=\"k\">import</span> <span class=\"nx\">delegateToWorker</span> <span class=\"k\">from</span> <span class=\"dl\">\"</span><span class=\"s2\">@/utils/delegateToWorker</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"k\">import</span> <span class=\"p\">{</span> <span class=\"nx\">NextRequest</span><span class=\"p\">,</span> <span class=\"nx\">NextResponse</span> <span class=\"p\">}</span> <span class=\"k\">from</span> <span class=\"dl\">\"</span><span class=\"s2\">next/server</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"k\">import</span> <span class=\"p\">{</span> <span class=\"nx\">openPickleJar</span> <span class=\"p\">}</span> <span class=\"k\">from</span> <span class=\"dl\">\"</span><span class=\"s2\">../openPickleJar</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n\n<span class=\"k\">export</span> <span class=\"k\">async</span> <span class=\"kd\">function</span> <span class=\"nx\">POST</span><span class=\"p\">(</span><span class=\"nx\">request</span><span class=\"p\">:</span> <span class=\"nx\">NextRequest</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"kd\">const</span> <span class=\"p\">{</span> <span class=\"nx\">width</span><span class=\"p\">,</span> <span class=\"nx\">height</span> <span class=\"p\">}</span> <span class=\"o\">=</span> <span class=\"k\">await</span> <span class=\"nx\">request</span><span class=\"p\">.</span><span class=\"nx\">json</span><span class=\"p\">();</span>\n  <span class=\"kd\">const</span> <span class=\"nx\">path</span> <span class=\"o\">=</span> <span class=\"nx\">request</span><span class=\"p\">.</span><span class=\"nx\">nextUrl</span><span class=\"p\">.</span><span class=\"nx\">pathname</span><span class=\"p\">;</span>\n  <span class=\"kd\">const</span> <span class=\"nx\">body</span> <span class=\"o\">=</span> <span class=\"k\">await</span> <span class=\"nx\">delegateToWorker</span><span class=\"p\">(</span><span class=\"nx\">path</span><span class=\"p\">,</span> <span class=\"nx\">openPickleJar</span><span class=\"p\">,</span> <span class=\"p\">{</span> <span class=\"nx\">width</span><span class=\"p\">,</span> <span class=\"nx\">height</span> <span class=\"p\">});</span>\n  <span class=\"k\">return</span> <span class=\"nx\">NextResponse</span><span class=\"p\">.</span><span class=\"nx\">json</span><span class=\"p\">(</span><span class=\"nx\">body</span><span class=\"p\">);</span>\n<span class=\"p\">}</span>\n</code></pre>\n  </div>\n</div>\n<p>The function <code>openPickleJar</code> that we’re importing contains our resource-intensive task, which in this case is extracting images from a <code>.zip</code> file, resizing them all to the new dimensions, and returning the new image URLs.</p>\n\n<p>The <code>POST</code> function is how one define routes for specific HTTP methods in Next.js, and ours implements a function  <code>delegateToWorker</code> that accepts the path of the current endpoint (<code>/open-pickle-jar</code>) our resource-intensive function, and the same request parameters. This function doesn’t yet exist, so let’s build that next!</p>\n<h3 id='creating-our-wrapper-function' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#creating-our-wrapper-function' aria-label='Anchor'></a><span class='plain-code'>Creating our wrapper function</span></h3>\n<p>Now that we’ve set up our endpoint, let’s flesh out the wrapper function that delegates our request to a more powerful machine.</p>\n\n<p>We haven’t defined our process groups just yet, but if you recall, the plan is to have two:</p>\n\n<ol>\n<li><code>web</code> - Our standard web server\n</li><li><code>worker</code> - For opening pickle jars (e.g. doing resource-intensive work). It’s essentially a duplicate of <code>web</code>, but running on beefier Machines.\n</li></ol>\n\n<p>Here’s what we want this wrapper function to do:</p>\n\n<ul>\n<li>If the current machine is a <code>worker</code> , proceed to execute the resource-intensive task\n</li><li>If the current machine is NOT a <code>worker</code> , make a new request to the identical endpoint on a <code>worker</code> Machine\n</li></ul>\n\n<p>Inside your <code>/utils</code> directory, create a file called <code>delegateToWorker.ts</code> with the following content:</p>\n<div class=\"highlight-wrapper group relative typescript\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-sc0e1it9\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-sc0e1it9\"><span class=\"c1\">// /utils/delegateToWorker.ts</span>\n\n<span class=\"k\">export</span> <span class=\"k\">default</span> <span class=\"k\">async</span> <span class=\"kd\">function</span> <span class=\"nx\">delegateToWorker</span><span class=\"p\">(</span><span class=\"nx\">path</span><span class=\"p\">:</span> <span class=\"kr\">string</span><span class=\"p\">,</span> <span class=\"nx\">func</span><span class=\"p\">:</span> <span class=\"p\">(...</span><span class=\"nx\">args</span><span class=\"p\">:</span> <span class=\"kr\">any</span><span class=\"p\">[])</span> <span class=\"o\">=></span> <span class=\"nb\">Promise</span><span class=\"o\"><</span><span class=\"kr\">any</span><span class=\"o\">></span><span class=\"p\">,</span> <span class=\"nx\">args</span><span class=\"p\">:</span> <span class=\"nx\">object</span><span class=\"p\">):</span> <span class=\"nb\">Promise</span><span class=\"o\"><</span><span class=\"kr\">any</span><span class=\"o\">></span> <span class=\"p\">{</span>\n\n  <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"nx\">process</span><span class=\"p\">.</span><span class=\"nx\">env</span><span class=\"p\">.</span><span class=\"nx\">FLY_PROCESS_GROUP</span> <span class=\"o\">===</span> <span class=\"dl\">'</span><span class=\"s1\">worker</span><span class=\"dl\">'</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nx\">log</span><span class=\"p\">(</span><span class=\"dl\">'</span><span class=\"s1\">running on the worker...</span><span class=\"dl\">'</span><span class=\"p\">);</span>\n    <span class=\"k\">return</span> <span class=\"nx\">func</span><span class=\"p\">({...</span><span class=\"nx\">args</span><span class=\"p\">});</span>\n\n  <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"p\">{</span>\n    <span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nx\">log</span><span class=\"p\">(</span><span class=\"dl\">'</span><span class=\"s1\">sending new request to worker...</span><span class=\"dl\">'</span><span class=\"p\">);</span>\n    <span class=\"kd\">const</span> <span class=\"nx\">workerHost</span> <span class=\"o\">=</span> <span class=\"nx\">process</span><span class=\"p\">.</span><span class=\"nx\">env</span><span class=\"p\">.</span><span class=\"nx\">NODE_ENV</span> <span class=\"o\">===</span> <span class=\"dl\">'</span><span class=\"s1\">development</span><span class=\"dl\">'</span> <span class=\"p\">?</span> <span class=\"dl\">'</span><span class=\"s1\">localhost:3001</span><span class=\"dl\">'</span> <span class=\"p\">:</span> <span class=\"s2\">`worker.process.</span><span class=\"p\">${</span><span class=\"nx\">process</span><span class=\"p\">.</span><span class=\"nx\">env</span><span class=\"p\">.</span><span class=\"nx\">FLY_APP_NAME</span><span class=\"p\">}</span><span class=\"s2\">.internal:3000`</span><span class=\"p\">;</span>\n\n    <span class=\"kd\">const</span> <span class=\"nx\">response</span> <span class=\"o\">=</span> <span class=\"k\">await</span> <span class=\"nx\">fetch</span><span class=\"p\">(</span><span class=\"s2\">`http://</span><span class=\"p\">${</span><span class=\"nx\">workerHost</span><span class=\"p\">}${</span><span class=\"nx\">path</span><span class=\"p\">}</span><span class=\"s2\">`</span><span class=\"p\">,</span> <span class=\"p\">{</span>\n      <span class=\"na\">method</span><span class=\"p\">:</span> <span class=\"dl\">'</span><span class=\"s1\">POST</span><span class=\"dl\">'</span><span class=\"p\">,</span>\n      <span class=\"na\">headers</span><span class=\"p\">:</span> <span class=\"p\">{</span>\n        <span class=\"dl\">'</span><span class=\"s1\">Content-Type</span><span class=\"dl\">'</span><span class=\"p\">:</span> <span class=\"dl\">'</span><span class=\"s1\">application/json</span><span class=\"dl\">'</span>\n      <span class=\"p\">},</span>\n      <span class=\"na\">body</span><span class=\"p\">:</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nx\">stringify</span><span class=\"p\">({...</span><span class=\"nx\">args</span> <span class=\"p\">})</span>\n    <span class=\"p\">});</span>\n    <span class=\"k\">return</span> <span class=\"nx\">response</span><span class=\"p\">.</span><span class=\"nx\">json</span><span class=\"p\">();</span>\n  <span class=\"p\">}</span>\n\n<span class=\"p\">}</span>\n</code></pre>\n  </div>\n</div>\n<p>In our <code>else</code> section, you’ll notice that while developing locally (aka, when <code>NODE_ENV</code> is <code>development</code>) we define the hostname of our <code>worker</code> process to be <code>localhost:3001</code>. Typically Next.js apps run on port <code>3000</code>, so while testing our app locally, we can have two instances of our process running in different terminal shells:</p>\n\n<ul>\n<li><code>npm run dev</code> - This will run on <code>localhost:3000</code> and will act as our local <code>web</code> process\n</li><li><code>FLY_PROCESS_GROUP=worker npm run dev</code> - This will run on <code>localhost:3001</code> and will act as our <code>worker</code> process (Next.js should auto-increment the port if the original <code>3000</code> is already in use)\n</li></ul>\n\n<p>Also, if you’re wondering about the <code>FLY_PROCESS_GROUP</code> and <code>FLY_APP_NAME</code> constants, these are <a href='https://fly.io/docs/reference/runtime-environment/' title=''>Fly.io-specific runtime environment variables</a> available on all apps.</p>\n<h3 id='accessing-our-worker-machines-internal' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accessing-our-worker-machines-internal' aria-label='Anchor'></a><span class='plain-code'>Accessing our <code>worker</code> Machines (<code>.internal</code>)</span></h3>\n<p>Now, when this code is running in production (aka <code>NODE_ENV</code> is NOT <code>development</code>) you’ll see that we’re using a unique hostname to access our <code>worker</code> Machine.</p>\n\n<p>Apps belonging to the same organization on Fly.io are provided a number of <a href='https://fly.io/docs/networking/private-networking/#fly-io-internal-addresses' title=''>internal addresses</a>. These <code>.internal</code> addresses let you point to different Apps and Machines in your private network. For example:</p>\n\n<ul>\n<li><code><region>.<app name>.internal</code> – To reach app instances in a particular region, like <code>gru.my-cool-app.internal</code>\n</li><li><code><app instance ID>.<app name>.internal</code> - To reach a <em>specific</em> app instance.\n</li><li><code><process group>.process.<app name>.internal</code> - To target app instances belonging to a specific process group. <strong class='font-semibold text-navy-950'>This is what we’re using in our app.</strong>\n</li></ul>\n\n<p>Since our <code>worker</code> process group is running the same process as our <code>web</code> process (in our case, <code>npm run start</code>), we’ll also need to make sure we use the same internal port (<code>3000</code>).</p>\n<h3 id='defining-our-process-groups-and-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#defining-our-process-groups-and-machines' aria-label='Anchor'></a><span class='plain-code'>Defining our process groups and Machines</span></h3>\n<p>The last thing to do will be to define our two process groups and their respective Machine specs. We’ll do this by editing our <code>fly.toml</code> configuration.</p>\n\n<p>If you don’t have this file, go ahead and create a blank one and use the content below, but replace <code>app = open-pickle-jar</code> with your app’s name, as well as your preferred <code>primary_region</code>.  If you don’t know what region you’d like to deploy to, <a href='https://fly.io/docs/reference/regions/' title=''>here’s the list of them</a>.</p>\n\n<p><strong class='font-semibold text-navy-950'>Before you deploy:</strong> Note that deploying this example app will spin up <strong class='font-semibold text-navy-950'>billable</strong> machines. Please feel free to alter the Machine (<code>[[vm]]</code>) specs listed here to ones that suit your budget or app’s needs.</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-lcp533re\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-lcp533re\">app = \"open-pickle-jar\"\nprimary_region = \"sea\"\n\n[build]\n\n[processes]\n  web = \"npm run start\"\n  worker = \"npm run start\"\n\n[http_service]\n  internal_port = 3000\n  force_https = true\n  auto_stop_machines = true\n  auto_start_machines = true\n  min_machines_running = 1\n  processes = [\"web\"]\n\n[[vm]]\n  cpu_kind = \"shared\"\n  cpus = 1\n  memory_mb = 1024\n  processes = [\"web\"]\n\n[[vm]]\n  size = \"performance-4x\"\n  processes = [\"worker\"]\n</code></pre>\n  </div>\n</div>\n<p>And that’s it! With our <code>fly.toml</code> finished, we’re ready to deploy our app!</p>\n\n<p><img src=\"https://slabstatic.com/prod/uploads/p1b436gf/posts/images/tH4GaGLVaDkh3RhIwCpiDRX3.png\" /></p>\n<h2 id='discussion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#discussion' aria-label='Anchor'></a><span class='plain-code'>Discussion</span></h2>\n<p>Today we built a machine pattern on top of Fly.io. This pattern allows us to have a lighter request server that can delegate certain tasks to a stronger server, meaning that we can have one Machine do all the heavy lifting that could block everything else while the other handles all the simple tasks for users. With this in mind, this is a fairly naïve implementation, and we can make this much better:</p>\n<h3 id='using-a-queue-for-better-resiliency' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#using-a-queue-for-better-resiliency' aria-label='Anchor'></a><span class='plain-code'>Using a queue for better resiliency</span></h3>\n<p>In its current state, our code isn’t very resilient to failed requests. For this reason, you may want to consider keeping track of jobs in a queue with Redis (similar to Sidekiq in Ruby-land). When you have work you want to do, put it in the queue. Your queue worker would have to write the result somewhere (e.g., in Redis) that the application could fetch when it’s ready.</p>\n<h3 id='starting-stopping-worker-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#starting-stopping-worker-machines' aria-label='Anchor'></a><span class='plain-code'>Starting/stopping worker Machines</span></h3>\n<p>The benefit of this pattern is that you can limit how many “beefy” Machines you need to have available at any given time. Our demo app doesn’t dictate how many <code>worker</code> Machines to have at any given time, but by adding timeouts you could elect to start and stop them as needed.</p>\n\n<p>Now, you may think that constantly starting and stopping Machines might incur higher response times, but note that we are NOT talking about creating/destroying Machines. Starting and stopping Machines only takes as long as it takes to start your web server (i.e. <code>npm run start</code>). The best part is that <strong class='font-semibold text-navy-950'>Fly.io does not charge for the CPU and RAM usage of stopped Machines.</strong> <a href='https://community.fly.io/t/we-are-going-to-start-collecting-charges-for-stopped-machines-rootfs-starting-april-25th/17825' title=''>We will charge for storage of their root filesystems on disk, starting April 25th, 2024</a>. Stopped Machines will still be much cheaper than running ones.</p>\n<h3 id='what-about-serverless-functions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-about-serverless-functions' aria-label='Anchor'></a><span class='plain-code'>What about serverless functions?</span></h3>\n<p>This “delegate to a beefy machine” pattern is similar to serverless functions with platforms like AWS Lambda. The main difference is that serverless functions usually require you to segment your application into a bunch of small pieces, whereas the method discussed today just uses the app framework that you deploy to production. Each pattern has its own benefits and downsides.</p>\n<h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2>\n<p>The pattern outlined here is one more tool in your arsenal for scaling applications. By utilizing Fly.io’s private network and <code>.internal</code> domains, it’s quick and easy to pass work between different processes that run our app. If you’d like to learn about more methods for scaling tasks in your applications, check out <a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''>Rethinking Serverless with FLAME</a> by Chris McCord and <a href='https://fly.io/blog/print-on-demand/' title=''>Print on Demand</a> by Sam Ruby.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Get more done on Fly.io</h1>\n    <p>Fly.io has fast booting machines at the ready for your dynamic workloads. It’s easy to get started. You can be off and running in minutes.</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/docs/speedrun/\">\n        Deploy something today!  <span class='opacity:50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-rabbit.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>",
      "image": {
        "url": "https://fly.io/blog/delegate-tasks-to-fly-machines/assets/delegate-tasks-to-fly-machines-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/macaroons-escalated-quickly/",
      "title": "Macaroons Escalated Quickly",
      "description": null,
      "url": "https://fly.io/blog/macaroons-escalated-quickly/",
      "published": "2024-01-31T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We built a new security token system, and can I tell you the good news about our lord and savior the Macaroon?</p>\n</div><h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'></a><span class='plain-code'>1</span></h2>\n<p>Let’s implement an API token together. It’s a design called “Macaroons”, but don’t get hung up on that yet.</p>\n\n<p>First some <button toggle=\"#includes\">throat-clearing</button>. Then:</p>\n<div id=\"includes\" toggle-content=\"\" aria-label=\"show very boring code\"><div class=\"highlight-wrapper group relative python\">\n  <button type=\"button\" class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\" data-wrap-target=\"#code-i4dzjfqm\">\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\"></path><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\"></path></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button type=\"button\" class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\" data-copy-target=\"sibling\">\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\"></path><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\"></path></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class=\"highlight relative group\">\n    <pre class=\"highlight \"><code id=\"code-i4dzjfqm\"><span class=\"kn\">import</span> <span class=\"nn\">sys</span>\n<span class=\"kn\">import</span> <span class=\"nn\">os</span>\n<span class=\"kn\">import</span> <span class=\"nn\">json</span>\n<span class=\"kn\">import</span> <span class=\"nn\">hmac</span> <span class=\"k\">as</span> <span class=\"n\">hm</span>\n<span class=\"kn\">from</span> <span class=\"nn\">base64</span> <span class=\"kn\">import</span> <span class=\"n\">b64encode</span><span class=\"p\">,</span> <span class=\"n\">b64decode</span>\n<span class=\"kn\">from</span> <span class=\"nn\">hashlib</span> <span class=\"kn\">import</span> <span class=\"n\">sha256</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">hmac</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"n\">v</span><span class=\"p\">):</span> <span class=\"k\">return</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"n\">v</span><span class=\"p\">,</span> <span class=\"n\">sha256</span><span class=\"p\">).</span><span class=\"n\">digest</span><span class=\"p\">()</span>\n<span class=\"k\">def</span> <span class=\"nf\">enc</span><span class=\"p\">(</span><span class=\"n\">x</span><span class=\"p\">):</span> <span class=\"k\">return</span> <span class=\"n\">b64encode</span><span class=\"p\">(</span><span class=\"n\">x</span><span class=\"p\">)</span>\n<span class=\"k\">def</span> <span class=\"nf\">dec</span><span class=\"p\">(</span><span class=\"n\">x</span><span class=\"p\">):</span> <span class=\"k\">return</span> <span class=\"n\">b64decode</span><span class=\"p\">(</span><span class=\"n\">x</span><span class=\"p\">)</span>\n</code></pre>\n  </div>\n</div></div><div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-b0hki89a\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-b0hki89a\"><span class=\"k\">def</span> <span class=\"nf\">blank_token</span><span class=\"p\">(</span><span class=\"n\">uid</span><span class=\"p\">,</span> <span class=\"n\">key</span><span class=\"p\">):</span>\n  <span class=\"n\">nonce</span> <span class=\"o\">=</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"s\">\":\"</span><span class=\"p\">.</span><span class=\"n\">join</span><span class=\"p\">([</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"n\">uid</span><span class=\"p\">),</span> <span class=\"n\">os</span><span class=\"p\">.</span><span class=\"n\">urandom</span><span class=\"p\">(</span><span class=\"mi\">16</span><span class=\"p\">)]))</span>\n  <span class=\"k\">return</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">dumps</span><span class=\"p\">([</span><span class=\"n\">nonce</span><span class=\"p\">,</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">hmac</span><span class=\"p\">(</span><span class=\"n\">key</span><span class=\"p\">,</span> <span class=\"n\">nonce</span><span class=\"p\">))])</span>\n</code></pre>\n  </div>\n</div><div class=\"right-sidenote\"><p>Bearer tokens: like cookies, blobs you attach to a request (usually in an HTTP header).</p>\n</div>\n<p>We’re going to build a minimally-stateful bearer token, a blob signed with HMAC. Nothing fancy so far. <a href='https://api.rubyonrails.org/classes/ActiveSupport/MessageVerifier.html' title=''>Rails has done this</a> for a decade and a half.</p>\n\n<p>There’s a <a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''>fashion in API security for stateless tokens</a>, which encode all the data you’d need to check any request accompanied by that token – without a database lookup. Stateless tokens have some nice properties, and some less-nice. Our tokens won’t be stateless: they carry a user ID, with which we’ll look up the HMAC key to verify it. But they’ll stake out a sort of middle ground.</p>\n<div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-v32ipl21\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-v32ipl21\"><span class=\"k\">def</span> <span class=\"nf\">attenuate</span><span class=\"p\">(</span><span class=\"n\">macStr</span><span class=\"p\">,</span> <span class=\"n\">cav</span><span class=\"p\">):</span>\n    <span class=\"n\">mac</span> <span class=\"o\">=</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">loads</span><span class=\"p\">(</span><span class=\"n\">macStr</span><span class=\"p\">)</span>\n    <span class=\"n\">cavStr</span> <span class=\"o\">=</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">dumps</span><span class=\"p\">(</span><span class=\"n\">cav</span><span class=\"p\">)</span>\n    <span class=\"n\">oldTail</span> <span class=\"o\">=</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">mac</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">])</span>\n    <span class=\"n\">newTail</span> <span class=\"o\">=</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">hmac</span><span class=\"p\">(</span><span class=\"n\">oldTail</span><span class=\"p\">,</span> <span class=\"n\">cavStr</span><span class=\"p\">))</span>\n    <span class=\"k\">return</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">dumps</span><span class=\"p\">(</span><span class=\"n\">mac</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">:</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]</span> <span class=\"o\">+</span> <span class=\"p\">[</span><span class=\"n\">cavStr</span><span class=\"p\">,</span> <span class=\"n\">newTail</span><span class=\"p\">])</span>\n\n<span class=\"n\">m0</span> <span class=\"o\">=</span> <span class=\"n\">blank_token</span><span class=\"p\">(</span><span class=\"mi\">10</span><span class=\"p\">,</span> <span class=\"n\">keys</span><span class=\"p\">[</span><span class=\"mi\">10</span><span class=\"p\">])</span>\n<span class=\"n\">m1</span> <span class=\"o\">=</span> <span class=\"n\">attenuate</span><span class=\"p\">(</span><span class=\"n\">m0</span><span class=\"p\">,</span> <span class=\"p\">{</span><span class=\"s\">'path'</span><span class=\"p\">:</span> <span class=\"s\">'/images'</span><span class=\"p\">})</span>\n<span class=\"n\">m2</span> <span class=\"o\">=</span> <span class=\"n\">attenuate</span><span class=\"p\">(</span><span class=\"n\">m1</span><span class=\"p\">,</span> <span class=\"p\">{</span><span class=\"s\">'op'</span><span class=\"p\">:</span> <span class=\"s\">'read'</span><span class=\"p\">})</span>\n</code></pre>\n  </div>\n</div>\n<p>Let’s add some stuff.</p>\n\n<p>The meat of our tokens will be a series of claims we call “caveats”. We call them that because each claim restricts further what the token authorizes. After <code>{'path': '/images'}</code>, this token only allows operations that happen underneath the <code>/images</code> directory. Then, after <code>{'op': 'read'}</code>, it allows only reads, not writes.</p>\n\n<p>(I guess we’re building a file sharing system. Whatever.)</p>\n\n<p>Some important things about things about this design. First: by implication from the fact that caveats further restrict tokens, a token with no caveats restricts nothing. It’s a god-mode token. Don’t honor it.</p>\n<div class=\"right-sidenote\"><p>In other words: the ordering of caveats doesn’t matter.</p>\n</div>\n<p>Second: the rule of checking caveats is very simple: every single caveat must pass, evaluating <code>True</code> against the request that carries it, in isolation and without reference to any other caveat. If any caveat evaluates <code>False</code>, the request fails. In that way, we ensure that adding caveats to a token can only ever weaken it.</p>\n\n<p>With that in mind, take a closer look at this code:</p>\n<div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-2and4pv7\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-2and4pv7\"><span class=\"n\">oldTail</span> <span class=\"o\">=</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">mac</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">])</span>\n<span class=\"n\">newTail</span> <span class=\"o\">=</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">hmac</span><span class=\"p\">(</span><span class=\"n\">oldTail</span><span class=\"p\">,</span> <span class=\"n\">cavStr</span><span class=\"p\">))</span>\n</code></pre>\n  </div>\n</div>\n<p>Every caveat is HMAC-signed independently, which is weird. Weirder still, the key for that HMAC is the output of the last HMAC. The caveats chain together, and the HMAC of the last caveat becomes the “tail” of the token.</p>\n\n<p>Creating a new blank token for a particular user requires a key that the server (and probably only the server) knows. But adding a caveat doesn’t! Anybody can add a caveat. In our design, you, the user, can edit your own API token.</p>\n<div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-zkheo9e4\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-zkheo9e4\"><span class=\"k\">def</span> <span class=\"nf\">verify</span><span class=\"p\">(</span><span class=\"n\">macStr</span><span class=\"p\">,</span> <span class=\"n\">keys</span><span class=\"p\">):</span>\n    <span class=\"n\">mac</span> <span class=\"o\">=</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">loads</span><span class=\"p\">(</span><span class=\"n\">macStr</span><span class=\"p\">)</span>\n    <span class=\"n\">nonce</span> <span class=\"o\">=</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">mac</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">]).</span><span class=\"n\">split</span><span class=\"p\">(</span><span class=\"s\">\":\"</span><span class=\"p\">)</span>\n    <span class=\"n\">key</span> <span class=\"o\">=</span> <span class=\"n\">keys</span><span class=\"p\">[</span><span class=\"nb\">int</span><span class=\"p\">(</span><span class=\"n\">nonce</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">])]</span>\n    <span class=\"n\">tail</span> <span class=\"o\">=</span> <span class=\"s\">\"\"</span>\n    <span class=\"k\">for</span> <span class=\"n\">cav</span> <span class=\"ow\">in</span> <span class=\"n\">mac</span><span class=\"p\">[:</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]:</span>\n        <span class=\"n\">tail</span> <span class=\"o\">=</span> <span class=\"n\">hmac</span><span class=\"p\">(</span><span class=\"n\">key</span><span class=\"p\">,</span> <span class=\"n\">cav</span><span class=\"p\">)</span>\n        <span class=\"n\">key</span> <span class=\"o\">=</span> <span class=\"n\">tail</span>\n    <span class=\"k\">return</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">compare_digest</span><span class=\"p\">(</span><span class=\"n\">tail</span><span class=\"p\">,</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">mac</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]))</span>\n\n<span class=\"n\">verify</span><span class=\"p\">(</span><span class=\"n\">m2</span><span class=\"p\">,</span> <span class=\"n\">keys</span><span class=\"p\">)</span> <span class=\"c1\"># => True\n</span></code></pre>\n  </div>\n</div>\n<p>For completeness, and to make a point, there’s the verification code. Look up the original secret key from the user ID,  and then it’s chained HMAC all the way down. The point I’m making is that Macaroons are very simple.</p>\n<h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'></a><span class='plain-code'>2</span></h2>\n<p>Back in 2014, Google published <a href='https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41892.pdf' title=''>a paper at NDSS</a> introducing “Macaroons”, a new kind of cookie. Since then, they’ve become a sort of hipster shibboleth. But they’re more talked about than implemented, which is a nice way to say that practically nobody uses them.</p>\n\n<p>Until now! I dragged Fly.io into implementing them. Suckers!</p>\n\n<p>We had a problem: our API tokens were much too powerful. We needed to scope them down and let them express roles, and I scoped up that project to replace OAuth2 tokens altogether. We now have what I think is one of the more expansive Macaroon implementations on the Internet.</p>\n\n<p>I dragged us into using Macaroons because I wanted us to use a hipster token format. Google designed Macaroons for a bigger reason: they hoped to replace browser cookies with something much more powerful.</p>\n\n<p>The problem with simple bearer tokens, like browser cookies or JWTs, is that they’re prone to being stolen and replayed by attackers.</p>\n<div class=\"right-sidenote\"><p>game-over: pentest jargon for “very bad”</p>\n</div>\n<p>Worse, a stolen token is usually a game-over condition. In most schemes, a bearer token is an all-access pass for the associated user. For some applications this isn’t that big a deal, but then, <a href='https://neilmadden.blog/2020/09/09/macaroon-access-tokens-for-oauth-part-2-transactional-auth/' title=''>think about banking</a>. A banking app token that authorizes arbitrary transactions is a recipe for having a small heart attack on every HTTP request.</p>\n<div class=\"right-sidenote\"><p>(Perfectly minimized API tokens: a software security holy grail)</p>\n</div>\n<p>Macaroons are user-editable tokens that enable JIT-generated least-privilege tokens. With minimal ceremony and no additional API requests, a banking app Macaroon lets you authorize a request with a caveat like, I don’t know, <code>{'maxAmount': '$5'}</code>. I mean, something way better than that, probably lots of caveats, not just one, but you get the idea: a token so minimized you feel safe sending it with your request. Ideally, a token that only authorizes that single, intended request.</p>\n<h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'></a><span class='plain-code'>3</span></h2>\n<p>That’s not why we like Macaroons. We already assume our tokens aren’t being stolen.</p>\n\n<p>In most systems, the developers come up with a permissions system, and you’re stuck with it. We run a public cloud platform, and people want a lot of different things from our permissions. The dream is, we (the low-level platform developers on the team) design a single permission system, one time, and go about our jobs never thinking about this problem again.</p>\n\n<p>Instead of thinking of all of our “roles” in advance, we just model our platform with caveats:</p>\n\n<ol>\n<li>Users belong to <code>Organizations</code>.\n</li><li><code>Organizations</code> own <code>Apps</code>.\n</li><li><code>Apps</code> contain <code>Machines</code> and <code>Volumes</code>.\n</li><li>To any of these things, you can <code>Read</code>, <code>Write</code>, <code>Create</code>, <code>Delete</code>, and/or <code>Control</code> <aside class=\"right-sidenote\">control being change of state, like “start” and “stop”</aside>.\n</li><li>Some administrivia, like expiration (<code>ValidityWindow</code>), locking tokens to specific Fly Machines (<code>FromMachineSource</code>), and escape hatches like <code>Mutation</code> (for our GraphQL API).\n</li></ol>\n<div class=\"right-sidenote\"><p>(this is a vibes-based notation, don’t think too hard about it)</p>\n</div>\n<p>Simplistic. But it expresses admin tokens:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-qulbfdw0\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-qulbfdw0\">Organization 4721, mask=*\n</code></pre>\n  </div>\n</div>\n<p>And it expresses normal user tokens:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-lag6amtu\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-lag6amtu\">Organization 4721, mask=read,write,control\n(App 123, mask=control), (App 345, mask=read, write, control)\n</code></pre>\n  </div>\n</div>\n<p>And also an auditor-only token for that user:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-bikhlk3o\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-bikhlk3o\">Organization 4721, mask=read,write,control\n(App 123, mask=control), (App 345, mask=read, write, control)\nOrganization 4721, mask=read\n</code></pre>\n  </div>\n</div><div class=\"right-sidenote\"><p>(our deploy tokens are more complicated than this)</p>\n</div>\n<p>Or a deployment-only token, for a CI/CD system:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-rxeo12rj\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-rxeo12rj\">Organization 4721, mask=write,control\n(App 123, mask=*)\n</code></pre>\n  </div>\n</div>\n<p>Those are just the roles we came up with. Users can invent others. The important thing is that they don’t have to bother me about them.</p>\n<h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'></a><span class='plain-code'>4</span></h2>\n<p>Astute readers will have noticed by now that we haven’t shown any code that actually evaluates a caveat. That’s because it’s boring, and I’m too lazy to write it out. Got an <code>Organization</code> token for <code>image-hosting</code> that allows <code>Reads</code>? Ok; check and make sure the incoming request is for an asset of <code>image-hosting</code>, and that it’s a <code>Read</code>. Whatever code you came up with, it’d be fine.</p>\n\n<p>These straightforward restrictions are called “first party caveats”. The first party is us, the platform. We’ve got all the information we need to check them.</p>\n\n<p>Let’s kit out our token format some more.</p>\n<div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-zcmmxwd0\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-zcmmxwd0\"><span class=\"k\">def</span> <span class=\"nf\">third_party_caveat</span><span class=\"p\">(</span><span class=\"n\">ka</span><span class=\"p\">,</span> <span class=\"n\">tail</span><span class=\"p\">,</span> <span class=\"n\">msg</span><span class=\"p\">,</span> <span class=\"n\">url</span><span class=\"p\">):</span>\n    <span class=\"n\">crk</span> <span class=\"o\">=</span> <span class=\"n\">os</span><span class=\"p\">.</span><span class=\"n\">urandom</span><span class=\"p\">(</span><span class=\"mi\">16</span><span class=\"p\">)</span>\n    <span class=\"n\">ticket</span> <span class=\"o\">=</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">encrypt</span><span class=\"p\">(</span><span class=\"n\">ka</span><span class=\"p\">,</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">dumps</span><span class=\"p\">({</span>\n        <span class=\"s\">'crk'</span><span class=\"p\">:</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">crk</span><span class=\"p\">),</span>\n        <span class=\"s\">'msg'</span><span class=\"p\">:</span> <span class=\"n\">msg</span>\n    <span class=\"p\">})))</span>\n    <span class=\"n\">challenge</span> <span class=\"o\">=</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">encrypt</span><span class=\"p\">(</span><span class=\"n\">tail</span><span class=\"p\">,</span> <span class=\"n\">crk</span><span class=\"p\">))</span>\n    <span class=\"k\">return</span> <span class=\"p\">{</span> <span class=\"s\">'url'</span><span class=\"p\">:</span> <span class=\"n\">url</span><span class=\"p\">,</span> <span class=\"s\">'ticket'</span><span class=\"p\">:</span> <span class=\"n\">ticket</span><span class=\"p\">,</span> <span class=\"s\">'challenge'</span> <span class=\"p\">:</span> <span class=\"n\">challenge</span> <span class=\"p\">}</span>\n\n<span class=\"n\">key</span> <span class=\"o\">=</span> <span class=\"nb\">bytes</span><span class=\"p\">(</span><span class=\"s\">\"YELLOW SUBMARINE\"</span><span class=\"p\">)</span>\n<span class=\"n\">url</span> <span class=\"o\">=</span> <span class=\"s\">\"https://canary.service\"</span>\n<span class=\"n\">c3</span> <span class=\"o\">=</span> <span class=\"n\">third_party_caveat</span><span class=\"p\">(</span><span class=\"n\">key</span><span class=\"p\">,</span> <span class=\"n\">tail</span><span class=\"p\">,</span> <span class=\"n\">url</span><span class=\"p\">,</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">dumps</span><span class=\"p\">({</span><span class=\"s\">'user'</span><span class=\"p\">:</span> <span class=\"s\">'bobson.dugnutt'</span><span class=\"p\">}))</span>\n<span class=\"n\">m3</span> <span class=\"o\">=</span> <span class=\"n\">attenuate</span><span class=\"p\">(</span><span class=\"n\">m2</span><span class=\"p\">,</span> <span class=\"n\">c3</span><span class=\"p\">)</span>\n</code></pre>\n  </div>\n</div>\n<p>Up till now, we’ve gotten by with nothing but HMAC, which is one of the great charms of the design. Now we need to encrypt. There’s no authenticated encryption in the Python standard library, but that won’t stop us. <button toggle=\"#hmac-ctr\">Ready to make some candy? Hand me that brake fluid!</button></p>\n<div id=\"hmac-ctr\" toggle-content=\"\" aria-label=\"show very silly code\"><div class=\"highlight-wrapper group relative python\">\n  <button type=\"button\" class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\" data-wrap-target=\"#code-qurlq0h7\">\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\"></path><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\"></path></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button type=\"button\" class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\" data-copy-target=\"sibling\">\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\"></path><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\"></path></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class=\"highlight relative group\">\n    <pre class=\"highlight \"><code id=\"code-qurlq0h7\"><span class=\"c1\"># do i really need to say that i'm not serious about this?\n</span>\n<span class=\"k\">def</span> <span class=\"nf\">hmactr</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"n\">n</span><span class=\"p\">):</span>\n     <span class=\"n\">ks</span> <span class=\"o\">=</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"o\">+</span><span class=\"n\">n</span><span class=\"p\">)</span>\n     <span class=\"k\">for</span> <span class=\"n\">counter</span> <span class=\"ow\">in</span> <span class=\"nb\">xrange</span><span class=\"p\">(</span><span class=\"n\">sys</span><span class=\"p\">.</span><span class=\"n\">maxint</span><span class=\"p\">):</span>\n         <span class=\"n\">ks</span><span class=\"p\">.</span><span class=\"n\">update</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"n\">counter</span><span class=\"p\">))</span>\n         <span class=\"n\">kbs</span> <span class=\"o\">=</span> <span class=\"n\">ks</span><span class=\"p\">.</span><span class=\"n\">digest</span><span class=\"p\">()</span>\n         <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">xrange</span><span class=\"p\">(</span><span class=\"mi\">16</span><span class=\"p\">):</span> <span class=\"k\">yield</span> <span class=\"n\">kbs</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">encrypt</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"n\">buf</span><span class=\"p\">):</span>\n    <span class=\"n\">ak</span> <span class=\"o\">=</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"s\">'auth'</span><span class=\"p\">).</span><span class=\"n\">digest</span><span class=\"p\">()</span>\n    <span class=\"n\">nonce</span> <span class=\"o\">=</span> <span class=\"n\">os</span><span class=\"p\">.</span><span class=\"n\">urandom</span><span class=\"p\">(</span><span class=\"mi\">16</span><span class=\"p\">)</span>\n    <span class=\"n\">cipher</span> <span class=\"o\">=</span> <span class=\"n\">hmactr</span><span class=\"p\">(</span><span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"s\">'enc'</span><span class=\"p\">).</span><span class=\"n\">digest</span><span class=\"p\">(),</span> <span class=\"n\">nonce</span><span class=\"p\">)</span>\n    <span class=\"n\">ctxt</span> <span class=\"o\">=</span> <span class=\"nb\">bytearray</span><span class=\"p\">(</span><span class=\"n\">buf</span><span class=\"p\">)</span>\n    <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">xrange</span><span class=\"p\">(</span><span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">buf</span><span class=\"p\">)):</span>\n        <span class=\"n\">ctxt</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span> <span class=\"o\">^=</span> <span class=\"nb\">ord</span><span class=\"p\">(</span><span class=\"n\">cipher</span><span class=\"p\">.</span><span class=\"nb\">next</span><span class=\"p\">())</span>\n    <span class=\"n\">res</span> <span class=\"o\">=</span> <span class=\"n\">nonce</span> <span class=\"o\">+</span> <span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"n\">ctxt</span><span class=\"p\">)</span>\n    <span class=\"k\">return</span> <span class=\"n\">res</span> <span class=\"o\">+</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">ak</span><span class=\"p\">,</span> <span class=\"n\">res</span><span class=\"p\">).</span><span class=\"n\">digest</span><span class=\"p\">()</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">decrypt</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"n\">buf</span><span class=\"p\">):</span>\n    <span class=\"n\">ak</span> <span class=\"o\">=</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"s\">'auth'</span><span class=\"p\">).</span><span class=\"n\">digest</span><span class=\"p\">()</span>\n    <span class=\"k\">if</span> <span class=\"ow\">not</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">compare_digest</span><span class=\"p\">(</span><span class=\"n\">buf</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">16</span><span class=\"p\">:],</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">ak</span><span class=\"p\">,</span> <span class=\"n\">buf</span><span class=\"p\">[:</span><span class=\"o\">-</span><span class=\"mi\">16</span><span class=\"p\">]).</span><span class=\"n\">digest</span><span class=\"p\">()):</span>\n        <span class=\"k\">return</span> <span class=\"bp\">False</span>\n    <span class=\"n\">nonce</span> <span class=\"o\">=</span> <span class=\"n\">buf</span><span class=\"p\">[:</span><span class=\"mi\">16</span><span class=\"p\">]</span>\n    <span class=\"n\">cipher</span> <span class=\"o\">=</span> <span class=\"n\">hmactr</span><span class=\"p\">(</span><span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"s\">'enc'</span><span class=\"p\">).</span><span class=\"n\">digest</span><span class=\"p\">(),</span> <span class=\"n\">nonce</span><span class=\"p\">)</span>\n    <span class=\"n\">ptxt</span> <span class=\"o\">=</span> <span class=\"nb\">bytearray</span><span class=\"p\">(</span><span class=\"n\">buf</span><span class=\"p\">[</span><span class=\"mi\">16</span><span class=\"p\">:</span><span class=\"o\">-</span><span class=\"mi\">16</span><span class=\"p\">])</span>\n    <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">xrange</span><span class=\"p\">(</span><span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">buf</span><span class=\"p\">[</span><span class=\"mi\">16</span><span class=\"p\">:</span><span class=\"o\">-</span><span class=\"mi\">16</span><span class=\"p\">])):</span>\n        <span class=\"n\">ptxt</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span> <span class=\"o\">^=</span> <span class=\"nb\">ord</span><span class=\"p\">(</span><span class=\"n\">cipher</span><span class=\"p\">.</span><span class=\"nb\">next</span><span class=\"p\">())</span>\n    <span class=\"k\">return</span> <span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"n\">ptxt</span><span class=\"p\">)</span>\n</code></pre>\n  </div>\n</div></div>\n<p>With “third-party” caveats comes a cast of characters. We’re still the first party. You’ll play the second party. The third party is any other system in the world that you trust: an SSO system, an audit log, a revocation checker, whatever.</p>\n\n<p>Here’s the trick of the third-party caveat: our platform doesn’t know what your caveat means, and it doesn’t have to. Instead, when you see a third-party caveat in your token, you tear a ticket off it and exchange it for a “discharge Macaroon” with that third party. You submit both Macaroons together to us.</p>\n\n<p>Let’s attenuate our token with a third-party caveat hooking it up to a “canary” service that generates a notice approximately any time the token is used.</p>\n\n<p><img src=\"/blog/macaroons-escalated-quickly/assets/third-party.png?1/2&wrap-left\" /></p>\n\n<p>To build that canary caveat, you first make a <code>ticket</code> that users of the token will hand to your canary, and then a <code>challenge</code> that Fly.io will use to verify discharges your checker spits out. The ticket and the challenge are both encrypted. The ticket is encrypted under <code>KA</code>, so your service can read it. The challenge is encrypted under the previous Macaroon tail, so only Fly.io can read it. Both hide yet another key, the random HMAC key <code>CRK</code> (“caveat root key”).</p>\n\n<p>In addition to <code>CRK</code>, the ticket contains a message, which says whatever you want it to; Fly.io doesn’t care. Typically, the message describes some kind of additional checking you want your service to perform before spitting out a discharge token.</p>\n<div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-lrt3p4uz\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-lrt3p4uz\"><span class=\"k\">def</span> <span class=\"nf\">discharge</span><span class=\"p\">(</span><span class=\"n\">ka</span><span class=\"p\">,</span> <span class=\"n\">ticket</span><span class=\"p\">):</span>\n    <span class=\"n\">ptxt</span> <span class=\"o\">=</span> <span class=\"n\">decrypt</span><span class=\"p\">(</span><span class=\"n\">ka</span><span class=\"p\">,</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">ticket</span><span class=\"p\">))</span>\n    <span class=\"k\">if</span> <span class=\"n\">ptxt</span> <span class=\"o\">==</span> <span class=\"bp\">False</span><span class=\"p\">:</span> <span class=\"k\">return</span> <span class=\"bp\">False</span>\n    <span class=\"n\">tbody</span> <span class=\"o\">=</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">loads</span><span class=\"p\">(</span><span class=\"n\">ptxt</span><span class=\"p\">)</span>\n    <span class=\"c1\"># not shown: do something with tbody['msg']\n</span>    <span class=\"k\">return</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">dumps</span><span class=\"p\">([</span><span class=\"n\">ticket</span><span class=\"p\">,</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">hmac</span><span class=\"p\">(</span><span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">tbody</span><span class=\"p\">[</span><span class=\"s\">'crk'</span><span class=\"p\">]),</span> <span class=\"n\">ticket</span><span class=\"p\">))])</span>\n</code></pre>\n  </div>\n</div>\n<p>To authorize a request with a token that includes a third-party caveat for the canary service, you need to get your hands on a corresponding discharge Macaroon. Normally, you do that by <code>POST</code>ing the ticket from the caveat to the service.</p>\n\n<p>Discharging is simple. The service, which holds <code>KA</code>, uses it to decrypt the ticket. It checks the message and makes some decisions. Finally, it mints a new macaroon, using <code>CRK</code>, recovered from the ticket, as the root key. The ticket itself is the nonce.</p>\n\n<p>If it wants, the third-party service can slap on a bunch of first-party caveats of its own. When we verify the Macaroon, we’ll copy those caveats out and enforce them. Attenuation of a third-party discharge macaroon works like a normal macaroon.</p>\n<div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-6nzqrw14\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-6nzqrw14\"><span class=\"k\">def</span> <span class=\"nf\">verify_third_party</span><span class=\"p\">(</span><span class=\"n\">tag</span><span class=\"p\">,</span> <span class=\"n\">cav</span><span class=\"p\">,</span> <span class=\"n\">discharges</span><span class=\"o\">=</span><span class=\"p\">[]):</span>\n    <span class=\"n\">crk</span> <span class=\"o\">=</span> <span class=\"n\">decrypt</span><span class=\"p\">(</span><span class=\"n\">tag</span><span class=\"p\">,</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">cav</span><span class=\"p\">[</span><span class=\"s\">'challenge'</span><span class=\"p\">]))</span>\n    <span class=\"k\">if</span> <span class=\"n\">crk</span> <span class=\"o\">==</span> <span class=\"bp\">False</span><span class=\"p\">:</span> <span class=\"k\">return</span> <span class=\"bp\">False</span>\n    <span class=\"n\">discharge</span> <span class=\"o\">=</span> <span class=\"bp\">None</span>\n    <span class=\"k\">for</span> <span class=\"n\">dcs</span> <span class=\"ow\">in</span> <span class=\"n\">discharges</span><span class=\"p\">:</span>\n        <span class=\"k\">if</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">loads</span><span class=\"p\">(</span><span class=\"n\">dcs</span><span class=\"p\">)[</span><span class=\"mi\">0</span><span class=\"p\">]</span> <span class=\"o\">==</span> <span class=\"n\">cav</span><span class=\"p\">[</span><span class=\"s\">'ticket'</span><span class=\"p\">]:</span>\n            <span class=\"n\">discharge</span> <span class=\"o\">=</span> <span class=\"n\">dcs</span>\n            <span class=\"k\">break</span>\n    <span class=\"k\">if</span> <span class=\"ow\">not</span> <span class=\"n\">discharge</span><span class=\"p\">:</span> <span class=\"k\">return</span> <span class=\"bp\">False</span>\n    <span class=\"n\">mac</span> <span class=\"o\">=</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">loads</span><span class=\"p\">(</span><span class=\"n\">discharge</span><span class=\"p\">)</span>\n    <span class=\"n\">key</span> <span class=\"o\">=</span> <span class=\"n\">crk</span>\n    <span class=\"c1\"># boring old stuff ---------------------\n</span>    <span class=\"n\">tag</span> <span class=\"o\">=</span> <span class=\"s\">\"\"</span>\n    <span class=\"k\">for</span> <span class=\"n\">cav</span> <span class=\"ow\">in</span> <span class=\"n\">mac</span><span class=\"p\">[:</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]:</span>\n        <span class=\"n\">tag</span> <span class=\"o\">=</span> <span class=\"n\">hmac</span><span class=\"p\">(</span><span class=\"n\">key</span><span class=\"p\">,</span> <span class=\"n\">cav</span><span class=\"p\">)</span>\n        <span class=\"n\">key</span> <span class=\"o\">=</span> <span class=\"n\">tag</span>\n    <span class=\"k\">return</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">compare_digest</span><span class=\"p\">(</span><span class=\"n\">tag</span><span class=\"p\">,</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">mac</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]))</span>\n</code></pre>\n  </div>\n</div>\n<p>To verify tokens that have third-party caveats, start with the root Macaroon, walking the caveats like usual. At each third-party caveat, match the <code>ticket</code> from the caveat with the <code>nonce</code> on the discharge Macaroon. The key for root Macaroon decrypts the <code>challenge</code> in the caveat, recovering <code>CRK</code>, which cryptographically verifies the discharge.</p>\n\n<p>(The Macaroons paper uses different terms: “caveat identifier” or <code>cId</code> for “ticket”, and “verification-key identifier” or <code>vId</code> for “challenge”. These names are self-evidently bad and our contribution to the state of the art is to replace them.)</p>\n\n<p>There’s two big applications for third-party caveats in Popular Macaroon Thought. First, they facilitate microservice-izing your auth logic, because you can stitch arbitrary policies together out of third-party caveats. And, they seem like <a href='https://github.com/go-macaroon-bakery/macaroon-bakery' title=''>fertile ground for an ecosystem of interoperable Macaroon services</a>: Okta and Google could stand up SSO dischargers, for instance, or someone can do a really good revocation service.</p>\n\n<p>Neither of these light us up. We’re allergic to microservices. As for public protocols, well, it’s good to want things. So we almost didn’t even implement third-party caveats.</p>\n<h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'></a><span class='plain-code'>5</span></h2>\n<p>I’m glad we did though, because they’ve been pretty great.</p>\n\n<p>The first problem third-party caveats solved for us was hazmat tokens. To the extent possible, we want Macaroon tokens to be safe to transmit between users. Our Macaroons express permissions, but not authentication, so it’s almost safe to email them.</p>\n\n<p>The way it works is, our Macaroons all have a third-party caveat pointing to a “login service”, either identifying the proper bearer as a particular Fly.io user or as a member of some <code>Organization</code>. To allow a request with your token, you first need to collect the discharge from the login service, which requires authentication.</p>\n\n<p>The login discharge is very sensitive, but there isn’t much reason to pass it around. The original permissions token is where all the interesting stuff is, and it’s not scary. So that’s nice.</p>\n\n<p><img src=\"/blog/macaroons-escalated-quickly/assets/fly-sso.png?1/3&wrap-left\" /></p>\n\n<p>Ben then came up with <a href=\"https://community.fly.io/t/organization-required-sso/17560\">third-party caveats that require Google or Github SSO logins.</a> If your token has one of those caveats, when you run <code>flyctl deploy</code>, a browser will pop up to log you into your SSO IdP (if you haven’t done so recently already).</p>\n\n<p>We’ve put a <a href='https://fly.io/blog/tokenized-tokens/#tokenizer-the-fabled-4th-way' title=''>bunch of work into getting the guts of our SSO system working</a>, but that work has mostly been invisible to customers. But Macaroon-ized SSO has a subtle benefit: you can configure <a href='http://Fly.io' title=''>Fly.io</a> to automatically add SSO requirements to specific <code>Organizations</code> (so, for instance, a dev environment might not need SSO at all, and prod might need two).</p>\n\n<p>SSO requirements in most applications are a brittle pain in the ass. Ours are flexible and straightforward, and that happened almost by accident. Macaroons, baby!</p>\n\n<p>Here’s a fun thing you can do with a Macaroon system: stand up a Slack bot, and give it an HTTP <code>POST</code> handler that accepts third-party tickets. Then:</p>\n\n<p><img src=\"/blog/macaroons-escalated-quickly/assets/bot-ok.png?1/2&center&border\" /></p>\n\n<p>So, the bot is cute, but any platform could do that. What’s cool is the way our platform <em>doesn’t</em> work with Slack; in fact, nothing on our platform knows anything about Slack, and Slack doesn’t know anything about us. We didn’t reach out to a Slack endpoint. Everything was purely cryptographic.</p>\n\n<p>That bot could, if I sunk some time into it, enforce arbitrary rules: it could selectively add caveats for the requests it authorizes, based on lookups of the users requesting them, at specific times of day, with specific logging. Theoretically, it could add third-party caveats of its own.</p>\n\n<p>The win for us for third-party caveats is that they create a plugin system for our security tokens. That’s an unusual place to see a plugin interface! But Macaroons are easy to understand and keep in your head, so we’re pretty confident about the security issues.</p>\n<h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'></a><span class='plain-code'>6</span></h2>\n<p>Obviously, we didn’t write our Macaroon code in Python, or with HMAC-SHA256-CTR.</p>\n\n<p>We landed on a primary implementation Golang (Ben subsequently wrote an Elixir implementation). Our hash is SHA256, our cipher is Chapoly. We encode in MsgPack.</p>\n<div class=\"callout\"><p>We didn’t use the pre-existing public implementation because <a href=\"https://securitycryptographywhatever.com/2021/08/12/what-do-we-do-about-jwt-with-jonathan-rudenberg/\" title=\"\">we were warned not to</a>. The Macaroon idea is simple, and it exists mostly as an academic paper, not a standard. The community that formed around building open source “standard” Macaroons decided to use untyped opaque blobs to represent caveats. We need things to be as rigidly unambiguous as they can be.</p>\n</div>\n<p><img src=\"/blog/macaroons-escalated-quickly/assets/verifier-service.png?2/3&center\" /></p>\n\n<p>The big strength of Macaroons as a cryptographic design — that it’s based almost entirely on HMAC — makes it a challenge to deploy. If you can verify a Macaroon, you can generate one.  We have thousands of servers. They can’t all be allowed to generate tokens.</p>\n\n<p>What we did instead:</p>\n\n<ul>\n<li>We split token checking into “verification” of token HMAC tags and “clearing” of token caveats.\n</li><li>Verification occurs only on a physically isolated token-verification service; to verify a token’s tag, you  HTTP <code>POST</code> the token to the verifier.\n</li><li>Clearing of token caveats can happen anywhere. Token caveat clearing is domain-specific and subject to change; token verification is simple cryptography and  changes rarely.\n</li><li>A token verification is cacheable. The client library for the token verifier does that, which speeds things up by exploiting the locality of token submissions.\n</li><li>The verification service is backed by a <a href='https://fly.io/docs/litefs/' title=''>LiteFS-distributed SQLite database</a>, so verification is fast globally — a major step forward from our legacy OAuth2 tokens, which are only fast in Ashburn, VA.\n</li></ul>\n\n<p><img src=\"/blog/macaroons-escalated-quickly/assets/service-token.png?2/3&center\" /></p>\n\n<p>Now buckle up, because I’m about to try to get you to care about service tokens.</p>\n\n<p>We operate “worker servers” all over the world to host apps for our customers. To do that, those workers need access to customer secrets, like the key to decrypt a customer volume. To retrieve those secrets, the workers have to talk to secrets management servers.</p>\n\n<p>We manage a lot of workers. We trust them. But we don’t trust them that much, if you get my drift. You don’t want to just leave it up to the servers to decide which secrets they can access. The blast radius of a problem with a single worker should be no greater than the apps that are supposed to run there.</p>\n\n<p>The gold standard for approving access to customer information is, naturally, explicit customer authorization. We almost have that with Macaroons! The first time an app runs on a worker, <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>the orchestrator code</a> has a token, and it can pass that along to the secret stores.</p>\n\n<p>The problem is, you need that token more than once; not just when the user does a deploy, but potentially any time you restart the app or migrate it to a new worker. And you can’t just store and replay user Macaroons. They have expirations.</p>\n<div class=\"right-sidenote\"><p>This is like dropping privilege with things like pledge(2), but in a distributed system.</p>\n</div>\n<p>So our token verification service exposes an API that transforms a user token into a “service token”, which is just the token with the authentication caveat and expiration “stripped off”.</p>\n\n<p>What’s cool is: components that receive service tokens can attenuate them. For instance, we could lock a token to a particular worker, or even a particular Fly Machine. Then we can expose the whole <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Fly Machines API</a> to customer VMs while keeping access traceable to specific customer tokens. Stealing the token from a Fly Machine doesn’t help you since it’s locked to that Fly Machine by a caveat attackers can’t strip.</p>\n<h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'></a><span class='plain-code'>7</span></h2>\n<p>If a customer loses their tokens to an attacker, we can’t just blow that off and let the attacker keep compromising the account!</p>\n<div class=\"right-sidenote\"><p>This cancels every token derived through attenuation by that nonce.</p>\n</div>\n<p>Every Macaroon we issue is identified by a unique nonce, and we can revoke tokens by that nonce. This is just a basic function of the token verification service we just described.</p>\n\n<p>We host token caches all over our fleet. Token revocation invalidates the caches. Anything with a cache checks frequently whether to invalidate. Revocation is rare, so just keeping a revocation list and invalidating caches wholesale seems fine.</p>\n<h2 id='8' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#8' aria-label='Anchor'></a><span class='plain-code'>8</span></h2>\n<p>I get it, it’s tough to get me to shut up about Macaroons.</p>\n\n<p>A couple years ago, I <a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''>wrote a long survey of API token designs</a>, from JWTs (never!) to Biscuits. I had a <a href='https://fly.io/blog/api-tokens-a-tedious-survey/#macaroons' title=''>bunch to say about Macaroons</a>, not all of it positive, and said we’d be plowing forward with them at Fly.io.</p>\n\n<p>My plan had been to follow up soon after with a deep dive on Macaroons as we planned them for Fly.io. I’m glad I didn’t do that, not just because it would’ve been embarrassing to announce a feature that took us over 2 years to launch, but also because the process of working on this with Ben Toews changed a lot of my thinking about them.</p>\n\n<p>I think if you asked Ben, he’d say he had mixed feelings about how much complexity we wrangled to get this launched. On the other hand: we got a lot of things out of them without trying very hard:</p>\n\n<ul>\n<li>Security tokens you can (almost) email to your users and partners without putting your account at risk.\n</li><li>A flexible permission system, encoded directly into the tokens, that users can drive without talking to our servers.\n</li><li>A plugin system that users can (when we clean up the tooling) use themselves, to add things like Passkeys or two-person-approval rules or audit logging, without us getting in the middle.\n</li><li>An SSO system that can stack different IdPs, mandate SSO login, and do that on a per-<code>Organization</code> basis.\n</li><li><a href='https://www.latacora.com/blog/2018/06/12/a-childs-garden/' title=''>Inter-service authorization</a> that is traceable back to customer actions, so our servers can’t just make up which apps they’re allowed to look at.\n</li><li>An elegant way of exposing our own APIs to customer Fly Machines with ambient authentication, but without the <a href=\"https://github.com/SummitRoute/imdsv2_wall_of_shame/blob/main/README.md\">AWS IMDSv1 credential theft problem</a>.\n</li></ul>\n\n<p>There are downsides and warts! I’m mostly not telling you about them! Pure restrictive caveats are an awkward way to express some roles. And, blinded by my hunger to get Macaroons deployed, I spat in the face of science and used internal database IDs as our public caveat format, an act for which JP will never forgive me.</p>\n\n<p>If i’ve piqued your interest, <a href='https://github.com/superfly/macaroon' title=''>the code for this stuff is public</a>, along with some more <a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''>detailed technical documentation</a>.</p>",
      "image": {
        "url": "https://fly.io/blog/macaroons-escalated-quickly/assets/evil-cookies-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/how-i-fly-yoko-li/",
      "title": "How Yoko Li makes towns, tamagoes, and tools for local AI",
      "description": null,
      "url": "https://fly.io/blog/how-i-fly-yoko-li/",
      "published": "2024-01-08T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<p>Hello all, and welcome to another episode of How I Fly, a series where I interview developers about what they do with technology, what they find exciting, and the unexpected things they’ve learned along the way. This time I’m talking with <a href='https://twitter.com/stuffyokodraws' title=''>Yoko Li</a>, an investment partner at A16Z who’s also an open-source AI developer. She works on some of the most exciting AI projects in the world. I’m excited to share them with you today, with fun stories about the lessons she’s learned along the way.</p>\n<h2 id='cool-experiments' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#cool-experiments' aria-label='Anchor'></a><span class='plain-code'>Cool Experiments</span></h2>\n<p>One of Yoko’s most thought-provoking experiments is <a href='https://www.convex.dev/ai-town' title=''>AI Town</a>, a virtual town populated by AI agents that talk with each other. It takes advantage of the randomness of AI responses to create emergent behavior. When you open it, it looks like this:</p>\n\n<p><img alt=\"A picture of the AI Town homepage, a UI showing a top-down 2D RPG view with a visible river and a tent. The UI shows a conversation with the characters Alice and Stella.\" src=\"/blog/how-i-fly-yoko-li/assets/image1.webp\" /></p>\n\n<p>You can see the AI agents talking with each other and watch how the relationships between them form and change over time. It’s also a lot of fun to watch.</p>\n\n<p>One of Yoko’s other experiments is <a href='https://ai-tamago.fly.dev/' title=''>AI Tamago</a>, a <a href='https://en.wikipedia.org/wiki/Tamagotchi' title=''>Tamagochi</a> virtual pet implemented with a large language model instead of the state machine that we’re all used to. AI Tamago uses an unmodified version of LLaMA 2 7B to take in game state and user inputs, then it generates what happens next. Every time you interact with your pet, it feeds data to LLaMA 2 and then uses Ollama’s JSON mode to generate unexpected output.</p>\n\n<p><img alt=\"A picture of the homepage of AI Tamago, showing a virtual pet with buttons to feed the pet, play with the pet, clean the pet, discipline the pet, check pet status, and deliver medical care to the pet.\" src=\"/blog/how-i-fly-yoko-li/assets/image4.webp\" /></p>\n\n<p>It’s all the fun of the classic Tamagochi toys from the 90’s (including the ability to randomly discipline your virtual pet) without any of the coin cell batteries or having to carry around the little egg-shaped puck.</p>\n\n<p>But that’s just something you can watch, not something that’s as easy to play with on your own machine. Yoko has also worked on the <a href='https://github.com/ykhli/local-ai-stack' title=''>Local AI Starter Kit</a> that lets you go from zero to AI in minutes. It’s a collection of chains of models that let you ingest a bunch of documents, store them in a database, and then use those documents as context for a language model to generate responses. It’s everything you need to implement a “chat with a knowledge base” feature.</p>\n<h3 id='the-dark-of-ai-experiments' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-dark-of-ai-experiments' aria-label='Anchor'></a><span class='plain-code'>The dark of AI experiments</span></h3>\n<p>The Local AI Starter Kit is significant because normally to do this, you need to set up billing and API keys for at least four different API providers, and then you need to write a bunch of (hopefully robust) code to tie it all together. With the Local AI Starter Kit, you can do this on your own hardware, with your own data, and your own models privately. It’s a huge step forward for democratizing access to this technology.</p>\n\n<p>Document search is one of my favorite usecases for AI, and it’s one of the most immediately useful ones. It’s also one of the most fiddly and annoying to get right. To help illustrate this, I’ve made a diagram of the steps involved with setting up document search by hand:</p>\n\n<p><img alt=\"A diagram showing the process of ingesting a pile of markdown documents into a vector database. The documents are broken into a collection of sections, then each section is passed through an embedding model and the resulting vectors are stored in a vector database.\" src=\"/blog/how-i-fly-yoko-li/assets/image3.webp\" /></p>\n\n<p>You start with your Markdown documents. Most Markdown documents are easily broken up into sections where each section will focus on a single aspect of the larger topic of the document. You can take advantage of this best practice by letting people search for each section individually, which is typically a lot more useful than just searching the entire document.</p>\n<div class=\"right-sidenote\"><p>Okay, okay, fine. Language encircles concepts instead of defining them directly. The point still stands that we’re operating at a level “below” words and sentences, I don’t want to bog this down in a bunch of linear algebra that neither of us understand well enough to explain in a single paragraph like I am here. The main point is that it lets you “fuzzy match” relevant documents in a way that exact word search queries never could on their own.</p>\n</div>\n<p>Essentially, the vector embeddings that you generate from an embedding model are a mathematical representation of the “concepts” that the embedding model uses that are adjacent to the text of your documents. When you use the same model to generate embeddings for your documents and user queries, this lets you find documents that are similar to the query, but not precisely the same exact words. This is called “fuzzy searching” and it is one of the most difficult problems in computer science (right next to naming things).</p>\n\n<p>When a user comes to search the database, you do the same thing as ingestion:</p>\n\n<p><img alt=\"A diagram showing the full flow for doing document search Q&A with a vector database. The user submits a question to an API endpoint, the question is broken into embedding vectors and used to search for similar vectors in the database. The relevant document fragments are fed into the prompt for a large language model to generate a response that is grounded in the facts from the documents that were ingested. The response is streamed to the user one token at a time.\" src=\"/blog/how-i-fly-yoko-li/assets/image2.webp\" /></p>\n\n<p>The user query comes into your API endpoint. You use the same embedding model from earlier (omitted from the diagram for brevity) to turn that query into a vector. Then you query the same vector database to find documents that are similar to the query. Then you have a list of documents with metadata like the URL to the documentation page or section fragment in that page. From here you have two options. You can either use the documents to return a list of results to the user, or you can do the more fun thing: using those documents as context for a large language model to generate a response grounded in the relevant facts in those documents.</p>\n<div class=\"right-sidenote\"><p>I think it’s also how OpenAI’s custom GPTs work, but they haven’t released technical details about how they work so this is outright speculation on my part.</p>\n</div>\n<p>This basic pattern is called Retrieval-augmented Generation (RAG), and it’s how Bing’s copilot chatbot works. The Local AI Starter Kit makes setting this pipeline up <em>effortless</em> and <em>fast</em>. It’s a huge step forward for making this groundbreaking technology accessible to everyone.</p>\n<h2 id='the-struggles' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-struggles' aria-label='Anchor'></a><span class='plain-code'>The struggles</span></h2>\n<blockquote>\n<p>When I was trying to get the AI models in AI Town to output JSON, I tried a bunch of different things. I got some good results by telling the model to “only reply in JSON, no prose”, but we ended up using a model tuned for outputting code. I think I inspired <a href='https://ollama.ai' title=''>Ollama</a> to add their JSON output feature.</p>\n</blockquote>\n\n<p>One of the main benefits of large language models is that they are essentially stochastic models of the entire Internet. They have a bunch of patterns formed that can let you create surprisingly different outputs from similar inputs. This is also one of the main drawbacks of large language models: they are essentially stochastic models of the entire Internet. They have a bunch of patterns formed that can let you create surprisingly different outputs from similar inputs. The outputs of these models are usually correct-ish enough (more correct if you ground the responses in document fact like you do with a Retrieval-augmented Generation system), but they are not always aligned with our observable reality.</p>\n\n<p>A lot of the time you will get outputs that don’t make any logical or factual sense. These are called “hallucinations” and they are one of the main drawbacks of large language models. If a hallucination pops in at the worst times, you’ve accidentally told someone how to poison themselves with chocolate chip cookies. This is, as the kids say, “bad”.</p>\n\n<p>The inherent randomness of the output of a large language model means that it can be difficult to get an exactly parsable format. Most of the time, you’d be able to coax the model to get usable JSON output, but without schema it can sometimes generate wildly different JSON responses. Only sometimes. This isn’t deterministic and Yoko has found that this is one of the most frustrating parts of working with large language models.</p>\n<div class=\"right-sidenote\"><p>This works by making any offending ungrammatical tokens weighted to negative infinity. It’s amazingly hacky but the hilarious part is that it works.</p>\n</div>\n<p>However, there are workarounds. <a href='https://github.com/ggerganov/llama.cpp' title=''>llama.cpp</a> offers a way to use a grammar file to strictly guide the output of a large language model by using context-free grammar. This lets you get something more deterministic, but it’s still not perfect. It’s a lot better than nothing, though.</p>\n\n<p>One of the fun things that can happen with this is that you can have the model fail to generate anything but an endless stream of newlines in JSON mode. This is hilarious and usually requires some special detection logic to handle and restart the query. There’s work being done to let you use JSON schema to guide the generation of large language model outputs, but it’s not currently ready for the masses.</p>\n<div class=\"right-sidenote\"><p>If it’s dumb and it works, is it really dumb?</p>\n</div>\n<p>However, one of the easiest ways to hack around this is by using a model that generates code instead of text. This is how Yoko got the AI Town and AI Tamago models to output JSON that was mostly valid. It’s a hack, but it works. This was made a lot easier for AI town when one of the tools they use (<a href='https://ollama.ai' title=''>Ollama</a>) added support for JSON output from the model. This is a lot better than the code generation model hack, but research continues.</p>\n<h2 id='the-simple-joy-of-unexpected-outputs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-simple-joy-of-unexpected-outputs' aria-label='Anchor'></a><span class='plain-code'>The simple joy of unexpected outputs</span></h2>\n<blockquote>\n<p>When I was making AI Town, I was inspired by <a href='https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Objects' title=''>The Lifecycle of Software Objects</a> by Ted Chiang. It’s about a former zookeeper that trained AI agents to be pets, kinda like how we use Reinforcement Learning from Human Feedback to train AI models like ChatGPT.</p>\n</blockquote>\n\n<p>However, at the same time, there are cases where hallucinations are not only useful, but they are what make the implementation of a system possible. If large language models are essentially massive banks of the word frequencies of a huge part of culture, then the emergent output can create unexpected things that happen frequently. This lets you have emergent behavior form, this can be the backbone of games and is the key thing that makes AI Town work as well as it does.</p>\n\n<p>AI Tamago is also completely driven off of the results of large language model hallucinations. They are the core of what drives user inputs, the game loop, and the surprising reactions you get when disciplining your pet. The status screen takes in the game state and lets you know what your pet is feeling in a way that the segment displays of the Tamagochi toys could never do.</p>\n\n<p>These enable you to build workflows that are <em>augmented</em> by the inherent randomness of the hallucinations instead of seeing them as drawbacks. This means you need to choose outputs that can have the hallucinations shine instead of being ugly warts you need to continuously shave away. Instead of using them for doing pathfinding, have them drive the AI of your characters or writing the A* pathfinding algorithm so you don’t have to write it again for the billionth time.</p>\n\n<p>I’m not saying that large language models can replace the output of a human, but they are more like a language server for human languages as well as programming languages. They are best used when you are generating the boilerplate you don’t want to do yourself, or when you are throwing science at the wall to see what sticks.</p>\n<h2 id='in-conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#in-conclusion' aria-label='Anchor'></a><span class='plain-code'>In conclusion</span></h2>\n<p>Yoko is showing people how to use AI today, on local machines, with models of your choice, that allow you to experiment, hack and learn.</p>\n\n<p>I can’t wait to see what’s next!</p>\n\n<p>If you want to follow what Yoko does, here’s a few links to add to your feeds:</p>\n\n<ul>\n<li>Yoko’s <a href='https://twitter.com/stuffyokodraws' title=''>Twitter</a> (or X, or whatever we’re supposed to call it now)\n</li><li>Yoko’s <a href='https://github.com/ykhli' title=''>GitHub</a>\n</li><li>Yoko’s <a href='https://yoko.dev/' title=''>Website</a>\n</li></ul>\n\n<p>(insert standard conclusion diatribe here)</p>",
      "image": {
        "url": "https://fly.io/blog/how-i-fly-yoko-li/assets/chat-bird-cover-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/not-midjourney-bot/",
      "title": "Deploy Your Own (Not) Midjourney Bot on Fly GPUs",
      "description": null,
      "url": "https://fly.io/blog/not-midjourney-bot/",
      "published": "2024-01-04T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>Fly.io has Enterprise-grade GPUs and servers all over the globe (or <em>disk</em>, depending on which side of the flat Earth debate you fall on) making it a great place to deploy your next disruptive AI app.</p>\n</div>\n<p>Some people daydream about normal things, like coffee machines or raising that Series A round (those are normal things to dream about, right?). I daydream about commanding a fleet of chonky <a href='https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413/' title=''>NVIDIA Lovelace L40Ss</a>. Also, totally normal. Well, fortunately for me and anyone else wanting to explore the world of generative AI — Fly.io has GPUs now!</p>\n\n<p>Sure, this technology will probably end up with the AI <a href='https://marketoonist.com/2023/03/ai-written-ai-read.html' title=''>talking to itself</a> while we go about our lives — but it seems like it’s here to stay, so we should at least have some fun with it. In this post we’ll put these GPUs to task and you’ll learn how to build your very own AI image-generating Discord bot, kinda like Midjourney. Available 24/7 and ready to serve up all the pictures of cats eating burritos your heart desires. And because I’d never tell you to draw the rest of the owl, I’ll link to working code that you can deploy today.</p>\n<h2 id='latent-diffusion-models-have-entered-the-chat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#latent-diffusion-models-have-entered-the-chat' aria-label='Anchor'></a><span class='plain-code'>Latent Diffusion Models Have Entered the Chat</span></h2>\n<p>In the realm of AI image generation, two names have become prominent: Midjourney and Stable Diffusion. Both are image generating software that allow you to synthesize an image from a textual prompt. One is a closed source paid service, while the other is open source and can run locally. Midjourney gained popularity because it allowed the less technically-inclined among us to explore this technology through its ease of use. Stable Diffusion democratized access to the technology, but it can be quite tricky to get good results out of it.</p>\n\n<p>Enter <a href='https://github.com/lllyasviel/Fooocus' title=''>Fooocus</a> (pronounced <em>focus</em>), an open source project that combines the best of both worlds and offers a user-friendly interface to Stable Diffusion. It’s hands down the easiest way to get started with Stable Diffusion. Sure there are more popular tools like Stable Diffusion web UI and ComfyUI, but Fooocus adds some magic to reduce the need to manually tweak a bunch of settings.  The most significant feature is probably GPT-2-based “<a href='https://github.com/lllyasviel/Fooocus/discussions/117#raw' title=''>prompt expansion</a>” to dynamically enhance prompts.</p>\n\n<p>The point of Fooocus is to <em>focus</em> on your prompt. The more you put into it, the more you get out. That said, a very simple prompt like “forest elf” can return high-quality images without the need to trawl the web for prompt ideas or fiddle with knobs and levers (although they’re there if you want them).</p>\n\n<p>So, what can this thing <em>do</em>? Well, this…</p>\n\n<p><img alt=\"A black and white sketch of hot-air balloon over a mountain range generated using Fooocus with \"Pencil Sketch Drawing\" style and quality = True\" src=\"/blog/not-midjourney-bot/assets/./balloon-sketch.webp\" /></p>\n\n<p>Here’s the full command I’ve used to generate this image: <code>/imagine prompt: sketch of hot-air balloon over a mountain range style1: Pencil Sketch Drawing quality: true ar: 1664×576</code></p>\n<h2 id='what-were-building' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-were-building' aria-label='Anchor'></a><span class='plain-code'>What We’re Building</span></h2>\n<p>We’ll deploy two applications. The code to run the bot itself will run on normal VM hardware, and the API server doing all the hard work synthesizing alpacas out of thin air will run on GPU hardware.</p>\n\n<p><img alt=\"An architecture diagram explaining how the two apps will communicate and return the requested image to an end user.\" src=\"/blog/not-midjourney-bot/assets/./arch-diagram.png?center&2/3\" /></p>\n\n<p>Fooocus is served up as a web UI by default, but with a little elbow grease we can interact with it as a REST API. Fortunately, with more than 25k stars on GitHub at the time of writing, the project has a lively open-source community, so we don’t need to do much work here — it’s already been done for us. <a href='https://github.com/konieshadow/Fooocus-API' title=''>Fooocus-API</a> is a project that shoves FastAPI in front of a Fooocus runtime. We’ll use this for the API server app.</p>\n\n<p>The Python-based bot connects to the <a href='https://discord.com/developers/docs/topics/gateway' title=''>Discord Gateway API</a> using the <a href='https://github.com/Pycord-Development/pycord' title=''>Pycord</a> library. When it starts up, it maintains an open pipe for data to flow back and forth via WebSockets. The bot app also includes a client that knows how to talk to the API server using Flycast and request the image it needs via HTTP.</p>\n\n<p>When we request an image from Discord using the <code>/imagine</code> slash command, we immediately respond using Pycord’s <code>defer()</code> function to let Discord know that the request has been received and the bot is working on it — it’ll take a few seconds to process your prompt, fabricate an image, upload it to Discord and let you share it with your friends. This is a blocking operation, so it won’t perform well if you have hundreds of people on your Discord Server using the command. For that, you’ll want to jiggle some wires to make the code non-blocking. But for for now, this gives us a nice UX for the bot.</p>\n\n<p>When the API server returns the image, it gets saved to disk. We’ll use the fantastic <a href='https://github.com/sqids/sqids-python' title=''>Sqids</a> library to generate collision-free file names:</p>\n<div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-8ivdq00o\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-8ivdq00o\"><span class=\"n\">unique_id</span> <span class=\"o\">=</span> <span class=\"bp\">self</span><span class=\"p\">.</span><span class=\"n\">sqids</span><span class=\"p\">.</span><span class=\"n\">encode</span><span class=\"p\">(</span>\n    <span class=\"p\">[</span><span class=\"n\">ctx</span><span class=\"p\">.</span><span class=\"n\">author</span><span class=\"p\">.</span><span class=\"nb\">id</span><span class=\"p\">,</span> <span class=\"nb\">int</span><span class=\"p\">(</span><span class=\"n\">time</span><span class=\"p\">.</span><span class=\"n\">time</span><span class=\"p\">())]</span>\n<span class=\"p\">)</span>\n\n<span class=\"n\">result_filename</span> <span class=\"o\">=</span> <span class=\"sa\">f</span><span class=\"s\">\"result_</span><span class=\"si\">{</span><span class=\"n\">unique_id</span><span class=\"si\">}</span><span class=\"s\">.png\"</span>\n</code></pre>\n  </div>\n</div>\n<p>We’ll also use <code>asyncio</code> to check if the image is ready every second, and when it is, we send it off to Discord to complete the request:</p>\n<div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-dwletmxh\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-dwletmxh\"><span class=\"k\">while</span> <span class=\"ow\">not</span> <span class=\"n\">os</span><span class=\"p\">.</span><span class=\"n\">path</span><span class=\"p\">.</span><span class=\"n\">exists</span><span class=\"p\">(</span><span class=\"n\">result_filename</span><span class=\"p\">):</span>\n    <span class=\"k\">await</span> <span class=\"n\">asyncio</span><span class=\"p\">.</span><span class=\"n\">sleep</span><span class=\"p\">(</span><span class=\"mi\">1</span><span class=\"p\">)</span>\n\n<span class=\"k\">with</span> <span class=\"nb\">open</span><span class=\"p\">(</span><span class=\"n\">result_filename</span><span class=\"p\">,</span> <span class=\"s\">\"rb\"</span><span class=\"p\">)</span> <span class=\"k\">as</span> <span class=\"n\">f</span><span class=\"p\">:</span>\n    <span class=\"k\">await</span> <span class=\"n\">ctx</span><span class=\"p\">.</span><span class=\"n\">respond</span><span class=\"p\">(</span>\n        <span class=\"nb\">file</span><span class=\"o\">=</span><span class=\"n\">discord</span><span class=\"p\">.</span><span class=\"n\">File</span><span class=\"p\">(</span><span class=\"n\">f</span><span class=\"p\">,</span> <span class=\"n\">result_filename</span><span class=\"p\">)</span>\n    <span class=\"p\">)</span>\n</code></pre>\n  </div>\n</div>\n<p>Neither of these two apps will be exposed to the Internet, yet they’ll still be able to communicate with each other. One of the undersold stories about Fly.io is the ease with which two applications can communicate over the private network. We assign special IPv6 private network (6pn) addresses within the same organizational space and applications can effortlessly discover and connect to one another without any additional configuration.</p>\n\n<p>But what about load balancing and this “scale-to-zero” thing? We don’t <em>just</em> want our two apps to talk to each other, we want the Fly Proxy to start our Machine when a request comes in, and stop it when idle. For that, we’ll need <a href='https://fly.io/docs/reference/private-networking/#flycast-private-load balancing' title=''>Flycast</a>, our private load balancing feature.</p>\n\n<p>When you assign a Flycast IP to your app, you can route requests using a special <code>.flycast</code> domain. Those requests are routed through the Fly Proxy instead of directly to instances in your app. Meaning you get all the load balancing, rate limiting and other proxy goodness that you’re accustomed to. The Proxy runs a process which can automatically downscale Machines every few minutes. It’ll also start them right back up when a request comes in — this means we can take advantage of scale-to-zero, saving us a bunch of money!</p>\n<h2 id='the-imagine-command' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-imagine-command' aria-label='Anchor'></a><span class='plain-code'>The <code>/imagine</code> Command</span></h2>\n<p>The slash command is the heart of your bot, enabling you to generate images based on your prompt, right from within Discord. When you type <code>/imagine</code> into the Discord chat, you’ll see some command options pop up.</p>\n\n<p>You’ll need to input your base prompt (e.g. “an alpaca sleeping in a grassy field”) and  optionally pick some styles (“Pencil Sketch Drawing”, “Futuristic Retro Cyberpunk”, “MRE Dark Cyberpunk” etc). With Fooocus, combining multiple styles — “style-chaining” — can help you achieve amazing results. Set the aspect ratio or provide negative prompts if needed, too.</p>\n\n<p>After you execute the command, the bot will request the image from the API, then send it as a response in the chat. Let’s see it in action!</p>\n\n<p><img alt=\"A dif demo run through showcasing the ability of the bot to generate images from Discord\" src=\"/blog/not-midjourney-bot/assets/./demo.gif?card&center\" /></p>\n<h2 id='deployment-speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-speedrun' aria-label='Anchor'></a><span class='plain-code'>Deployment Speedrun</span></h2>\n<p><strong class='font-semibold text-navy-950'>First, we’ll deploy the API server.</strong> For convenience (and to speed things up), we’ll use a pre-built image when we deploy. With dependencies like <code>torch</code> and <code>torchvision</code> bundled in, it’s a hefty image weighing in just shy of 12GB. With a normal Fly Machine  this would not only be a bad idea, but not even possible due to an 8GB limit for the VMs rootfs. Fortunately the wizards behind Fly GPUs have accounted for our need to run huge models and their dependencies, and awarded us 50GB of rootfs.</p>\n<div class=\"right-sidenote\"><p>Fly GPUs use <a href=\"https://github.com/cloud-hypervisor/cloud-hypervisor\" title=\"\">Cloud Hypervisor</a> and not <a href=\"https://github.com/firecracker-microvm/firecracker\" title=\"\">Firecracker</a> (like a regular Fly Machine) for virtualization. But even with a 12GB image, this doesn’t stop the Machine from booting in seconds when a new request comes in through the Proxy.</p>\n</div>\n<p>To start, clone the template <a href='https://github.com/fly-apps/not-midjourney-bot' title=''>repository</a>. You’ll need this for both the bot and server apps. Then deploy the server with the Fly CLI:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-kdezxoz\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-kdezxoz\">fly deploy <span class=\"se\">\\</span>\n  <span class=\"nt\">--image</span> ghcr.io/fly-apps/not-midjourney-bot:server <span class=\"se\">\\</span>\n  <span class=\"nt\">--config</span> ./server/fly.toml <span class=\"se\">\\</span>\n  <span class=\"nt\">--no-public-ips</span>\n</code></pre>\n  </div>\n</div>\n<p>This command tells Fly.io to deploy your application based on the configuration specified in the <code>fly.toml</code>, while the <code>--no-public-ips</code> flag secures your app by not exposing it to the public Internet.</p>\n\n<p>Remember Flycast? To use it, we’ll allocate a private IPv6:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-t586n0x7\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-t586n0x7\">fly ips allocate-v6 <span class=\"nt\">--private</span>\n</code></pre>\n  </div>\n</div>\n<p>Now, let’s take a look at our <a href='https://github.com/fly-apps/not-midjourney-bot/blob/134bb634f97bf81040e489650f2334b48d976c10/server/fly.toml' title=''><code>fly.toml</code></a> config:</p>\n<div class=\"highlight-wrapper group relative toml\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-avnpvpja\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-avnpvpja\"><span class=\"py\">app</span> <span class=\"p\">=</span> <span class=\"s\">\"alpaca-image-gen\"</span>\n<span class=\"py\">primary_region</span> <span class=\"p\">=</span> <span class=\"s\">\"ord\"</span>\n\n<span class=\"nn\">[[vm]]</span>\n  <span class=\"py\">size</span> <span class=\"p\">=</span> <span class=\"s\">\"performance-8x\"</span>\n  <span class=\"py\">memory</span> <span class=\"p\">=</span> <span class=\"s\">\"16gb\"</span>\n  <span class=\"py\">gpu_kind</span> <span class=\"p\">=</span> <span class=\"s\">\"l40s\"</span>\n\n<span class=\"nn\">[[services]]</span>\n  <span class=\"py\">internal_port</span> <span class=\"p\">=</span> <span class=\"mi\">8888</span>\n  <span class=\"py\">protocol</span> <span class=\"p\">=</span> <span class=\"s\">\"tcp\"</span>\n  <span class=\"py\">auto_stop_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n  <span class=\"py\">auto_start_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n  <span class=\"py\">min_machines_running</span> <span class=\"p\">=</span> <span class=\"mi\">0</span>\n\n    <span class=\"nn\">[[services.ports]]</span>\n      <span class=\"py\">handlers</span> <span class=\"p\">=</span> <span class=\"nn\">[\"http\"]</span>\n      <span class=\"py\">port</span> <span class=\"p\">=</span> <span class=\"mi\">80</span>\n      <span class=\"py\">force_https</span> <span class=\"p\">=</span> <span class=\"kc\">false</span>\n\n<span class=\"nn\">[mounts]</span>\n  <span class=\"py\">source</span> <span class=\"p\">=</span> <span class=\"s\">\"repositories\"</span>\n  <span class=\"py\">destination</span> <span class=\"p\">=</span> <span class=\"s\">\"/app/repositories\"</span>\n  <span class=\"py\">initial_size</span> <span class=\"p\">=</span> <span class=\"s\">\"20gb\"</span>\n</code></pre>\n  </div>\n</div>\n<p>There are a few key things to note here:</p>\n\n<ol>\n<li>Currently, the NVIDIA L40Ss we’re using when we specify <code>gpu_kind</code> are only available in <code>ORD</code>, so that’s what we’ve set the <code>primary_region</code> to. We’re rolling out more GPUs to more regions in a hurry — but for now we’ll host the bot in Chicago.\n</li><li>Out of the box, 8GB of system RAM is suggested. In my testing this wasn’t close to enough: the Machine would frequently run out of memory and crash. I got things working better by using 16GB of RAM.\n</li><li>The FastAPI server binds to port 8888; we need to set this as our <code>internal_port</code>, or the Fly Proxy won’t know where to send requests.\n</li><li>We want our Machine to <a href='https://fly.io/docs/apps/autostart-stop/' title=''>automatically stop and start</a>.\n</li><li>Flycast doesn’t do HTTPS, so we won’t force it here. Don’t worry, it’s still encrypted over the wire!\n</li><li>A volume is automatically created on the first deploy. On first boot, the app clones the Fooocus repo and downloads the Stable Diffusion model checkpoints onto that volume. This takes a couple of minutes, but the next time the Machine starts, it’ll have everything it needs to serve a request within seconds.\n</li></ol>\n<div class=\"callout\"><p>The <a href=\"https://github.com/fly-apps/not-midjourney-bot/blob/84e72d1e7048627b7c845fe3d44d45b278e451d5/README.md\" title=\"\"><strong class=\"font-semibold text-navy-950\">README</strong></a> for this project has detailed instructions about setting up your Discord bot and adding it to a Server.  After setting up the permissions and privileged intents, you’ll get an OAuth2 URL. Use this URL to invite your bot to your Discord server and confirm the permissions. Once that’s done, grab your Discord API token, you’ll need it for the next step.</p>\n</div>\n<p><strong class='font-semibold text-navy-950'>With the API server up and running, it’s time to deploy the Discord bot.</strong> This app will run on a normal Fly Machine, no GPU required. First, set the <code>DISCORD_TOKEN</code> and <code>FOOOCUS_API_URL</code> (the Flycast endpoint for the API server) secrets, using the Fly CLI. Then deploy:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-wc4jofjn\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-wc4jofjn\">fly deploy <span class=\"se\">\\</span>\n  <span class=\"nt\">--image</span> ghcr.io/fly-apps/not-midjourney-bot:bot <span class=\"se\">\\</span>\n  <span class=\"nt\">--config</span> ./bot/fly.toml <span class=\"se\">\\</span>\n  <span class=\"nt\">--no-public-ips</span>\n</code></pre>\n  </div>\n</div>\n<p>Notice that the bot app doesn’t need to be publicly visible on the Internet either. Under the hood, the WebSocket connection to Discord’s Gateway API allows the bot to communicate freely without the need to define any services in our <code>fly.toml</code>. This also means that the Fly Proxy will not downscale the app like it does the GPU Machine — the bot will always appear “online”.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Not interested in GPUs?</h1>\n    <p>You can still deploy apps on Fly.io today and be up and running in a matter of minutes.</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/docs/speedrun/\">\n        Deploy an app now<span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-turtle.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='how-do-i-know-this-thing-is-using-gpu-for-reals' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-do-i-know-this-thing-is-using-gpu-for-reals' aria-label='Anchor'></a><span class='plain-code'>How Do I Know This Thing Is Using GPU for Reals?</span></h2>\n<p>That’s easy! NVIDIA provides us with a neat little command-line utility called <code>nvidia-smi</code> which we can use to monitor and get information about NVIDIA GPU devices.</p>\n\n<p>Let’s SSH to the running Machine for the API server app and run an <code>nvidia-smi</code> query in one go. It’s a little clunky, but you’ll get the point:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-51fc7qja\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-51fc7qja\">fly ssh console <span class=\"se\">\\</span>\n  <span class=\"nt\">-C</span> <span class=\"s2\">\"nvidia-smi --query-gpu=gpu_name,utilization.gpu,utilization.memory,temperature.gpu,power.draw --format=csv,noheader --loop\"</span>\n</code></pre>\n  </div>\n</div><div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-uemr03f6\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-uemr03f6\">Connecting to fdaa:2:f664:a7b:210:d8b2:8fd8:2... complete\n\nNVIDIA L40S, 0 %, 0 %, 46, 88.63 W\nNVIDIA L40S, 0 %, 0 %, 46, 88.61 W\nNVIDIA L40S, 36 %, 4 %, 51, 103.41 W\nNVIDIA L40S, 65 %, 25 %, 57, 280.90 W\nNVIDIA L40S, 0 %, 0 %, 49, 91.13 W\nNVIDIA L40S, 0 %, 0 %, 48, 89.76 W\n</code></pre>\n  </div>\n</div>\n<p>What we’ve done is run the command on a loop while the bot is actually doing work synthesizing an image and we get to see it ramp up and consume more wattage and VRAM. The card is barely breaking a sweat!</p>\n<h2 id='how-much-will-these-alpaca-pics-cost-me' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-will-these-alpaca-pics-cost-me' aria-label='Anchor'></a><span class='plain-code'>How Much Will These Alpaca Pics Cost Me?</span></h2>\n<p>Let’s talk about the cost-effectiveness of this setup. On Fly.io, an L40S GPU <a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''>costs</a> $2.50/hr. Tag on a few cents per hour for the VM resources and storage for our models and you’re looking at about $3.20/hr to run the GPU Machine. It’s <em>on-demand</em>, too — if you’re not using the compute, you’re not paying for it! Keep in mind that some of these checkpoint models can be several gigabytes and if you create a volume, you will be charged for it even when you have no Machines running. It’s worth noting too, that the non-GPU bot app falls into our <a href='https://fly.io/docs/about/pricing/#free-allowances' title=''>free allowance</a>.</p>\n<div class=\"right-sidenote\"><p>Rates are on-demand, with no minimum usage requirements. Discounted rates for reserved GPU Machines and dedicated hosts are also available if you email <a href=\"mailto:[email protected]\" title=\"\">[email protected]</a></p>\n</div>\n<p>In comparison, Midjourney offers several subscription tiers with the cheapest plan costing $10/mo and providing 3.3 hours of “fast” GPU time (roughly equivalent to an enterprise-grade Fly GPU). This works out to about $3/hr give or take a few cents.</p>\n<h2 id='where-can-i-take-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-can-i-take-this' aria-label='Anchor'></a><span class='plain-code'>Where Can I Take This?</span></h2>\n<p>There is a lot you can do to build out the bot’s functionality. You control the source code for the bot, meaning that you can make it do <em>whatever you want</em>. You might decide to mimic Midjourney’s <code>/blend</code> command to splice your own images into prompts (AKA img2img diffusion). You can do this by adding more commands to your <a href='https://guide.pycord.dev/popular-topics/cogs' title=''>Cog</a>, Pycord’s way of grouping similar commands. You might decide to add a button to roll the image if you don’t like it, or even specify the number of images to return. The possibilities are endless and your cloud bill’s the limit!</p>\n\n<p>The full code for the bot and server (with detailed instructions on how to deploy it on Fly.io) can be found <a href='https://github.com/fly-apps/not-midjourney-bot' title=''><strong class='font-semibold text-navy-950'>here</strong></a>.</p>",
      "image": {
        "url": "https://fly.io/blog/not-midjourney-bot/assets/purple-balloon-taking-off-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/fly-with-alpine/",
      "title": "Fly With Alpine",
      "description": null,
      "url": "https://fly.io/blog/fly-with-alpine/",
      "published": "2023-12-21T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>Reduce image sizes and improve startup times by switching your base image to Alpine Linux.</p>\n</div>\n<p>Before proceeding, a caution.  This is an engineering trade-off.  Test carefully before deploying to production.</p>\n\n<p>By the end of this blog post you should have the information you need to make an informed decision.</p>\n<h2 id='introduction' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#introduction' aria-label='Anchor'></a><span class='plain-code'>Introduction</span></h2>\n<p><a href='https://www.alpinelinux.org/about/' title=''>Alpine Linux</a> is a Linux distribution that advertises itself as Small.  Simple.  Secure.</p>\n\n<p>It is indisputably smaller than the alternatives – when measured by image size.  More on that in a bit.  Some claim that this results in less memory usage and better performance.  Others dispute these claims.  For these, it is best that you test the results for yourself with your application.</p>\n\n<p>Simple is harder to measure.  Some of the larger differences, like <a href='https://github.com/OpenRC/openrc#readme' title=''>OpenRC</a> vs <a href='https://systemd.io/' title=''>SystemD</a>, are less relevant in container environments.  Others, like <a href='https://busybox.net/' title=''>BusyBox</a> are implementation details.  Essentially what you get is a Linux distribution with perhaps a number of standard packages (e.g., bash) not installed by default, but these can be easily added if needed.</p>\n\n<p>Secure is definitely an important attribute.  The alternatives make comparable claims in this area.  Do your own research in this area and come to your own conclusions.</p>\n\n<p>Not mentioned is the downside: Alpine Linux has a smaller ecosystem that the alternatives, particularly when compared to Debian.</p>\n<h2 id='baseline' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#baseline' aria-label='Anchor'></a><span class='plain-code'>Baseline</span></h2>\n<p>Let’s start with a baseline consisting of the Dockerfiles produced by <code>fly launch</code> for some of the most popular\nframeworks:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-257otcgi\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-257otcgi\">FROM fideloper/fly-laravel:${PHP_VERSION}\nFROM hexpm/elixir:1.12.3-erlang-24.1.4-debian-bullseye-20210902-slim\nFROM node:${NODE_VERSION}-slim\nFROM oven/bun:${BUN_VERSION}-slim\nFROM python:${PYTHON_VERSION}-slim-bullseye\nFROM ruby:$RUBY_VERSION-slim\n</code></pre>\n  </div>\n</div>\n<p>What may not be obvious to the naked eye from these results is that the base image for these is one of the following:</p>\n\n<ul>\n<li>Debian Bookworm (the current “stable” distribution)\n</li><li>Debian Bullseye (the previous “stable” distribution)\n</li><li>Ubuntu Focal Fossa (the previous LTS release of Ubuntu)\n</li></ul>\n\n<p>Once you factor in that Ubuntu is based on Debian, the conclusion is that Debian is effectively the default distribution for fly IO.  Rest assured that this isn’t the result of a devious conspiracy by Fly.io, but rather a reflection of the default choices made independently by the developers of a number of frameworks and runtimes.  Beyond this, all Fly.io is doing is choosing the “slim” version of the default distribution for each framework as the base.</p>\n\n<p>What’s likely going on here is a virtuos circle: people choose Debian because of the ecosystem, and ecosystem grows because people chose Debian.</p>\n\n<p>Now lets compare base image sizes:</p>\n<table class=\"ml-8 mb-8\">\n<thead>\n<tr>\n  <th class=\"px-8\">\n  <th class=\"px-8 underline\">Alpine\n  <th class=\"px-8 underline\">Debian slim\n</tr>\n</thead>\n<tbody>\n<tr>\n  <th class=\"text-left\">Bun 1.0.18\n  <td class=\"text-center\">43.10M\n  <td class=\"text-center\">63.84M\n</tr>\n<tr>\n  <th class=\"text-left\">Node 21.4.0\n  <td class=\"text-center\">46.83M\n  <td class=\"text-center\">70.08M\n</tr>\n<tr>\n  <th class=\"text-left\">Python 3.12.1\n  <td class=\"text-center\">17.59M\n  <td class=\"text-center\">45.36M\n</tr>\n<tr>\n  <th class=\"text-left\">Ruby 3.2\n  <td class=\"text-center\">40.14M\n  <td class=\"text-center\">74.36M\n</tr>\n</tbody>\n</table>\n\n\n<p>And these numbers are just the for the base images.  I’ve measured a minimal Rails/Postgresql/esbuild application at 304MB on Alpine and 428MB on Debian Slim.  A minimal Bun application at 110MB on Alpine and 173MB on Debian Slim.  And a minimal Node application at 142MB on Alpine and 207MB on Debian Slim.</p>\n\n<p>In each case, corresponding Alpine images are consistently smaller than their Debian slim equivalent.</p>\n<h2 id='switching-distributions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#switching-distributions' aria-label='Anchor'></a><span class='plain-code'>Switching Distributions</span></h2>\n<p>Switch distributions (and switching back!) is easy.</p>\n\n<p>The first change is to replace <code>-slim</code> with <code>-alpine</code> in <code>FROM</code> statements in your <code>Dockerfile</code>.</p>\n\n<p>Next is to replace <code>apt-get update</code> with <code>apk update</code> and <code>apt-get install</code> with <code>apk add</code>.  Delete any options you may have like <code>-y</code> and <code>--no-install-recommends</code> - they aren’t needed.</p>\n\n<p>Now review the names of the packages you are installing.  Many are named the same.  A few are different.\nYou can use <a href='https://pkgs.alpinelinux.org/packages' title=''>alpine packages</a> to look for ones to use.  Some examples of\ndifferences:</p>\n<table class=\"ml-8 mb-8\" style=\"border-collapse: separate; border-spacing: 1rem 0\">\n<thead>\n<tr>\n  <th class=\"px-8 underline text-left\">Debian\n  <th class=\"px-8 underline text-left\">Alpine\n</tr>\n</thead>\n<tbody>\n<tr>\n  <td>build-essential\n  <td>build-base\n</tr>\n<tr>\n  <td>chromium-sandbox\n  <td>chromium-chromedriver\n</tr>\n<tr>\n  <td>default-libmysqlclient-dev\n  <td>mysql-client\n</tr>\n<tr>\n  <td>default-mysqlclient\n  <td>mysql-client\n</tr>\n<tr>\n  <td>freedts-bin\n  <td>freedts\n</tr>\n<tr>\n  <td>libicu-dev\n  <td>icu-dev\n</tr>\n<tr>\n  <td>libjemalloc\n  <td>jemalloc-dev\n</tr>\n<tr>\n  <td>libjpeg-dev\n  <td>jpeg-dev\n</tr>\n<tr>\n  <td>libmagickwand-dev\n  <td>imagemagick-libs\n</tr>\n<tr>\n  <td>libsqlite3-0\n  <td>sqlite-dev\n</tr>\n<tr>\n  <td>libtiff-dev\n  <td>tiff-dev\n</tr>\n<tr>\n  <td>libvips\n  <td>vips-dev\n</tr>\n<tr>\n  <td>node-gyp\n  <td>gyp\n</tr>\n<tr>\n  <td>pkg-config\n  <td>pkgconfig\n</tr>\n<tr>\n  <td>python\n  <td>python3\n</tr>\n<tr>\n  <td>python-is-python3\n  <td>python3\n</tr>\n<tr>\n  <td>sqlite3\n  <td>sqlite\n</tr>\n</tbody>\n</table>\n\n\n<p>Note: the above is just an approximation.  For example, while <code>libsqlite3-0</code> and <code>sqlite-dev</code> include everything\nyou need to build an application that uses sqlite3, all that is needed at runtime is <code>sqlite-lib</code>.  This relentless attention to detail contributes to smaller final image sizes.</p>\n\n<p>Note: For Bun, Node, and Rails users, knowledge of how to apply the above changes are included in recent versions of the dockerfile generators that we provide.  After all, computers are good at <code>if</code> statements:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-7aa51nfb\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-7aa51nfb\">bunx dockerfile --alpine\nnpx dockerfile --alpine\nbin/rails generate dockerfile --alpine\n</code></pre>\n  </div>\n</div><figure class=\"post-cta\">\n  <figcaption>\n    <h1>Choose your own Linux Distribution</h1>\n    <p>Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/docs/\">\n        Run your entire stack near your users\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='potential-issues' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#potential-issues' aria-label='Anchor'></a><span class='plain-code'>Potential issues</span></h2>\n<p>Over time, we’ve noted a number of issues.</p>\n\n<ul>\n<li>Alpine uses <a href='https://musl.libc.org/' title=''>musl</a> for a runtime library.  Debian uses <a href='https://www.gnu.org/software/libc/' title=''>glibc</a>.  Software tested on glibc may not work as expected on musl. And there are other potential compatibility issues like <a href='https://bell-sw.com/blog/how-to-deal-with-alpine-dns-issues/' title=''>DNS</a>.\n</li><li>Debian includes both <code>adduser</code> and <code>useradd</code>.    Alpine, by default, only includes <code>adduser</code>.\nThis can be addressed by installing package like <a href='https://pkgs.alpinelinux.org/package/edge/community/armv7/shadow' title=''>shadow</a>, or switching to <code>adduser</code>.\n</li><li>Packages like <a href='https://github.com/nodenv/node-build' title=''>node-build</a> require <code>bash</code> which isn’t included by default.  Adding it back in allows <code>node-build</code> to run to completion, but the end result is that a precompiled Debian executable is installed that won’t run on Alpine.\nAn alternative is to download an <a href='https://unofficial-builds.nodejs.org/' title=''>unofficial build</a>.\n</li><li>Release candidates for Alpine may not get the same level of testing as Debian resulting in problems\nlike <a href='https://github.com/sparklemotion/sqlite3-ruby/issues/434' title=''>sqlite3-ruby not working on Alpine 3.19</a>.\nIn cases like this, stay back on previous versions of Alpine for a short while, or compile the gem for yourself.\nThese issues are temporary.\n</li><li>Some packages, like Chrome, are not available for Alpine.  Alternatives like Chromium may be necessary.\n</li></ul>\n<h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2>\n<p>While not as large a community as Debian, there is a substantial number of happy Alpine users.</p>\n\n<p>For the forseeable future, the default for both frameworks and there fly.io will remain Debian, but we make it easy to switch.</p>\n\n<p>Try it out!  Hopefully this blog has provided insight into what you should evaluate for before you switch.</p>",
      "image": {
        "url": "https://fly.io/blog/fly-with-alpine/assets/fly-with-alpine-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/fks/",
      "title": "Introducing Fly Kubernetes",
      "description": null,
      "url": "https://fly.io/blog/fks/",
      "published": "2023-12-18T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, and if you’ve been following us awhile you probably just did a double-take. We’re building a new public cloud that runs containerized applications with virtual machine isolation on our own hardware around the world. And we’ve been doing it without any K8s. Until now!</p>\n</div><div class=\"callout\"><p><strong class=\"font-semibold text-navy-950\">Update, March 2024:</strong> FKS does more stuff now, and you can read about it in <a href=\"https://fly.io/blog/fks-beta-live/\" title=\"\">Fly Kubernetes does more now</a></p>\n</div>\n<p>We’ll own it: we’ve been snarky about Kubernetes. We are, at heart, old-school Unix nerds. We’re still scandalized by <code>systemd</code>.</p>\n\n<p>To make matters more complicated, the problems we’re working on <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>have a lot of overlap with K8s</a>, but <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/#numad' title=''>just enough impedance mismatch</a> that it (<a href='https://www.nomadproject.io/' title=''>or anything that looks like it</a>) is a bad fit for our own platform.</p>\n\n<p>But, come on: you never took us too seriously about K8s, right? K8s is hard for us to use, but that doesn’t mean it’s not a great fit for what you’re building. We’ve been clear about that all along, right? Sure we have!</p>\n\n<p>Well, good news, everybody! If K8s is important for your project, and that’s all that’s been holding you back from <a href='https://fly.io/docs/speedrun/' title=''>trying out Fly.io</a>, we’ve spent the past several months building something for you.</p>\n<h2 id='fly-io-for-kubernetians' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-io-for-kubernetians' aria-label='Anchor'></a><span class='plain-code'>Fly.io For Kubernetians</span></h2>\n<p>Fly.io works by transmogrifying Docker containers into filesystems for <a href='https://firecracker-microvm.github.io/' title=''>lightweight hypervisors</a>, and running them on servers we rack in dozens of regions around the world.</p>\n\n<p>You can build something like Fly.io with “standard” orchestration tools like K8s. In fact, that’s what we did to start, too. To keep things simple, we used Nomad, and instead of K8s CNIs, we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system <a href='https://fly.io/blog/bpf-xdp-packet-filters-and-udp/' title=''>based on eBPF</a>). But the ideas are the same.</p>\n\n<p>The way we look at it, the signature feature of a “standard” orchestrator is the global scheduler: the global eye in the sky that keeps track of vacancies on servers and optimized placement of new workloads. That’s the problem we ran into. We’re running over 200,000 applications, and we’re doing so on every continent except Antarctica. The speed of light (and a globally distributed network of backhoes) has something to say about keeping a perfectly consistent global picture of hundreds of thousands of applications, and it’s not pleasant.</p>\n\n<p>The other problem we ran into is that our Nomad scheduler kept trying to outsmart us, and, worse, our customers. It turns out that our users have pretty firm ideas of where they’d like their apps to run. If they ask for São Paulo, they want São Paulo, not Rio. But global schedulers have other priorities, like optimally bin-packing resources, and sometimes <code>GIG</code> looks just as good as <code>GRU</code> to them.</p>\n\n<p>To escape the scaling and DX problems we were hitting, we rethought orchestration. Where orchestrators like K8s tend to work through distributed consensus, we keep state local to workers. Each racked server in our fleet is a source of truth about the apps running on it, and provide an API to a market-style “scheduler” that bids on resources in regions. <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/#numad' title=''>You can read more about here, if you’re interested.</a> We call this system the <a href='https://fly.io/docs/machines/' title=''>Fly Machines API.</a></p>\n\n<p>An important detail to grok about how this all works – a reason we haven’t, like, beaten the CAP theorem by doing this – is that Fly Machines API calls can fail. If Nomad or K8s tries to place a workload on some server, only to find out that it’s filled up or thrown a rod, it will go hunt around for some other place to put it, like a good little robot. The Machines API won’t do this. It’ll just fail the request. In fact, it goes out of its way to fail the request quickly, to deliver feedback; if we can’t schedule work in <code>JNB</code> right now, you might want instead to quickly deploy to <code>BOM</code>.</p>\n<h2 id='pluggable-orchestration-and-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pluggable-orchestration-and-fks' aria-label='Anchor'></a><span class='plain-code'>Pluggable Orchestration and FKS</span></h2>\n<p>In a real sense what we’ve done here is extract a chunk of the scheduling problem out of our orchestrator, and handed it off to other components. For most of our users, that component is <a href='https://github.com/superfly/flyctl' title=''><code>flyctl</code>, our intrepid CLI</a>.</p>\n\n<p>But <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Fly Machines is an API</a>, and anything can drive it. A lot of our users want quick answers to requests to schedule apps in specific regions, and <code>flyctl</code> does a fine job of that. But it’s totally reasonable to want something that works more like the good little robots inside of K8s.</p>\n\n<p>You can build your own orchestrator with our API, but if what you’re looking for is literally Kubernetes, we’ve saved you the trouble. It’s called Fly Kubernetes, or FKS for short.</p>\n\n<p>FKS is an implementation of Kubernetes that runs on top of Fly.io. You start it up using <code>flyctl</code>, by running <code>flyctl ext k8s create</code>.</p>\n\n<p>Under the hood, FKS is a straightforward combination of two well-known Kubernetes projects: <a href='https://k3s.io/' title=''>K3s, the lightweight CNCF-certified K8s distro</a>, and <a href='https://virtual-kubelet.io/' title=''>Virtual Kubelet</a>.</p>\n\n<p>Virtual Kubelet is interesting. In K8s-land, a <code>kubelet</code> is a host agent; it’s the thing that runs on every server in your fleet that knows how to run a K8s Pod. Virtual Kubelet isn’t a host agent; it’s a software component that pretends to be a host, registering itself with K8s as if it was one, but then sneakily proxying the Kubelet API elsewhere.</p>\n\n<p>In FKS, “elsewhere” is <a href='https://fly.io/docs/machines/' title=''>Fly Machines</a>. All we have to do is satisfy various APIs that virtual kubelet exposes. For example, the API for the lifecycle of a pod:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-z4irn8cz\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-z4irn8cz\">type PodLifecycleHandler interface {\n    CreatePod(ctx context.Context, pod *corev1.Pod) error\n    UpdatePod(ctx context.Context, pod *corev1.Pod) error\n    DeletePod(ctx context.Context, pod *corev1.Pod) error\n    GetPod(ctx context.Context, namespace, name string) (*corev1.Pod, error)\n    GetPodStatus(ctx context.Context, namespace, name string) (*corev1.PodStatus, error)\n    GetPods(context.Context) ([]*corev1.Pod, error)\n}\n</code></pre>\n  </div>\n</div>\n<p>This interface is easy to map to the Fly Machines API. For example:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-pjbdlk52\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-pjbdlk52\">CreatePod -> POST /apps/{app_name}/machines\nUpdatePod -> POST /apps/{app_name}/machines/{machine_id}\n</code></pre>\n  </div>\n</div>\n<p>K3s, meanwhile, is a stripped-down implementation of all of K8s that fits into a single binary. K3s does a bunch of clever things to be as streamlined as it is, but the most notable of them is <a href='https://github.com/k3s-io/kine' title=''>kine, an API shim that switches <code>etcd</code> out with databases like SQLite</a>. Because of <code>kine</code>, K3s can manage multiple servers, but also gracefully runs on a single server, without distributed state.</p>\n\n<p>So that’s what we do. When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine. We compile a <a href='https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/' title=''>kubeconfig</a>, with which you can talk to your K3s via <code>kubectl</code>. We set the whole thing up to run Pods on individual Fly Machines, so your cluster scales out directly using our platform, but with K8s tooling.</p>\n\n<p>One thing we like about this design is how much of the lifting is already done for us by the underlying platform. If you’re a K8s person, take a second to think of all the different components you’re dealing with: <a href='https://etcd.io/' title=''>etcd</a>, specifically provisioned nodes, the <a href='https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/' title=''>kube-proxy</a>, <a href='https://github.com/flannel-io/flannel' title=''>a CNI </a>binary and configuration and its integration with the host network, containerd, registries. But Fly.io already does most of those things. So this project was mostly chipping away components until we found the bare minimum: CoreDNS, SQLite persistence, and Virtual Kubelet.</p>\n\n<p>We ended up with something significantly simpler than K3s, which is saying something.</p>\n\n<p>Fly Kubernetes has some advantages over plain <code>flyctl</code> and <code>fly.toml</code>:</p>\n\n<ul>\n<li>Your deployment is more declarative than it is with the <code>fly.toml</code> file. You declare the exact state of everything down to replica counts, autoscaling rules, volume definitions, and more.\n</li><li>When you deploy with Fly Kubernetes, Kubernetes will automatically make your definitions match the state of the world. Machines go down? Kubernetes will whack them back online.\n</li></ul>\n\n<p>This is a different way to do orchestration and scheduling on Fly.io. It’s not what everyone is going to want. But if you want it, you really want it, and we’re psyched to give it to you: Fly.io’s platform features, with Kubernetes handling configuration and driving your system to its desired state.</p>\n\n<p>We’ve kept things simple to start with. There are K8s use cases we’re a strong fit for today, and others we’ll get better at in the near future, as K8s users drive the underlying platform (and particularly our proxy) forward.</p>\n\n<p><strong class='font-semibold text-navy-950'>Interested in getting early access? Email us at <a href=\"mailto:[email protected]\">[email protected]</a> and we’ll hook you up.</strong></p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Not invested in K8s?</h1>\n    <p>Nothing has to change for you! You can deploy apps on Fly.io today, in a matter of minutes, without talking to Sales.</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/docs/speedrun/\">\n        Deploy an app in minutes.<span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-turtle.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<div class=\"youtube-container\" data-exclude-render>\n  <div class=\"youtube-video\">\n    <iframe\n      width=\"100%\"\n      height=\"100%\"\n      src=\"https://www.youtube.com/embed/A3vFfZvUiwo\"\n      frameborder=\"0\"\n      allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n      allowfullscreen>\n    </iframe>\n  </div>\n</div>\n\n<h2 id='what-it-all-means' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-all-means' aria-label='Anchor'></a><span class='plain-code'>What It All Means</span></h2>\n<p>One obvious thing it means is that you’ve got an investment in Kubernetes tooling, you can keep it while running things on top of Fly.io. So that’s pretty neat. Buy our cereal!</p>\n\n<p>But the computer science story is interesting, too. We placed a bet on an idiosyncratic strategy for doing global orchestration. We replaced global consensus, which is how Borg, Kubernetes, and Nomad all work, with a market-based system. That system was faster and, importantly, dumber than the consensus system it replaced.</p>\n\n<p>This had costs! Nomad’s global consensus would do truly heroic amounts of work to make sure Fly Apps got scheduled somewhere, anywhere. Like a good capitalist, Fly Machines will tell you in no uncertain terms how much work it’s willing to do for you (“less than a Nomad”).</p>\n\n<p>But that doesn’t mean you’re stuck with the answers Fly Machines gives by itself. Because Fly Machines is so simple, and tries so hard to be predictable, we hoped you’d be able to build more sophisticated scheduling and orchestration schemes on top of it. And here you go: Kubernetes scheduling, as a plugin to the platform.</p>\n\n<p>More to come! We’re itching to see just how many different ways this bet might pay off. Or: we’ll perish in flames! Either way, it’ll be fun to watch.</p>",
      "image": {
        "url": "https://fly.io/blog/fks/assets/fks-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/fly-io-has-gpus-now/",
      "title": "Fly.io has GPUs now",
      "description": null,
      "url": "https://fly.io/blog/fly-io-has-gpus-now/",
      "published": "2023-12-13T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, we’re a new public cloud that lets you put your compute where it matters: near your users. Today we’re announcing that you can do this with GPUs too, allowing you to do AI workloads on the edge. Want to find out more? Keep reading.</p>\n</div><h2 id='ai-is-pretty-fly' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ai-is-pretty-fly' aria-label='Anchor'></a><span class='plain-code'>AI is pretty fly</span></h2>\n<p>AI is apparently a bit of a <em>thing</em> (maybe even <em>an thing</em> come to think about it). We’ve seen entire industries get transformed in the wake of ChatGPT existing (somehow it’s only been around for a year, I can’t believe it either). It’s likely to leave a huge impact on society as a whole in the same way that the Internet did once we got search engines. Like any good venture-capital funded infrastructure provider, we want to enable you to do hilarious things with AI using industrial-grade muscle.</p>\n\n<p>Fly.io lets you run a full-stack app—or an entire dev platform based on the <a href='https://fly.io/docs/machines/' title=''>Fly Machines API</a>—close to your users. Fly.io GPUs let you attach an <a href='https://www.nvidia.com/en-us/data-center/a100/' title=''>Nvidia A100</a> to whatever you’re building, harnessing the full power of CUDA with more VRAM than your local 4090 can shake a ray-traced stick at. With these cards (or whatever you call a GPU attached to SXM fabric), AI/ML workloads are at your fingertips. You can <a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>recognize speech</a>, segment text, summarize articles, synthesize images, and more at speeds that would make your homelab blush. You can even set one up as your programming companion with <a href='https://github.com/deepseek-ai/DeepSeek-Coder' title=''>your model of choice</a> in case you’ve just not been feeling it with the output of <em>other</em> models changing over time.</p>\n\n<p>If you want to find out more about what these cards are and what using them is like, check out <a href='https://fly.io/blog/what-are-these-gpus-really/' title=''>What are these “GPUs” really?</a> It covers the history of GPUs and why it’s ironic that the cards we offer are called “Graphics Processing Units” in the first place.</p>\n<h2 id='fly-io-gpus-in-action' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-io-gpus-in-action' aria-label='Anchor'></a><span class='plain-code'>Fly.io GPUs in Action</span></h2>\n<p>We want you to deploy your own code with your favorite models on top of Fly.io’s cloud backbone. Fly.io GPUs make this really easy.</p>\n\n<p>You can get a GPU app running <a href='https://ollama.ai' title=''>Ollama</a> (our friends in text generation) in two steps:</p>\n\n<ol>\n<li><p>Put this in your <code>fly.toml</code>:</p>\n<div class=\"highlight-wrapper group relative toml\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-u2sd0gfo\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-u2sd0gfo\"><span class=\"py\">app</span> <span class=\"p\">=</span> <span class=\"s\">\"sandwich_ai\"</span>\n<span class=\"py\">primary_region</span> <span class=\"p\">=</span> <span class=\"s\">\"ord\"</span>\n<span class=\"py\">vm.size</span> <span class=\"p\">=</span> <span class=\"s\">\"a100-40gb\"</span>\n\n<span class=\"nn\">[build]</span>\n  <span class=\"py\">image</span> <span class=\"p\">=</span> <span class=\"s\">\"ollama/ollama\"</span>\n\n<span class=\"nn\">[mounts]</span>\n  <span class=\"py\">source</span> <span class=\"p\">=</span> <span class=\"s\">\"models\"</span>\n  <span class=\"py\">destination</span> <span class=\"p\">=</span> <span class=\"s\">\"/root/.ollama\"</span>\n  <span class=\"py\">initial_size</span> <span class=\"p\">=</span> <span class=\"s\">\"100gb\"</span>\n</code></pre>\n  </div>\n</div></li><li><p>Run <code>fly apps create sandwich_ai && fly deploy</code>.</p>\n</li></ol>\n\n<p>If you want to read more about how to start your new sandwich empire, check out <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a>, it explains how to set up Ollama so that it <em>automatically scales itself down</em> when it’s not in use.</p>\n<h2 id='the-speed-of-light-is-only-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-speed-of-light-is-only-so-fast' aria-label='Anchor'></a><span class='plain-code'>The speed of light is only so fast</span></h2>\n<p>Being able to spin up GPUs is great, but where Fly.io really shines is inference at the edge.</p>\n\n<p>Let’s say you have an app that lets users enter ingredients they have in their kitchen and receive a sandwich recipe. Your users expect their recipes <em>instantly</em> (or at least as fast as the other leading apps). Seconds count when you need an emergency sandwich.</p>\n<div class=\"left-sidenote\"><p><br>\n<br>\n<br>\nIt’s depressingly customary in the AI industry to cherry-pick outputs. This was not cherry-picked. I used <a href=\"https://ollama.ai/library/yi:34b\" title=\"\">yi:34b</a> to generate this recipe. I’m not sure what a taco salad sandwich is, but I might be willing to try it.</p>\n</div>\n<p><img alt=\"A conversation between a user and an artificial intelligence. The user asks: \"What kind of sandwich can I make with an egg, a taco shell, and a pound of ground beef?\" The AI responds: \" You can make a breakfast taco or a taco salad sandwich with the ingredients you have. Here's how to prepare each: for the breakfast taco. Cook the pound of ground beef in a pan until browned. Season with salt, pepper, and any other desired spices like chili powder or taco seasoning. While the beef is cooking, fry an egg (or scramble it if you prefer) in another pan. Warm up the taco shell in the oven or microwave according to the package instructions. Assemble the taco by placing the ground cooked beef in the taco shell followed by the fried egg. You can also add any additional toppings like diced onions avocados, avocado, salsa or cheese if you have them. Serve with a side of hot sauce or sour cream if desired. For the taco salad sandwich: Prepare the ground beef as in the breakfast taco recipe above. Instead of frying an egg, you can cook it sunny side up or over easy depending on your preference. Crumble the taco shell into small pieces and mix them with the cooked ground beef. Assemble the sandwich by layering the crumbled taco shell mixture and ground beef in two slices of bread. Top with the fried egg and any additional toppings you like such as shredded lettuce, diced tomatoes, or sliced olives. Serve with a side of salsa or ranch dressing if desired. Both options are hardy and can be easily customized based on your taste preferences. Enjoy!\"\" src=\"/blog/fly-io-has-gpus-now/assets/sandwich-recipes.webp\" /></p>\n\n<p>In the previous snippet, we deployed our app to ord (<code>primary_region = \"ord\"</code>). The good news is that our model returns a result really quickly and users in Chicago get instant sandwich recipes. It’s a good experience for users near your datacentre, and you can do this on any half decent cloud provider.</p>\n\n<p>But surely people outside of Chicago need sandwiches too. Amsterdam has sandwich fiends as well. And sometimes it takes too long to have their requests leap across the pond. The speed of light is only so fast after all. Don’t worry, we’ve got your back. Fly.io has GPUs in datacentres all over the world. Even more, we’ll let you run <em>the same program</em> with the same public IP address and the same TLS certificates in any regions with GPU support.</p>\n\n<p>Don’t believe us? See how you can scale your app up in Amsterdam with one command:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-ltu09wf5\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-ltu09wf5\">fly scale count 2 --region ams\n</code></pre>\n  </div>\n</div>\n<p>It’s that easy.</p>\n<h2 id='actually-on-demand' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#actually-on-demand' aria-label='Anchor'></a><span class='plain-code'>Actually On-Demand</span></h2>\n<p>GPUs are powerful parallel processing packages, but they’re not cheap! Once we have enough people wanting to turn their fridge contents into tasty sandwiches, keeping a GPU or two running makes sense. But we’re just a small app still growing our user base while also funding the latest large sandwich model research. We want to only pay for GPUs when a user makes a request.</p>\n\n<p>Let’s open up that <code>fly.toml</code> again, and add a section called <code>services</code>, and we’ll include instructions on how we want our app to scale up and down:</p>\n<div class=\"highlight-wrapper group relative toml\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-yf7bfopz\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-yf7bfopz\"><span class=\"nn\">[[services]]</span>\n  <span class=\"py\">internal_port</span> <span class=\"p\">=</span> <span class=\"mi\">8080</span>\n  <span class=\"py\">protocol</span> <span class=\"p\">=</span> <span class=\"s\">\"tcp\"</span>\n  <span class=\"py\">auto_stop_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n  <span class=\"py\">auto_start_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n  <span class=\"py\">min_machines_running</span> <span class=\"p\">=</span> <span class=\"mi\">0</span>\n</code></pre>\n  </div>\n</div>\n<p>Now when no one needs sandwich recipes, you don’t pay for GPU time.</p>\n<h2 id='the-deets' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-deets' aria-label='Anchor'></a><span class='plain-code'>The Deets</span></h2>\n<p>We have GPUs ready to use in several US and EU regions and Sydney. You can deploy your sandwich, music generation, or AI illustration apps to:</p>\n\n<ul>\n<li><a href='https://www.nvidia.com/en-us/data-center/a100/' title=''>Ampere A100s</a> with 40gb of RAM for $2.50/hr\n</li><li><a href='https://www.nvidia.com/en-us/data-center/a100/' title=''>Ampere A100s</a> with 80gb of RAM for $3.50/hr\n</li><li><a href='https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413' title=''>Lovelace L40s</a> are coming soon (update: now here!) for $2.50/hr\n</li></ul>\n\n<p>By default, anything you deploy to GPUs will use eight heckin’ <a href='https://www.amd.com/en/processors/epyc-server-cpu-family' title=''>AMD EPYC</a> CPU cores, and you can attach volumes up to 500 gigabytes. We’ll even give you discounts for reserved instances and dedicated hosts if you ask nicely.</p>\n\n<p>We hope you have fun with these new cards and we’d love to see what you can do with them! Reach out to us on X (formerly Twitter) or <a href='https://community.fly.io/' title=''>the community forum</a> and share what you’ve been up to. We’d love to see what we can make easier!</p>",
      "image": {
        "url": "https://fly.io/blog/fly-io-has-gpus-now/assets/llama-portal-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/what-are-these-gpus-really/",
      "title": "What are these \"GPUs\" really?",
      "description": null,
      "url": "https://fly.io/blog/what-are-these-gpus-really/",
      "published": "2023-12-11T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>Fly.io runs containerized apps with virtual machine isolation on our own hardware around the world, so you can safely run your code close to where your users are. We’re in the process of rolling out GPU support, and that’s what this post is about, but you don’t have to wait for that to try us out: <a href=\"https://fly.io/docs/speedrun/\" title=\"\">your app can be up and running on us in minutes</a>.</p>\n</div>\n<p>GPU hardware will let our users run all sorts of fun Artificial Intelligence and Machine Learning (AI/ML) workloads near their users. But, what are these “GPUs” really? What can they do? What <em>can’t</em> they do?</p>\n\n<p>Listen here for my tale of woe as I spell out exactly what these cards are, are not, and what you can do with them. By the end of this magical journey, you should understand the true irony of them being called “Graphics Processing Units” and why every marketing term is always bad forever.</p>\n<h2 id='how-does-computer-formed' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-does-computer-formed' aria-label='Anchor'></a><span class='plain-code'>How does computer formed?</span></h2>\n<p>In the early days of computing, your computer generally had a few basic components:</p>\n\n<ul>\n<li>The CPU\n</li><li>Input device and assorted peripherals (keyboard, etc)\n</li><li>Output device (monitor, printer, etc)\n</li><li>Memory\n</li><li>Glue logic chips\n</li><li>Video rendering hardware\n</li></ul>\n\n<p>Taking the Commodore 64 as an example, it had a CPU, a chip to handle video output, a chip to handle audio output, and a chip to glue everything together. The CPU would read instructions from the RAM and then execute them to do things like draw to the screen, solve sudoku puzzles, play sounds, and so on.</p>\n\n<p>However, even though the CPU by itself was fast by the standards of the time, it could only do a million clock cycles per second or so. Imagine a very small shouting crystal vibrating millions of times per second triggering the CPU to do one part of a task and you’ll get the idea. This is fast, but not fast enough when executing instructions can take longer than a single clock cycle and when your video output device needs to be updated 60 times per second.</p>\n\n<p>The main way they optimized this was by shunting a lot of the video output tasks to a bespoke device called the VIC-II (Video Interface Chip, version 2). This allowed the Commodore 64 to send a bunch of instructions to the VIC-II and then let it do its thing while the CPU was off doing other things. This is called “offloading”.</p>\n\n<p><img src=\"/blog/what-are-these-gpus-really/assets/./deus-ex-machina-cover.webp\" /></p>\n\n<p>As technology advanced, the desire to do bigger and better things with both contemporary and future hardware increased. This came to a head when this little studio nobody had ever heard of called id Software released one of the most popular games of all time: DOOM.</p>\n\n<p>Now, even though DOOM was a huge advancement in gaming technology, it was still incredibly limited by the hardware of the time. It was actually a 2D game that used a lot of tricks to make it look (and feel) like it was 3D. It was also limited to a resolution of 320x200 and a hard cap of 35 frames per second. This was fine for the time (most movies were only at 24 frames per second), but it was clear that there was a lot of room for improvement.</p>\n\n<p>One of the main things that DOOM did was to use a pair of techniques to draw the world at near real-time. It used a combination of “raycasting” and binary-space partitioning to draw the world. This basically means that they drew a bunch of imaginary lines to where points in the map would be to figure out what color everything would be and then eliminated the parts of the map that were behind walls and other objects. This is a very simplified explanation, and if you want to know more, <a href='https://fabiensanglard.net/doomIphone/doomClassicRenderer.php' title=''>Fabien Sanglard explains the rendering</a> of DOOM in more detail.</p>\n<h2 id='the-dream-of-3d' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-dream-of-3d' aria-label='Anchor'></a><span class='plain-code'>The dream of 3D</span></h2>\n<p>However, a lot of this was logic that ran very slowly on the CPU, and while the CPU was doing the display logic, it couldn’t do anything else, such as enemy AI or playing sounds. Hence the idea of a “3D accelerator card”. The idea: offload the 3D rendering logic to a separate device that could do it much faster than the CPU could, and free the CPU to do other things like AI, sound, and so on.</p>\n\n<p>This was the dream, but it was a long way off. Then Quake happened.</p>\n<div class=\"right-sidenote\"><p>Really, Half-Life is based on Quake so much that the pattern for <a href=\"https://www.pcgamer.com/half-life-alyxs-lights-flicker-just-like-they-did-in-quake-almost-25-years-later/\" title=\"\">blinking lights</a> has carried forward 25 years later to Half-Life: Alyx in VR. If it ain’t broke, don’t fix it.</p>\n</div>\n<p>Unlike Doom, Quake was fully 3D on unmodified consumer hardware. Players could look up and down (something previously thought impossible without accelerator hardware!) and designers could make levels with that in mind. Quake also allowed much more complex geometry and textures. It was a huge leap forward in 3D gaming and it was only possible because of the massive leap in CPU power at the time. The Pentium family of processors was such a huge leap that it allowed them to bust through and do it in “real time”. Quake has since set the standard for multiplayer deathmatch games, and its source code has lineage to Call of Duty, Half-Life, Half-Life 2, DotA 2, Titanfall, and Apex Legends.</p>\n\n<p>However, the thing that really made 3D accelerator cards leap into the public spotlight was another little-known studio called Crystal Dynamics and their 1996 release of Tomb Raider. It was built from the ground up to require the use of 3D accelerator cards. The cards flew off the shelves.</p>\n\n<p>“3D accelerator cards” would later become known as “Graphics Processing Units” or GPUs because of how synonymous they became with 3D gaming, engineering tasks such as Computer-Aided Drafting (CAD), and even the entire OS environment with compositors like <a href='https://en.wikipedia.org/wiki/Desktop_Window_Manager' title=''>DWM</a> on Windows Vista, <a href='https://en.wikipedia.org/wiki/Compiz' title=''>Compiz</a> on GNU+Linux, and <a href='https://en.wikipedia.org/wiki/Quartz_(graphics_layer)' title=''>Quartz</a> on macOS. Things became so much easier for everyone when 2D and 3D graphics were integrated into the same device so you didn’t need to chain your output through your 3D accelerator card!</p>\n<h2 id='the-gpu-as-we-know-it' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-gpu-as-we-know-it' aria-label='Anchor'></a><span class='plain-code'>The GPU as we know it</span></h2>\n<p>When GPUs first came out, they were very simple devices. They had a few basic components:</p>\n\n<ul>\n<li>A framebuffer to store the current state of the screen\n</li><li>A command processor to take instructions from the game and translate them into something the hardware can understand\n</li><li>Memory to store temporary data\n</li><li>Shader processing hardware to allow designers to change how light and textures were rendered\n</li><li>A display output that was chained through an existing VGA card so that the user could see what was going on in real time (yes, this is something we actually did)\n</li></ul>\n\n<p>This basic architecture has remained the same for the past 20 years or so. The main differences are that as technology advanced, the capabilities of those cards increased. They got faster, more parallel, more capable, had more memory, were made cheaper, and so on. This gradually allowed for more and more complex games like Half-Life 2, Crysis, The Legend of Zelda: Breath of the Wild, Baudur’s Gate 3, and so on.</p>\n\n<p>Over time, as more and more hardware was added, GPUs became computers in their own rights (sometimes even bigger than the rest of the computer thanks for the need to cool things more aggressively). This new hardware includes:</p>\n\n<ul>\n<li>Video encoding hardware via NVENC and AMD VCE so that content creators can stream and record their gameplay in higher quality without having to impact the performance of the game\n</li><li><aside class=\"left-sidenote\">Seriously, once you experience high framerate HDR raytraced Tetris you can’t really go back to the old way.</aside>  Raytracing accelerator cores via RTX so that light can be rendered more realistically\n</li><li>AI/ML cores to allow for dynamic upscaling to eke out more performance from the card\n</li><li>Display output hardware to allow for multiple monitors to be connected to the card\n</li><li>Faster and faster memory buses and interfaces to the rest of the system to allow for more data to be processed faster\n</li><li>Direct streaming from the drive to GPU memory to allow for faster loading times\n</li></ul>\n\n<p>But, at the same time, that AI/ML hardware started to get noticed by more and more people. It was discovered that the shader cores and then the CUDA cores could be used to do AI/ML workloads at ludicrous speeds. This enabled research and development of models like GPT-2, Stable Diffusion, DLSS, and so on. This has led to a Cambrian Explosion of AI/ML research and development that is continuing to this day.</p>\n<h2 id='the-quot-gpus-quot-that-fly-io-is-using' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-quot-gpus-quot-that-fly-io-is-using' aria-label='Anchor'></a><span class='plain-code'>The “GPUs” that Fly.io is using</span></h2>\n<p>I’ve mostly been describing consumer GPUs and their capabilities up to this point because that’s what we all have the biggest understanding of. There is a huge difference between the “GPUs” that you can get for server tasks and normal consumer tasks from a place like Newegg or Best Buy. The main difference is that enterprise-grade Graphics Processing Units do not have any of the hardware needed to process graphics.</p>\n<div class=\"right-sidenote\"><p>Author’s note: This will not be the case in the future. Fly.io is going to add <a href=\"https://www.nvidia.com/en-us/data-center/l40s/\" title=\"\">Lovelace L40S GPUs</a> that do have 3D rendering, video encoding, shader cores, and so on. But, that’s not what we’re talking about today.</p>\n</div>\n<p>Yes. Really. They don’t have rasterization hardware, shader cores, display outputs, or anything useful for trying to run games on them. They are AI/ML accelerator cards more than anything. It’s kinda beautifully ironic that they’re called Graphics Processing Units when they have no ability to process graphics.</p>\n<h2 id='what-can-you-do-with-them' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-can-you-do-with-them' aria-label='Anchor'></a><span class='plain-code'>What can you do with them?</span></h2>\n<p>These GPUs are really good at massively parallel tasks. This naturally translates to being very good at AI/ML tasks such as:</p>\n\n<ul>\n<li>Summarization (what is this article about in a few sentences?)\n</li><li>Translation (what does this article say in Spanish?)\n</li><li>Speech recognition (what is a voice clip saying?)\n</li><li>Speech synthesis (what does this text sound like?)\n</li><li>Text generation (what would a cat say if it could talk?)\n</li><li>Basic rote question and answering (what is the safe cooking temperature for chicken breasts in celsius?)\n</li><li>Text classification (is this article about cats or dogs?)\n</li><li>Sentiment analysis (is this article positive or negative, what could that mean about the companies involved?)\n</li><li>Image classification (is this a cat or a dog?)\n</li><li>Object detection (where are the cats and dogs in this image?)\n</li></ul>\n\n<p>Or any combination/chain of these tasks. A lot of this is pretty abstract building blocks that can be combined in a lot of different ways. This is why AI/ML stuff is so exciting right now. We’re in the early days of understanding what these things are, what they can do, and how to use them properly.</p>\n\n<p>Imagine being able to load articles about the topic you are researching into your queries to find where someone said something roughly similar to what you’re looking for. Queries like “that one recipe with eggs that you fold over with ham in it”. That’s the kind of thing that’s possible with AI/ML (and tools like vector databases) but difficult to impossible with traditional search engines.</p>\n<h2 id='how-to-use-ai-for-reals' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-to-use-ai-for-reals' aria-label='Anchor'></a><span class='plain-code'>How to use AI for reals</span></h2>\n<p>Fortunately and unfortunately, we’re in the Cambrian Explosion days of this industry. Key advances happen constantly. Exact models and tooling changes almost as often. This is both a very good thing and a very bad thing.</p>\n\n<p>If you want to get started today, here’s a few models that you can play with right now:</p>\n\n<ul>\n<li><a href='https://ai.meta.com/llama/' title=''>Llama 2</a> - A generic foundation model with instruction and chat tuned variants. It’s a good starting point for a lot of research and nearly everything else uses the same formats that Llama 2 does.\n</li><li><a href='https://openai.com/research/whisper' title=''>Whisper</a> - A speech to text model that transcribes audio files into text better than most professional dictation software. I, the author of this article, wrote most of this article using Whisper.\n</li><li><a href='https://huggingface.co/NurtureAI/OpenHermes-2.5-Mistral-7B-16k' title=''>OpenHermes-2.5 Mistral 7B 16k</a> - An instruction-tuned model that can operate on up to 16 thousand tokens (about 40 printed pages of text, 12,000 words) at once. It’s a good starting point for summarization and other tasks that require a lot of context. I personally use it for my personal AI chatbot named <a href='https://xeiaso.net/characters/#Mimi' title=''>Mimi</a>.\n</li><li><aside class=\"right-sidenote\">Seriously Annie, you’re great!</aside> <a href='https://stability.ai/stable-diffusion' title=''>Stable Diffusion XL</a> - A text-to-image model that lets you create high quality images from simple text descriptions. It’s a good starting point for tasks that require image generation, such as when you want to add images to your blog posts but don’t have an artist like Annie to draw you what you want.\n</li></ul>\n\n<p>For a practical example, imagine that you have a set of <a href='https://xeiaso.net/talks/' title=''>conference talks that you’ve given over the years</a>. You want to take those talk videos, extract the audio, and transform them into written text because some people learn better from text than video. The overall workflow would look something like this:</p>\n\n<ul>\n<li>Use ffmpeg to extract the audio track from the video files\n</li><li>Use Whisper to <a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>convert the audio files into subtitle files</a>\n</li><li>Break the subtitle file into sequences based on significant pauses between topics (humans do this subconsciously, take advantage of it and you can make things seem heckin’ magic)\n</li><li>Use a large language model to summarize the segments and create a title for each segment\n</li><li>Paste the rest of the text into a markdown document between the segment titles\n</li><li>Manually review the documents and make any necessary changes with technical terms that the model didn’t know about or things the model got wrong because English is a minefield of homophones that even trained experts have trouble with (ask me how I know)\n</li><li>Publish the documents on your blog\n</li></ul>\n\n<p>Then bam, you don’t just have a portfolio piece, you have the recipe for winning downtime from visitors of orange websites clicking on your link so much. You can also use this to create transcripts for your videos so that people who can’t hear can still enjoy your content.</p>\n\n<p>The true advantage of these is not using them as individual parts on themselves, but as a cohesive whole in a chain. This is where the real power of AI/ML comes from. It’s not the individual models, but the ability to chain them together to do something useful. This is where the true opportunities for innovation lie.</p>\n<h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2>\n<p>So that’s what these “GPUs” are really: they’re AI/ML accelerator cards. The A100 cards incapable of processing graphics or encoding video, but they’re really, really good at AI/ML workloads. They allow you to do way more tasks per watt than any CPU ever could.</p>\n\n<p>I hope you enjoyed this tale of woe as I spilled out the horrible truths about marketing being awful forever and gave you ideas for how to <em>actually use</em> these graphics-free Graphics Processing Units to do useful things. But sadly, not for processing graphics unless you wait for the <a href='https://www.nvidia.com/en-us/data-center/l40s/' title=''>Lovelace L40S</a> cards early in 2024.</p>\n\n<p>Sign up for Fly.io today and try our GPUs! I can’t wait to see what you build with them.</p>",
      "image": {
        "url": "https://fly.io/blog/what-are-these-gpus-really/assets/gpu-songstress-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/scaling-llm-ollama/",
      "title": "Scaling Large Language Models to zero with Ollama",
      "description": null,
      "url": "https://fly.io/blog/scaling-llm-ollama/",
      "published": "2023-12-06T12:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io. We have powerful servers worldwide to run your code close to your users. Including GPUs so you can self host your own AI.</p>\n</div>\n<p>Open-source self-hosted AI tools have advanced a lot in the past 6 months. They allow you to create new methods of expression (with QR code generation and Stable Diffusion), easy access to summarization powers that would have made Google blush a decade ago (even with untuned foundation models such as LLaMa 2 and Yi), to conversational assistants that enable people to do more with their time, and to perform speech recognition in <em>real time</em> on moderate hardware (with Whisper et al). With all these capabilities comes the need for more and more raw computational muscle to be able to do inference on bigger and bigger models, and eventually do things that we can’t even imagine right now. Fly.io lets you put your compute where your users are so that you can do machine learning inference tasks on the edge with the power of enterprise-grade GPUs such as the Nvidia A100. You can also scale your GPU nodes to zero running Machines, so you only pay for what you actually need, when you need it.</p>\n<div class=\"right-sidenote\"><p>It’s worth mentioning that “scaling to zero” doesn’t mean what you may think it means. When you “scale to zero” in Fly.io, you actually stop the running Machine. This means the Machine is still laying around on the same computer box that it runs on, but it’s just put to sleep. If there is a capacity issue then your app may be unable to wake back up. We are working on a solution to this, but for now you should be aware that scaling to zero is not the same as spinning down your Machine and spinning it back up again on a new computer box when you need it.</p>\n</div><div class=\"callout\"><p>This is a continuation of the last post in this series about <a href=\"https://fly.io/blog/transcribing-on-fly-gpu-machines/\" title=\"\">how to use GPUs on Fly.io</a>.</p>\n</div><h2 id='why-scale-to-zero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-scale-to-zero' aria-label='Anchor'></a><span class='plain-code'>Why scale to zero?</span></h2>\n<p>Running GPU nodes on top of Fly is expensive. Sure, GPUs enable you to do things a lot faster than CPUs ever could on their own, but you mostly will have things run idle between uses. This is where scaling to zero comes in. With scaling to zero, you can have your GPU nodes shut down when you’re not using them. When your Machine stops, you aren’t paying for the GPU any more. This is good for the environment and your wallet.</p>\n\n<p>In this post, we’re going to be using <a href='https://ollama.ai' title=''>Ollama</a> to generate text. Ollama is a fancy wrapper around <a href='https://github.com/ggerganov/llama.cpp' title=''>llama.cpp</a> that allows you to run large language models on your own hardware with your choice of model. It also supports GPU acceleration, meaning that you can use Fly.io’s huge GPUs to run your models faster than your RTX 3060 at home ever would on its own.</p>\n\n<p>One of the main downsides of using Ollama in a cloud environment is that it doesn’t have authentication by default. Thanks to the power of about 70 lines of Go, we are able to shim that in after the fact. This will protect your server from random people on the internet using your GPU time (and spending your money) to generate text and integrate it into your own applications.</p>\n\n<p>Create a new folder called <code>ollama-scale-to-0</code>:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-kehwb9j1\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-kehwb9j1\"><span class=\"nb\">mkdir </span>ollama-scale-to-0\n</code></pre>\n  </div>\n</div><h2 id='fly-app-setup' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-app-setup' aria-label='Anchor'></a><span class='plain-code'>Fly app setup</span></h2>\n<p>First, we need to create a new Fly app:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-78cxwfbn\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-78cxwfbn\">fly launch <span class=\"nt\">--no-deploy</span>\n</code></pre>\n  </div>\n</div>\n<p>After selecting a name and an organization to run it in, this command will create the app and write out a <code>fly.toml</code> file for you:</p>\n<div class=\"highlight-wrapper group relative toml\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-6efaeoqi\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-6efaeoqi\"><span class=\"c\"># fly.toml app configuration file generated for sparkling-violet-709 on 2023-11-14T12:13:53-05:00</span>\n<span class=\"c\">#</span>\n<span class=\"c\"># See https://fly.io/docs/reference/configuration/ for information about how to use this file.</span>\n<span class=\"c\">#</span>\n\n<span class=\"py\">app</span> <span class=\"p\">=</span> <span class=\"s\">\"sparkling-violet-709\"</span>\n<span class=\"py\">primary_region</span> <span class=\"p\">=</span> <span class=\"s\">\"ord\"</span>\n\n<span class=\"nn\">[http_service]</span>\n  <span class=\"py\">internal_port</span> <span class=\"p\">=</span> <span class=\"mi\">11434</span> <span class=\"c\"># change me to 11434!</span>\n  <span class=\"py\">force_https</span> <span class=\"p\">=</span> <span class=\"kc\">false</span> <span class=\"c\"># change mo to false!</span>\n  <span class=\"py\">auto_stop_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n  <span class=\"py\">auto_start_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n  <span class=\"py\">min_machines_running</span> <span class=\"p\">=</span> <span class=\"mi\">0</span>\n  <span class=\"py\">processes</span> <span class=\"p\">=</span> <span class=\"nn\">[\"app\"]</span>\n</code></pre>\n  </div>\n</div>\n<p>This is the configuration file that Fly.io uses to know how to run your application. We’re going to be modifying the <code>fly.toml</code> file to add some additional configuration to it, such as enabling GPU support:</p>\n<div class=\"highlight-wrapper group relative toml\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-ef9ddpog\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-ef9ddpog\"><span class=\"py\">app</span> <span class=\"p\">=</span> <span class=\"s\">\"sparkling-violet-709\"</span>\n<span class=\"py\">primary_region</span> <span class=\"p\">=</span> <span class=\"s\">\"ord\"</span>\n<span class=\"py\">vm.size</span> <span class=\"p\">=</span> <span class=\"s\">\"a100-40gb\"</span> <span class=\"c\"># the GPU size, see https://fly.io/docs/gpus/gpu-quickstart/ for more info</span>\n</code></pre>\n  </div>\n</div>\n<p>We don’t want to expose the GPU to the internet, so we’re going to create a <a href='https://fly.io/docs/reference/private-networking/#flycast-private-load-balancing' title=''>flycast</a> address to expose it to other services on your private network. To create a flycast address, run this command:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-q2lw27z\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-q2lw27z\">fly ips allocate-v6 <span class=\"nt\">--private</span>\n</code></pre>\n  </div>\n</div>\n<p>The <code>fly ips allocate-v6</code> command makes a unique address in your private network that you can use to access Ollama from your other services. Make sure to add the <code>--private</code> flag, otherwise you’ll get a globally unique IP address instead of a private one.</p>\n\n<p>Next, you may need to remove all of the other public IP addresses for the app to lock it away from the public. Get a list of them with <code>fly ips list</code> and then remove them with <code>fly ips release <ip></code>. Delete everything but your flycast IP.</p>\n\n<p>Next, we need to declare the volume for Ollama to store models in. If you don’t do this, then when you scale to zero, your existing models will be destroyed and you will have to re-download them every time the server starts. This is not ideal, so we’re going to create a persistent volume to store the models in. Add the following to your <code>fly.toml</code>:</p>\n<div class=\"highlight-wrapper group relative toml\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-xcc0fiwo\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-xcc0fiwo\"><span class=\"nn\">[build]</span>\n  <span class=\"py\">image</span> <span class=\"p\">=</span> <span class=\"s\">\"ollama/ollama\"</span>\n\n<span class=\"nn\">[mounts]</span>\n  <span class=\"py\">source</span> <span class=\"p\">=</span> <span class=\"s\">\"models\"</span>\n  <span class=\"py\">destination</span> <span class=\"p\">=</span> <span class=\"s\">\"/root/.ollama\"</span>\n  <span class=\"py\">initial_size</span> <span class=\"p\">=</span> <span class=\"s\">\"100gb\"</span>\n</code></pre>\n  </div>\n</div>\n<p>This will create a 100GB volume in the <a href='https://en.wikipedia.org/wiki/O%27Hare_International_Airport' title=''><code>ord</code></a> region when the app is deployed. This will be used to store the models that you download from the <a href='https://ollama.ai/library/' title=''>Ollama library</a>. You can make this smaller if you want, but 100GB is a good place to start from.</p>\n\n<p>Now that everything is set up, we can deploy this to Fly.io:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-f1l19v0a\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-f1l19v0a\">fly deploy\n</code></pre>\n  </div>\n</div>\n<p>This will take a minute to pull the Ollama image, push it to a Machine, provision your volume, and kick everything else off with hypervisors, GPUs and whatnot. Once it’s done, you should see something like this:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-wji5shy9\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-wji5shy9\"> ✔ Machine 17816141f55489 <span class=\"o\">[</span>app] update succeeded\n<span class=\"nt\">-------</span>\n\nVisit your newly deployed app at https://sparkling-violet-709.fly.dev/\n</code></pre>\n  </div>\n</div>\n<p>This is a lie because we just deleted the public IP addresses for this app. You can’t access it from the internet, and by extension, random people can’t access it either. For now, you can run an interactive session with Ollama using an ephemeral Fly Machine:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-abxdbso0\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-abxdbso0\">fly m run <span class=\"nt\">-e</span> <span class=\"nv\">OLLAMA_HOST</span><span class=\"o\">=</span>http://sparkling-violet-709.flycast <span class=\"nt\">--shell</span> ollama/ollama\n</code></pre>\n  </div>\n</div>\n<p>And then you can pull an image from the <a href='https://ollama.ai/library/' title=''>ollama library</a> and generate some text:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-7zldfj8t\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-7zldfj8t\"><span class=\"nv\">$ </span>ollama run openchat:7b-v3.5-fp16\n<span class=\"o\">>>></span> How <span class=\"k\">do </span>I bake chocolate chip cookies?\n To bake chocolate chip cookies, follow these steps:\n\n1. Preheat the oven to 375°F <span class=\"o\">(</span>190°C<span class=\"o\">)</span> and line a baking sheet with parchment paper or silicone baking mat.\n\n2. In a large bowl, mix together 1 cup of unsalted butter <span class=\"o\">(</span>softened<span class=\"o\">)</span>, 3/4 cup granulated sugar, and 3/4\ncup packed brown sugar <span class=\"k\">until </span>light and fluffy.\n\n3. Add 2 large eggs, one at a <span class=\"nb\">time</span>, to the butter mixture, beating well after each addition. Stir <span class=\"k\">in </span>1\nteaspoon of pure vanilla extract.\n\n4. In a separate bowl, whisk together 2 cups all-purpose flour, 1/2 teaspoon baking soda, and 1/2 teaspoon\nsalt. Gradually add the dry ingredients to the wet ingredients, stirring <span class=\"k\">until </span>just combined.\n\n5. Fold <span class=\"k\">in </span>2 cups of chocolate chips <span class=\"o\">(</span>or chunks<span class=\"o\">)</span> into the dough.\n\n6. Drop rounded tablespoons of dough onto the prepared baking sheet, spacing them about 2 inches apart.\n\n7. Bake <span class=\"k\">for </span>10-12 minutes, or <span class=\"k\">until </span>the edges are golden brown. The centers should still be slightly soft.\n\n8. Allow the cookies to cool on the baking sheet <span class=\"k\">for </span>a few minutes before transferring them to a wire rack\nto cool completely.\n\nEnjoy your homemade chocolate chip cookies!\n</code></pre>\n  </div>\n</div>\n<p>If you want a persistent wake-on-use connection to your Ollama instance, you can set up a <a href='https://fly.io/docs/reference/private-networking/#discovering-apps-through-dns-on-a-wireguard-connection' title=''>connection to your Fly network using WireGuard</a>. This will let you use Ollama from your local applications without having to run them on Fly. For example, if you want to figure out the safe cooking temperature for ground beef in Celsius, you can query that in JavaScript with this snippet of code:</p>\n<div class=\"highlight-wrapper group relative typescript\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-x8r5bgmm\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-x8r5bgmm\"><span class=\"kd\">const</span> <span class=\"nx\">generateRequest</span> <span class=\"o\">=</span> <span class=\"p\">{</span>\n  <span class=\"na\">model</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">openchat:7b-v3.5-fp16</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"na\">prompt</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">What is the safe cooking temperature for ground beef in celsius?</span><span class=\"dl\">\"</span>\n  <span class=\"na\">stream</span><span class=\"p\">:</span> <span class=\"kc\">false</span><span class=\"p\">,</span> <span class=\"c1\">// <- important for Node/Deno clients</span>\n<span class=\"p\">};</span>\n\n<span class=\"kd\">let</span> <span class=\"nx\">resp</span> <span class=\"o\">=</span> <span class=\"k\">await</span> <span class=\"nx\">fetch</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">http://sparkling-violet-709.flycast/api/generate</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"p\">{</span>\n  <span class=\"na\">method</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">POST</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"na\">body</span><span class=\"p\">:</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nx\">stringify</span><span class=\"p\">(</span><span class=\"nx\">generateRequest</span><span class=\"p\">),</span>\n<span class=\"p\">});</span>\n\n<span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"nx\">resp</span><span class=\"p\">.</span><span class=\"nx\">status</span> <span class=\"o\">!==</span> <span class=\"mi\">200</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"k\">throw</span> <span class=\"k\">new</span> <span class=\"nb\">Error</span><span class=\"p\">(</span><span class=\"s2\">`error fetching response: </span><span class=\"p\">${</span><span class=\"nx\">resp</span><span class=\"p\">.</span><span class=\"nx\">status</span><span class=\"p\">}</span><span class=\"s2\">: </span><span class=\"p\">${</span><span class=\"k\">await</span> <span class=\"nx\">resp</span><span class=\"p\">.</span><span class=\"nx\">text</span><span class=\"p\">()}</span><span class=\"s2\">`</span><span class=\"p\">);</span>\n<span class=\"p\">}</span>\n\n<span class=\"nx\">resp</span> <span class=\"o\">=</span> <span class=\"k\">await</span> <span class=\"nx\">resp</span><span class=\"p\">.</span><span class=\"nx\">json</span><span class=\"p\">();</span>\n\n<span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nx\">log</span><span class=\"p\">(</span><span class=\"nx\">resp</span><span class=\"p\">.</span><span class=\"nx\">response</span><span class=\"p\">);</span> <span class=\"c1\">// Something like \"The safe cooking temperature for ground beef is 71 degrees celsius (160 degrees fahrenheit).</span>\n</code></pre>\n  </div>\n</div><h2 id='scaling-to-zero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#scaling-to-zero' aria-label='Anchor'></a><span class='plain-code'>Scaling to zero</span></h2>\n<p>The best part about all of this is that when you want to scale down to zero running Machines: do nothing, it will automatically shut down when it’s idle. Wait a few minutes and then verify it with <code>fly status</code>:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-wpipvj9k\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-wpipvj9k\"><span class=\"nv\">$ </span>fly status\n\n...\n\nPROCESS ID              VERSION REGION  STATE   ROLE    CHECKS  LAST UPDATED\napp     3d8d7949b22089  9       ord     stopped                 2023-11-14T19:34:24Z\n</code></pre>\n  </div>\n</div>\n<p>The app has been stopped. This means that it’s not running and you’re not paying for it. When you want it to start up again, just make a request. It will automatically start up and you can use it as normal with the CLI or even just arbitrary calls to <a href='https://github.com/jmorganca/ollama/blob/main/docs/api.md' title=''>the API</a>.</p>\n\n<p>You can also upload your own models to the Ollama registry by <a href='https://github.com/jmorganca/ollama/blob/main/docs/import.md' title=''>creating your own Modelfile</a> and pushing it (though you will need to install Ollama locally to publish your own models). At this time, the only way to set a custom system prompt is to use a Modelfile and upload your model to the registry.</p>\n<h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2>\n<p>Ollama is a fantastic way to run large language models of your choice and the ability to use Fly.io’s powerful GPUs means you can use bigger models with more parameters and a larger context window. This lets you make your assistants more lifelike, your conversations have more context, and your text generation more realistic.</p>\n\n<p>Oh, by the way, this also lets you use the new <code>json</code> mode to have your models call functions, similar to how ChatGPT would. To do this, have a system prompt that looks like this:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-2x5qkx59\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-2x5qkx59\">You are a helpful research assistant. The following functions are available for you to fetch further data to answer user questions, if relevant:\n\n{\n    \"function\": \"search_bing\",\n    \"description\": \"Search the web for content on Bing. This allows users to search online/the internet/the web for content.\",\n    \"arguments\": [\n        {\n            \"name\": \"query\",\n            \"type\": \"string\",\n            \"description\": \"The search query string\"\n        }\n    ]\n}\n\n{\n    \"function\": \"search_arxiv\",\n    \"description\": \"Search for research papers on ArXiv. Make use of AND, OR and NOT operators as appropriate to join terms within the query.\",\n    \"arguments\": [\n        {\n            \"name\": \"query\",\n            \"type\": \"string\",\n            \"description\": \"The search query string\"\n        }\n    ]\n}\n\nTo call a function, respond - immediately and only - with a JSON object of the following format:\n{\n    \"function\": \"function_name\",\n    \"arguments\": {\n        \"argument1\": \"argument_value\",\n        \"argument2\": \"argument_value\"\n    }\n}\n\nIf no function needs to be called, respond with an empty JSON object: {}\n</code></pre>\n  </div>\n</div>\n<p>Then you can use the <a href='https://github.com/jmorganca/ollama/blob/main/docs/api.md#request-json-mode' title=''>JSON format</a> to receive a JSON response from Ollama (hint: <code>—format=json</code> in the CLI or <code>format: \"json\"</code> in the API). This is a great way to make your assistants more lifelike and more useful. You will need to use something like <a href='https://www.langchain.com/' title=''>Langchain</a> or manual iterations to properly handle the cases where the user doesn’t want to call a function, but that’s a topic for another blog post.</p>\n\n<p>For the best results you may want to use a model with a larger context window such as <a href='https://ollama.ai/library/vicuna:13b-v1.5-16k-fp16' title=''>vicuna:13b-v1.5-16k-fp16</a> (16k == 16,384 token window) as JSON is very token-expensive. Future advances in the next few weeks (such as the Yi models gaining ludicrous token windows on the line of 200,000 tokens at the cost of ludicrous amounts of VRAM usage) will make this less of an issue. You can also get away with minifying the JSON in the functions and examples a lot, but you may need to experiment to get the best results.</p>\n\n<p>Happy hacking, y'all.</p>",
      "image": {
        "url": "https://fly.io/blog/scaling-llm-ollama/assets/thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/rethinking-serverless-with-flame/",
      "title": "Rethinking Serverless with FLAME",
      "description": null,
      "url": "https://fly.io/blog/rethinking-serverless-with-flame/",
      "published": "2023-12-06T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<blockquote>Imagine if you could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of your app.</blockquote>\n\n\n<p>The pursuit of elastic, auto-scaling applications has taken us to silly places.</p>\n\n<p>Serverless/FaaS had a couple things going for it. Elastic Scale™ is hard. It’s even harder when you need to manage those pesky servers. It also promised pay-what-you-use costs to avoid idle usage. Good stuff, right?</p>\n\n<p>Well the charade is over. You offload scaling concerns and the complexities of scaling, just to end up needing <em>more complexity</em>. Additional queues, storage, and glue code to communicate back to our app is just the starting point. Dev, test, and CI complexity balloons as fast as your costs. Oh, and you often have to rewrite your app in proprietary JavaScript – even if it’s already written in JavaScript!</p>\n\n<p>At the same time, the rest of us have elastically scaled by starting more webservers. Or we’ve dumped on complexity with microservices. This doesn’t make sense. Piling on more webservers to transcode more videos or serve up more ML tasks isn’t what we want. And granular scale shouldn’t require slicing our apps into bespoke operational units with their own APIs and deployments to manage.</p>\n\n<p>Enough is enough. There’s a better way to elastically scale applications.</p>\n<h2 id='the-flame-pattern' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-flame-pattern' aria-label='Anchor'></a><span class='plain-code'>The FLAME pattern</span></h2>\n<p>Here’s what we really want:</p>\n\n<ul>\n<li>We don’t want to manage those pesky servers. We already have this for our app deployments via <code>fly deploy</code>, <code>git push heroku</code>, <code>kubectl</code>, etc\n</li><li>We want on-demand, <em>granular</em> elastic scale of specific parts of our app code\n</li><li>We don’t want to rewrite our application or write parts of it in proprietary runtimes\n</li></ul>\n\n<p>Imagine if we could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of the app.</p>\n\n<p>Enter the FLAME pattern.</p>\n<blockquote>FLAME - Fleeting Lambda Application for Modular Execution</blockquote>\n\n\n<p>With FLAME, you treat your <em>entire application</em> as a lambda, where modular parts can be executed on short-lived infrastructure.</p>\n\n<p>No rewrites. No bespoke runtimes. No outrageous layers of complexity. Need to insert the results of an expensive operation to the database? PubSub broadcast the result of some expensive work? No problem! It’s your whole app so of course you can do it.</p>\n\n<p>The Elixir <a href='https://github.com/phoenixframework/flame' title=''>flame library</a> implements the FLAME pattern. It has a backend adapter for Fly.io, but you can use it on any cloud that gives you an API to spin up an instance with your app code running on it. We’ll talk more about backends in a bit, as well as implementing FLAME in other languages.</p>\n\n<p>First, lets watch a realtime thumbnail generation example to see FLAME + Elixir in action:</p>\n<div class=\"youtube-container\" data-exclude-render>\n  <div class=\"youtube-video\">\n    <iframe\n      width=\"100%\"\n      height=\"100%\"\n      src=\"https://www.youtube.com/embed/l1xt_rkWdic\"\n      frameborder=\"0\"\n      allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n      allowfullscreen>\n    </iframe>\n  </div>\n</div>\n\n\n<p>Now let’s walk thru something a little more basic. Imagine we have a function to transcode video to thumbnails in our Elixir application after they are uploaded:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-racxyl56\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-racxyl56\"><span class=\"k\">def</span> <span class=\"n\">generate_thumbnails</span><span class=\"p\">(%</span><span class=\"no\">Video</span><span class=\"p\">{}</span> <span class=\"o\">=</span> <span class=\"n\">vid</span><span class=\"p\">,</span> <span class=\"n\">interval</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n  <span class=\"n\">tmp</span> <span class=\"o\">=</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">join</span><span class=\"p\">(</span><span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">tmp_dir!</span><span class=\"p\">(),</span> <span class=\"no\">Ecto</span><span class=\"o\">.</span><span class=\"no\">UUID</span><span class=\"o\">.</span><span class=\"n\">generate</span><span class=\"p\">())</span>\n  <span class=\"no\">File</span><span class=\"o\">.</span><span class=\"n\">mkdir!</span><span class=\"p\">(</span><span class=\"n\">tmp</span><span class=\"p\">)</span>\n  <span class=\"n\">args</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"s2\">\"-i\"</span><span class=\"p\">,</span> <span class=\"n\">vid</span><span class=\"o\">.</span><span class=\"n\">url</span><span class=\"p\">,</span> <span class=\"s2\">\"-vf\"</span><span class=\"p\">,</span> <span class=\"s2\">\"fps=1/</span><span class=\"si\">#{</span><span class=\"n\">interval</span><span class=\"si\">}</span><span class=\"s2\">\"</span><span class=\"p\">,</span> <span class=\"s2\">\"</span><span class=\"si\">#{</span><span class=\"n\">tmp</span><span class=\"si\">}</span><span class=\"s2\">/%02d.png\"</span><span class=\"p\">]</span>\n  <span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">cmd</span><span class=\"p\">(</span><span class=\"s2\">\"ffmpeg\"</span><span class=\"p\">,</span> <span class=\"n\">args</span><span class=\"p\">)</span>\n  <span class=\"n\">urls</span> <span class=\"o\">=</span> <span class=\"no\">VidStore</span><span class=\"o\">.</span><span class=\"n\">put_thumbnails</span><span class=\"p\">(</span><span class=\"n\">vid</span><span class=\"p\">,</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">wildcard</span><span class=\"p\">(</span><span class=\"n\">tmp</span> <span class=\"o\"><></span> <span class=\"s2\">\"/*.png\"</span><span class=\"p\">))</span>\n  <span class=\"no\">Repo</span><span class=\"o\">.</span><span class=\"n\">insert_all</span><span class=\"p\">(</span><span class=\"no\">Thumb</span><span class=\"p\">,</span> <span class=\"no\">Enum</span><span class=\"o\">.</span><span class=\"n\">map</span><span class=\"p\">(</span><span class=\"n\">urls</span><span class=\"p\">,</span> <span class=\"o\">&</span><span class=\"p\">%{</span><span class=\"ss\">vid_id:</span> <span class=\"n\">vid</span><span class=\"o\">.</span><span class=\"n\">id</span><span class=\"p\">,</span> <span class=\"ss\">url:</span> <span class=\"nv\">&1</span><span class=\"p\">}))</span>\n<span class=\"k\">end</span>\n</code></pre>\n  </div>\n</div>\n<p>Our <code>generate_thumbnails</code> function accepts a video struct. We shell out to <code>ffmpeg</code> to take the video URL and generate thumbnails at a given interval. We then write the temporary thumbnail paths to durable storage. Finally, we insert the generated thumbnail URLs into the database.</p>\n\n<p>This works great locally, but CPU bound work like video transcoding can quickly bring our entire service to a halt in production. Instead of rewriting large swaths of our app to move this into microservices or some FaaS, we can simply wrap it in a FLAME call:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-b6z6pma\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-b6z6pma\"><span class=\"k\">def</span> <span class=\"n\">generate_thumbnails</span><span class=\"p\">(%</span><span class=\"no\">Video</span><span class=\"p\">{}</span> <span class=\"o\">=</span> <span class=\"n\">vid</span><span class=\"p\">,</span> <span class=\"n\">interval</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n  <span class=\"no\">FLAME</span><span class=\"o\">.</span><span class=\"n\">call</span><span class=\"p\">(</span><span class=\"no\">MyApp</span><span class=\"o\">.</span><span class=\"no\">FFMpegRunner</span><span class=\"p\">,</span> <span class=\"k\">fn</span> <span class=\"o\">-></span>\n    <span class=\"n\">tmp</span> <span class=\"o\">=</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">join</span><span class=\"p\">(</span><span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">tmp_dir!</span><span class=\"p\">(),</span> <span class=\"no\">Ecto</span><span class=\"o\">.</span><span class=\"no\">UUID</span><span class=\"o\">.</span><span class=\"n\">generate</span><span class=\"p\">())</span>\n    <span class=\"no\">File</span><span class=\"o\">.</span><span class=\"n\">mkdir!</span><span class=\"p\">(</span><span class=\"n\">tmp</span><span class=\"p\">)</span>\n    <span class=\"n\">args</span> <span class=\"o\">=</span>\n      <span class=\"p\">[</span><span class=\"s2\">\"-i\"</span><span class=\"p\">,</span> <span class=\"n\">vid</span><span class=\"o\">.</span><span class=\"n\">url</span><span class=\"p\">,</span> <span class=\"s2\">\"-vf\"</span><span class=\"p\">,</span> <span class=\"s2\">\"fps=1/</span><span class=\"si\">#{</span><span class=\"n\">interval</span><span class=\"si\">}</span><span class=\"s2\">\"</span><span class=\"p\">,</span> <span class=\"s2\">\"</span><span class=\"si\">#{</span><span class=\"n\">tmp</span><span class=\"si\">}</span><span class=\"s2\">/%02d.png\"</span><span class=\"p\">]</span>\n    <span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">cmd</span><span class=\"p\">(</span><span class=\"s2\">\"ffmpeg\"</span><span class=\"p\">,</span> <span class=\"n\">args</span><span class=\"p\">)</span>\n    <span class=\"n\">urls</span> <span class=\"o\">=</span> <span class=\"no\">VidStore</span><span class=\"o\">.</span><span class=\"n\">put_thumbnails</span><span class=\"p\">(</span><span class=\"n\">vid</span><span class=\"p\">,</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">wildcard</span><span class=\"p\">(</span><span class=\"n\">tmp</span> <span class=\"o\"><></span> <span class=\"s2\">\"/*.png\"</span><span class=\"p\">))</span>\n    <span class=\"no\">Repo</span><span class=\"o\">.</span><span class=\"n\">insert_all</span><span class=\"p\">(</span><span class=\"no\">Thumb</span><span class=\"p\">,</span> <span class=\"no\">Enum</span><span class=\"o\">.</span><span class=\"n\">map</span><span class=\"p\">(</span><span class=\"n\">urls</span><span class=\"p\">,</span> <span class=\"o\">&</span><span class=\"p\">%{</span><span class=\"ss\">vid_id:</span> <span class=\"n\">vid</span><span class=\"o\">.</span><span class=\"n\">id</span><span class=\"p\">,</span> <span class=\"ss\">url:</span> <span class=\"nv\">&1</span><span class=\"p\">}))</span>\n  <span class=\"k\">end</span><span class=\"p\">)</span>\n<span class=\"k\">end</span>\n</code></pre>\n  </div>\n</div>\n<p>That’s it! <code>FLAME.call</code> accepts the name of a runner pool, and a function. It then finds or boots a new copy of our entire application and runs the function there. Any variables the function closes over (like our <code>%Video{}</code> struct and <code>interval</code>) are passed along automatically.</p>\n\n<p>When the FLAME runner boots up, it connects back to the parent node, receives the function to run, executes it, and returns the result to the caller. Based on configuration, the booted runner either waits happily for more work before idling down, or extinguishes itself immediately.</p>\n\n<p>Let’s visualize the flow:</p>\n\n<p><img alt=\"visualizing the flow\" src=\"/blog/rethinking-serverless-with-flame/assets/visual.webp?centered\" /></p>\n\n<p>We changed no other code and issued our DB write with <code>Repo.insert_all</code> just like before, because we are running our <em>entire</em> <em>application</em>. Database connection(s) and all. Except this fleeting application only runs that little function after startup and nothing else.</p>\n\n<p>In practice, a FLAME implementation will support a pool of runners for hot startup, scale-to-zero, and elastic growth. More on that later.</p>\n<h2 id='solving-a-problem-vs-removing-the-problem' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#solving-a-problem-vs-removing-the-problem' aria-label='Anchor'></a><span class='plain-code'>Solving a problem vs removing the problem</span></h2><blockquote>FaaS solutions help you solve a problem. FLAME removes the problem.</blockquote>\n\n\n<p>The FaaS labyrinth of complexity defies reason. And it’s unavoidable. Let’s walkthrough the thumbnail use-case to see how.</p>\n\n<p>We try to start with the simplest building block like request/response AWS Lambda Function URL’s.</p>\n\n<p>The complexity hits immediately.</p>\n\n<p>We start writing custom encoders/decoders on both sides to handle streaming the thumbnails back to the app over HTTP. Phew that’s done. Wait, is our video transcoding or user uploads going to take longer than 15 minutes? Sorry, hard timeout limit – time to split our videos into chunks to stay within the timeout, which means more lambdas to do that. Now we’re orchestrating lambda workflows and relying on additional services, such as SQS and S3, to enable this.</p>\n\n<p>All the FaaS is doing is adding layers of communication between your code and the parts you want to run elastically. Each layer has its own glue integration price to pay.</p>\n\n<p>Ultimately handling this kind of use-case looks something like this:</p>\n\n<ul>\n<li>Trigger the lambda via HTTP endpoint, S3, or API gateway ($)\n</li><li>Write the bespoke lambda to transcode the video ($)\n</li><li>Place the thumbnail results into SQS ($)\n</li><li>Write the SQS consumer in our app (dev $)\n</li><li>Persist to DB and figure out how to get events back to active subscribers that may well be connected to other instances than the SQS consumer (dev $)\n</li></ul>\n\n<p>This is nuts. We pay the FaaS toll at every step. We shouldn’t have to do any of this!</p>\n\n<p>FaaS provides a bunch of offerings to build a solution on top of. FLAME removes the problem entirely.</p>\n<h2 id='flame-backends' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#flame-backends' aria-label='Anchor'></a><span class='plain-code'>FLAME Backends</span></h2><blockquote>On Fly.io infrastructure the <code>FLAME.FlyBackend</code> can boot a copy of your application on a new <a href=\"https://fly.io/docs/machines/\">Machine</a> and have it connect back to the parent for work within ~3s.</blockquote>\n\n\n<p>By default, FLAME ships with a <code>LocalBackend</code> and <code>FlyBackend</code>, but any host that provides an API to provision a server and run your app code can work as a FLAME backend. Erlang and Elixir primitives are doing all the heavy lifting here. The entire <code>FLAME.FlyBackend</code> is <a href='https://github.com/phoenixframework/flame/blob/main/lib/flame/fly_backend.ex' title=''>< 200 LOC with docs</a>. The library has a single dependency, <code>req</code>, which is an HTTP client.</p>\n\n<p>Because Fly.io runs our applications as a packaged up docker image, we simply ask the Fly API to boot a new Machine for us with the same image that our app is currently running. Also thanks to Fly infrastructure, we can guarantee the FLAME runners are started in the same region as the parent. This optimizes latency and lets you ship whatever data back and forth between parent and runner without having to think about it.</p>\n<h2 id='look-at-everything-were-not-doing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-everything-were-not-doing' aria-label='Anchor'></a><span class='plain-code'>Look at everything we’re not doing</span></h2>\n<p>With FaaS, just imagine how quickly the dev and testing story becomes a fate worse than death.</p>\n\n<p>To run the app locally, we either need to add some huge dev dependencies to simulate the entire FaaS pipeline, or worse, connect up our dev and test environments directly to the FaaS provider.</p>\n\n<p>With FLAME, your dev and test runners simply run on the local backend.</p>\n\n<p>Remember, this is your app. FLAME just controls where modular parts of it run. In dev or test, those parts simply run on the existing runtime on your laptop or CI server.</p>\n\n<p>Using Elixir, we can even send a file across to the remote FLAME application thanks to the distributed features of the Erlang VM:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-rduui6yl\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-rduui6yl\"><span class=\"k\">def</span> <span class=\"n\">generate_thumbnails</span><span class=\"p\">(%</span><span class=\"no\">Video</span><span class=\"p\">{}</span> <span class=\"o\">=</span> <span class=\"n\">vid</span><span class=\"p\">,</span> <span class=\"n\">interval</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n  <span class=\"n\">parent_stream</span> <span class=\"o\">=</span> <span class=\"no\">File</span><span class=\"o\">.</span><span class=\"n\">stream!</span><span class=\"p\">(</span><span class=\"n\">vid</span><span class=\"o\">.</span><span class=\"n\">filepath</span><span class=\"p\">,</span> <span class=\"p\">[],</span> <span class=\"mi\">2048</span><span class=\"p\">)</span>\n  <span class=\"no\">FLAME</span><span class=\"o\">.</span><span class=\"n\">call</span><span class=\"p\">(</span><span class=\"no\">MyApp</span><span class=\"o\">.</span><span class=\"no\">FFMpegRunner</span><span class=\"p\">,</span> <span class=\"k\">fn</span> <span class=\"o\">-></span>\n    <span class=\"n\">tmp_file</span> <span class=\"o\">=</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">join</span><span class=\"p\">(</span><span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">tmp_dir!</span><span class=\"p\">(),</span> <span class=\"no\">Ecto</span><span class=\"o\">.</span><span class=\"no\">UUID</span><span class=\"o\">.</span><span class=\"n\">generate</span><span class=\"p\">())</span>\n    <span class=\"n\">flame_stream</span> <span class=\"o\">=</span> <span class=\"no\">File</span><span class=\"o\">.</span><span class=\"n\">stream!</span><span class=\"p\">(</span><span class=\"n\">tmp_file</span><span class=\"p\">)</span>\n    <span class=\"no\">Enum</span><span class=\"o\">.</span><span class=\"n\">into</span><span class=\"p\">(</span><span class=\"n\">parent_stream</span><span class=\"p\">,</span> <span class=\"n\">flame_stream</span><span class=\"p\">)</span>\n\n    <span class=\"n\">tmp</span> <span class=\"o\">=</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">join</span><span class=\"p\">(</span><span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">tmp_dir!</span><span class=\"p\">(),</span> <span class=\"no\">Ecto</span><span class=\"o\">.</span><span class=\"no\">UUID</span><span class=\"o\">.</span><span class=\"n\">generate</span><span class=\"p\">())</span>\n    <span class=\"no\">File</span><span class=\"o\">.</span><span class=\"n\">mkdir!</span><span class=\"p\">(</span><span class=\"n\">tmp</span><span class=\"p\">)</span>\n    <span class=\"n\">args</span> <span class=\"o\">=</span>\n      <span class=\"p\">[</span><span class=\"s2\">\"-i\"</span><span class=\"p\">,</span> <span class=\"n\">tmp_file</span><span class=\"p\">,</span> <span class=\"s2\">\"-vf\"</span><span class=\"p\">,</span> <span class=\"s2\">\"fps=1/</span><span class=\"si\">#{</span><span class=\"n\">interval</span><span class=\"si\">}</span><span class=\"s2\">\"</span><span class=\"p\">,</span> <span class=\"s2\">\"</span><span class=\"si\">#{</span><span class=\"n\">tmp</span><span class=\"si\">}</span><span class=\"s2\">/%02d.png\"</span><span class=\"p\">]</span>\n    <span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">cmd</span><span class=\"p\">(</span><span class=\"s2\">\"ffmpeg\"</span><span class=\"p\">,</span> <span class=\"n\">args</span><span class=\"p\">)</span>\n    <span class=\"n\">urls</span> <span class=\"o\">=</span> <span class=\"no\">VidStore</span><span class=\"o\">.</span><span class=\"n\">put_thumbnails</span><span class=\"p\">(</span><span class=\"n\">vid</span><span class=\"p\">,</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">wildcard</span><span class=\"p\">(</span><span class=\"n\">tmp</span> <span class=\"o\"><></span> <span class=\"s2\">\"/*.png\"</span><span class=\"p\">))</span>\n    <span class=\"no\">Repo</span><span class=\"o\">.</span><span class=\"n\">insert_all</span><span class=\"p\">(</span><span class=\"no\">Thumb</span><span class=\"p\">,</span> <span class=\"no\">Enum</span><span class=\"o\">.</span><span class=\"n\">map</span><span class=\"p\">(</span><span class=\"n\">urls</span><span class=\"p\">,</span> <span class=\"o\">&</span><span class=\"p\">%{</span><span class=\"ss\">vid_id:</span> <span class=\"n\">vid</span><span class=\"o\">.</span><span class=\"n\">id</span><span class=\"p\">,</span> <span class=\"ss\">url:</span> <span class=\"nv\">&1</span><span class=\"p\">}))</span>\n  <span class=\"k\">end</span><span class=\"p\">)</span>\n<span class=\"k\">end</span>\n</code></pre>\n  </div>\n</div>\n<p>On line 2 we open a file on the parent node to the video path. Then in the FLAME child, we stream the file from the parent node to the FLAME server in only a couple lines of code. That’s it! No setup of S3 or HTTP interfaces required.</p>\n\n<p>With FLAME it’s easy to miss everything we’re not doing:</p>\n\n<ul>\n<li>We don’t need to write code outside of our application. We can reuse business logic, database setup, PubSub, and all the features of our respective platforms\n</li><li>We don’t need to manage deploys of separate services or endpoints\n</li><li>We don’t need to write results to S3 or SQS just to pick up values back in our app\n</li><li>We skip the dev, test, and CI dependency dance\n</li></ul>\n<h2 id='flame-outside-elixir' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#flame-outside-elixir' aria-label='Anchor'></a><span class='plain-code'>FLAME outside Elixir</span></h2>\n<p>Elixir is fantastically well suited for the FLAME model because we get so much <a href='https://fly.io/phoenix-files/elixir-and-phoenix-can-do-it-all/' title=''>for free</a> like process supervision and distributed messaging. That said, any language with reasonable concurrency primitives can take advantage of this pattern. For example, my teammate, Lubien, created a proof of concept example for breaking out functions in your JavaScript application and running them inside a new Fly Machine: <a href='https://github.com/lubien/fly-run-this-function-on-another-machine' title=''>https://github.com/lubien/fly-run-this-function-on-another-machine</a></p>\n\n<p>So the general flow for a JavaScript-based FLAME call would be to move the modular executions to a new file, which is executed on a runner pool. Provided the arguments are JSON serializable, the general FLAME flow is similar to what we’ve outlined here. Your application, your code, running on fleeting instances.</p>\n\n<p>A complete FLAME library will need to handle the following concerns:</p>\n\n<ul>\n<li>Elastic pool scale-up and scale-down logic\n</li><li>Hot vs cold startup with pools\n</li><li>Remote runner monitoring to avoid orphaned resources\n</li><li>How to monitor and keep deployments fresh\n</li></ul>\n\n<p>For the rest of this post we’ll see how the Elixir FLAME library handles these concerns as well as features uniquely suited to Elixir applications. But first, you might be wondering about your background job queues.</p>\n<h2 id='what-about-my-background-job-processor' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-about-my-background-job-processor' aria-label='Anchor'></a><span class='plain-code'>What about my background job processor?</span></h2>\n<p>FLAME works great inside your background job processor, but you may have noticed some overlap. If your job library handles scaling the worker pool, what is FLAME doing for you? There’s a couple important distinctions here.</p>\n\n<p>First, we reach for these queues when we need <em>durability guarantees</em>. We often can turn knobs to have the queues scale to handle more jobs as load changes. But durable operations are separate from elastic execution. Conflating these concerns can send you down a similar path to lambda complexity. Leaning on your worker queue purely for offloaded execution means writing all the glue code to get the data into and out of the job, and back to the caller or end-user’s device somehow.</p>\n\n<p>For example, if we want to guarantee we successfully generated thumbnails for a video after the user upload, then a job queue makes sense as the <em>dispatch, commit, and retry</em> <em>mechanism</em> for this operation. The actual transcoding could be a FLAME call inside the job itself, so we decouple the ideas of durability and scaled execution.</p>\n\n<p>On the other side, we have operations we don’t need durability for. Take the screencast above where the user hasn’t yet saved their video. Or an ML model execution where there’s no need to waste resources churning a prompt if the user has already left the app. In those cases, it doesn’t make sense to write to a durable store to pick up a job for work that will go right into the ether.</p>\n<h2 id='pooling-for-elastic-scale' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pooling-for-elastic-scale' aria-label='Anchor'></a><span class='plain-code'>Pooling for Elastic Scale</span></h2>\n<p>With the Elixir implementation of FLAME, you define elastic pools of runners. This allows scale-to-zero behavior while also elastically scaling up FLAME servers with max concurrency limits.</p>\n\n<p>For example, lets take a look at the <code>start/2</code> callback, which is the entry point of all Elixir applications. We can drop in a <code>FLAME.Pool</code> for video transcriptions and say we want it to scale to zero, boot a max of 10, and support 5 concurrent <code>ffmpeg</code> operations per runner:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-txwesbky\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-txwesbky\"><span class=\"k\">def</span> <span class=\"n\">start</span><span class=\"p\">(</span><span class=\"n\">_type</span><span class=\"p\">,</span> <span class=\"n\">_args</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n  <span class=\"n\">flame_parent</span> <span class=\"o\">=</span> <span class=\"no\">FLAME</span><span class=\"o\">.</span><span class=\"no\">Parent</span><span class=\"o\">.</span><span class=\"n\">get</span><span class=\"p\">()</span>\n\n  <span class=\"n\">children</span> <span class=\"o\">=</span> <span class=\"p\">[</span>\n    <span class=\"o\">...</span><span class=\"p\">,</span>\n    <span class=\"no\">MyApp</span><span class=\"o\">.</span><span class=\"no\">Repo</span><span class=\"p\">,</span>\n    <span class=\"p\">{</span><span class=\"no\">FLAME</span><span class=\"o\">.</span><span class=\"no\">Pool</span><span class=\"p\">,</span>\n      <span class=\"ss\">name:</span> <span class=\"no\">Thumbs</span><span class=\"o\">.</span><span class=\"no\">FFMpegRunner</span><span class=\"p\">,</span>\n      <span class=\"ss\">min:</span> <span class=\"mi\">0</span><span class=\"p\">,</span>\n      <span class=\"ss\">max:</span> <span class=\"mi\">10</span><span class=\"p\">,</span>\n      <span class=\"ss\">max_concurrency:</span> <span class=\"mi\">5</span><span class=\"p\">,</span>\n      <span class=\"ss\">idle_shutdown_after:</span> <span class=\"mi\">30_000</span><span class=\"p\">},</span>\n    <span class=\"n\">!flame_parent</span> <span class=\"o\">&&</span> <span class=\"no\">MyAppWeb</span><span class=\"o\">.</span><span class=\"no\">Endpoint</span>\n  <span class=\"p\">]</span>\n  <span class=\"o\">|></span> <span class=\"no\">Enum</span><span class=\"o\">.</span><span class=\"n\">filter</span><span class=\"p\">(</span><span class=\"o\">&</span> <span class=\"nv\">&1</span><span class=\"p\">)</span>\n\n  <span class=\"n\">opts</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"ss\">strategy:</span> <span class=\"ss\">:one_for_one</span><span class=\"p\">,</span> <span class=\"ss\">name:</span> <span class=\"no\">MyApp</span><span class=\"o\">.</span><span class=\"no\">Supervisor</span><span class=\"p\">]</span>\n  <span class=\"no\">Supervisor</span><span class=\"o\">.</span><span class=\"n\">start_link</span><span class=\"p\">(</span><span class=\"n\">children</span><span class=\"p\">,</span> <span class=\"n\">opts</span><span class=\"p\">)</span>\n<span class=\"k\">end</span>\n</code></pre>\n  </div>\n</div>\n<p>We use the presence of a FLAME parent to conditionally start our Phoenix webserver when booting the app. There’s no reason to start a webserver if we aren’t serving web traffic. Note we leave other services like the database <code>MyApp.Repo</code> alone because we want to make use of those services inside FLAME runners.</p>\n\n<p>Elixir’s supervised process approach to applications is uniquely great for turning these kinds of knobs.</p>\n\n<p>We also set our pool to idle down after 30 seconds of no caller operations. This keeps our runners hot for a short while before discarding them. We could also pass a <code>min: 1</code> to always ensure at least one <code>ffmpeg</code> runner is hot and ready for work by the time our application is started.</p>\n<h2 id='process-placement' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#process-placement' aria-label='Anchor'></a><span class='plain-code'>Process Placement</span></h2>\n<p>In Elixir, stateful bits of our applications are built around the <em>process</em> primitive – lightweight greenthreads with message mailboxes. Wrapping our otherwise stateless app code in a synchronous <code>FLAME.call</code>‘s or async <code>FLAME.cast</code>’s works great, but what about the stateful parts of our app?</p>\n\n<p><code>FLAME.place_child</code> exists to take an existing process specification in your Elixir app and start it on a FLAME runner instead of locally. You can use it anywhere you’d use <code>Task.Supervisor.start_child</code> , <code>DynamicSupervisor.start_child</code>, or similar interfaces. Just like <code>FLAME.call</code>, the process is run on an elastic pool and runners handle idle down when the process completes its work.</p>\n\n<p>And like <code>FLAME.call</code>, it lets us take existing app code, change a single LOC, and continue shipping features.</p>\n\n<p>Let’s walk thru the example from the screencast above. Imagine we want to generate video thumbnails for a video <em>as it is being uploaded</em>. Elixir and LiveView make this easy. We won’t cover all the code here, but you can view the <a href='https://github.com/fly-apps/thumbnail_generator/blob/main/lib/thumbs/thumbnail_generator.ex' title=''>full app implementation</a>.</p>\n\n<p>Our first pass would be to write a LiveView upload writer that calls into a <code>ThumbnailGenerator</code>:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-a6sjmlsd\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-a6sjmlsd\"><span class=\"k\">defmodule</span> <span class=\"no\">ThumbsWeb</span><span class=\"o\">.</span><span class=\"no\">ThumbnailUploadWriter</span> <span class=\"k\">do</span>\n  <span class=\"nv\">@behaviour</span> <span class=\"no\">Phoenix</span><span class=\"o\">.</span><span class=\"no\">LiveView</span><span class=\"o\">.</span><span class=\"no\">UploadWriter</span>\n\n  <span class=\"n\">alias</span> <span class=\"no\">Thumbs</span><span class=\"o\">.</span><span class=\"no\">ThumbnailGenerator</span>\n\n  <span class=\"k\">def</span> <span class=\"n\">init</span><span class=\"p\">(</span><span class=\"n\">opts</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n    <span class=\"n\">generator</span> <span class=\"o\">=</span> <span class=\"no\">ThumbnailGenerator</span><span class=\"o\">.</span><span class=\"n\">open</span><span class=\"p\">(</span><span class=\"n\">opts</span><span class=\"p\">)</span>\n    <span class=\"p\">{</span><span class=\"ss\">:ok</span><span class=\"p\">,</span> <span class=\"p\">%{</span><span class=\"ss\">gen:</span> <span class=\"n\">generator</span><span class=\"p\">}}</span>\n  <span class=\"k\">end</span>\n\n  <span class=\"k\">def</span> <span class=\"n\">write_chunk</span><span class=\"p\">(</span><span class=\"n\">data</span><span class=\"p\">,</span> <span class=\"n\">state</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n    <span class=\"no\">ThumbnailGenerator</span><span class=\"o\">.</span><span class=\"n\">stream_chunk!</span><span class=\"p\">(</span><span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">gen</span><span class=\"p\">,</span> <span class=\"n\">data</span><span class=\"p\">)</span>\n    <span class=\"p\">{</span><span class=\"ss\">:ok</span><span class=\"p\">,</span> <span class=\"n\">state</span><span class=\"p\">}</span>\n  <span class=\"k\">end</span>\n\n  <span class=\"k\">def</span> <span class=\"n\">meta</span><span class=\"p\">(</span><span class=\"n\">state</span><span class=\"p\">),</span> <span class=\"k\">do</span><span class=\"p\">:</span> <span class=\"p\">%{</span><span class=\"ss\">gen:</span> <span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">gen</span><span class=\"p\">}</span>\n\n  <span class=\"k\">def</span> <span class=\"n\">close</span><span class=\"p\">(</span><span class=\"n\">state</span><span class=\"p\">,</span> <span class=\"n\">_reason</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n    <span class=\"no\">ThumbnailGenerator</span><span class=\"o\">.</span><span class=\"n\">close</span><span class=\"p\">(</span><span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">gen</span><span class=\"p\">)</span>\n    <span class=\"p\">{</span><span class=\"ss\">:ok</span><span class=\"p\">,</span> <span class=\"n\">state</span><span class=\"p\">}</span>\n  <span class=\"k\">end</span>\n<span class=\"k\">end</span>\n</code></pre>\n  </div>\n</div>\n<p>An upload writer is a behavior that simply ferries the uploaded chunks from the client into whatever we’d like to do with them. Here we have a <code>ThumbnailGenerator.open/1</code> which starts a process that communicates with an <code>ffmpeg</code> shell. Inside <code>ThumbnailGenerator.open/1</code>, we use regular elixir process primitives:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-9zrau48v\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-9zrau48v\">  <span class=\"c1\"># thumbnail_generator.ex</span>\n  <span class=\"k\">def</span> <span class=\"n\">open</span><span class=\"p\">(</span><span class=\"n\">opts</span> <span class=\"p\">\\\\</span> <span class=\"p\">[])</span> <span class=\"k\">do</span>\n    <span class=\"no\">Keyword</span><span class=\"o\">.</span><span class=\"n\">validate!</span><span class=\"p\">(</span><span class=\"n\">opts</span><span class=\"p\">,</span> <span class=\"p\">[</span><span class=\"ss\">:timeout</span><span class=\"p\">,</span> <span class=\"ss\">:caller</span><span class=\"p\">,</span> <span class=\"ss\">:fps</span><span class=\"p\">])</span>\n    <span class=\"n\">timeout</span> <span class=\"o\">=</span> <span class=\"no\">Keyword</span><span class=\"o\">.</span><span class=\"n\">get</span><span class=\"p\">(</span><span class=\"n\">opts</span><span class=\"p\">,</span> <span class=\"ss\">:timeout</span><span class=\"p\">,</span> <span class=\"mi\">5_000</span><span class=\"p\">)</span>\n    <span class=\"n\">caller</span> <span class=\"o\">=</span> <span class=\"no\">Keyword</span><span class=\"o\">.</span><span class=\"n\">get</span><span class=\"p\">(</span><span class=\"n\">opts</span><span class=\"p\">,</span> <span class=\"ss\">:caller</span><span class=\"p\">,</span> <span class=\"n\">self</span><span class=\"p\">())</span>\n    <span class=\"n\">ref</span> <span class=\"o\">=</span> <span class=\"n\">make_ref</span><span class=\"p\">()</span>\n    <span class=\"n\">parent</span> <span class=\"o\">=</span> <span class=\"n\">self</span><span class=\"p\">()</span>\n\n    <span class=\"n\">spec</span> <span class=\"o\">=</span> <span class=\"p\">{</span><span class=\"bp\">__MODULE__</span><span class=\"p\">,</span> <span class=\"p\">{</span><span class=\"n\">caller</span><span class=\"p\">,</span> <span class=\"n\">ref</span><span class=\"p\">,</span> <span class=\"n\">parent</span><span class=\"p\">,</span> <span class=\"n\">opts</span><span class=\"p\">}}</span>\n    <span class=\"p\">{</span><span class=\"ss\">:ok</span><span class=\"p\">,</span> <span class=\"n\">pid</span><span class=\"p\">}</span> <span class=\"o\">=</span> <span class=\"no\">DynamicSupervisor</span><span class=\"o\">.</span><span class=\"n\">start_child</span><span class=\"p\">(</span><span class=\"nv\">@sup</span><span class=\"p\">,</span> <span class=\"n\">spec</span><span class=\"p\">)</span>\n\n    <span class=\"k\">receive</span> <span class=\"k\">do</span>\n      <span class=\"p\">{</span><span class=\"o\">^</span><span class=\"n\">ref</span><span class=\"p\">,</span> <span class=\"p\">%</span><span class=\"no\">ThumbnailGenerator</span><span class=\"p\">{}</span> <span class=\"o\">=</span> <span class=\"n\">gen</span><span class=\"p\">}</span> <span class=\"o\">-></span>\n        <span class=\"p\">%</span><span class=\"no\">ThumbnailGenerator</span><span class=\"p\">{</span><span class=\"n\">gen</span> <span class=\"o\">|</span> <span class=\"ss\">pid:</span> <span class=\"n\">pid</span><span class=\"p\">}</span>\n    <span class=\"k\">after</span>\n      <span class=\"n\">timeout</span> <span class=\"o\">-></span> <span class=\"k\">exit</span><span class=\"p\">(</span><span class=\"ss\">:timeout</span><span class=\"p\">)</span>\n    <span class=\"k\">end</span>\n  <span class=\"k\">end</span>\n</code></pre>\n  </div>\n</div>\n<p>The details aren’t super important here, except line 10 where we call <code>{:ok, pid} = DynamicSupervisor.start_child(@sup, spec)</code>, which starts a supervised<code>ThumbnailGenerator</code> process. The rest of the implementation simply ferries chunks as stdin into <code>ffmpeg</code> and parses png’s from stdout. Once a PNG delimiter is found in stdout, we send the <code>caller</code> process (our LiveView process) a message saying “hey, here’s an image”:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-mgmj8tt7\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-mgmj8tt7\"><span class=\"c1\"># thumbnail_generator.ex</span>\n<span class=\"nv\">@png_begin</span> <span class=\"o\"><<</span><span class=\"mi\">137</span><span class=\"p\">,</span> <span class=\"mi\">80</span><span class=\"p\">,</span> <span class=\"mi\">78</span><span class=\"p\">,</span> <span class=\"mi\">71</span><span class=\"p\">,</span> <span class=\"mi\">13</span><span class=\"p\">,</span> <span class=\"mi\">10</span><span class=\"p\">,</span> <span class=\"mi\">26</span><span class=\"p\">,</span> <span class=\"mi\">10</span><span class=\"o\">>></span>\n<span class=\"k\">defp</span> <span class=\"n\">handle_stdout</span><span class=\"p\">(</span><span class=\"n\">state</span><span class=\"p\">,</span> <span class=\"n\">ref</span><span class=\"p\">,</span> <span class=\"n\">bin</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n  <span class=\"p\">%</span><span class=\"no\">ThumbnailGenerator</span><span class=\"p\">{</span><span class=\"ss\">ref:</span> <span class=\"o\">^</span><span class=\"n\">ref</span><span class=\"p\">,</span> <span class=\"ss\">caller:</span> <span class=\"n\">caller</span><span class=\"p\">}</span> <span class=\"o\">=</span> <span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">gen</span>\n\n  <span class=\"k\">case</span> <span class=\"n\">bin</span> <span class=\"k\">do</span>\n    <span class=\"o\"><<</span><span class=\"nv\">@png_begin</span><span class=\"p\">,</span> <span class=\"n\">_rest</span><span class=\"p\">::</span><span class=\"n\">binary</span><span class=\"o\">>></span> <span class=\"o\">-></span>\n      <span class=\"k\">if</span> <span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">current</span> <span class=\"k\">do</span>\n        <span class=\"n\">send</span><span class=\"p\">(</span><span class=\"n\">caller</span><span class=\"p\">,</span> <span class=\"p\">{</span><span class=\"n\">ref</span><span class=\"p\">,</span> <span class=\"ss\">:image</span><span class=\"p\">,</span> <span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">count</span><span class=\"p\">,</span> <span class=\"n\">encode</span><span class=\"p\">(</span><span class=\"n\">state</span><span class=\"p\">)})</span>\n      <span class=\"k\">end</span>\n\n      <span class=\"p\">%{</span><span class=\"n\">state</span> <span class=\"o\">|</span> <span class=\"ss\">count:</span> <span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">count</span> <span class=\"o\">+</span> <span class=\"mi\">1</span><span class=\"p\">,</span> <span class=\"ss\">current:</span> <span class=\"p\">[</span><span class=\"n\">bin</span><span class=\"p\">]}</span>\n\n    <span class=\"n\">_</span> <span class=\"o\">-></span>\n      <span class=\"p\">%{</span><span class=\"n\">state</span> <span class=\"o\">|</span> <span class=\"ss\">current:</span> <span class=\"p\">[</span><span class=\"n\">bin</span> <span class=\"o\">|</span> <span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">current</span><span class=\"p\">]}</span>\n  <span class=\"k\">end</span>\n<span class=\"k\">end</span>\n</code></pre>\n  </div>\n</div>\n<p>The <code>caller</code> LiveView process then picks up the message in a <code>handle_info</code> callback and updates the UI:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-k5i7sj1i\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-k5i7sj1i\"><span class=\"c1\"># thumb_live.ex</span>\n<span class=\"k\">def</span> <span class=\"n\">handle_info</span><span class=\"p\">({</span><span class=\"n\">_ref</span><span class=\"p\">,</span> <span class=\"ss\">:image</span><span class=\"p\">,</span> <span class=\"n\">_count</span><span class=\"p\">,</span> <span class=\"n\">encoded</span><span class=\"p\">},</span> <span class=\"n\">socket</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n  <span class=\"p\">%{</span><span class=\"ss\">count:</span> <span class=\"n\">count</span><span class=\"p\">}</span> <span class=\"o\">=</span> <span class=\"n\">socket</span><span class=\"o\">.</span><span class=\"n\">assigns</span>\n\n  <span class=\"p\">{</span><span class=\"ss\">:noreply</span><span class=\"p\">,</span>\n   <span class=\"n\">socket</span>\n   <span class=\"o\">|></span> <span class=\"n\">assign</span><span class=\"p\">(</span><span class=\"ss\">count:</span> <span class=\"n\">count</span> <span class=\"o\">+</span> <span class=\"mi\">1</span><span class=\"p\">,</span> <span class=\"ss\">message:</span> <span class=\"s2\">\"Generating (</span><span class=\"si\">#{</span><span class=\"n\">count</span> <span class=\"o\">+</span> <span class=\"mi\">1</span><span class=\"si\">}</span><span class=\"s2\">)\"</span><span class=\"p\">)</span>\n   <span class=\"o\">|></span> <span class=\"n\">stream_insert</span><span class=\"p\">(</span><span class=\"ss\">:thumbs</span><span class=\"p\">,</span> <span class=\"p\">%{</span><span class=\"ss\">id:</span> <span class=\"n\">count</span><span class=\"p\">,</span> <span class=\"ss\">encoded:</span> <span class=\"n\">encoded</span><span class=\"p\">})}</span>\n<span class=\"k\">end</span>\n</code></pre>\n  </div>\n</div>\n<p>The <code>send(caller, {ref, :image, state.count, encode(state)}</code> is one magic part about Elixir. Everything is a process, and we can message those processes, regardless of their location in the cluster.</p>\n\n<p>It’s like if every instantiation of an object in your favorite OO lang included a cluster-global unique identifier to work with methods on that object. The LiveView (a process) simply receives the image message and updates the UI with new images.</p>\n\n<p>Now let’s head back over to our <code>ThumbnailGenerator.open/1</code> function and make this elastically scalable.</p>\n<div class=\"highlight-wrapper group relative diff\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-2cwbroo9\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-2cwbroo9\"><span class=\"gd\">-    {:ok, pid} = DynamicSupervisor.start_child(@sup, spec)\n</span><span class=\"gi\">+    {:ok, pid} = FLAME.place_child(Thumbs.FFMpegRunner, spec)\n</span></code></pre>\n  </div>\n</div>\n<p>That’s it! Because everything is a process and processes can live anywhere, it doesn’t matter what server our <code>ThumbnailGenerator</code> process lives on. It simply messages the caller with <code>send(caller, …)</code> and the messages are sent across the cluster if needed.</p>\n\n<p>Once the process exits, either from an explicit close, after the upload is done, or from the end-user closing their browser tab, the FLAME server will note the exit and idle down if no other work is being done.</p>\n\n<p>Check out the <a href='https://github.com/fly-apps/thumbnail_generator/blob/main/lib/thumbs/thumbnail_generator.ex' title=''>full implementation</a> if you’re interested.</p>\n<h2 id='remote-monitoring' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#remote-monitoring' aria-label='Anchor'></a><span class='plain-code'>Remote Monitoring</span></h2>\n<p>All this transient infrastructure needs failsafe mechanisms to avoid orphaning resources. If a parent spins up a runner, that runner must take care of idling itself down when no work is present and handle failsafe shutdowns if it can no longer contact the parent node.</p>\n\n<p>Likewise, we need to shutdown runners when parents are rolled for new deploys as we must guarantee we’re running the same code across the cluster.</p>\n\n<p>We also have active callers in many cases that are awaiting the result of work on runners that could go down for any reason.</p>\n\n<p>There’s a lot to monitor here.</p>\n\n<p>There’s also a number of failure modes that make this sound like a harrowing experience to implement. Fortunately Elixir has all the primitives to make this an easy task thanks to the Erlang VM. Namely, we get the following for free:</p>\n\n<ul>\n<li>Process monitoring and supervision – we know when things go bad. Whether on a node-local process, or one across the cluster\n</li><li>Node monitoring – we know when nodes come up, and when nodes go away\n</li><li>Declarative and controlled app startup and shutdown - we carefully control the startup and shutdown sequence of applications as a matter of course. This allows us to gracefully shutdown active runners when a fresh deploy is triggered, while giving them time to finish their work\n</li></ul>\n\n<p>We’ll cover the internal implementation details in a future deep-dive post. For now, feel free to poke around <a href='https://github.com/phoenixframework/flame' title=''>the flame source</a>.</p>\n<h2 id='whats-next' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next' aria-label='Anchor'></a><span class='plain-code'>What’s Next</span></h2>\n<p>We’re just getting started with the Elixir FLAME library, but it’s ready to try out now. In the future  look for more advance pool growth techniques, and deep dives into how the Elixir implementation works. You can also find me <a href='https://twitter.com/chris_mccord' title=''>@chris_mccord</a> to chat about implementing the FLAME pattern in your language of choice.</p>\n\n<p>Happy coding!</p>\n\n<p>–Chris</p>",
      "image": {
        "url": "https://fly.io/blog/rethinking-serverless-with-flame/assets/flame-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/",
      "title": "The risks of building apps on ChatGPT",
      "description": null,
      "url": "https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/",
      "published": "2023-12-05T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>If AI will play an essential role in your application, then consider using a self-hosted, open source model instead of a proprietary and externally hosted one. In this post we explore some of the risks for the latter option. We’re Fly.io. We put your code into lightweight microVMs on our own hardware <a href=\"https://fly.io/docs/reference/regions/\" title=\"\">around the world</a>. <a href=\"https://fly.io/docs/speedrun/\" title=\"\">Check us out</a>—your app can be deployed in minutes.</p>\n</div>\n<p>The topic of “AI” gets a lot of attention and press. Coverage ranges from apocalyptic warnings to Utopian predictions. The truth, as always, is likely somewhere in the middle. As developers, we are the ones that either imagine ways that AI can be used to enhance our products or the ones doing the herculean tasks of implementing it inside our companies.</p>\n\n<p>I believe the following statement to be true:</p>\n\n<blockquote>\n<p>AI won’t replace humans — but humans with AI will replace humans without AI.</p>\n</blockquote>\n\n<p>I believe this can be extended to many products and services and the companies that create them. Let’s express it this way:</p>\n\n<blockquote>\n<p>AI won’t replace businesses — but businesses with AI will replace businesses without AI.</p>\n</blockquote>\n\n<p>Today I’m assuming your business would benefit from using AI. Or, at the very least, your C-levels have decreed from on high that thou must integrateth with AI. With that out of the way, the next question is how you’re meant to do it. This post is an argument to build on top of open source language models instead of closed models that you rent access to. We’ll take a look at what convinced me.</p>\n<h2 id='but-openai-is-the-market-leader' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-openai-is-the-market-leader' aria-label='Anchor'></a><span class='plain-code'>But OpenAI is the market leader…</span></h2>\n<p>OpenAI, the creators of the famous ChatGPT, are the strong market leaders in this category. Why wouldn’t you want to use the best in the business?</p>\n\n<p>Early on, stories of private corporate documents being uploaded by employees and then finding that private information leaking out to general ChatGPT users was a real black eye. <a href='https://www.sciencealert.com/many-companies-are-banning-chatgpt-this-is-why' title=''>Companies began banning employees from using ChatGPT for work</a>. It exposed that people’s interactions with ChatGPT were being used as training data for future versions of the model.</p>\n\n<p>In response, OpenAI recently announced an <a href='https://openai.com/enterprise' title=''>Enterprise</a> offering promising that no Enterprise customer data is used for training.</p>\n\n<p>With the top objection addressed, it should be smooth sailing for wide adoption, right?</p>\n\n<p>Not so fast.</p>\n\n<p>While an Enterprise offering may address that concern, there are other subtle reasons to not use OpenAI, or other closed models, that can’t be resolved by vague statements of enterprise privacy.</p>\n<h2 id='what-are-the-risks-for-building-on-top-of-openai' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-are-the-risks-for-building-on-top-of-openai' aria-label='Anchor'></a><span class='plain-code'>What are the risks for building on top of OpenAI?</span></h2>\n<p>Let’s briefly outline the risks we take on when relying on a company like OpenAI for critical AI features in our applications.</p>\n\n<ul>\n<li><strong class='font-semibold text-navy-950'>Single provider risk</strong>: Relying deeply on an external service that plays a critical role in our business is risky. The integration is not easily swapped out for another service if needed. Additionally, we don’t want part of our “secret sauce” to actually be another company’s product. That’s some seriously shaky ground! They <em>want</em> to sell the same thing to our competitors too.\n</li><li><strong class='font-semibold text-navy-950'>Regulation or Policy change risk</strong>: “AI” is being talked about a lot in politics. What’s acceptable today may be deemed “not allowed” in the future and a corporation providing a newly regulated service must comply.\n</li><li><strong class='font-semibold text-navy-950'>Financial risk</strong>: <a href='https://www.washingtonpost.com/technology/2023/06/05/chatgpt-hidden-cost-gpu-compute/' title=''>AI chatbots lose money on every chat.</a> If the financial models that make our business profitable are built on impossible to maintain prices, then our business model may be at risk when it’s time to “make the AI engine profitable” like we’ve seen happen time and time again with every industry from cookware to video games. What might the true cost be? We don’t know. ‘Nuff said.\n</li><li><strong class='font-semibold text-navy-950'>Governance and leadership risk</strong>: The co-founder and CEO of OpenAI, <a href='https://openai.com/blog/openai-announces-leadership-transition' title=''>Sam Altman, was forced out of his own company by a coup from his board</a>. This was later resolved with both <a href='https://openai.com/blog/openai-announces-leadership-transition' title=''>Sam Altman and Greg Brockman returning</a>. This exposes another risk we don’t often consider with our providers. More on this later.\n</li></ul>\n\n<p>Let’s look a bit closer at the “Single provider risk”.</p>\n<h2 id='single-provider-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#single-provider-risk' aria-label='Anchor'></a><span class='plain-code'>Single provider risk</span></h2>\n<p>For hobby usage, proof of concept work, and personal experiments, by all means, use ChatGPT! I do and I expect to continue to as well. It’s fantastic for prototyping, it’s trivial to set up, and it allows you to throw ink on canvas so much more quickly than any other option out there.</p>\n\n<p>Up until recently, I was all gung-ho for ChatGPT being integrated into my apps. What happened? November 2023 happened. It was a very bad month for OpenAI.</p>\n\n<p>I created a <a href='https://fly.io/phoenix-files/created-my-personal-ai-fitness-trainer-in-2-days/' title=''>Personal AI Fitness Trainer</a> powered by ChatGPT and on the morning of November 8th, I asked my personal trainer about the workout for the day and it failed. OpenAI was having a bad day with an outage.</p>\n\n<p>I don’t fault someone for having a bad day. At some point, downtime happens to the best of us. And given enough time, it happens to <strong class='font-semibold text-navy-950'>all</strong> of us. But when possible, I want to prevent someone <em>else’s</em> bad day from becoming <em>my</em> bad day too.</p>\n<h3 id='evaluating-a-critical-dependency' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#evaluating-a-critical-dependency' aria-label='Anchor'></a><span class='plain-code'>Evaluating a critical dependency</span></h3>\n<p>In my case, my personal fitness trainer being unavailable was a minor inconvenience, but I managed. However, it gave me pause. If I had built an AI fitness trainer as a service, that outage would be a much bigger deal and there would be nothing I could have done to fix it until the ChatGPT API came back up.</p>\n\n<p>With services like a Personal AI Fitness Trainer, the AI component is the primary focus and main value proposition of the app. That’s pretty darn critical! If that AI service is interrupted, significantly altered (say, by the model suddenly refusing my requests for fitness information in ways that worked before) or my desired usage is denied (without warning or reason), the application is useless. That’s an existential threat that could make my app evaporate overnight without warning.</p>\n\n<p>This highlights the risk of having a critical dependency on an external service.</p>\n\n<p>Modern applications depend on many services, both internal and external. But how <strong class='font-semibold text-navy-950'>critical</strong> that dependency is matters.</p>\n\n<p>Let’s take a <em>very</em> simple application as an example. The application has a critical dependency on the database and both the app and database have a critical dependency on the underlying VMs/machines/provider. These critical dependencies are so common that we seldom think about them because we deal with them every day we come to work. It’s just how things are.</p>\n\n<p><img alt=\"Diagram showing an application stack of hosting > Database > My Application and weak dependencies on logging, error reporting, etc. Then a critical dependency on an external AI as a Service. \" src=\"/blog/the-risks-of-building-apps-on-chatgpt/assets/critical-dependency-vs-weak.png\" /></p>\n\n<p>The danger comes when we draw a critical dependency line to an <strong class='font-semibold text-navy-950'>external</strong> <strong class='font-semibold text-navy-950'>service</strong>. If the service has a hiccup or the network between my app and their service starts dropping all my packets, the entire application goes down. Someone else’s bad day gets spread around when that happens. 😞</p>\n\n<p>In order to protect ourselves from a risk like that, we should diversify our reliance away from a single external provider. How do we do that? We’ll come back to this later.</p>\n<h3 id='we-are-not-without-dependencies' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-are-not-without-dependencies' aria-label='Anchor'></a><span class='plain-code'>We are not without dependencies</span></h3>\n<p>It’s really common for apps to have external dependencies. The question is how critical to our service are those dependencies?</p>\n\n<p>What happens to the application when the external log aggregation service, email service, and error reporting services are all unreachable? If the app is designed well, then users may have a slightly degraded experience or, best case, the users won’t even notice the issues at all!</p>\n\n<p>The key factor is these external services are not essential to our application functioning.</p>\n<h2 id='regulation-or-policy-change-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#regulation-or-policy-change-risk' aria-label='Anchor'></a><span class='plain-code'>Regulation or Policy change risk</span></h2>\n<p>Our industry has a lot of misconceptions, fear, uncertainty, and doubt around the idea of regulation, but sometimes it’s justified. I don’t want you to think about regulation as a scary thing that yanks away control. Instead, let’s think about regulation as when a government body gets involved to disallow businesses from doing or engaging in specific activities. Given that our industry has been so self-defined for so long, this feels like an existential threat. However, this is a good thing when we think about vehicle safety standards (you don’t want your 4-ton mass of metal exploding while traveling at 70 mph), pollution, health risks, and more. It’s a careful balance.</p>\n\n<p>Ironically, Sam Altman has been a major proponent <a href='https://www.forbes.com/sites/johannacostigan/2023/06/13/openais-sam-altman-makes-global-call-for-ai-regulation-and-includes-china/?sh=4fc007421b47' title=''>for government regulation</a> of the AI industry. Why would he want that?</p>\n\n<p>It turns out that <a href='https://www.cato.org/policy-analysis/regulatory-protectionism-hidden-threat-free-trade' title=''>regulation can also be used as a form of protectionism</a>. Or, put another way, when the people with an early lead see that <a href='https://www.semianalysis.com/p/google-we-have-no-moat-and-neither' title=''>they aren’t defensible against advances with open source AI models</a>, they want to pull up the ladders behind them and have the government make it legally harder, or impossible, for competitors to catch up to them.</p>\n\n<p>If Altman’s efforts are successful, then companies who create AI can expect government involvement and oversight. Added licensing requirements and certifications would raise the cost of starting a competing business.</p>\n\n<p>At this point you may be thinking something like “but all of that is theoretical Mark, how would this affect my business’ use of AI today?”</p>\n\n<p>Introducing an external organization that can dictate changes to an AI product risks breaking an existing company’s applications or significantly reducing the effectiveness of the application. And those changes may come without notice or warning.</p>\n\n<p>Additionally, if my business is built on an external AI system protected from competition by regulators, that adds a significant risk. If they are now the only game in town, they can set whatever price they want.</p>\n<h2 id='governance-and-leadership-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#governance-and-leadership-risk' aria-label='Anchor'></a><span class='plain-code'>Governance and leadership risk</span></h2>\n<p>In the week following the OpenAI outage (November 17th to be precise), the entire tech industry was upended for most of a week following a blog post on the OpenAI blog <a href='https://openai.com/blog/openai-announces-leadership-transition' title=''>announcing that the OpenAI board fired the co-founder and CEO, Sam Altman</a>. Then <a href='https://www.forbes.com/sites/richardnieva/2023/11/17/openai-president-and-co-founder-quits-over-sam-altman-firing/?sh=34fe4b621d57' title=''>Greg Brockman, co-founder and acting President resigned in protest</a>.</p>\n\n<p>OpenAI is partnered with Microsoft and on Nov 20, 2023, <a href='https://twitter.com/satyanadella/status/1726509045803336122' title=''>Satya Nadella (CEO of Microsoft) posted the following on X</a> (formerly Twitter):</p>\n\n<blockquote>\n<p>We remain committed to our partnership with OpenAI (OAI) and have confidence in our product roadmap, our ability to continue to innovate with everything we announced at Microsoft Ignite, and in continuing to support our customers and partners. We look forward to getting to know Emmett Shear and OAI’s new leadership team and working with them. And <strong class='font-semibold text-navy-950'>we’re extremely excited to share the news that Sam Altman and Greg Brockman, together with colleagues, will be joining Microsoft to lead a new advanced AI research team.</strong> We look forward to moving quickly to provide them with the resources needed for their success.</p>\n</blockquote>\n\n<p>Microsoft nearly <a href='https://en.wikipedia.org/wiki/Acqui-hiring' title=''>acqui-hired</a> OpenAI for $0! That’s some serious business Jujutsu.</p>\n\n<p>In the end, after 12 days of very public corporate chaos, <a href='https://openai.com/blog/sam-altman-returns-as-ceo-openai-has-a-new-initial-board' title=''>Sam Altman and Greg Brockman returned to OpenAI at their previous leadership positions</a> as if nothing happened (save the firing of the rest of the board).</p>\n\n<p>With all the drama and uncertainty resolved, you may say, “it all worked out in the end, right? So what’s the problem?”</p>\n\n<p>This highlights the risk of building <em>any</em> critical business system on a product offered and hosted by an external company. When we do that, we implicitly take on all of that company’s risks in addition to the risks our business already has! In this case, it’s taking on all the risks of OpenAI while getting none of their financial benefits!</p>\n<h2 id='whats-the-alternative' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-the-alternative' aria-label='Anchor'></a><span class='plain-code'>What’s the alternative?</span></h2>\n<p>The thing big AI providers like OpenAI and Google seem to fear most is competition from open source AI models. And they should be afraid. Open source AI models continue to develop at a rapid pace (there’s huge incremental improvements on a weekly basis) and, most importantly, they can be self-hosted.</p>\n\n<p>Additionally, it’s not out of reach for us to <a href='https://huggingface.co/docs/transformers/training' title=''>fine tune</a> a general model to better fit our  needs by adding and removing capabilities rather than hope that the capabilities we need suddenly manifest for us.</p>\n\n<p>Doesn’t this all sound like the classic argument in favor of open source?</p>\n\n<p>If we have the model and can host it ourselves, no one can take it away. When we self-host it, we are protected from:</p>\n\n<ul>\n<li>service interruptions from an external provider for a critical system\n</li><li>changes in licensing or usage fees (such as your provider suddenly doubling inference costs without warning via an email sent at 3AM)\n</li><li>government regulators dictate a change to the model that negatively affects our use case (assuming our use isn’t breaking the law of course)\n</li><li>company policy changes that change the behavior of the model we rely on\n</li><li>rogue boards or a leadership crisis that impacts a provider\n</li></ul>\n\n<p>Using an open source and self-hosted model insulates us from these external risks.</p>\n<h2 id='i-still-need-gpus' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#i-still-need-gpus' aria-label='Anchor'></a><span class='plain-code'>I still need GPUs!</span></h2>\n<p>Getting dedicated access to a GPU is more expensive than renting limited time on OpenAI’s servers. That’s why a hobby or personal project is better off paying for the brief bits of time when needed.</p>\n\n<p>But let’s face it.</p>\n\n<p>If you really want to integrate AI into your business, you need to host your own models. You can’t control third party privacy policies, but you can control your own policies when you are the one doing your own inference with your own models. Ideally this means getting your own GPUs and incurring the capital expenditure and operations expenditures, but thankfully we’re in the future. We have the cloud now. There’s many options you can use for renting GPU access from other companies. This is supported in the big clouds as well as Fly.io. You can check out our <a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''>GPU offerings here</a>.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Fly.io also offer GPUs</h1>\n    <p>Running inference on your own hosted models can help de-risk critical AI integrations.</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/docs/about/pricing/#gpus-and-fly-machines\">\n        GPU resource prices\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-rabbit.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='closing-thoughts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#closing-thoughts' aria-label='Anchor'></a><span class='plain-code'>Closing thoughts</span></h2>\n<p>It’s important to take advantage of AI in our applications so we can reap the benefits. It can give us an important edge in the market! However, we should be extra cautious of building any critical features on a product offered by a proprietary external business. <a href='https://www.msn.com/en-us/money/companies/sam-altman-chaos-helped-openai-rivals-says-hugging-face-ceo-cl%C3%A9ment-delangue/ar-AA1kIFQP' title=''>Others are considering the risks of building on OpenAI as well</a>.</p>\n\n<p>Your specific level of risk depends on how central the AI aspect is to your business. If it’s a central component like in my Personal AI Fitness Trainer, then I risk losing all my customers and even the company if any of the above mentioned risk factors happen to  my AI provider. That’s an existential risk that I can’t do anything about without taking emergency heroic efforts.</p>\n\n<p>If the AI is sprinkled around the edges of the business, then suddenly losing it won’t kill the company. However, if the AI isn’t being well utilized, then the business may be at risk to competitors who place a bigger bet and take a bigger swing with AI.</p>\n\n<p>Oh, what interesting times we live in! 🙃</p>",
      "image": {
        "url": "https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/assets/risks-building-on-chatgpt-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/print-on-demand/",
      "title": "Print on Demand",
      "description": null,
      "url": "https://fly.io/blog/print-on-demand/",
      "published": "2023-11-29T00:00:00.000Z",
      "updated": "2025-05-20T16:26:25.000Z",
      "content": "<div class=\"lead\"><p>Save money by using appliance machines to only allocate memory and other machine resources when you actually need them.</p>\n</div>\n<p>Scaling discussions often lead to recommendations to add more memory, more CPU, more machines, more regions, more, more, more.</p>\n\n<p>This post is different.  It focuses instead on the idea of decomposing parts of your applications into event handlers, starting up Machines to handle the events when needed, and stopping them when the event is done.  Along the way we will see how a few built in Fly.io primitives make this easy.</p>\n\n<p>To make the discussion concrete, we are going to focus on a common requirement: generation of PDFs from web pages.  The code that we will introduce isn’t merely an example produced in support of a blog post - rather it is code that was extracted from a production application, and packaged up into an appliance that you can deploy in minutes to add PDF generation to your existing application.</p>\n\n<p>But before we dive in, let’s back up a bit.</p>\n<h2 id='motivation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#motivation' aria-label='Anchor'></a><span class='plain-code'>Motivation</span></h2>\n<p>Normally the way this is approached is to start with a tool like <a href='https://github.com/puppeteer/puppeteer' title=''>Puppeteer</a>, <a href='https://github.com/Studiosity/grover#readme' title=''>Grover</a>, <a href='https://playwright.dev/' title=''>Playwright</a>, <a href='https://github.com/bitcrowd/chromic_pdf' title=''>ChromicPDF</a>, or <a href='https://spatie.be/docs/browsershot/v2/introduction' title=''>BrowserShot</a>.  These and other tools ultimately launch a browser like <a href='https://developer.chrome.com/articles/new-headless/' title=''>Chrome headless</a>.</p>\n\n<p>Now a few things about Chrome itself:</p>\n\n<ul>\n<li>It likely is bigger than your entire web server.\n</li><li>It likely uses more memory than you see with a typical load on your server.\n</li><li>All total, people using your server likely spend much less time generating PDFs than they do using the rest of your application. \n</li></ul>\n\n<p>Taken together, this makes splitting PDF generation into a completely separate application an easy win.  With a smaller image, your application will start faster.  Memory usage will be more predictable, and the memory needed to generate PDFs will only be allocated when needed and can be scaled separately.</p>\n<h2 id='diving-in' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#diving-in' aria-label='Anchor'></a><span class='plain-code'>Diving in</span></h2>\n<p>Without further ado, the entire application is available on GitHub as <a href='https://github.com/fly-apps/pdf-appliance/#readme' title=''>fly-apps/pdf-appliance</a>.  Installation is a simple matter of: clone repository, create app, adjust config, deploy, and scale.</p>\n\n<p>Next, you will need to integrate this into your application.  All that is needed is to reply to requests that are intended to produce a PDF with a <a href='https://fly.io/docs/reference/dynamic-request-routing/#the-fly-replay-response-header' title=''>fly-replay</a> response header.  This can either be done on individual application routes / controller actions, or it can be done globally via either middleware or a front end like <a href='https://www.nginx.com/' title=''>NGINX</a>.  You can find a few examples in the <a href='https://github.com/fly-apps/pdf-appliance/#integrate-with-your-existing-application' title=''>README</a>.</p>\n\n<p>And, that’s it.  The most you might consider doing is issuing an additional HTTP request in anticipation of the user selecting what they want to print as this will <a href='https://github.com/fly-apps/pdf-appliance/#preloading-optional' title=''>preload the machine</a>.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Scale at your own pace</h1>\n    <p>Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/docs/\">\n        Run your entire stack near your users\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-turtle.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n\n<p>If you don’t have an application handy, you can try a demo.  Go to <a href='https://smooth.fly.dev/' title=''>smooth.fly.dev</a>.  Click on Demo, then on Publish, and finally on Invoices to see a PDF.  The PDF you see will likely be underwhelming as you would need to enter students, entries, packages and options to fill out the page.  But click refresh anyway and see how fast it responds.  If you want to explore further, links to the <a href='https://smooth.fly.dev/showcase/docs/' title=''>documentation</a> and <a href='https://github.com/rubys/showcase#readme' title=''>code</a> can be found on the front page.</p>\n<h2 id='implementation-details' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implementation-details' aria-label='Anchor'></a><span class='plain-code'>Implementation Details</span></h2>\n<p>The basic flow starts with a request comes into your app for a PDF.  That request is replayed to the PDF appliance.  A Chrome instance in that app then issues a second request to your app for the same URL minus the <code>.pdf</code> extension and then converts the HTML which it receives in response to a PDF.  That PDF is then returned as the response to the original request.</p>\n\n<p>A single Google Chrome instance per machine will be reused across all requests, which itself is faster than starting a new instance per request.  As all HTTP headers will be passed back to your application, this will seamlessly work with your existing session, cookies, and basic authentication.</p>\n\n<p>Starting up a machine on demand is handled by the <code>auto_stop_machines</code> setting in your <code>fly.toml</code>.  With this in place, machines can confidently exit when idle, secure in the knowledge that they will be restarted when needed.  See the <a href='https://github.com/fly-apps/pdf-appliance/#scaling' title=''>README</a> for more information on scaling.</p>\n\n<p>Note that different machines can use different languages and frameworks.  This code is written in JavaScript and runs on Bun.  It was designed to support a Ruby on Rails app, but can be used with any app.</p>\n<h2 id='a-reusable-pattern' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-reusable-pattern' aria-label='Anchor'></a><span class='plain-code'>A Reusable Pattern</span></h2>\n<p>If your app is small and your usage is low, scaling may not be much of a\nconcern, but as your need grow your first instinct shouldn’t merely be to throw\nmore hardware at the problem, but rather to partition the problem so that each\nmachine has a somewhat predictable capacity.</p>\n\n<p>Do this by taking a look at your application, and look for requests that are\nsomehow different than the rest.  Streaming audio and video files, handling websockets,\nconverting text to speech or performing other AI processing, long running\n“background” computation, fetching static pages, producing PDFs, and updating\ndatabases all have different profiles in terms of server load.</p>\n\n<p>It might even be helpful – purely as a thought experiment – to think of\nreplacing your main server with a proxy that does nothing more than route\nrequests to separate machines based on the type of workload performed.</p>\n\n<p>Once you have come up with an allocation of functions performed to pools of\nmachines, Fly-Replay is but one tool available to you.  There is also a\n<a href='https://fly.io/docs/machines/working-with-machines/' title=''>Machines API</a> that will\nenable you to orchestrate whatever topology you can come up with.\n<a href='https://fly.io/laravel-bytes/cost-effective-queue-workers-with-fly-io-machines/' title=''>Cost-Effective Queue Workers With Fly.io\nMachines</a>\ngives a preview of what that would look like with Laravel.</p>",
      "image": {
        "url": "https://fly.io/blog/print-on-demand/assets/print-on-demand-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    }
  ]
}

Analyze Another View with RSS.Style