Feed fetched in 126 ms.
Content type is text/xml
.
Feed is 905,469 characters long.
Feed has an ETag of W/"67ec39be-dd651"
.
Feed has a last modified date of Tue, 01 Apr 2025 19:08:46 GMT
.
Warning This feed does not have a stylesheet.
This appears to be an Atom feed.
Feed title: The Fly Blog
Error Feed self link does not match feed URL: https://fly.io/blog/.
Feed has 40 items.
First item published on 2025-03-27T00:00:00.000Z
Last item published on 2023-07-05T00:00:00.000Z
Home page URL: https://fly.io/blog/
Error Home page does not have any feed discovery link in the <head>.
Home page has a link to the feed in the <body>
<?xml version="1.0" encoding="UTF-8"?> <feed xmlns="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/"> <title>The Fly Blog</title> <subtitle>News, tips, and tricks from the team at Fly</subtitle> <id>https://fly.io/blog/</id> <link href="https://fly.io/blog/"/> <link href="https://fly.io/blog/" rel="self"/> <updated>2025-03-27T00:00:00+00:00</updated> <author> <name>Fly</name> </author> <entry> <title>Operationalizing Macaroons</title> <link rel="alternate" href="https://fly.io/blog/operationalizing-macaroons/"/> <id>https://fly.io/blog/operationalizing-macaroons/</id> <published>2025-03-27T00:00:00+00:00</published> <updated>2025-04-01T19:05:33+00:00</updated> <media:thumbnail url="https://fly.io/blog/operationalizing-macaroons/assets/occult-circle.jpg"/> <content type="html"><div class="lead"><p>We’re Fly.io, a security bearer token company with a public cloud problem. You can read more about what our platform does (Docker container goes in, virtual machine in Singapore comes out), but this is just an engineering deep-dive into how we make our security tokens work. It’s a tokens nerd post.</p> </div><h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'></a><span class='plain-code'>1</span></h2> <p>We’ve spent <a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''>too much time</a> talking about <a href='https://fly.io/blog/tokenized-tokens/' title=''>security tokens</a>, and about <a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''>Macaroon tokens</a> <a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''>in particular</a>. Writing another Macaroon treatise was not on my calendar. But we’re in the process of handing off our internal Macaroon project to a new internal owner, and in the process of truing up our operations manuals for these systems, I found myself in the position of writing a giant post about them. So, why not share?</p> <div class="callout"><p>Can I sum up Macaroons in a short paragraph? Macaroon tokens are bearer tokens (like JWTs) that use a cute chained-HMAC construction that allows an end-user to take any existing token they have and scope it down, all on their own. You can minimize your token before every API operation so that you’re only ever transmitting the least amount of privilege needed for what you’re actually doing, even if the token you were issued was an admin token. And they have a user-serviceable plug-in interface! <a href="https://fly.io/blog/macaroons-escalated-quickly/" title="">You’ll have to read the earlier post to learn more about that</a>.</p> </div><div class="right-sidenote"><p>Yes, probably, we are.</p> </div> <p>A couple years in to being the Internet’s largest user of Macaroons, I can report (as many predicted) that for our users, the cool things about Macaroons are a mixed bag in practice. It’s very neat that users can edit their own tokens, or even email them to partners without worrying too much. But users don’t really take advantage of token features.</p> <p>But I’m still happy we did this, because Macaroon quirks have given us a bunch of unexpected wins in our infrastructure. Our internal token system has turned out to be one of the nicer parts of our platform. Here’s why.</p> <p><img alt="This should clear everything up." src="/blog/operationalizing-macaroons/assets/schematic-diagram.png" /></p> <h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'></a><span class='plain-code'>2</span></h2> <p>As an operator, the most important thing to know about Macaroons is that they’re online-stateful; you need a database somewhere. A Macaroon token starts with a random field (a nonce) and the first thing you do when verifying a token is to look that nonce up in a database. So one of the most important details of a Macaroon implementation is where that database lives.</p> <p>I can tell you one place we’re not OK with it living: in our primary API cluster.</p> <p>There’s several reasons for that. Some of them are about scalability and reliability: far and away the most common failure mode of an outage on our platform is “deploys are broken”, and those failures are usually caused by API instability. It would not be OK if “deploys are broken” transitively meant “deployed apps can’t use security tokens”. But the biggest reason is security: root secrets for Macaroon tokens are hazmat, and a basic rule of thumb in secure design is: keep hazmat away from complicated code.</p> <p>So we created a deliberately simple system to manage token data. It’s called <code>tkdb</code>.</p> <div class="right-sidenote"><p>LiteFS: primary/replica distributed SQLite; Litestream: PITR SQLite replication to object storage; both work with unmodified SQLite libraries.</p> </div> <p><code>tkdb</code> is about 5000 lines of Go code that manages a SQLite database that is in turn managed by <a href='https://fly.io/docs/litefs/' title=''>LiteFS</a> and <a href='https://litestream.io/' title=''>Litestream</a>. It runs on isolated hardware (in the US, Europe, and Australia) and records in the database are encrypted with an injected secret. LiteFS gives us subsecond replication from our US primary to EU and AU, allows us to shift the primary to a different region, and gives us point-in-time recovery of the database.</p> <p>We’ve been running Macaroons for a couple years now, and the entire <code>tkdb</code> database is just a couple dozen megs large. Most of that data isn’t real. A full PITR recovery of the database takes just seconds. We use SQLite for a lot of our infrastructure, and this is one of the very few well-behaved databases we have.</p> <p>That’s in large part a consequence of the design of Macaroons. There’s actually not much for us to store! The most complicated possible Macaroon still chains up to a single root key (we generate a key per Fly.io “organization”; you don&rsquo;t share keys with your neighbors), and everything that complicates that Macaroon happens “offline”. We take advantage of &ldquo;attenuation&rdquo; far more than our users do.</p> <p>The result is that database writes are relatively rare and very simple: we just need to record an HMAC key when Fly.io organizations are created (that is, roughly, when people sign up for the service and actually do a deploy). That, and revocation lists (more on that later), which make up most of the data.</p> <h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'></a><span class='plain-code'>3</span></h2> <p>Talking to <code>tkdb</code> from the rest of our platform is complicated, for historical reasons.</p> <div class="right-sidenote"><p>NATS is fine, we just don’t really need it.</p> </div> <p>Ben Toews is responsible for most of the good things about this implementation. When he inherited the v0 Macaroons code from me, we were in the middle of a weird love affair with <a href='https://nats.io/' title=''>NATS</a>, the messaging system. So <code>tkdb</code> exported an RPC API over NATS messages.</p> <p>Our product security team can’t trust NATS (it’s not our code). That means a vulnerability in NATS can’t result in us losing control of all our tokens, or allow attackers to spoof authentication. Which in turn means you can’t run a plaintext RPC protocol for <code>tkdb</code> over NATS; attackers would just spoof “yes this token is fine” messages.</p> <div class="right-sidenote"><p>I highly recommend implementing Noise; <a href="http://www.noiseprotocol.org/noise.html" title="">the spec</a> is kind of a joy in a way you can’t appreciate until you use it, and it’s educational.</p> </div> <p>But you can’t just run TLS over NATS; NATS is a message bus, not a streaming secure channel. So I did the hipster thing and implemented <a href='http://www.noiseprotocol.org/noise.html' title=''>Noise</a>. We export a “verification” API, and a “signing” API for minting new tokens. Verification uses <code>Noise_IK</code> (which works like normal TLS) — anybody can verify, but everyone needs to prove they’re talking to the real <code>tkdb</code>. Signing uses <code>Noise_KK</code> (which works like mTLS) — only a few components in our system can mint tokens, and they get a special client key.</p> <p>A little over a year ago, <a href='https://fly.io/blog/the-exit-interview-jp/' title=''>JP</a> led an effort to replace NATS with HTTP, which is how you talk to <code>tkdb</code> today. Out of laziness, we kept the Noise stuff, which means the interface to <code>tkdb</code> is now HTTP/Noise. This is a design smell, but the security model is nice: across many thousands of machines, there are only a handful with the cryptographic material needed to mint a new Macaroon token. Neat!</p> <p><code>tkdb</code> is a Fly App (albeit deployed in special Fly-only isolated regions). Our infrastructure talks to it over “<a href='https://fly.io/docs/networking/flycast/' title=''>FlyCast</a>”, which is our internal Anycast service. If you’re in Singapore, you’re probably get routed to the Australian <code>tkdb</code>. If Australia falls over, you’ll get routed to the closest backup. The proxy that implements FlyCast is smart, as is the <code>tkdb</code> client library, which will do exponential backoff retry transparently.</p> <p>Even with all that, we don’t like that Macaroon token verification is &ldquo;online&rdquo;. When you operate a global public cloud one of the first thing you learn is that <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>the global Internet sucks</a>. Connectivity breaks all the time, and we’re paranoid about it. It’s painful for us that token verification can imply transoceanic links. Lovecraft was right about the oceans! Stay away!</p> <p>Our solution to this is caching. Macaroons, as it turns out, cache beautifully. That’s because once you’ve seen and verified a Macaroon, you have enough information to verify any more-specific Macaroon that descends from it; that’s a property of <a href='https://research.google/pubs/macaroons-cookies-with-contextual-caveats-for-decentralized-authorization-in-the-cloud/' title=''>their chaining HMAC construction</a>. Our client libraries cache verifications, and the cache ratio for verification is over 98%.</p> <h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'></a><span class='plain-code'>4</span></h2> <p><a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''>Revocation isn’t a corner case</a>. It can’t be an afterthought. We’re potentially revoking tokens any time a user logs out. If that doesn’t work reliably, you wind up with “cosmetic logout”, which is a real vulnerability. When we kill a token, it needs to stay dead.</p> <p>Our revocation system is simple. It’s this table:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-13jllwee" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-13jllwee"> CREATE TABLE IF NOT EXISTS blacklist ( nonce BLOB NOT NULL UNIQUE, required_until DATETIME, created_at DATETIME DEFAULT CURRENT_TIMESTAMP ); </code></pre> </div> </div> <p>When we need a token to be dead, we have our primary API do a call to the <code>tkdb</code> “signing” RPC service for <code>revoke</code>. <code>revoke</code> takes the random nonce from the beginning of the Macaroon, discarding the rest, and adds it to the blacklist. Every Macaroon in the lineage of that nonce is now dead; we check the blacklist before verifying tokens.</p> <p>The obvious challenge here is caching; over 98% of our validation requests never hit <code>tkdb</code>. We certainly don’t want to propagate the blacklist database to 35 regions around the globe.</p> <p>Instead, the <code>tkdb</code> “verification” API exports an endpoint that provides a feed of revocation notifications. Our client library “subscribes” to this API (really, it just polls). Macaroons are revoked regularly (but not constantly), and when that happens, clients notice and prune their caches.</p> <p>If clients lose connectivity to <code>tkdb</code>, past some threshold interval, they just dump their entire cache, forcing verification to happen at <code>tkdb</code>.</p> <h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'></a><span class='plain-code'>5</span></h2> <p>A place where we’ve gotten a lot of internal mileage out of Macaroon features is service tokens. Service tokens are tokens used by code, rather than humans; almost always, a service token is something that is stored alongside running application code.</p> <p>An important detail of Fly.io’s Macaroons is the distinction between a “permissions” token and an “authentication” token. Macaroons by themselves express authorization, not authentication.</p> <p>That’s a useful constraint, and we want to honor it. By requiring a separate token for authentication, we minimize the impact of having the permissions token stolen; you can’t use it without authentication, so really it’s just like a mobile signed IAM policy expression. Neat!</p> <p>The way we express authentication is with a third-party caveat (<a href='https://fly.io/blog/macaroons-escalated-quickly/#4' title=''>see the old post for details</a>). Your main Fly.io Macaroon will have a caveat saying “this token is only valid if accompanied by the discharge token for a user in your organization from our authentication system”. Our authentication system does the login dance and issues those discharges.</p> <p>This is exactly what you want for user tokens and not at all what you want for a service token: we don’t want running code to store those authenticator tokens, because they’re hazardous.</p> <p>The solution we came up with for service tokens is simple: <code>tkdb</code> exports an API that uses its access to token secrets to strip off the third-party authentication caveat. To call into that API, you have to present a valid discharging authentication token; that is, you have to prove you could already have done whatever the token said. <code>tkdb</code> returns a new token with all the previous caveats, minus the expiration (you don’t usually want service tokens to expire).</p> <p>OK, so we’ve managed to transform a tuple <code>(unscary-token, scary-token)</code> into the new tuple <code>(scary-token)</code>. Not so impressive. But hold on: the recipient of <code>scary-token</code> can attenuate it further: we can lock it to a particular instance of <code>flyd</code>, or to a particular Fly Machine. Which means exfiltrating it doesn’t do you any good; to use it, you have to control the environment it’s intended to be used in.</p> <p>The net result of this scheme is that a compromised physical host will only give you access to tokens that have been used on that worker, which is a very nice property. Another way to look at it: every token used in production is traceable in some way to a valid token a user submitted. Neat!</p> <div class="right-sidenote"><p>All the cool spooky secret store names were taken.</p> </div> <p>We do a similar dance to with Pet Semetary, our internal Vault replacement. Petsem manages user secrets for applications, such as Postgres connection strings. Petsem is its own Macaroon authority (it issues its own Macaroons with its own permissions system), and to do something with a secret, you need one of those Petsem-minted Macaroon.</p> <p>Our primary API servers field requests from users to set secrets for their apps. So the API has a Macaroon that allows secrets writes. But it doesn&rsquo;t allow reads: there’s no API call to dump your secrets, because our API servers don’t have that privilege. So far, so good.</p> <p>But when we boot up a Fly Machine, we need to inject the appropriate user secrets into it at boot; <em>something</em> needs a Macaroon that can read secrets. That “something” is <code>flyd</code>, our orchestrator, which runs on every worker server in our fleet.</p> <p>Clearly, we can’t give every <code>flyd</code> a Macaroon that reads every user’s secret. Most users will never deploy anything on any given worker, and we can’t have a security model that collapses down to “every worker is equally privileged”.</p> <p>Instead, the “read secret” Macaroon that <code>flyd</code> gets has a third-party caveat attached to it, which is dischargeable only by talking to <code>tkdb</code> and proving (with normal Macaroon tokens) that you have permissions for the org whose secrets you want to read. Once again, access is traceable to an end-user action, and minimized across our fleet. Neat!</p> <h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'></a><span class='plain-code'>6</span></h2> <p>Our token systems have some of the best telemetry in the whole platform.</p> <p>Most of that is down to <a href='http://opentelemetry.io/' title=''>OpenTelemetry</a> and <a href='https://www.honeycomb.io/' title=''>Honeycomb</a>. From the moment a request hits our API server through the moment <code>tkdb</code> responds to it, oTel <a href='https://opentelemetry.io/docs/concepts/context-propagation/' title=''>context propagation</a> gives us a single narrative about what’s happening.</p> <p><a href='https://fly.io/blog/the-exit-interview-jp/' title=''>I was a skeptic about oTel</a>. It’s really, really expensive. And, not to put too fine a point on it, oTel really cruds up our code. Once, I was an “80% of the value of tracing, we can get from logs and metrics” person. But I was wrong.</p> <p>Errors in our token system are rare. Usually, they’re just early indications of network instability, and between caching and FlyCast, we mostly don’t have to care about those alerts. When we do, it’s because something has gone so sideways that we’d have to care anyways. The <code>tkdb</code> code is remarkably stable and there hasn’t been an incident intervention with our token system in over a year.</p> <p>Past oTel, and the standard logging and Prometheus metrics every Fly App gets for free, we also have a complete audit trail for token operations, in a permanently retained OpenSearch cluster index. Since virtually all the operations that happen on our platform are mediated by Macaroons, this audit trail is itself pretty powerful.</p> <h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'></a><span class='plain-code'>7</span></h2> <p>So, that&rsquo;s pretty much it. The moral of the story for us is, Macaroons have a lot of neat features, our users mostly don&rsquo;t care about them — that may even be a good thing — but we get a lot of use out of them internally.</p> <p>As an engineering culture, we&rsquo;re allergic to &ldquo;microservices&rdquo;, and we flinched a bit at the prospect of adding a specific service just to manage tokens. But it&rsquo;s pulled its weight, and not added really any drama at all. We have at this point a second dedicated security service (Petsem), and even though they sort of rhyme with each other, we&rsquo;ve got no plans to merge them. <a href='https://how.complexsystems.fail/#10' title=''>Rule #10</a> and all that.</p> <p>Oh, and a total victory for LiteFS, Litestream, and infrastructure SQLite. Which, after managing an infrastructure SQLite project that routinely ballooned to tens of gigabytes and occasionally threatened service outages, is lovely to see.</p> <p>Macaroons! If you&rsquo;d asked us a year ago, we&rsquo;d have said the jury was still out on whether they were a good move. But then Ben Toews spent a year making them awesome, and so they are. <a href='https://github.com/superfly/macaroon' title=''>Most of the code is open source</a>!</p></content> </entry> <entry> <title>Taming A Voracious Rust Proxy</title> <link rel="alternate" href="https://fly.io/blog/taming-rust-proxy/"/> <id>https://fly.io/blog/taming-rust-proxy/</id> <published>2025-02-26T00:00:00+00:00</published> <updated>2025-03-10T19:59:35+00:00</updated> <media:thumbnail url="https://fly.io/blog/taming-rust-proxy/assets/happy-crab-cover.jpg"/> <content type="html"><div class="lead"><p>Here’s a fun bug.</p> </div> <p>The basic idea of our service is that we run containers for our users, as hardware-isolated virtual machines (Fly Machines), on hardware we own around the world. What makes that interesting is that we also connect every Fly Machine to a global Anycast network. If your app is running in Hong Kong and Dallas, and a request for it arrives in Singapore, we&rsquo;ll route it to <code>HKG</code>.</p> <p>Our own hardware fleet is roughly divided into two kinds of servers: edges, which receive incoming requests from the Internet, and workers, which run Fly Machines. Edges exist almost solely to run a Rust program called <code>fly-proxy</code>, the router at the heart of our Anycast network.</p> <p>So: a week or so ago, we flag an incident. Lots of things generate incidents: synthetic monitoring failures, metric thresholds, health check failures. In this case two edge tripwires tripped: elevated <code>fly-proxy</code> HTTP errors, and skyrocketing CPU utilization, on a couple hosts in <code>IAD</code>.</p> <p>Our incident process is pretty ironed out at this point. We created an incident channel (we ❤️ <a href='https://rootly.com/' title=''>Rootly</a> for this, <a href='https://rootly.com/' title=''>seriously check out Rootly</a>, an infra MVP here for years now), and incident responders quickly concluded that, while something hinky was definitely going on, the platform was fine. We have a lot of edges, and we&rsquo;ve also recently converted many of our edge servers to significantly beefier hardware.</p> <p>Bouncing <code>fly-proxy</code> clears the problem up on an affected proxy. But this wouldn&rsquo;t be much of an interesting story if the problem didn&rsquo;t later come back. So, for some number of hours, we&rsquo;re in an annoying steady-state of getting paged and bouncing proxies. </p> <p>While this is happening, Pavel, on our proxy team, pulls a profile from an angry proxy. <img alt="A flamegraph profile, described better in the prose anyways." src="/blog/taming-rust-proxy/assets/proxy-profile.jpg" /> So, this is fuckin&rsquo; weird: a huge chunk of the profile is dominated by Rust <code>tracing</code>&lsquo;s <code>Subscriber</code>. But that doesn&rsquo;t make sense. The entire point of Rust <code>tracing</code>, which generates fine-grained span records for program activity, is that <code>entering</code> and <code>exiting</code> a span is very, very fast. </p> <p>If the mere act of <code>entering</code> a span in a Tokio stack is chewing up a significant amount of CPU, something has gone haywire: the actual code being traced must be doing next to nothing.</p> <h2 id='a-quick-refresher-on-async-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-quick-refresher-on-async-rust' aria-label='Anchor'></a><span class='plain-code'>A Quick Refresher On Async Rust</span></h2> <p>So in Rust, like a lot of <code>async/await</code> languages, you&rsquo;ve got <code>Futures</code>. A <code>Future</code> is a type that represents the future value of an asychronous computation, like reading from a socket. <code>Futures</code> are state machines, and they&rsquo;re lazy: they expose one basic operation, <code>poll</code>, which an executor (like Tokio) calls to advance the state machine. That <code>poll</code> returns whether the <code>Future</code> is still <code>Pending</code>, or <code>Ready</code> with a result.</p> <p>In theory, you could build an executor that drove a bunch of <code>Futures</code> just by storing them in a list and busypolling each of them, round robin, until they return <code>Ready</code>. This executor would defeat the much of the purpose of asynchronous program, so no real executor works that way.</p> <p>Instead, a runtime like Tokio integrates <code>Futures</code> with an event loop (on <a href='https://man7.org/linux/man-pages/man7/epoll.7.html' title=''>epoll</a> or <a href='https://en.wikipedia.org/wiki/Kqueue' title=''>kqeue</a>) and, when calling <code>poll</code>, passes a <code>Waker</code>. The <code>Waker</code> is an abstract handle that allows the <code>Future</code> to instruct the Tokio runtime to call <code>poll</code>, because something has happened.</p> <p>To complicate things: an ordinary <code>Future</code> is a one-shot value. Once it&rsquo;s <code>Ready</code>, it can&rsquo;t be <code>polled</code> anymore. But with network programming, that&rsquo;s usually not what you want: data usually arrives in streams, which you want to track and make progress on as you can. So async Rust provides <code>AsyncRead</code> and <code>AsyncWrite</code> traits, which build on <code>Futures</code>, and provide methods like <code>poll_read</code> that return <code>Ready</code> <em>every time</em> there&rsquo;s data ready. </p> <p>So far so good? OK. Now, there are two footguns in this design. </p> <p>The first footgun is that a <code>poll</code> of a <code>Future</code> that isn&rsquo;t <code>Ready</code> wastes cycles, and, if you have a bug in your code and that <code>Pending</code> poll happens to trip a <code>Waker</code>, you&rsquo;ll slip into an infinite loop. That&rsquo;s easy to see.</p> <p>The second and more insidious footgun is that an <code>AsyncRead</code> can <code>poll_read</code> to a <code>Ready</code> that doesn&rsquo;t actually progress its underlying state machine. Since the idea of <code>AsyncRead</code> is that you keep <code>poll_reading</code> until it stops being <code>Ready</code>, this too is an infinite loop.</p> <p>When we look at our profiles, what we see are samples that almost terminate in libc, but spend next to no time in the kernel doing actual I/O. The obvious explanation: we&rsquo;ve entered lots of <code>poll</code> functions, but they&rsquo;re doing almost nothing and returning immediately.</p> <h2 id='jaccuse' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#jaccuse' aria-label='Anchor'></a><span class='plain-code'>J&#39;accuse!</span></h2> <p>Wakeup issues are annoying to debug. But the flamegraph gives us the fully qualified type of the <code>Future</code> we&rsquo;re polling:</p> <div class="highlight-wrapper group relative rust"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-hfleqvh4" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-hfleqvh4"><span class="o">&amp;</span><span class="k">mut</span> <span class="nn">fp_io</span><span class="p">::</span><span class="nn">copy</span><span class="p">::</span><span class="n">Duplex</span><span class="o">&lt;&amp;</span><span class="k">mut</span> <span class="nn">fp_io</span><span class="p">::</span><span class="nn">reusable_reader</span><span class="p">::</span><span class="n">ReusableReader</span><span class="o">&lt;</span><span class="nn">fp_tcp</span><span class="p">::</span><span class="nn">peek</span><span class="p">::</span><span class="n">PeekableReader</span><span class="o">&lt;</span><span class="nn">tokio_rustls</span><span class="p">::</span><span class="nn">server</span><span class="p">::</span><span class="n">TlsStream</span><span class="o">&lt;</span><span class="nn">fp_tcp_metered</span><span class="p">::</span><span class="n">MeteredIo</span><span class="o">&lt;</span><span class="nn">fp_tcp</span><span class="p">::</span><span class="nn">peek</span><span class="p">::</span><span class="n">PeekableReader</span><span class="o">&lt;</span><span class="nn">fp_tcp</span><span class="p">::</span><span class="nn">permitted</span><span class="p">::</span><span class="n">PermittedTcpStream</span><span class="o">&gt;&gt;&gt;&gt;&gt;</span><span class="p">,</span> <span class="nn">connect</span><span class="p">::</span><span class="nn">conn</span><span class="p">::</span><span class="n">Conn</span><span class="o">&lt;</span><span class="nn">tokio</span><span class="p">::</span><span class="nn">net</span><span class="p">::</span><span class="nn">tcp</span><span class="p">::</span><span class="nn">stream</span><span class="p">::</span><span class="n">TcpStream</span><span class="o">&gt;</span> </code></pre> </div> </div> <p>This loops like a lot, but much of it is just wrapper types we wrote ourselves, and those wrappers don&rsquo;t do anything interesting. What&rsquo;s left to audit:</p> <ul> <li><code>Duplex</code>, the outermost type, one of ours, <em>and</em> </li><li><code>TlsStream</code>, from <a href='https://github.com/rustls/rustls' title=''>Rustls</a>. </li></ul> <p><code>Duplex</code> is a beast. It&rsquo;s the core I/O state machine for proxying between connections. It&rsquo;s not easy to reason about in specificity. But: it also doesn&rsquo;t do anything directly with a <code>Waker</code>; it&rsquo;s built around <code>AsyncRead</code> and <code>AsyncWrite</code>. It hasn&rsquo;t changed recently and we can&rsquo;t trigger misbehavior in it.</p> <p>That leaves <code>TlsStream</code>. <code>TlsStream</code> is an ultra-important, load-bearing function in the Rust ecosystem. Everybody uses it. Could it harbor an async Rust footgun? Turns out, it did!</p> <p>Unlike our <code>Duplex</code>, Rustls actually does have to get intimate with the underlying async executor. And, looking through the repository, Pavel uncovers <a href='https://github.com/rustls/tokio-rustls/issues/72' title=''>this issue</a>: sometimes, <code>TlsStreams</code> in Rustls just spin out. And it turns out, what&rsquo;s causing this is a TLS state machine bug: when a TLS session is orderly-closed, with a <code>CloseNotify</code> <code>Alert</code> record, the sender of that record has informed its counterparty that no further data will be sent. But if there&rsquo;s still buffered data on the underlying connection, <code>TlsStream</code> mishandles its <code>Waker</code>, and we fall into a busy-loop.</p> <p><a href='https://github.com/rustls/rustls/pull/1950/files' title=''>Pretty straightforward fix</a>!</p> <h2 id='what-actually-happened-to-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-actually-happened-to-us' aria-label='Anchor'></a><span class='plain-code'>What Actually Happened To Us</span></h2> <p>Our partners in object storage, <a href='https://www.tigrisdata.com/' title=''>Tigris Data</a>, were conducting some kind of load testing exercise. Some aspect of their testing system triggered the <code>TlsStream</code> state machine bug, which locked up one or more <code>TlsStreams</code> in the edge proxy handling whatever corner-casey stream they were sending.</p> <p>Tigris wasn&rsquo;t generating a whole lot of traffic; tens of thousands of connections, tops. But all of them sent small HTTP bodies and then terminated early. We figured some of those connections errored out, and set up the &ldquo;TLS CloseNotify happened before EOF&rdquo; scenario. </p> <p>To be truer to the chronology, we knew pretty early on in our investigation that something Tigris was doing with their load testing was probably triggering the bug, and we got them to stop. After we worked it out, and Pavel deployed the fix, we told them to resume testing. No spin-outs.</p> <h2 id='lessons-learned' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lessons-learned' aria-label='Anchor'></a><span class='plain-code'>Lessons Learned</span></h2> <p>Keep your dependencies updated. Unless you shouldn&rsquo;t keep your dependencies updated. I mean, if there&rsquo;s a vulnerability (and, technically, this was a DoS vulnerability), always update. And if there&rsquo;s an important bugfix, update. But if there isn&rsquo;t an important bugfix, updating for the hell of it might also destabilize your project? So update maybe? Most of the time?</p> <p>Really, the truth of this is that keeping track of <em>what needs to be updated</em> is valuable work. The updates themselves are pretty fast and simple, but the process and testing infrastructure to confidently metabolize dependency updates is not. </p> <p>Our other lesson here is that there&rsquo;s an opportunity to spot these kinds of bugs more directly with our instrumentation. Spurious wakeups should be easy to spot, and triggering a metric when they happen should be cheap, because they&rsquo;re not supposed to happen often. So that&rsquo;s something we&rsquo;ll go do now.</p></content> </entry> <entry> <title>We Were Wrong About GPUs</title> <link rel="alternate" href="https://fly.io/blog/wrong-about-gpu/"/> <id>https://fly.io/blog/wrong-about-gpu/</id> <published>2025-02-14T00:00:00+00:00</published> <updated>2025-02-17T10:54:41+00:00</updated> <media:thumbnail url="https://fly.io/blog/wrong-about-gpu/assets/choices-choices-cover.webp"/> <content type="html"><div class="lead"><p>We’re building a public cloud, on hardware we own. We raised money to do that, and to place some bets; one of them: GPU-enabling our customers. A progress report: GPUs aren’t going anywhere, but: GPUs aren’t going anywhere.</p> </div> <p>A couple years back, <a href="https://fly.io/gpu">we put a bunch of chips down</a> on the bet that people shipping apps to users on the Internet would want GPUs, so they could do AI/ML inference tasks. To make that happen, we created <a href="https://fly.io/docs/gpus/getting-started-gpus/">Fly GPU Machines</a>.</p> <p>A Fly Machine is a <a href="https://fly.io/blog/docker-without-docker/">Docker/OCI container</a> running inside a hardware-virtualized virtual machine somewhere on our global fleet of bare-metal worker servers. A GPU Machine is a Fly Machine with a hardware-mapped Nvidia GPU. It&rsquo;s a Fly Machine that can do fast CUDA.</p> <p>Like everybody else in our industry, we were right about the importance of AI/ML. If anything, we underestimated its importance. But the product we came up with probably doesn&rsquo;t fit the moment. It&rsquo;s a bet that doesn&rsquo;t feel like it&rsquo;s paying off.</p> <p><strong class='font-semibold text-navy-950'>If you&rsquo;re using Fly GPU Machines, don&rsquo;t freak out; we&rsquo;re not getting rid of them.</strong> But if you&rsquo;re waiting for us to do something bigger with them, a v2 of the product, you&rsquo;ll probably be waiting awhile.</p> <h3 id='what-it-took' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-took' aria-label='Anchor'></a><span class='plain-code'>What It Took</span></h3> <p>GPU Machines were not a small project for us. Fly Machines run on an idiosyncratically small hypervisor (normally Firecracker, but for GPU Machines <a href="https://github.com/cloud-hypervisor/cloud-hypervisor">Intel&rsquo;s Cloud Hypervisor</a>, a very similar Rust codebase that supports PCI passthrough). The Nvidia ecosystem is not geared to supporting micro-VM hypervisors.</p> <p>GPUs <a href="https://googleprojectzero.blogspot.com/2020/09/attacking-qualcomm-adreno-gpu.html">terrified our security team</a>. A GPU is just about the worst case hardware peripheral: intense multi-directional direct memory transfers</p> <div class="right-sidenote"><p>(not even bidirectional: in common configurations, GPUs talk to each other)</p> </div> <p>with arbitrary, end-user controlled computation, all operating outside our normal security boundary.</p> <p>We did a couple expensive things to mitigate the risk. We shipped GPUs on dedicated server hardware, so that GPU- and non-GPU workloads weren&rsquo;t mixed. Because of that, the only reason for a Fly Machine to be scheduled on a GPU machine was that it needed a PCI BDF for an Nvidia GPU, and there&rsquo;s a limited number of those available on any box. Those GPU servers were drastically less utilized and thus less cost-effective than our ordinary servers.</p> <p>We funded two very large security assessments, from <a href="https://www.atredis.com/">Atredis</a> and <a href="https://tetrelsec.com/">Tetrel</a>, to evaluate our GPU deployment. Matt Braun is writing up those assessments now. They were not cheap, and they took time.</p> <p>Security wasn&rsquo;t directly the biggest cost we had to deal with, but it was an indirect cause for a subtle reason.</p> <p>We could have shipped GPUs very quickly by doing what Nvidia recommended: standing up a standard K8s cluster to schedule GPU jobs on. Had we taken that path, and let our GPU users share a single Linux kernel, we&rsquo;d have been on Nvidia&rsquo;s driver happy-path.</p> <p>Alternatively, we could have used a conventional hypervisor. Nvidia suggested VMware (heh). But they could have gotten things working had we used QEMU. We like QEMU fine, and could have talked ourselves into a security story for it, but the whole point of Fly Machines is that they take milliseconds to start. We could not have offered our desired Developer Experience on the Nvidia happy-path.</p> <p>Instead, we burned months trying (and ultimately failing) to get Nvidia&rsquo;s host drivers working to map <a href="https://www.nvidia.com/en-us/data-center/virtual-solutions/">virtualized GPUs</a> into Intel Cloud Hypervisor. At one point, we hex-edited the closed-source drivers to trick them into thinking our hypervisor was QEMU.</p> <p>I&rsquo;m not sure any of this really mattered in the end. There&rsquo;s a segment of the market we weren&rsquo;t ever really able to explore because Nvidia&rsquo;s driver support kept us from thin-slicing GPUs. We&rsquo;d have been able to put together a really cheap offering for developers if we hadn&rsquo;t run up against that, and developers love &ldquo;cheap&rdquo;, but I can&rsquo;t prove that those customers are real.</p> <p>On the other hand, we&rsquo;re committed to delivering the Fly Machine DX for GPU workloads. Beyond the PCI/IOMMU drama, just getting an entire hardware GPU working in a Fly Machine was a lift. We needed Fly Machines that would come up with the right Nvidia drivers; our stack was built assuming that the customer&rsquo;s OCI container almost entirely defined the root filesystem for a Machine. We had to engineer around that in our <code>flyd</code> orchestrator. And almost everything people want to do with GPUs involves efficiently grabbing huge files full of model weights. Also annoying!</p> <p>And, of course, we bought GPUs. A lot of GPUs. Expensive GPUs.</p> <h3 id='why-it-isnt-working' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-it-isnt-working' aria-label='Anchor'></a><span class='plain-code'>Why It Isn&rsquo;t Working</span></h3> <p>The biggest problem: developers don&rsquo;t want GPUs. They don&rsquo;t even want AI/ML models. They want LLMs. <em>System engineers</em> may have smart, fussy opinions on how to get their models loaded with CUDA, and what the best GPU is. But <em>software developers</em> don&rsquo;t care about any of that. When a software developer shipping an app comes looking for a way for their app to deliver prompts to an LLM, you can&rsquo;t just give them a GPU.</p> <p>For those developers, who probably make up most of the market, it doesn&rsquo;t seem plausible for an insurgent public cloud to compete with OpenAI and Anthropic. Their APIs are fast enough, and developers thinking about performance in terms of &ldquo;tokens per second&rdquo; aren&rsquo;t counting milliseconds.</p> <div class="right-sidenote"><p>(you should all feel sympathy for us)</p> </div> <p>This makes us sad because we really like the point in the solution space we found. Developers shipping apps on Amazon will outsource to other public clouds to get cost-effective access to GPUs. But then they&rsquo;ll faceplant trying to handle data and model weights, backhauling gigabytes (at significant expense) from S3. We have app servers, GPUs, and object storage all under the same top-of-rack switch. But inference latency just doesn&rsquo;t seem to matter yet, so the market doesn&rsquo;t care.</p> <p>Past that, and just considering the system engineers who do care about GPUs rather than LLMs: the hardware product/market fit here is really rough.</p> <p>People doing serious AI work want galactically huge amounts of GPU compute. A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s.</p> <div class="right-sidenote"><p>Near as we can tell, MIG gives you a UUID to talk to the host driver, not a PCI device.</p> </div> <p>We think there&rsquo;s probably a market for users doing lightweight ML work getting tiny GPUs. <a href="https://www.nvidia.com/en-us/technologies/multi-instance-gpu/">This is what Nvidia MIG does</a>, slicing a big GPU into arbitrarily small virtual GPUs. But for fully-virtualized workloads, it&rsquo;s not baked; we can&rsquo;t use it. And I&rsquo;m not sure how many of those customers there are, or whether we&rsquo;d get the density of customers per server that we need.</p> <p><a href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half">That leaves the L40S customers</a>. There are a bunch of these! We dropped L40S prices last year, not because we were sour on GPUs but because they&rsquo;re the one part we have in our inventory people seem to get a lot of use out of. We&rsquo;re happy with them. But they&rsquo;re just another kind of compute that some apps need; they&rsquo;re not a driver of our core business. They&rsquo;re not the GPU bet paying off.</p> <p>Really, all of this is just a long way of saying that for most software developers, &ldquo;AI-enabling&rdquo; their app is best done with API calls to things like Claude and GPT, Replicate and RunPod.</p> <h3 id='what-did-we-learn' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-did-we-learn' aria-label='Anchor'></a><span class='plain-code'>What Did We Learn?</span></h3> <p>A very useful way to look at a startup is that it&rsquo;s a race to learn stuff. So, what&rsquo;s our report card?</p> <p>First off, when we embarked down this path in 2022, we were (like many other companies) operating in a sort of phlogiston era of AI/ML. The industry attention to AI had not yet collapsed around a small number of foundational LLM models. We expected there to be a diversity of <em>mainstream</em> models, the world <a href='https://github.com/elixir-nx/bumblebee' title=''>Elixir Bumblebee</a> looks forward to, where people pull different AI workloads off the shelf the same way they do Ruby gems.</p> <p>But <a href='https://www.cursor.com/' title=''>Cursor happened</a>, and, as they say, how are you going to keep &lsquo;em down on the farm once they&rsquo;ve seen Karl Hungus? It seems much clearer where things are heading.</p> <p>GPUs were a test of a Fly.io company credo: as we think about core features, we design for 10,000 developers, not for 5-6. It took a minute, but the credo wins here: GPU workloads for the 10,001st developer are a niche thing.</p> <p>Another way to look at a startup is as a series of bets. We put a lot of chips down here. But the buy-in for this tournament gave us a lot of chips to play with. Never making a big bet of any sort isn&rsquo;t a winning strategy. I&rsquo;d rather we&rsquo;d flopped the nut straight, but I think going in on this hand was the right call.</p> <p>A really important thing to keep in mind here, and something I think a lot of startup thinkers sleep on, is the extent to which this bet involved acquiring assets. Obviously, some of our <a href='https://fly.io/blog/the-exit-interview-jp/' title=''>costs here aren&rsquo;t recoverable</a>. But the hardware parts that aren&rsquo;t generating revenue will ultimately get liquidated; like with <a href='https://fly.io/blog/32-bit-real-estate/' title=''>our portfolio of IPv4 addresses</a>, I&rsquo;m even more comfortable making bets backed by tradable assets with durable value.</p> <p>In the end, I don&rsquo;t think GPU Fly Machines were going to be a hit for us no matter what we did. Because of that, one thing I&rsquo;m very happy about is that we didn&rsquo;t compromise the rest of the product for them. Security concerns slowed us down to where we probably learned what we needed to learn a couple months later than we could have otherwise, but we&rsquo;re scaling back our GPU ambitions without having sacrificed <a href='https://fly.io/blog/sandboxing-and-workload-isolation/' title=''>any of our isolation story</a>, and, ironically, GPUs <em>other people run</em> are making that story a lot more important. The same thing goes for our Fly Machine developer experience.</p> <p>We started this company building a Javascript runtime for edge computing. We learned that our customers didn&rsquo;t want a new Javascript runtime; they just wanted their native code to work. <a href='https://news.ycombinator.com/item?id=22616857' title=''>We shipped containers</a>, and no convincing was needed. We were wrong about Javascript edge functions, and I think we were wrong about GPUs. That&rsquo;s usually how we figure out the right answers: by being wrong about a lot of stuff.</p></content> </entry> <entry> <title>The Exit Interview: JP Phillips</title> <link rel="alternate" href="https://fly.io/blog/the-exit-interview-jp/"/> <id>https://fly.io/blog/the-exit-interview-jp/</id> <published>2025-02-12T00:00:00+00:00</published> <updated>2025-02-12T14:06:21+00:00</updated> <media:thumbnail url="https://fly.io/blog/the-exit-interview-jp/assets/bye-bye-little-sebastian-cover.webp"/> <content type="html"><div class="lead"><p>JP Phillips is off to greener, or at least calmer, pastures. He joined us 4 years ago to build the next generation of our orchestration system, and has been one of the anchors of our engineering team. His last day is today. We wanted to know what he was thinking, and figured you might too.</p> </div> <p><em>Question 1: Why, JP? Just why?</em></p> <p>LOL. When I looked at what I wanted to see from here in the next 3-4 years, it didn&rsquo;t really match up with where we&rsquo;re currently heading. Specifically, with our new focus on MPG <em>[Managed Postgres]</em> and [llm] <em>[llm].</em></p> <div class="callout"><p>Editorial comment: Even I don’t know what [llm] is.</p> </div> <p>The Fly Machines platform is more or less finished, in the sense of being capable of supporting the next iteration of our products. My original desire to join Fly.io was to make Machines a product that would <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>rid us of HashiCorp Nomad</a>, and I feel like that&rsquo;s been accomplished.</p> <p><em>Where were you hoping to see us headed?</em></p> <p>More directly positioned as a cloud provider, rather than a platform-as-a-service; further along the customer journey from &ldquo;developers&rdquo; and &ldquo;startups&rdquo; to large established companies.</p> <p>And, it&rsquo;s not that I disagree with PAAS work or MPG! Rather, it&rsquo;s not something that excites me in a way that I&rsquo;d feel challenged and could continue to grow technically.</p> <p><em>Follow up question: does your family know what you’re doing here? Doing to us? Are they OK with it?</em></p> <p>Yes, my family was very involved in the decision, before I even talked to other companies.</p> <p><em>What&rsquo;s the thing you&rsquo;re happiest about having built here? It cannot be &ldquo;all of <code>flyd</code>&rdquo;.</em></p> <p>We&rsquo;ve enabled developers to run workloads from an OCI image and an API call all over the world. On any other cloud provider, the knowledge of how to pull that off comes with a professional certification.</p> <p><em>In what file in our <code>nomad-firecracker</code> repository would I find that code?</em></p> <p><a href='https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines' title=''>https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines</a></p> <p><img alt="A diagram that doesn&#39;t make any of this clearer" src="/blog/the-exit-interview-jp/assets/flaps.png?1/2&amp;center" /></p> <p><em>So you mean, literally, the whole Fly Machines API, and <code>flaps</code>, the API gateway for Fly Machines?</em></p> <p>Yes, all of it. The <code>flaps</code> API server, the <code>flyd</code> RPCs it calls, the <code>flyd</code> finite state machine system, the interface to running VMs.</p> <p><em>Is there something you especially like about that design?</em></p> <p>I like that it for the most part doesn&rsquo;t require any central coordination. And I like that the P90 for Fly Machine <code>create</code> calls is sub-5-seconds for pretty much every region except for Johannesburg and Hong Kong.</p> <p>I think the FSM design is something I&rsquo;m proud of; if I could take any code with me, it&rsquo;d be the <code>internal/fsm</code> in the <code>nomad-firecracker</code> repo.</p> <div class="callout"><p>You can read more about <a href="https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/" title="">the <code>flyd</code> orchestrator JP led over here</a>. But, a quick decoder ring: <code>flyd</code> runs independently without any central coordination on thousands of “worker” servers around the globe. It’s structured as an API server for a bunch of finite state machine invocations, where an FSM might be something like “start a Fly Machine” or “create a new Fly Machine” or “cordon off a Fly Machine so we can update it”. Each FSM invocation is comprised of a bunch of steps, each of those steps has callbacks into the <code>flyd</code> code, and each step is logged in <a href="https://github.com/boltdb/bolt" title="">a BoltDB database</a>.</p> </div> <p><em>Thinking back, there are like two archetypes of insanely talented developers I’ve worked with. One is the kind that belts out ridiculous amounts of relatively sophisticated code on a whim, at like 3AM. Jerome [who leads our fly-proxy team], is that type. The other comes to projects with what feels like fully-formed, coherent designs that are not super intuitive, and the whole project just falls together around that design. Did you know you were going to do the FSM log thing when you started <code>flyd</code>?</em></p> <p>I definitely didn&rsquo;t have any specific design in mind when I started on <code>flyd</code>. I think the FSM stuff is a result of work I did at Compose.io / MongoHQ (where it was called &ldquo;recipes&rdquo;/&ldquo;operations&rdquo;) and the workd I did at HashiCorp using Cadence.</p> <p>Once I understood what the product needed to do and look like, having a way to perform deterministic and durable execution felt like a good design.</p> <p><em>Cadence?</em></p> <p><a href='https://cadenceworkflow.io/' title=''>Cadence</a> is the child of AWS Step Functions and the predecessor to <a href='https://temporal.io/' title=''>Temporal</a> (the company).</p> <p>One of the biggest gains, with how it works in <code>flyd</code>, is knowing we would need to deploy <code>flyd</code> all day, every day. If <code>flyd</code> was in the middle of doing some work, it needed to pick back up right where it left off, post-deploy.</p> <p><em>OK, next question. What&rsquo;s the most impressive thing you saw someone else build here? To make things simpler and take some pressure off the interview, we can exclude any of my works from consideration.</em></p> <p>Probably <a href='https://github.com/superfly/corrosion' title=''><code>corrosion2</code></a>.</p> <div class="callout"><p>Sidebar: <code>corrosion2</code> is our state distribution system. While <code>flyd</code> runs individual Fly Machines for users, each instance is solely responsible for its own state; there’s no global scheduler. But we have platform components, most obviously <code>fly-proxy</code>, our Anycast router, that need to know what’s running where. <code>corrosion2</code> is a Rust service that does <a href="https://fly.io/blog/building-clusters-with-serf/" title="">SWIM gossip</a> to propagate information from each worker into a CRDT-structured SQLite database. <code>corrosion2</code> essentially means any component on our fleet can do SQLite queries to get near-real-time information about any Fly Machine around the world.</p> </div> <p>If for no other reason than that we deployed <code>corrosion</code>, learned from it, and were able to make significant and valuable improvements — and then migrate to the new system in a short period of time.</p> <p>Having a &ldquo;just SQLite&rdquo; interface, for async replicated changes around the world in seconds, it&rsquo;s pretty powerful.</p> <p>If we invested in <a href='https://antithesis.com/' title=''>Anthesis</a> or TLA+ testing, I think there&rsquo;s <a href='https://github.com/superfly/corrosion' title=''>potential for other companies</a> to get value out of <code>corrosion2</code>.</p> <p><em>Just as a general-purpose gossip-based SQLite CRDT gossip system?</em></p> <p>Yes.</p> <p><em>OK, you&rsquo;re being too nice. What&rsquo;s your least favorite thing about the platform?</em></p> <p>GraphQL. No, Elixir. It&rsquo;s a tie between GraphQL and Elixir.</p> <p>But probably GraphQL, by a hair.</p> <p><em>That&rsquo;s not the answer I expected.</em></p> <p>GraphQL slows everyone down, and everything. Elixir only slows me down.</p> <p><em>The rest of the platform, you&rsquo;re fine with? No complaints?</em></p> <p>I&rsquo;m happier now that we have <code>pilot</code>.</p> <div class="callout"><p><code>pilot</code> is our new <code>init</code>. When we launch a Fly Machine, <code>init</code> is our foothold in the machine; this is unlike a normal OCI runtime, where “pid 1” is often the user’s entrypoint program. Our original <code>init</code> was so simple people dunked on it and said it might as well have been a bash script; over time, <code>init</code> has sprouted a bunch of new features. <code>pilot</code> consolidates those features, and, more importantly, is itself a complete OCI runtime; <code>pilot</code> can natively run containers inside of Fly Machines.</p> </div> <p>Before <code>pilot</code>, there really wasn&rsquo;t any contract between <code>flyd</code> and <code>init</code>. And <code>init</code> was just &ldquo;whatever we wanted <code>init</code> to be&rdquo;. That limit its ability to serve us.</p> <p>Having <code>pilot</code> be an OCI-compliant runtime with an API for <code>flyd</code> to drive is a big win for the future of the Fly Machines API.</p> <p><em>Was I right that we should have used SQLite for <code>flyd</code>, or were you wrong to have used BoltDB?</em></p> <p>I still believe Bolt was the right choice. I&rsquo;ve never lost a second of sleep worried that someone is about to run a SQL update statement on a host, or across the whole fleet, and then mangled all our state data. And limiting the storage interface, by not using SQL, kept <code>flyd</code>&lsquo;s scope managed.</p> <p>On the engine side of the platform, which is what <code>flyd</code> is, I still believe SQL is too powerful for what <code>flyd</code> does.</p> <p><em>If you had this to do over again, would Bolt be precisely what you&rsquo;d pick, or is there something else you&rsquo;d want to try? Some cool-ass new KV store?</em></p> <p>Nah. But, I&rsquo;d maybe consider a SQLite database per-Fly-Machine. Then the scope of danger is about as small as it could possibly be.</p> <p><em>Whoah, that&rsquo;s an interesting thought. People sleep on the &ldquo;keep a zillion little SQLites&rdquo; design.</em></p> <p>Yeah, with per-Machine SQLite, once a Fly Machine is destroyed, we can just zip up the database and stash it in object storage. The biggest hold-up I have about it is how we&rsquo;d manage the schemas.</p> <p><em>OpenTelemetry: were you right all along?</em></p> <p>One hundred percent.</p> <p><em>I basically attribute oTel at Fly.io to you.</em></p> <p>Without oTel, it&rsquo;d be a disaster trying to troubleshoot the system. I&rsquo;d have ragequit trying.</p> <p><em>I remember there being a cost issue, with how much Honeycomb was going to end up charging us to manage all the data. But that seems silly in retrospect.</em></p> <p>For sure. It is 100% part of the decision and the conversation. But: we didn&rsquo;t have the best track record running a logs/metrics cluster at this fidelity. It was worth the money to pay someone else to manage tracing data.</p> <p><em>Strong agree. I think my only issue is just the extent to which it cruds up code. But I need to get over that.</em></p> <p>Yes, it&rsquo;s very explicit. I think the next big part of oTel is going to be auto-instrumentation, for profiling.</p> <p><em>You&rsquo;re a veteran Golang programmer. Say 3 nice things about Rust.</em></p> <div class="callout"><p>Most of our backend is in Go, but <code>fly-proxy</code>, <code>corrosion2</code>, and <code>pilot</code> are in Rust.</p> </div> <ol> <li>Option. </li><li>Match. </li><li>Serde macros. </li></ol> <p><em>Even I can&rsquo;t say shit about Option and match.</em></p> <p>Match is so much better than anything in Go.</p> <p><em>Elixir, Go, and Rust. An honest take on that programming cocktail.</em></p> <p>Three&rsquo;s a crowd, Elixir can stay home.</p> <p><em>If you could only lose one, you&rsquo;d keep Rust.</em></p> <p>I&rsquo;ve learned its shortcomings and the productivity far outweighs having to deal with the Rust compiler.</p> <p><em>You&rsquo;d be unhappy if we moved the <code>flaps</code> API code from Go to Elixir.</em></p> <p>Correct.</p> <p><em>I kind of buy the idea of doing orchestration and scheduling code, which is policy-intensive, in a higher-level language.</em></p> <p>Maybe. If Ruby had a better concurrency story, I don&rsquo;t think Elixir would have a place for us.</p> <div class="callout"><p>Here I need to note that Ruby is functionally dead here, and Elixir is ascendant.</p> </div> <p><em>We have an idiosyncratic management structure. We&rsquo;re bottom-up, but ambiguously so. We don&rsquo;t have roadmaps, except when we do. We have minimal top-down technical direction. Critique.</em></p> <p>It&rsquo;s too easy to lose sight of whether your current focus [in what you&rsquo;re building] is valuable to the company.</p> <p><em>The first thing I warn every candidate about on our &ldquo;do-not-work-here&rdquo; calls.</em></p> <p>I think it comes down to execution, and accountability to actually finish projects. I spun a lot trying to figure out what would be the most valuable work for Fly Machines.</p> <p><em>You don&rsquo;t have to be so nice about things.</em></p> <p>We struggle a lot with consistent communication. We change direction a little too often. It got to a point where I didn&rsquo;t see a point in devoting time and effort into projects, because I&rsquo;d not be able to show enough value quick enough.</p> <p><em>I see things paying off later than we&rsquo;d hoped or expected they would. Our secret storage system, Pet Semetary, is a good example of this. Our K8s service, FKS, is another obvious one, since we&rsquo;re shipping MPG on it.</em></p> <p><em>This is your second time working Kurt, at a company where he&rsquo;s the CEO. Give him a 1-4 star rating. He can take it! At least, I think he can take it.</em></p> <p>2022: ★★★★</p> <p>2023: ★★</p> <p>2024: ★★✩</p> <p>2025: ★★★✩</p> <p>On a four-star scale.</p> <p><em>Whoah. I did not expect a histogram. Say more about 2023!</em></p> <p>We hired too many people, too quickly, and didn&rsquo;t have the guardrails and structure in place for everybody to be successful.</p> <p><em>Also: GPUs!</em></p> <p>Yes. That was my next comment.</p> <p><em>Do we secretly agree about GPUs?</em></p> <p>I think so.</p> <p><em>Our side won the argument in the end! But at what cost?</em></p> <p>They were a killer distraction.</p> <p><em>Final question: how long will you remain in the first-responder on-call rotation after you leave? I assume at least until August. I have a shift this weekend; can you swap with me? I keep getting weekends.</em></p> <p>I am going to be asleep all weekend if any of my previous job changes are indicative.</p> <p><em>I sleep through on-call too! But nobody can yell at you for it now. I think you have the comparative advantage over me in on-calling.</em></p> <p>Yes I will absolutely take all your future on-call shifts, you have convinced me.</p> <p><em>All this aside: it has been a privilege watching you work. I hope your next gig is 100x more relaxing than this was. Or maybe I just hope that for myself. Except: I&rsquo;ll never escape this place. Thank you so much for doing this.</em></p> <p>Thank you! I&rsquo;m forever grateful for having the opportunity to be a part of Fly.io.</p></content> </entry> <entry> <title>A Blog, If You Can Keep It</title> <link rel="alternate" href="https://fly.io/blog/a-blog-if-kept/"/> <id>https://fly.io/blog/a-blog-if-kept/</id> <published>2025-02-10T00:00:00+00:00</published> <updated>2025-02-19T13:16:17+00:00</updated> <media:thumbnail url="https://fly.io/blog/a-blog-if-kept/assets/keep-blog.webp"/> <content type="html"><div class="lead"><p>A boldfaced lede like this was a sure sign you were reading a carefully choreographed EffortPost from our team at Fly.io. We’re going to do less of those. Or the same amount but more of a different kind of post. Either way: launch an app on Fly.io today!</p> </div> <p>Over the last 5 years, we’ve done pretty well for ourselves writing content for Hacker News. And that’s <a href='https://news.ycombinator.com/item?id=39373476' title=''>mostly</a> been good for us. We don’t do conventional marketing, we don’t have a sales team, the rest of social media is atomized over 5 different sites. Writing pieces that HN takes seriously has been our primary outreach tool.</p> <p>There’s a recipe (probably several, but I know this one works) for charting a post on HN:</p> <ol> <li>Write an EffortPost, which is to say a dense technical piece over 2000 words long; within that rubric there’s a bunch of things that are catnip to HN, including runnable code, research surveys, and explainers. (There are also cat-repellants you learn to steer clear of.) </li><li>Assiduously avoid promotion. You have to write for the audience. We get away with sporadically including a call-to-action block in our posts, but otherwise: the post should make sense even if an unrelated company posted it after you went out of business. </li><li>Pick topics HN is interested in (it helps if all your topics are au courant for HN, and we’ve been <a href='https://news.ycombinator.com/item?id=32250426' title=''>very</a> <a href='https://news.ycombinator.com/item?id=32018066' title=''>lucky</a> in that regard). </li><li>Like 5-6 more style-guide things that help incrementally. Probably 3 different teams writing for HN will have 3 different style guides with only like &frac12; overlap. Ours, for instances, instructs writers to swear. </li></ol> <p>I like this kind of writing. It’s not even a chore. But it’s become an impediment for us, for a couple reasons: the team serializes behind an “editorial” function here, which keeps us from publishing everything we want; worse, caring so much about our track record leaves us noodling on posts interminably (the poor <a href='https://www.tigrisdata.com/' title=''>Tigrises</a> have been waiting for months for me to publish the piece I wrote about them and FoundationDB; take heart, this post today means that one is coming soon).</p> <p>But worst of all, I worried incessantly about us <a href='https://gist.github.com/tqbf/e853764b562a2d72a91a6986ca3b77c0' title=''>wearing out our welcome</a>. To my mind, we’d have 1, maybe 2 bites at the HN apple in a given month, and we needed to make them count.</p> <p>That was dumb. I am dumb about a lot of things! I came around to understanding this after Kurt demanded I publish my blog post about BFAAS (Bash Functions As A Service), 500 lines of Go code that had generated 4500 words in my draft. It was only after I made the decision to stop gatekeeping this blog that I realized <a href='https://simonwillison.net/' title=''>Simon Willison</a> has been disproving my “wearing out the welcome” theory, day in and day out, for years. He just writes stuff about LLMs when it interests him. I mean, it helps that he’s a better writer than we are. But he’s not wasting time choreographing things.</p> <p>Back in like 2009, <a href='https://web.archive.org/web/20110806040300/http://chargen.matasano.com/chargen/2009/7/22/if-youre-typing-the-letters-a-e-s-into-your-code-youre-doing.html' title=''>we had a blog</a> at another company I was at. That blog drove a lot of business for us (and, on three occasions, almost killed me). It was not in the least bit optimized for HN. I like pretending to be a magazine feature writer, but I miss writing dashed-off pieces every day and clearing space for other people on the team to write as well.</p> <p>So this is all just a heads up: we’re trying something new. This is a very long and self-indulgent way to say “we’re going to write a normal blog like it’s 2008”, but that’s how broken my brain is after years of having my primary dopaminergic rewards come from how long Fly.io blog posts stay on the front page: I have to disclaim blogging before we start doing it, lest I fail to meet expectations.</p> <p>Like I said. I’m real dumb. But: looking forward to getting a lot more stuff out on the web for people to read this year!</p></content> </entry> <entry> <title>Did Semgrep Just Get A Lot More Interesting?</title> <link rel="alternate" href="https://fly.io/blog/semgrep-but-for-real-now/"/> <id>https://fly.io/blog/semgrep-but-for-real-now/</id> <published>2025-02-10T00:00:00+00:00</published> <updated>2025-02-11T00:20:14+00:00</updated> <media:thumbnail url="https://fly.io/blog/semgrep-but-for-real-now/assets/ci-cd-cover.webp"/> <content type="html"><div class="right-sidenote"><p>This whole paragraph is just one long sentence. God I love <a href="https://fly.io/blog/a-blog-if-kept/" title="">just random-ass blogging</a> again.</p> </div> <p><a href='https://ghuntley.com/stdlib/' title=''>This bit by Geoffrey Huntley</a> is super interesting to me and, despite calling out that LLM-driven development agents like Cursor have something like a 40% success rate at actually building anything that passes acceptance criteria, makes me think that more of the future of our field belongs to people who figure out how to use this weird bags of model weights than any of us are comfortable with. </p> <p>I’ve been dinking around with Cursor for a week now (if you haven’t, I think it’s something close to malpractice not to at least take it — or something like it — for a spin) and am just now from this post learning that Cursor has this <a href='https://docs.cursor.com/context/rules-for-ai' title=''>rules feature</a>. </p> <p>The important thing for me is not how Cursor rules work, but rather how Huntley uses them. He turns them back on themselves, writing rules to tell Cursor how to organize the rules, and then teach Cursor how to write (under human supervision) its own rules.</p> <p>Cursor kept trying to get Huntley to use Bazel as a build system. So he had cursor write a rule for itself: “no bazel”. And there was no more Bazel. If I’d known I could do this, I probably wouldn’t have bounced from the Elixir project I had Cursor doing, where trying to get it to write simple unit tests got it all tangled up trying to make <a href='https://hexdocs.pm/mox/Mox.html' title=''>Mox</a> work. </p> <p>But I’m burying the lead. </p> <p>Security people have been for several years now somewhat in love with a tool called <a href='https://github.com/semgrep/semgrep' title=''>Semgrep</a>. Semgrep is a semantics-aware code search tool; using symbolic variable placeholders and otherwise ordinary code, you can write rules to match pretty much arbitary expressions and control flow. </p> <p>If you’re an appsec person, where you obviously go with this is: you build a library of Semgrep searches for well-known vulnerability patterns (or, if you’re like us at Fly.io, you work out how to get Semgrep to catch the Rust concurrency footgun of RWLocks inside if-lets).</p> <p>The reality for most teams though is “ain’t nobody got time for that”. </p> <p>But I just checked and, unsurprisingly, 4o <a href='https://chatgpt.com/share/67aa94a7-ea3c-8012-845c-6c9491b33fe4' title=''>seems to do reasonably well</a> at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?</p> <p>What interests me is this: it seems obvious that we’re going to do more and more “closed-loop” LLM agent code generation stuff. By “closed loop”, I mean that the thingy that generates code is going to get to run the code and watch what happens when it’s interacted with. You’re just a small bit of glue code and a lot of system prompting away from building something like that right now: <a href='https://x.com/chris_mccord/status/1882839014845374683' title=''>Chris McCord is building</a> a thingy that generates whole Elixir/Phoenix apps and runs them as Fly Machines. When you deploy these kinds of things, the LLM gets to see the errors when the code is run, and it can just go fix them. It also gets to see errors and exceptions in the logs when you hit a page on the app, and it can just go fix them.</p> <p>With a bit more system prompting, you can get an LLM to try to generalize out from exceptions it fixes and generate unit test coverage for them. </p> <p>With a little bit more system prompting, you can probably get an LLM to (1) generate a Semgrep rule for the generalized bug it caught, (2) test the Semgrep rule with a positive/negative control, (3) save the rule, (4) test the whole codebase with Semgrep for that rule, and (5) fix anything it finds that way. </p> <p>That is a lot more interesting to me than tediously (and probably badly) trying to predict everything that will go wrong in my codebase a priori and Semgrepping for them. Which is to say: Semgrep — which I have always liked — is maybe a lot more interesting now? And tools like it?</p></content> </entry> <entry> <title>VSCode’s SSH Agent Is Bananas</title> <link rel="alternate" href="https://fly.io/blog/vscode-ssh-wtf/"/> <id>https://fly.io/blog/vscode-ssh-wtf/</id> <published>2025-02-07T00:00:00+00:00</published> <updated>2025-02-07T21:53:40+00:00</updated> <media:thumbnail url="https://fly.io/static/images/default-post-thumbnail.webp"/> <content type="html"><p>We’re interested in getting integrated into the flow VSCode uses to do remote editing over SSH, because everybody is using VSCode now, and, in particular, they’re using forks of VSCode that generate code with LLMs. </p> <div class="right-sidenote"><p>”hallucination” is what we call it when LLMs get code wrong; “engineering” is what we call it when people do.</p> </div> <p>LLM-generated code is <a href='https://nicholas.carlini.com/writing/2024/how-i-use-ai.html' title=''>useful in the general case</a> if you know what you’re doing. But it’s ultra-useful if you can close the loop between the LLM and the execution environment (with an “Agent” setup). There’s lots to say about this, but for the moment: it’s a semi-effective antidote to hallucination: the LLM generates the code, the agent scaffolding runs the code, the code generates errors, the agent feeds it back to the LLM, the process iterates. </p> <p>So, obviously, the issue here is you don’t want this iterative development process happening on your development laptop, because LLMs have boundary issues, and they’ll iterate on your system configuration just as happily on the Git project you happen to be working in. A thing you’d really like to be able to do: run a closed-loop agent-y (“agentic”? is that what we say now) configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can’t screw you over in any way. You get where we’re going with this.</p> <p>Anyways! I would like to register a concern.</p> <p>Emacs hosts the spiritual forebearer of remote editing systems, a blob of hyper-useful Elisp called <a href='https://www.gnu.org/software/tramp/' title=''>“Tramp”</a>. If you can hook Tramp up to any kind of interactive environment — usually, an SSH session — where it can run Bourne shell commands, it can extend Emacs to that environment.</p> <p>So, VSCode has a feature like Tramp. Which, neat, right? You’d think, take Tramp, maybe simplify it a bit, switch out Elisp for Typescript.</p> <p>You’d think wrong!</p> <p>Unlike Tramp, which lives off the land on the remote connection, VSCode mounts a full-scale invasion: it runs a Bash snippet stager that downloads an agent, including a binary installation of Node. </p> <p>I <em>think</em> this is <a href='https://github.com/microsoft/vscode/tree/c9e7e1b72f80b12ffc00e06153afcfedba9ec31f/src/vs/server/node' title=''>the source code</a>?</p> <p>The agent runs over port-forwarded SSH. It establishes a WebSockets connection back to your running VSCode front-end. The underlying protocol on that connection can:</p> <ul> <li>Wander around the filesystem </li><li>Edit arbitrary files </li><li>Launch its own shell PTY processes </li><li>Persist itself </li></ul> <p>In security-world, there’s a name for tools that work this way. I won’t say it out loud, because that’s not fair to VSCode, but let’s just say the name is murid in nature.</p> <p>I would be a little nervous about letting people VSCode-remote-edit stuff on dev servers, and apoplectic if that happened during an incident on something in production. </p> <p>It turns out we don’t have to care about any of this to get a custom connection to a Fly Machine working in VSCode, so none of this matters in any kind of deep way, but: we’ve decided to just be a blog again, so: we had to learn this, and now you do too.</p></content> </entry> <entry> <title>AI GPU Clusters, From Your Laptop, With Livebook</title> <link rel="alternate" href="https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/"/> <id>https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/</id> <published>2024-09-24T00:00:00+00:00</published> <updated>2024-10-03T19:05:54+00:00</updated> <media:thumbnail url="https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/assets/ai-gpu-livebook-thumb.webp"/> <content type="html"><div class="lead"><p>Livebook, FLAME, and the Nx stack: three Elixir components that are easy to describe, more powerful than they look, and intricately threaded into the Elixir ecosystem. A few weeks ago, Chris McCord (👋) and Chris Grainger showed them off at ElixirConf 2024. We thought the talk was worth a recap.</p> </div> <p>Let&rsquo;s begin by introducing our cast of characters.</p> <p><a href='https://livebook.dev/' title=''>Livebook</a> is usually described as Elixir&rsquo;s answer to <a href='https://jupyter.org/' title=''>Jupyter Notebooks</a>. And that&rsquo;s a good way to think about it. But Livebook takes full advantage of the Elixir platform, which makes it sneakily powerful. By linking up directly with Elixir app clusters, Livebook can switch easily between driving compute locally or on remote servers, and makes it easy to bring in any kind of data into reproducible workflows.</p> <p><a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''>FLAME</a> is the Elixir&rsquo;s answer to serverless computing. By having the library manage a pool of executors for you, FLAME lets you treat your entire application as if it was elastic and scale-to-zero. You configure FLAME with some basic information about where to run code and how many instances it&rsquo;s allowed to run with, and then mark off any arbitrary section of code with <code>Flame.call</code>. The framework takes care of the rest. It&rsquo;s the upside of serverless without committing yourself to blowing your app apart into tiny, intricately connected pieces.</p> <p>The <a href='https://github.com/elixir-nx' title=''>Nx stack</a> is how you do Elixir-native AI and ML. Nx gives you an Elixir-native notion of tensor computations with GPU backends. <a href='https://github.com/elixir-nx/axon' title=''>Axon</a> builds a common interface for ML models on top of it. <a href='https://github.com/elixir-nx/bumblebee' title=''>Bumblebee</a> makes those models available to any Elixir app that wants to download them, from just a couple lines of code.</p> <p>Here is quick video showing how to transfer a local tensor to a remote GPU, using Livebook, FLAME, and Nx:</p> <div class="youtube-container" data-exclude-render> <div class="youtube-video"> <iframe width="100%" height="100%" src="https://www.youtube.com/embed/5ImP3gpUSkQ" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe> </div> </div> <p>Let&rsquo;s dive into the <a href='https://www.youtube.com/watch?v=4qoHPh0obv0' title=''>keynote</a>.</p> <h2 id='poking-a-hole-in-your-infrastructure' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#poking-a-hole-in-your-infrastructure' aria-label='Anchor'></a><span class='plain-code'>Poking a hole in your infrastructure</span></h2> <p>Any Livebook, including the one running on your laptop, can start a runtime running on a Fly Machine, in Fly.io&rsquo;s public cloud. That Elixir machine will (by default) live in your default Fly.io organization, giving it networked access to all the other apps that might live there.</p> <p>This is an access control situation that mostly just does what you want it to do without asking. Unless you ask it to, Fly.io isn&rsquo;t exposing anything to the Internet, or to other users of Fly.io. For instance: say we have a database we&rsquo;re going to use to generate reports. It can hang out on our Fly organization, inside of a private network with no connectivity to the world. We can spin up a Livebook instance that can talk to it, without doing any network or infrastructure engineering to make that happen.</p> <p>But wait, there&rsquo;s more. Because this is all Elixir, Livebook also allows you to connect to any running Erlang/Elixir application in your infrastructure to debug, introspect, and monitor them.</p> <p>Check out this clip of Chris McCord connecting <a href='https://rtt.fly.dev/' title=''>to an existing application</a> during the keynote:</p> <div class="youtube-container" data-exclude-render> <div class="youtube-video"> <iframe width="100%" height="100%" src="https://www.youtube.com/embed/4qoHPh0obv0?start=1106" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe> </div> </div> <p>Running a snippet of code from a laptop on a remote server is a neat trick, but Livebook is doing something deeper than that. It&rsquo;s taking advantage of Erlang/Elixir&rsquo;s native facility with cluster computation and making it available to the notebook. As a result, when we do things like auto-completing, Livebook delivers results from modules defined on the remote note itself. 🤯</p> <h2 id='elastic-scale-with-flame' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#elastic-scale-with-flame' aria-label='Anchor'></a><span class='plain-code'>Elastic scale with FLAME</span></h2> <p>When we first introduced FLAME, the example we used was video encoding.</p> <p>Video encoding is complicated and slow enough that you&rsquo;d normally make arrangements to run it remotely or in a background job queue, or as a triggerable Lambda function. The point of FLAME is to get rid of all those steps, and give them over to the framework instead. So: we wrote our <code>ffpmeg</code> calls inline like normal code, as if they were going to complete in microseconds, and wrapped them in <code>Flame.call</code> blocks. That was it, that was the demo.</p> <p>Here, we&rsquo;re going to put a little AI spin on it.</p> <p>The first thing we&rsquo;re doing here is driving FLAME pools from Livebook. Livebook will automatically synchronize your notebook dependencies as well as any module or code defined in your notebook across nodes. That means any code we write in our notebook can be dispatched transparently out to arbitrarily many compute nodes, without ceremony.</p> <p>Now let&rsquo;s add some AI flair. We take an object store bucket full of video files. We use <code>ffmpeg</code> to extract stills from the video at different moments. Then: we send them to <a href='https://www.llama.com/' title=''>Llama</a>, running on <a href='https://fly.io/gpu' title=''>GPU Fly Machines</a> (still locked to our organization), to get descriptions of the stills.</p> <p>All those stills and descriptions get streamed back to our notebook, in real time:</p> <div class="youtube-container" data-exclude-render> <div class="youtube-video"> <iframe width="100%" height="100%" src="https://www.youtube.com/embed/4qoHPh0obv0?start=1692" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe> </div> </div> <p>At the end, the descriptions are sent to <a href='https://mistral.ai/' title=''>Mistral</a>, which builds a summary.</p> <p>Thanks to FLAME, we get explicit control over the minimum and the maximum amount of nodes you want running at once, as well their concurrency settings. As nodes finish processing each video, new ones are automatically sent to them, until the whole bucket has been traversed. Each node will automatically shut down after an idle timeout and the whole cluster terminates if you disconnect the Livebook runtime.</p> <p>Just like your app code, FLAME lets you take your notebook code designed to run locally, change almost nothing, and elastically execute it across ephemeral infrastructure.</p> <h2 id='64-gpus-hyperparameter-tuning-on-a-laptop' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#64-gpus-hyperparameter-tuning-on-a-laptop' aria-label='Anchor'></a><span class='plain-code'>64-GPUs hyperparameter tuning on a laptop</span></h2> <p>Next, Chris Grainger, CTO of <a href='https://amplified.ai/' title=''>Amplified</a>, takes the stage.</p> <p>For work at Amplified, Chris wants to analyze a gigantic archive of patents, on behalf of a client doing edible cannibinoid work. To do that, he uses a BERT model (BERT, from Google, is one of the OG &ldquo;transformer&rdquo; models, optimized for text comprehension).</p> <p>To make the BERT model effective for this task, he&rsquo;s going to do a hyperparameter training run.</p> <p>This is a much more complicated AI task than the Llama work we just showed up. Chris is going to generate a cluster of 64 GPU Fly Machines, each running an <a href='https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/' title=''>L40s GPU</a>. On each of these nodes, he needs to:</p> <ul> <li>setup its environment (including native dependencies and GPU bindings) </li><li>load the training data </li><li>compile a different version of BERT with different parameters, optimizers, etc. </li><li>start the fine-tuning </li><li>stream its results in real-time to each assigned chart </li></ul> <p>Here&rsquo;s the clip. You&rsquo;ll see the results stream in, in real time, directly back to his Livebook. We&rsquo;ll wait, because it won&rsquo;t take long to watch:</p> <div class="youtube-container" data-exclude-render> <div class="youtube-video"> <iframe width="100%" height="100%" src="https://www.youtube.com/embed/4qoHPh0obv0?start=3344" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe> </div> </div> <h2 id='this-is-just-the-beginning' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#this-is-just-the-beginning' aria-label='Anchor'></a><span class='plain-code'>This is just the beginning</span></h2> <p>The suggestion of mixing Livebook and FLAME to elastically scale notebook execution was originally proposed by Chris Grainger during ElixirConf EU. During the next four months, Jonatan Kłosko, Chris McCord, and José Valim worked part-time on making it a reality in time for ElixirConf US. Our ability to deliver such a rich combination of features in such a short period of time is a testament to the capabilities of the Erlang Virtual Machine, which Elixir and Livebook runs on. Other features, such as <a href='https://github.com/elixir-explorer/explorer/issues/932' title=''>remote dataframes and distributed GC</a>, were implemented in a weekend. Bringing the same functionality to other ecosystems would take several additional months, sometimes accompanied by millions in funding, and often times as part of a closed-source product.</p> <p>Furthermore, since we announced this feature, <a href='https://github.com/mruoss' title=''>Michael Ruoss</a> stepped in and brought the same functionality to Kubernetes. From Livebook v0.14.1, you can start Livebook runtimes inside a Kubernetes cluster and also use FLAME to elastically scale them. Expect more features and news in this space!</p> <p>Finally, Fly&rsquo;s infrastructure played a key role in making it possible to start a cluster of GPUs in seconds rather than minutes, and all it requires is a Docker image. We&rsquo;re looking forward to see how other technologies and notebook platforms can leverage Fly to also elevate their developer experiences.</p> <figure class="post-cta"> <figcaption> <h1>Launch a GPU app in seconds</h1> <p>Run your own LLMs or use Livebook for elastic GPU workflows&nbsp✨</p> <a class="btn btn-lg" href="/gpu"> Go! &nbsp;<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure></content> </entry> <entry> <title>Accident Forgiveness</title> <link rel="alternate" href="https://fly.io/blog/accident-forgiveness/"/> <id>https://fly.io/blog/accident-forgiveness/</id> <published>2024-08-21T00:00:00+00:00</published> <updated>2024-08-27T21:13:01+00:00</updated> <media:thumbnail url="https://fly.io/blog/accident-forgiveness/assets/money-for-mistakes-blog-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. <a href="https://fly.io/speedrun" title="">Try it out; you’ll be deployed in just minutes</a>, and, as you’re about to read, with less financial risk.</p> </div> <p>Public cloud billing is terrifying.</p> <p>The premise of a public cloud &mdash; what sets it apart from a hosting provider &mdash; is 8,760 hours/year of on-tap deployable compute, storage, and networking. Cloud resources are &ldquo;elastic&rdquo;: they&rsquo;re acquired and released as needed; in the &ldquo;cloud-iest&rdquo; apps, without human intervention. Public cloud resources behave like utilities, and that&rsquo;s how they&rsquo;re priced.</p> <p>You probably can&rsquo;t tell me how much electricity your home is using right now, and may only come within tens of dollars of accurately predicting your water bill. But neither of those bills are all that scary, because you assume there&rsquo;s a limit to how much you could run them up in a single billing interval.</p> <p>That&rsquo;s not true of public clouds. There are only so many ways to &ldquo;spend&rdquo; water at your home, but there are indeterminably many ways to land on a code path that grabs another VM, or to miskey a configuration, or to leak long-running CI/CD environments every time a PR gets merged. Pick a practitioner at random, and I bet they&rsquo;ve read a story within the last couple months about someone running up a galactic-scale bill at some provider or other.</p> <h2 id='implied-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implied-accident-forgiveness' aria-label='Anchor'></a><span class='plain-code'>Implied Accident Forgiveness</span></h2> <p>For people who don&rsquo;t do a lot of cloud work, what all this means is that every PR push sets off a little alarm in the back of their heads: &ldquo;you may have just incurred $200,000 of costs!&rdquo;. The alarm is quickly silenced, though it&rsquo;s still subtly extracting a cortisol penalty. But by deadening the nerves that sense the danger of unexpected charges, those people are nudged closer to themselves being the next story on Twitter about an accidental $200,000 bill.</p> <p>The saving grace here, which you&rsquo;ll learn if you ever become that $200,000 story, is that nobody pays those bills.</p> <p>See, what cloud-savvy people know already is that providers have billing support teams, which spend a big chunk of their time conceding disputed bills. If you do something luridly stupid and rack up costs, AWS and GCP will probably cut you a break. We will too. Everyone does.</p> <p>If you didn&rsquo;t already know this, you&rsquo;re welcome; I&rsquo;ve made your life a little better, even if you don&rsquo;t run things on Fly.io.</p> <p>But as soothing as it is to know you can get a break from cloud providers, the billing situation here is still a long ways away from &ldquo;good&rdquo;. If you accidentally add a zero to a scale count and don&rsquo;t notice for several weeks, AWS or GCP will probably cut you a break. But they won&rsquo;t <em>definitely</em> do it, and even though your odds are good, you&rsquo;re still finding out at email- and phone-tag scale speeds. That&rsquo;s not fun!</p> <h2 id='explicit-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#explicit-accident-forgiveness' aria-label='Anchor'></a><span class='plain-code'>Explicit Accident Forgiveness</span></h2> <p>Charging you for stuff you didn&rsquo;t want is bad business.</p> <p>Good business, we think, means making you so comfortable with your cloud you try new stuff. You, and everyone else on your team. Without a chaperone from the finance department.</p> <p>So we&rsquo;re going to do the work to make this official. If you&rsquo;re a customer of ours, we&rsquo;re going to charge you in exacting detail for every resource you intentionally use of ours, but if something blows up and you get an unexpected bill, we&rsquo;re going to let you off the hook.</p> <h2 id='not-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-so-fast' aria-label='Anchor'></a><span class='plain-code'>Not So Fast</span></h2> <p>This is a Project, with a capital P. While we&rsquo;re kind of kicking ourselves for not starting it earlier, there are reasons we couldn&rsquo;t do it back in 2020.</p> <p>The Fully Automated Accident-Forgiving Billing System of the Future (which we are in fact building and may even one day ship) will give you a line-item veto on your invoice. We are a long ways away. The biggest reason is fraud.</p> <p>Sit back, close your eyes, and try to think about everything public clouds do to make your life harder. Chances are, most of those things are responses to fraud. Cloud platforms attract fraudsters like ants to an upturned ice cream cone. Thanks to the modern science of cryptography, fraudsters have had a 15 year head start on turning metered compute into picodollar-granular near-money assets.</p> <p>Since there&rsquo;s no bouncer at the door checking IDs here, an open-ended and automated commitment to accident forgiveness is, with iron certainty, going to be used overwhelmingly in order to trick us into &ldquo;forgiving&rdquo; cryptocurrency miners. We&rsquo;re cloud platform engineers. They&rsquo;re our primary pathogen.</p> <p>So, we&rsquo;re going to roll this out incrementally.</p> <div class="callout"><p><strong class="font-semibold text-navy-950">Why not billing alerts?</strong> We’ll get there, but here too there are reasons we haven’t yet: (1) meaningful billing alerts were incredibly difficult to do with our previous billing system, and building the new system and migrating our customers to it has been a huge lift, a nightmare from which we are only now waking (the billing system’s official name); and (2) we’re wary about alerts being a product design cop-out; if we can alert on something, why aren’t we fixing it?</p> </div><h2 id='accident-forgiveness-v0-84beta' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accident-forgiveness-v0-84beta' aria-label='Anchor'></a><span class='plain-code'>Accident Forgiveness v0.84beta</span></h2> <p>All the same subtextual, implied reassurances that every cloud provider offers remain in place at Fly.io. You are strictly better off after this announcement, we promise.</p> <div class="right-sidenote"><p>I added the “almost” right before publishing, because I’m chicken.</p> </div> <p>Now: for customers that have a support contract with us, at any level, there&rsquo;s something new: I&rsquo;m saying the quiet part loud. The next time you see a bill with an unexpected charge on it, we&rsquo;ll refund that charge, (almost) no questions asked.</p> <p>That policy is so simple it feels anticlimactic to write. So, some additional color commentary:</p> <p>We&rsquo;re not advertising a limit to the number of times you can do this. If you&rsquo;re a serious customer of ours, I promise that you cannot remotely fathom the fullness of our fellow-feeling. You&rsquo;re not annoying us by getting us to refund unexpected charges. If you are growing a project on Fly.io, we will bend over backwards to keep you growing.</p> <p>How far can we take this? How simple can we keep this policy? We&rsquo;re going to find out together.</p> <p>To begin with, and in the spirit of &ldquo;doing things that won&rsquo;t scale&rdquo;, when we forgive a bill, what&rsquo;s going to happen next is this: I&rsquo;m going to set an irritating personal reminder for Kurt to look into what happened, now and then the day before your next bill, so we can see what&rsquo;s going wrong. He&rsquo;s going to hate that, which is the point: our best feature work is driven by Kurt-hate.</p> <p>Obviously, if you&rsquo;re rubbing your hands together excitedly over the opportunity this policy presents, then, well, not so much with the fellow-feeling. We reserve the right to cut you off.</p> <figure class="post-cta"> <figcaption> <h1>Support For Developers, By Developers</h1> <p>Explicit Accident Forgiveness is just one thing we like about Support at Fly.io.</p> <a class="btn btn-lg" href="https://fly.io/accident-forgiveness"> Go find out! &nbsp;<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='whats-next-accident-protection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next-accident-protection' aria-label='Anchor'></a><span class='plain-code'>What&rsquo;s Next: Accident Protection</span></h2> <p>We think this is a pretty good first step. But that&rsquo;s all it is.</p> <p>We can do better than offering you easy refunds for mistaken deployments and botched CI/CD jobs. What&rsquo;s better than getting a refund is never incurring the charge to begin with, and that&rsquo;s the next step we&rsquo;re working on.</p> <div class="right-sidenote"><p>More to come on that billing system.</p> </div> <p>We built a new billing system so that we can do things like that. For instance: we&rsquo;re in a position to catch sudden spikes in your month-over-month bills, flag them, and catch weird-looking deployments before we bill for them.</p> <p>Another thing we rebuilt billing for is <a href='https://community.fly.io/t/reservation-blocks-40-discount-on-machines-when-youre-ready-to-commit/20858' title=''>reserved pricing</a>. Already today you can get a steep discount from us reserving blocks of compute in advance. The trick to taking advantage of reserved pricing is confidently predicting a floor to your usage. For a lot of people, that means fighting feelings of loss aversion (nobody wants to get gym priced!). So another thing we can do in this same vein: catch opportunities to move customers to reserved blocks, and offer backdated reservations. We&rsquo;ll figure this out too.</p> <p>Someday, when we&rsquo;re in a monopoly position, our founders have all been replaced by ruthless MBAs, and Kurt has retired to farm coffee beans in lower Montana, we may stop doing this stuff. But until that day this is the right choice for our business.</p> <p>Meanwhile: like every public cloud, we provision our own hardware, and we have excess capacity. Your messed-up CI/CD jobs didn&rsquo;t really cost us anything, so if you didn&rsquo;t really want them, they shouldn&rsquo;t cost you anything either. Take us up on this! We love talking to you.</p></content> </entry> <entry> <title>We're Cutting L40S Prices In Half</title> <link rel="alternate" href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/"/> <id>https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/</id> <published>2024-08-15T00:00:00+00:00</published> <updated>2024-08-16T02:01:46+00:00</updated> <media:thumbnail url="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/assets/gpu-ga-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. And as of today, cheaper GPUs. <a href="https://fly.io/speedrun" title="">Try it out; you’ll be deployed in just minutes</a>.</p> </div> <p>We just lowered the prices on NVIDIA L40s GPUs to $1.25 per hour. Why? Because our feet are cold and we burn processor cycles for heat. But also other reasons.</p> <p>Let&rsquo;s back up.</p> <p>We offer 4 different NVIDIA GPU models; in increasing order of performance, they&rsquo;re the A10, the L40S, the 40G PCI A100, and the 80G SXM A100. Guess which one is most popular.</p> <p>We guessed wrong, and spent a lot of time working out how to maximize the amount of GPU power we could deliver to a single Fly Machine. Users surprised us. By a wide margin, the most popular GPU in our inventory is the A10.</p> <p>The A10 is an older generation of NVIDIA GPU with fewer, slower cores and less memory. It&rsquo;s the least capable GPU we offer. But that doesn&rsquo;t matter, because it&rsquo;s capable enough. It&rsquo;s solid for random inference tasks, and handles mid-sized generative AI stuff like Mistral Nemo or Stable Diffusion. For those workloads, there&rsquo;s not that much benefit in getting a beefier GPU.</p> <p>As a result, we can&rsquo;t get new A10s in fast enough for our users.</p> <p>If there&rsquo;s one thing we&rsquo;ve learned by talking to our customers over the last 4 years, it&rsquo;s that y&#39;all love a peek behind the curtain. So we&rsquo;re going to let you in on a little secret about how a hardware provider like Fly.io formulates GPU strategy: none of us know what the hell we&rsquo;re doing.</p> <p>If you had asked us in 2023 what the biggest GPU problem we could solve was, we&rsquo;d have said &ldquo;selling fractional A100 slices&rdquo;. We burned a whole quarter trying to get MIG, or at least vGPUs, working through IOMMU PCI passthrough on Fly Machines, in a project so cursed that Thomas has forsworn ever programming again. Then we went to market selling whole A100s, and for several more months it looked like the biggest problem we needed to solve was finding a secure way to expose NVLink-ganged A100 clusters to VMs so users could run training. Then H100s; can we find H100s anywhere? Maybe in a black market in Shenzhen?</p> <p>And here we are, a year later, looking at the data, and the least sexy, least interesting GPU part in the catalog is where all the action is.</p> <p>With actual customer data to back up the hypothesis, here&rsquo;s what we think is happening today:</p> <ul> <li>Most users who want to plug GPU-accelerated AI workloads into fast networks are doing inference, not training. </li><li>The hyperscaler public clouds strangle these customers, first with GPU instance surcharges, and then with egress fees for object storage data when those customers try to outsource the GPU stuff to GPU providers. </li><li>If you&rsquo;re trying to do something GPU-accelerated in response to an HTTP request, the right combination of GPU, instance RAM, fast object storage for datasets and model parameters, and networking is much more important than getting your hands on an H100. </li></ul> <p>This is a thing we didn&rsquo;t see coming, but should have: training workloads tend to look more like batch jobs, and inference tends to look more like transactions. Batch training jobs aren&rsquo;t that sensitive to networking or even reliability. Live inference jobs responding to end-user HTTP requests are. So, given our pricing, of course the A10s are a sweet spot.</p> <p>The next step up in our lineup after the A10 is the L40S. The L40S is a nice piece of kit. We&rsquo;re going to take a beat here and sell you on the L40S, because it&rsquo;s kind of awesome.</p> <p>The L40S is an AI-optimized version of the L40, which is the data center version of the GeForce RTX 4090, resembling two 4090s stapled together.</p> <p>If you&rsquo;re not a GPU hardware person, the RTX 4090 is a gaming GPU, the kind you&rsquo;d play ray-traced Witcher 3 on. NVIDIA&rsquo;s high-end gaming GPUs are actually reasonably good at AI workloads! But they suck in a data center rack: they chug power, they&rsquo;re hard to cool, and they&rsquo;re less dense. Also, NVIDIA can&rsquo;t charge as much for them.</p> <p>Hence the L40: (much) more memory, less energy consumption, designed for a rack, not a tower case. Marked up for &ldquo;enterprise&rdquo;.</p> <p>NVIDIA positioned the L40 as a kind of &ldquo;graphics&rdquo; AI GPU. Unlike the super high-end cards like the A100/H100, the L40 keeps all the rendering hardware, so it&rsquo;s good for 3D graphics and video processing. Which is sort of what you&rsquo;d expect from a &ldquo;professionalized&rdquo; GeForce card.</p> <p>A funny thing happened in the middle of 2023, though: the market for ultra-high-end NVIDIA cards went absolutely batshit. The huge cards you&rsquo;d gang up for training jobs got impossible to find, and NVIDIA became one of the most valuable companies in the world. Serious shops started working out plans to acquire groups of L40-type cards to work around the problem, whether or not they had graphics workloads.</p> <p>The only company in this space that does know what they&rsquo;re doing is NVIDIA. Nobody has written a highly-ranked Reddit post about GPU workloads without NVIDIA noticing and creating a new SKU. So they launched the L40S, which is an L40 with AI workload compute performance comparable to that of the A100 (without us getting into the details of F32 vs. F16 models).</p> <p>Long story short, the L40S is an A100-performer that we can price for A10 customers; the Volkswagen GTI of our lineup. We&rsquo;re going to see if we can make that happen.</p> <p>We think the combination of just-right-sized inference GPUs and Tigris object storage is pretty killer:</p> <ul> <li>model parameters, data sets, and compute are all close together </li><li>everything plugged into an Anycast network that&rsquo;s fast everywhere in the world </li><li>on VM instances that have enough memory to actually run real frameworks on </li><li>priced like we actually want you to use it. </li></ul> <p>You should use L40S cards without thinking hard about it. So we&rsquo;re making it official. You won&rsquo;t pay us a dime extra to use one instead of an A10. Have at it! Revolutionize the industry. For $1.25 an hour.</p> <p>Here are things you can do with an L40S on Fly.io today:</p> <ul> <li>You can run Llama 3.1 70B — a big Llama — for LLM jobs. </li><li>You can run Flux from Black Forest Labs for genAI images. </li><li>You can run Whisper for automated speech recognition. </li><li>You can do whole-genome alignment with SegAlign (Thomas&rsquo; biochemist kid who has been snatching free GPU hours for his lab gave us this one, and we&rsquo;re taking his word for it). </li><li>You can run DOOM Eternal, building the Stadia that Google couldn&rsquo;t pull off, because the L40S hasn&rsquo;t forgotten that it&rsquo;s a graphics GPU. </li></ul> <p>It&rsquo;s going to get chilly in Chicago in a month or so. Go light some cycles on fire! </p></content> </entry> <entry> <title>Making Machines Move</title> <link rel="alternate" href="https://fly.io/blog/machine-migrations/"/> <id>https://fly.io/blog/machine-migrations/</id> <published>2024-07-30T00:00:00+00:00</published> <updated>2024-08-07T00:54:26+00:00</updated> <media:thumbnail url="https://fly.io/blog/machine-migrations/assets/migrations-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io, a global public cloud with simple, developer-friendly ergonomics. If you’ve got a working Docker image, we’ll transmogrify it into a Fly Machine: a VM running on our hardware anywhere in the world. <a href="https://fly.io/speedrun" title="">Try it out; you’ll be deployed in just minutes</a>.</p> </div> <p>At the heart of our platform is a systems design tradeoff about durable storage for applications. When we added storage three years ago, to support stateful apps, we built it on attached NVMe drives. A benefit: a Fly App accessing a file on a Fly Volume is never more than a bus hop away from the data. A cost: a Fly App with an attached Volume is anchored to a particular worker physical.</p> <div class="right-sidenote"><p><code>bird</code>: a BGP4 route server.</p> </div> <p>Before offering attached storage, our on-call runbook was almost as simple as “de-bird that edge server”, “tell <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>Nomad</a> to drain that worker”, and “go back to sleep”. NVMe cost us that drain operation, which terribly complicated the lives of our infra team. We’ve spent the last year getting “drain” back. It’s one of the biggest engineering lifts we&rsquo;ve made, and if you didn’t notice, we lifted it cleanly.</p> <h3 id='the-goalposts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-goalposts' aria-label='Anchor'></a><span class='plain-code'>The Goalposts</span></h3> <p>With stateless apps, draining a worker is easy. For each app instance running on the victim server, start a new instance elsewhere. Confirm it&rsquo;s healthy, then kill the old one. Rinse, repeat. At our 2020 scale, we could drain a fully loaded worker in just a handful of minutes.</p> <p>You can see why this process won&rsquo;t work for apps with attached volumes. Sure, create a new volume elsewhere on the fleet, and boot up a new Fly Machine attached to it. But the new volume is empty. The data&rsquo;s still stuck on the original worker. We asked, and customers were not OK with this kind of migration.</p> <p>Of course, we back Volumes snapshots up (at an interval) to off-network storage. But for “drain”, restoring backups isn&rsquo;t nearly good enough. No matter the backup interval, a “restore from backup migration&quot; will lose data, and a “backup and restore” migration incurs untenable downtime.</p> <p>The next thought you have is, “OK, copy the volume over”. And, yes, of course you have to do that. But you can’t just <code>copy</code>, <code>boot</code>, and then <code>kill</code> the old Fly Machine. Because the original Fly Machine is still alive and writing, you have to <code>kill</code>first, then <code>copy</code>, then <code>boot</code>.</p> <p>Fly Volumes can get pretty big. Even to a rack buddy physical server, you&rsquo;ll hit a point where draining incurs minutes of interruption, especially if you’re moving lots of volumes simultaneously. <code>Kill</code>, <code>copy</code>, <code>boot</code> is too slow.</p> <div class="callout"><p>There’s a world where even 15 minutes of interruption is tolerable. It’s the world where you run more than one instance of your application to begin with, so prolonged interruption of a single Fly Machine isn’t visible to your users. Do this! But we have to live in the same world as our customers, many of whom don’t run in high-availability configurations.</p> </div><h3 id='behold-the-clone-o-mat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#behold-the-clone-o-mat' aria-label='Anchor'></a><span class='plain-code'>Behold The Clone-O-Mat</span></h3> <p><code>Copy</code>, <code>boot</code>, <code>kill</code> loses data. <code>Kill</code>, <code>copy</code>, <code>boot</code> takes too long. What we needed is a new operation: <code>clone</code>.</p> <p><code>Clone</code> is a lazier, asynchronous <code>copy</code>. It creates a new volume elsewhere on our fleet, just like <code>copy</code> would. But instead of blocking, waiting to transfer every byte from the original volume, <code>clone</code> returns immediately, with a transfer running in the background.</p> <p>A new Fly Machine can be booted with that cloned volume attached. Its blocks are mostly empty. But that’s OK: when the new Fly Machine tries to read from it, the block storage system works out whether the block has been transferred. If it hasn’t, it’s fetched over the network from the original volume; this is called &ldquo;hydration&rdquo;. Writes are even easier, and don’t hit the network at all.</p> <p><code>Kill</code>, <code>copy</code>, <code>boot</code> is slow. But <code>kill</code>, <code>clone</code>, <code>boot</code> is fast; it can be made asymptotically as fast as stateless migration.</p> <p>There are three big moving pieces to this design.</p> <ol> <li>First, we have to rig up our OS storage system to make this <code>clone</code> operation work. </li><li>Then, to read blocks over the network, we need a network protocol. (Spoiler: iSCSI, though we tried other stuff first.) </li><li>Finally, the gnarliest of those pieces is our orchestration logic: what’s running where, what state is it in, and whether it’s plugged in correctly. </li></ol> <h3 id='block-level-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#block-level-clone' aria-label='Anchor'></a><span class='plain-code'>Block-Level Clone</span></h3> <p>The Linux feature we need to make this work already exists; <a href='https://docs.kernel.org/admin-guide/device-mapper/dm-clone.html' title=''>it’s called <code>dm-clone</code></a>. Given an existing, readable storage device, <code>dm-clone</code> gives us a new device, of identical size, where reads of uninitialized blocks will pull from the original. It sounds terribly complicated, but it’s actually one of the simpler kernel lego bricks. Let&rsquo;s demystify it.</p> <p>As far as Unix is concerned, random-access storage devices, be they spinning rust or NVMe drives, are all instances of the common class “block device”. A block device is addressed in fixed-size (say, 4KiB) chunks, and <a href='https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L356' title=''>handles (roughly) these operations</a>:</p> <div class="highlight-wrapper group relative cpp"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-aokru06k" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-aokru06k"><span class="k">enum</span> <span class="n">req_opf</span> <span class="p">{</span> <span class="cm">/* read sectors from the device */</span> <span class="n">REQ_OP_READ</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="cm">/* write sectors to the device */</span> <span class="n">REQ_OP_WRITE</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="cm">/* flush the volatile write cache */</span> <span class="n">REQ_OP_FLUSH</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span> <span class="cm">/* discard sectors */</span> <span class="n">REQ_OP_DISCARD</span> <span class="o">=</span> <span class="mi">3</span><span class="p">,</span> <span class="cm">/* securely erase sectors */</span> <span class="n">REQ_OP_SECURE_ERASE</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="cm">/* write the same sector many times */</span> <span class="n">REQ_OP_WRITE_SAME</span> <span class="o">=</span> <span class="mi">7</span><span class="p">,</span> <span class="cm">/* write the zero filled sector many times */</span> <span class="n">REQ_OP_WRITE_ZEROES</span> <span class="o">=</span> <span class="mi">9</span><span class="p">,</span> <span class="cm">/* ... */</span> <span class="p">};</span> </code></pre> </div> </div> <p>You can imagine designing a simple network protocol that supported all these options. It might have messages that looked something like:</p> <p><img alt="A packet diagram, just skip down to &quot;struct bio&quot; below" src="/blog/machine-migrations/assets/packet.png?2/3&amp;center" /> Good news! The Linux block system is organized as if your computer was a network running a protocol that basically looks just like that. Here’s the message structure:</p> <div class="right-sidenote"><p>I’ve <a href="https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L223" title="">stripped a bunch of stuff out of here</a> but you don’t need any of it to understand what’s coming next.</p> </div><div class="highlight-wrapper group relative cpp"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-6neynwnf" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-6neynwnf"><span class="cm">/* * main unit of I/O for the block layer and lower layers (ie drivers and * stacking drivers) */</span> <span class="k">struct</span> <span class="nc">bio</span> <span class="p">{</span> <span class="k">struct</span> <span class="nc">gendisk</span> <span class="o">*</span><span class="n">bi_disk</span><span class="p">;</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">bi_opf</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">bi_flags</span><span class="p">;</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">bi_ioprio</span><span class="p">;</span> <span class="n">blk_status_t</span> <span class="n">bi_status</span><span class="p">;</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">bi_vcnt</span><span class="p">;</span> <span class="cm">/* how many bio_vec's */</span> <span class="k">struct</span> <span class="nc">bio_vec</span> <span class="n">bi_inline_vecs</span><span class="p">[]</span> <span class="cm">/* (page, len, offset) tuples */</span><span class="p">;</span> <span class="p">};</span> </code></pre> </div> </div> <p>No nerd has ever looked at a fixed-format message like this without thinking about writing a proxy for it, and <code>struct bio</code> is no exception. The proxy system in the Linux kernel for <code>struct bio</code> is called <code>device mapper</code>, or DM.</p> <p>DM target devices can plug into other DM devices. For that matter, they can do whatever the hell else they want, as long as they honor the interface. It boils down to a <code>map(bio)</code> function, which can dispatch a <code>struct bio</code>, or drop it, or muck with it and ask the kernel to resubmit it.</p> <p>You can do a whole lot of stuff with this interface: carve a big device into a bunch of smaller ones (<a href='https://docs.kernel.org/admin-guide/device-mapper/linear.html' title=''><code>dm-linear</code></a>), make one big striped device out of a bunch of smaller ones (<a href='https://docs.kernel.org/admin-guide/device-mapper/striped.html' title=''><code>dm-stripe</code></a>), do software RAID mirroring (<code>dm-raid1</code>), create snapshots of arbitrary existing devices (<a href='https://docs.kernel.org/admin-guide/device-mapper/snapshot.html' title=''><code>dm-snap</code></a>), cryptographically verify boot devices (<a href='https://docs.kernel.org/admin-guide/device-mapper/verity.html' title=''><code>dm-verity</code></a>), and a bunch more. Device Mapper is the kernel backend for the <a href='https://sourceware.org/lvm2/' title=''>userland LVM2 system</a>, which is how we do <a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''>thin pools and snapshot backups</a>.</p> <p>Which brings us to <code>dm-clone</code> : it’s a map function that boils down to:</p> <div class="highlight-wrapper group relative cpp"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-rj5y343v" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-rj5y343v"> <span class="cm">/* ... */</span> <span class="n">region_nr</span> <span class="o">=</span> <span class="n">bio_to_region</span><span class="p">(</span><span class="n">clone</span><span class="p">,</span> <span class="n">bio</span><span class="p">);</span> <span class="c1">// we have the data</span> <span class="k">if</span> <span class="p">(</span><span class="n">dm_clone_is_region_hydrated</span><span class="p">(</span><span class="n">clone</span><span class="o">-&gt;</span><span class="n">cmd</span><span class="p">,</span> <span class="n">region_nr</span><span class="p">))</span> <span class="p">{</span> <span class="n">remap_and_issue</span><span class="p">(</span><span class="n">clone</span><span class="p">,</span> <span class="n">bio</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// we don't and it's a read</span> <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">bio_data_dir</span><span class="p">(</span><span class="n">bio</span><span class="p">)</span> <span class="o">==</span> <span class="n">READ</span><span class="p">)</span> <span class="p">{</span> <span class="n">remap_to_source</span><span class="p">(</span><span class="n">clone</span><span class="p">,</span> <span class="n">bio</span><span class="p">);</span> <span class="k">return</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// we don't and it's a write</span> <span class="n">remap_to_dest</span><span class="p">(</span><span class="n">clone</span><span class="p">,</span> <span class="n">bio</span><span class="p">);</span> <span class="n">hydrate_bio_region</span><span class="p">(</span><span class="n">clone</span><span class="p">,</span> <span class="n">bio</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="cm">/* ... */</span> </code></pre> </div> </div><div class="right-sidenote"><p>a <a href="https://docs.kernel.org/admin-guide/device-mapper/kcopyd.html" title=""><code>kcopyd</code></a> thread runs in the background, rehydrating the device in addition to (and independent of) read accesses.</p> </div> <p><code>dm-clone</code> takes, in addition to the source device to clone from, a “metadata” device on which is stored a bitmap of the status of all the blocks: either “rehydrated” from the source, or not. That’s how it knows whether to fetch a block from the original device or the clone.</p> <h3 id='network-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#network-clone' aria-label='Anchor'></a><span class='plain-code'>Network Clone</span></h3><div class="callout"><p><strong class="font-semibold text-navy-950"><code>flyd</code> in a nutshell:</strong> worker physical run a service, <code>flyd</code>, which manages a couple of databases that are the source of truth for all the Fly Machines running there. Concepturally, <code>flyd</code> is a server for on-demand instances of durable finite state machines, each representing some operation on a Fly Machine (creation, start, stop, &amp;c), with the transition steps recorded carefully in a BoltDB database. An FSM step might be something like “assign a local IPv6 address to this interface”, or “check out a block device with the contents of this container”, and it’s straightforward to add and manage new ones.</p> </div> <p>Say we&rsquo;ve got <code>flyd</code> managing a Fly Machine with a volume on <code>worker-xx-cdg1-1</code>. We want it running on <code>worker-xx-cdg1-2</code>. Our whole fleet is meshed with WireGuard; everything can talk directly to everything else. So, conceptually:</p> <ol> <li><code>flyd</code> on <code>cdg1-1</code> stops the Fly Machine, and </li><li>sends a message to <code>flyd</code> on <code>cdg1-2</code> telling it to clone the source volume. </li><li><code>flyd</code> on <code>cdg1-2</code> starts a <code>dm-clone</code> instance, which creates a clone volume on <code>cdg1-2</code>, populating it, over some kind of network block protocol, from <code>cdg1-1</code>, and </li><li>boots a new Fly Machine, attached to the clone volume. </li><li><code>flyd</code> on <code>cdg1-2</code> monitors the clone operation, and, when the clone completes, converts the clone device to a simple linear device and cleans up. </li></ol> <p>For step (3) to work, the “original volume” on <code>cdg1-1</code> has to be visible on <code>cdg1-2</code>, which means we need to mount it over the network.</p> <div class="right-sidenote"><p><code>nbd</code> is so simple that it’s used as a sort of <code>dm-user</code> userland block device; to prototype a new block device, <a href="https://lwn.net/ml/linux-kernel/[email protected]/" title="">don’t bother writing a kernel module</a>, just write an <code>nbd</code> server.</p> </div> <p>Take your pick of protocols. iSCSI is the obvious one, but it’s relatively complicated, and Linux has native support for a much simpler one: <code>nbd</code>, the “network block device”. You could implement an <code>nbd</code> server in an afternoon, on top of a file or a SQLite database or S3, and the Linux kernel could mount it as a drive.</p> <p>We started out using <code>nbd</code>. But we kept getting stuck <code>nbd</code> kernel threads when there was any kind of network disruption. We’re a global public cloud; network disruption happens. Honestly, we could have debugged our way through this. But it was simpler just to spike out an iSCSI implementation, observe that didn’t get jammed up when the network hiccuped, and move on.</p> <h3 id='putting-the-pieces-together' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#putting-the-pieces-together' aria-label='Anchor'></a><span class='plain-code'>Putting The Pieces Together</span></h3> <p>To drain a worker with minimal downtime and no lost data, we turn workers into a temporary SANs, serving the volumes we need to drain to fresh-booted replica Fly Machines on a bunch of “target” physicals. Those SANs — combinations of <code>dm-clone</code>, iSCSI, and <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>our <code>flyd</code> orchestrator</a> — track the blocks copied from the origin, copying each one exactly once and cleaning up when the original volume has been fully copied.</p> <p>Problem solved!</p> <h3 id='no-there-were-more-problems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#no-there-were-more-problems' aria-label='Anchor'></a><span class='plain-code'>No, There Were More Problems</span></h3> <p>When your problem domain is hard, anything you build whose design you can’t fit completely in your head is going to be a fiasco. Shorter form: “if you see Raft consensus in a design, we’ve done something wrong”.</p> <p>A virtue of this migration system is that, for as many moving pieces as it has, it fits in your head. What complexity it has is mostly shouldered by strategic bets we’ve already built teams around, most notably the <code>flyd</code> orchestrator. So we’ve been running this system for the better part of a year without much drama. Not no drama, though. Some drama.</p> <p>Example: we encrypt volumes. Our key management is fussy. We do per-volume encryption keys that provision alongside the volumes themselves, so no one worker has a volume skeleton key.</p> <p>If you think “migrating those volume keys from worker to worker” is the problem I’m building up to, well, that too, but the bigger problem is <code>trim</code>.</p> <p>Most people use just a small fraction of the volumes they allocate. A 100GiB volume with just 5MiB used wouldn’t be at all weird. You don’t want to spend minutes copying a volume that could have been fully hydrated in seconds.</p> <p>And indeed, <code>dm-clone</code> doesn’t want to do that either. Given a source block device (for us, an iSCSI mount) and the clone device, a <code>DISCARD</code> issued on the clone device will get picked up by <code>dm-clone</code>, which will simply <a href='https://elixir.bootlin.com/linux/v5.11.11/source/drivers/md/dm-clone-target.c#L1357' title=''>short-circuit the read</a> of the relevant blocks by marking them as hydrated in the metadata volume. Simple enough.</p> <p>To make that work, we need the target worker to see the plaintext of the source volume (so that it can do an <code>fstrim</code> — don’t get us started on how annoying it is to sandbox this — to read the filesystem, identify the unused block, and issue the <code>DISCARDs</code> where <code>dm-clone</code> can see them) Easy enough.</p> <div class="right-sidenote"><p>these curses have a lot to do with how hard it was to drain workers!</p> </div> <p>Except: two different workers, for cursed reasons, might be running different versions of <a href='https://gitlab.com/cryptsetup/cryptsetup' title=''>cryptsetup</a>, the userland bridge between LUKS2 and the <a href='https://docs.kernel.org/admin-guide/device-mapper/dm-crypt.html' title=''>kernel dm-crypt driver</a>. There are (or were) two different versions of cryptsetup on our network, and they default to different <a href='https://fossies.org/linux/cryptsetup/docs/on-disk-format-luks2.pdf' title=''>LUKS2 header sizes</a> — 4MiB and 16MiB. Implying two different plaintext volume sizes. </p> <p>So now part of the migration FSM is an RPC call that carries metadata about the designed LUKS2 configuration for the target VM. Not something we expected to have to build, but, whatever.</p> <div class="right-sidenote"><p>Corrosion deserves its own post.</p> </div> <p>Gnarlier example: workers are the source of truth for information about the Fly Machines running on them. Migration knocks the legs out from under that constraint, which we were relying on in Corrosion, the SWIM-gossip SQLite database we use to connect Fly Machines to our request routing. Race conditions. Debugging. Design changes. Next!</p> <p>Gnarliest example: our private networks. Recall: we automatically place every Fly Machine into <a href='https://fly.io/blog/incoming-6pn-private-networks/' title=''>a private network</a>; by default, it’s the one all the other apps in your organization run in. This is super handy for setting up background services, databases, and clustered applications. 20 lines of eBPF code in our worker kernels keeps anybody from “crossing the streams”, sending packets from one private network to another.</p> <div class="right-sidenote"><p>we’re members of an elite cadre of idiots who have managed to find designs that made us wish IPv6 addresses were even bigger.</p> </div> <p>We call this scheme 6PN (for “IPv6 Private Network”). It functions by <a href='https://fly.io/blog/ipv6-wireguard-peering#ipv6-private-networking-at-fly' title=''>embedding routing information directly into IPv6 addresses</a>. This is, perhaps, gross. But it allows us to route diverse private networks with constantly changing membership across a global fleet of servers without running a distributed routing protocol. As the beardy wizards who kept the Internet backbone up and running on Cisco AGS+’s once said: the best routing protocol is “static”.</p> <p>Problem: the embedded routing information in a 6PN address refers in part to specific worker servers.</p> <p>That’s fine, right? They’re IPv6 addresses. Nobody uses literal IPv6 addresses. Nobody uses IP addresses at all; they use the DNS. When you migrate a host, just give it a new 6PN address, and update the DNS.</p> <p>Friends, somebody did use literal IPv6 addresses. It was us. In the configurations for Fly Postgres clusters.</p> <div class="right-sidenote"><p>It’s also not operationally easy for us to shell into random Fly Machines, for good reason.</p> </div> <p>The obvious fix for this is not complicated; given <code>flyctl</code> ssh access to a Fly Postgres cluster, it’s like a 30 second ninja edit. But we run a <em>lot</em> of Fly Postgres clusters, and the change has to be coordinated carefully to avoid getting the cluster into a confused state. We went as far as adding feature to our <code>init</code> to do network address mappings to keep old 6PN addresses reachable before biting the bullet and burning several weeks doing the direct configuration fix fleet-wide.</p> <figure class="post-cta"> <figcaption> <h1>Speedrun your app onto Fly.io.</h1> <p>3&hellip;2&hellip;1&hellip;</p> <a class="btn btn-lg" href="https://fly.io/speedrun"> Go! &nbsp;<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h3 id='the-learning-it-burns' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-learning-it-burns' aria-label='Anchor'></a><span class='plain-code'>The Learning, It Burns!</span></h3> <p>We get asked a lot why we don’t do storage the “obvious” way, with an <a href='https://aws.amazon.com/ebs/' title=''>EBS-type</a> SAN fabric, abstracting it away from our compute. Locally-attached NVMe storage is an idiosyncratic choice, one we’ve had to write disclaimers for (single-node clusters can lose data!) since we first launched it.</p> <p>One answer is: we’re a startup. Building SAN infrastructure in every region we operate in would be tremendously expensive. Look at any feature in AWS that normal people know the name of, like EC2, EBS, RDS, or S3 — there’s a whole company in there. We launched storage when we were just 10 people, and even at our current size we probably have nothing resembling the resources EBS gets. AWS is pretty great!</p> <p>But another thing to keep in mind is: we’re learning as we go. And so even if we had the means to do an EBS-style SAN, we might not build it today.</p> <p>Instead, we’re a lot more interested in log-structured virtual disks (LSVD). LSVD uses NVMe as a local cache, but durably persists writes in object storage. You get most of the performance benefit of bus-hop disk writes, along with unbounded storage and S3-grade reliability.</p> <p><a href='https://community.fly.io/t/bottomless-s3-backed-volumes/15648' title=''>We launched LSVD experimentally last year</a>; in the intervening year, something happened to make LSVD even more interesting to us: <a href='https://www.tigrisdata.com/' title=''>Tigris Data</a> launched S3-compatible object storage in every one our regions, so instead of backhauling updates to Northern Virginia, <a href='https://community.fly.io/t/tigris-backed-volumes/20792' title=''>we can keep them local</a>. We have more to say about LSVD, and a lot more to say about Tigris.</p> <p>Our first several months of migrations were done gingerly. By summer of 2024, we got to where our infra team can pull “drain this host” out of their toolbelt without much ceremony.</p> <p>We’re still not to the point where we’re migrating casually. Your Fly Machines are probably not getting migrated! There&rsquo;d need to be a reason! But the dream is fully-automated luxury space migration, in which you might get migrated semiregularly, as our systems work not just to drain problematic hosts but to rebalance workloads regularly. No time soon. But we’ll get there.</p> <p>This is the biggest thing our team has done since we replaced Nomad with flyd. Only the new billing system comes close. We did this thing not because it was easy, but because we thought it would be easy. It was not. But: worth it!</p></content> </entry> <entry> <title>AWS without Access Keys</title> <link rel="alternate" href="https://fly.io/blog/oidc-cloud-roles/"/> <id>https://fly.io/blog/oidc-cloud-roles/</id> <published>2024-06-19T00:00:00+00:00</published> <updated>2024-06-27T14:03:59+00:00</updated> <media:thumbnail url="https://fly.io/blog/oidc-cloud-roles/assets/spooky-security-skeleton-thumb.webp"/> <content type="html"><div class="lead"><p>It’s dangerous to go alone. Fly.io runs full-stack apps by transmuting Docker containers into Fly Machines: ultra-lightweight hardware-backed VMs. You can run all your dependencies on Fly.io, but sometimes, you’ll need to work with other clouds, and we’ve made that pretty simple. Try Fly.io out for yourself; your Rails or Node app <a href="https://fly.io/speedrun" title="">can be up and running in just minutes</a>.</p> </div> <p>Let&rsquo;s hypopulate you an app serving generative AI cat images based on the weather forecast, running on a <code>g4dn.xlarge</code> ECS task in AWS <code>us-east-1</code>. It&rsquo;s going great; people didn&rsquo;t realize how dependent their cat pic prefs are on barometric pressure, and you&rsquo;re all anyone can talk about.</p> <p>Word reaches Australia and Europe, but you&rsquo;re not catching on, because the… latency is too high? Just roll with us here. Anyways: fixing this is going to require replicating ECS tasks and ECR images into <code>ap-southeast-2</code> and <code>eu-central-1</code> while also setting up load balancing. Nah.</p> <p>This is the O.G. Fly.io deployment story; one deployed app, one versioned container, one command to get it running anywhere in the world.</p> <p>But you have a problem: your app relies on training data, it&rsquo;s huge, your giant employer manages it, and it&rsquo;s in S3. Getting this to work will require AWS credentials.</p> <p>You could ask your security team to create a user, give it permissions, and hand over the AWS keypair. Then you could wash your neck and wait for the blade. Passing around AWS keypairs is the beginning of every horror story told about cloud security, and security team ain&rsquo;t having it.</p> <p>There&rsquo;s a better way. It&rsquo;s drastically more secure, so your security people will at least hear you out. It&rsquo;s also so much easier on Fly.io that you might never bother creating a IAM service account again.</p> <h2 id='lets-get-it-out-of-the-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lets-get-it-out-of-the-way' aria-label='Anchor'></a><span class='plain-code'>Let&rsquo;s Get It out of the Way</span></h2> <p>We&rsquo;re going to use OIDC to set up strictly limited trust between AWS and Fly.io.</p> <ol> <li>In AWS: we&rsquo;ll add Fly.io as an <code>Identity Provider</code> in AWS IAM, giving us an ID we can plug into any IAM <code>Role</code>. </li><li>Also in AWS: we&rsquo;ll create a <code>Role</code>, give it access to the S3 bucket with our tokenized cat data, and then attach the <code>Identity Provider</code> to it. </li><li>In Fly.io, we&rsquo;ll take the <code>Role</code> ARN we got from step 2 and set it as an environment variable in our app. </li></ol> <p>Our machines will now magically have access to the S3 bucket.</p> <h2 id='what-the-what' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-what' aria-label='Anchor'></a><span class='plain-code'>What the What</span></h2> <p>A reasonable question to ask here is, &ldquo;where&rsquo;s the credential&rdquo;? Ordinarily, to give a Fly Machine access to an AWS resource, you&rsquo;d use <code>fly secrets set</code> to add an <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code> to the environment in the Machine. Here, we&rsquo;re not setting any secrets at all; we&rsquo;re just adding an ARN — which is not a credential — to the Machine.</p> <p>Here&rsquo;s what&rsquo;s happening.</p> <p>Fly.io operates an OIDC IdP at <code>oidc.fly.io</code>. It issues OIDC tokens, exclusively to Fly Machines. AWS can be configured to trust these tokens, on a role-by-role basis. That&rsquo;s the &ldquo;secret credential&rdquo;: the pre-configured trust relationship in IAM, and the public keypairs it manages. You, the user, never need to deal with these keys directly; it all happens behind the scenes, between AWS and Fly.io.</p> <p><img alt="A diagram: STS trusts OIDC.fly.io. OIDC.fly.io trusts flyd. flyd issues a token to the Machine, which proffers it to STS. STS sends an STS cred to the Machine, which then uses it to retrieve model weights from S3." src="/blog/oidc-cloud-roles/assets/oidc-diagram.webp" /></p> <p>The key actor in this picture is <code>STS</code>, the AWS <code>Security Token Service</code>. <code>STS</code>&lsquo;s main job is to vend short-lived AWS credentials, usually through some variant of an API called <code>AssumeRole</code>. Specifically, in our case: <code>AssumeRoleWithWebIdentity</code> tells <code>STS</code> to cough up an AWS keypair given an OIDC token (that matches a pre-configured trust relationship).</p> <p>That still leaves the question: how does your code, which is reaching out to the AWS APIs to get cat weights, drive any of this?</p> <h2 id='the-init-thickens' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-init-thickens' aria-label='Anchor'></a><span class='plain-code'>The Init Thickens</span></h2> <p>Every Fly Machine boots up into an <code>init</code> we wrote in Rust. It has slowly been gathering features.</p> <p>One of those features, which has been around for awhile, is a server for a Unix socket at <code>/.fly/api</code>, which exports a subset of the Fly Machines API to privileged processes in the Machine. Think of it as our answer to the EC2 Instant Metadata Service. How it works is, every time we boot a Fly Machine, we pass it a <a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''>Macaroon token</a> locked to that particular Machine; <code>init</code>&rsquo;s server for <code>/.fly/api</code> is a proxy that attaches that token to requests.</p> <div class="right-sidenote"><p>In addition to the API proxy being tricky to SSRF to.</p> </div> <p>What&rsquo;s neat about this is that the credential that drives <code>/.fly/api</code> is doubly protected:</p> <ol> <li>The Fly.io platform won&rsquo;t honor it unless it comes from that specific Fly Machine (<code>flyd</code>, our orchestrator, knows who it&rsquo;s talking to), <em>and</em> </li><li>Ordinary code running in a Fly Machine never gets a copy of the token to begin with. </li></ol> <p>You could rig up a local privilege escalation vulnerability and work out how to steal the Macaroon, but you can&rsquo;t exfiltrate it productively.</p> <p>So now you have half the puzzle worked out: OIDC is just part of the <a href='https://fly.io/docs/machines/api/' title=''>Fly Machines API</a> (specifically: <code>/v1/tokens/oidc</code>). A Fly Machine can hit a Unix socket and get an OIDC token tailored to that machine:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-9o3904mp" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-9o3904mp">{ "app_id": "3671581", "app_name": "weather-cat", "aud": "sts.amazonaws.com", "image": "image:latest", "image_digest": "sha256:dff79c6da8dd4e282ecc6c57052f7cfbd684039b652f481ca2e3324a413ee43f", "iss": "https://oidc.fly.io/example", "machine_id": "3d8d377ce9e398", "machine_name": "ancient-snow-4824", "machine_version": "01HZJXGTQ084DX0G0V92QH3XW4", "org_id": "29873298", "org_name": "example", "region": "yyz", "sub": "example:weather-cat:ancient-snow-4824" } // some OIDC stuff trimmed </code></pre> </div> </div> <p>Look upon this holy blob, sealed with a published key managed by Fly.io&rsquo;s OIDC vault, and see that there lies within it enough information for AWS <code>STS</code> to decide to issue a session credential.</p> <p>We have still not completed the puzzle, because while you can probably now see how you&rsquo;d drive this process with a bunch of new code that you&rsquo;d tediously write, you are acutely aware that you have not yet endured that tedium — e pur si muove!</p> <p>One <code>init</code> feature remains to be disclosed, and it&rsquo;s cute.</p> <p>If, when <code>init</code> starts in a Fly Machine, it sees an <code>AWS_ROLE_ARN</code> environment variable set, it initiates a little dance; it:</p> <ol> <li>goes off and generates an OIDC token, the way we just described, </li><li>saves that OIDC token in a file, <em>and</em> </li><li>sets the <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> and <code>AWS_ROLE_SESSION_NAME</code> environment variables for every process it launches. </li></ol> <p>The AWS SDK, linked to your application, does all the rest.</p> <p>Let&rsquo;s review: you add an <code>AWS_ROLE_ARN</code> variable to your Fly App, launch a Machine, and have it go fetch a file from S3. What happens next is:</p> <ol> <li><code>init</code> detects <code>AWS_ROLE_ARN</code> is set as an environment variable. </li><li><code>init</code> sends a request to <code>/v1/tokens/oidc</code> via <code>/.api/proxy</code>. </li><li><code>init</code> writes the response to <code>/.fly/oidc_token.</code> </li><li><code>init</code> sets <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> and <code>AWS_ROLE_SESSION_NAME</code>. </li><li>The entrypoint boots, and (say) runs <code>aws s3 get-object.</code> </li><li>The AWS SDK runs through the <a href='https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html#credentialProviderChain' title=''>credential provider chain</a> </li><li>The SDK sees that <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> is set and calls <code>AssumeRoleWithWebIdentity</code> with the file contents. </li><li>AWS verifies the token against <a href='https://oidc.fly.io/' title=''><code>https://oidc.fly.io/</code></a><code>example/.well-known/openid-configuration</code>, which references a key Fly.io manages on isolated hardware. </li><li>AWS vends <code>STS</code> credentials for the assumed <code>Role</code>. </li><li>The SDK uses the <code>STS</code> credentials to access the S3 bucket. </li><li>AWS checks the <code>Role</code>&rsquo;s IAM policy to see if it has access to the S3 bucket. </li><li>AWS returns the contents of the bucket object. </li></ol> <h2 id='how-much-better-is-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-better-is-this' aria-label='Anchor'></a><span class='plain-code'>How Much Better Is This?</span></h2> <p>It is a lot better.</p> <div class="right-sidenote"><p>They asymptotically approach the security properties of Macaroon tokens.</p> </div> <p>Most importantly: AWS <code>STS</code> credentials are short-lived. Because they&rsquo;re generated dynamically, rather than stored in a configuration file or environment variable, they&rsquo;re already a little bit annoying for an attacker to recover. But they&rsquo;re also dead in minutes. They have a sharply limited blast radius. They rotate themselves, and fail closed.</p> <p>They&rsquo;re also easier to manage. This is a rare instance where you can reasonably drive the entire AWS side of the process from within the web console. Your cloud team adds <code>Roles</code> all the time; this is just a <code>Role</code> with an extra snippet of JSON. The resulting ARN isn&rsquo;t even a secret; your cloud team could just email or Slack message it back to you.</p> <p>Finally, they offer finer-grained control.</p> <p>To understand the last part, let&rsquo;s look at that extra snippet of JSON (the &ldquo;Trust Policy&rdquo;) your cloud team is sticking on the new <code>cat-bucket</code> <code>Role</code>:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-la5jlerc" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-la5jlerc">{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::123456123456:oidc-provider/oidc.fly.io/example" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "oidc.fly.io/example:aud": "sts.amazonaws.com", }, "StringLike": { "oidc.fly.io/example:sub": "example:weather-cat:*" } } } ] } </code></pre> </div> </div><div class="right-sidenote"><p>The <code>aud</code> check guarantees <code>STS</code> will only honor tokens that Fly.io deliberately vended for <code>STS</code>.</p> </div> <p>Recall the OIDC token we dumped earlier; much of what&rsquo;s in it, we can match in the Trust Policy. Every OIDC token Fly.io generates is going to have a <code>sub</code> field formatted <code>org:app:machine</code>, so we can lock IAM <code>Roles</code> down to organizations, or to specific Fly Apps, or even specific Fly Machine instances.</p> <figure class="post-cta"> <figcaption> <h1>Speedrun your app onto Fly.io.</h1> <p>3&hellip;2&hellip;1&hellip;</p> <a class="btn btn-lg" href="https://fly.io/speedrun"> Go! &nbsp;<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-rabbit.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='and-so' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#and-so' aria-label='Anchor'></a><span class='plain-code'>And So</span></h2> <p>In case it&rsquo;s not obvious: this pattern works for any AWS API, not just S3.</p> <p>Our OIDC support on the platform and in Fly Machines will set arbitrary OIDC <code>audience</code> strings, so you can use it to authenticate to any OIDC-compliant cloud provider. It won&rsquo;t be as slick on Azure or GCP, because we haven&rsquo;t done the <code>init</code> features to light their APIs up with a single environment variable — but those features are easy, and we&rsquo;re just waiting for people to tell us what they need.</p> <p>For us, the gold standard for least-privilege, conditional access tokens remains Macaroons, and it&rsquo;s unlikely that we&rsquo;re going to do a bunch of internal stuff using OIDC. We even snuck Macaroons into this feature. But the security you&rsquo;re getting from this OIDC dance closes a lot of the gap between hardcoded user credentials and Macaroons, and it&rsquo;s easy to use — easier, in some ways, than it is to manage role-based access inside of a legacy EC2 deployment!</p></content> </entry> <entry> <title>Picture This: Open Source AI for Image Description</title> <link rel="alternate" href="https://fly.io/blog/llm-image-description/"/> <id>https://fly.io/blog/llm-image-description/</id> <published>2024-05-09T00:00:00+00:00</published> <updated>2024-05-09T17:35:04+00:00</updated> <media:thumbnail url="https://fly.io/blog/llm-image-description/assets/image-description-thumb.webp"/> <content type="html"><div class="lead"><p>I’m Nolan, and I work on Fly Machines here at Fly.io. We’re building a new public cloud—one where you can spin up CPU and GPU workloads, around the world, in a jiffy. <a href="https://fly.io/speedrun/" title="">Try us out</a>; you can be up and running in minutes. This is a post about LLMs being really helpful, and an extensible project you can build with open source on a weekend.</p> </div> <p>Picture this, if you will.</p> <p>You&rsquo;re blind. You&rsquo;re in an unfamiliar hotel room on a trip to Chicago.</p> <div class="right-sidenote"><p>If you live in Chicago IRL, imagine the hotel in Winnipeg, <a href="https://www.cbc.ca/history/EPISCONTENTSE1EP10CH3PA5LE.html" title="">the Chicago of the North</a>.</p> </div> <p>You&rsquo;ve absent-mindedly set your coffee down, and can&rsquo;t remember where. You&rsquo;re looking for the thermostat so you don&rsquo;t wake up frozen. Or, just maybe, you&rsquo;re playing a fun-filled round of &ldquo;find the damn light switch so your sighted partner can get some sleep already!&rdquo;</p> <p>If, like me, you&rsquo;ve been blind for a while, you have plenty of practice finding things without the luxury of a quick glance around. It may be more tedious than you&rsquo;d like, but you&rsquo;ll get it done.</p> <p>But the speed of innovation in machine learning and large language models has been dizzying, and in 2024 you can snap a photo with your phone and have an app like <a href='https://www.bemyeyes.com/blog/announcing-be-my-ai/' title=''>Be My AI</a> or <a href='https://www.seeingai.com/' title=''>Seeing AI</a> tell you where in that picture it found your missing coffee mug, or where it thinks the light switch is.</p> <div class="right-sidenote"><p>Creative switch locations seem to be a point of pride for hotels, so the light game may be good for a few rounds of quality entertainment, regardless of how good your AI is.</p> </div> <p>This is <em>big</em>. It&rsquo;s hard for me to state just how exciting and empowering AI image descriptions have been for me without sounding like a shill. In the past year, I&rsquo;ve:</p> <ul> <li>Found shit in strange hotel rooms. </li><li>Gotten descriptions of scenes and menus in otherwise inaccessible video games. </li><li>Requested summaries of technical diagrams and other materials where details weren’t made available textually. </li></ul> <p>I&rsquo;ve been consistently blown away at how impressive and helpful AI-created image descriptions have been.</p> <p>Also&hellip;</p> <h2 id='which-thousand-words-is-this-picture-worth' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#which-thousand-words-is-this-picture-worth' aria-label='Anchor'></a><span class='plain-code'>Which thousand words is this picture worth?</span></h2> <p>As a blind internet user for the last three decades, I have extensive empirical evidence to corroborate what you already know in your heart: humans are pretty flaky about writing useful alt text for all the images they publish. This does tend to make large swaths of the internet inaccessible to me!</p> <p>In just a few years, the state of image description on the internet has gone from complete reliance on the aforementioned lovable, but ultimately tragically flawed, humans, to automated strings of words like <code>Image may contain person, glasses, confusion, banality, disillusionment</code>, to LLM-generated text that reads a lot like it was written by a person, perhaps sipping from a steaming cup of Earl Grey as they reflect on their previous experiences of a background that features a tree with snow on its branches, suggesting that this scene takes place during winter.</p> <p>If an image is missing alt text, or if you want a second opinion, there are screen-reader addons, like <a href='https://github.com/cartertemm/AI-content-describer/' title=''>this one</a> for <a href='https://www.nvaccess.org/download/' title=''>NVDA</a>, that you can use with an API key to get image descriptions from GPT-4 or Google Gemini as you read. This is awesome! </p> <p>And this brings me to the nerd snipe. How hard would it be to build an image description service we can host ourselves, using open source technologies? It turns out to be spookily easy.</p> <p>Here&rsquo;s what I came up with:</p> <ol> <li><a href='https://ollama.com/' title=''>Ollama</a> to run the model </li><li>A <a href='https://pocketbase.io' title=''>PocketBase</a> project that provides a simple authenticated API for users to submit images, get descriptions, and ask followup questions about the image </li><li>The simplest possible Python client to interact with the PocketBase app on behalf of users </li></ol> <p>The idea is to keep it modular and hackable, so if sentiment analysis or joke creation is your thing, you can swap out image description for that and have something going in, like, a weekend.</p> <p>If you&rsquo;re like me, and you go skipping through recipe blogs to find the &ldquo;go directly to recipe&rdquo; link, find the code itself <a href='https://github.com/superfly/llm-describer' title=''>here</a>. </p> <h2 id='the-llm-is-the-easiest-part' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-llm-is-the-easiest-part' aria-label='Anchor'></a><span class='plain-code'>The LLM is the easiest part</span></h2> <p>An API to accept images and prompts, run the model, and spit out answers sounds like a lot! But it&rsquo;s the simplest part of this whole thing, because: that&rsquo;s <a href='https://ollama.com/' title=''>Ollama</a>.</p> <p>You can just run the Ollama Docker image, get it to grab the model you want to use, and that&rsquo;s it. There&rsquo;s your AI server. (We have a <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>blog post</a> all about deploying Ollama on Fly.io; Fly GPUs are rad, try&#39;em out, etc.).</p> <p>For this project, we need a model that can make sense&mdash;or at least words&mdash;out of a picture. <a href='https://llava-vl.github.io/' title=''>LLaVA</a> is a trained, Apache-licensed &ldquo;large multimodal model&rdquo; that fits the bill. Get the model with the Ollama CLI:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-wohvpptj" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-wohvpptj">ollama pull llava:34b </code></pre> </div> </div><div class="callout"><p>If you have hardware that can handle it, you could run this on your computer at home. If you run AI models on a cloud provider, be aware that GPU compute is expensive! <strong class="font-semibold text-navy-950">It’s important to take steps to ensure you’re not paying for a massive GPU 24/7.</strong></p> <p>On Fly.io, at the time of writing, you’d achieve this with the <a href="https://fly.io/docs/apps/autostart-stop/" title="">autostart and autostop</a> functions of the Fly Proxy, restricting Ollama access to internal requests over <a href="https://fly.io/docs/networking/private-networking/#flycast-private-fly-proxy-services" title="">Flycast</a> from the PocketBase app. That way, if there haven’t been any requests for a few minutes, the Fly Proxy stops the Ollama <a href="https://fly.io/docs/machines/" title="">Machine</a>, which releases the CPU, GPU, and RAM allocated to it. <a href="https://fly.io/blog/scaling-llm-ollama/" title="">Here’s a post</a> that goes into more detail. </p> </div><h2 id='a-multi-tool-on-the-backend' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-multi-tool-on-the-backend' aria-label='Anchor'></a><span class='plain-code'>A multi-tool on the backend</span></h2> <p>I want user auth to make sure just anyone can&rsquo;t grab my &ldquo;image description service&rdquo; and keep it busy generating short stories about their cat. If I build this out into a service for others to use, I might also want business logic around plans or credits, or mobile-friendly APIs for use in the field. <a href='https://pocketbase.io' title=''>PocketBase</a> provides a scaffolding for all of it. It&rsquo;s a Swiss army knife: a Firebase-like API on top of SQLite, complete with authentication, authorization, an admin UI, extensibility in JavaScript and Go, and various client-side APIs.</p> <div class="right-sidenote"><p>Yes, <em>of course</em> I’ve used an LLM to generate feline fanfic. Theme songs too. Hasn’t everyone? </p> </div> <p>I &ldquo;faked&rdquo; a task-specific API that supports followup questions by extending PocketBase in Go, modeling requests and responses as <a href='https://pocketbase.io/docs/collections/' title=''>collections</a> (i.e. SQLite tables) with <a href='https://pocketbase.io/docs/go-event-hooks/' title=''>event hooks</a> to trigger pre-set interactions with the Ollama app (via <a href='https://tmc.github.io/langchaingo' title=''>LangChainGo</a>) and the client (via the PocketBase API).</p> <p>If you&rsquo;re following along, <a href='https://github.com/superfly/llm-describer/blob/main/describer.go' title=''>here&rsquo;s the module</a> that handles all that, along with initializing the LLM connection.</p> <p>In a nutshell, this is the dance:</p> <ul> <li>When a user uploads an image, a hook on the <code>images</code> collection sends the image to Ollama, along with this prompt: <code>&quot;You are a helpful assistant describing images for blind screen reader users. Please describe this image.&quot;</code> </li><li>Ollama sends back its response, which the backend 1) passes back to the client and 2) stores in its <code>followups</code> collection for future reference. </li><li>If the user responds with a followup question about the image and description, that also goes into the <code>followups</code> collection; user-initiated changes to this collection trigger a hook to chain the new followup question with the image and the chat history into a new request for the model. </li><li>Lather, rinse, repeat. </li></ul> <p>This is a super simple hack to handle followup questions, and it’ll let you keep adding followups until something breaks. You&rsquo;ll see the quality of responses get poorer&mdash;possibly incoherent&mdash;as the context exceeds the context window.</p> <p>I also set up <a href='https://pocketbase.io/docs/api-rules-and-filters/' title=''>API rules</a> in PocketBase, ensuring that users can&rsquo;t read to and write from others&rsquo; chats with the AI.</p> <p>If image descriptions aren&rsquo;t your thing, this business logic is easily swappable for joke generation, extracting details from text, any other simple task you might want to throw at an LLM. Just slot the best model into Ollama (LLaVA is pretty OK as a general starting point too), and match the PocketBase schema and pre-set prompts to your application.</p> <h2 id='a-seedling-of-a-client' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-seedling-of-a-client' aria-label='Anchor'></a><span class='plain-code'>A seedling of a client</span></h2> <p>With the image description service in place, the user can talk to it with any client that speaks the PocketBase API. PocketBase already has SDK clients in JavaScript and Dart, but because my screen reader is <a href='https://github.com/nvaccess/nvda' title=''>written in Python</a>, I went with a <a href='https://pypi.org/project/pocketbase/' title=''>community-created Python library</a>. That way I can build this out into an NVDA add-on if I want to.</p> <p>If you&rsquo;re a fancy Python developer, you probably have your preferred tooling for handling virtualenvs and friends. I&rsquo;m not, and since my screen reader doesn&rsquo;t use those anyway, I just <code>pip install</code>ed the library so my client can import it:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-s8xqjyx2" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-s8xqjyx2">pip install pocketbase </code></pre> </div> </div> <p><a href='https://github.com/superfly/llm-describer/blob/main/__init__.py' title=''>My client</a> is a very simple script. It expects a couple of things: a file called <code>image.jpg</code>, located in the current directory, and environment variables to provide the service URL and user credentials to log into it with.</p> <p>When you run the client script, it uploads the image to the user’s <code>images</code> collection on the backend app, starting the back-and-forth between user and model we saw in the previous section. The client prints the model&rsquo;s output to the CLI and prompts the user to input a followup question, which it passes up to the <code>followups</code> collection, and so on.</p> <figure class="post-cta"> <figcaption> <h1>This can run on Fly.io.</h1> <p>Run your LLM on a datacenter-grade GPU.</p> <a class="btn btn-lg" href="https://fly.io/gpu/"> Try out a Fly GPU &nbsp;<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='all-together-now' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#all-together-now' aria-label='Anchor'></a><span class='plain-code'>All together now</span></h2> <p>I grabbed <a href='https://unsplash.com/photos/brown-trees-beside-river-under-blue-sky-during-daytime-kSvpTrfhaiU' title=''>this image</a> and saved it to a file called <em>image.jpg</em>. </p> <p>While I knew I was downloading an image of a winter scene, all I see on Unsplash is:</p> <blockquote> <p>brown trees beside river under blue sky during daytime Bright winter landscape with lake, snow, forest, beautiful blue sky and white clouds. An example of charming wildlife in Russia.</p> </blockquote> <p>Let&rsquo;s see what our very own AI describer thinks of this picture:</p> <div class="highlight-wrapper group relative plain"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-lvuwb8nb" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-lvuwb8nb">$ python __init__.py The image depicts a serene winter scene. There is a small stream of water flowing through the center, with patches of snow on its surface and along the banks. The surrounding area appears to be a mix of bare trees and shrubs, suggesting it's late in the season or possibly early spring when some plants have started to grow again but haven't yet leafed out. The sky is clear and blue, indicating good weather conditions at the time the photo was taken. The sunlight seems to be coming from the left side of the image, casting a warm glow on the scene and creating reflections in the water. There are no visible texts or distinguishing marks that provide additional context about the location or the photographer. The overall atmosphere is peaceful and natural, with the quiet beauty of the landscape undisturbed except for the gentle movement of the stream. </code></pre> </div> </div> <p>Is it a stellar description? Maybe not, but it certainly gives me a better sense of connection with the scene.</p> <p>Let&rsquo;s see how our describer copes with a followup question.</p> <div class="highlight-wrapper group relative plain"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-dgfkbrw6" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-dgfkbrw6">Enter your followup question, or 'quit' to quit: What types of trees are in the image? Sending followup question It's difficult to determine the exact species of trees in the image without more specific details, as many deciduous trees lose their leaves during winter and look quite similar. However, based on the general appearance of the bare branches and the fact that they are leafless, it is likely that these are common deciduous trees such as oaks, maples, or elms. The presence of snow and the absence of foliage suggest that this photo was taken during winter or early spring when the trees have not yet started to bloom again. The exact species would require a closer examination of the bark, buds, and other distinguishing features, which are not clearly visible in this image. </code></pre> </div> </div> <p>Boo, the general-purpose LLaVA model couldn&rsquo;t identify the leafless trees. At least it knows why it can&rsquo;t. Maybe there&rsquo;s a better model out there for that. Or we could train one, if we really needed tree identification! We could make every component of this service more sophisticated! </p> <p>But that I, personally, can make a proof of concept like this with a few days of effort continues to boggle my mind. Thanks to a handful of amazing open source projects, it&rsquo;s really, spookily, easy. And from here, I (or you) can build out a screen-reader addon, or a mobile app, or a different kind of AI service, with modular changes.</p> <h2 id='deployment-notes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-notes' aria-label='Anchor'></a><span class='plain-code'>Deployment notes</span></h2> <p>On Fly.io, stopping GPU Machines saves you a bunch of money and some carbon footprint, in return for cold-start latency when you make a request for the first time in more than a few minutes. In testing this project, on the <code>a100-40gb</code> Fly Machine preset, the 34b-parameter LLaVA model took several seconds to generate each response. If the Machine was stopped when the request came in, starting it up took another handful of seconds, followed by several tens of seconds to load the model into GPU RAM. The total time from cold start to completed description was about 45 seconds. Just something to keep in mind.</p> <p>If you&rsquo;re running Ollama in the cloud, you likely want to put the model onto storage that&rsquo;s persistent, so you don&rsquo;t have to download it repeatedly. You could also build the model into a Docker image ahead of deployment.</p> <p>The PocketBase Golang app compiles to a single executable that you can run wherever. I run it on Fly.io, unsurprisingly, and the <a href='https://github.com/superfly/llm-describer/' title=''>repo</a> comes with a Dockerfile and a <a href='https://fly.io/docs/reference/configuration/' title=''><code>fly.toml</code></a> config file, which you can edit to point at your own Ollama instance. It uses a small persistent storage volume for the SQLite database. Under testing, it runs fine on a <code>shared-cpu-1x</code> Machine. </p></content> </entry> <entry> <title>JIT WireGuard</title> <link rel="alternate" href="https://fly.io/blog/jit-wireguard-peers/"/> <id>https://fly.io/blog/jit-wireguard-peers/</id> <published>2024-03-12T00:00:00+00:00</published> <updated>2024-05-09T17:35:04+00:00</updated> <media:thumbnail url="https://fly.io/blog/jit-wireguard-peers/assets/network-thumbnail.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. We do a lot of stuff with WireGuard, which has become a part of our customer API. This is a quick story about some tricks we played to make WireGuard faster and more scalable for the hundreds of thousands of people who now use it here.</p> </div> <p>One of many odd decisions we&rsquo;ve made at Fly.io is how we use WireGuard. It&rsquo;s not just that we use it in many places where other shops would use HTTPS and REST APIs. We&rsquo;ve gone a step beyond that: every time you run <code>flyctl</code>, our lovable, sprawling CLI, it conjures a TCP/IP stack out of thin air, with its own IPv6 address, and speaks directly to Fly Machines running on our networks.</p> <p>There are plusses and minuses to this approach, which we talked about <a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''>in a blog post a couple years back</a>. Some things, like remote-operated Docker builders, get easier to express (a Fly Machine, as far as <code>flyctl</code> is concerned, might as well be on the same LAN). But everything generally gets trickier to keep running reliably.</p> <p>It was a decision. We own it.</p> <p>Anyways, we&rsquo;ve made some improvements recently, and I&rsquo;d like to talk about them.</p> <h2 id='where-we-left-off' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-we-left-off' aria-label='Anchor'></a><span class='plain-code'>Where we left off</span></h2> <p>Until a few weeks ago, our gateways ran on a pretty simple system.</p> <ol> <li>We operate dozens of &ldquo;gateway&rdquo; servers around the world, whose sole purpose is to accept incoming WireGuard connections and connect them to the appropriate private networks. </li><li>Any time you run <code>flyctl</code> and it needs to talk to a Fly Machine (to build a container, pop an SSH console, copy files, or proxy to a service you&rsquo;re running), it spawns or connects to a background agent process. </li><li>The first time it runs, the agent generates a new WireGuard peer configuration from our GraphQL API. WireGuard peer configurations are very simple: just a public key and an address to connect to. </li><li>Our API in turn takes that peer configuration and sends it to the appropriate gateway (say, <code>ord</code>, if you&rsquo;re near Chicago) via an RPC we send over the NATS messaging system. </li><li>On the gateway, a service called <code>wggwd</code> accepts that configuration, saves it to a SQLite database, and adds it to the kernel using WireGuard&rsquo;s Golang libraries. <code>wggwd</code> acknowledges the installation of the peer to the API. </li><li>The API replies to your GraphQL request, with the configuration. </li><li>Your <code>flyctl</code> connects to the WireGuard peer, which works, because you receiving the configuration means it’s installed on the gateway. </li></ol> <p>I copy-pasted those last two bullet points from <a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''>that two-year-old post</a>, because when it works, it does <em>just work</em> reasonably well. (We ultimately did end up defaulting everybody to WireGuard-over-WebSockets, though.)</p> <p>But if it always worked, we wouldn&rsquo;t be here, would we?</p> <p>We ran into two annoying problems:</p> <p>One: NATS is fast, but doesn’t guarantee delivery. Back in 2022, Fly.io was pretty big on NATS internally. We&rsquo;ve moved away from it. For instance, our <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>internal <code>flyd</code> API</a> used to be driven by NATS; today, it&rsquo;s HTTP. Our NATS cluster was losing too many messages to host a reliable API on it. Scaling back our use of NATS made WireGuard gateways better, but still not great.</p> <p>Two: When <code>flyctl</code> exits, the WireGuard peer it created sticks around on the gateway. Nothing cleans up old peers. After all, you&rsquo;re likely going to come back tomorrow and deploy a new version of your app, or <code>fly ssh console</code> into it to debug something. Why remove a peer just to re-add it the next day?</p> <p>Unfortunately, the vast majority of peers are created by <code>flyctl</code> in CI jobs, which don&rsquo;t have persistent storage and can&rsquo;t reconnect to the same peer the next run; they generate new peers every time, no matter what.</p> <p>So, we ended up with a not-reliable-enough provisioning system, and gateways with hundreds of thousands of peers that will never be used again. The high stale peer count made kernel WireGuard operations very slow - especially loading all the peers back into the kernel after a gateway server reboot - as well as some kernel panics.</p> <p>There had to be</p> <h2 id='a-better-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-better-way' aria-label='Anchor'></a><span class='plain-code'>A better way.</span></h2> <p>Storing bajillions of WireGuard peers is no big challenge for any serious n-tier RDBMS. This isn&rsquo;t &ldquo;big data&rdquo;. The problem we have at Fly.io is that our gateways don&rsquo;t have serious n-tier RDBMSs. They&rsquo;re small. Scrappy. They live off the land.</p> <p>Seriously, though: you could store every WireGuard peer everybody has ever used at Fly.io in a single SQLite database, easily. What you can&rsquo;t do is store them all in the Linux kernel.</p> <p>So, at some point, as you push more and more peer configurations to a gateway, you have to start making decisions about which peers you&rsquo;ll enable in the kernel, and which you won&rsquo;t.</p> <p>Wouldn&rsquo;t it be nice if we just didn&rsquo;t have this problem? What if, instead of pushing configs to gateways, we had the gateways pull them from our API on demand?</p> <p>If you did that, peers would only have to be added to the kernel when the client wanted to connect. You could yeet them out of the kernel any time you wanted; the next time the client connected, they&rsquo;d just get pulled again, and everything would work fine.</p> <p>The problem you quickly run into to build this design is that Linux kernel WireGuard doesn&rsquo;t have a feature for installing peers on demand. However:</p> <h2 id='it-is-possible-to-jit-wireguard-peers' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#it-is-possible-to-jit-wireguard-peers' aria-label='Anchor'></a><span class='plain-code'>It is possible to JIT WireGuard peers</span></h2> <p>The Linux kernel&rsquo;s <a href='https://github.com/WireGuard/wgctrl-go' title=''>interface for configuring WireGuard</a> is <a href='https://docs.kernel.org/userspace-api/netlink/intro.html' title=''>Netlink</a> (which is basically a way to create a userland socket to talk to a kernel service). Here&rsquo;s a <a href='https://github.com/WireGuard/wg-dynamic/blob/master/netlink.h' title=''>summary of it as a C API</a>. Note that there&rsquo;s no API call to subscribe for &ldquo;incoming connection attempt&rdquo; events.</p> <p>That&rsquo;s OK! We can just make our own events. WireGuard connection requests are packets, and they&rsquo;re easily identifiable, so we can efficiently snatch them with a BPF filter and a <a href='https://github.com/google/gopacket' title=''>packet socket</a>.</p> <div class="callout"><p>Most of the time, it’s even easier for us to get the raw WireGuard packets, because our users now default to WebSockets WireGuard (which is just an unauthenticated WebSockets connect that shuttles framed UDP packets to and from an interface on the gateway), so that people who have trouble talking end-to-end in UDP can bring connections up.</p> </div> <p>We own the daemon code for that, and can just hook the packet receive function to snarf WireGuard packets.</p> <p>It&rsquo;s not obvious, but WireGuard doesn&rsquo;t have notions of &ldquo;client&rdquo; or &ldquo;server&rdquo;. It&rsquo;s a pure point-to-point protocol; peers connect to each other when they have traffic to send. The first peer to connect is called the <strong class='font-semibold text-navy-950'>initiator</strong>, and the peer it connects to is the <strong class='font-semibold text-navy-950'>responder</strong>.</p> <div class="right-sidenote"><p><a href="https://www.wireguard.com/papers/wireguard.pdf" title=""><em>The WireGuard paper</em></a> <em>is a good read.</em></p> </div> <p>For Fly.io, <code>flyctl</code> is typically our initiator, sending a single UDP packet to the gateway, which is the responder. According <a href='https://www.wireguard.com/papers/wireguard.pdf' title=''>to the WireGuard paper</a>, this first packet is a <code>handshake initiation</code>. It gets better: the packet type is recorded in a single plaintext byte. So this simple BPF filter catches all the incoming connections: <code>udp and dst port 51820 and udp[8] = 1</code>.</p> <p>In most other protocols, we&rsquo;d be done at this point; we&rsquo;d just scrape the username or whatnot out of the packet, go fetch the matching configuration, and install it in the kernel. With WireGuard, not so fast. WireGuard is based on Trevor Perrin&rsquo;s <a href='http://www.noiseprotocol.org/' title=''>Noise Protocol Framework</a>, and Noise goes way out of its way to <a href='http://www.noiseprotocol.org/noise.html#identity-hiding' title=''>hide identities</a> during handshakes. To identify incoming requests, we&rsquo;ll need to run enough Noise cryptography to decrypt the identity.</p> <p>The code to do this is fussy, but it&rsquo;s relatively short (about 200 lines). Helpfully, the kernel Netlink interface will give a privileged process the private key for an interface, so the secrets we need to unwrap WireGuard are easy to get. Then it&rsquo;s just a matter of running the first bit of the Noise handshake. If you&rsquo;re that kind of nerdy, <a href='https://gist.github.com/tqbf/9f2c2852e976e6566f962d9bca83062b' title=''>here&rsquo;s the code.</a></p> <p>At this point, we have the event feed we wanted: the public keys of every user trying to make a WireGuard connection to our gateways. We keep a rate-limited cache in SQLite, and when we see new peers, we&rsquo;ll make an internal HTTP API request to fetch the matching peer information and install it. This fits nicely into the little daemon that already runs on our gateways to manage WireGuard, and allows us to ruthlessly and recklessly remove stale peers with a <code>cron</code> job.</p> <p>But wait! There&rsquo;s more! We bounced this plan off Jason Donenfeld, and he tipped us off on a sneaky feature of the Linux WireGuard Netlink interface.</p> <div class="right-sidenote"><p>Jason is the hardest working person in show business.</p> </div> <p>Our API fetch for new peers is generally not going to be fast enough to respond to the first handshake initiation message a new client sends us. That&rsquo;s OK; WireGuard is pretty fast about retrying. But we can do better.</p> <p>When we get an incoming initiation message, we have the 4-tuple address of the desired connection, including the ephemeral source port <code>flyctl</code> is using. We can install the peer as if we&rsquo;re the initiator, and <code>flyctl</code> is the responder. The Linux kernel will initiate a WireGuard connection back to <code>flyctl</code>. This works; the protocol doesn&rsquo;t care a whole lot who&rsquo;s the server and who&rsquo;s the client. We get new connections established about as fast as they can possibly be installed.</p> <figure class="post-cta"> <figcaption> <h1>Launch an app in minutes</h1> <p>Speedrun an app onto Fly.io and get your own JIT WireGuard peer&nbsp✨</p> <a class="btn btn-lg" href="/docs/speedrun/"> Speedrun &nbsp;<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-dog.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='look-at-this-graph' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-this-graph' aria-label='Anchor'></a><span class='plain-code'>Look at this graph</span></h2> <p>We&rsquo;ve been running this in production for a few weeks and we&rsquo;re feeling pretty happy about it. We went from thousands, or hundreds of thousands, of stale WireGuard peers on a gateway to what rounds to none. Gateways now hold a lot less state, are faster at setting up peers, and can be rebooted without having to wait for many unused peers to be loaded back into the kernel.</p> <p>I&rsquo;ll leave you with this happy Grafana chart from the day of the switchover.</p> <p><img alt="a Grafana chart of &#39;kernel_stale_wg_peer_count&#39; vs. time. For the first few hours, all traces are flat. Most are at values between 0 and 50,000 and the top-most is just under 550,000. Towards the end of the graph, each line in turn jumps sharply down to the bottom, and at the end of the chart all datapoints are indistinguishable from 0." src="/blog/jit-wireguard-peers/assets/wireguard-peers-graph.webp" /></p> <p><strong class='font-semibold text-navy-950'>Editor&rsquo;s note:</strong> Despite our tearful protests, Lillian has decided to move on from Fly.io to explore new pursuits. We wish her much success and happiness!&nbsp;✨</p></content> </entry> <entry> <title>Fly Kubernetes does more now</title> <link rel="alternate" href="https://fly.io/blog/fks-beta-live/"/> <id>https://fly.io/blog/fks-beta-live/</id> <published>2024-03-07T00:00:00+00:00</published> <updated>2024-04-22T18:28:43+00:00</updated> <media:thumbnail url="https://fly.io/blog/fks-beta-live/assets/fks-thumb.webp"/> <content type="html"><div class="lead"><p>Eons ago, we <a href="https://fly.io/blog/fks/" title="">announced</a> we were working on <a href="https://fly.io/docs/kubernetes/" title="">Fly Kubernetes</a>. It drummed up enough excitement to prove we were heading in the right direction. So, we got hard to work to get from barebones “early access” to a beta release. We’ll be onboarding customers to the closed beta over the next few weeks. Email us at <a href="mailto:[email protected]">[email protected]</a> and we’ll hook you up.</p> </div> <p>Fly Kubernetes is the &ldquo;blessed path&quot;™️ to using Kubernetes backed by Fly.io infrastructure. Or, in simpler terms, it is our managed Kubernetes service. We take care of the complexity of operating the Kubernetes control plane, leaving you with the unfettered joy of deploying your Kubernetes workloads. If you love Fly.io and K8s, this product is for you.</p> <h2 id='what-even-is-a-kubernete' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-even-is-a-kubernete' aria-label='Anchor'></a><span class='plain-code'>What even is a Kubernete?</span></h2> <p>So how did this all come to be&mdash;and what even is a Kubernete?</p> <div class="right-sidenote"><p>You can see more fun details in <a href="https://fly.io/blog/fks/" title="">Introducing Fly Kubernetes</a>.</p> </div> <p>If you wade through all the YAML and <a href='https://landscape.cncf.io/' title=''>CNCF projects</a>, what&rsquo;s left is an API for declaring workloads and how it should be accessed. </p> <p>But that&rsquo;s not what people usually talk / groan about. It&rsquo;s everything else that comes along with adopting Kubernetes: a container runtime (CRI), networking between workloads (CNI) which leads to DNS (CoreDNS). Then you layer on Prometheus for metrics and whatever the logging daemon du jour is at the time. Now you get to debate which Ingress&mdash;strike that&mdash;<em>Gateway</em> API to deploy and if the next thing is anything to do with a Service Mess, then as they like to say where I live, &quot;bless your heart&rdquo;.</p> <p>Finally, there&rsquo;s capacity planning. You&rsquo;ve got to pick and choose where, how and what the <a href='https://kubernetes.io/docs/concepts/architecture/nodes/' title=''>Nodes</a> will look like in order to configure and run the workloads.</p> <p>When we began thinking about what a Fly Kubernetes Service could look like, we started from first principles, as we do with most everything here. The best way we can describe it is the <a href='https://www.youtube.com/watch?v=Ddk9ci6geSs' title=''>scene from Iron Man 2 when Tony Stark discovers a new element</a>. As he&rsquo;s looking at the knowledge left behind by those that came before, he starts to imagine something entirely different and more capable than could have been accomplished previously. That&rsquo;s what happened to JP, but with K3s and Virtual Kubelet.</p> <h2 id='ok-then-wtf-whats-the-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ok-then-wtf-whats-the-fks' aria-label='Anchor'></a><span class='plain-code'>OK then, WTF (what&rsquo;s the FKS)?</span></h2> <p>We looked at what people need to get started—the API—and then started peeling away all the noise, filling in the gaps to connect things together to provide the power. Here&rsquo;s how this looks currently:</p> <ul> <li>Containerd/CRI → <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>flyd</a> + Firecracker + <a href='https://fly.io/blog/docker-without-docker/' title=''>our init</a>: our system transmogrifies Docker containers into Firecracker microVMs </li><li>Networking/CNI → Our <a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''>internal WireGuard mesh</a> connects your pods together </li><li>Pods → Fly Machines VMs </li><li>Secrets → Secrets, only not the base64&rsquo;d kind </li><li>Services → The Fly Proxy </li><li>CoreDNS → CoreDNS (to be replaced with our custom internal DNS) </li><li>Persistent Volumes → Fly Volumes (coming soon) </li></ul> <p>Now&hellip;not everything is a one-to-one comparison, and we explicitly did not set out to support any and every configuration. We aren&rsquo;t dealing with resources like Network Policy and init containers, though we&rsquo;re also not completely ignoring them. By mapping many of the core primitives of Kubernetes to a Fly.io resource, we&rsquo;re able to focus on continuing to build the primitives that make our cloud better for workloads of all shapes and sizes.</p> <p>A key thing to notice above is that there&rsquo;s no &ldquo;Node&rdquo;.</p> <p><a href='https://virtual-kubelet.io/' title=''>Virtual Kubelet</a> plays a central role in FKS. It&rsquo;s magic, really. A Virtual Kubelet acts as if it&rsquo;s a standard Kubelet running on a Node, eager to run your workloads. However, there&rsquo;s no Node backing it. It instead behaves like an API, receiving requests from Kubernetes and transforming them into requests to deploy on a cloud compute service. In our case, that&rsquo;s Fly Machines.</p> <p>So what we have is Kubernetes calling out to our <a href='https://virtual-kubelet.io/docs/providers/' title=''>Virtual Kubelet provider</a>, a small Golang program we run alongside K3s, to create and run your pod. It creates <a href='https://fly.io/blog/docker-without-docker/' title=''>your pod as a Fly Machine</a>, via the <a href='/docs/machines/api/' title=''>Fly Machines API</a>, deploying it to any underlying host within that region. This shifts the burden of managing hardware capacity from you to us. We think that&rsquo;s a cool trick&mdash;thanks, Virtual Kubelet magic!</p> <h2 id='speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#speedrun' aria-label='Anchor'></a><span class='plain-code'>Speedrun</span></h2> <p>You can deploy your workloads (including GPUs) across any of our available regions using the Kubernetes API.</p> <p>You create a cluster with <code>flyctl</code>:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-vomuctp1" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-vomuctp1">fly ext k8s create --name hello --org personal --region iad </code></pre> </div> </div> <p>When a cluster is created, it has the standard <code>default</code> namespace. You can inspect it:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-f85r6bqf" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-f85r6bqf">kubectl get ns default --show-labels </code></pre> </div> </div><div class="highlight-wrapper group relative output"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-6bmj8nmt" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight output whitespace-pre'><code id="code-6bmj8nmt">NAME STATUS AGE LABELS default Active 20d fly.io/app=fks-default-7zyjm3ovpdxmd0ep,kubernetes.io/metadata.name=default </code></pre> </div> </div> <p>The <code>fly.io/app</code> label shows the name of the Fly App that corresponds to your cluster.</p> <p>It would seem appropriate to deploy the <a href='https://github.com/kubernetes-up-and-running/kuard' title=''>Kubernetes Up And Running demo</a> here, but since your pods are connected over an <a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''>IPv6 WireGuard mesh</a>, we&rsquo;re going to use a <a href='https://github.com/jipperinbham/kuard' title=''>fork</a> with support for <a href='https://github.com/kubernetes-up-and-running/kuard/issues/46' title=''>IPv6 DNS</a>.</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-h0ws84lr" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-h0ws84lr">kubectl run \ --image=ghcr.io/jipperinbham/kuard-amd64:blue \ --labels="app=kuard-fks" \ kuard </code></pre> </div> </div> <p>And you can see its Machine representation via:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-ktbm1ey3" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-ktbm1ey3">fly machine list --app fks-default-7zyjm3ovpdxmd0ep </code></pre> </div> </div><div class="highlight-wrapper group relative output"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-httmdmgs" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight output whitespace-pre'><code id="code-httmdmgs">ID NAME STATE REGION IMAGE IP ADDRESS VOLUME CREATED LAST UPDATED APP PLATFORM PROCESS GROUP SIZE 1852291c46ded8 kuard started iad jipperinbham/kuard-amd64:blue fdaa:0:48c8:a7b:228:4b6d:6e20:2 2024-03-05T18:54:41Z 2024-03-05T18:54:44Z shared-cpu-1x:256MB </code></pre> </div> </div> <p></div></p> <p>This is important! Your pod is a Fly Machine! While we don’t yet support all kubectl features, Fly.io tooling will &ldquo;just work&rdquo; for cases where we don&rsquo;t yet support the kubectl way. So, for example, we don&rsquo;t have <code>kubectl port-forward</code> and <code>kubectl exec</code>, but you can use flyctl to forward ports and get a shell into a pod.</p> <p>Expose it to your internal network using the standard ClusterIP Service:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-9dy6iy1l" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-9dy6iy1l">kubectl expose pod kuard \ --name=kuard \ --port=8080 \ --target-port=8080 \ --selector='app=kuard-fks' </code></pre> </div> </div> <p>ClusterIP Services work natively, and Fly.io internal DNS supports them. Within the cluster, CoreDNS works too.</p> <p>Access this Service locally via <a href='https://fly.io/docs/networking/private-networking/#flycast-private-load-balancing' title=''>flycast</a>: Get connected to your org&rsquo;s <a href='https://fly.io/docs/networking/private-networking/' title=''>6PN private WireGuard network</a>. Get kubectl to describe the <code>kuard</code> Service:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-luy1nk1t" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-luy1nk1t">kubectl describe svc kuard </code></pre> </div> </div><div class="highlight-wrapper group relative output"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-r8ykf5mk" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight output'><code id="code-r8ykf5mk">Name: kuard Namespace: default Labels: app=kuard-fks Annotations: fly.io/clusterip-allocator: configured service.fly.io/sync-version: 11507529969321451315 Selector: app=kuard-fks Type: ClusterIP IP Family Policy: SingleStack IP Families: IPv6 IP: fdaa:0:48c8:0:1::1a IPs: fdaa:0:48c8:0:1::1a Port: &lt;unset&gt; 8080/TCP TargetPort: 8080/TCP Endpoints: [fdaa:0:48c8:a7b:228:4b6d:6e20:2]:8080 Session Affinity: None Events: &lt;none&gt; </code></pre> </div> </div> <p>You can pull out the Service&rsquo;s IP address from the above output, and get at the KUARD UI using that: in this case, <code>http://[fdaa:0:48c8:0:1::1a]:8080</code>. </p> <p>Using internal DNS: <code>http://&lt;service_name&gt;.svc.&lt;app_name&gt;.flycast:8080</code>. Or, in our example: <code>http://kuard.svc.fks-default-7zyjm3ovpdxmd0ep.flycast:8080</code>.</p> <p>And finally CoreDNS: <code>&lt;service_name&gt;.&lt;namespace&gt;.svc.cluster.local</code> resolves to the <code>fdaa</code> IP and is routable within the cluster.</p> <figure class="post-cta"> <figcaption> <h1>Get in on the FKS beta</h1> <p>Email us at [email protected]</p> </figcaption> <div class="image-container"> <img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='pricing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pricing' aria-label='Anchor'></a><span class='plain-code'>Pricing</span></h2> <p>The Fly Kubernetes Service is free during the beta. Fly Machines and Fly Volumes you create with it will cost the <a href='https://fly.io/docs/about/pricing/' title=''>same as for your other Fly.io projects</a>. It&rsquo;ll be <a href='https://fly.io/docs/about/pricing/#fly-kubernetes' title=''>$75/mo per cluster</a> after that, plus the cost of the other resources you create.</p> <h2 id='today-and-the-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#today-and-the-future' aria-label='Anchor'></a><span class='plain-code'>Today and the future</span></h2> <p>Today, Fly Kubernetes supports only a portion of the Kubernetes API. You can deploy pods using Deployments/ReplicaSets. Pods are able to communicate via Services using the standard K8s DNS format. Ephemeral and persistent volumes are supported.</p> <p>The most notable absences are: multi-container pods, StatefulSets, network policies, horizontal pod autoscaling and emptyDir volumes. We&rsquo;re working at supporting autoscaling and emptyDir volumes in the coming weeks and multi-container pods in the coming months.</p> <p>If you&rsquo;ve made it this far and are eagerly awaiting your chance to tell us and the rest of the internet &ldquo;this isn&rsquo;t Kubernetes!&rdquo;, well, we agree! It&rsquo;s not something we take lightly. We&rsquo;re still building, and conformance tests may be in the future for FKS. We&rsquo;ve made a deliberate decision to only care about fast launching VMs as the one and only way to run workloads on our cloud. And we also know enough of our customers would like to use the Kubernetes API to create a fast launching VM in the form of a Pod, and that&rsquo;s where this story begins. </p></content> </entry> <entry> <title>Globally Distributed Object Storage with Tigris</title> <link rel="alternate" href="https://fly.io/blog/tigris-public-beta/"/> <id>https://fly.io/blog/tigris-public-beta/</id> <published>2024-02-15T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/tigris-public-beta/assets/tigris-public-beta-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. That’s pretty cool, but we want to talk about something someone else built, that <a href="https://fly.io/docs/reference/tigris/" title="">you can use today</a> to build applications.</p> </div> <p>There are three hard things in computer science:</p> <ol> <li>Cache invalidation </li><li>Naming things </li><li><a href='https://aws.amazon.com/s3/' title=''>Doing a better job than Amazon of storing files</a> </li></ol> <p>Of all the annoying software problems that have no business being annoying, handling a file upload in a full-stack application stands apart, a universal if tractable malady, the plantar fasciitis of programming.</p> <p>Now, the actual act of clients placing files on servers is straightforward. Your framework <a href='https://hexdocs.pm/phoenix/file_uploads.html' title=''>has</a> <a href='https://edgeguides.rubyonrails.org/active_storage_overview.html' title=''>a</a> <a href='https://docs.djangoproject.com/en/5.0/topics/http/file-uploads/' title=''>feature</a> <a href='https://expressjs.com/en/resources/middleware/multer.html' title=''>that</a> <a href='https://github.com/yesodweb/yesod-cookbook/blob/master/cookbook/Cookbook-file-upload-saving-files-to-server.md' title=''>does</a> <a href='https://laravel.com/docs/10.x/filesystem' title=''>it</a>. What&rsquo;s hard is making sure that uploads stick around to be downloaded later.</p> <aside class="right-sidenote"><p>(yes, yes, we know, <a href="https://youtu.be/b2F-DItXtZs?t=102" title="">sharding /dev/null</a> is faster)</p> </aside> <p>Enter object storage, a pattern you may know by its colloquial name &ldquo;S3&rdquo;. Object storage occupies a funny place in software architecture, somewhere between a database and a filesystem. It&rsquo;s like <a href='https://man7.org/linux/man-pages/man3/malloc.3.html' title=''><code>malloc</code></a><code>()</code>, but for cloud storage instead of program memory.</p> <p><a href='https://www.kleenex.com/en-us/' title=''>S3</a>—err, object storage — is so important that it was the second AWS service ever introduced (EC2 was not the first!). Everybody wants it. We know, because they keep asking us for it.</p> <p>So why didn&rsquo;t we build it?</p> <p>Because we couldn&rsquo;t figure out a way to improve on S3. And we still haven&rsquo;t! But someone else did, at least for the kinds of applications we see on Fly.io.</p> <h2 id='but-first-some-back-story' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-first-some-back-story' aria-label='Anchor'></a><span class='plain-code'>But First, Some Back Story</span></h2> <p>S3 checks all the boxes. It&rsquo;s trivial to use. It&rsquo;s efficient and cost-effective. It has redundancies that would make a DoD contractor blush. It integrates with archival services like Glacier. And every framework supports it. At some point, the IETF should just take a deep sigh and write an S3 API RFC, XML signatures and all.</p> <p>There&rsquo;s at least one catch, though.</p> <p>Back in, like, &lsquo;07 people ran all their apps from a single city. S3 was designed to work for those kinds of apps. The data, the bytes on the disks (or whatever weird hyperputer AWS stores S3 bytes on), live in one place. A specific place. In a specific data center. As powerful and inspiring as The Architects are, they are mortals, and must obey the laws of physics.</p> <p>This observation feels banal, until you realize how apps have changed in the last decade. Apps and their users don&rsquo;t live in one specific place. They live all over the world. When users are close to the S3 data center, things are amazing! But things get less amazing the further away you get from the data center, and even less amazing the smaller and more frequent your reads and writes are.</p> <p>(Thought experiment: you have to pick one place in the world to route all your file storage. Where is it? Is it <a href='https://www.tripadvisor.com/Restaurant_Review-g30246-d1956555-Reviews-Ford_s_Fish_Shack_Ashburn-Ashburn_Loudoun_County_Virginia.html' title=''>Loudoun County, Virginia</a>?)</p> <p>So, for many modern apps, you end up having to <a href='https://stackoverflow.com/questions/32426249/aws-s3-bucket-with-multiple-regions' title=''>write things into different regions</a>, so that people close to the data get it from a region-specific bucket. Doing that pulls in CDN-caching things that complicated your application and put barriers between you and your data. Before you know it, you&rsquo;re wearing custom orthotics on your, uh, developer feet. (<em>I am done with this metaphor now, I promise.</em>)</p> <aside class="right-sidenote"><p>(well, okay, Backblaze B2 because somehow my bucket fits into their free tier, but you get the idea)</p> </aside> <p>Personally, I know this happens. Because I had to build one! I run a <a href='https://xeiaso.net/blog/xedn/' title=''>CDN backend</a> that&rsquo;s a caching proxy for S3 in six continents across the world. All so that I can deliver images and video efficiently for the readers of my blog.</p> <aside class="right-sidenote"><p>(shut up, it’s a sandwich)</p> </aside> <p>What if data was really global? For some applications, it might not matter much. But for others, it matters a lot. When a sandwich lover in Australia snaps a picture of a <a href='https://en.wikipedia.org/wiki/Hamdog' title=''>hamdog</a>, the people most likely to want to see that photo are also in Australia. Routing those uploads and downloads through one building in Ashburn is no way to build a sandwich reviewing empire.</p> <p>Localizing all the data sounds like a hard problem. What if you didn&rsquo;t need to change anything on your end to accomplish it?</p> <h2 id='show-me-a-hero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#show-me-a-hero' aria-label='Anchor'></a><span class='plain-code'>Show Me A Hero</span></h2> <p>Building a miniature CDN infrastructure just to handle file uploads seems like the kind of thing that could take a week or so of tinkering. The Fly.io unified theory of cloud development is that solutions are completely viable for full-stack developers only when they take less than 2 hours to get working.</p> <p>AWS agrees, which is why they have a SKU for it, <a href='https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_distribution' title=''>called Cloudfront</a>, which will, at some variably metered expense, optimize the read side of a single-write-region bucket: they&rsquo;ll set up <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>a simple caching CDN</a> for you. You can probably get S3 and Cloudfront working within 2 hours, especially if you&rsquo;ve set it up before.</p> <p>Our friends at Tigris have this problem down to single-digit minutes, and what they came up with is a lot cooler than a cache CDN.</p> <p>Here&rsquo;s how it works. Tigris runs redundant FoundationDB clusters in our regions to track objects. They use Fly.io&rsquo;s NVMe volumes as a first level of cached raw byte store, and a queuing system modelled on <a href='https://www.foundationdb.org/files/QuiCK.pdf' title=''>Apple&rsquo;s QuiCK paper</a> to distribute object data to multiple replicas, to regions where the data is in demand, and to 3rd party object stores… like S3.</p> <p>If your objects are less than about 128 kilobytes, Tigris makes them instantly global. By default! Things are just snappy, all over the world, automatically, because they&rsquo;ve done all the work.</p> <p>But it gets better, because Tigris is also much more flexible than a cache simple CDN. It&rsquo;s globally distributed from the jump, with inter-region routing baked into its distribution layer. Tigris isn&rsquo;t a CDN, but rather a toolset that you can use to build arbitrary CDNs, with consistency guarantees, instant purge and relay regions.</p> <p>There&rsquo;s a lot going on in this architecture, and it&rsquo;d be fun to dig into it more. But for now, you don&rsquo;t have to understand any of it. Because Tigris ties all this stuff together with an S3-compatible object storage API. If your framework can talk to S3, it can use Tigris.</p> <h2 id='fly-storage' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-storage' aria-label='Anchor'></a><span class='plain-code'><code>fly storage</code></span></h2> <p>To get started with this, run the <code>fly storage create</code> command:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-69koa0wf" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-69koa0wf">$ fly storage create Choose a name, use the default, or leave blank to generate one: xe-foo-images Your Tigris project (xe-foo-images) is ready. See details and next steps with: https://fly.io/docs/reference/tigris/ Setting the following secrets on xe-foo: AWS_REGION BUCKET_NAME AWS_ENDPOINT_URL_S3 AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY Secrets are staged for the first deployment </code></pre> </div> </div> <p>All you have to do is fill in a bucket name. Hit enter. All of the configuration for the AWS S3 library will be injected into your application for you. And you don&rsquo;t even need to change the libraries that you&rsquo;re using. <a href='https://www.tigrisdata.com/docs/sdks/s3/' title=''>The Tigris examples</a> all use the AWS libraries to put and delete objects into Tigris using the same calls that you use for S3.</p> <p>I know how this looks for a lot of you. It looks like we&rsquo;re partnering with Tigris because we&rsquo;re chicken, and we didn&rsquo;t want to build something like this. Well, guess what: you&rsquo;re right!</p> <p>Compute and networking: those are things we love and understand. Object storage? <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>We already gave away the game on how we&rsquo;d design a CDN for our own content</a>, and it wasn&rsquo;t nearly as slick as Tigris.</p> <p>Object storage is important. It needs to be good. We did not want to half-ass it. So we partnered with Tigris, so that they can put their full resources into making object storage as ✨magical✨ as Fly.io is.</p> <p>This also mirrors a lot of the Unix philosophy of Days Gone Past, you have individual parts that do one thing very well that are then chained together to create a composite result. I mean, come on, would you seriously want to buy your servers the same place you buy your shoes?</p> <h2 id='one-bill-to-rule-them-all' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#one-bill-to-rule-them-all' aria-label='Anchor'></a><span class='plain-code'>One bill to rule them all</span></h2> <p>Well, okay, the main reason why you would want to do that is because having everything under one bill makes it really easy for your accounting people. So, to make one bill for your computer, your block storage, your databases, your networking, and your object storage, we&rsquo;ve wrapped everything under one bill. You don&rsquo;t have to create separate accounts with Supabase or Upstash or PlanetScale or Tigris. Everything gets charged to your Fly.io bill and you pay one bill per month.</p> <aside class="right-sidenote"><p>This was actually going to be posted on Valentine’s Day, but we had to wait for the chocolate to go on sale.</p> </aside> <p>This is our Valentine&rsquo;s Day gift to you all. Object storage that just works. Stay tuned because we have a couple exciting features that build on top of the integration of Fly.io and Tigris that allow really unique things, such as truly global static website hosting and turning your bucket into a CDN in 5 minutes at most.</p> <p>Here&rsquo;s to many more happy developer days to come.</p></content> </entry> <entry> <title>GPUs on Fly.io are available to everyone!</title> <link rel="alternate" href="https://fly.io/blog/gpu-ga/"/> <id>https://fly.io/blog/gpu-ga/</id> <published>2024-02-12T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/gpu-ga/assets/gpu-ga-thumb.webp"/> <content type="html"><div class="lead"><p>Fly.io makes it easy to spin up compute around the world, now including powerful GPUs. Unlock the power of large language models, text transcription, and image generation with our datacenter-grade muscle!</p> </div> <p>GPUs are now available to everyone!</p> <p>We know you&rsquo;ve been excited about wanting to use GPUs on Fly.io and we&rsquo;re happy to announce that they&rsquo;re available for everyone. If you want, you can spin up GPU instances with any of the following cards:</p> <ul> <li>Ampere A100 (40GB) <code>a100-40gb</code> </li><li>Ampere A100 (80GB) <code>a100-80gb</code> </li><li>Lovelace L40s (48GB) <code>l40s</code> </li></ul> <p>To use a GPU instance today, change the <code>vm.size</code> for one of your apps or processes to any of the above GPU kinds. Here&rsquo;s how you can spin up an <a href='https://ollama.ai' title=''>Ollama</a> server in seconds:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-mgip5vdl" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-mgip5vdl"><span class="py">app</span> <span class="p">=</span> <span class="s">"your-app-name"</span> <span class="py">region</span> <span class="p">=</span> <span class="s">"ord"</span> <span class="py">vm.size</span> <span class="p">=</span> <span class="s">"l40s"</span> <span class="nn">[http_service]</span> <span class="py">internal_port</span> <span class="p">=</span> <span class="mi">11434</span> <span class="py">force_https</span> <span class="p">=</span> <span class="kc">false</span> <span class="py">auto_stop_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">auto_start_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">min_machines_running</span> <span class="p">=</span> <span class="mi">0</span> <span class="py">processes</span> <span class="p">=</span> <span class="nn">["app"]</span> <span class="nn">[build]</span> <span class="py">image</span> <span class="p">=</span> <span class="s">"ollama/ollama"</span> <span class="nn">[mounts]</span> <span class="py">source</span> <span class="p">=</span> <span class="s">"models"</span> <span class="py">destination</span> <span class="p">=</span> <span class="s">"/root/.ollama"</span> <span class="py">initial_size</span> <span class="p">=</span> <span class="s">"100gb"</span> </code></pre> </div> </div> <p>Deploy this and bam, large language model inferencing from anywhere. If you want a private setup, see the article <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a> for more information. You never know when you have a sandwich emergency and don&rsquo;t know what you can make with what you have on hand.</p> <p>We are working on getting some lower-cost A10 GPUs in the next few weeks. We&rsquo;ll update you when they&rsquo;re ready.</p> <p>If you want to explore the possibilities of GPUs on Fly.io, here&rsquo;s a few articles that may give you ideas:</p> <ul> <li><a href='https://fly.io/blog/not-midjourney-bot/' title=''>Deploy Your Own (Not) MidJourney Bot On Fly GPUs</a> </li><li><a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a> </li><li><a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>Transcribing on Fly GPU Machines</a> </li></ul> <p>Depending on factors such as your organization&rsquo;s age and payment history, you may need to go through additional verification steps.</p> <p>If you&rsquo;ve been experimenting with Fly.io GPUs and have made something cool, let us know on the <a href='https://community.fly.io/' title=''>Community Forums</a> or by mentioning us <a href='https://hachyderm.io/@flydotio' title=''>on Mastodon</a>! We&rsquo;ll boost the cool ones.</p></content> </entry> <entry> <title>Event Driven Machines</title> <link rel="alternate" href="https://fly.io/blog/event-driven-machines/"/> <id>https://fly.io/blog/event-driven-machines/</id> <published>2024-02-05T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/event-driven-machines/assets/lambdo-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We have fast booting VM’s, so why not <a href="https://fly.io/docs/speedrun/" title="">take advantage of them</a>?</p> </div> <p>Serverless is great because is has good ergonomics - when an event is received, a &ldquo;not-server&rdquo; boots quickly, code is run, and then everything is torn down. We&rsquo;re billed only on usage.</p> <p>It turns out that Fly.io shares many of <a href='https://fly.io/blog/the-serverless-server/' title=''>the same ergonomics</a> as serverless. Can we do a serverless on Fly.io? 🦆 Well, if it&rsquo;s quacking like a duck, let&rsquo;s call it a mallard.</p> <p>Here&rsquo;s a useful pattern for triggering our own not-servers with Fly Machines.</p> <h2 id='triggering-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#triggering-machines' aria-label='Anchor'></a><span class='plain-code'>Triggering Machines</span></h2> <p>I want to make Machines do some work based on my own events. Fly.io can already <a href='https://fly.io/docs/apps/autostart-stop/' title=''>stop Machines when idle</a> based on HTTP, so let&rsquo;s concentrate on non-HTTP events.</p> <p>The process of running evented Machines involves:</p> <ol> <li>Listening for events </li><li>Spinning up Fly Machines to run our code (with the events as context) </li><li>Having event-aware code to run </li></ol> <p>To do this, I made a project and named it <a href='https://github.com/fly-apps/lambdo' title=''><strong class='font-semibold text-navy-950'>Lambdo</strong></a> because reasons. You can consider this project &ldquo;reference architecture&rdquo; in the same way you call a toddler&rsquo;s scribbling &ldquo;art&rdquo;.</p> <p>The goal is to run some of our code on a fresh not-server when an event is received. We want this done efficiently - a Machine should only exist long enough to process an event or 3.</p> <p>Lambdo does just that - it receives some events, and spins up Fly Machines with those events placed <em>inside</em> the VMs. Once the code finishes, the Machine is destroyed.</p> <div class='group relative min-w-0 bg-white shadow-md shadow-navy-500/10 rounded-xl mb-7 ring-1 ring-navy-300/40'><button type='button' class='bubble-wrap z-20 absolute right-2.5 top-2.5 text-transparent group-hover:text-navy-950 hocus:text-violet-600 bg-transparent group-hover:bg-white hocus:bg-violet-200/40 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none' data-wrap-target='#table-hzpri2l5' data-wrap-type='nowrap'><svg class='w-5 h-5 pointer-events-none' viewBox='0 0 20 20' fill='none' stroke='currentColor' stroke-width='1.5' stroke-linecap='round' stroke-linejoin='round'><g buffered-rendering='static'><path d='M11.912 10.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.314 2.314 0 00-2.315-2.31H4.959M15.187 14.5H4.959M8.802 10H4.959' /><path d='M13.081 8.466l-1.548 1.571 1.548 1.571' /></g></svg><span class='bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950'>Wrap text</span></button><div class='min-w-0 overflow-x-auto rounded-xl'><table class='table-stripe table-stretch table-pad text-sm whitespace-nowrap m-0' id='table-hzpri2l5'><thead class='text-navy-950 text-left'><tr> <th style="text-align: center"><img alt="the files are inside the computer" src="/blog/event-driven-machines/assets/files-are-inside-the-computer-cover.webp" /></th> </tr> </thead><tbody><tr> <td style="text-align: center">The files are <em>in</em> the computer!</td> </tr> </tbody></table></div></div><h2 id='listening-for-events' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#listening-for-events' aria-label='Anchor'></a><span class='plain-code'>Listening for Events</span></h2> <p>For our purposes, an event is just a JSON object. <code>{&quot;any&quot;: &quot;object&quot;, &quot;will&quot;: &quot;do&quot;}</code>.</p> <p>We want to turn events into compute, so we need some sort of event system. I decided to use a queue.</p> <h3 id='the-queue' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-queue' aria-label='Anchor'></a><span class='plain-code'>The Queue</span></h3> <p>The first thing I needed was a place to send events! I chose to use SQS, which let me continue to pretend servers don&rsquo;t exist.</p> <p>It&rsquo;s no surprise then that the first part of this project is <a href='https://github.com/fly-apps/lambdo/blob/main/internal/sqs/get_events.go' title=''>code that polls SQS</a>.</p> <p>When the polling returns some non-zero number of events, it collects the SQS messages&rsquo; JSON strings (and some meta data), resulting in an array of objects (a list of events).</p> <p>Then we send these events to some Machines.</p> <h2 id='spinning-up-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#spinning-up-machines' aria-label='Anchor'></a><span class='plain-code'>Spinning Up Machines</span></h2> <p>Fly Machines are fast-booting Micro-VM&rsquo;s, controlled by an <a href='https://fly.io/docs/machines/working-with-machines/' title=''>API</a>.</p> <p>A feature of that API is the ability to <a href='https://community.fly.io/t/machine-files/14453' title=''>create files</a> on a new Machine. This is how we&rsquo;ll get our events into the Machine.</p> <p>When Lambdo creates a Machine, it places a file at <code>/tmp/events.json</code>. Our code just needs to read that file and parse the JSON.</p> <h3 id='running-our-code' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#running-our-code' aria-label='Anchor'></a><span class='plain-code'>Running Our Code</span></h3> <p>Part of the ergonomics of Serverless is (usually) being limited to running just a function. Fly.io doesn&rsquo;t really care what you run, which is to our advantage. We can choose to write discreet functions per event, or we can bring our whole <a href='https://signalvnoise.com/svn3/the-majestic-monolith/' title=''>Majestic Monolith</a> to bear.</p> <p>How do we package up our code? The real answer is &ldquo;however you want!&rdquo;, but here&rsquo;s 2 ideas.</p> <p><strong class='font-semibold text-navy-950'>Use Your Existing Code Base</strong></p> <p>You can just use your existing code base. This is especially easy if you&rsquo;re already deploying apps to Fly.io.</p> <p>All we&rsquo;d need to do is add some additional code - a command perhaps (<code>rake</code>, <code>artisan</code>, whatever) - that sucks in that JSON, iterates over the events, and does some stuff.</p> <div class="highlight-wrapper group relative php"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-4juzgucl" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-4juzgucl"><span class="nv">$events</span> <span class="o">=</span> <span class="nb">json_decode</span><span class="p">(</span><span class="nb">file_get_contents</span><span class="p">(</span><span class="s2">"/tmp/events.json"</span><span class="p">));</span> <span class="k">foreach</span> <span class="p">(</span><span class="nv">$events</span> <span class="k">as</span> <span class="nv">$event</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// do a thing</span> <span class="p">}</span> </code></pre> </div> </div> <p>When we create an event, we&rsquo;ll tell Lambdo how to run your code - more on that later.</p> <p><strong class='font-semibold text-navy-950'>Use Lambdo&rsquo;s Base Images</strong></p> <p>This project also provides some &ldquo;runtimes&rdquo; (base images). This is a bit more &ldquo;traditional serverless&rdquo;, were you provide a function to run.</p> <p>Lambdo contains <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes' title=''>two runtimes</a> right now - Node and PHP. There could be more, of course, but you know&hellip;lazy.</p> <p>The Node runtime <a href='https://github.com/fly-apps/lambdo/blob/main/runtimes/js/src/index.js' title=''>contains some code</a> that will read the JSON payload file (again, just an array of JSON events), and call a user-supplied JS function once per event.</p> <p>An <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes/js/sample-project' title=''>example is here</a> - our code just needs to export a function that does stuff to the given event:</p> <div class="highlight-wrapper group relative javascript"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-d6ki7m4i" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-d6ki7m4i"><span class="c1">// File /app/index.js</span> <span class="nx">exports</span><span class="p">.</span><span class="nx">handler</span> <span class="o">=</span> <span class="k">async</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">Let's process an event! The event:</span><span class="dl">"</span><span class="p">,</span> <span class="nx">event</span><span class="p">)</span> <span class="p">}</span> </code></pre> </div> </div> <p>The <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes/php' title=''>PHP runtime</a> is the same idea, a user-supplied handler looks like this:</p> <div class="highlight-wrapper group relative php"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-coch74a" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-coch74a"><span class="c1">// File /app/index.php</span> <span class="k">return</span> <span class="k">function</span> <span class="n">function</span><span class="p">(</span><span class="kt">array</span> <span class="nv">$event</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// Do something with $event</span> <span class="p">}</span> </code></pre> </div> </div> <p>Explore the <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes' title=''>runtime</a> directory of the project to see how that&rsquo;s put together.</p> <h2 id='sending-an-event' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#sending-an-event' aria-label='Anchor'></a><span class='plain-code'>Sending an Event</span></h2> <p>Since our events are sent via SQS queue, it would be helpful to see an example SQS message. Remember how I mentioned the SQS message has some meta data?</p> <p>Here&rsquo;s an example, with said meta data:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-uwc3p0p" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-uwc3p0p">aws sqs send-message <span class="se">\</span> <span class="nt">--queue-url</span><span class="o">=</span>https://sqs.&lt;region&gt;.amazonaws.com/&lt;account&gt;/&lt;queue&gt; <span class="se">\</span> <span class="nt">--message-body</span><span class="o">=</span><span class="s1">'{"foo": "bar"}'</span> <span class="se">\</span> <span class="nt">--message-attributes</span><span class="o">=</span><span class="s1">'{ "size":{"DataType":"String","StringValue":"performance-2x"}, "image":{"DataType":"String","StringValue":"fideloper/lambdo-php-sample:latest"} }'</span> </code></pre> </div> </div> <p>The Body field of the SQS message is assumed to be a JSON string (it&rsquo;s the event itself, and its contents are arbitrary - whatever makes sense for you).</p> <p>The message Attributes contains the meta data - up to 3 important details:</p> <ol> <li><code>image</code>: The image to run (it might be a Docker Hub image, or something you pushed to registry.fly.io). This is <strong class='font-semibold text-navy-950'>required</strong>. </li><li><code>size</code>: The CPU size and type to use† - defaults to <code>performance-2x</code> </li><li><code>command</code>: The command to run, which is the Docker <code>CMD</code> equivalent - defaults to whatever your <code>CMD</code> is set in the <code>Dockerfile</code> used to create the Machine image.†† </li></ol> <p>†You can get valid values for the <code>size</code> option by running <code>fly platform vm-sizes</code>.</p> <p>††It&rsquo;s an array form, e.g. <code>[&quot;php&quot;, &quot;artisan&quot;, &quot;foo&quot;]</code>, you may need to do some escaping of double quotes if you&rsquo;re sending messages to SQS via terminal.</p> <h2 id='we-did-a-lambda' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-did-a-lambda' aria-label='Anchor'></a><span class='plain-code'>We did a Lambda?</span></h2> <p>Fly.io isn&rsquo;t serverless, but it has all these primitives that add up to serverless. You have events, Fly.io has fast-booting VM&rsquo;s. They just make sense together!</p> <p>What we did here is use <a href='https://github.com/fly-apps/lambdo' title=''><strong class='font-semibold text-navy-950'>Lambdo</strong> to respond to events by spinning up a Machine</a>. Our code can process those events any way we want.</p> <p>What I like about this approach is how flexible it can be. We can choose the base image to use and the server type (even using GPU-enabled Machines) <em>per event</em>. Since we have full control over the Machine VM&rsquo;s responding to the events, we can do whatever we want inside of them. Pretty neat!</p></content> </entry> <entry> <title>Delegating tasks to Fly Machines</title> <link rel="alternate" href="https://fly.io/blog/delegate-tasks-to-fly-machines/"/> <id>https://fly.io/blog/delegate-tasks-to-fly-machines/</id> <published>2024-02-01T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/delegate-tasks-to-fly-machines/assets/delegate-tasks-to-fly-machines-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io. We run apps for our users on hardware we host around the world. Leveraging Fly.io Machines and Fly.io’s private network can make delegating expensive tasks a breeze. It’s easy to <a href="/docs/speedrun/" title="">get started</a>!</p> </div> <p>There are many ways to delegate work in web applications, from using background workers to serverless architecture. In this article, we explore a new machine pattern that takes advantage of Fly Machines and distinct process groups to make quick work of resource-intensive tasks.</p> <h2 id='the-problem' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-problem' aria-label='Anchor'></a><span class='plain-code'>The Problem</span></h2> <p>Let&rsquo;s say you&rsquo;re building a web application that has a few tasks that demand a hefty amount of memory or CPU juice. Resizing images, for example, can require a shocking amount of memory, but you might not need that much memory <em>all</em> of the time, for handling most of your web requests. Why pay for all that horsepower when you don&rsquo;t need it most of the time?</p> <p>What if there&rsquo;s a different way to delegate these resource-intensive tasks?</p> <h2 id='the-solution' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-solution' aria-label='Anchor'></a><span class='plain-code'>The Solution</span></h2> <p>What if you could simply delegate these types of tasks to a more powerful machine <em>only</em> when necessary? Let&rsquo;s build an example of this method in a sample app. We&rsquo;ll be using Next.js today, but this pattern is framework (and language) agnostic.</p> <p>Here&rsquo;s how it will work:</p> <ul> <li>A request hits an endpoint that does some resource-intensive tasks </li><li>The request is passed on to a copy of your app that&rsquo;s running on a more beefy machine </li><li>The beefy machine performs the intensive work and then hands the result back to the user via the &ldquo;weaker&rdquo; machine. </li></ul> <p><img alt="(A 3 panel comic of two characters, one small and one big and strong, both with computer screens for heads. Panel 1: Little guy hands the big guy a jar of pickles. Panel 2: Big guy opens the pickle jar. Panel 3: Big guy hands back the opened jar to the little guy, who is pleased; Illustration by Annie Sexton)" src="/blog/delegate-tasks-to-fly-machines/assets/./3-panel-comic-delegate-tasks-to-fly-machines.webp" /></p> <p>To demonstrate this task-delegation pattern, we&rsquo;re going to start with a single-page application that looks like this:</p> <p><img alt="(Screenshot of the demo app; its a single-page app with the header and description &quot;Open Pickle Jar: You&#39;ve got a jar of pickles (a zip file of some high-def pickle photos) that you would like to open (resize and display below)&quot;. Under the description there are two inputs, one for width and one for height, and a button that says &quot;Open pickle jar&quot;)" src="/blog/delegate-tasks-to-fly-machines/assets/./pickle-jar-screenshot.webp" /></p> <p>Our &ldquo;Open Pickle Jar&rdquo; app is quite simple: you provide the width and height and it goes off and resizes some high-resolution photos to those dimensions (exciting!).</p> <p>If you&rsquo;d like to follow along, you can clone the <code>start-here</code> branch of this repository: <a href='https://github.com/fly-apps/open-pickle-jar' title=''>https://github.com/fly-apps/open-pickle-jar</a> . The final changes are visible on the <code>main</code> branch. This app uses S3 for image storage, so you&rsquo;ll need to create a bucket called <code>open-pickle-jar</code> and provide <code>AWS_REGION</code>, <code>AWS_ACCESS_KEY_ID</code>, and <code>AWS_SECRET_ACCESS_KEY</code> as environment variables.</p> <p>This task is really just a stand-in for any HTTP request that kicks off a resource-intensive task. Get the request from the user, delegate it to a more powerful machine, and then return the result to the user. It&rsquo;s what happens when you can&rsquo;t open a pickle jar, and you ask for someone to help.</p> <p>Before we start, let&rsquo;s define some terms and what they mean on Fly.io:</p> <ul> <li><strong class='font-semibold text-navy-950'>Machines:</strong> Extremely fast-booting VMs. They can exist in different regions and even run different processes. </li><li><strong class='font-semibold text-navy-950'>App:</strong> An abstraction for a group of Machines running your code on Fly.io, along with the configuration, provisioned resources, and data we need to keep track of to run and route to your Machines. </li><li><strong class='font-semibold text-navy-950'>Process group:</strong> A collection of Machines running a specific process. Many apps only run a single process (typically a public-facing HTTP server), but you can define any number of them. </li><li><strong class='font-semibold text-navy-950'>fly.toml:</strong> A configuration file for deploying apps on Fly.io where you can set things like Machine specs, process groups, regions, and more. </li></ul> <hr> <p><strong class='font-semibold text-navy-950'>Setup Overview</strong></p> <p>Here&rsquo;s what we&rsquo;ll need for our application:</p> <ol> <li>A <strong class='font-semibold text-navy-950'>route</strong> that performs our resource-intensive task </li><li>A <strong class='font-semibold text-navy-950'>wrapper function</strong> that either: <ol> <li>Runs our resource-intensive task OR </li><li>Forwards the request to our more powerful Machine </li></ol> </li><li><strong class='font-semibold text-navy-950'>Two process groups</strong> running the <em>same process</em> but with differing Machine specs: <ol> <li>One for accepting HTTP traffic and handling most requests (let&rsquo;s call it <code>web</code>) </li><li>One internal-only group for doing the heavy lifting (let&rsquo;s call it <code>worker</code>) </li></ol> </li></ol> <p>In short, this is what our architecture will look like, a standard web and worker duo.</p> <p><img alt="(A simple graphic illustrating two servers; a small box containing &quot;npm run start&quot; and a larger box containing the same thing. The small is labeled &quot;web&quot; and the larger box is labeled &quot;worker&quot;.)" src="/blog/delegate-tasks-to-fly-machines/assets/./web-worker.webp" /></p> <h3 id='creating-our-route' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#creating-our-route' aria-label='Anchor'></a><span class='plain-code'>Creating our route</span></h3> <p>Next.js has two distinct routing patterns: Pages and App router. We&rsquo;ll use the App router in our example since it&rsquo;s the preferred method moving forward.</p> <p>Under your <code>/app</code> directory, create a new folder called <code>/open-pickle-jar</code> containing a <code>route.ts</code> .</p> <p>(We&rsquo;re using TypeScript here, but feel free to use normal JavaScript if you prefer!)</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-lg2jvd1h" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-lg2jvd1h">... /app /open-pickle-jar route.ts ... </code></pre> </div> </div> <p>Inside <code>route.ts</code> we&rsquo;ll flesh out our endpoint:</p> <div class="highlight-wrapper group relative typescript"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-x0guz9t5" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-x0guz9t5"><span class="c1">// /app/open-pickle-jar/route.ts</span> <span class="k">import</span> <span class="nx">delegateToWorker</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">@/utils/delegateToWorker</span><span class="dl">"</span><span class="p">;</span> <span class="k">import</span> <span class="p">{</span> <span class="nx">NextRequest</span><span class="p">,</span> <span class="nx">NextResponse</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">next/server</span><span class="dl">"</span><span class="p">;</span> <span class="k">import</span> <span class="p">{</span> <span class="nx">openPickleJar</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">../openPickleJar</span><span class="dl">"</span><span class="p">;</span> <span class="k">export</span> <span class="k">async</span> <span class="kd">function</span> <span class="nx">POST</span><span class="p">(</span><span class="nx">request</span><span class="p">:</span> <span class="nx">NextRequest</span><span class="p">)</span> <span class="p">{</span> <span class="kd">const</span> <span class="p">{</span> <span class="nx">width</span><span class="p">,</span> <span class="nx">height</span> <span class="p">}</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">request</span><span class="p">.</span><span class="nx">json</span><span class="p">();</span> <span class="kd">const</span> <span class="nx">path</span> <span class="o">=</span> <span class="nx">request</span><span class="p">.</span><span class="nx">nextUrl</span><span class="p">.</span><span class="nx">pathname</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">body</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">delegateToWorker</span><span class="p">(</span><span class="nx">path</span><span class="p">,</span> <span class="nx">openPickleJar</span><span class="p">,</span> <span class="p">{</span> <span class="nx">width</span><span class="p">,</span> <span class="nx">height</span> <span class="p">});</span> <span class="k">return</span> <span class="nx">NextResponse</span><span class="p">.</span><span class="nx">json</span><span class="p">(</span><span class="nx">body</span><span class="p">);</span> <span class="p">}</span> </code></pre> </div> </div> <p>The function <code>openPickleJar</code> that we&rsquo;re importing contains our resource-intensive task, which in this case is extracting images from a <code>.zip</code> file, resizing them all to the new dimensions, and returning the new image URLs.</p> <p>The <code>POST</code> function is how one define routes for specific HTTP methods in Next.js, and ours implements a function <code>delegateToWorker</code> that accepts the path of the current endpoint (<code>/open-pickle-jar</code>) our resource-intensive function, and the same request parameters. This function doesn&rsquo;t yet exist, so let&rsquo;s build that next!</p> <h3 id='creating-our-wrapper-function' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#creating-our-wrapper-function' aria-label='Anchor'></a><span class='plain-code'>Creating our wrapper function</span></h3> <p>Now that we&rsquo;ve set up our endpoint, let&rsquo;s flesh out the wrapper function that delegates our request to a more powerful machine.</p> <p>We haven&rsquo;t defined our process groups just yet, but if you recall, the plan is to have two:</p> <ol> <li><code>web</code> - Our standard web server </li><li><code>worker</code> - For opening pickle jars (e.g. doing resource-intensive work). It&rsquo;s essentially a duplicate of <code>web</code>, but running on beefier Machines. </li></ol> <p>Here&rsquo;s what we want this wrapper function to do:</p> <ul> <li>If the current machine is a <code>worker</code> , proceed to execute the resource-intensive task </li><li>If the current machine is NOT a <code>worker</code> , make a new request to the identical endpoint on a <code>worker</code> Machine </li></ul> <p>Inside your <code>/utils</code> directory, create a file called <code>delegateToWorker.ts</code> with the following content:</p> <div class="highlight-wrapper group relative typescript"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-c07fgdhq" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-c07fgdhq"><span class="c1">// /utils/delegateToWorker.ts</span> <span class="k">export</span> <span class="k">default</span> <span class="k">async</span> <span class="kd">function</span> <span class="nx">delegateToWorker</span><span class="p">(</span><span class="nx">path</span><span class="p">:</span> <span class="kr">string</span><span class="p">,</span> <span class="nx">func</span><span class="p">:</span> <span class="p">(...</span><span class="nx">args</span><span class="p">:</span> <span class="kr">any</span><span class="p">[])</span> <span class="o">=&gt;</span> <span class="nb">Promise</span><span class="o">&lt;</span><span class="kr">any</span><span class="o">&gt;</span><span class="p">,</span> <span class="nx">args</span><span class="p">:</span> <span class="nx">object</span><span class="p">):</span> <span class="nb">Promise</span><span class="o">&lt;</span><span class="kr">any</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">FLY_PROCESS_GROUP</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">worker</span><span class="dl">'</span><span class="p">)</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">'</span><span class="s1">running on the worker...</span><span class="dl">'</span><span class="p">);</span> <span class="k">return</span> <span class="nx">func</span><span class="p">({...</span><span class="nx">args</span><span class="p">});</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">'</span><span class="s1">sending new request to worker...</span><span class="dl">'</span><span class="p">);</span> <span class="kd">const</span> <span class="nx">workerHost</span> <span class="o">=</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">NODE_ENV</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">development</span><span class="dl">'</span> <span class="p">?</span> <span class="dl">'</span><span class="s1">localhost:3001</span><span class="dl">'</span> <span class="p">:</span> <span class="s2">`worker.process.</span><span class="p">${</span><span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">FLY_APP_NAME</span><span class="p">}</span><span class="s2">.internal:3000`</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">response</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">fetch</span><span class="p">(</span><span class="s2">`http://</span><span class="p">${</span><span class="nx">workerHost</span><span class="p">}${</span><span class="nx">path</span><span class="p">}</span><span class="s2">`</span><span class="p">,</span> <span class="p">{</span> <span class="na">method</span><span class="p">:</span> <span class="dl">'</span><span class="s1">POST</span><span class="dl">'</span><span class="p">,</span> <span class="na">headers</span><span class="p">:</span> <span class="p">{</span> <span class="dl">'</span><span class="s1">Content-Type</span><span class="dl">'</span><span class="p">:</span> <span class="dl">'</span><span class="s1">application/json</span><span class="dl">'</span> <span class="p">},</span> <span class="na">body</span><span class="p">:</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">({...</span><span class="nx">args</span> <span class="p">})</span> <span class="p">});</span> <span class="k">return</span> <span class="nx">response</span><span class="p">.</span><span class="nx">json</span><span class="p">();</span> <span class="p">}</span> <span class="p">}</span> </code></pre> </div> </div> <p>In our <code>else</code> section, you&rsquo;ll notice that while developing locally (aka, when <code>NODE_ENV</code> is <code>development</code>) we define the hostname of our <code>worker</code> process to be <code>localhost:3001</code>. Typically Next.js apps run on port <code>3000</code>, so while testing our app locally, we can have two instances of our process running in different terminal shells:</p> <ul> <li><code>npm run dev</code> - This will run on <code>localhost:3000</code> and will act as our local <code>web</code> process </li><li><code>FLY_PROCESS_GROUP=worker npm run dev</code> - This will run on <code>localhost:3001</code> and will act as our <code>worker</code> process (Next.js should auto-increment the port if the original <code>3000</code> is already in use) </li></ul> <p>Also, if you&rsquo;re wondering about the <code>FLY_PROCESS_GROUP</code> and <code>FLY_APP_NAME</code> constants, these are <a href='https://fly.io/docs/reference/runtime-environment/' title=''>Fly.io-specific runtime environment variables</a> available on all apps.</p> <h3 id='accessing-our-worker-machines-internal' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accessing-our-worker-machines-internal' aria-label='Anchor'></a><span class='plain-code'>Accessing our <code>worker</code> Machines (<code>.internal</code>)</span></h3> <p>Now, when this code is running in production (aka <code>NODE_ENV</code> is NOT <code>development</code>) you&rsquo;ll see that we&rsquo;re using a unique hostname to access our <code>worker</code> Machine.</p> <p>Apps belonging to the same organization on Fly.io are provided a number of <a href='https://fly.io/docs/networking/private-networking/#fly-io-internal-addresses' title=''>internal addresses</a>. These <code>.internal</code> addresses let you point to different Apps and Machines in your private network. For example:</p> <ul> <li><code>&lt;region&gt;.&lt;app name&gt;.internal</code> – To reach app instances in a particular region, like <code>gru.my-cool-app.internal</code> </li><li><code>&lt;app instance ID&gt;.&lt;app name&gt;.internal</code> - To reach a <em>specific</em> app instance. </li><li><code>&lt;process group&gt;.process.&lt;app name&gt;.internal</code> - To target app instances belonging to a specific process group. <strong class='font-semibold text-navy-950'>This is what we&rsquo;re using in our app.</strong> </li></ul> <p>Since our <code>worker</code> process group is running the same process as our <code>web</code> process (in our case, <code>npm run start</code>), we&rsquo;ll also need to make sure we use the same internal port (<code>3000</code>).</p> <h3 id='defining-our-process-groups-and-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#defining-our-process-groups-and-machines' aria-label='Anchor'></a><span class='plain-code'>Defining our process groups and Machines</span></h3> <p>The last thing to do will be to define our two process groups and their respective Machine specs. We&rsquo;ll do this by editing our <code>fly.toml</code> configuration.</p> <p>If you don&rsquo;t have this file, go ahead and create a blank one and use the content below, but replace <code>app = open-pickle-jar</code> with your app&rsquo;s name, as well as your preferred <code>primary_region</code>. If you don&rsquo;t know what region you&rsquo;d like to deploy to, <a href='https://fly.io/docs/reference/regions/' title=''>here&rsquo;s the list of them</a>.</p> <p><strong class='font-semibold text-navy-950'>Before you deploy:</strong> Note that deploying this example app will spin up <strong class='font-semibold text-navy-950'>billable</strong> machines. Please feel free to alter the Machine (<code>[[vm]]</code>) specs listed here to ones that suit your budget or app&rsquo;s needs.</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-ffgx1pjb" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-ffgx1pjb">app = "open-pickle-jar" primary_region = "sea" [build] [processes] web = "npm run start" worker = "npm run start" [http_service] internal_port = 3000 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 processes = ["web"] [[vm]] cpu_kind = "shared" cpus = 1 memory_mb = 1024 processes = ["web"] [[vm]] size = "performance-4x" processes = ["worker"] </code></pre> </div> </div> <p>And that&rsquo;s it! With our <code>fly.toml</code> finished, we&rsquo;re ready to deploy our app!</p> <p><img src="https://slabstatic.com/prod/uploads/p1b436gf/posts/images/tH4GaGLVaDkh3RhIwCpiDRX3.png" /></p> <h2 id='discussion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#discussion' aria-label='Anchor'></a><span class='plain-code'>Discussion</span></h2> <p>Today we built a machine pattern on top of Fly.io. This pattern allows us to have a lighter request server that can delegate certain tasks to a stronger server, meaning that we can have one Machine do all the heavy lifting that could block everything else while the other handles all the simple tasks for users. With this in mind, this is a fairly naïve implementation, and we can make this much better:</p> <h3 id='using-a-queue-for-better-resiliency' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#using-a-queue-for-better-resiliency' aria-label='Anchor'></a><span class='plain-code'>Using a queue for better resiliency</span></h3> <p>In its current state, our code isn&rsquo;t very resilient to failed requests. For this reason, you may want to consider keeping track of jobs in a queue with Redis (similar to Sidekiq in Ruby-land). When you have work you want to do, put it in the queue. Your queue worker would have to write the result somewhere (e.g., in Redis) that the application could fetch when it&rsquo;s ready.</p> <h3 id='starting-stopping-worker-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#starting-stopping-worker-machines' aria-label='Anchor'></a><span class='plain-code'>Starting/stopping worker Machines</span></h3> <p>The benefit of this pattern is that you can limit how many &ldquo;beefy&rdquo; Machines you need to have available at any given time. Our demo app doesn&rsquo;t dictate how many <code>worker</code> Machines to have at any given time, but by adding timeouts you could elect to start and stop them as needed.</p> <p>Now, you may think that constantly starting and stopping Machines might incur higher response times, but note that we are NOT talking about creating/destroying Machines. Starting and stopping Machines only takes as long as it takes to start your web server (i.e. <code>npm run start</code>). The best part is that <strong class='font-semibold text-navy-950'>Fly.io does not charge for the CPU and RAM usage of stopped Machines.</strong> <a href='https://community.fly.io/t/we-are-going-to-start-collecting-charges-for-stopped-machines-rootfs-starting-april-25th/17825' title=''>We will charge for storage of their root filesystems on disk, starting April 25th, 2024</a>. Stopped Machines will still be much cheaper than running ones.</p> <h3 id='what-about-serverless-functions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-about-serverless-functions' aria-label='Anchor'></a><span class='plain-code'>What about serverless functions?</span></h3> <p>This &ldquo;delegate to a beefy machine&rdquo; pattern is similar to serverless functions with platforms like AWS Lambda. The main difference is that serverless functions usually require you to segment your application into a bunch of small pieces, whereas the method discussed today just uses the app framework that you deploy to production. Each pattern has its own benefits and downsides.</p> <h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>The pattern outlined here is one more tool in your arsenal for scaling applications. By utilizing Fly.io&rsquo;s private network and <code>.internal</code> domains, it&rsquo;s quick and easy to pass work between different processes that run our app. If you&rsquo;d like to learn about more methods for scaling tasks in your applications, check out <a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''>Rethinking Serverless with FLAME</a> by Chris McCord and <a href='https://fly.io/blog/print-on-demand/' title=''>Print on Demand</a> by Sam Ruby.</p> <figure class="post-cta"> <figcaption> <h1>Get more done on Fly.io</h1> <p>Fly.io has fast booting machines at the ready for your dynamic workloads. It&rsquo;s easy to get started. You can be off and running in minutes.</p> <a class="btn btn-lg" href="https://fly.io/docs/speedrun/"> Deploy something today! <span class='opacity:50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-rabbit.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure></content> </entry> <entry> <title>Macaroons Escalated Quickly</title> <link rel="alternate" href="https://fly.io/blog/macaroons-escalated-quickly/"/> <id>https://fly.io/blog/macaroons-escalated-quickly/</id> <published>2024-01-31T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/macaroons-escalated-quickly/assets/evil-cookies-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We built a new security token system, and can I tell you the good news about our lord and savior the Macaroon?</p> </div><h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'></a><span class='plain-code'>1</span></h2> <p>Let&rsquo;s implement an API token together. It&rsquo;s a design called &ldquo;Macaroons&rdquo;, but don&rsquo;t get hung up on that yet.</p> <p>First some <button toggle="#includes">throat-clearing</button>. Then:</p> <div id="includes" toggle-content="" aria-label="show very boring code"><div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-1c9mit0n"> <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959"></path><path d="M11.081 6.466L9.533 8.037l1.548 1.571"></path></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling"> <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z"></path><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617"></path></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class="highlight relative group"> <pre class="highlight "><code id="code-1c9mit0n"><span class="kn">import</span> <span class="nn">sys</span> <span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">json</span> <span class="kn">import</span> <span class="nn">hmac</span> <span class="k">as</span> <span class="n">hm</span> <span class="kn">from</span> <span class="nn">base64</span> <span class="kn">import</span> <span class="n">b64encode</span><span class="p">,</span> <span class="n">b64decode</span> <span class="kn">from</span> <span class="nn">hashlib</span> <span class="kn">import</span> <span class="n">sha256</span> <span class="k">def</span> <span class="nf">hmac</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">):</span> <span class="k">return</span> <span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="n">sha256</span><span class="p">).</span><span class="n">digest</span><span class="p">()</span> <span class="k">def</span> <span class="nf">enc</span><span class="p">(</span><span class="n">x</span><span class="p">):</span> <span class="k">return</span> <span class="n">b64encode</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">def</span> <span class="nf">dec</span><span class="p">(</span><span class="n">x</span><span class="p">):</span> <span class="k">return</span> <span class="n">b64decode</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> </code></pre> </div> </div></div><div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-7t25lxr4" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-7t25lxr4"><span class="k">def</span> <span class="nf">blank_token</span><span class="p">(</span><span class="n">uid</span><span class="p">,</span> <span class="n">key</span><span class="p">):</span> <span class="n">nonce</span> <span class="o">=</span> <span class="n">enc</span><span class="p">(</span><span class="s">":"</span><span class="p">.</span><span class="n">join</span><span class="p">([</span><span class="nb">str</span><span class="p">(</span><span class="n">uid</span><span class="p">),</span> <span class="n">os</span><span class="p">.</span><span class="n">urandom</span><span class="p">(</span><span class="mi">16</span><span class="p">)]))</span> <span class="k">return</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">([</span><span class="n">nonce</span><span class="p">,</span> <span class="n">enc</span><span class="p">(</span><span class="n">hmac</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">nonce</span><span class="p">))])</span> </code></pre> </div> </div><div class="right-sidenote"><p>Bearer tokens: like cookies, blobs you attach to a request (usually in an HTTP header).</p> </div> <p>We&rsquo;re going to build a minimally-stateful bearer token, a blob signed with HMAC. Nothing fancy so far. <a href='https://api.rubyonrails.org/classes/ActiveSupport/MessageVerifier.html' title=''>Rails has done this</a> for a decade and a half.</p> <p>There&rsquo;s a <a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''>fashion in API security for stateless tokens</a>, which encode all the data you&rsquo;d need to check any request accompanied by that token – without a database lookup. Stateless tokens have some nice properties, and some less-nice. Our tokens won&rsquo;t be stateless: they carry a user ID, with which we&rsquo;ll look up the HMAC key to verify it. But they&rsquo;ll stake out a sort of middle ground.</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-r52d35ga" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-r52d35ga"><span class="k">def</span> <span class="nf">attenuate</span><span class="p">(</span><span class="n">macStr</span><span class="p">,</span> <span class="n">cav</span><span class="p">):</span> <span class="n">mac</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">macStr</span><span class="p">)</span> <span class="n">cavStr</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">cav</span><span class="p">)</span> <span class="n">oldTail</span> <span class="o">=</span> <span class="n">dec</span><span class="p">(</span><span class="n">mac</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span> <span class="n">newTail</span> <span class="o">=</span> <span class="n">enc</span><span class="p">(</span><span class="n">hmac</span><span class="p">(</span><span class="n">oldTail</span><span class="p">,</span> <span class="n">cavStr</span><span class="p">))</span> <span class="k">return</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">mac</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="p">[</span><span class="n">cavStr</span><span class="p">,</span> <span class="n">newTail</span><span class="p">])</span> <span class="n">m0</span> <span class="o">=</span> <span class="n">blank_token</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="n">keys</span><span class="p">[</span><span class="mi">10</span><span class="p">])</span> <span class="n">m1</span> <span class="o">=</span> <span class="n">attenuate</span><span class="p">(</span><span class="n">m0</span><span class="p">,</span> <span class="p">{</span><span class="s">'path'</span><span class="p">:</span> <span class="s">'/images'</span><span class="p">})</span> <span class="n">m2</span> <span class="o">=</span> <span class="n">attenuate</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="p">{</span><span class="s">'op'</span><span class="p">:</span> <span class="s">'read'</span><span class="p">})</span> </code></pre> </div> </div> <p>Let&rsquo;s add some stuff.</p> <p>The meat of our tokens will be a series of claims we call &ldquo;caveats&rdquo;. We call them that because each claim restricts further what the token authorizes. After <code>{&#39;path&#39;: &#39;/images&#39;}</code>, this token only allows operations that happen underneath the <code>/images</code> directory. Then, after <code>{&#39;op&#39;: &#39;read&#39;}</code>, it allows only reads, not writes.</p> <p>(I guess we&rsquo;re building a file sharing system. Whatever.)</p> <p>Some important things about things about this design. First: by implication from the fact that caveats further restrict tokens, a token with no caveats restricts nothing. It&rsquo;s a god-mode token. Don&rsquo;t honor it.</p> <div class="right-sidenote"><p>In other words: the ordering of caveats doesn’t matter.</p> </div> <p>Second: the rule of checking caveats is very simple: every single caveat must pass, evaluating <code>True</code> against the request that carries it, in isolation and without reference to any other caveat. If any caveat evaluates <code>False</code>, the request fails. In that way, we ensure that adding caveats to a token can only ever weaken it.</p> <p>With that in mind, take a closer look at this code:</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-n7mgbkwf" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-n7mgbkwf"><span class="n">oldTail</span> <span class="o">=</span> <span class="n">dec</span><span class="p">(</span><span class="n">mac</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span> <span class="n">newTail</span> <span class="o">=</span> <span class="n">enc</span><span class="p">(</span><span class="n">hmac</span><span class="p">(</span><span class="n">oldTail</span><span class="p">,</span> <span class="n">cavStr</span><span class="p">))</span> </code></pre> </div> </div> <p>Every caveat is HMAC-signed independently, which is weird. Weirder still, the key for that HMAC is the output of the last HMAC. The caveats chain together, and the HMAC of the last caveat becomes the &ldquo;tail&rdquo; of the token.</p> <p>Creating a new blank token for a particular user requires a key that the server (and probably only the server) knows. But adding a caveat doesn&rsquo;t! Anybody can add a caveat. In our design, you, the user, can edit your own API token.</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-nx5eitys" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-nx5eitys"><span class="k">def</span> <span class="nf">verify</span><span class="p">(</span><span class="n">macStr</span><span class="p">,</span> <span class="n">keys</span><span class="p">):</span> <span class="n">mac</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">macStr</span><span class="p">)</span> <span class="n">nonce</span> <span class="o">=</span> <span class="n">dec</span><span class="p">(</span><span class="n">mac</span><span class="p">[</span><span class="mi">0</span><span class="p">]).</span><span class="n">split</span><span class="p">(</span><span class="s">":"</span><span class="p">)</span> <span class="n">key</span> <span class="o">=</span> <span class="n">keys</span><span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">nonce</span><span class="p">[</span><span class="mi">0</span><span class="p">])]</span> <span class="n">tail</span> <span class="o">=</span> <span class="s">""</span> <span class="k">for</span> <span class="n">cav</span> <span class="ow">in</span> <span class="n">mac</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]:</span> <span class="n">tail</span> <span class="o">=</span> <span class="n">hmac</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">cav</span><span class="p">)</span> <span class="n">key</span> <span class="o">=</span> <span class="n">tail</span> <span class="k">return</span> <span class="n">hm</span><span class="p">.</span><span class="n">compare_digest</span><span class="p">(</span><span class="n">tail</span><span class="p">,</span> <span class="n">dec</span><span class="p">(</span><span class="n">mac</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]))</span> <span class="n">verify</span><span class="p">(</span><span class="n">m2</span><span class="p">,</span> <span class="n">keys</span><span class="p">)</span> <span class="c1"># =&gt; True </span></code></pre> </div> </div> <p>For completeness, and to make a point, there&rsquo;s the verification code. Look up the original secret key from the user ID, and then it&rsquo;s chained HMAC all the way down. The point I&rsquo;m making is that Macaroons are very simple.</p> <h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'></a><span class='plain-code'>2</span></h2> <p>Back in 2014, Google published <a href='https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41892.pdf' title=''>a paper at NDSS</a> introducing &ldquo;Macaroons&rdquo;, a new kind of cookie. Since then, they&rsquo;ve become a sort of hipster shibboleth. But they&rsquo;re more talked about than implemented, which is a nice way to say that practically nobody uses them.</p> <p>Until now! I dragged Fly.io into implementing them. Suckers!</p> <p>We had a problem: our API tokens were much too powerful. We needed to scope them down and let them express roles, and I scoped up that project to replace OAuth2 tokens altogether. We now have what I think is one of the more expansive Macaroon implementations on the Internet.</p> <p>I dragged us into using Macaroons because I wanted us to use a hipster token format. Google designed Macaroons for a bigger reason: they hoped to replace browser cookies with something much more powerful.</p> <p>The problem with simple bearer tokens, like browser cookies or JWTs, is that they&rsquo;re prone to being stolen and replayed by attackers.</p> <div class="right-sidenote"><p>game-over: pentest jargon for “very bad”</p> </div> <p>Worse, a stolen token is usually a game-over condition. In most schemes, a bearer token is an all-access pass for the associated user. For some applications this isn&rsquo;t that big a deal, but then, <a href='https://neilmadden.blog/2020/09/09/macaroon-access-tokens-for-oauth-part-2-transactional-auth/' title=''>think about banking</a>. A banking app token that authorizes arbitrary transactions is a recipe for having a small heart attack on every HTTP request.</p> <div class="right-sidenote"><p>(Perfectly minimized API tokens: a software security holy grail)</p> </div> <p>Macaroons are user-editable tokens that enable JIT-generated least-privilege tokens. With minimal ceremony and no additional API requests, a banking app Macaroon lets you authorize a request with a caveat like, I don&rsquo;t know, <code>{&#39;maxAmount&#39;: &#39;$5&#39;}</code>. I mean, something way better than that, probably lots of caveats, not just one, but you get the idea: a token so minimized you feel safe sending it with your request. Ideally, a token that only authorizes that single, intended request.</p> <h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'></a><span class='plain-code'>3</span></h2> <p>That&rsquo;s not why we like Macaroons. We already assume our tokens aren&rsquo;t being stolen.</p> <p>In most systems, the developers come up with a permissions system, and you’re stuck with it. We run a public cloud platform, and people want a lot of different things from our permissions. The dream is, we (the low-level platform developers on the team) design a single permission system, one time, and go about our jobs never thinking about this problem again.</p> <p>Instead of thinking of all of our &ldquo;roles&rdquo; in advance, we just model our platform with caveats:</p> <ol> <li>Users belong to <code>Organizations</code>. </li><li><code>Organizations</code> own <code>Apps</code>. </li><li><code>Apps</code> contain <code>Machines</code> and <code>Volumes</code>. </li><li>To any of these things, you can <code>Read</code>, <code>Write</code>, <code>Create</code>, <code>Delete</code>, and/or <code>Control</code> <aside class="right-sidenote">control being change of state, like “start” and “stop”</aside>. </li><li>Some administrivia, like expiration (<code>ValidityWindow</code>), locking tokens to specific Fly Machines (<code>FromMachineSource</code>), and escape hatches like <code>Mutation</code> (for our GraphQL API). </li></ol> <div class="right-sidenote"><p>(this is a vibes-based notation, don’t think too hard about it)</p> </div> <p>Simplistic. But it expresses admin tokens:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-x5iepn6s" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-x5iepn6s">Organization 4721, mask=* </code></pre> </div> </div> <p>And it expresses normal user tokens:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-srsndejy" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-srsndejy">Organization 4721, mask=read,write,control (App 123, mask=control), (App 345, mask=read, write, control) </code></pre> </div> </div> <p>And also an auditor-only token for that user:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-jh9ga1bt" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-jh9ga1bt">Organization 4721, mask=read,write,control (App 123, mask=control), (App 345, mask=read, write, control) Organization 4721, mask=read </code></pre> </div> </div><div class="right-sidenote"><p>(our deploy tokens are more complicated than this)</p> </div> <p>Or a deployment-only token, for a CI/CD system:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-pe18x39a" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-pe18x39a">Organization 4721, mask=write,control (App 123, mask=*) </code></pre> </div> </div> <p>Those are just the roles we came up with. Users can invent others. The important thing is that they don&rsquo;t have to bother me about them.</p> <h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'></a><span class='plain-code'>4</span></h2> <p>Astute readers will have noticed by now that we haven&rsquo;t shown any code that actually evaluates a caveat. That&rsquo;s because it&rsquo;s boring, and I&rsquo;m too lazy to write it out. Got an <code>Organization</code> token for <code>image-hosting</code> that allows <code>Reads</code>? Ok; check and make sure the incoming request is for an asset of <code>image-hosting</code>, and that it’s a <code>Read</code>. Whatever code you came up with, it’d be fine.</p> <p>These straightforward restrictions are called &ldquo;first party caveats&rdquo;. The first party is us, the platform. We&rsquo;ve got all the information we need to check them.</p> <p>Let&rsquo;s kit out our token format some more.</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-rvmob8wx" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-rvmob8wx"><span class="k">def</span> <span class="nf">third_party_caveat</span><span class="p">(</span><span class="n">ka</span><span class="p">,</span> <span class="n">tail</span><span class="p">,</span> <span class="n">msg</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span> <span class="n">crk</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">urandom</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span> <span class="n">ticket</span> <span class="o">=</span> <span class="n">enc</span><span class="p">(</span><span class="n">encrypt</span><span class="p">(</span><span class="n">ka</span><span class="p">,</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">({</span> <span class="s">'crk'</span><span class="p">:</span> <span class="n">enc</span><span class="p">(</span><span class="n">crk</span><span class="p">),</span> <span class="s">'msg'</span><span class="p">:</span> <span class="n">msg</span> <span class="p">})))</span> <span class="n">challenge</span> <span class="o">=</span> <span class="n">enc</span><span class="p">(</span><span class="n">encrypt</span><span class="p">(</span><span class="n">tail</span><span class="p">,</span> <span class="n">crk</span><span class="p">))</span> <span class="k">return</span> <span class="p">{</span> <span class="s">'url'</span><span class="p">:</span> <span class="n">url</span><span class="p">,</span> <span class="s">'ticket'</span><span class="p">:</span> <span class="n">ticket</span><span class="p">,</span> <span class="s">'challenge'</span> <span class="p">:</span> <span class="n">challenge</span> <span class="p">}</span> <span class="n">key</span> <span class="o">=</span> <span class="nb">bytes</span><span class="p">(</span><span class="s">"YELLOW SUBMARINE"</span><span class="p">)</span> <span class="n">url</span> <span class="o">=</span> <span class="s">"https://canary.service"</span> <span class="n">c3</span> <span class="o">=</span> <span class="n">third_party_caveat</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">tail</span><span class="p">,</span> <span class="n">url</span><span class="p">,</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">({</span><span class="s">'user'</span><span class="p">:</span> <span class="s">'bobson.dugnutt'</span><span class="p">}))</span> <span class="n">m3</span> <span class="o">=</span> <span class="n">attenuate</span><span class="p">(</span><span class="n">m2</span><span class="p">,</span> <span class="n">c3</span><span class="p">)</span> </code></pre> </div> </div> <p>Up till now, we&rsquo;ve gotten by with nothing but HMAC, which is one of the great charms of the design. Now we need to encrypt. There&rsquo;s no authenticated encryption in the Python standard library, but that won&rsquo;t stop us. <button toggle="#hmac-ctr">Ready to make some candy? Hand me that brake fluid!</button></p> <div id="hmac-ctr" toggle-content="" aria-label="show very silly code"><div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-brvb3s1v"> <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959"></path><path d="M11.081 6.466L9.533 8.037l1.548 1.571"></path></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling"> <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z"></path><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617"></path></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class="highlight relative group"> <pre class="highlight "><code id="code-brvb3s1v"><span class="c1"># do i really need to say that i'm not serious about this? </span> <span class="k">def</span> <span class="nf">hmactr</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span> <span class="n">ks</span> <span class="o">=</span> <span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">k</span><span class="o">+</span><span class="n">n</span><span class="p">)</span> <span class="k">for</span> <span class="n">counter</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">sys</span><span class="p">.</span><span class="n">maxint</span><span class="p">):</span> <span class="n">ks</span><span class="p">.</span><span class="n">update</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">counter</span><span class="p">))</span> <span class="n">kbs</span> <span class="o">=</span> <span class="n">ks</span><span class="p">.</span><span class="n">digest</span><span class="p">()</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">16</span><span class="p">):</span> <span class="k">yield</span> <span class="n">kbs</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">def</span> <span class="nf">encrypt</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">buf</span><span class="p">):</span> <span class="n">ak</span> <span class="o">=</span> <span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="s">'auth'</span><span class="p">).</span><span class="n">digest</span><span class="p">()</span> <span class="n">nonce</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">urandom</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span> <span class="n">cipher</span> <span class="o">=</span> <span class="n">hmactr</span><span class="p">(</span><span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="s">'enc'</span><span class="p">).</span><span class="n">digest</span><span class="p">(),</span> <span class="n">nonce</span><span class="p">)</span> <span class="n">ctxt</span> <span class="o">=</span> <span class="nb">bytearray</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">buf</span><span class="p">)):</span> <span class="n">ctxt</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">^=</span> <span class="nb">ord</span><span class="p">(</span><span class="n">cipher</span><span class="p">.</span><span class="nb">next</span><span class="p">())</span> <span class="n">res</span> <span class="o">=</span> <span class="n">nonce</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">ctxt</span><span class="p">)</span> <span class="k">return</span> <span class="n">res</span> <span class="o">+</span> <span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">ak</span><span class="p">,</span> <span class="n">res</span><span class="p">).</span><span class="n">digest</span><span class="p">()</span> <span class="k">def</span> <span class="nf">decrypt</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">buf</span><span class="p">):</span> <span class="n">ak</span> <span class="o">=</span> <span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="s">'auth'</span><span class="p">).</span><span class="n">digest</span><span class="p">()</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">hm</span><span class="p">.</span><span class="n">compare_digest</span><span class="p">(</span><span class="n">buf</span><span class="p">[</span><span class="o">-</span><span class="mi">16</span><span class="p">:],</span> <span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">ak</span><span class="p">,</span> <span class="n">buf</span><span class="p">[:</span><span class="o">-</span><span class="mi">16</span><span class="p">]).</span><span class="n">digest</span><span class="p">()):</span> <span class="k">return</span> <span class="bp">False</span> <span class="n">nonce</span> <span class="o">=</span> <span class="n">buf</span><span class="p">[:</span><span class="mi">16</span><span class="p">]</span> <span class="n">cipher</span> <span class="o">=</span> <span class="n">hmactr</span><span class="p">(</span><span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="s">'enc'</span><span class="p">).</span><span class="n">digest</span><span class="p">(),</span> <span class="n">nonce</span><span class="p">)</span> <span class="n">ptxt</span> <span class="o">=</span> <span class="nb">bytearray</span><span class="p">(</span><span class="n">buf</span><span class="p">[</span><span class="mi">16</span><span class="p">:</span><span class="o">-</span><span class="mi">16</span><span class="p">])</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">buf</span><span class="p">[</span><span class="mi">16</span><span class="p">:</span><span class="o">-</span><span class="mi">16</span><span class="p">])):</span> <span class="n">ptxt</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">^=</span> <span class="nb">ord</span><span class="p">(</span><span class="n">cipher</span><span class="p">.</span><span class="nb">next</span><span class="p">())</span> <span class="k">return</span> <span class="nb">str</span><span class="p">(</span><span class="n">ptxt</span><span class="p">)</span> </code></pre> </div> </div></div> <p>With &ldquo;third-party&rdquo; caveats comes a cast of characters. We&rsquo;re still the first party. You&rsquo;ll play the second party. The third party is any other system in the world that you trust: an SSO system, an audit log, a revocation checker, whatever.</p> <p>Here&rsquo;s the trick of the third-party caveat: our platform doesn&rsquo;t know what your caveat means, and it doesn&rsquo;t have to. Instead, when you see a third-party caveat in your token, you tear a ticket off it and exchange it for a &ldquo;discharge Macaroon&rdquo; with that third party. You submit both Macaroons together to us.</p> <p>Let&rsquo;s attenuate our token with a third-party caveat hooking it up to a &ldquo;canary&rdquo; service that generates a notice approximately any time the token is used.</p> <p><img src="/blog/macaroons-escalated-quickly/assets/third-party.png?1/2&amp;wrap-left" /></p> <p>To build that canary caveat, you first make a <code>ticket</code> that users of the token will hand to your canary, and then a <code>challenge</code> that Fly.io will use to verify discharges your checker spits out. The ticket and the challenge are both encrypted. The ticket is encrypted under <code>KA</code>, so your service can read it. The challenge is encrypted under the previous Macaroon tail, so only Fly.io can read it. Both hide yet another key, the random HMAC key <code>CRK</code> (&ldquo;caveat root key&rdquo;).</p> <p>In addition to <code>CRK</code>, the ticket contains a message, which says whatever you want it to; Fly.io doesn&rsquo;t care. Typically, the message describes some kind of additional checking you want your service to perform before spitting out a discharge token.</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-135v2c4d" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-135v2c4d"><span class="k">def</span> <span class="nf">discharge</span><span class="p">(</span><span class="n">ka</span><span class="p">,</span> <span class="n">ticket</span><span class="p">):</span> <span class="n">ptxt</span> <span class="o">=</span> <span class="n">decrypt</span><span class="p">(</span><span class="n">ka</span><span class="p">,</span> <span class="n">dec</span><span class="p">(</span><span class="n">ticket</span><span class="p">))</span> <span class="k">if</span> <span class="n">ptxt</span> <span class="o">==</span> <span class="bp">False</span><span class="p">:</span> <span class="k">return</span> <span class="bp">False</span> <span class="n">tbody</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">ptxt</span><span class="p">)</span> <span class="c1"># not shown: do something with tbody['msg'] </span> <span class="k">return</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">([</span><span class="n">ticket</span><span class="p">,</span> <span class="n">enc</span><span class="p">(</span><span class="n">hmac</span><span class="p">(</span><span class="n">dec</span><span class="p">(</span><span class="n">tbody</span><span class="p">[</span><span class="s">'crk'</span><span class="p">]),</span> <span class="n">ticket</span><span class="p">))])</span> </code></pre> </div> </div> <p>To authorize a request with a token that includes a third-party caveat for the canary service, you need to get your hands on a corresponding discharge Macaroon. Normally, you do that by <code>POST</code>ing the ticket from the caveat to the service.</p> <p>Discharging is simple. The service, which holds <code>KA</code>, uses it to decrypt the ticket. It checks the message and makes some decisions. Finally, it mints a new macaroon, using <code>CRK</code>, recovered from the ticket, as the root key. The ticket itself is the nonce.</p> <p>If it wants, the third-party service can slap on a bunch of first-party caveats of its own. When we verify the Macaroon, we&rsquo;ll copy those caveats out and enforce them. Attenuation of a third-party discharge macaroon works like a normal macaroon.</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-gjymtoma" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-gjymtoma"><span class="k">def</span> <span class="nf">verify_third_party</span><span class="p">(</span><span class="n">tag</span><span class="p">,</span> <span class="n">cav</span><span class="p">,</span> <span class="n">discharges</span><span class="o">=</span><span class="p">[]):</span> <span class="n">crk</span> <span class="o">=</span> <span class="n">decrypt</span><span class="p">(</span><span class="n">tag</span><span class="p">,</span> <span class="n">dec</span><span class="p">(</span><span class="n">cav</span><span class="p">[</span><span class="s">'challenge'</span><span class="p">]))</span> <span class="k">if</span> <span class="n">crk</span> <span class="o">==</span> <span class="bp">False</span><span class="p">:</span> <span class="k">return</span> <span class="bp">False</span> <span class="n">discharge</span> <span class="o">=</span> <span class="bp">None</span> <span class="k">for</span> <span class="n">dcs</span> <span class="ow">in</span> <span class="n">discharges</span><span class="p">:</span> <span class="k">if</span> <span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">dcs</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="n">cav</span><span class="p">[</span><span class="s">'ticket'</span><span class="p">]:</span> <span class="n">discharge</span> <span class="o">=</span> <span class="n">dcs</span> <span class="k">break</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">discharge</span><span class="p">:</span> <span class="k">return</span> <span class="bp">False</span> <span class="n">mac</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">discharge</span><span class="p">)</span> <span class="n">key</span> <span class="o">=</span> <span class="n">crk</span> <span class="c1"># boring old stuff --------------------- </span> <span class="n">tag</span> <span class="o">=</span> <span class="s">""</span> <span class="k">for</span> <span class="n">cav</span> <span class="ow">in</span> <span class="n">mac</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]:</span> <span class="n">tag</span> <span class="o">=</span> <span class="n">hmac</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">cav</span><span class="p">)</span> <span class="n">key</span> <span class="o">=</span> <span class="n">tag</span> <span class="k">return</span> <span class="n">hm</span><span class="p">.</span><span class="n">compare_digest</span><span class="p">(</span><span class="n">tag</span><span class="p">,</span> <span class="n">dec</span><span class="p">(</span><span class="n">mac</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]))</span> </code></pre> </div> </div> <p>To verify tokens that have third-party caveats, start with the root Macaroon, walking the caveats like usual. At each third-party caveat, match the <code>ticket</code> from the caveat with the <code>nonce</code> on the discharge Macaroon. The key for root Macaroon decrypts the <code>challenge</code> in the caveat, recovering <code>CRK</code>, which cryptographically verifies the discharge.</p> <p>(The Macaroons paper uses different terms: “caveat identifier” or <code>cId</code> for “ticket”, and “verification-key identifier” or <code>vId</code> for “challenge”. These names are self-evidently bad and our contribution to the state of the art is to replace them.)</p> <p>There&rsquo;s two big applications for third-party caveats in Popular Macaroon Thought. First, they facilitate microservice-izing your auth logic, because you can stitch arbitrary policies together out of third-party caveats. And, they seem like <a href='https://github.com/go-macaroon-bakery/macaroon-bakery' title=''>fertile ground for an ecosystem of interoperable Macaroon services</a>: Okta and Google could stand up SSO dischargers, for instance, or someone can do a really good revocation service.</p> <p>Neither of these light us up. We&rsquo;re allergic to microservices. As for public protocols, well, it&rsquo;s good to want things. So we almost didn&rsquo;t even implement third-party caveats.</p> <h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'></a><span class='plain-code'>5</span></h2> <p>I&rsquo;m glad we did though, because they&rsquo;ve been pretty great.</p> <p>The first problem third-party caveats solved for us was hazmat tokens. To the extent possible, we want Macaroon tokens to be safe to transmit between users. Our Macaroons express permissions, but not authentication, so it’s almost safe to email them.</p> <p>The way it works is, our Macaroons all have a third-party caveat pointing to a &ldquo;login service&rdquo;, either identifying the proper bearer as a particular Fly.io user or as a member of some <code>Organization</code>. To allow a request with your token, you first need to collect the discharge from the login service, which requires authentication.</p> <p>The login discharge is very sensitive, but there isn&rsquo;t much reason to pass it around. The original permissions token is where all the interesting stuff is, and it&rsquo;s not scary. So that&rsquo;s nice.</p> <p><img src="/blog/macaroons-escalated-quickly/assets/fly-sso.png?1/3&amp;wrap-left" /></p> <p>Ben then came up with <a href="https://community.fly.io/t/organization-required-sso/17560">third-party caveats that require Google or Github SSO logins.</a> If your token has one of those caveats, when you run <code>flyctl deploy</code>, a browser will pop up to log you into your SSO IdP (if you haven’t done so recently already).</p> <p>We’ve put a <a href='https://fly.io/blog/tokenized-tokens/#tokenizer-the-fabled-4th-way' title=''>bunch of work into getting the guts of our SSO system working</a>, but that work has mostly been invisible to customers. But Macaroon-ized SSO has a subtle benefit: you can configure <a href='http://Fly.io' title=''>Fly.io</a> to automatically add SSO requirements to specific <code>Organizations</code> (so, for instance, a dev environment might not need SSO at all, and prod might need two).</p> <p>SSO requirements in most applications are a brittle pain in the ass. Ours are flexible and straightforward, and that happened almost by accident. Macaroons, baby!</p> <p>Here&rsquo;s a fun thing you can do with a Macaroon system: stand up a Slack bot, and give it an HTTP <code>POST</code> handler that accepts third-party tickets. Then:</p> <p><img src="/blog/macaroons-escalated-quickly/assets/bot-ok.png?1/2&amp;center&amp;border" /></p> <p>So, the bot is cute, but any platform could do that. What’s cool is the way our platform <em>doesn’t</em> work with Slack; in fact, nothing on our platform knows anything about Slack, and Slack doesn’t know anything about us. We didn’t reach out to a Slack endpoint. Everything was purely cryptographic.</p> <p>That bot could, if I sunk some time into it, enforce arbitrary rules: it could selectively add caveats for the requests it authorizes, based on lookups of the users requesting them, at specific times of day, with specific logging. Theoretically, it could add third-party caveats of its own.</p> <p>The win for us for third-party caveats is that they create a plugin system for our security tokens. That’s an unusual place to see a plugin interface! But Macaroons are easy to understand and keep in your head, so we&rsquo;re pretty confident about the security issues.</p> <h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'></a><span class='plain-code'>6</span></h2> <p>Obviously, we didn&rsquo;t write our Macaroon code in Python, or with HMAC-SHA256-CTR.</p> <p>We landed on a primary implementation Golang (Ben subsequently wrote an Elixir implementation). Our hash is SHA256, our cipher is Chapoly. We encode in MsgPack.</p> <div class="callout"><p>We didn’t use the pre-existing public implementation because <a href="https://securitycryptographywhatever.com/2021/08/12/what-do-we-do-about-jwt-with-jonathan-rudenberg/" title="">we were warned not to</a>. The Macaroon idea is simple, and it exists mostly as an academic paper, not a standard. The community that formed around building open source “standard” Macaroons decided to use untyped opaque blobs to represent caveats. We need things to be as rigidly unambiguous as they can be.</p> </div> <p><img src="/blog/macaroons-escalated-quickly/assets/verifier-service.png?2/3&amp;center" /></p> <p>The big strength of Macaroons as a cryptographic design — that it’s based almost entirely on HMAC — makes it a challenge to deploy. If you can verify a Macaroon, you can generate one. We have thousands of servers. They can&rsquo;t all be allowed to generate tokens.</p> <p>What we did instead:</p> <ul> <li>We split token checking into “verification” of token HMAC tags and “clearing” of token caveats. </li><li>Verification occurs only on a physically isolated token-verification service; to verify a token’s tag, you HTTP <code>POST</code> the token to the verifier. </li><li>Clearing of token caveats can happen anywhere. Token caveat clearing is domain-specific and subject to change; token verification is simple cryptography and changes rarely. </li><li>A token verification is cacheable. The client library for the token verifier does that, which speeds things up by exploiting the locality of token submissions. </li><li>The verification service is backed by a <a href='https://fly.io/docs/litefs/' title=''>LiteFS-distributed SQLite database</a>, so verification is fast globally — a major step forward from our legacy OAuth2 tokens, which are only fast in Ashburn, VA. </li></ul> <p><img src="/blog/macaroons-escalated-quickly/assets/service-token.png?2/3&amp;center" /></p> <p>Now buckle up, because I&rsquo;m about to try to get you to care about service tokens.</p> <p>We operate &ldquo;worker servers&rdquo; all over the world to host apps for our customers. To do that, those workers need access to customer secrets, like the key to decrypt a customer volume. To retrieve those secrets, the workers have to talk to secrets management servers.</p> <p>We manage a lot of workers. We trust them. But we don&rsquo;t trust them that much, if you get my drift. You don&rsquo;t want to just leave it up to the servers to decide which secrets they can access. The blast radius of a problem with a single worker should be no greater than the apps that are supposed to run there.</p> <p>The gold standard for approving access to customer information is, naturally, explicit customer authorization. We almost have that with Macaroons! The first time an app runs on a worker, <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>the orchestrator code</a> has a token, and it can pass that along to the secret stores.</p> <p>The problem is, you need that token more than once; not just when the user does a deploy, but potentially any time you restart the app or migrate it to a new worker. And you can&rsquo;t just store and replay user Macaroons. They have expirations.</p> <div class="right-sidenote"><p>This is like dropping privilege with things like pledge(2), but in a distributed system.</p> </div> <p>So our token verification service exposes an API that transforms a user token into a “service token”, which is just the token with the authentication caveat and expiration “stripped off”.</p> <p>What’s cool is: components that receive service tokens can attenuate them. For instance, we could lock a token to a particular worker, or even a particular Fly Machine. Then we can expose the whole <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Fly Machines API</a> to customer VMs while keeping access traceable to specific customer tokens. Stealing the token from a Fly Machine doesn’t help you since it’s locked to that Fly Machine by a caveat attackers can’t strip.</p> <h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'></a><span class='plain-code'>7</span></h2> <p>If a customer loses their tokens to an attacker, we can’t just blow that off and let the attacker keep compromising the account!</p> <div class="right-sidenote"><p>This cancels every token derived through attenuation by that nonce.</p> </div> <p>Every Macaroon we issue is identified by a unique nonce, and we can revoke tokens by that nonce. This is just a basic function of the token verification service we just described.</p> <p>We host token caches all over our fleet. Token revocation invalidates the caches. Anything with a cache checks frequently whether to invalidate. Revocation is rare, so just keeping a revocation list and invalidating caches wholesale seems fine.</p> <h2 id='8' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#8' aria-label='Anchor'></a><span class='plain-code'>8</span></h2> <p>I get it, it&rsquo;s tough to get me to shut up about Macaroons.</p> <p>A couple years ago, I <a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''>wrote a long survey of API token designs</a>, from JWTs (never!) to Biscuits. I had a <a href='https://fly.io/blog/api-tokens-a-tedious-survey/#macaroons' title=''>bunch to say about Macaroons</a>, not all of it positive, and said we&rsquo;d be plowing forward with them at Fly.io.</p> <p>My plan had been to follow up soon after with a deep dive on Macaroons as we planned them for Fly.io. I&rsquo;m glad I didn&rsquo;t do that, not just because it would&rsquo;ve been embarrassing to announce a feature that took us over 2 years to launch, but also because the process of working on this with Ben Toews changed a lot of my thinking about them.</p> <p>I think if you asked Ben, he&rsquo;d say he had mixed feelings about how much complexity we wrangled to get this launched. On the other hand: we got a lot of things out of them without trying very hard:</p> <ul> <li>Security tokens you can (almost) email to your users and partners without putting your account at risk. </li><li>A flexible permission system, encoded directly into the tokens, that users can drive without talking to our servers. </li><li>A plugin system that users can (when we clean up the tooling) use themselves, to add things like Passkeys or two-person-approval rules or audit logging, without us getting in the middle. </li><li>An SSO system that can stack different IdPs, mandate SSO login, and do that on a per-<code>Organization</code> basis. </li><li><a href='https://www.latacora.com/blog/2018/06/12/a-childs-garden/' title=''>Inter-service authorization</a> that is traceable back to customer actions, so our servers can&rsquo;t just make up which apps they&rsquo;re allowed to look at. </li><li>An elegant way of exposing our own APIs to customer Fly Machines with ambient authentication, but without the <a href="https://github.com/SummitRoute/imdsv2_wall_of_shame/blob/main/README.md">AWS IMDSv1 credential theft problem</a>. </li></ul> <p>There are downsides and warts! I&rsquo;m mostly not telling you about them! Pure restrictive caveats are an awkward way to express some roles. And, blinded by my hunger to get Macaroons deployed, I spat in the face of science and used internal database IDs as our public caveat format, an act for which JP will never forgive me.</p> <p>If i&rsquo;ve piqued your interest, <a href='https://github.com/superfly/macaroon' title=''>the code for this stuff is public</a>, along with some more <a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''>detailed technical documentation</a>.</p></content> </entry> <entry> <title>How Yoko Li makes towns, tamagoes, and tools for local AI</title> <link rel="alternate" href="https://fly.io/blog/how-i-fly-yoko-li/"/> <id>https://fly.io/blog/how-i-fly-yoko-li/</id> <published>2024-01-08T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/how-i-fly-yoko-li/assets/chat-bird-cover-thumb.webp"/> <content type="html"><p>Hello all, and welcome to another episode of How I Fly, a series where I interview developers about what they do with technology, what they find exciting, and the unexpected things they’ve learned along the way. This time I’m talking with <a href='https://twitter.com/stuffyokodraws' title=''>Yoko Li</a>, an investment partner at A16Z who’s also an open-source AI developer. She works on some of the most exciting AI projects in the world. I’m excited to share them with you today, with fun stories about the lessons she’s learned along the way.</p> <h2 id='cool-experiments' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#cool-experiments' aria-label='Anchor'></a><span class='plain-code'>Cool Experiments</span></h2> <p>One of Yoko’s most thought-provoking experiments is <a href='https://www.convex.dev/ai-town' title=''>AI Town</a>, a virtual town populated by AI agents that talk with each other. It takes advantage of the randomness of AI responses to create emergent behavior. When you open it, it looks like this:</p> <p><img alt="A picture of the AI Town homepage, a UI showing a top-down 2D RPG view with a visible river and a tent. The UI shows a conversation with the characters Alice and Stella." src="/blog/how-i-fly-yoko-li/assets/image1.webp" /></p> <p>You can see the AI agents talking with each other and watch how the relationships between them form and change over time. It’s also a lot of fun to watch.</p> <p>One of Yoko’s other experiments is <a href='https://ai-tamago.fly.dev/' title=''>AI Tamago</a>, a <a href='https://en.wikipedia.org/wiki/Tamagotchi' title=''>Tamagochi</a> virtual pet implemented with a large language model instead of the state machine that we’re all used to. AI Tamago uses an unmodified version of LLaMA 2 7B to take in game state and user inputs, then it generates what happens next. Every time you interact with your pet, it feeds data to LLaMA 2 and then uses Ollama’s JSON mode to generate unexpected output.</p> <p><img alt="A picture of the homepage of AI Tamago, showing a virtual pet with buttons to feed the pet, play with the pet, clean the pet, discipline the pet, check pet status, and deliver medical care to the pet." src="/blog/how-i-fly-yoko-li/assets/image4.webp" /></p> <p>It’s all the fun of the classic Tamagochi toys from the 90’s (including the ability to randomly discipline your virtual pet) without any of the coin cell batteries or having to carry around the little egg-shaped puck.</p> <p>But that’s just something you can watch, not something that’s as easy to play with on your own machine. Yoko has also worked on the <a href='https://github.com/ykhli/local-ai-stack' title=''>Local AI Starter Kit</a> that lets you go from zero to AI in minutes. It’s a collection of chains of models that let you ingest a bunch of documents, store them in a database, and then use those documents as context for a language model to generate responses. It’s everything you need to implement a “chat with a knowledge base” feature.</p> <h3 id='the-dark-of-ai-experiments' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-dark-of-ai-experiments' aria-label='Anchor'></a><span class='plain-code'>The dark of AI experiments</span></h3> <p>The Local AI Starter Kit is significant because normally to do this, you need to set up billing and API keys for at least four different API providers, and then you need to write a bunch of (hopefully robust) code to tie it all together. With the Local AI Starter Kit, you can do this on your own hardware, with your own data, and your own models privately. It’s a huge step forward for democratizing access to this technology.</p> <p>Document search is one of my favorite usecases for AI, and it’s one of the most immediately useful ones. It’s also one of the most fiddly and annoying to get right. To help illustrate this, I’ve made a diagram of the steps involved with setting up document search by hand:</p> <p><img alt="A diagram showing the process of ingesting a pile of markdown documents into a vector database. The documents are broken into a collection of sections, then each section is passed through an embedding model and the resulting vectors are stored in a vector database." src="/blog/how-i-fly-yoko-li/assets/image3.webp" /></p> <p>You start with your Markdown documents. Most Markdown documents are easily broken up into sections where each section will focus on a single aspect of the larger topic of the document. You can take advantage of this best practice by letting people search for each section individually, which is typically a lot more useful than just searching the entire document.</p> <div class="right-sidenote"><p>Okay, okay, fine. Language encircles concepts instead of defining them directly. The point still stands that we’re operating at a level “below” words and sentences, I don’t want to bog this down in a bunch of linear algebra that neither of us understand well enough to explain in a single paragraph like I am here. The main point is that it lets you “fuzzy match” relevant documents in a way that exact word search queries never could on their own.</p> </div> <p>Essentially, the vector embeddings that you generate from an embedding model are a mathematical representation of the “concepts” that the embedding model uses that are adjacent to the text of your documents. When you use the same model to generate embeddings for your documents and user queries, this lets you find documents that are similar to the query, but not precisely the same exact words. This is called “fuzzy searching” and it is one of the most difficult problems in computer science (right next to naming things).</p> <p>When a user comes to search the database, you do the same thing as ingestion:</p> <p><img alt="A diagram showing the full flow for doing document search Q&amp;A with a vector database. The user submits a question to an API endpoint, the question is broken into embedding vectors and used to search for similar vectors in the database. The relevant document fragments are fed into the prompt for a large language model to generate a response that is grounded in the facts from the documents that were ingested. The response is streamed to the user one token at a time." src="/blog/how-i-fly-yoko-li/assets/image2.webp" /></p> <p>The user query comes into your API endpoint. You use the same embedding model from earlier (omitted from the diagram for brevity) to turn that query into a vector. Then you query the same vector database to find documents that are similar to the query. Then you have a list of documents with metadata like the URL to the documentation page or section fragment in that page. From here you have two options. You can either use the documents to return a list of results to the user, or you can do the more fun thing: using those documents as context for a large language model to generate a response grounded in the relevant facts in those documents.</p> <div class="right-sidenote"><p>I think it’s also how OpenAI’s custom GPTs work, but they haven’t released technical details about how they work so this is outright speculation on my part.</p> </div> <p>This basic pattern is called Retrieval-augmented Generation (RAG), and it’s how Bing’s copilot chatbot works. The Local AI Starter Kit makes setting this pipeline up <em>effortless</em> and <em>fast</em>. It’s a huge step forward for making this groundbreaking technology accessible to everyone.</p> <h2 id='the-struggles' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-struggles' aria-label='Anchor'></a><span class='plain-code'>The struggles</span></h2> <blockquote> <p>When I was trying to get the AI models in AI Town to output JSON, I tried a bunch of different things. I got some good results by telling the model to &ldquo;only reply in JSON, no prose&rdquo;, but we ended up using a model tuned for outputting code. I think I inspired <a href='https://ollama.ai' title=''>Ollama</a> to add their JSON output feature.</p> </blockquote> <p>One of the main benefits of large language models is that they are essentially stochastic models of the entire Internet. They have a bunch of patterns formed that can let you create surprisingly different outputs from similar inputs. This is also one of the main drawbacks of large language models: they are essentially stochastic models of the entire Internet. They have a bunch of patterns formed that can let you create surprisingly different outputs from similar inputs. The outputs of these models are usually correct-ish enough (more correct if you ground the responses in document fact like you do with a Retrieval-augmented Generation system), but they are not always aligned with our observable reality.</p> <p>A lot of the time you will get outputs that don’t make any logical or factual sense. These are called “hallucinations” and they are one of the main drawbacks of large language models. If a hallucination pops in at the worst times, you’ve accidentally told someone how to poison themselves with chocolate chip cookies. This is, as the kids say, “bad”.</p> <p>The inherent randomness of the output of a large language model means that it can be difficult to get an exactly parsable format. Most of the time, you’d be able to coax the model to get usable JSON output, but without schema it can sometimes generate wildly different JSON responses. Only sometimes. This isn’t deterministic and Yoko has found that this is one of the most frustrating parts of working with large language models.</p> <div class="right-sidenote"><p>This works by making any offending ungrammatical tokens weighted to negative infinity. It’s amazingly hacky but the hilarious part is that it works.</p> </div> <p>However, there are workarounds. <a href='https://github.com/ggerganov/llama.cpp' title=''>llama.cpp</a> offers a way to use a grammar file to strictly guide the output of a large language model by using context-free grammar. This lets you get something more deterministic, but it’s still not perfect. It’s a lot better than nothing, though.</p> <p>One of the fun things that can happen with this is that you can have the model fail to generate anything but an endless stream of newlines in JSON mode. This is hilarious and usually requires some special detection logic to handle and restart the query. There’s work being done to let you use JSON schema to guide the generation of large language model outputs, but it’s not currently ready for the masses.</p> <div class="right-sidenote"><p>If it’s dumb and it works, is it really dumb?</p> </div> <p>However, one of the easiest ways to hack around this is by using a model that generates code instead of text. This is how Yoko got the AI Town and AI Tamago models to output JSON that was mostly valid. It’s a hack, but it works. This was made a lot easier for AI town when one of the tools they use (<a href='https://ollama.ai' title=''>Ollama</a>) added support for JSON output from the model. This is a lot better than the code generation model hack, but research continues.</p> <h2 id='the-simple-joy-of-unexpected-outputs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-simple-joy-of-unexpected-outputs' aria-label='Anchor'></a><span class='plain-code'>The simple joy of unexpected outputs</span></h2> <blockquote> <p>When I was making AI Town, I was inspired by <a href='https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Objects' title=''>The Lifecycle of Software Objects</a> by Ted Chiang. It&rsquo;s about a former zookeeper that trained AI agents to be pets, kinda like how we use Reinforcement Learning from Human Feedback to train AI models like ChatGPT.</p> </blockquote> <p>However, at the same time, there are cases where hallucinations are not only useful, but they are what make the implementation of a system possible. If large language models are essentially massive banks of the word frequencies of a huge part of culture, then the emergent output can create unexpected things that happen frequently. This lets you have emergent behavior form, this can be the backbone of games and is the key thing that makes AI Town work as well as it does.</p> <p>AI Tamago is also completely driven off of the results of large language model hallucinations. They are the core of what drives user inputs, the game loop, and the surprising reactions you get when disciplining your pet. The status screen takes in the game state and lets you know what your pet is feeling in a way that the segment displays of the Tamagochi toys could never do.</p> <p>These enable you to build workflows that are <em>augmented</em> by the inherent randomness of the hallucinations instead of seeing them as drawbacks. This means you need to choose outputs that can have the hallucinations shine instead of being ugly warts you need to continuously shave away. Instead of using them for doing pathfinding, have them drive the AI of your characters or writing the A* pathfinding algorithm so you don’t have to write it again for the billionth time.</p> <p>I’m not saying that large language models can replace the output of a human, but they are more like a language server for human languages as well as programming languages. They are best used when you are generating the boilerplate you don’t want to do yourself, or when you are throwing science at the wall to see what sticks.</p> <h2 id='in-conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#in-conclusion' aria-label='Anchor'></a><span class='plain-code'>In conclusion</span></h2> <p>Yoko is showing people how to use AI today, on local machines, with models of your choice, that allow you to experiment, hack and learn.</p> <p>I can’t wait to see what’s next!</p> <p>If you want to follow what Yoko does, here’s a few links to add to your feeds:</p> <ul> <li>Yoko’s <a href='https://twitter.com/stuffyokodraws' title=''>Twitter</a> (or X, or whatever we&rsquo;re supposed to call it now) </li><li>Yoko’s <a href='https://github.com/ykhli' title=''>GitHub</a> </li><li>Yoko’s <a href='https://yoko.dev/' title=''>Website</a> </li></ul> <p>(insert standard conclusion diatribe here)</p></content> </entry> <entry> <title>Deploy Your Own (Not) Midjourney Bot on Fly GPUs</title> <link rel="alternate" href="https://fly.io/blog/not-midjourney-bot/"/> <id>https://fly.io/blog/not-midjourney-bot/</id> <published>2024-01-04T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/not-midjourney-bot/assets/purple-balloon-taking-off-thumb.webp"/> <content type="html"><div class="lead"><p>Fly.io has Enterprise-grade GPUs and servers all over the globe (or <em>disk</em>, depending on which side of the flat Earth debate you fall on) making it a great place to deploy your next disruptive AI app.</p> </div> <p>Some people daydream about normal things, like coffee machines or raising that Series A round (those are normal things to dream about, right?). I daydream about commanding a fleet of chonky <a href='https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413/' title=''>NVIDIA Lovelace L40Ss</a>. Also, totally normal. Well, fortunately for me and anyone else wanting to explore the world of generative AI — Fly.io has GPUs now!</p> <p>Sure, this technology will probably end up with the AI <a href='https://marketoonist.com/2023/03/ai-written-ai-read.html' title=''>talking to itself</a> while we go about our lives — but it seems like it&rsquo;s here to stay, so we should at least have some fun with it. In this post we&rsquo;ll put these GPUs to task and you&rsquo;ll learn how to build your very own AI image-generating Discord bot, kinda like Midjourney. Available 24/7 and ready to serve up all the pictures of cats eating burritos your heart desires. And because I&rsquo;d never tell you to draw the rest of the owl, I&rsquo;ll link to working code that you can deploy today.</p> <h2 id='latent-diffusion-models-have-entered-the-chat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#latent-diffusion-models-have-entered-the-chat' aria-label='Anchor'></a><span class='plain-code'>Latent Diffusion Models Have Entered the Chat</span></h2> <p>In the realm of AI image generation, two names have become prominent: Midjourney and Stable Diffusion. Both are image generating software that allow you to synthesize an image from a textual prompt. One is a closed source paid service, while the other is open source and can run locally. Midjourney gained popularity because it allowed the less technically-inclined among us to explore this technology through its ease of use. Stable Diffusion democratized access to the technology, but it can be quite tricky to get good results out of it.</p> <p>Enter <a href='https://github.com/lllyasviel/Fooocus' title=''>Fooocus</a> (pronounced <em>focus</em>), an open source project that combines the best of both worlds and offers a user-friendly interface to Stable Diffusion. It&rsquo;s hands down the easiest way to get started with Stable Diffusion. Sure there are more popular tools like Stable Diffusion web UI and ComfyUI, but Fooocus adds some magic to reduce the need to manually tweak a bunch of settings. The most significant feature is probably GPT-2-based &ldquo;<a href='https://github.com/lllyasviel/Fooocus/discussions/117#raw' title=''>prompt expansion</a>&rdquo; to dynamically enhance prompts.</p> <p>The point of Fooocus is to <em>focus</em> on your prompt. The more you put into it, the more you get out. That said, a very simple prompt like &ldquo;forest elf&rdquo; can return high-quality images without the need to trawl the web for prompt ideas or fiddle with knobs and levers (although they&rsquo;re there if you want them).</p> <p>So, what can this thing <em>do</em>? Well, this…</p> <p><img alt="A black and white sketch of hot-air balloon over a mountain range generated using Fooocus with &quot;Pencil Sketch Drawing&quot; style and quality = True" src="/blog/not-midjourney-bot/assets/./balloon-sketch.webp" /></p> <p>Here&rsquo;s the full command I&rsquo;ve used to generate this image: <code>/imagine prompt: sketch of hot-air balloon over a mountain range style1: Pencil Sketch Drawing quality: true ar: 1664×576</code></p> <h2 id='what-were-building' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-were-building' aria-label='Anchor'></a><span class='plain-code'>What We&rsquo;re Building</span></h2> <p>We&rsquo;ll deploy two applications. The code to run the bot itself will run on normal VM hardware, and the API server doing all the hard work synthesizing alpacas out of thin air will run on GPU hardware.</p> <p><img alt="An architecture diagram explaining how the two apps will communicate and return the requested image to an end user." src="/blog/not-midjourney-bot/assets/./arch-diagram.png?center&amp;2/3" /></p> <p>Fooocus is served up as a web UI by default, but with a little elbow grease we can interact with it as a REST API. Fortunately, with more than 25k stars on GitHub at the time of writing, the project has a lively open-source community, so we don&rsquo;t need to do much work here — it&rsquo;s already been done for us. <a href='https://github.com/konieshadow/Fooocus-API' title=''>Fooocus-API</a> is a project that shoves FastAPI in front of a Fooocus runtime. We&rsquo;ll use this for the API server app.</p> <p>The Python-based bot connects to the <a href='https://discord.com/developers/docs/topics/gateway' title=''>Discord Gateway API</a> using the <a href='https://github.com/Pycord-Development/pycord' title=''>Pycord</a> library. When it starts up, it maintains an open pipe for data to flow back and forth via WebSockets. The bot app also includes a client that knows how to talk to the API server using Flycast and request the image it needs via HTTP.</p> <p>When we request an image from Discord using the <code>/imagine</code> slash command, we immediately respond using Pycord&rsquo;s <code>defer()</code> function to let Discord know that the request has been received and the bot is working on it — it&rsquo;ll take a few seconds to process your prompt, fabricate an image, upload it to Discord and let you share it with your friends. This is a blocking operation, so it won&rsquo;t perform well if you have hundreds of people on your Discord Server using the command. For that, you&rsquo;ll want to jiggle some wires to make the code non-blocking. But for for now, this gives us a nice UX for the bot.</p> <p>When the API server returns the image, it gets saved to disk. We&rsquo;ll use the fantastic <a href='https://github.com/sqids/sqids-python' title=''>Sqids</a> library to generate collision-free file names:</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-75afx6ud" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-75afx6ud"><span class="n">unique_id</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">sqids</span><span class="p">.</span><span class="n">encode</span><span class="p">(</span> <span class="p">[</span><span class="n">ctx</span><span class="p">.</span><span class="n">author</span><span class="p">.</span><span class="nb">id</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">())]</span> <span class="p">)</span> <span class="n">result_filename</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"result_</span><span class="si">{</span><span class="n">unique_id</span><span class="si">}</span><span class="s">.png"</span> </code></pre> </div> </div> <p>We&rsquo;ll also use <code>asyncio</code> to check if the image is ready every second, and when it is, we send it off to Discord to complete the request:</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-w1v7557b" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-w1v7557b"><span class="k">while</span> <span class="ow">not</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">exists</span><span class="p">(</span><span class="n">result_filename</span><span class="p">):</span> <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">result_filename</span><span class="p">,</span> <span class="s">"rb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span> <span class="k">await</span> <span class="n">ctx</span><span class="p">.</span><span class="n">respond</span><span class="p">(</span> <span class="nb">file</span><span class="o">=</span><span class="n">discord</span><span class="p">.</span><span class="n">File</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">result_filename</span><span class="p">)</span> <span class="p">)</span> </code></pre> </div> </div> <p>Neither of these two apps will be exposed to the Internet, yet they&rsquo;ll still be able to communicate with each other. One of the undersold stories about Fly.io is the ease with which two applications can communicate over the private network. We assign special IPv6 private network (6pn) addresses within the same organizational space and applications can effortlessly discover and connect to one another without any additional configuration.</p> <p>But what about load balancing and this &ldquo;scale-to-zero&rdquo; thing? We don&rsquo;t <em>just</em> want our two apps to talk to each other, we want the Fly Proxy to start our Machine when a request comes in, and stop it when idle. For that, we&rsquo;ll need <a href='https://fly.io/docs/reference/private-networking/#flycast-private-load balancing' title=''>Flycast</a>, our private load balancing feature.</p> <p>When you assign a Flycast IP to your app, you can route requests using a special <code>.flycast</code> domain. Those requests are routed through the Fly Proxy instead of directly to instances in your app. Meaning you get all the load balancing, rate limiting and other proxy goodness that you&rsquo;re accustomed to. The Proxy runs a process which can automatically downscale Machines every few minutes. It&rsquo;ll also start them right back up when a request comes in — this means we can take advantage of scale-to-zero, saving us a bunch of money!</p> <h2 id='the-imagine-command' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-imagine-command' aria-label='Anchor'></a><span class='plain-code'>The <code>/imagine</code> Command</span></h2> <p>The slash command is the heart of your bot, enabling you to generate images based on your prompt, right from within Discord. When you type <code>/imagine</code> into the Discord chat, you&rsquo;ll see some command options pop up.</p> <p>You&rsquo;ll need to input your base prompt (e.g. &ldquo;an alpaca sleeping in a grassy field&rdquo;) and optionally pick some styles (&ldquo;Pencil Sketch Drawing&rdquo;, &ldquo;Futuristic Retro Cyberpunk&rdquo;, &ldquo;MRE Dark Cyberpunk&rdquo; etc). With Fooocus, combining multiple styles — &ldquo;style-chaining&rdquo; — can help you achieve amazing results. Set the aspect ratio or provide negative prompts if needed, too.</p> <p>After you execute the command, the bot will request the image from the API, then send it as a response in the chat. Let’s see it in action!</p> <p><img alt="A dif demo run through showcasing the ability of the bot to generate images from Discord" src="/blog/not-midjourney-bot/assets/./demo.gif?card&amp;center" /></p> <h2 id='deployment-speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-speedrun' aria-label='Anchor'></a><span class='plain-code'>Deployment Speedrun</span></h2> <p><strong class='font-semibold text-navy-950'>First, we&rsquo;ll deploy the API server.</strong> For convenience (and to speed things up), we&rsquo;ll use a pre-built image when we deploy. With dependencies like <code>torch</code> and <code>torchvision</code> bundled in, it&rsquo;s a hefty image weighing in just shy of 12GB. With a normal Fly Machine this would not only be a bad idea, but not even possible due to an 8GB limit for the VMs rootfs. Fortunately the wizards behind Fly GPUs have accounted for our need to run huge models and their dependencies, and awarded us 50GB of rootfs.</p> <div class="right-sidenote"><p>Fly GPUs use <a href="https://github.com/cloud-hypervisor/cloud-hypervisor" title="">Cloud Hypervisor</a> and not <a href="https://github.com/firecracker-microvm/firecracker" title="">Firecracker</a> (like a regular Fly Machine) for virtualization. But even with a 12GB image, this doesn’t stop the Machine from booting in seconds when a new request comes in through the Proxy.</p> </div> <p>To start, clone the template <a href='https://github.com/fly-apps/not-midjourney-bot' title=''>repository</a>. You&rsquo;ll need this for both the bot and server apps. Then deploy the server with the Fly CLI:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-ytt1j7os" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-ytt1j7os">fly deploy <span class="se">\</span> <span class="nt">--image</span> ghcr.io/fly-apps/not-midjourney-bot:server <span class="se">\</span> <span class="nt">--config</span> ./server/fly.toml <span class="se">\</span> <span class="nt">--no-public-ips</span> </code></pre> </div> </div> <p>This command tells Fly.io to deploy your application based on the configuration specified in the <code>fly.toml</code>, while the <code>--no-public-ips</code> flag secures your app by not exposing it to the public Internet.</p> <p>Remember Flycast? To use it, we’ll allocate a private IPv6:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-g3tqfpkl" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-g3tqfpkl">fly ips allocate-v6 <span class="nt">--private</span> </code></pre> </div> </div> <p>Now, let&rsquo;s take a look at our <a href='https://github.com/fly-apps/not-midjourney-bot/blob/134bb634f97bf81040e489650f2334b48d976c10/server/fly.toml' title=''><code>fly.toml</code></a> config:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-a3s9879o" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-a3s9879o"><span class="py">app</span> <span class="p">=</span> <span class="s">"alpaca-image-gen"</span> <span class="py">primary_region</span> <span class="p">=</span> <span class="s">"ord"</span> <span class="nn">[[vm]]</span> <span class="py">size</span> <span class="p">=</span> <span class="s">"performance-8x"</span> <span class="py">memory</span> <span class="p">=</span> <span class="s">"16gb"</span> <span class="py">gpu_kind</span> <span class="p">=</span> <span class="s">"l40s"</span> <span class="nn">[[services]]</span> <span class="py">internal_port</span> <span class="p">=</span> <span class="mi">8888</span> <span class="py">protocol</span> <span class="p">=</span> <span class="s">"tcp"</span> <span class="py">auto_stop_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">auto_start_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">min_machines_running</span> <span class="p">=</span> <span class="mi">0</span> <span class="nn">[[services.ports]]</span> <span class="py">handlers</span> <span class="p">=</span> <span class="nn">["http"]</span> <span class="py">port</span> <span class="p">=</span> <span class="mi">80</span> <span class="py">force_https</span> <span class="p">=</span> <span class="kc">false</span> <span class="nn">[mounts]</span> <span class="py">source</span> <span class="p">=</span> <span class="s">"repositories"</span> <span class="py">destination</span> <span class="p">=</span> <span class="s">"/app/repositories"</span> <span class="py">initial_size</span> <span class="p">=</span> <span class="s">"20gb"</span> </code></pre> </div> </div> <p>There are a few key things to note here:</p> <ol> <li>Currently, the NVIDIA L40Ss we&rsquo;re using when we specify <code>gpu_kind</code> are only available in <code>ORD</code>, so that&rsquo;s what we&rsquo;ve set the <code>primary_region</code> to. We&rsquo;re rolling out more GPUs to more regions in a hurry — but for now we&rsquo;ll host the bot in Chicago. </li><li>Out of the box, 8GB of system RAM is suggested. In my testing this wasn&rsquo;t close to enough: the Machine would frequently run out of memory and crash. I got things working better by using 16GB of RAM. </li><li>The FastAPI server binds to port 8888; we need to set this as our <code>internal_port</code>, or the Fly Proxy won&rsquo;t know where to send requests. </li><li>We want our Machine to <a href='https://fly.io/docs/apps/autostart-stop/' title=''>automatically stop and start</a>. </li><li>Flycast doesn&rsquo;t do HTTPS, so we won&rsquo;t force it here. Don&rsquo;t worry, it&rsquo;s still encrypted over the wire! </li><li>A volume is automatically created on the first deploy. On first boot, the app clones the Fooocus repo and downloads the Stable Diffusion model checkpoints onto that volume. This takes a couple of minutes, but the next time the Machine starts, it&rsquo;ll have everything it needs to serve a request within seconds. </li></ol> <div class="callout"><p>The <a href="https://github.com/fly-apps/not-midjourney-bot/blob/84e72d1e7048627b7c845fe3d44d45b278e451d5/README.md" title=""><strong class="font-semibold text-navy-950">README</strong></a> for this project has detailed instructions about setting up your Discord bot and adding it to a Server. After setting up the permissions and privileged intents, you’ll get an OAuth2 URL. Use this URL to invite your bot to your Discord server and confirm the permissions. Once that’s done, grab your Discord API token, you’ll need it for the next step.</p> </div> <p><strong class='font-semibold text-navy-950'>With the API server up and running, it&rsquo;s time to deploy the Discord bot.</strong> This app will run on a normal Fly Machine, no GPU required. First, set the <code>DISCORD_TOKEN</code> and <code>FOOOCUS_API_URL</code> (the Flycast endpoint for the API server) secrets, using the Fly CLI. Then deploy:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-314htg3w" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-314htg3w">fly deploy <span class="se">\</span> <span class="nt">--image</span> ghcr.io/fly-apps/not-midjourney-bot:bot <span class="se">\</span> <span class="nt">--config</span> ./bot/fly.toml <span class="se">\</span> <span class="nt">--no-public-ips</span> </code></pre> </div> </div> <p>Notice that the bot app doesn&rsquo;t need to be publicly visible on the Internet either. Under the hood, the WebSocket connection to Discord&rsquo;s Gateway API allows the bot to communicate freely without the need to define any services in our <code>fly.toml</code>. This also means that the Fly Proxy will not downscale the app like it does the GPU Machine — the bot will always appear &ldquo;online&rdquo;.</p> <figure class="post-cta"> <figcaption> <h1>Not interested in GPUs?</h1> <p>You can still deploy apps on Fly.io today and be up and running in a matter of minutes.</p> <a class="btn btn-lg" href="https://fly.io/docs/speedrun/"> Deploy an app now<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='how-do-i-know-this-thing-is-using-gpu-for-reals' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-do-i-know-this-thing-is-using-gpu-for-reals' aria-label='Anchor'></a><span class='plain-code'>How Do I Know This Thing Is Using GPU for Reals?</span></h2> <p>That&rsquo;s easy! NVIDIA provides us with a neat little command-line utility called <code>nvidia-smi</code> which we can use to monitor and get information about NVIDIA GPU devices.</p> <p>Let&rsquo;s SSH to the running Machine for the API server app and run an <code>nvidia-smi</code> query in one go. It&rsquo;s a little clunky, but you&rsquo;ll get the point:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-v0fauj3q" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-v0fauj3q">fly ssh console <span class="se">\</span> <span class="nt">-C</span> <span class="s2">"nvidia-smi --query-gpu=gpu_name,utilization.gpu,utilization.memory,temperature.gpu,power.draw --format=csv,noheader --loop"</span> </code></pre> </div> </div><div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-j86zv5m2" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-j86zv5m2">Connecting to fdaa:2:f664:a7b:210:d8b2:8fd8:2... complete NVIDIA L40S, 0 %, 0 %, 46, 88.63 W NVIDIA L40S, 0 %, 0 %, 46, 88.61 W NVIDIA L40S, 36 %, 4 %, 51, 103.41 W NVIDIA L40S, 65 %, 25 %, 57, 280.90 W NVIDIA L40S, 0 %, 0 %, 49, 91.13 W NVIDIA L40S, 0 %, 0 %, 48, 89.76 W </code></pre> </div> </div> <p>What we&rsquo;ve done is run the command on a loop while the bot is actually doing work synthesizing an image and we get to see it ramp up and consume more wattage and VRAM. The card is barely breaking a sweat!</p> <h2 id='how-much-will-these-alpaca-pics-cost-me' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-will-these-alpaca-pics-cost-me' aria-label='Anchor'></a><span class='plain-code'>How Much Will These Alpaca Pics Cost Me?</span></h2> <p>Let&rsquo;s talk about the cost-effectiveness of this setup. On Fly.io, an L40S GPU <a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''>costs</a> $2.50/hr. Tag on a few cents per hour for the VM resources and storage for our models and you&rsquo;re looking at about $3.20/hr to run the GPU Machine. It&rsquo;s <em>on-demand</em>, too — if you&rsquo;re not using the compute, you&rsquo;re not paying for it! Keep in mind that some of these checkpoint models can be several gigabytes and if you create a volume, you will be charged for it even when you have no Machines running. It&rsquo;s worth noting too, that the non-GPU bot app falls into our <a href='https://fly.io/docs/about/pricing/#free-allowances' title=''>free allowance</a>.</p> <div class="right-sidenote"><p>Rates are on-demand, with no minimum usage requirements. Discounted rates for reserved GPU Machines and dedicated hosts are also available if you email <a href="mailto:[email protected]" title="">[email protected]</a></p> </div> <p>In comparison, Midjourney offers several subscription tiers with the cheapest plan costing $10/mo and providing 3.3 hours of &ldquo;fast&rdquo; GPU time (roughly equivalent to an enterprise-grade Fly GPU). This works out to about $3/hr give or take a few cents.</p> <h2 id='where-can-i-take-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-can-i-take-this' aria-label='Anchor'></a><span class='plain-code'>Where Can I Take This?</span></h2> <p>There is a lot you can do to build out the bot&rsquo;s functionality. You control the source code for the bot, meaning that you can make it do <em>whatever you want</em>. You might decide to mimic Midjourney&rsquo;s <code>/blend</code> command to splice your own images into prompts (AKA img2img diffusion). You can do this by adding more commands to your <a href='https://guide.pycord.dev/popular-topics/cogs' title=''>Cog</a>, Pycord&rsquo;s way of grouping similar commands. You might decide to add a button to roll the image if you don&rsquo;t like it, or even specify the number of images to return. The possibilities are endless and your cloud bill&rsquo;s the limit!</p> <p>The full code for the bot and server (with detailed instructions on how to deploy it on Fly.io) can be found <a href='https://github.com/fly-apps/not-midjourney-bot' title=''><strong class='font-semibold text-navy-950'>here</strong></a>.</p></content> </entry> <entry> <title>Fly With Alpine</title> <link rel="alternate" href="https://fly.io/blog/fly-with-alpine/"/> <id>https://fly.io/blog/fly-with-alpine/</id> <published>2023-12-21T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/fly-with-alpine/assets/fly-with-alpine-thumb.webp"/> <content type="html"><div class="lead"><p>Reduce image sizes and improve startup times by switching your base image to Alpine Linux.</p> </div> <p>Before proceeding, a caution. This is an engineering trade-off. Test carefully before deploying to production.</p> <p>By the end of this blog post you should have the information you need to make an informed decision.</p> <h2 id='introduction' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#introduction' aria-label='Anchor'></a><span class='plain-code'>Introduction</span></h2> <p><a href='https://www.alpinelinux.org/about/' title=''>Alpine Linux</a> is a Linux distribution that advertises itself as Small. Simple. Secure.</p> <p>It is indisputably smaller than the alternatives &ndash; when measured by image size. More on that in a bit. Some claim that this results in less memory usage and better performance. Others dispute these claims. For these, it is best that you test the results for yourself with your application.</p> <p>Simple is harder to measure. Some of the larger differences, like <a href='https://github.com/OpenRC/openrc#readme' title=''>OpenRC</a> vs <a href='https://systemd.io/' title=''>SystemD</a>, are less relevant in container environments. Others, like <a href='https://busybox.net/' title=''>BusyBox</a> are implementation details. Essentially what you get is a Linux distribution with perhaps a number of standard packages (e.g., bash) not installed by default, but these can be easily added if needed.</p> <p>Secure is definitely an important attribute. The alternatives make comparable claims in this area. Do your own research in this area and come to your own conclusions.</p> <p>Not mentioned is the downside: Alpine Linux has a smaller ecosystem that the alternatives, particularly when compared to Debian.</p> <h2 id='baseline' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#baseline' aria-label='Anchor'></a><span class='plain-code'>Baseline</span></h2> <p>Let&rsquo;s start with a baseline consisting of the Dockerfiles produced by <code>fly launch</code> for some of the most popular frameworks:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-ywliy2hv" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-ywliy2hv">FROM fideloper/fly-laravel:${PHP_VERSION} FROM hexpm/elixir:1.12.3-erlang-24.1.4-debian-bullseye-20210902-slim FROM node:${NODE_VERSION}-slim FROM oven/bun:${BUN_VERSION}-slim FROM python:${PYTHON_VERSION}-slim-bullseye FROM ruby:$RUBY_VERSION-slim </code></pre> </div> </div> <p>What may not be obvious to the naked eye from these results is that the base image for these is one of the following:</p> <ul> <li>Debian Bookworm (the current &ldquo;stable&rdquo; distribution) </li><li>Debian Bullseye (the previous &ldquo;stable&rdquo; distribution) </li><li>Ubuntu Focal Fossa (the previous LTS release of Ubuntu) </li></ul> <p>Once you factor in that Ubuntu is based on Debian, the conclusion is that Debian is effectively the default distribution for fly IO. Rest assured that this isn&rsquo;t the result of a devious conspiracy by Fly.io, but rather a reflection of the default choices made independently by the developers of a number of frameworks and runtimes. Beyond this, all Fly.io is doing is choosing the &ldquo;slim&rdquo; version of the default distribution for each framework as the base.</p> <p>What&rsquo;s likely going on here is a virtuos circle: people choose Debian because of the ecosystem, and ecosystem grows because people chose Debian.</p> <p>Now lets compare base image sizes:</p> <table class="ml-8 mb-8"> <thead> <tr> <th class="px-8"> <th class="px-8 underline">Alpine <th class="px-8 underline">Debian slim </tr> </thead> <tbody> <tr> <th class="text-left">Bun 1.0.18 <td class="text-center">43.10M <td class="text-center">63.84M </tr> <tr> <th class="text-left">Node 21.4.0 <td class="text-center">46.83M <td class="text-center">70.08M </tr> <tr> <th class="text-left">Python 3.12.1 <td class="text-center">17.59M <td class="text-center">45.36M </tr> <tr> <th class="text-left">Ruby 3.2 <td class="text-center">40.14M <td class="text-center">74.36M </tr> </tbody> </table> <p>And these numbers are just the for the base images. I&rsquo;ve measured a minimal Rails/Postgresql/esbuild application at 304MB on Alpine and 428MB on Debian Slim. A minimal Bun application at 110MB on Alpine and 173MB on Debian Slim. And a minimal Node application at 142MB on Alpine and 207MB on Debian Slim.</p> <p>In each case, corresponding Alpine images are consistently smaller than their Debian slim equivalent.</p> <h2 id='switching-distributions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#switching-distributions' aria-label='Anchor'></a><span class='plain-code'>Switching Distributions</span></h2> <p>Switch distributions (and switching back!) is easy.</p> <p>The first change is to replace <code>-slim</code> with <code>-alpine</code> in <code>FROM</code> statements in your <code>Dockerfile</code>.</p> <p>Next is to replace <code>apt-get update</code> with <code>apk update</code> and <code>apt-get install</code> with <code>apk add</code>. Delete any options you may have like <code>-y</code> and <code>--no-install-recommends</code> - they aren&rsquo;t needed.</p> <p>Now review the names of the packages you are installing. Many are named the same. A few are different. You can use <a href='https://pkgs.alpinelinux.org/packages' title=''>alpine packages</a> to look for ones to use. Some examples of differences:</p> <table class="ml-8 mb-8" style="border-collapse: separate; border-spacing: 1rem 0"> <thead> <tr> <th class="px-8 underline text-left">Debian <th class="px-8 underline text-left">Alpine </tr> </thead> <tbody> <tr> <td>build-essential <td>build-base </tr> <tr> <td>chromium-sandbox <td>chromium-chromedriver </tr> <tr> <td>default-libmysqlclient-dev <td>mysql-client </tr> <tr> <td>default-mysqlclient <td>mysql-client </tr> <tr> <td>freedts-bin <td>freedts </tr> <tr> <td>libicu-dev <td>icu-dev </tr> <tr> <td>libjemalloc <td>jemalloc-dev </tr> <tr> <td>libjpeg-dev <td>jpeg-dev </tr> <tr> <td>libmagickwand-dev <td>imagemagick-libs </tr> <tr> <td>libsqlite3-0 <td>sqlite-dev </tr> <tr> <td>libtiff-dev <td>tiff-dev </tr> <tr> <td>libvips <td>vips-dev </tr> <tr> <td>node-gyp <td>gyp </tr> <tr> <td>pkg-config <td>pkgconfig </tr> <tr> <td>python <td>python3 </tr> <tr> <td>python-is-python3 <td>python3 </tr> <tr> <td>sqlite3 <td>sqlite </tr> </tbody> </table> <p>Note: the above is just an approximation. For example, while <code>libsqlite3-0</code> and <code>sqlite-dev</code> include everything you need to build an application that uses sqlite3, all that is needed at runtime is <code>sqlite-lib</code>. This relentless attention to detail contributes to smaller final image sizes.</p> <p>Note: For Bun, Node, and Rails users, knowledge of how to apply the above changes are included in recent versions of the dockerfile generators that we provide. After all, computers are good at <code>if</code> statements:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-q2q9lq4b" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-q2q9lq4b">bunx dockerfile --alpine npx dockerfile --alpine bin/rails generate dockerfile --alpine </code></pre> </div> </div><figure class="post-cta"> <figcaption> <h1>Choose your own Linux Distribution</h1> <p>Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.</p> <a class="btn btn-lg" href="https://fly.io/docs/"> Run your entire stack near your users </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-rabbit.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='potential-issues' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#potential-issues' aria-label='Anchor'></a><span class='plain-code'>Potential issues</span></h2> <p>Over time, we&rsquo;ve noted a number of issues.</p> <ul> <li>Alpine uses <a href='https://musl.libc.org/' title=''>musl</a> for a runtime library. Debian uses <a href='https://www.gnu.org/software/libc/' title=''>glibc</a>. Software tested on glibc may not work as expected on musl. And there are other potential compatibility issues like <a href='https://bell-sw.com/blog/how-to-deal-with-alpine-dns-issues/' title=''>DNS</a>. </li><li>Debian includes both <code>adduser</code> and <code>useradd</code>. Alpine, by default, only includes <code>adduser</code>. This can be addressed by installing package like <a href='https://pkgs.alpinelinux.org/package/edge/community/armv7/shadow' title=''>shadow</a>, or switching to <code>adduser</code>. </li><li>Packages like <a href='https://github.com/nodenv/node-build' title=''>node-build</a> require <code>bash</code> which isn&rsquo;t included by default. Adding it back in allows <code>node-build</code> to run to completion, but the end result is that a precompiled Debian executable is installed that won&rsquo;t run on Alpine. An alternative is to download an <a href='https://unofficial-builds.nodejs.org/' title=''>unofficial build</a>. </li><li>Release candidates for Alpine may not get the same level of testing as Debian resulting in problems like <a href='https://github.com/sparklemotion/sqlite3-ruby/issues/434' title=''>sqlite3-ruby not working on Alpine 3.19</a>. In cases like this, stay back on previous versions of Alpine for a short while, or compile the gem for yourself. These issues are temporary. </li><li>Some packages, like Chrome, are not available for Alpine. Alternatives like Chromium may be necessary. </li></ul> <h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>While not as large a community as Debian, there is a substantial number of happy Alpine users.</p> <p>For the forseeable future, the default for both frameworks and there fly.io will remain Debian, but we make it easy to switch.</p> <p>Try it out! Hopefully this blog has provided insight into what you should evaluate for before you switch.</p></content> </entry> <entry> <title>Introducing Fly Kubernetes</title> <link rel="alternate" href="https://fly.io/blog/fks/"/> <id>https://fly.io/blog/fks/</id> <published>2023-12-18T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/fks/assets/fks-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io, and if you’ve been following us awhile you probably just did a double-take. We’re building a new public cloud that runs containerized applications with virtual machine isolation on our own hardware around the world. And we’ve been doing it without any K8s. Until now!</p> </div><div class="callout"><p><strong class="font-semibold text-navy-950">Update, March 2024:</strong> FKS does more stuff now, and you can read about it in <a href="https://fly.io/blog/fks-beta-live/" title="">Fly Kubernetes does more now</a></p> </div> <p>We&rsquo;ll own it: we&rsquo;ve been snarky about Kubernetes. We are, at heart, old-school Unix nerds. We&rsquo;re still scandalized by <code>systemd</code>.</p> <p>To make matters more complicated, the problems we&rsquo;re working on <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>have a lot of overlap with K8s</a>, but <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/#numad' title=''>just enough impedance mismatch</a> that it (<a href='https://www.nomadproject.io/' title=''>or anything that looks like it</a>) is a bad fit for our own platform.</p> <p>But, come on: you never took us too seriously about K8s, right? K8s is hard for us to use, but that doesn&rsquo;t mean it&rsquo;s not a great fit for what you&rsquo;re building. We&rsquo;ve been clear about that all along, right? Sure we have!</p> <p>Well, good news, everybody! If K8s is important for your project, and that&rsquo;s all that&rsquo;s been holding you back from <a href='https://fly.io/docs/speedrun/' title=''>trying out Fly.io</a>, we&rsquo;ve spent the past several months building something for you.</p> <h2 id='fly-io-for-kubernetians' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-io-for-kubernetians' aria-label='Anchor'></a><span class='plain-code'>Fly.io For Kubernetians</span></h2> <p>Fly.io works by transmogrifying Docker containers into filesystems for <a href='https://firecracker-microvm.github.io/' title=''>lightweight hypervisors</a>, and running them on servers we rack in dozens of regions around the world.</p> <p>You can build something like Fly.io with &ldquo;standard&rdquo; orchestration tools like K8s. In fact, that&rsquo;s what we did to start, too. To keep things simple, we used Nomad, and instead of K8s CNIs, we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system <a href='https://fly.io/blog/bpf-xdp-packet-filters-and-udp/' title=''>based on eBPF</a>). But the ideas are the same.</p> <p>The way we look at it, the signature feature of a &ldquo;standard&rdquo; orchestrator is the global scheduler: the global eye in the sky that keeps track of vacancies on servers and optimized placement of new workloads. That&rsquo;s the problem we ran into. We&rsquo;re running over 200,000 applications, and we&rsquo;re doing so on every continent except Antarctica. The speed of light (and a globally distributed network of backhoes) has something to say about keeping a perfectly consistent global picture of hundreds of thousands of applications, and it&rsquo;s not pleasant.</p> <p>The other problem we ran into is that our Nomad scheduler kept trying to outsmart us, and, worse, our customers. It turns out that our users have pretty firm ideas of where they&rsquo;d like their apps to run. If they ask for São Paulo, they want São Paulo, not Rio. But global schedulers have other priorities, like optimally bin-packing resources, and sometimes <code>GIG</code> looks just as good as <code>GRU</code> to them.</p> <p>To escape the scaling and DX problems we were hitting, we rethought orchestration. Where orchestrators like K8s tend to work through distributed consensus, we keep state local to workers. Each racked server in our fleet is a source of truth about the apps running on it, and provide an API to a market-style &ldquo;scheduler&rdquo; that bids on resources in regions. <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/#numad' title=''>You can read more about here, if you&rsquo;re interested.</a> We call this system the <a href='https://fly.io/docs/machines/' title=''>Fly Machines API.</a></p> <p>An important detail to grok about how this all works – a reason we haven&rsquo;t, like, beaten the CAP theorem by doing this – is that Fly Machines API calls can fail. If Nomad or K8s tries to place a workload on some server, only to find out that it&rsquo;s filled up or thrown a rod, it will go hunt around for some other place to put it, like a good little robot. The Machines API won&rsquo;t do this. It&rsquo;ll just fail the request. In fact, it goes out of its way to fail the request quickly, to deliver feedback; if we can&rsquo;t schedule work in <code>JNB</code> right now, you might want instead to quickly deploy to <code>BOM</code>.</p> <h2 id='pluggable-orchestration-and-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pluggable-orchestration-and-fks' aria-label='Anchor'></a><span class='plain-code'>Pluggable Orchestration and FKS</span></h2> <p>In a real sense what we&rsquo;ve done here is extract a chunk of the scheduling problem out of our orchestrator, and handed it off to other components. For most of our users, that component is <a href='https://github.com/superfly/flyctl' title=''><code>flyctl</code>, our intrepid CLI</a>.</p> <p>But <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Fly Machines is an API</a>, and anything can drive it. A lot of our users want quick answers to requests to schedule apps in specific regions, and <code>flyctl</code> does a fine job of that. But it&rsquo;s totally reasonable to want something that works more like the good little robots inside of K8s.</p> <p>You can build your own orchestrator with our API, but if what you&rsquo;re looking for is literally Kubernetes, we&rsquo;ve saved you the trouble. It&rsquo;s called Fly Kubernetes, or FKS for short.</p> <p>FKS is an implementation of Kubernetes that runs on top of Fly.io. You start it up using <code>flyctl</code>, by running <code>flyctl ext k8s create</code>.</p> <p>Under the hood, FKS is a straightforward combination of two well-known Kubernetes projects: <a href='https://k3s.io/' title=''>K3s, the lightweight CNCF-certified K8s distro</a>, and <a href='https://virtual-kubelet.io/' title=''>Virtual Kubelet</a>.</p> <p>Virtual Kubelet is interesting. In K8s-land, a <code>kubelet</code> is a host agent; it&rsquo;s the thing that runs on every server in your fleet that knows how to run a K8s Pod. Virtual Kubelet isn&rsquo;t a host agent; it&rsquo;s a software component that pretends to be a host, registering itself with K8s as if it was one, but then sneakily proxying the Kubelet API elsewhere.</p> <p>In FKS, &ldquo;elsewhere&rdquo; is <a href='https://fly.io/docs/machines/' title=''>Fly Machines</a>. All we have to do is satisfy various APIs that virtual kubelet exposes. For example, the API for the lifecycle of a pod:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-did7dsc1" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-did7dsc1">type PodLifecycleHandler interface { CreatePod(ctx context.Context, pod *corev1.Pod) error UpdatePod(ctx context.Context, pod *corev1.Pod) error DeletePod(ctx context.Context, pod *corev1.Pod) error GetPod(ctx context.Context, namespace, name string) (*corev1.Pod, error) GetPodStatus(ctx context.Context, namespace, name string) (*corev1.PodStatus, error) GetPods(context.Context) ([]*corev1.Pod, error) } </code></pre> </div> </div> <p>This interface is easy to map to the Fly Machines API. For example:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-hv82buwy" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-hv82buwy">CreatePod -&gt; POST /apps/{app_name}/machines UpdatePod -&gt; POST /apps/{app_name}/machines/{machine_id} </code></pre> </div> </div> <p>K3s, meanwhile, is a stripped-down implementation of all of K8s that fits into a single binary. K3s does a bunch of clever things to be as streamlined as it is, but the most notable of them is <a href='https://github.com/k3s-io/kine' title=''>kine, an API shim that switches <code>etcd</code> out with databases like SQLite</a>. Because of <code>kine</code>, K3s can manage multiple servers, but also gracefully runs on a single server, without distributed state.</p> <p>So that&rsquo;s what we do. When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine. We compile a <a href='https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/' title=''>kubeconfig</a>, with which you can talk to your K3s via <code>kubectl</code>. We set the whole thing up to run Pods on individual Fly Machines, so your cluster scales out directly using our platform, but with K8s tooling.</p> <p>One thing we like about this design is how much of the lifting is already done for us by the underlying platform. If you&rsquo;re a K8s person, take a second to think of all the different components you&rsquo;re dealing with: <a href='https://etcd.io/' title=''>etcd</a>, specifically provisioned nodes, the <a href='https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/' title=''>kube-proxy</a>, <a href='https://github.com/flannel-io/flannel' title=''>a CNI </a>binary and configuration and its integration with the host network, containerd, registries. But Fly.io already does most of those things. So this project was mostly chipping away components until we found the bare minimum: CoreDNS, SQLite persistence, and Virtual Kubelet.</p> <p>We ended up with something significantly simpler than K3s, which is saying something.</p> <p>Fly Kubernetes has some advantages over plain <code>flyctl</code> and <code>fly.toml</code>:</p> <ul> <li>Your deployment is more declarative than it is with the <code>fly.toml</code> file. You declare the exact state of everything down to replica counts, autoscaling rules, volume definitions, and more. </li><li>When you deploy with Fly Kubernetes, Kubernetes will automatically make your definitions match the state of the world. Machines go down? Kubernetes will whack them back online. </li></ul> <p>This is a different way to do orchestration and scheduling on Fly.io. It&rsquo;s not what everyone is going to want. But if you want it, you really want it, and we&rsquo;re psyched to give it to you: Fly.io&rsquo;s platform features, with Kubernetes handling configuration and driving your system to its desired state.</p> <p>We&rsquo;ve kept things simple to start with. There are K8s use cases we&rsquo;re a strong fit for today, and others we&rsquo;ll get better at in the near future, as K8s users drive the underlying platform (and particularly our proxy) forward.</p> <p><strong class='font-semibold text-navy-950'>Interested in getting early access? Email us at <a href="mailto:[email protected]">[email protected]</a> and we&rsquo;ll hook you up.</strong></p> <figure class="post-cta"> <figcaption> <h1>Not invested in K8s?</h1> <p>Nothing has to change for you! You can deploy apps on Fly.io today, in a matter of minutes, without talking to Sales.</p> <a class="btn btn-lg" href="https://fly.io/docs/speedrun/"> Deploy an app in minutes.<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <div class="youtube-container" data-exclude-render> <div class="youtube-video"> <iframe width="100%" height="100%" src="https://www.youtube.com/embed/A3vFfZvUiwo" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe> </div> </div> <h2 id='what-it-all-means' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-all-means' aria-label='Anchor'></a><span class='plain-code'>What It All Means</span></h2> <p>One obvious thing it means is that you&rsquo;ve got an investment in Kubernetes tooling, you can keep it while running things on top of Fly.io. So that&rsquo;s pretty neat. Buy our cereal!</p> <p>But the computer science story is interesting, too. We placed a bet on an idiosyncratic strategy for doing global orchestration. We replaced global consensus, which is how Borg, Kubernetes, and Nomad all work, with a market-based system. That system was faster and, importantly, dumber than the consensus system it replaced.</p> <p>This had costs! Nomad&rsquo;s global consensus would do truly heroic amounts of work to make sure Fly Apps got scheduled somewhere, anywhere. Like a good capitalist, Fly Machines will tell you in no uncertain terms how much work it&rsquo;s willing to do for you (&ldquo;less than a Nomad&rdquo;).</p> <p>But that doesn&rsquo;t mean you&rsquo;re stuck with the answers Fly Machines gives by itself. Because Fly Machines is so simple, and tries so hard to be predictable, we hoped you&rsquo;d be able to build more sophisticated scheduling and orchestration schemes on top of it. And here you go: Kubernetes scheduling, as a plugin to the platform.</p> <p>More to come! We&rsquo;re itching to see just how many different ways this bet might pay off. Or: we&rsquo;ll perish in flames! Either way, it&rsquo;ll be fun to watch.</p></content> </entry> <entry> <title>Fly.io has GPUs now</title> <link rel="alternate" href="https://fly.io/blog/fly-io-has-gpus-now/"/> <id>https://fly.io/blog/fly-io-has-gpus-now/</id> <published>2023-12-13T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/fly-io-has-gpus-now/assets/llama-portal-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io, we’re a new public cloud that lets you put your compute where it matters: near your users. Today we’re announcing that you can do this with GPUs too, allowing you to do AI workloads on the edge. Want to find out more? Keep reading.</p> </div><h2 id='ai-is-pretty-fly' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ai-is-pretty-fly' aria-label='Anchor'></a><span class='plain-code'>AI is pretty fly</span></h2> <p>AI is apparently a bit of a <em>thing</em> (maybe even <em>an thing</em> come to think about it). We&rsquo;ve seen entire industries get transformed in the wake of ChatGPT existing (somehow it&rsquo;s only been around for a year, I can&rsquo;t believe it either). It&rsquo;s likely to leave a huge impact on society as a whole in the same way that the Internet did once we got search engines. Like any good venture-capital funded infrastructure provider, we want to enable you to do hilarious things with AI using industrial-grade muscle.</p> <p>Fly.io lets you run a full-stack app&mdash;or an entire dev platform based on the <a href='https://fly.io/docs/machines/' title=''>Fly Machines API</a>&mdash;close to your users. Fly.io GPUs let you attach an <a href='https://www.nvidia.com/en-us/data-center/a100/' title=''>Nvidia A100</a> to whatever you&rsquo;re building, harnessing the full power of CUDA with more VRAM than your local 4090 can shake a ray-traced stick at. With these cards (or whatever you call a GPU attached to SXM fabric), AI/ML workloads are at your fingertips. You can <a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>recognize speech</a>, segment text, summarize articles, synthesize images, and more at speeds that would make your homelab blush. You can even set one up as your programming companion with <a href='https://github.com/deepseek-ai/DeepSeek-Coder' title=''>your model of choice</a> in case you&rsquo;ve just not been feeling it with the output of <em>other</em> models changing over time.</p> <p>If you want to find out more about what these cards are and what using them is like, check out <a href='https://fly.io/blog/what-are-these-gpus-really/' title=''>What are these &ldquo;GPUs&rdquo; really?</a> It covers the history of GPUs and why it&rsquo;s ironic that the cards we offer are called &ldquo;Graphics Processing Units&rdquo; in the first place.</p> <h2 id='fly-io-gpus-in-action' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-io-gpus-in-action' aria-label='Anchor'></a><span class='plain-code'>Fly.io GPUs in Action</span></h2> <p>We want you to deploy your own code with your favorite models on top of Fly.io&rsquo;s cloud backbone. Fly.io GPUs make this really easy.</p> <p>You can get a GPU app running <a href='https://ollama.ai' title=''>Ollama</a> (our friends in text generation) in two steps:</p> <ol> <li><p>Put this in your <code>fly.toml</code>:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-l8a9wi1z" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-l8a9wi1z"><span class="py">app</span> <span class="p">=</span> <span class="s">"sandwich_ai"</span> <span class="py">primary_region</span> <span class="p">=</span> <span class="s">"ord"</span> <span class="py">vm.size</span> <span class="p">=</span> <span class="s">"a100-40gb"</span> <span class="nn">[build]</span> <span class="py">image</span> <span class="p">=</span> <span class="s">"ollama/ollama"</span> <span class="nn">[mounts]</span> <span class="py">source</span> <span class="p">=</span> <span class="s">"models"</span> <span class="py">destination</span> <span class="p">=</span> <span class="s">"/root/.ollama"</span> <span class="py">initial_size</span> <span class="p">=</span> <span class="s">"100gb"</span> </code></pre> </div> </div></li><li><p>Run <code>fly apps create sandwich_ai &amp;&amp; fly deploy</code>.</p> </li></ol> <p>If you want to read more about how to start your new sandwich empire, check out <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a>, it explains how to set up Ollama so that it <em>automatically scales itself down</em> when it&rsquo;s not in use.</p> <h2 id='the-speed-of-light-is-only-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-speed-of-light-is-only-so-fast' aria-label='Anchor'></a><span class='plain-code'>The speed of light is only so fast</span></h2> <p>Being able to spin up GPUs is great, but where Fly.io really shines is inference at the edge.</p> <p>Let&rsquo;s say you have an app that lets users enter ingredients they have in their kitchen and receive a sandwich recipe. Your users expect their recipes <em>instantly</em> (or at least as fast as the other leading apps). Seconds count when you need an emergency sandwich.</p> <div class="left-sidenote"><p><br> <br> <br> It’s depressingly customary in the AI industry to cherry-pick outputs. This was not cherry-picked. I used <a href="https://ollama.ai/library/yi:34b" title="">yi:34b</a> to generate this recipe. I’m not sure what a taco salad sandwich is, but I might be willing to try it.</p> </div> <p><img alt="A conversation between a user and an artificial intelligence. The user asks: &quot;What kind of sandwich can I make with an egg, a taco shell, and a pound of ground beef?&quot; The AI responds: &quot; You can make a breakfast taco or a taco salad sandwich with the ingredients you have. Here&#39;s how to prepare each: for the breakfast taco. Cook the pound of ground beef in a pan until browned. Season with salt, pepper, and any other desired spices like chili powder or taco seasoning. While the beef is cooking, fry an egg (or scramble it if you prefer) in another pan. Warm up the taco shell in the oven or microwave according to the package instructions. Assemble the taco by placing the ground cooked beef in the taco shell followed by the fried egg. You can also add any additional toppings like diced onions avocados, avocado, salsa or cheese if you have them. Serve with a side of hot sauce or sour cream if desired. For the taco salad sandwich: Prepare the ground beef as in the breakfast taco recipe above. Instead of frying an egg, you can cook it sunny side up or over easy depending on your preference. Crumble the taco shell into small pieces and mix them with the cooked ground beef. Assemble the sandwich by layering the crumbled taco shell mixture and ground beef in two slices of bread. Top with the fried egg and any additional toppings you like such as shredded lettuce, diced tomatoes, or sliced olives. Serve with a side of salsa or ranch dressing if desired. Both options are hardy and can be easily customized based on your taste preferences. Enjoy!&quot;" src="/blog/fly-io-has-gpus-now/assets/sandwich-recipes.webp" /></p> <p>In the previous snippet, we deployed our app to ord (<code>primary_region = &quot;ord&quot;</code>). The good news is that our model returns a result really quickly and users in Chicago get instant sandwich recipes. It&rsquo;s a good experience for users near your datacentre, and you can do this on any half decent cloud provider.</p> <p>But surely people outside of Chicago need sandwiches too. Amsterdam has sandwich fiends as well. And sometimes it takes too long to have their requests leap across the pond. The speed of light is only so fast after all. Don&rsquo;t worry, we&rsquo;ve got your back. Fly.io has GPUs in datacentres all over the world. Even more, we&rsquo;ll let you run <em>the same program</em> with the same public IP address and the same TLS certificates in any regions with GPU support.</p> <p>Don&rsquo;t believe us? See how you can scale your app up in Amsterdam with one command:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-404ps1ts" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-404ps1ts">fly scale count 2 --region ams </code></pre> </div> </div> <p>It&rsquo;s that easy.</p> <h2 id='actually-on-demand' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#actually-on-demand' aria-label='Anchor'></a><span class='plain-code'>Actually On-Demand</span></h2> <p>GPUs are powerful parallel processing packages, but they&rsquo;re not cheap! Once we have enough people wanting to turn their fridge contents into tasty sandwiches, keeping a GPU or two running makes sense. But we&rsquo;re just a small app still growing our user base while also funding the latest large sandwich model research. We want to only pay for GPUs when a user makes a request.</p> <p>Let&rsquo;s open up that <code>fly.toml</code> again, and add a section called <code>services</code>, and we&rsquo;ll include instructions on how we want our app to scale up and down:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-cfo4p0z3" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-cfo4p0z3"><span class="nn">[[services]]</span> <span class="py">internal_port</span> <span class="p">=</span> <span class="mi">8080</span> <span class="py">protocol</span> <span class="p">=</span> <span class="s">"tcp"</span> <span class="py">auto_stop_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">auto_start_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">min_machines_running</span> <span class="p">=</span> <span class="mi">0</span> </code></pre> </div> </div> <p>Now when no one needs sandwich recipes, you don&rsquo;t pay for GPU time.</p> <h2 id='the-deets' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-deets' aria-label='Anchor'></a><span class='plain-code'>The Deets</span></h2> <p>We have GPUs ready to use in several US and EU regions and Sydney. You can deploy your sandwich, music generation, or AI illustration apps to:</p> <ul> <li><a href='https://www.nvidia.com/en-us/data-center/a100/' title=''>Ampere A100s</a> with 40gb of RAM for $2.50/hr </li><li><a href='https://www.nvidia.com/en-us/data-center/a100/' title=''>Ampere A100s</a> with 80gb of RAM for $3.50/hr </li><li><a href='https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413' title=''>Lovelace L40s</a> are coming soon (update: now here!) for $2.50/hr </li></ul> <p>By default, anything you deploy to GPUs will use eight heckin&rsquo; <a href='https://www.amd.com/en/processors/epyc-server-cpu-family' title=''>AMD EPYC</a> CPU cores, and you can attach volumes up to 500 gigabytes. We&rsquo;ll even give you discounts for reserved instances and dedicated hosts if you ask nicely.</p> <p>We hope you have fun with these new cards and we&rsquo;d love to see what you can do with them! Reach out to us on X (formerly Twitter) or <a href='https://community.fly.io/' title=''>the community forum</a> and share what you&rsquo;ve been up to. We&rsquo;d love to see what we can make easier!</p></content> </entry> <entry> <title>What are these "GPUs" really?</title> <link rel="alternate" href="https://fly.io/blog/what-are-these-gpus-really/"/> <id>https://fly.io/blog/what-are-these-gpus-really/</id> <published>2023-12-11T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/what-are-these-gpus-really/assets/gpu-songstress-thumb.webp"/> <content type="html"><div class="lead"><p>Fly.io runs containerized apps with virtual machine isolation on our own hardware around the world, so you can safely run your code close to where your users are. We’re in the process of rolling out GPU support, and that’s what this post is about, but you don’t have to wait for that to try us out: <a href="https://fly.io/docs/speedrun/" title="">your app can be up and running on us in minutes</a>.</p> </div> <p>GPU hardware will let our users run all sorts of fun Artificial Intelligence and Machine Learning (AI/ML) workloads near their users. But, what are these &ldquo;GPUs&rdquo; really? What can they do? What <em>can&rsquo;t</em> they do?</p> <p>Listen here for my tale of woe as I spell out exactly what these cards are, are not, and what you can do with them. By the end of this magical journey, you should understand the true irony of them being called &ldquo;Graphics Processing Units&rdquo; and why every marketing term is always bad forever.</p> <h2 id='how-does-computer-formed' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-does-computer-formed' aria-label='Anchor'></a><span class='plain-code'>How does computer formed?</span></h2> <p>In the early days of computing, your computer generally had a few basic components:</p> <ul> <li>The CPU </li><li>Input device and assorted peripherals (keyboard, etc) </li><li>Output device (monitor, printer, etc) </li><li>Memory </li><li>Glue logic chips </li><li>Video rendering hardware </li></ul> <p>Taking the Commodore 64 as an example, it had a CPU, a chip to handle video output, a chip to handle audio output, and a chip to glue everything together. The CPU would read instructions from the RAM and then execute them to do things like draw to the screen, solve sudoku puzzles, play sounds, and so on.</p> <p>However, even though the CPU by itself was fast by the standards of the time, it could only do a million clock cycles per second or so. Imagine a very small shouting crystal vibrating millions of times per second triggering the CPU to do one part of a task and you&rsquo;ll get the idea. This is fast, but not fast enough when executing instructions can take longer than a single clock cycle and when your video output device needs to be updated 60 times per second.</p> <p>The main way they optimized this was by shunting a lot of the video output tasks to a bespoke device called the VIC-II (Video Interface Chip, version 2). This allowed the Commodore 64 to send a bunch of instructions to the VIC-II and then let it do its thing while the CPU was off doing other things. This is called &ldquo;offloading&rdquo;.</p> <p><img src="/blog/what-are-these-gpus-really/assets/./deus-ex-machina-cover.webp" /></p> <p>As technology advanced, the desire to do bigger and better things with both contemporary and future hardware increased. This came to a head when this little studio nobody had ever heard of called id Software released one of the most popular games of all time: DOOM.</p> <p>Now, even though DOOM was a huge advancement in gaming technology, it was still incredibly limited by the hardware of the time. It was actually a 2D game that used a lot of tricks to make it look (and feel) like it was 3D. It was also limited to a resolution of 320x200 and a hard cap of 35 frames per second. This was fine for the time (most movies were only at 24 frames per second), but it was clear that there was a lot of room for improvement.</p> <p>One of the main things that DOOM did was to use a pair of techniques to draw the world at near real-time. It used a combination of &ldquo;raycasting&rdquo; and binary-space partitioning to draw the world. This basically means that they drew a bunch of imaginary lines to where points in the map would be to figure out what color everything would be and then eliminated the parts of the map that were behind walls and other objects. This is a very simplified explanation, and if you want to know more, <a href='https://fabiensanglard.net/doomIphone/doomClassicRenderer.php' title=''>Fabien Sanglard explains the rendering</a> of DOOM in more detail.</p> <h2 id='the-dream-of-3d' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-dream-of-3d' aria-label='Anchor'></a><span class='plain-code'>The dream of 3D</span></h2> <p>However, a lot of this was logic that ran very slowly on the CPU, and while the CPU was doing the display logic, it couldn&rsquo;t do anything else, such as enemy AI or playing sounds. Hence the idea of a &ldquo;3D accelerator card&rdquo;. The idea: offload the 3D rendering logic to a separate device that could do it much faster than the CPU could, and free the CPU to do other things like AI, sound, and so on.</p> <p>This was the dream, but it was a long way off. Then Quake happened.</p> <div class="right-sidenote"><p>Really, Half-Life is based on Quake so much that the pattern for <a href="https://www.pcgamer.com/half-life-alyxs-lights-flicker-just-like-they-did-in-quake-almost-25-years-later/" title="">blinking lights</a> has carried forward 25 years later to Half-Life: Alyx in VR. If it ain’t broke, don’t fix it.</p> </div> <p>Unlike Doom, Quake was fully 3D on unmodified consumer hardware. Players could look up and down (something previously thought impossible without accelerator hardware!) and designers could make levels with that in mind. Quake also allowed much more complex geometry and textures. It was a huge leap forward in 3D gaming and it was only possible because of the massive leap in CPU power at the time. The Pentium family of processors was such a huge leap that it allowed them to bust through and do it in &ldquo;real time&rdquo;. Quake has since set the standard for multiplayer deathmatch games, and its source code has lineage to Call of Duty, Half-Life, Half-Life 2, DotA 2, Titanfall, and Apex Legends.</p> <p>However, the thing that really made 3D accelerator cards leap into the public spotlight was another little-known studio called Crystal Dynamics and their 1996 release of Tomb Raider. It was built from the ground up to require the use of 3D accelerator cards. The cards flew off the shelves.</p> <p>&ldquo;3D accelerator cards&rdquo; would later become known as &ldquo;Graphics Processing Units&rdquo; or GPUs because of how synonymous they became with 3D gaming, engineering tasks such as Computer-Aided Drafting (CAD), and even the entire OS environment with compositors like <a href='https://en.wikipedia.org/wiki/Desktop_Window_Manager' title=''>DWM</a> on Windows Vista, <a href='https://en.wikipedia.org/wiki/Compiz' title=''>Compiz</a> on GNU+Linux, and <a href='https://en.wikipedia.org/wiki/Quartz_(graphics_layer)' title=''>Quartz</a> on macOS. Things became so much easier for everyone when 2D and 3D graphics were integrated into the same device so you didn&rsquo;t need to chain your output through your 3D accelerator card!</p> <h2 id='the-gpu-as-we-know-it' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-gpu-as-we-know-it' aria-label='Anchor'></a><span class='plain-code'>The GPU as we know it</span></h2> <p>When GPUs first came out, they were very simple devices. They had a few basic components:</p> <ul> <li>A framebuffer to store the current state of the screen </li><li>A command processor to take instructions from the game and translate them into something the hardware can understand </li><li>Memory to store temporary data </li><li>Shader processing hardware to allow designers to change how light and textures were rendered </li><li>A display output that was chained through an existing VGA card so that the user could see what was going on in real time (yes, this is something we actually did) </li></ul> <p>This basic architecture has remained the same for the past 20 years or so. The main differences are that as technology advanced, the capabilities of those cards increased. They got faster, more parallel, more capable, had more memory, were made cheaper, and so on. This gradually allowed for more and more complex games like Half-Life 2, Crysis, The Legend of Zelda: Breath of the Wild, Baudur&rsquo;s Gate 3, and so on.</p> <p>Over time, as more and more hardware was added, GPUs became computers in their own rights (sometimes even bigger than the rest of the computer thanks for the need to cool things more aggressively). This new hardware includes:</p> <ul> <li>Video encoding hardware via NVENC and AMD VCE so that content creators can stream and record their gameplay in higher quality without having to impact the performance of the game </li><li><aside class="left-sidenote">Seriously, once you experience high framerate HDR raytraced Tetris you can&rsquo;t really go back to the old way.</aside> Raytracing accelerator cores via RTX so that light can be rendered more realistically </li><li>AI/ML cores to allow for dynamic upscaling to eke out more performance from the card </li><li>Display output hardware to allow for multiple monitors to be connected to the card </li><li>Faster and faster memory buses and interfaces to the rest of the system to allow for more data to be processed faster </li><li>Direct streaming from the drive to GPU memory to allow for faster loading times </li></ul> <p>But, at the same time, that AI/ML hardware started to get noticed by more and more people. It was discovered that the shader cores and then the CUDA cores could be used to do AI/ML workloads at ludicrous speeds. This enabled research and development of models like GPT-2, Stable Diffusion, DLSS, and so on. This has led to a Cambrian Explosion of AI/ML research and development that is continuing to this day.</p> <h2 id='the-quot-gpus-quot-that-fly-io-is-using' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-quot-gpus-quot-that-fly-io-is-using' aria-label='Anchor'></a><span class='plain-code'>The &ldquo;GPUs&rdquo; that Fly.io is using</span></h2> <p>I&rsquo;ve mostly been describing consumer GPUs and their capabilities up to this point because that&rsquo;s what we all have the biggest understanding of. There is a huge difference between the &ldquo;GPUs&rdquo; that you can get for server tasks and normal consumer tasks from a place like Newegg or Best Buy. The main difference is that enterprise-grade Graphics Processing Units do not have any of the hardware needed to process graphics.</p> <div class="right-sidenote"><p>Author’s note: This will not be the case in the future. Fly.io is going to add <a href="https://www.nvidia.com/en-us/data-center/l40s/" title="">Lovelace L40S GPUs</a> that do have 3D rendering, video encoding, shader cores, and so on. But, that’s not what we’re talking about today.</p> </div> <p>Yes. Really. They don&rsquo;t have rasterization hardware, shader cores, display outputs, or anything useful for trying to run games on them. They are AI/ML accelerator cards more than anything. It&rsquo;s kinda beautifully ironic that they&rsquo;re called Graphics Processing Units when they have no ability to process graphics.</p> <h2 id='what-can-you-do-with-them' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-can-you-do-with-them' aria-label='Anchor'></a><span class='plain-code'>What can you do with them?</span></h2> <p>These GPUs are really good at massively parallel tasks. This naturally translates to being very good at AI/ML tasks such as:</p> <ul> <li>Summarization (what is this article about in a few sentences?) </li><li>Translation (what does this article say in Spanish?) </li><li>Speech recognition (what is a voice clip saying?) </li><li>Speech synthesis (what does this text sound like?) </li><li>Text generation (what would a cat say if it could talk?) </li><li>Basic rote question and answering (what is the safe cooking temperature for chicken breasts in celsius?) </li><li>Text classification (is this article about cats or dogs?) </li><li>Sentiment analysis (is this article positive or negative, what could that mean about the companies involved?) </li><li>Image classification (is this a cat or a dog?) </li><li>Object detection (where are the cats and dogs in this image?) </li></ul> <p>Or any combination/chain of these tasks. A lot of this is pretty abstract building blocks that can be combined in a lot of different ways. This is why AI/ML stuff is so exciting right now. We&rsquo;re in the early days of understanding what these things are, what they can do, and how to use them properly.</p> <p>Imagine being able to load articles about the topic you are researching into your queries to find where someone said something roughly similar to what you&rsquo;re looking for. Queries like &ldquo;that one recipe with eggs that you fold over with ham in it&rdquo;. That&rsquo;s the kind of thing that&rsquo;s possible with AI/ML (and tools like vector databases) but difficult to impossible with traditional search engines.</p> <h2 id='how-to-use-ai-for-reals' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-to-use-ai-for-reals' aria-label='Anchor'></a><span class='plain-code'>How to use AI for reals</span></h2> <p>Fortunately and unfortunately, we&rsquo;re in the Cambrian Explosion days of this industry. Key advances happen constantly. Exact models and tooling changes almost as often. This is both a very good thing and a very bad thing.</p> <p>If you want to get started today, here&rsquo;s a few models that you can play with right now:</p> <ul> <li><a href='https://ai.meta.com/llama/' title=''>Llama 2</a> - A generic foundation model with instruction and chat tuned variants. It&rsquo;s a good starting point for a lot of research and nearly everything else uses the same formats that Llama 2 does. </li><li><a href='https://openai.com/research/whisper' title=''>Whisper</a> - A speech to text model that transcribes audio files into text better than most professional dictation software. I, the author of this article, wrote most of this article using Whisper. </li><li><a href='https://huggingface.co/NurtureAI/OpenHermes-2.5-Mistral-7B-16k' title=''>OpenHermes-2.5 Mistral 7B 16k</a> - An instruction-tuned model that can operate on up to 16 thousand tokens (about 40 printed pages of text, 12,000 words) at once. It&rsquo;s a good starting point for summarization and other tasks that require a lot of context. I personally use it for my personal AI chatbot named <a href='https://xeiaso.net/characters/#Mimi' title=''>Mimi</a>. </li><li><aside class="right-sidenote">Seriously Annie, you&rsquo;re great!</aside> <a href='https://stability.ai/stable-diffusion' title=''>Stable Diffusion XL</a> - A text-to-image model that lets you create high quality images from simple text descriptions. It&rsquo;s a good starting point for tasks that require image generation, such as when you want to add images to your blog posts but don&rsquo;t have an artist like Annie to draw you what you want. </li></ul> <p>For a practical example, imagine that you have a set of <a href='https://xeiaso.net/talks/' title=''>conference talks that you&rsquo;ve given over the years</a>. You want to take those talk videos, extract the audio, and transform them into written text because some people learn better from text than video. The overall workflow would look something like this:</p> <ul> <li>Use ffmpeg to extract the audio track from the video files </li><li>Use Whisper to <a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>convert the audio files into subtitle files</a> </li><li>Break the subtitle file into sequences based on significant pauses between topics (humans do this subconsciously, take advantage of it and you can make things seem heckin&rsquo; magic) </li><li>Use a large language model to summarize the segments and create a title for each segment </li><li>Paste the rest of the text into a markdown document between the segment titles </li><li>Manually review the documents and make any necessary changes with technical terms that the model didn&rsquo;t know about or things the model got wrong because English is a minefield of homophones that even trained experts have trouble with (ask me how I know) </li><li>Publish the documents on your blog </li></ul> <p>Then bam, you don&rsquo;t just have a portfolio piece, you have the recipe for winning downtime from visitors of orange websites clicking on your link so much. You can also use this to create transcripts for your videos so that people who can&rsquo;t hear can still enjoy your content.</p> <p>The true advantage of these is not using them as individual parts on themselves, but as a cohesive whole in a chain. This is where the real power of AI/ML comes from. It&rsquo;s not the individual models, but the ability to chain them together to do something useful. This is where the true opportunities for innovation lie.</p> <h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>So that&rsquo;s what these &ldquo;GPUs&rdquo; are really: they&rsquo;re AI/ML accelerator cards. The A100 cards incapable of processing graphics or encoding video, but they&rsquo;re really, really good at AI/ML workloads. They allow you to do way more tasks per watt than any CPU ever could.</p> <p>I hope you enjoyed this tale of woe as I spilled out the horrible truths about marketing being awful forever and gave you ideas for how to <em>actually use</em> these graphics-free Graphics Processing Units to do useful things. But sadly, not for processing graphics unless you wait for the <a href='https://www.nvidia.com/en-us/data-center/l40s/' title=''>Lovelace L40S</a> cards early in 2024.</p> <p>Sign up for Fly.io today and try our GPUs! I can&rsquo;t wait to see what you build with them.</p></content> </entry> <entry> <title>Scaling Large Language Models to zero with Ollama</title> <link rel="alternate" href="https://fly.io/blog/scaling-llm-ollama/"/> <id>https://fly.io/blog/scaling-llm-ollama/</id> <published>2023-12-06T12:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/scaling-llm-ollama/assets/thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io. We have powerful servers worldwide to run your code close to your users. Including GPUs so you can self host your own AI.</p> </div> <p>Open-source self-hosted AI tools have advanced a lot in the past 6 months. They allow you to create new methods of expression (with QR code generation and Stable Diffusion), easy access to summarization powers that would have made Google blush a decade ago (even with untuned foundation models such as LLaMa 2 and Yi), to conversational assistants that enable people to do more with their time, and to perform speech recognition in <em>real time</em> on moderate hardware (with Whisper et al). With all these capabilities comes the need for more and more raw computational muscle to be able to do inference on bigger and bigger models, and eventually do things that we can&rsquo;t even imagine right now. Fly.io lets you put your compute where your users are so that you can do machine learning inference tasks on the edge with the power of enterprise-grade GPUs such as the Nvidia A100. You can also scale your GPU nodes to zero running Machines, so you only pay for what you actually need, when you need it.</p> <div class="right-sidenote"><p>It’s worth mentioning that “scaling to zero” doesn’t mean what you may think it means. When you “scale to zero” in Fly.io, you actually stop the running Machine. This means the Machine is still laying around on the same computer box that it runs on, but it’s just put to sleep. If there is a capacity issue then your app may be unable to wake back up. We are working on a solution to this, but for now you should be aware that scaling to zero is not the same as spinning down your Machine and spinning it back up again on a new computer box when you need it.</p> </div><div class="callout"><p>This is a continuation of the last post in this series about <a href="https://fly.io/blog/transcribing-on-fly-gpu-machines/" title="">how to use GPUs on Fly.io</a>.</p> </div><h2 id='why-scale-to-zero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-scale-to-zero' aria-label='Anchor'></a><span class='plain-code'>Why scale to zero?</span></h2> <p>Running GPU nodes on top of Fly is expensive. Sure, GPUs enable you to do things a lot faster than CPUs ever could on their own, but you mostly will have things run idle between uses. This is where scaling to zero comes in. With scaling to zero, you can have your GPU nodes shut down when you&rsquo;re not using them. When your Machine stops, you aren&rsquo;t paying for the GPU any more. This is good for the environment and your wallet.</p> <p>In this post, we&rsquo;re going to be using <a href='https://ollama.ai' title=''>Ollama</a> to generate text. Ollama is a fancy wrapper around <a href='https://github.com/ggerganov/llama.cpp' title=''>llama.cpp</a> that allows you to run large language models on your own hardware with your choice of model. It also supports GPU acceleration, meaning that you can use Fly.io&rsquo;s huge GPUs to run your models faster than your RTX 3060 at home ever would on its own.</p> <p>One of the main downsides of using Ollama in a cloud environment is that it doesn&rsquo;t have authentication by default. Thanks to the power of about 70 lines of Go, we are able to shim that in after the fact. This will protect your server from random people on the internet using your GPU time (and spending your money) to generate text and integrate it into your own applications.</p> <p>Create a new folder called <code>ollama-scale-to-0</code>:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-hmfd22hk" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-hmfd22hk"><span class="nb">mkdir </span>ollama-scale-to-0 </code></pre> </div> </div><h2 id='fly-app-setup' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-app-setup' aria-label='Anchor'></a><span class='plain-code'>Fly app setup</span></h2> <p>First, we need to create a new Fly app:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-tzghjjx5" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-tzghjjx5">fly launch <span class="nt">--no-deploy</span> </code></pre> </div> </div> <p>After selecting a name and an organization to run it in, this command will create the app and write out a <code>fly.toml</code> file for you:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-bfrjoo6m" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-bfrjoo6m"><span class="c"># fly.toml app configuration file generated for sparkling-violet-709 on 2023-11-14T12:13:53-05:00</span> <span class="c">#</span> <span class="c"># See https://fly.io/docs/reference/configuration/ for information about how to use this file.</span> <span class="c">#</span> <span class="py">app</span> <span class="p">=</span> <span class="s">"sparkling-violet-709"</span> <span class="py">primary_region</span> <span class="p">=</span> <span class="s">"ord"</span> <span class="nn">[http_service]</span> <span class="py">internal_port</span> <span class="p">=</span> <span class="mi">11434</span> <span class="c"># change me to 11434!</span> <span class="py">force_https</span> <span class="p">=</span> <span class="kc">false</span> <span class="c"># change mo to false!</span> <span class="py">auto_stop_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">auto_start_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">min_machines_running</span> <span class="p">=</span> <span class="mi">0</span> <span class="py">processes</span> <span class="p">=</span> <span class="nn">["app"]</span> </code></pre> </div> </div> <p>This is the configuration file that Fly.io uses to know how to run your application. We&rsquo;re going to be modifying the <code>fly.toml</code> file to add some additional configuration to it, such as enabling GPU support:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-3lhl3358" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-3lhl3358"><span class="py">app</span> <span class="p">=</span> <span class="s">"sparkling-violet-709"</span> <span class="py">primary_region</span> <span class="p">=</span> <span class="s">"ord"</span> <span class="py">vm.size</span> <span class="p">=</span> <span class="s">"a100-40gb"</span> <span class="c"># the GPU size, see https://fly.io/docs/gpus/gpu-quickstart/ for more info</span> </code></pre> </div> </div> <p>We don&rsquo;t want to expose the GPU to the internet, so we&rsquo;re going to create a <a href='https://fly.io/docs/reference/private-networking/#flycast-private-load-balancing' title=''>flycast</a> address to expose it to other services on your private network. To create a flycast address, run this command:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-bthlbecs" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-bthlbecs">fly ips allocate-v6 <span class="nt">--private</span> </code></pre> </div> </div> <p>The <code>fly ips allocate-v6</code> command makes a unique address in your private network that you can use to access Ollama from your other services. Make sure to add the <code>--private</code> flag, otherwise you&rsquo;ll get a globally unique IP address instead of a private one.</p> <p>Next, you may need to remove all of the other public IP addresses for the app to lock it away from the public. Get a list of them with <code>fly ips list</code> and then remove them with <code>fly ips release &lt;ip&gt;</code>. Delete everything but your flycast IP.</p> <p>Next, we need to declare the volume for Ollama to store models in. If you don&rsquo;t do this, then when you scale to zero, your existing models will be destroyed and you will have to re-download them every time the server starts. This is not ideal, so we&rsquo;re going to create a persistent volume to store the models in. Add the following to your <code>fly.toml</code>:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-i9h5kt6l" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-i9h5kt6l"><span class="nn">[build]</span> <span class="py">image</span> <span class="p">=</span> <span class="s">"ollama/ollama"</span> <span class="nn">[mounts]</span> <span class="py">source</span> <span class="p">=</span> <span class="s">"models"</span> <span class="py">destination</span> <span class="p">=</span> <span class="s">"/root/.ollama"</span> <span class="py">initial_size</span> <span class="p">=</span> <span class="s">"100gb"</span> </code></pre> </div> </div> <p>This will create a 100GB volume in the <a href='https://en.wikipedia.org/wiki/O%27Hare_International_Airport' title=''><code>ord</code></a> region when the app is deployed. This will be used to store the models that you download from the <a href='https://ollama.ai/library/' title=''>Ollama library</a>. You can make this smaller if you want, but 100GB is a good place to start from.</p> <p>Now that everything is set up, we can deploy this to Fly.io:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-iogi1ir3" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-iogi1ir3">fly deploy </code></pre> </div> </div> <p>This will take a minute to pull the Ollama image, push it to a Machine, provision your volume, and kick everything else off with hypervisors, GPUs and whatnot. Once it&rsquo;s done, you should see something like this:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-rgjl7r36" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-rgjl7r36"> ✔ Machine 17816141f55489 <span class="o">[</span>app] update succeeded <span class="nt">-------</span> Visit your newly deployed app at https://sparkling-violet-709.fly.dev/ </code></pre> </div> </div> <p>This is a lie because we just deleted the public IP addresses for this app. You can&rsquo;t access it from the internet, and by extension, random people can&rsquo;t access it either. For now, you can run an interactive session with Ollama using an ephemeral Fly Machine:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-pjpmi8ic" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-pjpmi8ic">fly m run <span class="nt">-e</span> <span class="nv">OLLAMA_HOST</span><span class="o">=</span>http://sparkling-violet-709.flycast <span class="nt">--shell</span> ollama/ollama </code></pre> </div> </div> <p>And then you can pull an image from the <a href='https://ollama.ai/library/' title=''>ollama library</a> and generate some text:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-ytdqtkck" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-ytdqtkck"><span class="nv">$ </span>ollama run openchat:7b-v3.5-fp16 <span class="o">&gt;&gt;&gt;</span> How <span class="k">do </span>I bake chocolate chip cookies? To bake chocolate chip cookies, follow these steps: 1. Preheat the oven to 375°F <span class="o">(</span>190°C<span class="o">)</span> and line a baking sheet with parchment paper or silicone baking mat. 2. In a large bowl, mix together 1 cup of unsalted butter <span class="o">(</span>softened<span class="o">)</span>, 3/4 cup granulated sugar, and 3/4 cup packed brown sugar <span class="k">until </span>light and fluffy. 3. Add 2 large eggs, one at a <span class="nb">time</span>, to the butter mixture, beating well after each addition. Stir <span class="k">in </span>1 teaspoon of pure vanilla extract. 4. In a separate bowl, whisk together 2 cups all-purpose flour, 1/2 teaspoon baking soda, and 1/2 teaspoon salt. Gradually add the dry ingredients to the wet ingredients, stirring <span class="k">until </span>just combined. 5. Fold <span class="k">in </span>2 cups of chocolate chips <span class="o">(</span>or chunks<span class="o">)</span> into the dough. 6. Drop rounded tablespoons of dough onto the prepared baking sheet, spacing them about 2 inches apart. 7. Bake <span class="k">for </span>10-12 minutes, or <span class="k">until </span>the edges are golden brown. The centers should still be slightly soft. 8. Allow the cookies to cool on the baking sheet <span class="k">for </span>a few minutes before transferring them to a wire rack to cool completely. Enjoy your homemade chocolate chip cookies! </code></pre> </div> </div> <p>If you want a persistent wake-on-use connection to your Ollama instance, you can set up a <a href='https://fly.io/docs/reference/private-networking/#discovering-apps-through-dns-on-a-wireguard-connection' title=''>connection to your Fly network using WireGuard</a>. This will let you use Ollama from your local applications without having to run them on Fly. For example, if you want to figure out the safe cooking temperature for ground beef in Celsius, you can query that in JavaScript with this snippet of code:</p> <div class="highlight-wrapper group relative typescript"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-rlnqfarq" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-rlnqfarq"><span class="kd">const</span> <span class="nx">generateRequest</span> <span class="o">=</span> <span class="p">{</span> <span class="na">model</span><span class="p">:</span> <span class="dl">"</span><span class="s2">openchat:7b-v3.5-fp16</span><span class="dl">"</span><span class="p">,</span> <span class="na">prompt</span><span class="p">:</span> <span class="dl">"</span><span class="s2">What is the safe cooking temperature for ground beef in celsius?</span><span class="dl">"</span> <span class="na">stream</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span> <span class="c1">// &lt;- important for Node/Deno clients</span> <span class="p">};</span> <span class="kd">let</span> <span class="nx">resp</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">fetch</span><span class="p">(</span><span class="dl">"</span><span class="s2">http://sparkling-violet-709.flycast/api/generate</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span> <span class="na">method</span><span class="p">:</span> <span class="dl">"</span><span class="s2">POST</span><span class="dl">"</span><span class="p">,</span> <span class="na">body</span><span class="p">:</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">generateRequest</span><span class="p">),</span> <span class="p">});</span> <span class="k">if</span> <span class="p">(</span><span class="nx">resp</span><span class="p">.</span><span class="nx">status</span> <span class="o">!==</span> <span class="mi">200</span><span class="p">)</span> <span class="p">{</span> <span class="k">throw</span> <span class="k">new</span> <span class="nb">Error</span><span class="p">(</span><span class="s2">`error fetching response: </span><span class="p">${</span><span class="nx">resp</span><span class="p">.</span><span class="nx">status</span><span class="p">}</span><span class="s2">: </span><span class="p">${</span><span class="k">await</span> <span class="nx">resp</span><span class="p">.</span><span class="nx">text</span><span class="p">()}</span><span class="s2">`</span><span class="p">);</span> <span class="p">}</span> <span class="nx">resp</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">resp</span><span class="p">.</span><span class="nx">json</span><span class="p">();</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">resp</span><span class="p">.</span><span class="nx">response</span><span class="p">);</span> <span class="c1">// Something like "The safe cooking temperature for ground beef is 71 degrees celsius (160 degrees fahrenheit).</span> </code></pre> </div> </div><h2 id='scaling-to-zero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#scaling-to-zero' aria-label='Anchor'></a><span class='plain-code'>Scaling to zero</span></h2> <p>The best part about all of this is that when you want to scale down to zero running Machines: do nothing, it will automatically shut down when it&rsquo;s idle. Wait a few minutes and then verify it with <code>fly status</code>:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-u3h45u8u" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-u3h45u8u"><span class="nv">$ </span>fly status ... PROCESS ID VERSION REGION STATE ROLE CHECKS LAST UPDATED app 3d8d7949b22089 9 ord stopped 2023-11-14T19:34:24Z </code></pre> </div> </div> <p>The app has been stopped. This means that it&rsquo;s not running and you&rsquo;re not paying for it. When you want it to start up again, just make a request. It will automatically start up and you can use it as normal with the CLI or even just arbitrary calls to <a href='https://github.com/jmorganca/ollama/blob/main/docs/api.md' title=''>the API</a>.</p> <p>You can also upload your own models to the Ollama registry by <a href='https://github.com/jmorganca/ollama/blob/main/docs/import.md' title=''>creating your own Modelfile</a> and pushing it (though you will need to install Ollama locally to publish your own models). At this time, the only way to set a custom system prompt is to use a Modelfile and upload your model to the registry.</p> <h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>Ollama is a fantastic way to run large language models of your choice and the ability to use Fly.io&rsquo;s powerful GPUs means you can use bigger models with more parameters and a larger context window. This lets you make your assistants more lifelike, your conversations have more context, and your text generation more realistic.</p> <p>Oh, by the way, this also lets you use the new <code>json</code> mode to have your models call functions, similar to how ChatGPT would. To do this, have a system prompt that looks like this:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-p3jklt02" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-p3jklt02">You are a helpful research assistant. The following functions are available for you to fetch further data to answer user questions, if relevant: { "function": "search_bing", "description": "Search the web for content on Bing. This allows users to search online/the internet/the web for content.", "arguments": [ { "name": "query", "type": "string", "description": "The search query string" } ] } { "function": "search_arxiv", "description": "Search for research papers on ArXiv. Make use of AND, OR and NOT operators as appropriate to join terms within the query.", "arguments": [ { "name": "query", "type": "string", "description": "The search query string" } ] } To call a function, respond - immediately and only - with a JSON object of the following format: { "function": "function_name", "arguments": { "argument1": "argument_value", "argument2": "argument_value" } } If no function needs to be called, respond with an empty JSON object: {} </code></pre> </div> </div> <p>Then you can use the <a href='https://github.com/jmorganca/ollama/blob/main/docs/api.md#request-json-mode' title=''>JSON format</a> to receive a JSON response from Ollama (hint: <code>—format=json</code> in the CLI or <code>format: &quot;json&quot;</code> in the API). This is a great way to make your assistants more lifelike and more useful. You will need to use something like <a href='https://www.langchain.com/' title=''>Langchain</a> or manual iterations to properly handle the cases where the user doesn&rsquo;t want to call a function, but that&rsquo;s a topic for another blog post.</p> <p>For the best results you may want to use a model with a larger context window such as <a href='https://ollama.ai/library/vicuna:13b-v1.5-16k-fp16' title=''>vicuna:13b-v1.5-16k-fp16</a> (16k == 16,384 token window) as JSON is very token-expensive. Future advances in the next few weeks (such as the Yi models gaining ludicrous token windows on the line of 200,000 tokens at the cost of ludicrous amounts of VRAM usage) will make this less of an issue. You can also get away with minifying the JSON in the functions and examples a lot, but you may need to experiment to get the best results.</p> <p>Happy hacking, y&#39;all.</p></content> </entry> <entry> <title>Rethinking Serverless with FLAME</title> <link rel="alternate" href="https://fly.io/blog/rethinking-serverless-with-flame/"/> <id>https://fly.io/blog/rethinking-serverless-with-flame/</id> <published>2023-12-06T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/rethinking-serverless-with-flame/assets/flame-thumb.webp"/> <content type="html"><blockquote>Imagine if you could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of your app.</blockquote> <p>The pursuit of elastic, auto-scaling applications has taken us to silly places.</p> <p>Serverless/FaaS had a couple things going for it. Elastic Scale™ is hard. It&rsquo;s even harder when you need to manage those pesky servers. It also promised pay-what-you-use costs to avoid idle usage. Good stuff, right?</p> <p>Well the charade is over. You offload scaling concerns and the complexities of scaling, just to end up needing <em>more complexity</em>. Additional queues, storage, and glue code to communicate back to our app is just the starting point. Dev, test, and CI complexity balloons as fast as your costs. Oh, and you often have to rewrite your app in proprietary JavaScript – even if it&rsquo;s already written in JavaScript!</p> <p>At the same time, the rest of us have elastically scaled by starting more webservers. Or we&rsquo;ve dumped on complexity with microservices. This doesn&rsquo;t make sense. Piling on more webservers to transcode more videos or serve up more ML tasks isn&rsquo;t what we want. And granular scale shouldn&rsquo;t require slicing our apps into bespoke operational units with their own APIs and deployments to manage.</p> <p>Enough is enough. There&rsquo;s a better way to elastically scale applications.</p> <h2 id='the-flame-pattern' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-flame-pattern' aria-label='Anchor'></a><span class='plain-code'>The FLAME pattern</span></h2> <p>Here&rsquo;s what we really want:</p> <ul> <li>We don&rsquo;t want to manage those pesky servers. We already have this for our app deployments via <code>fly deploy</code>, <code>git push heroku</code>, <code>kubectl</code>, etc </li><li>We want on-demand, <em>granular</em> elastic scale of specific parts of our app code </li><li>We don&rsquo;t want to rewrite our application or write parts of it in proprietary runtimes </li></ul> <p>Imagine if we could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of the app.</p> <p>Enter the FLAME pattern.</p> <blockquote>FLAME - Fleeting Lambda Application for Modular Execution</blockquote> <p>With FLAME, you treat your <em>entire application</em> as a lambda, where modular parts can be executed on short-lived infrastructure.</p> <p>No rewrites. No bespoke runtimes. No outrageous layers of complexity. Need to insert the results of an expensive operation to the database? PubSub broadcast the result of some expensive work? No problem! It&rsquo;s your whole app so of course you can do it.</p> <p>The Elixir <a href='https://github.com/phoenixframework/flame' title=''>flame library</a> implements the FLAME pattern. It has a backend adapter for Fly.io, but you can use it on any cloud that gives you an API to spin up an instance with your app code running on it. We&rsquo;ll talk more about backends in a bit, as well as implementing FLAME in other languages.</p> <p>First, lets watch a realtime thumbnail generation example to see FLAME + Elixir in action:</p> <div class="youtube-container" data-exclude-render> <div class="youtube-video"> <iframe width="100%" height="100%" src="https://www.youtube.com/embed/l1xt_rkWdic" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe> </div> </div> <p>Now let&rsquo;s walk thru something a little more basic. Imagine we have a function to transcode video to thumbnails in our Elixir application after they are uploaded:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-dcj5640t" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-dcj5640t"><span class="k">def</span> <span class="n">generate_thumbnails</span><span class="p">(%</span><span class="no">Video</span><span class="p">{}</span> <span class="o">=</span> <span class="n">vid</span><span class="p">,</span> <span class="n">interval</span><span class="p">)</span> <span class="k">do</span> <span class="n">tmp</span> <span class="o">=</span> <span class="no">Path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="no">System</span><span class="o">.</span><span class="n">tmp_dir!</span><span class="p">(),</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">UUID</span><span class="o">.</span><span class="n">generate</span><span class="p">())</span> <span class="no">File</span><span class="o">.</span><span class="n">mkdir!</span><span class="p">(</span><span class="n">tmp</span><span class="p">)</span> <span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"-i"</span><span class="p">,</span> <span class="n">vid</span><span class="o">.</span><span class="n">url</span><span class="p">,</span> <span class="s2">"-vf"</span><span class="p">,</span> <span class="s2">"fps=1/</span><span class="si">#{</span><span class="n">interval</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="s2">"</span><span class="si">#{</span><span class="n">tmp</span><span class="si">}</span><span class="s2">/%02d.png"</span><span class="p">]</span> <span class="no">System</span><span class="o">.</span><span class="n">cmd</span><span class="p">(</span><span class="s2">"ffmpeg"</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span> <span class="n">urls</span> <span class="o">=</span> <span class="no">VidStore</span><span class="o">.</span><span class="n">put_thumbnails</span><span class="p">(</span><span class="n">vid</span><span class="p">,</span> <span class="no">Path</span><span class="o">.</span><span class="n">wildcard</span><span class="p">(</span><span class="n">tmp</span> <span class="o">&lt;&gt;</span> <span class="s2">"/*.png"</span><span class="p">))</span> <span class="no">Repo</span><span class="o">.</span><span class="n">insert_all</span><span class="p">(</span><span class="no">Thumb</span><span class="p">,</span> <span class="no">Enum</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">urls</span><span class="p">,</span> <span class="o">&amp;</span><span class="p">%{</span><span class="ss">vid_id:</span> <span class="n">vid</span><span class="o">.</span><span class="n">id</span><span class="p">,</span> <span class="ss">url:</span> <span class="nv">&amp;1</span><span class="p">}))</span> <span class="k">end</span> </code></pre> </div> </div> <p>Our <code>generate_thumbnails</code> function accepts a video struct. We shell out to <code>ffmpeg</code> to take the video URL and generate thumbnails at a given interval. We then write the temporary thumbnail paths to durable storage. Finally, we insert the generated thumbnail URLs into the database.</p> <p>This works great locally, but CPU bound work like video transcoding can quickly bring our entire service to a halt in production. Instead of rewriting large swaths of our app to move this into microservices or some FaaS, we can simply wrap it in a FLAME call:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-gcihj0ww" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-gcihj0ww"><span class="k">def</span> <span class="n">generate_thumbnails</span><span class="p">(%</span><span class="no">Video</span><span class="p">{}</span> <span class="o">=</span> <span class="n">vid</span><span class="p">,</span> <span class="n">interval</span><span class="p">)</span> <span class="k">do</span> <span class="no">FLAME</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="no">MyApp</span><span class="o">.</span><span class="no">FFMpegRunner</span><span class="p">,</span> <span class="k">fn</span> <span class="o">-&gt;</span> <span class="n">tmp</span> <span class="o">=</span> <span class="no">Path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="no">System</span><span class="o">.</span><span class="n">tmp_dir!</span><span class="p">(),</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">UUID</span><span class="o">.</span><span class="n">generate</span><span class="p">())</span> <span class="no">File</span><span class="o">.</span><span class="n">mkdir!</span><span class="p">(</span><span class="n">tmp</span><span class="p">)</span> <span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"-i"</span><span class="p">,</span> <span class="n">vid</span><span class="o">.</span><span class="n">url</span><span class="p">,</span> <span class="s2">"-vf"</span><span class="p">,</span> <span class="s2">"fps=1/</span><span class="si">#{</span><span class="n">interval</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="s2">"</span><span class="si">#{</span><span class="n">tmp</span><span class="si">}</span><span class="s2">/%02d.png"</span><span class="p">]</span> <span class="no">System</span><span class="o">.</span><span class="n">cmd</span><span class="p">(</span><span class="s2">"ffmpeg"</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span> <span class="n">urls</span> <span class="o">=</span> <span class="no">VidStore</span><span class="o">.</span><span class="n">put_thumbnails</span><span class="p">(</span><span class="n">vid</span><span class="p">,</span> <span class="no">Path</span><span class="o">.</span><span class="n">wildcard</span><span class="p">(</span><span class="n">tmp</span> <span class="o">&lt;&gt;</span> <span class="s2">"/*.png"</span><span class="p">))</span> <span class="no">Repo</span><span class="o">.</span><span class="n">insert_all</span><span class="p">(</span><span class="no">Thumb</span><span class="p">,</span> <span class="no">Enum</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">urls</span><span class="p">,</span> <span class="o">&amp;</span><span class="p">%{</span><span class="ss">vid_id:</span> <span class="n">vid</span><span class="o">.</span><span class="n">id</span><span class="p">,</span> <span class="ss">url:</span> <span class="nv">&amp;1</span><span class="p">}))</span> <span class="k">end</span><span class="p">)</span> <span class="k">end</span> </code></pre> </div> </div> <p>That&rsquo;s it! <code>FLAME.call</code> accepts the name of a runner pool, and a function. It then finds or boots a new copy of our entire application and runs the function there. Any variables the function closes over (like our <code>%Video{}</code> struct and <code>interval</code>) are passed along automatically.</p> <p>When the FLAME runner boots up, it connects back to the parent node, receives the function to run, executes it, and returns the result to the caller. Based on configuration, the booted runner either waits happily for more work before idling down, or extinguishes itself immediately.</p> <p>Let&rsquo;s visualize the flow:</p> <p><img alt="visualizing the flow" src="/blog/rethinking-serverless-with-flame/assets/visual.webp?centered" /></p> <p>We changed no other code and issued our DB write with <code>Repo.insert_all</code> just like before, because we are running our <em>entire</em> <em>application</em>. Database connection(s) and all. Except this fleeting application only runs that little function after startup and nothing else.</p> <p>In practice, a FLAME implementation will support a pool of runners for hot startup, scale-to-zero, and elastic growth. More on that later.</p> <h2 id='solving-a-problem-vs-removing-the-problem' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#solving-a-problem-vs-removing-the-problem' aria-label='Anchor'></a><span class='plain-code'>Solving a problem vs removing the problem</span></h2><blockquote>FaaS solutions help you solve a problem. FLAME removes the problem.</blockquote> <p>The FaaS labyrinth of complexity defies reason. And it&rsquo;s unavoidable. Let&rsquo;s walkthrough the thumbnail use-case to see how.</p> <p>We try to start with the simplest building block like request/response AWS Lambda Function URL&rsquo;s.</p> <p>The complexity hits immediately.</p> <p>We start writing custom encoders/decoders on both sides to handle streaming the thumbnails back to the app over HTTP. Phew that&rsquo;s done. Wait, is our video transcoding or user uploads going to take longer than 15 minutes? Sorry, hard timeout limit – time to split our videos into chunks to stay within the timeout, which means more lambdas to do that. Now we&rsquo;re orchestrating lambda workflows and relying on additional services, such as SQS and S3, to enable this.</p> <p>All the FaaS is doing is adding layers of communication between your code and the parts you want to run elastically. Each layer has its own glue integration price to pay.</p> <p>Ultimately handling this kind of use-case looks something like this:</p> <ul> <li>Trigger the lambda via HTTP endpoint, S3, or API gateway ($) </li><li>Write the bespoke lambda to transcode the video ($) </li><li>Place the thumbnail results into SQS ($) </li><li>Write the SQS consumer in our app (dev $) </li><li>Persist to DB and figure out how to get events back to active subscribers that may well be connected to other instances than the SQS consumer (dev $) </li></ul> <p>This is nuts. We pay the FaaS toll at every step. We shouldn&rsquo;t have to do any of this!</p> <p>FaaS provides a bunch of offerings to build a solution on top of. FLAME removes the problem entirely.</p> <h2 id='flame-backends' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#flame-backends' aria-label='Anchor'></a><span class='plain-code'>FLAME Backends</span></h2><blockquote>On Fly.io infrastructure the <code>FLAME.FlyBackend</code> can boot a copy of your application on a new <a href="https://fly.io/docs/machines/">Machine</a> and have it connect back to the parent for work within ~3s.</blockquote> <p>By default, FLAME ships with a <code>LocalBackend</code> and <code>FlyBackend</code>, but any host that provides an API to provision a server and run your app code can work as a FLAME backend. Erlang and Elixir primitives are doing all the heavy lifting here. The entire <code>FLAME.FlyBackend</code> is <a href='https://github.com/phoenixframework/flame/blob/main/lib/flame/fly_backend.ex' title=''>&lt; 200 LOC with docs</a>. The library has a single dependency, <code>req</code>, which is an HTTP client.</p> <p>Because Fly.io runs our applications as a packaged up docker image, we simply ask the Fly API to boot a new Machine for us with the same image that our app is currently running. Also thanks to Fly infrastructure, we can guarantee the FLAME runners are started in the same region as the parent. This optimizes latency and lets you ship whatever data back and forth between parent and runner without having to think about it.</p> <h2 id='look-at-everything-were-not-doing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-everything-were-not-doing' aria-label='Anchor'></a><span class='plain-code'>Look at everything we&rsquo;re not doing</span></h2> <p>With FaaS, just imagine how quickly the dev and testing story becomes a fate worse than death.</p> <p>To run the app locally, we either need to add some huge dev dependencies to simulate the entire FaaS pipeline, or worse, connect up our dev and test environments directly to the FaaS provider.</p> <p>With FLAME, your dev and test runners simply run on the local backend.</p> <p>Remember, this is your app. FLAME just controls where modular parts of it run. In dev or test, those parts simply run on the existing runtime on your laptop or CI server.</p> <p>Using Elixir, we can even send a file across to the remote FLAME application thanks to the distributed features of the Erlang VM:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-6icc60nu" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-6icc60nu"><span class="k">def</span> <span class="n">generate_thumbnails</span><span class="p">(%</span><span class="no">Video</span><span class="p">{}</span> <span class="o">=</span> <span class="n">vid</span><span class="p">,</span> <span class="n">interval</span><span class="p">)</span> <span class="k">do</span> <span class="n">parent_stream</span> <span class="o">=</span> <span class="no">File</span><span class="o">.</span><span class="n">stream!</span><span class="p">(</span><span class="n">vid</span><span class="o">.</span><span class="n">filepath</span><span class="p">,</span> <span class="p">[],</span> <span class="mi">2048</span><span class="p">)</span> <span class="no">FLAME</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="no">MyApp</span><span class="o">.</span><span class="no">FFMpegRunner</span><span class="p">,</span> <span class="k">fn</span> <span class="o">-&gt;</span> <span class="n">tmp_file</span> <span class="o">=</span> <span class="no">Path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="no">System</span><span class="o">.</span><span class="n">tmp_dir!</span><span class="p">(),</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">UUID</span><span class="o">.</span><span class="n">generate</span><span class="p">())</span> <span class="n">flame_stream</span> <span class="o">=</span> <span class="no">File</span><span class="o">.</span><span class="n">stream!</span><span class="p">(</span><span class="n">tmp_file</span><span class="p">)</span> <span class="no">Enum</span><span class="o">.</span><span class="n">into</span><span class="p">(</span><span class="n">parent_stream</span><span class="p">,</span> <span class="n">flame_stream</span><span class="p">)</span> <span class="n">tmp</span> <span class="o">=</span> <span class="no">Path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="no">System</span><span class="o">.</span><span class="n">tmp_dir!</span><span class="p">(),</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">UUID</span><span class="o">.</span><span class="n">generate</span><span class="p">())</span> <span class="no">File</span><span class="o">.</span><span class="n">mkdir!</span><span class="p">(</span><span class="n">tmp</span><span class="p">)</span> <span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"-i"</span><span class="p">,</span> <span class="n">tmp_file</span><span class="p">,</span> <span class="s2">"-vf"</span><span class="p">,</span> <span class="s2">"fps=1/</span><span class="si">#{</span><span class="n">interval</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="s2">"</span><span class="si">#{</span><span class="n">tmp</span><span class="si">}</span><span class="s2">/%02d.png"</span><span class="p">]</span> <span class="no">System</span><span class="o">.</span><span class="n">cmd</span><span class="p">(</span><span class="s2">"ffmpeg"</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span> <span class="n">urls</span> <span class="o">=</span> <span class="no">VidStore</span><span class="o">.</span><span class="n">put_thumbnails</span><span class="p">(</span><span class="n">vid</span><span class="p">,</span> <span class="no">Path</span><span class="o">.</span><span class="n">wildcard</span><span class="p">(</span><span class="n">tmp</span> <span class="o">&lt;&gt;</span> <span class="s2">"/*.png"</span><span class="p">))</span> <span class="no">Repo</span><span class="o">.</span><span class="n">insert_all</span><span class="p">(</span><span class="no">Thumb</span><span class="p">,</span> <span class="no">Enum</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">urls</span><span class="p">,</span> <span class="o">&amp;</span><span class="p">%{</span><span class="ss">vid_id:</span> <span class="n">vid</span><span class="o">.</span><span class="n">id</span><span class="p">,</span> <span class="ss">url:</span> <span class="nv">&amp;1</span><span class="p">}))</span> <span class="k">end</span><span class="p">)</span> <span class="k">end</span> </code></pre> </div> </div> <p>On line 2 we open a file on the parent node to the video path. Then in the FLAME child, we stream the file from the parent node to the FLAME server in only a couple lines of code. That&rsquo;s it! No setup of S3 or HTTP interfaces required.</p> <p>With FLAME it&rsquo;s easy to miss everything we&rsquo;re not doing:</p> <ul> <li>We don&rsquo;t need to write code outside of our application. We can reuse business logic, database setup, PubSub, and all the features of our respective platforms </li><li>We don&rsquo;t need to manage deploys of separate services or endpoints </li><li>We don&rsquo;t need to write results to S3 or SQS just to pick up values back in our app </li><li>We skip the dev, test, and CI dependency dance </li></ul> <h2 id='flame-outside-elixir' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#flame-outside-elixir' aria-label='Anchor'></a><span class='plain-code'>FLAME outside Elixir</span></h2> <p>Elixir is fantastically well suited for the FLAME model because we get so much <a href='https://fly.io/phoenix-files/elixir-and-phoenix-can-do-it-all/' title=''>for free</a> like process supervision and distributed messaging. That said, any language with reasonable concurrency primitives can take advantage of this pattern. For example, my teammate, Lubien, created a proof of concept example for breaking out functions in your JavaScript application and running them inside a new Fly Machine: <a href='https://github.com/lubien/fly-run-this-function-on-another-machine' title=''>https://github.com/lubien/fly-run-this-function-on-another-machine</a></p> <p>So the general flow for a JavaScript-based FLAME call would be to move the modular executions to a new file, which is executed on a runner pool. Provided the arguments are JSON serializable, the general FLAME flow is similar to what we&rsquo;ve outlined here. Your application, your code, running on fleeting instances.</p> <p>A complete FLAME library will need to handle the following concerns:</p> <ul> <li>Elastic pool scale-up and scale-down logic </li><li>Hot vs cold startup with pools </li><li>Remote runner monitoring to avoid orphaned resources </li><li>How to monitor and keep deployments fresh </li></ul> <p>For the rest of this post we&rsquo;ll see how the Elixir FLAME library handles these concerns as well as features uniquely suited to Elixir applications. But first, you might be wondering about your background job queues.</p> <h2 id='what-about-my-background-job-processor' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-about-my-background-job-processor' aria-label='Anchor'></a><span class='plain-code'>What about my background job processor?</span></h2> <p>FLAME works great inside your background job processor, but you may have noticed some overlap. If your job library handles scaling the worker pool, what is FLAME doing for you? There&rsquo;s a couple important distinctions here.</p> <p>First, we reach for these queues when we need <em>durability guarantees</em>. We often can turn knobs to have the queues scale to handle more jobs as load changes. But durable operations are separate from elastic execution. Conflating these concerns can send you down a similar path to lambda complexity. Leaning on your worker queue purely for offloaded execution means writing all the glue code to get the data into and out of the job, and back to the caller or end-user&rsquo;s device somehow.</p> <p>For example, if we want to guarantee we successfully generated thumbnails for a video after the user upload, then a job queue makes sense as the <em>dispatch, commit, and retry</em> <em>mechanism</em> for this operation. The actual transcoding could be a FLAME call inside the job itself, so we decouple the ideas of durability and scaled execution.</p> <p>On the other side, we have operations we don&rsquo;t need durability for. Take the screencast above where the user hasn&rsquo;t yet saved their video. Or an ML model execution where there&rsquo;s no need to waste resources churning a prompt if the user has already left the app. In those cases, it doesn&rsquo;t make sense to write to a durable store to pick up a job for work that will go right into the ether.</p> <h2 id='pooling-for-elastic-scale' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pooling-for-elastic-scale' aria-label='Anchor'></a><span class='plain-code'>Pooling for Elastic Scale</span></h2> <p>With the Elixir implementation of FLAME, you define elastic pools of runners. This allows scale-to-zero behavior while also elastically scaling up FLAME servers with max concurrency limits.</p> <p>For example, lets take a look at the <code>start/2</code> callback, which is the entry point of all Elixir applications. We can drop in a <code>FLAME.Pool</code> for video transcriptions and say we want it to scale to zero, boot a max of 10, and support 5 concurrent <code>ffmpeg</code> operations per runner:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-glp57duz" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-glp57duz"><span class="k">def</span> <span class="n">start</span><span class="p">(</span><span class="n">_type</span><span class="p">,</span> <span class="n">_args</span><span class="p">)</span> <span class="k">do</span> <span class="n">flame_parent</span> <span class="o">=</span> <span class="no">FLAME</span><span class="o">.</span><span class="no">Parent</span><span class="o">.</span><span class="n">get</span><span class="p">()</span> <span class="n">children</span> <span class="o">=</span> <span class="p">[</span> <span class="o">...</span><span class="p">,</span> <span class="no">MyApp</span><span class="o">.</span><span class="no">Repo</span><span class="p">,</span> <span class="p">{</span><span class="no">FLAME</span><span class="o">.</span><span class="no">Pool</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">Thumbs</span><span class="o">.</span><span class="no">FFMpegRunner</span><span class="p">,</span> <span class="ss">min:</span> <span class="mi">0</span><span class="p">,</span> <span class="ss">max:</span> <span class="mi">10</span><span class="p">,</span> <span class="ss">max_concurrency:</span> <span class="mi">5</span><span class="p">,</span> <span class="ss">idle_shutdown_after:</span> <span class="mi">30_000</span><span class="p">},</span> <span class="n">!flame_parent</span> <span class="o">&amp;&amp;</span> <span class="no">MyAppWeb</span><span class="o">.</span><span class="no">Endpoint</span> <span class="p">]</span> <span class="o">|&gt;</span> <span class="no">Enum</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="o">&amp;</span> <span class="nv">&amp;1</span><span class="p">)</span> <span class="n">opts</span> <span class="o">=</span> <span class="p">[</span><span class="ss">strategy:</span> <span class="ss">:one_for_one</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">MyApp</span><span class="o">.</span><span class="no">Supervisor</span><span class="p">]</span> <span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="n">children</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span> <span class="k">end</span> </code></pre> </div> </div> <p>We use the presence of a FLAME parent to conditionally start our Phoenix webserver when booting the app. There&rsquo;s no reason to start a webserver if we aren&rsquo;t serving web traffic. Note we leave other services like the database <code>MyApp.Repo</code> alone because we want to make use of those services inside FLAME runners.</p> <p>Elixir&rsquo;s supervised process approach to applications is uniquely great for turning these kinds of knobs.</p> <p>We also set our pool to idle down after 30 seconds of no caller operations. This keeps our runners hot for a short while before discarding them. We could also pass a <code>min: 1</code> to always ensure at least one <code>ffmpeg</code> runner is hot and ready for work by the time our application is started.</p> <h2 id='process-placement' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#process-placement' aria-label='Anchor'></a><span class='plain-code'>Process Placement</span></h2> <p>In Elixir, stateful bits of our applications are built around the <em>process</em> primitive – lightweight greenthreads with message mailboxes. Wrapping our otherwise stateless app code in a synchronous <code>FLAME.call</code>&lsquo;s or async <code>FLAME.cast</code>&rsquo;s works great, but what about the stateful parts of our app?</p> <p><code>FLAME.place_child</code> exists to take an existing process specification in your Elixir app and start it on a FLAME runner instead of locally. You can use it anywhere you&rsquo;d use <code>Task.Supervisor.start_child</code> , <code>DynamicSupervisor.start_child</code>, or similar interfaces. Just like <code>FLAME.call</code>, the process is run on an elastic pool and runners handle idle down when the process completes its work.</p> <p>And like <code>FLAME.call</code>, it lets us take existing app code, change a single LOC, and continue shipping features.</p> <p>Let&rsquo;s walk thru the example from the screencast above. Imagine we want to generate video thumbnails for a video <em>as it is being uploaded</em>. Elixir and LiveView make this easy. We won&rsquo;t cover all the code here, but you can view the <a href='https://github.com/fly-apps/thumbnail_generator/blob/main/lib/thumbs/thumbnail_generator.ex' title=''>full app implementation</a>.</p> <p>Our first pass would be to write a LiveView upload writer that calls into a <code>ThumbnailGenerator</code>:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-e630ykcb" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-e630ykcb"><span class="k">defmodule</span> <span class="no">ThumbsWeb</span><span class="o">.</span><span class="no">ThumbnailUploadWriter</span> <span class="k">do</span> <span class="nv">@behaviour</span> <span class="no">Phoenix</span><span class="o">.</span><span class="no">LiveView</span><span class="o">.</span><span class="no">UploadWriter</span> <span class="n">alias</span> <span class="no">Thumbs</span><span class="o">.</span><span class="no">ThumbnailGenerator</span> <span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span> <span class="n">generator</span> <span class="o">=</span> <span class="no">ThumbnailGenerator</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="p">%{</span><span class="ss">gen:</span> <span class="n">generator</span><span class="p">}}</span> <span class="k">end</span> <span class="k">def</span> <span class="n">write_chunk</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">state</span><span class="p">)</span> <span class="k">do</span> <span class="no">ThumbnailGenerator</span><span class="o">.</span><span class="n">stream_chunk!</span><span class="p">(</span><span class="n">state</span><span class="o">.</span><span class="n">gen</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">state</span><span class="p">}</span> <span class="k">end</span> <span class="k">def</span> <span class="n">meta</span><span class="p">(</span><span class="n">state</span><span class="p">),</span> <span class="k">do</span><span class="p">:</span> <span class="p">%{</span><span class="ss">gen:</span> <span class="n">state</span><span class="o">.</span><span class="n">gen</span><span class="p">}</span> <span class="k">def</span> <span class="n">close</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">_reason</span><span class="p">)</span> <span class="k">do</span> <span class="no">ThumbnailGenerator</span><span class="o">.</span><span class="n">close</span><span class="p">(</span><span class="n">state</span><span class="o">.</span><span class="n">gen</span><span class="p">)</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">state</span><span class="p">}</span> <span class="k">end</span> <span class="k">end</span> </code></pre> </div> </div> <p>An upload writer is a behavior that simply ferries the uploaded chunks from the client into whatever we&rsquo;d like to do with them. Here we have a <code>ThumbnailGenerator.open/1</code> which starts a process that communicates with an <code>ffmpeg</code> shell. Inside <code>ThumbnailGenerator.open/1</code>, we use regular elixir process primitives:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-ziskaky4" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-ziskaky4"> <span class="c1"># thumbnail_generator.ex</span> <span class="k">def</span> <span class="n">open</span><span class="p">(</span><span class="n">opts</span> <span class="p">\\</span> <span class="p">[])</span> <span class="k">do</span> <span class="no">Keyword</span><span class="o">.</span><span class="n">validate!</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="p">[</span><span class="ss">:timeout</span><span class="p">,</span> <span class="ss">:caller</span><span class="p">,</span> <span class="ss">:fps</span><span class="p">])</span> <span class="n">timeout</span> <span class="o">=</span> <span class="no">Keyword</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="ss">:timeout</span><span class="p">,</span> <span class="mi">5_000</span><span class="p">)</span> <span class="n">caller</span> <span class="o">=</span> <span class="no">Keyword</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="ss">:caller</span><span class="p">,</span> <span class="n">self</span><span class="p">())</span> <span class="n">ref</span> <span class="o">=</span> <span class="n">make_ref</span><span class="p">()</span> <span class="n">parent</span> <span class="o">=</span> <span class="n">self</span><span class="p">()</span> <span class="n">spec</span> <span class="o">=</span> <span class="p">{</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="p">{</span><span class="n">caller</span><span class="p">,</span> <span class="n">ref</span><span class="p">,</span> <span class="n">parent</span><span class="p">,</span> <span class="n">opts</span><span class="p">}}</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">pid</span><span class="p">}</span> <span class="o">=</span> <span class="no">DynamicSupervisor</span><span class="o">.</span><span class="n">start_child</span><span class="p">(</span><span class="nv">@sup</span><span class="p">,</span> <span class="n">spec</span><span class="p">)</span> <span class="k">receive</span> <span class="k">do</span> <span class="p">{</span><span class="o">^</span><span class="n">ref</span><span class="p">,</span> <span class="p">%</span><span class="no">ThumbnailGenerator</span><span class="p">{}</span> <span class="o">=</span> <span class="n">gen</span><span class="p">}</span> <span class="o">-&gt;</span> <span class="p">%</span><span class="no">ThumbnailGenerator</span><span class="p">{</span><span class="n">gen</span> <span class="o">|</span> <span class="ss">pid:</span> <span class="n">pid</span><span class="p">}</span> <span class="k">after</span> <span class="n">timeout</span> <span class="o">-&gt;</span> <span class="k">exit</span><span class="p">(</span><span class="ss">:timeout</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> </code></pre> </div> </div> <p>The details aren&rsquo;t super important here, except line 10 where we call <code>{:ok, pid} = DynamicSupervisor.start_child(@sup, spec)</code>, which starts a supervised<code>ThumbnailGenerator</code> process. The rest of the implementation simply ferries chunks as stdin into <code>ffmpeg</code> and parses png&rsquo;s from stdout. Once a PNG delimiter is found in stdout, we send the <code>caller</code> process (our LiveView process) a message saying &ldquo;hey, here&rsquo;s an image&rdquo;:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-y166mubi" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-y166mubi"><span class="c1"># thumbnail_generator.ex</span> <span class="nv">@png_begin</span> <span class="o">&lt;&lt;</span><span class="mi">137</span><span class="p">,</span> <span class="mi">80</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">71</span><span class="p">,</span> <span class="mi">13</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">26</span><span class="p">,</span> <span class="mi">10</span><span class="o">&gt;&gt;</span> <span class="k">defp</span> <span class="n">handle_stdout</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">ref</span><span class="p">,</span> <span class="n">bin</span><span class="p">)</span> <span class="k">do</span> <span class="p">%</span><span class="no">ThumbnailGenerator</span><span class="p">{</span><span class="ss">ref:</span> <span class="o">^</span><span class="n">ref</span><span class="p">,</span> <span class="ss">caller:</span> <span class="n">caller</span><span class="p">}</span> <span class="o">=</span> <span class="n">state</span><span class="o">.</span><span class="n">gen</span> <span class="k">case</span> <span class="n">bin</span> <span class="k">do</span> <span class="o">&lt;&lt;</span><span class="nv">@png_begin</span><span class="p">,</span> <span class="n">_rest</span><span class="p">::</span><span class="n">binary</span><span class="o">&gt;&gt;</span> <span class="o">-&gt;</span> <span class="k">if</span> <span class="n">state</span><span class="o">.</span><span class="n">current</span> <span class="k">do</span> <span class="n">send</span><span class="p">(</span><span class="n">caller</span><span class="p">,</span> <span class="p">{</span><span class="n">ref</span><span class="p">,</span> <span class="ss">:image</span><span class="p">,</span> <span class="n">state</span><span class="o">.</span><span class="n">count</span><span class="p">,</span> <span class="n">encode</span><span class="p">(</span><span class="n">state</span><span class="p">)})</span> <span class="k">end</span> <span class="p">%{</span><span class="n">state</span> <span class="o">|</span> <span class="ss">count:</span> <span class="n">state</span><span class="o">.</span><span class="n">count</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="ss">current:</span> <span class="p">[</span><span class="n">bin</span><span class="p">]}</span> <span class="n">_</span> <span class="o">-&gt;</span> <span class="p">%{</span><span class="n">state</span> <span class="o">|</span> <span class="ss">current:</span> <span class="p">[</span><span class="n">bin</span> <span class="o">|</span> <span class="n">state</span><span class="o">.</span><span class="n">current</span><span class="p">]}</span> <span class="k">end</span> <span class="k">end</span> </code></pre> </div> </div> <p>The <code>caller</code> LiveView process then picks up the message in a <code>handle_info</code> callback and updates the UI:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-3gf1jq5" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-3gf1jq5"><span class="c1"># thumb_live.ex</span> <span class="k">def</span> <span class="n">handle_info</span><span class="p">({</span><span class="n">_ref</span><span class="p">,</span> <span class="ss">:image</span><span class="p">,</span> <span class="n">_count</span><span class="p">,</span> <span class="n">encoded</span><span class="p">},</span> <span class="n">socket</span><span class="p">)</span> <span class="k">do</span> <span class="p">%{</span><span class="ss">count:</span> <span class="n">count</span><span class="p">}</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">assigns</span> <span class="p">{</span><span class="ss">:noreply</span><span class="p">,</span> <span class="n">socket</span> <span class="o">|&gt;</span> <span class="n">assign</span><span class="p">(</span><span class="ss">count:</span> <span class="n">count</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="ss">message:</span> <span class="s2">"Generating (</span><span class="si">#{</span><span class="n">count</span> <span class="o">+</span> <span class="mi">1</span><span class="si">}</span><span class="s2">)"</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="n">stream_insert</span><span class="p">(</span><span class="ss">:thumbs</span><span class="p">,</span> <span class="p">%{</span><span class="ss">id:</span> <span class="n">count</span><span class="p">,</span> <span class="ss">encoded:</span> <span class="n">encoded</span><span class="p">})}</span> <span class="k">end</span> </code></pre> </div> </div> <p>The <code>send(caller, {ref, :image, state.count, encode(state)}</code> is one magic part about Elixir. Everything is a process, and we can message those processes, regardless of their location in the cluster.</p> <p>It&rsquo;s like if every instantiation of an object in your favorite OO lang included a cluster-global unique identifier to work with methods on that object. The LiveView (a process) simply receives the image message and updates the UI with new images.</p> <p>Now let&rsquo;s head back over to our <code>ThumbnailGenerator.open/1</code> function and make this elastically scalable.</p> <div class="highlight-wrapper group relative diff"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-5jadq56a" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-5jadq56a"><span class="gd">- {:ok, pid} = DynamicSupervisor.start_child(@sup, spec) </span><span class="gi">+ {:ok, pid} = FLAME.place_child(Thumbs.FFMpegRunner, spec) </span></code></pre> </div> </div> <p>That&rsquo;s it! Because everything is a process and processes can live anywhere, it doesn&rsquo;t matter what server our <code>ThumbnailGenerator</code> process lives on. It simply messages the caller with <code>send(caller, …)</code> and the messages are sent across the cluster if needed.</p> <p>Once the process exits, either from an explicit close, after the upload is done, or from the end-user closing their browser tab, the FLAME server will note the exit and idle down if no other work is being done.</p> <p>Check out the <a href='https://github.com/fly-apps/thumbnail_generator/blob/main/lib/thumbs/thumbnail_generator.ex' title=''>full implementation</a> if you&rsquo;re interested.</p> <h2 id='remote-monitoring' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#remote-monitoring' aria-label='Anchor'></a><span class='plain-code'>Remote Monitoring</span></h2> <p>All this transient infrastructure needs failsafe mechanisms to avoid orphaning resources. If a parent spins up a runner, that runner must take care of idling itself down when no work is present and handle failsafe shutdowns if it can no longer contact the parent node.</p> <p>Likewise, we need to shutdown runners when parents are rolled for new deploys as we must guarantee we&rsquo;re running the same code across the cluster.</p> <p>We also have active callers in many cases that are awaiting the result of work on runners that could go down for any reason.</p> <p>There&rsquo;s a lot to monitor here.</p> <p>There&rsquo;s also a number of failure modes that make this sound like a harrowing experience to implement. Fortunately Elixir has all the primitives to make this an easy task thanks to the Erlang VM. Namely, we get the following for free:</p> <ul> <li>Process monitoring and supervision – we know when things go bad. Whether on a node-local process, or one across the cluster </li><li>Node monitoring – we know when nodes come up, and when nodes go away </li><li>Declarative and controlled app startup and shutdown - we carefully control the startup and shutdown sequence of applications as a matter of course. This allows us to gracefully shutdown active runners when a fresh deploy is triggered, while giving them time to finish their work </li></ul> <p>We&rsquo;ll cover the internal implementation details in a future deep-dive post. For now, feel free to poke around <a href='https://github.com/phoenixframework/flame' title=''>the flame source</a>.</p> <h2 id='whats-next' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next' aria-label='Anchor'></a><span class='plain-code'>What&rsquo;s Next</span></h2> <p>We&rsquo;re just getting started with the Elixir FLAME library, but it&rsquo;s ready to try out now. In the future look for more advance pool growth techniques, and deep dives into how the Elixir implementation works. You can also find me <a href='https://twitter.com/chris_mccord' title=''>@chris_mccord</a> to chat about implementing the FLAME pattern in your language of choice.</p> <p>Happy coding!</p> <p>–Chris</p></content> </entry> <entry> <title>The risks of building apps on ChatGPT</title> <link rel="alternate" href="https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/"/> <id>https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/</id> <published>2023-12-05T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/assets/risks-building-on-chatgpt-thumb.webp"/> <content type="html"><div class="lead"><p>If AI will play an essential role in your application, then consider using a self-hosted, open source model instead of a proprietary and externally hosted one. In this post we explore some of the risks for the latter option. We’re Fly.io. We put your code into lightweight microVMs on our own hardware <a href="https://fly.io/docs/reference/regions/" title="">around the world</a>. <a href="https://fly.io/docs/speedrun/" title="">Check us out</a>—your app can be deployed in minutes.</p> </div> <p>The topic of &ldquo;AI&rdquo; gets a lot of attention and press. Coverage ranges from apocalyptic warnings to Utopian predictions. The truth, as always, is likely somewhere in the middle. As developers, we are the ones that either imagine ways that AI can be used to enhance our products or the ones doing the herculean tasks of implementing it inside our companies.</p> <p>I believe the following statement to be true:</p> <blockquote> <p>AI won’t replace humans — but humans with AI will replace humans without AI.</p> </blockquote> <p>I believe this can be extended to many products and services and the companies that create them. Let&rsquo;s express it this way:</p> <blockquote> <p>AI won’t replace businesses — but businesses with AI will replace businesses without AI.</p> </blockquote> <p>Today I&rsquo;m assuming your business would benefit from using AI. Or, at the very least, your C-levels have decreed from on high that thou must integrateth with AI. With that out of the way, the next question is how you&rsquo;re meant to do it. This post is an argument to build on top of open source language models instead of closed models that you rent access to. We&rsquo;ll take a look at what convinced me.</p> <h2 id='but-openai-is-the-market-leader' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-openai-is-the-market-leader' aria-label='Anchor'></a><span class='plain-code'>But OpenAI is the market leader…</span></h2> <p>OpenAI, the creators of the famous ChatGPT, are the strong market leaders in this category. Why wouldn&rsquo;t you want to use the best in the business?</p> <p>Early on, stories of private corporate documents being uploaded by employees and then finding that private information leaking out to general ChatGPT users was a real black eye. <a href='https://www.sciencealert.com/many-companies-are-banning-chatgpt-this-is-why' title=''>Companies began banning employees from using ChatGPT for work</a>. It exposed that people&rsquo;s interactions with ChatGPT were being used as training data for future versions of the model.</p> <p>In response, OpenAI recently announced an <a href='https://openai.com/enterprise' title=''>Enterprise</a> offering promising that no Enterprise customer data is used for training.</p> <p>With the top objection addressed, it should be smooth sailing for wide adoption, right?</p> <p>Not so fast.</p> <p>While an Enterprise offering may address that concern, there are other subtle reasons to not use OpenAI, or other closed models, that can&rsquo;t be resolved by vague statements of enterprise privacy.</p> <h2 id='what-are-the-risks-for-building-on-top-of-openai' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-are-the-risks-for-building-on-top-of-openai' aria-label='Anchor'></a><span class='plain-code'>What are the risks for building on top of OpenAI?</span></h2> <p>Let&rsquo;s briefly outline the risks we take on when relying on a company like OpenAI for critical AI features in our applications.</p> <ul> <li><strong class='font-semibold text-navy-950'>Single provider risk</strong>: Relying deeply on an external service that plays a critical role in our business is risky. The integration is not easily swapped out for another service if needed. Additionally, we don&rsquo;t want part of our &ldquo;secret sauce&rdquo; to actually be another company&rsquo;s product. That&rsquo;s some seriously shaky ground! They <em>want</em> to sell the same thing to our competitors too. </li><li><strong class='font-semibold text-navy-950'>Regulation or Policy change risk</strong>: &ldquo;AI&rdquo; is being talked about a lot in politics. What&rsquo;s acceptable today may be deemed &ldquo;not allowed&rdquo; in the future and a corporation providing a newly regulated service must comply. </li><li><strong class='font-semibold text-navy-950'>Financial risk</strong>: <a href='https://www.washingtonpost.com/technology/2023/06/05/chatgpt-hidden-cost-gpu-compute/' title=''>AI chatbots lose money on every chat.</a> If the financial models that make our business profitable are built on impossible to maintain prices, then our business model may be at risk when it&rsquo;s time to &ldquo;make the AI engine profitable&rdquo; like we&rsquo;ve seen happen time and time again with every industry from cookware to video games. What might the true cost be? We don&rsquo;t know. &lsquo;Nuff said. </li><li><strong class='font-semibold text-navy-950'>Governance and leadership risk</strong>: The co-founder and CEO of OpenAI, <a href='https://openai.com/blog/openai-announces-leadership-transition' title=''>Sam Altman, was forced out of his own company by a coup from his board</a>. This was later resolved with both <a href='https://openai.com/blog/openai-announces-leadership-transition' title=''>Sam Altman and Greg Brockman returning</a>. This exposes another risk we don&rsquo;t often consider with our providers. More on this later. </li></ul> <p>Let&rsquo;s look a bit closer at the &ldquo;Single provider risk&rdquo;.</p> <h2 id='single-provider-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#single-provider-risk' aria-label='Anchor'></a><span class='plain-code'>Single provider risk</span></h2> <p>For hobby usage, proof of concept work, and personal experiments, by all means, use ChatGPT! I do and I expect to continue to as well. It&rsquo;s fantastic for prototyping, it&rsquo;s trivial to set up, and it allows you to throw ink on canvas so much more quickly than any other option out there.</p> <p>Up until recently, I was all gung-ho for ChatGPT being integrated into my apps. What happened? November 2023 happened. It was a very bad month for OpenAI.</p> <p>I created a <a href='https://fly.io/phoenix-files/created-my-personal-ai-fitness-trainer-in-2-days/' title=''>Personal AI Fitness Trainer</a> powered by ChatGPT and on the morning of November 8th, I asked my personal trainer about the workout for the day and it failed. OpenAI was having a bad day with an outage.</p> <p>I don&rsquo;t fault someone for having a bad day. At some point, downtime happens to the best of us. And given enough time, it happens to <strong class='font-semibold text-navy-950'>all</strong> of us. But when possible, I want to prevent someone <em>else&rsquo;s</em> bad day from becoming <em>my</em> bad day too.</p> <h3 id='evaluating-a-critical-dependency' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#evaluating-a-critical-dependency' aria-label='Anchor'></a><span class='plain-code'>Evaluating a critical dependency</span></h3> <p>In my case, my personal fitness trainer being unavailable was a minor inconvenience, but I managed. However, it gave me pause. If I had built an AI fitness trainer as a service, that outage would be a much bigger deal and there would be nothing I could have done to fix it until the ChatGPT API came back up.</p> <p>With services like a Personal AI Fitness Trainer, the AI component is the primary focus and main value proposition of the app. That&rsquo;s pretty darn critical! If that AI service is interrupted, significantly altered (say, by the model suddenly refusing my requests for fitness information in ways that worked before) or my desired usage is denied (without warning or reason), the application is useless. That&rsquo;s an existential threat that could make my app evaporate overnight without warning.</p> <p>This highlights the risk of having a critical dependency on an external service.</p> <p>Modern applications depend on many services, both internal and external. But how <strong class='font-semibold text-navy-950'>critical</strong> that dependency is matters.</p> <p>Let&rsquo;s take a <em>very</em> simple application as an example. The application has a critical dependency on the database and both the app and database have a critical dependency on the underlying VMs/machines/provider. These critical dependencies are so common that we seldom think about them because we deal with them every day we come to work. It&rsquo;s just how things are.</p> <p><img alt="Diagram showing an application stack of hosting &gt; Database &gt; My Application and weak dependencies on logging, error reporting, etc. Then a critical dependency on an external AI as a Service. " src="/blog/the-risks-of-building-apps-on-chatgpt/assets/critical-dependency-vs-weak.png" /></p> <p>The danger comes when we draw a critical dependency line to an <strong class='font-semibold text-navy-950'>external</strong> <strong class='font-semibold text-navy-950'>service</strong>. If the service has a hiccup or the network between my app and their service starts dropping all my packets, the entire application goes down. Someone else&rsquo;s bad day gets spread around when that happens. 😞</p> <p>In order to protect ourselves from a risk like that, we should diversify our reliance away from a single external provider. How do we do that? We&rsquo;ll come back to this later.</p> <h3 id='we-are-not-without-dependencies' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-are-not-without-dependencies' aria-label='Anchor'></a><span class='plain-code'>We are not without dependencies</span></h3> <p>It&rsquo;s really common for apps to have external dependencies. The question is how critical to our service are those dependencies?</p> <p>What happens to the application when the external log aggregation service, email service, and error reporting services are all unreachable? If the app is designed well, then users may have a slightly degraded experience or, best case, the users won&rsquo;t even notice the issues at all!</p> <p>The key factor is these external services are not essential to our application functioning.</p> <h2 id='regulation-or-policy-change-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#regulation-or-policy-change-risk' aria-label='Anchor'></a><span class='plain-code'>Regulation or Policy change risk</span></h2> <p>Our industry has a lot of misconceptions, fear, uncertainty, and doubt around the idea of regulation, but sometimes it&rsquo;s justified. I don&rsquo;t want you to think about regulation as a scary thing that yanks away control. Instead, let&rsquo;s think about regulation as when a government body gets involved to disallow businesses from doing or engaging in specific activities. Given that our industry has been so self-defined for so long, this feels like an existential threat. However, this is a good thing when we think about vehicle safety standards (you don&rsquo;t want your 4-ton mass of metal exploding while traveling at 70 mph), pollution, health risks, and more. It&rsquo;s a careful balance.</p> <p>Ironically, Sam Altman has been a major proponent <a href='https://www.forbes.com/sites/johannacostigan/2023/06/13/openais-sam-altman-makes-global-call-for-ai-regulation-and-includes-china/?sh=4fc007421b47' title=''>for government regulation</a> of the AI industry. Why would he want that?</p> <p>It turns out that <a href='https://www.cato.org/policy-analysis/regulatory-protectionism-hidden-threat-free-trade' title=''>regulation can also be used as a form of protectionism</a>. Or, put another way, when the people with an early lead see that <a href='https://www.semianalysis.com/p/google-we-have-no-moat-and-neither' title=''>they aren&rsquo;t defensible against advances with open source AI models</a>, they want to pull up the ladders behind them and have the government make it legally harder, or impossible, for competitors to catch up to them.</p> <p>If Altman&rsquo;s efforts are successful, then companies who create AI can expect government involvement and oversight. Added licensing requirements and certifications would raise the cost of starting a competing business.</p> <p>At this point you may be thinking something like &ldquo;but all of that is theoretical Mark, how would this affect my business&rsquo; use of AI today?&rdquo;</p> <p>Introducing an external organization that can dictate changes to an AI product risks breaking an existing company&rsquo;s applications or significantly reducing the effectiveness of the application. And those changes may come without notice or warning.</p> <p>Additionally, if my business is built on an external AI system protected from competition by regulators, that adds a significant risk. If they are now the only game in town, they can set whatever price they want.</p> <h2 id='governance-and-leadership-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#governance-and-leadership-risk' aria-label='Anchor'></a><span class='plain-code'>Governance and leadership risk</span></h2> <p>In the week following the OpenAI outage (November 17th to be precise), the entire tech industry was upended for most of a week following a blog post on the OpenAI blog <a href='https://openai.com/blog/openai-announces-leadership-transition' title=''>announcing that the OpenAI board fired the co-founder and CEO, Sam Altman</a>. Then <a href='https://www.forbes.com/sites/richardnieva/2023/11/17/openai-president-and-co-founder-quits-over-sam-altman-firing/?sh=34fe4b621d57' title=''>Greg Brockman, co-founder and acting President resigned in protest</a>.</p> <p>OpenAI is partnered with Microsoft and on Nov 20, 2023, <a href='https://twitter.com/satyanadella/status/1726509045803336122' title=''>Satya Nadella (CEO of Microsoft) posted the following on X</a> (formerly Twitter):</p> <blockquote> <p>We remain committed to our partnership with OpenAI (OAI) and have confidence in our product roadmap, our ability to continue to innovate with everything we announced at Microsoft Ignite, and in continuing to support our customers and partners. We look forward to getting to know Emmett Shear and OAI&rsquo;s new leadership team and working with them. And <strong class='font-semibold text-navy-950'>we’re extremely excited to share the news that Sam Altman and Greg Brockman, together with colleagues, will be joining Microsoft to lead a new advanced AI research team.</strong> We look forward to moving quickly to provide them with the resources needed for their success.</p> </blockquote> <p>Microsoft nearly <a href='https://en.wikipedia.org/wiki/Acqui-hiring' title=''>acqui-hired</a> OpenAI for $0! That&rsquo;s some serious business Jujutsu.</p> <p>In the end, after 12 days of very public corporate chaos, <a href='https://openai.com/blog/sam-altman-returns-as-ceo-openai-has-a-new-initial-board' title=''>Sam Altman and Greg Brockman returned to OpenAI at their previous leadership positions</a> as if nothing happened (save the firing of the rest of the board).</p> <p>With all the drama and uncertainty resolved, you may say, &ldquo;it all worked out in the end, right? So what&rsquo;s the problem?&rdquo;</p> <p>This highlights the risk of building <em>any</em> critical business system on a product offered and hosted by an external company. When we do that, we implicitly take on all of that company&rsquo;s risks in addition to the risks our business already has! In this case, it&rsquo;s taking on all the risks of OpenAI while getting none of their financial benefits!</p> <h2 id='whats-the-alternative' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-the-alternative' aria-label='Anchor'></a><span class='plain-code'>What&rsquo;s the alternative?</span></h2> <p>The thing big AI providers like OpenAI and Google seem to fear most is competition from open source AI models. And they should be afraid. Open source AI models continue to develop at a rapid pace (there&rsquo;s huge incremental improvements on a weekly basis) and, most importantly, they can be self-hosted.</p> <p>Additionally, it&rsquo;s not out of reach for us to <a href='https://huggingface.co/docs/transformers/training' title=''>fine tune</a> a general model to better fit our needs by adding and removing capabilities rather than hope that the capabilities we need suddenly manifest for us.</p> <p>Doesn&rsquo;t this all sound like the classic argument in favor of open source?</p> <p>If we have the model and can host it ourselves, no one can take it away. When we self-host it, we are protected from:</p> <ul> <li>service interruptions from an external provider for a critical system </li><li>changes in licensing or usage fees (such as your provider suddenly doubling inference costs without warning via an email sent at 3AM) </li><li>government regulators dictate a change to the model that negatively affects our use case (assuming our use isn&rsquo;t breaking the law of course) </li><li>company policy changes that change the behavior of the model we rely on </li><li>rogue boards or a leadership crisis that impacts a provider </li></ul> <p>Using an open source and self-hosted model insulates us from these external risks.</p> <h2 id='i-still-need-gpus' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#i-still-need-gpus' aria-label='Anchor'></a><span class='plain-code'>I still need GPUs!</span></h2> <p>Getting dedicated access to a GPU is more expensive than renting limited time on OpenAI&rsquo;s servers. That&rsquo;s why a hobby or personal project is better off paying for the brief bits of time when needed.</p> <p>But let&rsquo;s face it.</p> <p>If you really want to integrate AI into your business, you need to host your own models. You can&rsquo;t control third party privacy policies, but you can control your own policies when you are the one doing your own inference with your own models. Ideally this means getting your own GPUs and incurring the capital expenditure and operations expenditures, but thankfully we&rsquo;re in the future. We have the cloud now. There&rsquo;s many options you can use for renting GPU access from other companies. This is supported in the big clouds as well as Fly.io. You can check out our <a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''>GPU offerings here</a>.</p> <figure class="post-cta"> <figcaption> <h1>Fly.io also offer GPUs</h1> <p>Running inference on your own hosted models can help de-risk critical AI integrations.</p> <a class="btn btn-lg" href="https://fly.io/docs/about/pricing/#gpus-and-fly-machines"> GPU resource prices </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='closing-thoughts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#closing-thoughts' aria-label='Anchor'></a><span class='plain-code'>Closing thoughts</span></h2> <p>It&rsquo;s important to take advantage of AI in our applications so we can reap the benefits. It can give us an important edge in the market! However, we should be extra cautious of building any critical features on a product offered by a proprietary external business. <a href='https://www.msn.com/en-us/money/companies/sam-altman-chaos-helped-openai-rivals-says-hugging-face-ceo-cl%C3%A9ment-delangue/ar-AA1kIFQP' title=''>Others are considering the risks of building on OpenAI as well</a>.</p> <p>Your specific level of risk depends on how central the AI aspect is to your business. If it&rsquo;s a central component like in my Personal AI Fitness Trainer, then I risk losing all my customers and even the company if any of the above mentioned risk factors happen to my AI provider. That&rsquo;s an existential risk that I can&rsquo;t do anything about without taking emergency heroic efforts.</p> <p>If the AI is sprinkled around the edges of the business, then suddenly losing it won&rsquo;t kill the company. However, if the AI isn&rsquo;t being well utilized, then the business may be at risk to competitors who place a bigger bet and take a bigger swing with AI.</p> <p>Oh, what interesting times we live in! 🙃</p></content> </entry> <entry> <title>Print on Demand</title> <link rel="alternate" href="https://fly.io/blog/print-on-demand/"/> <id>https://fly.io/blog/print-on-demand/</id> <published>2023-11-29T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/print-on-demand/assets/print-on-demand-thumb.webp"/> <content type="html"><div class="lead"><p>Save money by using appliance machines to only allocate memory and other machine resources when you actually need them.</p> </div> <p>Scaling discussions often lead to recommendations to add more memory, more CPU, more machines, more regions, more, more, more.</p> <p>This post is different. It focuses instead on the idea of decomposing parts of your applications into event handlers, starting up Machines to handle the events when needed, and stopping them when the event is done. Along the way we will see how a few built in Fly.io primitives make this easy.</p> <p>To make the discussion concrete, we are going to focus on a common requirement: generation of PDFs from web pages. The code that we will introduce isn&rsquo;t merely an example produced in support of a blog post - rather it is code that was extracted from a production application, and packaged up into an appliance that you can deploy in minutes to add PDF generation to your existing application.</p> <p>But before we dive in, let&rsquo;s back up a bit.</p> <h2 id='motivation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#motivation' aria-label='Anchor'></a><span class='plain-code'>Motivation</span></h2> <p>Normally the way this is approached is to start with a tool like <a href='https://github.com/puppeteer/puppeteer' title=''>Puppeteer</a>, <a href='https://github.com/Studiosity/grover#readme' title=''>Grover</a>, <a href='https://playwright.dev/' title=''>Playwright</a>, <a href='https://github.com/bitcrowd/chromic_pdf' title=''>ChromicPDF</a>, or <a href='https://spatie.be/docs/browsershot/v2/introduction' title=''>BrowserShot</a>. These and other tools ultimately launch a browser like <a href='https://developer.chrome.com/articles/new-headless/' title=''>Chrome headless</a>.</p> <p>Now a few things about Chrome itself:</p> <ul> <li>It likely is bigger than your entire web server. </li><li>It likely uses more memory than you see with a typical load on your server. </li><li>All total, people using your server likely spend much less time generating PDFs than they do using the rest of your application. </li></ul> <p>Taken together, this makes splitting PDF generation into a completely separate application an easy win. With a smaller image, your application will start faster. Memory usage will be more predictable, and the memory needed to generate PDFs will only be allocated when needed and can be scaled separately.</p> <h2 id='diving-in' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#diving-in' aria-label='Anchor'></a><span class='plain-code'>Diving in</span></h2> <p>Without further ado, the entire application is available on GitHub as <a href='https://github.com/fly-apps/pdf-appliance/#readme' title=''>fly-apps/pdf-appliance</a>. Installation is a simple matter of: clone repository, create app, adjust config, deploy, and scale.</p> <p>Next, you will need to integrate this into your application. All that is needed is to reply to requests that are intended to produce a PDF with a <a href='https://fly.io/docs/reference/dynamic-request-routing/#the-fly-replay-response-header' title=''>fly-replay</a> response header. This can either be done on individual application routes / controller actions, or it can be done globally via either middleware or a front end like <a href='https://www.nginx.com/' title=''>NGINX</a>. You can find a few examples in the <a href='https://github.com/fly-apps/pdf-appliance/#integrate-with-your-existing-application' title=''>README</a>.</p> <p>And, that&rsquo;s it. The most you might consider doing is issuing an additional HTTP request in anticipation of the user selecting what they want to print as this will <a href='https://github.com/fly-apps/pdf-appliance/#preloading-optional' title=''>preload the machine</a>.</p> <figure class="post-cta"> <figcaption> <h1>Scale at your own pace</h1> <p>Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.</p> <a class="btn btn-lg" href="https://fly.io/docs/"> Run your entire stack near your users </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <p>If you don&rsquo;t have an application handy, you can try a demo. Go to <a href='https://smooth.fly.dev/' title=''>smooth.fly.dev</a>. Click on Demo, then on Publish, and finally on Invoices to see a PDF. The PDF you see will likely be underwhelming as you would need to enter students, entries, packages and options to fill out the page. But click refresh anyway and see how fast it responds. If you want to explore further, links to the <a href='https://smooth.fly.dev/showcase/docs/' title=''>documentation</a> and <a href='https://github.com/rubys/showcase#readme' title=''>code</a> can be found on the front page.</p> <h2 id='implementation-details' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implementation-details' aria-label='Anchor'></a><span class='plain-code'>Implementation Details</span></h2> <p>The basic flow starts with a request comes into your app for a PDF. That request is replayed to the PDF appliance. A Chrome instance in that app then issues a second request to your app for the same URL minus the <code>.pdf</code> extension and then converts the HTML which it receives in response to a PDF. That PDF is then returned as the response to the original request.</p> <p>A single Google Chrome instance per machine will be reused across all requests, which itself is faster than starting a new instance per request. As all HTTP headers will be passed back to your application, this will seamlessly work with your existing session, cookies, and basic authentication.</p> <p>Starting up a machine on demand is handled by the <code>auto_stop_machines</code> setting in your <code>fly.toml</code>. With this in place, machines can confidently exit when idle, secure in the knowledge that they will be restarted when needed. See the <a href='https://github.com/fly-apps/pdf-appliance/#scaling' title=''>README</a> for more information on scaling.</p> <p>Note that different machines can use different languages and frameworks. This code is written in JavaScript and runs on Bun. It was designed to support a Ruby on Rails app, but can be used with any app.</p> <h2 id='a-reusable-pattern' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-reusable-pattern' aria-label='Anchor'></a><span class='plain-code'>A Reusable Pattern</span></h2> <p>If your app is small and your usage is low, scaling may not be much of a concern, but as your need grow your first instinct shouldn&rsquo;t merely be to throw more hardware at the problem, but rather to partition the problem so that each machine has a somewhat predictable capacity.</p> <p>Do this by taking a look at your application, and look for requests that are somehow different than the rest. Streaming audio and video files, handling websockets, converting text to speech or performing other AI processing, long running &ldquo;background&rdquo; computation, fetching static pages, producing PDFs, and updating databases all have different profiles in terms of server load.</p> <p>It might even be helpful &ndash; purely as a thought experiment &ndash; to think of replacing your main server with a proxy that does nothing more than route requests to separate machines based on the type of workload performed.</p> <p>Once you have come up with an allocation of functions performed to pools of machines, Fly-Replay is but one tool available to you. There is also a <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Machines API</a> that will enable you to orchestrate whatever topology you can come up with. <a href='https://fly.io/laravel-bytes/cost-effective-queue-workers-with-fly-io-machines/' title=''>Cost-Effective Queue Workers With Fly.io Machines</a> gives a preview of what that would look like with Laravel.</p></content> </entry> <entry> <title>Launching to Victory</title> <link rel="alternate" href="https://fly.io/blog/new-launch/"/> <id>https://fly.io/blog/new-launch/</id> <published>2023-11-28T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/new-launch/assets/thumb.webp"/> <content type="html"><div class="lead"><p>Fly.io is the new public cloud for running your applications near your users so it can be faster than ever. When you create a new application, you use the <code>fly launch</code> command to give the platform all the information it needs to send it out into the sky. We’ve made steps towards making launching a new app <em>even easier</em> because first impressions matter. <a href="https://fly.io/docs/speedrun/" title="">Try the new <code>fly launch</code> now</a>; you can have an app up and running in mere minutes.</p> </div> <p>Previously when you ran <code>fly launch</code>, you got asked a bunch of hopefully relevant questions to help you get your app up and running. We&rsquo;ve taken a lot of the guesswork out of the process and made it a lot more streamlined. It turns out that even though Fly.io developers use a variety of frameworks, languages, and toolchains you can fold most of them into a few basic infrastructure shapes.</p> <h2 id='the-new-launch' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-new-launch' aria-label='Anchor'></a><span class='plain-code'>The new launch</span></h2> <p>Now when you run <code>fly launch</code>, the CLI will infer what you want based on the source code of your application. For example, if you have a Rails app with SQLite, it&rsquo;ll give you an opinionated set of defaults that you can build from. If you don&rsquo;t, it&rsquo;ll give you other options so you can craft the infrastructure you need. I took one of my older applications named <a href='https://douglas-adams-quotes.fly.dev/' title=''>douglas-adams-quotes</a> and launched it with the new flow. Here&rsquo;s what it looks like:</p> <p><img alt="An animated GIF showing the new fully automated launch process. It starts by guessing what your app is and what needs it has, then presents you with a set of opinionated defaults so that you can confirm or deny. If you confirm it will build your application and deploy it, then give you the URL so you can use it." src="/blog/new-launch/assets/./the-gif-edited.gif" /></p> <p>If the settings it guessed are good enough, you can launch it into the cloud. If not, then you&rsquo;ll be taken to a webpage where you can confirm or change the settings it guessed.</p> <p>Once you say yes or confirm on the web, your app will get built and deployed (unless you asked it not to with <code>--no-deploy</code>). You&rsquo;ll get a link to your app so you can go check it out. It&rsquo;s that easy.</p> <h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>We hope that this can help you look before you <code>fly launch</code> into the wild unknowns of the cloud.</p> <p>Got any ideas or comments on how we can make this even smoother? Get in touch on our <a href='https://community.fly.io/' title=''>community forum</a>. We&rsquo;d love to hear from you.</p></content> </entry> <entry> <title>How I Fly</title> <link rel="alternate" href="https://fly.io/blog/how-i-fly/"/> <id>https://fly.io/blog/how-i-fly/</id> <published>2023-11-17T00:00:00+00:00</published> <updated>2023-11-28T14:16:01+00:00</updated> <media:thumbnail url="https://fly.io/blog/how-i-fly/assets/thumb.webp"/> <content type="html"><div class="lead"><p>We are Fly.io. We make it easy to run your programs close to your users. We make it easy to update your programs whenever you need to and communicate between your services in an end-to-end encrypted fashion. Today, Xe is going to tell you what they do to use Fly.io effectively. <a href="https://fly.io/docs/speedrun/" title="">Deploy your first app</a> for free and scale it up to production. That’s what Xe did.</p> </div> <p>I&rsquo;m Xe Iaso. I&rsquo;m a writer, technical educator, and philosopher who focuses on making technology easy to understand and scale to your needs. I use Fly.io to host my website and in nearly all of my personal projects now. Fly.io allows me to experiment with new ideas quickly and then deploy them to the world with ease.</p> <h2 id='what-is-fly-io' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-is-fly-io' aria-label='Anchor'></a><span class='plain-code'>What is Fly.io?</span></h2> <p>Fly.io lets you host your applications in data centers close to your users. Fly.io also lets you have rolling updates of your programs and facilitates easy communication between your services inside and outside of your organization&rsquo;s private network.</p> <p>I use Fly.io to host my blog, its CDN (named XeDN for reasons which are an exercise for the reader), and a bunch of other supporting services that help make it run. It is easily the most fun I&rsquo;ve had deploying things since I worked at Heroku.</p> <h2 id='my-blog' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#my-blog' aria-label='Anchor'></a><span class='plain-code'>My blog</span></h2> <p>My blog is made up of several parts: the backend blog server and the CDN. Both are written in Go, my favorite programming language. The back-end blog server runs in Toronto, but XeDN runs in 35 datacenters worldwide. I plan to eventually move my blog to be served from XeDN, but for right now it&rsquo;s still comfortably running off of a single server in Toronto.</p> <p><img alt="The entire flow for how things run on Xesite." src="/blog/how-i-fly/assets/./rebuild-flow.svg" /></p> <p>Overall, my website&rsquo;s architecture looks like this. My website listens for updates from Patreon and GitHub to trigger rebuilds because of its <a href='https://xeiaso.net/blog/xesite-v4/' title=''>dystatic nature</a>. When I am working on new posts or building new assets, I upload them to Backblaze B2. Anytime someone tries to access one of the files on a XeDN node, it will download it from Backblaze B2 if it doesn&rsquo;t have it locally already.</p> <p>With Fly.io, I don&rsquo;t have to worry about the user experience being degraded when servers go down. If any individual XeDN server goes down, I can rely on the other XeDN servers worldwide to pick up the slack thanks to the fact that Fly.io will shunt the traffic to the servers that aren&rsquo;t down. Combine this with some very aggressive caching logic for things like video assets, I can make sure that my blog is fast for everyone, no matter where they are in the world.</p> <p>Of course, it doesn&rsquo;t end here. My CDN server is the back end that helps make my other projects work too. I spent some time working on a <a href='https://xeiaso.net/blog/iaso-fonts/' title=''>custom font</a> for all of my web properties, and I <a href='https://cdn.xeiaso.net/static/pkg/iosevka/specimen.html' title=''>serve it from my CDN</a> so that I can use it in every project of mine. This allows me to integrate it into other projects like <a href='https://arsene.fly.dev/' title=''>Arsène</a> without having to do anything special.</p> <h2 id='building-on-top-of-projects-with-fly-io' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#building-on-top-of-projects-with-fly-io' aria-label='Anchor'></a><span class='plain-code'>Building on top of projects with Fly.io</span></h2> <p>I like making projects that aren&rsquo;t entirely serious. I love using these projects to explore aspects and bits of technology that I would have never gotten to play with before. One of these is <a href='https://arsene.fly.dev' title=''>Arsène</a>, a project I used to explore what a &ldquo;dead internet&rdquo; powered by AI could look like.</p> <p>Every 12 hours, Arsène will have the ChatGPT API generate new posts and then use Stable Diffusion to create a (hopefully relevant) illustration for that post. I run a copy of the <a href='https://github.com/AUTOMATIC1111/stable-diffusion-webui' title=''>Automatic1111</a> Stable Diffusion API in my private network. When Arsène generates an image, it reaches out to that Stable Diffusion API directly over that private network to make the calls it needs. Since XeDN is in the same private network, I can also have Arsène send the images there to be cached and served all over the world.</p> <p>Here&rsquo;s what the total flow looks like:</p> <p><img alt="The flow of data for Arsène, showing how this lets me reuse projects" src="/blog/how-i-fly/assets/./reuse-flow.svg" /></p> <p>This means that when I am creating things, I am not just making one-off things that don&rsquo;t work with each other. I am creating individual building blocks that interoperate with each other. I am creating opportunities for me to reuse my infrastructure to create brand new things that are robust and scalable with minimal effort on my end.</p> <h2 id='my-other-projects' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#my-other-projects' aria-label='Anchor'></a><span class='plain-code'>My other projects</span></h2> <p>I have some other projects that I&rsquo;m working on that I don&rsquo;t want to get into too much detail about yet, but it&rsquo;s going to mostly involve transforming the basic ideas of using my CDN for distributing things and a webserver for sending HTML to users in new and interesting ways. I love using Fly.io for this because I am just allowed to create things instead of having to worry about how to implement it, where state is going to be stored, or how I&rsquo;m going to scale it.</p> <div class="callout"><p>Fly.io is the only platform where I’ve used where I can spin up 35 copies of a program as easily as one copy of a program.</p> </div><h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>If you haven&rsquo;t given Fly.io a try yet, you&rsquo;re really missing out. It is utterly trivial to deploy your application across the globe. Not to mention, when your applications are idle, you can have them scale down to zero copies. This means that you only pay for what you actually use. I don&rsquo;t have to worry about overpaying for my blog by having a giant server in Helsinki running 24/7, even though I&rsquo;m only using a small sliver of it.</p> <p>If you want to learn more about Fly.io, you can check out <a href='https://fly.io' title=''>fly.io</a>. My CDN cost me nothing until I started adding cover art per post and the <a href='https://xeiaso.net//blog/how-mara-works-2020-09-30/' title=''>conversation snippets</a> with furry stickers. It definitely went over the bar when I started uploading video. I can see it scaling in the future as my demands scale too.</p> <p>Of course, this is barely even scratching the surface. Stay tuned for secret tricks you can use to dynamically spin up and spin down machines as you need. Imagine uploading an image, automatically creating a machine to handle compressing it, and uploading it to your storage back end. Imagine what you could do if compute was a faucet that you could turn on and off as you needed it.</p> <p>You can do it on Fly.io. Try it today, you can run an app on a 256 MB Machine for free. XeDN ran on three 256 MB Machines for a year. Arsène still runs on a 256 MB Machine to this day. It&rsquo;s more than enough for what you&rsquo;re going to do. And when it isn&rsquo;t, scaling up is <a href='https://fly.io/docs/about/pricing/' title=''>cheaper than you can imagine</a>.</p></content> </entry> <entry> <title>Transcribing on Fly GPU Machines</title> <link rel="alternate" href="https://fly.io/blog/transcribing-on-fly-gpu-machines/"/> <id>https://fly.io/blog/transcribing-on-fly-gpu-machines/</id> <published>2023-11-13T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/transcribing-on-fly-gpu-machines/assets/whispering-thumb.webp"/> <content type="html"><div class="lead"><p>Fly.io has GPUs! If you want to run AI (or whatever) workloads, checkout how to <a href="https://fly.io/docs/gpus/gpu-quickstart/" title="">get started with GPU Machines</a>!</p> </div> <p>Fly.io has GPU Machines, which means we can finally <del>play games</del> <del>mine bitcoin</del> <del>baghold NFTs</del> run AI workloads with just a few API calls.</p> <p>This is exciting! Running GPU workloads yourself is useful when the community™ builds upon available models to make them faster, more useful, or less restrictive than first-party APIs.</p> <p>One such tool is the <a href='https://github.com/ahmetoner/whisper-asr-webservice' title=''>Whisper Webservice</a>, which is conveniently packaged in a way that makes it a good candidate to use on Fly GPU Machines.</p> <p>Let&rsquo;s see how to use Fly.io GPU by spinning up Whisper Webservice.</p> <h2 id='whisper-webservice' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whisper-webservice' aria-label='Anchor'></a><span class='plain-code'>Whisper Webservice</span></h2> <p>Whisper is OpenAI&rsquo;s voice recognition service - it&rsquo;s used for audio transcription. To use it anywhere that&rsquo;s not OpenAI&rsquo;s platform, you need <a href='https://github.com/openai/whisper' title=''>some Python</a>, a few GB of storage, and (preferably) a GPU.</p> <p>The aforementioned <a href='https://github.com/ahmetoner/whisper-asr-webservice' title=''>Whisper Webservice</a> packages this up for us, while making Whisper faster, more useful, and less restricted than OpenAI&rsquo;s API:</p> <ol> <li>It provides a web API on top of Whisper&rsquo;s Python library </li><li>It (optionally) integrates <a href='https://github.com/guillaumekln/faster-whisper' title=''>faster-whisper</a> to make it, you know, faster </li><li>It (optionally) uses FFmpeg to process the uploaded audio file, useful for getting audio out of video files or converting audio formats </li></ol> <p>Luckily for us, and totally <strong class='font-semibold text-navy-950'>not</strong> why I chose this as an example - the project provides GPU-friendly Docker images. We&rsquo;ll use those to spin up Fly GPU Machines and process some audio files.</p> <p>(I&rsquo;ll also show examples of making your own Docker image!)</p> <h2 id='running-a-gpu-machine' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#running-a-gpu-machine' aria-label='Anchor'></a><span class='plain-code'>Running a GPU Machine</span></h2> <p>Spinning up a GPU Machine is very similar to any other Machine. The main difference is the new &ldquo;GPU kind&rdquo; option (<code>--vm-gpu-kind</code>), which takes 2 possible values:</p> <ol> <li><code>a100-pcie-40gb</code> </li><li><code>a100-sxm4-80gb</code> </li></ol> <p>These are 2 flavors of Nvidia A100 GPUs, the difference worth caring about is <code>40</code> vs <code>80</code> GB of memory (here&rsquo;s <a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''>pricing</a>).</p> <p>We&rsquo;ll create machines using <code>a100-pcie-40gb</code> because we don&rsquo;t need 80 freakin&rsquo; GB for what we&rsquo;re doing.</p> <p>Using <code>flyctl</code> is a great way to run a GPU Machine. We&rsquo;ll make an app and run the conveniently created <a href='https://hub.docker.com/r/onerahmet/openai-whisper-asr-webservice' title=''>Whisper Webservice Docker image</a> that supports Nvidia GPUs. The <code>flyctl</code> commands will default us into a <code>performance-8x</code> server size (8 CPUs, 16G ram) unless we specify something different.</p> <p><strong class='font-semibold text-navy-950'>One caveat:</strong> AI model files are big. Docker images ideally aren&rsquo;t big - sending huge layers across the network angers the spiteful networking gods. If you shove models into your Docker images, you <em>might</em> have a bad time.</p> <p>We suggest creating a Fly Volume and making your Docker image download needed models when it first spins up. The Whisper service (and in my experience, OpenAI&rsquo;s Python library) does that for us.</p> <p>So, we&rsquo;ll create a volume to house (and cache) the models. In the case of the Whisper project, the models get placed in <code>/root/.cache/whisper</code> on its first boot, and so we&rsquo;ll mount our disk there.</p> <p>Alright, let&rsquo;s create a GPU Machine. Here&rsquo;s what the process looks like:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-chwf29f7" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-chwf29f7"><span class="nv">APP_NAME</span><span class="o">=</span><span class="s2">"whispering-zines"</span> fly apps create <span class="nv">$APP_NAME</span> <span class="nt">-o</span> personal <span class="c"># We "hint" --vm-gpu-kind so the volume</span> <span class="c"># is provisioned on a GPU host</span> <span class="c"># We choose region ord, where most Fly GPUs</span> <span class="c"># currently live</span> fly volumes create whisper_zine_cache <span class="nt">-s</span> 10 <span class="se">\</span> <span class="nt">-a</span> <span class="nv">$APP_NAME</span> <span class="nt">-r</span> ord <span class="nt">--vm-gpu-kind</span> a100-pcie-40gb <span class="c"># Take note of the volume ID from the output ^</span> <span class="c"># Run a machine that can accept web requests</span> <span class="c"># from the public internet</span> fly machines run onerahmet/openai-whisper-asr-webservice:latest-gpu <span class="se">\</span> <span class="nt">--vm-gpu-kind</span> a100-pcie-40gb <span class="se">\</span> <span class="nt">-p</span> 443:9000/tcp:tls:http <span class="nt">-p</span> 80:9000/tcp:http <span class="se">\</span> <span class="nt">-r</span> ord <span class="se">\</span> <span class="nt">-v</span> &lt;VOLUME_ID&gt;:/root/.cache/whisper <span class="se">\</span> <span class="nt">-e</span> <span class="nv">ASR_MODEL</span><span class="o">=</span>large <span class="nt">-e</span> <span class="nv">ASR_ENGINE</span><span class="o">=</span>faster_whisper <span class="se">\</span> <span class="nt">-a</span> <span class="nv">$APP_NAME</span> <span class="c"># Allocate IPs so we can view it on the web</span> fly ips allocate-v4 <span class="nt">--shared</span> <span class="nt">-a</span> <span class="nv">$APP_NAME</span> fly ips allocate-v6 <span class="nt">-a</span> <span class="nv">$APP_NAME</span> </code></pre> </div> </div> <p>That&rsquo;s all pretty standard for Fly Machines, <strong class='font-semibold text-navy-950'>except</strong> for the <code>--vm-gpu-kind</code> flags used both for volume <strong class='font-semibold text-navy-950'>and</strong> Machine creation. Volumes are pinned to specific hosts - using this flag tells Fly.io to create the volume on a GPU host. Assuming we set the same region (<code>-r ord</code>), creating a GPU Machine with the just-created volume will tell Fly.io to place the Machine on the same host as the volume.</p> <p><strong class='font-semibold text-navy-950'>Note:</strong> As my machine started up, I saw a log line <code>WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.</code>, which ended up being an issue of timing. Once everything is running, I was able to see things were working by using <code>fly ssh console -a $APP_NAME</code> and running command <code>nvidia-smi</code> to confirm that the VM had a GPU. It also listed the running web service (Python in this case) was running as a GPU process.</p> <p>Once everything is running, you should be able to head to <code>$APP_NAME.fly.dev</code> and view it in the browser.</p> <p>The Whisper Webservice UI will let you try out individual calls in its API. This will also give you the information you need to make those calls from your code. There&rsquo;s a link to the API specification (e.g. <code>$APP_NAME.fly.dev/openapi.json</code>) you can use to, say, have <a href='https://www.blobr.io/post/create-api-specs-chatgpt' title=''>ChatGPT generate a client</a> in your language of choice.</p> <h2 id='automating-gpu-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#automating-gpu-machines' aria-label='Anchor'></a><span class='plain-code'>Automating GPU Machines</span></h2> <p>If you want to automate this, you can use the <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Machines API</a> (spec <a href='https://docs.machines.dev/swagger/index.html' title=''>here</a>).</p> <p>An easy way to get started is to spy on the API requests <code>flyctl</code> is making:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-13v3zt2f" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-13v3zt2f"><span class="c"># Debug logs will output the API requests / responses</span> <span class="c"># made to Fly.io's API.</span> <span class="nv">LOG_LEVEL</span><span class="o">=</span>debug flyctl machine run ... </code></pre> </div> </div> <p>This helped me figure out why my own initial API attempts failed - it turns out we need some extra parameters in the <code>compute</code> portion of the request JSON for creating a volume, and the <code>guest</code> section for creating a Machine.</p> <p>For both volumes and Machines, we set the <code>gpu_kind</code> the same way we did in our <code>flyctl</code> command. However we <em>also</em> need the <code>cpu_kind</code> to be set. Additionally, when creating a Machine, we need to set <code>cpus</code> and <code>memory_mb</code> to <a href='https://fly.io/docs/machines/guides-examples/machine-sizing/' title=''>valid values</a> for <code>performance</code> Machines.</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-e14p7s3k" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-e14p7s3k"><span class="nv">APP_NAME</span><span class="o">=</span><span class="s2">"whispering-zines"</span> <span class="c"># Create a volume on a GPU host. Specify both</span> <span class="c"># cpu_kind and gpu_kind</span> curl <span class="nt">-H</span> <span class="s2">"Authorization: Bearer </span><span class="sb">`</span>fly auth token<span class="sb">`</span><span class="s2">"</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s2">"Accept: application/json"</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s2">"Content-Type: application/json"</span> <span class="se">\</span> https://api.machines.dev/v1/apps/<span class="nv">$APP_NAME</span>/volumes <span class="se">\</span> <span class="nt">-d</span> <span class="s1">'{ "name": "whisper_zine_cache", "region": "ord", "size_gb": 10, "compute": { "cpu_kind": "performance", "gpu_kind": "a100-pcie-40gb" } }'</span> <span class="c"># Take note of the volume ID from the response ^</span> <span class="c"># Run a machine that can accept web requests</span> <span class="c"># from the public internet.</span> curl <span class="nt">-H</span> <span class="s2">"Authorization: Bearer </span><span class="sb">`</span>fly auth token<span class="sb">`</span><span class="s2">"</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s2">"Accept: application/json"</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s2">"Content-Type: application/json"</span> <span class="se">\</span> https://api.machines.dev/v1/apps/<span class="nv">$APP_NAME</span>/machines <span class="se">\</span> <span class="nt">-d</span> <span class="s1">'{ "region": "ord", "config": { "env": { "ASR_ENGINE": "faster_whisper", "ASR_MODEL": "large", "FLY_PROCESS_GROUP": "app", "PRIMARY_REGION": "ord" }, "mounts": [ { "path": "/root/.cache/whisper", "volume": "&lt;VOLUME_ID&gt;", "name": "data" } ], "services": [ { "protocol": "tcp", "internal_port": 9000, "autostop": false, "ports": [ { "port": 80, "handlers": [ "http" ], "force_https": true }, { "port": 443, "handlers": [ "http", "tls" ] } ] } ], "image": "onerahmet/openai-whisper-asr-webservice:latest-gpu", "guest": { "cpus": 8, "memory_mb": 16384, "cpu_kind": "performance", "gpu_kind": "a100-pcie-40gb" } } }'</span> </code></pre> </div> </div> <p>After that we can assign the app some IPs. You can use <code>flyctl</code> for this, or the <a href='https://api.fly.io/graphql' title=''>graphql API.</a> You can once again use debug mode with <code>flyctl</code> to see what API calls it makes. Side note: Eventually the Machines REST API will include the ability to allocate IP addresses.</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-v5sntcmu" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-v5sntcmu">fly ips allocate-v4 <span class="nt">--shared</span> <span class="nt">-a</span> <span class="nv">$APP_NAME</span> fly ips allocate-v6 <span class="nt">-a</span> <span class="nv">$APP_NAME</span> </code></pre> </div> </div> <p>If you&rsquo;re doing this type of work for your business, you may want to keep these Machines inside a private network anyway, in which case you won&rsquo;t be assigning it IP addresses.</p> <h2 id='making-your-own-images' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#making-your-own-images' aria-label='Anchor'></a><span class='plain-code'>Making Your Own Images</span></h2> <p>There is, luckily (for me, a hardware ignoramus) less dark magic to making GPU-friendly Docker images than you might think. Basically you need to just install the correct Nvidia drivers.</p> <p>A way to cheat at this is to run <a href='https://github.com/NVIDIA/nvidia-container-toolkit/tree/main' title=''>Nvidia cuda base images</a>, but you&rsquo;re made of sterner stuff, you can also start with a base Ubuntu image and install your own.</p> <p>While the Whisper webservice image is based on <code>nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04</code>, I got Whisper (plain, not the webservice) working with <code>ubuntu:22.04</code>:</p> <div class="highlight-wrapper group relative dockerfile"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-qjwtp7g3" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-qjwtp7g3"><span class="c"># Base image</span> <span class="k">FROM</span><span class="s"> ubuntu:22.04</span> <span class="k">RUN </span>apt update <span class="nt">-q</span> <span class="o">&amp;&amp;</span> apt <span class="nb">install</span> <span class="nt">-y</span> ca-certificates wget <span class="se">\ </span> <span class="o">&amp;&amp;</span> wget <span class="nt">-qO</span> /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb <span class="se">\ </span> <span class="o">&amp;&amp;</span> dpkg <span class="nt">-i</span> /cuda-keyring.deb <span class="o">&amp;&amp;</span> apt update <span class="nt">-q</span> <span class="se">\ </span> <span class="o">&amp;&amp;</span> apt <span class="nb">install</span> <span class="nt">-y</span> <span class="nt">--no-install-recommends</span> ffmpeg libcudnn8 libcublas-12-2 <span class="se">\ </span> git python3 python3-pip <span class="k">WORKDIR</span><span class="s"> /app</span> <span class="k">COPY</span><span class="s"> audio.mp3</span> <span class="k">COPY</span><span class="s"> run.py /app/run.py</span> <span class="k">CMD</span><span class="s"> ["python3" "run.py"]</span> </code></pre> </div> </div> <p>You can find a full, <a href='https://github.com/fly-apps/whisper-example' title=''>working version of this here</a>.</p> <h2 id='this-time-its-different-i-guess' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#this-time-its-different-i-guess' aria-label='Anchor'></a><span class='plain-code'>This time it&rsquo;s different, I guess</span></h2> <p>AI feels a bit different than previous trends in that it has immediately-obvious benefits. No one needs to throw around catchy phrases with a wink-wink nudge-nudge (&ldquo;we like the art&rdquo;) for us to find value.</p> <p>Since AI workloads work most efficiently in GPUs, they remain a hot commodity. For those of us who didn&rsquo;t purchase enough $NVDA to retire, we can bring more value to our businesses by adding in AI.</p> <p>Fly Machines have always been a great little piece of tech to run &ldquo;ephemeral compute workloads&rdquo; (wait, do I work at AWS!?) - and this is what I like about GPU Machines. You can mix and match all sorts of AI stuff together to make a chain of useful tools!</p></content> </entry> <entry> <title>Skip the API, Ship Your Database</title> <link rel="alternate" href="https://fly.io/blog/skip-the-api/"/> <id>https://fly.io/blog/skip-the-api/</id> <published>2023-09-13T00:00:00+00:00</published> <updated>2023-11-21T21:08:37+00:00</updated> <media:thumbnail url="https://fly.io/blog/skip-the-api/assets/skip-the-api-thumb.webp"/> <content type="html"><div class="lead"><p>With Fly.io, <a href="https://fly.io/docs/speedrun/" title="">you can get your app running globally in a matter of minutes</a>, and with LiteFS, you can run SQLite alongside your app! Now we’re introducing LiteFS Cloud: managed backups and point-in-time restores for LiteFS. <a href="https://fly.io/docs/litefs/speedrun/" title="">Try it out for yourself</a>!</p> </div> <p>My favorite part about building tools is discovering their unintended uses. It&rsquo;s like starting to write a murder mystery book but you have no idea who the killer is!</p> <p>History is filled with examples of these accidental discoveries: WD-40 was originally <a href='https://en.wikipedia.org/wiki/WD-40#History' title=''>used to protect ICBMs from rust</a> and now it fixes your squeaky doorknob. Bubble wrap was <a href='https://en.wikipedia.org/wiki/Bubble_Wrap_(brand)#History' title=''>originally sold as wallpaper</a> and now it protects your Amazon packages.</p> <p>When we started writing <a href='https://fly.io/docs/litefs/' title=''>LiteFS</a>, a distributed SQLite database, we thought it would be used to distribute data geographically so users in, say, Bucharest see response times as fast as users in San Jose. And for the most part, that&rsquo;s what LiteFS users are doing.</p> <p>But we discovered another unexpected use: replacing the API layer between services with SQLite databases.</p> <h2 id='how-it-started' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-it-started' aria-label='Anchor'></a><span class='plain-code'>How it started</span></h2> <p>In the early days of LiteFS development, we wanted to find a real-world test bed for our tool so we could smoke out any bugs that we didn&rsquo;t find during automated tests. Part of our existing infrastructure is a program called <em>Corrosion</em> that gossips state between all our servers. Corrosion tracks VM statuses, health checks, and a plethora of other information for each server and communicates this info with other servers so they can make intelligent decisions about request routing and VM placement. Corrosion keeps a fast, local copy of all this data in a SQLite database.</p> <p>So we set up a Corrosion instance that also ran on top of LiteFS. This helped root out some bugs but we also found another use for it: making Corrosion accessible to our internal services.</p> <p><img src="/blog/skip-the-api/assets/corrosion.png" /></p> <h2 id='shipping-the-kitchen-sink' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#shipping-the-kitchen-sink' aria-label='Anchor'></a><span class='plain-code'>Shipping the kitchen sink</span></h2> <p>The typical approach to making data available between services is to spend weeks designing an API and then building a service around it. Your API design needs to take into account the different use cases of each consuming service so that it can deliver the data it needs efficiently. You don&rsquo;t want your clients making a dozen API calls for every request!</p> <p><img src="/blog/skip-the-api/assets/architecture.png" /></p> <p>A different approach is to skip the API design entirely and just ship the entire database to your client. You don&rsquo;t need to consider the consuming service&rsquo;s access patterns as they can use vanilla SQL to query and join whatever data their heart desires. That&rsquo;s what we did using LiteFS.</p> <p>While we could have set up each downstream service as a Corrosion node, gossip protocols can be chatty and we really just needed a one-way stream of updates. Setting up a read-only LiteFS instance for a new service is simple—it just needs the hostname of the upstream primary node to connect to:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-e631uyyz" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-e631uyyz">lease: type: "static" candidate: false advertise-url: "http://corrosion-bridge:20202 </code></pre> </div> </div> <p>And voila! You have a full, read-only copy of the database on your app.</p> <h2 id='moving-compute-to-the-client' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#moving-compute-to-the-client' aria-label='Anchor'></a><span class='plain-code'>Moving compute to the client</span></h2> <p>API design is notoriously difficult as it&rsquo;s hard to know what your consuming services will need. Query languages such as <a href='https://graphql.org/' title=''>GraphQL</a> have even been invented for this specific problem!</p> <p>However, GraphQL has its own limitations. It&rsquo;s good for fetching raw data but it lacks built-in <a href='https://www.sqlite.org/lang_aggfunc.html' title=''>aggregation</a> &amp; advanced querying capabilities like <a href='https://www.sqlite.org/windowfunctions.html' title=''>windowing</a>. GraphQL is typically layered on top of an existing relational database that uses SQL. So why not just use SQL?</p> <p>Additionally, performing queries on your service means that you need to handle multiple tenants competing for compute resources. Managing these tenants involves rate limiting and query timeouts so that no one client consumes all the resources.</p> <p>By pushing a read-only copy of the database to clients, these restrictions aren&rsquo;t a concern anymore. A tenant can use 100% of its CPU for hours if it wants to. It won&rsquo;t adversely affect any other tenant because the query is running on its own hardware.</p> <h2 id='so-whats-the-downside' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#so-whats-the-downside' aria-label='Anchor'></a><span class='plain-code'>So what&rsquo;s the downside?</span></h2> <p>There&rsquo;s always trade-offs with any technology and shipping read-only replicas is no different. One obvious limitation of read-only replicas is that they&rsquo;re read-only. If your clients need to update data, they&rsquo;ll still need an API for those mutations.</p> <p>A less obvious downside is that the contract for a database can be less strict than an API. One benefit to an API layer is that you can change the underlying database structure but still massage data to look the same to clients. When you&rsquo;re shipping the raw database, that becomes more difficult. Fortunately, many database changes, such as adding columns to a table, are backwards compatible so clients don&rsquo;t need to change their code. Database views are also a great way to reshape data so it stays consistent—even when the underlying tables change.</p> <p>Finally, shipping a database limits your ability to restrict access to data. If you have a multi-tenant database, you can&rsquo;t ship that database without the client seeing all the data. One workaround for this is to use a database per tenant. SQLite databases are lightweight since they are just files on disk. This also has the added benefit of preventing queries in your application from accidentally fetching data across tenants.</p> <h2 id='where-do-we-take-this-next' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-do-we-take-this-next' aria-label='Anchor'></a><span class='plain-code'>Where do we take this next?</span></h2> <p>While this approach has worked well for some internal tooling, how does this look in the broader world of software? APIs are likely stick around for the foreseeable future so providing read-only database replicas make sense for specific use cases where those APIs aren&rsquo;t a great fit.</p> <p>Imagine being able to query all your Stripe data or your GitHub data from a local database. You could join that data on to your own dataset and perform fast queries on your own hardware.</p> <p>While companies such as Stripe or GitHub likely colocate their tenant data into one database, many companies run an event bus using tools like Kafka which could allow them to generate per-tenant SQLite databases to then stream to customers.</p> <p>Pushing queries out to the end user has huge benefits for both the data provider &amp; the data consumer in terms of flexibility and power.</p></content> </entry> <entry> <title>Automated Sentry Error Tracking</title> <link rel="alternate" href="https://fly.io/blog/sentry-partnership/"/> <id>https://fly.io/blog/sentry-partnership/</id> <published>2023-09-12T00:00:00+00:00</published> <updated>2023-11-21T21:08:37+00:00</updated> <media:thumbnail url="https://fly.io/blog/sentry-partnership/assets/sentry-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io. We put your code into lightweight microVMs on our own hardware <a href="https://fly.io/docs/reference/regions/" title="">around the world</a>, close to your users. We partnered with <a href="https://sentry.io" title="">Sentry</a> to bring error and performance monitoring to your apps. Deploy your first app, and automatically get a year’s worth of credits to Sentry’s <a href="https://sentry.io/pricing/" title="">Team Plan</a> credits. <a href="https://fly.io/docs/speedrun/" title="">Check us out</a>—your app can be deployed and instrumented in minutes.</p> </div> <p>We&rsquo;ve been using Sentry since the dawn of the internet. Or at least as far back as the <a href='https://home.cern/science/physics/higgs-boson/how' title=''>discovery</a> of the Higgs boson. Project to project, the familiar Sentry issue detail screen has been our faithful debugging companion.</p> <p>Today it&rsquo;s no exception: All of our Golang, Elixir, Ruby and Rust services report dutifully to Sentry.</p> <p>So, it felt natural to integrate Sentry as the default error monitoring tool. All new deployments on Fly.io get a Sentry project provisioned automatically. Existing apps can grab theirs with <code>flyctl ext sentry create</code>.</p> <p>Each Fly.io organization receives, for one year, a generous monthly quota:</p> <ul> <li>50,000 Error events </li><li>100,000 Performance units </li><li>500 Session Replays </li><li>1GB of storage for Attachments </li></ul> <p>Once your app is instrumented, you’ll automatically get notified of production errors, latency issues, and crashes as soon as they occur in production. Sentry’s Team plan also gives you access to over 40 integrations, unlimited seats, and custom alerting.</p> <h2 id='auto-instrumenting-rails' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#auto-instrumenting-rails' aria-label='Anchor'></a><span class='plain-code'>Auto-instrumenting Rails</span></h2> <p>To see Sentry in action, let&rsquo;s launch our <a href='https://github.com/fly-apps/boomer' title=''>Boomer Rails App</a>. Yes kids, Rails is old school, and it&rsquo;s the easiest framework to auto-instrument.</p> <p>When <code>flyctl launch</code> detects a Rails app, it&rsquo;s automatically setup to use a freshly minted Sentry project. Gems are installed, initializers planted, and finally, the <code>SENTRY_DSN</code> secret is set for deployment. We redacted some output for brevity.</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-onfm6lp2" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-onfm6lp2">fly deploy </code></pre> </div> </div><div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-93jth4av" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-93jth4av">==&gt; Verifying app config ... Your Sentry project is ready. See details and next steps with: flyctl apps errors Setting the following secrets on boomerang: SENTRY_DSN ... Visit your newly deployed app at https://boomerang.fly.dev/ </code></pre> </div> </div> <p>Now, having Sentry configured at launch time means that deployment errors are captured early. This is useful for situations where apps fail to boot, run out of memory, and so on.</p> <p>Now let&rsquo;s force an application exception. We visit the app root, which goes Boom, thanks to some hastily written Ruby code.</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-suswx77f" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-suswx77f">flyctl open </code></pre> </div> </div> <p><img src="/blog/sentry-partnership/assets/boom-cover.webp?card&amp;center" /></p> <p>Oh shucks. Something went wrong. But, I got an email about this error.</p> <p><img src="/blog/sentry-partnership/assets/email-cover.webp?card&amp;center" /></p> <p>We could click &ldquo;View on Sentry&rdquo;. Instead, let&rsquo;s use <code>flyctl</code> to send us to the Sentry issues dashboard.</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-3seig9v4" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-3seig9v4">flyctl apps errors </code></pre> </div> </div> <p>We click through to this specific issue.</p> <p><img src="/blog/sentry-partnership/assets/dash.webp?card&amp;center" /></p> <p>We successfully debugged our issue. The takeaway: don&rsquo;t raise when you can call.</p> <p>Error tracking on Sentry is just scratching the surface. Check out their <a href='https://docs.sentry.io/product/performance/' title=''>performance monitoring</a>, <a href='https://docs.sentry.io/product/session-replay' title=''>session replay</a>, <a href='https://docs.sentry.io/product/alerts/' title=''>alerting</a> and <a href='https://docs.sentry.io/product/' title=''>much more</a>.</p> <h2 id='next-steps-for-fly-io-and-sentry' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#next-steps-for-fly-io-and-sentry' aria-label='Anchor'></a><span class='plain-code'>Next Steps for Fly.io and Sentry</span></h2> <p>For our next trick, we&rsquo;ll be tracking Fly.io releases in Sentry, so Sentry can link issues to their <a href='https://docs.sentry.io/product/releases/' title=''>release tracking</a> feature. We&rsquo;ll also send events like <a href='https://fly.io/docs/getting-started/troubleshooting/#out-of-memory-oom-or-high-cpu-usage' title=''>out-of-memory errors</a> to Sentry. The possibilities are endless.</p> <p>Got ideas or comments? Get in touch on our <a href='https://community.fly.io/' title=''>community forum</a>.</p></content> </entry> <entry> <title>Tracking Application-Level Consistency with LiteFS</title> <link rel="alternate" href="https://fly.io/blog/tracking-consistency-with-litefs/"/> <id>https://fly.io/blog/tracking-consistency-with-litefs/</id> <published>2023-08-30T00:00:00+00:00</published> <updated>2023-11-21T21:08:37+00:00</updated> <media:thumbnail url="https://fly.io/blog/tracking-consistency-with-litefs/assets/tracking-consistency-thumb.webp"/> <content type="html"><div class="lead"><p>With Fly.io, <a href="https://fly.io/docs/speedrun/" title="">you can get your app running globally in a matter of minutes</a>, and with LiteFS, you can run SQLite alongside your app! Now we’re introducing LiteFS Cloud: managed backups and point-in-time restores for LiteFS. <a href="https://fly.io/docs/litefs/speedrun/" title="">Try it out for yourself</a>!</p> </div> <p>When we started the <a href='https://fly.io/docs/litefs/' title=''>LiteFS</a> project a year ago, we started more with an ideal in mind rather than a specific implementation. We wanted to make it possible to not only run distributed SQLite but we also wanted to make it… <em>gasp</em>… easy!</p> <p>There were hurdles that we expected to be hard, such as intercepting SQLite transaction boundaries via syscalls or shipping logs around the world while ensuring data integrity. But there was one hurdle that was unexpectedly hard: maintaining a consistent view from the application&rsquo;s perspective.</p> <p>LiteFS requires write transactions to only be performed at the primary node and then those transactions are shipped back to replicas instantaneously. Well, almost instantaneously. And therein lies the crux of our problem.</p> <p>Let&rsquo;s say your user sends a write request to write to the primary node in Madrid and the user&rsquo;s next read request goes to a local read-only replica in Rio de Janeiro. Most of the time LiteFS completes replication quickly and everything is fine. But if your request arrives a few milliseconds before data is replicated, then your user sees the database state from before the write occurred. That&rsquo;s no good.</p> <p>How exactly do we handle that when our database lives outside the user&rsquo;s application?</p> <h2 id='our-initial-series-of-failures-or-how-we-tried-to-teach-distributed-systems-to-users' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-initial-series-of-failures-or-how-we-tried-to-teach-distributed-systems-to-users' aria-label='Anchor'></a><span class='plain-code'>Our initial series of failures, or how we tried to teach distributed systems to users</span></h2> <p>Our first plan was to let LiteFS users manage consistency themselves. Every application may have different needs and, honestly, we didn&rsquo;t have a better plan at the time. However, once we started explaining how to track replication state, it became obvious that it was going to be an untenable approach. Let&rsquo;s start with a primer and you&rsquo;ll understand why.</p> <p>Every node in LiteFS maintains a <em>replication position</em> for each database which consists of two values:</p> <ul> <li>Transaction ID (TXID): An identifier that monotonically increases with every successful write transaction. </li><li>Post-Apply Checksum: A checksum of the entire database after the transaction has been written to disk. </li></ul> <p>You can read the current position from your LiteFS mount from the <code>-pos</code> file:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-wv6ha7bx" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-wv6ha7bx">$ cat /litefs/my.db-pos 000000000042478b/8b73bc1d07d84988 </code></pre> </div> </div> <p>This example shows that we are at TXID <code>0x42478b</code> (or 4,343,691 in decimal) and the checksum of our whole database after the transaction is <code>8b73bc1d07d84988</code>. A replica can detect how far it&rsquo;s lagging behind by comparing its position to the primary&rsquo;s position. Typically, a monotonic transaction ID doesn&rsquo;t work in asynchronous replication systems like LiteFS but when we couple it with a checksum it allows us to check for divergence so the pair works surprisingly well.</p> <p>LiteFS handles the replication position internally, however, it would be up to the application to check it to ensure that its clients saw a consistent view. This meant that the application would have needed to have its clients track the TXID from their last write to the primary and then the application would have to wait until its local replication caught up to that position before it could serve the request.</p> <p>That would have been a lot to manage. While you may find the nuts and bolts of replication interesting, sometimes you just want to get your app up and running!</p> <h2 id='lets-use-a-library-er-libraries' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lets-use-a-library-er-libraries' aria-label='Anchor'></a><span class='plain-code'>Let&rsquo;s use a library! Er, libraries.</span></h2> <p>Teaching distributed systems to each and every LiteFS user was not going to work. So instead, we thought we could tuck that complexity away by providing a LiteFS client library. Just import a package and you&rsquo;re done!</p> <p>Libraries are a great way to abstract away the tough parts of a system. For example, nobody wants to roll their own cryptography implementation so they use a library. But LiteFS is a database so it needs to work across all languages which means we needed to implement a library for each language.</p> <p>Actually, it&rsquo;s worse than that. We need to act as a traffic cop to redirect incoming client requests to make sure they arrive at the primary node for writes or that they see a consistent view on a replica for reads. We aren&rsquo;t able to redirect writes at the data layer so it&rsquo;s typically handled at the HTTP layer. Within each language ecosystem there can be a variety of web server implementations: Ruby has Rails &amp; Sinatra, Go has net/http, gin, fasthttp, and whatever 12 new routers came out this week.</p> <h2 id='moving-up-the-abstraction-stack' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#moving-up-the-abstraction-stack' aria-label='Anchor'></a><span class='plain-code'>Moving up the abstraction stack</span></h2> <p>Abstraction often feels like a footgun. Generalizing functionality across multiple situations means that you lose flexibility in specific situations. Sometimes that means you shouldn&rsquo;t abstract but sometimes you just haven&rsquo;t found the right abstraction layer yet.</p> <p>For better or for worse, HTTP &amp; REST-like applications have become the norm in our industry and some of the conventions provide a great layer for LiteFS to build upon. Specifically, the convention of using <code>GET</code> requests for reading data and the other methods (<code>POST</code>, <code>PUT</code>, <code>DELETE</code>, etc) for writing data.</p> <p>Instead of developers injecting a LiteFS library into their application, we built a thin HTTP proxy that lives in front of the application.</p> <p><img alt="Wrapping the application with a proxy &amp; FUSE mount." src="https://slabstatic.com/prod/uploads/p1b436gf/posts/images/25yuWQlLKyLrkHBDFVcbU8to.png" /></p> <p>This approach has let us manage both the incoming client side via HTTP as well as the backend data plane via our FUSE mount. It lets us isolate the application developer from the low-level details of LiteFS replication while making it feel like they&rsquo;re developing against vanilla SQLite.</p> <h2 id='how-it-works' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-it-works' aria-label='Anchor'></a><span class='plain-code'>How it works</span></h2> <p>The LiteFS proxy design is simple but effective. As an example, let&rsquo;s start with a write request. A user creates a new order so they send a <code>POST /orders</code> request to your web app. The LiteFS proxy intercepts the request &amp; parses the HTTP headers to see that it&rsquo;s a <code>POST</code> write request. If the local node is a replica, the proxy forwards the request to the primary node.</p> <p>If the local node is the primary, it&rsquo;ll pass the request through to the application&rsquo;s web server and the request will be processed normally. When the response begins streaming out to the client, the proxy will attach a cookie with the TXID of the newly-written commit.</p> <p>When the client then sends a <code>GET</code> read request, the LiteFS proxy again intercepts it and parses the headers. It can see the TXID that was set in the cookie on the previous write and the proxy will check it against the replication position of the local replica. If replication has caught up to the client&rsquo;s last write transaction, it&rsquo;ll pass through the request to the application. Otherwise, it&rsquo;ll wait for the local node to catch up or it will eventually time out. The proxy is built into the <code>litefs</code> binary so communication with the internal replication state is wicked fast.</p> <h2 id='preventing-laggards' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#preventing-laggards' aria-label='Anchor'></a><span class='plain-code'>Preventing laggards</span></h2> <p>The proxy provides another benefit: health checks. Networks and servers don&rsquo;t always play nice when they&rsquo;re communicating across the world and sometimes they get disconnected. The proxy hooks into the LiteFS built-in heartbeat system to detect lag and it can report the node as unhealthy via a health check URL when this lag exceeds a threshold.</p> <p>If you&rsquo;re running on Fly.io, we&rsquo;ll take that node out of rotation when health checks begin reporting issues so users will automatically get routed to a different, healthy replica. When the replica reconnects to the primary, the health check will report as healthy and the node will rejoin.</p> <h2 id='the-tradeoffs-theres-always-tradeoffs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-tradeoffs-theres-always-tradeoffs' aria-label='Anchor'></a><span class='plain-code'>The Tradeoffs… there&rsquo;s always tradeoffs!</span></h2> <p>Despite how well the LiteFS proxy works in most situations, there&rsquo;s gonna be times when it doesn&rsquo;t quite fit. For example, if your application cannot rely on cookies to track application state then the proxy won&rsquo;t work for you.</p> <p>There are also frameworks, like <a href='https://www.phoenixframework.org/' title=''>Phoenix</a>, which can rely heavily on websockets for live updates so this circumvents your traditional HTTP request/response approach that LiteFS proxy depends on. Finally, the proxy provides <a href='https://jepsen.io/consistency/models/read-your-writes' title=''>read-your-writes</a> guarantees which may not work for every application out there.</p> <p>In these cases, <a href='https://github.com/superfly/litefs/issues/new' title=''>let us know how we can improve the proxy</a> to make it work for more use cases! We&rsquo;d love to hear your thoughts.</p> <h2 id='diving-in-further' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#diving-in-further' aria-label='Anchor'></a><span class='plain-code'>Diving in further</span></h2> <p>The LiteFS proxy makes it easy to run SQLite applications in multiple regions around the world. You can even run many legacy applications with little to no change in the code.</p> <p>If you&rsquo;re interested in setting up LiteFS, check out our <a href='https://fly.io/docs/litefs/getting-started-fly/' title=''>Getting Started</a> guide. You can find additional details about configuring the proxy on our <a href='https://fly.io/docs/litefs/proxy/' title=''>Built-in HTTP Proxy</a> docs page.</p></content> </entry> <entry> <title>Multiple Logs for Resiliency</title> <link rel="alternate" href="https://fly.io/blog/redundant-logs/"/> <id>https://fly.io/blog/redundant-logs/</id> <published>2023-07-21T00:00:00+00:00</published> <updated>2023-11-21T21:08:37+00:00</updated> <media:thumbnail url="https://fly.io/blog/redundant-logs/assets/lergs-thumb.webp"/> <content type="html"><p>You&rsquo;ve done everything right. You are well aware of <a href='https://en.wikipedia.org/wiki/Murphy%27s_law' title=''>Murphy&rsquo;s Law</a>. You have multiple redundant machines. You&rsquo;ve set up a regular back up schedule for your database, perhaps even are using <a href='https://fly.io/blog/litefs-cloud/' title=''>LiteFS CLoud</a>. You <a href='https://fly.io/blog/shipping-logs/' title=''>ship your logs</a> to <a href='https://logtail.com/' title=''>LogTail</a> or perhaps some other <a href='https://github.com/superfly/fly-log-shipper#provider-configuration' title=''>provider</a> so you can do forensic analysis should anything go wrong&hellip;</p> <p>Then the unexpected happens. A major network outage causes your application to misbehave. What&rsquo;s worse is that your logs are missing crucial data from this point, perhaps because of the same network outage. Maybe this time you are lucky and you can find the data you need by using copies of your logs via <a href='https://fly.io/docs/flyctl/logs/' title=''>flyctl logs</a> or the monitoring tab on the <a href='https://fly.io/docs/flyctl/dashboard/' title=''>flyctl dashboard</a> before they disappear forever.</p> <p>So, what is going on here? Let&rsquo;s look at the steps. Your application writes logs to STDOUT. Fly.io will take that output and send it to <a href='https://nats.io/' title=''>NATS</a>. The <a href='https://github.com/superfly/fly-log-shipper' title=''>Log Shipper</a> will take that data and hand it to <a href='https://vector.dev/docs/about/what-is-vector/' title=''>Vector</a>. From there it is shipped to your third party logging provider. That&rsquo;s a lot of moving parts.</p> <p>All that is great, but just like how you have redundant machines in case of failures, you may want to have redundant logs in addition to the ones fly.io and the log shipper provide. Below are two strategies for doing just that. You can use either or both, and best of all the logs you create will be in addition to your existing logs.</p> <h2 id='logging-to-multiple-places' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#logging-to-multiple-places' aria-label='Anchor'></a><span class='plain-code'>Logging to multiple places</span></h2> <p>The following approach is likely the most failsafe, but often the least convenient: having your primary application on each machine write to a separate log file in addition to standard out. This does mean that when you need this data you will have to fetch it from each machine and it likely with be rather raw. But at least it will be there even in the face of network failures.</p> <p>For best results put these logs on a <a href='https://fly.io/docs/reference/volumes/' title=''>volume</a> so that it survives a restart, and be prepared to rotate logs as they grow in size so that they don&rsquo;t eventually fill up that volume.</p> <p>This approach is necessarily framework specific, but most frameworks provides some ability to do this. A Rails example:</p> <div class="highlight-wrapper group relative ruby"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-2yaa45j3" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-2yaa45j3"><span class="n">logger</span> <span class="o">=</span> <span class="no">ActiveSupport</span><span class="o">::</span><span class="no">Logger</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">STDOUT</span><span class="p">)</span> <span class="n">logger</span><span class="p">.</span><span class="nf">formatter</span> <span class="o">=</span> <span class="n">config</span><span class="p">.</span><span class="nf">log_formatter</span> <span class="n">volume_logger</span> <span class="o">=</span> <span class="no">ActiveSupport</span><span class="o">::</span><span class="no">Logger</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"/logs/production.log"</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="n">logger</span> <span class="o">=</span> <span class="n">logger</span><span class="p">.</span><span class="nf">extend</span> <span class="no">ActiveSupport</span><span class="o">::</span><span class="no">Logger</span><span class="p">.</span><span class="nf">broadcast</span><span class="p">(</span><span class="n">volume_logger</span><span class="p">)</span> </code></pre> </div> </div> <p>You probably already have the first two lines already in your <code>config/environments/production.rb</code> file. Adjust and add the last two lines. That&rsquo;s it! You now have redundant logs.</p> <p>See the <a href='https://docs.ruby-lang.org/en/master/Logger.html#class-Logger-label-Log+File+Rotation' title=''>Ruby docs for Logger</a> documentation on how to handle log rotation.</p> <p>Some pointers for other frameworks:</p> <ul> <li><a href='https://dev.to/darnahsan/elixir-logging-to-multiple-files-using-metadatafilter-3896' title=''>Elixir</a> </li><li><a href='https://laravel.com/docs/10.x/logging' title=''>Laravel</a> </li><li><a href='https://docs.python.org/3/howto/logging-cookbook.html#multiple-handlers-and-formatters' title=''>Python</a> </li><li><a href='https://github.com/winstonjs/winston#multiple-transports-of-the-same-type' title=''>Winston</a> for Node applications </li></ul> <h2 id='custom-log-shipper' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#custom-log-shipper' aria-label='Anchor'></a><span class='plain-code'>Custom log shipper</span></h2> <p>This approach is less bullet proof but may result in more immediately usable results. Instead of using Log Shipper, Vector, and a third party, it is easy to subscribe directly to NATS and process log entries yourself.</p> <p>What you are going to want is a separate app running on a separate machine so that it doesn&rsquo;t go down there are problems with the machine you are monitoring, or even during the times when you are deploying a new version. If the code you write will be writing to disk, you will want a volume.</p> <p>Also like with log shipper, you will want to set the following secret:</p> <div class="highlight-wrapper group relative shell"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-zdu3b55g" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-zdu3b55g">fly secrets <span class="nb">set </span><span class="nv">FLY_AUTH_TOKEN</span><span class="o">=</span><span class="si">$(</span>fly auth token<span class="si">)</span> </code></pre> </div> </div> <p>Here&rsquo;s a minimal JavaScript example that can be run using Node or Bun:</p> <div class="highlight-wrapper group relative javascript"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-fxjs7ls8" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-fxjs7ls8"><span class="k">import</span> <span class="p">{</span> <span class="nx">connect</span><span class="p">,</span> <span class="nx">StringCodec</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">nats</span><span class="dl">"</span><span class="p">;</span> <span class="k">import</span> <span class="nx">fs</span> <span class="k">from</span> <span class="dl">'</span><span class="s1">node:fs</span><span class="dl">'</span><span class="p">;</span> <span class="c1">// tailor these two constants for your needs</span> <span class="kd">const</span> <span class="nx">LOG_FILE</span> <span class="o">=</span> <span class="dl">"</span><span class="s2">/log/production.log</span><span class="dl">"</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">ORGANIZATION</span> <span class="o">=</span> <span class="dl">"</span><span class="s2">your-organization-name</span><span class="dl">"</span><span class="p">;</span> <span class="c1">// create a connection to a nats-server</span> <span class="kd">const</span> <span class="nx">nc</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">connect</span><span class="p">({</span> <span class="na">servers</span><span class="p">:</span> <span class="dl">"</span><span class="s2">[fdaa::3]:4223</span><span class="dl">"</span><span class="p">,</span> <span class="na">user</span><span class="p">:</span> <span class="nx">ORGANIZATION</span><span class="p">,</span> <span class="na">pass</span><span class="p">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">ACCESS_TOKEN</span> <span class="p">});</span> <span class="c1">// open log file</span> <span class="nx">file</span> <span class="o">=</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">openSync</span><span class="p">(</span><span class="nx">LOG_FILE</span><span class="p">,</span> <span class="dl">'</span><span class="s1">a+</span><span class="dl">'</span><span class="p">);</span> <span class="c1">// create a codec</span> <span class="kd">const</span> <span class="nx">sc</span> <span class="o">=</span> <span class="nx">StringCodec</span><span class="p">();</span> <span class="c1">// create a simple subscriber and iterate over messages</span> <span class="c1">// matching the subscription</span> <span class="kd">const</span> <span class="nx">sub</span> <span class="o">=</span> <span class="nx">nc</span><span class="p">.</span><span class="nx">subscribe</span><span class="p">(</span><span class="dl">"</span><span class="s2">logs.&gt;</span><span class="dl">"</span><span class="p">);</span> <span class="k">for</span> <span class="k">await</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">msg</span> <span class="k">of</span> <span class="nx">sub</span><span class="p">)</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">data</span> <span class="o">=</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">sc</span><span class="p">.</span><span class="nx">decode</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">data</span><span class="p">));</span> <span class="c1">// build log file entry</span> <span class="kd">const</span> <span class="nx">log</span> <span class="o">=</span> <span class="p">[</span> <span class="nx">data</span><span class="p">.</span><span class="nx">timestamp</span><span class="p">.</span><span class="nx">padEnd</span><span class="p">(</span><span class="mi">30</span><span class="p">),</span> <span class="s2">`[</span><span class="p">${</span><span class="nx">data</span><span class="p">.</span><span class="nx">fly</span><span class="p">.</span><span class="nx">app</span><span class="p">.</span><span class="nx">instance</span><span class="p">}</span><span class="s2">]`</span><span class="p">,</span> <span class="nx">data</span><span class="p">.</span><span class="nx">fly</span><span class="p">.</span><span class="nx">region</span><span class="p">,</span> <span class="s2">`[</span><span class="p">${</span><span class="nx">data</span><span class="p">.</span><span class="nx">log</span><span class="p">.</span><span class="nx">level</span><span class="p">}</span><span class="s2">]`</span><span class="p">,</span> <span class="nx">data</span><span class="p">.</span><span class="nx">message</span> <span class="p">].</span><span class="nx">join</span><span class="p">(</span><span class="dl">'</span><span class="s1"> </span><span class="dl">'</span><span class="p">)</span> <span class="o">+</span> <span class="dl">"</span><span class="se">\n</span><span class="dl">"</span><span class="p">;</span> <span class="c1">// write entry to disk</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">file</span><span class="p">,</span> <span class="nx">log</span><span class="p">,</span> <span class="nx">error</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">error</span><span class="p">);</span> <span class="p">});</span> <span class="p">}</span> </code></pre> </div> </div> <p>The above is pretty straightforward. It connects to NAT, opens a file, subscribes to logs, parses each message, and writes out selected data to disk. This example is in JavaScript, but feel free to reimplement this basic approach using your favorite language, as NATS supports <a href='https://docs.nats.io/using-nats/developer' title=''>plenty</a>.</p> <p>Things to watch out for: you don&rsquo;t want recursive errors when exceptions occur during write. You want to capture errors and reconnect to NATS when the connection goes down. You may even want to filter messages. A more complete example implementing a number of these features can be found <a href='https://github.com/rubys/showcase/blob/main/fly/applications/logger/logfiler.ts' title=''>here</a>.</p> <h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>Log failures are not common, and perhaps the redundant logs that fly.io already keeps will be sufficient for your needs. But it may be worth reviewing what your exposure is and how to mitigate that exposure should your logs fail at the worst possible time.</p> <p>Hopefully the approaches listed above give you ideas on how to ensure that you will always have the log data you need even in the most hostile environment conditions.</p></content> </entry> <entry> <title>Tokenized Tokens</title> <link rel="alternate" href="https://fly.io/blog/tokenized-tokens/"/> <id>https://fly.io/blog/tokenized-tokens/</id> <published>2023-07-12T00:00:00+00:00</published> <updated>2023-11-21T21:08:37+00:00</updated> <media:thumbnail url="https://fly.io/blog/tokenized-tokens/assets/ghosts.png"/> <content type="html"><div class="lead"><p>We’re Fly.io. We run apps for our users on hardware we host around the world. Building security for a platform like this is tricky, and that’s what the post is about. But you don’t have to read any of this to get an app running on here. See how to <a href="https://fly.io/docs/speedrun/" title="">speedrun getting an app running on Fly.io here</a>.</p> </div> <p>We built some little security thingies. We&rsquo;re open sourcing them, and hoping you like them as much as we do. In a nutshell: it&rsquo;s a proxy that injects secrets into arbitrary 3rd-party API calls. We could describe it more completely here, but that wouldn&rsquo;t be as fun as writing a big long essay about how the thingies came to be, so: buckle up.</p> <p>The problem we confront is as old as Rails itself. Our application started simple: some controllers, some models. The only secrets it stored were bcrypt password hashes. But not unlike a pet baby alligator, it grew up. Now it&rsquo;s become more unruly than we&rsquo;d planned.</p> <p>That&rsquo;s because frameworks like Rails make it easy to collect secrets: you just create another model for them, <a href='https://guides.rubyonrails.org/active_record_encryption.html' title=''>roll some kind of secret to encrypt them</a>, jam that secret into the deployment environment, and call it a day.</p> <p>And, at least in less sensitive applications, or even the early days of an app like ours, that can work!</p> <div class="callout"><p>For what it’s worth, and to the annoyance of some of our Heroku refugees, we’ve never stored customer app secrets this way; our Rails API can write customer secrets, but has never been able to read them. We’ll talk more about how this works in a sec.</p> </div> <p>But for us, not anymore. At the stage we&rsquo;re at, all secrets are hazmat. And Rails itself is the portion of our attack surface we&rsquo;re least confident about – the rest of it is either outside of our trust boundaries, or written in Rust and Go, strongly-typed memory-safe languages that are easy to reason about, and which have never accidentally treated YAML as an executable file format.</p> <p>So, a few months back, during an integration with a 3rd party API that relied on OAuth2 tokens, we drew a line: ⚡ <em>henceforth, hazmat shall only be removed from Rails, never added</em> ⚡. This is easier said than done, though: despite prominent &ldquo;this is not a place of honor&rdquo; signs all over the codebase, our Rails API is still where much of the action in our system takes place.</p> <h3 id='how-apps-use-secrets-3-different-approaches' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-apps-use-secrets-3-different-approaches' aria-label='Anchor'></a><span class='plain-code'>How Apps Use Secrets: 3 Different Approaches</span></h3> <p><img src="/blog/tokenized-tokens/assets/secrets-1.png?2/3&amp;card&amp;center" /></p> <p>We just gave you one way, probably the most common. Stick &lsquo;em in a model, encrypt them with an environment secret, and watch Dependabot religiously for vulnerabilities in transitively-added libraries you&rsquo;ve never heard of before.</p> <p><img src="/blog/tokenized-tokens/assets/secrets-2.png?2/3&amp;card&amp;center" /></p> <p>Here&rsquo;s a second way, probably the second-most popular: use a secrets management system, like <a href='https://aws.amazon.com/kms/' title=''>KMS</a> or <a href='https://www.hashicorp.com/products/vault' title=''>Vault</a>. These systems, which are great, keep secrets encrypted and allow access based on an intricate access control language, which is great.</p> <p>That&rsquo;s what we do for customer app secrets, like <code>DATABASE_URL</code> and <code>API_KEY</code>. We use <a href='https://www.hashicorp.com/products/vault' title=''>HashiCorp Vault</a> (for the time being). Our Rails API has an access token for Vault that allows it to set secrets, but not read any of them back, like a kind of diode. A game-over Rails vulnerability might allow an attacker to scramble secrets, but not to easily dump them.</p> <p>In the happiest cases with secrets, systems like Vault can keep secret bits from ever touching the application. Customer app secrets are a happy case: Rails never needs to read them, <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>just our orchestrator</a>, to inject them into VM environments. In other happy cases, Vault operates on the app&rsquo;s behalf: signing a time-limited request URL for AWS, or making a direct request to a known 3rd-party service. Vault calls these features &ldquo;<a href='https://developer.hashicorp.com/vault/docs/secrets' title=''>secret engines</a>&rdquo;, and when you can get away with using them, it&rsquo;s hard to do better.</p> <p>The catch is, sometimes you can&rsquo;t get away with them. For most 3rd parties, Vault has no idea how to interact with them. And most secrets are bearer tokens, not request signatures. The only way to use those kinds of secrets is to read them into app memory. If good code can read a secret from Vault, so can a YAML vulnerability.</p> <div class="callout"><p>Still: this is better than nothing: even if apps can read raw secrets, systems like Vault can provide an audit trail of which secrets were pulled when, and make it much easier to rotate secrets, which you’ll want to do with raw secrets to contain their blast radius. HashiCorp Vault is great, so is KMS, we recommend them unreservedly.</p> </div> <p><img src="/blog/tokenized-tokens/assets/secrets-3.png?2/3&amp;card&amp;center" /></p> <p>So that&rsquo;s why there&rsquo;s a third way to handle this problem, which is: decompose your application into services so that the parts that have to handle secrets are tiny and well-contained. The bulk of our domain-specific business code can chug along in Rails, and the parts that trade bearer tokens with 3rd parties can be built in a couple hundred lines of Go.</p> <p>This is a good approach, too. It&rsquo;s just cumbersome, because a big application ends up dealing with lots of different kinds of secrets, making a trusted microservice for each of them is a drag. What you want is to notice some commonality in how 3rd party API secrets are used, and to come up with some possible way of exploiting that.</p> <p>We thought long and hard on this and came up with:</p> <h3 id='tokenizer-the-fabled-4th-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#tokenizer-the-fabled-4th-way' aria-label='Anchor'></a><span class='plain-code'>Tokenizer: The Fabled 4th Way</span></h3> <p><img src="/blog/tokenized-tokens/assets/secrets-4.png?2/3&amp;card&amp;center" /></p> <p>We developed a multipurpose secret-using service called the <code>Tokenizer</code>.</p> <p><code>Tokenizer</code> is a stateless HTTP proxy that holds the private key of a <a href='https://pkg.go.dev/golang.org/x/crypto/nacl/box' title=''>Curve25519 keypair.</a></p> <p>When we get a new 3rd party API secret, we encrypt it to <code>Tokenizer&#39;s</code> public key; we &ldquo;tokenize&rdquo; it. Our API server can handle the (encrypted) tokenized secret, but it can&rsquo;t read or use it directly. Only <code>Tokenizer</code> can.</p> <p>When it comes time to talk to the 3rd party API, Rails does so via <code>Tokenizer</code>. Here&rsquo;s how that works:</p> <ol> <li>The API request is proxied, as an ordinary HTTP 1.1 request, through <code>Tokenizer</code>. </li><li>The request carries one or more additional <code>Proxy-Tokenizer</code> headers. </li><li>Each <code>Proxy-Tokenizer</code> header carries an encrypted secret and instructions for <code>Tokenizer</code> to rewrite the request in some way, usually by injecting the decrypted plaintext into a header. </li></ol> <p>You can think of <code>Tokenizer</code> as a sort of Vault-style &ldquo;secret engine&rdquo; that happens to capture virtually everything an app needs secrets for. It can even use decrypted secrets to selectively HMAC parts of requests, for APIs that authenticate with signatures instead of bearer tokens.</p> <p>Check it out: <a href='https://github.com/superfly/tokenizer' title=''>it&rsquo;s not super complicated</a>.</p> <p>Now, our goal is to keep Rails from ever touching secret bits. But, hold on: a game-over Rails vulnerability would give attackers an easy way around <code>Tokenizer</code>: you&rsquo;d just proxy requests for a particular secret to a service you ran that collected the plaintext.</p> <p>To mitigate that, we built the obvious feature: you can lock requests for specific secrets down to a list of allowed hosts or host regexp patterns.</p> <p>We think this approach to handling secrets is pretty similar to how payment processors tokenize payment card information, hence the name. The advantages are straightforward:</p> <ul> <li>Secrets are exposed to a much smaller attack surface that doesn&rsquo;t include Rails. </li><li>Virtually every usage of secrets we&rsquo;re likely to run across is captured by HTTP proxying, without us needing to write per-service code. </li><li>The tokenizer is a tiny project that&rsquo;s easy to audit and reason about. </li><li>Every language we work in already has first-class support for running requests through a proxy (something we already do for <a href='https://github.com/stripe/smokescreen' title=''>SSRF protection</a>.) </li></ul> <h3 id='ssokenizer-tokenizing-oauth-sso' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ssokenizer-tokenizing-oauth-sso' aria-label='Anchor'></a><span class='plain-code'>SSOkenizer: Tokenizing OAuth SSO</span></h3> <p>When we created <code>Tokenizer</code>, we were motivated by the problem of OAuth2 tokens other services providers gave us, for partnership features we build for mutual customers.</p> <p>We&rsquo;d also dearly like our customers to use OAuth2/OIDC to log into Fly.io itself; it&rsquo;s more secure for them, and gives them the full complement of Google MFA features, meaning we don&rsquo;t immediately have to implement the full complement of Google MFA features. Letting people log into Fly.io with a Google OAuth token means we have to keep track of people&rsquo;s OAuth tokens. That sounds like a job for the <code>Tokenizer</code>!</p> <p>But there&rsquo;s a catch: acquiring those OAuth tokens in the first place means doing the OAuth2 dance, which means that for a brief window of time, Rails is handling hazmat. We&rsquo;d like to close that window.</p> <p><img src="/blog/tokenized-tokens/assets/ssokenizer.png?2/3&amp;card&amp;center" /></p> <p>Enter the <code>SSOkenizer</code>.</p> <p>The job of the <code>SSOkenizer</code> is to perform the OAuth2 dance on behalf of Rails, and then use the output of that process (the OAuth2 bearer token yielded from the OAuth2 code flow, which you can <a href='https://github.com/superfly/ssokenizer#ssokenizer' title=''>see in its cursed majesty here</a>) to drive the <code>Tokenizer</code>.</p> <p>In other words, where we&rsquo;d otherwise explicitly encrypt secrets to be tokenized a-priori, the <code>SSOkenizer</code> does that on the fly, passing tokenized OAuth2 credentials back to Rails. Those… tokenized tokens can only be used through the <code>Tokenizer</code> proxy, which is the only component in our system with the private key that unseals them.</p> <p>We think this is a pretty neat trick. The <code>SSOkenizer</code> itself is tiny, even smaller than the <code>Tokenizer</code> (<a href='https://github.com/superfly/ssokenizer/' title=''>you can read it here</a>), and essentially stateless; in fact, pretty much everything in this system is minimally stateful, except Rails, which is great at being stateful. We even keep almost all of OAuth2 out of Rails and confined to Go code (where it&rsquo;s practically the hello-world of Go OAuth2 libraries).</p> <p>A nice side effect-slash-validation of this design: once we got it working for Google, it became a super easy project to get OAuth2 logins working for other providers.</p> <h3 id='feel-free-to-poach-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#feel-free-to-poach-this' aria-label='Anchor'></a><span class='plain-code'>Feel Free To Poach This</span></h3> <p>We&rsquo;re psyched for a bunch of reasons:</p> <ul> <li>We&rsquo;ve got a clear path to rolling out SSO logins. </li><li>We can do integrations with third-party services now without infecting Rails with more hazmat secrets. </li><li>We&rsquo;ve honored the rule of &ldquo;only removing hazmat from Rails, not adding it&rdquo;. </li><li>We&rsquo;ve also cleared a path to getting all the rest of the hazmat Rails has access to tokenized. </li></ul> <p>These are standalone tools with no real dependencies on Fly.io, so they&rsquo;re easy for us to open source. Which is what we did: if they sound useful to you, check out the <a href='https://github.com/superfly/tokenizer' title=''>tokenizer</a> and <a href='https://github.com/superfly/ssokenizer' title=''>ssokenizer</a> repositories for instructions on deploying and using these services yourself.</p></content> </entry> <entry> <title>Fly.io ❤️ Bun</title> <link rel="alternate" href="https://fly.io/blog/flydotio-heart-bun/"/> <id>https://fly.io/blog/flydotio-heart-bun/</id> <published>2023-07-11T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/flydotio-heart-bun/assets/flydotio-heart-bun-thumb.webp"/> <content type="html"><p><a href='https://lu.ma/cqk31rvl' title=''>Bun 1.0 comes out September 7th</a>. Fly.io is making preparations.</p> <p>Previously, we stated that <a href='https://fly.io/blog/flydotio-heart-js/' title=''>Fly.io ❤️ JS</a>, and we understandably started with Node.js. While that work is ongoing, it makes sense to start expanding to other runtimes.</p> <p>Bun is the obvious next choice given it <a href='https://bun.sh/docs/runtime/nodejs-apis' title=''>aims for complete Node.js API compatibility</a>.</p> <p>Starting with <a href='https://fly.io/docs/hands-on/install-flyctl/' title=''>flyctl</a> version 0.1.54 and <a href='https://www.npmjs.com/package/@flydotio/dockerfile' title=''>@flydotio/dockerfile</a> version 0.3.3, you can launch and deploy bun applications using <code>fly launch</code> and <code>fly deploy</code>, provided:</p> <ul> <li>You&rsquo;ve installed bun version 0.5.3 or later </li><li>You have a <code>package.json</code> that meets at least one of the following conditions: <ul> <li>It has a <code>start</code> entry in the <code>scripts</code> section. </li><li>It has a <code>module</code> entry and specified <code>module</code> as the <code>type</code>. </li><li>If has a <code>main</code> entry. </li></ul> </li></ul> <p>Basically, if you can run <a href='https://bun.sh/docs/quickstart' title=''>Bun&rsquo;s Quickstart</a> and <a href='https://fly.io/docs/hands-on/' title=''>Fly&rsquo;s hands-on walk-through</a>, you have all you need to deploy your application on fly.io.</p> <p>We also have a <a href='https://github.com/fly-apps/bun/' title=''>sample</a> that you can deploy.</p> <p>Be forewarned that everything is beta at this point. Some issues we encountered while preparing this support:</p> <ul> <li><a href='https://github.com/oven-sh/bun/issues/3605' title=''><code>bun install</code> has no <code>--prune</code> option</a>. Our Dockerfiles use this to remove development dependencies after running <code>build</code>. Of course with bun you are less likely to need a build step as TS and JSX are built in. </li><li><a href='https://github.com/oven-sh/bun/issues/1579' title=''><code>throwIfNoEntry</code> is not supported in <code>fs.statSync</code></a>. <a href='https://github.com/fly-apps/node-demo' title=''><code>fly-apps/node-demo</code></a> uses that. </li><li>Programs that used <a href='https://nodejs.org/api/readline.html' title=''>readline</a> <a href='https://github.com/oven-sh/bun/issues/3604' title=''>never exit</a>. Switching to <a href='https://bun.sh/docs/api/globals' title=''>global</a>.<a href='https://developer.mozilla.org/en-US/docs/Web/API/Window/prompt' title=''>prompt</a> resolved this issue for <code>@flydotio/dockerfile</code>. </li></ul> <p>Undoubtedly there will be bugs in fly&rsquo;s dockerfile generator too. But as Node.js and Bun share the same generator, fixes that are made for either framework will generally benefit both.</p> <p>If you see a problem, <a href='https://community.fly.io/' title=''>start a discussion</a>, <a href='https://github.com/fly-apps/dockerfile-node' title=''>open an issue</a>, or <a href='https://github.com/fly-apps/dockerfile-node/pulls' title=''>create a pull request</a>.</p></content> </entry> <entry> <title>LiteFS Cloud: Distributed SQLite with Managed Backups</title> <link rel="alternate" href="https://fly.io/blog/litefs-cloud/"/> <id>https://fly.io/blog/litefs-cloud/</id> <published>2023-07-05T00:00:00+00:00</published> <updated>2023-11-21T21:08:37+00:00</updated> <media:thumbnail url="https://fly.io/blog/litefs-cloud/assets/litefs-cloud-thumb.webp"/> <content type="html"><div class="lead"><p>With Fly.io, <a href="https://fly.io/docs/speedrun/" title="">you can get your app running globally in a matter of minutes</a>, and with LiteFS, you can run SQLite alongside your app! Now we’re introducing LiteFS Cloud: managed backups and point-in-time restores for LiteFS—whether your app is running on Fly.io or anywhere else. <a href="https://fly.io/docs/litefs/speedrun/" title="">Try it out for yourself</a>!</p> </div> <p>We love <a href='https://fly.io/blog/all-in-on-sqlite-litestream/' title=''>SQLite in production</a>, and we&rsquo;re all about running apps close to users. That&rsquo;s why we created LiteFS: an open source distributed SQLite database that lives on the same filesystem as your application, and replicates data to all the nodes in your app cluster.</p> <p>With LiteFS, you get the simplicity, flexibility, and lightning-fast local reads of working with vanilla SQLite, but distributed (so it&rsquo;s close to your users)! It&rsquo;s especially great for read-heavy web applications. Learn more about LiteFS in the <a href='https://fly.io/docs/litefs/' title=''>LiteFS docs</a> and in <a href='https://fly.io/blog/introducing-litefs/' title=''>our blog post introducing LiteFS</a>.</p> <p>At Fly.io we&rsquo;ve been using LiteFS internally for a while now, and it&rsquo;s awesome!</p> <p>However, something is missing: disaster recovery. Because it&rsquo;s local to your app, you don&rsquo;t need to—indeed can&#39;t—pay someone to manage your LiteFS cluster, which means no managed backups. Until now, you&rsquo;ve had to <a href='https://fly.io/docs/litefs/backup/' title=''>build your own</a>: take regular snapshots, store them somewhere, figure out a retention policy, that sort of thing.</p> <p>This also means you can only restore from a point in time when you happen to have taken a snapshot, and you likely need to limit how frequently you snapshot for cost reasons. Wouldn&rsquo;t it be cool if you could have super-frequent reliable backups to restore from, without having to implement it yourself?</p> <p>Well, that&rsquo;s why we&rsquo;re launching, in preview, LiteFS Cloud: backups and restores for LiteFS, managed by Fly.io. It gives you painless and reliable backups, with the equivalent of a snapshot every five minutes (8760 snapshots per month!), whether your database is hosted with us, or anywhere else.</p> <h2 id='how-do-i-use-litefs-cloud' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-do-i-use-litefs-cloud' aria-label='Anchor'></a><span class='plain-code'>How do I use LiteFS Cloud?</span></h2> <p>There&rsquo;s a few steps to get started:</p> <ul> <li>Upgrade LiteFS to version 0.5.1 or greater </li><li>Create a LiteFS Cloud cluster in the Fly.io dashboard, <a href='https://fly.io/dashboard/personal/litefs' title=''>LiteFS Cloud section</a> </li><li>Make the LiteFS Cloud auth token available to your LiteFS </li></ul> <p><img alt="Screenshot of Fly.io dashboard, with a red arrow pointing to &quot;LiteFS Cloud&quot; in the left navbar, and another red arrow pointing to the &quot;Create&quot; button on the top right for creating a LiteFS Cloud cluster" src="/blog/litefs-cloud/assets/screenshot1.png" /></p> <p><a href='https://fly.io/docs/litefs/cloud-backups' title=''>There are some docs here</a>, but that’s literally it. Then your database will start automagically backing up, we’ll manage the backups for you, and you’ll be able to restore your database near instantaneously to any point in time in the last 30 days (with 5 minute granularity).</p> <p>I want to say that again because I think it’s just wild – you can restore your database to <em>any point in time, with 5 minute granularity</em>. <strong class='font-semibold text-navy-950'><em>Near instantaneously</em></strong>.</p> <p>Speaking of restores&mdash;you can do those in the dashboard too. You pick a date and time, and we’ll take the most recent snapshot before that timestamp and restore it. This will take a couple of seconds (or less).</p> <p><img alt="Screenshot of popup modal on Fly.io dashboard, with a date and time selector, and a text field with &quot;lfsc-test-runner/db&quot; typed in it, and a red button at the bottom with text &quot;I understand the consequences. Restore from this snapshot.&quot;" src="/blog/litefs-cloud/assets/screenshot2.png" /></p> <p>We&rsquo;ll introduce pricing in the coming months, but for now LiteFS Cloud is in preview and is free to use. Please go check it out, and let us know how it goes!</p> <h2 id='the-secret-sauce-ltx-amp-compactions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-secret-sauce-ltx-amp-compactions' aria-label='Anchor'></a><span class='plain-code'>The secret sauce: LTX &amp; compactions</span></h2> <p>LiteFS is built on a simple file format called <a href='https://github.com/superfly/ltx' title=''>Lite Transaction File (LTX)</a> which is designed for fast, flexible replication and recovery in LiteFS itself and in LiteFS Cloud.</p> <p>But first, let&rsquo;s start off with what an LTX file represents: <em>a change set of database pages</em>.</p> <p>When you commit a write transaction in SQLite, it updates one or more fixed-sized blocks called pages. By default, these are 4KB in size. An LTX file is simply a sorted list of these changed pages. Whenever you perform a transaction in SQLite, LiteFS will build an LTX file for that transaction.</p> <p>The interesting part of LTX is that contiguous sets of LTX files can be merged together into one LTX file. This merge process is called <em>compaction</em>.</p> <p>For example, let&rsquo;s say you have 3 transactions in a row that update the following set of pages:</p> <ul> <li>LTX A: Pages 1, 5, 7 </li><li>LTX B: Pages 5, 6 </li><li>LTX C: Pages 5, 7 </li></ul> <p>With LTX compaction, you avoid the duplicate work that comes from overwriting the same pages one transaction at a time. Instead, one LTX file for transactions A through C contains the last version of each page, so the pages are stored and updated only once:</p> <p><img alt="Compacting three contiguous LTX files into a single LTX file." src="/blog/litefs-cloud/assets/single-level-compaction.png" /></p> <p>That, in a nutshell, is how a single-level compaction works.</p> <h2 id='its-ltx-all-the-way-down' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#its-ltx-all-the-way-down' aria-label='Anchor'></a><span class='plain-code'>It&rsquo;s LTX all the way down</span></h2> <p>Compactions let us take changes for a bunch of transactions and smoosh them down into a single, small file. That&rsquo;s cool and all but how does that give us fast point-in-time restores? By the magic of multi-level compactions!</p> <p>Compaction levels are progressively larger time intervals that we roll up transaction data. In the following illustration, you can see that the highest level (L3) starts with a full snapshot of the database. This occurs daily and it&rsquo;s our starting point during a restore.</p> <p>Next, we have an hourly compaction level called L2 so there will be an LTX file with page changes between midnight and 1am, and then another file for 1am to 2am, etc. Below that is L1 which holds 5-minute intervals of data.</p> <p><img alt="Compaction levels for snapshots (L3), hourly (L2), &amp; every five minutes (L1)." src="/blog/litefs-cloud/assets/multi-level-compaction.png" /></p> <p>When a restore is requested for a specific timestamp, we can determine a minimal set of LTX files to replay. For example, if we restored to January 10th at 8:15am we would grab the following files:</p> <ul> <li>Start with the snapshot for January 10th. </li><li>Fetch the eight hourly LTX files from midnight to 8am. </li><li>Fetch the three 5-minute interval LTX files from 8:00am to 8:15am. </li></ul> <p>Since LTX files are sorted by page number, we can perform a streaming merge of these twelve files and end up with the state of the database at the given timestamp.</p> <h2 id='department-of-redundancy-department' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#department-of-redundancy-department' aria-label='Anchor'></a><span class='plain-code'>Department of Redundancy Department</span></h2> <p>One of the primary goals of LiteFS is to be simple to use. However, that&rsquo;s not an easy goal for a distributed database when our industry is moving more and more towards highly dynamic and ephemeral infrastructure. Traditional consensus algorithms require stable membership and adjusting the member set can be complicated.</p> <p>With LiteFS, we chose to use async replication as the primary mode of operation. This has some trade-offs in durability guarantees but it makes the cluster much simpler to operate. LiteFS Cloud alleviates many of these trade-offs of async replication by writing data out to high-durability, high-availability object storage&mdash;for now, we&rsquo;re using S3.</p> <p>However, we don&rsquo;t write every individual LTX file to object storage immediately. The latency is too high and it&rsquo;s not cost effective when you write a lot of transactions. Instead, the LiteFS primary node will batch up its changes every second and send a single, compacted LTX file to LiteFS Cloud. Once there, LiteFS Cloud will batch these 1-second files together and flush them to storage periodically.</p> <p>We track the ID of the latest transaction that&rsquo;s been flushed, and we call this the &ldquo;high water mark&rdquo; or HWM. This transaction ID is propagated back down to the nodes of the LiteFS cluster so we can ensure that the transaction file is not removed from any node until it is safely persisted in object storage. With this approach, we have multiple layers of redundancy in case your LiteFS cluster can&rsquo;t communicate with LiteFS Cloud or if we can&rsquo;t communicate with S3.</p> <h2 id='whats-next-for-litefs-cloud' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next-for-litefs-cloud' aria-label='Anchor'></a><span class='plain-code'>What&rsquo;s next for LiteFS Cloud?</span></h2> <p>We have a small team dedicated to LiteFS Cloud, and we&rsquo;re chugging away at new exciting features! Right now, LiteFS Cloud is really just backups and restores, but we are working on a lot of other cool stuff:</p> <ul> <li>Upload your database in the Fly.io dashboard. This way you don&rsquo;t have to worry about figuring out how to initialize your database when you first deploy it, just upload the database in the dashboard and LiteFS will pull it from LiteFS Cloud. </li><li>Download a point-in-time snapshot of your database from the Fly.io dashboard. You can use this to spin up a local dev env (with production data), do some local analysis, etc. </li><li>Clone your LiteFS Cloud cluster to a new cluster, which you could use for a staging environment (or on-demand test environments for your CI pipelines) with real data. </li><li>Features to support apps that run on serverless platforms like Vercel, Google Cloud Run, Deno, and more. We&rsquo;ll need to develop a number of different features for this, stay tuned for more information in the coming weeks! </li></ul> <p>We&rsquo;re really excited about the future of LiteFS Cloud, so we wanted to share what we&rsquo;re thinking. We&rsquo;d also love to hear any feedback you have about these ideas that might inform our work.</p></content> </entry> </feed>
<?xml version="1.0" encoding="UTF-8"?> <feed xmlns="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/"> <title>The Fly Blog</title> <subtitle>News, tips, and tricks from the team at Fly</subtitle> <id>https://fly.io/blog/</id> <link href="https://fly.io/blog/"/> <link href="https://fly.io/blog/" rel="self"/> <updated>2025-03-27T00:00:00+00:00</updated> <author> <name>Fly</name> </author> <entry> <title>Operationalizing Macaroons</title> <link rel="alternate" href="https://fly.io/blog/operationalizing-macaroons/"/> <id>https://fly.io/blog/operationalizing-macaroons/</id> <published>2025-03-27T00:00:00+00:00</published> <updated>2025-04-01T19:05:33+00:00</updated> <media:thumbnail url="https://fly.io/blog/operationalizing-macaroons/assets/occult-circle.jpg"/> <content type="html"><div class="lead"><p>We’re Fly.io, a security bearer token company with a public cloud problem. You can read more about what our platform does (Docker container goes in, virtual machine in Singapore comes out), but this is just an engineering deep-dive into how we make our security tokens work. It’s a tokens nerd post.</p> </div><h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'></a><span class='plain-code'>1</span></h2> <p>We’ve spent <a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''>too much time</a> talking about <a href='https://fly.io/blog/tokenized-tokens/' title=''>security tokens</a>, and about <a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''>Macaroon tokens</a> <a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''>in particular</a>. Writing another Macaroon treatise was not on my calendar. But we’re in the process of handing off our internal Macaroon project to a new internal owner, and in the process of truing up our operations manuals for these systems, I found myself in the position of writing a giant post about them. So, why not share?</p> <div class="callout"><p>Can I sum up Macaroons in a short paragraph? Macaroon tokens are bearer tokens (like JWTs) that use a cute chained-HMAC construction that allows an end-user to take any existing token they have and scope it down, all on their own. You can minimize your token before every API operation so that you’re only ever transmitting the least amount of privilege needed for what you’re actually doing, even if the token you were issued was an admin token. And they have a user-serviceable plug-in interface! <a href="https://fly.io/blog/macaroons-escalated-quickly/" title="">You’ll have to read the earlier post to learn more about that</a>.</p> </div><div class="right-sidenote"><p>Yes, probably, we are.</p> </div> <p>A couple years in to being the Internet’s largest user of Macaroons, I can report (as many predicted) that for our users, the cool things about Macaroons are a mixed bag in practice. It’s very neat that users can edit their own tokens, or even email them to partners without worrying too much. But users don’t really take advantage of token features.</p> <p>But I’m still happy we did this, because Macaroon quirks have given us a bunch of unexpected wins in our infrastructure. Our internal token system has turned out to be one of the nicer parts of our platform. Here’s why.</p> <p><img alt="This should clear everything up." src="/blog/operationalizing-macaroons/assets/schematic-diagram.png" /></p> <h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'></a><span class='plain-code'>2</span></h2> <p>As an operator, the most important thing to know about Macaroons is that they’re online-stateful; you need a database somewhere. A Macaroon token starts with a random field (a nonce) and the first thing you do when verifying a token is to look that nonce up in a database. So one of the most important details of a Macaroon implementation is where that database lives.</p> <p>I can tell you one place we’re not OK with it living: in our primary API cluster.</p> <p>There’s several reasons for that. Some of them are about scalability and reliability: far and away the most common failure mode of an outage on our platform is “deploys are broken”, and those failures are usually caused by API instability. It would not be OK if “deploys are broken” transitively meant “deployed apps can’t use security tokens”. But the biggest reason is security: root secrets for Macaroon tokens are hazmat, and a basic rule of thumb in secure design is: keep hazmat away from complicated code.</p> <p>So we created a deliberately simple system to manage token data. It’s called <code>tkdb</code>.</p> <div class="right-sidenote"><p>LiteFS: primary/replica distributed SQLite; Litestream: PITR SQLite replication to object storage; both work with unmodified SQLite libraries.</p> </div> <p><code>tkdb</code> is about 5000 lines of Go code that manages a SQLite database that is in turn managed by <a href='https://fly.io/docs/litefs/' title=''>LiteFS</a> and <a href='https://litestream.io/' title=''>Litestream</a>. It runs on isolated hardware (in the US, Europe, and Australia) and records in the database are encrypted with an injected secret. LiteFS gives us subsecond replication from our US primary to EU and AU, allows us to shift the primary to a different region, and gives us point-in-time recovery of the database.</p> <p>We’ve been running Macaroons for a couple years now, and the entire <code>tkdb</code> database is just a couple dozen megs large. Most of that data isn’t real. A full PITR recovery of the database takes just seconds. We use SQLite for a lot of our infrastructure, and this is one of the very few well-behaved databases we have.</p> <p>That’s in large part a consequence of the design of Macaroons. There’s actually not much for us to store! The most complicated possible Macaroon still chains up to a single root key (we generate a key per Fly.io “organization”; you don&rsquo;t share keys with your neighbors), and everything that complicates that Macaroon happens “offline”. We take advantage of &ldquo;attenuation&rdquo; far more than our users do.</p> <p>The result is that database writes are relatively rare and very simple: we just need to record an HMAC key when Fly.io organizations are created (that is, roughly, when people sign up for the service and actually do a deploy). That, and revocation lists (more on that later), which make up most of the data.</p> <h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'></a><span class='plain-code'>3</span></h2> <p>Talking to <code>tkdb</code> from the rest of our platform is complicated, for historical reasons.</p> <div class="right-sidenote"><p>NATS is fine, we just don’t really need it.</p> </div> <p>Ben Toews is responsible for most of the good things about this implementation. When he inherited the v0 Macaroons code from me, we were in the middle of a weird love affair with <a href='https://nats.io/' title=''>NATS</a>, the messaging system. So <code>tkdb</code> exported an RPC API over NATS messages.</p> <p>Our product security team can’t trust NATS (it’s not our code). That means a vulnerability in NATS can’t result in us losing control of all our tokens, or allow attackers to spoof authentication. Which in turn means you can’t run a plaintext RPC protocol for <code>tkdb</code> over NATS; attackers would just spoof “yes this token is fine” messages.</p> <div class="right-sidenote"><p>I highly recommend implementing Noise; <a href="http://www.noiseprotocol.org/noise.html" title="">the spec</a> is kind of a joy in a way you can’t appreciate until you use it, and it’s educational.</p> </div> <p>But you can’t just run TLS over NATS; NATS is a message bus, not a streaming secure channel. So I did the hipster thing and implemented <a href='http://www.noiseprotocol.org/noise.html' title=''>Noise</a>. We export a “verification” API, and a “signing” API for minting new tokens. Verification uses <code>Noise_IK</code> (which works like normal TLS) — anybody can verify, but everyone needs to prove they’re talking to the real <code>tkdb</code>. Signing uses <code>Noise_KK</code> (which works like mTLS) — only a few components in our system can mint tokens, and they get a special client key.</p> <p>A little over a year ago, <a href='https://fly.io/blog/the-exit-interview-jp/' title=''>JP</a> led an effort to replace NATS with HTTP, which is how you talk to <code>tkdb</code> today. Out of laziness, we kept the Noise stuff, which means the interface to <code>tkdb</code> is now HTTP/Noise. This is a design smell, but the security model is nice: across many thousands of machines, there are only a handful with the cryptographic material needed to mint a new Macaroon token. Neat!</p> <p><code>tkdb</code> is a Fly App (albeit deployed in special Fly-only isolated regions). Our infrastructure talks to it over “<a href='https://fly.io/docs/networking/flycast/' title=''>FlyCast</a>”, which is our internal Anycast service. If you’re in Singapore, you’re probably get routed to the Australian <code>tkdb</code>. If Australia falls over, you’ll get routed to the closest backup. The proxy that implements FlyCast is smart, as is the <code>tkdb</code> client library, which will do exponential backoff retry transparently.</p> <p>Even with all that, we don’t like that Macaroon token verification is &ldquo;online&rdquo;. When you operate a global public cloud one of the first thing you learn is that <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>the global Internet sucks</a>. Connectivity breaks all the time, and we’re paranoid about it. It’s painful for us that token verification can imply transoceanic links. Lovecraft was right about the oceans! Stay away!</p> <p>Our solution to this is caching. Macaroons, as it turns out, cache beautifully. That’s because once you’ve seen and verified a Macaroon, you have enough information to verify any more-specific Macaroon that descends from it; that’s a property of <a href='https://research.google/pubs/macaroons-cookies-with-contextual-caveats-for-decentralized-authorization-in-the-cloud/' title=''>their chaining HMAC construction</a>. Our client libraries cache verifications, and the cache ratio for verification is over 98%.</p> <h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'></a><span class='plain-code'>4</span></h2> <p><a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''>Revocation isn’t a corner case</a>. It can’t be an afterthought. We’re potentially revoking tokens any time a user logs out. If that doesn’t work reliably, you wind up with “cosmetic logout”, which is a real vulnerability. When we kill a token, it needs to stay dead.</p> <p>Our revocation system is simple. It’s this table:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-13jllwee" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-13jllwee"> CREATE TABLE IF NOT EXISTS blacklist ( nonce BLOB NOT NULL UNIQUE, required_until DATETIME, created_at DATETIME DEFAULT CURRENT_TIMESTAMP ); </code></pre> </div> </div> <p>When we need a token to be dead, we have our primary API do a call to the <code>tkdb</code> “signing” RPC service for <code>revoke</code>. <code>revoke</code> takes the random nonce from the beginning of the Macaroon, discarding the rest, and adds it to the blacklist. Every Macaroon in the lineage of that nonce is now dead; we check the blacklist before verifying tokens.</p> <p>The obvious challenge here is caching; over 98% of our validation requests never hit <code>tkdb</code>. We certainly don’t want to propagate the blacklist database to 35 regions around the globe.</p> <p>Instead, the <code>tkdb</code> “verification” API exports an endpoint that provides a feed of revocation notifications. Our client library “subscribes” to this API (really, it just polls). Macaroons are revoked regularly (but not constantly), and when that happens, clients notice and prune their caches.</p> <p>If clients lose connectivity to <code>tkdb</code>, past some threshold interval, they just dump their entire cache, forcing verification to happen at <code>tkdb</code>.</p> <h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'></a><span class='plain-code'>5</span></h2> <p>A place where we’ve gotten a lot of internal mileage out of Macaroon features is service tokens. Service tokens are tokens used by code, rather than humans; almost always, a service token is something that is stored alongside running application code.</p> <p>An important detail of Fly.io’s Macaroons is the distinction between a “permissions” token and an “authentication” token. Macaroons by themselves express authorization, not authentication.</p> <p>That’s a useful constraint, and we want to honor it. By requiring a separate token for authentication, we minimize the impact of having the permissions token stolen; you can’t use it without authentication, so really it’s just like a mobile signed IAM policy expression. Neat!</p> <p>The way we express authentication is with a third-party caveat (<a href='https://fly.io/blog/macaroons-escalated-quickly/#4' title=''>see the old post for details</a>). Your main Fly.io Macaroon will have a caveat saying “this token is only valid if accompanied by the discharge token for a user in your organization from our authentication system”. Our authentication system does the login dance and issues those discharges.</p> <p>This is exactly what you want for user tokens and not at all what you want for a service token: we don’t want running code to store those authenticator tokens, because they’re hazardous.</p> <p>The solution we came up with for service tokens is simple: <code>tkdb</code> exports an API that uses its access to token secrets to strip off the third-party authentication caveat. To call into that API, you have to present a valid discharging authentication token; that is, you have to prove you could already have done whatever the token said. <code>tkdb</code> returns a new token with all the previous caveats, minus the expiration (you don’t usually want service tokens to expire).</p> <p>OK, so we’ve managed to transform a tuple <code>(unscary-token, scary-token)</code> into the new tuple <code>(scary-token)</code>. Not so impressive. But hold on: the recipient of <code>scary-token</code> can attenuate it further: we can lock it to a particular instance of <code>flyd</code>, or to a particular Fly Machine. Which means exfiltrating it doesn’t do you any good; to use it, you have to control the environment it’s intended to be used in.</p> <p>The net result of this scheme is that a compromised physical host will only give you access to tokens that have been used on that worker, which is a very nice property. Another way to look at it: every token used in production is traceable in some way to a valid token a user submitted. Neat!</p> <div class="right-sidenote"><p>All the cool spooky secret store names were taken.</p> </div> <p>We do a similar dance to with Pet Semetary, our internal Vault replacement. Petsem manages user secrets for applications, such as Postgres connection strings. Petsem is its own Macaroon authority (it issues its own Macaroons with its own permissions system), and to do something with a secret, you need one of those Petsem-minted Macaroon.</p> <p>Our primary API servers field requests from users to set secrets for their apps. So the API has a Macaroon that allows secrets writes. But it doesn&rsquo;t allow reads: there’s no API call to dump your secrets, because our API servers don’t have that privilege. So far, so good.</p> <p>But when we boot up a Fly Machine, we need to inject the appropriate user secrets into it at boot; <em>something</em> needs a Macaroon that can read secrets. That “something” is <code>flyd</code>, our orchestrator, which runs on every worker server in our fleet.</p> <p>Clearly, we can’t give every <code>flyd</code> a Macaroon that reads every user’s secret. Most users will never deploy anything on any given worker, and we can’t have a security model that collapses down to “every worker is equally privileged”.</p> <p>Instead, the “read secret” Macaroon that <code>flyd</code> gets has a third-party caveat attached to it, which is dischargeable only by talking to <code>tkdb</code> and proving (with normal Macaroon tokens) that you have permissions for the org whose secrets you want to read. Once again, access is traceable to an end-user action, and minimized across our fleet. Neat!</p> <h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'></a><span class='plain-code'>6</span></h2> <p>Our token systems have some of the best telemetry in the whole platform.</p> <p>Most of that is down to <a href='http://opentelemetry.io/' title=''>OpenTelemetry</a> and <a href='https://www.honeycomb.io/' title=''>Honeycomb</a>. From the moment a request hits our API server through the moment <code>tkdb</code> responds to it, oTel <a href='https://opentelemetry.io/docs/concepts/context-propagation/' title=''>context propagation</a> gives us a single narrative about what’s happening.</p> <p><a href='https://fly.io/blog/the-exit-interview-jp/' title=''>I was a skeptic about oTel</a>. It’s really, really expensive. And, not to put too fine a point on it, oTel really cruds up our code. Once, I was an “80% of the value of tracing, we can get from logs and metrics” person. But I was wrong.</p> <p>Errors in our token system are rare. Usually, they’re just early indications of network instability, and between caching and FlyCast, we mostly don’t have to care about those alerts. When we do, it’s because something has gone so sideways that we’d have to care anyways. The <code>tkdb</code> code is remarkably stable and there hasn’t been an incident intervention with our token system in over a year.</p> <p>Past oTel, and the standard logging and Prometheus metrics every Fly App gets for free, we also have a complete audit trail for token operations, in a permanently retained OpenSearch cluster index. Since virtually all the operations that happen on our platform are mediated by Macaroons, this audit trail is itself pretty powerful.</p> <h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'></a><span class='plain-code'>7</span></h2> <p>So, that&rsquo;s pretty much it. The moral of the story for us is, Macaroons have a lot of neat features, our users mostly don&rsquo;t care about them — that may even be a good thing — but we get a lot of use out of them internally.</p> <p>As an engineering culture, we&rsquo;re allergic to &ldquo;microservices&rdquo;, and we flinched a bit at the prospect of adding a specific service just to manage tokens. But it&rsquo;s pulled its weight, and not added really any drama at all. We have at this point a second dedicated security service (Petsem), and even though they sort of rhyme with each other, we&rsquo;ve got no plans to merge them. <a href='https://how.complexsystems.fail/#10' title=''>Rule #10</a> and all that.</p> <p>Oh, and a total victory for LiteFS, Litestream, and infrastructure SQLite. Which, after managing an infrastructure SQLite project that routinely ballooned to tens of gigabytes and occasionally threatened service outages, is lovely to see.</p> <p>Macaroons! If you&rsquo;d asked us a year ago, we&rsquo;d have said the jury was still out on whether they were a good move. But then Ben Toews spent a year making them awesome, and so they are. <a href='https://github.com/superfly/macaroon' title=''>Most of the code is open source</a>!</p> </content> </entry> <entry> <title>Taming A Voracious Rust Proxy</title> <link rel="alternate" href="https://fly.io/blog/taming-rust-proxy/"/> <id>https://fly.io/blog/taming-rust-proxy/</id> <published>2025-02-26T00:00:00+00:00</published> <updated>2025-03-10T19:59:35+00:00</updated> <media:thumbnail url="https://fly.io/blog/taming-rust-proxy/assets/happy-crab-cover.jpg"/> <content type="html"><div class="lead"><p>Here’s a fun bug.</p> </div> <p>The basic idea of our service is that we run containers for our users, as hardware-isolated virtual machines (Fly Machines), on hardware we own around the world. What makes that interesting is that we also connect every Fly Machine to a global Anycast network. If your app is running in Hong Kong and Dallas, and a request for it arrives in Singapore, we&rsquo;ll route it to <code>HKG</code>.</p> <p>Our own hardware fleet is roughly divided into two kinds of servers: edges, which receive incoming requests from the Internet, and workers, which run Fly Machines. Edges exist almost solely to run a Rust program called <code>fly-proxy</code>, the router at the heart of our Anycast network.</p> <p>So: a week or so ago, we flag an incident. Lots of things generate incidents: synthetic monitoring failures, metric thresholds, health check failures. In this case two edge tripwires tripped: elevated <code>fly-proxy</code> HTTP errors, and skyrocketing CPU utilization, on a couple hosts in <code>IAD</code>.</p> <p>Our incident process is pretty ironed out at this point. We created an incident channel (we ❤️ <a href='https://rootly.com/' title=''>Rootly</a> for this, <a href='https://rootly.com/' title=''>seriously check out Rootly</a>, an infra MVP here for years now), and incident responders quickly concluded that, while something hinky was definitely going on, the platform was fine. We have a lot of edges, and we&rsquo;ve also recently converted many of our edge servers to significantly beefier hardware.</p> <p>Bouncing <code>fly-proxy</code> clears the problem up on an affected proxy. But this wouldn&rsquo;t be much of an interesting story if the problem didn&rsquo;t later come back. So, for some number of hours, we&rsquo;re in an annoying steady-state of getting paged and bouncing proxies. </p> <p>While this is happening, Pavel, on our proxy team, pulls a profile from an angry proxy. <img alt="A flamegraph profile, described better in the prose anyways." src="/blog/taming-rust-proxy/assets/proxy-profile.jpg" /> So, this is fuckin&rsquo; weird: a huge chunk of the profile is dominated by Rust <code>tracing</code>&lsquo;s <code>Subscriber</code>. But that doesn&rsquo;t make sense. The entire point of Rust <code>tracing</code>, which generates fine-grained span records for program activity, is that <code>entering</code> and <code>exiting</code> a span is very, very fast. </p> <p>If the mere act of <code>entering</code> a span in a Tokio stack is chewing up a significant amount of CPU, something has gone haywire: the actual code being traced must be doing next to nothing.</p> <h2 id='a-quick-refresher-on-async-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-quick-refresher-on-async-rust' aria-label='Anchor'></a><span class='plain-code'>A Quick Refresher On Async Rust</span></h2> <p>So in Rust, like a lot of <code>async/await</code> languages, you&rsquo;ve got <code>Futures</code>. A <code>Future</code> is a type that represents the future value of an asychronous computation, like reading from a socket. <code>Futures</code> are state machines, and they&rsquo;re lazy: they expose one basic operation, <code>poll</code>, which an executor (like Tokio) calls to advance the state machine. That <code>poll</code> returns whether the <code>Future</code> is still <code>Pending</code>, or <code>Ready</code> with a result.</p> <p>In theory, you could build an executor that drove a bunch of <code>Futures</code> just by storing them in a list and busypolling each of them, round robin, until they return <code>Ready</code>. This executor would defeat the much of the purpose of asynchronous program, so no real executor works that way.</p> <p>Instead, a runtime like Tokio integrates <code>Futures</code> with an event loop (on <a href='https://man7.org/linux/man-pages/man7/epoll.7.html' title=''>epoll</a> or <a href='https://en.wikipedia.org/wiki/Kqueue' title=''>kqeue</a>) and, when calling <code>poll</code>, passes a <code>Waker</code>. The <code>Waker</code> is an abstract handle that allows the <code>Future</code> to instruct the Tokio runtime to call <code>poll</code>, because something has happened.</p> <p>To complicate things: an ordinary <code>Future</code> is a one-shot value. Once it&rsquo;s <code>Ready</code>, it can&rsquo;t be <code>polled</code> anymore. But with network programming, that&rsquo;s usually not what you want: data usually arrives in streams, which you want to track and make progress on as you can. So async Rust provides <code>AsyncRead</code> and <code>AsyncWrite</code> traits, which build on <code>Futures</code>, and provide methods like <code>poll_read</code> that return <code>Ready</code> <em>every time</em> there&rsquo;s data ready. </p> <p>So far so good? OK. Now, there are two footguns in this design. </p> <p>The first footgun is that a <code>poll</code> of a <code>Future</code> that isn&rsquo;t <code>Ready</code> wastes cycles, and, if you have a bug in your code and that <code>Pending</code> poll happens to trip a <code>Waker</code>, you&rsquo;ll slip into an infinite loop. That&rsquo;s easy to see.</p> <p>The second and more insidious footgun is that an <code>AsyncRead</code> can <code>poll_read</code> to a <code>Ready</code> that doesn&rsquo;t actually progress its underlying state machine. Since the idea of <code>AsyncRead</code> is that you keep <code>poll_reading</code> until it stops being <code>Ready</code>, this too is an infinite loop.</p> <p>When we look at our profiles, what we see are samples that almost terminate in libc, but spend next to no time in the kernel doing actual I/O. The obvious explanation: we&rsquo;ve entered lots of <code>poll</code> functions, but they&rsquo;re doing almost nothing and returning immediately.</p> <h2 id='jaccuse' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#jaccuse' aria-label='Anchor'></a><span class='plain-code'>J&#39;accuse!</span></h2> <p>Wakeup issues are annoying to debug. But the flamegraph gives us the fully qualified type of the <code>Future</code> we&rsquo;re polling:</p> <div class="highlight-wrapper group relative rust"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-hfleqvh4" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-hfleqvh4"><span class="o">&amp;</span><span class="k">mut</span> <span class="nn">fp_io</span><span class="p">::</span><span class="nn">copy</span><span class="p">::</span><span class="n">Duplex</span><span class="o">&lt;&amp;</span><span class="k">mut</span> <span class="nn">fp_io</span><span class="p">::</span><span class="nn">reusable_reader</span><span class="p">::</span><span class="n">ReusableReader</span><span class="o">&lt;</span><span class="nn">fp_tcp</span><span class="p">::</span><span class="nn">peek</span><span class="p">::</span><span class="n">PeekableReader</span><span class="o">&lt;</span><span class="nn">tokio_rustls</span><span class="p">::</span><span class="nn">server</span><span class="p">::</span><span class="n">TlsStream</span><span class="o">&lt;</span><span class="nn">fp_tcp_metered</span><span class="p">::</span><span class="n">MeteredIo</span><span class="o">&lt;</span><span class="nn">fp_tcp</span><span class="p">::</span><span class="nn">peek</span><span class="p">::</span><span class="n">PeekableReader</span><span class="o">&lt;</span><span class="nn">fp_tcp</span><span class="p">::</span><span class="nn">permitted</span><span class="p">::</span><span class="n">PermittedTcpStream</span><span class="o">&gt;&gt;&gt;&gt;&gt;</span><span class="p">,</span> <span class="nn">connect</span><span class="p">::</span><span class="nn">conn</span><span class="p">::</span><span class="n">Conn</span><span class="o">&lt;</span><span class="nn">tokio</span><span class="p">::</span><span class="nn">net</span><span class="p">::</span><span class="nn">tcp</span><span class="p">::</span><span class="nn">stream</span><span class="p">::</span><span class="n">TcpStream</span><span class="o">&gt;</span> </code></pre> </div> </div> <p>This loops like a lot, but much of it is just wrapper types we wrote ourselves, and those wrappers don&rsquo;t do anything interesting. What&rsquo;s left to audit:</p> <ul> <li><code>Duplex</code>, the outermost type, one of ours, <em>and</em> </li><li><code>TlsStream</code>, from <a href='https://github.com/rustls/rustls' title=''>Rustls</a>. </li></ul> <p><code>Duplex</code> is a beast. It&rsquo;s the core I/O state machine for proxying between connections. It&rsquo;s not easy to reason about in specificity. But: it also doesn&rsquo;t do anything directly with a <code>Waker</code>; it&rsquo;s built around <code>AsyncRead</code> and <code>AsyncWrite</code>. It hasn&rsquo;t changed recently and we can&rsquo;t trigger misbehavior in it.</p> <p>That leaves <code>TlsStream</code>. <code>TlsStream</code> is an ultra-important, load-bearing function in the Rust ecosystem. Everybody uses it. Could it harbor an async Rust footgun? Turns out, it did!</p> <p>Unlike our <code>Duplex</code>, Rustls actually does have to get intimate with the underlying async executor. And, looking through the repository, Pavel uncovers <a href='https://github.com/rustls/tokio-rustls/issues/72' title=''>this issue</a>: sometimes, <code>TlsStreams</code> in Rustls just spin out. And it turns out, what&rsquo;s causing this is a TLS state machine bug: when a TLS session is orderly-closed, with a <code>CloseNotify</code> <code>Alert</code> record, the sender of that record has informed its counterparty that no further data will be sent. But if there&rsquo;s still buffered data on the underlying connection, <code>TlsStream</code> mishandles its <code>Waker</code>, and we fall into a busy-loop.</p> <p><a href='https://github.com/rustls/rustls/pull/1950/files' title=''>Pretty straightforward fix</a>!</p> <h2 id='what-actually-happened-to-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-actually-happened-to-us' aria-label='Anchor'></a><span class='plain-code'>What Actually Happened To Us</span></h2> <p>Our partners in object storage, <a href='https://www.tigrisdata.com/' title=''>Tigris Data</a>, were conducting some kind of load testing exercise. Some aspect of their testing system triggered the <code>TlsStream</code> state machine bug, which locked up one or more <code>TlsStreams</code> in the edge proxy handling whatever corner-casey stream they were sending.</p> <p>Tigris wasn&rsquo;t generating a whole lot of traffic; tens of thousands of connections, tops. But all of them sent small HTTP bodies and then terminated early. We figured some of those connections errored out, and set up the &ldquo;TLS CloseNotify happened before EOF&rdquo; scenario. </p> <p>To be truer to the chronology, we knew pretty early on in our investigation that something Tigris was doing with their load testing was probably triggering the bug, and we got them to stop. After we worked it out, and Pavel deployed the fix, we told them to resume testing. No spin-outs.</p> <h2 id='lessons-learned' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lessons-learned' aria-label='Anchor'></a><span class='plain-code'>Lessons Learned</span></h2> <p>Keep your dependencies updated. Unless you shouldn&rsquo;t keep your dependencies updated. I mean, if there&rsquo;s a vulnerability (and, technically, this was a DoS vulnerability), always update. And if there&rsquo;s an important bugfix, update. But if there isn&rsquo;t an important bugfix, updating for the hell of it might also destabilize your project? So update maybe? Most of the time?</p> <p>Really, the truth of this is that keeping track of <em>what needs to be updated</em> is valuable work. The updates themselves are pretty fast and simple, but the process and testing infrastructure to confidently metabolize dependency updates is not. </p> <p>Our other lesson here is that there&rsquo;s an opportunity to spot these kinds of bugs more directly with our instrumentation. Spurious wakeups should be easy to spot, and triggering a metric when they happen should be cheap, because they&rsquo;re not supposed to happen often. So that&rsquo;s something we&rsquo;ll go do now.</p> </content> </entry> <entry> <title>We Were Wrong About GPUs</title> <link rel="alternate" href="https://fly.io/blog/wrong-about-gpu/"/> <id>https://fly.io/blog/wrong-about-gpu/</id> <published>2025-02-14T00:00:00+00:00</published> <updated>2025-02-17T10:54:41+00:00</updated> <media:thumbnail url="https://fly.io/blog/wrong-about-gpu/assets/choices-choices-cover.webp"/> <content type="html"><div class="lead"><p>We’re building a public cloud, on hardware we own. We raised money to do that, and to place some bets; one of them: GPU-enabling our customers. A progress report: GPUs aren’t going anywhere, but: GPUs aren’t going anywhere.</p> </div> <p>A couple years back, <a href="https://fly.io/gpu">we put a bunch of chips down</a> on the bet that people shipping apps to users on the Internet would want GPUs, so they could do AI/ML inference tasks. To make that happen, we created <a href="https://fly.io/docs/gpus/getting-started-gpus/">Fly GPU Machines</a>.</p> <p>A Fly Machine is a <a href="https://fly.io/blog/docker-without-docker/">Docker/OCI container</a> running inside a hardware-virtualized virtual machine somewhere on our global fleet of bare-metal worker servers. A GPU Machine is a Fly Machine with a hardware-mapped Nvidia GPU. It&rsquo;s a Fly Machine that can do fast CUDA.</p> <p>Like everybody else in our industry, we were right about the importance of AI/ML. If anything, we underestimated its importance. But the product we came up with probably doesn&rsquo;t fit the moment. It&rsquo;s a bet that doesn&rsquo;t feel like it&rsquo;s paying off.</p> <p><strong class='font-semibold text-navy-950'>If you&rsquo;re using Fly GPU Machines, don&rsquo;t freak out; we&rsquo;re not getting rid of them.</strong> But if you&rsquo;re waiting for us to do something bigger with them, a v2 of the product, you&rsquo;ll probably be waiting awhile.</p> <h3 id='what-it-took' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-took' aria-label='Anchor'></a><span class='plain-code'>What It Took</span></h3> <p>GPU Machines were not a small project for us. Fly Machines run on an idiosyncratically small hypervisor (normally Firecracker, but for GPU Machines <a href="https://github.com/cloud-hypervisor/cloud-hypervisor">Intel&rsquo;s Cloud Hypervisor</a>, a very similar Rust codebase that supports PCI passthrough). The Nvidia ecosystem is not geared to supporting micro-VM hypervisors.</p> <p>GPUs <a href="https://googleprojectzero.blogspot.com/2020/09/attacking-qualcomm-adreno-gpu.html">terrified our security team</a>. A GPU is just about the worst case hardware peripheral: intense multi-directional direct memory transfers</p> <div class="right-sidenote"><p>(not even bidirectional: in common configurations, GPUs talk to each other)</p> </div> <p>with arbitrary, end-user controlled computation, all operating outside our normal security boundary.</p> <p>We did a couple expensive things to mitigate the risk. We shipped GPUs on dedicated server hardware, so that GPU- and non-GPU workloads weren&rsquo;t mixed. Because of that, the only reason for a Fly Machine to be scheduled on a GPU machine was that it needed a PCI BDF for an Nvidia GPU, and there&rsquo;s a limited number of those available on any box. Those GPU servers were drastically less utilized and thus less cost-effective than our ordinary servers.</p> <p>We funded two very large security assessments, from <a href="https://www.atredis.com/">Atredis</a> and <a href="https://tetrelsec.com/">Tetrel</a>, to evaluate our GPU deployment. Matt Braun is writing up those assessments now. They were not cheap, and they took time.</p> <p>Security wasn&rsquo;t directly the biggest cost we had to deal with, but it was an indirect cause for a subtle reason.</p> <p>We could have shipped GPUs very quickly by doing what Nvidia recommended: standing up a standard K8s cluster to schedule GPU jobs on. Had we taken that path, and let our GPU users share a single Linux kernel, we&rsquo;d have been on Nvidia&rsquo;s driver happy-path.</p> <p>Alternatively, we could have used a conventional hypervisor. Nvidia suggested VMware (heh). But they could have gotten things working had we used QEMU. We like QEMU fine, and could have talked ourselves into a security story for it, but the whole point of Fly Machines is that they take milliseconds to start. We could not have offered our desired Developer Experience on the Nvidia happy-path.</p> <p>Instead, we burned months trying (and ultimately failing) to get Nvidia&rsquo;s host drivers working to map <a href="https://www.nvidia.com/en-us/data-center/virtual-solutions/">virtualized GPUs</a> into Intel Cloud Hypervisor. At one point, we hex-edited the closed-source drivers to trick them into thinking our hypervisor was QEMU.</p> <p>I&rsquo;m not sure any of this really mattered in the end. There&rsquo;s a segment of the market we weren&rsquo;t ever really able to explore because Nvidia&rsquo;s driver support kept us from thin-slicing GPUs. We&rsquo;d have been able to put together a really cheap offering for developers if we hadn&rsquo;t run up against that, and developers love &ldquo;cheap&rdquo;, but I can&rsquo;t prove that those customers are real.</p> <p>On the other hand, we&rsquo;re committed to delivering the Fly Machine DX for GPU workloads. Beyond the PCI/IOMMU drama, just getting an entire hardware GPU working in a Fly Machine was a lift. We needed Fly Machines that would come up with the right Nvidia drivers; our stack was built assuming that the customer&rsquo;s OCI container almost entirely defined the root filesystem for a Machine. We had to engineer around that in our <code>flyd</code> orchestrator. And almost everything people want to do with GPUs involves efficiently grabbing huge files full of model weights. Also annoying!</p> <p>And, of course, we bought GPUs. A lot of GPUs. Expensive GPUs.</p> <h3 id='why-it-isnt-working' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-it-isnt-working' aria-label='Anchor'></a><span class='plain-code'>Why It Isn&rsquo;t Working</span></h3> <p>The biggest problem: developers don&rsquo;t want GPUs. They don&rsquo;t even want AI/ML models. They want LLMs. <em>System engineers</em> may have smart, fussy opinions on how to get their models loaded with CUDA, and what the best GPU is. But <em>software developers</em> don&rsquo;t care about any of that. When a software developer shipping an app comes looking for a way for their app to deliver prompts to an LLM, you can&rsquo;t just give them a GPU.</p> <p>For those developers, who probably make up most of the market, it doesn&rsquo;t seem plausible for an insurgent public cloud to compete with OpenAI and Anthropic. Their APIs are fast enough, and developers thinking about performance in terms of &ldquo;tokens per second&rdquo; aren&rsquo;t counting milliseconds.</p> <div class="right-sidenote"><p>(you should all feel sympathy for us)</p> </div> <p>This makes us sad because we really like the point in the solution space we found. Developers shipping apps on Amazon will outsource to other public clouds to get cost-effective access to GPUs. But then they&rsquo;ll faceplant trying to handle data and model weights, backhauling gigabytes (at significant expense) from S3. We have app servers, GPUs, and object storage all under the same top-of-rack switch. But inference latency just doesn&rsquo;t seem to matter yet, so the market doesn&rsquo;t care.</p> <p>Past that, and just considering the system engineers who do care about GPUs rather than LLMs: the hardware product/market fit here is really rough.</p> <p>People doing serious AI work want galactically huge amounts of GPU compute. A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s.</p> <div class="right-sidenote"><p>Near as we can tell, MIG gives you a UUID to talk to the host driver, not a PCI device.</p> </div> <p>We think there&rsquo;s probably a market for users doing lightweight ML work getting tiny GPUs. <a href="https://www.nvidia.com/en-us/technologies/multi-instance-gpu/">This is what Nvidia MIG does</a>, slicing a big GPU into arbitrarily small virtual GPUs. But for fully-virtualized workloads, it&rsquo;s not baked; we can&rsquo;t use it. And I&rsquo;m not sure how many of those customers there are, or whether we&rsquo;d get the density of customers per server that we need.</p> <p><a href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half">That leaves the L40S customers</a>. There are a bunch of these! We dropped L40S prices last year, not because we were sour on GPUs but because they&rsquo;re the one part we have in our inventory people seem to get a lot of use out of. We&rsquo;re happy with them. But they&rsquo;re just another kind of compute that some apps need; they&rsquo;re not a driver of our core business. They&rsquo;re not the GPU bet paying off.</p> <p>Really, all of this is just a long way of saying that for most software developers, &ldquo;AI-enabling&rdquo; their app is best done with API calls to things like Claude and GPT, Replicate and RunPod.</p> <h3 id='what-did-we-learn' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-did-we-learn' aria-label='Anchor'></a><span class='plain-code'>What Did We Learn?</span></h3> <p>A very useful way to look at a startup is that it&rsquo;s a race to learn stuff. So, what&rsquo;s our report card?</p> <p>First off, when we embarked down this path in 2022, we were (like many other companies) operating in a sort of phlogiston era of AI/ML. The industry attention to AI had not yet collapsed around a small number of foundational LLM models. We expected there to be a diversity of <em>mainstream</em> models, the world <a href='https://github.com/elixir-nx/bumblebee' title=''>Elixir Bumblebee</a> looks forward to, where people pull different AI workloads off the shelf the same way they do Ruby gems.</p> <p>But <a href='https://www.cursor.com/' title=''>Cursor happened</a>, and, as they say, how are you going to keep &lsquo;em down on the farm once they&rsquo;ve seen Karl Hungus? It seems much clearer where things are heading.</p> <p>GPUs were a test of a Fly.io company credo: as we think about core features, we design for 10,000 developers, not for 5-6. It took a minute, but the credo wins here: GPU workloads for the 10,001st developer are a niche thing.</p> <p>Another way to look at a startup is as a series of bets. We put a lot of chips down here. But the buy-in for this tournament gave us a lot of chips to play with. Never making a big bet of any sort isn&rsquo;t a winning strategy. I&rsquo;d rather we&rsquo;d flopped the nut straight, but I think going in on this hand was the right call.</p> <p>A really important thing to keep in mind here, and something I think a lot of startup thinkers sleep on, is the extent to which this bet involved acquiring assets. Obviously, some of our <a href='https://fly.io/blog/the-exit-interview-jp/' title=''>costs here aren&rsquo;t recoverable</a>. But the hardware parts that aren&rsquo;t generating revenue will ultimately get liquidated; like with <a href='https://fly.io/blog/32-bit-real-estate/' title=''>our portfolio of IPv4 addresses</a>, I&rsquo;m even more comfortable making bets backed by tradable assets with durable value.</p> <p>In the end, I don&rsquo;t think GPU Fly Machines were going to be a hit for us no matter what we did. Because of that, one thing I&rsquo;m very happy about is that we didn&rsquo;t compromise the rest of the product for them. Security concerns slowed us down to where we probably learned what we needed to learn a couple months later than we could have otherwise, but we&rsquo;re scaling back our GPU ambitions without having sacrificed <a href='https://fly.io/blog/sandboxing-and-workload-isolation/' title=''>any of our isolation story</a>, and, ironically, GPUs <em>other people run</em> are making that story a lot more important. The same thing goes for our Fly Machine developer experience.</p> <p>We started this company building a Javascript runtime for edge computing. We learned that our customers didn&rsquo;t want a new Javascript runtime; they just wanted their native code to work. <a href='https://news.ycombinator.com/item?id=22616857' title=''>We shipped containers</a>, and no convincing was needed. We were wrong about Javascript edge functions, and I think we were wrong about GPUs. That&rsquo;s usually how we figure out the right answers: by being wrong about a lot of stuff.</p> </content> </entry> <entry> <title>The Exit Interview: JP Phillips</title> <link rel="alternate" href="https://fly.io/blog/the-exit-interview-jp/"/> <id>https://fly.io/blog/the-exit-interview-jp/</id> <published>2025-02-12T00:00:00+00:00</published> <updated>2025-02-12T14:06:21+00:00</updated> <media:thumbnail url="https://fly.io/blog/the-exit-interview-jp/assets/bye-bye-little-sebastian-cover.webp"/> <content type="html"><div class="lead"><p>JP Phillips is off to greener, or at least calmer, pastures. He joined us 4 years ago to build the next generation of our orchestration system, and has been one of the anchors of our engineering team. His last day is today. We wanted to know what he was thinking, and figured you might too.</p> </div> <p><em>Question 1: Why, JP? Just why?</em></p> <p>LOL. When I looked at what I wanted to see from here in the next 3-4 years, it didn&rsquo;t really match up with where we&rsquo;re currently heading. Specifically, with our new focus on MPG <em>[Managed Postgres]</em> and [llm] <em>[llm].</em></p> <div class="callout"><p>Editorial comment: Even I don’t know what [llm] is.</p> </div> <p>The Fly Machines platform is more or less finished, in the sense of being capable of supporting the next iteration of our products. My original desire to join Fly.io was to make Machines a product that would <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>rid us of HashiCorp Nomad</a>, and I feel like that&rsquo;s been accomplished.</p> <p><em>Where were you hoping to see us headed?</em></p> <p>More directly positioned as a cloud provider, rather than a platform-as-a-service; further along the customer journey from &ldquo;developers&rdquo; and &ldquo;startups&rdquo; to large established companies.</p> <p>And, it&rsquo;s not that I disagree with PAAS work or MPG! Rather, it&rsquo;s not something that excites me in a way that I&rsquo;d feel challenged and could continue to grow technically.</p> <p><em>Follow up question: does your family know what you’re doing here? Doing to us? Are they OK with it?</em></p> <p>Yes, my family was very involved in the decision, before I even talked to other companies.</p> <p><em>What&rsquo;s the thing you&rsquo;re happiest about having built here? It cannot be &ldquo;all of <code>flyd</code>&rdquo;.</em></p> <p>We&rsquo;ve enabled developers to run workloads from an OCI image and an API call all over the world. On any other cloud provider, the knowledge of how to pull that off comes with a professional certification.</p> <p><em>In what file in our <code>nomad-firecracker</code> repository would I find that code?</em></p> <p><a href='https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines' title=''>https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines</a></p> <p><img alt="A diagram that doesn&#39;t make any of this clearer" src="/blog/the-exit-interview-jp/assets/flaps.png?1/2&amp;center" /></p> <p><em>So you mean, literally, the whole Fly Machines API, and <code>flaps</code>, the API gateway for Fly Machines?</em></p> <p>Yes, all of it. The <code>flaps</code> API server, the <code>flyd</code> RPCs it calls, the <code>flyd</code> finite state machine system, the interface to running VMs.</p> <p><em>Is there something you especially like about that design?</em></p> <p>I like that it for the most part doesn&rsquo;t require any central coordination. And I like that the P90 for Fly Machine <code>create</code> calls is sub-5-seconds for pretty much every region except for Johannesburg and Hong Kong.</p> <p>I think the FSM design is something I&rsquo;m proud of; if I could take any code with me, it&rsquo;d be the <code>internal/fsm</code> in the <code>nomad-firecracker</code> repo.</p> <div class="callout"><p>You can read more about <a href="https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/" title="">the <code>flyd</code> orchestrator JP led over here</a>. But, a quick decoder ring: <code>flyd</code> runs independently without any central coordination on thousands of “worker” servers around the globe. It’s structured as an API server for a bunch of finite state machine invocations, where an FSM might be something like “start a Fly Machine” or “create a new Fly Machine” or “cordon off a Fly Machine so we can update it”. Each FSM invocation is comprised of a bunch of steps, each of those steps has callbacks into the <code>flyd</code> code, and each step is logged in <a href="https://github.com/boltdb/bolt" title="">a BoltDB database</a>.</p> </div> <p><em>Thinking back, there are like two archetypes of insanely talented developers I’ve worked with. One is the kind that belts out ridiculous amounts of relatively sophisticated code on a whim, at like 3AM. Jerome [who leads our fly-proxy team], is that type. The other comes to projects with what feels like fully-formed, coherent designs that are not super intuitive, and the whole project just falls together around that design. Did you know you were going to do the FSM log thing when you started <code>flyd</code>?</em></p> <p>I definitely didn&rsquo;t have any specific design in mind when I started on <code>flyd</code>. I think the FSM stuff is a result of work I did at Compose.io / MongoHQ (where it was called &ldquo;recipes&rdquo;/&ldquo;operations&rdquo;) and the workd I did at HashiCorp using Cadence.</p> <p>Once I understood what the product needed to do and look like, having a way to perform deterministic and durable execution felt like a good design.</p> <p><em>Cadence?</em></p> <p><a href='https://cadenceworkflow.io/' title=''>Cadence</a> is the child of AWS Step Functions and the predecessor to <a href='https://temporal.io/' title=''>Temporal</a> (the company).</p> <p>One of the biggest gains, with how it works in <code>flyd</code>, is knowing we would need to deploy <code>flyd</code> all day, every day. If <code>flyd</code> was in the middle of doing some work, it needed to pick back up right where it left off, post-deploy.</p> <p><em>OK, next question. What&rsquo;s the most impressive thing you saw someone else build here? To make things simpler and take some pressure off the interview, we can exclude any of my works from consideration.</em></p> <p>Probably <a href='https://github.com/superfly/corrosion' title=''><code>corrosion2</code></a>.</p> <div class="callout"><p>Sidebar: <code>corrosion2</code> is our state distribution system. While <code>flyd</code> runs individual Fly Machines for users, each instance is solely responsible for its own state; there’s no global scheduler. But we have platform components, most obviously <code>fly-proxy</code>, our Anycast router, that need to know what’s running where. <code>corrosion2</code> is a Rust service that does <a href="https://fly.io/blog/building-clusters-with-serf/" title="">SWIM gossip</a> to propagate information from each worker into a CRDT-structured SQLite database. <code>corrosion2</code> essentially means any component on our fleet can do SQLite queries to get near-real-time information about any Fly Machine around the world.</p> </div> <p>If for no other reason than that we deployed <code>corrosion</code>, learned from it, and were able to make significant and valuable improvements — and then migrate to the new system in a short period of time.</p> <p>Having a &ldquo;just SQLite&rdquo; interface, for async replicated changes around the world in seconds, it&rsquo;s pretty powerful.</p> <p>If we invested in <a href='https://antithesis.com/' title=''>Anthesis</a> or TLA+ testing, I think there&rsquo;s <a href='https://github.com/superfly/corrosion' title=''>potential for other companies</a> to get value out of <code>corrosion2</code>.</p> <p><em>Just as a general-purpose gossip-based SQLite CRDT gossip system?</em></p> <p>Yes.</p> <p><em>OK, you&rsquo;re being too nice. What&rsquo;s your least favorite thing about the platform?</em></p> <p>GraphQL. No, Elixir. It&rsquo;s a tie between GraphQL and Elixir.</p> <p>But probably GraphQL, by a hair.</p> <p><em>That&rsquo;s not the answer I expected.</em></p> <p>GraphQL slows everyone down, and everything. Elixir only slows me down.</p> <p><em>The rest of the platform, you&rsquo;re fine with? No complaints?</em></p> <p>I&rsquo;m happier now that we have <code>pilot</code>.</p> <div class="callout"><p><code>pilot</code> is our new <code>init</code>. When we launch a Fly Machine, <code>init</code> is our foothold in the machine; this is unlike a normal OCI runtime, where “pid 1” is often the user’s entrypoint program. Our original <code>init</code> was so simple people dunked on it and said it might as well have been a bash script; over time, <code>init</code> has sprouted a bunch of new features. <code>pilot</code> consolidates those features, and, more importantly, is itself a complete OCI runtime; <code>pilot</code> can natively run containers inside of Fly Machines.</p> </div> <p>Before <code>pilot</code>, there really wasn&rsquo;t any contract between <code>flyd</code> and <code>init</code>. And <code>init</code> was just &ldquo;whatever we wanted <code>init</code> to be&rdquo;. That limit its ability to serve us.</p> <p>Having <code>pilot</code> be an OCI-compliant runtime with an API for <code>flyd</code> to drive is a big win for the future of the Fly Machines API.</p> <p><em>Was I right that we should have used SQLite for <code>flyd</code>, or were you wrong to have used BoltDB?</em></p> <p>I still believe Bolt was the right choice. I&rsquo;ve never lost a second of sleep worried that someone is about to run a SQL update statement on a host, or across the whole fleet, and then mangled all our state data. And limiting the storage interface, by not using SQL, kept <code>flyd</code>&lsquo;s scope managed.</p> <p>On the engine side of the platform, which is what <code>flyd</code> is, I still believe SQL is too powerful for what <code>flyd</code> does.</p> <p><em>If you had this to do over again, would Bolt be precisely what you&rsquo;d pick, or is there something else you&rsquo;d want to try? Some cool-ass new KV store?</em></p> <p>Nah. But, I&rsquo;d maybe consider a SQLite database per-Fly-Machine. Then the scope of danger is about as small as it could possibly be.</p> <p><em>Whoah, that&rsquo;s an interesting thought. People sleep on the &ldquo;keep a zillion little SQLites&rdquo; design.</em></p> <p>Yeah, with per-Machine SQLite, once a Fly Machine is destroyed, we can just zip up the database and stash it in object storage. The biggest hold-up I have about it is how we&rsquo;d manage the schemas.</p> <p><em>OpenTelemetry: were you right all along?</em></p> <p>One hundred percent.</p> <p><em>I basically attribute oTel at Fly.io to you.</em></p> <p>Without oTel, it&rsquo;d be a disaster trying to troubleshoot the system. I&rsquo;d have ragequit trying.</p> <p><em>I remember there being a cost issue, with how much Honeycomb was going to end up charging us to manage all the data. But that seems silly in retrospect.</em></p> <p>For sure. It is 100% part of the decision and the conversation. But: we didn&rsquo;t have the best track record running a logs/metrics cluster at this fidelity. It was worth the money to pay someone else to manage tracing data.</p> <p><em>Strong agree. I think my only issue is just the extent to which it cruds up code. But I need to get over that.</em></p> <p>Yes, it&rsquo;s very explicit. I think the next big part of oTel is going to be auto-instrumentation, for profiling.</p> <p><em>You&rsquo;re a veteran Golang programmer. Say 3 nice things about Rust.</em></p> <div class="callout"><p>Most of our backend is in Go, but <code>fly-proxy</code>, <code>corrosion2</code>, and <code>pilot</code> are in Rust.</p> </div> <ol> <li>Option. </li><li>Match. </li><li>Serde macros. </li></ol> <p><em>Even I can&rsquo;t say shit about Option and match.</em></p> <p>Match is so much better than anything in Go.</p> <p><em>Elixir, Go, and Rust. An honest take on that programming cocktail.</em></p> <p>Three&rsquo;s a crowd, Elixir can stay home.</p> <p><em>If you could only lose one, you&rsquo;d keep Rust.</em></p> <p>I&rsquo;ve learned its shortcomings and the productivity far outweighs having to deal with the Rust compiler.</p> <p><em>You&rsquo;d be unhappy if we moved the <code>flaps</code> API code from Go to Elixir.</em></p> <p>Correct.</p> <p><em>I kind of buy the idea of doing orchestration and scheduling code, which is policy-intensive, in a higher-level language.</em></p> <p>Maybe. If Ruby had a better concurrency story, I don&rsquo;t think Elixir would have a place for us.</p> <div class="callout"><p>Here I need to note that Ruby is functionally dead here, and Elixir is ascendant.</p> </div> <p><em>We have an idiosyncratic management structure. We&rsquo;re bottom-up, but ambiguously so. We don&rsquo;t have roadmaps, except when we do. We have minimal top-down technical direction. Critique.</em></p> <p>It&rsquo;s too easy to lose sight of whether your current focus [in what you&rsquo;re building] is valuable to the company.</p> <p><em>The first thing I warn every candidate about on our &ldquo;do-not-work-here&rdquo; calls.</em></p> <p>I think it comes down to execution, and accountability to actually finish projects. I spun a lot trying to figure out what would be the most valuable work for Fly Machines.</p> <p><em>You don&rsquo;t have to be so nice about things.</em></p> <p>We struggle a lot with consistent communication. We change direction a little too often. It got to a point where I didn&rsquo;t see a point in devoting time and effort into projects, because I&rsquo;d not be able to show enough value quick enough.</p> <p><em>I see things paying off later than we&rsquo;d hoped or expected they would. Our secret storage system, Pet Semetary, is a good example of this. Our K8s service, FKS, is another obvious one, since we&rsquo;re shipping MPG on it.</em></p> <p><em>This is your second time working Kurt, at a company where he&rsquo;s the CEO. Give him a 1-4 star rating. He can take it! At least, I think he can take it.</em></p> <p>2022: ★★★★</p> <p>2023: ★★</p> <p>2024: ★★✩</p> <p>2025: ★★★✩</p> <p>On a four-star scale.</p> <p><em>Whoah. I did not expect a histogram. Say more about 2023!</em></p> <p>We hired too many people, too quickly, and didn&rsquo;t have the guardrails and structure in place for everybody to be successful.</p> <p><em>Also: GPUs!</em></p> <p>Yes. That was my next comment.</p> <p><em>Do we secretly agree about GPUs?</em></p> <p>I think so.</p> <p><em>Our side won the argument in the end! But at what cost?</em></p> <p>They were a killer distraction.</p> <p><em>Final question: how long will you remain in the first-responder on-call rotation after you leave? I assume at least until August. I have a shift this weekend; can you swap with me? I keep getting weekends.</em></p> <p>I am going to be asleep all weekend if any of my previous job changes are indicative.</p> <p><em>I sleep through on-call too! But nobody can yell at you for it now. I think you have the comparative advantage over me in on-calling.</em></p> <p>Yes I will absolutely take all your future on-call shifts, you have convinced me.</p> <p><em>All this aside: it has been a privilege watching you work. I hope your next gig is 100x more relaxing than this was. Or maybe I just hope that for myself. Except: I&rsquo;ll never escape this place. Thank you so much for doing this.</em></p> <p>Thank you! I&rsquo;m forever grateful for having the opportunity to be a part of Fly.io.</p> </content> </entry> <entry> <title>A Blog, If You Can Keep It</title> <link rel="alternate" href="https://fly.io/blog/a-blog-if-kept/"/> <id>https://fly.io/blog/a-blog-if-kept/</id> <published>2025-02-10T00:00:00+00:00</published> <updated>2025-02-19T13:16:17+00:00</updated> <media:thumbnail url="https://fly.io/blog/a-blog-if-kept/assets/keep-blog.webp"/> <content type="html"><div class="lead"><p>A boldfaced lede like this was a sure sign you were reading a carefully choreographed EffortPost from our team at Fly.io. We’re going to do less of those. Or the same amount but more of a different kind of post. Either way: launch an app on Fly.io today!</p> </div> <p>Over the last 5 years, we’ve done pretty well for ourselves writing content for Hacker News. And that’s <a href='https://news.ycombinator.com/item?id=39373476' title=''>mostly</a> been good for us. We don’t do conventional marketing, we don’t have a sales team, the rest of social media is atomized over 5 different sites. Writing pieces that HN takes seriously has been our primary outreach tool.</p> <p>There’s a recipe (probably several, but I know this one works) for charting a post on HN:</p> <ol> <li>Write an EffortPost, which is to say a dense technical piece over 2000 words long; within that rubric there’s a bunch of things that are catnip to HN, including runnable code, research surveys, and explainers. (There are also cat-repellants you learn to steer clear of.) </li><li>Assiduously avoid promotion. You have to write for the audience. We get away with sporadically including a call-to-action block in our posts, but otherwise: the post should make sense even if an unrelated company posted it after you went out of business. </li><li>Pick topics HN is interested in (it helps if all your topics are au courant for HN, and we’ve been <a href='https://news.ycombinator.com/item?id=32250426' title=''>very</a> <a href='https://news.ycombinator.com/item?id=32018066' title=''>lucky</a> in that regard). </li><li>Like 5-6 more style-guide things that help incrementally. Probably 3 different teams writing for HN will have 3 different style guides with only like &frac12; overlap. Ours, for instances, instructs writers to swear. </li></ol> <p>I like this kind of writing. It’s not even a chore. But it’s become an impediment for us, for a couple reasons: the team serializes behind an “editorial” function here, which keeps us from publishing everything we want; worse, caring so much about our track record leaves us noodling on posts interminably (the poor <a href='https://www.tigrisdata.com/' title=''>Tigrises</a> have been waiting for months for me to publish the piece I wrote about them and FoundationDB; take heart, this post today means that one is coming soon).</p> <p>But worst of all, I worried incessantly about us <a href='https://gist.github.com/tqbf/e853764b562a2d72a91a6986ca3b77c0' title=''>wearing out our welcome</a>. To my mind, we’d have 1, maybe 2 bites at the HN apple in a given month, and we needed to make them count.</p> <p>That was dumb. I am dumb about a lot of things! I came around to understanding this after Kurt demanded I publish my blog post about BFAAS (Bash Functions As A Service), 500 lines of Go code that had generated 4500 words in my draft. It was only after I made the decision to stop gatekeeping this blog that I realized <a href='https://simonwillison.net/' title=''>Simon Willison</a> has been disproving my “wearing out the welcome” theory, day in and day out, for years. He just writes stuff about LLMs when it interests him. I mean, it helps that he’s a better writer than we are. But he’s not wasting time choreographing things.</p> <p>Back in like 2009, <a href='https://web.archive.org/web/20110806040300/http://chargen.matasano.com/chargen/2009/7/22/if-youre-typing-the-letters-a-e-s-into-your-code-youre-doing.html' title=''>we had a blog</a> at another company I was at. That blog drove a lot of business for us (and, on three occasions, almost killed me). It was not in the least bit optimized for HN. I like pretending to be a magazine feature writer, but I miss writing dashed-off pieces every day and clearing space for other people on the team to write as well.</p> <p>So this is all just a heads up: we’re trying something new. This is a very long and self-indulgent way to say “we’re going to write a normal blog like it’s 2008”, but that’s how broken my brain is after years of having my primary dopaminergic rewards come from how long Fly.io blog posts stay on the front page: I have to disclaim blogging before we start doing it, lest I fail to meet expectations.</p> <p>Like I said. I’m real dumb. But: looking forward to getting a lot more stuff out on the web for people to read this year!</p> </content> </entry> <entry> <title>Did Semgrep Just Get A Lot More Interesting?</title> <link rel="alternate" href="https://fly.io/blog/semgrep-but-for-real-now/"/> <id>https://fly.io/blog/semgrep-but-for-real-now/</id> <published>2025-02-10T00:00:00+00:00</published> <updated>2025-02-11T00:20:14+00:00</updated> <media:thumbnail url="https://fly.io/blog/semgrep-but-for-real-now/assets/ci-cd-cover.webp"/> <content type="html"><div class="right-sidenote"><p>This whole paragraph is just one long sentence. God I love <a href="https://fly.io/blog/a-blog-if-kept/" title="">just random-ass blogging</a> again.</p> </div> <p><a href='https://ghuntley.com/stdlib/' title=''>This bit by Geoffrey Huntley</a> is super interesting to me and, despite calling out that LLM-driven development agents like Cursor have something like a 40% success rate at actually building anything that passes acceptance criteria, makes me think that more of the future of our field belongs to people who figure out how to use this weird bags of model weights than any of us are comfortable with. </p> <p>I’ve been dinking around with Cursor for a week now (if you haven’t, I think it’s something close to malpractice not to at least take it — or something like it — for a spin) and am just now from this post learning that Cursor has this <a href='https://docs.cursor.com/context/rules-for-ai' title=''>rules feature</a>. </p> <p>The important thing for me is not how Cursor rules work, but rather how Huntley uses them. He turns them back on themselves, writing rules to tell Cursor how to organize the rules, and then teach Cursor how to write (under human supervision) its own rules.</p> <p>Cursor kept trying to get Huntley to use Bazel as a build system. So he had cursor write a rule for itself: “no bazel”. And there was no more Bazel. If I’d known I could do this, I probably wouldn’t have bounced from the Elixir project I had Cursor doing, where trying to get it to write simple unit tests got it all tangled up trying to make <a href='https://hexdocs.pm/mox/Mox.html' title=''>Mox</a> work. </p> <p>But I’m burying the lead. </p> <p>Security people have been for several years now somewhat in love with a tool called <a href='https://github.com/semgrep/semgrep' title=''>Semgrep</a>. Semgrep is a semantics-aware code search tool; using symbolic variable placeholders and otherwise ordinary code, you can write rules to match pretty much arbitary expressions and control flow. </p> <p>If you’re an appsec person, where you obviously go with this is: you build a library of Semgrep searches for well-known vulnerability patterns (or, if you’re like us at Fly.io, you work out how to get Semgrep to catch the Rust concurrency footgun of RWLocks inside if-lets).</p> <p>The reality for most teams though is “ain’t nobody got time for that”. </p> <p>But I just checked and, unsurprisingly, 4o <a href='https://chatgpt.com/share/67aa94a7-ea3c-8012-845c-6c9491b33fe4' title=''>seems to do reasonably well</a> at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?</p> <p>What interests me is this: it seems obvious that we’re going to do more and more “closed-loop” LLM agent code generation stuff. By “closed loop”, I mean that the thingy that generates code is going to get to run the code and watch what happens when it’s interacted with. You’re just a small bit of glue code and a lot of system prompting away from building something like that right now: <a href='https://x.com/chris_mccord/status/1882839014845374683' title=''>Chris McCord is building</a> a thingy that generates whole Elixir/Phoenix apps and runs them as Fly Machines. When you deploy these kinds of things, the LLM gets to see the errors when the code is run, and it can just go fix them. It also gets to see errors and exceptions in the logs when you hit a page on the app, and it can just go fix them.</p> <p>With a bit more system prompting, you can get an LLM to try to generalize out from exceptions it fixes and generate unit test coverage for them. </p> <p>With a little bit more system prompting, you can probably get an LLM to (1) generate a Semgrep rule for the generalized bug it caught, (2) test the Semgrep rule with a positive/negative control, (3) save the rule, (4) test the whole codebase with Semgrep for that rule, and (5) fix anything it finds that way. </p> <p>That is a lot more interesting to me than tediously (and probably badly) trying to predict everything that will go wrong in my codebase a priori and Semgrepping for them. Which is to say: Semgrep — which I have always liked — is maybe a lot more interesting now? And tools like it?</p> </content> </entry> <entry> <title>VSCode’s SSH Agent Is Bananas</title> <link rel="alternate" href="https://fly.io/blog/vscode-ssh-wtf/"/> <id>https://fly.io/blog/vscode-ssh-wtf/</id> <published>2025-02-07T00:00:00+00:00</published> <updated>2025-02-07T21:53:40+00:00</updated> <media:thumbnail url="https://fly.io/static/images/default-post-thumbnail.webp"/> <content type="html"><p>We’re interested in getting integrated into the flow VSCode uses to do remote editing over SSH, because everybody is using VSCode now, and, in particular, they’re using forks of VSCode that generate code with LLMs. </p> <div class="right-sidenote"><p>”hallucination” is what we call it when LLMs get code wrong; “engineering” is what we call it when people do.</p> </div> <p>LLM-generated code is <a href='https://nicholas.carlini.com/writing/2024/how-i-use-ai.html' title=''>useful in the general case</a> if you know what you’re doing. But it’s ultra-useful if you can close the loop between the LLM and the execution environment (with an “Agent” setup). There’s lots to say about this, but for the moment: it’s a semi-effective antidote to hallucination: the LLM generates the code, the agent scaffolding runs the code, the code generates errors, the agent feeds it back to the LLM, the process iterates. </p> <p>So, obviously, the issue here is you don’t want this iterative development process happening on your development laptop, because LLMs have boundary issues, and they’ll iterate on your system configuration just as happily on the Git project you happen to be working in. A thing you’d really like to be able to do: run a closed-loop agent-y (“agentic”? is that what we say now) configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can’t screw you over in any way. You get where we’re going with this.</p> <p>Anyways! I would like to register a concern.</p> <p>Emacs hosts the spiritual forebearer of remote editing systems, a blob of hyper-useful Elisp called <a href='https://www.gnu.org/software/tramp/' title=''>“Tramp”</a>. If you can hook Tramp up to any kind of interactive environment — usually, an SSH session — where it can run Bourne shell commands, it can extend Emacs to that environment.</p> <p>So, VSCode has a feature like Tramp. Which, neat, right? You’d think, take Tramp, maybe simplify it a bit, switch out Elisp for Typescript.</p> <p>You’d think wrong!</p> <p>Unlike Tramp, which lives off the land on the remote connection, VSCode mounts a full-scale invasion: it runs a Bash snippet stager that downloads an agent, including a binary installation of Node. </p> <p>I <em>think</em> this is <a href='https://github.com/microsoft/vscode/tree/c9e7e1b72f80b12ffc00e06153afcfedba9ec31f/src/vs/server/node' title=''>the source code</a>?</p> <p>The agent runs over port-forwarded SSH. It establishes a WebSockets connection back to your running VSCode front-end. The underlying protocol on that connection can:</p> <ul> <li>Wander around the filesystem </li><li>Edit arbitrary files </li><li>Launch its own shell PTY processes </li><li>Persist itself </li></ul> <p>In security-world, there’s a name for tools that work this way. I won’t say it out loud, because that’s not fair to VSCode, but let’s just say the name is murid in nature.</p> <p>I would be a little nervous about letting people VSCode-remote-edit stuff on dev servers, and apoplectic if that happened during an incident on something in production. </p> <p>It turns out we don’t have to care about any of this to get a custom connection to a Fly Machine working in VSCode, so none of this matters in any kind of deep way, but: we’ve decided to just be a blog again, so: we had to learn this, and now you do too.</p> </content> </entry> <entry> <title>AI GPU Clusters, From Your Laptop, With Livebook</title> <link rel="alternate" href="https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/"/> <id>https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/</id> <published>2024-09-24T00:00:00+00:00</published> <updated>2024-10-03T19:05:54+00:00</updated> <media:thumbnail url="https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/assets/ai-gpu-livebook-thumb.webp"/> <content type="html"><div class="lead"><p>Livebook, FLAME, and the Nx stack: three Elixir components that are easy to describe, more powerful than they look, and intricately threaded into the Elixir ecosystem. A few weeks ago, Chris McCord (👋) and Chris Grainger showed them off at ElixirConf 2024. We thought the talk was worth a recap.</p> </div> <p>Let&rsquo;s begin by introducing our cast of characters.</p> <p><a href='https://livebook.dev/' title=''>Livebook</a> is usually described as Elixir&rsquo;s answer to <a href='https://jupyter.org/' title=''>Jupyter Notebooks</a>. And that&rsquo;s a good way to think about it. But Livebook takes full advantage of the Elixir platform, which makes it sneakily powerful. By linking up directly with Elixir app clusters, Livebook can switch easily between driving compute locally or on remote servers, and makes it easy to bring in any kind of data into reproducible workflows.</p> <p><a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''>FLAME</a> is the Elixir&rsquo;s answer to serverless computing. By having the library manage a pool of executors for you, FLAME lets you treat your entire application as if it was elastic and scale-to-zero. You configure FLAME with some basic information about where to run code and how many instances it&rsquo;s allowed to run with, and then mark off any arbitrary section of code with <code>Flame.call</code>. The framework takes care of the rest. It&rsquo;s the upside of serverless without committing yourself to blowing your app apart into tiny, intricately connected pieces.</p> <p>The <a href='https://github.com/elixir-nx' title=''>Nx stack</a> is how you do Elixir-native AI and ML. Nx gives you an Elixir-native notion of tensor computations with GPU backends. <a href='https://github.com/elixir-nx/axon' title=''>Axon</a> builds a common interface for ML models on top of it. <a href='https://github.com/elixir-nx/bumblebee' title=''>Bumblebee</a> makes those models available to any Elixir app that wants to download them, from just a couple lines of code.</p> <p>Here is quick video showing how to transfer a local tensor to a remote GPU, using Livebook, FLAME, and Nx:</p> <div class="youtube-container" data-exclude-render> <div class="youtube-video"> <iframe width="100%" height="100%" src="https://www.youtube.com/embed/5ImP3gpUSkQ" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe> </div> </div> <p>Let&rsquo;s dive into the <a href='https://www.youtube.com/watch?v=4qoHPh0obv0' title=''>keynote</a>.</p> <h2 id='poking-a-hole-in-your-infrastructure' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#poking-a-hole-in-your-infrastructure' aria-label='Anchor'></a><span class='plain-code'>Poking a hole in your infrastructure</span></h2> <p>Any Livebook, including the one running on your laptop, can start a runtime running on a Fly Machine, in Fly.io&rsquo;s public cloud. That Elixir machine will (by default) live in your default Fly.io organization, giving it networked access to all the other apps that might live there.</p> <p>This is an access control situation that mostly just does what you want it to do without asking. Unless you ask it to, Fly.io isn&rsquo;t exposing anything to the Internet, or to other users of Fly.io. For instance: say we have a database we&rsquo;re going to use to generate reports. It can hang out on our Fly organization, inside of a private network with no connectivity to the world. We can spin up a Livebook instance that can talk to it, without doing any network or infrastructure engineering to make that happen.</p> <p>But wait, there&rsquo;s more. Because this is all Elixir, Livebook also allows you to connect to any running Erlang/Elixir application in your infrastructure to debug, introspect, and monitor them.</p> <p>Check out this clip of Chris McCord connecting <a href='https://rtt.fly.dev/' title=''>to an existing application</a> during the keynote:</p> <div class="youtube-container" data-exclude-render> <div class="youtube-video"> <iframe width="100%" height="100%" src="https://www.youtube.com/embed/4qoHPh0obv0?start=1106" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe> </div> </div> <p>Running a snippet of code from a laptop on a remote server is a neat trick, but Livebook is doing something deeper than that. It&rsquo;s taking advantage of Erlang/Elixir&rsquo;s native facility with cluster computation and making it available to the notebook. As a result, when we do things like auto-completing, Livebook delivers results from modules defined on the remote note itself. 🤯</p> <h2 id='elastic-scale-with-flame' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#elastic-scale-with-flame' aria-label='Anchor'></a><span class='plain-code'>Elastic scale with FLAME</span></h2> <p>When we first introduced FLAME, the example we used was video encoding.</p> <p>Video encoding is complicated and slow enough that you&rsquo;d normally make arrangements to run it remotely or in a background job queue, or as a triggerable Lambda function. The point of FLAME is to get rid of all those steps, and give them over to the framework instead. So: we wrote our <code>ffpmeg</code> calls inline like normal code, as if they were going to complete in microseconds, and wrapped them in <code>Flame.call</code> blocks. That was it, that was the demo.</p> <p>Here, we&rsquo;re going to put a little AI spin on it.</p> <p>The first thing we&rsquo;re doing here is driving FLAME pools from Livebook. Livebook will automatically synchronize your notebook dependencies as well as any module or code defined in your notebook across nodes. That means any code we write in our notebook can be dispatched transparently out to arbitrarily many compute nodes, without ceremony.</p> <p>Now let&rsquo;s add some AI flair. We take an object store bucket full of video files. We use <code>ffmpeg</code> to extract stills from the video at different moments. Then: we send them to <a href='https://www.llama.com/' title=''>Llama</a>, running on <a href='https://fly.io/gpu' title=''>GPU Fly Machines</a> (still locked to our organization), to get descriptions of the stills.</p> <p>All those stills and descriptions get streamed back to our notebook, in real time:</p> <div class="youtube-container" data-exclude-render> <div class="youtube-video"> <iframe width="100%" height="100%" src="https://www.youtube.com/embed/4qoHPh0obv0?start=1692" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe> </div> </div> <p>At the end, the descriptions are sent to <a href='https://mistral.ai/' title=''>Mistral</a>, which builds a summary.</p> <p>Thanks to FLAME, we get explicit control over the minimum and the maximum amount of nodes you want running at once, as well their concurrency settings. As nodes finish processing each video, new ones are automatically sent to them, until the whole bucket has been traversed. Each node will automatically shut down after an idle timeout and the whole cluster terminates if you disconnect the Livebook runtime.</p> <p>Just like your app code, FLAME lets you take your notebook code designed to run locally, change almost nothing, and elastically execute it across ephemeral infrastructure.</p> <h2 id='64-gpus-hyperparameter-tuning-on-a-laptop' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#64-gpus-hyperparameter-tuning-on-a-laptop' aria-label='Anchor'></a><span class='plain-code'>64-GPUs hyperparameter tuning on a laptop</span></h2> <p>Next, Chris Grainger, CTO of <a href='https://amplified.ai/' title=''>Amplified</a>, takes the stage.</p> <p>For work at Amplified, Chris wants to analyze a gigantic archive of patents, on behalf of a client doing edible cannibinoid work. To do that, he uses a BERT model (BERT, from Google, is one of the OG &ldquo;transformer&rdquo; models, optimized for text comprehension).</p> <p>To make the BERT model effective for this task, he&rsquo;s going to do a hyperparameter training run.</p> <p>This is a much more complicated AI task than the Llama work we just showed up. Chris is going to generate a cluster of 64 GPU Fly Machines, each running an <a href='https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/' title=''>L40s GPU</a>. On each of these nodes, he needs to:</p> <ul> <li>setup its environment (including native dependencies and GPU bindings) </li><li>load the training data </li><li>compile a different version of BERT with different parameters, optimizers, etc. </li><li>start the fine-tuning </li><li>stream its results in real-time to each assigned chart </li></ul> <p>Here&rsquo;s the clip. You&rsquo;ll see the results stream in, in real time, directly back to his Livebook. We&rsquo;ll wait, because it won&rsquo;t take long to watch:</p> <div class="youtube-container" data-exclude-render> <div class="youtube-video"> <iframe width="100%" height="100%" src="https://www.youtube.com/embed/4qoHPh0obv0?start=3344" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe> </div> </div> <h2 id='this-is-just-the-beginning' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#this-is-just-the-beginning' aria-label='Anchor'></a><span class='plain-code'>This is just the beginning</span></h2> <p>The suggestion of mixing Livebook and FLAME to elastically scale notebook execution was originally proposed by Chris Grainger during ElixirConf EU. During the next four months, Jonatan Kłosko, Chris McCord, and José Valim worked part-time on making it a reality in time for ElixirConf US. Our ability to deliver such a rich combination of features in such a short period of time is a testament to the capabilities of the Erlang Virtual Machine, which Elixir and Livebook runs on. Other features, such as <a href='https://github.com/elixir-explorer/explorer/issues/932' title=''>remote dataframes and distributed GC</a>, were implemented in a weekend. Bringing the same functionality to other ecosystems would take several additional months, sometimes accompanied by millions in funding, and often times as part of a closed-source product.</p> <p>Furthermore, since we announced this feature, <a href='https://github.com/mruoss' title=''>Michael Ruoss</a> stepped in and brought the same functionality to Kubernetes. From Livebook v0.14.1, you can start Livebook runtimes inside a Kubernetes cluster and also use FLAME to elastically scale them. Expect more features and news in this space!</p> <p>Finally, Fly&rsquo;s infrastructure played a key role in making it possible to start a cluster of GPUs in seconds rather than minutes, and all it requires is a Docker image. We&rsquo;re looking forward to see how other technologies and notebook platforms can leverage Fly to also elevate their developer experiences.</p> <figure class="post-cta"> <figcaption> <h1>Launch a GPU app in seconds</h1> <p>Run your own LLMs or use Livebook for elastic GPU workflows&nbsp✨</p> <a class="btn btn-lg" href="/gpu"> Go! &nbsp;<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> </content> </entry> <entry> <title>Accident Forgiveness</title> <link rel="alternate" href="https://fly.io/blog/accident-forgiveness/"/> <id>https://fly.io/blog/accident-forgiveness/</id> <published>2024-08-21T00:00:00+00:00</published> <updated>2024-08-27T21:13:01+00:00</updated> <media:thumbnail url="https://fly.io/blog/accident-forgiveness/assets/money-for-mistakes-blog-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. <a href="https://fly.io/speedrun" title="">Try it out; you’ll be deployed in just minutes</a>, and, as you’re about to read, with less financial risk.</p> </div> <p>Public cloud billing is terrifying.</p> <p>The premise of a public cloud &mdash; what sets it apart from a hosting provider &mdash; is 8,760 hours/year of on-tap deployable compute, storage, and networking. Cloud resources are &ldquo;elastic&rdquo;: they&rsquo;re acquired and released as needed; in the &ldquo;cloud-iest&rdquo; apps, without human intervention. Public cloud resources behave like utilities, and that&rsquo;s how they&rsquo;re priced.</p> <p>You probably can&rsquo;t tell me how much electricity your home is using right now, and may only come within tens of dollars of accurately predicting your water bill. But neither of those bills are all that scary, because you assume there&rsquo;s a limit to how much you could run them up in a single billing interval.</p> <p>That&rsquo;s not true of public clouds. There are only so many ways to &ldquo;spend&rdquo; water at your home, but there are indeterminably many ways to land on a code path that grabs another VM, or to miskey a configuration, or to leak long-running CI/CD environments every time a PR gets merged. Pick a practitioner at random, and I bet they&rsquo;ve read a story within the last couple months about someone running up a galactic-scale bill at some provider or other.</p> <h2 id='implied-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implied-accident-forgiveness' aria-label='Anchor'></a><span class='plain-code'>Implied Accident Forgiveness</span></h2> <p>For people who don&rsquo;t do a lot of cloud work, what all this means is that every PR push sets off a little alarm in the back of their heads: &ldquo;you may have just incurred $200,000 of costs!&rdquo;. The alarm is quickly silenced, though it&rsquo;s still subtly extracting a cortisol penalty. But by deadening the nerves that sense the danger of unexpected charges, those people are nudged closer to themselves being the next story on Twitter about an accidental $200,000 bill.</p> <p>The saving grace here, which you&rsquo;ll learn if you ever become that $200,000 story, is that nobody pays those bills.</p> <p>See, what cloud-savvy people know already is that providers have billing support teams, which spend a big chunk of their time conceding disputed bills. If you do something luridly stupid and rack up costs, AWS and GCP will probably cut you a break. We will too. Everyone does.</p> <p>If you didn&rsquo;t already know this, you&rsquo;re welcome; I&rsquo;ve made your life a little better, even if you don&rsquo;t run things on Fly.io.</p> <p>But as soothing as it is to know you can get a break from cloud providers, the billing situation here is still a long ways away from &ldquo;good&rdquo;. If you accidentally add a zero to a scale count and don&rsquo;t notice for several weeks, AWS or GCP will probably cut you a break. But they won&rsquo;t <em>definitely</em> do it, and even though your odds are good, you&rsquo;re still finding out at email- and phone-tag scale speeds. That&rsquo;s not fun!</p> <h2 id='explicit-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#explicit-accident-forgiveness' aria-label='Anchor'></a><span class='plain-code'>Explicit Accident Forgiveness</span></h2> <p>Charging you for stuff you didn&rsquo;t want is bad business.</p> <p>Good business, we think, means making you so comfortable with your cloud you try new stuff. You, and everyone else on your team. Without a chaperone from the finance department.</p> <p>So we&rsquo;re going to do the work to make this official. If you&rsquo;re a customer of ours, we&rsquo;re going to charge you in exacting detail for every resource you intentionally use of ours, but if something blows up and you get an unexpected bill, we&rsquo;re going to let you off the hook.</p> <h2 id='not-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-so-fast' aria-label='Anchor'></a><span class='plain-code'>Not So Fast</span></h2> <p>This is a Project, with a capital P. While we&rsquo;re kind of kicking ourselves for not starting it earlier, there are reasons we couldn&rsquo;t do it back in 2020.</p> <p>The Fully Automated Accident-Forgiving Billing System of the Future (which we are in fact building and may even one day ship) will give you a line-item veto on your invoice. We are a long ways away. The biggest reason is fraud.</p> <p>Sit back, close your eyes, and try to think about everything public clouds do to make your life harder. Chances are, most of those things are responses to fraud. Cloud platforms attract fraudsters like ants to an upturned ice cream cone. Thanks to the modern science of cryptography, fraudsters have had a 15 year head start on turning metered compute into picodollar-granular near-money assets.</p> <p>Since there&rsquo;s no bouncer at the door checking IDs here, an open-ended and automated commitment to accident forgiveness is, with iron certainty, going to be used overwhelmingly in order to trick us into &ldquo;forgiving&rdquo; cryptocurrency miners. We&rsquo;re cloud platform engineers. They&rsquo;re our primary pathogen.</p> <p>So, we&rsquo;re going to roll this out incrementally.</p> <div class="callout"><p><strong class="font-semibold text-navy-950">Why not billing alerts?</strong> We’ll get there, but here too there are reasons we haven’t yet: (1) meaningful billing alerts were incredibly difficult to do with our previous billing system, and building the new system and migrating our customers to it has been a huge lift, a nightmare from which we are only now waking (the billing system’s official name); and (2) we’re wary about alerts being a product design cop-out; if we can alert on something, why aren’t we fixing it?</p> </div><h2 id='accident-forgiveness-v0-84beta' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accident-forgiveness-v0-84beta' aria-label='Anchor'></a><span class='plain-code'>Accident Forgiveness v0.84beta</span></h2> <p>All the same subtextual, implied reassurances that every cloud provider offers remain in place at Fly.io. You are strictly better off after this announcement, we promise.</p> <div class="right-sidenote"><p>I added the “almost” right before publishing, because I’m chicken.</p> </div> <p>Now: for customers that have a support contract with us, at any level, there&rsquo;s something new: I&rsquo;m saying the quiet part loud. The next time you see a bill with an unexpected charge on it, we&rsquo;ll refund that charge, (almost) no questions asked.</p> <p>That policy is so simple it feels anticlimactic to write. So, some additional color commentary:</p> <p>We&rsquo;re not advertising a limit to the number of times you can do this. If you&rsquo;re a serious customer of ours, I promise that you cannot remotely fathom the fullness of our fellow-feeling. You&rsquo;re not annoying us by getting us to refund unexpected charges. If you are growing a project on Fly.io, we will bend over backwards to keep you growing.</p> <p>How far can we take this? How simple can we keep this policy? We&rsquo;re going to find out together.</p> <p>To begin with, and in the spirit of &ldquo;doing things that won&rsquo;t scale&rdquo;, when we forgive a bill, what&rsquo;s going to happen next is this: I&rsquo;m going to set an irritating personal reminder for Kurt to look into what happened, now and then the day before your next bill, so we can see what&rsquo;s going wrong. He&rsquo;s going to hate that, which is the point: our best feature work is driven by Kurt-hate.</p> <p>Obviously, if you&rsquo;re rubbing your hands together excitedly over the opportunity this policy presents, then, well, not so much with the fellow-feeling. We reserve the right to cut you off.</p> <figure class="post-cta"> <figcaption> <h1>Support For Developers, By Developers</h1> <p>Explicit Accident Forgiveness is just one thing we like about Support at Fly.io.</p> <a class="btn btn-lg" href="https://fly.io/accident-forgiveness"> Go find out! &nbsp;<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='whats-next-accident-protection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next-accident-protection' aria-label='Anchor'></a><span class='plain-code'>What&rsquo;s Next: Accident Protection</span></h2> <p>We think this is a pretty good first step. But that&rsquo;s all it is.</p> <p>We can do better than offering you easy refunds for mistaken deployments and botched CI/CD jobs. What&rsquo;s better than getting a refund is never incurring the charge to begin with, and that&rsquo;s the next step we&rsquo;re working on.</p> <div class="right-sidenote"><p>More to come on that billing system.</p> </div> <p>We built a new billing system so that we can do things like that. For instance: we&rsquo;re in a position to catch sudden spikes in your month-over-month bills, flag them, and catch weird-looking deployments before we bill for them.</p> <p>Another thing we rebuilt billing for is <a href='https://community.fly.io/t/reservation-blocks-40-discount-on-machines-when-youre-ready-to-commit/20858' title=''>reserved pricing</a>. Already today you can get a steep discount from us reserving blocks of compute in advance. The trick to taking advantage of reserved pricing is confidently predicting a floor to your usage. For a lot of people, that means fighting feelings of loss aversion (nobody wants to get gym priced!). So another thing we can do in this same vein: catch opportunities to move customers to reserved blocks, and offer backdated reservations. We&rsquo;ll figure this out too.</p> <p>Someday, when we&rsquo;re in a monopoly position, our founders have all been replaced by ruthless MBAs, and Kurt has retired to farm coffee beans in lower Montana, we may stop doing this stuff. But until that day this is the right choice for our business.</p> <p>Meanwhile: like every public cloud, we provision our own hardware, and we have excess capacity. Your messed-up CI/CD jobs didn&rsquo;t really cost us anything, so if you didn&rsquo;t really want them, they shouldn&rsquo;t cost you anything either. Take us up on this! We love talking to you.</p> </content> </entry> <entry> <title>We're Cutting L40S Prices In Half</title> <link rel="alternate" href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/"/> <id>https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/</id> <published>2024-08-15T00:00:00+00:00</published> <updated>2024-08-16T02:01:46+00:00</updated> <media:thumbnail url="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/assets/gpu-ga-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. And as of today, cheaper GPUs. <a href="https://fly.io/speedrun" title="">Try it out; you’ll be deployed in just minutes</a>.</p> </div> <p>We just lowered the prices on NVIDIA L40s GPUs to $1.25 per hour. Why? Because our feet are cold and we burn processor cycles for heat. But also other reasons.</p> <p>Let&rsquo;s back up.</p> <p>We offer 4 different NVIDIA GPU models; in increasing order of performance, they&rsquo;re the A10, the L40S, the 40G PCI A100, and the 80G SXM A100. Guess which one is most popular.</p> <p>We guessed wrong, and spent a lot of time working out how to maximize the amount of GPU power we could deliver to a single Fly Machine. Users surprised us. By a wide margin, the most popular GPU in our inventory is the A10.</p> <p>The A10 is an older generation of NVIDIA GPU with fewer, slower cores and less memory. It&rsquo;s the least capable GPU we offer. But that doesn&rsquo;t matter, because it&rsquo;s capable enough. It&rsquo;s solid for random inference tasks, and handles mid-sized generative AI stuff like Mistral Nemo or Stable Diffusion. For those workloads, there&rsquo;s not that much benefit in getting a beefier GPU.</p> <p>As a result, we can&rsquo;t get new A10s in fast enough for our users.</p> <p>If there&rsquo;s one thing we&rsquo;ve learned by talking to our customers over the last 4 years, it&rsquo;s that y&#39;all love a peek behind the curtain. So we&rsquo;re going to let you in on a little secret about how a hardware provider like Fly.io formulates GPU strategy: none of us know what the hell we&rsquo;re doing.</p> <p>If you had asked us in 2023 what the biggest GPU problem we could solve was, we&rsquo;d have said &ldquo;selling fractional A100 slices&rdquo;. We burned a whole quarter trying to get MIG, or at least vGPUs, working through IOMMU PCI passthrough on Fly Machines, in a project so cursed that Thomas has forsworn ever programming again. Then we went to market selling whole A100s, and for several more months it looked like the biggest problem we needed to solve was finding a secure way to expose NVLink-ganged A100 clusters to VMs so users could run training. Then H100s; can we find H100s anywhere? Maybe in a black market in Shenzhen?</p> <p>And here we are, a year later, looking at the data, and the least sexy, least interesting GPU part in the catalog is where all the action is.</p> <p>With actual customer data to back up the hypothesis, here&rsquo;s what we think is happening today:</p> <ul> <li>Most users who want to plug GPU-accelerated AI workloads into fast networks are doing inference, not training. </li><li>The hyperscaler public clouds strangle these customers, first with GPU instance surcharges, and then with egress fees for object storage data when those customers try to outsource the GPU stuff to GPU providers. </li><li>If you&rsquo;re trying to do something GPU-accelerated in response to an HTTP request, the right combination of GPU, instance RAM, fast object storage for datasets and model parameters, and networking is much more important than getting your hands on an H100. </li></ul> <p>This is a thing we didn&rsquo;t see coming, but should have: training workloads tend to look more like batch jobs, and inference tends to look more like transactions. Batch training jobs aren&rsquo;t that sensitive to networking or even reliability. Live inference jobs responding to end-user HTTP requests are. So, given our pricing, of course the A10s are a sweet spot.</p> <p>The next step up in our lineup after the A10 is the L40S. The L40S is a nice piece of kit. We&rsquo;re going to take a beat here and sell you on the L40S, because it&rsquo;s kind of awesome.</p> <p>The L40S is an AI-optimized version of the L40, which is the data center version of the GeForce RTX 4090, resembling two 4090s stapled together.</p> <p>If you&rsquo;re not a GPU hardware person, the RTX 4090 is a gaming GPU, the kind you&rsquo;d play ray-traced Witcher 3 on. NVIDIA&rsquo;s high-end gaming GPUs are actually reasonably good at AI workloads! But they suck in a data center rack: they chug power, they&rsquo;re hard to cool, and they&rsquo;re less dense. Also, NVIDIA can&rsquo;t charge as much for them.</p> <p>Hence the L40: (much) more memory, less energy consumption, designed for a rack, not a tower case. Marked up for &ldquo;enterprise&rdquo;.</p> <p>NVIDIA positioned the L40 as a kind of &ldquo;graphics&rdquo; AI GPU. Unlike the super high-end cards like the A100/H100, the L40 keeps all the rendering hardware, so it&rsquo;s good for 3D graphics and video processing. Which is sort of what you&rsquo;d expect from a &ldquo;professionalized&rdquo; GeForce card.</p> <p>A funny thing happened in the middle of 2023, though: the market for ultra-high-end NVIDIA cards went absolutely batshit. The huge cards you&rsquo;d gang up for training jobs got impossible to find, and NVIDIA became one of the most valuable companies in the world. Serious shops started working out plans to acquire groups of L40-type cards to work around the problem, whether or not they had graphics workloads.</p> <p>The only company in this space that does know what they&rsquo;re doing is NVIDIA. Nobody has written a highly-ranked Reddit post about GPU workloads without NVIDIA noticing and creating a new SKU. So they launched the L40S, which is an L40 with AI workload compute performance comparable to that of the A100 (without us getting into the details of F32 vs. F16 models).</p> <p>Long story short, the L40S is an A100-performer that we can price for A10 customers; the Volkswagen GTI of our lineup. We&rsquo;re going to see if we can make that happen.</p> <p>We think the combination of just-right-sized inference GPUs and Tigris object storage is pretty killer:</p> <ul> <li>model parameters, data sets, and compute are all close together </li><li>everything plugged into an Anycast network that&rsquo;s fast everywhere in the world </li><li>on VM instances that have enough memory to actually run real frameworks on </li><li>priced like we actually want you to use it. </li></ul> <p>You should use L40S cards without thinking hard about it. So we&rsquo;re making it official. You won&rsquo;t pay us a dime extra to use one instead of an A10. Have at it! Revolutionize the industry. For $1.25 an hour.</p> <p>Here are things you can do with an L40S on Fly.io today:</p> <ul> <li>You can run Llama 3.1 70B — a big Llama — for LLM jobs. </li><li>You can run Flux from Black Forest Labs for genAI images. </li><li>You can run Whisper for automated speech recognition. </li><li>You can do whole-genome alignment with SegAlign (Thomas&rsquo; biochemist kid who has been snatching free GPU hours for his lab gave us this one, and we&rsquo;re taking his word for it). </li><li>You can run DOOM Eternal, building the Stadia that Google couldn&rsquo;t pull off, because the L40S hasn&rsquo;t forgotten that it&rsquo;s a graphics GPU. </li></ul> <p>It&rsquo;s going to get chilly in Chicago in a month or so. Go light some cycles on fire! </p> </content> </entry> <entry> <title>Making Machines Move</title> <link rel="alternate" href="https://fly.io/blog/machine-migrations/"/> <id>https://fly.io/blog/machine-migrations/</id> <published>2024-07-30T00:00:00+00:00</published> <updated>2024-08-07T00:54:26+00:00</updated> <media:thumbnail url="https://fly.io/blog/machine-migrations/assets/migrations-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io, a global public cloud with simple, developer-friendly ergonomics. If you’ve got a working Docker image, we’ll transmogrify it into a Fly Machine: a VM running on our hardware anywhere in the world. <a href="https://fly.io/speedrun" title="">Try it out; you’ll be deployed in just minutes</a>.</p> </div> <p>At the heart of our platform is a systems design tradeoff about durable storage for applications. When we added storage three years ago, to support stateful apps, we built it on attached NVMe drives. A benefit: a Fly App accessing a file on a Fly Volume is never more than a bus hop away from the data. A cost: a Fly App with an attached Volume is anchored to a particular worker physical.</p> <div class="right-sidenote"><p><code>bird</code>: a BGP4 route server.</p> </div> <p>Before offering attached storage, our on-call runbook was almost as simple as “de-bird that edge server”, “tell <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>Nomad</a> to drain that worker”, and “go back to sleep”. NVMe cost us that drain operation, which terribly complicated the lives of our infra team. We’ve spent the last year getting “drain” back. It’s one of the biggest engineering lifts we&rsquo;ve made, and if you didn’t notice, we lifted it cleanly.</p> <h3 id='the-goalposts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-goalposts' aria-label='Anchor'></a><span class='plain-code'>The Goalposts</span></h3> <p>With stateless apps, draining a worker is easy. For each app instance running on the victim server, start a new instance elsewhere. Confirm it&rsquo;s healthy, then kill the old one. Rinse, repeat. At our 2020 scale, we could drain a fully loaded worker in just a handful of minutes.</p> <p>You can see why this process won&rsquo;t work for apps with attached volumes. Sure, create a new volume elsewhere on the fleet, and boot up a new Fly Machine attached to it. But the new volume is empty. The data&rsquo;s still stuck on the original worker. We asked, and customers were not OK with this kind of migration.</p> <p>Of course, we back Volumes snapshots up (at an interval) to off-network storage. But for “drain”, restoring backups isn&rsquo;t nearly good enough. No matter the backup interval, a “restore from backup migration&quot; will lose data, and a “backup and restore” migration incurs untenable downtime.</p> <p>The next thought you have is, “OK, copy the volume over”. And, yes, of course you have to do that. But you can’t just <code>copy</code>, <code>boot</code>, and then <code>kill</code> the old Fly Machine. Because the original Fly Machine is still alive and writing, you have to <code>kill</code>first, then <code>copy</code>, then <code>boot</code>.</p> <p>Fly Volumes can get pretty big. Even to a rack buddy physical server, you&rsquo;ll hit a point where draining incurs minutes of interruption, especially if you’re moving lots of volumes simultaneously. <code>Kill</code>, <code>copy</code>, <code>boot</code> is too slow.</p> <div class="callout"><p>There’s a world where even 15 minutes of interruption is tolerable. It’s the world where you run more than one instance of your application to begin with, so prolonged interruption of a single Fly Machine isn’t visible to your users. Do this! But we have to live in the same world as our customers, many of whom don’t run in high-availability configurations.</p> </div><h3 id='behold-the-clone-o-mat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#behold-the-clone-o-mat' aria-label='Anchor'></a><span class='plain-code'>Behold The Clone-O-Mat</span></h3> <p><code>Copy</code>, <code>boot</code>, <code>kill</code> loses data. <code>Kill</code>, <code>copy</code>, <code>boot</code> takes too long. What we needed is a new operation: <code>clone</code>.</p> <p><code>Clone</code> is a lazier, asynchronous <code>copy</code>. It creates a new volume elsewhere on our fleet, just like <code>copy</code> would. But instead of blocking, waiting to transfer every byte from the original volume, <code>clone</code> returns immediately, with a transfer running in the background.</p> <p>A new Fly Machine can be booted with that cloned volume attached. Its blocks are mostly empty. But that’s OK: when the new Fly Machine tries to read from it, the block storage system works out whether the block has been transferred. If it hasn’t, it’s fetched over the network from the original volume; this is called &ldquo;hydration&rdquo;. Writes are even easier, and don’t hit the network at all.</p> <p><code>Kill</code>, <code>copy</code>, <code>boot</code> is slow. But <code>kill</code>, <code>clone</code>, <code>boot</code> is fast; it can be made asymptotically as fast as stateless migration.</p> <p>There are three big moving pieces to this design.</p> <ol> <li>First, we have to rig up our OS storage system to make this <code>clone</code> operation work. </li><li>Then, to read blocks over the network, we need a network protocol. (Spoiler: iSCSI, though we tried other stuff first.) </li><li>Finally, the gnarliest of those pieces is our orchestration logic: what’s running where, what state is it in, and whether it’s plugged in correctly. </li></ol> <h3 id='block-level-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#block-level-clone' aria-label='Anchor'></a><span class='plain-code'>Block-Level Clone</span></h3> <p>The Linux feature we need to make this work already exists; <a href='https://docs.kernel.org/admin-guide/device-mapper/dm-clone.html' title=''>it’s called <code>dm-clone</code></a>. Given an existing, readable storage device, <code>dm-clone</code> gives us a new device, of identical size, where reads of uninitialized blocks will pull from the original. It sounds terribly complicated, but it’s actually one of the simpler kernel lego bricks. Let&rsquo;s demystify it.</p> <p>As far as Unix is concerned, random-access storage devices, be they spinning rust or NVMe drives, are all instances of the common class “block device”. A block device is addressed in fixed-size (say, 4KiB) chunks, and <a href='https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L356' title=''>handles (roughly) these operations</a>:</p> <div class="highlight-wrapper group relative cpp"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-aokru06k" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-aokru06k"><span class="k">enum</span> <span class="n">req_opf</span> <span class="p">{</span> <span class="cm">/* read sectors from the device */</span> <span class="n">REQ_OP_READ</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="cm">/* write sectors to the device */</span> <span class="n">REQ_OP_WRITE</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="cm">/* flush the volatile write cache */</span> <span class="n">REQ_OP_FLUSH</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span> <span class="cm">/* discard sectors */</span> <span class="n">REQ_OP_DISCARD</span> <span class="o">=</span> <span class="mi">3</span><span class="p">,</span> <span class="cm">/* securely erase sectors */</span> <span class="n">REQ_OP_SECURE_ERASE</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="cm">/* write the same sector many times */</span> <span class="n">REQ_OP_WRITE_SAME</span> <span class="o">=</span> <span class="mi">7</span><span class="p">,</span> <span class="cm">/* write the zero filled sector many times */</span> <span class="n">REQ_OP_WRITE_ZEROES</span> <span class="o">=</span> <span class="mi">9</span><span class="p">,</span> <span class="cm">/* ... */</span> <span class="p">};</span> </code></pre> </div> </div> <p>You can imagine designing a simple network protocol that supported all these options. It might have messages that looked something like:</p> <p><img alt="A packet diagram, just skip down to &quot;struct bio&quot; below" src="/blog/machine-migrations/assets/packet.png?2/3&amp;center" /> Good news! The Linux block system is organized as if your computer was a network running a protocol that basically looks just like that. Here’s the message structure:</p> <div class="right-sidenote"><p>I’ve <a href="https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L223" title="">stripped a bunch of stuff out of here</a> but you don’t need any of it to understand what’s coming next.</p> </div><div class="highlight-wrapper group relative cpp"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-6neynwnf" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-6neynwnf"><span class="cm">/* * main unit of I/O for the block layer and lower layers (ie drivers and * stacking drivers) */</span> <span class="k">struct</span> <span class="nc">bio</span> <span class="p">{</span> <span class="k">struct</span> <span class="nc">gendisk</span> <span class="o">*</span><span class="n">bi_disk</span><span class="p">;</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">bi_opf</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">bi_flags</span><span class="p">;</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">bi_ioprio</span><span class="p">;</span> <span class="n">blk_status_t</span> <span class="n">bi_status</span><span class="p">;</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">bi_vcnt</span><span class="p">;</span> <span class="cm">/* how many bio_vec's */</span> <span class="k">struct</span> <span class="nc">bio_vec</span> <span class="n">bi_inline_vecs</span><span class="p">[]</span> <span class="cm">/* (page, len, offset) tuples */</span><span class="p">;</span> <span class="p">};</span> </code></pre> </div> </div> <p>No nerd has ever looked at a fixed-format message like this without thinking about writing a proxy for it, and <code>struct bio</code> is no exception. The proxy system in the Linux kernel for <code>struct bio</code> is called <code>device mapper</code>, or DM.</p> <p>DM target devices can plug into other DM devices. For that matter, they can do whatever the hell else they want, as long as they honor the interface. It boils down to a <code>map(bio)</code> function, which can dispatch a <code>struct bio</code>, or drop it, or muck with it and ask the kernel to resubmit it.</p> <p>You can do a whole lot of stuff with this interface: carve a big device into a bunch of smaller ones (<a href='https://docs.kernel.org/admin-guide/device-mapper/linear.html' title=''><code>dm-linear</code></a>), make one big striped device out of a bunch of smaller ones (<a href='https://docs.kernel.org/admin-guide/device-mapper/striped.html' title=''><code>dm-stripe</code></a>), do software RAID mirroring (<code>dm-raid1</code>), create snapshots of arbitrary existing devices (<a href='https://docs.kernel.org/admin-guide/device-mapper/snapshot.html' title=''><code>dm-snap</code></a>), cryptographically verify boot devices (<a href='https://docs.kernel.org/admin-guide/device-mapper/verity.html' title=''><code>dm-verity</code></a>), and a bunch more. Device Mapper is the kernel backend for the <a href='https://sourceware.org/lvm2/' title=''>userland LVM2 system</a>, which is how we do <a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''>thin pools and snapshot backups</a>.</p> <p>Which brings us to <code>dm-clone</code> : it’s a map function that boils down to:</p> <div class="highlight-wrapper group relative cpp"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-rj5y343v" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-rj5y343v"> <span class="cm">/* ... */</span> <span class="n">region_nr</span> <span class="o">=</span> <span class="n">bio_to_region</span><span class="p">(</span><span class="n">clone</span><span class="p">,</span> <span class="n">bio</span><span class="p">);</span> <span class="c1">// we have the data</span> <span class="k">if</span> <span class="p">(</span><span class="n">dm_clone_is_region_hydrated</span><span class="p">(</span><span class="n">clone</span><span class="o">-&gt;</span><span class="n">cmd</span><span class="p">,</span> <span class="n">region_nr</span><span class="p">))</span> <span class="p">{</span> <span class="n">remap_and_issue</span><span class="p">(</span><span class="n">clone</span><span class="p">,</span> <span class="n">bio</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// we don't and it's a read</span> <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">bio_data_dir</span><span class="p">(</span><span class="n">bio</span><span class="p">)</span> <span class="o">==</span> <span class="n">READ</span><span class="p">)</span> <span class="p">{</span> <span class="n">remap_to_source</span><span class="p">(</span><span class="n">clone</span><span class="p">,</span> <span class="n">bio</span><span class="p">);</span> <span class="k">return</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// we don't and it's a write</span> <span class="n">remap_to_dest</span><span class="p">(</span><span class="n">clone</span><span class="p">,</span> <span class="n">bio</span><span class="p">);</span> <span class="n">hydrate_bio_region</span><span class="p">(</span><span class="n">clone</span><span class="p">,</span> <span class="n">bio</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="cm">/* ... */</span> </code></pre> </div> </div><div class="right-sidenote"><p>a <a href="https://docs.kernel.org/admin-guide/device-mapper/kcopyd.html" title=""><code>kcopyd</code></a> thread runs in the background, rehydrating the device in addition to (and independent of) read accesses.</p> </div> <p><code>dm-clone</code> takes, in addition to the source device to clone from, a “metadata” device on which is stored a bitmap of the status of all the blocks: either “rehydrated” from the source, or not. That’s how it knows whether to fetch a block from the original device or the clone.</p> <h3 id='network-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#network-clone' aria-label='Anchor'></a><span class='plain-code'>Network Clone</span></h3><div class="callout"><p><strong class="font-semibold text-navy-950"><code>flyd</code> in a nutshell:</strong> worker physical run a service, <code>flyd</code>, which manages a couple of databases that are the source of truth for all the Fly Machines running there. Concepturally, <code>flyd</code> is a server for on-demand instances of durable finite state machines, each representing some operation on a Fly Machine (creation, start, stop, &amp;c), with the transition steps recorded carefully in a BoltDB database. An FSM step might be something like “assign a local IPv6 address to this interface”, or “check out a block device with the contents of this container”, and it’s straightforward to add and manage new ones.</p> </div> <p>Say we&rsquo;ve got <code>flyd</code> managing a Fly Machine with a volume on <code>worker-xx-cdg1-1</code>. We want it running on <code>worker-xx-cdg1-2</code>. Our whole fleet is meshed with WireGuard; everything can talk directly to everything else. So, conceptually:</p> <ol> <li><code>flyd</code> on <code>cdg1-1</code> stops the Fly Machine, and </li><li>sends a message to <code>flyd</code> on <code>cdg1-2</code> telling it to clone the source volume. </li><li><code>flyd</code> on <code>cdg1-2</code> starts a <code>dm-clone</code> instance, which creates a clone volume on <code>cdg1-2</code>, populating it, over some kind of network block protocol, from <code>cdg1-1</code>, and </li><li>boots a new Fly Machine, attached to the clone volume. </li><li><code>flyd</code> on <code>cdg1-2</code> monitors the clone operation, and, when the clone completes, converts the clone device to a simple linear device and cleans up. </li></ol> <p>For step (3) to work, the “original volume” on <code>cdg1-1</code> has to be visible on <code>cdg1-2</code>, which means we need to mount it over the network.</p> <div class="right-sidenote"><p><code>nbd</code> is so simple that it’s used as a sort of <code>dm-user</code> userland block device; to prototype a new block device, <a href="https://lwn.net/ml/linux-kernel/[email protected]/" title="">don’t bother writing a kernel module</a>, just write an <code>nbd</code> server.</p> </div> <p>Take your pick of protocols. iSCSI is the obvious one, but it’s relatively complicated, and Linux has native support for a much simpler one: <code>nbd</code>, the “network block device”. You could implement an <code>nbd</code> server in an afternoon, on top of a file or a SQLite database or S3, and the Linux kernel could mount it as a drive.</p> <p>We started out using <code>nbd</code>. But we kept getting stuck <code>nbd</code> kernel threads when there was any kind of network disruption. We’re a global public cloud; network disruption happens. Honestly, we could have debugged our way through this. But it was simpler just to spike out an iSCSI implementation, observe that didn’t get jammed up when the network hiccuped, and move on.</p> <h3 id='putting-the-pieces-together' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#putting-the-pieces-together' aria-label='Anchor'></a><span class='plain-code'>Putting The Pieces Together</span></h3> <p>To drain a worker with minimal downtime and no lost data, we turn workers into a temporary SANs, serving the volumes we need to drain to fresh-booted replica Fly Machines on a bunch of “target” physicals. Those SANs — combinations of <code>dm-clone</code>, iSCSI, and <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>our <code>flyd</code> orchestrator</a> — track the blocks copied from the origin, copying each one exactly once and cleaning up when the original volume has been fully copied.</p> <p>Problem solved!</p> <h3 id='no-there-were-more-problems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#no-there-were-more-problems' aria-label='Anchor'></a><span class='plain-code'>No, There Were More Problems</span></h3> <p>When your problem domain is hard, anything you build whose design you can’t fit completely in your head is going to be a fiasco. Shorter form: “if you see Raft consensus in a design, we’ve done something wrong”.</p> <p>A virtue of this migration system is that, for as many moving pieces as it has, it fits in your head. What complexity it has is mostly shouldered by strategic bets we’ve already built teams around, most notably the <code>flyd</code> orchestrator. So we’ve been running this system for the better part of a year without much drama. Not no drama, though. Some drama.</p> <p>Example: we encrypt volumes. Our key management is fussy. We do per-volume encryption keys that provision alongside the volumes themselves, so no one worker has a volume skeleton key.</p> <p>If you think “migrating those volume keys from worker to worker” is the problem I’m building up to, well, that too, but the bigger problem is <code>trim</code>.</p> <p>Most people use just a small fraction of the volumes they allocate. A 100GiB volume with just 5MiB used wouldn’t be at all weird. You don’t want to spend minutes copying a volume that could have been fully hydrated in seconds.</p> <p>And indeed, <code>dm-clone</code> doesn’t want to do that either. Given a source block device (for us, an iSCSI mount) and the clone device, a <code>DISCARD</code> issued on the clone device will get picked up by <code>dm-clone</code>, which will simply <a href='https://elixir.bootlin.com/linux/v5.11.11/source/drivers/md/dm-clone-target.c#L1357' title=''>short-circuit the read</a> of the relevant blocks by marking them as hydrated in the metadata volume. Simple enough.</p> <p>To make that work, we need the target worker to see the plaintext of the source volume (so that it can do an <code>fstrim</code> — don’t get us started on how annoying it is to sandbox this — to read the filesystem, identify the unused block, and issue the <code>DISCARDs</code> where <code>dm-clone</code> can see them) Easy enough.</p> <div class="right-sidenote"><p>these curses have a lot to do with how hard it was to drain workers!</p> </div> <p>Except: two different workers, for cursed reasons, might be running different versions of <a href='https://gitlab.com/cryptsetup/cryptsetup' title=''>cryptsetup</a>, the userland bridge between LUKS2 and the <a href='https://docs.kernel.org/admin-guide/device-mapper/dm-crypt.html' title=''>kernel dm-crypt driver</a>. There are (or were) two different versions of cryptsetup on our network, and they default to different <a href='https://fossies.org/linux/cryptsetup/docs/on-disk-format-luks2.pdf' title=''>LUKS2 header sizes</a> — 4MiB and 16MiB. Implying two different plaintext volume sizes. </p> <p>So now part of the migration FSM is an RPC call that carries metadata about the designed LUKS2 configuration for the target VM. Not something we expected to have to build, but, whatever.</p> <div class="right-sidenote"><p>Corrosion deserves its own post.</p> </div> <p>Gnarlier example: workers are the source of truth for information about the Fly Machines running on them. Migration knocks the legs out from under that constraint, which we were relying on in Corrosion, the SWIM-gossip SQLite database we use to connect Fly Machines to our request routing. Race conditions. Debugging. Design changes. Next!</p> <p>Gnarliest example: our private networks. Recall: we automatically place every Fly Machine into <a href='https://fly.io/blog/incoming-6pn-private-networks/' title=''>a private network</a>; by default, it’s the one all the other apps in your organization run in. This is super handy for setting up background services, databases, and clustered applications. 20 lines of eBPF code in our worker kernels keeps anybody from “crossing the streams”, sending packets from one private network to another.</p> <div class="right-sidenote"><p>we’re members of an elite cadre of idiots who have managed to find designs that made us wish IPv6 addresses were even bigger.</p> </div> <p>We call this scheme 6PN (for “IPv6 Private Network”). It functions by <a href='https://fly.io/blog/ipv6-wireguard-peering#ipv6-private-networking-at-fly' title=''>embedding routing information directly into IPv6 addresses</a>. This is, perhaps, gross. But it allows us to route diverse private networks with constantly changing membership across a global fleet of servers without running a distributed routing protocol. As the beardy wizards who kept the Internet backbone up and running on Cisco AGS+’s once said: the best routing protocol is “static”.</p> <p>Problem: the embedded routing information in a 6PN address refers in part to specific worker servers.</p> <p>That’s fine, right? They’re IPv6 addresses. Nobody uses literal IPv6 addresses. Nobody uses IP addresses at all; they use the DNS. When you migrate a host, just give it a new 6PN address, and update the DNS.</p> <p>Friends, somebody did use literal IPv6 addresses. It was us. In the configurations for Fly Postgres clusters.</p> <div class="right-sidenote"><p>It’s also not operationally easy for us to shell into random Fly Machines, for good reason.</p> </div> <p>The obvious fix for this is not complicated; given <code>flyctl</code> ssh access to a Fly Postgres cluster, it’s like a 30 second ninja edit. But we run a <em>lot</em> of Fly Postgres clusters, and the change has to be coordinated carefully to avoid getting the cluster into a confused state. We went as far as adding feature to our <code>init</code> to do network address mappings to keep old 6PN addresses reachable before biting the bullet and burning several weeks doing the direct configuration fix fleet-wide.</p> <figure class="post-cta"> <figcaption> <h1>Speedrun your app onto Fly.io.</h1> <p>3&hellip;2&hellip;1&hellip;</p> <a class="btn btn-lg" href="https://fly.io/speedrun"> Go! &nbsp;<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h3 id='the-learning-it-burns' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-learning-it-burns' aria-label='Anchor'></a><span class='plain-code'>The Learning, It Burns!</span></h3> <p>We get asked a lot why we don’t do storage the “obvious” way, with an <a href='https://aws.amazon.com/ebs/' title=''>EBS-type</a> SAN fabric, abstracting it away from our compute. Locally-attached NVMe storage is an idiosyncratic choice, one we’ve had to write disclaimers for (single-node clusters can lose data!) since we first launched it.</p> <p>One answer is: we’re a startup. Building SAN infrastructure in every region we operate in would be tremendously expensive. Look at any feature in AWS that normal people know the name of, like EC2, EBS, RDS, or S3 — there’s a whole company in there. We launched storage when we were just 10 people, and even at our current size we probably have nothing resembling the resources EBS gets. AWS is pretty great!</p> <p>But another thing to keep in mind is: we’re learning as we go. And so even if we had the means to do an EBS-style SAN, we might not build it today.</p> <p>Instead, we’re a lot more interested in log-structured virtual disks (LSVD). LSVD uses NVMe as a local cache, but durably persists writes in object storage. You get most of the performance benefit of bus-hop disk writes, along with unbounded storage and S3-grade reliability.</p> <p><a href='https://community.fly.io/t/bottomless-s3-backed-volumes/15648' title=''>We launched LSVD experimentally last year</a>; in the intervening year, something happened to make LSVD even more interesting to us: <a href='https://www.tigrisdata.com/' title=''>Tigris Data</a> launched S3-compatible object storage in every one our regions, so instead of backhauling updates to Northern Virginia, <a href='https://community.fly.io/t/tigris-backed-volumes/20792' title=''>we can keep them local</a>. We have more to say about LSVD, and a lot more to say about Tigris.</p> <p>Our first several months of migrations were done gingerly. By summer of 2024, we got to where our infra team can pull “drain this host” out of their toolbelt without much ceremony.</p> <p>We’re still not to the point where we’re migrating casually. Your Fly Machines are probably not getting migrated! There&rsquo;d need to be a reason! But the dream is fully-automated luxury space migration, in which you might get migrated semiregularly, as our systems work not just to drain problematic hosts but to rebalance workloads regularly. No time soon. But we’ll get there.</p> <p>This is the biggest thing our team has done since we replaced Nomad with flyd. Only the new billing system comes close. We did this thing not because it was easy, but because we thought it would be easy. It was not. But: worth it!</p> </content> </entry> <entry> <title>AWS without Access Keys</title> <link rel="alternate" href="https://fly.io/blog/oidc-cloud-roles/"/> <id>https://fly.io/blog/oidc-cloud-roles/</id> <published>2024-06-19T00:00:00+00:00</published> <updated>2024-06-27T14:03:59+00:00</updated> <media:thumbnail url="https://fly.io/blog/oidc-cloud-roles/assets/spooky-security-skeleton-thumb.webp"/> <content type="html"><div class="lead"><p>It’s dangerous to go alone. Fly.io runs full-stack apps by transmuting Docker containers into Fly Machines: ultra-lightweight hardware-backed VMs. You can run all your dependencies on Fly.io, but sometimes, you’ll need to work with other clouds, and we’ve made that pretty simple. Try Fly.io out for yourself; your Rails or Node app <a href="https://fly.io/speedrun" title="">can be up and running in just minutes</a>.</p> </div> <p>Let&rsquo;s hypopulate you an app serving generative AI cat images based on the weather forecast, running on a <code>g4dn.xlarge</code> ECS task in AWS <code>us-east-1</code>. It&rsquo;s going great; people didn&rsquo;t realize how dependent their cat pic prefs are on barometric pressure, and you&rsquo;re all anyone can talk about.</p> <p>Word reaches Australia and Europe, but you&rsquo;re not catching on, because the… latency is too high? Just roll with us here. Anyways: fixing this is going to require replicating ECS tasks and ECR images into <code>ap-southeast-2</code> and <code>eu-central-1</code> while also setting up load balancing. Nah.</p> <p>This is the O.G. Fly.io deployment story; one deployed app, one versioned container, one command to get it running anywhere in the world.</p> <p>But you have a problem: your app relies on training data, it&rsquo;s huge, your giant employer manages it, and it&rsquo;s in S3. Getting this to work will require AWS credentials.</p> <p>You could ask your security team to create a user, give it permissions, and hand over the AWS keypair. Then you could wash your neck and wait for the blade. Passing around AWS keypairs is the beginning of every horror story told about cloud security, and security team ain&rsquo;t having it.</p> <p>There&rsquo;s a better way. It&rsquo;s drastically more secure, so your security people will at least hear you out. It&rsquo;s also so much easier on Fly.io that you might never bother creating a IAM service account again.</p> <h2 id='lets-get-it-out-of-the-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lets-get-it-out-of-the-way' aria-label='Anchor'></a><span class='plain-code'>Let&rsquo;s Get It out of the Way</span></h2> <p>We&rsquo;re going to use OIDC to set up strictly limited trust between AWS and Fly.io.</p> <ol> <li>In AWS: we&rsquo;ll add Fly.io as an <code>Identity Provider</code> in AWS IAM, giving us an ID we can plug into any IAM <code>Role</code>. </li><li>Also in AWS: we&rsquo;ll create a <code>Role</code>, give it access to the S3 bucket with our tokenized cat data, and then attach the <code>Identity Provider</code> to it. </li><li>In Fly.io, we&rsquo;ll take the <code>Role</code> ARN we got from step 2 and set it as an environment variable in our app. </li></ol> <p>Our machines will now magically have access to the S3 bucket.</p> <h2 id='what-the-what' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-what' aria-label='Anchor'></a><span class='plain-code'>What the What</span></h2> <p>A reasonable question to ask here is, &ldquo;where&rsquo;s the credential&rdquo;? Ordinarily, to give a Fly Machine access to an AWS resource, you&rsquo;d use <code>fly secrets set</code> to add an <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code> to the environment in the Machine. Here, we&rsquo;re not setting any secrets at all; we&rsquo;re just adding an ARN — which is not a credential — to the Machine.</p> <p>Here&rsquo;s what&rsquo;s happening.</p> <p>Fly.io operates an OIDC IdP at <code>oidc.fly.io</code>. It issues OIDC tokens, exclusively to Fly Machines. AWS can be configured to trust these tokens, on a role-by-role basis. That&rsquo;s the &ldquo;secret credential&rdquo;: the pre-configured trust relationship in IAM, and the public keypairs it manages. You, the user, never need to deal with these keys directly; it all happens behind the scenes, between AWS and Fly.io.</p> <p><img alt="A diagram: STS trusts OIDC.fly.io. OIDC.fly.io trusts flyd. flyd issues a token to the Machine, which proffers it to STS. STS sends an STS cred to the Machine, which then uses it to retrieve model weights from S3." src="/blog/oidc-cloud-roles/assets/oidc-diagram.webp" /></p> <p>The key actor in this picture is <code>STS</code>, the AWS <code>Security Token Service</code>. <code>STS</code>&lsquo;s main job is to vend short-lived AWS credentials, usually through some variant of an API called <code>AssumeRole</code>. Specifically, in our case: <code>AssumeRoleWithWebIdentity</code> tells <code>STS</code> to cough up an AWS keypair given an OIDC token (that matches a pre-configured trust relationship).</p> <p>That still leaves the question: how does your code, which is reaching out to the AWS APIs to get cat weights, drive any of this?</p> <h2 id='the-init-thickens' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-init-thickens' aria-label='Anchor'></a><span class='plain-code'>The Init Thickens</span></h2> <p>Every Fly Machine boots up into an <code>init</code> we wrote in Rust. It has slowly been gathering features.</p> <p>One of those features, which has been around for awhile, is a server for a Unix socket at <code>/.fly/api</code>, which exports a subset of the Fly Machines API to privileged processes in the Machine. Think of it as our answer to the EC2 Instant Metadata Service. How it works is, every time we boot a Fly Machine, we pass it a <a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''>Macaroon token</a> locked to that particular Machine; <code>init</code>&rsquo;s server for <code>/.fly/api</code> is a proxy that attaches that token to requests.</p> <div class="right-sidenote"><p>In addition to the API proxy being tricky to SSRF to.</p> </div> <p>What&rsquo;s neat about this is that the credential that drives <code>/.fly/api</code> is doubly protected:</p> <ol> <li>The Fly.io platform won&rsquo;t honor it unless it comes from that specific Fly Machine (<code>flyd</code>, our orchestrator, knows who it&rsquo;s talking to), <em>and</em> </li><li>Ordinary code running in a Fly Machine never gets a copy of the token to begin with. </li></ol> <p>You could rig up a local privilege escalation vulnerability and work out how to steal the Macaroon, but you can&rsquo;t exfiltrate it productively.</p> <p>So now you have half the puzzle worked out: OIDC is just part of the <a href='https://fly.io/docs/machines/api/' title=''>Fly Machines API</a> (specifically: <code>/v1/tokens/oidc</code>). A Fly Machine can hit a Unix socket and get an OIDC token tailored to that machine:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-9o3904mp" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-9o3904mp">{ "app_id": "3671581", "app_name": "weather-cat", "aud": "sts.amazonaws.com", "image": "image:latest", "image_digest": "sha256:dff79c6da8dd4e282ecc6c57052f7cfbd684039b652f481ca2e3324a413ee43f", "iss": "https://oidc.fly.io/example", "machine_id": "3d8d377ce9e398", "machine_name": "ancient-snow-4824", "machine_version": "01HZJXGTQ084DX0G0V92QH3XW4", "org_id": "29873298", "org_name": "example", "region": "yyz", "sub": "example:weather-cat:ancient-snow-4824" } // some OIDC stuff trimmed </code></pre> </div> </div> <p>Look upon this holy blob, sealed with a published key managed by Fly.io&rsquo;s OIDC vault, and see that there lies within it enough information for AWS <code>STS</code> to decide to issue a session credential.</p> <p>We have still not completed the puzzle, because while you can probably now see how you&rsquo;d drive this process with a bunch of new code that you&rsquo;d tediously write, you are acutely aware that you have not yet endured that tedium — e pur si muove!</p> <p>One <code>init</code> feature remains to be disclosed, and it&rsquo;s cute.</p> <p>If, when <code>init</code> starts in a Fly Machine, it sees an <code>AWS_ROLE_ARN</code> environment variable set, it initiates a little dance; it:</p> <ol> <li>goes off and generates an OIDC token, the way we just described, </li><li>saves that OIDC token in a file, <em>and</em> </li><li>sets the <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> and <code>AWS_ROLE_SESSION_NAME</code> environment variables for every process it launches. </li></ol> <p>The AWS SDK, linked to your application, does all the rest.</p> <p>Let&rsquo;s review: you add an <code>AWS_ROLE_ARN</code> variable to your Fly App, launch a Machine, and have it go fetch a file from S3. What happens next is:</p> <ol> <li><code>init</code> detects <code>AWS_ROLE_ARN</code> is set as an environment variable. </li><li><code>init</code> sends a request to <code>/v1/tokens/oidc</code> via <code>/.api/proxy</code>. </li><li><code>init</code> writes the response to <code>/.fly/oidc_token.</code> </li><li><code>init</code> sets <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> and <code>AWS_ROLE_SESSION_NAME</code>. </li><li>The entrypoint boots, and (say) runs <code>aws s3 get-object.</code> </li><li>The AWS SDK runs through the <a href='https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html#credentialProviderChain' title=''>credential provider chain</a> </li><li>The SDK sees that <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> is set and calls <code>AssumeRoleWithWebIdentity</code> with the file contents. </li><li>AWS verifies the token against <a href='https://oidc.fly.io/' title=''><code>https://oidc.fly.io/</code></a><code>example/.well-known/openid-configuration</code>, which references a key Fly.io manages on isolated hardware. </li><li>AWS vends <code>STS</code> credentials for the assumed <code>Role</code>. </li><li>The SDK uses the <code>STS</code> credentials to access the S3 bucket. </li><li>AWS checks the <code>Role</code>&rsquo;s IAM policy to see if it has access to the S3 bucket. </li><li>AWS returns the contents of the bucket object. </li></ol> <h2 id='how-much-better-is-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-better-is-this' aria-label='Anchor'></a><span class='plain-code'>How Much Better Is This?</span></h2> <p>It is a lot better.</p> <div class="right-sidenote"><p>They asymptotically approach the security properties of Macaroon tokens.</p> </div> <p>Most importantly: AWS <code>STS</code> credentials are short-lived. Because they&rsquo;re generated dynamically, rather than stored in a configuration file or environment variable, they&rsquo;re already a little bit annoying for an attacker to recover. But they&rsquo;re also dead in minutes. They have a sharply limited blast radius. They rotate themselves, and fail closed.</p> <p>They&rsquo;re also easier to manage. This is a rare instance where you can reasonably drive the entire AWS side of the process from within the web console. Your cloud team adds <code>Roles</code> all the time; this is just a <code>Role</code> with an extra snippet of JSON. The resulting ARN isn&rsquo;t even a secret; your cloud team could just email or Slack message it back to you.</p> <p>Finally, they offer finer-grained control.</p> <p>To understand the last part, let&rsquo;s look at that extra snippet of JSON (the &ldquo;Trust Policy&rdquo;) your cloud team is sticking on the new <code>cat-bucket</code> <code>Role</code>:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-la5jlerc" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-la5jlerc">{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::123456123456:oidc-provider/oidc.fly.io/example" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "oidc.fly.io/example:aud": "sts.amazonaws.com", }, "StringLike": { "oidc.fly.io/example:sub": "example:weather-cat:*" } } } ] } </code></pre> </div> </div><div class="right-sidenote"><p>The <code>aud</code> check guarantees <code>STS</code> will only honor tokens that Fly.io deliberately vended for <code>STS</code>.</p> </div> <p>Recall the OIDC token we dumped earlier; much of what&rsquo;s in it, we can match in the Trust Policy. Every OIDC token Fly.io generates is going to have a <code>sub</code> field formatted <code>org:app:machine</code>, so we can lock IAM <code>Roles</code> down to organizations, or to specific Fly Apps, or even specific Fly Machine instances.</p> <figure class="post-cta"> <figcaption> <h1>Speedrun your app onto Fly.io.</h1> <p>3&hellip;2&hellip;1&hellip;</p> <a class="btn btn-lg" href="https://fly.io/speedrun"> Go! &nbsp;<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-rabbit.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='and-so' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#and-so' aria-label='Anchor'></a><span class='plain-code'>And So</span></h2> <p>In case it&rsquo;s not obvious: this pattern works for any AWS API, not just S3.</p> <p>Our OIDC support on the platform and in Fly Machines will set arbitrary OIDC <code>audience</code> strings, so you can use it to authenticate to any OIDC-compliant cloud provider. It won&rsquo;t be as slick on Azure or GCP, because we haven&rsquo;t done the <code>init</code> features to light their APIs up with a single environment variable — but those features are easy, and we&rsquo;re just waiting for people to tell us what they need.</p> <p>For us, the gold standard for least-privilege, conditional access tokens remains Macaroons, and it&rsquo;s unlikely that we&rsquo;re going to do a bunch of internal stuff using OIDC. We even snuck Macaroons into this feature. But the security you&rsquo;re getting from this OIDC dance closes a lot of the gap between hardcoded user credentials and Macaroons, and it&rsquo;s easy to use — easier, in some ways, than it is to manage role-based access inside of a legacy EC2 deployment!</p> </content> </entry> <entry> <title>Picture This: Open Source AI for Image Description</title> <link rel="alternate" href="https://fly.io/blog/llm-image-description/"/> <id>https://fly.io/blog/llm-image-description/</id> <published>2024-05-09T00:00:00+00:00</published> <updated>2024-05-09T17:35:04+00:00</updated> <media:thumbnail url="https://fly.io/blog/llm-image-description/assets/image-description-thumb.webp"/> <content type="html"><div class="lead"><p>I’m Nolan, and I work on Fly Machines here at Fly.io. We’re building a new public cloud—one where you can spin up CPU and GPU workloads, around the world, in a jiffy. <a href="https://fly.io/speedrun/" title="">Try us out</a>; you can be up and running in minutes. This is a post about LLMs being really helpful, and an extensible project you can build with open source on a weekend.</p> </div> <p>Picture this, if you will.</p> <p>You&rsquo;re blind. You&rsquo;re in an unfamiliar hotel room on a trip to Chicago.</p> <div class="right-sidenote"><p>If you live in Chicago IRL, imagine the hotel in Winnipeg, <a href="https://www.cbc.ca/history/EPISCONTENTSE1EP10CH3PA5LE.html" title="">the Chicago of the North</a>.</p> </div> <p>You&rsquo;ve absent-mindedly set your coffee down, and can&rsquo;t remember where. You&rsquo;re looking for the thermostat so you don&rsquo;t wake up frozen. Or, just maybe, you&rsquo;re playing a fun-filled round of &ldquo;find the damn light switch so your sighted partner can get some sleep already!&rdquo;</p> <p>If, like me, you&rsquo;ve been blind for a while, you have plenty of practice finding things without the luxury of a quick glance around. It may be more tedious than you&rsquo;d like, but you&rsquo;ll get it done.</p> <p>But the speed of innovation in machine learning and large language models has been dizzying, and in 2024 you can snap a photo with your phone and have an app like <a href='https://www.bemyeyes.com/blog/announcing-be-my-ai/' title=''>Be My AI</a> or <a href='https://www.seeingai.com/' title=''>Seeing AI</a> tell you where in that picture it found your missing coffee mug, or where it thinks the light switch is.</p> <div class="right-sidenote"><p>Creative switch locations seem to be a point of pride for hotels, so the light game may be good for a few rounds of quality entertainment, regardless of how good your AI is.</p> </div> <p>This is <em>big</em>. It&rsquo;s hard for me to state just how exciting and empowering AI image descriptions have been for me without sounding like a shill. In the past year, I&rsquo;ve:</p> <ul> <li>Found shit in strange hotel rooms. </li><li>Gotten descriptions of scenes and menus in otherwise inaccessible video games. </li><li>Requested summaries of technical diagrams and other materials where details weren’t made available textually. </li></ul> <p>I&rsquo;ve been consistently blown away at how impressive and helpful AI-created image descriptions have been.</p> <p>Also&hellip;</p> <h2 id='which-thousand-words-is-this-picture-worth' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#which-thousand-words-is-this-picture-worth' aria-label='Anchor'></a><span class='plain-code'>Which thousand words is this picture worth?</span></h2> <p>As a blind internet user for the last three decades, I have extensive empirical evidence to corroborate what you already know in your heart: humans are pretty flaky about writing useful alt text for all the images they publish. This does tend to make large swaths of the internet inaccessible to me!</p> <p>In just a few years, the state of image description on the internet has gone from complete reliance on the aforementioned lovable, but ultimately tragically flawed, humans, to automated strings of words like <code>Image may contain person, glasses, confusion, banality, disillusionment</code>, to LLM-generated text that reads a lot like it was written by a person, perhaps sipping from a steaming cup of Earl Grey as they reflect on their previous experiences of a background that features a tree with snow on its branches, suggesting that this scene takes place during winter.</p> <p>If an image is missing alt text, or if you want a second opinion, there are screen-reader addons, like <a href='https://github.com/cartertemm/AI-content-describer/' title=''>this one</a> for <a href='https://www.nvaccess.org/download/' title=''>NVDA</a>, that you can use with an API key to get image descriptions from GPT-4 or Google Gemini as you read. This is awesome! </p> <p>And this brings me to the nerd snipe. How hard would it be to build an image description service we can host ourselves, using open source technologies? It turns out to be spookily easy.</p> <p>Here&rsquo;s what I came up with:</p> <ol> <li><a href='https://ollama.com/' title=''>Ollama</a> to run the model </li><li>A <a href='https://pocketbase.io' title=''>PocketBase</a> project that provides a simple authenticated API for users to submit images, get descriptions, and ask followup questions about the image </li><li>The simplest possible Python client to interact with the PocketBase app on behalf of users </li></ol> <p>The idea is to keep it modular and hackable, so if sentiment analysis or joke creation is your thing, you can swap out image description for that and have something going in, like, a weekend.</p> <p>If you&rsquo;re like me, and you go skipping through recipe blogs to find the &ldquo;go directly to recipe&rdquo; link, find the code itself <a href='https://github.com/superfly/llm-describer' title=''>here</a>. </p> <h2 id='the-llm-is-the-easiest-part' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-llm-is-the-easiest-part' aria-label='Anchor'></a><span class='plain-code'>The LLM is the easiest part</span></h2> <p>An API to accept images and prompts, run the model, and spit out answers sounds like a lot! But it&rsquo;s the simplest part of this whole thing, because: that&rsquo;s <a href='https://ollama.com/' title=''>Ollama</a>.</p> <p>You can just run the Ollama Docker image, get it to grab the model you want to use, and that&rsquo;s it. There&rsquo;s your AI server. (We have a <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>blog post</a> all about deploying Ollama on Fly.io; Fly GPUs are rad, try&#39;em out, etc.).</p> <p>For this project, we need a model that can make sense&mdash;or at least words&mdash;out of a picture. <a href='https://llava-vl.github.io/' title=''>LLaVA</a> is a trained, Apache-licensed &ldquo;large multimodal model&rdquo; that fits the bill. Get the model with the Ollama CLI:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-wohvpptj" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-wohvpptj">ollama pull llava:34b </code></pre> </div> </div><div class="callout"><p>If you have hardware that can handle it, you could run this on your computer at home. If you run AI models on a cloud provider, be aware that GPU compute is expensive! <strong class="font-semibold text-navy-950">It’s important to take steps to ensure you’re not paying for a massive GPU 24/7.</strong></p> <p>On Fly.io, at the time of writing, you’d achieve this with the <a href="https://fly.io/docs/apps/autostart-stop/" title="">autostart and autostop</a> functions of the Fly Proxy, restricting Ollama access to internal requests over <a href="https://fly.io/docs/networking/private-networking/#flycast-private-fly-proxy-services" title="">Flycast</a> from the PocketBase app. That way, if there haven’t been any requests for a few minutes, the Fly Proxy stops the Ollama <a href="https://fly.io/docs/machines/" title="">Machine</a>, which releases the CPU, GPU, and RAM allocated to it. <a href="https://fly.io/blog/scaling-llm-ollama/" title="">Here’s a post</a> that goes into more detail. </p> </div><h2 id='a-multi-tool-on-the-backend' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-multi-tool-on-the-backend' aria-label='Anchor'></a><span class='plain-code'>A multi-tool on the backend</span></h2> <p>I want user auth to make sure just anyone can&rsquo;t grab my &ldquo;image description service&rdquo; and keep it busy generating short stories about their cat. If I build this out into a service for others to use, I might also want business logic around plans or credits, or mobile-friendly APIs for use in the field. <a href='https://pocketbase.io' title=''>PocketBase</a> provides a scaffolding for all of it. It&rsquo;s a Swiss army knife: a Firebase-like API on top of SQLite, complete with authentication, authorization, an admin UI, extensibility in JavaScript and Go, and various client-side APIs.</p> <div class="right-sidenote"><p>Yes, <em>of course</em> I’ve used an LLM to generate feline fanfic. Theme songs too. Hasn’t everyone? </p> </div> <p>I &ldquo;faked&rdquo; a task-specific API that supports followup questions by extending PocketBase in Go, modeling requests and responses as <a href='https://pocketbase.io/docs/collections/' title=''>collections</a> (i.e. SQLite tables) with <a href='https://pocketbase.io/docs/go-event-hooks/' title=''>event hooks</a> to trigger pre-set interactions with the Ollama app (via <a href='https://tmc.github.io/langchaingo' title=''>LangChainGo</a>) and the client (via the PocketBase API).</p> <p>If you&rsquo;re following along, <a href='https://github.com/superfly/llm-describer/blob/main/describer.go' title=''>here&rsquo;s the module</a> that handles all that, along with initializing the LLM connection.</p> <p>In a nutshell, this is the dance:</p> <ul> <li>When a user uploads an image, a hook on the <code>images</code> collection sends the image to Ollama, along with this prompt: <code>&quot;You are a helpful assistant describing images for blind screen reader users. Please describe this image.&quot;</code> </li><li>Ollama sends back its response, which the backend 1) passes back to the client and 2) stores in its <code>followups</code> collection for future reference. </li><li>If the user responds with a followup question about the image and description, that also goes into the <code>followups</code> collection; user-initiated changes to this collection trigger a hook to chain the new followup question with the image and the chat history into a new request for the model. </li><li>Lather, rinse, repeat. </li></ul> <p>This is a super simple hack to handle followup questions, and it’ll let you keep adding followups until something breaks. You&rsquo;ll see the quality of responses get poorer&mdash;possibly incoherent&mdash;as the context exceeds the context window.</p> <p>I also set up <a href='https://pocketbase.io/docs/api-rules-and-filters/' title=''>API rules</a> in PocketBase, ensuring that users can&rsquo;t read to and write from others&rsquo; chats with the AI.</p> <p>If image descriptions aren&rsquo;t your thing, this business logic is easily swappable for joke generation, extracting details from text, any other simple task you might want to throw at an LLM. Just slot the best model into Ollama (LLaVA is pretty OK as a general starting point too), and match the PocketBase schema and pre-set prompts to your application.</p> <h2 id='a-seedling-of-a-client' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-seedling-of-a-client' aria-label='Anchor'></a><span class='plain-code'>A seedling of a client</span></h2> <p>With the image description service in place, the user can talk to it with any client that speaks the PocketBase API. PocketBase already has SDK clients in JavaScript and Dart, but because my screen reader is <a href='https://github.com/nvaccess/nvda' title=''>written in Python</a>, I went with a <a href='https://pypi.org/project/pocketbase/' title=''>community-created Python library</a>. That way I can build this out into an NVDA add-on if I want to.</p> <p>If you&rsquo;re a fancy Python developer, you probably have your preferred tooling for handling virtualenvs and friends. I&rsquo;m not, and since my screen reader doesn&rsquo;t use those anyway, I just <code>pip install</code>ed the library so my client can import it:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-s8xqjyx2" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-s8xqjyx2">pip install pocketbase </code></pre> </div> </div> <p><a href='https://github.com/superfly/llm-describer/blob/main/__init__.py' title=''>My client</a> is a very simple script. It expects a couple of things: a file called <code>image.jpg</code>, located in the current directory, and environment variables to provide the service URL and user credentials to log into it with.</p> <p>When you run the client script, it uploads the image to the user’s <code>images</code> collection on the backend app, starting the back-and-forth between user and model we saw in the previous section. The client prints the model&rsquo;s output to the CLI and prompts the user to input a followup question, which it passes up to the <code>followups</code> collection, and so on.</p> <figure class="post-cta"> <figcaption> <h1>This can run on Fly.io.</h1> <p>Run your LLM on a datacenter-grade GPU.</p> <a class="btn btn-lg" href="https://fly.io/gpu/"> Try out a Fly GPU &nbsp;<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='all-together-now' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#all-together-now' aria-label='Anchor'></a><span class='plain-code'>All together now</span></h2> <p>I grabbed <a href='https://unsplash.com/photos/brown-trees-beside-river-under-blue-sky-during-daytime-kSvpTrfhaiU' title=''>this image</a> and saved it to a file called <em>image.jpg</em>. </p> <p>While I knew I was downloading an image of a winter scene, all I see on Unsplash is:</p> <blockquote> <p>brown trees beside river under blue sky during daytime Bright winter landscape with lake, snow, forest, beautiful blue sky and white clouds. An example of charming wildlife in Russia.</p> </blockquote> <p>Let&rsquo;s see what our very own AI describer thinks of this picture:</p> <div class="highlight-wrapper group relative plain"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-lvuwb8nb" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-lvuwb8nb">$ python __init__.py The image depicts a serene winter scene. There is a small stream of water flowing through the center, with patches of snow on its surface and along the banks. The surrounding area appears to be a mix of bare trees and shrubs, suggesting it's late in the season or possibly early spring when some plants have started to grow again but haven't yet leafed out. The sky is clear and blue, indicating good weather conditions at the time the photo was taken. The sunlight seems to be coming from the left side of the image, casting a warm glow on the scene and creating reflections in the water. There are no visible texts or distinguishing marks that provide additional context about the location or the photographer. The overall atmosphere is peaceful and natural, with the quiet beauty of the landscape undisturbed except for the gentle movement of the stream. </code></pre> </div> </div> <p>Is it a stellar description? Maybe not, but it certainly gives me a better sense of connection with the scene.</p> <p>Let&rsquo;s see how our describer copes with a followup question.</p> <div class="highlight-wrapper group relative plain"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-dgfkbrw6" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-dgfkbrw6">Enter your followup question, or 'quit' to quit: What types of trees are in the image? Sending followup question It's difficult to determine the exact species of trees in the image without more specific details, as many deciduous trees lose their leaves during winter and look quite similar. However, based on the general appearance of the bare branches and the fact that they are leafless, it is likely that these are common deciduous trees such as oaks, maples, or elms. The presence of snow and the absence of foliage suggest that this photo was taken during winter or early spring when the trees have not yet started to bloom again. The exact species would require a closer examination of the bark, buds, and other distinguishing features, which are not clearly visible in this image. </code></pre> </div> </div> <p>Boo, the general-purpose LLaVA model couldn&rsquo;t identify the leafless trees. At least it knows why it can&rsquo;t. Maybe there&rsquo;s a better model out there for that. Or we could train one, if we really needed tree identification! We could make every component of this service more sophisticated! </p> <p>But that I, personally, can make a proof of concept like this with a few days of effort continues to boggle my mind. Thanks to a handful of amazing open source projects, it&rsquo;s really, spookily, easy. And from here, I (or you) can build out a screen-reader addon, or a mobile app, or a different kind of AI service, with modular changes.</p> <h2 id='deployment-notes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-notes' aria-label='Anchor'></a><span class='plain-code'>Deployment notes</span></h2> <p>On Fly.io, stopping GPU Machines saves you a bunch of money and some carbon footprint, in return for cold-start latency when you make a request for the first time in more than a few minutes. In testing this project, on the <code>a100-40gb</code> Fly Machine preset, the 34b-parameter LLaVA model took several seconds to generate each response. If the Machine was stopped when the request came in, starting it up took another handful of seconds, followed by several tens of seconds to load the model into GPU RAM. The total time from cold start to completed description was about 45 seconds. Just something to keep in mind.</p> <p>If you&rsquo;re running Ollama in the cloud, you likely want to put the model onto storage that&rsquo;s persistent, so you don&rsquo;t have to download it repeatedly. You could also build the model into a Docker image ahead of deployment.</p> <p>The PocketBase Golang app compiles to a single executable that you can run wherever. I run it on Fly.io, unsurprisingly, and the <a href='https://github.com/superfly/llm-describer/' title=''>repo</a> comes with a Dockerfile and a <a href='https://fly.io/docs/reference/configuration/' title=''><code>fly.toml</code></a> config file, which you can edit to point at your own Ollama instance. It uses a small persistent storage volume for the SQLite database. Under testing, it runs fine on a <code>shared-cpu-1x</code> Machine. </p> </content> </entry> <entry> <title>JIT WireGuard</title> <link rel="alternate" href="https://fly.io/blog/jit-wireguard-peers/"/> <id>https://fly.io/blog/jit-wireguard-peers/</id> <published>2024-03-12T00:00:00+00:00</published> <updated>2024-05-09T17:35:04+00:00</updated> <media:thumbnail url="https://fly.io/blog/jit-wireguard-peers/assets/network-thumbnail.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. We do a lot of stuff with WireGuard, which has become a part of our customer API. This is a quick story about some tricks we played to make WireGuard faster and more scalable for the hundreds of thousands of people who now use it here.</p> </div> <p>One of many odd decisions we&rsquo;ve made at Fly.io is how we use WireGuard. It&rsquo;s not just that we use it in many places where other shops would use HTTPS and REST APIs. We&rsquo;ve gone a step beyond that: every time you run <code>flyctl</code>, our lovable, sprawling CLI, it conjures a TCP/IP stack out of thin air, with its own IPv6 address, and speaks directly to Fly Machines running on our networks.</p> <p>There are plusses and minuses to this approach, which we talked about <a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''>in a blog post a couple years back</a>. Some things, like remote-operated Docker builders, get easier to express (a Fly Machine, as far as <code>flyctl</code> is concerned, might as well be on the same LAN). But everything generally gets trickier to keep running reliably.</p> <p>It was a decision. We own it.</p> <p>Anyways, we&rsquo;ve made some improvements recently, and I&rsquo;d like to talk about them.</p> <h2 id='where-we-left-off' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-we-left-off' aria-label='Anchor'></a><span class='plain-code'>Where we left off</span></h2> <p>Until a few weeks ago, our gateways ran on a pretty simple system.</p> <ol> <li>We operate dozens of &ldquo;gateway&rdquo; servers around the world, whose sole purpose is to accept incoming WireGuard connections and connect them to the appropriate private networks. </li><li>Any time you run <code>flyctl</code> and it needs to talk to a Fly Machine (to build a container, pop an SSH console, copy files, or proxy to a service you&rsquo;re running), it spawns or connects to a background agent process. </li><li>The first time it runs, the agent generates a new WireGuard peer configuration from our GraphQL API. WireGuard peer configurations are very simple: just a public key and an address to connect to. </li><li>Our API in turn takes that peer configuration and sends it to the appropriate gateway (say, <code>ord</code>, if you&rsquo;re near Chicago) via an RPC we send over the NATS messaging system. </li><li>On the gateway, a service called <code>wggwd</code> accepts that configuration, saves it to a SQLite database, and adds it to the kernel using WireGuard&rsquo;s Golang libraries. <code>wggwd</code> acknowledges the installation of the peer to the API. </li><li>The API replies to your GraphQL request, with the configuration. </li><li>Your <code>flyctl</code> connects to the WireGuard peer, which works, because you receiving the configuration means it’s installed on the gateway. </li></ol> <p>I copy-pasted those last two bullet points from <a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''>that two-year-old post</a>, because when it works, it does <em>just work</em> reasonably well. (We ultimately did end up defaulting everybody to WireGuard-over-WebSockets, though.)</p> <p>But if it always worked, we wouldn&rsquo;t be here, would we?</p> <p>We ran into two annoying problems:</p> <p>One: NATS is fast, but doesn’t guarantee delivery. Back in 2022, Fly.io was pretty big on NATS internally. We&rsquo;ve moved away from it. For instance, our <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>internal <code>flyd</code> API</a> used to be driven by NATS; today, it&rsquo;s HTTP. Our NATS cluster was losing too many messages to host a reliable API on it. Scaling back our use of NATS made WireGuard gateways better, but still not great.</p> <p>Two: When <code>flyctl</code> exits, the WireGuard peer it created sticks around on the gateway. Nothing cleans up old peers. After all, you&rsquo;re likely going to come back tomorrow and deploy a new version of your app, or <code>fly ssh console</code> into it to debug something. Why remove a peer just to re-add it the next day?</p> <p>Unfortunately, the vast majority of peers are created by <code>flyctl</code> in CI jobs, which don&rsquo;t have persistent storage and can&rsquo;t reconnect to the same peer the next run; they generate new peers every time, no matter what.</p> <p>So, we ended up with a not-reliable-enough provisioning system, and gateways with hundreds of thousands of peers that will never be used again. The high stale peer count made kernel WireGuard operations very slow - especially loading all the peers back into the kernel after a gateway server reboot - as well as some kernel panics.</p> <p>There had to be</p> <h2 id='a-better-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-better-way' aria-label='Anchor'></a><span class='plain-code'>A better way.</span></h2> <p>Storing bajillions of WireGuard peers is no big challenge for any serious n-tier RDBMS. This isn&rsquo;t &ldquo;big data&rdquo;. The problem we have at Fly.io is that our gateways don&rsquo;t have serious n-tier RDBMSs. They&rsquo;re small. Scrappy. They live off the land.</p> <p>Seriously, though: you could store every WireGuard peer everybody has ever used at Fly.io in a single SQLite database, easily. What you can&rsquo;t do is store them all in the Linux kernel.</p> <p>So, at some point, as you push more and more peer configurations to a gateway, you have to start making decisions about which peers you&rsquo;ll enable in the kernel, and which you won&rsquo;t.</p> <p>Wouldn&rsquo;t it be nice if we just didn&rsquo;t have this problem? What if, instead of pushing configs to gateways, we had the gateways pull them from our API on demand?</p> <p>If you did that, peers would only have to be added to the kernel when the client wanted to connect. You could yeet them out of the kernel any time you wanted; the next time the client connected, they&rsquo;d just get pulled again, and everything would work fine.</p> <p>The problem you quickly run into to build this design is that Linux kernel WireGuard doesn&rsquo;t have a feature for installing peers on demand. However:</p> <h2 id='it-is-possible-to-jit-wireguard-peers' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#it-is-possible-to-jit-wireguard-peers' aria-label='Anchor'></a><span class='plain-code'>It is possible to JIT WireGuard peers</span></h2> <p>The Linux kernel&rsquo;s <a href='https://github.com/WireGuard/wgctrl-go' title=''>interface for configuring WireGuard</a> is <a href='https://docs.kernel.org/userspace-api/netlink/intro.html' title=''>Netlink</a> (which is basically a way to create a userland socket to talk to a kernel service). Here&rsquo;s a <a href='https://github.com/WireGuard/wg-dynamic/blob/master/netlink.h' title=''>summary of it as a C API</a>. Note that there&rsquo;s no API call to subscribe for &ldquo;incoming connection attempt&rdquo; events.</p> <p>That&rsquo;s OK! We can just make our own events. WireGuard connection requests are packets, and they&rsquo;re easily identifiable, so we can efficiently snatch them with a BPF filter and a <a href='https://github.com/google/gopacket' title=''>packet socket</a>.</p> <div class="callout"><p>Most of the time, it’s even easier for us to get the raw WireGuard packets, because our users now default to WebSockets WireGuard (which is just an unauthenticated WebSockets connect that shuttles framed UDP packets to and from an interface on the gateway), so that people who have trouble talking end-to-end in UDP can bring connections up.</p> </div> <p>We own the daemon code for that, and can just hook the packet receive function to snarf WireGuard packets.</p> <p>It&rsquo;s not obvious, but WireGuard doesn&rsquo;t have notions of &ldquo;client&rdquo; or &ldquo;server&rdquo;. It&rsquo;s a pure point-to-point protocol; peers connect to each other when they have traffic to send. The first peer to connect is called the <strong class='font-semibold text-navy-950'>initiator</strong>, and the peer it connects to is the <strong class='font-semibold text-navy-950'>responder</strong>.</p> <div class="right-sidenote"><p><a href="https://www.wireguard.com/papers/wireguard.pdf" title=""><em>The WireGuard paper</em></a> <em>is a good read.</em></p> </div> <p>For Fly.io, <code>flyctl</code> is typically our initiator, sending a single UDP packet to the gateway, which is the responder. According <a href='https://www.wireguard.com/papers/wireguard.pdf' title=''>to the WireGuard paper</a>, this first packet is a <code>handshake initiation</code>. It gets better: the packet type is recorded in a single plaintext byte. So this simple BPF filter catches all the incoming connections: <code>udp and dst port 51820 and udp[8] = 1</code>.</p> <p>In most other protocols, we&rsquo;d be done at this point; we&rsquo;d just scrape the username or whatnot out of the packet, go fetch the matching configuration, and install it in the kernel. With WireGuard, not so fast. WireGuard is based on Trevor Perrin&rsquo;s <a href='http://www.noiseprotocol.org/' title=''>Noise Protocol Framework</a>, and Noise goes way out of its way to <a href='http://www.noiseprotocol.org/noise.html#identity-hiding' title=''>hide identities</a> during handshakes. To identify incoming requests, we&rsquo;ll need to run enough Noise cryptography to decrypt the identity.</p> <p>The code to do this is fussy, but it&rsquo;s relatively short (about 200 lines). Helpfully, the kernel Netlink interface will give a privileged process the private key for an interface, so the secrets we need to unwrap WireGuard are easy to get. Then it&rsquo;s just a matter of running the first bit of the Noise handshake. If you&rsquo;re that kind of nerdy, <a href='https://gist.github.com/tqbf/9f2c2852e976e6566f962d9bca83062b' title=''>here&rsquo;s the code.</a></p> <p>At this point, we have the event feed we wanted: the public keys of every user trying to make a WireGuard connection to our gateways. We keep a rate-limited cache in SQLite, and when we see new peers, we&rsquo;ll make an internal HTTP API request to fetch the matching peer information and install it. This fits nicely into the little daemon that already runs on our gateways to manage WireGuard, and allows us to ruthlessly and recklessly remove stale peers with a <code>cron</code> job.</p> <p>But wait! There&rsquo;s more! We bounced this plan off Jason Donenfeld, and he tipped us off on a sneaky feature of the Linux WireGuard Netlink interface.</p> <div class="right-sidenote"><p>Jason is the hardest working person in show business.</p> </div> <p>Our API fetch for new peers is generally not going to be fast enough to respond to the first handshake initiation message a new client sends us. That&rsquo;s OK; WireGuard is pretty fast about retrying. But we can do better.</p> <p>When we get an incoming initiation message, we have the 4-tuple address of the desired connection, including the ephemeral source port <code>flyctl</code> is using. We can install the peer as if we&rsquo;re the initiator, and <code>flyctl</code> is the responder. The Linux kernel will initiate a WireGuard connection back to <code>flyctl</code>. This works; the protocol doesn&rsquo;t care a whole lot who&rsquo;s the server and who&rsquo;s the client. We get new connections established about as fast as they can possibly be installed.</p> <figure class="post-cta"> <figcaption> <h1>Launch an app in minutes</h1> <p>Speedrun an app onto Fly.io and get your own JIT WireGuard peer&nbsp✨</p> <a class="btn btn-lg" href="/docs/speedrun/"> Speedrun &nbsp;<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-dog.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='look-at-this-graph' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-this-graph' aria-label='Anchor'></a><span class='plain-code'>Look at this graph</span></h2> <p>We&rsquo;ve been running this in production for a few weeks and we&rsquo;re feeling pretty happy about it. We went from thousands, or hundreds of thousands, of stale WireGuard peers on a gateway to what rounds to none. Gateways now hold a lot less state, are faster at setting up peers, and can be rebooted without having to wait for many unused peers to be loaded back into the kernel.</p> <p>I&rsquo;ll leave you with this happy Grafana chart from the day of the switchover.</p> <p><img alt="a Grafana chart of &#39;kernel_stale_wg_peer_count&#39; vs. time. For the first few hours, all traces are flat. Most are at values between 0 and 50,000 and the top-most is just under 550,000. Towards the end of the graph, each line in turn jumps sharply down to the bottom, and at the end of the chart all datapoints are indistinguishable from 0." src="/blog/jit-wireguard-peers/assets/wireguard-peers-graph.webp" /></p> <p><strong class='font-semibold text-navy-950'>Editor&rsquo;s note:</strong> Despite our tearful protests, Lillian has decided to move on from Fly.io to explore new pursuits. We wish her much success and happiness!&nbsp;✨</p> </content> </entry> <entry> <title>Fly Kubernetes does more now</title> <link rel="alternate" href="https://fly.io/blog/fks-beta-live/"/> <id>https://fly.io/blog/fks-beta-live/</id> <published>2024-03-07T00:00:00+00:00</published> <updated>2024-04-22T18:28:43+00:00</updated> <media:thumbnail url="https://fly.io/blog/fks-beta-live/assets/fks-thumb.webp"/> <content type="html"><div class="lead"><p>Eons ago, we <a href="https://fly.io/blog/fks/" title="">announced</a> we were working on <a href="https://fly.io/docs/kubernetes/" title="">Fly Kubernetes</a>. It drummed up enough excitement to prove we were heading in the right direction. So, we got hard to work to get from barebones “early access” to a beta release. We’ll be onboarding customers to the closed beta over the next few weeks. Email us at <a href="mailto:[email protected]">[email protected]</a> and we’ll hook you up.</p> </div> <p>Fly Kubernetes is the &ldquo;blessed path&quot;™️ to using Kubernetes backed by Fly.io infrastructure. Or, in simpler terms, it is our managed Kubernetes service. We take care of the complexity of operating the Kubernetes control plane, leaving you with the unfettered joy of deploying your Kubernetes workloads. If you love Fly.io and K8s, this product is for you.</p> <h2 id='what-even-is-a-kubernete' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-even-is-a-kubernete' aria-label='Anchor'></a><span class='plain-code'>What even is a Kubernete?</span></h2> <p>So how did this all come to be&mdash;and what even is a Kubernete?</p> <div class="right-sidenote"><p>You can see more fun details in <a href="https://fly.io/blog/fks/" title="">Introducing Fly Kubernetes</a>.</p> </div> <p>If you wade through all the YAML and <a href='https://landscape.cncf.io/' title=''>CNCF projects</a>, what&rsquo;s left is an API for declaring workloads and how it should be accessed. </p> <p>But that&rsquo;s not what people usually talk / groan about. It&rsquo;s everything else that comes along with adopting Kubernetes: a container runtime (CRI), networking between workloads (CNI) which leads to DNS (CoreDNS). Then you layer on Prometheus for metrics and whatever the logging daemon du jour is at the time. Now you get to debate which Ingress&mdash;strike that&mdash;<em>Gateway</em> API to deploy and if the next thing is anything to do with a Service Mess, then as they like to say where I live, &quot;bless your heart&rdquo;.</p> <p>Finally, there&rsquo;s capacity planning. You&rsquo;ve got to pick and choose where, how and what the <a href='https://kubernetes.io/docs/concepts/architecture/nodes/' title=''>Nodes</a> will look like in order to configure and run the workloads.</p> <p>When we began thinking about what a Fly Kubernetes Service could look like, we started from first principles, as we do with most everything here. The best way we can describe it is the <a href='https://www.youtube.com/watch?v=Ddk9ci6geSs' title=''>scene from Iron Man 2 when Tony Stark discovers a new element</a>. As he&rsquo;s looking at the knowledge left behind by those that came before, he starts to imagine something entirely different and more capable than could have been accomplished previously. That&rsquo;s what happened to JP, but with K3s and Virtual Kubelet.</p> <h2 id='ok-then-wtf-whats-the-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ok-then-wtf-whats-the-fks' aria-label='Anchor'></a><span class='plain-code'>OK then, WTF (what&rsquo;s the FKS)?</span></h2> <p>We looked at what people need to get started—the API—and then started peeling away all the noise, filling in the gaps to connect things together to provide the power. Here&rsquo;s how this looks currently:</p> <ul> <li>Containerd/CRI → <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>flyd</a> + Firecracker + <a href='https://fly.io/blog/docker-without-docker/' title=''>our init</a>: our system transmogrifies Docker containers into Firecracker microVMs </li><li>Networking/CNI → Our <a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''>internal WireGuard mesh</a> connects your pods together </li><li>Pods → Fly Machines VMs </li><li>Secrets → Secrets, only not the base64&rsquo;d kind </li><li>Services → The Fly Proxy </li><li>CoreDNS → CoreDNS (to be replaced with our custom internal DNS) </li><li>Persistent Volumes → Fly Volumes (coming soon) </li></ul> <p>Now&hellip;not everything is a one-to-one comparison, and we explicitly did not set out to support any and every configuration. We aren&rsquo;t dealing with resources like Network Policy and init containers, though we&rsquo;re also not completely ignoring them. By mapping many of the core primitives of Kubernetes to a Fly.io resource, we&rsquo;re able to focus on continuing to build the primitives that make our cloud better for workloads of all shapes and sizes.</p> <p>A key thing to notice above is that there&rsquo;s no &ldquo;Node&rdquo;.</p> <p><a href='https://virtual-kubelet.io/' title=''>Virtual Kubelet</a> plays a central role in FKS. It&rsquo;s magic, really. A Virtual Kubelet acts as if it&rsquo;s a standard Kubelet running on a Node, eager to run your workloads. However, there&rsquo;s no Node backing it. It instead behaves like an API, receiving requests from Kubernetes and transforming them into requests to deploy on a cloud compute service. In our case, that&rsquo;s Fly Machines.</p> <p>So what we have is Kubernetes calling out to our <a href='https://virtual-kubelet.io/docs/providers/' title=''>Virtual Kubelet provider</a>, a small Golang program we run alongside K3s, to create and run your pod. It creates <a href='https://fly.io/blog/docker-without-docker/' title=''>your pod as a Fly Machine</a>, via the <a href='/docs/machines/api/' title=''>Fly Machines API</a>, deploying it to any underlying host within that region. This shifts the burden of managing hardware capacity from you to us. We think that&rsquo;s a cool trick&mdash;thanks, Virtual Kubelet magic!</p> <h2 id='speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#speedrun' aria-label='Anchor'></a><span class='plain-code'>Speedrun</span></h2> <p>You can deploy your workloads (including GPUs) across any of our available regions using the Kubernetes API.</p> <p>You create a cluster with <code>flyctl</code>:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-vomuctp1" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-vomuctp1">fly ext k8s create --name hello --org personal --region iad </code></pre> </div> </div> <p>When a cluster is created, it has the standard <code>default</code> namespace. You can inspect it:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-f85r6bqf" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-f85r6bqf">kubectl get ns default --show-labels </code></pre> </div> </div><div class="highlight-wrapper group relative output"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-6bmj8nmt" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight output whitespace-pre'><code id="code-6bmj8nmt">NAME STATUS AGE LABELS default Active 20d fly.io/app=fks-default-7zyjm3ovpdxmd0ep,kubernetes.io/metadata.name=default </code></pre> </div> </div> <p>The <code>fly.io/app</code> label shows the name of the Fly App that corresponds to your cluster.</p> <p>It would seem appropriate to deploy the <a href='https://github.com/kubernetes-up-and-running/kuard' title=''>Kubernetes Up And Running demo</a> here, but since your pods are connected over an <a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''>IPv6 WireGuard mesh</a>, we&rsquo;re going to use a <a href='https://github.com/jipperinbham/kuard' title=''>fork</a> with support for <a href='https://github.com/kubernetes-up-and-running/kuard/issues/46' title=''>IPv6 DNS</a>.</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-h0ws84lr" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-h0ws84lr">kubectl run \ --image=ghcr.io/jipperinbham/kuard-amd64:blue \ --labels="app=kuard-fks" \ kuard </code></pre> </div> </div> <p>And you can see its Machine representation via:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-ktbm1ey3" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-ktbm1ey3">fly machine list --app fks-default-7zyjm3ovpdxmd0ep </code></pre> </div> </div><div class="highlight-wrapper group relative output"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-httmdmgs" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight output whitespace-pre'><code id="code-httmdmgs">ID NAME STATE REGION IMAGE IP ADDRESS VOLUME CREATED LAST UPDATED APP PLATFORM PROCESS GROUP SIZE 1852291c46ded8 kuard started iad jipperinbham/kuard-amd64:blue fdaa:0:48c8:a7b:228:4b6d:6e20:2 2024-03-05T18:54:41Z 2024-03-05T18:54:44Z shared-cpu-1x:256MB </code></pre> </div> </div> <p></div></p> <p>This is important! Your pod is a Fly Machine! While we don’t yet support all kubectl features, Fly.io tooling will &ldquo;just work&rdquo; for cases where we don&rsquo;t yet support the kubectl way. So, for example, we don&rsquo;t have <code>kubectl port-forward</code> and <code>kubectl exec</code>, but you can use flyctl to forward ports and get a shell into a pod.</p> <p>Expose it to your internal network using the standard ClusterIP Service:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-9dy6iy1l" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-9dy6iy1l">kubectl expose pod kuard \ --name=kuard \ --port=8080 \ --target-port=8080 \ --selector='app=kuard-fks' </code></pre> </div> </div> <p>ClusterIP Services work natively, and Fly.io internal DNS supports them. Within the cluster, CoreDNS works too.</p> <p>Access this Service locally via <a href='https://fly.io/docs/networking/private-networking/#flycast-private-load-balancing' title=''>flycast</a>: Get connected to your org&rsquo;s <a href='https://fly.io/docs/networking/private-networking/' title=''>6PN private WireGuard network</a>. Get kubectl to describe the <code>kuard</code> Service:</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-luy1nk1t" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-luy1nk1t">kubectl describe svc kuard </code></pre> </div> </div><div class="highlight-wrapper group relative output"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-r8ykf5mk" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight output'><code id="code-r8ykf5mk">Name: kuard Namespace: default Labels: app=kuard-fks Annotations: fly.io/clusterip-allocator: configured service.fly.io/sync-version: 11507529969321451315 Selector: app=kuard-fks Type: ClusterIP IP Family Policy: SingleStack IP Families: IPv6 IP: fdaa:0:48c8:0:1::1a IPs: fdaa:0:48c8:0:1::1a Port: &lt;unset&gt; 8080/TCP TargetPort: 8080/TCP Endpoints: [fdaa:0:48c8:a7b:228:4b6d:6e20:2]:8080 Session Affinity: None Events: &lt;none&gt; </code></pre> </div> </div> <p>You can pull out the Service&rsquo;s IP address from the above output, and get at the KUARD UI using that: in this case, <code>http://[fdaa:0:48c8:0:1::1a]:8080</code>. </p> <p>Using internal DNS: <code>http://&lt;service_name&gt;.svc.&lt;app_name&gt;.flycast:8080</code>. Or, in our example: <code>http://kuard.svc.fks-default-7zyjm3ovpdxmd0ep.flycast:8080</code>.</p> <p>And finally CoreDNS: <code>&lt;service_name&gt;.&lt;namespace&gt;.svc.cluster.local</code> resolves to the <code>fdaa</code> IP and is routable within the cluster.</p> <figure class="post-cta"> <figcaption> <h1>Get in on the FKS beta</h1> <p>Email us at [email protected]</p> </figcaption> <div class="image-container"> <img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='pricing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pricing' aria-label='Anchor'></a><span class='plain-code'>Pricing</span></h2> <p>The Fly Kubernetes Service is free during the beta. Fly Machines and Fly Volumes you create with it will cost the <a href='https://fly.io/docs/about/pricing/' title=''>same as for your other Fly.io projects</a>. It&rsquo;ll be <a href='https://fly.io/docs/about/pricing/#fly-kubernetes' title=''>$75/mo per cluster</a> after that, plus the cost of the other resources you create.</p> <h2 id='today-and-the-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#today-and-the-future' aria-label='Anchor'></a><span class='plain-code'>Today and the future</span></h2> <p>Today, Fly Kubernetes supports only a portion of the Kubernetes API. You can deploy pods using Deployments/ReplicaSets. Pods are able to communicate via Services using the standard K8s DNS format. Ephemeral and persistent volumes are supported.</p> <p>The most notable absences are: multi-container pods, StatefulSets, network policies, horizontal pod autoscaling and emptyDir volumes. We&rsquo;re working at supporting autoscaling and emptyDir volumes in the coming weeks and multi-container pods in the coming months.</p> <p>If you&rsquo;ve made it this far and are eagerly awaiting your chance to tell us and the rest of the internet &ldquo;this isn&rsquo;t Kubernetes!&rdquo;, well, we agree! It&rsquo;s not something we take lightly. We&rsquo;re still building, and conformance tests may be in the future for FKS. We&rsquo;ve made a deliberate decision to only care about fast launching VMs as the one and only way to run workloads on our cloud. And we also know enough of our customers would like to use the Kubernetes API to create a fast launching VM in the form of a Pod, and that&rsquo;s where this story begins. </p> </content> </entry> <entry> <title>Globally Distributed Object Storage with Tigris</title> <link rel="alternate" href="https://fly.io/blog/tigris-public-beta/"/> <id>https://fly.io/blog/tigris-public-beta/</id> <published>2024-02-15T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/tigris-public-beta/assets/tigris-public-beta-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. That’s pretty cool, but we want to talk about something someone else built, that <a href="https://fly.io/docs/reference/tigris/" title="">you can use today</a> to build applications.</p> </div> <p>There are three hard things in computer science:</p> <ol> <li>Cache invalidation </li><li>Naming things </li><li><a href='https://aws.amazon.com/s3/' title=''>Doing a better job than Amazon of storing files</a> </li></ol> <p>Of all the annoying software problems that have no business being annoying, handling a file upload in a full-stack application stands apart, a universal if tractable malady, the plantar fasciitis of programming.</p> <p>Now, the actual act of clients placing files on servers is straightforward. Your framework <a href='https://hexdocs.pm/phoenix/file_uploads.html' title=''>has</a> <a href='https://edgeguides.rubyonrails.org/active_storage_overview.html' title=''>a</a> <a href='https://docs.djangoproject.com/en/5.0/topics/http/file-uploads/' title=''>feature</a> <a href='https://expressjs.com/en/resources/middleware/multer.html' title=''>that</a> <a href='https://github.com/yesodweb/yesod-cookbook/blob/master/cookbook/Cookbook-file-upload-saving-files-to-server.md' title=''>does</a> <a href='https://laravel.com/docs/10.x/filesystem' title=''>it</a>. What&rsquo;s hard is making sure that uploads stick around to be downloaded later.</p> <aside class="right-sidenote"><p>(yes, yes, we know, <a href="https://youtu.be/b2F-DItXtZs?t=102" title="">sharding /dev/null</a> is faster)</p> </aside> <p>Enter object storage, a pattern you may know by its colloquial name &ldquo;S3&rdquo;. Object storage occupies a funny place in software architecture, somewhere between a database and a filesystem. It&rsquo;s like <a href='https://man7.org/linux/man-pages/man3/malloc.3.html' title=''><code>malloc</code></a><code>()</code>, but for cloud storage instead of program memory.</p> <p><a href='https://www.kleenex.com/en-us/' title=''>S3</a>—err, object storage — is so important that it was the second AWS service ever introduced (EC2 was not the first!). Everybody wants it. We know, because they keep asking us for it.</p> <p>So why didn&rsquo;t we build it?</p> <p>Because we couldn&rsquo;t figure out a way to improve on S3. And we still haven&rsquo;t! But someone else did, at least for the kinds of applications we see on Fly.io.</p> <h2 id='but-first-some-back-story' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-first-some-back-story' aria-label='Anchor'></a><span class='plain-code'>But First, Some Back Story</span></h2> <p>S3 checks all the boxes. It&rsquo;s trivial to use. It&rsquo;s efficient and cost-effective. It has redundancies that would make a DoD contractor blush. It integrates with archival services like Glacier. And every framework supports it. At some point, the IETF should just take a deep sigh and write an S3 API RFC, XML signatures and all.</p> <p>There&rsquo;s at least one catch, though.</p> <p>Back in, like, &lsquo;07 people ran all their apps from a single city. S3 was designed to work for those kinds of apps. The data, the bytes on the disks (or whatever weird hyperputer AWS stores S3 bytes on), live in one place. A specific place. In a specific data center. As powerful and inspiring as The Architects are, they are mortals, and must obey the laws of physics.</p> <p>This observation feels banal, until you realize how apps have changed in the last decade. Apps and their users don&rsquo;t live in one specific place. They live all over the world. When users are close to the S3 data center, things are amazing! But things get less amazing the further away you get from the data center, and even less amazing the smaller and more frequent your reads and writes are.</p> <p>(Thought experiment: you have to pick one place in the world to route all your file storage. Where is it? Is it <a href='https://www.tripadvisor.com/Restaurant_Review-g30246-d1956555-Reviews-Ford_s_Fish_Shack_Ashburn-Ashburn_Loudoun_County_Virginia.html' title=''>Loudoun County, Virginia</a>?)</p> <p>So, for many modern apps, you end up having to <a href='https://stackoverflow.com/questions/32426249/aws-s3-bucket-with-multiple-regions' title=''>write things into different regions</a>, so that people close to the data get it from a region-specific bucket. Doing that pulls in CDN-caching things that complicated your application and put barriers between you and your data. Before you know it, you&rsquo;re wearing custom orthotics on your, uh, developer feet. (<em>I am done with this metaphor now, I promise.</em>)</p> <aside class="right-sidenote"><p>(well, okay, Backblaze B2 because somehow my bucket fits into their free tier, but you get the idea)</p> </aside> <p>Personally, I know this happens. Because I had to build one! I run a <a href='https://xeiaso.net/blog/xedn/' title=''>CDN backend</a> that&rsquo;s a caching proxy for S3 in six continents across the world. All so that I can deliver images and video efficiently for the readers of my blog.</p> <aside class="right-sidenote"><p>(shut up, it’s a sandwich)</p> </aside> <p>What if data was really global? For some applications, it might not matter much. But for others, it matters a lot. When a sandwich lover in Australia snaps a picture of a <a href='https://en.wikipedia.org/wiki/Hamdog' title=''>hamdog</a>, the people most likely to want to see that photo are also in Australia. Routing those uploads and downloads through one building in Ashburn is no way to build a sandwich reviewing empire.</p> <p>Localizing all the data sounds like a hard problem. What if you didn&rsquo;t need to change anything on your end to accomplish it?</p> <h2 id='show-me-a-hero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#show-me-a-hero' aria-label='Anchor'></a><span class='plain-code'>Show Me A Hero</span></h2> <p>Building a miniature CDN infrastructure just to handle file uploads seems like the kind of thing that could take a week or so of tinkering. The Fly.io unified theory of cloud development is that solutions are completely viable for full-stack developers only when they take less than 2 hours to get working.</p> <p>AWS agrees, which is why they have a SKU for it, <a href='https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_distribution' title=''>called Cloudfront</a>, which will, at some variably metered expense, optimize the read side of a single-write-region bucket: they&rsquo;ll set up <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>a simple caching CDN</a> for you. You can probably get S3 and Cloudfront working within 2 hours, especially if you&rsquo;ve set it up before.</p> <p>Our friends at Tigris have this problem down to single-digit minutes, and what they came up with is a lot cooler than a cache CDN.</p> <p>Here&rsquo;s how it works. Tigris runs redundant FoundationDB clusters in our regions to track objects. They use Fly.io&rsquo;s NVMe volumes as a first level of cached raw byte store, and a queuing system modelled on <a href='https://www.foundationdb.org/files/QuiCK.pdf' title=''>Apple&rsquo;s QuiCK paper</a> to distribute object data to multiple replicas, to regions where the data is in demand, and to 3rd party object stores… like S3.</p> <p>If your objects are less than about 128 kilobytes, Tigris makes them instantly global. By default! Things are just snappy, all over the world, automatically, because they&rsquo;ve done all the work.</p> <p>But it gets better, because Tigris is also much more flexible than a cache simple CDN. It&rsquo;s globally distributed from the jump, with inter-region routing baked into its distribution layer. Tigris isn&rsquo;t a CDN, but rather a toolset that you can use to build arbitrary CDNs, with consistency guarantees, instant purge and relay regions.</p> <p>There&rsquo;s a lot going on in this architecture, and it&rsquo;d be fun to dig into it more. But for now, you don&rsquo;t have to understand any of it. Because Tigris ties all this stuff together with an S3-compatible object storage API. If your framework can talk to S3, it can use Tigris.</p> <h2 id='fly-storage' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-storage' aria-label='Anchor'></a><span class='plain-code'><code>fly storage</code></span></h2> <p>To get started with this, run the <code>fly storage create</code> command:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-69koa0wf" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-69koa0wf">$ fly storage create Choose a name, use the default, or leave blank to generate one: xe-foo-images Your Tigris project (xe-foo-images) is ready. See details and next steps with: https://fly.io/docs/reference/tigris/ Setting the following secrets on xe-foo: AWS_REGION BUCKET_NAME AWS_ENDPOINT_URL_S3 AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY Secrets are staged for the first deployment </code></pre> </div> </div> <p>All you have to do is fill in a bucket name. Hit enter. All of the configuration for the AWS S3 library will be injected into your application for you. And you don&rsquo;t even need to change the libraries that you&rsquo;re using. <a href='https://www.tigrisdata.com/docs/sdks/s3/' title=''>The Tigris examples</a> all use the AWS libraries to put and delete objects into Tigris using the same calls that you use for S3.</p> <p>I know how this looks for a lot of you. It looks like we&rsquo;re partnering with Tigris because we&rsquo;re chicken, and we didn&rsquo;t want to build something like this. Well, guess what: you&rsquo;re right!</p> <p>Compute and networking: those are things we love and understand. Object storage? <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>We already gave away the game on how we&rsquo;d design a CDN for our own content</a>, and it wasn&rsquo;t nearly as slick as Tigris.</p> <p>Object storage is important. It needs to be good. We did not want to half-ass it. So we partnered with Tigris, so that they can put their full resources into making object storage as ✨magical✨ as Fly.io is.</p> <p>This also mirrors a lot of the Unix philosophy of Days Gone Past, you have individual parts that do one thing very well that are then chained together to create a composite result. I mean, come on, would you seriously want to buy your servers the same place you buy your shoes?</p> <h2 id='one-bill-to-rule-them-all' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#one-bill-to-rule-them-all' aria-label='Anchor'></a><span class='plain-code'>One bill to rule them all</span></h2> <p>Well, okay, the main reason why you would want to do that is because having everything under one bill makes it really easy for your accounting people. So, to make one bill for your computer, your block storage, your databases, your networking, and your object storage, we&rsquo;ve wrapped everything under one bill. You don&rsquo;t have to create separate accounts with Supabase or Upstash or PlanetScale or Tigris. Everything gets charged to your Fly.io bill and you pay one bill per month.</p> <aside class="right-sidenote"><p>This was actually going to be posted on Valentine’s Day, but we had to wait for the chocolate to go on sale.</p> </aside> <p>This is our Valentine&rsquo;s Day gift to you all. Object storage that just works. Stay tuned because we have a couple exciting features that build on top of the integration of Fly.io and Tigris that allow really unique things, such as truly global static website hosting and turning your bucket into a CDN in 5 minutes at most.</p> <p>Here&rsquo;s to many more happy developer days to come.</p> </content> </entry> <entry> <title>GPUs on Fly.io are available to everyone!</title> <link rel="alternate" href="https://fly.io/blog/gpu-ga/"/> <id>https://fly.io/blog/gpu-ga/</id> <published>2024-02-12T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/gpu-ga/assets/gpu-ga-thumb.webp"/> <content type="html"><div class="lead"><p>Fly.io makes it easy to spin up compute around the world, now including powerful GPUs. Unlock the power of large language models, text transcription, and image generation with our datacenter-grade muscle!</p> </div> <p>GPUs are now available to everyone!</p> <p>We know you&rsquo;ve been excited about wanting to use GPUs on Fly.io and we&rsquo;re happy to announce that they&rsquo;re available for everyone. If you want, you can spin up GPU instances with any of the following cards:</p> <ul> <li>Ampere A100 (40GB) <code>a100-40gb</code> </li><li>Ampere A100 (80GB) <code>a100-80gb</code> </li><li>Lovelace L40s (48GB) <code>l40s</code> </li></ul> <p>To use a GPU instance today, change the <code>vm.size</code> for one of your apps or processes to any of the above GPU kinds. Here&rsquo;s how you can spin up an <a href='https://ollama.ai' title=''>Ollama</a> server in seconds:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-mgip5vdl" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-mgip5vdl"><span class="py">app</span> <span class="p">=</span> <span class="s">"your-app-name"</span> <span class="py">region</span> <span class="p">=</span> <span class="s">"ord"</span> <span class="py">vm.size</span> <span class="p">=</span> <span class="s">"l40s"</span> <span class="nn">[http_service]</span> <span class="py">internal_port</span> <span class="p">=</span> <span class="mi">11434</span> <span class="py">force_https</span> <span class="p">=</span> <span class="kc">false</span> <span class="py">auto_stop_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">auto_start_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">min_machines_running</span> <span class="p">=</span> <span class="mi">0</span> <span class="py">processes</span> <span class="p">=</span> <span class="nn">["app"]</span> <span class="nn">[build]</span> <span class="py">image</span> <span class="p">=</span> <span class="s">"ollama/ollama"</span> <span class="nn">[mounts]</span> <span class="py">source</span> <span class="p">=</span> <span class="s">"models"</span> <span class="py">destination</span> <span class="p">=</span> <span class="s">"/root/.ollama"</span> <span class="py">initial_size</span> <span class="p">=</span> <span class="s">"100gb"</span> </code></pre> </div> </div> <p>Deploy this and bam, large language model inferencing from anywhere. If you want a private setup, see the article <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a> for more information. You never know when you have a sandwich emergency and don&rsquo;t know what you can make with what you have on hand.</p> <p>We are working on getting some lower-cost A10 GPUs in the next few weeks. We&rsquo;ll update you when they&rsquo;re ready.</p> <p>If you want to explore the possibilities of GPUs on Fly.io, here&rsquo;s a few articles that may give you ideas:</p> <ul> <li><a href='https://fly.io/blog/not-midjourney-bot/' title=''>Deploy Your Own (Not) MidJourney Bot On Fly GPUs</a> </li><li><a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a> </li><li><a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>Transcribing on Fly GPU Machines</a> </li></ul> <p>Depending on factors such as your organization&rsquo;s age and payment history, you may need to go through additional verification steps.</p> <p>If you&rsquo;ve been experimenting with Fly.io GPUs and have made something cool, let us know on the <a href='https://community.fly.io/' title=''>Community Forums</a> or by mentioning us <a href='https://hachyderm.io/@flydotio' title=''>on Mastodon</a>! We&rsquo;ll boost the cool ones.</p> </content> </entry> <entry> <title>Event Driven Machines</title> <link rel="alternate" href="https://fly.io/blog/event-driven-machines/"/> <id>https://fly.io/blog/event-driven-machines/</id> <published>2024-02-05T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/event-driven-machines/assets/lambdo-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We have fast booting VM’s, so why not <a href="https://fly.io/docs/speedrun/" title="">take advantage of them</a>?</p> </div> <p>Serverless is great because is has good ergonomics - when an event is received, a &ldquo;not-server&rdquo; boots quickly, code is run, and then everything is torn down. We&rsquo;re billed only on usage.</p> <p>It turns out that Fly.io shares many of <a href='https://fly.io/blog/the-serverless-server/' title=''>the same ergonomics</a> as serverless. Can we do a serverless on Fly.io? 🦆 Well, if it&rsquo;s quacking like a duck, let&rsquo;s call it a mallard.</p> <p>Here&rsquo;s a useful pattern for triggering our own not-servers with Fly Machines.</p> <h2 id='triggering-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#triggering-machines' aria-label='Anchor'></a><span class='plain-code'>Triggering Machines</span></h2> <p>I want to make Machines do some work based on my own events. Fly.io can already <a href='https://fly.io/docs/apps/autostart-stop/' title=''>stop Machines when idle</a> based on HTTP, so let&rsquo;s concentrate on non-HTTP events.</p> <p>The process of running evented Machines involves:</p> <ol> <li>Listening for events </li><li>Spinning up Fly Machines to run our code (with the events as context) </li><li>Having event-aware code to run </li></ol> <p>To do this, I made a project and named it <a href='https://github.com/fly-apps/lambdo' title=''><strong class='font-semibold text-navy-950'>Lambdo</strong></a> because reasons. You can consider this project &ldquo;reference architecture&rdquo; in the same way you call a toddler&rsquo;s scribbling &ldquo;art&rdquo;.</p> <p>The goal is to run some of our code on a fresh not-server when an event is received. We want this done efficiently - a Machine should only exist long enough to process an event or 3.</p> <p>Lambdo does just that - it receives some events, and spins up Fly Machines with those events placed <em>inside</em> the VMs. Once the code finishes, the Machine is destroyed.</p> <div class='group relative min-w-0 bg-white shadow-md shadow-navy-500/10 rounded-xl mb-7 ring-1 ring-navy-300/40'><button type='button' class='bubble-wrap z-20 absolute right-2.5 top-2.5 text-transparent group-hover:text-navy-950 hocus:text-violet-600 bg-transparent group-hover:bg-white hocus:bg-violet-200/40 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none' data-wrap-target='#table-hzpri2l5' data-wrap-type='nowrap'><svg class='w-5 h-5 pointer-events-none' viewBox='0 0 20 20' fill='none' stroke='currentColor' stroke-width='1.5' stroke-linecap='round' stroke-linejoin='round'><g buffered-rendering='static'><path d='M11.912 10.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.314 2.314 0 00-2.315-2.31H4.959M15.187 14.5H4.959M8.802 10H4.959' /><path d='M13.081 8.466l-1.548 1.571 1.548 1.571' /></g></svg><span class='bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950'>Wrap text</span></button><div class='min-w-0 overflow-x-auto rounded-xl'><table class='table-stripe table-stretch table-pad text-sm whitespace-nowrap m-0' id='table-hzpri2l5'><thead class='text-navy-950 text-left'><tr> <th style="text-align: center"><img alt="the files are inside the computer" src="/blog/event-driven-machines/assets/files-are-inside-the-computer-cover.webp" /></th> </tr> </thead><tbody><tr> <td style="text-align: center">The files are <em>in</em> the computer!</td> </tr> </tbody></table></div></div><h2 id='listening-for-events' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#listening-for-events' aria-label='Anchor'></a><span class='plain-code'>Listening for Events</span></h2> <p>For our purposes, an event is just a JSON object. <code>{&quot;any&quot;: &quot;object&quot;, &quot;will&quot;: &quot;do&quot;}</code>.</p> <p>We want to turn events into compute, so we need some sort of event system. I decided to use a queue.</p> <h3 id='the-queue' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-queue' aria-label='Anchor'></a><span class='plain-code'>The Queue</span></h3> <p>The first thing I needed was a place to send events! I chose to use SQS, which let me continue to pretend servers don&rsquo;t exist.</p> <p>It&rsquo;s no surprise then that the first part of this project is <a href='https://github.com/fly-apps/lambdo/blob/main/internal/sqs/get_events.go' title=''>code that polls SQS</a>.</p> <p>When the polling returns some non-zero number of events, it collects the SQS messages&rsquo; JSON strings (and some meta data), resulting in an array of objects (a list of events).</p> <p>Then we send these events to some Machines.</p> <h2 id='spinning-up-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#spinning-up-machines' aria-label='Anchor'></a><span class='plain-code'>Spinning Up Machines</span></h2> <p>Fly Machines are fast-booting Micro-VM&rsquo;s, controlled by an <a href='https://fly.io/docs/machines/working-with-machines/' title=''>API</a>.</p> <p>A feature of that API is the ability to <a href='https://community.fly.io/t/machine-files/14453' title=''>create files</a> on a new Machine. This is how we&rsquo;ll get our events into the Machine.</p> <p>When Lambdo creates a Machine, it places a file at <code>/tmp/events.json</code>. Our code just needs to read that file and parse the JSON.</p> <h3 id='running-our-code' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#running-our-code' aria-label='Anchor'></a><span class='plain-code'>Running Our Code</span></h3> <p>Part of the ergonomics of Serverless is (usually) being limited to running just a function. Fly.io doesn&rsquo;t really care what you run, which is to our advantage. We can choose to write discreet functions per event, or we can bring our whole <a href='https://signalvnoise.com/svn3/the-majestic-monolith/' title=''>Majestic Monolith</a> to bear.</p> <p>How do we package up our code? The real answer is &ldquo;however you want!&rdquo;, but here&rsquo;s 2 ideas.</p> <p><strong class='font-semibold text-navy-950'>Use Your Existing Code Base</strong></p> <p>You can just use your existing code base. This is especially easy if you&rsquo;re already deploying apps to Fly.io.</p> <p>All we&rsquo;d need to do is add some additional code - a command perhaps (<code>rake</code>, <code>artisan</code>, whatever) - that sucks in that JSON, iterates over the events, and does some stuff.</p> <div class="highlight-wrapper group relative php"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-4juzgucl" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-4juzgucl"><span class="nv">$events</span> <span class="o">=</span> <span class="nb">json_decode</span><span class="p">(</span><span class="nb">file_get_contents</span><span class="p">(</span><span class="s2">"/tmp/events.json"</span><span class="p">));</span> <span class="k">foreach</span> <span class="p">(</span><span class="nv">$events</span> <span class="k">as</span> <span class="nv">$event</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// do a thing</span> <span class="p">}</span> </code></pre> </div> </div> <p>When we create an event, we&rsquo;ll tell Lambdo how to run your code - more on that later.</p> <p><strong class='font-semibold text-navy-950'>Use Lambdo&rsquo;s Base Images</strong></p> <p>This project also provides some &ldquo;runtimes&rdquo; (base images). This is a bit more &ldquo;traditional serverless&rdquo;, were you provide a function to run.</p> <p>Lambdo contains <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes' title=''>two runtimes</a> right now - Node and PHP. There could be more, of course, but you know&hellip;lazy.</p> <p>The Node runtime <a href='https://github.com/fly-apps/lambdo/blob/main/runtimes/js/src/index.js' title=''>contains some code</a> that will read the JSON payload file (again, just an array of JSON events), and call a user-supplied JS function once per event.</p> <p>An <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes/js/sample-project' title=''>example is here</a> - our code just needs to export a function that does stuff to the given event:</p> <div class="highlight-wrapper group relative javascript"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-d6ki7m4i" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-d6ki7m4i"><span class="c1">// File /app/index.js</span> <span class="nx">exports</span><span class="p">.</span><span class="nx">handler</span> <span class="o">=</span> <span class="k">async</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">Let's process an event! The event:</span><span class="dl">"</span><span class="p">,</span> <span class="nx">event</span><span class="p">)</span> <span class="p">}</span> </code></pre> </div> </div> <p>The <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes/php' title=''>PHP runtime</a> is the same idea, a user-supplied handler looks like this:</p> <div class="highlight-wrapper group relative php"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-coch74a" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-coch74a"><span class="c1">// File /app/index.php</span> <span class="k">return</span> <span class="k">function</span> <span class="n">function</span><span class="p">(</span><span class="kt">array</span> <span class="nv">$event</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// Do something with $event</span> <span class="p">}</span> </code></pre> </div> </div> <p>Explore the <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes' title=''>runtime</a> directory of the project to see how that&rsquo;s put together.</p> <h2 id='sending-an-event' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#sending-an-event' aria-label='Anchor'></a><span class='plain-code'>Sending an Event</span></h2> <p>Since our events are sent via SQS queue, it would be helpful to see an example SQS message. Remember how I mentioned the SQS message has some meta data?</p> <p>Here&rsquo;s an example, with said meta data:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-uwc3p0p" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-uwc3p0p">aws sqs send-message <span class="se">\</span> <span class="nt">--queue-url</span><span class="o">=</span>https://sqs.&lt;region&gt;.amazonaws.com/&lt;account&gt;/&lt;queue&gt; <span class="se">\</span> <span class="nt">--message-body</span><span class="o">=</span><span class="s1">'{"foo": "bar"}'</span> <span class="se">\</span> <span class="nt">--message-attributes</span><span class="o">=</span><span class="s1">'{ "size":{"DataType":"String","StringValue":"performance-2x"}, "image":{"DataType":"String","StringValue":"fideloper/lambdo-php-sample:latest"} }'</span> </code></pre> </div> </div> <p>The Body field of the SQS message is assumed to be a JSON string (it&rsquo;s the event itself, and its contents are arbitrary - whatever makes sense for you).</p> <p>The message Attributes contains the meta data - up to 3 important details:</p> <ol> <li><code>image</code>: The image to run (it might be a Docker Hub image, or something you pushed to registry.fly.io). This is <strong class='font-semibold text-navy-950'>required</strong>. </li><li><code>size</code>: The CPU size and type to use† - defaults to <code>performance-2x</code> </li><li><code>command</code>: The command to run, which is the Docker <code>CMD</code> equivalent - defaults to whatever your <code>CMD</code> is set in the <code>Dockerfile</code> used to create the Machine image.†† </li></ol> <p>†You can get valid values for the <code>size</code> option by running <code>fly platform vm-sizes</code>.</p> <p>††It&rsquo;s an array form, e.g. <code>[&quot;php&quot;, &quot;artisan&quot;, &quot;foo&quot;]</code>, you may need to do some escaping of double quotes if you&rsquo;re sending messages to SQS via terminal.</p> <h2 id='we-did-a-lambda' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-did-a-lambda' aria-label='Anchor'></a><span class='plain-code'>We did a Lambda?</span></h2> <p>Fly.io isn&rsquo;t serverless, but it has all these primitives that add up to serverless. You have events, Fly.io has fast-booting VM&rsquo;s. They just make sense together!</p> <p>What we did here is use <a href='https://github.com/fly-apps/lambdo' title=''><strong class='font-semibold text-navy-950'>Lambdo</strong> to respond to events by spinning up a Machine</a>. Our code can process those events any way we want.</p> <p>What I like about this approach is how flexible it can be. We can choose the base image to use and the server type (even using GPU-enabled Machines) <em>per event</em>. Since we have full control over the Machine VM&rsquo;s responding to the events, we can do whatever we want inside of them. Pretty neat!</p> </content> </entry> <entry> <title>Delegating tasks to Fly Machines</title> <link rel="alternate" href="https://fly.io/blog/delegate-tasks-to-fly-machines/"/> <id>https://fly.io/blog/delegate-tasks-to-fly-machines/</id> <published>2024-02-01T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/delegate-tasks-to-fly-machines/assets/delegate-tasks-to-fly-machines-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io. We run apps for our users on hardware we host around the world. Leveraging Fly.io Machines and Fly.io’s private network can make delegating expensive tasks a breeze. It’s easy to <a href="/docs/speedrun/" title="">get started</a>!</p> </div> <p>There are many ways to delegate work in web applications, from using background workers to serverless architecture. In this article, we explore a new machine pattern that takes advantage of Fly Machines and distinct process groups to make quick work of resource-intensive tasks.</p> <h2 id='the-problem' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-problem' aria-label='Anchor'></a><span class='plain-code'>The Problem</span></h2> <p>Let&rsquo;s say you&rsquo;re building a web application that has a few tasks that demand a hefty amount of memory or CPU juice. Resizing images, for example, can require a shocking amount of memory, but you might not need that much memory <em>all</em> of the time, for handling most of your web requests. Why pay for all that horsepower when you don&rsquo;t need it most of the time?</p> <p>What if there&rsquo;s a different way to delegate these resource-intensive tasks?</p> <h2 id='the-solution' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-solution' aria-label='Anchor'></a><span class='plain-code'>The Solution</span></h2> <p>What if you could simply delegate these types of tasks to a more powerful machine <em>only</em> when necessary? Let&rsquo;s build an example of this method in a sample app. We&rsquo;ll be using Next.js today, but this pattern is framework (and language) agnostic.</p> <p>Here&rsquo;s how it will work:</p> <ul> <li>A request hits an endpoint that does some resource-intensive tasks </li><li>The request is passed on to a copy of your app that&rsquo;s running on a more beefy machine </li><li>The beefy machine performs the intensive work and then hands the result back to the user via the &ldquo;weaker&rdquo; machine. </li></ul> <p><img alt="(A 3 panel comic of two characters, one small and one big and strong, both with computer screens for heads. Panel 1: Little guy hands the big guy a jar of pickles. Panel 2: Big guy opens the pickle jar. Panel 3: Big guy hands back the opened jar to the little guy, who is pleased; Illustration by Annie Sexton)" src="/blog/delegate-tasks-to-fly-machines/assets/./3-panel-comic-delegate-tasks-to-fly-machines.webp" /></p> <p>To demonstrate this task-delegation pattern, we&rsquo;re going to start with a single-page application that looks like this:</p> <p><img alt="(Screenshot of the demo app; its a single-page app with the header and description &quot;Open Pickle Jar: You&#39;ve got a jar of pickles (a zip file of some high-def pickle photos) that you would like to open (resize and display below)&quot;. Under the description there are two inputs, one for width and one for height, and a button that says &quot;Open pickle jar&quot;)" src="/blog/delegate-tasks-to-fly-machines/assets/./pickle-jar-screenshot.webp" /></p> <p>Our &ldquo;Open Pickle Jar&rdquo; app is quite simple: you provide the width and height and it goes off and resizes some high-resolution photos to those dimensions (exciting!).</p> <p>If you&rsquo;d like to follow along, you can clone the <code>start-here</code> branch of this repository: <a href='https://github.com/fly-apps/open-pickle-jar' title=''>https://github.com/fly-apps/open-pickle-jar</a> . The final changes are visible on the <code>main</code> branch. This app uses S3 for image storage, so you&rsquo;ll need to create a bucket called <code>open-pickle-jar</code> and provide <code>AWS_REGION</code>, <code>AWS_ACCESS_KEY_ID</code>, and <code>AWS_SECRET_ACCESS_KEY</code> as environment variables.</p> <p>This task is really just a stand-in for any HTTP request that kicks off a resource-intensive task. Get the request from the user, delegate it to a more powerful machine, and then return the result to the user. It&rsquo;s what happens when you can&rsquo;t open a pickle jar, and you ask for someone to help.</p> <p>Before we start, let&rsquo;s define some terms and what they mean on Fly.io:</p> <ul> <li><strong class='font-semibold text-navy-950'>Machines:</strong> Extremely fast-booting VMs. They can exist in different regions and even run different processes. </li><li><strong class='font-semibold text-navy-950'>App:</strong> An abstraction for a group of Machines running your code on Fly.io, along with the configuration, provisioned resources, and data we need to keep track of to run and route to your Machines. </li><li><strong class='font-semibold text-navy-950'>Process group:</strong> A collection of Machines running a specific process. Many apps only run a single process (typically a public-facing HTTP server), but you can define any number of them. </li><li><strong class='font-semibold text-navy-950'>fly.toml:</strong> A configuration file for deploying apps on Fly.io where you can set things like Machine specs, process groups, regions, and more. </li></ul> <hr> <p><strong class='font-semibold text-navy-950'>Setup Overview</strong></p> <p>Here&rsquo;s what we&rsquo;ll need for our application:</p> <ol> <li>A <strong class='font-semibold text-navy-950'>route</strong> that performs our resource-intensive task </li><li>A <strong class='font-semibold text-navy-950'>wrapper function</strong> that either: <ol> <li>Runs our resource-intensive task OR </li><li>Forwards the request to our more powerful Machine </li></ol> </li><li><strong class='font-semibold text-navy-950'>Two process groups</strong> running the <em>same process</em> but with differing Machine specs: <ol> <li>One for accepting HTTP traffic and handling most requests (let&rsquo;s call it <code>web</code>) </li><li>One internal-only group for doing the heavy lifting (let&rsquo;s call it <code>worker</code>) </li></ol> </li></ol> <p>In short, this is what our architecture will look like, a standard web and worker duo.</p> <p><img alt="(A simple graphic illustrating two servers; a small box containing &quot;npm run start&quot; and a larger box containing the same thing. The small is labeled &quot;web&quot; and the larger box is labeled &quot;worker&quot;.)" src="/blog/delegate-tasks-to-fly-machines/assets/./web-worker.webp" /></p> <h3 id='creating-our-route' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#creating-our-route' aria-label='Anchor'></a><span class='plain-code'>Creating our route</span></h3> <p>Next.js has two distinct routing patterns: Pages and App router. We&rsquo;ll use the App router in our example since it&rsquo;s the preferred method moving forward.</p> <p>Under your <code>/app</code> directory, create a new folder called <code>/open-pickle-jar</code> containing a <code>route.ts</code> .</p> <p>(We&rsquo;re using TypeScript here, but feel free to use normal JavaScript if you prefer!)</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-lg2jvd1h" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-lg2jvd1h">... /app /open-pickle-jar route.ts ... </code></pre> </div> </div> <p>Inside <code>route.ts</code> we&rsquo;ll flesh out our endpoint:</p> <div class="highlight-wrapper group relative typescript"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-x0guz9t5" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-x0guz9t5"><span class="c1">// /app/open-pickle-jar/route.ts</span> <span class="k">import</span> <span class="nx">delegateToWorker</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">@/utils/delegateToWorker</span><span class="dl">"</span><span class="p">;</span> <span class="k">import</span> <span class="p">{</span> <span class="nx">NextRequest</span><span class="p">,</span> <span class="nx">NextResponse</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">next/server</span><span class="dl">"</span><span class="p">;</span> <span class="k">import</span> <span class="p">{</span> <span class="nx">openPickleJar</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">../openPickleJar</span><span class="dl">"</span><span class="p">;</span> <span class="k">export</span> <span class="k">async</span> <span class="kd">function</span> <span class="nx">POST</span><span class="p">(</span><span class="nx">request</span><span class="p">:</span> <span class="nx">NextRequest</span><span class="p">)</span> <span class="p">{</span> <span class="kd">const</span> <span class="p">{</span> <span class="nx">width</span><span class="p">,</span> <span class="nx">height</span> <span class="p">}</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">request</span><span class="p">.</span><span class="nx">json</span><span class="p">();</span> <span class="kd">const</span> <span class="nx">path</span> <span class="o">=</span> <span class="nx">request</span><span class="p">.</span><span class="nx">nextUrl</span><span class="p">.</span><span class="nx">pathname</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">body</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">delegateToWorker</span><span class="p">(</span><span class="nx">path</span><span class="p">,</span> <span class="nx">openPickleJar</span><span class="p">,</span> <span class="p">{</span> <span class="nx">width</span><span class="p">,</span> <span class="nx">height</span> <span class="p">});</span> <span class="k">return</span> <span class="nx">NextResponse</span><span class="p">.</span><span class="nx">json</span><span class="p">(</span><span class="nx">body</span><span class="p">);</span> <span class="p">}</span> </code></pre> </div> </div> <p>The function <code>openPickleJar</code> that we&rsquo;re importing contains our resource-intensive task, which in this case is extracting images from a <code>.zip</code> file, resizing them all to the new dimensions, and returning the new image URLs.</p> <p>The <code>POST</code> function is how one define routes for specific HTTP methods in Next.js, and ours implements a function <code>delegateToWorker</code> that accepts the path of the current endpoint (<code>/open-pickle-jar</code>) our resource-intensive function, and the same request parameters. This function doesn&rsquo;t yet exist, so let&rsquo;s build that next!</p> <h3 id='creating-our-wrapper-function' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#creating-our-wrapper-function' aria-label='Anchor'></a><span class='plain-code'>Creating our wrapper function</span></h3> <p>Now that we&rsquo;ve set up our endpoint, let&rsquo;s flesh out the wrapper function that delegates our request to a more powerful machine.</p> <p>We haven&rsquo;t defined our process groups just yet, but if you recall, the plan is to have two:</p> <ol> <li><code>web</code> - Our standard web server </li><li><code>worker</code> - For opening pickle jars (e.g. doing resource-intensive work). It&rsquo;s essentially a duplicate of <code>web</code>, but running on beefier Machines. </li></ol> <p>Here&rsquo;s what we want this wrapper function to do:</p> <ul> <li>If the current machine is a <code>worker</code> , proceed to execute the resource-intensive task </li><li>If the current machine is NOT a <code>worker</code> , make a new request to the identical endpoint on a <code>worker</code> Machine </li></ul> <p>Inside your <code>/utils</code> directory, create a file called <code>delegateToWorker.ts</code> with the following content:</p> <div class="highlight-wrapper group relative typescript"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-c07fgdhq" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-c07fgdhq"><span class="c1">// /utils/delegateToWorker.ts</span> <span class="k">export</span> <span class="k">default</span> <span class="k">async</span> <span class="kd">function</span> <span class="nx">delegateToWorker</span><span class="p">(</span><span class="nx">path</span><span class="p">:</span> <span class="kr">string</span><span class="p">,</span> <span class="nx">func</span><span class="p">:</span> <span class="p">(...</span><span class="nx">args</span><span class="p">:</span> <span class="kr">any</span><span class="p">[])</span> <span class="o">=&gt;</span> <span class="nb">Promise</span><span class="o">&lt;</span><span class="kr">any</span><span class="o">&gt;</span><span class="p">,</span> <span class="nx">args</span><span class="p">:</span> <span class="nx">object</span><span class="p">):</span> <span class="nb">Promise</span><span class="o">&lt;</span><span class="kr">any</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">FLY_PROCESS_GROUP</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">worker</span><span class="dl">'</span><span class="p">)</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">'</span><span class="s1">running on the worker...</span><span class="dl">'</span><span class="p">);</span> <span class="k">return</span> <span class="nx">func</span><span class="p">({...</span><span class="nx">args</span><span class="p">});</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">'</span><span class="s1">sending new request to worker...</span><span class="dl">'</span><span class="p">);</span> <span class="kd">const</span> <span class="nx">workerHost</span> <span class="o">=</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">NODE_ENV</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">development</span><span class="dl">'</span> <span class="p">?</span> <span class="dl">'</span><span class="s1">localhost:3001</span><span class="dl">'</span> <span class="p">:</span> <span class="s2">`worker.process.</span><span class="p">${</span><span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">FLY_APP_NAME</span><span class="p">}</span><span class="s2">.internal:3000`</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">response</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">fetch</span><span class="p">(</span><span class="s2">`http://</span><span class="p">${</span><span class="nx">workerHost</span><span class="p">}${</span><span class="nx">path</span><span class="p">}</span><span class="s2">`</span><span class="p">,</span> <span class="p">{</span> <span class="na">method</span><span class="p">:</span> <span class="dl">'</span><span class="s1">POST</span><span class="dl">'</span><span class="p">,</span> <span class="na">headers</span><span class="p">:</span> <span class="p">{</span> <span class="dl">'</span><span class="s1">Content-Type</span><span class="dl">'</span><span class="p">:</span> <span class="dl">'</span><span class="s1">application/json</span><span class="dl">'</span> <span class="p">},</span> <span class="na">body</span><span class="p">:</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">({...</span><span class="nx">args</span> <span class="p">})</span> <span class="p">});</span> <span class="k">return</span> <span class="nx">response</span><span class="p">.</span><span class="nx">json</span><span class="p">();</span> <span class="p">}</span> <span class="p">}</span> </code></pre> </div> </div> <p>In our <code>else</code> section, you&rsquo;ll notice that while developing locally (aka, when <code>NODE_ENV</code> is <code>development</code>) we define the hostname of our <code>worker</code> process to be <code>localhost:3001</code>. Typically Next.js apps run on port <code>3000</code>, so while testing our app locally, we can have two instances of our process running in different terminal shells:</p> <ul> <li><code>npm run dev</code> - This will run on <code>localhost:3000</code> and will act as our local <code>web</code> process </li><li><code>FLY_PROCESS_GROUP=worker npm run dev</code> - This will run on <code>localhost:3001</code> and will act as our <code>worker</code> process (Next.js should auto-increment the port if the original <code>3000</code> is already in use) </li></ul> <p>Also, if you&rsquo;re wondering about the <code>FLY_PROCESS_GROUP</code> and <code>FLY_APP_NAME</code> constants, these are <a href='https://fly.io/docs/reference/runtime-environment/' title=''>Fly.io-specific runtime environment variables</a> available on all apps.</p> <h3 id='accessing-our-worker-machines-internal' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accessing-our-worker-machines-internal' aria-label='Anchor'></a><span class='plain-code'>Accessing our <code>worker</code> Machines (<code>.internal</code>)</span></h3> <p>Now, when this code is running in production (aka <code>NODE_ENV</code> is NOT <code>development</code>) you&rsquo;ll see that we&rsquo;re using a unique hostname to access our <code>worker</code> Machine.</p> <p>Apps belonging to the same organization on Fly.io are provided a number of <a href='https://fly.io/docs/networking/private-networking/#fly-io-internal-addresses' title=''>internal addresses</a>. These <code>.internal</code> addresses let you point to different Apps and Machines in your private network. For example:</p> <ul> <li><code>&lt;region&gt;.&lt;app name&gt;.internal</code> – To reach app instances in a particular region, like <code>gru.my-cool-app.internal</code> </li><li><code>&lt;app instance ID&gt;.&lt;app name&gt;.internal</code> - To reach a <em>specific</em> app instance. </li><li><code>&lt;process group&gt;.process.&lt;app name&gt;.internal</code> - To target app instances belonging to a specific process group. <strong class='font-semibold text-navy-950'>This is what we&rsquo;re using in our app.</strong> </li></ul> <p>Since our <code>worker</code> process group is running the same process as our <code>web</code> process (in our case, <code>npm run start</code>), we&rsquo;ll also need to make sure we use the same internal port (<code>3000</code>).</p> <h3 id='defining-our-process-groups-and-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#defining-our-process-groups-and-machines' aria-label='Anchor'></a><span class='plain-code'>Defining our process groups and Machines</span></h3> <p>The last thing to do will be to define our two process groups and their respective Machine specs. We&rsquo;ll do this by editing our <code>fly.toml</code> configuration.</p> <p>If you don&rsquo;t have this file, go ahead and create a blank one and use the content below, but replace <code>app = open-pickle-jar</code> with your app&rsquo;s name, as well as your preferred <code>primary_region</code>. If you don&rsquo;t know what region you&rsquo;d like to deploy to, <a href='https://fly.io/docs/reference/regions/' title=''>here&rsquo;s the list of them</a>.</p> <p><strong class='font-semibold text-navy-950'>Before you deploy:</strong> Note that deploying this example app will spin up <strong class='font-semibold text-navy-950'>billable</strong> machines. Please feel free to alter the Machine (<code>[[vm]]</code>) specs listed here to ones that suit your budget or app&rsquo;s needs.</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-ffgx1pjb" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-ffgx1pjb">app = "open-pickle-jar" primary_region = "sea" [build] [processes] web = "npm run start" worker = "npm run start" [http_service] internal_port = 3000 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 processes = ["web"] [[vm]] cpu_kind = "shared" cpus = 1 memory_mb = 1024 processes = ["web"] [[vm]] size = "performance-4x" processes = ["worker"] </code></pre> </div> </div> <p>And that&rsquo;s it! With our <code>fly.toml</code> finished, we&rsquo;re ready to deploy our app!</p> <p><img src="https://slabstatic.com/prod/uploads/p1b436gf/posts/images/tH4GaGLVaDkh3RhIwCpiDRX3.png" /></p> <h2 id='discussion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#discussion' aria-label='Anchor'></a><span class='plain-code'>Discussion</span></h2> <p>Today we built a machine pattern on top of Fly.io. This pattern allows us to have a lighter request server that can delegate certain tasks to a stronger server, meaning that we can have one Machine do all the heavy lifting that could block everything else while the other handles all the simple tasks for users. With this in mind, this is a fairly naïve implementation, and we can make this much better:</p> <h3 id='using-a-queue-for-better-resiliency' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#using-a-queue-for-better-resiliency' aria-label='Anchor'></a><span class='plain-code'>Using a queue for better resiliency</span></h3> <p>In its current state, our code isn&rsquo;t very resilient to failed requests. For this reason, you may want to consider keeping track of jobs in a queue with Redis (similar to Sidekiq in Ruby-land). When you have work you want to do, put it in the queue. Your queue worker would have to write the result somewhere (e.g., in Redis) that the application could fetch when it&rsquo;s ready.</p> <h3 id='starting-stopping-worker-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#starting-stopping-worker-machines' aria-label='Anchor'></a><span class='plain-code'>Starting/stopping worker Machines</span></h3> <p>The benefit of this pattern is that you can limit how many &ldquo;beefy&rdquo; Machines you need to have available at any given time. Our demo app doesn&rsquo;t dictate how many <code>worker</code> Machines to have at any given time, but by adding timeouts you could elect to start and stop them as needed.</p> <p>Now, you may think that constantly starting and stopping Machines might incur higher response times, but note that we are NOT talking about creating/destroying Machines. Starting and stopping Machines only takes as long as it takes to start your web server (i.e. <code>npm run start</code>). The best part is that <strong class='font-semibold text-navy-950'>Fly.io does not charge for the CPU and RAM usage of stopped Machines.</strong> <a href='https://community.fly.io/t/we-are-going-to-start-collecting-charges-for-stopped-machines-rootfs-starting-april-25th/17825' title=''>We will charge for storage of their root filesystems on disk, starting April 25th, 2024</a>. Stopped Machines will still be much cheaper than running ones.</p> <h3 id='what-about-serverless-functions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-about-serverless-functions' aria-label='Anchor'></a><span class='plain-code'>What about serverless functions?</span></h3> <p>This &ldquo;delegate to a beefy machine&rdquo; pattern is similar to serverless functions with platforms like AWS Lambda. The main difference is that serverless functions usually require you to segment your application into a bunch of small pieces, whereas the method discussed today just uses the app framework that you deploy to production. Each pattern has its own benefits and downsides.</p> <h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>The pattern outlined here is one more tool in your arsenal for scaling applications. By utilizing Fly.io&rsquo;s private network and <code>.internal</code> domains, it&rsquo;s quick and easy to pass work between different processes that run our app. If you&rsquo;d like to learn about more methods for scaling tasks in your applications, check out <a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''>Rethinking Serverless with FLAME</a> by Chris McCord and <a href='https://fly.io/blog/print-on-demand/' title=''>Print on Demand</a> by Sam Ruby.</p> <figure class="post-cta"> <figcaption> <h1>Get more done on Fly.io</h1> <p>Fly.io has fast booting machines at the ready for your dynamic workloads. It&rsquo;s easy to get started. You can be off and running in minutes.</p> <a class="btn btn-lg" href="https://fly.io/docs/speedrun/"> Deploy something today! <span class='opacity:50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-rabbit.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> </content> </entry> <entry> <title>Macaroons Escalated Quickly</title> <link rel="alternate" href="https://fly.io/blog/macaroons-escalated-quickly/"/> <id>https://fly.io/blog/macaroons-escalated-quickly/</id> <published>2024-01-31T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/macaroons-escalated-quickly/assets/evil-cookies-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We built a new security token system, and can I tell you the good news about our lord and savior the Macaroon?</p> </div><h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'></a><span class='plain-code'>1</span></h2> <p>Let&rsquo;s implement an API token together. It&rsquo;s a design called &ldquo;Macaroons&rdquo;, but don&rsquo;t get hung up on that yet.</p> <p>First some <button toggle="#includes">throat-clearing</button>. Then:</p> <div id="includes" toggle-content="" aria-label="show very boring code"><div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-1c9mit0n"> <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959"></path><path d="M11.081 6.466L9.533 8.037l1.548 1.571"></path></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling"> <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z"></path><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617"></path></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class="highlight relative group"> <pre class="highlight "><code id="code-1c9mit0n"><span class="kn">import</span> <span class="nn">sys</span> <span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">json</span> <span class="kn">import</span> <span class="nn">hmac</span> <span class="k">as</span> <span class="n">hm</span> <span class="kn">from</span> <span class="nn">base64</span> <span class="kn">import</span> <span class="n">b64encode</span><span class="p">,</span> <span class="n">b64decode</span> <span class="kn">from</span> <span class="nn">hashlib</span> <span class="kn">import</span> <span class="n">sha256</span> <span class="k">def</span> <span class="nf">hmac</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">):</span> <span class="k">return</span> <span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="n">sha256</span><span class="p">).</span><span class="n">digest</span><span class="p">()</span> <span class="k">def</span> <span class="nf">enc</span><span class="p">(</span><span class="n">x</span><span class="p">):</span> <span class="k">return</span> <span class="n">b64encode</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">def</span> <span class="nf">dec</span><span class="p">(</span><span class="n">x</span><span class="p">):</span> <span class="k">return</span> <span class="n">b64decode</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> </code></pre> </div> </div></div><div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-7t25lxr4" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-7t25lxr4"><span class="k">def</span> <span class="nf">blank_token</span><span class="p">(</span><span class="n">uid</span><span class="p">,</span> <span class="n">key</span><span class="p">):</span> <span class="n">nonce</span> <span class="o">=</span> <span class="n">enc</span><span class="p">(</span><span class="s">":"</span><span class="p">.</span><span class="n">join</span><span class="p">([</span><span class="nb">str</span><span class="p">(</span><span class="n">uid</span><span class="p">),</span> <span class="n">os</span><span class="p">.</span><span class="n">urandom</span><span class="p">(</span><span class="mi">16</span><span class="p">)]))</span> <span class="k">return</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">([</span><span class="n">nonce</span><span class="p">,</span> <span class="n">enc</span><span class="p">(</span><span class="n">hmac</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">nonce</span><span class="p">))])</span> </code></pre> </div> </div><div class="right-sidenote"><p>Bearer tokens: like cookies, blobs you attach to a request (usually in an HTTP header).</p> </div> <p>We&rsquo;re going to build a minimally-stateful bearer token, a blob signed with HMAC. Nothing fancy so far. <a href='https://api.rubyonrails.org/classes/ActiveSupport/MessageVerifier.html' title=''>Rails has done this</a> for a decade and a half.</p> <p>There&rsquo;s a <a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''>fashion in API security for stateless tokens</a>, which encode all the data you&rsquo;d need to check any request accompanied by that token – without a database lookup. Stateless tokens have some nice properties, and some less-nice. Our tokens won&rsquo;t be stateless: they carry a user ID, with which we&rsquo;ll look up the HMAC key to verify it. But they&rsquo;ll stake out a sort of middle ground.</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-r52d35ga" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-r52d35ga"><span class="k">def</span> <span class="nf">attenuate</span><span class="p">(</span><span class="n">macStr</span><span class="p">,</span> <span class="n">cav</span><span class="p">):</span> <span class="n">mac</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">macStr</span><span class="p">)</span> <span class="n">cavStr</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">cav</span><span class="p">)</span> <span class="n">oldTail</span> <span class="o">=</span> <span class="n">dec</span><span class="p">(</span><span class="n">mac</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span> <span class="n">newTail</span> <span class="o">=</span> <span class="n">enc</span><span class="p">(</span><span class="n">hmac</span><span class="p">(</span><span class="n">oldTail</span><span class="p">,</span> <span class="n">cavStr</span><span class="p">))</span> <span class="k">return</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">mac</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="p">[</span><span class="n">cavStr</span><span class="p">,</span> <span class="n">newTail</span><span class="p">])</span> <span class="n">m0</span> <span class="o">=</span> <span class="n">blank_token</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="n">keys</span><span class="p">[</span><span class="mi">10</span><span class="p">])</span> <span class="n">m1</span> <span class="o">=</span> <span class="n">attenuate</span><span class="p">(</span><span class="n">m0</span><span class="p">,</span> <span class="p">{</span><span class="s">'path'</span><span class="p">:</span> <span class="s">'/images'</span><span class="p">})</span> <span class="n">m2</span> <span class="o">=</span> <span class="n">attenuate</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="p">{</span><span class="s">'op'</span><span class="p">:</span> <span class="s">'read'</span><span class="p">})</span> </code></pre> </div> </div> <p>Let&rsquo;s add some stuff.</p> <p>The meat of our tokens will be a series of claims we call &ldquo;caveats&rdquo;. We call them that because each claim restricts further what the token authorizes. After <code>{&#39;path&#39;: &#39;/images&#39;}</code>, this token only allows operations that happen underneath the <code>/images</code> directory. Then, after <code>{&#39;op&#39;: &#39;read&#39;}</code>, it allows only reads, not writes.</p> <p>(I guess we&rsquo;re building a file sharing system. Whatever.)</p> <p>Some important things about things about this design. First: by implication from the fact that caveats further restrict tokens, a token with no caveats restricts nothing. It&rsquo;s a god-mode token. Don&rsquo;t honor it.</p> <div class="right-sidenote"><p>In other words: the ordering of caveats doesn’t matter.</p> </div> <p>Second: the rule of checking caveats is very simple: every single caveat must pass, evaluating <code>True</code> against the request that carries it, in isolation and without reference to any other caveat. If any caveat evaluates <code>False</code>, the request fails. In that way, we ensure that adding caveats to a token can only ever weaken it.</p> <p>With that in mind, take a closer look at this code:</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-n7mgbkwf" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-n7mgbkwf"><span class="n">oldTail</span> <span class="o">=</span> <span class="n">dec</span><span class="p">(</span><span class="n">mac</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span> <span class="n">newTail</span> <span class="o">=</span> <span class="n">enc</span><span class="p">(</span><span class="n">hmac</span><span class="p">(</span><span class="n">oldTail</span><span class="p">,</span> <span class="n">cavStr</span><span class="p">))</span> </code></pre> </div> </div> <p>Every caveat is HMAC-signed independently, which is weird. Weirder still, the key for that HMAC is the output of the last HMAC. The caveats chain together, and the HMAC of the last caveat becomes the &ldquo;tail&rdquo; of the token.</p> <p>Creating a new blank token for a particular user requires a key that the server (and probably only the server) knows. But adding a caveat doesn&rsquo;t! Anybody can add a caveat. In our design, you, the user, can edit your own API token.</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-nx5eitys" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-nx5eitys"><span class="k">def</span> <span class="nf">verify</span><span class="p">(</span><span class="n">macStr</span><span class="p">,</span> <span class="n">keys</span><span class="p">):</span> <span class="n">mac</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">macStr</span><span class="p">)</span> <span class="n">nonce</span> <span class="o">=</span> <span class="n">dec</span><span class="p">(</span><span class="n">mac</span><span class="p">[</span><span class="mi">0</span><span class="p">]).</span><span class="n">split</span><span class="p">(</span><span class="s">":"</span><span class="p">)</span> <span class="n">key</span> <span class="o">=</span> <span class="n">keys</span><span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">nonce</span><span class="p">[</span><span class="mi">0</span><span class="p">])]</span> <span class="n">tail</span> <span class="o">=</span> <span class="s">""</span> <span class="k">for</span> <span class="n">cav</span> <span class="ow">in</span> <span class="n">mac</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]:</span> <span class="n">tail</span> <span class="o">=</span> <span class="n">hmac</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">cav</span><span class="p">)</span> <span class="n">key</span> <span class="o">=</span> <span class="n">tail</span> <span class="k">return</span> <span class="n">hm</span><span class="p">.</span><span class="n">compare_digest</span><span class="p">(</span><span class="n">tail</span><span class="p">,</span> <span class="n">dec</span><span class="p">(</span><span class="n">mac</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]))</span> <span class="n">verify</span><span class="p">(</span><span class="n">m2</span><span class="p">,</span> <span class="n">keys</span><span class="p">)</span> <span class="c1"># =&gt; True </span></code></pre> </div> </div> <p>For completeness, and to make a point, there&rsquo;s the verification code. Look up the original secret key from the user ID, and then it&rsquo;s chained HMAC all the way down. The point I&rsquo;m making is that Macaroons are very simple.</p> <h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'></a><span class='plain-code'>2</span></h2> <p>Back in 2014, Google published <a href='https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41892.pdf' title=''>a paper at NDSS</a> introducing &ldquo;Macaroons&rdquo;, a new kind of cookie. Since then, they&rsquo;ve become a sort of hipster shibboleth. But they&rsquo;re more talked about than implemented, which is a nice way to say that practically nobody uses them.</p> <p>Until now! I dragged Fly.io into implementing them. Suckers!</p> <p>We had a problem: our API tokens were much too powerful. We needed to scope them down and let them express roles, and I scoped up that project to replace OAuth2 tokens altogether. We now have what I think is one of the more expansive Macaroon implementations on the Internet.</p> <p>I dragged us into using Macaroons because I wanted us to use a hipster token format. Google designed Macaroons for a bigger reason: they hoped to replace browser cookies with something much more powerful.</p> <p>The problem with simple bearer tokens, like browser cookies or JWTs, is that they&rsquo;re prone to being stolen and replayed by attackers.</p> <div class="right-sidenote"><p>game-over: pentest jargon for “very bad”</p> </div> <p>Worse, a stolen token is usually a game-over condition. In most schemes, a bearer token is an all-access pass for the associated user. For some applications this isn&rsquo;t that big a deal, but then, <a href='https://neilmadden.blog/2020/09/09/macaroon-access-tokens-for-oauth-part-2-transactional-auth/' title=''>think about banking</a>. A banking app token that authorizes arbitrary transactions is a recipe for having a small heart attack on every HTTP request.</p> <div class="right-sidenote"><p>(Perfectly minimized API tokens: a software security holy grail)</p> </div> <p>Macaroons are user-editable tokens that enable JIT-generated least-privilege tokens. With minimal ceremony and no additional API requests, a banking app Macaroon lets you authorize a request with a caveat like, I don&rsquo;t know, <code>{&#39;maxAmount&#39;: &#39;$5&#39;}</code>. I mean, something way better than that, probably lots of caveats, not just one, but you get the idea: a token so minimized you feel safe sending it with your request. Ideally, a token that only authorizes that single, intended request.</p> <h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'></a><span class='plain-code'>3</span></h2> <p>That&rsquo;s not why we like Macaroons. We already assume our tokens aren&rsquo;t being stolen.</p> <p>In most systems, the developers come up with a permissions system, and you’re stuck with it. We run a public cloud platform, and people want a lot of different things from our permissions. The dream is, we (the low-level platform developers on the team) design a single permission system, one time, and go about our jobs never thinking about this problem again.</p> <p>Instead of thinking of all of our &ldquo;roles&rdquo; in advance, we just model our platform with caveats:</p> <ol> <li>Users belong to <code>Organizations</code>. </li><li><code>Organizations</code> own <code>Apps</code>. </li><li><code>Apps</code> contain <code>Machines</code> and <code>Volumes</code>. </li><li>To any of these things, you can <code>Read</code>, <code>Write</code>, <code>Create</code>, <code>Delete</code>, and/or <code>Control</code> <aside class="right-sidenote">control being change of state, like “start” and “stop”</aside>. </li><li>Some administrivia, like expiration (<code>ValidityWindow</code>), locking tokens to specific Fly Machines (<code>FromMachineSource</code>), and escape hatches like <code>Mutation</code> (for our GraphQL API). </li></ol> <div class="right-sidenote"><p>(this is a vibes-based notation, don’t think too hard about it)</p> </div> <p>Simplistic. But it expresses admin tokens:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-x5iepn6s" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-x5iepn6s">Organization 4721, mask=* </code></pre> </div> </div> <p>And it expresses normal user tokens:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-srsndejy" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-srsndejy">Organization 4721, mask=read,write,control (App 123, mask=control), (App 345, mask=read, write, control) </code></pre> </div> </div> <p>And also an auditor-only token for that user:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-jh9ga1bt" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-jh9ga1bt">Organization 4721, mask=read,write,control (App 123, mask=control), (App 345, mask=read, write, control) Organization 4721, mask=read </code></pre> </div> </div><div class="right-sidenote"><p>(our deploy tokens are more complicated than this)</p> </div> <p>Or a deployment-only token, for a CI/CD system:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-pe18x39a" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-pe18x39a">Organization 4721, mask=write,control (App 123, mask=*) </code></pre> </div> </div> <p>Those are just the roles we came up with. Users can invent others. The important thing is that they don&rsquo;t have to bother me about them.</p> <h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'></a><span class='plain-code'>4</span></h2> <p>Astute readers will have noticed by now that we haven&rsquo;t shown any code that actually evaluates a caveat. That&rsquo;s because it&rsquo;s boring, and I&rsquo;m too lazy to write it out. Got an <code>Organization</code> token for <code>image-hosting</code> that allows <code>Reads</code>? Ok; check and make sure the incoming request is for an asset of <code>image-hosting</code>, and that it’s a <code>Read</code>. Whatever code you came up with, it’d be fine.</p> <p>These straightforward restrictions are called &ldquo;first party caveats&rdquo;. The first party is us, the platform. We&rsquo;ve got all the information we need to check them.</p> <p>Let&rsquo;s kit out our token format some more.</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-rvmob8wx" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-rvmob8wx"><span class="k">def</span> <span class="nf">third_party_caveat</span><span class="p">(</span><span class="n">ka</span><span class="p">,</span> <span class="n">tail</span><span class="p">,</span> <span class="n">msg</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span> <span class="n">crk</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">urandom</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span> <span class="n">ticket</span> <span class="o">=</span> <span class="n">enc</span><span class="p">(</span><span class="n">encrypt</span><span class="p">(</span><span class="n">ka</span><span class="p">,</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">({</span> <span class="s">'crk'</span><span class="p">:</span> <span class="n">enc</span><span class="p">(</span><span class="n">crk</span><span class="p">),</span> <span class="s">'msg'</span><span class="p">:</span> <span class="n">msg</span> <span class="p">})))</span> <span class="n">challenge</span> <span class="o">=</span> <span class="n">enc</span><span class="p">(</span><span class="n">encrypt</span><span class="p">(</span><span class="n">tail</span><span class="p">,</span> <span class="n">crk</span><span class="p">))</span> <span class="k">return</span> <span class="p">{</span> <span class="s">'url'</span><span class="p">:</span> <span class="n">url</span><span class="p">,</span> <span class="s">'ticket'</span><span class="p">:</span> <span class="n">ticket</span><span class="p">,</span> <span class="s">'challenge'</span> <span class="p">:</span> <span class="n">challenge</span> <span class="p">}</span> <span class="n">key</span> <span class="o">=</span> <span class="nb">bytes</span><span class="p">(</span><span class="s">"YELLOW SUBMARINE"</span><span class="p">)</span> <span class="n">url</span> <span class="o">=</span> <span class="s">"https://canary.service"</span> <span class="n">c3</span> <span class="o">=</span> <span class="n">third_party_caveat</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">tail</span><span class="p">,</span> <span class="n">url</span><span class="p">,</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">({</span><span class="s">'user'</span><span class="p">:</span> <span class="s">'bobson.dugnutt'</span><span class="p">}))</span> <span class="n">m3</span> <span class="o">=</span> <span class="n">attenuate</span><span class="p">(</span><span class="n">m2</span><span class="p">,</span> <span class="n">c3</span><span class="p">)</span> </code></pre> </div> </div> <p>Up till now, we&rsquo;ve gotten by with nothing but HMAC, which is one of the great charms of the design. Now we need to encrypt. There&rsquo;s no authenticated encryption in the Python standard library, but that won&rsquo;t stop us. <button toggle="#hmac-ctr">Ready to make some candy? Hand me that brake fluid!</button></p> <div id="hmac-ctr" toggle-content="" aria-label="show very silly code"><div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-brvb3s1v"> <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959"></path><path d="M11.081 6.466L9.533 8.037l1.548 1.571"></path></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling"> <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z"></path><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617"></path></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class="highlight relative group"> <pre class="highlight "><code id="code-brvb3s1v"><span class="c1"># do i really need to say that i'm not serious about this? </span> <span class="k">def</span> <span class="nf">hmactr</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span> <span class="n">ks</span> <span class="o">=</span> <span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">k</span><span class="o">+</span><span class="n">n</span><span class="p">)</span> <span class="k">for</span> <span class="n">counter</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">sys</span><span class="p">.</span><span class="n">maxint</span><span class="p">):</span> <span class="n">ks</span><span class="p">.</span><span class="n">update</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">counter</span><span class="p">))</span> <span class="n">kbs</span> <span class="o">=</span> <span class="n">ks</span><span class="p">.</span><span class="n">digest</span><span class="p">()</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">16</span><span class="p">):</span> <span class="k">yield</span> <span class="n">kbs</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">def</span> <span class="nf">encrypt</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">buf</span><span class="p">):</span> <span class="n">ak</span> <span class="o">=</span> <span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="s">'auth'</span><span class="p">).</span><span class="n">digest</span><span class="p">()</span> <span class="n">nonce</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">urandom</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span> <span class="n">cipher</span> <span class="o">=</span> <span class="n">hmactr</span><span class="p">(</span><span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="s">'enc'</span><span class="p">).</span><span class="n">digest</span><span class="p">(),</span> <span class="n">nonce</span><span class="p">)</span> <span class="n">ctxt</span> <span class="o">=</span> <span class="nb">bytearray</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">buf</span><span class="p">)):</span> <span class="n">ctxt</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">^=</span> <span class="nb">ord</span><span class="p">(</span><span class="n">cipher</span><span class="p">.</span><span class="nb">next</span><span class="p">())</span> <span class="n">res</span> <span class="o">=</span> <span class="n">nonce</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">ctxt</span><span class="p">)</span> <span class="k">return</span> <span class="n">res</span> <span class="o">+</span> <span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">ak</span><span class="p">,</span> <span class="n">res</span><span class="p">).</span><span class="n">digest</span><span class="p">()</span> <span class="k">def</span> <span class="nf">decrypt</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">buf</span><span class="p">):</span> <span class="n">ak</span> <span class="o">=</span> <span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="s">'auth'</span><span class="p">).</span><span class="n">digest</span><span class="p">()</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">hm</span><span class="p">.</span><span class="n">compare_digest</span><span class="p">(</span><span class="n">buf</span><span class="p">[</span><span class="o">-</span><span class="mi">16</span><span class="p">:],</span> <span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">ak</span><span class="p">,</span> <span class="n">buf</span><span class="p">[:</span><span class="o">-</span><span class="mi">16</span><span class="p">]).</span><span class="n">digest</span><span class="p">()):</span> <span class="k">return</span> <span class="bp">False</span> <span class="n">nonce</span> <span class="o">=</span> <span class="n">buf</span><span class="p">[:</span><span class="mi">16</span><span class="p">]</span> <span class="n">cipher</span> <span class="o">=</span> <span class="n">hmactr</span><span class="p">(</span><span class="n">hm</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="s">'enc'</span><span class="p">).</span><span class="n">digest</span><span class="p">(),</span> <span class="n">nonce</span><span class="p">)</span> <span class="n">ptxt</span> <span class="o">=</span> <span class="nb">bytearray</span><span class="p">(</span><span class="n">buf</span><span class="p">[</span><span class="mi">16</span><span class="p">:</span><span class="o">-</span><span class="mi">16</span><span class="p">])</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">buf</span><span class="p">[</span><span class="mi">16</span><span class="p">:</span><span class="o">-</span><span class="mi">16</span><span class="p">])):</span> <span class="n">ptxt</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">^=</span> <span class="nb">ord</span><span class="p">(</span><span class="n">cipher</span><span class="p">.</span><span class="nb">next</span><span class="p">())</span> <span class="k">return</span> <span class="nb">str</span><span class="p">(</span><span class="n">ptxt</span><span class="p">)</span> </code></pre> </div> </div></div> <p>With &ldquo;third-party&rdquo; caveats comes a cast of characters. We&rsquo;re still the first party. You&rsquo;ll play the second party. The third party is any other system in the world that you trust: an SSO system, an audit log, a revocation checker, whatever.</p> <p>Here&rsquo;s the trick of the third-party caveat: our platform doesn&rsquo;t know what your caveat means, and it doesn&rsquo;t have to. Instead, when you see a third-party caveat in your token, you tear a ticket off it and exchange it for a &ldquo;discharge Macaroon&rdquo; with that third party. You submit both Macaroons together to us.</p> <p>Let&rsquo;s attenuate our token with a third-party caveat hooking it up to a &ldquo;canary&rdquo; service that generates a notice approximately any time the token is used.</p> <p><img src="/blog/macaroons-escalated-quickly/assets/third-party.png?1/2&amp;wrap-left" /></p> <p>To build that canary caveat, you first make a <code>ticket</code> that users of the token will hand to your canary, and then a <code>challenge</code> that Fly.io will use to verify discharges your checker spits out. The ticket and the challenge are both encrypted. The ticket is encrypted under <code>KA</code>, so your service can read it. The challenge is encrypted under the previous Macaroon tail, so only Fly.io can read it. Both hide yet another key, the random HMAC key <code>CRK</code> (&ldquo;caveat root key&rdquo;).</p> <p>In addition to <code>CRK</code>, the ticket contains a message, which says whatever you want it to; Fly.io doesn&rsquo;t care. Typically, the message describes some kind of additional checking you want your service to perform before spitting out a discharge token.</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-135v2c4d" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-135v2c4d"><span class="k">def</span> <span class="nf">discharge</span><span class="p">(</span><span class="n">ka</span><span class="p">,</span> <span class="n">ticket</span><span class="p">):</span> <span class="n">ptxt</span> <span class="o">=</span> <span class="n">decrypt</span><span class="p">(</span><span class="n">ka</span><span class="p">,</span> <span class="n">dec</span><span class="p">(</span><span class="n">ticket</span><span class="p">))</span> <span class="k">if</span> <span class="n">ptxt</span> <span class="o">==</span> <span class="bp">False</span><span class="p">:</span> <span class="k">return</span> <span class="bp">False</span> <span class="n">tbody</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">ptxt</span><span class="p">)</span> <span class="c1"># not shown: do something with tbody['msg'] </span> <span class="k">return</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">([</span><span class="n">ticket</span><span class="p">,</span> <span class="n">enc</span><span class="p">(</span><span class="n">hmac</span><span class="p">(</span><span class="n">dec</span><span class="p">(</span><span class="n">tbody</span><span class="p">[</span><span class="s">'crk'</span><span class="p">]),</span> <span class="n">ticket</span><span class="p">))])</span> </code></pre> </div> </div> <p>To authorize a request with a token that includes a third-party caveat for the canary service, you need to get your hands on a corresponding discharge Macaroon. Normally, you do that by <code>POST</code>ing the ticket from the caveat to the service.</p> <p>Discharging is simple. The service, which holds <code>KA</code>, uses it to decrypt the ticket. It checks the message and makes some decisions. Finally, it mints a new macaroon, using <code>CRK</code>, recovered from the ticket, as the root key. The ticket itself is the nonce.</p> <p>If it wants, the third-party service can slap on a bunch of first-party caveats of its own. When we verify the Macaroon, we&rsquo;ll copy those caveats out and enforce them. Attenuation of a third-party discharge macaroon works like a normal macaroon.</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-gjymtoma" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-gjymtoma"><span class="k">def</span> <span class="nf">verify_third_party</span><span class="p">(</span><span class="n">tag</span><span class="p">,</span> <span class="n">cav</span><span class="p">,</span> <span class="n">discharges</span><span class="o">=</span><span class="p">[]):</span> <span class="n">crk</span> <span class="o">=</span> <span class="n">decrypt</span><span class="p">(</span><span class="n">tag</span><span class="p">,</span> <span class="n">dec</span><span class="p">(</span><span class="n">cav</span><span class="p">[</span><span class="s">'challenge'</span><span class="p">]))</span> <span class="k">if</span> <span class="n">crk</span> <span class="o">==</span> <span class="bp">False</span><span class="p">:</span> <span class="k">return</span> <span class="bp">False</span> <span class="n">discharge</span> <span class="o">=</span> <span class="bp">None</span> <span class="k">for</span> <span class="n">dcs</span> <span class="ow">in</span> <span class="n">discharges</span><span class="p">:</span> <span class="k">if</span> <span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">dcs</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="n">cav</span><span class="p">[</span><span class="s">'ticket'</span><span class="p">]:</span> <span class="n">discharge</span> <span class="o">=</span> <span class="n">dcs</span> <span class="k">break</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">discharge</span><span class="p">:</span> <span class="k">return</span> <span class="bp">False</span> <span class="n">mac</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">discharge</span><span class="p">)</span> <span class="n">key</span> <span class="o">=</span> <span class="n">crk</span> <span class="c1"># boring old stuff --------------------- </span> <span class="n">tag</span> <span class="o">=</span> <span class="s">""</span> <span class="k">for</span> <span class="n">cav</span> <span class="ow">in</span> <span class="n">mac</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]:</span> <span class="n">tag</span> <span class="o">=</span> <span class="n">hmac</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">cav</span><span class="p">)</span> <span class="n">key</span> <span class="o">=</span> <span class="n">tag</span> <span class="k">return</span> <span class="n">hm</span><span class="p">.</span><span class="n">compare_digest</span><span class="p">(</span><span class="n">tag</span><span class="p">,</span> <span class="n">dec</span><span class="p">(</span><span class="n">mac</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]))</span> </code></pre> </div> </div> <p>To verify tokens that have third-party caveats, start with the root Macaroon, walking the caveats like usual. At each third-party caveat, match the <code>ticket</code> from the caveat with the <code>nonce</code> on the discharge Macaroon. The key for root Macaroon decrypts the <code>challenge</code> in the caveat, recovering <code>CRK</code>, which cryptographically verifies the discharge.</p> <p>(The Macaroons paper uses different terms: “caveat identifier” or <code>cId</code> for “ticket”, and “verification-key identifier” or <code>vId</code> for “challenge”. These names are self-evidently bad and our contribution to the state of the art is to replace them.)</p> <p>There&rsquo;s two big applications for third-party caveats in Popular Macaroon Thought. First, they facilitate microservice-izing your auth logic, because you can stitch arbitrary policies together out of third-party caveats. And, they seem like <a href='https://github.com/go-macaroon-bakery/macaroon-bakery' title=''>fertile ground for an ecosystem of interoperable Macaroon services</a>: Okta and Google could stand up SSO dischargers, for instance, or someone can do a really good revocation service.</p> <p>Neither of these light us up. We&rsquo;re allergic to microservices. As for public protocols, well, it&rsquo;s good to want things. So we almost didn&rsquo;t even implement third-party caveats.</p> <h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'></a><span class='plain-code'>5</span></h2> <p>I&rsquo;m glad we did though, because they&rsquo;ve been pretty great.</p> <p>The first problem third-party caveats solved for us was hazmat tokens. To the extent possible, we want Macaroon tokens to be safe to transmit between users. Our Macaroons express permissions, but not authentication, so it’s almost safe to email them.</p> <p>The way it works is, our Macaroons all have a third-party caveat pointing to a &ldquo;login service&rdquo;, either identifying the proper bearer as a particular Fly.io user or as a member of some <code>Organization</code>. To allow a request with your token, you first need to collect the discharge from the login service, which requires authentication.</p> <p>The login discharge is very sensitive, but there isn&rsquo;t much reason to pass it around. The original permissions token is where all the interesting stuff is, and it&rsquo;s not scary. So that&rsquo;s nice.</p> <p><img src="/blog/macaroons-escalated-quickly/assets/fly-sso.png?1/3&amp;wrap-left" /></p> <p>Ben then came up with <a href="https://community.fly.io/t/organization-required-sso/17560">third-party caveats that require Google or Github SSO logins.</a> If your token has one of those caveats, when you run <code>flyctl deploy</code>, a browser will pop up to log you into your SSO IdP (if you haven’t done so recently already).</p> <p>We’ve put a <a href='https://fly.io/blog/tokenized-tokens/#tokenizer-the-fabled-4th-way' title=''>bunch of work into getting the guts of our SSO system working</a>, but that work has mostly been invisible to customers. But Macaroon-ized SSO has a subtle benefit: you can configure <a href='http://Fly.io' title=''>Fly.io</a> to automatically add SSO requirements to specific <code>Organizations</code> (so, for instance, a dev environment might not need SSO at all, and prod might need two).</p> <p>SSO requirements in most applications are a brittle pain in the ass. Ours are flexible and straightforward, and that happened almost by accident. Macaroons, baby!</p> <p>Here&rsquo;s a fun thing you can do with a Macaroon system: stand up a Slack bot, and give it an HTTP <code>POST</code> handler that accepts third-party tickets. Then:</p> <p><img src="/blog/macaroons-escalated-quickly/assets/bot-ok.png?1/2&amp;center&amp;border" /></p> <p>So, the bot is cute, but any platform could do that. What’s cool is the way our platform <em>doesn’t</em> work with Slack; in fact, nothing on our platform knows anything about Slack, and Slack doesn’t know anything about us. We didn’t reach out to a Slack endpoint. Everything was purely cryptographic.</p> <p>That bot could, if I sunk some time into it, enforce arbitrary rules: it could selectively add caveats for the requests it authorizes, based on lookups of the users requesting them, at specific times of day, with specific logging. Theoretically, it could add third-party caveats of its own.</p> <p>The win for us for third-party caveats is that they create a plugin system for our security tokens. That’s an unusual place to see a plugin interface! But Macaroons are easy to understand and keep in your head, so we&rsquo;re pretty confident about the security issues.</p> <h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'></a><span class='plain-code'>6</span></h2> <p>Obviously, we didn&rsquo;t write our Macaroon code in Python, or with HMAC-SHA256-CTR.</p> <p>We landed on a primary implementation Golang (Ben subsequently wrote an Elixir implementation). Our hash is SHA256, our cipher is Chapoly. We encode in MsgPack.</p> <div class="callout"><p>We didn’t use the pre-existing public implementation because <a href="https://securitycryptographywhatever.com/2021/08/12/what-do-we-do-about-jwt-with-jonathan-rudenberg/" title="">we were warned not to</a>. The Macaroon idea is simple, and it exists mostly as an academic paper, not a standard. The community that formed around building open source “standard” Macaroons decided to use untyped opaque blobs to represent caveats. We need things to be as rigidly unambiguous as they can be.</p> </div> <p><img src="/blog/macaroons-escalated-quickly/assets/verifier-service.png?2/3&amp;center" /></p> <p>The big strength of Macaroons as a cryptographic design — that it’s based almost entirely on HMAC — makes it a challenge to deploy. If you can verify a Macaroon, you can generate one. We have thousands of servers. They can&rsquo;t all be allowed to generate tokens.</p> <p>What we did instead:</p> <ul> <li>We split token checking into “verification” of token HMAC tags and “clearing” of token caveats. </li><li>Verification occurs only on a physically isolated token-verification service; to verify a token’s tag, you HTTP <code>POST</code> the token to the verifier. </li><li>Clearing of token caveats can happen anywhere. Token caveat clearing is domain-specific and subject to change; token verification is simple cryptography and changes rarely. </li><li>A token verification is cacheable. The client library for the token verifier does that, which speeds things up by exploiting the locality of token submissions. </li><li>The verification service is backed by a <a href='https://fly.io/docs/litefs/' title=''>LiteFS-distributed SQLite database</a>, so verification is fast globally — a major step forward from our legacy OAuth2 tokens, which are only fast in Ashburn, VA. </li></ul> <p><img src="/blog/macaroons-escalated-quickly/assets/service-token.png?2/3&amp;center" /></p> <p>Now buckle up, because I&rsquo;m about to try to get you to care about service tokens.</p> <p>We operate &ldquo;worker servers&rdquo; all over the world to host apps for our customers. To do that, those workers need access to customer secrets, like the key to decrypt a customer volume. To retrieve those secrets, the workers have to talk to secrets management servers.</p> <p>We manage a lot of workers. We trust them. But we don&rsquo;t trust them that much, if you get my drift. You don&rsquo;t want to just leave it up to the servers to decide which secrets they can access. The blast radius of a problem with a single worker should be no greater than the apps that are supposed to run there.</p> <p>The gold standard for approving access to customer information is, naturally, explicit customer authorization. We almost have that with Macaroons! The first time an app runs on a worker, <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>the orchestrator code</a> has a token, and it can pass that along to the secret stores.</p> <p>The problem is, you need that token more than once; not just when the user does a deploy, but potentially any time you restart the app or migrate it to a new worker. And you can&rsquo;t just store and replay user Macaroons. They have expirations.</p> <div class="right-sidenote"><p>This is like dropping privilege with things like pledge(2), but in a distributed system.</p> </div> <p>So our token verification service exposes an API that transforms a user token into a “service token”, which is just the token with the authentication caveat and expiration “stripped off”.</p> <p>What’s cool is: components that receive service tokens can attenuate them. For instance, we could lock a token to a particular worker, or even a particular Fly Machine. Then we can expose the whole <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Fly Machines API</a> to customer VMs while keeping access traceable to specific customer tokens. Stealing the token from a Fly Machine doesn’t help you since it’s locked to that Fly Machine by a caveat attackers can’t strip.</p> <h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'></a><span class='plain-code'>7</span></h2> <p>If a customer loses their tokens to an attacker, we can’t just blow that off and let the attacker keep compromising the account!</p> <div class="right-sidenote"><p>This cancels every token derived through attenuation by that nonce.</p> </div> <p>Every Macaroon we issue is identified by a unique nonce, and we can revoke tokens by that nonce. This is just a basic function of the token verification service we just described.</p> <p>We host token caches all over our fleet. Token revocation invalidates the caches. Anything with a cache checks frequently whether to invalidate. Revocation is rare, so just keeping a revocation list and invalidating caches wholesale seems fine.</p> <h2 id='8' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#8' aria-label='Anchor'></a><span class='plain-code'>8</span></h2> <p>I get it, it&rsquo;s tough to get me to shut up about Macaroons.</p> <p>A couple years ago, I <a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''>wrote a long survey of API token designs</a>, from JWTs (never!) to Biscuits. I had a <a href='https://fly.io/blog/api-tokens-a-tedious-survey/#macaroons' title=''>bunch to say about Macaroons</a>, not all of it positive, and said we&rsquo;d be plowing forward with them at Fly.io.</p> <p>My plan had been to follow up soon after with a deep dive on Macaroons as we planned them for Fly.io. I&rsquo;m glad I didn&rsquo;t do that, not just because it would&rsquo;ve been embarrassing to announce a feature that took us over 2 years to launch, but also because the process of working on this with Ben Toews changed a lot of my thinking about them.</p> <p>I think if you asked Ben, he&rsquo;d say he had mixed feelings about how much complexity we wrangled to get this launched. On the other hand: we got a lot of things out of them without trying very hard:</p> <ul> <li>Security tokens you can (almost) email to your users and partners without putting your account at risk. </li><li>A flexible permission system, encoded directly into the tokens, that users can drive without talking to our servers. </li><li>A plugin system that users can (when we clean up the tooling) use themselves, to add things like Passkeys or two-person-approval rules or audit logging, without us getting in the middle. </li><li>An SSO system that can stack different IdPs, mandate SSO login, and do that on a per-<code>Organization</code> basis. </li><li><a href='https://www.latacora.com/blog/2018/06/12/a-childs-garden/' title=''>Inter-service authorization</a> that is traceable back to customer actions, so our servers can&rsquo;t just make up which apps they&rsquo;re allowed to look at. </li><li>An elegant way of exposing our own APIs to customer Fly Machines with ambient authentication, but without the <a href="https://github.com/SummitRoute/imdsv2_wall_of_shame/blob/main/README.md">AWS IMDSv1 credential theft problem</a>. </li></ul> <p>There are downsides and warts! I&rsquo;m mostly not telling you about them! Pure restrictive caveats are an awkward way to express some roles. And, blinded by my hunger to get Macaroons deployed, I spat in the face of science and used internal database IDs as our public caveat format, an act for which JP will never forgive me.</p> <p>If i&rsquo;ve piqued your interest, <a href='https://github.com/superfly/macaroon' title=''>the code for this stuff is public</a>, along with some more <a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''>detailed technical documentation</a>.</p> </content> </entry> <entry> <title>How Yoko Li makes towns, tamagoes, and tools for local AI</title> <link rel="alternate" href="https://fly.io/blog/how-i-fly-yoko-li/"/> <id>https://fly.io/blog/how-i-fly-yoko-li/</id> <published>2024-01-08T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/how-i-fly-yoko-li/assets/chat-bird-cover-thumb.webp"/> <content type="html"><p>Hello all, and welcome to another episode of How I Fly, a series where I interview developers about what they do with technology, what they find exciting, and the unexpected things they’ve learned along the way. This time I’m talking with <a href='https://twitter.com/stuffyokodraws' title=''>Yoko Li</a>, an investment partner at A16Z who’s also an open-source AI developer. She works on some of the most exciting AI projects in the world. I’m excited to share them with you today, with fun stories about the lessons she’s learned along the way.</p> <h2 id='cool-experiments' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#cool-experiments' aria-label='Anchor'></a><span class='plain-code'>Cool Experiments</span></h2> <p>One of Yoko’s most thought-provoking experiments is <a href='https://www.convex.dev/ai-town' title=''>AI Town</a>, a virtual town populated by AI agents that talk with each other. It takes advantage of the randomness of AI responses to create emergent behavior. When you open it, it looks like this:</p> <p><img alt="A picture of the AI Town homepage, a UI showing a top-down 2D RPG view with a visible river and a tent. The UI shows a conversation with the characters Alice and Stella." src="/blog/how-i-fly-yoko-li/assets/image1.webp" /></p> <p>You can see the AI agents talking with each other and watch how the relationships between them form and change over time. It’s also a lot of fun to watch.</p> <p>One of Yoko’s other experiments is <a href='https://ai-tamago.fly.dev/' title=''>AI Tamago</a>, a <a href='https://en.wikipedia.org/wiki/Tamagotchi' title=''>Tamagochi</a> virtual pet implemented with a large language model instead of the state machine that we’re all used to. AI Tamago uses an unmodified version of LLaMA 2 7B to take in game state and user inputs, then it generates what happens next. Every time you interact with your pet, it feeds data to LLaMA 2 and then uses Ollama’s JSON mode to generate unexpected output.</p> <p><img alt="A picture of the homepage of AI Tamago, showing a virtual pet with buttons to feed the pet, play with the pet, clean the pet, discipline the pet, check pet status, and deliver medical care to the pet." src="/blog/how-i-fly-yoko-li/assets/image4.webp" /></p> <p>It’s all the fun of the classic Tamagochi toys from the 90’s (including the ability to randomly discipline your virtual pet) without any of the coin cell batteries or having to carry around the little egg-shaped puck.</p> <p>But that’s just something you can watch, not something that’s as easy to play with on your own machine. Yoko has also worked on the <a href='https://github.com/ykhli/local-ai-stack' title=''>Local AI Starter Kit</a> that lets you go from zero to AI in minutes. It’s a collection of chains of models that let you ingest a bunch of documents, store them in a database, and then use those documents as context for a language model to generate responses. It’s everything you need to implement a “chat with a knowledge base” feature.</p> <h3 id='the-dark-of-ai-experiments' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-dark-of-ai-experiments' aria-label='Anchor'></a><span class='plain-code'>The dark of AI experiments</span></h3> <p>The Local AI Starter Kit is significant because normally to do this, you need to set up billing and API keys for at least four different API providers, and then you need to write a bunch of (hopefully robust) code to tie it all together. With the Local AI Starter Kit, you can do this on your own hardware, with your own data, and your own models privately. It’s a huge step forward for democratizing access to this technology.</p> <p>Document search is one of my favorite usecases for AI, and it’s one of the most immediately useful ones. It’s also one of the most fiddly and annoying to get right. To help illustrate this, I’ve made a diagram of the steps involved with setting up document search by hand:</p> <p><img alt="A diagram showing the process of ingesting a pile of markdown documents into a vector database. The documents are broken into a collection of sections, then each section is passed through an embedding model and the resulting vectors are stored in a vector database." src="/blog/how-i-fly-yoko-li/assets/image3.webp" /></p> <p>You start with your Markdown documents. Most Markdown documents are easily broken up into sections where each section will focus on a single aspect of the larger topic of the document. You can take advantage of this best practice by letting people search for each section individually, which is typically a lot more useful than just searching the entire document.</p> <div class="right-sidenote"><p>Okay, okay, fine. Language encircles concepts instead of defining them directly. The point still stands that we’re operating at a level “below” words and sentences, I don’t want to bog this down in a bunch of linear algebra that neither of us understand well enough to explain in a single paragraph like I am here. The main point is that it lets you “fuzzy match” relevant documents in a way that exact word search queries never could on their own.</p> </div> <p>Essentially, the vector embeddings that you generate from an embedding model are a mathematical representation of the “concepts” that the embedding model uses that are adjacent to the text of your documents. When you use the same model to generate embeddings for your documents and user queries, this lets you find documents that are similar to the query, but not precisely the same exact words. This is called “fuzzy searching” and it is one of the most difficult problems in computer science (right next to naming things).</p> <p>When a user comes to search the database, you do the same thing as ingestion:</p> <p><img alt="A diagram showing the full flow for doing document search Q&amp;A with a vector database. The user submits a question to an API endpoint, the question is broken into embedding vectors and used to search for similar vectors in the database. The relevant document fragments are fed into the prompt for a large language model to generate a response that is grounded in the facts from the documents that were ingested. The response is streamed to the user one token at a time." src="/blog/how-i-fly-yoko-li/assets/image2.webp" /></p> <p>The user query comes into your API endpoint. You use the same embedding model from earlier (omitted from the diagram for brevity) to turn that query into a vector. Then you query the same vector database to find documents that are similar to the query. Then you have a list of documents with metadata like the URL to the documentation page or section fragment in that page. From here you have two options. You can either use the documents to return a list of results to the user, or you can do the more fun thing: using those documents as context for a large language model to generate a response grounded in the relevant facts in those documents.</p> <div class="right-sidenote"><p>I think it’s also how OpenAI’s custom GPTs work, but they haven’t released technical details about how they work so this is outright speculation on my part.</p> </div> <p>This basic pattern is called Retrieval-augmented Generation (RAG), and it’s how Bing’s copilot chatbot works. The Local AI Starter Kit makes setting this pipeline up <em>effortless</em> and <em>fast</em>. It’s a huge step forward for making this groundbreaking technology accessible to everyone.</p> <h2 id='the-struggles' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-struggles' aria-label='Anchor'></a><span class='plain-code'>The struggles</span></h2> <blockquote> <p>When I was trying to get the AI models in AI Town to output JSON, I tried a bunch of different things. I got some good results by telling the model to &ldquo;only reply in JSON, no prose&rdquo;, but we ended up using a model tuned for outputting code. I think I inspired <a href='https://ollama.ai' title=''>Ollama</a> to add their JSON output feature.</p> </blockquote> <p>One of the main benefits of large language models is that they are essentially stochastic models of the entire Internet. They have a bunch of patterns formed that can let you create surprisingly different outputs from similar inputs. This is also one of the main drawbacks of large language models: they are essentially stochastic models of the entire Internet. They have a bunch of patterns formed that can let you create surprisingly different outputs from similar inputs. The outputs of these models are usually correct-ish enough (more correct if you ground the responses in document fact like you do with a Retrieval-augmented Generation system), but they are not always aligned with our observable reality.</p> <p>A lot of the time you will get outputs that don’t make any logical or factual sense. These are called “hallucinations” and they are one of the main drawbacks of large language models. If a hallucination pops in at the worst times, you’ve accidentally told someone how to poison themselves with chocolate chip cookies. This is, as the kids say, “bad”.</p> <p>The inherent randomness of the output of a large language model means that it can be difficult to get an exactly parsable format. Most of the time, you’d be able to coax the model to get usable JSON output, but without schema it can sometimes generate wildly different JSON responses. Only sometimes. This isn’t deterministic and Yoko has found that this is one of the most frustrating parts of working with large language models.</p> <div class="right-sidenote"><p>This works by making any offending ungrammatical tokens weighted to negative infinity. It’s amazingly hacky but the hilarious part is that it works.</p> </div> <p>However, there are workarounds. <a href='https://github.com/ggerganov/llama.cpp' title=''>llama.cpp</a> offers a way to use a grammar file to strictly guide the output of a large language model by using context-free grammar. This lets you get something more deterministic, but it’s still not perfect. It’s a lot better than nothing, though.</p> <p>One of the fun things that can happen with this is that you can have the model fail to generate anything but an endless stream of newlines in JSON mode. This is hilarious and usually requires some special detection logic to handle and restart the query. There’s work being done to let you use JSON schema to guide the generation of large language model outputs, but it’s not currently ready for the masses.</p> <div class="right-sidenote"><p>If it’s dumb and it works, is it really dumb?</p> </div> <p>However, one of the easiest ways to hack around this is by using a model that generates code instead of text. This is how Yoko got the AI Town and AI Tamago models to output JSON that was mostly valid. It’s a hack, but it works. This was made a lot easier for AI town when one of the tools they use (<a href='https://ollama.ai' title=''>Ollama</a>) added support for JSON output from the model. This is a lot better than the code generation model hack, but research continues.</p> <h2 id='the-simple-joy-of-unexpected-outputs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-simple-joy-of-unexpected-outputs' aria-label='Anchor'></a><span class='plain-code'>The simple joy of unexpected outputs</span></h2> <blockquote> <p>When I was making AI Town, I was inspired by <a href='https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Objects' title=''>The Lifecycle of Software Objects</a> by Ted Chiang. It&rsquo;s about a former zookeeper that trained AI agents to be pets, kinda like how we use Reinforcement Learning from Human Feedback to train AI models like ChatGPT.</p> </blockquote> <p>However, at the same time, there are cases where hallucinations are not only useful, but they are what make the implementation of a system possible. If large language models are essentially massive banks of the word frequencies of a huge part of culture, then the emergent output can create unexpected things that happen frequently. This lets you have emergent behavior form, this can be the backbone of games and is the key thing that makes AI Town work as well as it does.</p> <p>AI Tamago is also completely driven off of the results of large language model hallucinations. They are the core of what drives user inputs, the game loop, and the surprising reactions you get when disciplining your pet. The status screen takes in the game state and lets you know what your pet is feeling in a way that the segment displays of the Tamagochi toys could never do.</p> <p>These enable you to build workflows that are <em>augmented</em> by the inherent randomness of the hallucinations instead of seeing them as drawbacks. This means you need to choose outputs that can have the hallucinations shine instead of being ugly warts you need to continuously shave away. Instead of using them for doing pathfinding, have them drive the AI of your characters or writing the A* pathfinding algorithm so you don’t have to write it again for the billionth time.</p> <p>I’m not saying that large language models can replace the output of a human, but they are more like a language server for human languages as well as programming languages. They are best used when you are generating the boilerplate you don’t want to do yourself, or when you are throwing science at the wall to see what sticks.</p> <h2 id='in-conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#in-conclusion' aria-label='Anchor'></a><span class='plain-code'>In conclusion</span></h2> <p>Yoko is showing people how to use AI today, on local machines, with models of your choice, that allow you to experiment, hack and learn.</p> <p>I can’t wait to see what’s next!</p> <p>If you want to follow what Yoko does, here’s a few links to add to your feeds:</p> <ul> <li>Yoko’s <a href='https://twitter.com/stuffyokodraws' title=''>Twitter</a> (or X, or whatever we&rsquo;re supposed to call it now) </li><li>Yoko’s <a href='https://github.com/ykhli' title=''>GitHub</a> </li><li>Yoko’s <a href='https://yoko.dev/' title=''>Website</a> </li></ul> <p>(insert standard conclusion diatribe here)</p> </content> </entry> <entry> <title>Deploy Your Own (Not) Midjourney Bot on Fly GPUs</title> <link rel="alternate" href="https://fly.io/blog/not-midjourney-bot/"/> <id>https://fly.io/blog/not-midjourney-bot/</id> <published>2024-01-04T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/not-midjourney-bot/assets/purple-balloon-taking-off-thumb.webp"/> <content type="html"><div class="lead"><p>Fly.io has Enterprise-grade GPUs and servers all over the globe (or <em>disk</em>, depending on which side of the flat Earth debate you fall on) making it a great place to deploy your next disruptive AI app.</p> </div> <p>Some people daydream about normal things, like coffee machines or raising that Series A round (those are normal things to dream about, right?). I daydream about commanding a fleet of chonky <a href='https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413/' title=''>NVIDIA Lovelace L40Ss</a>. Also, totally normal. Well, fortunately for me and anyone else wanting to explore the world of generative AI — Fly.io has GPUs now!</p> <p>Sure, this technology will probably end up with the AI <a href='https://marketoonist.com/2023/03/ai-written-ai-read.html' title=''>talking to itself</a> while we go about our lives — but it seems like it&rsquo;s here to stay, so we should at least have some fun with it. In this post we&rsquo;ll put these GPUs to task and you&rsquo;ll learn how to build your very own AI image-generating Discord bot, kinda like Midjourney. Available 24/7 and ready to serve up all the pictures of cats eating burritos your heart desires. And because I&rsquo;d never tell you to draw the rest of the owl, I&rsquo;ll link to working code that you can deploy today.</p> <h2 id='latent-diffusion-models-have-entered-the-chat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#latent-diffusion-models-have-entered-the-chat' aria-label='Anchor'></a><span class='plain-code'>Latent Diffusion Models Have Entered the Chat</span></h2> <p>In the realm of AI image generation, two names have become prominent: Midjourney and Stable Diffusion. Both are image generating software that allow you to synthesize an image from a textual prompt. One is a closed source paid service, while the other is open source and can run locally. Midjourney gained popularity because it allowed the less technically-inclined among us to explore this technology through its ease of use. Stable Diffusion democratized access to the technology, but it can be quite tricky to get good results out of it.</p> <p>Enter <a href='https://github.com/lllyasviel/Fooocus' title=''>Fooocus</a> (pronounced <em>focus</em>), an open source project that combines the best of both worlds and offers a user-friendly interface to Stable Diffusion. It&rsquo;s hands down the easiest way to get started with Stable Diffusion. Sure there are more popular tools like Stable Diffusion web UI and ComfyUI, but Fooocus adds some magic to reduce the need to manually tweak a bunch of settings. The most significant feature is probably GPT-2-based &ldquo;<a href='https://github.com/lllyasviel/Fooocus/discussions/117#raw' title=''>prompt expansion</a>&rdquo; to dynamically enhance prompts.</p> <p>The point of Fooocus is to <em>focus</em> on your prompt. The more you put into it, the more you get out. That said, a very simple prompt like &ldquo;forest elf&rdquo; can return high-quality images without the need to trawl the web for prompt ideas or fiddle with knobs and levers (although they&rsquo;re there if you want them).</p> <p>So, what can this thing <em>do</em>? Well, this…</p> <p><img alt="A black and white sketch of hot-air balloon over a mountain range generated using Fooocus with &quot;Pencil Sketch Drawing&quot; style and quality = True" src="/blog/not-midjourney-bot/assets/./balloon-sketch.webp" /></p> <p>Here&rsquo;s the full command I&rsquo;ve used to generate this image: <code>/imagine prompt: sketch of hot-air balloon over a mountain range style1: Pencil Sketch Drawing quality: true ar: 1664×576</code></p> <h2 id='what-were-building' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-were-building' aria-label='Anchor'></a><span class='plain-code'>What We&rsquo;re Building</span></h2> <p>We&rsquo;ll deploy two applications. The code to run the bot itself will run on normal VM hardware, and the API server doing all the hard work synthesizing alpacas out of thin air will run on GPU hardware.</p> <p><img alt="An architecture diagram explaining how the two apps will communicate and return the requested image to an end user." src="/blog/not-midjourney-bot/assets/./arch-diagram.png?center&amp;2/3" /></p> <p>Fooocus is served up as a web UI by default, but with a little elbow grease we can interact with it as a REST API. Fortunately, with more than 25k stars on GitHub at the time of writing, the project has a lively open-source community, so we don&rsquo;t need to do much work here — it&rsquo;s already been done for us. <a href='https://github.com/konieshadow/Fooocus-API' title=''>Fooocus-API</a> is a project that shoves FastAPI in front of a Fooocus runtime. We&rsquo;ll use this for the API server app.</p> <p>The Python-based bot connects to the <a href='https://discord.com/developers/docs/topics/gateway' title=''>Discord Gateway API</a> using the <a href='https://github.com/Pycord-Development/pycord' title=''>Pycord</a> library. When it starts up, it maintains an open pipe for data to flow back and forth via WebSockets. The bot app also includes a client that knows how to talk to the API server using Flycast and request the image it needs via HTTP.</p> <p>When we request an image from Discord using the <code>/imagine</code> slash command, we immediately respond using Pycord&rsquo;s <code>defer()</code> function to let Discord know that the request has been received and the bot is working on it — it&rsquo;ll take a few seconds to process your prompt, fabricate an image, upload it to Discord and let you share it with your friends. This is a blocking operation, so it won&rsquo;t perform well if you have hundreds of people on your Discord Server using the command. For that, you&rsquo;ll want to jiggle some wires to make the code non-blocking. But for for now, this gives us a nice UX for the bot.</p> <p>When the API server returns the image, it gets saved to disk. We&rsquo;ll use the fantastic <a href='https://github.com/sqids/sqids-python' title=''>Sqids</a> library to generate collision-free file names:</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-75afx6ud" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-75afx6ud"><span class="n">unique_id</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">sqids</span><span class="p">.</span><span class="n">encode</span><span class="p">(</span> <span class="p">[</span><span class="n">ctx</span><span class="p">.</span><span class="n">author</span><span class="p">.</span><span class="nb">id</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">())]</span> <span class="p">)</span> <span class="n">result_filename</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"result_</span><span class="si">{</span><span class="n">unique_id</span><span class="si">}</span><span class="s">.png"</span> </code></pre> </div> </div> <p>We&rsquo;ll also use <code>asyncio</code> to check if the image is ready every second, and when it is, we send it off to Discord to complete the request:</p> <div class="highlight-wrapper group relative python"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-w1v7557b" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-w1v7557b"><span class="k">while</span> <span class="ow">not</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">exists</span><span class="p">(</span><span class="n">result_filename</span><span class="p">):</span> <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">result_filename</span><span class="p">,</span> <span class="s">"rb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span> <span class="k">await</span> <span class="n">ctx</span><span class="p">.</span><span class="n">respond</span><span class="p">(</span> <span class="nb">file</span><span class="o">=</span><span class="n">discord</span><span class="p">.</span><span class="n">File</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">result_filename</span><span class="p">)</span> <span class="p">)</span> </code></pre> </div> </div> <p>Neither of these two apps will be exposed to the Internet, yet they&rsquo;ll still be able to communicate with each other. One of the undersold stories about Fly.io is the ease with which two applications can communicate over the private network. We assign special IPv6 private network (6pn) addresses within the same organizational space and applications can effortlessly discover and connect to one another without any additional configuration.</p> <p>But what about load balancing and this &ldquo;scale-to-zero&rdquo; thing? We don&rsquo;t <em>just</em> want our two apps to talk to each other, we want the Fly Proxy to start our Machine when a request comes in, and stop it when idle. For that, we&rsquo;ll need <a href='https://fly.io/docs/reference/private-networking/#flycast-private-load balancing' title=''>Flycast</a>, our private load balancing feature.</p> <p>When you assign a Flycast IP to your app, you can route requests using a special <code>.flycast</code> domain. Those requests are routed through the Fly Proxy instead of directly to instances in your app. Meaning you get all the load balancing, rate limiting and other proxy goodness that you&rsquo;re accustomed to. The Proxy runs a process which can automatically downscale Machines every few minutes. It&rsquo;ll also start them right back up when a request comes in — this means we can take advantage of scale-to-zero, saving us a bunch of money!</p> <h2 id='the-imagine-command' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-imagine-command' aria-label='Anchor'></a><span class='plain-code'>The <code>/imagine</code> Command</span></h2> <p>The slash command is the heart of your bot, enabling you to generate images based on your prompt, right from within Discord. When you type <code>/imagine</code> into the Discord chat, you&rsquo;ll see some command options pop up.</p> <p>You&rsquo;ll need to input your base prompt (e.g. &ldquo;an alpaca sleeping in a grassy field&rdquo;) and optionally pick some styles (&ldquo;Pencil Sketch Drawing&rdquo;, &ldquo;Futuristic Retro Cyberpunk&rdquo;, &ldquo;MRE Dark Cyberpunk&rdquo; etc). With Fooocus, combining multiple styles — &ldquo;style-chaining&rdquo; — can help you achieve amazing results. Set the aspect ratio or provide negative prompts if needed, too.</p> <p>After you execute the command, the bot will request the image from the API, then send it as a response in the chat. Let’s see it in action!</p> <p><img alt="A dif demo run through showcasing the ability of the bot to generate images from Discord" src="/blog/not-midjourney-bot/assets/./demo.gif?card&amp;center" /></p> <h2 id='deployment-speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-speedrun' aria-label='Anchor'></a><span class='plain-code'>Deployment Speedrun</span></h2> <p><strong class='font-semibold text-navy-950'>First, we&rsquo;ll deploy the API server.</strong> For convenience (and to speed things up), we&rsquo;ll use a pre-built image when we deploy. With dependencies like <code>torch</code> and <code>torchvision</code> bundled in, it&rsquo;s a hefty image weighing in just shy of 12GB. With a normal Fly Machine this would not only be a bad idea, but not even possible due to an 8GB limit for the VMs rootfs. Fortunately the wizards behind Fly GPUs have accounted for our need to run huge models and their dependencies, and awarded us 50GB of rootfs.</p> <div class="right-sidenote"><p>Fly GPUs use <a href="https://github.com/cloud-hypervisor/cloud-hypervisor" title="">Cloud Hypervisor</a> and not <a href="https://github.com/firecracker-microvm/firecracker" title="">Firecracker</a> (like a regular Fly Machine) for virtualization. But even with a 12GB image, this doesn’t stop the Machine from booting in seconds when a new request comes in through the Proxy.</p> </div> <p>To start, clone the template <a href='https://github.com/fly-apps/not-midjourney-bot' title=''>repository</a>. You&rsquo;ll need this for both the bot and server apps. Then deploy the server with the Fly CLI:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-ytt1j7os" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-ytt1j7os">fly deploy <span class="se">\</span> <span class="nt">--image</span> ghcr.io/fly-apps/not-midjourney-bot:server <span class="se">\</span> <span class="nt">--config</span> ./server/fly.toml <span class="se">\</span> <span class="nt">--no-public-ips</span> </code></pre> </div> </div> <p>This command tells Fly.io to deploy your application based on the configuration specified in the <code>fly.toml</code>, while the <code>--no-public-ips</code> flag secures your app by not exposing it to the public Internet.</p> <p>Remember Flycast? To use it, we’ll allocate a private IPv6:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-g3tqfpkl" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-g3tqfpkl">fly ips allocate-v6 <span class="nt">--private</span> </code></pre> </div> </div> <p>Now, let&rsquo;s take a look at our <a href='https://github.com/fly-apps/not-midjourney-bot/blob/134bb634f97bf81040e489650f2334b48d976c10/server/fly.toml' title=''><code>fly.toml</code></a> config:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-a3s9879o" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-a3s9879o"><span class="py">app</span> <span class="p">=</span> <span class="s">"alpaca-image-gen"</span> <span class="py">primary_region</span> <span class="p">=</span> <span class="s">"ord"</span> <span class="nn">[[vm]]</span> <span class="py">size</span> <span class="p">=</span> <span class="s">"performance-8x"</span> <span class="py">memory</span> <span class="p">=</span> <span class="s">"16gb"</span> <span class="py">gpu_kind</span> <span class="p">=</span> <span class="s">"l40s"</span> <span class="nn">[[services]]</span> <span class="py">internal_port</span> <span class="p">=</span> <span class="mi">8888</span> <span class="py">protocol</span> <span class="p">=</span> <span class="s">"tcp"</span> <span class="py">auto_stop_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">auto_start_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">min_machines_running</span> <span class="p">=</span> <span class="mi">0</span> <span class="nn">[[services.ports]]</span> <span class="py">handlers</span> <span class="p">=</span> <span class="nn">["http"]</span> <span class="py">port</span> <span class="p">=</span> <span class="mi">80</span> <span class="py">force_https</span> <span class="p">=</span> <span class="kc">false</span> <span class="nn">[mounts]</span> <span class="py">source</span> <span class="p">=</span> <span class="s">"repositories"</span> <span class="py">destination</span> <span class="p">=</span> <span class="s">"/app/repositories"</span> <span class="py">initial_size</span> <span class="p">=</span> <span class="s">"20gb"</span> </code></pre> </div> </div> <p>There are a few key things to note here:</p> <ol> <li>Currently, the NVIDIA L40Ss we&rsquo;re using when we specify <code>gpu_kind</code> are only available in <code>ORD</code>, so that&rsquo;s what we&rsquo;ve set the <code>primary_region</code> to. We&rsquo;re rolling out more GPUs to more regions in a hurry — but for now we&rsquo;ll host the bot in Chicago. </li><li>Out of the box, 8GB of system RAM is suggested. In my testing this wasn&rsquo;t close to enough: the Machine would frequently run out of memory and crash. I got things working better by using 16GB of RAM. </li><li>The FastAPI server binds to port 8888; we need to set this as our <code>internal_port</code>, or the Fly Proxy won&rsquo;t know where to send requests. </li><li>We want our Machine to <a href='https://fly.io/docs/apps/autostart-stop/' title=''>automatically stop and start</a>. </li><li>Flycast doesn&rsquo;t do HTTPS, so we won&rsquo;t force it here. Don&rsquo;t worry, it&rsquo;s still encrypted over the wire! </li><li>A volume is automatically created on the first deploy. On first boot, the app clones the Fooocus repo and downloads the Stable Diffusion model checkpoints onto that volume. This takes a couple of minutes, but the next time the Machine starts, it&rsquo;ll have everything it needs to serve a request within seconds. </li></ol> <div class="callout"><p>The <a href="https://github.com/fly-apps/not-midjourney-bot/blob/84e72d1e7048627b7c845fe3d44d45b278e451d5/README.md" title=""><strong class="font-semibold text-navy-950">README</strong></a> for this project has detailed instructions about setting up your Discord bot and adding it to a Server. After setting up the permissions and privileged intents, you’ll get an OAuth2 URL. Use this URL to invite your bot to your Discord server and confirm the permissions. Once that’s done, grab your Discord API token, you’ll need it for the next step.</p> </div> <p><strong class='font-semibold text-navy-950'>With the API server up and running, it&rsquo;s time to deploy the Discord bot.</strong> This app will run on a normal Fly Machine, no GPU required. First, set the <code>DISCORD_TOKEN</code> and <code>FOOOCUS_API_URL</code> (the Flycast endpoint for the API server) secrets, using the Fly CLI. Then deploy:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-314htg3w" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-314htg3w">fly deploy <span class="se">\</span> <span class="nt">--image</span> ghcr.io/fly-apps/not-midjourney-bot:bot <span class="se">\</span> <span class="nt">--config</span> ./bot/fly.toml <span class="se">\</span> <span class="nt">--no-public-ips</span> </code></pre> </div> </div> <p>Notice that the bot app doesn&rsquo;t need to be publicly visible on the Internet either. Under the hood, the WebSocket connection to Discord&rsquo;s Gateway API allows the bot to communicate freely without the need to define any services in our <code>fly.toml</code>. This also means that the Fly Proxy will not downscale the app like it does the GPU Machine — the bot will always appear &ldquo;online&rdquo;.</p> <figure class="post-cta"> <figcaption> <h1>Not interested in GPUs?</h1> <p>You can still deploy apps on Fly.io today and be up and running in a matter of minutes.</p> <a class="btn btn-lg" href="https://fly.io/docs/speedrun/"> Deploy an app now<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='how-do-i-know-this-thing-is-using-gpu-for-reals' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-do-i-know-this-thing-is-using-gpu-for-reals' aria-label='Anchor'></a><span class='plain-code'>How Do I Know This Thing Is Using GPU for Reals?</span></h2> <p>That&rsquo;s easy! NVIDIA provides us with a neat little command-line utility called <code>nvidia-smi</code> which we can use to monitor and get information about NVIDIA GPU devices.</p> <p>Let&rsquo;s SSH to the running Machine for the API server app and run an <code>nvidia-smi</code> query in one go. It&rsquo;s a little clunky, but you&rsquo;ll get the point:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-v0fauj3q" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-v0fauj3q">fly ssh console <span class="se">\</span> <span class="nt">-C</span> <span class="s2">"nvidia-smi --query-gpu=gpu_name,utilization.gpu,utilization.memory,temperature.gpu,power.draw --format=csv,noheader --loop"</span> </code></pre> </div> </div><div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-j86zv5m2" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-j86zv5m2">Connecting to fdaa:2:f664:a7b:210:d8b2:8fd8:2... complete NVIDIA L40S, 0 %, 0 %, 46, 88.63 W NVIDIA L40S, 0 %, 0 %, 46, 88.61 W NVIDIA L40S, 36 %, 4 %, 51, 103.41 W NVIDIA L40S, 65 %, 25 %, 57, 280.90 W NVIDIA L40S, 0 %, 0 %, 49, 91.13 W NVIDIA L40S, 0 %, 0 %, 48, 89.76 W </code></pre> </div> </div> <p>What we&rsquo;ve done is run the command on a loop while the bot is actually doing work synthesizing an image and we get to see it ramp up and consume more wattage and VRAM. The card is barely breaking a sweat!</p> <h2 id='how-much-will-these-alpaca-pics-cost-me' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-will-these-alpaca-pics-cost-me' aria-label='Anchor'></a><span class='plain-code'>How Much Will These Alpaca Pics Cost Me?</span></h2> <p>Let&rsquo;s talk about the cost-effectiveness of this setup. On Fly.io, an L40S GPU <a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''>costs</a> $2.50/hr. Tag on a few cents per hour for the VM resources and storage for our models and you&rsquo;re looking at about $3.20/hr to run the GPU Machine. It&rsquo;s <em>on-demand</em>, too — if you&rsquo;re not using the compute, you&rsquo;re not paying for it! Keep in mind that some of these checkpoint models can be several gigabytes and if you create a volume, you will be charged for it even when you have no Machines running. It&rsquo;s worth noting too, that the non-GPU bot app falls into our <a href='https://fly.io/docs/about/pricing/#free-allowances' title=''>free allowance</a>.</p> <div class="right-sidenote"><p>Rates are on-demand, with no minimum usage requirements. Discounted rates for reserved GPU Machines and dedicated hosts are also available if you email <a href="mailto:[email protected]" title="">[email protected]</a></p> </div> <p>In comparison, Midjourney offers several subscription tiers with the cheapest plan costing $10/mo and providing 3.3 hours of &ldquo;fast&rdquo; GPU time (roughly equivalent to an enterprise-grade Fly GPU). This works out to about $3/hr give or take a few cents.</p> <h2 id='where-can-i-take-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-can-i-take-this' aria-label='Anchor'></a><span class='plain-code'>Where Can I Take This?</span></h2> <p>There is a lot you can do to build out the bot&rsquo;s functionality. You control the source code for the bot, meaning that you can make it do <em>whatever you want</em>. You might decide to mimic Midjourney&rsquo;s <code>/blend</code> command to splice your own images into prompts (AKA img2img diffusion). You can do this by adding more commands to your <a href='https://guide.pycord.dev/popular-topics/cogs' title=''>Cog</a>, Pycord&rsquo;s way of grouping similar commands. You might decide to add a button to roll the image if you don&rsquo;t like it, or even specify the number of images to return. The possibilities are endless and your cloud bill&rsquo;s the limit!</p> <p>The full code for the bot and server (with detailed instructions on how to deploy it on Fly.io) can be found <a href='https://github.com/fly-apps/not-midjourney-bot' title=''><strong class='font-semibold text-navy-950'>here</strong></a>.</p> </content> </entry> <entry> <title>Fly With Alpine</title> <link rel="alternate" href="https://fly.io/blog/fly-with-alpine/"/> <id>https://fly.io/blog/fly-with-alpine/</id> <published>2023-12-21T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/fly-with-alpine/assets/fly-with-alpine-thumb.webp"/> <content type="html"><div class="lead"><p>Reduce image sizes and improve startup times by switching your base image to Alpine Linux.</p> </div> <p>Before proceeding, a caution. This is an engineering trade-off. Test carefully before deploying to production.</p> <p>By the end of this blog post you should have the information you need to make an informed decision.</p> <h2 id='introduction' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#introduction' aria-label='Anchor'></a><span class='plain-code'>Introduction</span></h2> <p><a href='https://www.alpinelinux.org/about/' title=''>Alpine Linux</a> is a Linux distribution that advertises itself as Small. Simple. Secure.</p> <p>It is indisputably smaller than the alternatives &ndash; when measured by image size. More on that in a bit. Some claim that this results in less memory usage and better performance. Others dispute these claims. For these, it is best that you test the results for yourself with your application.</p> <p>Simple is harder to measure. Some of the larger differences, like <a href='https://github.com/OpenRC/openrc#readme' title=''>OpenRC</a> vs <a href='https://systemd.io/' title=''>SystemD</a>, are less relevant in container environments. Others, like <a href='https://busybox.net/' title=''>BusyBox</a> are implementation details. Essentially what you get is a Linux distribution with perhaps a number of standard packages (e.g., bash) not installed by default, but these can be easily added if needed.</p> <p>Secure is definitely an important attribute. The alternatives make comparable claims in this area. Do your own research in this area and come to your own conclusions.</p> <p>Not mentioned is the downside: Alpine Linux has a smaller ecosystem that the alternatives, particularly when compared to Debian.</p> <h2 id='baseline' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#baseline' aria-label='Anchor'></a><span class='plain-code'>Baseline</span></h2> <p>Let&rsquo;s start with a baseline consisting of the Dockerfiles produced by <code>fly launch</code> for some of the most popular frameworks:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-ywliy2hv" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-ywliy2hv">FROM fideloper/fly-laravel:${PHP_VERSION} FROM hexpm/elixir:1.12.3-erlang-24.1.4-debian-bullseye-20210902-slim FROM node:${NODE_VERSION}-slim FROM oven/bun:${BUN_VERSION}-slim FROM python:${PYTHON_VERSION}-slim-bullseye FROM ruby:$RUBY_VERSION-slim </code></pre> </div> </div> <p>What may not be obvious to the naked eye from these results is that the base image for these is one of the following:</p> <ul> <li>Debian Bookworm (the current &ldquo;stable&rdquo; distribution) </li><li>Debian Bullseye (the previous &ldquo;stable&rdquo; distribution) </li><li>Ubuntu Focal Fossa (the previous LTS release of Ubuntu) </li></ul> <p>Once you factor in that Ubuntu is based on Debian, the conclusion is that Debian is effectively the default distribution for fly IO. Rest assured that this isn&rsquo;t the result of a devious conspiracy by Fly.io, but rather a reflection of the default choices made independently by the developers of a number of frameworks and runtimes. Beyond this, all Fly.io is doing is choosing the &ldquo;slim&rdquo; version of the default distribution for each framework as the base.</p> <p>What&rsquo;s likely going on here is a virtuos circle: people choose Debian because of the ecosystem, and ecosystem grows because people chose Debian.</p> <p>Now lets compare base image sizes:</p> <table class="ml-8 mb-8"> <thead> <tr> <th class="px-8"> <th class="px-8 underline">Alpine <th class="px-8 underline">Debian slim </tr> </thead> <tbody> <tr> <th class="text-left">Bun 1.0.18 <td class="text-center">43.10M <td class="text-center">63.84M </tr> <tr> <th class="text-left">Node 21.4.0 <td class="text-center">46.83M <td class="text-center">70.08M </tr> <tr> <th class="text-left">Python 3.12.1 <td class="text-center">17.59M <td class="text-center">45.36M </tr> <tr> <th class="text-left">Ruby 3.2 <td class="text-center">40.14M <td class="text-center">74.36M </tr> </tbody> </table> <p>And these numbers are just the for the base images. I&rsquo;ve measured a minimal Rails/Postgresql/esbuild application at 304MB on Alpine and 428MB on Debian Slim. A minimal Bun application at 110MB on Alpine and 173MB on Debian Slim. And a minimal Node application at 142MB on Alpine and 207MB on Debian Slim.</p> <p>In each case, corresponding Alpine images are consistently smaller than their Debian slim equivalent.</p> <h2 id='switching-distributions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#switching-distributions' aria-label='Anchor'></a><span class='plain-code'>Switching Distributions</span></h2> <p>Switch distributions (and switching back!) is easy.</p> <p>The first change is to replace <code>-slim</code> with <code>-alpine</code> in <code>FROM</code> statements in your <code>Dockerfile</code>.</p> <p>Next is to replace <code>apt-get update</code> with <code>apk update</code> and <code>apt-get install</code> with <code>apk add</code>. Delete any options you may have like <code>-y</code> and <code>--no-install-recommends</code> - they aren&rsquo;t needed.</p> <p>Now review the names of the packages you are installing. Many are named the same. A few are different. You can use <a href='https://pkgs.alpinelinux.org/packages' title=''>alpine packages</a> to look for ones to use. Some examples of differences:</p> <table class="ml-8 mb-8" style="border-collapse: separate; border-spacing: 1rem 0"> <thead> <tr> <th class="px-8 underline text-left">Debian <th class="px-8 underline text-left">Alpine </tr> </thead> <tbody> <tr> <td>build-essential <td>build-base </tr> <tr> <td>chromium-sandbox <td>chromium-chromedriver </tr> <tr> <td>default-libmysqlclient-dev <td>mysql-client </tr> <tr> <td>default-mysqlclient <td>mysql-client </tr> <tr> <td>freedts-bin <td>freedts </tr> <tr> <td>libicu-dev <td>icu-dev </tr> <tr> <td>libjemalloc <td>jemalloc-dev </tr> <tr> <td>libjpeg-dev <td>jpeg-dev </tr> <tr> <td>libmagickwand-dev <td>imagemagick-libs </tr> <tr> <td>libsqlite3-0 <td>sqlite-dev </tr> <tr> <td>libtiff-dev <td>tiff-dev </tr> <tr> <td>libvips <td>vips-dev </tr> <tr> <td>node-gyp <td>gyp </tr> <tr> <td>pkg-config <td>pkgconfig </tr> <tr> <td>python <td>python3 </tr> <tr> <td>python-is-python3 <td>python3 </tr> <tr> <td>sqlite3 <td>sqlite </tr> </tbody> </table> <p>Note: the above is just an approximation. For example, while <code>libsqlite3-0</code> and <code>sqlite-dev</code> include everything you need to build an application that uses sqlite3, all that is needed at runtime is <code>sqlite-lib</code>. This relentless attention to detail contributes to smaller final image sizes.</p> <p>Note: For Bun, Node, and Rails users, knowledge of how to apply the above changes are included in recent versions of the dockerfile generators that we provide. After all, computers are good at <code>if</code> statements:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-q2q9lq4b" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-q2q9lq4b">bunx dockerfile --alpine npx dockerfile --alpine bin/rails generate dockerfile --alpine </code></pre> </div> </div><figure class="post-cta"> <figcaption> <h1>Choose your own Linux Distribution</h1> <p>Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.</p> <a class="btn btn-lg" href="https://fly.io/docs/"> Run your entire stack near your users </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-rabbit.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='potential-issues' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#potential-issues' aria-label='Anchor'></a><span class='plain-code'>Potential issues</span></h2> <p>Over time, we&rsquo;ve noted a number of issues.</p> <ul> <li>Alpine uses <a href='https://musl.libc.org/' title=''>musl</a> for a runtime library. Debian uses <a href='https://www.gnu.org/software/libc/' title=''>glibc</a>. Software tested on glibc may not work as expected on musl. And there are other potential compatibility issues like <a href='https://bell-sw.com/blog/how-to-deal-with-alpine-dns-issues/' title=''>DNS</a>. </li><li>Debian includes both <code>adduser</code> and <code>useradd</code>. Alpine, by default, only includes <code>adduser</code>. This can be addressed by installing package like <a href='https://pkgs.alpinelinux.org/package/edge/community/armv7/shadow' title=''>shadow</a>, or switching to <code>adduser</code>. </li><li>Packages like <a href='https://github.com/nodenv/node-build' title=''>node-build</a> require <code>bash</code> which isn&rsquo;t included by default. Adding it back in allows <code>node-build</code> to run to completion, but the end result is that a precompiled Debian executable is installed that won&rsquo;t run on Alpine. An alternative is to download an <a href='https://unofficial-builds.nodejs.org/' title=''>unofficial build</a>. </li><li>Release candidates for Alpine may not get the same level of testing as Debian resulting in problems like <a href='https://github.com/sparklemotion/sqlite3-ruby/issues/434' title=''>sqlite3-ruby not working on Alpine 3.19</a>. In cases like this, stay back on previous versions of Alpine for a short while, or compile the gem for yourself. These issues are temporary. </li><li>Some packages, like Chrome, are not available for Alpine. Alternatives like Chromium may be necessary. </li></ul> <h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>While not as large a community as Debian, there is a substantial number of happy Alpine users.</p> <p>For the forseeable future, the default for both frameworks and there fly.io will remain Debian, but we make it easy to switch.</p> <p>Try it out! Hopefully this blog has provided insight into what you should evaluate for before you switch.</p> </content> </entry> <entry> <title>Introducing Fly Kubernetes</title> <link rel="alternate" href="https://fly.io/blog/fks/"/> <id>https://fly.io/blog/fks/</id> <published>2023-12-18T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/fks/assets/fks-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io, and if you’ve been following us awhile you probably just did a double-take. We’re building a new public cloud that runs containerized applications with virtual machine isolation on our own hardware around the world. And we’ve been doing it without any K8s. Until now!</p> </div><div class="callout"><p><strong class="font-semibold text-navy-950">Update, March 2024:</strong> FKS does more stuff now, and you can read about it in <a href="https://fly.io/blog/fks-beta-live/" title="">Fly Kubernetes does more now</a></p> </div> <p>We&rsquo;ll own it: we&rsquo;ve been snarky about Kubernetes. We are, at heart, old-school Unix nerds. We&rsquo;re still scandalized by <code>systemd</code>.</p> <p>To make matters more complicated, the problems we&rsquo;re working on <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>have a lot of overlap with K8s</a>, but <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/#numad' title=''>just enough impedance mismatch</a> that it (<a href='https://www.nomadproject.io/' title=''>or anything that looks like it</a>) is a bad fit for our own platform.</p> <p>But, come on: you never took us too seriously about K8s, right? K8s is hard for us to use, but that doesn&rsquo;t mean it&rsquo;s not a great fit for what you&rsquo;re building. We&rsquo;ve been clear about that all along, right? Sure we have!</p> <p>Well, good news, everybody! If K8s is important for your project, and that&rsquo;s all that&rsquo;s been holding you back from <a href='https://fly.io/docs/speedrun/' title=''>trying out Fly.io</a>, we&rsquo;ve spent the past several months building something for you.</p> <h2 id='fly-io-for-kubernetians' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-io-for-kubernetians' aria-label='Anchor'></a><span class='plain-code'>Fly.io For Kubernetians</span></h2> <p>Fly.io works by transmogrifying Docker containers into filesystems for <a href='https://firecracker-microvm.github.io/' title=''>lightweight hypervisors</a>, and running them on servers we rack in dozens of regions around the world.</p> <p>You can build something like Fly.io with &ldquo;standard&rdquo; orchestration tools like K8s. In fact, that&rsquo;s what we did to start, too. To keep things simple, we used Nomad, and instead of K8s CNIs, we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system <a href='https://fly.io/blog/bpf-xdp-packet-filters-and-udp/' title=''>based on eBPF</a>). But the ideas are the same.</p> <p>The way we look at it, the signature feature of a &ldquo;standard&rdquo; orchestrator is the global scheduler: the global eye in the sky that keeps track of vacancies on servers and optimized placement of new workloads. That&rsquo;s the problem we ran into. We&rsquo;re running over 200,000 applications, and we&rsquo;re doing so on every continent except Antarctica. The speed of light (and a globally distributed network of backhoes) has something to say about keeping a perfectly consistent global picture of hundreds of thousands of applications, and it&rsquo;s not pleasant.</p> <p>The other problem we ran into is that our Nomad scheduler kept trying to outsmart us, and, worse, our customers. It turns out that our users have pretty firm ideas of where they&rsquo;d like their apps to run. If they ask for São Paulo, they want São Paulo, not Rio. But global schedulers have other priorities, like optimally bin-packing resources, and sometimes <code>GIG</code> looks just as good as <code>GRU</code> to them.</p> <p>To escape the scaling and DX problems we were hitting, we rethought orchestration. Where orchestrators like K8s tend to work through distributed consensus, we keep state local to workers. Each racked server in our fleet is a source of truth about the apps running on it, and provide an API to a market-style &ldquo;scheduler&rdquo; that bids on resources in regions. <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/#numad' title=''>You can read more about here, if you&rsquo;re interested.</a> We call this system the <a href='https://fly.io/docs/machines/' title=''>Fly Machines API.</a></p> <p>An important detail to grok about how this all works – a reason we haven&rsquo;t, like, beaten the CAP theorem by doing this – is that Fly Machines API calls can fail. If Nomad or K8s tries to place a workload on some server, only to find out that it&rsquo;s filled up or thrown a rod, it will go hunt around for some other place to put it, like a good little robot. The Machines API won&rsquo;t do this. It&rsquo;ll just fail the request. In fact, it goes out of its way to fail the request quickly, to deliver feedback; if we can&rsquo;t schedule work in <code>JNB</code> right now, you might want instead to quickly deploy to <code>BOM</code>.</p> <h2 id='pluggable-orchestration-and-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pluggable-orchestration-and-fks' aria-label='Anchor'></a><span class='plain-code'>Pluggable Orchestration and FKS</span></h2> <p>In a real sense what we&rsquo;ve done here is extract a chunk of the scheduling problem out of our orchestrator, and handed it off to other components. For most of our users, that component is <a href='https://github.com/superfly/flyctl' title=''><code>flyctl</code>, our intrepid CLI</a>.</p> <p>But <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Fly Machines is an API</a>, and anything can drive it. A lot of our users want quick answers to requests to schedule apps in specific regions, and <code>flyctl</code> does a fine job of that. But it&rsquo;s totally reasonable to want something that works more like the good little robots inside of K8s.</p> <p>You can build your own orchestrator with our API, but if what you&rsquo;re looking for is literally Kubernetes, we&rsquo;ve saved you the trouble. It&rsquo;s called Fly Kubernetes, or FKS for short.</p> <p>FKS is an implementation of Kubernetes that runs on top of Fly.io. You start it up using <code>flyctl</code>, by running <code>flyctl ext k8s create</code>.</p> <p>Under the hood, FKS is a straightforward combination of two well-known Kubernetes projects: <a href='https://k3s.io/' title=''>K3s, the lightweight CNCF-certified K8s distro</a>, and <a href='https://virtual-kubelet.io/' title=''>Virtual Kubelet</a>.</p> <p>Virtual Kubelet is interesting. In K8s-land, a <code>kubelet</code> is a host agent; it&rsquo;s the thing that runs on every server in your fleet that knows how to run a K8s Pod. Virtual Kubelet isn&rsquo;t a host agent; it&rsquo;s a software component that pretends to be a host, registering itself with K8s as if it was one, but then sneakily proxying the Kubelet API elsewhere.</p> <p>In FKS, &ldquo;elsewhere&rdquo; is <a href='https://fly.io/docs/machines/' title=''>Fly Machines</a>. All we have to do is satisfy various APIs that virtual kubelet exposes. For example, the API for the lifecycle of a pod:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-did7dsc1" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-did7dsc1">type PodLifecycleHandler interface { CreatePod(ctx context.Context, pod *corev1.Pod) error UpdatePod(ctx context.Context, pod *corev1.Pod) error DeletePod(ctx context.Context, pod *corev1.Pod) error GetPod(ctx context.Context, namespace, name string) (*corev1.Pod, error) GetPodStatus(ctx context.Context, namespace, name string) (*corev1.PodStatus, error) GetPods(context.Context) ([]*corev1.Pod, error) } </code></pre> </div> </div> <p>This interface is easy to map to the Fly Machines API. For example:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-hv82buwy" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-hv82buwy">CreatePod -&gt; POST /apps/{app_name}/machines UpdatePod -&gt; POST /apps/{app_name}/machines/{machine_id} </code></pre> </div> </div> <p>K3s, meanwhile, is a stripped-down implementation of all of K8s that fits into a single binary. K3s does a bunch of clever things to be as streamlined as it is, but the most notable of them is <a href='https://github.com/k3s-io/kine' title=''>kine, an API shim that switches <code>etcd</code> out with databases like SQLite</a>. Because of <code>kine</code>, K3s can manage multiple servers, but also gracefully runs on a single server, without distributed state.</p> <p>So that&rsquo;s what we do. When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine. We compile a <a href='https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/' title=''>kubeconfig</a>, with which you can talk to your K3s via <code>kubectl</code>. We set the whole thing up to run Pods on individual Fly Machines, so your cluster scales out directly using our platform, but with K8s tooling.</p> <p>One thing we like about this design is how much of the lifting is already done for us by the underlying platform. If you&rsquo;re a K8s person, take a second to think of all the different components you&rsquo;re dealing with: <a href='https://etcd.io/' title=''>etcd</a>, specifically provisioned nodes, the <a href='https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/' title=''>kube-proxy</a>, <a href='https://github.com/flannel-io/flannel' title=''>a CNI </a>binary and configuration and its integration with the host network, containerd, registries. But Fly.io already does most of those things. So this project was mostly chipping away components until we found the bare minimum: CoreDNS, SQLite persistence, and Virtual Kubelet.</p> <p>We ended up with something significantly simpler than K3s, which is saying something.</p> <p>Fly Kubernetes has some advantages over plain <code>flyctl</code> and <code>fly.toml</code>:</p> <ul> <li>Your deployment is more declarative than it is with the <code>fly.toml</code> file. You declare the exact state of everything down to replica counts, autoscaling rules, volume definitions, and more. </li><li>When you deploy with Fly Kubernetes, Kubernetes will automatically make your definitions match the state of the world. Machines go down? Kubernetes will whack them back online. </li></ul> <p>This is a different way to do orchestration and scheduling on Fly.io. It&rsquo;s not what everyone is going to want. But if you want it, you really want it, and we&rsquo;re psyched to give it to you: Fly.io&rsquo;s platform features, with Kubernetes handling configuration and driving your system to its desired state.</p> <p>We&rsquo;ve kept things simple to start with. There are K8s use cases we&rsquo;re a strong fit for today, and others we&rsquo;ll get better at in the near future, as K8s users drive the underlying platform (and particularly our proxy) forward.</p> <p><strong class='font-semibold text-navy-950'>Interested in getting early access? Email us at <a href="mailto:[email protected]">[email protected]</a> and we&rsquo;ll hook you up.</strong></p> <figure class="post-cta"> <figcaption> <h1>Not invested in K8s?</h1> <p>Nothing has to change for you! You can deploy apps on Fly.io today, in a matter of minutes, without talking to Sales.</p> <a class="btn btn-lg" href="https://fly.io/docs/speedrun/"> Deploy an app in minutes.<span class='opacity-50'>→</span> </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <div class="youtube-container" data-exclude-render> <div class="youtube-video"> <iframe width="100%" height="100%" src="https://www.youtube.com/embed/A3vFfZvUiwo" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe> </div> </div> <h2 id='what-it-all-means' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-all-means' aria-label='Anchor'></a><span class='plain-code'>What It All Means</span></h2> <p>One obvious thing it means is that you&rsquo;ve got an investment in Kubernetes tooling, you can keep it while running things on top of Fly.io. So that&rsquo;s pretty neat. Buy our cereal!</p> <p>But the computer science story is interesting, too. We placed a bet on an idiosyncratic strategy for doing global orchestration. We replaced global consensus, which is how Borg, Kubernetes, and Nomad all work, with a market-based system. That system was faster and, importantly, dumber than the consensus system it replaced.</p> <p>This had costs! Nomad&rsquo;s global consensus would do truly heroic amounts of work to make sure Fly Apps got scheduled somewhere, anywhere. Like a good capitalist, Fly Machines will tell you in no uncertain terms how much work it&rsquo;s willing to do for you (&ldquo;less than a Nomad&rdquo;).</p> <p>But that doesn&rsquo;t mean you&rsquo;re stuck with the answers Fly Machines gives by itself. Because Fly Machines is so simple, and tries so hard to be predictable, we hoped you&rsquo;d be able to build more sophisticated scheduling and orchestration schemes on top of it. And here you go: Kubernetes scheduling, as a plugin to the platform.</p> <p>More to come! We&rsquo;re itching to see just how many different ways this bet might pay off. Or: we&rsquo;ll perish in flames! Either way, it&rsquo;ll be fun to watch.</p> </content> </entry> <entry> <title>Fly.io has GPUs now</title> <link rel="alternate" href="https://fly.io/blog/fly-io-has-gpus-now/"/> <id>https://fly.io/blog/fly-io-has-gpus-now/</id> <published>2023-12-13T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/fly-io-has-gpus-now/assets/llama-portal-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io, we’re a new public cloud that lets you put your compute where it matters: near your users. Today we’re announcing that you can do this with GPUs too, allowing you to do AI workloads on the edge. Want to find out more? Keep reading.</p> </div><h2 id='ai-is-pretty-fly' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ai-is-pretty-fly' aria-label='Anchor'></a><span class='plain-code'>AI is pretty fly</span></h2> <p>AI is apparently a bit of a <em>thing</em> (maybe even <em>an thing</em> come to think about it). We&rsquo;ve seen entire industries get transformed in the wake of ChatGPT existing (somehow it&rsquo;s only been around for a year, I can&rsquo;t believe it either). It&rsquo;s likely to leave a huge impact on society as a whole in the same way that the Internet did once we got search engines. Like any good venture-capital funded infrastructure provider, we want to enable you to do hilarious things with AI using industrial-grade muscle.</p> <p>Fly.io lets you run a full-stack app&mdash;or an entire dev platform based on the <a href='https://fly.io/docs/machines/' title=''>Fly Machines API</a>&mdash;close to your users. Fly.io GPUs let you attach an <a href='https://www.nvidia.com/en-us/data-center/a100/' title=''>Nvidia A100</a> to whatever you&rsquo;re building, harnessing the full power of CUDA with more VRAM than your local 4090 can shake a ray-traced stick at. With these cards (or whatever you call a GPU attached to SXM fabric), AI/ML workloads are at your fingertips. You can <a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>recognize speech</a>, segment text, summarize articles, synthesize images, and more at speeds that would make your homelab blush. You can even set one up as your programming companion with <a href='https://github.com/deepseek-ai/DeepSeek-Coder' title=''>your model of choice</a> in case you&rsquo;ve just not been feeling it with the output of <em>other</em> models changing over time.</p> <p>If you want to find out more about what these cards are and what using them is like, check out <a href='https://fly.io/blog/what-are-these-gpus-really/' title=''>What are these &ldquo;GPUs&rdquo; really?</a> It covers the history of GPUs and why it&rsquo;s ironic that the cards we offer are called &ldquo;Graphics Processing Units&rdquo; in the first place.</p> <h2 id='fly-io-gpus-in-action' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-io-gpus-in-action' aria-label='Anchor'></a><span class='plain-code'>Fly.io GPUs in Action</span></h2> <p>We want you to deploy your own code with your favorite models on top of Fly.io&rsquo;s cloud backbone. Fly.io GPUs make this really easy.</p> <p>You can get a GPU app running <a href='https://ollama.ai' title=''>Ollama</a> (our friends in text generation) in two steps:</p> <ol> <li><p>Put this in your <code>fly.toml</code>:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-l8a9wi1z" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-l8a9wi1z"><span class="py">app</span> <span class="p">=</span> <span class="s">"sandwich_ai"</span> <span class="py">primary_region</span> <span class="p">=</span> <span class="s">"ord"</span> <span class="py">vm.size</span> <span class="p">=</span> <span class="s">"a100-40gb"</span> <span class="nn">[build]</span> <span class="py">image</span> <span class="p">=</span> <span class="s">"ollama/ollama"</span> <span class="nn">[mounts]</span> <span class="py">source</span> <span class="p">=</span> <span class="s">"models"</span> <span class="py">destination</span> <span class="p">=</span> <span class="s">"/root/.ollama"</span> <span class="py">initial_size</span> <span class="p">=</span> <span class="s">"100gb"</span> </code></pre> </div> </div></li><li><p>Run <code>fly apps create sandwich_ai &amp;&amp; fly deploy</code>.</p> </li></ol> <p>If you want to read more about how to start your new sandwich empire, check out <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a>, it explains how to set up Ollama so that it <em>automatically scales itself down</em> when it&rsquo;s not in use.</p> <h2 id='the-speed-of-light-is-only-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-speed-of-light-is-only-so-fast' aria-label='Anchor'></a><span class='plain-code'>The speed of light is only so fast</span></h2> <p>Being able to spin up GPUs is great, but where Fly.io really shines is inference at the edge.</p> <p>Let&rsquo;s say you have an app that lets users enter ingredients they have in their kitchen and receive a sandwich recipe. Your users expect their recipes <em>instantly</em> (or at least as fast as the other leading apps). Seconds count when you need an emergency sandwich.</p> <div class="left-sidenote"><p><br> <br> <br> It’s depressingly customary in the AI industry to cherry-pick outputs. This was not cherry-picked. I used <a href="https://ollama.ai/library/yi:34b" title="">yi:34b</a> to generate this recipe. I’m not sure what a taco salad sandwich is, but I might be willing to try it.</p> </div> <p><img alt="A conversation between a user and an artificial intelligence. The user asks: &quot;What kind of sandwich can I make with an egg, a taco shell, and a pound of ground beef?&quot; The AI responds: &quot; You can make a breakfast taco or a taco salad sandwich with the ingredients you have. Here&#39;s how to prepare each: for the breakfast taco. Cook the pound of ground beef in a pan until browned. Season with salt, pepper, and any other desired spices like chili powder or taco seasoning. While the beef is cooking, fry an egg (or scramble it if you prefer) in another pan. Warm up the taco shell in the oven or microwave according to the package instructions. Assemble the taco by placing the ground cooked beef in the taco shell followed by the fried egg. You can also add any additional toppings like diced onions avocados, avocado, salsa or cheese if you have them. Serve with a side of hot sauce or sour cream if desired. For the taco salad sandwich: Prepare the ground beef as in the breakfast taco recipe above. Instead of frying an egg, you can cook it sunny side up or over easy depending on your preference. Crumble the taco shell into small pieces and mix them with the cooked ground beef. Assemble the sandwich by layering the crumbled taco shell mixture and ground beef in two slices of bread. Top with the fried egg and any additional toppings you like such as shredded lettuce, diced tomatoes, or sliced olives. Serve with a side of salsa or ranch dressing if desired. Both options are hardy and can be easily customized based on your taste preferences. Enjoy!&quot;" src="/blog/fly-io-has-gpus-now/assets/sandwich-recipes.webp" /></p> <p>In the previous snippet, we deployed our app to ord (<code>primary_region = &quot;ord&quot;</code>). The good news is that our model returns a result really quickly and users in Chicago get instant sandwich recipes. It&rsquo;s a good experience for users near your datacentre, and you can do this on any half decent cloud provider.</p> <p>But surely people outside of Chicago need sandwiches too. Amsterdam has sandwich fiends as well. And sometimes it takes too long to have their requests leap across the pond. The speed of light is only so fast after all. Don&rsquo;t worry, we&rsquo;ve got your back. Fly.io has GPUs in datacentres all over the world. Even more, we&rsquo;ll let you run <em>the same program</em> with the same public IP address and the same TLS certificates in any regions with GPU support.</p> <p>Don&rsquo;t believe us? See how you can scale your app up in Amsterdam with one command:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-404ps1ts" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-404ps1ts">fly scale count 2 --region ams </code></pre> </div> </div> <p>It&rsquo;s that easy.</p> <h2 id='actually-on-demand' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#actually-on-demand' aria-label='Anchor'></a><span class='plain-code'>Actually On-Demand</span></h2> <p>GPUs are powerful parallel processing packages, but they&rsquo;re not cheap! Once we have enough people wanting to turn their fridge contents into tasty sandwiches, keeping a GPU or two running makes sense. But we&rsquo;re just a small app still growing our user base while also funding the latest large sandwich model research. We want to only pay for GPUs when a user makes a request.</p> <p>Let&rsquo;s open up that <code>fly.toml</code> again, and add a section called <code>services</code>, and we&rsquo;ll include instructions on how we want our app to scale up and down:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-cfo4p0z3" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-cfo4p0z3"><span class="nn">[[services]]</span> <span class="py">internal_port</span> <span class="p">=</span> <span class="mi">8080</span> <span class="py">protocol</span> <span class="p">=</span> <span class="s">"tcp"</span> <span class="py">auto_stop_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">auto_start_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">min_machines_running</span> <span class="p">=</span> <span class="mi">0</span> </code></pre> </div> </div> <p>Now when no one needs sandwich recipes, you don&rsquo;t pay for GPU time.</p> <h2 id='the-deets' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-deets' aria-label='Anchor'></a><span class='plain-code'>The Deets</span></h2> <p>We have GPUs ready to use in several US and EU regions and Sydney. You can deploy your sandwich, music generation, or AI illustration apps to:</p> <ul> <li><a href='https://www.nvidia.com/en-us/data-center/a100/' title=''>Ampere A100s</a> with 40gb of RAM for $2.50/hr </li><li><a href='https://www.nvidia.com/en-us/data-center/a100/' title=''>Ampere A100s</a> with 80gb of RAM for $3.50/hr </li><li><a href='https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413' title=''>Lovelace L40s</a> are coming soon (update: now here!) for $2.50/hr </li></ul> <p>By default, anything you deploy to GPUs will use eight heckin&rsquo; <a href='https://www.amd.com/en/processors/epyc-server-cpu-family' title=''>AMD EPYC</a> CPU cores, and you can attach volumes up to 500 gigabytes. We&rsquo;ll even give you discounts for reserved instances and dedicated hosts if you ask nicely.</p> <p>We hope you have fun with these new cards and we&rsquo;d love to see what you can do with them! Reach out to us on X (formerly Twitter) or <a href='https://community.fly.io/' title=''>the community forum</a> and share what you&rsquo;ve been up to. We&rsquo;d love to see what we can make easier!</p> </content> </entry> <entry> <title>What are these "GPUs" really?</title> <link rel="alternate" href="https://fly.io/blog/what-are-these-gpus-really/"/> <id>https://fly.io/blog/what-are-these-gpus-really/</id> <published>2023-12-11T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/what-are-these-gpus-really/assets/gpu-songstress-thumb.webp"/> <content type="html"><div class="lead"><p>Fly.io runs containerized apps with virtual machine isolation on our own hardware around the world, so you can safely run your code close to where your users are. We’re in the process of rolling out GPU support, and that’s what this post is about, but you don’t have to wait for that to try us out: <a href="https://fly.io/docs/speedrun/" title="">your app can be up and running on us in minutes</a>.</p> </div> <p>GPU hardware will let our users run all sorts of fun Artificial Intelligence and Machine Learning (AI/ML) workloads near their users. But, what are these &ldquo;GPUs&rdquo; really? What can they do? What <em>can&rsquo;t</em> they do?</p> <p>Listen here for my tale of woe as I spell out exactly what these cards are, are not, and what you can do with them. By the end of this magical journey, you should understand the true irony of them being called &ldquo;Graphics Processing Units&rdquo; and why every marketing term is always bad forever.</p> <h2 id='how-does-computer-formed' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-does-computer-formed' aria-label='Anchor'></a><span class='plain-code'>How does computer formed?</span></h2> <p>In the early days of computing, your computer generally had a few basic components:</p> <ul> <li>The CPU </li><li>Input device and assorted peripherals (keyboard, etc) </li><li>Output device (monitor, printer, etc) </li><li>Memory </li><li>Glue logic chips </li><li>Video rendering hardware </li></ul> <p>Taking the Commodore 64 as an example, it had a CPU, a chip to handle video output, a chip to handle audio output, and a chip to glue everything together. The CPU would read instructions from the RAM and then execute them to do things like draw to the screen, solve sudoku puzzles, play sounds, and so on.</p> <p>However, even though the CPU by itself was fast by the standards of the time, it could only do a million clock cycles per second or so. Imagine a very small shouting crystal vibrating millions of times per second triggering the CPU to do one part of a task and you&rsquo;ll get the idea. This is fast, but not fast enough when executing instructions can take longer than a single clock cycle and when your video output device needs to be updated 60 times per second.</p> <p>The main way they optimized this was by shunting a lot of the video output tasks to a bespoke device called the VIC-II (Video Interface Chip, version 2). This allowed the Commodore 64 to send a bunch of instructions to the VIC-II and then let it do its thing while the CPU was off doing other things. This is called &ldquo;offloading&rdquo;.</p> <p><img src="/blog/what-are-these-gpus-really/assets/./deus-ex-machina-cover.webp" /></p> <p>As technology advanced, the desire to do bigger and better things with both contemporary and future hardware increased. This came to a head when this little studio nobody had ever heard of called id Software released one of the most popular games of all time: DOOM.</p> <p>Now, even though DOOM was a huge advancement in gaming technology, it was still incredibly limited by the hardware of the time. It was actually a 2D game that used a lot of tricks to make it look (and feel) like it was 3D. It was also limited to a resolution of 320x200 and a hard cap of 35 frames per second. This was fine for the time (most movies were only at 24 frames per second), but it was clear that there was a lot of room for improvement.</p> <p>One of the main things that DOOM did was to use a pair of techniques to draw the world at near real-time. It used a combination of &ldquo;raycasting&rdquo; and binary-space partitioning to draw the world. This basically means that they drew a bunch of imaginary lines to where points in the map would be to figure out what color everything would be and then eliminated the parts of the map that were behind walls and other objects. This is a very simplified explanation, and if you want to know more, <a href='https://fabiensanglard.net/doomIphone/doomClassicRenderer.php' title=''>Fabien Sanglard explains the rendering</a> of DOOM in more detail.</p> <h2 id='the-dream-of-3d' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-dream-of-3d' aria-label='Anchor'></a><span class='plain-code'>The dream of 3D</span></h2> <p>However, a lot of this was logic that ran very slowly on the CPU, and while the CPU was doing the display logic, it couldn&rsquo;t do anything else, such as enemy AI or playing sounds. Hence the idea of a &ldquo;3D accelerator card&rdquo;. The idea: offload the 3D rendering logic to a separate device that could do it much faster than the CPU could, and free the CPU to do other things like AI, sound, and so on.</p> <p>This was the dream, but it was a long way off. Then Quake happened.</p> <div class="right-sidenote"><p>Really, Half-Life is based on Quake so much that the pattern for <a href="https://www.pcgamer.com/half-life-alyxs-lights-flicker-just-like-they-did-in-quake-almost-25-years-later/" title="">blinking lights</a> has carried forward 25 years later to Half-Life: Alyx in VR. If it ain’t broke, don’t fix it.</p> </div> <p>Unlike Doom, Quake was fully 3D on unmodified consumer hardware. Players could look up and down (something previously thought impossible without accelerator hardware!) and designers could make levels with that in mind. Quake also allowed much more complex geometry and textures. It was a huge leap forward in 3D gaming and it was only possible because of the massive leap in CPU power at the time. The Pentium family of processors was such a huge leap that it allowed them to bust through and do it in &ldquo;real time&rdquo;. Quake has since set the standard for multiplayer deathmatch games, and its source code has lineage to Call of Duty, Half-Life, Half-Life 2, DotA 2, Titanfall, and Apex Legends.</p> <p>However, the thing that really made 3D accelerator cards leap into the public spotlight was another little-known studio called Crystal Dynamics and their 1996 release of Tomb Raider. It was built from the ground up to require the use of 3D accelerator cards. The cards flew off the shelves.</p> <p>&ldquo;3D accelerator cards&rdquo; would later become known as &ldquo;Graphics Processing Units&rdquo; or GPUs because of how synonymous they became with 3D gaming, engineering tasks such as Computer-Aided Drafting (CAD), and even the entire OS environment with compositors like <a href='https://en.wikipedia.org/wiki/Desktop_Window_Manager' title=''>DWM</a> on Windows Vista, <a href='https://en.wikipedia.org/wiki/Compiz' title=''>Compiz</a> on GNU+Linux, and <a href='https://en.wikipedia.org/wiki/Quartz_(graphics_layer)' title=''>Quartz</a> on macOS. Things became so much easier for everyone when 2D and 3D graphics were integrated into the same device so you didn&rsquo;t need to chain your output through your 3D accelerator card!</p> <h2 id='the-gpu-as-we-know-it' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-gpu-as-we-know-it' aria-label='Anchor'></a><span class='plain-code'>The GPU as we know it</span></h2> <p>When GPUs first came out, they were very simple devices. They had a few basic components:</p> <ul> <li>A framebuffer to store the current state of the screen </li><li>A command processor to take instructions from the game and translate them into something the hardware can understand </li><li>Memory to store temporary data </li><li>Shader processing hardware to allow designers to change how light and textures were rendered </li><li>A display output that was chained through an existing VGA card so that the user could see what was going on in real time (yes, this is something we actually did) </li></ul> <p>This basic architecture has remained the same for the past 20 years or so. The main differences are that as technology advanced, the capabilities of those cards increased. They got faster, more parallel, more capable, had more memory, were made cheaper, and so on. This gradually allowed for more and more complex games like Half-Life 2, Crysis, The Legend of Zelda: Breath of the Wild, Baudur&rsquo;s Gate 3, and so on.</p> <p>Over time, as more and more hardware was added, GPUs became computers in their own rights (sometimes even bigger than the rest of the computer thanks for the need to cool things more aggressively). This new hardware includes:</p> <ul> <li>Video encoding hardware via NVENC and AMD VCE so that content creators can stream and record their gameplay in higher quality without having to impact the performance of the game </li><li><aside class="left-sidenote">Seriously, once you experience high framerate HDR raytraced Tetris you can&rsquo;t really go back to the old way.</aside> Raytracing accelerator cores via RTX so that light can be rendered more realistically </li><li>AI/ML cores to allow for dynamic upscaling to eke out more performance from the card </li><li>Display output hardware to allow for multiple monitors to be connected to the card </li><li>Faster and faster memory buses and interfaces to the rest of the system to allow for more data to be processed faster </li><li>Direct streaming from the drive to GPU memory to allow for faster loading times </li></ul> <p>But, at the same time, that AI/ML hardware started to get noticed by more and more people. It was discovered that the shader cores and then the CUDA cores could be used to do AI/ML workloads at ludicrous speeds. This enabled research and development of models like GPT-2, Stable Diffusion, DLSS, and so on. This has led to a Cambrian Explosion of AI/ML research and development that is continuing to this day.</p> <h2 id='the-quot-gpus-quot-that-fly-io-is-using' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-quot-gpus-quot-that-fly-io-is-using' aria-label='Anchor'></a><span class='plain-code'>The &ldquo;GPUs&rdquo; that Fly.io is using</span></h2> <p>I&rsquo;ve mostly been describing consumer GPUs and their capabilities up to this point because that&rsquo;s what we all have the biggest understanding of. There is a huge difference between the &ldquo;GPUs&rdquo; that you can get for server tasks and normal consumer tasks from a place like Newegg or Best Buy. The main difference is that enterprise-grade Graphics Processing Units do not have any of the hardware needed to process graphics.</p> <div class="right-sidenote"><p>Author’s note: This will not be the case in the future. Fly.io is going to add <a href="https://www.nvidia.com/en-us/data-center/l40s/" title="">Lovelace L40S GPUs</a> that do have 3D rendering, video encoding, shader cores, and so on. But, that’s not what we’re talking about today.</p> </div> <p>Yes. Really. They don&rsquo;t have rasterization hardware, shader cores, display outputs, or anything useful for trying to run games on them. They are AI/ML accelerator cards more than anything. It&rsquo;s kinda beautifully ironic that they&rsquo;re called Graphics Processing Units when they have no ability to process graphics.</p> <h2 id='what-can-you-do-with-them' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-can-you-do-with-them' aria-label='Anchor'></a><span class='plain-code'>What can you do with them?</span></h2> <p>These GPUs are really good at massively parallel tasks. This naturally translates to being very good at AI/ML tasks such as:</p> <ul> <li>Summarization (what is this article about in a few sentences?) </li><li>Translation (what does this article say in Spanish?) </li><li>Speech recognition (what is a voice clip saying?) </li><li>Speech synthesis (what does this text sound like?) </li><li>Text generation (what would a cat say if it could talk?) </li><li>Basic rote question and answering (what is the safe cooking temperature for chicken breasts in celsius?) </li><li>Text classification (is this article about cats or dogs?) </li><li>Sentiment analysis (is this article positive or negative, what could that mean about the companies involved?) </li><li>Image classification (is this a cat or a dog?) </li><li>Object detection (where are the cats and dogs in this image?) </li></ul> <p>Or any combination/chain of these tasks. A lot of this is pretty abstract building blocks that can be combined in a lot of different ways. This is why AI/ML stuff is so exciting right now. We&rsquo;re in the early days of understanding what these things are, what they can do, and how to use them properly.</p> <p>Imagine being able to load articles about the topic you are researching into your queries to find where someone said something roughly similar to what you&rsquo;re looking for. Queries like &ldquo;that one recipe with eggs that you fold over with ham in it&rdquo;. That&rsquo;s the kind of thing that&rsquo;s possible with AI/ML (and tools like vector databases) but difficult to impossible with traditional search engines.</p> <h2 id='how-to-use-ai-for-reals' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-to-use-ai-for-reals' aria-label='Anchor'></a><span class='plain-code'>How to use AI for reals</span></h2> <p>Fortunately and unfortunately, we&rsquo;re in the Cambrian Explosion days of this industry. Key advances happen constantly. Exact models and tooling changes almost as often. This is both a very good thing and a very bad thing.</p> <p>If you want to get started today, here&rsquo;s a few models that you can play with right now:</p> <ul> <li><a href='https://ai.meta.com/llama/' title=''>Llama 2</a> - A generic foundation model with instruction and chat tuned variants. It&rsquo;s a good starting point for a lot of research and nearly everything else uses the same formats that Llama 2 does. </li><li><a href='https://openai.com/research/whisper' title=''>Whisper</a> - A speech to text model that transcribes audio files into text better than most professional dictation software. I, the author of this article, wrote most of this article using Whisper. </li><li><a href='https://huggingface.co/NurtureAI/OpenHermes-2.5-Mistral-7B-16k' title=''>OpenHermes-2.5 Mistral 7B 16k</a> - An instruction-tuned model that can operate on up to 16 thousand tokens (about 40 printed pages of text, 12,000 words) at once. It&rsquo;s a good starting point for summarization and other tasks that require a lot of context. I personally use it for my personal AI chatbot named <a href='https://xeiaso.net/characters/#Mimi' title=''>Mimi</a>. </li><li><aside class="right-sidenote">Seriously Annie, you&rsquo;re great!</aside> <a href='https://stability.ai/stable-diffusion' title=''>Stable Diffusion XL</a> - A text-to-image model that lets you create high quality images from simple text descriptions. It&rsquo;s a good starting point for tasks that require image generation, such as when you want to add images to your blog posts but don&rsquo;t have an artist like Annie to draw you what you want. </li></ul> <p>For a practical example, imagine that you have a set of <a href='https://xeiaso.net/talks/' title=''>conference talks that you&rsquo;ve given over the years</a>. You want to take those talk videos, extract the audio, and transform them into written text because some people learn better from text than video. The overall workflow would look something like this:</p> <ul> <li>Use ffmpeg to extract the audio track from the video files </li><li>Use Whisper to <a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>convert the audio files into subtitle files</a> </li><li>Break the subtitle file into sequences based on significant pauses between topics (humans do this subconsciously, take advantage of it and you can make things seem heckin&rsquo; magic) </li><li>Use a large language model to summarize the segments and create a title for each segment </li><li>Paste the rest of the text into a markdown document between the segment titles </li><li>Manually review the documents and make any necessary changes with technical terms that the model didn&rsquo;t know about or things the model got wrong because English is a minefield of homophones that even trained experts have trouble with (ask me how I know) </li><li>Publish the documents on your blog </li></ul> <p>Then bam, you don&rsquo;t just have a portfolio piece, you have the recipe for winning downtime from visitors of orange websites clicking on your link so much. You can also use this to create transcripts for your videos so that people who can&rsquo;t hear can still enjoy your content.</p> <p>The true advantage of these is not using them as individual parts on themselves, but as a cohesive whole in a chain. This is where the real power of AI/ML comes from. It&rsquo;s not the individual models, but the ability to chain them together to do something useful. This is where the true opportunities for innovation lie.</p> <h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>So that&rsquo;s what these &ldquo;GPUs&rdquo; are really: they&rsquo;re AI/ML accelerator cards. The A100 cards incapable of processing graphics or encoding video, but they&rsquo;re really, really good at AI/ML workloads. They allow you to do way more tasks per watt than any CPU ever could.</p> <p>I hope you enjoyed this tale of woe as I spilled out the horrible truths about marketing being awful forever and gave you ideas for how to <em>actually use</em> these graphics-free Graphics Processing Units to do useful things. But sadly, not for processing graphics unless you wait for the <a href='https://www.nvidia.com/en-us/data-center/l40s/' title=''>Lovelace L40S</a> cards early in 2024.</p> <p>Sign up for Fly.io today and try our GPUs! I can&rsquo;t wait to see what you build with them.</p> </content> </entry> <entry> <title>Scaling Large Language Models to zero with Ollama</title> <link rel="alternate" href="https://fly.io/blog/scaling-llm-ollama/"/> <id>https://fly.io/blog/scaling-llm-ollama/</id> <published>2023-12-06T12:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/scaling-llm-ollama/assets/thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io. We have powerful servers worldwide to run your code close to your users. Including GPUs so you can self host your own AI.</p> </div> <p>Open-source self-hosted AI tools have advanced a lot in the past 6 months. They allow you to create new methods of expression (with QR code generation and Stable Diffusion), easy access to summarization powers that would have made Google blush a decade ago (even with untuned foundation models such as LLaMa 2 and Yi), to conversational assistants that enable people to do more with their time, and to perform speech recognition in <em>real time</em> on moderate hardware (with Whisper et al). With all these capabilities comes the need for more and more raw computational muscle to be able to do inference on bigger and bigger models, and eventually do things that we can&rsquo;t even imagine right now. Fly.io lets you put your compute where your users are so that you can do machine learning inference tasks on the edge with the power of enterprise-grade GPUs such as the Nvidia A100. You can also scale your GPU nodes to zero running Machines, so you only pay for what you actually need, when you need it.</p> <div class="right-sidenote"><p>It’s worth mentioning that “scaling to zero” doesn’t mean what you may think it means. When you “scale to zero” in Fly.io, you actually stop the running Machine. This means the Machine is still laying around on the same computer box that it runs on, but it’s just put to sleep. If there is a capacity issue then your app may be unable to wake back up. We are working on a solution to this, but for now you should be aware that scaling to zero is not the same as spinning down your Machine and spinning it back up again on a new computer box when you need it.</p> </div><div class="callout"><p>This is a continuation of the last post in this series about <a href="https://fly.io/blog/transcribing-on-fly-gpu-machines/" title="">how to use GPUs on Fly.io</a>.</p> </div><h2 id='why-scale-to-zero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-scale-to-zero' aria-label='Anchor'></a><span class='plain-code'>Why scale to zero?</span></h2> <p>Running GPU nodes on top of Fly is expensive. Sure, GPUs enable you to do things a lot faster than CPUs ever could on their own, but you mostly will have things run idle between uses. This is where scaling to zero comes in. With scaling to zero, you can have your GPU nodes shut down when you&rsquo;re not using them. When your Machine stops, you aren&rsquo;t paying for the GPU any more. This is good for the environment and your wallet.</p> <p>In this post, we&rsquo;re going to be using <a href='https://ollama.ai' title=''>Ollama</a> to generate text. Ollama is a fancy wrapper around <a href='https://github.com/ggerganov/llama.cpp' title=''>llama.cpp</a> that allows you to run large language models on your own hardware with your choice of model. It also supports GPU acceleration, meaning that you can use Fly.io&rsquo;s huge GPUs to run your models faster than your RTX 3060 at home ever would on its own.</p> <p>One of the main downsides of using Ollama in a cloud environment is that it doesn&rsquo;t have authentication by default. Thanks to the power of about 70 lines of Go, we are able to shim that in after the fact. This will protect your server from random people on the internet using your GPU time (and spending your money) to generate text and integrate it into your own applications.</p> <p>Create a new folder called <code>ollama-scale-to-0</code>:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-hmfd22hk" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-hmfd22hk"><span class="nb">mkdir </span>ollama-scale-to-0 </code></pre> </div> </div><h2 id='fly-app-setup' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-app-setup' aria-label='Anchor'></a><span class='plain-code'>Fly app setup</span></h2> <p>First, we need to create a new Fly app:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-tzghjjx5" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-tzghjjx5">fly launch <span class="nt">--no-deploy</span> </code></pre> </div> </div> <p>After selecting a name and an organization to run it in, this command will create the app and write out a <code>fly.toml</code> file for you:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-bfrjoo6m" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-bfrjoo6m"><span class="c"># fly.toml app configuration file generated for sparkling-violet-709 on 2023-11-14T12:13:53-05:00</span> <span class="c">#</span> <span class="c"># See https://fly.io/docs/reference/configuration/ for information about how to use this file.</span> <span class="c">#</span> <span class="py">app</span> <span class="p">=</span> <span class="s">"sparkling-violet-709"</span> <span class="py">primary_region</span> <span class="p">=</span> <span class="s">"ord"</span> <span class="nn">[http_service]</span> <span class="py">internal_port</span> <span class="p">=</span> <span class="mi">11434</span> <span class="c"># change me to 11434!</span> <span class="py">force_https</span> <span class="p">=</span> <span class="kc">false</span> <span class="c"># change mo to false!</span> <span class="py">auto_stop_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">auto_start_machines</span> <span class="p">=</span> <span class="kc">true</span> <span class="py">min_machines_running</span> <span class="p">=</span> <span class="mi">0</span> <span class="py">processes</span> <span class="p">=</span> <span class="nn">["app"]</span> </code></pre> </div> </div> <p>This is the configuration file that Fly.io uses to know how to run your application. We&rsquo;re going to be modifying the <code>fly.toml</code> file to add some additional configuration to it, such as enabling GPU support:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-3lhl3358" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-3lhl3358"><span class="py">app</span> <span class="p">=</span> <span class="s">"sparkling-violet-709"</span> <span class="py">primary_region</span> <span class="p">=</span> <span class="s">"ord"</span> <span class="py">vm.size</span> <span class="p">=</span> <span class="s">"a100-40gb"</span> <span class="c"># the GPU size, see https://fly.io/docs/gpus/gpu-quickstart/ for more info</span> </code></pre> </div> </div> <p>We don&rsquo;t want to expose the GPU to the internet, so we&rsquo;re going to create a <a href='https://fly.io/docs/reference/private-networking/#flycast-private-load-balancing' title=''>flycast</a> address to expose it to other services on your private network. To create a flycast address, run this command:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-bthlbecs" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-bthlbecs">fly ips allocate-v6 <span class="nt">--private</span> </code></pre> </div> </div> <p>The <code>fly ips allocate-v6</code> command makes a unique address in your private network that you can use to access Ollama from your other services. Make sure to add the <code>--private</code> flag, otherwise you&rsquo;ll get a globally unique IP address instead of a private one.</p> <p>Next, you may need to remove all of the other public IP addresses for the app to lock it away from the public. Get a list of them with <code>fly ips list</code> and then remove them with <code>fly ips release &lt;ip&gt;</code>. Delete everything but your flycast IP.</p> <p>Next, we need to declare the volume for Ollama to store models in. If you don&rsquo;t do this, then when you scale to zero, your existing models will be destroyed and you will have to re-download them every time the server starts. This is not ideal, so we&rsquo;re going to create a persistent volume to store the models in. Add the following to your <code>fly.toml</code>:</p> <div class="highlight-wrapper group relative toml"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-i9h5kt6l" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-i9h5kt6l"><span class="nn">[build]</span> <span class="py">image</span> <span class="p">=</span> <span class="s">"ollama/ollama"</span> <span class="nn">[mounts]</span> <span class="py">source</span> <span class="p">=</span> <span class="s">"models"</span> <span class="py">destination</span> <span class="p">=</span> <span class="s">"/root/.ollama"</span> <span class="py">initial_size</span> <span class="p">=</span> <span class="s">"100gb"</span> </code></pre> </div> </div> <p>This will create a 100GB volume in the <a href='https://en.wikipedia.org/wiki/O%27Hare_International_Airport' title=''><code>ord</code></a> region when the app is deployed. This will be used to store the models that you download from the <a href='https://ollama.ai/library/' title=''>Ollama library</a>. You can make this smaller if you want, but 100GB is a good place to start from.</p> <p>Now that everything is set up, we can deploy this to Fly.io:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-iogi1ir3" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-iogi1ir3">fly deploy </code></pre> </div> </div> <p>This will take a minute to pull the Ollama image, push it to a Machine, provision your volume, and kick everything else off with hypervisors, GPUs and whatnot. Once it&rsquo;s done, you should see something like this:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-rgjl7r36" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-rgjl7r36"> ✔ Machine 17816141f55489 <span class="o">[</span>app] update succeeded <span class="nt">-------</span> Visit your newly deployed app at https://sparkling-violet-709.fly.dev/ </code></pre> </div> </div> <p>This is a lie because we just deleted the public IP addresses for this app. You can&rsquo;t access it from the internet, and by extension, random people can&rsquo;t access it either. For now, you can run an interactive session with Ollama using an ephemeral Fly Machine:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-pjpmi8ic" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-pjpmi8ic">fly m run <span class="nt">-e</span> <span class="nv">OLLAMA_HOST</span><span class="o">=</span>http://sparkling-violet-709.flycast <span class="nt">--shell</span> ollama/ollama </code></pre> </div> </div> <p>And then you can pull an image from the <a href='https://ollama.ai/library/' title=''>ollama library</a> and generate some text:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-ytdqtkck" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-ytdqtkck"><span class="nv">$ </span>ollama run openchat:7b-v3.5-fp16 <span class="o">&gt;&gt;&gt;</span> How <span class="k">do </span>I bake chocolate chip cookies? To bake chocolate chip cookies, follow these steps: 1. Preheat the oven to 375°F <span class="o">(</span>190°C<span class="o">)</span> and line a baking sheet with parchment paper or silicone baking mat. 2. In a large bowl, mix together 1 cup of unsalted butter <span class="o">(</span>softened<span class="o">)</span>, 3/4 cup granulated sugar, and 3/4 cup packed brown sugar <span class="k">until </span>light and fluffy. 3. Add 2 large eggs, one at a <span class="nb">time</span>, to the butter mixture, beating well after each addition. Stir <span class="k">in </span>1 teaspoon of pure vanilla extract. 4. In a separate bowl, whisk together 2 cups all-purpose flour, 1/2 teaspoon baking soda, and 1/2 teaspoon salt. Gradually add the dry ingredients to the wet ingredients, stirring <span class="k">until </span>just combined. 5. Fold <span class="k">in </span>2 cups of chocolate chips <span class="o">(</span>or chunks<span class="o">)</span> into the dough. 6. Drop rounded tablespoons of dough onto the prepared baking sheet, spacing them about 2 inches apart. 7. Bake <span class="k">for </span>10-12 minutes, or <span class="k">until </span>the edges are golden brown. The centers should still be slightly soft. 8. Allow the cookies to cool on the baking sheet <span class="k">for </span>a few minutes before transferring them to a wire rack to cool completely. Enjoy your homemade chocolate chip cookies! </code></pre> </div> </div> <p>If you want a persistent wake-on-use connection to your Ollama instance, you can set up a <a href='https://fly.io/docs/reference/private-networking/#discovering-apps-through-dns-on-a-wireguard-connection' title=''>connection to your Fly network using WireGuard</a>. This will let you use Ollama from your local applications without having to run them on Fly. For example, if you want to figure out the safe cooking temperature for ground beef in Celsius, you can query that in JavaScript with this snippet of code:</p> <div class="highlight-wrapper group relative typescript"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-rlnqfarq" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-rlnqfarq"><span class="kd">const</span> <span class="nx">generateRequest</span> <span class="o">=</span> <span class="p">{</span> <span class="na">model</span><span class="p">:</span> <span class="dl">"</span><span class="s2">openchat:7b-v3.5-fp16</span><span class="dl">"</span><span class="p">,</span> <span class="na">prompt</span><span class="p">:</span> <span class="dl">"</span><span class="s2">What is the safe cooking temperature for ground beef in celsius?</span><span class="dl">"</span> <span class="na">stream</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span> <span class="c1">// &lt;- important for Node/Deno clients</span> <span class="p">};</span> <span class="kd">let</span> <span class="nx">resp</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">fetch</span><span class="p">(</span><span class="dl">"</span><span class="s2">http://sparkling-violet-709.flycast/api/generate</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span> <span class="na">method</span><span class="p">:</span> <span class="dl">"</span><span class="s2">POST</span><span class="dl">"</span><span class="p">,</span> <span class="na">body</span><span class="p">:</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">generateRequest</span><span class="p">),</span> <span class="p">});</span> <span class="k">if</span> <span class="p">(</span><span class="nx">resp</span><span class="p">.</span><span class="nx">status</span> <span class="o">!==</span> <span class="mi">200</span><span class="p">)</span> <span class="p">{</span> <span class="k">throw</span> <span class="k">new</span> <span class="nb">Error</span><span class="p">(</span><span class="s2">`error fetching response: </span><span class="p">${</span><span class="nx">resp</span><span class="p">.</span><span class="nx">status</span><span class="p">}</span><span class="s2">: </span><span class="p">${</span><span class="k">await</span> <span class="nx">resp</span><span class="p">.</span><span class="nx">text</span><span class="p">()}</span><span class="s2">`</span><span class="p">);</span> <span class="p">}</span> <span class="nx">resp</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">resp</span><span class="p">.</span><span class="nx">json</span><span class="p">();</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">resp</span><span class="p">.</span><span class="nx">response</span><span class="p">);</span> <span class="c1">// Something like "The safe cooking temperature for ground beef is 71 degrees celsius (160 degrees fahrenheit).</span> </code></pre> </div> </div><h2 id='scaling-to-zero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#scaling-to-zero' aria-label='Anchor'></a><span class='plain-code'>Scaling to zero</span></h2> <p>The best part about all of this is that when you want to scale down to zero running Machines: do nothing, it will automatically shut down when it&rsquo;s idle. Wait a few minutes and then verify it with <code>fly status</code>:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-u3h45u8u" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-u3h45u8u"><span class="nv">$ </span>fly status ... PROCESS ID VERSION REGION STATE ROLE CHECKS LAST UPDATED app 3d8d7949b22089 9 ord stopped 2023-11-14T19:34:24Z </code></pre> </div> </div> <p>The app has been stopped. This means that it&rsquo;s not running and you&rsquo;re not paying for it. When you want it to start up again, just make a request. It will automatically start up and you can use it as normal with the CLI or even just arbitrary calls to <a href='https://github.com/jmorganca/ollama/blob/main/docs/api.md' title=''>the API</a>.</p> <p>You can also upload your own models to the Ollama registry by <a href='https://github.com/jmorganca/ollama/blob/main/docs/import.md' title=''>creating your own Modelfile</a> and pushing it (though you will need to install Ollama locally to publish your own models). At this time, the only way to set a custom system prompt is to use a Modelfile and upload your model to the registry.</p> <h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>Ollama is a fantastic way to run large language models of your choice and the ability to use Fly.io&rsquo;s powerful GPUs means you can use bigger models with more parameters and a larger context window. This lets you make your assistants more lifelike, your conversations have more context, and your text generation more realistic.</p> <p>Oh, by the way, this also lets you use the new <code>json</code> mode to have your models call functions, similar to how ChatGPT would. To do this, have a system prompt that looks like this:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-p3jklt02" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-p3jklt02">You are a helpful research assistant. The following functions are available for you to fetch further data to answer user questions, if relevant: { "function": "search_bing", "description": "Search the web for content on Bing. This allows users to search online/the internet/the web for content.", "arguments": [ { "name": "query", "type": "string", "description": "The search query string" } ] } { "function": "search_arxiv", "description": "Search for research papers on ArXiv. Make use of AND, OR and NOT operators as appropriate to join terms within the query.", "arguments": [ { "name": "query", "type": "string", "description": "The search query string" } ] } To call a function, respond - immediately and only - with a JSON object of the following format: { "function": "function_name", "arguments": { "argument1": "argument_value", "argument2": "argument_value" } } If no function needs to be called, respond with an empty JSON object: {} </code></pre> </div> </div> <p>Then you can use the <a href='https://github.com/jmorganca/ollama/blob/main/docs/api.md#request-json-mode' title=''>JSON format</a> to receive a JSON response from Ollama (hint: <code>—format=json</code> in the CLI or <code>format: &quot;json&quot;</code> in the API). This is a great way to make your assistants more lifelike and more useful. You will need to use something like <a href='https://www.langchain.com/' title=''>Langchain</a> or manual iterations to properly handle the cases where the user doesn&rsquo;t want to call a function, but that&rsquo;s a topic for another blog post.</p> <p>For the best results you may want to use a model with a larger context window such as <a href='https://ollama.ai/library/vicuna:13b-v1.5-16k-fp16' title=''>vicuna:13b-v1.5-16k-fp16</a> (16k == 16,384 token window) as JSON is very token-expensive. Future advances in the next few weeks (such as the Yi models gaining ludicrous token windows on the line of 200,000 tokens at the cost of ludicrous amounts of VRAM usage) will make this less of an issue. You can also get away with minifying the JSON in the functions and examples a lot, but you may need to experiment to get the best results.</p> <p>Happy hacking, y&#39;all.</p> </content> </entry> <entry> <title>Rethinking Serverless with FLAME</title> <link rel="alternate" href="https://fly.io/blog/rethinking-serverless-with-flame/"/> <id>https://fly.io/blog/rethinking-serverless-with-flame/</id> <published>2023-12-06T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/rethinking-serverless-with-flame/assets/flame-thumb.webp"/> <content type="html"><blockquote>Imagine if you could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of your app.</blockquote> <p>The pursuit of elastic, auto-scaling applications has taken us to silly places.</p> <p>Serverless/FaaS had a couple things going for it. Elastic Scale™ is hard. It&rsquo;s even harder when you need to manage those pesky servers. It also promised pay-what-you-use costs to avoid idle usage. Good stuff, right?</p> <p>Well the charade is over. You offload scaling concerns and the complexities of scaling, just to end up needing <em>more complexity</em>. Additional queues, storage, and glue code to communicate back to our app is just the starting point. Dev, test, and CI complexity balloons as fast as your costs. Oh, and you often have to rewrite your app in proprietary JavaScript – even if it&rsquo;s already written in JavaScript!</p> <p>At the same time, the rest of us have elastically scaled by starting more webservers. Or we&rsquo;ve dumped on complexity with microservices. This doesn&rsquo;t make sense. Piling on more webservers to transcode more videos or serve up more ML tasks isn&rsquo;t what we want. And granular scale shouldn&rsquo;t require slicing our apps into bespoke operational units with their own APIs and deployments to manage.</p> <p>Enough is enough. There&rsquo;s a better way to elastically scale applications.</p> <h2 id='the-flame-pattern' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-flame-pattern' aria-label='Anchor'></a><span class='plain-code'>The FLAME pattern</span></h2> <p>Here&rsquo;s what we really want:</p> <ul> <li>We don&rsquo;t want to manage those pesky servers. We already have this for our app deployments via <code>fly deploy</code>, <code>git push heroku</code>, <code>kubectl</code>, etc </li><li>We want on-demand, <em>granular</em> elastic scale of specific parts of our app code </li><li>We don&rsquo;t want to rewrite our application or write parts of it in proprietary runtimes </li></ul> <p>Imagine if we could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of the app.</p> <p>Enter the FLAME pattern.</p> <blockquote>FLAME - Fleeting Lambda Application for Modular Execution</blockquote> <p>With FLAME, you treat your <em>entire application</em> as a lambda, where modular parts can be executed on short-lived infrastructure.</p> <p>No rewrites. No bespoke runtimes. No outrageous layers of complexity. Need to insert the results of an expensive operation to the database? PubSub broadcast the result of some expensive work? No problem! It&rsquo;s your whole app so of course you can do it.</p> <p>The Elixir <a href='https://github.com/phoenixframework/flame' title=''>flame library</a> implements the FLAME pattern. It has a backend adapter for Fly.io, but you can use it on any cloud that gives you an API to spin up an instance with your app code running on it. We&rsquo;ll talk more about backends in a bit, as well as implementing FLAME in other languages.</p> <p>First, lets watch a realtime thumbnail generation example to see FLAME + Elixir in action:</p> <div class="youtube-container" data-exclude-render> <div class="youtube-video"> <iframe width="100%" height="100%" src="https://www.youtube.com/embed/l1xt_rkWdic" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe> </div> </div> <p>Now let&rsquo;s walk thru something a little more basic. Imagine we have a function to transcode video to thumbnails in our Elixir application after they are uploaded:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-dcj5640t" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-dcj5640t"><span class="k">def</span> <span class="n">generate_thumbnails</span><span class="p">(%</span><span class="no">Video</span><span class="p">{}</span> <span class="o">=</span> <span class="n">vid</span><span class="p">,</span> <span class="n">interval</span><span class="p">)</span> <span class="k">do</span> <span class="n">tmp</span> <span class="o">=</span> <span class="no">Path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="no">System</span><span class="o">.</span><span class="n">tmp_dir!</span><span class="p">(),</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">UUID</span><span class="o">.</span><span class="n">generate</span><span class="p">())</span> <span class="no">File</span><span class="o">.</span><span class="n">mkdir!</span><span class="p">(</span><span class="n">tmp</span><span class="p">)</span> <span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"-i"</span><span class="p">,</span> <span class="n">vid</span><span class="o">.</span><span class="n">url</span><span class="p">,</span> <span class="s2">"-vf"</span><span class="p">,</span> <span class="s2">"fps=1/</span><span class="si">#{</span><span class="n">interval</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="s2">"</span><span class="si">#{</span><span class="n">tmp</span><span class="si">}</span><span class="s2">/%02d.png"</span><span class="p">]</span> <span class="no">System</span><span class="o">.</span><span class="n">cmd</span><span class="p">(</span><span class="s2">"ffmpeg"</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span> <span class="n">urls</span> <span class="o">=</span> <span class="no">VidStore</span><span class="o">.</span><span class="n">put_thumbnails</span><span class="p">(</span><span class="n">vid</span><span class="p">,</span> <span class="no">Path</span><span class="o">.</span><span class="n">wildcard</span><span class="p">(</span><span class="n">tmp</span> <span class="o">&lt;&gt;</span> <span class="s2">"/*.png"</span><span class="p">))</span> <span class="no">Repo</span><span class="o">.</span><span class="n">insert_all</span><span class="p">(</span><span class="no">Thumb</span><span class="p">,</span> <span class="no">Enum</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">urls</span><span class="p">,</span> <span class="o">&amp;</span><span class="p">%{</span><span class="ss">vid_id:</span> <span class="n">vid</span><span class="o">.</span><span class="n">id</span><span class="p">,</span> <span class="ss">url:</span> <span class="nv">&amp;1</span><span class="p">}))</span> <span class="k">end</span> </code></pre> </div> </div> <p>Our <code>generate_thumbnails</code> function accepts a video struct. We shell out to <code>ffmpeg</code> to take the video URL and generate thumbnails at a given interval. We then write the temporary thumbnail paths to durable storage. Finally, we insert the generated thumbnail URLs into the database.</p> <p>This works great locally, but CPU bound work like video transcoding can quickly bring our entire service to a halt in production. Instead of rewriting large swaths of our app to move this into microservices or some FaaS, we can simply wrap it in a FLAME call:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-gcihj0ww" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-gcihj0ww"><span class="k">def</span> <span class="n">generate_thumbnails</span><span class="p">(%</span><span class="no">Video</span><span class="p">{}</span> <span class="o">=</span> <span class="n">vid</span><span class="p">,</span> <span class="n">interval</span><span class="p">)</span> <span class="k">do</span> <span class="no">FLAME</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="no">MyApp</span><span class="o">.</span><span class="no">FFMpegRunner</span><span class="p">,</span> <span class="k">fn</span> <span class="o">-&gt;</span> <span class="n">tmp</span> <span class="o">=</span> <span class="no">Path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="no">System</span><span class="o">.</span><span class="n">tmp_dir!</span><span class="p">(),</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">UUID</span><span class="o">.</span><span class="n">generate</span><span class="p">())</span> <span class="no">File</span><span class="o">.</span><span class="n">mkdir!</span><span class="p">(</span><span class="n">tmp</span><span class="p">)</span> <span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"-i"</span><span class="p">,</span> <span class="n">vid</span><span class="o">.</span><span class="n">url</span><span class="p">,</span> <span class="s2">"-vf"</span><span class="p">,</span> <span class="s2">"fps=1/</span><span class="si">#{</span><span class="n">interval</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="s2">"</span><span class="si">#{</span><span class="n">tmp</span><span class="si">}</span><span class="s2">/%02d.png"</span><span class="p">]</span> <span class="no">System</span><span class="o">.</span><span class="n">cmd</span><span class="p">(</span><span class="s2">"ffmpeg"</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span> <span class="n">urls</span> <span class="o">=</span> <span class="no">VidStore</span><span class="o">.</span><span class="n">put_thumbnails</span><span class="p">(</span><span class="n">vid</span><span class="p">,</span> <span class="no">Path</span><span class="o">.</span><span class="n">wildcard</span><span class="p">(</span><span class="n">tmp</span> <span class="o">&lt;&gt;</span> <span class="s2">"/*.png"</span><span class="p">))</span> <span class="no">Repo</span><span class="o">.</span><span class="n">insert_all</span><span class="p">(</span><span class="no">Thumb</span><span class="p">,</span> <span class="no">Enum</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">urls</span><span class="p">,</span> <span class="o">&amp;</span><span class="p">%{</span><span class="ss">vid_id:</span> <span class="n">vid</span><span class="o">.</span><span class="n">id</span><span class="p">,</span> <span class="ss">url:</span> <span class="nv">&amp;1</span><span class="p">}))</span> <span class="k">end</span><span class="p">)</span> <span class="k">end</span> </code></pre> </div> </div> <p>That&rsquo;s it! <code>FLAME.call</code> accepts the name of a runner pool, and a function. It then finds or boots a new copy of our entire application and runs the function there. Any variables the function closes over (like our <code>%Video{}</code> struct and <code>interval</code>) are passed along automatically.</p> <p>When the FLAME runner boots up, it connects back to the parent node, receives the function to run, executes it, and returns the result to the caller. Based on configuration, the booted runner either waits happily for more work before idling down, or extinguishes itself immediately.</p> <p>Let&rsquo;s visualize the flow:</p> <p><img alt="visualizing the flow" src="/blog/rethinking-serverless-with-flame/assets/visual.webp?centered" /></p> <p>We changed no other code and issued our DB write with <code>Repo.insert_all</code> just like before, because we are running our <em>entire</em> <em>application</em>. Database connection(s) and all. Except this fleeting application only runs that little function after startup and nothing else.</p> <p>In practice, a FLAME implementation will support a pool of runners for hot startup, scale-to-zero, and elastic growth. More on that later.</p> <h2 id='solving-a-problem-vs-removing-the-problem' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#solving-a-problem-vs-removing-the-problem' aria-label='Anchor'></a><span class='plain-code'>Solving a problem vs removing the problem</span></h2><blockquote>FaaS solutions help you solve a problem. FLAME removes the problem.</blockquote> <p>The FaaS labyrinth of complexity defies reason. And it&rsquo;s unavoidable. Let&rsquo;s walkthrough the thumbnail use-case to see how.</p> <p>We try to start with the simplest building block like request/response AWS Lambda Function URL&rsquo;s.</p> <p>The complexity hits immediately.</p> <p>We start writing custom encoders/decoders on both sides to handle streaming the thumbnails back to the app over HTTP. Phew that&rsquo;s done. Wait, is our video transcoding or user uploads going to take longer than 15 minutes? Sorry, hard timeout limit – time to split our videos into chunks to stay within the timeout, which means more lambdas to do that. Now we&rsquo;re orchestrating lambda workflows and relying on additional services, such as SQS and S3, to enable this.</p> <p>All the FaaS is doing is adding layers of communication between your code and the parts you want to run elastically. Each layer has its own glue integration price to pay.</p> <p>Ultimately handling this kind of use-case looks something like this:</p> <ul> <li>Trigger the lambda via HTTP endpoint, S3, or API gateway ($) </li><li>Write the bespoke lambda to transcode the video ($) </li><li>Place the thumbnail results into SQS ($) </li><li>Write the SQS consumer in our app (dev $) </li><li>Persist to DB and figure out how to get events back to active subscribers that may well be connected to other instances than the SQS consumer (dev $) </li></ul> <p>This is nuts. We pay the FaaS toll at every step. We shouldn&rsquo;t have to do any of this!</p> <p>FaaS provides a bunch of offerings to build a solution on top of. FLAME removes the problem entirely.</p> <h2 id='flame-backends' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#flame-backends' aria-label='Anchor'></a><span class='plain-code'>FLAME Backends</span></h2><blockquote>On Fly.io infrastructure the <code>FLAME.FlyBackend</code> can boot a copy of your application on a new <a href="https://fly.io/docs/machines/">Machine</a> and have it connect back to the parent for work within ~3s.</blockquote> <p>By default, FLAME ships with a <code>LocalBackend</code> and <code>FlyBackend</code>, but any host that provides an API to provision a server and run your app code can work as a FLAME backend. Erlang and Elixir primitives are doing all the heavy lifting here. The entire <code>FLAME.FlyBackend</code> is <a href='https://github.com/phoenixframework/flame/blob/main/lib/flame/fly_backend.ex' title=''>&lt; 200 LOC with docs</a>. The library has a single dependency, <code>req</code>, which is an HTTP client.</p> <p>Because Fly.io runs our applications as a packaged up docker image, we simply ask the Fly API to boot a new Machine for us with the same image that our app is currently running. Also thanks to Fly infrastructure, we can guarantee the FLAME runners are started in the same region as the parent. This optimizes latency and lets you ship whatever data back and forth between parent and runner without having to think about it.</p> <h2 id='look-at-everything-were-not-doing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-everything-were-not-doing' aria-label='Anchor'></a><span class='plain-code'>Look at everything we&rsquo;re not doing</span></h2> <p>With FaaS, just imagine how quickly the dev and testing story becomes a fate worse than death.</p> <p>To run the app locally, we either need to add some huge dev dependencies to simulate the entire FaaS pipeline, or worse, connect up our dev and test environments directly to the FaaS provider.</p> <p>With FLAME, your dev and test runners simply run on the local backend.</p> <p>Remember, this is your app. FLAME just controls where modular parts of it run. In dev or test, those parts simply run on the existing runtime on your laptop or CI server.</p> <p>Using Elixir, we can even send a file across to the remote FLAME application thanks to the distributed features of the Erlang VM:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-6icc60nu" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-6icc60nu"><span class="k">def</span> <span class="n">generate_thumbnails</span><span class="p">(%</span><span class="no">Video</span><span class="p">{}</span> <span class="o">=</span> <span class="n">vid</span><span class="p">,</span> <span class="n">interval</span><span class="p">)</span> <span class="k">do</span> <span class="n">parent_stream</span> <span class="o">=</span> <span class="no">File</span><span class="o">.</span><span class="n">stream!</span><span class="p">(</span><span class="n">vid</span><span class="o">.</span><span class="n">filepath</span><span class="p">,</span> <span class="p">[],</span> <span class="mi">2048</span><span class="p">)</span> <span class="no">FLAME</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="no">MyApp</span><span class="o">.</span><span class="no">FFMpegRunner</span><span class="p">,</span> <span class="k">fn</span> <span class="o">-&gt;</span> <span class="n">tmp_file</span> <span class="o">=</span> <span class="no">Path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="no">System</span><span class="o">.</span><span class="n">tmp_dir!</span><span class="p">(),</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">UUID</span><span class="o">.</span><span class="n">generate</span><span class="p">())</span> <span class="n">flame_stream</span> <span class="o">=</span> <span class="no">File</span><span class="o">.</span><span class="n">stream!</span><span class="p">(</span><span class="n">tmp_file</span><span class="p">)</span> <span class="no">Enum</span><span class="o">.</span><span class="n">into</span><span class="p">(</span><span class="n">parent_stream</span><span class="p">,</span> <span class="n">flame_stream</span><span class="p">)</span> <span class="n">tmp</span> <span class="o">=</span> <span class="no">Path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="no">System</span><span class="o">.</span><span class="n">tmp_dir!</span><span class="p">(),</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">UUID</span><span class="o">.</span><span class="n">generate</span><span class="p">())</span> <span class="no">File</span><span class="o">.</span><span class="n">mkdir!</span><span class="p">(</span><span class="n">tmp</span><span class="p">)</span> <span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"-i"</span><span class="p">,</span> <span class="n">tmp_file</span><span class="p">,</span> <span class="s2">"-vf"</span><span class="p">,</span> <span class="s2">"fps=1/</span><span class="si">#{</span><span class="n">interval</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="s2">"</span><span class="si">#{</span><span class="n">tmp</span><span class="si">}</span><span class="s2">/%02d.png"</span><span class="p">]</span> <span class="no">System</span><span class="o">.</span><span class="n">cmd</span><span class="p">(</span><span class="s2">"ffmpeg"</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span> <span class="n">urls</span> <span class="o">=</span> <span class="no">VidStore</span><span class="o">.</span><span class="n">put_thumbnails</span><span class="p">(</span><span class="n">vid</span><span class="p">,</span> <span class="no">Path</span><span class="o">.</span><span class="n">wildcard</span><span class="p">(</span><span class="n">tmp</span> <span class="o">&lt;&gt;</span> <span class="s2">"/*.png"</span><span class="p">))</span> <span class="no">Repo</span><span class="o">.</span><span class="n">insert_all</span><span class="p">(</span><span class="no">Thumb</span><span class="p">,</span> <span class="no">Enum</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">urls</span><span class="p">,</span> <span class="o">&amp;</span><span class="p">%{</span><span class="ss">vid_id:</span> <span class="n">vid</span><span class="o">.</span><span class="n">id</span><span class="p">,</span> <span class="ss">url:</span> <span class="nv">&amp;1</span><span class="p">}))</span> <span class="k">end</span><span class="p">)</span> <span class="k">end</span> </code></pre> </div> </div> <p>On line 2 we open a file on the parent node to the video path. Then in the FLAME child, we stream the file from the parent node to the FLAME server in only a couple lines of code. That&rsquo;s it! No setup of S3 or HTTP interfaces required.</p> <p>With FLAME it&rsquo;s easy to miss everything we&rsquo;re not doing:</p> <ul> <li>We don&rsquo;t need to write code outside of our application. We can reuse business logic, database setup, PubSub, and all the features of our respective platforms </li><li>We don&rsquo;t need to manage deploys of separate services or endpoints </li><li>We don&rsquo;t need to write results to S3 or SQS just to pick up values back in our app </li><li>We skip the dev, test, and CI dependency dance </li></ul> <h2 id='flame-outside-elixir' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#flame-outside-elixir' aria-label='Anchor'></a><span class='plain-code'>FLAME outside Elixir</span></h2> <p>Elixir is fantastically well suited for the FLAME model because we get so much <a href='https://fly.io/phoenix-files/elixir-and-phoenix-can-do-it-all/' title=''>for free</a> like process supervision and distributed messaging. That said, any language with reasonable concurrency primitives can take advantage of this pattern. For example, my teammate, Lubien, created a proof of concept example for breaking out functions in your JavaScript application and running them inside a new Fly Machine: <a href='https://github.com/lubien/fly-run-this-function-on-another-machine' title=''>https://github.com/lubien/fly-run-this-function-on-another-machine</a></p> <p>So the general flow for a JavaScript-based FLAME call would be to move the modular executions to a new file, which is executed on a runner pool. Provided the arguments are JSON serializable, the general FLAME flow is similar to what we&rsquo;ve outlined here. Your application, your code, running on fleeting instances.</p> <p>A complete FLAME library will need to handle the following concerns:</p> <ul> <li>Elastic pool scale-up and scale-down logic </li><li>Hot vs cold startup with pools </li><li>Remote runner monitoring to avoid orphaned resources </li><li>How to monitor and keep deployments fresh </li></ul> <p>For the rest of this post we&rsquo;ll see how the Elixir FLAME library handles these concerns as well as features uniquely suited to Elixir applications. But first, you might be wondering about your background job queues.</p> <h2 id='what-about-my-background-job-processor' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-about-my-background-job-processor' aria-label='Anchor'></a><span class='plain-code'>What about my background job processor?</span></h2> <p>FLAME works great inside your background job processor, but you may have noticed some overlap. If your job library handles scaling the worker pool, what is FLAME doing for you? There&rsquo;s a couple important distinctions here.</p> <p>First, we reach for these queues when we need <em>durability guarantees</em>. We often can turn knobs to have the queues scale to handle more jobs as load changes. But durable operations are separate from elastic execution. Conflating these concerns can send you down a similar path to lambda complexity. Leaning on your worker queue purely for offloaded execution means writing all the glue code to get the data into and out of the job, and back to the caller or end-user&rsquo;s device somehow.</p> <p>For example, if we want to guarantee we successfully generated thumbnails for a video after the user upload, then a job queue makes sense as the <em>dispatch, commit, and retry</em> <em>mechanism</em> for this operation. The actual transcoding could be a FLAME call inside the job itself, so we decouple the ideas of durability and scaled execution.</p> <p>On the other side, we have operations we don&rsquo;t need durability for. Take the screencast above where the user hasn&rsquo;t yet saved their video. Or an ML model execution where there&rsquo;s no need to waste resources churning a prompt if the user has already left the app. In those cases, it doesn&rsquo;t make sense to write to a durable store to pick up a job for work that will go right into the ether.</p> <h2 id='pooling-for-elastic-scale' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pooling-for-elastic-scale' aria-label='Anchor'></a><span class='plain-code'>Pooling for Elastic Scale</span></h2> <p>With the Elixir implementation of FLAME, you define elastic pools of runners. This allows scale-to-zero behavior while also elastically scaling up FLAME servers with max concurrency limits.</p> <p>For example, lets take a look at the <code>start/2</code> callback, which is the entry point of all Elixir applications. We can drop in a <code>FLAME.Pool</code> for video transcriptions and say we want it to scale to zero, boot a max of 10, and support 5 concurrent <code>ffmpeg</code> operations per runner:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-glp57duz" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-glp57duz"><span class="k">def</span> <span class="n">start</span><span class="p">(</span><span class="n">_type</span><span class="p">,</span> <span class="n">_args</span><span class="p">)</span> <span class="k">do</span> <span class="n">flame_parent</span> <span class="o">=</span> <span class="no">FLAME</span><span class="o">.</span><span class="no">Parent</span><span class="o">.</span><span class="n">get</span><span class="p">()</span> <span class="n">children</span> <span class="o">=</span> <span class="p">[</span> <span class="o">...</span><span class="p">,</span> <span class="no">MyApp</span><span class="o">.</span><span class="no">Repo</span><span class="p">,</span> <span class="p">{</span><span class="no">FLAME</span><span class="o">.</span><span class="no">Pool</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">Thumbs</span><span class="o">.</span><span class="no">FFMpegRunner</span><span class="p">,</span> <span class="ss">min:</span> <span class="mi">0</span><span class="p">,</span> <span class="ss">max:</span> <span class="mi">10</span><span class="p">,</span> <span class="ss">max_concurrency:</span> <span class="mi">5</span><span class="p">,</span> <span class="ss">idle_shutdown_after:</span> <span class="mi">30_000</span><span class="p">},</span> <span class="n">!flame_parent</span> <span class="o">&amp;&amp;</span> <span class="no">MyAppWeb</span><span class="o">.</span><span class="no">Endpoint</span> <span class="p">]</span> <span class="o">|&gt;</span> <span class="no">Enum</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="o">&amp;</span> <span class="nv">&amp;1</span><span class="p">)</span> <span class="n">opts</span> <span class="o">=</span> <span class="p">[</span><span class="ss">strategy:</span> <span class="ss">:one_for_one</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">MyApp</span><span class="o">.</span><span class="no">Supervisor</span><span class="p">]</span> <span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="n">children</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span> <span class="k">end</span> </code></pre> </div> </div> <p>We use the presence of a FLAME parent to conditionally start our Phoenix webserver when booting the app. There&rsquo;s no reason to start a webserver if we aren&rsquo;t serving web traffic. Note we leave other services like the database <code>MyApp.Repo</code> alone because we want to make use of those services inside FLAME runners.</p> <p>Elixir&rsquo;s supervised process approach to applications is uniquely great for turning these kinds of knobs.</p> <p>We also set our pool to idle down after 30 seconds of no caller operations. This keeps our runners hot for a short while before discarding them. We could also pass a <code>min: 1</code> to always ensure at least one <code>ffmpeg</code> runner is hot and ready for work by the time our application is started.</p> <h2 id='process-placement' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#process-placement' aria-label='Anchor'></a><span class='plain-code'>Process Placement</span></h2> <p>In Elixir, stateful bits of our applications are built around the <em>process</em> primitive – lightweight greenthreads with message mailboxes. Wrapping our otherwise stateless app code in a synchronous <code>FLAME.call</code>&lsquo;s or async <code>FLAME.cast</code>&rsquo;s works great, but what about the stateful parts of our app?</p> <p><code>FLAME.place_child</code> exists to take an existing process specification in your Elixir app and start it on a FLAME runner instead of locally. You can use it anywhere you&rsquo;d use <code>Task.Supervisor.start_child</code> , <code>DynamicSupervisor.start_child</code>, or similar interfaces. Just like <code>FLAME.call</code>, the process is run on an elastic pool and runners handle idle down when the process completes its work.</p> <p>And like <code>FLAME.call</code>, it lets us take existing app code, change a single LOC, and continue shipping features.</p> <p>Let&rsquo;s walk thru the example from the screencast above. Imagine we want to generate video thumbnails for a video <em>as it is being uploaded</em>. Elixir and LiveView make this easy. We won&rsquo;t cover all the code here, but you can view the <a href='https://github.com/fly-apps/thumbnail_generator/blob/main/lib/thumbs/thumbnail_generator.ex' title=''>full app implementation</a>.</p> <p>Our first pass would be to write a LiveView upload writer that calls into a <code>ThumbnailGenerator</code>:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-e630ykcb" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-e630ykcb"><span class="k">defmodule</span> <span class="no">ThumbsWeb</span><span class="o">.</span><span class="no">ThumbnailUploadWriter</span> <span class="k">do</span> <span class="nv">@behaviour</span> <span class="no">Phoenix</span><span class="o">.</span><span class="no">LiveView</span><span class="o">.</span><span class="no">UploadWriter</span> <span class="n">alias</span> <span class="no">Thumbs</span><span class="o">.</span><span class="no">ThumbnailGenerator</span> <span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span> <span class="n">generator</span> <span class="o">=</span> <span class="no">ThumbnailGenerator</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="p">%{</span><span class="ss">gen:</span> <span class="n">generator</span><span class="p">}}</span> <span class="k">end</span> <span class="k">def</span> <span class="n">write_chunk</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">state</span><span class="p">)</span> <span class="k">do</span> <span class="no">ThumbnailGenerator</span><span class="o">.</span><span class="n">stream_chunk!</span><span class="p">(</span><span class="n">state</span><span class="o">.</span><span class="n">gen</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">state</span><span class="p">}</span> <span class="k">end</span> <span class="k">def</span> <span class="n">meta</span><span class="p">(</span><span class="n">state</span><span class="p">),</span> <span class="k">do</span><span class="p">:</span> <span class="p">%{</span><span class="ss">gen:</span> <span class="n">state</span><span class="o">.</span><span class="n">gen</span><span class="p">}</span> <span class="k">def</span> <span class="n">close</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">_reason</span><span class="p">)</span> <span class="k">do</span> <span class="no">ThumbnailGenerator</span><span class="o">.</span><span class="n">close</span><span class="p">(</span><span class="n">state</span><span class="o">.</span><span class="n">gen</span><span class="p">)</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">state</span><span class="p">}</span> <span class="k">end</span> <span class="k">end</span> </code></pre> </div> </div> <p>An upload writer is a behavior that simply ferries the uploaded chunks from the client into whatever we&rsquo;d like to do with them. Here we have a <code>ThumbnailGenerator.open/1</code> which starts a process that communicates with an <code>ffmpeg</code> shell. Inside <code>ThumbnailGenerator.open/1</code>, we use regular elixir process primitives:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-ziskaky4" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-ziskaky4"> <span class="c1"># thumbnail_generator.ex</span> <span class="k">def</span> <span class="n">open</span><span class="p">(</span><span class="n">opts</span> <span class="p">\\</span> <span class="p">[])</span> <span class="k">do</span> <span class="no">Keyword</span><span class="o">.</span><span class="n">validate!</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="p">[</span><span class="ss">:timeout</span><span class="p">,</span> <span class="ss">:caller</span><span class="p">,</span> <span class="ss">:fps</span><span class="p">])</span> <span class="n">timeout</span> <span class="o">=</span> <span class="no">Keyword</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="ss">:timeout</span><span class="p">,</span> <span class="mi">5_000</span><span class="p">)</span> <span class="n">caller</span> <span class="o">=</span> <span class="no">Keyword</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="ss">:caller</span><span class="p">,</span> <span class="n">self</span><span class="p">())</span> <span class="n">ref</span> <span class="o">=</span> <span class="n">make_ref</span><span class="p">()</span> <span class="n">parent</span> <span class="o">=</span> <span class="n">self</span><span class="p">()</span> <span class="n">spec</span> <span class="o">=</span> <span class="p">{</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="p">{</span><span class="n">caller</span><span class="p">,</span> <span class="n">ref</span><span class="p">,</span> <span class="n">parent</span><span class="p">,</span> <span class="n">opts</span><span class="p">}}</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">pid</span><span class="p">}</span> <span class="o">=</span> <span class="no">DynamicSupervisor</span><span class="o">.</span><span class="n">start_child</span><span class="p">(</span><span class="nv">@sup</span><span class="p">,</span> <span class="n">spec</span><span class="p">)</span> <span class="k">receive</span> <span class="k">do</span> <span class="p">{</span><span class="o">^</span><span class="n">ref</span><span class="p">,</span> <span class="p">%</span><span class="no">ThumbnailGenerator</span><span class="p">{}</span> <span class="o">=</span> <span class="n">gen</span><span class="p">}</span> <span class="o">-&gt;</span> <span class="p">%</span><span class="no">ThumbnailGenerator</span><span class="p">{</span><span class="n">gen</span> <span class="o">|</span> <span class="ss">pid:</span> <span class="n">pid</span><span class="p">}</span> <span class="k">after</span> <span class="n">timeout</span> <span class="o">-&gt;</span> <span class="k">exit</span><span class="p">(</span><span class="ss">:timeout</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> </code></pre> </div> </div> <p>The details aren&rsquo;t super important here, except line 10 where we call <code>{:ok, pid} = DynamicSupervisor.start_child(@sup, spec)</code>, which starts a supervised<code>ThumbnailGenerator</code> process. The rest of the implementation simply ferries chunks as stdin into <code>ffmpeg</code> and parses png&rsquo;s from stdout. Once a PNG delimiter is found in stdout, we send the <code>caller</code> process (our LiveView process) a message saying &ldquo;hey, here&rsquo;s an image&rdquo;:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-y166mubi" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-y166mubi"><span class="c1"># thumbnail_generator.ex</span> <span class="nv">@png_begin</span> <span class="o">&lt;&lt;</span><span class="mi">137</span><span class="p">,</span> <span class="mi">80</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">71</span><span class="p">,</span> <span class="mi">13</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">26</span><span class="p">,</span> <span class="mi">10</span><span class="o">&gt;&gt;</span> <span class="k">defp</span> <span class="n">handle_stdout</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">ref</span><span class="p">,</span> <span class="n">bin</span><span class="p">)</span> <span class="k">do</span> <span class="p">%</span><span class="no">ThumbnailGenerator</span><span class="p">{</span><span class="ss">ref:</span> <span class="o">^</span><span class="n">ref</span><span class="p">,</span> <span class="ss">caller:</span> <span class="n">caller</span><span class="p">}</span> <span class="o">=</span> <span class="n">state</span><span class="o">.</span><span class="n">gen</span> <span class="k">case</span> <span class="n">bin</span> <span class="k">do</span> <span class="o">&lt;&lt;</span><span class="nv">@png_begin</span><span class="p">,</span> <span class="n">_rest</span><span class="p">::</span><span class="n">binary</span><span class="o">&gt;&gt;</span> <span class="o">-&gt;</span> <span class="k">if</span> <span class="n">state</span><span class="o">.</span><span class="n">current</span> <span class="k">do</span> <span class="n">send</span><span class="p">(</span><span class="n">caller</span><span class="p">,</span> <span class="p">{</span><span class="n">ref</span><span class="p">,</span> <span class="ss">:image</span><span class="p">,</span> <span class="n">state</span><span class="o">.</span><span class="n">count</span><span class="p">,</span> <span class="n">encode</span><span class="p">(</span><span class="n">state</span><span class="p">)})</span> <span class="k">end</span> <span class="p">%{</span><span class="n">state</span> <span class="o">|</span> <span class="ss">count:</span> <span class="n">state</span><span class="o">.</span><span class="n">count</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="ss">current:</span> <span class="p">[</span><span class="n">bin</span><span class="p">]}</span> <span class="n">_</span> <span class="o">-&gt;</span> <span class="p">%{</span><span class="n">state</span> <span class="o">|</span> <span class="ss">current:</span> <span class="p">[</span><span class="n">bin</span> <span class="o">|</span> <span class="n">state</span><span class="o">.</span><span class="n">current</span><span class="p">]}</span> <span class="k">end</span> <span class="k">end</span> </code></pre> </div> </div> <p>The <code>caller</code> LiveView process then picks up the message in a <code>handle_info</code> callback and updates the UI:</p> <div class="highlight-wrapper group relative elixir"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-3gf1jq5" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-3gf1jq5"><span class="c1"># thumb_live.ex</span> <span class="k">def</span> <span class="n">handle_info</span><span class="p">({</span><span class="n">_ref</span><span class="p">,</span> <span class="ss">:image</span><span class="p">,</span> <span class="n">_count</span><span class="p">,</span> <span class="n">encoded</span><span class="p">},</span> <span class="n">socket</span><span class="p">)</span> <span class="k">do</span> <span class="p">%{</span><span class="ss">count:</span> <span class="n">count</span><span class="p">}</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">assigns</span> <span class="p">{</span><span class="ss">:noreply</span><span class="p">,</span> <span class="n">socket</span> <span class="o">|&gt;</span> <span class="n">assign</span><span class="p">(</span><span class="ss">count:</span> <span class="n">count</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="ss">message:</span> <span class="s2">"Generating (</span><span class="si">#{</span><span class="n">count</span> <span class="o">+</span> <span class="mi">1</span><span class="si">}</span><span class="s2">)"</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="n">stream_insert</span><span class="p">(</span><span class="ss">:thumbs</span><span class="p">,</span> <span class="p">%{</span><span class="ss">id:</span> <span class="n">count</span><span class="p">,</span> <span class="ss">encoded:</span> <span class="n">encoded</span><span class="p">})}</span> <span class="k">end</span> </code></pre> </div> </div> <p>The <code>send(caller, {ref, :image, state.count, encode(state)}</code> is one magic part about Elixir. Everything is a process, and we can message those processes, regardless of their location in the cluster.</p> <p>It&rsquo;s like if every instantiation of an object in your favorite OO lang included a cluster-global unique identifier to work with methods on that object. The LiveView (a process) simply receives the image message and updates the UI with new images.</p> <p>Now let&rsquo;s head back over to our <code>ThumbnailGenerator.open/1</code> function and make this elastically scalable.</p> <div class="highlight-wrapper group relative diff"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-5jadq56a" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-5jadq56a"><span class="gd">- {:ok, pid} = DynamicSupervisor.start_child(@sup, spec) </span><span class="gi">+ {:ok, pid} = FLAME.place_child(Thumbs.FFMpegRunner, spec) </span></code></pre> </div> </div> <p>That&rsquo;s it! Because everything is a process and processes can live anywhere, it doesn&rsquo;t matter what server our <code>ThumbnailGenerator</code> process lives on. It simply messages the caller with <code>send(caller, …)</code> and the messages are sent across the cluster if needed.</p> <p>Once the process exits, either from an explicit close, after the upload is done, or from the end-user closing their browser tab, the FLAME server will note the exit and idle down if no other work is being done.</p> <p>Check out the <a href='https://github.com/fly-apps/thumbnail_generator/blob/main/lib/thumbs/thumbnail_generator.ex' title=''>full implementation</a> if you&rsquo;re interested.</p> <h2 id='remote-monitoring' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#remote-monitoring' aria-label='Anchor'></a><span class='plain-code'>Remote Monitoring</span></h2> <p>All this transient infrastructure needs failsafe mechanisms to avoid orphaning resources. If a parent spins up a runner, that runner must take care of idling itself down when no work is present and handle failsafe shutdowns if it can no longer contact the parent node.</p> <p>Likewise, we need to shutdown runners when parents are rolled for new deploys as we must guarantee we&rsquo;re running the same code across the cluster.</p> <p>We also have active callers in many cases that are awaiting the result of work on runners that could go down for any reason.</p> <p>There&rsquo;s a lot to monitor here.</p> <p>There&rsquo;s also a number of failure modes that make this sound like a harrowing experience to implement. Fortunately Elixir has all the primitives to make this an easy task thanks to the Erlang VM. Namely, we get the following for free:</p> <ul> <li>Process monitoring and supervision – we know when things go bad. Whether on a node-local process, or one across the cluster </li><li>Node monitoring – we know when nodes come up, and when nodes go away </li><li>Declarative and controlled app startup and shutdown - we carefully control the startup and shutdown sequence of applications as a matter of course. This allows us to gracefully shutdown active runners when a fresh deploy is triggered, while giving them time to finish their work </li></ul> <p>We&rsquo;ll cover the internal implementation details in a future deep-dive post. For now, feel free to poke around <a href='https://github.com/phoenixframework/flame' title=''>the flame source</a>.</p> <h2 id='whats-next' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next' aria-label='Anchor'></a><span class='plain-code'>What&rsquo;s Next</span></h2> <p>We&rsquo;re just getting started with the Elixir FLAME library, but it&rsquo;s ready to try out now. In the future look for more advance pool growth techniques, and deep dives into how the Elixir implementation works. You can also find me <a href='https://twitter.com/chris_mccord' title=''>@chris_mccord</a> to chat about implementing the FLAME pattern in your language of choice.</p> <p>Happy coding!</p> <p>–Chris</p> </content> </entry> <entry> <title>The risks of building apps on ChatGPT</title> <link rel="alternate" href="https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/"/> <id>https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/</id> <published>2023-12-05T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/assets/risks-building-on-chatgpt-thumb.webp"/> <content type="html"><div class="lead"><p>If AI will play an essential role in your application, then consider using a self-hosted, open source model instead of a proprietary and externally hosted one. In this post we explore some of the risks for the latter option. We’re Fly.io. We put your code into lightweight microVMs on our own hardware <a href="https://fly.io/docs/reference/regions/" title="">around the world</a>. <a href="https://fly.io/docs/speedrun/" title="">Check us out</a>—your app can be deployed in minutes.</p> </div> <p>The topic of &ldquo;AI&rdquo; gets a lot of attention and press. Coverage ranges from apocalyptic warnings to Utopian predictions. The truth, as always, is likely somewhere in the middle. As developers, we are the ones that either imagine ways that AI can be used to enhance our products or the ones doing the herculean tasks of implementing it inside our companies.</p> <p>I believe the following statement to be true:</p> <blockquote> <p>AI won’t replace humans — but humans with AI will replace humans without AI.</p> </blockquote> <p>I believe this can be extended to many products and services and the companies that create them. Let&rsquo;s express it this way:</p> <blockquote> <p>AI won’t replace businesses — but businesses with AI will replace businesses without AI.</p> </blockquote> <p>Today I&rsquo;m assuming your business would benefit from using AI. Or, at the very least, your C-levels have decreed from on high that thou must integrateth with AI. With that out of the way, the next question is how you&rsquo;re meant to do it. This post is an argument to build on top of open source language models instead of closed models that you rent access to. We&rsquo;ll take a look at what convinced me.</p> <h2 id='but-openai-is-the-market-leader' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-openai-is-the-market-leader' aria-label='Anchor'></a><span class='plain-code'>But OpenAI is the market leader…</span></h2> <p>OpenAI, the creators of the famous ChatGPT, are the strong market leaders in this category. Why wouldn&rsquo;t you want to use the best in the business?</p> <p>Early on, stories of private corporate documents being uploaded by employees and then finding that private information leaking out to general ChatGPT users was a real black eye. <a href='https://www.sciencealert.com/many-companies-are-banning-chatgpt-this-is-why' title=''>Companies began banning employees from using ChatGPT for work</a>. It exposed that people&rsquo;s interactions with ChatGPT were being used as training data for future versions of the model.</p> <p>In response, OpenAI recently announced an <a href='https://openai.com/enterprise' title=''>Enterprise</a> offering promising that no Enterprise customer data is used for training.</p> <p>With the top objection addressed, it should be smooth sailing for wide adoption, right?</p> <p>Not so fast.</p> <p>While an Enterprise offering may address that concern, there are other subtle reasons to not use OpenAI, or other closed models, that can&rsquo;t be resolved by vague statements of enterprise privacy.</p> <h2 id='what-are-the-risks-for-building-on-top-of-openai' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-are-the-risks-for-building-on-top-of-openai' aria-label='Anchor'></a><span class='plain-code'>What are the risks for building on top of OpenAI?</span></h2> <p>Let&rsquo;s briefly outline the risks we take on when relying on a company like OpenAI for critical AI features in our applications.</p> <ul> <li><strong class='font-semibold text-navy-950'>Single provider risk</strong>: Relying deeply on an external service that plays a critical role in our business is risky. The integration is not easily swapped out for another service if needed. Additionally, we don&rsquo;t want part of our &ldquo;secret sauce&rdquo; to actually be another company&rsquo;s product. That&rsquo;s some seriously shaky ground! They <em>want</em> to sell the same thing to our competitors too. </li><li><strong class='font-semibold text-navy-950'>Regulation or Policy change risk</strong>: &ldquo;AI&rdquo; is being talked about a lot in politics. What&rsquo;s acceptable today may be deemed &ldquo;not allowed&rdquo; in the future and a corporation providing a newly regulated service must comply. </li><li><strong class='font-semibold text-navy-950'>Financial risk</strong>: <a href='https://www.washingtonpost.com/technology/2023/06/05/chatgpt-hidden-cost-gpu-compute/' title=''>AI chatbots lose money on every chat.</a> If the financial models that make our business profitable are built on impossible to maintain prices, then our business model may be at risk when it&rsquo;s time to &ldquo;make the AI engine profitable&rdquo; like we&rsquo;ve seen happen time and time again with every industry from cookware to video games. What might the true cost be? We don&rsquo;t know. &lsquo;Nuff said. </li><li><strong class='font-semibold text-navy-950'>Governance and leadership risk</strong>: The co-founder and CEO of OpenAI, <a href='https://openai.com/blog/openai-announces-leadership-transition' title=''>Sam Altman, was forced out of his own company by a coup from his board</a>. This was later resolved with both <a href='https://openai.com/blog/openai-announces-leadership-transition' title=''>Sam Altman and Greg Brockman returning</a>. This exposes another risk we don&rsquo;t often consider with our providers. More on this later. </li></ul> <p>Let&rsquo;s look a bit closer at the &ldquo;Single provider risk&rdquo;.</p> <h2 id='single-provider-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#single-provider-risk' aria-label='Anchor'></a><span class='plain-code'>Single provider risk</span></h2> <p>For hobby usage, proof of concept work, and personal experiments, by all means, use ChatGPT! I do and I expect to continue to as well. It&rsquo;s fantastic for prototyping, it&rsquo;s trivial to set up, and it allows you to throw ink on canvas so much more quickly than any other option out there.</p> <p>Up until recently, I was all gung-ho for ChatGPT being integrated into my apps. What happened? November 2023 happened. It was a very bad month for OpenAI.</p> <p>I created a <a href='https://fly.io/phoenix-files/created-my-personal-ai-fitness-trainer-in-2-days/' title=''>Personal AI Fitness Trainer</a> powered by ChatGPT and on the morning of November 8th, I asked my personal trainer about the workout for the day and it failed. OpenAI was having a bad day with an outage.</p> <p>I don&rsquo;t fault someone for having a bad day. At some point, downtime happens to the best of us. And given enough time, it happens to <strong class='font-semibold text-navy-950'>all</strong> of us. But when possible, I want to prevent someone <em>else&rsquo;s</em> bad day from becoming <em>my</em> bad day too.</p> <h3 id='evaluating-a-critical-dependency' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#evaluating-a-critical-dependency' aria-label='Anchor'></a><span class='plain-code'>Evaluating a critical dependency</span></h3> <p>In my case, my personal fitness trainer being unavailable was a minor inconvenience, but I managed. However, it gave me pause. If I had built an AI fitness trainer as a service, that outage would be a much bigger deal and there would be nothing I could have done to fix it until the ChatGPT API came back up.</p> <p>With services like a Personal AI Fitness Trainer, the AI component is the primary focus and main value proposition of the app. That&rsquo;s pretty darn critical! If that AI service is interrupted, significantly altered (say, by the model suddenly refusing my requests for fitness information in ways that worked before) or my desired usage is denied (without warning or reason), the application is useless. That&rsquo;s an existential threat that could make my app evaporate overnight without warning.</p> <p>This highlights the risk of having a critical dependency on an external service.</p> <p>Modern applications depend on many services, both internal and external. But how <strong class='font-semibold text-navy-950'>critical</strong> that dependency is matters.</p> <p>Let&rsquo;s take a <em>very</em> simple application as an example. The application has a critical dependency on the database and both the app and database have a critical dependency on the underlying VMs/machines/provider. These critical dependencies are so common that we seldom think about them because we deal with them every day we come to work. It&rsquo;s just how things are.</p> <p><img alt="Diagram showing an application stack of hosting &gt; Database &gt; My Application and weak dependencies on logging, error reporting, etc. Then a critical dependency on an external AI as a Service. " src="/blog/the-risks-of-building-apps-on-chatgpt/assets/critical-dependency-vs-weak.png" /></p> <p>The danger comes when we draw a critical dependency line to an <strong class='font-semibold text-navy-950'>external</strong> <strong class='font-semibold text-navy-950'>service</strong>. If the service has a hiccup or the network between my app and their service starts dropping all my packets, the entire application goes down. Someone else&rsquo;s bad day gets spread around when that happens. 😞</p> <p>In order to protect ourselves from a risk like that, we should diversify our reliance away from a single external provider. How do we do that? We&rsquo;ll come back to this later.</p> <h3 id='we-are-not-without-dependencies' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-are-not-without-dependencies' aria-label='Anchor'></a><span class='plain-code'>We are not without dependencies</span></h3> <p>It&rsquo;s really common for apps to have external dependencies. The question is how critical to our service are those dependencies?</p> <p>What happens to the application when the external log aggregation service, email service, and error reporting services are all unreachable? If the app is designed well, then users may have a slightly degraded experience or, best case, the users won&rsquo;t even notice the issues at all!</p> <p>The key factor is these external services are not essential to our application functioning.</p> <h2 id='regulation-or-policy-change-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#regulation-or-policy-change-risk' aria-label='Anchor'></a><span class='plain-code'>Regulation or Policy change risk</span></h2> <p>Our industry has a lot of misconceptions, fear, uncertainty, and doubt around the idea of regulation, but sometimes it&rsquo;s justified. I don&rsquo;t want you to think about regulation as a scary thing that yanks away control. Instead, let&rsquo;s think about regulation as when a government body gets involved to disallow businesses from doing or engaging in specific activities. Given that our industry has been so self-defined for so long, this feels like an existential threat. However, this is a good thing when we think about vehicle safety standards (you don&rsquo;t want your 4-ton mass of metal exploding while traveling at 70 mph), pollution, health risks, and more. It&rsquo;s a careful balance.</p> <p>Ironically, Sam Altman has been a major proponent <a href='https://www.forbes.com/sites/johannacostigan/2023/06/13/openais-sam-altman-makes-global-call-for-ai-regulation-and-includes-china/?sh=4fc007421b47' title=''>for government regulation</a> of the AI industry. Why would he want that?</p> <p>It turns out that <a href='https://www.cato.org/policy-analysis/regulatory-protectionism-hidden-threat-free-trade' title=''>regulation can also be used as a form of protectionism</a>. Or, put another way, when the people with an early lead see that <a href='https://www.semianalysis.com/p/google-we-have-no-moat-and-neither' title=''>they aren&rsquo;t defensible against advances with open source AI models</a>, they want to pull up the ladders behind them and have the government make it legally harder, or impossible, for competitors to catch up to them.</p> <p>If Altman&rsquo;s efforts are successful, then companies who create AI can expect government involvement and oversight. Added licensing requirements and certifications would raise the cost of starting a competing business.</p> <p>At this point you may be thinking something like &ldquo;but all of that is theoretical Mark, how would this affect my business&rsquo; use of AI today?&rdquo;</p> <p>Introducing an external organization that can dictate changes to an AI product risks breaking an existing company&rsquo;s applications or significantly reducing the effectiveness of the application. And those changes may come without notice or warning.</p> <p>Additionally, if my business is built on an external AI system protected from competition by regulators, that adds a significant risk. If they are now the only game in town, they can set whatever price they want.</p> <h2 id='governance-and-leadership-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#governance-and-leadership-risk' aria-label='Anchor'></a><span class='plain-code'>Governance and leadership risk</span></h2> <p>In the week following the OpenAI outage (November 17th to be precise), the entire tech industry was upended for most of a week following a blog post on the OpenAI blog <a href='https://openai.com/blog/openai-announces-leadership-transition' title=''>announcing that the OpenAI board fired the co-founder and CEO, Sam Altman</a>. Then <a href='https://www.forbes.com/sites/richardnieva/2023/11/17/openai-president-and-co-founder-quits-over-sam-altman-firing/?sh=34fe4b621d57' title=''>Greg Brockman, co-founder and acting President resigned in protest</a>.</p> <p>OpenAI is partnered with Microsoft and on Nov 20, 2023, <a href='https://twitter.com/satyanadella/status/1726509045803336122' title=''>Satya Nadella (CEO of Microsoft) posted the following on X</a> (formerly Twitter):</p> <blockquote> <p>We remain committed to our partnership with OpenAI (OAI) and have confidence in our product roadmap, our ability to continue to innovate with everything we announced at Microsoft Ignite, and in continuing to support our customers and partners. We look forward to getting to know Emmett Shear and OAI&rsquo;s new leadership team and working with them. And <strong class='font-semibold text-navy-950'>we’re extremely excited to share the news that Sam Altman and Greg Brockman, together with colleagues, will be joining Microsoft to lead a new advanced AI research team.</strong> We look forward to moving quickly to provide them with the resources needed for their success.</p> </blockquote> <p>Microsoft nearly <a href='https://en.wikipedia.org/wiki/Acqui-hiring' title=''>acqui-hired</a> OpenAI for $0! That&rsquo;s some serious business Jujutsu.</p> <p>In the end, after 12 days of very public corporate chaos, <a href='https://openai.com/blog/sam-altman-returns-as-ceo-openai-has-a-new-initial-board' title=''>Sam Altman and Greg Brockman returned to OpenAI at their previous leadership positions</a> as if nothing happened (save the firing of the rest of the board).</p> <p>With all the drama and uncertainty resolved, you may say, &ldquo;it all worked out in the end, right? So what&rsquo;s the problem?&rdquo;</p> <p>This highlights the risk of building <em>any</em> critical business system on a product offered and hosted by an external company. When we do that, we implicitly take on all of that company&rsquo;s risks in addition to the risks our business already has! In this case, it&rsquo;s taking on all the risks of OpenAI while getting none of their financial benefits!</p> <h2 id='whats-the-alternative' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-the-alternative' aria-label='Anchor'></a><span class='plain-code'>What&rsquo;s the alternative?</span></h2> <p>The thing big AI providers like OpenAI and Google seem to fear most is competition from open source AI models. And they should be afraid. Open source AI models continue to develop at a rapid pace (there&rsquo;s huge incremental improvements on a weekly basis) and, most importantly, they can be self-hosted.</p> <p>Additionally, it&rsquo;s not out of reach for us to <a href='https://huggingface.co/docs/transformers/training' title=''>fine tune</a> a general model to better fit our needs by adding and removing capabilities rather than hope that the capabilities we need suddenly manifest for us.</p> <p>Doesn&rsquo;t this all sound like the classic argument in favor of open source?</p> <p>If we have the model and can host it ourselves, no one can take it away. When we self-host it, we are protected from:</p> <ul> <li>service interruptions from an external provider for a critical system </li><li>changes in licensing or usage fees (such as your provider suddenly doubling inference costs without warning via an email sent at 3AM) </li><li>government regulators dictate a change to the model that negatively affects our use case (assuming our use isn&rsquo;t breaking the law of course) </li><li>company policy changes that change the behavior of the model we rely on </li><li>rogue boards or a leadership crisis that impacts a provider </li></ul> <p>Using an open source and self-hosted model insulates us from these external risks.</p> <h2 id='i-still-need-gpus' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#i-still-need-gpus' aria-label='Anchor'></a><span class='plain-code'>I still need GPUs!</span></h2> <p>Getting dedicated access to a GPU is more expensive than renting limited time on OpenAI&rsquo;s servers. That&rsquo;s why a hobby or personal project is better off paying for the brief bits of time when needed.</p> <p>But let&rsquo;s face it.</p> <p>If you really want to integrate AI into your business, you need to host your own models. You can&rsquo;t control third party privacy policies, but you can control your own policies when you are the one doing your own inference with your own models. Ideally this means getting your own GPUs and incurring the capital expenditure and operations expenditures, but thankfully we&rsquo;re in the future. We have the cloud now. There&rsquo;s many options you can use for renting GPU access from other companies. This is supported in the big clouds as well as Fly.io. You can check out our <a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''>GPU offerings here</a>.</p> <figure class="post-cta"> <figcaption> <h1>Fly.io also offer GPUs</h1> <p>Running inference on your own hosted models can help de-risk critical AI integrations.</p> <a class="btn btn-lg" href="https://fly.io/docs/about/pricing/#gpus-and-fly-machines"> GPU resource prices </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <h2 id='closing-thoughts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#closing-thoughts' aria-label='Anchor'></a><span class='plain-code'>Closing thoughts</span></h2> <p>It&rsquo;s important to take advantage of AI in our applications so we can reap the benefits. It can give us an important edge in the market! However, we should be extra cautious of building any critical features on a product offered by a proprietary external business. <a href='https://www.msn.com/en-us/money/companies/sam-altman-chaos-helped-openai-rivals-says-hugging-face-ceo-cl%C3%A9ment-delangue/ar-AA1kIFQP' title=''>Others are considering the risks of building on OpenAI as well</a>.</p> <p>Your specific level of risk depends on how central the AI aspect is to your business. If it&rsquo;s a central component like in my Personal AI Fitness Trainer, then I risk losing all my customers and even the company if any of the above mentioned risk factors happen to my AI provider. That&rsquo;s an existential risk that I can&rsquo;t do anything about without taking emergency heroic efforts.</p> <p>If the AI is sprinkled around the edges of the business, then suddenly losing it won&rsquo;t kill the company. However, if the AI isn&rsquo;t being well utilized, then the business may be at risk to competitors who place a bigger bet and take a bigger swing with AI.</p> <p>Oh, what interesting times we live in! 🙃</p> </content> </entry> <entry> <title>Print on Demand</title> <link rel="alternate" href="https://fly.io/blog/print-on-demand/"/> <id>https://fly.io/blog/print-on-demand/</id> <published>2023-11-29T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/print-on-demand/assets/print-on-demand-thumb.webp"/> <content type="html"><div class="lead"><p>Save money by using appliance machines to only allocate memory and other machine resources when you actually need them.</p> </div> <p>Scaling discussions often lead to recommendations to add more memory, more CPU, more machines, more regions, more, more, more.</p> <p>This post is different. It focuses instead on the idea of decomposing parts of your applications into event handlers, starting up Machines to handle the events when needed, and stopping them when the event is done. Along the way we will see how a few built in Fly.io primitives make this easy.</p> <p>To make the discussion concrete, we are going to focus on a common requirement: generation of PDFs from web pages. The code that we will introduce isn&rsquo;t merely an example produced in support of a blog post - rather it is code that was extracted from a production application, and packaged up into an appliance that you can deploy in minutes to add PDF generation to your existing application.</p> <p>But before we dive in, let&rsquo;s back up a bit.</p> <h2 id='motivation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#motivation' aria-label='Anchor'></a><span class='plain-code'>Motivation</span></h2> <p>Normally the way this is approached is to start with a tool like <a href='https://github.com/puppeteer/puppeteer' title=''>Puppeteer</a>, <a href='https://github.com/Studiosity/grover#readme' title=''>Grover</a>, <a href='https://playwright.dev/' title=''>Playwright</a>, <a href='https://github.com/bitcrowd/chromic_pdf' title=''>ChromicPDF</a>, or <a href='https://spatie.be/docs/browsershot/v2/introduction' title=''>BrowserShot</a>. These and other tools ultimately launch a browser like <a href='https://developer.chrome.com/articles/new-headless/' title=''>Chrome headless</a>.</p> <p>Now a few things about Chrome itself:</p> <ul> <li>It likely is bigger than your entire web server. </li><li>It likely uses more memory than you see with a typical load on your server. </li><li>All total, people using your server likely spend much less time generating PDFs than they do using the rest of your application. </li></ul> <p>Taken together, this makes splitting PDF generation into a completely separate application an easy win. With a smaller image, your application will start faster. Memory usage will be more predictable, and the memory needed to generate PDFs will only be allocated when needed and can be scaled separately.</p> <h2 id='diving-in' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#diving-in' aria-label='Anchor'></a><span class='plain-code'>Diving in</span></h2> <p>Without further ado, the entire application is available on GitHub as <a href='https://github.com/fly-apps/pdf-appliance/#readme' title=''>fly-apps/pdf-appliance</a>. Installation is a simple matter of: clone repository, create app, adjust config, deploy, and scale.</p> <p>Next, you will need to integrate this into your application. All that is needed is to reply to requests that are intended to produce a PDF with a <a href='https://fly.io/docs/reference/dynamic-request-routing/#the-fly-replay-response-header' title=''>fly-replay</a> response header. This can either be done on individual application routes / controller actions, or it can be done globally via either middleware or a front end like <a href='https://www.nginx.com/' title=''>NGINX</a>. You can find a few examples in the <a href='https://github.com/fly-apps/pdf-appliance/#integrate-with-your-existing-application' title=''>README</a>.</p> <p>And, that&rsquo;s it. The most you might consider doing is issuing an additional HTTP request in anticipation of the user selecting what they want to print as this will <a href='https://github.com/fly-apps/pdf-appliance/#preloading-optional' title=''>preload the machine</a>.</p> <figure class="post-cta"> <figcaption> <h1>Scale at your own pace</h1> <p>Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.</p> <a class="btn btn-lg" href="https://fly.io/docs/"> Run your entire stack near your users </a> </figcaption> <div class="image-container"> <img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""> </div> </figure> <p>If you don&rsquo;t have an application handy, you can try a demo. Go to <a href='https://smooth.fly.dev/' title=''>smooth.fly.dev</a>. Click on Demo, then on Publish, and finally on Invoices to see a PDF. The PDF you see will likely be underwhelming as you would need to enter students, entries, packages and options to fill out the page. But click refresh anyway and see how fast it responds. If you want to explore further, links to the <a href='https://smooth.fly.dev/showcase/docs/' title=''>documentation</a> and <a href='https://github.com/rubys/showcase#readme' title=''>code</a> can be found on the front page.</p> <h2 id='implementation-details' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implementation-details' aria-label='Anchor'></a><span class='plain-code'>Implementation Details</span></h2> <p>The basic flow starts with a request comes into your app for a PDF. That request is replayed to the PDF appliance. A Chrome instance in that app then issues a second request to your app for the same URL minus the <code>.pdf</code> extension and then converts the HTML which it receives in response to a PDF. That PDF is then returned as the response to the original request.</p> <p>A single Google Chrome instance per machine will be reused across all requests, which itself is faster than starting a new instance per request. As all HTTP headers will be passed back to your application, this will seamlessly work with your existing session, cookies, and basic authentication.</p> <p>Starting up a machine on demand is handled by the <code>auto_stop_machines</code> setting in your <code>fly.toml</code>. With this in place, machines can confidently exit when idle, secure in the knowledge that they will be restarted when needed. See the <a href='https://github.com/fly-apps/pdf-appliance/#scaling' title=''>README</a> for more information on scaling.</p> <p>Note that different machines can use different languages and frameworks. This code is written in JavaScript and runs on Bun. It was designed to support a Ruby on Rails app, but can be used with any app.</p> <h2 id='a-reusable-pattern' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-reusable-pattern' aria-label='Anchor'></a><span class='plain-code'>A Reusable Pattern</span></h2> <p>If your app is small and your usage is low, scaling may not be much of a concern, but as your need grow your first instinct shouldn&rsquo;t merely be to throw more hardware at the problem, but rather to partition the problem so that each machine has a somewhat predictable capacity.</p> <p>Do this by taking a look at your application, and look for requests that are somehow different than the rest. Streaming audio and video files, handling websockets, converting text to speech or performing other AI processing, long running &ldquo;background&rdquo; computation, fetching static pages, producing PDFs, and updating databases all have different profiles in terms of server load.</p> <p>It might even be helpful &ndash; purely as a thought experiment &ndash; to think of replacing your main server with a proxy that does nothing more than route requests to separate machines based on the type of workload performed.</p> <p>Once you have come up with an allocation of functions performed to pools of machines, Fly-Replay is but one tool available to you. There is also a <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Machines API</a> that will enable you to orchestrate whatever topology you can come up with. <a href='https://fly.io/laravel-bytes/cost-effective-queue-workers-with-fly-io-machines/' title=''>Cost-Effective Queue Workers With Fly.io Machines</a> gives a preview of what that would look like with Laravel.</p> </content> </entry> <entry> <title>Launching to Victory</title> <link rel="alternate" href="https://fly.io/blog/new-launch/"/> <id>https://fly.io/blog/new-launch/</id> <published>2023-11-28T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/new-launch/assets/thumb.webp"/> <content type="html"><div class="lead"><p>Fly.io is the new public cloud for running your applications near your users so it can be faster than ever. When you create a new application, you use the <code>fly launch</code> command to give the platform all the information it needs to send it out into the sky. We’ve made steps towards making launching a new app <em>even easier</em> because first impressions matter. <a href="https://fly.io/docs/speedrun/" title="">Try the new <code>fly launch</code> now</a>; you can have an app up and running in mere minutes.</p> </div> <p>Previously when you ran <code>fly launch</code>, you got asked a bunch of hopefully relevant questions to help you get your app up and running. We&rsquo;ve taken a lot of the guesswork out of the process and made it a lot more streamlined. It turns out that even though Fly.io developers use a variety of frameworks, languages, and toolchains you can fold most of them into a few basic infrastructure shapes.</p> <h2 id='the-new-launch' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-new-launch' aria-label='Anchor'></a><span class='plain-code'>The new launch</span></h2> <p>Now when you run <code>fly launch</code>, the CLI will infer what you want based on the source code of your application. For example, if you have a Rails app with SQLite, it&rsquo;ll give you an opinionated set of defaults that you can build from. If you don&rsquo;t, it&rsquo;ll give you other options so you can craft the infrastructure you need. I took one of my older applications named <a href='https://douglas-adams-quotes.fly.dev/' title=''>douglas-adams-quotes</a> and launched it with the new flow. Here&rsquo;s what it looks like:</p> <p><img alt="An animated GIF showing the new fully automated launch process. It starts by guessing what your app is and what needs it has, then presents you with a set of opinionated defaults so that you can confirm or deny. If you confirm it will build your application and deploy it, then give you the URL so you can use it." src="/blog/new-launch/assets/./the-gif-edited.gif" /></p> <p>If the settings it guessed are good enough, you can launch it into the cloud. If not, then you&rsquo;ll be taken to a webpage where you can confirm or change the settings it guessed.</p> <p>Once you say yes or confirm on the web, your app will get built and deployed (unless you asked it not to with <code>--no-deploy</code>). You&rsquo;ll get a link to your app so you can go check it out. It&rsquo;s that easy.</p> <h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>We hope that this can help you look before you <code>fly launch</code> into the wild unknowns of the cloud.</p> <p>Got any ideas or comments on how we can make this even smoother? Get in touch on our <a href='https://community.fly.io/' title=''>community forum</a>. We&rsquo;d love to hear from you.</p> </content> </entry> <entry> <title>How I Fly</title> <link rel="alternate" href="https://fly.io/blog/how-i-fly/"/> <id>https://fly.io/blog/how-i-fly/</id> <published>2023-11-17T00:00:00+00:00</published> <updated>2023-11-28T14:16:01+00:00</updated> <media:thumbnail url="https://fly.io/blog/how-i-fly/assets/thumb.webp"/> <content type="html"><div class="lead"><p>We are Fly.io. We make it easy to run your programs close to your users. We make it easy to update your programs whenever you need to and communicate between your services in an end-to-end encrypted fashion. Today, Xe is going to tell you what they do to use Fly.io effectively. <a href="https://fly.io/docs/speedrun/" title="">Deploy your first app</a> for free and scale it up to production. That’s what Xe did.</p> </div> <p>I&rsquo;m Xe Iaso. I&rsquo;m a writer, technical educator, and philosopher who focuses on making technology easy to understand and scale to your needs. I use Fly.io to host my website and in nearly all of my personal projects now. Fly.io allows me to experiment with new ideas quickly and then deploy them to the world with ease.</p> <h2 id='what-is-fly-io' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-is-fly-io' aria-label='Anchor'></a><span class='plain-code'>What is Fly.io?</span></h2> <p>Fly.io lets you host your applications in data centers close to your users. Fly.io also lets you have rolling updates of your programs and facilitates easy communication between your services inside and outside of your organization&rsquo;s private network.</p> <p>I use Fly.io to host my blog, its CDN (named XeDN for reasons which are an exercise for the reader), and a bunch of other supporting services that help make it run. It is easily the most fun I&rsquo;ve had deploying things since I worked at Heroku.</p> <h2 id='my-blog' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#my-blog' aria-label='Anchor'></a><span class='plain-code'>My blog</span></h2> <p>My blog is made up of several parts: the backend blog server and the CDN. Both are written in Go, my favorite programming language. The back-end blog server runs in Toronto, but XeDN runs in 35 datacenters worldwide. I plan to eventually move my blog to be served from XeDN, but for right now it&rsquo;s still comfortably running off of a single server in Toronto.</p> <p><img alt="The entire flow for how things run on Xesite." src="/blog/how-i-fly/assets/./rebuild-flow.svg" /></p> <p>Overall, my website&rsquo;s architecture looks like this. My website listens for updates from Patreon and GitHub to trigger rebuilds because of its <a href='https://xeiaso.net/blog/xesite-v4/' title=''>dystatic nature</a>. When I am working on new posts or building new assets, I upload them to Backblaze B2. Anytime someone tries to access one of the files on a XeDN node, it will download it from Backblaze B2 if it doesn&rsquo;t have it locally already.</p> <p>With Fly.io, I don&rsquo;t have to worry about the user experience being degraded when servers go down. If any individual XeDN server goes down, I can rely on the other XeDN servers worldwide to pick up the slack thanks to the fact that Fly.io will shunt the traffic to the servers that aren&rsquo;t down. Combine this with some very aggressive caching logic for things like video assets, I can make sure that my blog is fast for everyone, no matter where they are in the world.</p> <p>Of course, it doesn&rsquo;t end here. My CDN server is the back end that helps make my other projects work too. I spent some time working on a <a href='https://xeiaso.net/blog/iaso-fonts/' title=''>custom font</a> for all of my web properties, and I <a href='https://cdn.xeiaso.net/static/pkg/iosevka/specimen.html' title=''>serve it from my CDN</a> so that I can use it in every project of mine. This allows me to integrate it into other projects like <a href='https://arsene.fly.dev/' title=''>Arsène</a> without having to do anything special.</p> <h2 id='building-on-top-of-projects-with-fly-io' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#building-on-top-of-projects-with-fly-io' aria-label='Anchor'></a><span class='plain-code'>Building on top of projects with Fly.io</span></h2> <p>I like making projects that aren&rsquo;t entirely serious. I love using these projects to explore aspects and bits of technology that I would have never gotten to play with before. One of these is <a href='https://arsene.fly.dev' title=''>Arsène</a>, a project I used to explore what a &ldquo;dead internet&rdquo; powered by AI could look like.</p> <p>Every 12 hours, Arsène will have the ChatGPT API generate new posts and then use Stable Diffusion to create a (hopefully relevant) illustration for that post. I run a copy of the <a href='https://github.com/AUTOMATIC1111/stable-diffusion-webui' title=''>Automatic1111</a> Stable Diffusion API in my private network. When Arsène generates an image, it reaches out to that Stable Diffusion API directly over that private network to make the calls it needs. Since XeDN is in the same private network, I can also have Arsène send the images there to be cached and served all over the world.</p> <p>Here&rsquo;s what the total flow looks like:</p> <p><img alt="The flow of data for Arsène, showing how this lets me reuse projects" src="/blog/how-i-fly/assets/./reuse-flow.svg" /></p> <p>This means that when I am creating things, I am not just making one-off things that don&rsquo;t work with each other. I am creating individual building blocks that interoperate with each other. I am creating opportunities for me to reuse my infrastructure to create brand new things that are robust and scalable with minimal effort on my end.</p> <h2 id='my-other-projects' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#my-other-projects' aria-label='Anchor'></a><span class='plain-code'>My other projects</span></h2> <p>I have some other projects that I&rsquo;m working on that I don&rsquo;t want to get into too much detail about yet, but it&rsquo;s going to mostly involve transforming the basic ideas of using my CDN for distributing things and a webserver for sending HTML to users in new and interesting ways. I love using Fly.io for this because I am just allowed to create things instead of having to worry about how to implement it, where state is going to be stored, or how I&rsquo;m going to scale it.</p> <div class="callout"><p>Fly.io is the only platform where I’ve used where I can spin up 35 copies of a program as easily as one copy of a program.</p> </div><h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>If you haven&rsquo;t given Fly.io a try yet, you&rsquo;re really missing out. It is utterly trivial to deploy your application across the globe. Not to mention, when your applications are idle, you can have them scale down to zero copies. This means that you only pay for what you actually use. I don&rsquo;t have to worry about overpaying for my blog by having a giant server in Helsinki running 24/7, even though I&rsquo;m only using a small sliver of it.</p> <p>If you want to learn more about Fly.io, you can check out <a href='https://fly.io' title=''>fly.io</a>. My CDN cost me nothing until I started adding cover art per post and the <a href='https://xeiaso.net//blog/how-mara-works-2020-09-30/' title=''>conversation snippets</a> with furry stickers. It definitely went over the bar when I started uploading video. I can see it scaling in the future as my demands scale too.</p> <p>Of course, this is barely even scratching the surface. Stay tuned for secret tricks you can use to dynamically spin up and spin down machines as you need. Imagine uploading an image, automatically creating a machine to handle compressing it, and uploading it to your storage back end. Imagine what you could do if compute was a faucet that you could turn on and off as you needed it.</p> <p>You can do it on Fly.io. Try it today, you can run an app on a 256 MB Machine for free. XeDN ran on three 256 MB Machines for a year. Arsène still runs on a 256 MB Machine to this day. It&rsquo;s more than enough for what you&rsquo;re going to do. And when it isn&rsquo;t, scaling up is <a href='https://fly.io/docs/about/pricing/' title=''>cheaper than you can imagine</a>.</p> </content> </entry> <entry> <title>Transcribing on Fly GPU Machines</title> <link rel="alternate" href="https://fly.io/blog/transcribing-on-fly-gpu-machines/"/> <id>https://fly.io/blog/transcribing-on-fly-gpu-machines/</id> <published>2023-11-13T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/transcribing-on-fly-gpu-machines/assets/whispering-thumb.webp"/> <content type="html"><div class="lead"><p>Fly.io has GPUs! If you want to run AI (or whatever) workloads, checkout how to <a href="https://fly.io/docs/gpus/gpu-quickstart/" title="">get started with GPU Machines</a>!</p> </div> <p>Fly.io has GPU Machines, which means we can finally <del>play games</del> <del>mine bitcoin</del> <del>baghold NFTs</del> run AI workloads with just a few API calls.</p> <p>This is exciting! Running GPU workloads yourself is useful when the community™ builds upon available models to make them faster, more useful, or less restrictive than first-party APIs.</p> <p>One such tool is the <a href='https://github.com/ahmetoner/whisper-asr-webservice' title=''>Whisper Webservice</a>, which is conveniently packaged in a way that makes it a good candidate to use on Fly GPU Machines.</p> <p>Let&rsquo;s see how to use Fly.io GPU by spinning up Whisper Webservice.</p> <h2 id='whisper-webservice' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whisper-webservice' aria-label='Anchor'></a><span class='plain-code'>Whisper Webservice</span></h2> <p>Whisper is OpenAI&rsquo;s voice recognition service - it&rsquo;s used for audio transcription. To use it anywhere that&rsquo;s not OpenAI&rsquo;s platform, you need <a href='https://github.com/openai/whisper' title=''>some Python</a>, a few GB of storage, and (preferably) a GPU.</p> <p>The aforementioned <a href='https://github.com/ahmetoner/whisper-asr-webservice' title=''>Whisper Webservice</a> packages this up for us, while making Whisper faster, more useful, and less restricted than OpenAI&rsquo;s API:</p> <ol> <li>It provides a web API on top of Whisper&rsquo;s Python library </li><li>It (optionally) integrates <a href='https://github.com/guillaumekln/faster-whisper' title=''>faster-whisper</a> to make it, you know, faster </li><li>It (optionally) uses FFmpeg to process the uploaded audio file, useful for getting audio out of video files or converting audio formats </li></ol> <p>Luckily for us, and totally <strong class='font-semibold text-navy-950'>not</strong> why I chose this as an example - the project provides GPU-friendly Docker images. We&rsquo;ll use those to spin up Fly GPU Machines and process some audio files.</p> <p>(I&rsquo;ll also show examples of making your own Docker image!)</p> <h2 id='running-a-gpu-machine' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#running-a-gpu-machine' aria-label='Anchor'></a><span class='plain-code'>Running a GPU Machine</span></h2> <p>Spinning up a GPU Machine is very similar to any other Machine. The main difference is the new &ldquo;GPU kind&rdquo; option (<code>--vm-gpu-kind</code>), which takes 2 possible values:</p> <ol> <li><code>a100-pcie-40gb</code> </li><li><code>a100-sxm4-80gb</code> </li></ol> <p>These are 2 flavors of Nvidia A100 GPUs, the difference worth caring about is <code>40</code> vs <code>80</code> GB of memory (here&rsquo;s <a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''>pricing</a>).</p> <p>We&rsquo;ll create machines using <code>a100-pcie-40gb</code> because we don&rsquo;t need 80 freakin&rsquo; GB for what we&rsquo;re doing.</p> <p>Using <code>flyctl</code> is a great way to run a GPU Machine. We&rsquo;ll make an app and run the conveniently created <a href='https://hub.docker.com/r/onerahmet/openai-whisper-asr-webservice' title=''>Whisper Webservice Docker image</a> that supports Nvidia GPUs. The <code>flyctl</code> commands will default us into a <code>performance-8x</code> server size (8 CPUs, 16G ram) unless we specify something different.</p> <p><strong class='font-semibold text-navy-950'>One caveat:</strong> AI model files are big. Docker images ideally aren&rsquo;t big - sending huge layers across the network angers the spiteful networking gods. If you shove models into your Docker images, you <em>might</em> have a bad time.</p> <p>We suggest creating a Fly Volume and making your Docker image download needed models when it first spins up. The Whisper service (and in my experience, OpenAI&rsquo;s Python library) does that for us.</p> <p>So, we&rsquo;ll create a volume to house (and cache) the models. In the case of the Whisper project, the models get placed in <code>/root/.cache/whisper</code> on its first boot, and so we&rsquo;ll mount our disk there.</p> <p>Alright, let&rsquo;s create a GPU Machine. Here&rsquo;s what the process looks like:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-chwf29f7" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-chwf29f7"><span class="nv">APP_NAME</span><span class="o">=</span><span class="s2">"whispering-zines"</span> fly apps create <span class="nv">$APP_NAME</span> <span class="nt">-o</span> personal <span class="c"># We "hint" --vm-gpu-kind so the volume</span> <span class="c"># is provisioned on a GPU host</span> <span class="c"># We choose region ord, where most Fly GPUs</span> <span class="c"># currently live</span> fly volumes create whisper_zine_cache <span class="nt">-s</span> 10 <span class="se">\</span> <span class="nt">-a</span> <span class="nv">$APP_NAME</span> <span class="nt">-r</span> ord <span class="nt">--vm-gpu-kind</span> a100-pcie-40gb <span class="c"># Take note of the volume ID from the output ^</span> <span class="c"># Run a machine that can accept web requests</span> <span class="c"># from the public internet</span> fly machines run onerahmet/openai-whisper-asr-webservice:latest-gpu <span class="se">\</span> <span class="nt">--vm-gpu-kind</span> a100-pcie-40gb <span class="se">\</span> <span class="nt">-p</span> 443:9000/tcp:tls:http <span class="nt">-p</span> 80:9000/tcp:http <span class="se">\</span> <span class="nt">-r</span> ord <span class="se">\</span> <span class="nt">-v</span> &lt;VOLUME_ID&gt;:/root/.cache/whisper <span class="se">\</span> <span class="nt">-e</span> <span class="nv">ASR_MODEL</span><span class="o">=</span>large <span class="nt">-e</span> <span class="nv">ASR_ENGINE</span><span class="o">=</span>faster_whisper <span class="se">\</span> <span class="nt">-a</span> <span class="nv">$APP_NAME</span> <span class="c"># Allocate IPs so we can view it on the web</span> fly ips allocate-v4 <span class="nt">--shared</span> <span class="nt">-a</span> <span class="nv">$APP_NAME</span> fly ips allocate-v6 <span class="nt">-a</span> <span class="nv">$APP_NAME</span> </code></pre> </div> </div> <p>That&rsquo;s all pretty standard for Fly Machines, <strong class='font-semibold text-navy-950'>except</strong> for the <code>--vm-gpu-kind</code> flags used both for volume <strong class='font-semibold text-navy-950'>and</strong> Machine creation. Volumes are pinned to specific hosts - using this flag tells Fly.io to create the volume on a GPU host. Assuming we set the same region (<code>-r ord</code>), creating a GPU Machine with the just-created volume will tell Fly.io to place the Machine on the same host as the volume.</p> <p><strong class='font-semibold text-navy-950'>Note:</strong> As my machine started up, I saw a log line <code>WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.</code>, which ended up being an issue of timing. Once everything is running, I was able to see things were working by using <code>fly ssh console -a $APP_NAME</code> and running command <code>nvidia-smi</code> to confirm that the VM had a GPU. It also listed the running web service (Python in this case) was running as a GPU process.</p> <p>Once everything is running, you should be able to head to <code>$APP_NAME.fly.dev</code> and view it in the browser.</p> <p>The Whisper Webservice UI will let you try out individual calls in its API. This will also give you the information you need to make those calls from your code. There&rsquo;s a link to the API specification (e.g. <code>$APP_NAME.fly.dev/openapi.json</code>) you can use to, say, have <a href='https://www.blobr.io/post/create-api-specs-chatgpt' title=''>ChatGPT generate a client</a> in your language of choice.</p> <h2 id='automating-gpu-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#automating-gpu-machines' aria-label='Anchor'></a><span class='plain-code'>Automating GPU Machines</span></h2> <p>If you want to automate this, you can use the <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Machines API</a> (spec <a href='https://docs.machines.dev/swagger/index.html' title=''>here</a>).</p> <p>An easy way to get started is to spy on the API requests <code>flyctl</code> is making:</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-13v3zt2f" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-13v3zt2f"><span class="c"># Debug logs will output the API requests / responses</span> <span class="c"># made to Fly.io's API.</span> <span class="nv">LOG_LEVEL</span><span class="o">=</span>debug flyctl machine run ... </code></pre> </div> </div> <p>This helped me figure out why my own initial API attempts failed - it turns out we need some extra parameters in the <code>compute</code> portion of the request JSON for creating a volume, and the <code>guest</code> section for creating a Machine.</p> <p>For both volumes and Machines, we set the <code>gpu_kind</code> the same way we did in our <code>flyctl</code> command. However we <em>also</em> need the <code>cpu_kind</code> to be set. Additionally, when creating a Machine, we need to set <code>cpus</code> and <code>memory_mb</code> to <a href='https://fly.io/docs/machines/guides-examples/machine-sizing/' title=''>valid values</a> for <code>performance</code> Machines.</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-e14p7s3k" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-e14p7s3k"><span class="nv">APP_NAME</span><span class="o">=</span><span class="s2">"whispering-zines"</span> <span class="c"># Create a volume on a GPU host. Specify both</span> <span class="c"># cpu_kind and gpu_kind</span> curl <span class="nt">-H</span> <span class="s2">"Authorization: Bearer </span><span class="sb">`</span>fly auth token<span class="sb">`</span><span class="s2">"</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s2">"Accept: application/json"</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s2">"Content-Type: application/json"</span> <span class="se">\</span> https://api.machines.dev/v1/apps/<span class="nv">$APP_NAME</span>/volumes <span class="se">\</span> <span class="nt">-d</span> <span class="s1">'{ "name": "whisper_zine_cache", "region": "ord", "size_gb": 10, "compute": { "cpu_kind": "performance", "gpu_kind": "a100-pcie-40gb" } }'</span> <span class="c"># Take note of the volume ID from the response ^</span> <span class="c"># Run a machine that can accept web requests</span> <span class="c"># from the public internet.</span> curl <span class="nt">-H</span> <span class="s2">"Authorization: Bearer </span><span class="sb">`</span>fly auth token<span class="sb">`</span><span class="s2">"</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s2">"Accept: application/json"</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s2">"Content-Type: application/json"</span> <span class="se">\</span> https://api.machines.dev/v1/apps/<span class="nv">$APP_NAME</span>/machines <span class="se">\</span> <span class="nt">-d</span> <span class="s1">'{ "region": "ord", "config": { "env": { "ASR_ENGINE": "faster_whisper", "ASR_MODEL": "large", "FLY_PROCESS_GROUP": "app", "PRIMARY_REGION": "ord" }, "mounts": [ { "path": "/root/.cache/whisper", "volume": "&lt;VOLUME_ID&gt;", "name": "data" } ], "services": [ { "protocol": "tcp", "internal_port": 9000, "autostop": false, "ports": [ { "port": 80, "handlers": [ "http" ], "force_https": true }, { "port": 443, "handlers": [ "http", "tls" ] } ] } ], "image": "onerahmet/openai-whisper-asr-webservice:latest-gpu", "guest": { "cpus": 8, "memory_mb": 16384, "cpu_kind": "performance", "gpu_kind": "a100-pcie-40gb" } } }'</span> </code></pre> </div> </div> <p>After that we can assign the app some IPs. You can use <code>flyctl</code> for this, or the <a href='https://api.fly.io/graphql' title=''>graphql API.</a> You can once again use debug mode with <code>flyctl</code> to see what API calls it makes. Side note: Eventually the Machines REST API will include the ability to allocate IP addresses.</p> <div class="highlight-wrapper group relative bash"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-v5sntcmu" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-v5sntcmu">fly ips allocate-v4 <span class="nt">--shared</span> <span class="nt">-a</span> <span class="nv">$APP_NAME</span> fly ips allocate-v6 <span class="nt">-a</span> <span class="nv">$APP_NAME</span> </code></pre> </div> </div> <p>If you&rsquo;re doing this type of work for your business, you may want to keep these Machines inside a private network anyway, in which case you won&rsquo;t be assigning it IP addresses.</p> <h2 id='making-your-own-images' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#making-your-own-images' aria-label='Anchor'></a><span class='plain-code'>Making Your Own Images</span></h2> <p>There is, luckily (for me, a hardware ignoramus) less dark magic to making GPU-friendly Docker images than you might think. Basically you need to just install the correct Nvidia drivers.</p> <p>A way to cheat at this is to run <a href='https://github.com/NVIDIA/nvidia-container-toolkit/tree/main' title=''>Nvidia cuda base images</a>, but you&rsquo;re made of sterner stuff, you can also start with a base Ubuntu image and install your own.</p> <p>While the Whisper webservice image is based on <code>nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04</code>, I got Whisper (plain, not the webservice) working with <code>ubuntu:22.04</code>:</p> <div class="highlight-wrapper group relative dockerfile"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-qjwtp7g3" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-qjwtp7g3"><span class="c"># Base image</span> <span class="k">FROM</span><span class="s"> ubuntu:22.04</span> <span class="k">RUN </span>apt update <span class="nt">-q</span> <span class="o">&amp;&amp;</span> apt <span class="nb">install</span> <span class="nt">-y</span> ca-certificates wget <span class="se">\ </span> <span class="o">&amp;&amp;</span> wget <span class="nt">-qO</span> /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb <span class="se">\ </span> <span class="o">&amp;&amp;</span> dpkg <span class="nt">-i</span> /cuda-keyring.deb <span class="o">&amp;&amp;</span> apt update <span class="nt">-q</span> <span class="se">\ </span> <span class="o">&amp;&amp;</span> apt <span class="nb">install</span> <span class="nt">-y</span> <span class="nt">--no-install-recommends</span> ffmpeg libcudnn8 libcublas-12-2 <span class="se">\ </span> git python3 python3-pip <span class="k">WORKDIR</span><span class="s"> /app</span> <span class="k">COPY</span><span class="s"> audio.mp3</span> <span class="k">COPY</span><span class="s"> run.py /app/run.py</span> <span class="k">CMD</span><span class="s"> ["python3" "run.py"]</span> </code></pre> </div> </div> <p>You can find a full, <a href='https://github.com/fly-apps/whisper-example' title=''>working version of this here</a>.</p> <h2 id='this-time-its-different-i-guess' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#this-time-its-different-i-guess' aria-label='Anchor'></a><span class='plain-code'>This time it&rsquo;s different, I guess</span></h2> <p>AI feels a bit different than previous trends in that it has immediately-obvious benefits. No one needs to throw around catchy phrases with a wink-wink nudge-nudge (&ldquo;we like the art&rdquo;) for us to find value.</p> <p>Since AI workloads work most efficiently in GPUs, they remain a hot commodity. For those of us who didn&rsquo;t purchase enough $NVDA to retire, we can bring more value to our businesses by adding in AI.</p> <p>Fly Machines have always been a great little piece of tech to run &ldquo;ephemeral compute workloads&rdquo; (wait, do I work at AWS!?) - and this is what I like about GPU Machines. You can mix and match all sorts of AI stuff together to make a chain of useful tools!</p> </content> </entry> <entry> <title>Skip the API, Ship Your Database</title> <link rel="alternate" href="https://fly.io/blog/skip-the-api/"/> <id>https://fly.io/blog/skip-the-api/</id> <published>2023-09-13T00:00:00+00:00</published> <updated>2023-11-21T21:08:37+00:00</updated> <media:thumbnail url="https://fly.io/blog/skip-the-api/assets/skip-the-api-thumb.webp"/> <content type="html"><div class="lead"><p>With Fly.io, <a href="https://fly.io/docs/speedrun/" title="">you can get your app running globally in a matter of minutes</a>, and with LiteFS, you can run SQLite alongside your app! Now we’re introducing LiteFS Cloud: managed backups and point-in-time restores for LiteFS. <a href="https://fly.io/docs/litefs/speedrun/" title="">Try it out for yourself</a>!</p> </div> <p>My favorite part about building tools is discovering their unintended uses. It&rsquo;s like starting to write a murder mystery book but you have no idea who the killer is!</p> <p>History is filled with examples of these accidental discoveries: WD-40 was originally <a href='https://en.wikipedia.org/wiki/WD-40#History' title=''>used to protect ICBMs from rust</a> and now it fixes your squeaky doorknob. Bubble wrap was <a href='https://en.wikipedia.org/wiki/Bubble_Wrap_(brand)#History' title=''>originally sold as wallpaper</a> and now it protects your Amazon packages.</p> <p>When we started writing <a href='https://fly.io/docs/litefs/' title=''>LiteFS</a>, a distributed SQLite database, we thought it would be used to distribute data geographically so users in, say, Bucharest see response times as fast as users in San Jose. And for the most part, that&rsquo;s what LiteFS users are doing.</p> <p>But we discovered another unexpected use: replacing the API layer between services with SQLite databases.</p> <h2 id='how-it-started' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-it-started' aria-label='Anchor'></a><span class='plain-code'>How it started</span></h2> <p>In the early days of LiteFS development, we wanted to find a real-world test bed for our tool so we could smoke out any bugs that we didn&rsquo;t find during automated tests. Part of our existing infrastructure is a program called <em>Corrosion</em> that gossips state between all our servers. Corrosion tracks VM statuses, health checks, and a plethora of other information for each server and communicates this info with other servers so they can make intelligent decisions about request routing and VM placement. Corrosion keeps a fast, local copy of all this data in a SQLite database.</p> <p>So we set up a Corrosion instance that also ran on top of LiteFS. This helped root out some bugs but we also found another use for it: making Corrosion accessible to our internal services.</p> <p><img src="/blog/skip-the-api/assets/corrosion.png" /></p> <h2 id='shipping-the-kitchen-sink' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#shipping-the-kitchen-sink' aria-label='Anchor'></a><span class='plain-code'>Shipping the kitchen sink</span></h2> <p>The typical approach to making data available between services is to spend weeks designing an API and then building a service around it. Your API design needs to take into account the different use cases of each consuming service so that it can deliver the data it needs efficiently. You don&rsquo;t want your clients making a dozen API calls for every request!</p> <p><img src="/blog/skip-the-api/assets/architecture.png" /></p> <p>A different approach is to skip the API design entirely and just ship the entire database to your client. You don&rsquo;t need to consider the consuming service&rsquo;s access patterns as they can use vanilla SQL to query and join whatever data their heart desires. That&rsquo;s what we did using LiteFS.</p> <p>While we could have set up each downstream service as a Corrosion node, gossip protocols can be chatty and we really just needed a one-way stream of updates. Setting up a read-only LiteFS instance for a new service is simple—it just needs the hostname of the upstream primary node to connect to:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-e631uyyz" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-e631uyyz">lease: type: "static" candidate: false advertise-url: "http://corrosion-bridge:20202 </code></pre> </div> </div> <p>And voila! You have a full, read-only copy of the database on your app.</p> <h2 id='moving-compute-to-the-client' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#moving-compute-to-the-client' aria-label='Anchor'></a><span class='plain-code'>Moving compute to the client</span></h2> <p>API design is notoriously difficult as it&rsquo;s hard to know what your consuming services will need. Query languages such as <a href='https://graphql.org/' title=''>GraphQL</a> have even been invented for this specific problem!</p> <p>However, GraphQL has its own limitations. It&rsquo;s good for fetching raw data but it lacks built-in <a href='https://www.sqlite.org/lang_aggfunc.html' title=''>aggregation</a> &amp; advanced querying capabilities like <a href='https://www.sqlite.org/windowfunctions.html' title=''>windowing</a>. GraphQL is typically layered on top of an existing relational database that uses SQL. So why not just use SQL?</p> <p>Additionally, performing queries on your service means that you need to handle multiple tenants competing for compute resources. Managing these tenants involves rate limiting and query timeouts so that no one client consumes all the resources.</p> <p>By pushing a read-only copy of the database to clients, these restrictions aren&rsquo;t a concern anymore. A tenant can use 100% of its CPU for hours if it wants to. It won&rsquo;t adversely affect any other tenant because the query is running on its own hardware.</p> <h2 id='so-whats-the-downside' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#so-whats-the-downside' aria-label='Anchor'></a><span class='plain-code'>So what&rsquo;s the downside?</span></h2> <p>There&rsquo;s always trade-offs with any technology and shipping read-only replicas is no different. One obvious limitation of read-only replicas is that they&rsquo;re read-only. If your clients need to update data, they&rsquo;ll still need an API for those mutations.</p> <p>A less obvious downside is that the contract for a database can be less strict than an API. One benefit to an API layer is that you can change the underlying database structure but still massage data to look the same to clients. When you&rsquo;re shipping the raw database, that becomes more difficult. Fortunately, many database changes, such as adding columns to a table, are backwards compatible so clients don&rsquo;t need to change their code. Database views are also a great way to reshape data so it stays consistent—even when the underlying tables change.</p> <p>Finally, shipping a database limits your ability to restrict access to data. If you have a multi-tenant database, you can&rsquo;t ship that database without the client seeing all the data. One workaround for this is to use a database per tenant. SQLite databases are lightweight since they are just files on disk. This also has the added benefit of preventing queries in your application from accidentally fetching data across tenants.</p> <h2 id='where-do-we-take-this-next' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-do-we-take-this-next' aria-label='Anchor'></a><span class='plain-code'>Where do we take this next?</span></h2> <p>While this approach has worked well for some internal tooling, how does this look in the broader world of software? APIs are likely stick around for the foreseeable future so providing read-only database replicas make sense for specific use cases where those APIs aren&rsquo;t a great fit.</p> <p>Imagine being able to query all your Stripe data or your GitHub data from a local database. You could join that data on to your own dataset and perform fast queries on your own hardware.</p> <p>While companies such as Stripe or GitHub likely colocate their tenant data into one database, many companies run an event bus using tools like Kafka which could allow them to generate per-tenant SQLite databases to then stream to customers.</p> <p>Pushing queries out to the end user has huge benefits for both the data provider &amp; the data consumer in terms of flexibility and power.</p> </content> </entry> <entry> <title>Automated Sentry Error Tracking</title> <link rel="alternate" href="https://fly.io/blog/sentry-partnership/"/> <id>https://fly.io/blog/sentry-partnership/</id> <published>2023-09-12T00:00:00+00:00</published> <updated>2023-11-21T21:08:37+00:00</updated> <media:thumbnail url="https://fly.io/blog/sentry-partnership/assets/sentry-thumb.webp"/> <content type="html"><div class="lead"><p>We’re Fly.io. We put your code into lightweight microVMs on our own hardware <a href="https://fly.io/docs/reference/regions/" title="">around the world</a>, close to your users. We partnered with <a href="https://sentry.io" title="">Sentry</a> to bring error and performance monitoring to your apps. Deploy your first app, and automatically get a year’s worth of credits to Sentry’s <a href="https://sentry.io/pricing/" title="">Team Plan</a> credits. <a href="https://fly.io/docs/speedrun/" title="">Check us out</a>—your app can be deployed and instrumented in minutes.</p> </div> <p>We&rsquo;ve been using Sentry since the dawn of the internet. Or at least as far back as the <a href='https://home.cern/science/physics/higgs-boson/how' title=''>discovery</a> of the Higgs boson. Project to project, the familiar Sentry issue detail screen has been our faithful debugging companion.</p> <p>Today it&rsquo;s no exception: All of our Golang, Elixir, Ruby and Rust services report dutifully to Sentry.</p> <p>So, it felt natural to integrate Sentry as the default error monitoring tool. All new deployments on Fly.io get a Sentry project provisioned automatically. Existing apps can grab theirs with <code>flyctl ext sentry create</code>.</p> <p>Each Fly.io organization receives, for one year, a generous monthly quota:</p> <ul> <li>50,000 Error events </li><li>100,000 Performance units </li><li>500 Session Replays </li><li>1GB of storage for Attachments </li></ul> <p>Once your app is instrumented, you’ll automatically get notified of production errors, latency issues, and crashes as soon as they occur in production. Sentry’s Team plan also gives you access to over 40 integrations, unlimited seats, and custom alerting.</p> <h2 id='auto-instrumenting-rails' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#auto-instrumenting-rails' aria-label='Anchor'></a><span class='plain-code'>Auto-instrumenting Rails</span></h2> <p>To see Sentry in action, let&rsquo;s launch our <a href='https://github.com/fly-apps/boomer' title=''>Boomer Rails App</a>. Yes kids, Rails is old school, and it&rsquo;s the easiest framework to auto-instrument.</p> <p>When <code>flyctl launch</code> detects a Rails app, it&rsquo;s automatically setup to use a freshly minted Sentry project. Gems are installed, initializers planted, and finally, the <code>SENTRY_DSN</code> secret is set for deployment. We redacted some output for brevity.</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-onfm6lp2" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-onfm6lp2">fly deploy </code></pre> </div> </div><div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-93jth4av" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-93jth4av">==&gt; Verifying app config ... Your Sentry project is ready. See details and next steps with: flyctl apps errors Setting the following secrets on boomerang: SENTRY_DSN ... Visit your newly deployed app at https://boomerang.fly.dev/ </code></pre> </div> </div> <p>Now, having Sentry configured at launch time means that deployment errors are captured early. This is useful for situations where apps fail to boot, run out of memory, and so on.</p> <p>Now let&rsquo;s force an application exception. We visit the app root, which goes Boom, thanks to some hastily written Ruby code.</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-suswx77f" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-suswx77f">flyctl open </code></pre> </div> </div> <p><img src="/blog/sentry-partnership/assets/boom-cover.webp?card&amp;center" /></p> <p>Oh shucks. Something went wrong. But, I got an email about this error.</p> <p><img src="/blog/sentry-partnership/assets/email-cover.webp?card&amp;center" /></p> <p>We could click &ldquo;View on Sentry&rdquo;. Instead, let&rsquo;s use <code>flyctl</code> to send us to the Sentry issues dashboard.</p> <div class="highlight-wrapper group relative cmd"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-3seig9v4" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight cmd'><code id="code-3seig9v4">flyctl apps errors </code></pre> </div> </div> <p>We click through to this specific issue.</p> <p><img src="/blog/sentry-partnership/assets/dash.webp?card&amp;center" /></p> <p>We successfully debugged our issue. The takeaway: don&rsquo;t raise when you can call.</p> <p>Error tracking on Sentry is just scratching the surface. Check out their <a href='https://docs.sentry.io/product/performance/' title=''>performance monitoring</a>, <a href='https://docs.sentry.io/product/session-replay' title=''>session replay</a>, <a href='https://docs.sentry.io/product/alerts/' title=''>alerting</a> and <a href='https://docs.sentry.io/product/' title=''>much more</a>.</p> <h2 id='next-steps-for-fly-io-and-sentry' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#next-steps-for-fly-io-and-sentry' aria-label='Anchor'></a><span class='plain-code'>Next Steps for Fly.io and Sentry</span></h2> <p>For our next trick, we&rsquo;ll be tracking Fly.io releases in Sentry, so Sentry can link issues to their <a href='https://docs.sentry.io/product/releases/' title=''>release tracking</a> feature. We&rsquo;ll also send events like <a href='https://fly.io/docs/getting-started/troubleshooting/#out-of-memory-oom-or-high-cpu-usage' title=''>out-of-memory errors</a> to Sentry. The possibilities are endless.</p> <p>Got ideas or comments? Get in touch on our <a href='https://community.fly.io/' title=''>community forum</a>.</p> </content> </entry> <entry> <title>Tracking Application-Level Consistency with LiteFS</title> <link rel="alternate" href="https://fly.io/blog/tracking-consistency-with-litefs/"/> <id>https://fly.io/blog/tracking-consistency-with-litefs/</id> <published>2023-08-30T00:00:00+00:00</published> <updated>2023-11-21T21:08:37+00:00</updated> <media:thumbnail url="https://fly.io/blog/tracking-consistency-with-litefs/assets/tracking-consistency-thumb.webp"/> <content type="html"><div class="lead"><p>With Fly.io, <a href="https://fly.io/docs/speedrun/" title="">you can get your app running globally in a matter of minutes</a>, and with LiteFS, you can run SQLite alongside your app! Now we’re introducing LiteFS Cloud: managed backups and point-in-time restores for LiteFS. <a href="https://fly.io/docs/litefs/speedrun/" title="">Try it out for yourself</a>!</p> </div> <p>When we started the <a href='https://fly.io/docs/litefs/' title=''>LiteFS</a> project a year ago, we started more with an ideal in mind rather than a specific implementation. We wanted to make it possible to not only run distributed SQLite but we also wanted to make it… <em>gasp</em>… easy!</p> <p>There were hurdles that we expected to be hard, such as intercepting SQLite transaction boundaries via syscalls or shipping logs around the world while ensuring data integrity. But there was one hurdle that was unexpectedly hard: maintaining a consistent view from the application&rsquo;s perspective.</p> <p>LiteFS requires write transactions to only be performed at the primary node and then those transactions are shipped back to replicas instantaneously. Well, almost instantaneously. And therein lies the crux of our problem.</p> <p>Let&rsquo;s say your user sends a write request to write to the primary node in Madrid and the user&rsquo;s next read request goes to a local read-only replica in Rio de Janeiro. Most of the time LiteFS completes replication quickly and everything is fine. But if your request arrives a few milliseconds before data is replicated, then your user sees the database state from before the write occurred. That&rsquo;s no good.</p> <p>How exactly do we handle that when our database lives outside the user&rsquo;s application?</p> <h2 id='our-initial-series-of-failures-or-how-we-tried-to-teach-distributed-systems-to-users' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-initial-series-of-failures-or-how-we-tried-to-teach-distributed-systems-to-users' aria-label='Anchor'></a><span class='plain-code'>Our initial series of failures, or how we tried to teach distributed systems to users</span></h2> <p>Our first plan was to let LiteFS users manage consistency themselves. Every application may have different needs and, honestly, we didn&rsquo;t have a better plan at the time. However, once we started explaining how to track replication state, it became obvious that it was going to be an untenable approach. Let&rsquo;s start with a primer and you&rsquo;ll understand why.</p> <p>Every node in LiteFS maintains a <em>replication position</em> for each database which consists of two values:</p> <ul> <li>Transaction ID (TXID): An identifier that monotonically increases with every successful write transaction. </li><li>Post-Apply Checksum: A checksum of the entire database after the transaction has been written to disk. </li></ul> <p>You can read the current position from your LiteFS mount from the <code>-pos</code> file:</p> <div class="highlight-wrapper group relative "> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-wv6ha7bx" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-wv6ha7bx">$ cat /litefs/my.db-pos 000000000042478b/8b73bc1d07d84988 </code></pre> </div> </div> <p>This example shows that we are at TXID <code>0x42478b</code> (or 4,343,691 in decimal) and the checksum of our whole database after the transaction is <code>8b73bc1d07d84988</code>. A replica can detect how far it&rsquo;s lagging behind by comparing its position to the primary&rsquo;s position. Typically, a monotonic transaction ID doesn&rsquo;t work in asynchronous replication systems like LiteFS but when we couple it with a checksum it allows us to check for divergence so the pair works surprisingly well.</p> <p>LiteFS handles the replication position internally, however, it would be up to the application to check it to ensure that its clients saw a consistent view. This meant that the application would have needed to have its clients track the TXID from their last write to the primary and then the application would have to wait until its local replication caught up to that position before it could serve the request.</p> <p>That would have been a lot to manage. While you may find the nuts and bolts of replication interesting, sometimes you just want to get your app up and running!</p> <h2 id='lets-use-a-library-er-libraries' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lets-use-a-library-er-libraries' aria-label='Anchor'></a><span class='plain-code'>Let&rsquo;s use a library! Er, libraries.</span></h2> <p>Teaching distributed systems to each and every LiteFS user was not going to work. So instead, we thought we could tuck that complexity away by providing a LiteFS client library. Just import a package and you&rsquo;re done!</p> <p>Libraries are a great way to abstract away the tough parts of a system. For example, nobody wants to roll their own cryptography implementation so they use a library. But LiteFS is a database so it needs to work across all languages which means we needed to implement a library for each language.</p> <p>Actually, it&rsquo;s worse than that. We need to act as a traffic cop to redirect incoming client requests to make sure they arrive at the primary node for writes or that they see a consistent view on a replica for reads. We aren&rsquo;t able to redirect writes at the data layer so it&rsquo;s typically handled at the HTTP layer. Within each language ecosystem there can be a variety of web server implementations: Ruby has Rails &amp; Sinatra, Go has net/http, gin, fasthttp, and whatever 12 new routers came out this week.</p> <h2 id='moving-up-the-abstraction-stack' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#moving-up-the-abstraction-stack' aria-label='Anchor'></a><span class='plain-code'>Moving up the abstraction stack</span></h2> <p>Abstraction often feels like a footgun. Generalizing functionality across multiple situations means that you lose flexibility in specific situations. Sometimes that means you shouldn&rsquo;t abstract but sometimes you just haven&rsquo;t found the right abstraction layer yet.</p> <p>For better or for worse, HTTP &amp; REST-like applications have become the norm in our industry and some of the conventions provide a great layer for LiteFS to build upon. Specifically, the convention of using <code>GET</code> requests for reading data and the other methods (<code>POST</code>, <code>PUT</code>, <code>DELETE</code>, etc) for writing data.</p> <p>Instead of developers injecting a LiteFS library into their application, we built a thin HTTP proxy that lives in front of the application.</p> <p><img alt="Wrapping the application with a proxy &amp; FUSE mount." src="https://slabstatic.com/prod/uploads/p1b436gf/posts/images/25yuWQlLKyLrkHBDFVcbU8to.png" /></p> <p>This approach has let us manage both the incoming client side via HTTP as well as the backend data plane via our FUSE mount. It lets us isolate the application developer from the low-level details of LiteFS replication while making it feel like they&rsquo;re developing against vanilla SQLite.</p> <h2 id='how-it-works' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-it-works' aria-label='Anchor'></a><span class='plain-code'>How it works</span></h2> <p>The LiteFS proxy design is simple but effective. As an example, let&rsquo;s start with a write request. A user creates a new order so they send a <code>POST /orders</code> request to your web app. The LiteFS proxy intercepts the request &amp; parses the HTTP headers to see that it&rsquo;s a <code>POST</code> write request. If the local node is a replica, the proxy forwards the request to the primary node.</p> <p>If the local node is the primary, it&rsquo;ll pass the request through to the application&rsquo;s web server and the request will be processed normally. When the response begins streaming out to the client, the proxy will attach a cookie with the TXID of the newly-written commit.</p> <p>When the client then sends a <code>GET</code> read request, the LiteFS proxy again intercepts it and parses the headers. It can see the TXID that was set in the cookie on the previous write and the proxy will check it against the replication position of the local replica. If replication has caught up to the client&rsquo;s last write transaction, it&rsquo;ll pass through the request to the application. Otherwise, it&rsquo;ll wait for the local node to catch up or it will eventually time out. The proxy is built into the <code>litefs</code> binary so communication with the internal replication state is wicked fast.</p> <h2 id='preventing-laggards' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#preventing-laggards' aria-label='Anchor'></a><span class='plain-code'>Preventing laggards</span></h2> <p>The proxy provides another benefit: health checks. Networks and servers don&rsquo;t always play nice when they&rsquo;re communicating across the world and sometimes they get disconnected. The proxy hooks into the LiteFS built-in heartbeat system to detect lag and it can report the node as unhealthy via a health check URL when this lag exceeds a threshold.</p> <p>If you&rsquo;re running on Fly.io, we&rsquo;ll take that node out of rotation when health checks begin reporting issues so users will automatically get routed to a different, healthy replica. When the replica reconnects to the primary, the health check will report as healthy and the node will rejoin.</p> <h2 id='the-tradeoffs-theres-always-tradeoffs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-tradeoffs-theres-always-tradeoffs' aria-label='Anchor'></a><span class='plain-code'>The Tradeoffs… there&rsquo;s always tradeoffs!</span></h2> <p>Despite how well the LiteFS proxy works in most situations, there&rsquo;s gonna be times when it doesn&rsquo;t quite fit. For example, if your application cannot rely on cookies to track application state then the proxy won&rsquo;t work for you.</p> <p>There are also frameworks, like <a href='https://www.phoenixframework.org/' title=''>Phoenix</a>, which can rely heavily on websockets for live updates so this circumvents your traditional HTTP request/response approach that LiteFS proxy depends on. Finally, the proxy provides <a href='https://jepsen.io/consistency/models/read-your-writes' title=''>read-your-writes</a> guarantees which may not work for every application out there.</p> <p>In these cases, <a href='https://github.com/superfly/litefs/issues/new' title=''>let us know how we can improve the proxy</a> to make it work for more use cases! We&rsquo;d love to hear your thoughts.</p> <h2 id='diving-in-further' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#diving-in-further' aria-label='Anchor'></a><span class='plain-code'>Diving in further</span></h2> <p>The LiteFS proxy makes it easy to run SQLite applications in multiple regions around the world. You can even run many legacy applications with little to no change in the code.</p> <p>If you&rsquo;re interested in setting up LiteFS, check out our <a href='https://fly.io/docs/litefs/getting-started-fly/' title=''>Getting Started</a> guide. You can find additional details about configuring the proxy on our <a href='https://fly.io/docs/litefs/proxy/' title=''>Built-in HTTP Proxy</a> docs page.</p> </content> </entry> <entry> <title>Multiple Logs for Resiliency</title> <link rel="alternate" href="https://fly.io/blog/redundant-logs/"/> <id>https://fly.io/blog/redundant-logs/</id> <published>2023-07-21T00:00:00+00:00</published> <updated>2023-11-21T21:08:37+00:00</updated> <media:thumbnail url="https://fly.io/blog/redundant-logs/assets/lergs-thumb.webp"/> <content type="html"><p>You&rsquo;ve done everything right. You are well aware of <a href='https://en.wikipedia.org/wiki/Murphy%27s_law' title=''>Murphy&rsquo;s Law</a>. You have multiple redundant machines. You&rsquo;ve set up a regular back up schedule for your database, perhaps even are using <a href='https://fly.io/blog/litefs-cloud/' title=''>LiteFS CLoud</a>. You <a href='https://fly.io/blog/shipping-logs/' title=''>ship your logs</a> to <a href='https://logtail.com/' title=''>LogTail</a> or perhaps some other <a href='https://github.com/superfly/fly-log-shipper#provider-configuration' title=''>provider</a> so you can do forensic analysis should anything go wrong&hellip;</p> <p>Then the unexpected happens. A major network outage causes your application to misbehave. What&rsquo;s worse is that your logs are missing crucial data from this point, perhaps because of the same network outage. Maybe this time you are lucky and you can find the data you need by using copies of your logs via <a href='https://fly.io/docs/flyctl/logs/' title=''>flyctl logs</a> or the monitoring tab on the <a href='https://fly.io/docs/flyctl/dashboard/' title=''>flyctl dashboard</a> before they disappear forever.</p> <p>So, what is going on here? Let&rsquo;s look at the steps. Your application writes logs to STDOUT. Fly.io will take that output and send it to <a href='https://nats.io/' title=''>NATS</a>. The <a href='https://github.com/superfly/fly-log-shipper' title=''>Log Shipper</a> will take that data and hand it to <a href='https://vector.dev/docs/about/what-is-vector/' title=''>Vector</a>. From there it is shipped to your third party logging provider. That&rsquo;s a lot of moving parts.</p> <p>All that is great, but just like how you have redundant machines in case of failures, you may want to have redundant logs in addition to the ones fly.io and the log shipper provide. Below are two strategies for doing just that. You can use either or both, and best of all the logs you create will be in addition to your existing logs.</p> <h2 id='logging-to-multiple-places' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#logging-to-multiple-places' aria-label='Anchor'></a><span class='plain-code'>Logging to multiple places</span></h2> <p>The following approach is likely the most failsafe, but often the least convenient: having your primary application on each machine write to a separate log file in addition to standard out. This does mean that when you need this data you will have to fetch it from each machine and it likely with be rather raw. But at least it will be there even in the face of network failures.</p> <p>For best results put these logs on a <a href='https://fly.io/docs/reference/volumes/' title=''>volume</a> so that it survives a restart, and be prepared to rotate logs as they grow in size so that they don&rsquo;t eventually fill up that volume.</p> <p>This approach is necessarily framework specific, but most frameworks provides some ability to do this. A Rails example:</p> <div class="highlight-wrapper group relative ruby"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-2yaa45j3" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-2yaa45j3"><span class="n">logger</span> <span class="o">=</span> <span class="no">ActiveSupport</span><span class="o">::</span><span class="no">Logger</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">STDOUT</span><span class="p">)</span> <span class="n">logger</span><span class="p">.</span><span class="nf">formatter</span> <span class="o">=</span> <span class="n">config</span><span class="p">.</span><span class="nf">log_formatter</span> <span class="n">volume_logger</span> <span class="o">=</span> <span class="no">ActiveSupport</span><span class="o">::</span><span class="no">Logger</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"/logs/production.log"</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="n">logger</span> <span class="o">=</span> <span class="n">logger</span><span class="p">.</span><span class="nf">extend</span> <span class="no">ActiveSupport</span><span class="o">::</span><span class="no">Logger</span><span class="p">.</span><span class="nf">broadcast</span><span class="p">(</span><span class="n">volume_logger</span><span class="p">)</span> </code></pre> </div> </div> <p>You probably already have the first two lines already in your <code>config/environments/production.rb</code> file. Adjust and add the last two lines. That&rsquo;s it! You now have redundant logs.</p> <p>See the <a href='https://docs.ruby-lang.org/en/master/Logger.html#class-Logger-label-Log+File+Rotation' title=''>Ruby docs for Logger</a> documentation on how to handle log rotation.</p> <p>Some pointers for other frameworks:</p> <ul> <li><a href='https://dev.to/darnahsan/elixir-logging-to-multiple-files-using-metadatafilter-3896' title=''>Elixir</a> </li><li><a href='https://laravel.com/docs/10.x/logging' title=''>Laravel</a> </li><li><a href='https://docs.python.org/3/howto/logging-cookbook.html#multiple-handlers-and-formatters' title=''>Python</a> </li><li><a href='https://github.com/winstonjs/winston#multiple-transports-of-the-same-type' title=''>Winston</a> for Node applications </li></ul> <h2 id='custom-log-shipper' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#custom-log-shipper' aria-label='Anchor'></a><span class='plain-code'>Custom log shipper</span></h2> <p>This approach is less bullet proof but may result in more immediately usable results. Instead of using Log Shipper, Vector, and a third party, it is easy to subscribe directly to NATS and process log entries yourself.</p> <p>What you are going to want is a separate app running on a separate machine so that it doesn&rsquo;t go down there are problems with the machine you are monitoring, or even during the times when you are deploying a new version. If the code you write will be writing to disk, you will want a volume.</p> <p>Also like with log shipper, you will want to set the following secret:</p> <div class="highlight-wrapper group relative shell"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-zdu3b55g" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-zdu3b55g">fly secrets <span class="nb">set </span><span class="nv">FLY_AUTH_TOKEN</span><span class="o">=</span><span class="si">$(</span>fly auth token<span class="si">)</span> </code></pre> </div> </div> <p>Here&rsquo;s a minimal JavaScript example that can be run using Node or Bun:</p> <div class="highlight-wrapper group relative javascript"> <button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-fxjs7ls8" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"><g buffered-rendering="static"><path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /><path d="M11.081 6.466L9.533 8.037l1.548 1.571" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"> Wrap text </span> </button> <button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling" > <svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"><g buffered-rendering="static"><path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /><path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /></g></svg> <span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"> Copy to clipboard </span> </button> <div class='highlight relative group'> <pre class='highlight '><code id="code-fxjs7ls8"><span class="k">import</span> <span class="p">{</span> <span class="nx">connect</span><span class="p">,</span> <span class="nx">StringCodec</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">nats</span><span class="dl">"</span><span class="p">;</span> <span class="k">import</span> <span class="nx">fs</span> <span class="k">from</span> <span class="dl">'</span><span class="s1">node:fs</span><span class="dl">'</span><span class="p">;</span> <span class="c1">// tailor these two constants for your needs</span> <span class="kd">const</span> <span class="nx">LOG_FILE</span> <span class="o">=</span> <span class="dl">"</span><span class="s2">/log/production.log</span><span class="dl">"</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">ORGANIZATION</span> <span class="o">=</span> <span class="dl">"</span><span class="s2">your-organization-name</span><span class="dl">"</span><span class="p">;</span> <span class="c1">// create a connection to a nats-server</span> <span class="kd">const</span> <span class="nx">nc</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">connect</span><span class="p">({</span> <span class="na">servers</span><span class="p">:</span> <span class="dl">"</span><span class="s2">[fdaa::3]:4223</span><span class="dl">"</span><span class="p">,</span> <span class="na">user</span><span class="p">:</span> <span class="nx">ORGANIZATION</span><span class="p">,</span> <span class="na">pass</span><span class="p">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">ACCESS_TOKEN</span> <span class="p">});</span> <span class="c1">// open log file</span> <span class="nx">file</span> <span class="o">=</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">openSync</span><span class="p">(</span><span class="nx">LOG_FILE</span><span class="p">,</span> <span class="dl">'</span><span class="s1">a+</span><span class="dl">'</span><span class="p">);</span> <span class="c1">// create a codec</span> <span class="kd">const</span> <span class="nx">sc</span> <span class="o">=</span> <span class="nx">StringCodec</span><span class="p">();</span> <span class="c1">// create a simple subscriber and iterate over messages</span> <span class="c1">// matching the subscription</span> <span class="kd">const</span> <span class="nx">sub</span> <span class="o">=</span> <span class="nx">nc</span><span class="p">.</span><span class="nx">subscribe</span><span class="p">(</span><span class="dl">"</span><span class="s2">logs.&gt;</span><span class="dl">"</span><span class="p">);</span> <span class="k">for</span> <span class="k">await</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">msg</span> <span class="k">of</span> <span class="nx">sub</span><span class="p">)</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">data</span> <span class="o">=</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">sc</span><span class="p">.</span><span class="nx">decode</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">data</span><span class="p">));</span> <span class="c1">// build log file entry</span> <span class="kd">const</span> <span class="nx">log</span> <span class="o">=</span> <span class="p">[</span> <span class="nx">data</span><span class="p">.</span><span class="nx">timestamp</span><span class="p">.</span><span class="nx">padEnd</span><span class="p">(</span><span class="mi">30</span><span class="p">),</span> <span class="s2">`[</span><span class="p">${</span><span class="nx">data</span><span class="p">.</span><span class="nx">fly</span><span class="p">.</span><span class="nx">app</span><span class="p">.</span><span class="nx">instance</span><span class="p">}</span><span class="s2">]`</span><span class="p">,</span> <span class="nx">data</span><span class="p">.</span><span class="nx">fly</span><span class="p">.</span><span class="nx">region</span><span class="p">,</span> <span class="s2">`[</span><span class="p">${</span><span class="nx">data</span><span class="p">.</span><span class="nx">log</span><span class="p">.</span><span class="nx">level</span><span class="p">}</span><span class="s2">]`</span><span class="p">,</span> <span class="nx">data</span><span class="p">.</span><span class="nx">message</span> <span class="p">].</span><span class="nx">join</span><span class="p">(</span><span class="dl">'</span><span class="s1"> </span><span class="dl">'</span><span class="p">)</span> <span class="o">+</span> <span class="dl">"</span><span class="se">\n</span><span class="dl">"</span><span class="p">;</span> <span class="c1">// write entry to disk</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">file</span><span class="p">,</span> <span class="nx">log</span><span class="p">,</span> <span class="nx">error</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">error</span><span class="p">);</span> <span class="p">});</span> <span class="p">}</span> </code></pre> </div> </div> <p>The above is pretty straightforward. It connects to NAT, opens a file, subscribes to logs, parses each message, and writes out selected data to disk. This example is in JavaScript, but feel free to reimplement this basic approach using your favorite language, as NATS supports <a href='https://docs.nats.io/using-nats/developer' title=''>plenty</a>.</p> <p>Things to watch out for: you don&rsquo;t want recursive errors when exceptions occur during write. You want to capture errors and reconnect to NATS when the connection goes down. You may even want to filter messages. A more complete example implementing a number of these features can be found <a href='https://github.com/rubys/showcase/blob/main/fly/applications/logger/logfiler.ts' title=''>here</a>.</p> <h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2> <p>Log failures are not common, and perhaps the redundant logs that fly.io already keeps will be sufficient for your needs. But it may be worth reviewing what your exposure is and how to mitigate that exposure should your logs fail at the worst possible time.</p> <p>Hopefully the approaches listed above give you ideas on how to ensure that you will always have the log data you need even in the most hostile environment conditions.</p> </content> </entry> <entry> <title>Tokenized Tokens</title> <link rel="alternate" href="https://fly.io/blog/tokenized-tokens/"/> <id>https://fly.io/blog/tokenized-tokens/</id> <published>2023-07-12T00:00:00+00:00</published> <updated>2023-11-21T21:08:37+00:00</updated> <media:thumbnail url="https://fly.io/blog/tokenized-tokens/assets/ghosts.png"/> <content type="html"><div class="lead"><p>We’re Fly.io. We run apps for our users on hardware we host around the world. Building security for a platform like this is tricky, and that’s what the post is about. But you don’t have to read any of this to get an app running on here. See how to <a href="https://fly.io/docs/speedrun/" title="">speedrun getting an app running on Fly.io here</a>.</p> </div> <p>We built some little security thingies. We&rsquo;re open sourcing them, and hoping you like them as much as we do. In a nutshell: it&rsquo;s a proxy that injects secrets into arbitrary 3rd-party API calls. We could describe it more completely here, but that wouldn&rsquo;t be as fun as writing a big long essay about how the thingies came to be, so: buckle up.</p> <p>The problem we confront is as old as Rails itself. Our application started simple: some controllers, some models. The only secrets it stored were bcrypt password hashes. But not unlike a pet baby alligator, it grew up. Now it&rsquo;s become more unruly than we&rsquo;d planned.</p> <p>That&rsquo;s because frameworks like Rails make it easy to collect secrets: you just create another model for them, <a href='https://guides.rubyonrails.org/active_record_encryption.html' title=''>roll some kind of secret to encrypt them</a>, jam that secret into the deployment environment, and call it a day.</p> <p>And, at least in less sensitive applications, or even the early days of an app like ours, that can work!</p> <div class="callout"><p>For what it’s worth, and to the annoyance of some of our Heroku refugees, we’ve never stored customer app secrets this way; our Rails API can write customer secrets, but has never been able to read them. We’ll talk more about how this works in a sec.</p> </div> <p>But for us, not anymore. At the stage we&rsquo;re at, all secrets are hazmat. And Rails itself is the portion of our attack surface we&rsquo;re least confident about – the rest of it is either outside of our trust boundaries, or written in Rust and Go, strongly-typed memory-safe languages that are easy to reason about, and which have never accidentally treated YAML as an executable file format.</p> <p>So, a few months back, during an integration with a 3rd party API that relied on OAuth2 tokens, we drew a line: ⚡ <em>henceforth, hazmat shall only be removed from Rails, never added</em> ⚡. This is easier said than done, though: despite prominent &ldquo;this is not a place of honor&rdquo; signs all over the codebase, our Rails API is still where much of the action in our system takes place.</p> <h3 id='how-apps-use-secrets-3-different-approaches' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-apps-use-secrets-3-different-approaches' aria-label='Anchor'></a><span class='plain-code'>How Apps Use Secrets: 3 Different Approaches</span></h3> <p><img src="/blog/tokenized-tokens/assets/secrets-1.png?2/3&amp;card&amp;center" /></p> <p>We just gave you one way, probably the most common. Stick &lsquo;em in a model, encrypt them with an environment secret, and watch Dependabot religiously for vulnerabilities in transitively-added libraries you&rsquo;ve never heard of before.</p> <p><img src="/blog/tokenized-tokens/assets/secrets-2.png?2/3&amp;card&amp;center" /></p> <p>Here&rsquo;s a second way, probably the second-most popular: use a secrets management system, like <a href='https://aws.amazon.com/kms/' title=''>KMS</a> or <a href='https://www.hashicorp.com/products/vault' title=''>Vault</a>. These systems, which are great, keep secrets encrypted and allow access based on an intricate access control language, which is great.</p> <p>That&rsquo;s what we do for customer app secrets, like <code>DATABASE_URL</code> and <code>API_KEY</code>. We use <a href='https://www.hashicorp.com/products/vault' title=''>HashiCorp Vault</a> (for the time being). Our Rails API has an access token for Vault that allows it to set secrets, but not read any of them back, like a kind of diode. A game-over Rails vulnerability might allow an attacker to scramble secrets, but not to easily dump them.</p> <p>In the happiest cases with secrets, systems like Vault can keep secret bits from ever touching the application. Customer app secrets are a happy case: Rails never needs to read them, <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>just our orchestrator</a>, to inject them into VM environments. In other happy cases, Vault operates on the app&rsquo;s behalf: signing a time-limited request URL for AWS, or making a direct request to a known 3rd-party service. Vault calls these features &ldquo;<a href='https://developer.hashicorp.com/vault/docs/secrets' title=''>secret engines</a>&rdquo;, and when you can get away with using them, it&rsquo;s hard to do better.</p> <p>The catch is, sometimes you can&rsquo;t get away with them. For most 3rd parties, Vault has no idea how to interact with them. And most secrets are bearer tokens, not request signatures. The only way to use those kinds of secrets is to read them into app memory. If good code can read a secret from Vault, so can a YAML vulnerability.</p> <div class="callout"><p>Still: this is better than nothing: even if apps can read raw secrets, systems like Vault can provide an audit trail of which secrets were pulled when, and make it much easier to rotate secrets, which you’ll want to do with raw secrets to contain their blast radius. HashiCorp Vault is great, so is KMS, we recommend them unreservedly.</p> </div> <p><img src="/blog/tokenized-tokens/assets/secrets-3.png?2/3&amp;card&amp;center" /></p> <p>So that&rsquo;s why there&rsquo;s a third way to handle this problem, which is: decompose your application into services so that the parts that have to handle secrets are tiny and well-contained. The bulk of our domain-specific business code can chug along in Rails, and the parts that trade bearer tokens with 3rd parties can be built in a couple hundred lines of Go.</p> <p>This is a good approach, too. It&rsquo;s just cumbersome, because a big application ends up dealing with lots of different kinds of secrets, making a trusted microservice for each of them is a drag. What you want is to notice some commonality in how 3rd party API secrets are used, and to come up with some possible way of exploiting that.</p> <p>We thought long and hard on this and came up with:</p> <h3 id='tokenizer-the-fabled-4th-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#tokenizer-the-fabled-4th-way' aria-label='Anchor'></a><span class='plain-code'>Tokenizer: The Fabled 4th Way</span></h3> <p><img src="/blog/tokenized-tokens/assets/secrets-4.png?2/3&amp;card&amp;center" /></p> <p>We developed a multipurpose secret-using service called the <code>Tokenizer</code>.</p> <p><code>Tokenizer</code> is a stateless HTTP proxy that holds the private key of a <a href='https://pkg.go.dev/golang.org/x/crypto/nacl/box' title=''>Curve25519 keypair.</a></p> <p>When we get a new 3rd party API secret, we encrypt it to <code>Tokenizer&#39;s</code> public key; we &ldquo;tokenize&rdquo; it. Our API server can handle the (encrypted) tokenized secret, but it can&rsquo;t read or use it directly. Only <code>Tokenizer</code> can.</p> <p>When it comes time to talk to the 3rd party API, Rails does so via <code>Tokenizer</code>. Here&rsquo;s how that works:</p> <ol> <li>The API request is proxied, as an ordinary HTTP 1.1 request, through <code>Tokenizer</code>. </li><li>The request carries one or more additional <code>Proxy-Tokenizer</code> headers. </li><li>Each <code>Proxy-Tokenizer</code> header carries an encrypted secret and instructions for <code>Tokenizer</code> to rewrite the request in some way, usually by injecting the decrypted plaintext into a header. </li></ol> <p>You can think of <code>Tokenizer</code> as a sort of Vault-style &ldquo;secret engine&rdquo; that happens to capture virtually everything an app needs secrets for. It can even use decrypted secrets to selectively HMAC parts of requests, for APIs that authenticate with signatures instead of bearer tokens.</p> <p>Check it out: <a href='https://github.com/superfly/tokenizer' title=''>it&rsquo;s not super complicated</a>.</p> <p>Now, our goal is to keep Rails from ever touching secret bits. But, hold on: a game-over Rails vulnerability would give attackers an easy way around <code>Tokenizer</code>: you&rsquo;d just proxy requests for a particular secret to a service you ran that collected the plaintext.</p> <p>To mitigate that, we built the obvious feature: you can lock requests for specific secrets down to a list of allowed hosts or host regexp patterns.</p> <p>We think this approach to handling secrets is pretty similar to how payment processors tokenize payment card information, hence the name. The advantages are straightforward:</p> <ul> <li>Secrets are exposed to a much smaller attack surface that doesn&rsquo;t include Rails. </li><li>Virtually every usage of secrets we&rsquo;re likely to run across is captured by HTTP proxying, without us needing to write per-service code. </li><li>The tokenizer is a tiny project that&rsquo;s easy to audit and reason about. </li><li>Every language we work in already has first-class support for running requests through a proxy (something we already do for <a href='https://github.com/stripe/smokescreen' title=''>SSRF protection</a>.) </li></ul> <h3 id='ssokenizer-tokenizing-oauth-sso' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ssokenizer-tokenizing-oauth-sso' aria-label='Anchor'></a><span class='plain-code'>SSOkenizer: Tokenizing OAuth SSO</span></h3> <p>When we created <code>Tokenizer</code>, we were motivated by the problem of OAuth2 tokens other services providers gave us, for partnership features we build for mutual customers.</p> <p>We&rsquo;d also dearly like our customers to use OAuth2/OIDC to log into Fly.io itself; it&rsquo;s more secure for them, and gives them the full complement of Google MFA features, meaning we don&rsquo;t immediately have to implement the full complement of Google MFA features. Letting people log into Fly.io with a Google OAuth token means we have to keep track of people&rsquo;s OAuth tokens. That sounds like a job for the <code>Tokenizer</code>!</p> <p>But there&rsquo;s a catch: acquiring those OAuth tokens in the first place means doing the OAuth2 dance, which means that for a brief window of time, Rails is handling hazmat. We&rsquo;d like to close that window.</p> <p><img src="/blog/tokenized-tokens/assets/ssokenizer.png?2/3&amp;card&amp;center" /></p> <p>Enter the <code>SSOkenizer</code>.</p> <p>The job of the <code>SSOkenizer</code> is to perform the OAuth2 dance on behalf of Rails, and then use the output of that process (the OAuth2 bearer token yielded from the OAuth2 code flow, which you can <a href='https://github.com/superfly/ssokenizer#ssokenizer' title=''>see in its cursed majesty here</a>) to drive the <code>Tokenizer</code>.</p> <p>In other words, where we&rsquo;d otherwise explicitly encrypt secrets to be tokenized a-priori, the <code>SSOkenizer</code> does that on the fly, passing tokenized OAuth2 credentials back to Rails. Those… tokenized tokens can only be used through the <code>Tokenizer</code> proxy, which is the only component in our system with the private key that unseals them.</p> <p>We think this is a pretty neat trick. The <code>SSOkenizer</code> itself is tiny, even smaller than the <code>Tokenizer</code> (<a href='https://github.com/superfly/ssokenizer/' title=''>you can read it here</a>), and essentially stateless; in fact, pretty much everything in this system is minimally stateful, except Rails, which is great at being stateful. We even keep almost all of OAuth2 out of Rails and confined to Go code (where it&rsquo;s practically the hello-world of Go OAuth2 libraries).</p> <p>A nice side effect-slash-validation of this design: once we got it working for Google, it became a super easy project to get OAuth2 logins working for other providers.</p> <h3 id='feel-free-to-poach-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#feel-free-to-poach-this' aria-label='Anchor'></a><span class='plain-code'>Feel Free To Poach This</span></h3> <p>We&rsquo;re psyched for a bunch of reasons:</p> <ul> <li>We&rsquo;ve got a clear path to rolling out SSO logins. </li><li>We can do integrations with third-party services now without infecting Rails with more hazmat secrets. </li><li>We&rsquo;ve honored the rule of &ldquo;only removing hazmat from Rails, not adding it&rdquo;. </li><li>We&rsquo;ve also cleared a path to getting all the rest of the hazmat Rails has access to tokenized. </li></ul> <p>These are standalone tools with no real dependencies on Fly.io, so they&rsquo;re easy for us to open source. Which is what we did: if they sound useful to you, check out the <a href='https://github.com/superfly/tokenizer' title=''>tokenizer</a> and <a href='https://github.com/superfly/ssokenizer' title=''>ssokenizer</a> repositories for instructions on deploying and using these services yourself.</p> </content> </entry> <entry> <title>Fly.io ❤️ Bun</title> <link rel="alternate" href="https://fly.io/blog/flydotio-heart-bun/"/> <id>https://fly.io/blog/flydotio-heart-bun/</id> <published>2023-07-11T00:00:00+00:00</published> <updated>2024-04-12T18:23:39+00:00</updated> <media:thumbnail url="https://fly.io/blog/flydotio-heart-bun/assets/flydotio-heart-bun-thumb.webp"/> <content type="html"><p><a href='https://lu.ma/cqk31rvl' title=''>Bun 1.0 comes out September 7th</a>. Fly.io is making preparations.</p> <p>Previously, we stated that <a href='https://fly.io/blog/flydotio-heart-js/' title=''>Fly.io ❤️ JS</a>, and we understandably started with Node.js. While that work is ongoing, it makes sense to start expanding to other runtimes.</p> <p>Bun is the obvious next choice given it <a href='https://bun.sh/docs/runtime/nodejs-apis' title=''>aims for complete Node.js API compatibility</a>.</p> <p>Starting with <a href='https://fly.io/docs/hands-on/install-flyctl/' title=''>flyctl</a> version 0.1.54 and <a href='https://www.npmjs.com/package/@flydotio/dockerfile' title=''>@flydotio/dockerfile</a> version 0.3.3, you can launch and deploy bun applications using <code>fly launch</code> and <code>fly deploy</code>, provided:</p> <ul> <li>You&rsquo;ve installed bun version 0.5.3 or later </li><li>You have a <code>package.json</code> that meets at least one of the following conditions: <ul> <li>It has a <code>start</code> entry in the <code>scripts</code> section. </li><li>It has a <code>module</code> entry and specified <code>module</code> as the <code>type</code>. </li><li>If has a <code>main</code> entry. </li></ul> </li></ul> <p>Basically, if you can run <a href='https://bun.sh/docs/quickstart' title=''>Bun&rsquo;s Quickstart</a> and <a href='https://fly.io/docs/hands-on/' title=''>Fly&rsquo;s hands-on walk-through</a>, you have all you need to deploy your application on fly.io.</p> <p>We also have a <a href='https://github.com/fly-apps/bun/' title=''>sample</a> that you can deploy.</p> <p>Be forewarned that everything is beta at this point. Some issues we encountered while preparing this support:</p> <ul> <li><a href='https://github.com/oven-sh/bun/issues/3605' title=''><code>bun install</code> has no <code>--prune</code> option</a>. Our Dockerfiles use this to remove development dependencies after running <code>build</code>. Of course with bun you are less likely to need a build step as TS and JSX are built in. </li><li><a href='https://github.com/oven-sh/bun/issues/1579' title=''><code>throwIfNoEntry</code> is not supported in <code>fs.statSync</code></a>. <a href='https://github.com/fly-apps/node-demo' title=''><code>fly-apps/node-demo</code></a> uses that. </li><li>Programs that used <a href='https://nodejs.org/api/readline.html' title=''>readline</a> <a href='https://github.com/oven-sh/bun/issues/3604' title=''>never exit</a>. Switching to <a href='https://bun.sh/docs/api/globals' title=''>global</a>.<a href='https://developer.mozilla.org/en-US/docs/Web/API/Window/prompt' title=''>prompt</a> resolved this issue for <code>@flydotio/dockerfile</code>. </li></ul> <p>Undoubtedly there will be bugs in fly&rsquo;s dockerfile generator too. But as Node.js and Bun share the same generator, fixes that are made for either framework will generally benefit both.</p> <p>If you see a problem, <a href='https://community.fly.io/' title=''>start a discussion</a>, <a href='https://github.com/fly-apps/dockerfile-node' title=''>open an issue</a>, or <a href='https://github.com/fly-apps/dockerfile-node/pulls' title=''>create a pull request</a>.</p> </content> </entry> <entry> <title>LiteFS Cloud: Distributed SQLite with Managed Backups</title> <link rel="alternate" href="https://fly.io/blog/litefs-cloud/"/> <id>https://fly.io/blog/litefs-cloud/</id> <published>2023-07-05T00:00:00+00:00</published> <updated>2023-11-21T21:08:37+00:00</updated> <media:thumbnail url="https://fly.io/blog/litefs-cloud/assets/litefs-cloud-thumb.webp"/> <content type="html"><div class="lead"><p>With Fly.io, <a href="https://fly.io/docs/speedrun/" title="">you can get your app running globally in a matter of minutes</a>, and with LiteFS, you can run SQLite alongside your app! Now we’re introducing LiteFS Cloud: managed backups and point-in-time restores for LiteFS—whether your app is running on Fly.io or anywhere else. <a href="https://fly.io/docs/litefs/speedrun/" title="">Try it out for yourself</a>!</p> </div> <p>We love <a href='https://fly.io/blog/all-in-on-sqlite-litestream/' title=''>SQLite in production</a>, and we&rsquo;re all about running apps close to users. That&rsquo;s why we created LiteFS: an open source distributed SQLite database that lives on the same filesystem as your application, and replicates data to all the nodes in your app cluster.</p> <p>With LiteFS, you get the simplicity, flexibility, and lightning-fast local reads of working with vanilla SQLite, but distributed (so it&rsquo;s close to your users)! It&rsquo;s especially great for read-heavy web applications. Learn more about LiteFS in the <a href='https://fly.io/docs/litefs/' title=''>LiteFS docs</a> and in <a href='https://fly.io/blog/introducing-litefs/' title=''>our blog post introducing LiteFS</a>.</p> <p>At Fly.io we&rsquo;ve been using LiteFS internally for a while now, and it&rsquo;s awesome!</p> <p>However, something is missing: disaster recovery. Because it&rsquo;s local to your app, you don&rsquo;t need to—indeed can&#39;t—pay someone to manage your LiteFS cluster, which means no managed backups. Until now, you&rsquo;ve had to <a href='https://fly.io/docs/litefs/backup/' title=''>build your own</a>: take regular snapshots, store them somewhere, figure out a retention policy, that sort of thing.</p> <p>This also means you can only restore from a point in time when you happen to have taken a snapshot, and you likely need to limit how frequently you snapshot for cost reasons. Wouldn&rsquo;t it be cool if you could have super-frequent reliable backups to restore from, without having to implement it yourself?</p> <p>Well, that&rsquo;s why we&rsquo;re launching, in preview, LiteFS Cloud: backups and restores for LiteFS, managed by Fly.io. It gives you painless and reliable backups, with the equivalent of a snapshot every five minutes (8760 snapshots per month!), whether your database is hosted with us, or anywhere else.</p> <h2 id='how-do-i-use-litefs-cloud' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-do-i-use-litefs-cloud' aria-label='Anchor'></a><span class='plain-code'>How do I use LiteFS Cloud?</span></h2> <p>There&rsquo;s a few steps to get started:</p> <ul> <li>Upgrade LiteFS to version 0.5.1 or greater </li><li>Create a LiteFS Cloud cluster in the Fly.io dashboard, <a href='https://fly.io/dashboard/personal/litefs' title=''>LiteFS Cloud section</a> </li><li>Make the LiteFS Cloud auth token available to your LiteFS </li></ul> <p><img alt="Screenshot of Fly.io dashboard, with a red arrow pointing to &quot;LiteFS Cloud&quot; in the left navbar, and another red arrow pointing to the &quot;Create&quot; button on the top right for creating a LiteFS Cloud cluster" src="/blog/litefs-cloud/assets/screenshot1.png" /></p> <p><a href='https://fly.io/docs/litefs/cloud-backups' title=''>There are some docs here</a>, but that’s literally it. Then your database will start automagically backing up, we’ll manage the backups for you, and you’ll be able to restore your database near instantaneously to any point in time in the last 30 days (with 5 minute granularity).</p> <p>I want to say that again because I think it’s just wild – you can restore your database to <em>any point in time, with 5 minute granularity</em>. <strong class='font-semibold text-navy-950'><em>Near instantaneously</em></strong>.</p> <p>Speaking of restores&mdash;you can do those in the dashboard too. You pick a date and time, and we’ll take the most recent snapshot before that timestamp and restore it. This will take a couple of seconds (or less).</p> <p><img alt="Screenshot of popup modal on Fly.io dashboard, with a date and time selector, and a text field with &quot;lfsc-test-runner/db&quot; typed in it, and a red button at the bottom with text &quot;I understand the consequences. Restore from this snapshot.&quot;" src="/blog/litefs-cloud/assets/screenshot2.png" /></p> <p>We&rsquo;ll introduce pricing in the coming months, but for now LiteFS Cloud is in preview and is free to use. Please go check it out, and let us know how it goes!</p> <h2 id='the-secret-sauce-ltx-amp-compactions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-secret-sauce-ltx-amp-compactions' aria-label='Anchor'></a><span class='plain-code'>The secret sauce: LTX &amp; compactions</span></h2> <p>LiteFS is built on a simple file format called <a href='https://github.com/superfly/ltx' title=''>Lite Transaction File (LTX)</a> which is designed for fast, flexible replication and recovery in LiteFS itself and in LiteFS Cloud.</p> <p>But first, let&rsquo;s start off with what an LTX file represents: <em>a change set of database pages</em>.</p> <p>When you commit a write transaction in SQLite, it updates one or more fixed-sized blocks called pages. By default, these are 4KB in size. An LTX file is simply a sorted list of these changed pages. Whenever you perform a transaction in SQLite, LiteFS will build an LTX file for that transaction.</p> <p>The interesting part of LTX is that contiguous sets of LTX files can be merged together into one LTX file. This merge process is called <em>compaction</em>.</p> <p>For example, let&rsquo;s say you have 3 transactions in a row that update the following set of pages:</p> <ul> <li>LTX A: Pages 1, 5, 7 </li><li>LTX B: Pages 5, 6 </li><li>LTX C: Pages 5, 7 </li></ul> <p>With LTX compaction, you avoid the duplicate work that comes from overwriting the same pages one transaction at a time. Instead, one LTX file for transactions A through C contains the last version of each page, so the pages are stored and updated only once:</p> <p><img alt="Compacting three contiguous LTX files into a single LTX file." src="/blog/litefs-cloud/assets/single-level-compaction.png" /></p> <p>That, in a nutshell, is how a single-level compaction works.</p> <h2 id='its-ltx-all-the-way-down' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#its-ltx-all-the-way-down' aria-label='Anchor'></a><span class='plain-code'>It&rsquo;s LTX all the way down</span></h2> <p>Compactions let us take changes for a bunch of transactions and smoosh them down into a single, small file. That&rsquo;s cool and all but how does that give us fast point-in-time restores? By the magic of multi-level compactions!</p> <p>Compaction levels are progressively larger time intervals that we roll up transaction data. In the following illustration, you can see that the highest level (L3) starts with a full snapshot of the database. This occurs daily and it&rsquo;s our starting point during a restore.</p> <p>Next, we have an hourly compaction level called L2 so there will be an LTX file with page changes between midnight and 1am, and then another file for 1am to 2am, etc. Below that is L1 which holds 5-minute intervals of data.</p> <p><img alt="Compaction levels for snapshots (L3), hourly (L2), &amp; every five minutes (L1)." src="/blog/litefs-cloud/assets/multi-level-compaction.png" /></p> <p>When a restore is requested for a specific timestamp, we can determine a minimal set of LTX files to replay. For example, if we restored to January 10th at 8:15am we would grab the following files:</p> <ul> <li>Start with the snapshot for January 10th. </li><li>Fetch the eight hourly LTX files from midnight to 8am. </li><li>Fetch the three 5-minute interval LTX files from 8:00am to 8:15am. </li></ul> <p>Since LTX files are sorted by page number, we can perform a streaming merge of these twelve files and end up with the state of the database at the given timestamp.</p> <h2 id='department-of-redundancy-department' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#department-of-redundancy-department' aria-label='Anchor'></a><span class='plain-code'>Department of Redundancy Department</span></h2> <p>One of the primary goals of LiteFS is to be simple to use. However, that&rsquo;s not an easy goal for a distributed database when our industry is moving more and more towards highly dynamic and ephemeral infrastructure. Traditional consensus algorithms require stable membership and adjusting the member set can be complicated.</p> <p>With LiteFS, we chose to use async replication as the primary mode of operation. This has some trade-offs in durability guarantees but it makes the cluster much simpler to operate. LiteFS Cloud alleviates many of these trade-offs of async replication by writing data out to high-durability, high-availability object storage&mdash;for now, we&rsquo;re using S3.</p> <p>However, we don&rsquo;t write every individual LTX file to object storage immediately. The latency is too high and it&rsquo;s not cost effective when you write a lot of transactions. Instead, the LiteFS primary node will batch up its changes every second and send a single, compacted LTX file to LiteFS Cloud. Once there, LiteFS Cloud will batch these 1-second files together and flush them to storage periodically.</p> <p>We track the ID of the latest transaction that&rsquo;s been flushed, and we call this the &ldquo;high water mark&rdquo; or HWM. This transaction ID is propagated back down to the nodes of the LiteFS cluster so we can ensure that the transaction file is not removed from any node until it is safely persisted in object storage. With this approach, we have multiple layers of redundancy in case your LiteFS cluster can&rsquo;t communicate with LiteFS Cloud or if we can&rsquo;t communicate with S3.</p> <h2 id='whats-next-for-litefs-cloud' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next-for-litefs-cloud' aria-label='Anchor'></a><span class='plain-code'>What&rsquo;s next for LiteFS Cloud?</span></h2> <p>We have a small team dedicated to LiteFS Cloud, and we&rsquo;re chugging away at new exciting features! Right now, LiteFS Cloud is really just backups and restores, but we are working on a lot of other cool stuff:</p> <ul> <li>Upload your database in the Fly.io dashboard. This way you don&rsquo;t have to worry about figuring out how to initialize your database when you first deploy it, just upload the database in the dashboard and LiteFS will pull it from LiteFS Cloud. </li><li>Download a point-in-time snapshot of your database from the Fly.io dashboard. You can use this to spin up a local dev env (with production data), do some local analysis, etc. </li><li>Clone your LiteFS Cloud cluster to a new cluster, which you could use for a staging environment (or on-demand test environments for your CI pipelines) with real data. </li><li>Features to support apps that run on serverless platforms like Vercel, Google Cloud Run, Deno, and more. We&rsquo;ll need to develop a number of different features for this, stay tuned for more information in the coming weeks! </li></ul> <p>We&rsquo;re really excited about the future of LiteFS Cloud, so we wanted to share what we&rsquo;re thinking. We&rsquo;d also love to hear any feedback you have about these ideas that might inform our work.</p> </content> </entry> </feed>
{ "cache-control": "max-age=0, private, must-revalidate", "cf-cache-status": "DYNAMIC", "cf-ray": "929b55a1d3aff32d-ORD", "connection": "keep-alive", "content-type": "text/xml", "date": "Tue, 01 Apr 2025 21:56:06 GMT", "etag": "W/\"67ec39be-dd651\"", "fly-request-id": "01JQSNNGAE35X4P31YE6HAX83M-chi", "last-modified": "Tue, 01 Apr 2025 19:08:46 GMT", "server": "cloudflare", "set-cookie": "fly_gtm={}; path=/; expires=Wed, 01 Apr 2026 21:56:06 GMT; max-age=31536000; secure; HttpOnly; SameSite=Lax", "transfer-encoding": "chunked", "vary": "accept-encoding", "via": "1.1 fly.io, 1.1 fly.io, 1.1 fly.io" }
{ "meta": { "type": "atom", "version": "1.0" }, "language": null, "title": "The Fly Blog", "description": "News, tips, and tricks from the team at Fly", "copyright": null, "url": "https://fly.io/blog/", "self": "https://fly.io/blog/", "published": null, "updated": "2025-03-27T00:00:00.000Z", "generator": null, "image": null, "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [], "items": [ { "id": "https://fly.io/blog/operationalizing-macaroons/", "title": "Operationalizing Macaroons", "description": null, "url": "https://fly.io/blog/operationalizing-macaroons/", "published": "2025-03-27T00:00:00.000Z", "updated": "2025-04-01T19:05:33.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io, a security bearer token company with a public cloud problem. You can read more about what our platform does (Docker container goes in, virtual machine in Singapore comes out), but this is just an engineering deep-dive into how we make our security tokens work. It’s a tokens nerd post.</p>\n</div><h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'></a><span class='plain-code'>1</span></h2>\n<p>We’ve spent <a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''>too much time</a> talking about <a href='https://fly.io/blog/tokenized-tokens/' title=''>security tokens</a>, and about <a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''>Macaroon tokens</a> <a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''>in particular</a>. Writing another Macaroon treatise was not on my calendar. But we’re in the process of handing off our internal Macaroon project to a new internal owner, and in the process of truing up our operations manuals for these systems, I found myself in the position of writing a giant post about them. So, why not share?</p>\n<div class=\"callout\"><p>Can I sum up Macaroons in a short paragraph? Macaroon tokens are bearer tokens (like JWTs) that use a cute chained-HMAC construction that allows an end-user to take any existing token they have and scope it down, all on their own. You can minimize your token before every API operation so that you’re only ever transmitting the least amount of privilege needed for what you’re actually doing, even if the token you were issued was an admin token. And they have a user-serviceable plug-in interface! <a href=\"https://fly.io/blog/macaroons-escalated-quickly/\" title=\"\">You’ll have to read the earlier post to learn more about that</a>.</p>\n</div><div class=\"right-sidenote\"><p>Yes, probably, we are.</p>\n</div>\n<p>A couple years in to being the Internet’s largest user of Macaroons, I can report (as many predicted) that for our users, the cool things about Macaroons are a mixed bag in practice. It’s very neat that users can edit their own tokens, or even email them to partners without worrying too much. But users don’t really take advantage of token features.</p>\n\n<p>But I’m still happy we did this, because Macaroon quirks have given us a bunch of unexpected wins in our infrastructure. Our internal token system has turned out to be one of the nicer parts of our platform. Here’s why.</p>\n\n<p><img alt=\"This should clear everything up.\" src=\"/blog/operationalizing-macaroons/assets/schematic-diagram.png\" /></p>\n<h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'></a><span class='plain-code'>2</span></h2>\n<p>As an operator, the most important thing to know about Macaroons is that they’re online-stateful; you need a database somewhere. A Macaroon token starts with a random field (a nonce) and the first thing you do when verifying a token is to look that nonce up in a database. So one of the most important details of a Macaroon implementation is where that database lives.</p>\n\n<p>I can tell you one place we’re not OK with it living: in our primary API cluster.</p>\n\n<p>There’s several reasons for that. Some of them are about scalability and reliability: far and away the most common failure mode of an outage on our platform is “deploys are broken”, and those failures are usually caused by API instability. It would not be OK if “deploys are broken” transitively meant “deployed apps can’t use security tokens”. But the biggest reason is security: root secrets for Macaroon tokens are hazmat, and a basic rule of thumb in secure design is: keep hazmat away from complicated code.</p>\n\n<p>So we created a deliberately simple system to manage token data. It’s called <code>tkdb</code>.</p>\n<div class=\"right-sidenote\"><p>LiteFS: primary/replica distributed SQLite; Litestream: PITR SQLite replication to object storage; both work with unmodified SQLite libraries.</p>\n</div>\n<p><code>tkdb</code> is about 5000 lines of Go code that manages a SQLite database that is in turn managed by <a href='https://fly.io/docs/litefs/' title=''>LiteFS</a> and <a href='https://litestream.io/' title=''>Litestream</a>. It runs on isolated hardware (in the US, Europe, and Australia) and records in the database are encrypted with an injected secret. LiteFS gives us subsecond replication from our US primary to EU and AU, allows us to shift the primary to a different region, and gives us point-in-time recovery of the database.</p>\n\n<p>We’ve been running Macaroons for a couple years now, and the entire <code>tkdb</code> database is just a couple dozen megs large. Most of that data isn’t real. A full PITR recovery of the database takes just seconds. We use SQLite for a lot of our infrastructure, and this is one of the very few well-behaved databases we have.</p>\n\n<p>That’s in large part a consequence of the design of Macaroons. There’s actually not much for us to store! The most complicated possible Macaroon still chains up to a single root key (we generate a key per Fly.io “organization”; you don’t share keys with your neighbors), and everything that complicates that Macaroon happens “offline”. We take advantage of “attenuation” far more than our users do.</p>\n\n<p>The result is that database writes are relatively rare and very simple: we just need to record an HMAC key when Fly.io organizations are created (that is, roughly, when people sign up for the service and actually do a deploy). That, and revocation lists (more on that later), which make up most of the data.</p>\n<h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'></a><span class='plain-code'>3</span></h2>\n<p>Talking to <code>tkdb</code> from the rest of our platform is complicated, for historical reasons.</p>\n<div class=\"right-sidenote\"><p>NATS is fine, we just don’t really need it.</p>\n</div>\n<p>Ben Toews is responsible for most of the good things about this implementation. When he inherited the v0 Macaroons code from me, we were in the middle of a weird love affair with <a href='https://nats.io/' title=''>NATS</a>, the messaging system. So <code>tkdb</code> exported an RPC API over NATS messages.</p>\n\n<p>Our product security team can’t trust NATS (it’s not our code). That means a vulnerability in NATS can’t result in us losing control of all our tokens, or allow attackers to spoof authentication. Which in turn means you can’t run a plaintext RPC protocol for <code>tkdb</code> over NATS; attackers would just spoof “yes this token is fine” messages.</p>\n<div class=\"right-sidenote\"><p>I highly recommend implementing Noise; <a href=\"http://www.noiseprotocol.org/noise.html\" title=\"\">the spec</a> is kind of a joy in a way you can’t appreciate until you use it, and it’s educational.</p>\n</div>\n<p>But you can’t just run TLS over NATS; NATS is a message bus, not a streaming secure channel. So I did the hipster thing and implemented <a href='http://www.noiseprotocol.org/noise.html' title=''>Noise</a>. We export a “verification” API, and a “signing” API for minting new tokens. Verification uses <code>Noise_IK</code> (which works like normal TLS) — anybody can verify, but everyone needs to prove they’re talking to the real <code>tkdb</code>. Signing uses <code>Noise_KK</code> (which works like mTLS) — only a few components in our system can mint tokens, and they get a special client key.</p>\n\n<p>A little over a year ago, <a href='https://fly.io/blog/the-exit-interview-jp/' title=''>JP</a> led an effort to replace NATS with HTTP, which is how you talk to <code>tkdb</code> today. Out of laziness, we kept the Noise stuff, which means the interface to <code>tkdb</code> is now HTTP/Noise. This is a design smell, but the security model is nice: across many thousands of machines, there are only a handful with the cryptographic material needed to mint a new Macaroon token. Neat!</p>\n\n<p><code>tkdb</code> is a Fly App (albeit deployed in special Fly-only isolated regions). Our infrastructure talks to it over “<a href='https://fly.io/docs/networking/flycast/' title=''>FlyCast</a>”, which is our internal Anycast service. If you’re in Singapore, you’re probably get routed to the Australian <code>tkdb</code>. If Australia falls over, you’ll get routed to the closest backup. The proxy that implements FlyCast is smart, as is the <code>tkdb</code> client library, which will do exponential backoff retry transparently.</p>\n\n<p>Even with all that, we don’t like that Macaroon token verification is “online”. When you operate a global public cloud one of the first thing you learn is that <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>the global Internet sucks</a>. Connectivity breaks all the time, and we’re paranoid about it. It’s painful for us that token verification can imply transoceanic links. Lovecraft was right about the oceans! Stay away!</p>\n\n<p>Our solution to this is caching. Macaroons, as it turns out, cache beautifully. That’s because once you’ve seen and verified a Macaroon, you have enough information to verify any more-specific Macaroon that descends from it; that’s a property of <a href='https://research.google/pubs/macaroons-cookies-with-contextual-caveats-for-decentralized-authorization-in-the-cloud/' title=''>their chaining HMAC construction</a>. Our client libraries cache verifications, and the cache ratio for verification is over 98%.</p>\n<h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'></a><span class='plain-code'>4</span></h2>\n<p><a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''>Revocation isn’t a corner case</a>. It can’t be an afterthought. We’re potentially revoking tokens any time a user logs out. If that doesn’t work reliably, you wind up with “cosmetic logout”, which is a real vulnerability. When we kill a token, it needs to stay dead.</p>\n\n<p>Our revocation system is simple. It’s this table:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-13jllwee\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-13jllwee\"> CREATE TABLE IF NOT EXISTS blacklist ( \n nonce BLOB NOT NULL UNIQUE, \n required_until DATETIME,\n created_at DATETIME DEFAULT CURRENT_TIMESTAMP\n );\n</code></pre>\n </div>\n</div>\n<p>When we need a token to be dead, we have our primary API do a call to the <code>tkdb</code> “signing” RPC service for <code>revoke</code>. <code>revoke</code> takes the random nonce from the beginning of the Macaroon, discarding the rest, and adds it to the blacklist. Every Macaroon in the lineage of that nonce is now dead; we check the blacklist before verifying tokens.</p>\n\n<p>The obvious challenge here is caching; over 98% of our validation requests never hit <code>tkdb</code>. We certainly don’t want to propagate the blacklist database to 35 regions around the globe.</p>\n\n<p>Instead, the <code>tkdb</code> “verification” API exports an endpoint that provides a feed of revocation notifications. Our client library “subscribes” to this API (really, it just polls). Macaroons are revoked regularly (but not constantly), and when that happens, clients notice and prune their caches.</p>\n\n<p>If clients lose connectivity to <code>tkdb</code>, past some threshold interval, they just dump their entire cache, forcing verification to happen at <code>tkdb</code>.</p>\n<h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'></a><span class='plain-code'>5</span></h2>\n<p>A place where we’ve gotten a lot of internal mileage out of Macaroon features is service tokens. Service tokens are tokens used by code, rather than humans; almost always, a service token is something that is stored alongside running application code.</p>\n\n<p>An important detail of Fly.io’s Macaroons is the distinction between a “permissions” token and an “authentication” token. Macaroons by themselves express authorization, not authentication.</p>\n\n<p>That’s a useful constraint, and we want to honor it. By requiring a separate token for authentication, we minimize the impact of having the permissions token stolen; you can’t use it without authentication, so really it’s just like a mobile signed IAM policy expression. Neat!</p>\n\n<p>The way we express authentication is with a third-party caveat (<a href='https://fly.io/blog/macaroons-escalated-quickly/#4' title=''>see the old post for details</a>). Your main Fly.io Macaroon will have a caveat saying “this token is only valid if accompanied by the discharge token for a user in your organization from our authentication system”. Our authentication system does the login dance and issues those discharges.</p>\n\n<p>This is exactly what you want for user tokens and not at all what you want for a service token: we don’t want running code to store those authenticator tokens, because they’re hazardous.</p>\n\n<p>The solution we came up with for service tokens is simple: <code>tkdb</code> exports an API that uses its access to token secrets to strip off the third-party authentication caveat. To call into that API, you have to present a valid discharging authentication token; that is, you have to prove you could already have done whatever the token said. <code>tkdb</code> returns a new token with all the previous caveats, minus the expiration (you don’t usually want service tokens to expire).</p>\n\n<p>OK, so we’ve managed to transform a tuple <code>(unscary-token, scary-token)</code> into the new tuple <code>(scary-token)</code>. Not so impressive. But hold on: the recipient of <code>scary-token</code> can attenuate it further: we can lock it to a particular instance of <code>flyd</code>, or to a particular Fly Machine. Which means exfiltrating it doesn’t do you any good; to use it, you have to control the environment it’s intended to be used in.</p>\n\n<p>The net result of this scheme is that a compromised physical host will only give you access to tokens that have been used on that worker, which is a very nice property. Another way to look at it: every token used in production is traceable in some way to a valid token a user submitted. Neat!</p>\n<div class=\"right-sidenote\"><p>All the cool spooky secret store names were taken.</p>\n</div>\n<p>We do a similar dance to with Pet Semetary, our internal Vault replacement. Petsem manages user secrets for applications, such as Postgres connection strings. Petsem is its own Macaroon authority (it issues its own Macaroons with its own permissions system), and to do something with a secret, you need one of those Petsem-minted Macaroon.</p>\n\n<p>Our primary API servers field requests from users to set secrets for their apps. So the API has a Macaroon that allows secrets writes. But it doesn’t allow reads: there’s no API call to dump your secrets, because our API servers don’t have that privilege. So far, so good.</p>\n\n<p>But when we boot up a Fly Machine, we need to inject the appropriate user secrets into it at boot; <em>something</em> needs a Macaroon that can read secrets. That “something” is <code>flyd</code>, our orchestrator, which runs on every worker server in our fleet.</p>\n\n<p>Clearly, we can’t give every <code>flyd</code> a Macaroon that reads every user’s secret. Most users will never deploy anything on any given worker, and we can’t have a security model that collapses down to “every worker is equally privileged”.</p>\n\n<p>Instead, the “read secret” Macaroon that <code>flyd</code> gets has a third-party caveat attached to it, which is dischargeable only by talking to <code>tkdb</code> and proving (with normal Macaroon tokens) that you have permissions for the org whose secrets you want to read. Once again, access is traceable to an end-user action, and minimized across our fleet. Neat!</p>\n<h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'></a><span class='plain-code'>6</span></h2>\n<p>Our token systems have some of the best telemetry in the whole platform.</p>\n\n<p>Most of that is down to <a href='http://opentelemetry.io/' title=''>OpenTelemetry</a> and <a href='https://www.honeycomb.io/' title=''>Honeycomb</a>. From the moment a request hits our API server through the moment <code>tkdb</code> responds to it, oTel <a href='https://opentelemetry.io/docs/concepts/context-propagation/' title=''>context propagation</a> gives us a single narrative about what’s happening.</p>\n\n<p><a href='https://fly.io/blog/the-exit-interview-jp/' title=''>I was a skeptic about oTel</a>. It’s really, really expensive. And, not to put too fine a point on it, oTel really cruds up our code. Once, I was an “80% of the value of tracing, we can get from logs and metrics” person. But I was wrong.</p>\n\n<p>Errors in our token system are rare. Usually, they’re just early indications of network instability, and between caching and FlyCast, we mostly don’t have to care about those alerts. When we do, it’s because something has gone so sideways that we’d have to care anyways. The <code>tkdb</code> code is remarkably stable and there hasn’t been an incident intervention with our token system in over a year.</p>\n\n<p>Past oTel, and the standard logging and Prometheus metrics every Fly App gets for free, we also have a complete audit trail for token operations, in a permanently retained OpenSearch cluster index. Since virtually all the operations that happen on our platform are mediated by Macaroons, this audit trail is itself pretty powerful.</p>\n<h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'></a><span class='plain-code'>7</span></h2>\n<p>So, that’s pretty much it. The moral of the story for us is, Macaroons have a lot of neat features, our users mostly don’t care about them — that may even be a good thing — but we get a lot of use out of them internally.</p>\n\n<p>As an engineering culture, we’re allergic to “microservices”, and we flinched a bit at the prospect of adding a specific service just to manage tokens. But it’s pulled its weight, and not added really any drama at all. We have at this point a second dedicated security service (Petsem), and even though they sort of rhyme with each other, we’ve got no plans to merge them. <a href='https://how.complexsystems.fail/#10' title=''>Rule #10</a> and all that.</p>\n\n<p>Oh, and a total victory for LiteFS, Litestream, and infrastructure SQLite. Which, after managing an infrastructure SQLite project that routinely ballooned to tens of gigabytes and occasionally threatened service outages, is lovely to see.</p>\n\n<p>Macaroons! If you’d asked us a year ago, we’d have said the jury was still out on whether they were a good move. But then Ben Toews spent a year making them awesome, and so they are. <a href='https://github.com/superfly/macaroon' title=''>Most of the code is open source</a>!</p>", "image": { "url": "https://fly.io/blog/operationalizing-macaroons/assets/occult-circle.jpg", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/taming-rust-proxy/", "title": "Taming A Voracious Rust Proxy", "description": null, "url": "https://fly.io/blog/taming-rust-proxy/", "published": "2025-02-26T00:00:00.000Z", "updated": "2025-03-10T19:59:35.000Z", "content": "<div class=\"lead\"><p>Here’s a fun bug.</p>\n</div>\n<p>The basic idea of our service is that we run containers for our users, as hardware-isolated virtual machines (Fly Machines), on hardware we own around the world. What makes that interesting is that we also connect every Fly Machine to a global Anycast network. If your app is running in Hong Kong and Dallas, and a request for it arrives in Singapore, we’ll route it to <code>HKG</code>.</p>\n\n<p>Our own hardware fleet is roughly divided into two kinds of servers: edges, which receive incoming requests from the Internet, and workers, which run Fly Machines. Edges exist almost solely to run a Rust program called <code>fly-proxy</code>, the router at the heart of our Anycast network.</p>\n\n<p>So: a week or so ago, we flag an incident. Lots of things generate incidents: synthetic monitoring failures, metric thresholds, health check failures. In this case two edge tripwires tripped: elevated <code>fly-proxy</code> HTTP errors, and skyrocketing CPU utilization, on a couple hosts in <code>IAD</code>.</p>\n\n<p>Our incident process is pretty ironed out at this point. We created an incident channel (we ❤️ <a href='https://rootly.com/' title=''>Rootly</a> for this, <a href='https://rootly.com/' title=''>seriously check out Rootly</a>, an infra MVP here for years now), and incident responders quickly concluded that, while something hinky was definitely going on, the platform was fine. We have a lot of edges, and we’ve also recently converted many of our edge servers to significantly beefier hardware.</p>\n\n<p>Bouncing <code>fly-proxy</code> clears the problem up on an affected proxy. But this wouldn’t be much of an interesting story if the problem didn’t later come back. So, for some number of hours, we’re in an annoying steady-state of getting paged and bouncing proxies. </p>\n\n<p>While this is happening, Pavel, on our proxy team, pulls a profile from an angry proxy. \n<img alt=\"A flamegraph profile, described better in the prose anyways.\" src=\"/blog/taming-rust-proxy/assets/proxy-profile.jpg\" />\nSo, this is fuckin’ weird: a huge chunk of the profile is dominated by Rust <code>tracing</code>‘s <code>Subscriber</code>. But that doesn’t make sense. The entire point of Rust <code>tracing</code>, which generates fine-grained span records for program activity, is that <code>entering</code> and <code>exiting</code> a span is very, very fast. </p>\n\n<p>If the mere act of <code>entering</code> a span in a Tokio stack is chewing up a significant amount of CPU, something has gone haywire: the actual code being traced must be doing next to nothing.</p>\n<h2 id='a-quick-refresher-on-async-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-quick-refresher-on-async-rust' aria-label='Anchor'></a><span class='plain-code'>A Quick Refresher On Async Rust</span></h2>\n<p>So in Rust, like a lot of <code>async/await</code> languages, you’ve got <code>Futures</code>. A <code>Future</code> is a type that represents the future value of an asychronous computation, like reading from a socket. <code>Futures</code> are state machines, and they’re lazy: they expose one basic operation, <code>poll</code>, which an executor (like Tokio) calls to advance the state machine. That <code>poll</code> returns whether the <code>Future</code> is still <code>Pending</code>, or <code>Ready</code> with a result.</p>\n\n<p>In theory, you could build an executor that drove a bunch of <code>Futures</code> just by storing them in a list and busypolling each of them, round robin, until they return <code>Ready</code>. This executor would defeat the much of the purpose of asynchronous program, so no real executor works that way.</p>\n\n<p>Instead, a runtime like Tokio integrates <code>Futures</code> with an event loop (on <a href='https://man7.org/linux/man-pages/man7/epoll.7.html' title=''>epoll</a> or <a href='https://en.wikipedia.org/wiki/Kqueue' title=''>kqeue</a>) and, when calling <code>poll</code>, passes a <code>Waker</code>. The <code>Waker</code> is an abstract handle that allows the <code>Future</code> to instruct the Tokio runtime to call <code>poll</code>, because something has happened.</p>\n\n<p>To complicate things: an ordinary <code>Future</code> is a one-shot value. Once it’s <code>Ready</code>, it can’t be <code>polled</code> anymore. But with network programming, that’s usually not what you want: data usually arrives in streams, which you want to track and make progress on as you can. So async Rust provides <code>AsyncRead</code> and <code>AsyncWrite</code> traits, which build on <code>Futures</code>, and provide methods like <code>poll_read</code> that return <code>Ready</code> <em>every time</em> there’s data ready. </p>\n\n<p>So far so good? OK. Now, there are two footguns in this design. </p>\n\n<p>The first footgun is that a <code>poll</code> of a <code>Future</code> that isn’t <code>Ready</code> wastes cycles, and, if you have a bug in your code and that <code>Pending</code> poll happens to trip a <code>Waker</code>, you’ll slip into an infinite loop. That’s easy to see.</p>\n\n<p>The second and more insidious footgun is that an <code>AsyncRead</code> can <code>poll_read</code> to a <code>Ready</code> that doesn’t actually progress its underlying state machine. Since the idea of <code>AsyncRead</code> is that you keep <code>poll_reading</code> until it stops being <code>Ready</code>, this too is an infinite loop.</p>\n\n<p>When we look at our profiles, what we see are samples that almost terminate in libc, but spend next to no time in the kernel doing actual I/O. The obvious explanation: we’ve entered lots of <code>poll</code> functions, but they’re doing almost nothing and returning immediately.</p>\n<h2 id='jaccuse' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#jaccuse' aria-label='Anchor'></a><span class='plain-code'>J'accuse!</span></h2>\n<p>Wakeup issues are annoying to debug. But the flamegraph gives us the fully qualified type of the <code>Future</code> we’re polling:</p>\n<div class=\"highlight-wrapper group relative rust\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-hfleqvh4\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-hfleqvh4\"><span class=\"o\">&</span><span class=\"k\">mut</span> <span class=\"nn\">fp_io</span><span class=\"p\">::</span><span class=\"nn\">copy</span><span class=\"p\">::</span><span class=\"n\">Duplex</span><span class=\"o\"><&</span><span class=\"k\">mut</span> <span class=\"nn\">fp_io</span><span class=\"p\">::</span><span class=\"nn\">reusable_reader</span><span class=\"p\">::</span><span class=\"n\">ReusableReader</span><span class=\"o\"><</span><span class=\"nn\">fp_tcp</span><span class=\"p\">::</span><span class=\"nn\">peek</span><span class=\"p\">::</span><span class=\"n\">PeekableReader</span><span class=\"o\"><</span><span class=\"nn\">tokio_rustls</span><span class=\"p\">::</span><span class=\"nn\">server</span><span class=\"p\">::</span><span class=\"n\">TlsStream</span><span class=\"o\"><</span><span class=\"nn\">fp_tcp_metered</span><span class=\"p\">::</span><span class=\"n\">MeteredIo</span><span class=\"o\"><</span><span class=\"nn\">fp_tcp</span><span class=\"p\">::</span><span class=\"nn\">peek</span><span class=\"p\">::</span><span class=\"n\">PeekableReader</span><span class=\"o\"><</span><span class=\"nn\">fp_tcp</span><span class=\"p\">::</span><span class=\"nn\">permitted</span><span class=\"p\">::</span><span class=\"n\">PermittedTcpStream</span><span class=\"o\">>>>>></span><span class=\"p\">,</span> <span class=\"nn\">connect</span><span class=\"p\">::</span><span class=\"nn\">conn</span><span class=\"p\">::</span><span class=\"n\">Conn</span><span class=\"o\"><</span><span class=\"nn\">tokio</span><span class=\"p\">::</span><span class=\"nn\">net</span><span class=\"p\">::</span><span class=\"nn\">tcp</span><span class=\"p\">::</span><span class=\"nn\">stream</span><span class=\"p\">::</span><span class=\"n\">TcpStream</span><span class=\"o\">></span>\n</code></pre>\n </div>\n</div>\n<p>This loops like a lot, but much of it is just wrapper types we wrote ourselves, and those wrappers don’t do anything interesting. What’s left to audit:</p>\n\n<ul>\n<li><code>Duplex</code>, the outermost type, one of ours, <em>and</em>\n</li><li><code>TlsStream</code>, from <a href='https://github.com/rustls/rustls' title=''>Rustls</a>.\n</li></ul>\n\n<p><code>Duplex</code> is a beast. It’s the core I/O state machine for proxying between connections. It’s not easy to reason about in specificity. But: it also doesn’t do anything directly with a <code>Waker</code>; it’s built around <code>AsyncRead</code> and <code>AsyncWrite</code>. It hasn’t changed recently and we can’t trigger misbehavior in it.</p>\n\n<p>That leaves <code>TlsStream</code>. <code>TlsStream</code> is an ultra-important, load-bearing function in the Rust ecosystem. Everybody uses it. Could it harbor an async Rust footgun? Turns out, it did!</p>\n\n<p>Unlike our <code>Duplex</code>, Rustls actually does have to get intimate with the underlying async executor. And, looking through the repository, Pavel uncovers <a href='https://github.com/rustls/tokio-rustls/issues/72' title=''>this issue</a>: sometimes, <code>TlsStreams</code> in Rustls just spin out. And it turns out, what’s causing this is a TLS state machine bug: when a TLS session is orderly-closed, with a <code>CloseNotify</code> <code>Alert</code> record, the sender of that record has informed its counterparty that no further data will be sent. But if there’s still buffered data on the underlying connection, <code>TlsStream</code> mishandles its <code>Waker</code>, and we fall into a busy-loop.</p>\n\n<p><a href='https://github.com/rustls/rustls/pull/1950/files' title=''>Pretty straightforward fix</a>!</p>\n<h2 id='what-actually-happened-to-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-actually-happened-to-us' aria-label='Anchor'></a><span class='plain-code'>What Actually Happened To Us</span></h2>\n<p>Our partners in object storage, <a href='https://www.tigrisdata.com/' title=''>Tigris Data</a>, were conducting some kind of load testing exercise. Some aspect of their testing system triggered the <code>TlsStream</code> state machine bug, which locked up one or more <code>TlsStreams</code> in the edge proxy handling whatever corner-casey stream they were sending.</p>\n\n<p>Tigris wasn’t generating a whole lot of traffic; tens of thousands of connections, tops. But all of them sent small HTTP bodies and then terminated early. We figured some of those connections errored out, and set up the “TLS CloseNotify happened before EOF” scenario. </p>\n\n<p>To be truer to the chronology, we knew pretty early on in our investigation that something Tigris was doing with their load testing was probably triggering the bug, and we got them to stop. After we worked it out, and Pavel deployed the fix, we told them to resume testing. No spin-outs.</p>\n<h2 id='lessons-learned' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lessons-learned' aria-label='Anchor'></a><span class='plain-code'>Lessons Learned</span></h2>\n<p>Keep your dependencies updated. Unless you shouldn’t keep your dependencies updated. I mean, if there’s a vulnerability (and, technically, this was a DoS vulnerability), always update. And if there’s an important bugfix, update. But if there isn’t an important bugfix, updating for the hell of it might also destabilize your project? So update maybe? Most of the time?</p>\n\n<p>Really, the truth of this is that keeping track of <em>what needs to be updated</em> is valuable work. The updates themselves are pretty fast and simple, but the process and testing infrastructure to confidently metabolize dependency updates is not. </p>\n\n<p>Our other lesson here is that there’s an opportunity to spot these kinds of bugs more directly with our instrumentation. Spurious wakeups should be easy to spot, and triggering a metric when they happen should be cheap, because they’re not supposed to happen often. So that’s something we’ll go do now.</p>", "image": { "url": "https://fly.io/blog/taming-rust-proxy/assets/happy-crab-cover.jpg", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/wrong-about-gpu/", "title": "We Were Wrong About GPUs", "description": null, "url": "https://fly.io/blog/wrong-about-gpu/", "published": "2025-02-14T00:00:00.000Z", "updated": "2025-02-17T10:54:41.000Z", "content": "<div class=\"lead\"><p>We’re building a public cloud, on hardware we own. We raised money to do that, and to place some bets; one of them: GPU-enabling our customers. A progress report: GPUs aren’t going anywhere, but: GPUs aren’t going anywhere.</p>\n</div>\n<p>A couple years back, <a href=\"https://fly.io/gpu\">we put a bunch of chips down</a> on the bet that people shipping apps to users on the Internet would want GPUs, so they could do AI/ML inference tasks. To make that happen, we created <a href=\"https://fly.io/docs/gpus/getting-started-gpus/\">Fly GPU Machines</a>.</p>\n\n<p>A Fly Machine is a <a href=\"https://fly.io/blog/docker-without-docker/\">Docker/OCI container</a> running inside a hardware-virtualized virtual machine somewhere on our global fleet of bare-metal worker servers. A GPU Machine is a Fly Machine with a hardware-mapped Nvidia GPU. It’s a Fly Machine that can do fast CUDA.</p>\n\n<p>Like everybody else in our industry, we were right about the importance of AI/ML. If anything, we underestimated its importance. But the product we came up with probably doesn’t fit the moment. It’s a bet that doesn’t feel like it’s paying off.</p>\n\n<p><strong class='font-semibold text-navy-950'>If you’re using Fly GPU Machines, don’t freak out; we’re not getting rid of them.</strong> But if you’re waiting for us to do something bigger with them, a v2 of the product, you’ll probably be waiting awhile.</p>\n<h3 id='what-it-took' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-took' aria-label='Anchor'></a><span class='plain-code'>What It Took</span></h3>\n<p>GPU Machines were not a small project for us. Fly Machines run on an idiosyncratically small hypervisor (normally Firecracker, but for GPU Machines <a href=\"https://github.com/cloud-hypervisor/cloud-hypervisor\">Intel’s Cloud Hypervisor</a>, a very similar Rust codebase that supports PCI passthrough). The Nvidia ecosystem is not geared to supporting micro-VM hypervisors.</p>\n\n<p>GPUs <a href=\"https://googleprojectzero.blogspot.com/2020/09/attacking-qualcomm-adreno-gpu.html\">terrified our security team</a>. A GPU is just about the worst case hardware peripheral: intense multi-directional direct memory transfers</p>\n<div class=\"right-sidenote\"><p>(not even bidirectional: in common configurations, GPUs talk to each other)</p>\n</div>\n<p>with arbitrary, end-user controlled computation, all operating outside our normal security boundary.</p>\n\n<p>We did a couple expensive things to mitigate the risk. We shipped GPUs on dedicated server hardware, so that GPU- and non-GPU workloads weren’t mixed. Because of that, the only reason for a Fly Machine to be scheduled on a GPU machine was that it needed a PCI BDF for an Nvidia GPU, and there’s a limited number of those available on any box. Those GPU servers were drastically less utilized and thus less cost-effective than our ordinary servers.</p>\n\n<p>We funded two very large security assessments, from <a href=\"https://www.atredis.com/\">Atredis</a> and <a href=\"https://tetrelsec.com/\">Tetrel</a>, to evaluate our GPU deployment. Matt Braun is writing up those assessments now. They were not cheap, and they took time.</p>\n\n<p>Security wasn’t directly the biggest cost we had to deal with, but it was an indirect cause for a subtle reason.</p>\n\n<p>We could have shipped GPUs very quickly by doing what Nvidia recommended: standing up a standard K8s cluster to schedule GPU jobs on. Had we taken that path, and let our GPU users share a single Linux kernel, we’d have been on Nvidia’s driver happy-path.</p>\n\n<p>Alternatively, we could have used a conventional hypervisor. Nvidia suggested VMware (heh). But they could have gotten things working had we used QEMU. We like QEMU fine, and could have talked ourselves into a security story for it, but the whole point of Fly Machines is that they take milliseconds to start. We could not have offered our desired Developer Experience on the Nvidia happy-path.</p>\n\n<p>Instead, we burned months trying (and ultimately failing) to get Nvidia’s host drivers working to map <a href=\"https://www.nvidia.com/en-us/data-center/virtual-solutions/\">virtualized GPUs</a> into Intel Cloud Hypervisor. At one point, we hex-edited the closed-source drivers to trick them into thinking our hypervisor was QEMU.</p>\n\n<p>I’m not sure any of this really mattered in the end. There’s a segment of the market we weren’t ever really able to explore because Nvidia’s driver support kept us from thin-slicing GPUs. We’d have been able to put together a really cheap offering for developers if we hadn’t run up against that, and developers love “cheap”, but I can’t prove that those customers are real.</p>\n\n<p>On the other hand, we’re committed to delivering the Fly Machine DX for GPU workloads. Beyond the PCI/IOMMU drama, just getting an entire hardware GPU working in a Fly Machine was a lift. We needed Fly Machines that would come up with the right Nvidia drivers; our stack was built assuming that the customer’s OCI container almost entirely defined the root filesystem for a Machine. We had to engineer around that in our <code>flyd</code> orchestrator. And almost everything people want to do with GPUs involves efficiently grabbing huge files full of model weights. Also annoying!</p>\n\n<p>And, of course, we bought GPUs. A lot of GPUs. Expensive GPUs.</p>\n<h3 id='why-it-isnt-working' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-it-isnt-working' aria-label='Anchor'></a><span class='plain-code'>Why It Isn’t Working</span></h3>\n<p>The biggest problem: developers don’t want GPUs. They don’t even want AI/ML models. They want LLMs. <em>System engineers</em> may have smart, fussy opinions on how to get their models loaded with CUDA, and what the best GPU is. But <em>software developers</em> don’t care about any of that. When a software developer shipping an app comes looking for a way for their app to deliver prompts to an LLM, you can’t just give them a GPU.</p>\n\n<p>For those developers, who probably make up most of the market, it doesn’t seem plausible for an insurgent public cloud to compete with OpenAI and Anthropic. Their APIs are fast enough, and developers thinking about performance in terms of “tokens per second” aren’t counting milliseconds.</p>\n<div class=\"right-sidenote\"><p>(you should all feel sympathy for us)</p>\n</div>\n<p>This makes us sad because we really like the point in the solution space we found. Developers shipping apps on Amazon will outsource to other public clouds to get cost-effective access to GPUs. But then they’ll faceplant trying to handle data and model weights, backhauling gigabytes (at significant expense) from S3. We have app servers, GPUs, and object storage all under the same top-of-rack switch. But inference latency just doesn’t seem to matter yet, so the market doesn’t care.</p>\n\n<p>Past that, and just considering the system engineers who do care about GPUs rather than LLMs: the hardware product/market fit here is really rough.</p>\n\n<p>People doing serious AI work want galactically huge amounts of GPU compute. A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s.</p>\n<div class=\"right-sidenote\"><p>Near as we can tell, MIG gives you a UUID to talk to the host driver, not a PCI device.</p>\n</div>\n<p>We think there’s probably a market for users doing lightweight ML work getting tiny GPUs. <a href=\"https://www.nvidia.com/en-us/technologies/multi-instance-gpu/\">This is what Nvidia MIG does</a>, slicing a big GPU into arbitrarily small virtual GPUs. But for fully-virtualized workloads, it’s not baked; we can’t use it. And I’m not sure how many of those customers there are, or whether we’d get the density of customers per server that we need.</p>\n\n<p><a href=\"https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half\">That leaves the L40S customers</a>. There are a bunch of these! We dropped L40S prices last year, not because we were sour on GPUs but because they’re the one part we have in our inventory people seem to get a lot of use out of. We’re happy with them. But they’re just another kind of compute that some apps need; they’re not a driver of our core business. They’re not the GPU bet paying off.</p>\n\n<p>Really, all of this is just a long way of saying that for most software developers, “AI-enabling” their app is best done with API calls to things like Claude and GPT, Replicate and RunPod.</p>\n<h3 id='what-did-we-learn' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-did-we-learn' aria-label='Anchor'></a><span class='plain-code'>What Did We Learn?</span></h3>\n<p>A very useful way to look at a startup is that it’s a race to learn stuff. So, what’s our report card?</p>\n\n<p>First off, when we embarked down this path in 2022, we were (like many other companies) operating in a sort of phlogiston era of AI/ML. The industry attention to AI had not yet collapsed around a small number of foundational LLM models. We expected there to be a diversity of <em>mainstream</em> models, the world <a href='https://github.com/elixir-nx/bumblebee' title=''>Elixir Bumblebee</a> looks forward to, where people pull different AI workloads off the shelf the same way they do Ruby gems.</p>\n\n<p>But <a href='https://www.cursor.com/' title=''>Cursor happened</a>, and, as they say, how are you going to keep ‘em down on the farm once they’ve seen Karl Hungus? It seems much clearer where things are heading.</p>\n\n<p>GPUs were a test of a Fly.io company credo: as we think about core features, we design for 10,000 developers, not for 5-6. It took a minute, but the credo wins here: GPU workloads for the 10,001st developer are a niche thing.</p>\n\n<p>Another way to look at a startup is as a series of bets. We put a lot of chips down here. But the buy-in for this tournament gave us a lot of chips to play with. Never making a big bet of any sort isn’t a winning strategy. I’d rather we’d flopped the nut straight, but I think going in on this hand was the right call.</p>\n\n<p>A really important thing to keep in mind here, and something I think a lot of startup thinkers sleep on, is the extent to which this bet involved acquiring assets. Obviously, some of our <a href='https://fly.io/blog/the-exit-interview-jp/' title=''>costs here aren’t recoverable</a>. But the hardware parts that aren’t generating revenue will ultimately get liquidated; like with <a href='https://fly.io/blog/32-bit-real-estate/' title=''>our portfolio of IPv4 addresses</a>, I’m even more comfortable making bets backed by tradable assets with durable value.</p>\n\n<p>In the end, I don’t think GPU Fly Machines were going to be a hit for us no matter what we did. Because of that, one thing I’m very happy about is that we didn’t compromise the rest of the product for them. Security concerns slowed us down to where we probably learned what we needed to learn a couple months later than we could have otherwise, but we’re scaling back our GPU ambitions without having sacrificed <a href='https://fly.io/blog/sandboxing-and-workload-isolation/' title=''>any of our isolation story</a>, and, ironically, GPUs <em>other people run</em> are making that story a lot more important. The same thing goes for our Fly Machine developer experience.</p>\n\n<p>We started this company building a Javascript runtime for edge computing. We learned that our customers didn’t want a new Javascript runtime; they just wanted their native code to work. <a href='https://news.ycombinator.com/item?id=22616857' title=''>We shipped containers</a>, and no convincing was needed. We were wrong about Javascript edge functions, and I think we were wrong about GPUs. That’s usually how we figure out the right answers: by being wrong about a lot of stuff.</p>", "image": { "url": "https://fly.io/blog/wrong-about-gpu/assets/choices-choices-cover.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/the-exit-interview-jp/", "title": "The Exit Interview: JP Phillips", "description": null, "url": "https://fly.io/blog/the-exit-interview-jp/", "published": "2025-02-12T00:00:00.000Z", "updated": "2025-02-12T14:06:21.000Z", "content": "<div class=\"lead\"><p>JP Phillips is off to greener, or at least calmer, pastures. He joined us 4 years ago to build the next generation of our orchestration system, and has been one of the anchors of our engineering team. His last day is today. We wanted to know what he was thinking, and figured you might too.</p>\n</div>\n<p><em>Question 1: Why, JP? Just why?</em></p>\n\n<p>LOL. When I looked at what I wanted to see from here in the next 3-4 years, it didn’t really match up with where we’re currently heading. Specifically, with our new focus on MPG <em>[Managed Postgres]</em> and [llm] <em>[llm].</em></p>\n<div class=\"callout\"><p>Editorial comment: Even I don’t know what [llm] is.</p>\n</div>\n<p>The Fly Machines platform is more or less finished, in the sense of being capable of supporting the next iteration of our products. My original desire to join Fly.io was to make Machines a product that would <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>rid us of HashiCorp Nomad</a>, and I feel like that’s been accomplished.</p>\n\n<p><em>Where were you hoping to see us headed?</em></p>\n\n<p>More directly positioned as a cloud provider, rather than a platform-as-a-service; further along the customer journey from “developers” and “startups” to large established companies.</p>\n\n<p>And, it’s not that I disagree with PAAS work or MPG! Rather, it’s not something that excites me in a way that I’d feel challenged and could continue to grow technically.</p>\n\n<p><em>Follow up question: does your family know what you’re doing here? Doing to us? Are they OK with it?</em></p>\n\n<p>Yes, my family was very involved in the decision, before I even talked to other companies.</p>\n\n<p><em>What’s the thing you’re happiest about having built here? It cannot be “all of <code>flyd</code>”.</em></p>\n\n<p>We’ve enabled developers to run workloads from an OCI image and an API call all over the world. On any other cloud provider, the knowledge of how to pull that off comes with a professional certification.</p>\n\n<p><em>In what file in our <code>nomad-firecracker</code> repository would I find that code?</em></p>\n\n<p><a href='https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines' title=''>https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines</a></p>\n\n<p><img alt=\"A diagram that doesn't make any of this clearer\" src=\"/blog/the-exit-interview-jp/assets/flaps.png?1/2¢er\" /></p>\n\n<p><em>So you mean, literally, the whole Fly Machines API, and <code>flaps</code>, the API gateway for Fly Machines?</em></p>\n\n<p>Yes, all of it. The <code>flaps</code> API server, the <code>flyd</code> RPCs it calls, the <code>flyd</code> finite state machine system, the interface to running VMs.</p>\n\n<p><em>Is there something you especially like about that design?</em></p>\n\n<p>I like that it for the most part doesn’t require any central coordination. And I like that the P90 for Fly Machine <code>create</code> calls is sub-5-seconds for pretty much every region except for Johannesburg and Hong Kong.</p>\n\n<p>I think the FSM design is something I’m proud of; if I could take any code with me, it’d be the <code>internal/fsm</code> in the <code>nomad-firecracker</code> repo.</p>\n<div class=\"callout\"><p>You can read more about <a href=\"https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/\" title=\"\">the <code>flyd</code> orchestrator JP led over here</a>. But, a quick decoder ring: <code>flyd</code> runs independently without any central coordination on thousands of “worker” servers around the globe. It’s structured as an API server for a bunch of finite state machine invocations, where an FSM might be something like “start a Fly Machine” or “create a new Fly Machine” or “cordon off a Fly Machine so we can update it”. Each FSM invocation is comprised of a bunch of steps, each of those steps has callbacks into the <code>flyd</code> code, and each step is logged in <a href=\"https://github.com/boltdb/bolt\" title=\"\">a BoltDB database</a>.</p>\n</div>\n<p><em>Thinking back, there are like two archetypes of insanely talented developers I’ve worked with. One is the kind that belts out ridiculous amounts of relatively sophisticated code on a whim, at like 3AM. Jerome [who leads our fly-proxy team], is that type. The other comes to projects with what feels like fully-formed, coherent designs that are not super intuitive, and the whole project just falls together around that design. Did you know you were going to do the FSM log thing when you started <code>flyd</code>?</em></p>\n\n<p>I definitely didn’t have any specific design in mind when I started on <code>flyd</code>. I think the FSM stuff is a result of work I did at Compose.io / MongoHQ (where it was called “recipes”/“operations”) and the workd I did at HashiCorp using Cadence.</p>\n\n<p>Once I understood what the product needed to do and look like, having a way to perform deterministic and durable execution felt like a good design.</p>\n\n<p><em>Cadence?</em></p>\n\n<p><a href='https://cadenceworkflow.io/' title=''>Cadence</a> is the child of AWS Step Functions and the predecessor to <a href='https://temporal.io/' title=''>Temporal</a> (the company).</p>\n\n<p>One of the biggest gains, with how it works in <code>flyd</code>, is knowing we would need to deploy <code>flyd</code> all day, every day. If <code>flyd</code> was in the middle of doing some work, it needed to pick back up right where it left off, post-deploy.</p>\n\n<p><em>OK, next question. What’s the most impressive thing you saw someone else build here? To make things simpler and take some pressure off the interview, we can exclude any of my works from consideration.</em></p>\n\n<p>Probably <a href='https://github.com/superfly/corrosion' title=''><code>corrosion2</code></a>.</p>\n<div class=\"callout\"><p>Sidebar: <code>corrosion2</code> is our state distribution system. While <code>flyd</code> runs individual Fly Machines for users, each instance is solely responsible for its own state; there’s no global scheduler. But we have platform components, most obviously <code>fly-proxy</code>, our Anycast router, that need to know what’s running where. <code>corrosion2</code> is a Rust service that does <a href=\"https://fly.io/blog/building-clusters-with-serf/\" title=\"\">SWIM gossip</a> to propagate information from each worker into a CRDT-structured SQLite database. <code>corrosion2</code> essentially means any component on our fleet can do SQLite queries to get near-real-time information about any Fly Machine around the world.</p>\n</div>\n<p>If for no other reason than that we deployed <code>corrosion</code>, learned from it, and were able to make significant and valuable improvements — and then migrate to the new system in a short period of time.</p>\n\n<p>Having a “just SQLite” interface, for async replicated changes around the world in seconds, it’s pretty powerful.</p>\n\n<p>If we invested in <a href='https://antithesis.com/' title=''>Anthesis</a> or TLA+ testing, I think there’s <a href='https://github.com/superfly/corrosion' title=''>potential for other companies</a> to get value out of <code>corrosion2</code>.</p>\n\n<p><em>Just as a general-purpose gossip-based SQLite CRDT gossip system?</em></p>\n\n<p>Yes.</p>\n\n<p><em>OK, you’re being too nice. What’s your least favorite thing about the platform?</em></p>\n\n<p>GraphQL. No, Elixir. It’s a tie between GraphQL and Elixir.</p>\n\n<p>But probably GraphQL, by a hair.</p>\n\n<p><em>That’s not the answer I expected.</em></p>\n\n<p>GraphQL slows everyone down, and everything. Elixir only slows me down.</p>\n\n<p><em>The rest of the platform, you’re fine with? No complaints?</em></p>\n\n<p>I’m happier now that we have <code>pilot</code>.</p>\n<div class=\"callout\"><p><code>pilot</code> is our new <code>init</code>. When we launch a Fly Machine, <code>init</code> is our foothold in the machine; this is unlike a normal OCI runtime, where “pid 1” is often the user’s entrypoint program. Our original <code>init</code> was so simple people dunked on it and said it might as well have been a bash script; over time, <code>init</code> has sprouted a bunch of new features. <code>pilot</code> consolidates those features, and, more importantly, is itself a complete OCI runtime; <code>pilot</code> can natively run containers inside of Fly Machines.</p>\n</div>\n<p>Before <code>pilot</code>, there really wasn’t any contract between <code>flyd</code> and <code>init</code>. And <code>init</code> was just “whatever we wanted <code>init</code> to be”. That limit its ability to serve us.</p>\n\n<p>Having <code>pilot</code> be an OCI-compliant runtime with an API for <code>flyd</code> to drive is a big win for the future of the Fly Machines API.</p>\n\n<p><em>Was I right that we should have used SQLite for <code>flyd</code>, or were you wrong to have used BoltDB?</em></p>\n\n<p>I still believe Bolt was the right choice. I’ve never lost a second of sleep worried that someone is about to run a SQL update statement on a host, or across the whole fleet, and then mangled all our state data. And limiting the storage interface, by not using SQL, kept <code>flyd</code>‘s scope managed.</p>\n\n<p>On the engine side of the platform, which is what <code>flyd</code> is, I still believe SQL is too powerful for what <code>flyd</code> does.</p>\n\n<p><em>If you had this to do over again, would Bolt be precisely what you’d pick, or is there something else you’d want to try? Some cool-ass new KV store?</em></p>\n\n<p>Nah. But, I’d maybe consider a SQLite database per-Fly-Machine. Then the scope of danger is about as small as it could possibly be.</p>\n\n<p><em>Whoah, that’s an interesting thought. People sleep on the “keep a zillion little SQLites” design.</em></p>\n\n<p>Yeah, with per-Machine SQLite, once a Fly Machine is destroyed, we can just zip up the database and stash it in object storage. The biggest hold-up I have about it is how we’d manage the schemas.</p>\n\n<p><em>OpenTelemetry: were you right all along?</em></p>\n\n<p>One hundred percent.</p>\n\n<p><em>I basically attribute oTel at Fly.io to you.</em></p>\n\n<p>Without oTel, it’d be a disaster trying to troubleshoot the system. I’d have ragequit trying.</p>\n\n<p><em>I remember there being a cost issue, with how much Honeycomb was going to end up charging us to manage all the data. But that seems silly in retrospect.</em></p>\n\n<p>For sure. It is 100% part of the decision and the conversation. But: we didn’t have the best track record running a logs/metrics cluster at this fidelity. It was worth the money to pay someone else to manage tracing data.</p>\n\n<p><em>Strong agree. I think my only issue is just the extent to which it cruds up code. But I need to get over that.</em></p>\n\n<p>Yes, it’s very explicit. I think the next big part of oTel is going to be auto-instrumentation, for profiling.</p>\n\n<p><em>You’re a veteran Golang programmer. Say 3 nice things about Rust.</em></p>\n<div class=\"callout\"><p>Most of our backend is in Go, but <code>fly-proxy</code>, <code>corrosion2</code>, and <code>pilot</code> are in Rust.</p>\n</div>\n<ol>\n<li>Option. \n</li><li>Match.\n</li><li>Serde macros.\n</li></ol>\n\n<p><em>Even I can’t say shit about Option and match.</em></p>\n\n<p>Match is so much better than anything in Go.</p>\n\n<p><em>Elixir, Go, and Rust. An honest take on that programming cocktail.</em></p>\n\n<p>Three’s a crowd, Elixir can stay home.</p>\n\n<p><em>If you could only lose one, you’d keep Rust.</em></p>\n\n<p>I’ve learned its shortcomings and the productivity far outweighs having to deal with the Rust compiler.</p>\n\n<p><em>You’d be unhappy if we moved the <code>flaps</code> API code from Go to Elixir.</em></p>\n\n<p>Correct.</p>\n\n<p><em>I kind of buy the idea of doing orchestration and scheduling code, which is policy-intensive, in a higher-level language.</em></p>\n\n<p>Maybe. If Ruby had a better concurrency story, I don’t think Elixir would have a place for us.</p>\n<div class=\"callout\"><p>Here I need to note that Ruby is functionally dead here, and Elixir is ascendant.</p>\n</div>\n<p><em>We have an idiosyncratic management structure. We’re bottom-up, but ambiguously so. We don’t have roadmaps, except when we do. We have minimal top-down technical direction. Critique.</em></p>\n\n<p>It’s too easy to lose sight of whether your current focus [in what you’re building] is valuable to the company.</p>\n\n<p><em>The first thing I warn every candidate about on our “do-not-work-here” calls.</em></p>\n\n<p>I think it comes down to execution, and accountability to actually finish projects. I spun a lot trying to figure out what would be the most valuable work for Fly Machines.</p>\n\n<p><em>You don’t have to be so nice about things.</em></p>\n\n<p>We struggle a lot with consistent communication. We change direction a little too often. It got to a point where I didn’t see a point in devoting time and effort into projects, because I’d not be able to show enough value quick enough.</p>\n\n<p><em>I see things paying off later than we’d hoped or expected they would. Our secret storage system, Pet Semetary, is a good example of this. Our K8s service, FKS, is another obvious one, since we’re shipping MPG on it.</em></p>\n\n<p><em>This is your second time working Kurt, at a company where he’s the CEO. Give him a 1-4 star rating. He can take it! At least, I think he can take it.</em></p>\n\n<p>2022: ★★★★</p>\n\n<p>2023: ★★</p>\n\n<p>2024: ★★✩</p>\n\n<p>2025: ★★★✩</p>\n\n<p>On a four-star scale.</p>\n\n<p><em>Whoah. I did not expect a histogram. Say more about 2023!</em></p>\n\n<p>We hired too many people, too quickly, and didn’t have the guardrails and structure in place for everybody to be successful.</p>\n\n<p><em>Also: GPUs!</em></p>\n\n<p>Yes. That was my next comment.</p>\n\n<p><em>Do we secretly agree about GPUs?</em></p>\n\n<p>I think so.</p>\n\n<p><em>Our side won the argument in the end! But at what cost?</em></p>\n\n<p>They were a killer distraction.</p>\n\n<p><em>Final question: how long will you remain in the first-responder on-call rotation after you leave? I assume at least until August. I have a shift this weekend; can you swap with me? I keep getting weekends.</em></p>\n\n<p>I am going to be asleep all weekend if any of my previous job changes are indicative.</p>\n\n<p><em>I sleep through on-call too! But nobody can yell at you for it now. I think you have the comparative advantage over me in on-calling.</em></p>\n\n<p>Yes I will absolutely take all your future on-call shifts, you have convinced me.</p>\n\n<p><em>All this aside: it has been a privilege watching you work. I hope your next gig is 100x more relaxing than this was. Or maybe I just hope that for myself. Except: I’ll never escape this place. Thank you so much for doing this.</em></p>\n\n<p>Thank you! I’m forever grateful for having the opportunity to be a part of Fly.io.</p>", "image": { "url": "https://fly.io/blog/the-exit-interview-jp/assets/bye-bye-little-sebastian-cover.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/a-blog-if-kept/", "title": "A Blog, If You Can Keep It", "description": null, "url": "https://fly.io/blog/a-blog-if-kept/", "published": "2025-02-10T00:00:00.000Z", "updated": "2025-02-19T13:16:17.000Z", "content": "<div class=\"lead\"><p>A boldfaced lede like this was a sure sign you were reading a carefully choreographed EffortPost from our team at Fly.io. We’re going to do less of those. Or the same amount but more of a different kind of post. Either way: launch an app on Fly.io today!</p>\n</div>\n<p>Over the last 5 years, we’ve done pretty well for ourselves writing content for Hacker News. And that’s <a href='https://news.ycombinator.com/item?id=39373476' title=''>mostly</a> been good for us. We don’t do conventional marketing, we don’t have a sales team, the rest of social media is atomized over 5 different sites. Writing pieces that HN takes seriously has been our primary outreach tool.</p>\n\n<p>There’s a recipe (probably several, but I know this one works) for charting a post on HN:</p>\n\n<ol>\n<li>Write an EffortPost, which is to say a dense technical piece over 2000 words long; within that rubric there’s a bunch of things that are catnip to HN, including runnable code, research surveys, and explainers. (There are also cat-repellants you learn to steer clear of.)\n</li><li>Assiduously avoid promotion. You have to write for the audience. We get away with sporadically including a call-to-action block in our posts, but otherwise: the post should make sense even if an unrelated company posted it after you went out of business.\n</li><li>Pick topics HN is interested in (it helps if all your topics are au courant for HN, and we’ve been <a href='https://news.ycombinator.com/item?id=32250426' title=''>very</a> <a href='https://news.ycombinator.com/item?id=32018066' title=''>lucky</a> in that regard).\n</li><li>Like 5-6 more style-guide things that help incrementally. Probably 3 different teams writing for HN will have 3 different style guides with only like ½ overlap. Ours, for instances, instructs writers to swear.\n</li></ol>\n\n<p>I like this kind of writing. It’s not even a chore. But it’s become an impediment for us, for a couple reasons: the team serializes behind an “editorial” function here, which keeps us from publishing everything we want; worse, caring so much about our track record leaves us noodling on posts interminably (the poor <a href='https://www.tigrisdata.com/' title=''>Tigrises</a> have been waiting for months for me to publish the piece I wrote about them and FoundationDB; take heart, this post today means that one is coming soon).</p>\n\n<p>But worst of all, I worried incessantly about us <a href='https://gist.github.com/tqbf/e853764b562a2d72a91a6986ca3b77c0' title=''>wearing out our welcome</a>. To my mind, we’d have 1, maybe 2 bites at the HN apple in a given month, and we needed to make them count.</p>\n\n<p>That was dumb. I am dumb about a lot of things! I came around to understanding this after Kurt demanded I publish my blog post about BFAAS (Bash Functions As A Service), 500 lines of Go code that had generated 4500 words in my draft. It was only after I made the decision to stop gatekeeping this blog that I realized <a href='https://simonwillison.net/' title=''>Simon Willison</a> has been disproving my “wearing out the welcome” theory, day in and day out, for years. He just writes stuff about LLMs when it interests him. I mean, it helps that he’s a better writer than we are. But he’s not wasting time choreographing things.</p>\n\n<p>Back in like 2009, <a href='https://web.archive.org/web/20110806040300/http://chargen.matasano.com/chargen/2009/7/22/if-youre-typing-the-letters-a-e-s-into-your-code-youre-doing.html' title=''>we had a blog</a> at another company I was at. That blog drove a lot of business for us (and, on three occasions, almost killed me). It was not in the least bit optimized for HN. I like pretending to be a magazine feature writer, but I miss writing dashed-off pieces every day and clearing space for other people on the team to write as well.</p>\n\n<p>So this is all just a heads up: we’re trying something new. This is a very long and self-indulgent way to say “we’re going to write a normal blog like it’s 2008”, but that’s how broken my brain is after years of having my primary dopaminergic rewards come from how long Fly.io blog posts stay on the front page: I have to disclaim blogging before we start doing it, lest I fail to meet expectations.</p>\n\n<p>Like I said. I’m real dumb. But: looking forward to getting a lot more stuff out on the web for people to read this year!</p>", "image": { "url": "https://fly.io/blog/a-blog-if-kept/assets/keep-blog.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/semgrep-but-for-real-now/", "title": "Did Semgrep Just Get A Lot More Interesting?", "description": null, "url": "https://fly.io/blog/semgrep-but-for-real-now/", "published": "2025-02-10T00:00:00.000Z", "updated": "2025-02-11T00:20:14.000Z", "content": "<div class=\"right-sidenote\"><p>This whole paragraph is just one long sentence. God I love <a href=\"https://fly.io/blog/a-blog-if-kept/\" title=\"\">just random-ass blogging</a> again.</p>\n</div>\n<p><a href='https://ghuntley.com/stdlib/' title=''>This bit by Geoffrey Huntley</a> is super interesting to me and, despite calling out that LLM-driven development agents like Cursor have something like a 40% success rate at actually building anything that passes acceptance criteria, makes me think that more of the future of our field belongs to people who figure out how to use this weird bags of model weights than any of us are comfortable with. </p>\n\n<p>I’ve been dinking around with Cursor for a week now (if you haven’t, I think it’s something close to malpractice not to at least take it — or something like it — for a spin) and am just now from this post learning that Cursor has this <a href='https://docs.cursor.com/context/rules-for-ai' title=''>rules feature</a>. </p>\n\n<p>The important thing for me is not how Cursor rules work, but rather how Huntley uses them. He turns them back on themselves, writing rules to tell Cursor how to organize the rules, and then teach Cursor how to write (under human supervision) its own rules.</p>\n\n<p>Cursor kept trying to get Huntley to use Bazel as a build system. So he had cursor write a rule for itself: “no bazel”. And there was no more Bazel. If I’d known I could do this, I probably wouldn’t have bounced from the Elixir project I had Cursor doing, where trying to get it to write simple unit tests got it all tangled up trying to make <a href='https://hexdocs.pm/mox/Mox.html' title=''>Mox</a> work. </p>\n\n<p>But I’m burying the lead. </p>\n\n<p>Security people have been for several years now somewhat in love with a tool called <a href='https://github.com/semgrep/semgrep' title=''>Semgrep</a>. Semgrep is a semantics-aware code search tool; using symbolic variable placeholders and otherwise ordinary code, you can write rules to match pretty much arbitary expressions and control flow. </p>\n\n<p>If you’re an appsec person, where you obviously go with this is: you build a library of Semgrep searches for well-known vulnerability patterns (or, if you’re like us at Fly.io, you work out how to get Semgrep to catch the Rust concurrency footgun of RWLocks inside if-lets).</p>\n\n<p>The reality for most teams though is “ain’t nobody got time for that”. </p>\n\n<p>But I just checked and, unsurprisingly, 4o <a href='https://chatgpt.com/share/67aa94a7-ea3c-8012-845c-6c9491b33fe4' title=''>seems to do reasonably well</a> at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?</p>\n\n<p>What interests me is this: it seems obvious that we’re going to do more and more “closed-loop” LLM agent code generation stuff. By “closed loop”, I mean that the thingy that generates code is going to get to run the code and watch what happens when it’s interacted with. You’re just a small bit of glue code and a lot of system prompting away from building something like that right now: <a href='https://x.com/chris_mccord/status/1882839014845374683' title=''>Chris McCord is building</a> a thingy that generates whole Elixir/Phoenix apps and runs them as Fly Machines. When you deploy these kinds of things, the LLM gets to see the errors when the code is run, and it can just go fix them. It also gets to see errors and exceptions in the logs when you hit a page on the app, and it can just go fix them.</p>\n\n<p>With a bit more system prompting, you can get an LLM to try to generalize out from exceptions it fixes and generate unit test coverage for them. </p>\n\n<p>With a little bit more system prompting, you can probably get an LLM to (1) generate a Semgrep rule for the generalized bug it caught, (2) test the Semgrep rule with a positive/negative control, (3) save the rule, (4) test the whole codebase with Semgrep for that rule, and (5) fix anything it finds that way. </p>\n\n<p>That is a lot more interesting to me than tediously (and probably badly) trying to predict everything that will go wrong in my codebase a priori and Semgrepping for them. Which is to say: Semgrep — which I have always liked — is maybe a lot more interesting now? And tools like it?</p>", "image": { "url": "https://fly.io/blog/semgrep-but-for-real-now/assets/ci-cd-cover.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/vscode-ssh-wtf/", "title": "VSCode’s SSH Agent Is Bananas", "description": null, "url": "https://fly.io/blog/vscode-ssh-wtf/", "published": "2025-02-07T00:00:00.000Z", "updated": "2025-02-07T21:53:40.000Z", "content": "<p>We’re interested in getting integrated into the flow VSCode uses to do remote editing over SSH, because everybody is using VSCode now, and, in particular, they’re using forks of VSCode that generate code with LLMs. </p>\n<div class=\"right-sidenote\"><p>”hallucination” is what we call it when LLMs get code wrong; “engineering” is what we call it when people do.</p>\n</div>\n<p>LLM-generated code is <a href='https://nicholas.carlini.com/writing/2024/how-i-use-ai.html' title=''>useful in the general case</a> if you know what you’re doing. But it’s ultra-useful if you can close the loop between the LLM and the execution environment (with an “Agent” setup). There’s lots to say about this, but for the moment: it’s a semi-effective antidote to hallucination: the LLM generates the code, the agent scaffolding runs the code, the code generates errors, the agent feeds it back to the LLM, the process iterates. </p>\n\n<p>So, obviously, the issue here is you don’t want this iterative development process happening on your development laptop, because LLMs have boundary issues, and they’ll iterate on your system configuration just as happily on the Git project you happen to be working in. A thing you’d really like to be able to do: run a closed-loop agent-y (“agentic”? is that what we say now) configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can’t screw you over in any way. You get where we’re going with this.</p>\n\n<p>Anyways! I would like to register a concern.</p>\n\n<p>Emacs hosts the spiritual forebearer of remote editing systems, a blob of hyper-useful Elisp called <a href='https://www.gnu.org/software/tramp/' title=''>“Tramp”</a>. If you can hook Tramp up to any kind of interactive environment — usually, an SSH session — where it can run Bourne shell commands, it can extend Emacs to that environment.</p>\n\n<p>So, VSCode has a feature like Tramp. Which, neat, right? You’d think, take Tramp, maybe simplify it a bit, switch out Elisp for Typescript.</p>\n\n<p>You’d think wrong!</p>\n\n<p>Unlike Tramp, which lives off the land on the remote connection, VSCode mounts a full-scale invasion: it runs a Bash snippet stager that downloads an agent, including a binary installation of Node. </p>\n\n<p>I <em>think</em> this is <a href='https://github.com/microsoft/vscode/tree/c9e7e1b72f80b12ffc00e06153afcfedba9ec31f/src/vs/server/node' title=''>the source code</a>?</p>\n\n<p>The agent runs over port-forwarded SSH. It establishes a WebSockets connection back to your running VSCode front-end. The underlying protocol on that connection can:</p>\n\n<ul>\n<li>Wander around the filesystem\n</li><li>Edit arbitrary files\n</li><li>Launch its own shell PTY processes\n</li><li>Persist itself\n</li></ul>\n\n<p>In security-world, there’s a name for tools that work this way. I won’t say it out loud, because that’s not fair to VSCode, but let’s just say the name is murid in nature.</p>\n\n<p>I would be a little nervous about letting people VSCode-remote-edit stuff on dev servers, and apoplectic if that happened during an incident on something in production. </p>\n\n<p>It turns out we don’t have to care about any of this to get a custom connection to a Fly Machine working in VSCode, so none of this matters in any kind of deep way, but: we’ve decided to just be a blog again, so: we had to learn this, and now you do too.</p>", "image": { "url": "https://fly.io/static/images/default-post-thumbnail.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/", "title": "AI GPU Clusters, From Your Laptop, With Livebook", "description": null, "url": "https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/", "published": "2024-09-24T00:00:00.000Z", "updated": "2024-10-03T19:05:54.000Z", "content": "<div class=\"lead\"><p>Livebook, FLAME, and the Nx stack: three Elixir components that are easy to describe, more powerful than they look, and intricately threaded into the Elixir ecosystem. A few weeks ago, Chris McCord (👋) and Chris Grainger showed them off at ElixirConf 2024. We thought the talk was worth a recap.</p>\n</div>\n<p>Let’s begin by introducing our cast of characters.</p>\n\n<p><a href='https://livebook.dev/' title=''>Livebook</a> is usually described as Elixir’s answer to <a href='https://jupyter.org/' title=''>Jupyter Notebooks</a>. And that’s a good way to think about it. But Livebook takes full advantage of the Elixir platform, which makes it sneakily powerful. By linking up directly with Elixir app clusters, Livebook can switch easily between driving compute locally or on remote servers, and makes it easy to bring in any kind of data into reproducible workflows.</p>\n\n<p><a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''>FLAME</a> is the Elixir’s answer to serverless computing. By having the library manage a pool of executors for you, FLAME lets you treat your entire application as if it was elastic and scale-to-zero. You configure FLAME with some basic information about where to run code and how many instances it’s allowed to run with, and then mark off any arbitrary section of code with <code>Flame.call</code>. The framework takes care of the rest. It’s the upside of serverless without committing yourself to blowing your app apart into tiny, intricately connected pieces.</p>\n\n<p>The <a href='https://github.com/elixir-nx' title=''>Nx stack</a> is how you do Elixir-native AI and ML. Nx gives you an Elixir-native notion of tensor computations with GPU backends. <a href='https://github.com/elixir-nx/axon' title=''>Axon</a> builds a common interface for ML models on top of it. <a href='https://github.com/elixir-nx/bumblebee' title=''>Bumblebee</a> makes those models available to any Elixir app that wants to download them, from just a couple lines of code.</p>\n\n<p>Here is quick video showing how to transfer a local tensor to a remote GPU, using Livebook, FLAME, and Nx:</p>\n<div class=\"youtube-container\" data-exclude-render>\n <div class=\"youtube-video\">\n <iframe\n width=\"100%\"\n height=\"100%\"\n src=\"https://www.youtube.com/embed/5ImP3gpUSkQ\"\n frameborder=\"0\"\n allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n allowfullscreen>\n </iframe>\n </div>\n</div>\n\n\n<p>Let’s dive into the <a href='https://www.youtube.com/watch?v=4qoHPh0obv0' title=''>keynote</a>.</p>\n<h2 id='poking-a-hole-in-your-infrastructure' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#poking-a-hole-in-your-infrastructure' aria-label='Anchor'></a><span class='plain-code'>Poking a hole in your infrastructure</span></h2>\n<p>Any Livebook, including the one running on your laptop, can start a runtime running on a Fly Machine, in Fly.io’s public cloud. That Elixir machine will (by default) live in your default Fly.io organization, giving it networked access to all the other apps that might live there.</p>\n\n<p>This is an access control situation that mostly just does what you want it to do without asking. Unless you ask it to, Fly.io isn’t exposing anything to the Internet, or to other users of Fly.io. For instance: say we have a database we’re going to use to generate reports. It can hang out on our Fly organization, inside of a private network with no connectivity to the world. We can spin up a Livebook instance that can talk to it, without doing any network or infrastructure engineering to make that happen.</p>\n\n<p>But wait, there’s more. Because this is all Elixir, Livebook also allows you to connect to any running Erlang/Elixir application in your infrastructure to debug, introspect, and monitor them.</p>\n\n<p>Check out this clip of Chris McCord connecting <a href='https://rtt.fly.dev/' title=''>to an existing application</a> during the keynote:</p>\n<div class=\"youtube-container\" data-exclude-render>\n <div class=\"youtube-video\">\n <iframe\n width=\"100%\"\n height=\"100%\"\n src=\"https://www.youtube.com/embed/4qoHPh0obv0?start=1106\"\n frameborder=\"0\"\n allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n allowfullscreen>\n </iframe>\n </div>\n</div>\n\n\n<p>Running a snippet of code from a laptop on a remote server is a neat trick, but Livebook is doing something deeper than that. It’s taking advantage of Erlang/Elixir’s native facility with cluster computation and making it available to the notebook. As a result, when we do things like auto-completing, Livebook delivers results from modules defined on the remote note itself. 🤯</p>\n<h2 id='elastic-scale-with-flame' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#elastic-scale-with-flame' aria-label='Anchor'></a><span class='plain-code'>Elastic scale with FLAME</span></h2>\n<p>When we first introduced FLAME, the example we used was video encoding.</p>\n\n<p>Video encoding is complicated and slow enough that you’d normally make arrangements to run it remotely or in a background job queue, or as a triggerable Lambda function. The point of FLAME is to get rid of all those steps, and give them over to the framework instead. So: we wrote our <code>ffpmeg</code> calls inline like normal code, as if they were going to complete in microseconds, and wrapped them in <code>Flame.call</code> blocks. That was it, that was the demo.</p>\n\n<p>Here, we’re going to put a little AI spin on it.</p>\n\n<p>The first thing we’re doing here is driving FLAME pools from Livebook. Livebook will automatically synchronize your notebook dependencies as well as any module or code defined in your notebook across nodes. That means any code we write in our notebook can be dispatched transparently out to arbitrarily many compute nodes, without ceremony.</p>\n\n<p>Now let’s add some AI flair. We take an object store bucket full of video files. We use <code>ffmpeg</code> to extract stills from the video at different moments. Then: we send them to <a href='https://www.llama.com/' title=''>Llama</a>, running on <a href='https://fly.io/gpu' title=''>GPU Fly Machines</a> (still locked to our organization), to get descriptions of the stills.</p>\n\n<p>All those stills and descriptions get streamed back to our notebook, in real time:</p>\n<div class=\"youtube-container\" data-exclude-render>\n <div class=\"youtube-video\">\n <iframe\n width=\"100%\"\n height=\"100%\"\n src=\"https://www.youtube.com/embed/4qoHPh0obv0?start=1692\"\n frameborder=\"0\"\n allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n allowfullscreen>\n </iframe>\n </div>\n</div>\n\n\n<p>At the end, the descriptions are sent to <a href='https://mistral.ai/' title=''>Mistral</a>, which builds a summary.</p>\n\n<p>Thanks to FLAME, we get explicit control over the minimum and the maximum amount of nodes you want running at once, as well their concurrency settings. As nodes finish processing each video, new ones are automatically sent to them, until the whole bucket has been traversed. Each node will automatically shut down after an idle timeout and the whole cluster terminates if you disconnect the Livebook runtime.</p>\n\n<p>Just like your app code, FLAME lets you take your notebook code designed to run locally, change almost nothing, and elastically execute it across ephemeral infrastructure.</p>\n<h2 id='64-gpus-hyperparameter-tuning-on-a-laptop' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#64-gpus-hyperparameter-tuning-on-a-laptop' aria-label='Anchor'></a><span class='plain-code'>64-GPUs hyperparameter tuning on a laptop</span></h2>\n<p>Next, Chris Grainger, CTO of <a href='https://amplified.ai/' title=''>Amplified</a>, takes the stage.</p>\n\n<p>For work at Amplified, Chris wants to analyze a gigantic archive of patents, on behalf of a client doing edible cannibinoid work. To do that, he uses a BERT model (BERT, from Google, is one of the OG “transformer” models, optimized for text comprehension).</p>\n\n<p>To make the BERT model effective for this task, he’s going to do a hyperparameter training run.</p>\n\n<p>This is a much more complicated AI task than the Llama work we just showed up. Chris is going to generate a cluster of 64 GPU Fly Machines, each running an <a href='https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/' title=''>L40s GPU</a>. On each of these nodes, he needs to:</p>\n\n<ul>\n<li>setup its environment (including native dependencies and GPU bindings)\n</li><li>load the training data\n</li><li>compile a different version of BERT with different parameters, optimizers, etc.\n</li><li>start the fine-tuning\n</li><li>stream its results in real-time to each assigned chart\n</li></ul>\n\n<p>Here’s the clip. You’ll see the results stream in, in real time, directly back to his Livebook. We’ll wait, because it won’t take long to watch:</p>\n<div class=\"youtube-container\" data-exclude-render>\n <div class=\"youtube-video\">\n <iframe\n width=\"100%\"\n height=\"100%\"\n src=\"https://www.youtube.com/embed/4qoHPh0obv0?start=3344\"\n frameborder=\"0\"\n allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n allowfullscreen>\n </iframe>\n </div>\n</div>\n\n<h2 id='this-is-just-the-beginning' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#this-is-just-the-beginning' aria-label='Anchor'></a><span class='plain-code'>This is just the beginning</span></h2>\n<p>The suggestion of mixing Livebook and FLAME to elastically scale notebook execution was originally proposed by Chris Grainger during ElixirConf EU. During the next four months, Jonatan Kłosko, Chris McCord, and José Valim worked part-time on making it a reality in time for ElixirConf US. Our ability to deliver such a rich combination of features in such a short period of time is a testament to the capabilities of the Erlang Virtual Machine, which Elixir and Livebook runs on. Other features, such as <a href='https://github.com/elixir-explorer/explorer/issues/932' title=''>remote dataframes and distributed GC</a>, were implemented in a weekend. Bringing the same functionality to other ecosystems would take several additional months, sometimes accompanied by millions in funding, and often times as part of a closed-source product.</p>\n\n<p>Furthermore, since we announced this feature, <a href='https://github.com/mruoss' title=''>Michael Ruoss</a> stepped in and brought the same functionality to Kubernetes. From Livebook v0.14.1, you can start Livebook runtimes inside a Kubernetes cluster and also use FLAME to elastically scale them. Expect more features and news in this space!</p>\n\n<p>Finally, Fly’s infrastructure played a key role in making it possible to start a cluster of GPUs in seconds rather than minutes, and all it requires is a Docker image. We’re looking forward to see how other technologies and notebook platforms can leverage Fly to also elevate their developer experiences.</p>\n<figure class=\"post-cta\">\n <figcaption>\n <h1>Launch a GPU app in seconds</h1>\n <p>Run your own LLMs or use Livebook for elastic GPU workflows ✨</p>\n <a class=\"btn btn-lg\" href=\"/gpu\">\n Go! <span class='opacity-50'>→</span>\n </a>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>", "image": { "url": "https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/assets/ai-gpu-livebook-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/accident-forgiveness/", "title": "Accident Forgiveness", "description": null, "url": "https://fly.io/blog/accident-forgiveness/", "published": "2024-08-21T00:00:00.000Z", "updated": "2024-08-27T21:13:01.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. <a href=\"https://fly.io/speedrun\" title=\"\">Try it out; you’ll be deployed in just minutes</a>, and, as you’re about to read, with less financial risk.</p>\n</div>\n<p>Public cloud billing is terrifying.</p>\n\n<p>The premise of a public cloud — what sets it apart from a hosting provider — is 8,760 hours/year of on-tap deployable compute, storage, and networking. Cloud resources are “elastic”: they’re acquired and released as needed; in the “cloud-iest” apps, without human intervention. Public cloud resources behave like utilities, and that’s how they’re priced.</p>\n\n<p>You probably can’t tell me how much electricity your home is using right now, and may only come within tens of dollars of accurately predicting your water bill. But neither of those bills are all that scary, because you assume there’s a limit to how much you could run them up in a single billing interval.</p>\n\n<p>That’s not true of public clouds. There are only so many ways to “spend” water at your home, but there are indeterminably many ways to land on a code path that grabs another VM, or to miskey a configuration, or to leak long-running CI/CD environments every time a PR gets merged. Pick a practitioner at random, and I bet they’ve read a story within the last couple months about someone running up a galactic-scale bill at some provider or other.</p>\n<h2 id='implied-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implied-accident-forgiveness' aria-label='Anchor'></a><span class='plain-code'>Implied Accident Forgiveness</span></h2>\n<p>For people who don’t do a lot of cloud work, what all this means is that every PR push sets off a little alarm in the back of their heads: “you may have just incurred $200,000 of costs!”. The alarm is quickly silenced, though it’s still subtly extracting a cortisol penalty. But by deadening the nerves that sense the danger of unexpected charges, those people are nudged closer to themselves being the next story on Twitter about an accidental $200,000 bill.</p>\n\n<p>The saving grace here, which you’ll learn if you ever become that $200,000 story, is that nobody pays those bills.</p>\n\n<p>See, what cloud-savvy people know already is that providers have billing support teams, which spend a big chunk of their time conceding disputed bills. If you do something luridly stupid and rack up costs, AWS and GCP will probably cut you a break. We will too. Everyone does.</p>\n\n<p>If you didn’t already know this, you’re welcome; I’ve made your life a little better, even if you don’t run things on Fly.io.</p>\n\n<p>But as soothing as it is to know you can get a break from cloud providers, the billing situation here is still a long ways away from “good”. If you accidentally add a zero to a scale count and don’t notice for several weeks, AWS or GCP will probably cut you a break. But they won’t <em>definitely</em> do it, and even though your odds are good, you’re still finding out at email- and phone-tag scale speeds. That’s not fun!</p>\n<h2 id='explicit-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#explicit-accident-forgiveness' aria-label='Anchor'></a><span class='plain-code'>Explicit Accident Forgiveness</span></h2>\n<p>Charging you for stuff you didn’t want is bad business.</p>\n\n<p>Good business, we think, means making you so comfortable with your cloud you try new stuff. You, and everyone else on your team. Without a chaperone from the finance department.</p>\n\n<p>So we’re going to do the work to make this official. If you’re a customer of ours, we’re going to charge you in exacting detail for every resource you intentionally use of ours, but if something blows up and you get an unexpected bill, we’re going to let you off the hook.</p>\n<h2 id='not-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-so-fast' aria-label='Anchor'></a><span class='plain-code'>Not So Fast</span></h2>\n<p>This is a Project, with a capital P. While we’re kind of kicking ourselves for not starting it earlier, there are reasons we couldn’t do it back in 2020.</p>\n\n<p>The Fully Automated Accident-Forgiving Billing System of the Future (which we are in fact building and may even one day ship) will give you a line-item veto on your invoice. We are a long ways away. The biggest reason is fraud.</p>\n\n<p>Sit back, close your eyes, and try to think about everything public clouds do to make your life harder. Chances are, most of those things are responses to fraud. Cloud platforms attract fraudsters like ants to an upturned ice cream cone. Thanks to the modern science of cryptography, fraudsters have had a 15 year head start on turning metered compute into picodollar-granular near-money assets.</p>\n\n<p>Since there’s no bouncer at the door checking IDs here, an open-ended and automated commitment to accident forgiveness is, with iron certainty, going to be used overwhelmingly in order to trick us into “forgiving” cryptocurrency miners. We’re cloud platform engineers. They’re our primary pathogen.</p>\n\n<p>So, we’re going to roll this out incrementally.</p>\n<div class=\"callout\"><p><strong class=\"font-semibold text-navy-950\">Why not billing alerts?</strong> We’ll get there, but here too there are reasons we haven’t yet: (1) meaningful billing alerts were incredibly difficult to do with our previous billing system, and building the new system and migrating our customers to it has been a huge lift, a nightmare from which we are only now waking (the billing system’s official name); and (2) we’re wary about alerts being a product design cop-out; if we can alert on something, why aren’t we fixing it?</p>\n</div><h2 id='accident-forgiveness-v0-84beta' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accident-forgiveness-v0-84beta' aria-label='Anchor'></a><span class='plain-code'>Accident Forgiveness v0.84beta</span></h2>\n<p>All the same subtextual, implied reassurances that every cloud provider offers remain in place at Fly.io. You are strictly better off after this announcement, we promise.</p>\n<div class=\"right-sidenote\"><p>I added the “almost” right before publishing, because I’m chicken.</p>\n</div>\n<p>Now: for customers that have a support contract with us, at any level, there’s something new: I’m saying the quiet part loud. The next time you see a bill with an unexpected charge on it, we’ll refund that charge, (almost) no questions asked.</p>\n\n<p>That policy is so simple it feels anticlimactic to write. So, some additional color commentary:</p>\n\n<p>We’re not advertising a limit to the number of times you can do this. If you’re a serious customer of ours, I promise that you cannot remotely fathom the fullness of our fellow-feeling. You’re not annoying us by getting us to refund unexpected charges. If you are growing a project on Fly.io, we will bend over backwards to keep you growing.</p>\n\n<p>How far can we take this? How simple can we keep this policy? We’re going to find out together.</p>\n\n<p>To begin with, and in the spirit of “doing things that won’t scale”, when we forgive a bill, what’s going to happen next is this: I’m going to set an irritating personal reminder for Kurt to look into what happened, now and then the day before your next bill, so we can see what’s going wrong. He’s going to hate that, which is the point: our best feature work is driven by Kurt-hate.</p>\n\n<p>Obviously, if you’re rubbing your hands together excitedly over the opportunity this policy presents, then, well, not so much with the fellow-feeling. We reserve the right to cut you off.</p>\n<figure class=\"post-cta\">\n <figcaption>\n <h1>Support For Developers, By Developers</h1>\n <p>Explicit Accident Forgiveness is just one thing we like about Support at Fly.io.</p>\n <a class=\"btn btn-lg\" href=\"https://fly.io/accident-forgiveness\">\n Go find out! <span class='opacity-50'>→</span>\n </a>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-kitty.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>\n\n<h2 id='whats-next-accident-protection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next-accident-protection' aria-label='Anchor'></a><span class='plain-code'>What’s Next: Accident Protection</span></h2>\n<p>We think this is a pretty good first step. But that’s all it is.</p>\n\n<p>We can do better than offering you easy refunds for mistaken deployments and botched CI/CD jobs. What’s better than getting a refund is never incurring the charge to begin with, and that’s the next step we’re working on.</p>\n<div class=\"right-sidenote\"><p>More to come on that billing system.</p>\n</div>\n<p>We built a new billing system so that we can do things like that. For instance: we’re in a position to catch sudden spikes in your month-over-month bills, flag them, and catch weird-looking deployments before we bill for them.</p>\n\n<p>Another thing we rebuilt billing for is <a href='https://community.fly.io/t/reservation-blocks-40-discount-on-machines-when-youre-ready-to-commit/20858' title=''>reserved pricing</a>. Already today you can get a steep discount from us reserving blocks of compute in advance. The trick to taking advantage of reserved pricing is confidently predicting a floor to your usage. For a lot of people, that means fighting feelings of loss aversion (nobody wants to get gym priced!). So another thing we can do in this same vein: catch opportunities to move customers to reserved blocks, and offer backdated reservations. We’ll figure this out too.</p>\n\n<p>Someday, when we’re in a monopoly position, our founders have all been replaced by ruthless MBAs, and Kurt has retired to farm coffee beans in lower Montana, we may stop doing this stuff. But until that day this is the right choice for our business.</p>\n\n<p>Meanwhile: like every public cloud, we provision our own hardware, and we have excess capacity. Your messed-up CI/CD jobs didn’t really cost us anything, so if you didn’t really want them, they shouldn’t cost you anything either. Take us up on this! We love talking to you.</p>", "image": { "url": "https://fly.io/blog/accident-forgiveness/assets/money-for-mistakes-blog-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/", "title": "We're Cutting L40S Prices In Half", "description": null, "url": "https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/", "published": "2024-08-15T00:00:00.000Z", "updated": "2024-08-16T02:01:46.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. And as of today, cheaper GPUs. <a href=\"https://fly.io/speedrun\" title=\"\">Try it out; you’ll be deployed in just minutes</a>.</p>\n</div>\n<p>We just lowered the prices on NVIDIA L40s GPUs to $1.25 per hour. Why? Because our feet are cold and we burn processor cycles for heat. But also other reasons.</p>\n\n<p>Let’s back up.</p>\n\n<p>We offer 4 different NVIDIA GPU models; in increasing order of performance, they’re the A10, the L40S, the 40G PCI A100, and the 80G SXM A100. Guess which one is most popular.</p>\n\n<p>We guessed wrong, and spent a lot of time working out how to maximize the amount of GPU power we could deliver to a single Fly Machine. Users surprised us. By a wide margin, the most popular GPU in our inventory is the A10.</p>\n\n<p>The A10 is an older generation of NVIDIA GPU with fewer, slower cores and less memory. It’s the least capable GPU we offer. But that doesn’t matter, because it’s capable enough. It’s solid for random inference tasks, and handles mid-sized generative AI stuff like Mistral Nemo or Stable Diffusion. For those workloads, there’s not that much benefit in getting a beefier GPU.</p>\n\n<p>As a result, we can’t get new A10s in fast enough for our users.</p>\n\n<p>If there’s one thing we’ve learned by talking to our customers over the last 4 years, it’s that y'all love a peek behind the curtain. So we’re going to let you in on a little secret about how a hardware provider like Fly.io formulates GPU strategy: none of us know what the hell we’re doing.</p>\n\n<p>If you had asked us in 2023 what the biggest GPU problem we could solve was, we’d have said “selling fractional A100 slices”. We burned a whole quarter trying to get MIG, or at least vGPUs, working through IOMMU PCI passthrough on Fly Machines, in a project so cursed that Thomas has forsworn ever programming again. Then we went to market selling whole A100s, and for several more months it looked like the biggest problem we needed to solve was finding a secure way to expose NVLink-ganged A100 clusters to VMs so users could run training. Then H100s; can we find H100s anywhere? Maybe in a black market in Shenzhen?</p>\n\n<p>And here we are, a year later, looking at the data, and the least sexy, least interesting GPU part in the catalog is where all the action is.</p>\n\n<p>With actual customer data to back up the hypothesis, here’s what we think is happening today:</p>\n\n<ul>\n<li>Most users who want to plug GPU-accelerated AI workloads into fast networks are doing inference, not training. \n</li><li>The hyperscaler public clouds strangle these customers, first with GPU instance surcharges, and then with egress fees for object storage data when those customers try to outsource the GPU stuff to GPU providers.\n</li><li>If you’re trying to do something GPU-accelerated in response to an HTTP request, the right combination of GPU, instance RAM, fast object storage for datasets and model parameters, and networking is much more important than getting your hands on an H100.\n</li></ul>\n\n<p>This is a thing we didn’t see coming, but should have: training workloads tend to look more like batch jobs, and inference tends to look more like transactions. Batch training jobs aren’t that sensitive to networking or even reliability. Live inference jobs responding to end-user HTTP requests are. So, given our pricing, of course the A10s are a sweet spot.</p>\n\n<p>The next step up in our lineup after the A10 is the L40S. The L40S is a nice piece of kit. We’re going to take a beat here and sell you on the L40S, because it’s kind of awesome.</p>\n\n<p>The L40S is an AI-optimized version of the L40, which is the data center version of the GeForce RTX 4090, resembling two 4090s stapled together.</p>\n\n<p>If you’re not a GPU hardware person, the RTX 4090 is a gaming GPU, the kind you’d play ray-traced Witcher 3 on. NVIDIA’s high-end gaming GPUs are actually reasonably good at AI workloads! But they suck in a data center rack: they chug power, they’re hard to cool, and they’re less dense. Also, NVIDIA can’t charge as much for them.</p>\n\n<p>Hence the L40: (much) more memory, less energy consumption, designed for a rack, not a tower case. Marked up for “enterprise”.</p>\n\n<p>NVIDIA positioned the L40 as a kind of “graphics” AI GPU. Unlike the super high-end cards like the A100/H100, the L40 keeps all the rendering hardware, so it’s good for 3D graphics and video processing. Which is sort of what you’d expect from a “professionalized” GeForce card.</p>\n\n<p>A funny thing happened in the middle of 2023, though: the market for ultra-high-end NVIDIA cards went absolutely batshit. The huge cards you’d gang up for training jobs got impossible to find, and NVIDIA became one of the most valuable companies in the world. Serious shops started working out plans to acquire groups of L40-type cards to work around the problem, whether or not they had graphics workloads.</p>\n\n<p>The only company in this space that does know what they’re doing is NVIDIA. Nobody has written a highly-ranked Reddit post about GPU workloads without NVIDIA noticing and creating a new SKU. So they launched the L40S, which is an L40 with AI workload compute performance comparable to that of the A100 (without us getting into the details of F32 vs. F16 models).</p>\n\n<p>Long story short, the L40S is an A100-performer that we can price for A10 customers; the Volkswagen GTI of our lineup. We’re going to see if we can make that happen.</p>\n\n<p>We think the combination of just-right-sized inference GPUs and Tigris object storage is pretty killer:</p>\n\n<ul>\n<li>model parameters, data sets, and compute are all close together\n</li><li>everything plugged into an Anycast network that’s fast everywhere in the world\n</li><li>on VM instances that have enough memory to actually run real frameworks on\n</li><li>priced like we actually want you to use it.\n</li></ul>\n\n<p>You should use L40S cards without thinking hard about it. So we’re making it official. You won’t pay us a dime extra to use one instead of an A10. Have at it! Revolutionize the industry. For $1.25 an hour.</p>\n\n<p>Here are things you can do with an L40S on Fly.io today:</p>\n\n<ul>\n<li>You can run Llama 3.1 70B — a big Llama — for LLM jobs.\n</li><li>You can run Flux from Black Forest Labs for genAI images.\n</li><li>You can run Whisper for automated speech recognition.\n</li><li>You can do whole-genome alignment with SegAlign (Thomas’ biochemist kid who has been snatching free GPU hours for his lab gave us this one, and we’re taking his word for it).\n</li><li>You can run DOOM Eternal, building the Stadia that Google couldn’t pull off, because the L40S hasn’t forgotten that it’s a graphics GPU. \n</li></ul>\n\n<p>It’s going to get chilly in Chicago in a month or so. Go light some cycles on fire! </p>", "image": { "url": "https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/assets/gpu-ga-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/machine-migrations/", "title": "Making Machines Move", "description": null, "url": "https://fly.io/blog/machine-migrations/", "published": "2024-07-30T00:00:00.000Z", "updated": "2024-08-07T00:54:26.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io, a global public cloud with simple, developer-friendly ergonomics. If you’ve got a working Docker image, we’ll transmogrify it into a Fly Machine: a VM running on our hardware anywhere in the world. <a href=\"https://fly.io/speedrun\" title=\"\">Try it out; you’ll be deployed in just minutes</a>.</p>\n</div>\n<p>At the heart of our platform is a systems design tradeoff about durable storage for applications. When we added storage three years ago, to support stateful apps, we built it on attached NVMe drives. A benefit: a Fly App accessing a file on a Fly Volume is never more than a bus hop away from the data. A cost: a Fly App with an attached Volume is anchored to a particular worker physical.</p>\n<div class=\"right-sidenote\"><p><code>bird</code>: a BGP4 route server.</p>\n</div>\n<p>Before offering attached storage, our on-call runbook was almost as simple as “de-bird that edge server”, “tell <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>Nomad</a> to drain that worker”, and “go back to sleep”. NVMe cost us that drain operation, which terribly complicated the lives of our infra team. We’ve spent the last year getting “drain” back. It’s one of the biggest engineering lifts we’ve made, and if you didn’t notice, we lifted it cleanly.</p>\n<h3 id='the-goalposts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-goalposts' aria-label='Anchor'></a><span class='plain-code'>The Goalposts</span></h3>\n<p>With stateless apps, draining a worker is easy. For each app instance running on the victim server, start a new instance elsewhere. Confirm it’s healthy, then kill the old one. Rinse, repeat. At our 2020 scale, we could drain a fully loaded worker in just a handful of minutes.</p>\n\n<p>You can see why this process won’t work for apps with attached volumes. Sure, create a new volume elsewhere on the fleet, and boot up a new Fly Machine attached to it. But the new volume is empty. The data’s still stuck on the original worker. We asked, and customers were not OK with this kind of migration.</p>\n\n<p>Of course, we back Volumes snapshots up (at an interval) to off-network storage. But for “drain”, restoring backups isn’t nearly good enough. No matter the backup interval, a “restore from backup migration\" will lose data, and a “backup and restore” migration incurs untenable downtime.</p>\n\n<p>The next thought you have is, “OK, copy the volume over”. And, yes, of course you have to do that. But you can’t just <code>copy</code>, <code>boot</code>, and then <code>kill</code> the old Fly Machine. Because the original Fly Machine is still alive and writing, you have to <code>kill</code>first, then <code>copy</code>, then <code>boot</code>.</p>\n\n<p>Fly Volumes can get pretty big. Even to a rack buddy physical server, you’ll hit a point where draining incurs minutes of interruption, especially if you’re moving lots of volumes simultaneously. <code>Kill</code>, <code>copy</code>, <code>boot</code> is too slow.</p>\n<div class=\"callout\"><p>There’s a world where even 15 minutes of interruption is tolerable. It’s the world where you run more than one instance of your application to begin with, so prolonged interruption of a single Fly Machine isn’t visible to your users. Do this! But we have to live in the same world as our customers, many of whom don’t run in high-availability configurations.</p>\n</div><h3 id='behold-the-clone-o-mat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#behold-the-clone-o-mat' aria-label='Anchor'></a><span class='plain-code'>Behold The Clone-O-Mat</span></h3>\n<p><code>Copy</code>, <code>boot</code>, <code>kill</code> loses data. <code>Kill</code>, <code>copy</code>, <code>boot</code> takes too long. What we needed is a new operation: <code>clone</code>.</p>\n\n<p><code>Clone</code> is a lazier, asynchronous <code>copy</code>. It creates a new volume elsewhere on our fleet, just like <code>copy</code> would. But instead of blocking, waiting to transfer every byte from the original volume, <code>clone</code> returns immediately, with a transfer running in the background.</p>\n\n<p>A new Fly Machine can be booted with that cloned volume attached. Its blocks are mostly empty. But that’s OK: when the new Fly Machine tries to read from it, the block storage system works out whether the block has been transferred. If it hasn’t, it’s fetched over the network from the original volume; this is called “hydration”. Writes are even easier, and don’t hit the network at all.</p>\n\n<p><code>Kill</code>, <code>copy</code>, <code>boot</code> is slow. But <code>kill</code>, <code>clone</code>, <code>boot</code> is fast; it can be made asymptotically as fast as stateless migration.</p>\n\n<p>There are three big moving pieces to this design.</p>\n\n<ol>\n<li>First, we have to rig up our OS storage system to make this <code>clone</code> operation work.\n</li><li>Then, to read blocks over the network, we need a network protocol. (Spoiler: iSCSI, though we tried other stuff first.)\n</li><li>Finally, the gnarliest of those pieces is our orchestration logic: what’s running where, what state is it in, and whether it’s plugged in correctly.\n</li></ol>\n<h3 id='block-level-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#block-level-clone' aria-label='Anchor'></a><span class='plain-code'>Block-Level Clone</span></h3>\n<p>The Linux feature we need to make this work already exists; <a href='https://docs.kernel.org/admin-guide/device-mapper/dm-clone.html' title=''>it’s called <code>dm-clone</code></a>. Given an existing, readable storage device, <code>dm-clone</code> gives us a new device, of identical size, where reads of uninitialized blocks will pull from the original. It sounds terribly complicated, but it’s actually one of the simpler kernel lego bricks. Let’s demystify it.</p>\n\n<p>As far as Unix is concerned, random-access storage devices, be they spinning rust or NVMe drives, are all instances of the common class “block device”. A block device is addressed in fixed-size (say, 4KiB) chunks, and <a href='https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L356' title=''>handles (roughly) these operations</a>:</p>\n<div class=\"highlight-wrapper group relative cpp\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-aokru06k\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-aokru06k\"><span class=\"k\">enum</span> <span class=\"n\">req_opf</span> <span class=\"p\">{</span>\n <span class=\"cm\">/* read sectors from the device */</span>\n <span class=\"n\">REQ_OP_READ</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">,</span>\n <span class=\"cm\">/* write sectors to the device */</span>\n <span class=\"n\">REQ_OP_WRITE</span> <span class=\"o\">=</span> <span class=\"mi\">1</span><span class=\"p\">,</span>\n <span class=\"cm\">/* flush the volatile write cache */</span>\n <span class=\"n\">REQ_OP_FLUSH</span> <span class=\"o\">=</span> <span class=\"mi\">2</span><span class=\"p\">,</span>\n <span class=\"cm\">/* discard sectors */</span>\n <span class=\"n\">REQ_OP_DISCARD</span> <span class=\"o\">=</span> <span class=\"mi\">3</span><span class=\"p\">,</span>\n <span class=\"cm\">/* securely erase sectors */</span>\n <span class=\"n\">REQ_OP_SECURE_ERASE</span> <span class=\"o\">=</span> <span class=\"mi\">5</span><span class=\"p\">,</span>\n <span class=\"cm\">/* write the same sector many times */</span>\n <span class=\"n\">REQ_OP_WRITE_SAME</span> <span class=\"o\">=</span> <span class=\"mi\">7</span><span class=\"p\">,</span>\n <span class=\"cm\">/* write the zero filled sector many times */</span>\n <span class=\"n\">REQ_OP_WRITE_ZEROES</span> <span class=\"o\">=</span> <span class=\"mi\">9</span><span class=\"p\">,</span>\n <span class=\"cm\">/* ... */</span>\n<span class=\"p\">};</span>\n</code></pre>\n </div>\n</div>\n<p>You can imagine designing a simple network protocol that supported all these options. It might have messages that looked something like:</p>\n\n<p><img alt=\"A packet diagram, just skip down to \"struct bio\" below\" src=\"/blog/machine-migrations/assets/packet.png?2/3¢er\" />\nGood news! The Linux block system is organized as if your computer was a network running a protocol that basically looks just like that. Here’s the message structure:</p>\n<div class=\"right-sidenote\"><p>I’ve <a href=\"https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L223\" title=\"\">stripped a bunch of stuff out of here</a> but you don’t need any of it to understand what’s coming next.</p>\n</div><div class=\"highlight-wrapper group relative cpp\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-6neynwnf\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-6neynwnf\"><span class=\"cm\">/*\n * main unit of I/O for the block layer and lower layers (ie drivers and\n * stacking drivers)\n */</span>\n<span class=\"k\">struct</span> <span class=\"nc\">bio</span> <span class=\"p\">{</span>\n <span class=\"k\">struct</span> <span class=\"nc\">gendisk</span> <span class=\"o\">*</span><span class=\"n\">bi_disk</span><span class=\"p\">;</span>\n <span class=\"kt\">unsigned</span> <span class=\"kt\">int</span> <span class=\"n\">bi_opf</span>\n <span class=\"kt\">unsigned</span> <span class=\"kt\">short</span> <span class=\"n\">bi_flags</span><span class=\"p\">;</span> \n <span class=\"kt\">unsigned</span> <span class=\"kt\">short</span> <span class=\"n\">bi_ioprio</span><span class=\"p\">;</span>\n <span class=\"n\">blk_status_t</span> <span class=\"n\">bi_status</span><span class=\"p\">;</span>\n <span class=\"kt\">unsigned</span> <span class=\"kt\">short</span> <span class=\"n\">bi_vcnt</span><span class=\"p\">;</span> <span class=\"cm\">/* how many bio_vec's */</span>\n <span class=\"k\">struct</span> <span class=\"nc\">bio_vec</span> <span class=\"n\">bi_inline_vecs</span><span class=\"p\">[]</span> <span class=\"cm\">/* (page, len, offset) tuples */</span><span class=\"p\">;</span>\n<span class=\"p\">};</span>\n</code></pre>\n </div>\n</div>\n<p>No nerd has ever looked at a fixed-format message like this without thinking about writing a proxy for it, and <code>struct bio</code> is no exception. The proxy system in the Linux kernel for <code>struct bio</code> is called <code>device mapper</code>, or DM.</p>\n\n<p>DM target devices can plug into other DM devices. For that matter, they can do whatever the hell else they want, as long as they honor the interface. It boils down to a <code>map(bio)</code> function, which can dispatch a <code>struct bio</code>, or drop it, or muck with it and ask the kernel to resubmit it.</p>\n\n<p>You can do a whole lot of stuff with this interface: carve a big device into a bunch of smaller ones (<a href='https://docs.kernel.org/admin-guide/device-mapper/linear.html' title=''><code>dm-linear</code></a>), make one big striped device out of a bunch of smaller ones (<a href='https://docs.kernel.org/admin-guide/device-mapper/striped.html' title=''><code>dm-stripe</code></a>), do software RAID mirroring (<code>dm-raid1</code>), create snapshots of arbitrary existing devices (<a href='https://docs.kernel.org/admin-guide/device-mapper/snapshot.html' title=''><code>dm-snap</code></a>), cryptographically verify boot devices (<a href='https://docs.kernel.org/admin-guide/device-mapper/verity.html' title=''><code>dm-verity</code></a>), and a bunch more. Device Mapper is the kernel backend for the <a href='https://sourceware.org/lvm2/' title=''>userland LVM2 system</a>, which is how we do <a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''>thin pools and snapshot backups</a>.</p>\n\n<p>Which brings us to <code>dm-clone</code> : it’s a map function that boils down to:</p>\n<div class=\"highlight-wrapper group relative cpp\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-rj5y343v\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-rj5y343v\"> <span class=\"cm\">/* ... */</span> \n <span class=\"n\">region_nr</span> <span class=\"o\">=</span> <span class=\"n\">bio_to_region</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n\n <span class=\"c1\">// we have the data</span>\n <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"n\">dm_clone_is_region_hydrated</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"o\">-></span><span class=\"n\">cmd</span><span class=\"p\">,</span> <span class=\"n\">region_nr</span><span class=\"p\">))</span> <span class=\"p\">{</span>\n <span class=\"n\">remap_and_issue</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n <span class=\"k\">return</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n\n <span class=\"c1\">// we don't and it's a read</span>\n <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"n\">bio_data_dir</span><span class=\"p\">(</span><span class=\"n\">bio</span><span class=\"p\">)</span> <span class=\"o\">==</span> <span class=\"n\">READ</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n <span class=\"n\">remap_to_source</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n <span class=\"k\">return</span> <span class=\"mi\">1</span><span class=\"p\">;</span>\n <span class=\"p\">}</span>\n\n <span class=\"c1\">// we don't and it's a write</span>\n <span class=\"n\">remap_to_dest</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n <span class=\"n\">hydrate_bio_region</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n <span class=\"k\">return</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n <span class=\"cm\">/* ... */</span> \n</code></pre>\n </div>\n</div><div class=\"right-sidenote\"><p>a <a href=\"https://docs.kernel.org/admin-guide/device-mapper/kcopyd.html\" title=\"\"><code>kcopyd</code></a> thread runs in the background, rehydrating the device in addition to (and independent of) read accesses.</p>\n</div>\n<p><code>dm-clone</code> takes, in addition to the source device to clone from, a “metadata” device on which is stored a bitmap of the status of all the blocks: either “rehydrated” from the source, or not. That’s how it knows whether to fetch a block from the original device or the clone.</p>\n<h3 id='network-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#network-clone' aria-label='Anchor'></a><span class='plain-code'>Network Clone</span></h3><div class=\"callout\"><p><strong class=\"font-semibold text-navy-950\"><code>flyd</code> in a nutshell:</strong> worker physical run a service, <code>flyd</code>, which manages a couple of databases that are the source of truth for all the Fly Machines running there. Concepturally, <code>flyd</code> is a server for on-demand instances of durable finite state machines, each representing some operation on a Fly Machine (creation, start, stop, &c), with the transition steps recorded carefully in a BoltDB database. An FSM step might be something like “assign a local IPv6 address to this interface”, or “check out a block device with the contents of this container”, and it’s straightforward to add and manage new ones.</p>\n</div>\n<p>Say we’ve got <code>flyd</code> managing a Fly Machine with a volume on <code>worker-xx-cdg1-1</code>. We want it running on <code>worker-xx-cdg1-2</code>. Our whole fleet is meshed with WireGuard; everything can talk directly to everything else. So, conceptually:</p>\n\n<ol>\n<li><code>flyd</code> on <code>cdg1-1</code> stops the Fly Machine, and\n</li><li>sends a message to <code>flyd</code> on <code>cdg1-2</code> telling it to clone the source volume.\n</li><li><code>flyd</code> on <code>cdg1-2</code> starts a <code>dm-clone</code> instance, which creates a clone volume on <code>cdg1-2</code>, populating it, over some kind of network block protocol, from <code>cdg1-1</code>, and\n</li><li>boots a new Fly Machine, attached to the clone volume.\n</li><li><code>flyd</code> on <code>cdg1-2</code> monitors the clone operation, and, when the clone completes, converts the clone device to a simple linear device and cleans up.\n</li></ol>\n\n<p>For step (3) to work, the “original volume” on <code>cdg1-1</code> has to be visible on <code>cdg1-2</code>, which means we need to mount it over the network.</p>\n<div class=\"right-sidenote\"><p><code>nbd</code> is so simple that it’s used as a sort of <code>dm-user</code> userland block device; to prototype a new block device, <a href=\"https://lwn.net/ml/linux-kernel/[email protected]/\" title=\"\">don’t bother writing a kernel module</a>, just write an <code>nbd</code> server.</p>\n</div>\n<p>Take your pick of protocols. iSCSI is the obvious one, but it’s relatively complicated, and Linux has native support for a much simpler one: <code>nbd</code>, the “network block device”. You could implement an <code>nbd</code> server in an afternoon, on top of a file or a SQLite database or S3, and the Linux kernel could mount it as a drive.</p>\n\n<p>We started out using <code>nbd</code>. But we kept getting stuck <code>nbd</code> kernel threads when there was any kind of network disruption. We’re a global public cloud; network disruption happens. Honestly, we could have debugged our way through this. But it was simpler just to spike out an iSCSI implementation, observe that didn’t get jammed up when the network hiccuped, and move on.</p>\n<h3 id='putting-the-pieces-together' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#putting-the-pieces-together' aria-label='Anchor'></a><span class='plain-code'>Putting The Pieces Together</span></h3>\n<p>To drain a worker with minimal downtime and no lost data, we turn workers into a temporary SANs, serving the volumes we need to drain to fresh-booted replica Fly Machines on a bunch of “target” physicals. Those SANs — combinations of <code>dm-clone</code>, iSCSI, and <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>our <code>flyd</code> orchestrator</a> — track the blocks copied from the origin, copying each one exactly once and cleaning up when the original volume has been fully copied.</p>\n\n<p>Problem solved!</p>\n<h3 id='no-there-were-more-problems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#no-there-were-more-problems' aria-label='Anchor'></a><span class='plain-code'>No, There Were More Problems</span></h3>\n<p>When your problem domain is hard, anything you build whose design you can’t fit completely in your head is going to be a fiasco. Shorter form: “if you see Raft consensus in a design, we’ve done something wrong”.</p>\n\n<p>A virtue of this migration system is that, for as many moving pieces as it has, it fits in your head. What complexity it has is mostly shouldered by strategic bets we’ve already built teams around, most notably the <code>flyd</code> orchestrator. So we’ve been running this system for the better part of a year without much drama. Not no drama, though. Some drama.</p>\n\n<p>Example: we encrypt volumes. Our key management is fussy. We do per-volume encryption keys that provision alongside the volumes themselves, so no one worker has a volume skeleton key.</p>\n\n<p>If you think “migrating those volume keys from worker to worker” is the problem I’m building up to, well, that too, but the bigger problem is <code>trim</code>.</p>\n\n<p>Most people use just a small fraction of the volumes they allocate. A 100GiB volume with just 5MiB used wouldn’t be at all weird. You don’t want to spend minutes copying a volume that could have been fully hydrated in seconds.</p>\n\n<p>And indeed, <code>dm-clone</code> doesn’t want to do that either. Given a source block device (for us, an iSCSI mount) and the clone device, a <code>DISCARD</code> issued on the clone device will get picked up by <code>dm-clone</code>, which will simply <a href='https://elixir.bootlin.com/linux/v5.11.11/source/drivers/md/dm-clone-target.c#L1357' title=''>short-circuit the read</a> of the relevant blocks by marking them as hydrated in the metadata volume. Simple enough.</p>\n\n<p>To make that work, we need the target worker to see the plaintext of the source volume (so that it can do an <code>fstrim</code> — don’t get us started on how annoying it is to sandbox this — to read the filesystem, identify the unused block, and issue the <code>DISCARDs</code> where <code>dm-clone</code> can see them) Easy enough.</p>\n<div class=\"right-sidenote\"><p>these curses have a lot to do with how hard it was to drain workers!</p>\n</div>\n<p>Except: two different workers, for cursed reasons, might be running different versions of <a href='https://gitlab.com/cryptsetup/cryptsetup' title=''>cryptsetup</a>, the userland bridge between LUKS2 and the <a href='https://docs.kernel.org/admin-guide/device-mapper/dm-crypt.html' title=''>kernel dm-crypt driver</a>. There are (or were) two different versions of cryptsetup on our network, and they default to different <a href='https://fossies.org/linux/cryptsetup/docs/on-disk-format-luks2.pdf' title=''>LUKS2 header sizes</a> — 4MiB and 16MiB. Implying two different plaintext volume sizes. </p>\n\n<p>So now part of the migration FSM is an RPC call that carries metadata about the designed LUKS2 configuration for the target VM. Not something we expected to have to build, but, whatever.</p>\n<div class=\"right-sidenote\"><p>Corrosion deserves its own post.</p>\n</div>\n<p>Gnarlier example: workers are the source of truth for information about the Fly Machines running on them. Migration knocks the legs out from under that constraint, which we were relying on in Corrosion, the SWIM-gossip SQLite database we use to connect Fly Machines to our request routing. Race conditions. Debugging. Design changes. Next!</p>\n\n<p>Gnarliest example: our private networks. Recall: we automatically place every Fly Machine into <a href='https://fly.io/blog/incoming-6pn-private-networks/' title=''>a private network</a>; by default, it’s the one all the other apps in your organization run in. This is super handy for setting up background services, databases, and clustered applications. 20 lines of eBPF code in our worker kernels keeps anybody from “crossing the streams”, sending packets from one private network to another.</p>\n<div class=\"right-sidenote\"><p>we’re members of an elite cadre of idiots who have managed to find designs that made us wish IPv6 addresses were even bigger.</p>\n</div>\n<p>We call this scheme 6PN (for “IPv6 Private Network”). It functions by <a href='https://fly.io/blog/ipv6-wireguard-peering#ipv6-private-networking-at-fly' title=''>embedding routing information directly into IPv6 addresses</a>. This is, perhaps, gross. But it allows us to route diverse private networks with constantly changing membership across a global fleet of servers without running a distributed routing protocol. As the beardy wizards who kept the Internet backbone up and running on Cisco AGS+’s once said: the best routing protocol is “static”.</p>\n\n<p>Problem: the embedded routing information in a 6PN address refers in part to specific worker servers.</p>\n\n<p>That’s fine, right? They’re IPv6 addresses. Nobody uses literal IPv6 addresses. Nobody uses IP addresses at all; they use the DNS. When you migrate a host, just give it a new 6PN address, and update the DNS.</p>\n\n<p>Friends, somebody did use literal IPv6 addresses. It was us. In the configurations for Fly Postgres clusters.</p>\n<div class=\"right-sidenote\"><p>It’s also not operationally easy for us to shell into random Fly Machines, for good reason.</p>\n</div>\n<p>The obvious fix for this is not complicated; given <code>flyctl</code> ssh access to a Fly Postgres cluster, it’s like a 30 second ninja edit. But we run a <em>lot</em> of Fly Postgres clusters, and the change has to be coordinated carefully to avoid getting the cluster into a confused state. We went as far as adding feature to our <code>init</code> to do network address mappings to keep old 6PN addresses reachable before biting the bullet and burning several weeks doing the direct configuration fix fleet-wide.</p>\n<figure class=\"post-cta\">\n <figcaption>\n <h1>Speedrun your app onto Fly.io.</h1>\n <p>3…2…1…</p>\n <a class=\"btn btn-lg\" href=\"https://fly.io/speedrun\">\n Go! <span class='opacity-50'>→</span>\n </a>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-kitty.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>\n\n<h3 id='the-learning-it-burns' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-learning-it-burns' aria-label='Anchor'></a><span class='plain-code'>The Learning, It Burns!</span></h3>\n<p>We get asked a lot why we don’t do storage the “obvious” way, with an <a href='https://aws.amazon.com/ebs/' title=''>EBS-type</a> SAN fabric, abstracting it away from our compute. Locally-attached NVMe storage is an idiosyncratic choice, one we’ve had to write disclaimers for (single-node clusters can lose data!) since we first launched it.</p>\n\n<p>One answer is: we’re a startup. Building SAN infrastructure in every region we operate in would be tremendously expensive. Look at any feature in AWS that normal people know the name of, like EC2, EBS, RDS, or S3 — there’s a whole company in there. We launched storage when we were just 10 people, and even at our current size we probably have nothing resembling the resources EBS gets. AWS is pretty great!</p>\n\n<p>But another thing to keep in mind is: we’re learning as we go. And so even if we had the means to do an EBS-style SAN, we might not build it today.</p>\n\n<p>Instead, we’re a lot more interested in log-structured virtual disks (LSVD). LSVD uses NVMe as a local cache, but durably persists writes in object storage. You get most of the performance benefit of bus-hop disk writes, along with unbounded storage and S3-grade reliability.</p>\n\n<p><a href='https://community.fly.io/t/bottomless-s3-backed-volumes/15648' title=''>We launched LSVD experimentally last year</a>; in the intervening year, something happened to make LSVD even more interesting to us: <a href='https://www.tigrisdata.com/' title=''>Tigris Data</a> launched S3-compatible object storage in every one our regions, so instead of backhauling updates to Northern Virginia, <a href='https://community.fly.io/t/tigris-backed-volumes/20792' title=''>we can keep them local</a>. We have more to say about LSVD, and a lot more to say about Tigris.</p>\n\n<p>Our first several months of migrations were done gingerly. By summer of 2024, we got to where our infra team can pull “drain this host” out of their toolbelt without much ceremony.</p>\n\n<p>We’re still not to the point where we’re migrating casually. Your Fly Machines are probably not getting migrated! There’d need to be a reason! But the dream is fully-automated luxury space migration, in which you might get migrated semiregularly, as our systems work not just to drain problematic hosts but to rebalance workloads regularly. No time soon. But we’ll get there.</p>\n\n<p>This is the biggest thing our team has done since we replaced Nomad with flyd. Only the new billing system comes close. We did this thing not because it was easy, but because we thought it would be easy. It was not. But: worth it!</p>", "image": { "url": "https://fly.io/blog/machine-migrations/assets/migrations-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/oidc-cloud-roles/", "title": "AWS without Access Keys", "description": null, "url": "https://fly.io/blog/oidc-cloud-roles/", "published": "2024-06-19T00:00:00.000Z", "updated": "2024-06-27T14:03:59.000Z", "content": "<div class=\"lead\"><p>It’s dangerous to go alone. Fly.io runs full-stack apps by transmuting Docker containers into Fly Machines: ultra-lightweight hardware-backed VMs. You can run all your dependencies on Fly.io, but sometimes, you’ll need to work with other clouds, and we’ve made that pretty simple. Try Fly.io out for yourself; your Rails or Node app <a href=\"https://fly.io/speedrun\" title=\"\">can be up and running in just minutes</a>.</p>\n</div>\n<p>Let’s hypopulate you an app serving generative AI cat images based on the weather forecast, running on a <code>g4dn.xlarge</code> ECS task in AWS <code>us-east-1</code>. It’s going great; people didn’t realize how dependent their cat pic prefs are on barometric pressure, and you’re all anyone can talk about.</p>\n\n<p>Word reaches Australia and Europe, but you’re not catching on, because the… latency is too high? Just roll with us here. Anyways: fixing this is going to require replicating ECS tasks and ECR images into <code>ap-southeast-2</code> and <code>eu-central-1</code> while also setting up load balancing. Nah.</p>\n\n<p>This is the O.G. Fly.io deployment story; one deployed app, one versioned container, one command to get it running anywhere in the world.</p>\n\n<p>But you have a problem: your app relies on training data, it’s huge, your giant employer manages it, and it’s in S3. Getting this to work will require AWS credentials.</p>\n\n<p>You could ask your security team to create a user, give it permissions, and hand over the AWS keypair. Then you could wash your neck and wait for the blade. Passing around AWS keypairs is the beginning of every horror story told about cloud security, and security team ain’t having it.</p>\n\n<p>There’s a better way. It’s drastically more secure, so your security people will at least hear you out. It’s also so much easier on Fly.io that you might never bother creating a IAM service account again.</p>\n<h2 id='lets-get-it-out-of-the-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lets-get-it-out-of-the-way' aria-label='Anchor'></a><span class='plain-code'>Let’s Get It out of the Way</span></h2>\n<p>We’re going to use OIDC to set up strictly limited trust between AWS and Fly.io.</p>\n\n<ol>\n<li>In AWS: we’ll add Fly.io as an <code>Identity Provider</code> in AWS IAM, giving us an ID we can plug into any IAM <code>Role</code>.\n</li><li>Also in AWS: we’ll create a <code>Role</code>, give it access to the S3 bucket with our tokenized cat data, and then attach the <code>Identity Provider</code> to it.\n</li><li>In Fly.io, we’ll take the <code>Role</code> ARN we got from step 2 and set it as an environment variable in our app.\n</li></ol>\n\n<p>Our machines will now magically have access to the S3 bucket.</p>\n<h2 id='what-the-what' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-what' aria-label='Anchor'></a><span class='plain-code'>What the What</span></h2>\n<p>A reasonable question to ask here is, “where’s the credential”? Ordinarily, to give a Fly Machine access to an AWS resource, you’d use <code>fly secrets set</code> to add an <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code> to the environment in the Machine. Here, we’re not setting any secrets at all; we’re just adding an ARN — which is not a credential — to the Machine.</p>\n\n<p>Here’s what’s happening.</p>\n\n<p>Fly.io operates an OIDC IdP at <code>oidc.fly.io</code>. It issues OIDC tokens, exclusively to Fly Machines. AWS can be configured to trust these tokens, on a role-by-role basis. That’s the “secret credential”: the pre-configured trust relationship in IAM, and the public keypairs it manages. You, the user, never need to deal with these keys directly; it all happens behind the scenes, between AWS and Fly.io.</p>\n\n<p><img alt=\"A diagram: STS trusts OIDC.fly.io. OIDC.fly.io trusts flyd. flyd issues a token to the Machine, which proffers it to STS. STS sends an STS cred to the Machine, which then uses it to retrieve model weights from S3.\" src=\"/blog/oidc-cloud-roles/assets/oidc-diagram.webp\" /></p>\n\n<p>The key actor in this picture is <code>STS</code>, the AWS <code>Security Token Service</code>. <code>STS</code>‘s main job is to vend short-lived AWS credentials, usually through some variant of an API called <code>AssumeRole</code>. Specifically, in our case: <code>AssumeRoleWithWebIdentity</code> tells <code>STS</code> to cough up an AWS keypair given an OIDC token (that matches a pre-configured trust relationship).</p>\n\n<p>That still leaves the question: how does your code, which is reaching out to the AWS APIs to get cat weights, drive any of this?</p>\n<h2 id='the-init-thickens' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-init-thickens' aria-label='Anchor'></a><span class='plain-code'>The Init Thickens</span></h2>\n<p>Every Fly Machine boots up into an <code>init</code> we wrote in Rust. It has slowly been gathering features.</p>\n\n<p>One of those features, which has been around for awhile, is a server for a Unix socket at <code>/.fly/api</code>, which exports a subset of the Fly Machines API to privileged processes in the Machine. Think of it as our answer to the EC2 Instant Metadata Service. How it works is, every time we boot a Fly Machine, we pass it a <a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''>Macaroon token</a> locked to that particular Machine; <code>init</code>’s server for <code>/.fly/api</code> is a proxy that attaches that token to requests.</p>\n<div class=\"right-sidenote\"><p>In addition to the API proxy being tricky to SSRF to.</p>\n</div>\n<p>What’s neat about this is that the credential that drives <code>/.fly/api</code> is doubly protected:</p>\n\n<ol>\n<li>The Fly.io platform won’t honor it unless it comes from that specific Fly Machine (<code>flyd</code>, our orchestrator, knows who it’s talking to), <em>and</em>\n</li><li>Ordinary code running in a Fly Machine never gets a copy of the token to begin with.\n</li></ol>\n\n<p>You could rig up a local privilege escalation vulnerability and work out how to steal the Macaroon, but you can’t exfiltrate it productively.</p>\n\n<p>So now you have half the puzzle worked out: OIDC is just part of the <a href='https://fly.io/docs/machines/api/' title=''>Fly Machines API</a> (specifically: <code>/v1/tokens/oidc</code>). A Fly Machine can hit a Unix socket and get an OIDC token tailored to that machine:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-9o3904mp\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-9o3904mp\">{\n \"app_id\": \"3671581\",\n \"app_name\": \"weather-cat\",\n \"aud\": \"sts.amazonaws.com\",\n \"image\": \"image:latest\",\n \"image_digest\": \"sha256:dff79c6da8dd4e282ecc6c57052f7cfbd684039b652f481ca2e3324a413ee43f\",\n \"iss\": \"https://oidc.fly.io/example\",\n \"machine_id\": \"3d8d377ce9e398\",\n \"machine_name\": \"ancient-snow-4824\",\n \"machine_version\": \"01HZJXGTQ084DX0G0V92QH3XW4\",\n \"org_id\": \"29873298\",\n \"org_name\": \"example\",\n \"region\": \"yyz\",\n \"sub\": \"example:weather-cat:ancient-snow-4824\"\n} // some OIDC stuff trimmed\n</code></pre>\n </div>\n</div>\n<p>Look upon this holy blob, sealed with a published key managed by Fly.io’s OIDC vault, and see that there lies within it enough information for AWS <code>STS</code> to decide to issue a session credential.</p>\n\n<p>We have still not completed the puzzle, because while you can probably now see how you’d drive this process with a bunch of new code that you’d tediously write, you are acutely aware that you have not yet endured that tedium — e pur si muove!</p>\n\n<p>One <code>init</code> feature remains to be disclosed, and it’s cute.</p>\n\n<p>If, when <code>init</code> starts in a Fly Machine, it sees an <code>AWS_ROLE_ARN</code> environment variable set, it initiates a little dance; it:</p>\n\n<ol>\n<li>goes off and generates an OIDC token, the way we just described,\n</li><li>saves that OIDC token in a file, <em>and</em>\n</li><li>sets the <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> and <code>AWS_ROLE_SESSION_NAME</code> environment variables for every process it launches.\n</li></ol>\n\n<p>The AWS SDK, linked to your application, does all the rest.</p>\n\n<p>Let’s review: you add an <code>AWS_ROLE_ARN</code> variable to your Fly App, launch a Machine, and have it go fetch a file from S3. What happens next is:</p>\n\n<ol>\n<li><code>init</code> detects <code>AWS_ROLE_ARN</code> is set as an environment variable.\n</li><li><code>init</code> sends a request to <code>/v1/tokens/oidc</code> via <code>/.api/proxy</code>.\n</li><li><code>init</code> writes the response to <code>/.fly/oidc_token.</code>\n</li><li><code>init</code> sets <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> and <code>AWS_ROLE_SESSION_NAME</code>.\n</li><li>The entrypoint boots, and (say) runs <code>aws s3 get-object.</code>\n</li><li>The AWS SDK runs through the <a href='https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html#credentialProviderChain' title=''>credential provider chain</a>\n</li><li>The SDK sees that <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> is set and calls <code>AssumeRoleWithWebIdentity</code> with the file contents.\n</li><li>AWS verifies the token against <a href='https://oidc.fly.io/' title=''><code>https://oidc.fly.io/</code></a><code>example/.well-known/openid-configuration</code>, which references a key Fly.io manages on isolated hardware.\n</li><li>AWS vends <code>STS</code> credentials for the assumed <code>Role</code>.\n</li><li>The SDK uses the <code>STS</code> credentials to access the S3 bucket.\n</li><li>AWS checks the <code>Role</code>’s IAM policy to see if it has access to the S3 bucket.\n</li><li>AWS returns the contents of the bucket object.\n</li></ol>\n<h2 id='how-much-better-is-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-better-is-this' aria-label='Anchor'></a><span class='plain-code'>How Much Better Is This?</span></h2>\n<p>It is a lot better.</p>\n<div class=\"right-sidenote\"><p>They asymptotically approach the security properties of Macaroon tokens.</p>\n</div>\n<p>Most importantly: AWS <code>STS</code> credentials are short-lived. Because they’re generated dynamically, rather than stored in a configuration file or environment variable, they’re already a little bit annoying for an attacker to recover. But they’re also dead in minutes. They have a sharply limited blast radius. They rotate themselves, and fail closed.</p>\n\n<p>They’re also easier to manage. This is a rare instance where you can reasonably drive the entire AWS side of the process from within the web console. Your cloud team adds <code>Roles</code> all the time; this is just a <code>Role</code> with an extra snippet of JSON. The resulting ARN isn’t even a secret; your cloud team could just email or Slack message it back to you.</p>\n\n<p>Finally, they offer finer-grained control.</p>\n\n<p>To understand the last part, let’s look at that extra snippet of JSON (the “Trust Policy”) your cloud team is sticking on the new <code>cat-bucket</code> <code>Role</code>:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-la5jlerc\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-la5jlerc\">{\n \"Version\": \"2012-10-17\",\n \"Statement\": [\n {\n \"Effect\": \"Allow\",\n \"Principal\": {\n \"Federated\": \"arn:aws:iam::123456123456:oidc-provider/oidc.fly.io/example\"\n },\n \"Action\": \"sts:AssumeRoleWithWebIdentity\",\n \"Condition\": {\n \"StringEquals\": {\n \"oidc.fly.io/example:aud\": \"sts.amazonaws.com\",\n },\n \"StringLike\": {\n \"oidc.fly.io/example:sub\": \"example:weather-cat:*\"\n }\n }\n }\n ]\n}\n</code></pre>\n </div>\n</div><div class=\"right-sidenote\"><p>The <code>aud</code> check guarantees <code>STS</code> will only honor tokens that Fly.io deliberately vended for <code>STS</code>.</p>\n</div>\n<p>Recall the OIDC token we dumped earlier; much of what’s in it, we can match in the Trust Policy. Every OIDC token Fly.io generates is going to have a <code>sub</code> field formatted <code>org:app:machine</code>, so we can lock IAM <code>Roles</code> down to organizations, or to specific Fly Apps, or even specific Fly Machine instances.</p>\n<figure class=\"post-cta\">\n <figcaption>\n <h1>Speedrun your app onto Fly.io.</h1>\n <p>3…2…1…</p>\n <a class=\"btn btn-lg\" href=\"https://fly.io/speedrun\">\n Go! <span class='opacity-50'>→</span>\n </a>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-rabbit.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>\n\n<h2 id='and-so' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#and-so' aria-label='Anchor'></a><span class='plain-code'>And So</span></h2>\n<p>In case it’s not obvious: this pattern works for any AWS API, not just S3.</p>\n\n<p>Our OIDC support on the platform and in Fly Machines will set arbitrary OIDC <code>audience</code> strings, so you can use it to authenticate to any OIDC-compliant cloud provider. It won’t be as slick on Azure or GCP, because we haven’t done the <code>init</code> features to light their APIs up with a single environment variable — but those features are easy, and we’re just waiting for people to tell us what they need.</p>\n\n<p>For us, the gold standard for least-privilege, conditional access tokens remains Macaroons, and it’s unlikely that we’re going to do a bunch of internal stuff using OIDC. We even snuck Macaroons into this feature. But the security you’re getting from this OIDC dance closes a lot of the gap between hardcoded user credentials and Macaroons, and it’s easy to use — easier, in some ways, than it is to manage role-based access inside of a legacy EC2 deployment!</p>", "image": { "url": "https://fly.io/blog/oidc-cloud-roles/assets/spooky-security-skeleton-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/llm-image-description/", "title": "Picture This: Open Source AI for Image Description", "description": null, "url": "https://fly.io/blog/llm-image-description/", "published": "2024-05-09T00:00:00.000Z", "updated": "2024-05-09T17:35:04.000Z", "content": "<div class=\"lead\"><p>I’m Nolan, and I work on Fly Machines here at Fly.io. We’re building a new public cloud—one where you can spin up CPU and GPU workloads, around the world, in a jiffy. <a href=\"https://fly.io/speedrun/\" title=\"\">Try us out</a>; you can be up and running in minutes. This is a post about LLMs being really helpful, and an extensible project you can build with open source on a weekend.</p>\n</div>\n<p>Picture this, if you will.</p>\n\n<p>You’re blind. You’re in an unfamiliar hotel room on a trip to Chicago.</p>\n<div class=\"right-sidenote\"><p>If you live in Chicago IRL, imagine the hotel in Winnipeg, <a href=\"https://www.cbc.ca/history/EPISCONTENTSE1EP10CH3PA5LE.html\" title=\"\">the Chicago of the North</a>.</p>\n</div>\n<p>You’ve absent-mindedly set your coffee down, and can’t remember where. You’re looking for the thermostat so you don’t wake up frozen. Or, just maybe, you’re playing a fun-filled round of “find the damn light switch so your sighted partner can get some sleep already!”</p>\n\n<p>If, like me, you’ve been blind for a while, you have plenty of practice finding things without the luxury of a quick glance around. It may be more tedious than you’d like, but you’ll get it done.</p>\n\n<p>But the speed of innovation in machine learning and large language models has been dizzying, and in 2024 you can snap a photo with your phone and have an app like <a href='https://www.bemyeyes.com/blog/announcing-be-my-ai/' title=''>Be My AI</a> or <a href='https://www.seeingai.com/' title=''>Seeing AI</a> tell you where in that picture it found your missing coffee mug, or where it thinks the light switch is.</p>\n<div class=\"right-sidenote\"><p>Creative switch locations seem to be a point of pride for hotels, so the light game may be good for a few rounds of quality entertainment, regardless of how good your AI is.</p>\n</div>\n<p>This is <em>big</em>. It’s hard for me to state just how exciting and empowering AI image descriptions have been for me without sounding like a shill. In the past year, I’ve:</p>\n\n<ul>\n<li>Found shit in strange hotel rooms. \n</li><li>Gotten descriptions of scenes and menus in otherwise inaccessible video games.\n</li><li>Requested summaries of technical diagrams and other materials where details weren’t made available textually. \n</li></ul>\n\n<p>I’ve been consistently blown away at how impressive and helpful AI-created image descriptions have been.</p>\n\n<p>Also…</p>\n<h2 id='which-thousand-words-is-this-picture-worth' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#which-thousand-words-is-this-picture-worth' aria-label='Anchor'></a><span class='plain-code'>Which thousand words is this picture worth?</span></h2>\n<p>As a blind internet user for the last three decades, I have extensive empirical evidence to corroborate what you already know in your heart: humans are pretty flaky about writing useful alt text for all the images they publish. This does tend to make large swaths of the internet inaccessible to me!</p>\n\n<p>In just a few years, the state of image description on the internet has gone from complete reliance on the aforementioned lovable, but ultimately tragically flawed, humans, to automated strings of words like <code>Image may contain person, glasses, confusion, banality, disillusionment</code>, to LLM-generated text that reads a lot like it was written by a person, perhaps sipping from a steaming cup of Earl Grey as they reflect on their previous experiences of a background that features a tree with snow on its branches, suggesting that this scene takes place during winter.</p>\n\n<p>If an image is missing alt text, or if you want a second opinion, there are screen-reader addons, like <a href='https://github.com/cartertemm/AI-content-describer/' title=''>this one</a> for <a href='https://www.nvaccess.org/download/' title=''>NVDA</a>, that you can use with an API key to get image descriptions from GPT-4 or Google Gemini as you read. This is awesome! </p>\n\n<p>And this brings me to the nerd snipe. How hard would it be to build an image description service we can host ourselves, using open source technologies? It turns out to be spookily easy.</p>\n\n<p>Here’s what I came up with:</p>\n\n<ol>\n<li><a href='https://ollama.com/' title=''>Ollama</a> to run the model\n</li><li>A <a href='https://pocketbase.io' title=''>PocketBase</a> project that provides a simple authenticated API for users to submit images, get descriptions, and ask followup questions about the image\n</li><li>The simplest possible Python client to interact with the PocketBase app on behalf of users\n</li></ol>\n\n<p>The idea is to keep it modular and hackable, so if sentiment analysis or joke creation is your thing, you can swap out image description for that and have something going in, like, a weekend.</p>\n\n<p>If you’re like me, and you go skipping through recipe blogs to find the “go directly to recipe” link, find the code itself <a href='https://github.com/superfly/llm-describer' title=''>here</a>. </p>\n<h2 id='the-llm-is-the-easiest-part' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-llm-is-the-easiest-part' aria-label='Anchor'></a><span class='plain-code'>The LLM is the easiest part</span></h2>\n<p>An API to accept images and prompts, run the model, and spit \nout answers sounds like a lot! But it’s the simplest part of this whole thing, because: \nthat’s <a href='https://ollama.com/' title=''>Ollama</a>.</p>\n\n<p>You can just run the Ollama Docker image, get it to grab the model \nyou want to use, and that’s it. There’s your AI server. (We have a <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>blog post</a> \nall about deploying Ollama on Fly.io; Fly GPUs are rad, try'em out, etc.).</p>\n\n<p>For this project, we need a model that can make sense—or at least words—out of a picture. \n<a href='https://llava-vl.github.io/' title=''>LLaVA</a> is a trained, Apache-licensed “large multimodal model” that fits the bill. \nGet the model with the Ollama CLI:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-wohvpptj\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight cmd'><code id=\"code-wohvpptj\">ollama pull llava:34b\n</code></pre>\n </div>\n</div><div class=\"callout\"><p>If you have hardware that can handle it, you could run this on your computer at home. If you run AI models on a cloud provider, be aware that GPU compute is expensive! <strong class=\"font-semibold text-navy-950\">It’s important to take steps to ensure you’re not paying for a massive GPU 24/7.</strong></p>\n\n<p>On Fly.io, at the time of writing, you’d achieve this with the <a href=\"https://fly.io/docs/apps/autostart-stop/\" title=\"\">autostart and autostop</a> functions of the Fly Proxy, restricting Ollama access to internal requests over <a href=\"https://fly.io/docs/networking/private-networking/#flycast-private-fly-proxy-services\" title=\"\">Flycast</a> from the PocketBase app. That way, if there haven’t been any requests for a few minutes, the Fly Proxy stops the Ollama <a href=\"https://fly.io/docs/machines/\" title=\"\">Machine</a>, which releases the CPU, GPU, and RAM allocated to it. <a href=\"https://fly.io/blog/scaling-llm-ollama/\" title=\"\">Here’s a post</a> that goes into more detail. </p>\n</div><h2 id='a-multi-tool-on-the-backend' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-multi-tool-on-the-backend' aria-label='Anchor'></a><span class='plain-code'>A multi-tool on the backend</span></h2>\n<p>I want user auth to make sure just anyone can’t grab my “image description service” and keep it busy generating short stories about their cat. If I build this out into a service for others to use, I might also want business logic around plans or\ncredits, or mobile-friendly APIs for use in the field. <a href='https://pocketbase.io' title=''>PocketBase</a> provides a scaffolding for all of it. It’s a Swiss army knife: a Firebase-like API on top of SQLite, complete with authentication, authorization, an admin UI, extensibility in JavaScript and Go, and various client-side APIs.</p>\n<div class=\"right-sidenote\"><p>Yes, <em>of course</em> I’ve used an LLM to generate feline fanfic. Theme songs too. Hasn’t everyone? </p>\n</div>\n<p>I “faked” a task-specific API that supports followup questions by extending PocketBase in Go, modeling requests and responses as <a href='https://pocketbase.io/docs/collections/' title=''>collections</a> (i.e. SQLite tables) with <a href='https://pocketbase.io/docs/go-event-hooks/' title=''>event hooks</a> to trigger pre-set interactions with the Ollama app (via <a href='https://tmc.github.io/langchaingo' title=''>LangChainGo</a>) and the client (via the PocketBase API).</p>\n\n<p>If you’re following along, <a href='https://github.com/superfly/llm-describer/blob/main/describer.go' title=''>here’s the module</a>\nthat handles all that, along with initializing the LLM connection.</p>\n\n<p>In a nutshell, this is the dance:</p>\n\n<ul>\n<li>When a user uploads an image, a hook on the <code>images</code> collection sends the image to Ollama, along with this prompt:\n<code>\"You are a helpful assistant describing images for blind screen reader users. Please describe this image.\"</code>\n</li><li>Ollama sends back its response, which the backend 1) passes back to the client and 2) stores in its <code>followups</code> collection for future reference.\n</li><li>If the user responds with a followup question about the image and description, that also \ngoes into the <code>followups</code> collection; user-initiated changes to this collection trigger a hook to chain the new \nfollowup question with the image and the chat history into a new request for the model.\n</li><li>Lather, rinse, repeat.\n</li></ul>\n\n<p>This is a super simple hack to handle followup questions, and it’ll let you keep adding followups until \nsomething breaks. You’ll see the quality of responses get poorer—possibly incoherent—as the context \nexceeds the context window.</p>\n\n<p>I also set up <a href='https://pocketbase.io/docs/api-rules-and-filters/' title=''>API rules</a> in PocketBase,\nensuring that users can’t read to and write from others’ chats with the AI.</p>\n\n<p>If image descriptions aren’t your thing, this business logic is easily swappable \nfor joke generation, extracting details from text, any other simple task you \nmight want to throw at an LLM. Just slot the best model into Ollama (LLaVA is pretty OK as a general starting point too), and match the PocketBase schema and pre-set prompts to your application.</p>\n<h2 id='a-seedling-of-a-client' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-seedling-of-a-client' aria-label='Anchor'></a><span class='plain-code'>A seedling of a client</span></h2>\n<p>With the image description service in place, the user can talk to it with any client that speaks the PocketBase API. PocketBase already has SDK clients in JavaScript and Dart, but because my screen reader is <a href='https://github.com/nvaccess/nvda' title=''>written in Python</a>, I went with a <a href='https://pypi.org/project/pocketbase/' title=''>community-created Python library</a>. That way I can build this out into an NVDA add-on \nif I want to.</p>\n\n<p>If you’re a fancy Python developer, you probably have your preferred tooling for\nhandling virtualenvs and friends. I’m not, and since my screen reader doesn’t use those\nanyway, I just <code>pip install</code>ed the library so my client can import it:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-s8xqjyx2\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight cmd'><code id=\"code-s8xqjyx2\">pip install pocketbase\n</code></pre>\n </div>\n</div>\n<p><a href='https://github.com/superfly/llm-describer/blob/main/__init__.py' title=''>My client</a> is a very simple script. \nIt expects a couple of things: a file called <code>image.jpg</code>, located in the current directory, \nand environment variables to provide the service URL and user credentials to log into it with.</p>\n\n<p>When you run the client script, it uploads the image to the user’s <code>images</code> collection on the \nbackend app, starting the back-and-forth between user and model we saw in the previous section. \nThe client prints the model’s output to the CLI and prompts the user to input a followup question, \nwhich it passes up to the <code>followups</code> collection, and so on.</p>\n<figure class=\"post-cta\">\n <figcaption>\n <h1>This can run on Fly.io.</h1>\n <p>Run your LLM on a datacenter-grade GPU.</p>\n <a class=\"btn btn-lg\" href=\"https://fly.io/gpu/\">\n Try out a Fly GPU <span class='opacity-50'>→</span>\n </a>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-kitty.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>\n\n<h2 id='all-together-now' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#all-together-now' aria-label='Anchor'></a><span class='plain-code'>All together now</span></h2>\n<p>I grabbed <a href='https://unsplash.com/photos/brown-trees-beside-river-under-blue-sky-during-daytime-kSvpTrfhaiU' title=''>this\nimage</a>\nand saved it to a file called <em>image.jpg</em>. </p>\n\n<p>While I knew I was downloading an image of a winter scene, all I see on Unsplash is:</p>\n\n<blockquote>\n<p>brown trees beside river under blue sky during daytime Bright winter landscape\nwith lake, snow, forest, beautiful blue sky and white clouds. An example of\ncharming wildlife in Russia.</p>\n</blockquote>\n\n<p>Let’s see what our very own AI describer thinks of this picture:</p>\n<div class=\"highlight-wrapper group relative plain\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-lvuwb8nb\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-lvuwb8nb\">$ python __init__.py\nThe image depicts a serene winter scene. There is a small stream of water flowing through the center, with patches of snow on its surface and along the banks. The surrounding area appears to be a mix of bare trees and shrubs, suggesting it's late in the season or possibly early spring when some plants have started to grow again but haven't yet leafed out.\n\nThe sky is clear and blue, indicating good weather conditions at the time the photo was taken. The sunlight seems to be coming from the left side of the image, casting a warm glow on the scene and creating reflections in the water. There are no visible texts or distinguishing marks that provide additional context about the location or the photographer.\n\nThe overall atmosphere is peaceful and natural, with the quiet beauty of the landscape undisturbed except for the gentle movement of the stream.\n</code></pre>\n </div>\n</div>\n<p>Is it a stellar description? Maybe not, but it certainly gives me a better sense of connection with the scene.</p>\n\n<p>Let’s see how our describer copes with a followup question.</p>\n<div class=\"highlight-wrapper group relative plain\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-dgfkbrw6\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-dgfkbrw6\">Enter your followup question, or 'quit' to quit: What types of trees are in the image?\nSending followup question\nIt's difficult to determine the exact species of trees in the image without more specific details, as many deciduous trees lose their leaves during winter and look quite similar. However, based on the general appearance of the bare branches and the fact that they are leafless, it is likely that these are common deciduous trees such as oaks, maples, or elms.\n\nThe presence of snow and the absence of foliage suggest that this photo was taken during winter or early spring when the trees have not yet started to bloom again. The exact species would require a closer examination of the bark, buds, and other distinguishing features, which are not clearly visible in this image.\n</code></pre>\n </div>\n</div>\n<p>Boo, the general-purpose LLaVA model couldn’t identify the leafless trees. At least it knows why it can’t. Maybe there’s a better model out \nthere for that. Or we could train one, if we really needed tree identification! We could make every component of \nthis service more sophisticated! </p>\n\n<p>But that I, personally, can make a proof of concept like this with a few days of effort\ncontinues to boggle my mind. Thanks to a handful of amazing open source projects, it’s really, spookily, easy. And from here, I (or you) can build out a screen-reader addon, or a mobile app, or a different kind of AI service, with modular changes.</p>\n<h2 id='deployment-notes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-notes' aria-label='Anchor'></a><span class='plain-code'>Deployment notes</span></h2>\n<p>On Fly.io, stopping GPU Machines saves you a bunch of money and some carbon footprint, in return for cold-start latency when you make a request for the first time in more than a few minutes. In testing this project, on the <code>a100-40gb</code> Fly Machine preset, the 34b-parameter LLaVA model took several seconds to generate each response. If the Machine was stopped when the request came in, starting it up took another handful of seconds, followed by several tens of seconds to load the model into GPU RAM. The total time from cold start to completed description was about 45 seconds. Just something to keep in mind.</p>\n\n<p>If you’re running Ollama in the cloud, you likely want to put the model onto storage that’s persistent, so you don’t have to download it repeatedly. You could also build the model into a Docker image ahead of deployment.</p>\n\n<p>The PocketBase Golang app compiles to a single executable that you can run wherever.\nI run it on Fly.io, unsurprisingly, and the <a href='https://github.com/superfly/llm-describer/' title=''>repo</a> comes with a Dockerfile and a <a href='https://fly.io/docs/reference/configuration/' title=''><code>fly.toml</code></a> config file, which you can edit to point at your own Ollama instance. It uses a small persistent storage volume for the SQLite database. Under testing, it runs fine on a <code>shared-cpu-1x</code> Machine. </p>", "image": { "url": "https://fly.io/blog/llm-image-description/assets/image-description-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/jit-wireguard-peers/", "title": "JIT WireGuard", "description": null, "url": "https://fly.io/blog/jit-wireguard-peers/", "published": "2024-03-12T00:00:00.000Z", "updated": "2024-05-09T17:35:04.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. We do a lot of stuff with WireGuard, which has become a part of our customer API. This is a quick story about some tricks we played to make WireGuard faster and more scalable for the hundreds of thousands of people who now use it here.</p>\n</div>\n<p>One of many odd decisions we’ve made at Fly.io is how we use WireGuard. It’s not just that we use it in many places where other shops would use HTTPS and REST APIs. We’ve gone a step beyond that: every time you run <code>flyctl</code>, our lovable, sprawling CLI, it conjures a TCP/IP stack out of thin air, with its own IPv6 address, and speaks directly to Fly Machines running on our networks.</p>\n\n<p>There are plusses and minuses to this approach, which we talked about <a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''>in a blog post a couple years back</a>. Some things, like remote-operated Docker builders, get easier to express (a Fly Machine, as far as <code>flyctl</code> is concerned, might as well be on the same LAN). But everything generally gets trickier to keep running reliably.</p>\n\n<p>It was a decision. We own it.</p>\n\n<p>Anyways, we’ve made some improvements recently, and I’d like to talk about them.</p>\n<h2 id='where-we-left-off' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-we-left-off' aria-label='Anchor'></a><span class='plain-code'>Where we left off</span></h2>\n<p>Until a few weeks ago, our gateways ran on a pretty simple system.</p>\n\n<ol>\n<li>We operate dozens of “gateway” servers around the world, whose sole purpose is to accept incoming WireGuard connections and connect them to the appropriate private networks.\n</li><li>Any time you run <code>flyctl</code> and it needs to talk to a Fly Machine (to build a container, pop an SSH console, copy files, or proxy to a service you’re running), it spawns or connects to a background agent process.\n</li><li>The first time it runs, the agent generates a new WireGuard peer configuration from our GraphQL API. WireGuard peer configurations are very simple: just a public key and an address to connect to.\n</li><li>Our API in turn takes that peer configuration and sends it to the appropriate gateway (say, <code>ord</code>, if you’re near Chicago) via an RPC we send over the NATS messaging system.\n</li><li>On the gateway, a service called <code>wggwd</code> accepts that configuration, saves it to a SQLite database, and adds it to the kernel using WireGuard’s Golang libraries. <code>wggwd</code> acknowledges the installation of the peer to the API.\n</li><li>The API replies to your GraphQL request, with the configuration.\n</li><li>Your <code>flyctl</code> connects to the WireGuard peer, which works, because you receiving the configuration means it’s installed on the gateway.\n</li></ol>\n\n<p>I copy-pasted those last two bullet points from <a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''>that two-year-old post</a>, because when it works, it does <em>just work</em> reasonably well. (We ultimately did end up defaulting everybody to WireGuard-over-WebSockets, though.)</p>\n\n<p>But if it always worked, we wouldn’t be here, would we?</p>\n\n<p>We ran into two annoying problems:</p>\n\n<p>One: NATS is fast, but doesn’t guarantee delivery. Back in 2022, Fly.io was pretty big on NATS internally. We’ve moved away from it. For instance, our <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>internal <code>flyd</code> API</a> used to be driven by NATS; today, it’s HTTP. Our NATS cluster was losing too many messages to host a reliable API on it. Scaling back our use of NATS made WireGuard gateways better, but still not great.</p>\n\n<p>Two: When <code>flyctl</code> exits, the WireGuard peer it created sticks around on the gateway. Nothing cleans up old peers. After all, you’re likely going to come back tomorrow and deploy a new version of your app, or <code>fly ssh console</code> into it to debug something. Why remove a peer just to re-add it the next day?</p>\n\n<p>Unfortunately, the vast majority of peers are created by <code>flyctl</code> in CI jobs, which don’t have persistent storage and can’t reconnect to the same peer the next run; they generate new peers every time, no matter what.</p>\n\n<p>So, we ended up with a not-reliable-enough provisioning system, and gateways with hundreds of thousands of peers that will never be used again. The high stale peer count made kernel WireGuard operations very slow - especially loading all the peers back into the kernel after a gateway server reboot - as well as some kernel panics.</p>\n\n<p>There had to be</p>\n<h2 id='a-better-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-better-way' aria-label='Anchor'></a><span class='plain-code'>A better way.</span></h2>\n<p>Storing bajillions of WireGuard peers is no big challenge for any serious n-tier RDBMS. This isn’t “big data”. The problem we have at Fly.io is that our gateways don’t have serious n-tier RDBMSs. They’re small. Scrappy. They live off the land.</p>\n\n<p>Seriously, though: you could store every WireGuard peer everybody has ever used at Fly.io in a single SQLite database, easily. What you can’t do is store them all in the Linux kernel.</p>\n\n<p>So, at some point, as you push more and more peer configurations to a gateway, you have to start making decisions about which peers you’ll enable in the kernel, and which you won’t.</p>\n\n<p>Wouldn’t it be nice if we just didn’t have this problem? What if, instead of pushing configs to gateways, we had the gateways pull them from our API on demand?</p>\n\n<p>If you did that, peers would only have to be added to the kernel when the client wanted to connect. You could yeet them out of the kernel any time you wanted; the next time the client connected, they’d just get pulled again, and everything would work fine.</p>\n\n<p>The problem you quickly run into to build this design is that Linux kernel WireGuard doesn’t have a feature for installing peers on demand. However:</p>\n<h2 id='it-is-possible-to-jit-wireguard-peers' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#it-is-possible-to-jit-wireguard-peers' aria-label='Anchor'></a><span class='plain-code'>It is possible to JIT WireGuard peers</span></h2>\n<p>The Linux kernel’s <a href='https://github.com/WireGuard/wgctrl-go' title=''>interface for configuring WireGuard</a> is <a href='https://docs.kernel.org/userspace-api/netlink/intro.html' title=''>Netlink</a> (which is basically a way to create a userland socket to talk to a kernel service). Here’s a <a href='https://github.com/WireGuard/wg-dynamic/blob/master/netlink.h' title=''>summary of it as a C API</a>. Note that there’s no API call to subscribe for “incoming connection attempt” events.</p>\n\n<p>That’s OK! We can just make our own events. WireGuard connection requests are packets, and they’re easily identifiable, so we can efficiently snatch them with a BPF filter and a <a href='https://github.com/google/gopacket' title=''>packet socket</a>.</p>\n<div class=\"callout\"><p>Most of the time, it’s even easier for us to get the raw WireGuard packets, because our users now default to WebSockets WireGuard (which is just an unauthenticated WebSockets connect that shuttles framed UDP packets to and from an interface on the gateway), so that people who have trouble talking end-to-end in UDP can bring connections up.</p>\n</div>\n<p>We own the daemon code for that, and can just hook the packet receive function to snarf WireGuard packets.</p>\n\n<p>It’s not obvious, but WireGuard doesn’t have notions of “client” or “server”. It’s a pure point-to-point protocol; peers connect to each other when they have traffic to send. The first peer to connect is called the <strong class='font-semibold text-navy-950'>initiator</strong>, and the peer it connects to is the <strong class='font-semibold text-navy-950'>responder</strong>.</p>\n<div class=\"right-sidenote\"><p><a href=\"https://www.wireguard.com/papers/wireguard.pdf\" title=\"\"><em>The WireGuard paper</em></a> <em>is a good read.</em></p>\n</div>\n<p>For Fly.io, <code>flyctl</code> is typically our initiator, sending a single UDP packet to the gateway, which is the responder. According <a href='https://www.wireguard.com/papers/wireguard.pdf' title=''>to the WireGuard paper</a>, this first packet is a <code>handshake initiation</code>. It gets better: the packet type is recorded in a single plaintext byte. So this simple BPF filter catches all the incoming connections: <code>udp and dst port 51820 and udp[8] = 1</code>.</p>\n\n<p>In most other protocols, we’d be done at this point; we’d just scrape the username or whatnot out of the packet, go fetch the matching configuration, and install it in the kernel. With WireGuard, not so fast. WireGuard is based on Trevor Perrin’s <a href='http://www.noiseprotocol.org/' title=''>Noise Protocol Framework</a>, and Noise goes way out of its way to <a href='http://www.noiseprotocol.org/noise.html#identity-hiding' title=''>hide identities</a> during handshakes. To identify incoming requests, we’ll need to run enough Noise cryptography to decrypt the identity.</p>\n\n<p>The code to do this is fussy, but it’s relatively short (about 200 lines). Helpfully, the kernel Netlink interface will give a privileged process the private key for an interface, so the secrets we need to unwrap WireGuard are easy to get. Then it’s just a matter of running the first bit of the Noise handshake. If you’re that kind of nerdy, <a href='https://gist.github.com/tqbf/9f2c2852e976e6566f962d9bca83062b' title=''>here’s the code.</a></p>\n\n<p>At this point, we have the event feed we wanted: the public keys of every user trying to make a WireGuard connection to our gateways. We keep a rate-limited cache in SQLite, and when we see new peers, we’ll make an internal HTTP API request to fetch the matching peer information and install it. This fits nicely into the little daemon that already runs on our gateways to manage WireGuard, and allows us to ruthlessly and recklessly remove stale peers with a <code>cron</code> job.</p>\n\n<p>But wait! There’s more! We bounced this plan off Jason Donenfeld, and he tipped us off on a sneaky feature of the Linux WireGuard Netlink interface.</p>\n<div class=\"right-sidenote\"><p>Jason is the hardest working person in show business.</p>\n</div>\n<p>Our API fetch for new peers is generally not going to be fast enough to respond to the first handshake initiation message a new client sends us. That’s OK; WireGuard is pretty fast about retrying. But we can do better.</p>\n\n<p>When we get an incoming initiation message, we have the 4-tuple address of the desired connection, including the ephemeral source port <code>flyctl</code> is using. We can install the peer as if we’re the initiator, and <code>flyctl</code> is the responder. The Linux kernel will initiate a WireGuard connection back to <code>flyctl</code>. This works; the protocol doesn’t care a whole lot who’s the server and who’s the client. We get new connections established about as fast as they can possibly be installed.</p>\n<figure class=\"post-cta\">\n <figcaption>\n <h1>Launch an app in minutes</h1>\n <p>Speedrun an app onto Fly.io and get your own JIT WireGuard peer ✨</p>\n <a class=\"btn btn-lg\" href=\"/docs/speedrun/\">\n Speedrun <span class='opacity-50'>→</span>\n </a>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-dog.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>\n\n<h2 id='look-at-this-graph' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-this-graph' aria-label='Anchor'></a><span class='plain-code'>Look at this graph</span></h2>\n<p>We’ve been running this in production for a few weeks and we’re feeling pretty happy about it. We went from thousands, or hundreds of thousands, of stale WireGuard peers on a gateway to what rounds to none. Gateways now hold a lot less state, are faster at setting up peers, and can be rebooted without having to wait for many unused peers to be loaded back into the kernel.</p>\n\n<p>I’ll leave you with this happy Grafana chart from the day of the switchover.</p>\n\n<p><img alt=\"a Grafana chart of 'kernel_stale_wg_peer_count' vs. time. For the first few hours, all traces are flat. Most are at values between 0 and 50,000 and the top-most is just under 550,000. Towards the end of the graph, each line in turn jumps sharply down to the bottom, and at the end of the chart all datapoints are indistinguishable from 0.\" src=\"/blog/jit-wireguard-peers/assets/wireguard-peers-graph.webp\" /></p>\n\n<p><strong class='font-semibold text-navy-950'>Editor’s note:</strong> Despite our tearful protests, Lillian has decided to move on from Fly.io to explore new pursuits. We wish her much success and happiness! ✨</p>", "image": { "url": "https://fly.io/blog/jit-wireguard-peers/assets/network-thumbnail.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/fks-beta-live/", "title": "Fly Kubernetes does more now", "description": null, "url": "https://fly.io/blog/fks-beta-live/", "published": "2024-03-07T00:00:00.000Z", "updated": "2024-04-22T18:28:43.000Z", "content": "<div class=\"lead\"><p>Eons ago, we <a href=\"https://fly.io/blog/fks/\" title=\"\">announced</a> we were working on <a href=\"https://fly.io/docs/kubernetes/\" title=\"\">Fly Kubernetes</a>. It drummed up enough excitement to prove we were heading in the right direction. So, we got hard to work to get from barebones “early access” to a beta release. We’ll be onboarding customers to the closed beta over the next few weeks. Email us at <a href=\"mailto:[email protected]\">[email protected]</a> and we’ll hook you up.</p>\n</div>\n<p>Fly Kubernetes is the “blessed path\"™️ to using Kubernetes backed by Fly.io infrastructure. Or, in simpler terms, it is our managed Kubernetes service. We take care of the complexity of operating the Kubernetes control plane, leaving you with the unfettered joy of deploying your Kubernetes workloads. If you love Fly.io and K8s, this product is for you.</p>\n<h2 id='what-even-is-a-kubernete' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-even-is-a-kubernete' aria-label='Anchor'></a><span class='plain-code'>What even is a Kubernete?</span></h2>\n<p>So how did this all come to be—and what even is a Kubernete?</p>\n<div class=\"right-sidenote\"><p>You can see more fun details in <a href=\"https://fly.io/blog/fks/\" title=\"\">Introducing Fly Kubernetes</a>.</p>\n</div>\n<p>If you wade through all the YAML and <a href='https://landscape.cncf.io/' title=''>CNCF projects</a>, what’s left is an API for declaring workloads and how it should be accessed. </p>\n\n<p>But that’s not what people usually talk / groan about. It’s everything else that comes along with adopting Kubernetes: a container runtime (CRI), networking between workloads (CNI) which leads to DNS (CoreDNS). Then you layer on Prometheus for metrics and whatever the logging daemon du jour is at the time. Now you get to debate which Ingress—strike that—<em>Gateway</em> API to deploy and if the next thing is anything to do with a Service Mess, then as they like to say where I live, \"bless your heart”.</p>\n\n<p>Finally, there’s capacity planning. You’ve got to pick and choose where, how and what the <a href='https://kubernetes.io/docs/concepts/architecture/nodes/' title=''>Nodes</a> will look like in order to configure and run the workloads.</p>\n\n<p>When we began thinking about what a Fly Kubernetes Service could look like, we started from first principles, as we do with most everything here. The best way we can describe it is the <a href='https://www.youtube.com/watch?v=Ddk9ci6geSs' title=''>scene from Iron Man 2 when Tony Stark discovers a new element</a>. As he’s looking at the knowledge left behind by those that came before, he starts to imagine something entirely different and more capable than could have been accomplished previously. That’s what happened to JP, but with K3s and Virtual Kubelet.</p>\n<h2 id='ok-then-wtf-whats-the-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ok-then-wtf-whats-the-fks' aria-label='Anchor'></a><span class='plain-code'>OK then, WTF (what’s the FKS)?</span></h2>\n<p>We looked at what people need to get started—the API—and then started peeling away all the noise, filling in the gaps to connect things together to provide the power. Here’s how this looks currently:</p>\n\n<ul>\n<li>Containerd/CRI → <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>flyd</a> + Firecracker + <a href='https://fly.io/blog/docker-without-docker/' title=''>our init</a>: our system transmogrifies Docker containers into Firecracker microVMs\n</li><li>Networking/CNI → Our <a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''>internal WireGuard mesh</a> connects your pods together\n</li><li>Pods → Fly Machines VMs\n</li><li>Secrets → Secrets, only not the base64’d kind\n</li><li>Services → The Fly Proxy\n</li><li>CoreDNS → CoreDNS (to be replaced with our custom internal DNS)\n</li><li>Persistent Volumes → Fly Volumes (coming soon)\n</li></ul>\n\n<p>Now…not everything is a one-to-one comparison, and we explicitly did not set out to support any and every configuration. We aren’t dealing with resources like Network Policy and init containers, though we’re also not completely ignoring them. By mapping many of the core primitives of Kubernetes to a Fly.io resource, we’re able to focus on continuing to build the primitives that make our cloud better for workloads of all shapes and sizes.</p>\n\n<p>A key thing to notice above is that there’s no “Node”.</p>\n\n<p><a href='https://virtual-kubelet.io/' title=''>Virtual Kubelet</a> plays a central role in FKS. It’s magic, really. A Virtual Kubelet acts as if it’s a standard Kubelet running on a Node, eager to run your workloads. However, there’s no Node backing it. It instead behaves like an API, receiving requests from Kubernetes and transforming them into requests to deploy on a cloud compute service. In our case, that’s Fly Machines.</p>\n\n<p>So what we have is Kubernetes calling out to our <a href='https://virtual-kubelet.io/docs/providers/' title=''>Virtual Kubelet provider</a>, a small Golang program we run alongside K3s, to create and run your pod. It creates <a href='https://fly.io/blog/docker-without-docker/' title=''>your pod as a Fly Machine</a>, via the <a href='/docs/machines/api/' title=''>Fly Machines API</a>, deploying it to any underlying host within that region. This shifts the burden of managing hardware capacity from you to us. We think that’s a cool trick—thanks, Virtual Kubelet magic!</p>\n<h2 id='speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#speedrun' aria-label='Anchor'></a><span class='plain-code'>Speedrun</span></h2>\n<p>You can deploy your workloads (including GPUs) across any of our available regions using the Kubernetes API.</p>\n\n<p>You create a cluster with <code>flyctl</code>:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-vomuctp1\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight cmd'><code id=\"code-vomuctp1\">fly ext k8s create --name hello --org personal --region iad\n</code></pre>\n </div>\n</div>\n<p>When a cluster is created, it has the standard <code>default</code> namespace. You can inspect it:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-f85r6bqf\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight cmd'><code id=\"code-f85r6bqf\">kubectl get ns default --show-labels\n</code></pre>\n </div>\n</div><div class=\"highlight-wrapper group relative output\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-6bmj8nmt\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight output whitespace-pre'><code id=\"code-6bmj8nmt\">NAME STATUS AGE LABELS\ndefault Active 20d fly.io/app=fks-default-7zyjm3ovpdxmd0ep,kubernetes.io/metadata.name=default\n</code></pre>\n </div>\n</div>\n<p>The <code>fly.io/app</code> label shows the name of the Fly App that corresponds to your cluster.</p>\n\n<p>It would seem appropriate to deploy the <a href='https://github.com/kubernetes-up-and-running/kuard' title=''>Kubernetes Up And Running demo</a> here, but since your pods are connected over an <a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''>IPv6 WireGuard mesh</a>, we’re going to use a <a href='https://github.com/jipperinbham/kuard' title=''>fork</a> with support for <a href='https://github.com/kubernetes-up-and-running/kuard/issues/46' title=''>IPv6 DNS</a>.</p>\n<div class=\"highlight-wrapper group relative cmd\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-h0ws84lr\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight cmd'><code id=\"code-h0ws84lr\">kubectl run \\\n --image=ghcr.io/jipperinbham/kuard-amd64:blue \\\n --labels=\"app=kuard-fks\" \\\n kuard\n</code></pre>\n </div>\n</div>\n<p>And you can see its Machine representation via:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-ktbm1ey3\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight cmd'><code id=\"code-ktbm1ey3\">fly machine list --app fks-default-7zyjm3ovpdxmd0ep\n</code></pre>\n </div>\n</div><div class=\"highlight-wrapper group relative output\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-httmdmgs\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight output whitespace-pre'><code id=\"code-httmdmgs\">ID NAME STATE REGION IMAGE IP ADDRESS VOLUME CREATED LAST UPDATED APP PLATFORM PROCESS GROUP SIZE\n1852291c46ded8 kuard started iad jipperinbham/kuard-amd64:blue fdaa:0:48c8:a7b:228:4b6d:6e20:2 2024-03-05T18:54:41Z 2024-03-05T18:54:44Z shared-cpu-1x:256MB\n</code></pre>\n </div>\n</div>\n<p></div></p>\n\n<p>This is important! Your pod is a Fly Machine! While we don’t yet support all kubectl features, Fly.io tooling will “just work” for cases where we don’t yet support the kubectl way. So, for example, we don’t have <code>kubectl port-forward</code> and <code>kubectl exec</code>, but you can use flyctl to forward ports and get a shell into a pod.</p>\n\n<p>Expose it to your internal network using the standard ClusterIP Service:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-9dy6iy1l\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight cmd'><code id=\"code-9dy6iy1l\">kubectl expose pod kuard \\\n --name=kuard \\\n --port=8080 \\\n --target-port=8080 \\\n --selector='app=kuard-fks'\n</code></pre>\n </div>\n</div>\n<p>ClusterIP Services work natively, and Fly.io internal DNS supports them. Within the cluster, CoreDNS works too.</p>\n\n<p>Access this Service locally via <a href='https://fly.io/docs/networking/private-networking/#flycast-private-load-balancing' title=''>flycast</a>: Get connected to your org’s <a href='https://fly.io/docs/networking/private-networking/' title=''>6PN private WireGuard network</a>. Get kubectl to describe the <code>kuard</code> Service:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-luy1nk1t\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight cmd'><code id=\"code-luy1nk1t\">kubectl describe svc kuard\n</code></pre>\n </div>\n</div><div class=\"highlight-wrapper group relative output\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-r8ykf5mk\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight output'><code id=\"code-r8ykf5mk\">Name: kuard\nNamespace: default\nLabels: app=kuard-fks\nAnnotations: fly.io/clusterip-allocator: configured\n service.fly.io/sync-version: 11507529969321451315\nSelector: app=kuard-fks\nType: ClusterIP\nIP Family Policy: SingleStack\nIP Families: IPv6\nIP: fdaa:0:48c8:0:1::1a\nIPs: fdaa:0:48c8:0:1::1a\nPort: <unset> 8080/TCP\nTargetPort: 8080/TCP\nEndpoints: [fdaa:0:48c8:a7b:228:4b6d:6e20:2]:8080\nSession Affinity: None\nEvents: <none>\n</code></pre>\n </div>\n</div>\n<p>You can pull out the Service’s IP address from the above output, and get at the KUARD UI using that: in this case, <code>http://[fdaa:0:48c8:0:1::1a]:8080</code>. </p>\n\n<p>Using internal DNS: <code>http://<service_name>.svc.<app_name>.flycast:8080</code>. Or, in our example: <code>http://kuard.svc.fks-default-7zyjm3ovpdxmd0ep.flycast:8080</code>.</p>\n\n<p>And finally CoreDNS: <code><service_name>.<namespace>.svc.cluster.local</code> resolves to the <code>fdaa</code> IP and is routable within the cluster.</p>\n<figure class=\"post-cta\">\n <figcaption>\n <h1>Get in on the FKS beta</h1>\n <p>Email us at [email protected]</p>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-turtle.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>\n\n<h2 id='pricing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pricing' aria-label='Anchor'></a><span class='plain-code'>Pricing</span></h2>\n<p>The Fly Kubernetes Service is free during the beta. Fly Machines and Fly Volumes you create with it will cost the <a href='https://fly.io/docs/about/pricing/' title=''>same as for your other Fly.io projects</a>. It’ll be <a href='https://fly.io/docs/about/pricing/#fly-kubernetes' title=''>$75/mo per cluster</a> after that, plus the cost of the other resources you create.</p>\n<h2 id='today-and-the-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#today-and-the-future' aria-label='Anchor'></a><span class='plain-code'>Today and the future</span></h2>\n<p>Today, Fly Kubernetes supports only a portion of the Kubernetes API. You can deploy pods using Deployments/ReplicaSets. Pods are able to communicate via Services using the standard K8s DNS format. Ephemeral and persistent volumes are supported.</p>\n\n<p>The most notable absences are: multi-container pods, StatefulSets, network policies, horizontal pod autoscaling and emptyDir volumes. We’re working at supporting autoscaling and emptyDir volumes in the coming weeks and multi-container pods in the coming months.</p>\n\n<p>If you’ve made it this far and are eagerly awaiting your chance to tell us and the rest of the internet “this isn’t Kubernetes!”, well, we agree! It’s not something we take lightly. We’re still building, and conformance tests may be in the future for FKS. We’ve made a deliberate decision to only care about fast launching VMs as the one and only way to run workloads on our cloud. And we also know enough of our customers would like to use the Kubernetes API to create a fast launching VM in the form of a Pod, and that’s where this story begins. </p>", "image": { "url": "https://fly.io/blog/fks-beta-live/assets/fks-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/tigris-public-beta/", "title": "Globally Distributed Object Storage with Tigris", "description": null, "url": "https://fly.io/blog/tigris-public-beta/", "published": "2024-02-15T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. That’s pretty cool, but we want to talk about something someone else built, that <a href=\"https://fly.io/docs/reference/tigris/\" title=\"\">you can use today</a> to build applications.</p>\n</div>\n<p>There are three hard things in computer science:</p>\n\n<ol>\n<li>Cache invalidation\n</li><li>Naming things\n</li><li><a href='https://aws.amazon.com/s3/' title=''>Doing a better job than Amazon of storing files</a>\n</li></ol>\n\n<p>Of all the annoying software problems that have no business being annoying, handling a file upload in a full-stack application stands apart, a universal if tractable malady, the plantar fasciitis of programming.</p>\n\n<p>Now, the actual act of clients placing files on servers is straightforward. Your framework <a href='https://hexdocs.pm/phoenix/file_uploads.html' title=''>has</a> <a href='https://edgeguides.rubyonrails.org/active_storage_overview.html' title=''>a</a> <a href='https://docs.djangoproject.com/en/5.0/topics/http/file-uploads/' title=''>feature</a> <a href='https://expressjs.com/en/resources/middleware/multer.html' title=''>that</a> <a href='https://github.com/yesodweb/yesod-cookbook/blob/master/cookbook/Cookbook-file-upload-saving-files-to-server.md' title=''>does</a> <a href='https://laravel.com/docs/10.x/filesystem' title=''>it</a>. What’s hard is making sure that uploads stick around to be downloaded later.</p>\n<aside class=\"right-sidenote\"><p>(yes, yes, we know, <a href=\"https://youtu.be/b2F-DItXtZs?t=102\" title=\"\">sharding /dev/null</a> is faster)</p>\n</aside>\n<p>Enter object storage, a pattern you may know by its colloquial name “S3”. Object storage occupies a funny place in software architecture, somewhere between a database and a filesystem. It’s like <a href='https://man7.org/linux/man-pages/man3/malloc.3.html' title=''><code>malloc</code></a><code>()</code>, but for cloud storage instead of program memory.</p>\n\n<p><a href='https://www.kleenex.com/en-us/' title=''>S3</a>—err, object storage — is so important that it was the second AWS service ever introduced (EC2 was not the first!). Everybody wants it. We know, because they keep asking us for it.</p>\n\n<p>So why didn’t we build it?</p>\n\n<p>Because we couldn’t figure out a way to improve on S3. And we still haven’t! But someone else did, at least for the kinds of applications we see on Fly.io.</p>\n<h2 id='but-first-some-back-story' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-first-some-back-story' aria-label='Anchor'></a><span class='plain-code'>But First, Some Back Story</span></h2>\n<p>S3 checks all the boxes. It’s trivial to use. It’s efficient and cost-effective. It has redundancies that would make a DoD contractor blush. It integrates with archival services like Glacier. And every framework supports it. At some point, the IETF should just take a deep sigh and write an S3 API RFC, XML signatures and all.</p>\n\n<p>There’s at least one catch, though.</p>\n\n<p>Back in, like, ‘07 people ran all their apps from a single city. S3 was designed to work for those kinds of apps. The data, the bytes on the disks (or whatever weird hyperputer AWS stores S3 bytes on), live in one place. A specific place. In a specific data center. As powerful and inspiring as The Architects are, they are mortals, and must obey the laws of physics.</p>\n\n<p>This observation feels banal, until you realize how apps have changed in the last decade. Apps and their users don’t live in one specific place. They live all over the world. When users are close to the S3 data center, things are amazing! But things get less amazing the further away you get from the data center, and even less amazing the smaller and more frequent your reads and writes are.</p>\n\n<p>(Thought experiment: you have to pick one place in the world to route all your file storage. Where is it? Is it <a href='https://www.tripadvisor.com/Restaurant_Review-g30246-d1956555-Reviews-Ford_s_Fish_Shack_Ashburn-Ashburn_Loudoun_County_Virginia.html' title=''>Loudoun County, Virginia</a>?)</p>\n\n<p>So, for many modern apps, you end up having to <a href='https://stackoverflow.com/questions/32426249/aws-s3-bucket-with-multiple-regions' title=''>write things into different regions</a>, so that people close to the data get it from a region-specific bucket. Doing that pulls in CDN-caching things that complicated your application and put barriers between you and your data. Before you know it, you’re wearing custom orthotics on your, uh, developer feet. (<em>I am done with this metaphor now, I promise.</em>)</p>\n<aside class=\"right-sidenote\"><p>(well, okay, Backblaze B2 because somehow my bucket fits into their free tier, but you get the idea)</p>\n</aside>\n<p>Personally, I know this happens. Because I had to build one! I run a <a href='https://xeiaso.net/blog/xedn/' title=''>CDN backend</a> that’s a caching proxy for S3 in six continents across the world. All so that I can deliver images and video efficiently for the readers of my blog.</p>\n<aside class=\"right-sidenote\"><p>(shut up, it’s a sandwich)</p>\n</aside>\n<p>What if data was really global? For some applications, it might not matter much. But for others, it matters a lot. When a sandwich lover in Australia snaps a picture of a <a href='https://en.wikipedia.org/wiki/Hamdog' title=''>hamdog</a>, the people most likely to want to see that photo are also in Australia. Routing those uploads and downloads through one building in Ashburn is no way to build a sandwich reviewing empire.</p>\n\n<p>Localizing all the data sounds like a hard problem. What if you didn’t need to change anything on your end to accomplish it?</p>\n<h2 id='show-me-a-hero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#show-me-a-hero' aria-label='Anchor'></a><span class='plain-code'>Show Me A Hero</span></h2>\n<p>Building a miniature CDN infrastructure just to handle file uploads seems like the kind of thing that could take a week or so of tinkering. The Fly.io unified theory of cloud development is that solutions are completely viable for full-stack developers only when they take less than 2 hours to get working.</p>\n\n<p>AWS agrees, which is why they have a SKU for it, <a href='https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_distribution' title=''>called Cloudfront</a>, which will, at some variably metered expense, optimize the read side of a single-write-region bucket: they’ll set up <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>a simple caching CDN</a> for you. You can probably get S3 and Cloudfront working within 2 hours, especially if you’ve set it up before.</p>\n\n<p>Our friends at Tigris have this problem down to single-digit minutes, and what they came up with is a lot cooler than a cache CDN.</p>\n\n<p>Here’s how it works. Tigris runs redundant FoundationDB clusters in our regions to track objects. They use Fly.io’s NVMe volumes as a first level of cached raw byte store, and a queuing system modelled on <a href='https://www.foundationdb.org/files/QuiCK.pdf' title=''>Apple’s QuiCK paper</a> to distribute object data to multiple replicas, to regions where the data is in demand, and to 3rd party object stores… like S3.</p>\n\n<p>If your objects are less than about 128 kilobytes, Tigris makes them instantly global. By default! Things are just snappy, all over the world, automatically, because they’ve done all the work.</p>\n\n<p>But it gets better, because Tigris is also much more flexible than a cache simple CDN. It’s globally distributed from the jump, with inter-region routing baked into its distribution layer. Tigris isn’t a CDN, but rather a toolset that you can use to build arbitrary CDNs, with consistency guarantees, instant purge and relay regions.</p>\n\n<p>There’s a lot going on in this architecture, and it’d be fun to dig into it more. But for now, you don’t have to understand any of it. Because Tigris ties all this stuff together with an S3-compatible object storage API. If your framework can talk to S3, it can use Tigris.</p>\n<h2 id='fly-storage' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-storage' aria-label='Anchor'></a><span class='plain-code'><code>fly storage</code></span></h2>\n<p>To get started with this, run the <code>fly storage create</code> command:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-69koa0wf\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-69koa0wf\">$ fly storage create\nChoose a name, use the default, or leave blank to generate one: xe-foo-images\nYour Tigris project (xe-foo-images) is ready. See details and next steps with: https://fly.io/docs/reference/tigris/\n\nSetting the following secrets on xe-foo:\nAWS_REGION\nBUCKET_NAME\nAWS_ENDPOINT_URL_S3\nAWS_ACCESS_KEY_ID\nAWS_SECRET_ACCESS_KEY\n\nSecrets are staged for the first deployment\n</code></pre>\n </div>\n</div>\n<p>All you have to do is fill in a bucket name. Hit enter. All of the configuration for the AWS S3 library will be injected into your application for you. And you don’t even need to change the libraries that you’re using. <a href='https://www.tigrisdata.com/docs/sdks/s3/' title=''>The Tigris examples</a> all use the AWS libraries to put and delete objects into Tigris using the same calls that you use for S3.</p>\n\n<p>I know how this looks for a lot of you. It looks like we’re partnering with Tigris because we’re chicken, and we didn’t want to build something like this. Well, guess what: you’re right!</p>\n\n<p>Compute and networking: those are things we love and understand. Object storage? <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>We already gave away the game on how we’d design a CDN for our own content</a>, and it wasn’t nearly as slick as Tigris.</p>\n\n<p>Object storage is important. It needs to be good. We did not want to half-ass it. So we partnered with Tigris, so that they can put their full resources into making object storage as ✨magical✨ as Fly.io is.</p>\n\n<p>This also mirrors a lot of the Unix philosophy of Days Gone Past, you have individual parts that do one thing very well that are then chained together to create a composite result. I mean, come on, would you seriously want to buy your servers the same place you buy your shoes?</p>\n<h2 id='one-bill-to-rule-them-all' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#one-bill-to-rule-them-all' aria-label='Anchor'></a><span class='plain-code'>One bill to rule them all</span></h2>\n<p>Well, okay, the main reason why you would want to do that is because having everything under one bill makes it really easy for your accounting people. So, to make one bill for your computer, your block storage, your databases, your networking, and your object storage, we’ve wrapped everything under one bill. You don’t have to create separate accounts with Supabase or Upstash or PlanetScale or Tigris. Everything gets charged to your Fly.io bill and you pay one bill per month.</p>\n<aside class=\"right-sidenote\"><p>This was actually going to be posted on Valentine’s Day, but we had to wait for the chocolate to go on sale.</p>\n</aside>\n<p>This is our Valentine’s Day gift to you all. Object storage that just works. Stay tuned because we have a couple exciting features that build on top of the integration of Fly.io and Tigris that allow really unique things, such as truly global static website hosting and turning your bucket into a CDN in 5 minutes at most.</p>\n\n<p>Here’s to many more happy developer days to come.</p>", "image": { "url": "https://fly.io/blog/tigris-public-beta/assets/tigris-public-beta-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/gpu-ga/", "title": "GPUs on Fly.io are available to everyone!", "description": null, "url": "https://fly.io/blog/gpu-ga/", "published": "2024-02-12T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>Fly.io makes it easy to spin up compute around the world, now including powerful GPUs. Unlock the power of large language models, text transcription, and image generation with our datacenter-grade muscle!</p>\n</div>\n<p>GPUs are now available to everyone!</p>\n\n<p>We know you’ve been excited about wanting to use GPUs on Fly.io and we’re happy to announce that they’re available for everyone. If you want, you can spin up GPU instances with any of the following cards:</p>\n\n<ul>\n<li>Ampere A100 (40GB) <code>a100-40gb</code>\n</li><li>Ampere A100 (80GB) <code>a100-80gb</code>\n</li><li>Lovelace L40s (48GB) <code>l40s</code>\n</li></ul>\n\n<p>To use a GPU instance today, change the <code>vm.size</code> for one of your apps or processes to any of the above GPU kinds. Here’s how you can spin up an <a href='https://ollama.ai' title=''>Ollama</a> server in seconds:</p>\n<div class=\"highlight-wrapper group relative toml\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-mgip5vdl\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-mgip5vdl\"><span class=\"py\">app</span> <span class=\"p\">=</span> <span class=\"s\">\"your-app-name\"</span>\n<span class=\"py\">region</span> <span class=\"p\">=</span> <span class=\"s\">\"ord\"</span>\n<span class=\"py\">vm.size</span> <span class=\"p\">=</span> <span class=\"s\">\"l40s\"</span>\n\n<span class=\"nn\">[http_service]</span>\n <span class=\"py\">internal_port</span> <span class=\"p\">=</span> <span class=\"mi\">11434</span>\n <span class=\"py\">force_https</span> <span class=\"p\">=</span> <span class=\"kc\">false</span>\n <span class=\"py\">auto_stop_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n <span class=\"py\">auto_start_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n <span class=\"py\">min_machines_running</span> <span class=\"p\">=</span> <span class=\"mi\">0</span>\n <span class=\"py\">processes</span> <span class=\"p\">=</span> <span class=\"nn\">[\"app\"]</span>\n\n<span class=\"nn\">[build]</span>\n <span class=\"py\">image</span> <span class=\"p\">=</span> <span class=\"s\">\"ollama/ollama\"</span>\n\n<span class=\"nn\">[mounts]</span>\n <span class=\"py\">source</span> <span class=\"p\">=</span> <span class=\"s\">\"models\"</span>\n <span class=\"py\">destination</span> <span class=\"p\">=</span> <span class=\"s\">\"/root/.ollama\"</span>\n <span class=\"py\">initial_size</span> <span class=\"p\">=</span> <span class=\"s\">\"100gb\"</span>\n</code></pre>\n </div>\n</div>\n<p>Deploy this and bam, large language model inferencing from anywhere. If you want a private setup, see the article <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a> for more information. You never know when you have a sandwich emergency and don’t know what you can make with what you have on hand.</p>\n\n<p>We are working on getting some lower-cost A10 GPUs in the next few weeks. We’ll update you when they’re ready.</p>\n\n<p>If you want to explore the possibilities of GPUs on Fly.io, here’s a few articles that may give you ideas:</p>\n\n<ul>\n<li><a href='https://fly.io/blog/not-midjourney-bot/' title=''>Deploy Your Own (Not) MidJourney Bot On Fly GPUs</a>\n</li><li><a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a>\n</li><li><a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>Transcribing on Fly GPU Machines</a>\n</li></ul>\n\n<p>Depending on factors such as your organization’s age and payment history, you may need to go through additional verification steps.</p>\n\n<p>If you’ve been experimenting with Fly.io GPUs and have made something cool, let us know on the <a href='https://community.fly.io/' title=''>Community Forums</a> or by mentioning us <a href='https://hachyderm.io/@flydotio' title=''>on Mastodon</a>! We’ll boost the cool ones.</p>", "image": { "url": "https://fly.io/blog/gpu-ga/assets/gpu-ga-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/event-driven-machines/", "title": "Event Driven Machines", "description": null, "url": "https://fly.io/blog/event-driven-machines/", "published": "2024-02-05T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We have fast booting VM’s, so why not <a href=\"https://fly.io/docs/speedrun/\" title=\"\">take advantage of them</a>?</p>\n</div>\n<p>Serverless is great because is has good ergonomics - when an event is received, a “not-server” boots quickly, code is run, and then everything is torn down. We’re billed only on usage.</p>\n\n<p>It turns out that Fly.io shares many of <a href='https://fly.io/blog/the-serverless-server/' title=''>the same ergonomics</a> as serverless. Can we do a serverless on Fly.io? 🦆 Well, if it’s quacking like a duck, let’s call it a mallard.</p>\n\n<p>Here’s a useful pattern for triggering our own not-servers with Fly Machines.</p>\n<h2 id='triggering-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#triggering-machines' aria-label='Anchor'></a><span class='plain-code'>Triggering Machines</span></h2>\n<p>I want to make Machines do some work based on my own events. Fly.io can already <a href='https://fly.io/docs/apps/autostart-stop/' title=''>stop Machines when idle</a> based on HTTP, so let’s concentrate on non-HTTP events.</p>\n\n<p>The process of running evented Machines involves:</p>\n\n<ol>\n<li>Listening for events\n</li><li>Spinning up Fly Machines to run our code (with the events as context)\n</li><li>Having event-aware code to run\n</li></ol>\n\n<p>To do this, I made a project and named it <a href='https://github.com/fly-apps/lambdo' title=''><strong class='font-semibold text-navy-950'>Lambdo</strong></a> because reasons.\nYou can consider this project “reference architecture” in the same way you call a toddler’s scribbling “art”.</p>\n\n<p>The goal is to run some of our code on a fresh not-server when an event is received. We want this done efficiently - a Machine should only exist long enough to process an event or 3.</p>\n\n<p>Lambdo does just that - it receives some events, and spins up Fly Machines with those events placed <em>inside</em> the VMs. Once the code finishes, the Machine is destroyed.</p>\n<div class='group relative min-w-0 bg-white shadow-md shadow-navy-500/10 rounded-xl mb-7 ring-1 ring-navy-300/40'><button type='button' class='bubble-wrap z-20 absolute right-2.5 top-2.5 text-transparent group-hover:text-navy-950 hocus:text-violet-600 bg-transparent group-hover:bg-white hocus:bg-violet-200/40 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none' data-wrap-target='#table-hzpri2l5' data-wrap-type='nowrap'><svg class='w-5 h-5 pointer-events-none' viewBox='0 0 20 20' fill='none' stroke='currentColor' stroke-width='1.5' stroke-linecap='round' stroke-linejoin='round'><g buffered-rendering='static'><path d='M11.912 10.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.314 2.314 0 00-2.315-2.31H4.959M15.187 14.5H4.959M8.802 10H4.959' /><path d='M13.081 8.466l-1.548 1.571 1.548 1.571' /></g></svg><span class='bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950'>Wrap text</span></button><div class='min-w-0 overflow-x-auto rounded-xl'><table class='table-stripe table-stretch table-pad text-sm whitespace-nowrap m-0' id='table-hzpri2l5'><thead class='text-navy-950 text-left'><tr>\n<th style=\"text-align: center\"><img alt=\"the files are inside the computer\" src=\"/blog/event-driven-machines/assets/files-are-inside-the-computer-cover.webp\" /></th>\n</tr>\n</thead><tbody><tr>\n<td style=\"text-align: center\">The files are <em>in</em> the computer!</td>\n</tr>\n</tbody></table></div></div><h2 id='listening-for-events' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#listening-for-events' aria-label='Anchor'></a><span class='plain-code'>Listening for Events</span></h2>\n<p>For our purposes, an event is just a JSON object. <code>{\"any\": \"object\", \"will\": \"do\"}</code>.</p>\n\n<p>We want to turn events into compute, so we need some sort of event system. I decided to use a queue.</p>\n<h3 id='the-queue' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-queue' aria-label='Anchor'></a><span class='plain-code'>The Queue</span></h3>\n<p>The first thing I needed was a place to send events! I chose to use SQS, which let me continue to pretend servers don’t exist.</p>\n\n<p>It’s no surprise then that the first part of this project is <a href='https://github.com/fly-apps/lambdo/blob/main/internal/sqs/get_events.go' title=''>code that polls SQS</a>.</p>\n\n<p>When the polling returns some non-zero number of events, it collects the SQS messages’ JSON strings (and some meta data), resulting in an array of objects (a list of events).</p>\n\n<p>Then we send these events to some Machines.</p>\n<h2 id='spinning-up-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#spinning-up-machines' aria-label='Anchor'></a><span class='plain-code'>Spinning Up Machines</span></h2>\n<p>Fly Machines are fast-booting Micro-VM’s, controlled by an <a href='https://fly.io/docs/machines/working-with-machines/' title=''>API</a>.</p>\n\n<p>A feature of that API is the ability to <a href='https://community.fly.io/t/machine-files/14453' title=''>create files</a> on a new Machine. This is how we’ll get our events into the Machine.</p>\n\n<p>When Lambdo creates a Machine, it places a file at <code>/tmp/events.json</code>. Our code just needs to read that file and parse the JSON.</p>\n<h3 id='running-our-code' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#running-our-code' aria-label='Anchor'></a><span class='plain-code'>Running Our Code</span></h3>\n<p>Part of the ergonomics of Serverless is (usually) being limited to running just a function. Fly.io doesn’t really care what you run, which is to our advantage. We can choose to write discreet functions per event, or we can bring our whole <a href='https://signalvnoise.com/svn3/the-majestic-monolith/' title=''>Majestic Monolith</a> to bear.</p>\n\n<p>How do we package up our code? The real answer is “however you want!”, but here’s 2 ideas.</p>\n\n<p><strong class='font-semibold text-navy-950'>Use Your Existing Code Base</strong></p>\n\n<p>You can just use your existing code base. This is especially easy if you’re already deploying apps to Fly.io.</p>\n\n<p>All we’d need to do is add some additional code - a command perhaps (<code>rake</code>, <code>artisan</code>, whatever) - that sucks in that JSON, iterates over the events, and does some stuff.</p>\n<div class=\"highlight-wrapper group relative php\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-4juzgucl\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-4juzgucl\"><span class=\"nv\">$events</span> <span class=\"o\">=</span> <span class=\"nb\">json_decode</span><span class=\"p\">(</span><span class=\"nb\">file_get_contents</span><span class=\"p\">(</span><span class=\"s2\">\"/tmp/events.json\"</span><span class=\"p\">));</span>\n\n<span class=\"k\">foreach</span> <span class=\"p\">(</span><span class=\"nv\">$events</span> <span class=\"k\">as</span> <span class=\"nv\">$event</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n <span class=\"c1\">// do a thing</span>\n<span class=\"p\">}</span>\n</code></pre>\n </div>\n</div>\n<p>When we create an event, we’ll tell Lambdo how to run your code - more on that later.</p>\n\n<p><strong class='font-semibold text-navy-950'>Use Lambdo’s Base Images</strong></p>\n\n<p>This project also provides some “runtimes” (base images). This is a bit more “traditional serverless”, were you provide a function to run.</p>\n\n<p>Lambdo contains <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes' title=''>two runtimes</a> right now - Node and PHP. There could be more, of course, but you know…lazy.</p>\n\n<p>The Node runtime <a href='https://github.com/fly-apps/lambdo/blob/main/runtimes/js/src/index.js' title=''>contains some code</a> that will read the JSON payload file (again, just an array of JSON events), and call a user-supplied JS function once per event.</p>\n\n<p>An <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes/js/sample-project' title=''>example is here</a> - our code just needs to export a function that does stuff to the given event:</p>\n<div class=\"highlight-wrapper group relative javascript\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-d6ki7m4i\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-d6ki7m4i\"><span class=\"c1\">// File /app/index.js</span>\n<span class=\"nx\">exports</span><span class=\"p\">.</span><span class=\"nx\">handler</span> <span class=\"o\">=</span> <span class=\"k\">async</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">event</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n <span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nx\">log</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">Let's process an event! The event:</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"nx\">event</span><span class=\"p\">)</span>\n<span class=\"p\">}</span>\n</code></pre>\n </div>\n</div>\n<p>The <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes/php' title=''>PHP runtime</a> is the same idea, a user-supplied handler looks like this:</p>\n<div class=\"highlight-wrapper group relative php\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-coch74a\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-coch74a\"><span class=\"c1\">// File /app/index.php</span>\n<span class=\"k\">return</span> <span class=\"k\">function</span> <span class=\"n\">function</span><span class=\"p\">(</span><span class=\"kt\">array</span> <span class=\"nv\">$event</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n <span class=\"c1\">// Do something with $event</span>\n<span class=\"p\">}</span>\n</code></pre>\n </div>\n</div>\n<p>Explore the <a href='https://github.com/fly-apps/lambdo/tree/main/runtimes' title=''>runtime</a> directory of the project to see how that’s put together.</p>\n<h2 id='sending-an-event' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#sending-an-event' aria-label='Anchor'></a><span class='plain-code'>Sending an Event</span></h2>\n<p>Since our events are sent via SQS queue, it would be helpful to see an example SQS message. Remember how I mentioned the SQS message has some meta data?</p>\n\n<p>Here’s an example, with said meta data:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-uwc3p0p\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-uwc3p0p\">aws sqs send-message <span class=\"se\">\\</span>\n <span class=\"nt\">--queue-url</span><span class=\"o\">=</span>https://sqs.<region>.amazonaws.com/<account>/<queue> <span class=\"se\">\\</span>\n <span class=\"nt\">--message-body</span><span class=\"o\">=</span><span class=\"s1\">'{\"foo\": \"bar\"}'</span> <span class=\"se\">\\</span>\n <span class=\"nt\">--message-attributes</span><span class=\"o\">=</span><span class=\"s1\">'{\n\"size\":{\"DataType\":\"String\",\"StringValue\":\"performance-2x\"}, \n\"image\":{\"DataType\":\"String\",\"StringValue\":\"fideloper/lambdo-php-sample:latest\"}\n}'</span>\n</code></pre>\n </div>\n</div>\n<p>The Body field of the SQS message is assumed to be a JSON string (it’s the event itself, and its contents are arbitrary - whatever makes sense for you).</p>\n\n<p>The message Attributes contains the meta data - up to 3 important details:</p>\n\n<ol>\n<li><code>image</code>: The image to run (it might be a Docker Hub image, or something you pushed to registry.fly.io). This is <strong class='font-semibold text-navy-950'>required</strong>.\n</li><li><code>size</code>: The CPU size and type to use† - defaults to <code>performance-2x</code>\n</li><li><code>command</code>: The command to run, which is the Docker <code>CMD</code> equivalent - defaults to whatever your <code>CMD</code> is set in the <code>Dockerfile</code> used to create the Machine image.††\n</li></ol>\n\n<p>†You can get valid values for the <code>size</code> option by running <code>fly platform vm-sizes</code>.</p>\n\n<p>††It’s an array form, e.g. <code>[\"php\", \"artisan\", \"foo\"]</code>, you may need to do some escaping of double quotes if you’re sending messages to SQS via terminal.</p>\n<h2 id='we-did-a-lambda' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-did-a-lambda' aria-label='Anchor'></a><span class='plain-code'>We did a Lambda?</span></h2>\n<p>Fly.io isn’t serverless, but it has all these primitives that add up to serverless. You have events, Fly.io has fast-booting VM’s. They just make sense together!</p>\n\n<p>What we did here is use <a href='https://github.com/fly-apps/lambdo' title=''><strong class='font-semibold text-navy-950'>Lambdo</strong> to respond to events by spinning up a Machine</a>. Our code can process those events any way we want.</p>\n\n<p>What I like about this approach is how flexible it can be. We can choose the base image to use and the server type (even using GPU-enabled Machines) <em>per event</em>.\nSince we have full control over the Machine VM’s responding to the events, we can do whatever we want inside of them. Pretty neat!</p>", "image": { "url": "https://fly.io/blog/event-driven-machines/assets/lambdo-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/delegate-tasks-to-fly-machines/", "title": "Delegating tasks to Fly Machines", "description": null, "url": "https://fly.io/blog/delegate-tasks-to-fly-machines/", "published": "2024-02-01T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io. We run apps for our users on hardware we host around the world. Leveraging Fly.io Machines and Fly.io’s private network can make delegating expensive tasks a breeze. It’s easy to <a href=\"/docs/speedrun/\" title=\"\">get started</a>!</p>\n</div>\n<p>There are many ways to delegate work in web applications, from using background workers to serverless architecture. In this article, we explore a new machine pattern that takes advantage of Fly Machines and distinct process groups to make quick work of resource-intensive tasks.</p>\n<h2 id='the-problem' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-problem' aria-label='Anchor'></a><span class='plain-code'>The Problem</span></h2>\n<p>Let’s say you’re building a web application that has a few tasks that demand a hefty amount of memory or CPU juice. Resizing images, for example, can require a shocking amount of memory, but you might not need that much memory <em>all</em> of the time, for handling most of your web requests. Why pay for all that horsepower when you don’t need it most of the time?</p>\n\n<p>What if there’s a different way to delegate these resource-intensive tasks?</p>\n<h2 id='the-solution' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-solution' aria-label='Anchor'></a><span class='plain-code'>The Solution</span></h2>\n<p>What if you could simply delegate these types of tasks to a more powerful machine <em>only</em> when necessary? Let’s build an example of this method in a sample app. We’ll be using Next.js today, but this pattern is framework (and language) agnostic.</p>\n\n<p>Here’s how it will work:</p>\n\n<ul>\n<li>A request hits an endpoint that does some resource-intensive tasks\n</li><li>The request is passed on to a copy of your app that’s running on a more beefy machine\n</li><li>The beefy machine performs the intensive work and then hands the result back to the user via the “weaker” machine.\n</li></ul>\n\n<p><img alt=\"(A 3 panel comic of two characters, one small and one big and strong, both with computer screens for heads. Panel 1: Little guy hands the big guy a jar of pickles. Panel 2: Big guy opens the pickle jar. Panel 3: Big guy hands back the opened jar to the little guy, who is pleased; Illustration by Annie Sexton)\" src=\"/blog/delegate-tasks-to-fly-machines/assets/./3-panel-comic-delegate-tasks-to-fly-machines.webp\" /></p>\n\n<p>To demonstrate this task-delegation pattern, we’re going to start with a single-page application that looks like this:</p>\n\n<p><img alt=\"(Screenshot of the demo app; its a single-page app with the header and description \"Open Pickle Jar: You've got a jar of pickles (a zip file of some high-def pickle photos) that you would like to open (resize and display below)\". Under the description there are two inputs, one for width and one for height, and a button that says \"Open pickle jar\")\" src=\"/blog/delegate-tasks-to-fly-machines/assets/./pickle-jar-screenshot.webp\" /></p>\n\n<p>Our “Open Pickle Jar” app is quite simple: you provide the width and height and it goes off and resizes some high-resolution photos to those dimensions (exciting!).</p>\n\n<p>If you’d like to follow along, you can clone the <code>start-here</code> branch of this repository: <a href='https://github.com/fly-apps/open-pickle-jar' title=''>https://github.com/fly-apps/open-pickle-jar</a> . The final changes are visible on the <code>main</code> branch. This app uses S3 for image storage, so you’ll need to create a bucket called <code>open-pickle-jar</code> and provide <code>AWS_REGION</code>, <code>AWS_ACCESS_KEY_ID</code>, and <code>AWS_SECRET_ACCESS_KEY</code> as environment variables.</p>\n\n<p>This task is really just a stand-in for any HTTP request that kicks off a resource-intensive task. Get the request from the user, delegate it to a more powerful machine, and then return the result to the user. It’s what happens when you can’t open a pickle jar, and you ask for someone to help.</p>\n\n<p>Before we start, let’s define some terms and what they mean on Fly.io:</p>\n\n<ul>\n<li><strong class='font-semibold text-navy-950'>Machines:</strong> Extremely fast-booting VMs. They can exist in different regions and even run different processes.\n</li><li><strong class='font-semibold text-navy-950'>App:</strong> An abstraction for a group of Machines running your code on Fly.io, along with the configuration, provisioned resources, and data we need to keep track of to run and route to your Machines.\n</li><li><strong class='font-semibold text-navy-950'>Process group:</strong> A collection of Machines running a specific process. Many apps only run a single process (typically a public-facing HTTP server), but you can define any number of them.\n</li><li><strong class='font-semibold text-navy-950'>fly.toml:</strong> A configuration file for deploying apps on Fly.io where you can set things like Machine specs, process groups, regions, and more.\n</li></ul>\n\n<hr>\n\n<p><strong class='font-semibold text-navy-950'>Setup Overview</strong></p>\n\n<p>Here’s what we’ll need for our application:</p>\n\n<ol>\n<li>A <strong class='font-semibold text-navy-950'>route</strong> that performs our resource-intensive task\n</li><li>A <strong class='font-semibold text-navy-950'>wrapper function</strong> that either:\n\n<ol>\n<li>Runs our resource-intensive task OR\n</li><li>Forwards the request to our more powerful Machine\n</li></ol>\n</li><li><strong class='font-semibold text-navy-950'>Two process groups</strong> running the <em>same process</em> but with differing Machine specs:\n\n<ol>\n<li>One for accepting HTTP traffic and handling most requests (let’s call it <code>web</code>)\n</li><li>One internal-only group for doing the heavy lifting (let’s call it <code>worker</code>)\n</li></ol>\n</li></ol>\n\n<p>In short, this is what our architecture will look like, a standard web and worker duo.</p>\n\n<p><img alt=\"(A simple graphic illustrating two servers; a small box containing \"npm run start\" and a larger box containing the same thing. The small is labeled \"web\" and the larger box is labeled \"worker\".)\" src=\"/blog/delegate-tasks-to-fly-machines/assets/./web-worker.webp\" /></p>\n<h3 id='creating-our-route' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#creating-our-route' aria-label='Anchor'></a><span class='plain-code'>Creating our route</span></h3>\n<p>Next.js has two distinct routing patterns: Pages and App router. We’ll use the App router in our example since it’s the preferred method moving forward.</p>\n\n<p>Under your <code>/app</code> directory, create a new folder called <code>/open-pickle-jar</code> containing a <code>route.ts</code> .</p>\n\n<p>(We’re using TypeScript here, but feel free to use normal JavaScript if you prefer!)</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-lg2jvd1h\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-lg2jvd1h\">...\n/app\n /open-pickle-jar\n route.ts\n...\n</code></pre>\n </div>\n</div>\n<p>Inside <code>route.ts</code> we’ll flesh out our endpoint:</p>\n<div class=\"highlight-wrapper group relative typescript\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-x0guz9t5\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-x0guz9t5\"><span class=\"c1\">// /app/open-pickle-jar/route.ts</span>\n\n<span class=\"k\">import</span> <span class=\"nx\">delegateToWorker</span> <span class=\"k\">from</span> <span class=\"dl\">\"</span><span class=\"s2\">@/utils/delegateToWorker</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"k\">import</span> <span class=\"p\">{</span> <span class=\"nx\">NextRequest</span><span class=\"p\">,</span> <span class=\"nx\">NextResponse</span> <span class=\"p\">}</span> <span class=\"k\">from</span> <span class=\"dl\">\"</span><span class=\"s2\">next/server</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"k\">import</span> <span class=\"p\">{</span> <span class=\"nx\">openPickleJar</span> <span class=\"p\">}</span> <span class=\"k\">from</span> <span class=\"dl\">\"</span><span class=\"s2\">../openPickleJar</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n\n<span class=\"k\">export</span> <span class=\"k\">async</span> <span class=\"kd\">function</span> <span class=\"nx\">POST</span><span class=\"p\">(</span><span class=\"nx\">request</span><span class=\"p\">:</span> <span class=\"nx\">NextRequest</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n <span class=\"kd\">const</span> <span class=\"p\">{</span> <span class=\"nx\">width</span><span class=\"p\">,</span> <span class=\"nx\">height</span> <span class=\"p\">}</span> <span class=\"o\">=</span> <span class=\"k\">await</span> <span class=\"nx\">request</span><span class=\"p\">.</span><span class=\"nx\">json</span><span class=\"p\">();</span>\n <span class=\"kd\">const</span> <span class=\"nx\">path</span> <span class=\"o\">=</span> <span class=\"nx\">request</span><span class=\"p\">.</span><span class=\"nx\">nextUrl</span><span class=\"p\">.</span><span class=\"nx\">pathname</span><span class=\"p\">;</span>\n <span class=\"kd\">const</span> <span class=\"nx\">body</span> <span class=\"o\">=</span> <span class=\"k\">await</span> <span class=\"nx\">delegateToWorker</span><span class=\"p\">(</span><span class=\"nx\">path</span><span class=\"p\">,</span> <span class=\"nx\">openPickleJar</span><span class=\"p\">,</span> <span class=\"p\">{</span> <span class=\"nx\">width</span><span class=\"p\">,</span> <span class=\"nx\">height</span> <span class=\"p\">});</span>\n <span class=\"k\">return</span> <span class=\"nx\">NextResponse</span><span class=\"p\">.</span><span class=\"nx\">json</span><span class=\"p\">(</span><span class=\"nx\">body</span><span class=\"p\">);</span>\n<span class=\"p\">}</span>\n</code></pre>\n </div>\n</div>\n<p>The function <code>openPickleJar</code> that we’re importing contains our resource-intensive task, which in this case is extracting images from a <code>.zip</code> file, resizing them all to the new dimensions, and returning the new image URLs.</p>\n\n<p>The <code>POST</code> function is how one define routes for specific HTTP methods in Next.js, and ours implements a function <code>delegateToWorker</code> that accepts the path of the current endpoint (<code>/open-pickle-jar</code>) our resource-intensive function, and the same request parameters. This function doesn’t yet exist, so let’s build that next!</p>\n<h3 id='creating-our-wrapper-function' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#creating-our-wrapper-function' aria-label='Anchor'></a><span class='plain-code'>Creating our wrapper function</span></h3>\n<p>Now that we’ve set up our endpoint, let’s flesh out the wrapper function that delegates our request to a more powerful machine.</p>\n\n<p>We haven’t defined our process groups just yet, but if you recall, the plan is to have two:</p>\n\n<ol>\n<li><code>web</code> - Our standard web server\n</li><li><code>worker</code> - For opening pickle jars (e.g. doing resource-intensive work). It’s essentially a duplicate of <code>web</code>, but running on beefier Machines.\n</li></ol>\n\n<p>Here’s what we want this wrapper function to do:</p>\n\n<ul>\n<li>If the current machine is a <code>worker</code> , proceed to execute the resource-intensive task\n</li><li>If the current machine is NOT a <code>worker</code> , make a new request to the identical endpoint on a <code>worker</code> Machine\n</li></ul>\n\n<p>Inside your <code>/utils</code> directory, create a file called <code>delegateToWorker.ts</code> with the following content:</p>\n<div class=\"highlight-wrapper group relative typescript\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-c07fgdhq\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-c07fgdhq\"><span class=\"c1\">// /utils/delegateToWorker.ts</span>\n\n<span class=\"k\">export</span> <span class=\"k\">default</span> <span class=\"k\">async</span> <span class=\"kd\">function</span> <span class=\"nx\">delegateToWorker</span><span class=\"p\">(</span><span class=\"nx\">path</span><span class=\"p\">:</span> <span class=\"kr\">string</span><span class=\"p\">,</span> <span class=\"nx\">func</span><span class=\"p\">:</span> <span class=\"p\">(...</span><span class=\"nx\">args</span><span class=\"p\">:</span> <span class=\"kr\">any</span><span class=\"p\">[])</span> <span class=\"o\">=></span> <span class=\"nb\">Promise</span><span class=\"o\"><</span><span class=\"kr\">any</span><span class=\"o\">></span><span class=\"p\">,</span> <span class=\"nx\">args</span><span class=\"p\">:</span> <span class=\"nx\">object</span><span class=\"p\">):</span> <span class=\"nb\">Promise</span><span class=\"o\"><</span><span class=\"kr\">any</span><span class=\"o\">></span> <span class=\"p\">{</span>\n\n <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"nx\">process</span><span class=\"p\">.</span><span class=\"nx\">env</span><span class=\"p\">.</span><span class=\"nx\">FLY_PROCESS_GROUP</span> <span class=\"o\">===</span> <span class=\"dl\">'</span><span class=\"s1\">worker</span><span class=\"dl\">'</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n <span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nx\">log</span><span class=\"p\">(</span><span class=\"dl\">'</span><span class=\"s1\">running on the worker...</span><span class=\"dl\">'</span><span class=\"p\">);</span>\n <span class=\"k\">return</span> <span class=\"nx\">func</span><span class=\"p\">({...</span><span class=\"nx\">args</span><span class=\"p\">});</span>\n\n <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"p\">{</span>\n <span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nx\">log</span><span class=\"p\">(</span><span class=\"dl\">'</span><span class=\"s1\">sending new request to worker...</span><span class=\"dl\">'</span><span class=\"p\">);</span>\n <span class=\"kd\">const</span> <span class=\"nx\">workerHost</span> <span class=\"o\">=</span> <span class=\"nx\">process</span><span class=\"p\">.</span><span class=\"nx\">env</span><span class=\"p\">.</span><span class=\"nx\">NODE_ENV</span> <span class=\"o\">===</span> <span class=\"dl\">'</span><span class=\"s1\">development</span><span class=\"dl\">'</span> <span class=\"p\">?</span> <span class=\"dl\">'</span><span class=\"s1\">localhost:3001</span><span class=\"dl\">'</span> <span class=\"p\">:</span> <span class=\"s2\">`worker.process.</span><span class=\"p\">${</span><span class=\"nx\">process</span><span class=\"p\">.</span><span class=\"nx\">env</span><span class=\"p\">.</span><span class=\"nx\">FLY_APP_NAME</span><span class=\"p\">}</span><span class=\"s2\">.internal:3000`</span><span class=\"p\">;</span>\n\n <span class=\"kd\">const</span> <span class=\"nx\">response</span> <span class=\"o\">=</span> <span class=\"k\">await</span> <span class=\"nx\">fetch</span><span class=\"p\">(</span><span class=\"s2\">`http://</span><span class=\"p\">${</span><span class=\"nx\">workerHost</span><span class=\"p\">}${</span><span class=\"nx\">path</span><span class=\"p\">}</span><span class=\"s2\">`</span><span class=\"p\">,</span> <span class=\"p\">{</span>\n <span class=\"na\">method</span><span class=\"p\">:</span> <span class=\"dl\">'</span><span class=\"s1\">POST</span><span class=\"dl\">'</span><span class=\"p\">,</span>\n <span class=\"na\">headers</span><span class=\"p\">:</span> <span class=\"p\">{</span>\n <span class=\"dl\">'</span><span class=\"s1\">Content-Type</span><span class=\"dl\">'</span><span class=\"p\">:</span> <span class=\"dl\">'</span><span class=\"s1\">application/json</span><span class=\"dl\">'</span>\n <span class=\"p\">},</span>\n <span class=\"na\">body</span><span class=\"p\">:</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nx\">stringify</span><span class=\"p\">({...</span><span class=\"nx\">args</span> <span class=\"p\">})</span>\n <span class=\"p\">});</span>\n <span class=\"k\">return</span> <span class=\"nx\">response</span><span class=\"p\">.</span><span class=\"nx\">json</span><span class=\"p\">();</span>\n <span class=\"p\">}</span>\n\n<span class=\"p\">}</span>\n</code></pre>\n </div>\n</div>\n<p>In our <code>else</code> section, you’ll notice that while developing locally (aka, when <code>NODE_ENV</code> is <code>development</code>) we define the hostname of our <code>worker</code> process to be <code>localhost:3001</code>. Typically Next.js apps run on port <code>3000</code>, so while testing our app locally, we can have two instances of our process running in different terminal shells:</p>\n\n<ul>\n<li><code>npm run dev</code> - This will run on <code>localhost:3000</code> and will act as our local <code>web</code> process\n</li><li><code>FLY_PROCESS_GROUP=worker npm run dev</code> - This will run on <code>localhost:3001</code> and will act as our <code>worker</code> process (Next.js should auto-increment the port if the original <code>3000</code> is already in use)\n</li></ul>\n\n<p>Also, if you’re wondering about the <code>FLY_PROCESS_GROUP</code> and <code>FLY_APP_NAME</code> constants, these are <a href='https://fly.io/docs/reference/runtime-environment/' title=''>Fly.io-specific runtime environment variables</a> available on all apps.</p>\n<h3 id='accessing-our-worker-machines-internal' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accessing-our-worker-machines-internal' aria-label='Anchor'></a><span class='plain-code'>Accessing our <code>worker</code> Machines (<code>.internal</code>)</span></h3>\n<p>Now, when this code is running in production (aka <code>NODE_ENV</code> is NOT <code>development</code>) you’ll see that we’re using a unique hostname to access our <code>worker</code> Machine.</p>\n\n<p>Apps belonging to the same organization on Fly.io are provided a number of <a href='https://fly.io/docs/networking/private-networking/#fly-io-internal-addresses' title=''>internal addresses</a>. These <code>.internal</code> addresses let you point to different Apps and Machines in your private network. For example:</p>\n\n<ul>\n<li><code><region>.<app name>.internal</code> – To reach app instances in a particular region, like <code>gru.my-cool-app.internal</code>\n</li><li><code><app instance ID>.<app name>.internal</code> - To reach a <em>specific</em> app instance.\n</li><li><code><process group>.process.<app name>.internal</code> - To target app instances belonging to a specific process group. <strong class='font-semibold text-navy-950'>This is what we’re using in our app.</strong>\n</li></ul>\n\n<p>Since our <code>worker</code> process group is running the same process as our <code>web</code> process (in our case, <code>npm run start</code>), we’ll also need to make sure we use the same internal port (<code>3000</code>).</p>\n<h3 id='defining-our-process-groups-and-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#defining-our-process-groups-and-machines' aria-label='Anchor'></a><span class='plain-code'>Defining our process groups and Machines</span></h3>\n<p>The last thing to do will be to define our two process groups and their respective Machine specs. We’ll do this by editing our <code>fly.toml</code> configuration.</p>\n\n<p>If you don’t have this file, go ahead and create a blank one and use the content below, but replace <code>app = open-pickle-jar</code> with your app’s name, as well as your preferred <code>primary_region</code>. If you don’t know what region you’d like to deploy to, <a href='https://fly.io/docs/reference/regions/' title=''>here’s the list of them</a>.</p>\n\n<p><strong class='font-semibold text-navy-950'>Before you deploy:</strong> Note that deploying this example app will spin up <strong class='font-semibold text-navy-950'>billable</strong> machines. Please feel free to alter the Machine (<code>[[vm]]</code>) specs listed here to ones that suit your budget or app’s needs.</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-ffgx1pjb\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-ffgx1pjb\">app = \"open-pickle-jar\"\nprimary_region = \"sea\"\n\n[build]\n\n[processes]\n web = \"npm run start\"\n worker = \"npm run start\"\n\n[http_service]\n internal_port = 3000\n force_https = true\n auto_stop_machines = true\n auto_start_machines = true\n min_machines_running = 1\n processes = [\"web\"]\n\n[[vm]]\n cpu_kind = \"shared\"\n cpus = 1\n memory_mb = 1024\n processes = [\"web\"]\n\n[[vm]]\n size = \"performance-4x\"\n processes = [\"worker\"]\n</code></pre>\n </div>\n</div>\n<p>And that’s it! With our <code>fly.toml</code> finished, we’re ready to deploy our app!</p>\n\n<p><img src=\"https://slabstatic.com/prod/uploads/p1b436gf/posts/images/tH4GaGLVaDkh3RhIwCpiDRX3.png\" /></p>\n<h2 id='discussion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#discussion' aria-label='Anchor'></a><span class='plain-code'>Discussion</span></h2>\n<p>Today we built a machine pattern on top of Fly.io. This pattern allows us to have a lighter request server that can delegate certain tasks to a stronger server, meaning that we can have one Machine do all the heavy lifting that could block everything else while the other handles all the simple tasks for users. With this in mind, this is a fairly naïve implementation, and we can make this much better:</p>\n<h3 id='using-a-queue-for-better-resiliency' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#using-a-queue-for-better-resiliency' aria-label='Anchor'></a><span class='plain-code'>Using a queue for better resiliency</span></h3>\n<p>In its current state, our code isn’t very resilient to failed requests. For this reason, you may want to consider keeping track of jobs in a queue with Redis (similar to Sidekiq in Ruby-land). When you have work you want to do, put it in the queue. Your queue worker would have to write the result somewhere (e.g., in Redis) that the application could fetch when it’s ready.</p>\n<h3 id='starting-stopping-worker-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#starting-stopping-worker-machines' aria-label='Anchor'></a><span class='plain-code'>Starting/stopping worker Machines</span></h3>\n<p>The benefit of this pattern is that you can limit how many “beefy” Machines you need to have available at any given time. Our demo app doesn’t dictate how many <code>worker</code> Machines to have at any given time, but by adding timeouts you could elect to start and stop them as needed.</p>\n\n<p>Now, you may think that constantly starting and stopping Machines might incur higher response times, but note that we are NOT talking about creating/destroying Machines. Starting and stopping Machines only takes as long as it takes to start your web server (i.e. <code>npm run start</code>). The best part is that <strong class='font-semibold text-navy-950'>Fly.io does not charge for the CPU and RAM usage of stopped Machines.</strong> <a href='https://community.fly.io/t/we-are-going-to-start-collecting-charges-for-stopped-machines-rootfs-starting-april-25th/17825' title=''>We will charge for storage of their root filesystems on disk, starting April 25th, 2024</a>. Stopped Machines will still be much cheaper than running ones.</p>\n<h3 id='what-about-serverless-functions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-about-serverless-functions' aria-label='Anchor'></a><span class='plain-code'>What about serverless functions?</span></h3>\n<p>This “delegate to a beefy machine” pattern is similar to serverless functions with platforms like AWS Lambda. The main difference is that serverless functions usually require you to segment your application into a bunch of small pieces, whereas the method discussed today just uses the app framework that you deploy to production. Each pattern has its own benefits and downsides.</p>\n<h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2>\n<p>The pattern outlined here is one more tool in your arsenal for scaling applications. By utilizing Fly.io’s private network and <code>.internal</code> domains, it’s quick and easy to pass work between different processes that run our app. If you’d like to learn about more methods for scaling tasks in your applications, check out <a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''>Rethinking Serverless with FLAME</a> by Chris McCord and <a href='https://fly.io/blog/print-on-demand/' title=''>Print on Demand</a> by Sam Ruby.</p>\n<figure class=\"post-cta\">\n <figcaption>\n <h1>Get more done on Fly.io</h1>\n <p>Fly.io has fast booting machines at the ready for your dynamic workloads. It’s easy to get started. You can be off and running in minutes.</p>\n <a class=\"btn btn-lg\" href=\"https://fly.io/docs/speedrun/\">\n Deploy something today! <span class='opacity:50'>→</span>\n </a>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-rabbit.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>", "image": { "url": "https://fly.io/blog/delegate-tasks-to-fly-machines/assets/delegate-tasks-to-fly-machines-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/macaroons-escalated-quickly/", "title": "Macaroons Escalated Quickly", "description": null, "url": "https://fly.io/blog/macaroons-escalated-quickly/", "published": "2024-01-31T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We built a new security token system, and can I tell you the good news about our lord and savior the Macaroon?</p>\n</div><h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'></a><span class='plain-code'>1</span></h2>\n<p>Let’s implement an API token together. It’s a design called “Macaroons”, but don’t get hung up on that yet.</p>\n\n<p>First some <button toggle=\"#includes\">throat-clearing</button>. Then:</p>\n<div id=\"includes\" toggle-content=\"\" aria-label=\"show very boring code\"><div class=\"highlight-wrapper group relative python\">\n <button type=\"button\" class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\" data-wrap-target=\"#code-1c9mit0n\">\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\"></path><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\"></path></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button type=\"button\" class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\" data-copy-target=\"sibling\">\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\"></path><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\"></path></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class=\"highlight relative group\">\n <pre class=\"highlight \"><code id=\"code-1c9mit0n\"><span class=\"kn\">import</span> <span class=\"nn\">sys</span>\n<span class=\"kn\">import</span> <span class=\"nn\">os</span>\n<span class=\"kn\">import</span> <span class=\"nn\">json</span>\n<span class=\"kn\">import</span> <span class=\"nn\">hmac</span> <span class=\"k\">as</span> <span class=\"n\">hm</span>\n<span class=\"kn\">from</span> <span class=\"nn\">base64</span> <span class=\"kn\">import</span> <span class=\"n\">b64encode</span><span class=\"p\">,</span> <span class=\"n\">b64decode</span>\n<span class=\"kn\">from</span> <span class=\"nn\">hashlib</span> <span class=\"kn\">import</span> <span class=\"n\">sha256</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">hmac</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"n\">v</span><span class=\"p\">):</span> <span class=\"k\">return</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"n\">v</span><span class=\"p\">,</span> <span class=\"n\">sha256</span><span class=\"p\">).</span><span class=\"n\">digest</span><span class=\"p\">()</span>\n<span class=\"k\">def</span> <span class=\"nf\">enc</span><span class=\"p\">(</span><span class=\"n\">x</span><span class=\"p\">):</span> <span class=\"k\">return</span> <span class=\"n\">b64encode</span><span class=\"p\">(</span><span class=\"n\">x</span><span class=\"p\">)</span>\n<span class=\"k\">def</span> <span class=\"nf\">dec</span><span class=\"p\">(</span><span class=\"n\">x</span><span class=\"p\">):</span> <span class=\"k\">return</span> <span class=\"n\">b64decode</span><span class=\"p\">(</span><span class=\"n\">x</span><span class=\"p\">)</span>\n</code></pre>\n </div>\n</div></div><div class=\"highlight-wrapper group relative python\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-7t25lxr4\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-7t25lxr4\"><span class=\"k\">def</span> <span class=\"nf\">blank_token</span><span class=\"p\">(</span><span class=\"n\">uid</span><span class=\"p\">,</span> <span class=\"n\">key</span><span class=\"p\">):</span>\n <span class=\"n\">nonce</span> <span class=\"o\">=</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"s\">\":\"</span><span class=\"p\">.</span><span class=\"n\">join</span><span class=\"p\">([</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"n\">uid</span><span class=\"p\">),</span> <span class=\"n\">os</span><span class=\"p\">.</span><span class=\"n\">urandom</span><span class=\"p\">(</span><span class=\"mi\">16</span><span class=\"p\">)]))</span>\n <span class=\"k\">return</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">dumps</span><span class=\"p\">([</span><span class=\"n\">nonce</span><span class=\"p\">,</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">hmac</span><span class=\"p\">(</span><span class=\"n\">key</span><span class=\"p\">,</span> <span class=\"n\">nonce</span><span class=\"p\">))])</span>\n</code></pre>\n </div>\n</div><div class=\"right-sidenote\"><p>Bearer tokens: like cookies, blobs you attach to a request (usually in an HTTP header).</p>\n</div>\n<p>We’re going to build a minimally-stateful bearer token, a blob signed with HMAC. Nothing fancy so far. <a href='https://api.rubyonrails.org/classes/ActiveSupport/MessageVerifier.html' title=''>Rails has done this</a> for a decade and a half.</p>\n\n<p>There’s a <a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''>fashion in API security for stateless tokens</a>, which encode all the data you’d need to check any request accompanied by that token – without a database lookup. Stateless tokens have some nice properties, and some less-nice. Our tokens won’t be stateless: they carry a user ID, with which we’ll look up the HMAC key to verify it. But they’ll stake out a sort of middle ground.</p>\n<div class=\"highlight-wrapper group relative python\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-r52d35ga\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-r52d35ga\"><span class=\"k\">def</span> <span class=\"nf\">attenuate</span><span class=\"p\">(</span><span class=\"n\">macStr</span><span class=\"p\">,</span> <span class=\"n\">cav</span><span class=\"p\">):</span>\n <span class=\"n\">mac</span> <span class=\"o\">=</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">loads</span><span class=\"p\">(</span><span class=\"n\">macStr</span><span class=\"p\">)</span>\n <span class=\"n\">cavStr</span> <span class=\"o\">=</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">dumps</span><span class=\"p\">(</span><span class=\"n\">cav</span><span class=\"p\">)</span>\n <span class=\"n\">oldTail</span> <span class=\"o\">=</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">mac</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">])</span>\n <span class=\"n\">newTail</span> <span class=\"o\">=</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">hmac</span><span class=\"p\">(</span><span class=\"n\">oldTail</span><span class=\"p\">,</span> <span class=\"n\">cavStr</span><span class=\"p\">))</span>\n <span class=\"k\">return</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">dumps</span><span class=\"p\">(</span><span class=\"n\">mac</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">:</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]</span> <span class=\"o\">+</span> <span class=\"p\">[</span><span class=\"n\">cavStr</span><span class=\"p\">,</span> <span class=\"n\">newTail</span><span class=\"p\">])</span>\n\n<span class=\"n\">m0</span> <span class=\"o\">=</span> <span class=\"n\">blank_token</span><span class=\"p\">(</span><span class=\"mi\">10</span><span class=\"p\">,</span> <span class=\"n\">keys</span><span class=\"p\">[</span><span class=\"mi\">10</span><span class=\"p\">])</span>\n<span class=\"n\">m1</span> <span class=\"o\">=</span> <span class=\"n\">attenuate</span><span class=\"p\">(</span><span class=\"n\">m0</span><span class=\"p\">,</span> <span class=\"p\">{</span><span class=\"s\">'path'</span><span class=\"p\">:</span> <span class=\"s\">'/images'</span><span class=\"p\">})</span>\n<span class=\"n\">m2</span> <span class=\"o\">=</span> <span class=\"n\">attenuate</span><span class=\"p\">(</span><span class=\"n\">m1</span><span class=\"p\">,</span> <span class=\"p\">{</span><span class=\"s\">'op'</span><span class=\"p\">:</span> <span class=\"s\">'read'</span><span class=\"p\">})</span>\n</code></pre>\n </div>\n</div>\n<p>Let’s add some stuff.</p>\n\n<p>The meat of our tokens will be a series of claims we call “caveats”. We call them that because each claim restricts further what the token authorizes. After <code>{'path': '/images'}</code>, this token only allows operations that happen underneath the <code>/images</code> directory. Then, after <code>{'op': 'read'}</code>, it allows only reads, not writes.</p>\n\n<p>(I guess we’re building a file sharing system. Whatever.)</p>\n\n<p>Some important things about things about this design. First: by implication from the fact that caveats further restrict tokens, a token with no caveats restricts nothing. It’s a god-mode token. Don’t honor it.</p>\n<div class=\"right-sidenote\"><p>In other words: the ordering of caveats doesn’t matter.</p>\n</div>\n<p>Second: the rule of checking caveats is very simple: every single caveat must pass, evaluating <code>True</code> against the request that carries it, in isolation and without reference to any other caveat. If any caveat evaluates <code>False</code>, the request fails. In that way, we ensure that adding caveats to a token can only ever weaken it.</p>\n\n<p>With that in mind, take a closer look at this code:</p>\n<div class=\"highlight-wrapper group relative python\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-n7mgbkwf\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-n7mgbkwf\"><span class=\"n\">oldTail</span> <span class=\"o\">=</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">mac</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">])</span>\n<span class=\"n\">newTail</span> <span class=\"o\">=</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">hmac</span><span class=\"p\">(</span><span class=\"n\">oldTail</span><span class=\"p\">,</span> <span class=\"n\">cavStr</span><span class=\"p\">))</span>\n</code></pre>\n </div>\n</div>\n<p>Every caveat is HMAC-signed independently, which is weird. Weirder still, the key for that HMAC is the output of the last HMAC. The caveats chain together, and the HMAC of the last caveat becomes the “tail” of the token.</p>\n\n<p>Creating a new blank token for a particular user requires a key that the server (and probably only the server) knows. But adding a caveat doesn’t! Anybody can add a caveat. In our design, you, the user, can edit your own API token.</p>\n<div class=\"highlight-wrapper group relative python\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-nx5eitys\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-nx5eitys\"><span class=\"k\">def</span> <span class=\"nf\">verify</span><span class=\"p\">(</span><span class=\"n\">macStr</span><span class=\"p\">,</span> <span class=\"n\">keys</span><span class=\"p\">):</span>\n <span class=\"n\">mac</span> <span class=\"o\">=</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">loads</span><span class=\"p\">(</span><span class=\"n\">macStr</span><span class=\"p\">)</span>\n <span class=\"n\">nonce</span> <span class=\"o\">=</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">mac</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">]).</span><span class=\"n\">split</span><span class=\"p\">(</span><span class=\"s\">\":\"</span><span class=\"p\">)</span>\n <span class=\"n\">key</span> <span class=\"o\">=</span> <span class=\"n\">keys</span><span class=\"p\">[</span><span class=\"nb\">int</span><span class=\"p\">(</span><span class=\"n\">nonce</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">])]</span>\n <span class=\"n\">tail</span> <span class=\"o\">=</span> <span class=\"s\">\"\"</span>\n <span class=\"k\">for</span> <span class=\"n\">cav</span> <span class=\"ow\">in</span> <span class=\"n\">mac</span><span class=\"p\">[:</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]:</span>\n <span class=\"n\">tail</span> <span class=\"o\">=</span> <span class=\"n\">hmac</span><span class=\"p\">(</span><span class=\"n\">key</span><span class=\"p\">,</span> <span class=\"n\">cav</span><span class=\"p\">)</span>\n <span class=\"n\">key</span> <span class=\"o\">=</span> <span class=\"n\">tail</span>\n <span class=\"k\">return</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">compare_digest</span><span class=\"p\">(</span><span class=\"n\">tail</span><span class=\"p\">,</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">mac</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]))</span>\n\n<span class=\"n\">verify</span><span class=\"p\">(</span><span class=\"n\">m2</span><span class=\"p\">,</span> <span class=\"n\">keys</span><span class=\"p\">)</span> <span class=\"c1\"># => True\n</span></code></pre>\n </div>\n</div>\n<p>For completeness, and to make a point, there’s the verification code. Look up the original secret key from the user ID, and then it’s chained HMAC all the way down. The point I’m making is that Macaroons are very simple.</p>\n<h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'></a><span class='plain-code'>2</span></h2>\n<p>Back in 2014, Google published <a href='https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41892.pdf' title=''>a paper at NDSS</a> introducing “Macaroons”, a new kind of cookie. Since then, they’ve become a sort of hipster shibboleth. But they’re more talked about than implemented, which is a nice way to say that practically nobody uses them.</p>\n\n<p>Until now! I dragged Fly.io into implementing them. Suckers!</p>\n\n<p>We had a problem: our API tokens were much too powerful. We needed to scope them down and let them express roles, and I scoped up that project to replace OAuth2 tokens altogether. We now have what I think is one of the more expansive Macaroon implementations on the Internet.</p>\n\n<p>I dragged us into using Macaroons because I wanted us to use a hipster token format. Google designed Macaroons for a bigger reason: they hoped to replace browser cookies with something much more powerful.</p>\n\n<p>The problem with simple bearer tokens, like browser cookies or JWTs, is that they’re prone to being stolen and replayed by attackers.</p>\n<div class=\"right-sidenote\"><p>game-over: pentest jargon for “very bad”</p>\n</div>\n<p>Worse, a stolen token is usually a game-over condition. In most schemes, a bearer token is an all-access pass for the associated user. For some applications this isn’t that big a deal, but then, <a href='https://neilmadden.blog/2020/09/09/macaroon-access-tokens-for-oauth-part-2-transactional-auth/' title=''>think about banking</a>. A banking app token that authorizes arbitrary transactions is a recipe for having a small heart attack on every HTTP request.</p>\n<div class=\"right-sidenote\"><p>(Perfectly minimized API tokens: a software security holy grail)</p>\n</div>\n<p>Macaroons are user-editable tokens that enable JIT-generated least-privilege tokens. With minimal ceremony and no additional API requests, a banking app Macaroon lets you authorize a request with a caveat like, I don’t know, <code>{'maxAmount': '$5'}</code>. I mean, something way better than that, probably lots of caveats, not just one, but you get the idea: a token so minimized you feel safe sending it with your request. Ideally, a token that only authorizes that single, intended request.</p>\n<h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'></a><span class='plain-code'>3</span></h2>\n<p>That’s not why we like Macaroons. We already assume our tokens aren’t being stolen.</p>\n\n<p>In most systems, the developers come up with a permissions system, and you’re stuck with it. We run a public cloud platform, and people want a lot of different things from our permissions. The dream is, we (the low-level platform developers on the team) design a single permission system, one time, and go about our jobs never thinking about this problem again.</p>\n\n<p>Instead of thinking of all of our “roles” in advance, we just model our platform with caveats:</p>\n\n<ol>\n<li>Users belong to <code>Organizations</code>.\n</li><li><code>Organizations</code> own <code>Apps</code>.\n</li><li><code>Apps</code> contain <code>Machines</code> and <code>Volumes</code>.\n</li><li>To any of these things, you can <code>Read</code>, <code>Write</code>, <code>Create</code>, <code>Delete</code>, and/or <code>Control</code> <aside class=\"right-sidenote\">control being change of state, like “start” and “stop”</aside>.\n</li><li>Some administrivia, like expiration (<code>ValidityWindow</code>), locking tokens to specific Fly Machines (<code>FromMachineSource</code>), and escape hatches like <code>Mutation</code> (for our GraphQL API).\n</li></ol>\n<div class=\"right-sidenote\"><p>(this is a vibes-based notation, don’t think too hard about it)</p>\n</div>\n<p>Simplistic. But it expresses admin tokens:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-x5iepn6s\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-x5iepn6s\">Organization 4721, mask=*\n</code></pre>\n </div>\n</div>\n<p>And it expresses normal user tokens:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-srsndejy\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-srsndejy\">Organization 4721, mask=read,write,control\n(App 123, mask=control), (App 345, mask=read, write, control)\n</code></pre>\n </div>\n</div>\n<p>And also an auditor-only token for that user:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-jh9ga1bt\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-jh9ga1bt\">Organization 4721, mask=read,write,control\n(App 123, mask=control), (App 345, mask=read, write, control)\nOrganization 4721, mask=read\n</code></pre>\n </div>\n</div><div class=\"right-sidenote\"><p>(our deploy tokens are more complicated than this)</p>\n</div>\n<p>Or a deployment-only token, for a CI/CD system:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-pe18x39a\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-pe18x39a\">Organization 4721, mask=write,control\n(App 123, mask=*)\n</code></pre>\n </div>\n</div>\n<p>Those are just the roles we came up with. Users can invent others. The important thing is that they don’t have to bother me about them.</p>\n<h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'></a><span class='plain-code'>4</span></h2>\n<p>Astute readers will have noticed by now that we haven’t shown any code that actually evaluates a caveat. That’s because it’s boring, and I’m too lazy to write it out. Got an <code>Organization</code> token for <code>image-hosting</code> that allows <code>Reads</code>? Ok; check and make sure the incoming request is for an asset of <code>image-hosting</code>, and that it’s a <code>Read</code>. Whatever code you came up with, it’d be fine.</p>\n\n<p>These straightforward restrictions are called “first party caveats”. The first party is us, the platform. We’ve got all the information we need to check them.</p>\n\n<p>Let’s kit out our token format some more.</p>\n<div class=\"highlight-wrapper group relative python\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-rvmob8wx\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-rvmob8wx\"><span class=\"k\">def</span> <span class=\"nf\">third_party_caveat</span><span class=\"p\">(</span><span class=\"n\">ka</span><span class=\"p\">,</span> <span class=\"n\">tail</span><span class=\"p\">,</span> <span class=\"n\">msg</span><span class=\"p\">,</span> <span class=\"n\">url</span><span class=\"p\">):</span>\n <span class=\"n\">crk</span> <span class=\"o\">=</span> <span class=\"n\">os</span><span class=\"p\">.</span><span class=\"n\">urandom</span><span class=\"p\">(</span><span class=\"mi\">16</span><span class=\"p\">)</span>\n <span class=\"n\">ticket</span> <span class=\"o\">=</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">encrypt</span><span class=\"p\">(</span><span class=\"n\">ka</span><span class=\"p\">,</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">dumps</span><span class=\"p\">({</span>\n <span class=\"s\">'crk'</span><span class=\"p\">:</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">crk</span><span class=\"p\">),</span>\n <span class=\"s\">'msg'</span><span class=\"p\">:</span> <span class=\"n\">msg</span>\n <span class=\"p\">})))</span>\n <span class=\"n\">challenge</span> <span class=\"o\">=</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">encrypt</span><span class=\"p\">(</span><span class=\"n\">tail</span><span class=\"p\">,</span> <span class=\"n\">crk</span><span class=\"p\">))</span>\n <span class=\"k\">return</span> <span class=\"p\">{</span> <span class=\"s\">'url'</span><span class=\"p\">:</span> <span class=\"n\">url</span><span class=\"p\">,</span> <span class=\"s\">'ticket'</span><span class=\"p\">:</span> <span class=\"n\">ticket</span><span class=\"p\">,</span> <span class=\"s\">'challenge'</span> <span class=\"p\">:</span> <span class=\"n\">challenge</span> <span class=\"p\">}</span>\n\n<span class=\"n\">key</span> <span class=\"o\">=</span> <span class=\"nb\">bytes</span><span class=\"p\">(</span><span class=\"s\">\"YELLOW SUBMARINE\"</span><span class=\"p\">)</span>\n<span class=\"n\">url</span> <span class=\"o\">=</span> <span class=\"s\">\"https://canary.service\"</span>\n<span class=\"n\">c3</span> <span class=\"o\">=</span> <span class=\"n\">third_party_caveat</span><span class=\"p\">(</span><span class=\"n\">key</span><span class=\"p\">,</span> <span class=\"n\">tail</span><span class=\"p\">,</span> <span class=\"n\">url</span><span class=\"p\">,</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">dumps</span><span class=\"p\">({</span><span class=\"s\">'user'</span><span class=\"p\">:</span> <span class=\"s\">'bobson.dugnutt'</span><span class=\"p\">}))</span>\n<span class=\"n\">m3</span> <span class=\"o\">=</span> <span class=\"n\">attenuate</span><span class=\"p\">(</span><span class=\"n\">m2</span><span class=\"p\">,</span> <span class=\"n\">c3</span><span class=\"p\">)</span>\n</code></pre>\n </div>\n</div>\n<p>Up till now, we’ve gotten by with nothing but HMAC, which is one of the great charms of the design. Now we need to encrypt. There’s no authenticated encryption in the Python standard library, but that won’t stop us. <button toggle=\"#hmac-ctr\">Ready to make some candy? Hand me that brake fluid!</button></p>\n<div id=\"hmac-ctr\" toggle-content=\"\" aria-label=\"show very silly code\"><div class=\"highlight-wrapper group relative python\">\n <button type=\"button\" class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\" data-wrap-target=\"#code-brvb3s1v\">\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\"></path><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\"></path></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button type=\"button\" class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\" data-copy-target=\"sibling\">\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\"></path><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\"></path></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class=\"highlight relative group\">\n <pre class=\"highlight \"><code id=\"code-brvb3s1v\"><span class=\"c1\"># do i really need to say that i'm not serious about this?\n</span>\n<span class=\"k\">def</span> <span class=\"nf\">hmactr</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"n\">n</span><span class=\"p\">):</span>\n <span class=\"n\">ks</span> <span class=\"o\">=</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"o\">+</span><span class=\"n\">n</span><span class=\"p\">)</span>\n <span class=\"k\">for</span> <span class=\"n\">counter</span> <span class=\"ow\">in</span> <span class=\"nb\">xrange</span><span class=\"p\">(</span><span class=\"n\">sys</span><span class=\"p\">.</span><span class=\"n\">maxint</span><span class=\"p\">):</span>\n <span class=\"n\">ks</span><span class=\"p\">.</span><span class=\"n\">update</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"n\">counter</span><span class=\"p\">))</span>\n <span class=\"n\">kbs</span> <span class=\"o\">=</span> <span class=\"n\">ks</span><span class=\"p\">.</span><span class=\"n\">digest</span><span class=\"p\">()</span>\n <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">xrange</span><span class=\"p\">(</span><span class=\"mi\">16</span><span class=\"p\">):</span> <span class=\"k\">yield</span> <span class=\"n\">kbs</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">encrypt</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"n\">buf</span><span class=\"p\">):</span>\n <span class=\"n\">ak</span> <span class=\"o\">=</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"s\">'auth'</span><span class=\"p\">).</span><span class=\"n\">digest</span><span class=\"p\">()</span>\n <span class=\"n\">nonce</span> <span class=\"o\">=</span> <span class=\"n\">os</span><span class=\"p\">.</span><span class=\"n\">urandom</span><span class=\"p\">(</span><span class=\"mi\">16</span><span class=\"p\">)</span>\n <span class=\"n\">cipher</span> <span class=\"o\">=</span> <span class=\"n\">hmactr</span><span class=\"p\">(</span><span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"s\">'enc'</span><span class=\"p\">).</span><span class=\"n\">digest</span><span class=\"p\">(),</span> <span class=\"n\">nonce</span><span class=\"p\">)</span>\n <span class=\"n\">ctxt</span> <span class=\"o\">=</span> <span class=\"nb\">bytearray</span><span class=\"p\">(</span><span class=\"n\">buf</span><span class=\"p\">)</span>\n <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">xrange</span><span class=\"p\">(</span><span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">buf</span><span class=\"p\">)):</span>\n <span class=\"n\">ctxt</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span> <span class=\"o\">^=</span> <span class=\"nb\">ord</span><span class=\"p\">(</span><span class=\"n\">cipher</span><span class=\"p\">.</span><span class=\"nb\">next</span><span class=\"p\">())</span>\n <span class=\"n\">res</span> <span class=\"o\">=</span> <span class=\"n\">nonce</span> <span class=\"o\">+</span> <span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"n\">ctxt</span><span class=\"p\">)</span>\n <span class=\"k\">return</span> <span class=\"n\">res</span> <span class=\"o\">+</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">ak</span><span class=\"p\">,</span> <span class=\"n\">res</span><span class=\"p\">).</span><span class=\"n\">digest</span><span class=\"p\">()</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">decrypt</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"n\">buf</span><span class=\"p\">):</span>\n <span class=\"n\">ak</span> <span class=\"o\">=</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"s\">'auth'</span><span class=\"p\">).</span><span class=\"n\">digest</span><span class=\"p\">()</span>\n <span class=\"k\">if</span> <span class=\"ow\">not</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">compare_digest</span><span class=\"p\">(</span><span class=\"n\">buf</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">16</span><span class=\"p\">:],</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">ak</span><span class=\"p\">,</span> <span class=\"n\">buf</span><span class=\"p\">[:</span><span class=\"o\">-</span><span class=\"mi\">16</span><span class=\"p\">]).</span><span class=\"n\">digest</span><span class=\"p\">()):</span>\n <span class=\"k\">return</span> <span class=\"bp\">False</span>\n <span class=\"n\">nonce</span> <span class=\"o\">=</span> <span class=\"n\">buf</span><span class=\"p\">[:</span><span class=\"mi\">16</span><span class=\"p\">]</span>\n <span class=\"n\">cipher</span> <span class=\"o\">=</span> <span class=\"n\">hmactr</span><span class=\"p\">(</span><span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">new</span><span class=\"p\">(</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"s\">'enc'</span><span class=\"p\">).</span><span class=\"n\">digest</span><span class=\"p\">(),</span> <span class=\"n\">nonce</span><span class=\"p\">)</span>\n <span class=\"n\">ptxt</span> <span class=\"o\">=</span> <span class=\"nb\">bytearray</span><span class=\"p\">(</span><span class=\"n\">buf</span><span class=\"p\">[</span><span class=\"mi\">16</span><span class=\"p\">:</span><span class=\"o\">-</span><span class=\"mi\">16</span><span class=\"p\">])</span>\n <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">xrange</span><span class=\"p\">(</span><span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">buf</span><span class=\"p\">[</span><span class=\"mi\">16</span><span class=\"p\">:</span><span class=\"o\">-</span><span class=\"mi\">16</span><span class=\"p\">])):</span>\n <span class=\"n\">ptxt</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span> <span class=\"o\">^=</span> <span class=\"nb\">ord</span><span class=\"p\">(</span><span class=\"n\">cipher</span><span class=\"p\">.</span><span class=\"nb\">next</span><span class=\"p\">())</span>\n <span class=\"k\">return</span> <span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"n\">ptxt</span><span class=\"p\">)</span>\n</code></pre>\n </div>\n</div></div>\n<p>With “third-party” caveats comes a cast of characters. We’re still the first party. You’ll play the second party. The third party is any other system in the world that you trust: an SSO system, an audit log, a revocation checker, whatever.</p>\n\n<p>Here’s the trick of the third-party caveat: our platform doesn’t know what your caveat means, and it doesn’t have to. Instead, when you see a third-party caveat in your token, you tear a ticket off it and exchange it for a “discharge Macaroon” with that third party. You submit both Macaroons together to us.</p>\n\n<p>Let’s attenuate our token with a third-party caveat hooking it up to a “canary” service that generates a notice approximately any time the token is used.</p>\n\n<p><img src=\"/blog/macaroons-escalated-quickly/assets/third-party.png?1/2&wrap-left\" /></p>\n\n<p>To build that canary caveat, you first make a <code>ticket</code> that users of the token will hand to your canary, and then a <code>challenge</code> that Fly.io will use to verify discharges your checker spits out. The ticket and the challenge are both encrypted. The ticket is encrypted under <code>KA</code>, so your service can read it. The challenge is encrypted under the previous Macaroon tail, so only Fly.io can read it. Both hide yet another key, the random HMAC key <code>CRK</code> (“caveat root key”).</p>\n\n<p>In addition to <code>CRK</code>, the ticket contains a message, which says whatever you want it to; Fly.io doesn’t care. Typically, the message describes some kind of additional checking you want your service to perform before spitting out a discharge token.</p>\n<div class=\"highlight-wrapper group relative python\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-135v2c4d\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-135v2c4d\"><span class=\"k\">def</span> <span class=\"nf\">discharge</span><span class=\"p\">(</span><span class=\"n\">ka</span><span class=\"p\">,</span> <span class=\"n\">ticket</span><span class=\"p\">):</span>\n <span class=\"n\">ptxt</span> <span class=\"o\">=</span> <span class=\"n\">decrypt</span><span class=\"p\">(</span><span class=\"n\">ka</span><span class=\"p\">,</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">ticket</span><span class=\"p\">))</span>\n <span class=\"k\">if</span> <span class=\"n\">ptxt</span> <span class=\"o\">==</span> <span class=\"bp\">False</span><span class=\"p\">:</span> <span class=\"k\">return</span> <span class=\"bp\">False</span>\n <span class=\"n\">tbody</span> <span class=\"o\">=</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">loads</span><span class=\"p\">(</span><span class=\"n\">ptxt</span><span class=\"p\">)</span>\n <span class=\"c1\"># not shown: do something with tbody['msg']\n</span> <span class=\"k\">return</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">dumps</span><span class=\"p\">([</span><span class=\"n\">ticket</span><span class=\"p\">,</span> <span class=\"n\">enc</span><span class=\"p\">(</span><span class=\"n\">hmac</span><span class=\"p\">(</span><span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">tbody</span><span class=\"p\">[</span><span class=\"s\">'crk'</span><span class=\"p\">]),</span> <span class=\"n\">ticket</span><span class=\"p\">))])</span>\n</code></pre>\n </div>\n</div>\n<p>To authorize a request with a token that includes a third-party caveat for the canary service, you need to get your hands on a corresponding discharge Macaroon. Normally, you do that by <code>POST</code>ing the ticket from the caveat to the service.</p>\n\n<p>Discharging is simple. The service, which holds <code>KA</code>, uses it to decrypt the ticket. It checks the message and makes some decisions. Finally, it mints a new macaroon, using <code>CRK</code>, recovered from the ticket, as the root key. The ticket itself is the nonce.</p>\n\n<p>If it wants, the third-party service can slap on a bunch of first-party caveats of its own. When we verify the Macaroon, we’ll copy those caveats out and enforce them. Attenuation of a third-party discharge macaroon works like a normal macaroon.</p>\n<div class=\"highlight-wrapper group relative python\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-gjymtoma\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-gjymtoma\"><span class=\"k\">def</span> <span class=\"nf\">verify_third_party</span><span class=\"p\">(</span><span class=\"n\">tag</span><span class=\"p\">,</span> <span class=\"n\">cav</span><span class=\"p\">,</span> <span class=\"n\">discharges</span><span class=\"o\">=</span><span class=\"p\">[]):</span>\n <span class=\"n\">crk</span> <span class=\"o\">=</span> <span class=\"n\">decrypt</span><span class=\"p\">(</span><span class=\"n\">tag</span><span class=\"p\">,</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">cav</span><span class=\"p\">[</span><span class=\"s\">'challenge'</span><span class=\"p\">]))</span>\n <span class=\"k\">if</span> <span class=\"n\">crk</span> <span class=\"o\">==</span> <span class=\"bp\">False</span><span class=\"p\">:</span> <span class=\"k\">return</span> <span class=\"bp\">False</span>\n <span class=\"n\">discharge</span> <span class=\"o\">=</span> <span class=\"bp\">None</span>\n <span class=\"k\">for</span> <span class=\"n\">dcs</span> <span class=\"ow\">in</span> <span class=\"n\">discharges</span><span class=\"p\">:</span>\n <span class=\"k\">if</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">loads</span><span class=\"p\">(</span><span class=\"n\">dcs</span><span class=\"p\">)[</span><span class=\"mi\">0</span><span class=\"p\">]</span> <span class=\"o\">==</span> <span class=\"n\">cav</span><span class=\"p\">[</span><span class=\"s\">'ticket'</span><span class=\"p\">]:</span>\n <span class=\"n\">discharge</span> <span class=\"o\">=</span> <span class=\"n\">dcs</span>\n <span class=\"k\">break</span>\n <span class=\"k\">if</span> <span class=\"ow\">not</span> <span class=\"n\">discharge</span><span class=\"p\">:</span> <span class=\"k\">return</span> <span class=\"bp\">False</span>\n <span class=\"n\">mac</span> <span class=\"o\">=</span> <span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">loads</span><span class=\"p\">(</span><span class=\"n\">discharge</span><span class=\"p\">)</span>\n <span class=\"n\">key</span> <span class=\"o\">=</span> <span class=\"n\">crk</span>\n <span class=\"c1\"># boring old stuff ---------------------\n</span> <span class=\"n\">tag</span> <span class=\"o\">=</span> <span class=\"s\">\"\"</span>\n <span class=\"k\">for</span> <span class=\"n\">cav</span> <span class=\"ow\">in</span> <span class=\"n\">mac</span><span class=\"p\">[:</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]:</span>\n <span class=\"n\">tag</span> <span class=\"o\">=</span> <span class=\"n\">hmac</span><span class=\"p\">(</span><span class=\"n\">key</span><span class=\"p\">,</span> <span class=\"n\">cav</span><span class=\"p\">)</span>\n <span class=\"n\">key</span> <span class=\"o\">=</span> <span class=\"n\">tag</span>\n <span class=\"k\">return</span> <span class=\"n\">hm</span><span class=\"p\">.</span><span class=\"n\">compare_digest</span><span class=\"p\">(</span><span class=\"n\">tag</span><span class=\"p\">,</span> <span class=\"n\">dec</span><span class=\"p\">(</span><span class=\"n\">mac</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]))</span>\n</code></pre>\n </div>\n</div>\n<p>To verify tokens that have third-party caveats, start with the root Macaroon, walking the caveats like usual. At each third-party caveat, match the <code>ticket</code> from the caveat with the <code>nonce</code> on the discharge Macaroon. The key for root Macaroon decrypts the <code>challenge</code> in the caveat, recovering <code>CRK</code>, which cryptographically verifies the discharge.</p>\n\n<p>(The Macaroons paper uses different terms: “caveat identifier” or <code>cId</code> for “ticket”, and “verification-key identifier” or <code>vId</code> for “challenge”. These names are self-evidently bad and our contribution to the state of the art is to replace them.)</p>\n\n<p>There’s two big applications for third-party caveats in Popular Macaroon Thought. First, they facilitate microservice-izing your auth logic, because you can stitch arbitrary policies together out of third-party caveats. And, they seem like <a href='https://github.com/go-macaroon-bakery/macaroon-bakery' title=''>fertile ground for an ecosystem of interoperable Macaroon services</a>: Okta and Google could stand up SSO dischargers, for instance, or someone can do a really good revocation service.</p>\n\n<p>Neither of these light us up. We’re allergic to microservices. As for public protocols, well, it’s good to want things. So we almost didn’t even implement third-party caveats.</p>\n<h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'></a><span class='plain-code'>5</span></h2>\n<p>I’m glad we did though, because they’ve been pretty great.</p>\n\n<p>The first problem third-party caveats solved for us was hazmat tokens. To the extent possible, we want Macaroon tokens to be safe to transmit between users. Our Macaroons express permissions, but not authentication, so it’s almost safe to email them.</p>\n\n<p>The way it works is, our Macaroons all have a third-party caveat pointing to a “login service”, either identifying the proper bearer as a particular Fly.io user or as a member of some <code>Organization</code>. To allow a request with your token, you first need to collect the discharge from the login service, which requires authentication.</p>\n\n<p>The login discharge is very sensitive, but there isn’t much reason to pass it around. The original permissions token is where all the interesting stuff is, and it’s not scary. So that’s nice.</p>\n\n<p><img src=\"/blog/macaroons-escalated-quickly/assets/fly-sso.png?1/3&wrap-left\" /></p>\n\n<p>Ben then came up with <a href=\"https://community.fly.io/t/organization-required-sso/17560\">third-party caveats that require Google or Github SSO logins.</a> If your token has one of those caveats, when you run <code>flyctl deploy</code>, a browser will pop up to log you into your SSO IdP (if you haven’t done so recently already).</p>\n\n<p>We’ve put a <a href='https://fly.io/blog/tokenized-tokens/#tokenizer-the-fabled-4th-way' title=''>bunch of work into getting the guts of our SSO system working</a>, but that work has mostly been invisible to customers. But Macaroon-ized SSO has a subtle benefit: you can configure <a href='http://Fly.io' title=''>Fly.io</a> to automatically add SSO requirements to specific <code>Organizations</code> (so, for instance, a dev environment might not need SSO at all, and prod might need two).</p>\n\n<p>SSO requirements in most applications are a brittle pain in the ass. Ours are flexible and straightforward, and that happened almost by accident. Macaroons, baby!</p>\n\n<p>Here’s a fun thing you can do with a Macaroon system: stand up a Slack bot, and give it an HTTP <code>POST</code> handler that accepts third-party tickets. Then:</p>\n\n<p><img src=\"/blog/macaroons-escalated-quickly/assets/bot-ok.png?1/2¢er&border\" /></p>\n\n<p>So, the bot is cute, but any platform could do that. What’s cool is the way our platform <em>doesn’t</em> work with Slack; in fact, nothing on our platform knows anything about Slack, and Slack doesn’t know anything about us. We didn’t reach out to a Slack endpoint. Everything was purely cryptographic.</p>\n\n<p>That bot could, if I sunk some time into it, enforce arbitrary rules: it could selectively add caveats for the requests it authorizes, based on lookups of the users requesting them, at specific times of day, with specific logging. Theoretically, it could add third-party caveats of its own.</p>\n\n<p>The win for us for third-party caveats is that they create a plugin system for our security tokens. That’s an unusual place to see a plugin interface! But Macaroons are easy to understand and keep in your head, so we’re pretty confident about the security issues.</p>\n<h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'></a><span class='plain-code'>6</span></h2>\n<p>Obviously, we didn’t write our Macaroon code in Python, or with HMAC-SHA256-CTR.</p>\n\n<p>We landed on a primary implementation Golang (Ben subsequently wrote an Elixir implementation). Our hash is SHA256, our cipher is Chapoly. We encode in MsgPack.</p>\n<div class=\"callout\"><p>We didn’t use the pre-existing public implementation because <a href=\"https://securitycryptographywhatever.com/2021/08/12/what-do-we-do-about-jwt-with-jonathan-rudenberg/\" title=\"\">we were warned not to</a>. The Macaroon idea is simple, and it exists mostly as an academic paper, not a standard. The community that formed around building open source “standard” Macaroons decided to use untyped opaque blobs to represent caveats. We need things to be as rigidly unambiguous as they can be.</p>\n</div>\n<p><img src=\"/blog/macaroons-escalated-quickly/assets/verifier-service.png?2/3¢er\" /></p>\n\n<p>The big strength of Macaroons as a cryptographic design — that it’s based almost entirely on HMAC — makes it a challenge to deploy. If you can verify a Macaroon, you can generate one. We have thousands of servers. They can’t all be allowed to generate tokens.</p>\n\n<p>What we did instead:</p>\n\n<ul>\n<li>We split token checking into “verification” of token HMAC tags and “clearing” of token caveats.\n</li><li>Verification occurs only on a physically isolated token-verification service; to verify a token’s tag, you HTTP <code>POST</code> the token to the verifier.\n</li><li>Clearing of token caveats can happen anywhere. Token caveat clearing is domain-specific and subject to change; token verification is simple cryptography and changes rarely.\n</li><li>A token verification is cacheable. The client library for the token verifier does that, which speeds things up by exploiting the locality of token submissions.\n</li><li>The verification service is backed by a <a href='https://fly.io/docs/litefs/' title=''>LiteFS-distributed SQLite database</a>, so verification is fast globally — a major step forward from our legacy OAuth2 tokens, which are only fast in Ashburn, VA.\n</li></ul>\n\n<p><img src=\"/blog/macaroons-escalated-quickly/assets/service-token.png?2/3¢er\" /></p>\n\n<p>Now buckle up, because I’m about to try to get you to care about service tokens.</p>\n\n<p>We operate “worker servers” all over the world to host apps for our customers. To do that, those workers need access to customer secrets, like the key to decrypt a customer volume. To retrieve those secrets, the workers have to talk to secrets management servers.</p>\n\n<p>We manage a lot of workers. We trust them. But we don’t trust them that much, if you get my drift. You don’t want to just leave it up to the servers to decide which secrets they can access. The blast radius of a problem with a single worker should be no greater than the apps that are supposed to run there.</p>\n\n<p>The gold standard for approving access to customer information is, naturally, explicit customer authorization. We almost have that with Macaroons! The first time an app runs on a worker, <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>the orchestrator code</a> has a token, and it can pass that along to the secret stores.</p>\n\n<p>The problem is, you need that token more than once; not just when the user does a deploy, but potentially any time you restart the app or migrate it to a new worker. And you can’t just store and replay user Macaroons. They have expirations.</p>\n<div class=\"right-sidenote\"><p>This is like dropping privilege with things like pledge(2), but in a distributed system.</p>\n</div>\n<p>So our token verification service exposes an API that transforms a user token into a “service token”, which is just the token with the authentication caveat and expiration “stripped off”.</p>\n\n<p>What’s cool is: components that receive service tokens can attenuate them. For instance, we could lock a token to a particular worker, or even a particular Fly Machine. Then we can expose the whole <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Fly Machines API</a> to customer VMs while keeping access traceable to specific customer tokens. Stealing the token from a Fly Machine doesn’t help you since it’s locked to that Fly Machine by a caveat attackers can’t strip.</p>\n<h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'></a><span class='plain-code'>7</span></h2>\n<p>If a customer loses their tokens to an attacker, we can’t just blow that off and let the attacker keep compromising the account!</p>\n<div class=\"right-sidenote\"><p>This cancels every token derived through attenuation by that nonce.</p>\n</div>\n<p>Every Macaroon we issue is identified by a unique nonce, and we can revoke tokens by that nonce. This is just a basic function of the token verification service we just described.</p>\n\n<p>We host token caches all over our fleet. Token revocation invalidates the caches. Anything with a cache checks frequently whether to invalidate. Revocation is rare, so just keeping a revocation list and invalidating caches wholesale seems fine.</p>\n<h2 id='8' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#8' aria-label='Anchor'></a><span class='plain-code'>8</span></h2>\n<p>I get it, it’s tough to get me to shut up about Macaroons.</p>\n\n<p>A couple years ago, I <a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''>wrote a long survey of API token designs</a>, from JWTs (never!) to Biscuits. I had a <a href='https://fly.io/blog/api-tokens-a-tedious-survey/#macaroons' title=''>bunch to say about Macaroons</a>, not all of it positive, and said we’d be plowing forward with them at Fly.io.</p>\n\n<p>My plan had been to follow up soon after with a deep dive on Macaroons as we planned them for Fly.io. I’m glad I didn’t do that, not just because it would’ve been embarrassing to announce a feature that took us over 2 years to launch, but also because the process of working on this with Ben Toews changed a lot of my thinking about them.</p>\n\n<p>I think if you asked Ben, he’d say he had mixed feelings about how much complexity we wrangled to get this launched. On the other hand: we got a lot of things out of them without trying very hard:</p>\n\n<ul>\n<li>Security tokens you can (almost) email to your users and partners without putting your account at risk.\n</li><li>A flexible permission system, encoded directly into the tokens, that users can drive without talking to our servers.\n</li><li>A plugin system that users can (when we clean up the tooling) use themselves, to add things like Passkeys or two-person-approval rules or audit logging, without us getting in the middle.\n</li><li>An SSO system that can stack different IdPs, mandate SSO login, and do that on a per-<code>Organization</code> basis.\n</li><li><a href='https://www.latacora.com/blog/2018/06/12/a-childs-garden/' title=''>Inter-service authorization</a> that is traceable back to customer actions, so our servers can’t just make up which apps they’re allowed to look at.\n</li><li>An elegant way of exposing our own APIs to customer Fly Machines with ambient authentication, but without the <a href=\"https://github.com/SummitRoute/imdsv2_wall_of_shame/blob/main/README.md\">AWS IMDSv1 credential theft problem</a>.\n</li></ul>\n\n<p>There are downsides and warts! I’m mostly not telling you about them! Pure restrictive caveats are an awkward way to express some roles. And, blinded by my hunger to get Macaroons deployed, I spat in the face of science and used internal database IDs as our public caveat format, an act for which JP will never forgive me.</p>\n\n<p>If i’ve piqued your interest, <a href='https://github.com/superfly/macaroon' title=''>the code for this stuff is public</a>, along with some more <a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''>detailed technical documentation</a>.</p>", "image": { "url": "https://fly.io/blog/macaroons-escalated-quickly/assets/evil-cookies-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/how-i-fly-yoko-li/", "title": "How Yoko Li makes towns, tamagoes, and tools for local AI", "description": null, "url": "https://fly.io/blog/how-i-fly-yoko-li/", "published": "2024-01-08T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<p>Hello all, and welcome to another episode of How I Fly, a series where I interview developers about what they do with technology, what they find exciting, and the unexpected things they’ve learned along the way. This time I’m talking with <a href='https://twitter.com/stuffyokodraws' title=''>Yoko Li</a>, an investment partner at A16Z who’s also an open-source AI developer. She works on some of the most exciting AI projects in the world. I’m excited to share them with you today, with fun stories about the lessons she’s learned along the way.</p>\n<h2 id='cool-experiments' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#cool-experiments' aria-label='Anchor'></a><span class='plain-code'>Cool Experiments</span></h2>\n<p>One of Yoko’s most thought-provoking experiments is <a href='https://www.convex.dev/ai-town' title=''>AI Town</a>, a virtual town populated by AI agents that talk with each other. It takes advantage of the randomness of AI responses to create emergent behavior. When you open it, it looks like this:</p>\n\n<p><img alt=\"A picture of the AI Town homepage, a UI showing a top-down 2D RPG view with a visible river and a tent. The UI shows a conversation with the characters Alice and Stella.\" src=\"/blog/how-i-fly-yoko-li/assets/image1.webp\" /></p>\n\n<p>You can see the AI agents talking with each other and watch how the relationships between them form and change over time. It’s also a lot of fun to watch.</p>\n\n<p>One of Yoko’s other experiments is <a href='https://ai-tamago.fly.dev/' title=''>AI Tamago</a>, a <a href='https://en.wikipedia.org/wiki/Tamagotchi' title=''>Tamagochi</a> virtual pet implemented with a large language model instead of the state machine that we’re all used to. AI Tamago uses an unmodified version of LLaMA 2 7B to take in game state and user inputs, then it generates what happens next. Every time you interact with your pet, it feeds data to LLaMA 2 and then uses Ollama’s JSON mode to generate unexpected output.</p>\n\n<p><img alt=\"A picture of the homepage of AI Tamago, showing a virtual pet with buttons to feed the pet, play with the pet, clean the pet, discipline the pet, check pet status, and deliver medical care to the pet.\" src=\"/blog/how-i-fly-yoko-li/assets/image4.webp\" /></p>\n\n<p>It’s all the fun of the classic Tamagochi toys from the 90’s (including the ability to randomly discipline your virtual pet) without any of the coin cell batteries or having to carry around the little egg-shaped puck.</p>\n\n<p>But that’s just something you can watch, not something that’s as easy to play with on your own machine. Yoko has also worked on the <a href='https://github.com/ykhli/local-ai-stack' title=''>Local AI Starter Kit</a> that lets you go from zero to AI in minutes. It’s a collection of chains of models that let you ingest a bunch of documents, store them in a database, and then use those documents as context for a language model to generate responses. It’s everything you need to implement a “chat with a knowledge base” feature.</p>\n<h3 id='the-dark-of-ai-experiments' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-dark-of-ai-experiments' aria-label='Anchor'></a><span class='plain-code'>The dark of AI experiments</span></h3>\n<p>The Local AI Starter Kit is significant because normally to do this, you need to set up billing and API keys for at least four different API providers, and then you need to write a bunch of (hopefully robust) code to tie it all together. With the Local AI Starter Kit, you can do this on your own hardware, with your own data, and your own models privately. It’s a huge step forward for democratizing access to this technology.</p>\n\n<p>Document search is one of my favorite usecases for AI, and it’s one of the most immediately useful ones. It’s also one of the most fiddly and annoying to get right. To help illustrate this, I’ve made a diagram of the steps involved with setting up document search by hand:</p>\n\n<p><img alt=\"A diagram showing the process of ingesting a pile of markdown documents into a vector database. The documents are broken into a collection of sections, then each section is passed through an embedding model and the resulting vectors are stored in a vector database.\" src=\"/blog/how-i-fly-yoko-li/assets/image3.webp\" /></p>\n\n<p>You start with your Markdown documents. Most Markdown documents are easily broken up into sections where each section will focus on a single aspect of the larger topic of the document. You can take advantage of this best practice by letting people search for each section individually, which is typically a lot more useful than just searching the entire document.</p>\n<div class=\"right-sidenote\"><p>Okay, okay, fine. Language encircles concepts instead of defining them directly. The point still stands that we’re operating at a level “below” words and sentences, I don’t want to bog this down in a bunch of linear algebra that neither of us understand well enough to explain in a single paragraph like I am here. The main point is that it lets you “fuzzy match” relevant documents in a way that exact word search queries never could on their own.</p>\n</div>\n<p>Essentially, the vector embeddings that you generate from an embedding model are a mathematical representation of the “concepts” that the embedding model uses that are adjacent to the text of your documents. When you use the same model to generate embeddings for your documents and user queries, this lets you find documents that are similar to the query, but not precisely the same exact words. This is called “fuzzy searching” and it is one of the most difficult problems in computer science (right next to naming things).</p>\n\n<p>When a user comes to search the database, you do the same thing as ingestion:</p>\n\n<p><img alt=\"A diagram showing the full flow for doing document search Q&A with a vector database. The user submits a question to an API endpoint, the question is broken into embedding vectors and used to search for similar vectors in the database. The relevant document fragments are fed into the prompt for a large language model to generate a response that is grounded in the facts from the documents that were ingested. The response is streamed to the user one token at a time.\" src=\"/blog/how-i-fly-yoko-li/assets/image2.webp\" /></p>\n\n<p>The user query comes into your API endpoint. You use the same embedding model from earlier (omitted from the diagram for brevity) to turn that query into a vector. Then you query the same vector database to find documents that are similar to the query. Then you have a list of documents with metadata like the URL to the documentation page or section fragment in that page. From here you have two options. You can either use the documents to return a list of results to the user, or you can do the more fun thing: using those documents as context for a large language model to generate a response grounded in the relevant facts in those documents.</p>\n<div class=\"right-sidenote\"><p>I think it’s also how OpenAI’s custom GPTs work, but they haven’t released technical details about how they work so this is outright speculation on my part.</p>\n</div>\n<p>This basic pattern is called Retrieval-augmented Generation (RAG), and it’s how Bing’s copilot chatbot works. The Local AI Starter Kit makes setting this pipeline up <em>effortless</em> and <em>fast</em>. It’s a huge step forward for making this groundbreaking technology accessible to everyone.</p>\n<h2 id='the-struggles' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-struggles' aria-label='Anchor'></a><span class='plain-code'>The struggles</span></h2>\n<blockquote>\n<p>When I was trying to get the AI models in AI Town to output JSON, I tried a bunch of different things. I got some good results by telling the model to “only reply in JSON, no prose”, but we ended up using a model tuned for outputting code. I think I inspired <a href='https://ollama.ai' title=''>Ollama</a> to add their JSON output feature.</p>\n</blockquote>\n\n<p>One of the main benefits of large language models is that they are essentially stochastic models of the entire Internet. They have a bunch of patterns formed that can let you create surprisingly different outputs from similar inputs. This is also one of the main drawbacks of large language models: they are essentially stochastic models of the entire Internet. They have a bunch of patterns formed that can let you create surprisingly different outputs from similar inputs. The outputs of these models are usually correct-ish enough (more correct if you ground the responses in document fact like you do with a Retrieval-augmented Generation system), but they are not always aligned with our observable reality.</p>\n\n<p>A lot of the time you will get outputs that don’t make any logical or factual sense. These are called “hallucinations” and they are one of the main drawbacks of large language models. If a hallucination pops in at the worst times, you’ve accidentally told someone how to poison themselves with chocolate chip cookies. This is, as the kids say, “bad”.</p>\n\n<p>The inherent randomness of the output of a large language model means that it can be difficult to get an exactly parsable format. Most of the time, you’d be able to coax the model to get usable JSON output, but without schema it can sometimes generate wildly different JSON responses. Only sometimes. This isn’t deterministic and Yoko has found that this is one of the most frustrating parts of working with large language models.</p>\n<div class=\"right-sidenote\"><p>This works by making any offending ungrammatical tokens weighted to negative infinity. It’s amazingly hacky but the hilarious part is that it works.</p>\n</div>\n<p>However, there are workarounds. <a href='https://github.com/ggerganov/llama.cpp' title=''>llama.cpp</a> offers a way to use a grammar file to strictly guide the output of a large language model by using context-free grammar. This lets you get something more deterministic, but it’s still not perfect. It’s a lot better than nothing, though.</p>\n\n<p>One of the fun things that can happen with this is that you can have the model fail to generate anything but an endless stream of newlines in JSON mode. This is hilarious and usually requires some special detection logic to handle and restart the query. There’s work being done to let you use JSON schema to guide the generation of large language model outputs, but it’s not currently ready for the masses.</p>\n<div class=\"right-sidenote\"><p>If it’s dumb and it works, is it really dumb?</p>\n</div>\n<p>However, one of the easiest ways to hack around this is by using a model that generates code instead of text. This is how Yoko got the AI Town and AI Tamago models to output JSON that was mostly valid. It’s a hack, but it works. This was made a lot easier for AI town when one of the tools they use (<a href='https://ollama.ai' title=''>Ollama</a>) added support for JSON output from the model. This is a lot better than the code generation model hack, but research continues.</p>\n<h2 id='the-simple-joy-of-unexpected-outputs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-simple-joy-of-unexpected-outputs' aria-label='Anchor'></a><span class='plain-code'>The simple joy of unexpected outputs</span></h2>\n<blockquote>\n<p>When I was making AI Town, I was inspired by <a href='https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Objects' title=''>The Lifecycle of Software Objects</a> by Ted Chiang. It’s about a former zookeeper that trained AI agents to be pets, kinda like how we use Reinforcement Learning from Human Feedback to train AI models like ChatGPT.</p>\n</blockquote>\n\n<p>However, at the same time, there are cases where hallucinations are not only useful, but they are what make the implementation of a system possible. If large language models are essentially massive banks of the word frequencies of a huge part of culture, then the emergent output can create unexpected things that happen frequently. This lets you have emergent behavior form, this can be the backbone of games and is the key thing that makes AI Town work as well as it does.</p>\n\n<p>AI Tamago is also completely driven off of the results of large language model hallucinations. They are the core of what drives user inputs, the game loop, and the surprising reactions you get when disciplining your pet. The status screen takes in the game state and lets you know what your pet is feeling in a way that the segment displays of the Tamagochi toys could never do.</p>\n\n<p>These enable you to build workflows that are <em>augmented</em> by the inherent randomness of the hallucinations instead of seeing them as drawbacks. This means you need to choose outputs that can have the hallucinations shine instead of being ugly warts you need to continuously shave away. Instead of using them for doing pathfinding, have them drive the AI of your characters or writing the A* pathfinding algorithm so you don’t have to write it again for the billionth time.</p>\n\n<p>I’m not saying that large language models can replace the output of a human, but they are more like a language server for human languages as well as programming languages. They are best used when you are generating the boilerplate you don’t want to do yourself, or when you are throwing science at the wall to see what sticks.</p>\n<h2 id='in-conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#in-conclusion' aria-label='Anchor'></a><span class='plain-code'>In conclusion</span></h2>\n<p>Yoko is showing people how to use AI today, on local machines, with models of your choice, that allow you to experiment, hack and learn.</p>\n\n<p>I can’t wait to see what’s next!</p>\n\n<p>If you want to follow what Yoko does, here’s a few links to add to your feeds:</p>\n\n<ul>\n<li>Yoko’s <a href='https://twitter.com/stuffyokodraws' title=''>Twitter</a> (or X, or whatever we’re supposed to call it now)\n</li><li>Yoko’s <a href='https://github.com/ykhli' title=''>GitHub</a>\n</li><li>Yoko’s <a href='https://yoko.dev/' title=''>Website</a>\n</li></ul>\n\n<p>(insert standard conclusion diatribe here)</p>", "image": { "url": "https://fly.io/blog/how-i-fly-yoko-li/assets/chat-bird-cover-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/not-midjourney-bot/", "title": "Deploy Your Own (Not) Midjourney Bot on Fly GPUs", "description": null, "url": "https://fly.io/blog/not-midjourney-bot/", "published": "2024-01-04T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>Fly.io has Enterprise-grade GPUs and servers all over the globe (or <em>disk</em>, depending on which side of the flat Earth debate you fall on) making it a great place to deploy your next disruptive AI app.</p>\n</div>\n<p>Some people daydream about normal things, like coffee machines or raising that Series A round (those are normal things to dream about, right?). I daydream about commanding a fleet of chonky <a href='https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413/' title=''>NVIDIA Lovelace L40Ss</a>. Also, totally normal. Well, fortunately for me and anyone else wanting to explore the world of generative AI — Fly.io has GPUs now!</p>\n\n<p>Sure, this technology will probably end up with the AI <a href='https://marketoonist.com/2023/03/ai-written-ai-read.html' title=''>talking to itself</a> while we go about our lives — but it seems like it’s here to stay, so we should at least have some fun with it. In this post we’ll put these GPUs to task and you’ll learn how to build your very own AI image-generating Discord bot, kinda like Midjourney. Available 24/7 and ready to serve up all the pictures of cats eating burritos your heart desires. And because I’d never tell you to draw the rest of the owl, I’ll link to working code that you can deploy today.</p>\n<h2 id='latent-diffusion-models-have-entered-the-chat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#latent-diffusion-models-have-entered-the-chat' aria-label='Anchor'></a><span class='plain-code'>Latent Diffusion Models Have Entered the Chat</span></h2>\n<p>In the realm of AI image generation, two names have become prominent: Midjourney and Stable Diffusion. Both are image generating software that allow you to synthesize an image from a textual prompt. One is a closed source paid service, while the other is open source and can run locally. Midjourney gained popularity because it allowed the less technically-inclined among us to explore this technology through its ease of use. Stable Diffusion democratized access to the technology, but it can be quite tricky to get good results out of it.</p>\n\n<p>Enter <a href='https://github.com/lllyasviel/Fooocus' title=''>Fooocus</a> (pronounced <em>focus</em>), an open source project that combines the best of both worlds and offers a user-friendly interface to Stable Diffusion. It’s hands down the easiest way to get started with Stable Diffusion. Sure there are more popular tools like Stable Diffusion web UI and ComfyUI, but Fooocus adds some magic to reduce the need to manually tweak a bunch of settings. The most significant feature is probably GPT-2-based “<a href='https://github.com/lllyasviel/Fooocus/discussions/117#raw' title=''>prompt expansion</a>” to dynamically enhance prompts.</p>\n\n<p>The point of Fooocus is to <em>focus</em> on your prompt. The more you put into it, the more you get out. That said, a very simple prompt like “forest elf” can return high-quality images without the need to trawl the web for prompt ideas or fiddle with knobs and levers (although they’re there if you want them).</p>\n\n<p>So, what can this thing <em>do</em>? Well, this…</p>\n\n<p><img alt=\"A black and white sketch of hot-air balloon over a mountain range generated using Fooocus with \"Pencil Sketch Drawing\" style and quality = True\" src=\"/blog/not-midjourney-bot/assets/./balloon-sketch.webp\" /></p>\n\n<p>Here’s the full command I’ve used to generate this image: <code>/imagine prompt: sketch of hot-air balloon over a mountain range style1: Pencil Sketch Drawing quality: true ar: 1664×576</code></p>\n<h2 id='what-were-building' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-were-building' aria-label='Anchor'></a><span class='plain-code'>What We’re Building</span></h2>\n<p>We’ll deploy two applications. The code to run the bot itself will run on normal VM hardware, and the API server doing all the hard work synthesizing alpacas out of thin air will run on GPU hardware.</p>\n\n<p><img alt=\"An architecture diagram explaining how the two apps will communicate and return the requested image to an end user.\" src=\"/blog/not-midjourney-bot/assets/./arch-diagram.png?center&2/3\" /></p>\n\n<p>Fooocus is served up as a web UI by default, but with a little elbow grease we can interact with it as a REST API. Fortunately, with more than 25k stars on GitHub at the time of writing, the project has a lively open-source community, so we don’t need to do much work here — it’s already been done for us. <a href='https://github.com/konieshadow/Fooocus-API' title=''>Fooocus-API</a> is a project that shoves FastAPI in front of a Fooocus runtime. We’ll use this for the API server app.</p>\n\n<p>The Python-based bot connects to the <a href='https://discord.com/developers/docs/topics/gateway' title=''>Discord Gateway API</a> using the <a href='https://github.com/Pycord-Development/pycord' title=''>Pycord</a> library. When it starts up, it maintains an open pipe for data to flow back and forth via WebSockets. The bot app also includes a client that knows how to talk to the API server using Flycast and request the image it needs via HTTP.</p>\n\n<p>When we request an image from Discord using the <code>/imagine</code> slash command, we immediately respond using Pycord’s <code>defer()</code> function to let Discord know that the request has been received and the bot is working on it — it’ll take a few seconds to process your prompt, fabricate an image, upload it to Discord and let you share it with your friends. This is a blocking operation, so it won’t perform well if you have hundreds of people on your Discord Server using the command. For that, you’ll want to jiggle some wires to make the code non-blocking. But for for now, this gives us a nice UX for the bot.</p>\n\n<p>When the API server returns the image, it gets saved to disk. We’ll use the fantastic <a href='https://github.com/sqids/sqids-python' title=''>Sqids</a> library to generate collision-free file names:</p>\n<div class=\"highlight-wrapper group relative python\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-75afx6ud\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-75afx6ud\"><span class=\"n\">unique_id</span> <span class=\"o\">=</span> <span class=\"bp\">self</span><span class=\"p\">.</span><span class=\"n\">sqids</span><span class=\"p\">.</span><span class=\"n\">encode</span><span class=\"p\">(</span>\n <span class=\"p\">[</span><span class=\"n\">ctx</span><span class=\"p\">.</span><span class=\"n\">author</span><span class=\"p\">.</span><span class=\"nb\">id</span><span class=\"p\">,</span> <span class=\"nb\">int</span><span class=\"p\">(</span><span class=\"n\">time</span><span class=\"p\">.</span><span class=\"n\">time</span><span class=\"p\">())]</span>\n<span class=\"p\">)</span>\n\n<span class=\"n\">result_filename</span> <span class=\"o\">=</span> <span class=\"sa\">f</span><span class=\"s\">\"result_</span><span class=\"si\">{</span><span class=\"n\">unique_id</span><span class=\"si\">}</span><span class=\"s\">.png\"</span>\n</code></pre>\n </div>\n</div>\n<p>We’ll also use <code>asyncio</code> to check if the image is ready every second, and when it is, we send it off to Discord to complete the request:</p>\n<div class=\"highlight-wrapper group relative python\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-w1v7557b\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-w1v7557b\"><span class=\"k\">while</span> <span class=\"ow\">not</span> <span class=\"n\">os</span><span class=\"p\">.</span><span class=\"n\">path</span><span class=\"p\">.</span><span class=\"n\">exists</span><span class=\"p\">(</span><span class=\"n\">result_filename</span><span class=\"p\">):</span>\n <span class=\"k\">await</span> <span class=\"n\">asyncio</span><span class=\"p\">.</span><span class=\"n\">sleep</span><span class=\"p\">(</span><span class=\"mi\">1</span><span class=\"p\">)</span>\n\n<span class=\"k\">with</span> <span class=\"nb\">open</span><span class=\"p\">(</span><span class=\"n\">result_filename</span><span class=\"p\">,</span> <span class=\"s\">\"rb\"</span><span class=\"p\">)</span> <span class=\"k\">as</span> <span class=\"n\">f</span><span class=\"p\">:</span>\n <span class=\"k\">await</span> <span class=\"n\">ctx</span><span class=\"p\">.</span><span class=\"n\">respond</span><span class=\"p\">(</span>\n <span class=\"nb\">file</span><span class=\"o\">=</span><span class=\"n\">discord</span><span class=\"p\">.</span><span class=\"n\">File</span><span class=\"p\">(</span><span class=\"n\">f</span><span class=\"p\">,</span> <span class=\"n\">result_filename</span><span class=\"p\">)</span>\n <span class=\"p\">)</span>\n</code></pre>\n </div>\n</div>\n<p>Neither of these two apps will be exposed to the Internet, yet they’ll still be able to communicate with each other. One of the undersold stories about Fly.io is the ease with which two applications can communicate over the private network. We assign special IPv6 private network (6pn) addresses within the same organizational space and applications can effortlessly discover and connect to one another without any additional configuration.</p>\n\n<p>But what about load balancing and this “scale-to-zero” thing? We don’t <em>just</em> want our two apps to talk to each other, we want the Fly Proxy to start our Machine when a request comes in, and stop it when idle. For that, we’ll need <a href='https://fly.io/docs/reference/private-networking/#flycast-private-load balancing' title=''>Flycast</a>, our private load balancing feature.</p>\n\n<p>When you assign a Flycast IP to your app, you can route requests using a special <code>.flycast</code> domain. Those requests are routed through the Fly Proxy instead of directly to instances in your app. Meaning you get all the load balancing, rate limiting and other proxy goodness that you’re accustomed to. The Proxy runs a process which can automatically downscale Machines every few minutes. It’ll also start them right back up when a request comes in — this means we can take advantage of scale-to-zero, saving us a bunch of money!</p>\n<h2 id='the-imagine-command' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-imagine-command' aria-label='Anchor'></a><span class='plain-code'>The <code>/imagine</code> Command</span></h2>\n<p>The slash command is the heart of your bot, enabling you to generate images based on your prompt, right from within Discord. When you type <code>/imagine</code> into the Discord chat, you’ll see some command options pop up.</p>\n\n<p>You’ll need to input your base prompt (e.g. “an alpaca sleeping in a grassy field”) and optionally pick some styles (“Pencil Sketch Drawing”, “Futuristic Retro Cyberpunk”, “MRE Dark Cyberpunk” etc). With Fooocus, combining multiple styles — “style-chaining” — can help you achieve amazing results. Set the aspect ratio or provide negative prompts if needed, too.</p>\n\n<p>After you execute the command, the bot will request the image from the API, then send it as a response in the chat. Let’s see it in action!</p>\n\n<p><img alt=\"A dif demo run through showcasing the ability of the bot to generate images from Discord\" src=\"/blog/not-midjourney-bot/assets/./demo.gif?card¢er\" /></p>\n<h2 id='deployment-speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-speedrun' aria-label='Anchor'></a><span class='plain-code'>Deployment Speedrun</span></h2>\n<p><strong class='font-semibold text-navy-950'>First, we’ll deploy the API server.</strong> For convenience (and to speed things up), we’ll use a pre-built image when we deploy. With dependencies like <code>torch</code> and <code>torchvision</code> bundled in, it’s a hefty image weighing in just shy of 12GB. With a normal Fly Machine this would not only be a bad idea, but not even possible due to an 8GB limit for the VMs rootfs. Fortunately the wizards behind Fly GPUs have accounted for our need to run huge models and their dependencies, and awarded us 50GB of rootfs.</p>\n<div class=\"right-sidenote\"><p>Fly GPUs use <a href=\"https://github.com/cloud-hypervisor/cloud-hypervisor\" title=\"\">Cloud Hypervisor</a> and not <a href=\"https://github.com/firecracker-microvm/firecracker\" title=\"\">Firecracker</a> (like a regular Fly Machine) for virtualization. But even with a 12GB image, this doesn’t stop the Machine from booting in seconds when a new request comes in through the Proxy.</p>\n</div>\n<p>To start, clone the template <a href='https://github.com/fly-apps/not-midjourney-bot' title=''>repository</a>. You’ll need this for both the bot and server apps. Then deploy the server with the Fly CLI:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-ytt1j7os\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-ytt1j7os\">fly deploy <span class=\"se\">\\</span>\n <span class=\"nt\">--image</span> ghcr.io/fly-apps/not-midjourney-bot:server <span class=\"se\">\\</span>\n <span class=\"nt\">--config</span> ./server/fly.toml <span class=\"se\">\\</span>\n <span class=\"nt\">--no-public-ips</span>\n</code></pre>\n </div>\n</div>\n<p>This command tells Fly.io to deploy your application based on the configuration specified in the <code>fly.toml</code>, while the <code>--no-public-ips</code> flag secures your app by not exposing it to the public Internet.</p>\n\n<p>Remember Flycast? To use it, we’ll allocate a private IPv6:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-g3tqfpkl\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-g3tqfpkl\">fly ips allocate-v6 <span class=\"nt\">--private</span>\n</code></pre>\n </div>\n</div>\n<p>Now, let’s take a look at our <a href='https://github.com/fly-apps/not-midjourney-bot/blob/134bb634f97bf81040e489650f2334b48d976c10/server/fly.toml' title=''><code>fly.toml</code></a> config:</p>\n<div class=\"highlight-wrapper group relative toml\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-a3s9879o\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-a3s9879o\"><span class=\"py\">app</span> <span class=\"p\">=</span> <span class=\"s\">\"alpaca-image-gen\"</span>\n<span class=\"py\">primary_region</span> <span class=\"p\">=</span> <span class=\"s\">\"ord\"</span>\n\n<span class=\"nn\">[[vm]]</span>\n <span class=\"py\">size</span> <span class=\"p\">=</span> <span class=\"s\">\"performance-8x\"</span>\n <span class=\"py\">memory</span> <span class=\"p\">=</span> <span class=\"s\">\"16gb\"</span>\n <span class=\"py\">gpu_kind</span> <span class=\"p\">=</span> <span class=\"s\">\"l40s\"</span>\n\n<span class=\"nn\">[[services]]</span>\n <span class=\"py\">internal_port</span> <span class=\"p\">=</span> <span class=\"mi\">8888</span>\n <span class=\"py\">protocol</span> <span class=\"p\">=</span> <span class=\"s\">\"tcp\"</span>\n <span class=\"py\">auto_stop_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n <span class=\"py\">auto_start_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n <span class=\"py\">min_machines_running</span> <span class=\"p\">=</span> <span class=\"mi\">0</span>\n\n <span class=\"nn\">[[services.ports]]</span>\n <span class=\"py\">handlers</span> <span class=\"p\">=</span> <span class=\"nn\">[\"http\"]</span>\n <span class=\"py\">port</span> <span class=\"p\">=</span> <span class=\"mi\">80</span>\n <span class=\"py\">force_https</span> <span class=\"p\">=</span> <span class=\"kc\">false</span>\n\n<span class=\"nn\">[mounts]</span>\n <span class=\"py\">source</span> <span class=\"p\">=</span> <span class=\"s\">\"repositories\"</span>\n <span class=\"py\">destination</span> <span class=\"p\">=</span> <span class=\"s\">\"/app/repositories\"</span>\n <span class=\"py\">initial_size</span> <span class=\"p\">=</span> <span class=\"s\">\"20gb\"</span>\n</code></pre>\n </div>\n</div>\n<p>There are a few key things to note here:</p>\n\n<ol>\n<li>Currently, the NVIDIA L40Ss we’re using when we specify <code>gpu_kind</code> are only available in <code>ORD</code>, so that’s what we’ve set the <code>primary_region</code> to. We’re rolling out more GPUs to more regions in a hurry — but for now we’ll host the bot in Chicago.\n</li><li>Out of the box, 8GB of system RAM is suggested. In my testing this wasn’t close to enough: the Machine would frequently run out of memory and crash. I got things working better by using 16GB of RAM.\n</li><li>The FastAPI server binds to port 8888; we need to set this as our <code>internal_port</code>, or the Fly Proxy won’t know where to send requests.\n</li><li>We want our Machine to <a href='https://fly.io/docs/apps/autostart-stop/' title=''>automatically stop and start</a>.\n</li><li>Flycast doesn’t do HTTPS, so we won’t force it here. Don’t worry, it’s still encrypted over the wire!\n</li><li>A volume is automatically created on the first deploy. On first boot, the app clones the Fooocus repo and downloads the Stable Diffusion model checkpoints onto that volume. This takes a couple of minutes, but the next time the Machine starts, it’ll have everything it needs to serve a request within seconds.\n</li></ol>\n<div class=\"callout\"><p>The <a href=\"https://github.com/fly-apps/not-midjourney-bot/blob/84e72d1e7048627b7c845fe3d44d45b278e451d5/README.md\" title=\"\"><strong class=\"font-semibold text-navy-950\">README</strong></a> for this project has detailed instructions about setting up your Discord bot and adding it to a Server. After setting up the permissions and privileged intents, you’ll get an OAuth2 URL. Use this URL to invite your bot to your Discord server and confirm the permissions. Once that’s done, grab your Discord API token, you’ll need it for the next step.</p>\n</div>\n<p><strong class='font-semibold text-navy-950'>With the API server up and running, it’s time to deploy the Discord bot.</strong> This app will run on a normal Fly Machine, no GPU required. First, set the <code>DISCORD_TOKEN</code> and <code>FOOOCUS_API_URL</code> (the Flycast endpoint for the API server) secrets, using the Fly CLI. Then deploy:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-314htg3w\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-314htg3w\">fly deploy <span class=\"se\">\\</span>\n <span class=\"nt\">--image</span> ghcr.io/fly-apps/not-midjourney-bot:bot <span class=\"se\">\\</span>\n <span class=\"nt\">--config</span> ./bot/fly.toml <span class=\"se\">\\</span>\n <span class=\"nt\">--no-public-ips</span>\n</code></pre>\n </div>\n</div>\n<p>Notice that the bot app doesn’t need to be publicly visible on the Internet either. Under the hood, the WebSocket connection to Discord’s Gateway API allows the bot to communicate freely without the need to define any services in our <code>fly.toml</code>. This also means that the Fly Proxy will not downscale the app like it does the GPU Machine — the bot will always appear “online”.</p>\n<figure class=\"post-cta\">\n <figcaption>\n <h1>Not interested in GPUs?</h1>\n <p>You can still deploy apps on Fly.io today and be up and running in a matter of minutes.</p>\n <a class=\"btn btn-lg\" href=\"https://fly.io/docs/speedrun/\">\n Deploy an app now<span class='opacity-50'>→</span>\n </a>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-kitty.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>\n\n<h2 id='how-do-i-know-this-thing-is-using-gpu-for-reals' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-do-i-know-this-thing-is-using-gpu-for-reals' aria-label='Anchor'></a><span class='plain-code'>How Do I Know This Thing Is Using GPU for Reals?</span></h2>\n<p>That’s easy! NVIDIA provides us with a neat little command-line utility called <code>nvidia-smi</code> which we can use to monitor and get information about NVIDIA GPU devices.</p>\n\n<p>Let’s SSH to the running Machine for the API server app and run an <code>nvidia-smi</code> query in one go. It’s a little clunky, but you’ll get the point:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-v0fauj3q\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-v0fauj3q\">fly ssh console <span class=\"se\">\\</span>\n <span class=\"nt\">-C</span> <span class=\"s2\">\"nvidia-smi --query-gpu=gpu_name,utilization.gpu,utilization.memory,temperature.gpu,power.draw --format=csv,noheader --loop\"</span>\n</code></pre>\n </div>\n</div><div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-j86zv5m2\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-j86zv5m2\">Connecting to fdaa:2:f664:a7b:210:d8b2:8fd8:2... complete\n\nNVIDIA L40S, 0 %, 0 %, 46, 88.63 W\nNVIDIA L40S, 0 %, 0 %, 46, 88.61 W\nNVIDIA L40S, 36 %, 4 %, 51, 103.41 W\nNVIDIA L40S, 65 %, 25 %, 57, 280.90 W\nNVIDIA L40S, 0 %, 0 %, 49, 91.13 W\nNVIDIA L40S, 0 %, 0 %, 48, 89.76 W\n</code></pre>\n </div>\n</div>\n<p>What we’ve done is run the command on a loop while the bot is actually doing work synthesizing an image and we get to see it ramp up and consume more wattage and VRAM. The card is barely breaking a sweat!</p>\n<h2 id='how-much-will-these-alpaca-pics-cost-me' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-will-these-alpaca-pics-cost-me' aria-label='Anchor'></a><span class='plain-code'>How Much Will These Alpaca Pics Cost Me?</span></h2>\n<p>Let’s talk about the cost-effectiveness of this setup. On Fly.io, an L40S GPU <a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''>costs</a> $2.50/hr. Tag on a few cents per hour for the VM resources and storage for our models and you’re looking at about $3.20/hr to run the GPU Machine. It’s <em>on-demand</em>, too — if you’re not using the compute, you’re not paying for it! Keep in mind that some of these checkpoint models can be several gigabytes and if you create a volume, you will be charged for it even when you have no Machines running. It’s worth noting too, that the non-GPU bot app falls into our <a href='https://fly.io/docs/about/pricing/#free-allowances' title=''>free allowance</a>.</p>\n<div class=\"right-sidenote\"><p>Rates are on-demand, with no minimum usage requirements. Discounted rates for reserved GPU Machines and dedicated hosts are also available if you email <a href=\"mailto:[email protected]\" title=\"\">[email protected]</a></p>\n</div>\n<p>In comparison, Midjourney offers several subscription tiers with the cheapest plan costing $10/mo and providing 3.3 hours of “fast” GPU time (roughly equivalent to an enterprise-grade Fly GPU). This works out to about $3/hr give or take a few cents.</p>\n<h2 id='where-can-i-take-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-can-i-take-this' aria-label='Anchor'></a><span class='plain-code'>Where Can I Take This?</span></h2>\n<p>There is a lot you can do to build out the bot’s functionality. You control the source code for the bot, meaning that you can make it do <em>whatever you want</em>. You might decide to mimic Midjourney’s <code>/blend</code> command to splice your own images into prompts (AKA img2img diffusion). You can do this by adding more commands to your <a href='https://guide.pycord.dev/popular-topics/cogs' title=''>Cog</a>, Pycord’s way of grouping similar commands. You might decide to add a button to roll the image if you don’t like it, or even specify the number of images to return. The possibilities are endless and your cloud bill’s the limit!</p>\n\n<p>The full code for the bot and server (with detailed instructions on how to deploy it on Fly.io) can be found <a href='https://github.com/fly-apps/not-midjourney-bot' title=''><strong class='font-semibold text-navy-950'>here</strong></a>.</p>", "image": { "url": "https://fly.io/blog/not-midjourney-bot/assets/purple-balloon-taking-off-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/fly-with-alpine/", "title": "Fly With Alpine", "description": null, "url": "https://fly.io/blog/fly-with-alpine/", "published": "2023-12-21T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>Reduce image sizes and improve startup times by switching your base image to Alpine Linux.</p>\n</div>\n<p>Before proceeding, a caution. This is an engineering trade-off. Test carefully before deploying to production.</p>\n\n<p>By the end of this blog post you should have the information you need to make an informed decision.</p>\n<h2 id='introduction' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#introduction' aria-label='Anchor'></a><span class='plain-code'>Introduction</span></h2>\n<p><a href='https://www.alpinelinux.org/about/' title=''>Alpine Linux</a> is a Linux distribution that advertises itself as Small. Simple. Secure.</p>\n\n<p>It is indisputably smaller than the alternatives – when measured by image size. More on that in a bit. Some claim that this results in less memory usage and better performance. Others dispute these claims. For these, it is best that you test the results for yourself with your application.</p>\n\n<p>Simple is harder to measure. Some of the larger differences, like <a href='https://github.com/OpenRC/openrc#readme' title=''>OpenRC</a> vs <a href='https://systemd.io/' title=''>SystemD</a>, are less relevant in container environments. Others, like <a href='https://busybox.net/' title=''>BusyBox</a> are implementation details. Essentially what you get is a Linux distribution with perhaps a number of standard packages (e.g., bash) not installed by default, but these can be easily added if needed.</p>\n\n<p>Secure is definitely an important attribute. The alternatives make comparable claims in this area. Do your own research in this area and come to your own conclusions.</p>\n\n<p>Not mentioned is the downside: Alpine Linux has a smaller ecosystem that the alternatives, particularly when compared to Debian.</p>\n<h2 id='baseline' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#baseline' aria-label='Anchor'></a><span class='plain-code'>Baseline</span></h2>\n<p>Let’s start with a baseline consisting of the Dockerfiles produced by <code>fly launch</code> for some of the most popular\nframeworks:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-ywliy2hv\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-ywliy2hv\">FROM fideloper/fly-laravel:${PHP_VERSION}\nFROM hexpm/elixir:1.12.3-erlang-24.1.4-debian-bullseye-20210902-slim\nFROM node:${NODE_VERSION}-slim\nFROM oven/bun:${BUN_VERSION}-slim\nFROM python:${PYTHON_VERSION}-slim-bullseye\nFROM ruby:$RUBY_VERSION-slim\n</code></pre>\n </div>\n</div>\n<p>What may not be obvious to the naked eye from these results is that the base image for these is one of the following:</p>\n\n<ul>\n<li>Debian Bookworm (the current “stable” distribution)\n</li><li>Debian Bullseye (the previous “stable” distribution)\n</li><li>Ubuntu Focal Fossa (the previous LTS release of Ubuntu)\n</li></ul>\n\n<p>Once you factor in that Ubuntu is based on Debian, the conclusion is that Debian is effectively the default distribution for fly IO. Rest assured that this isn’t the result of a devious conspiracy by Fly.io, but rather a reflection of the default choices made independently by the developers of a number of frameworks and runtimes. Beyond this, all Fly.io is doing is choosing the “slim” version of the default distribution for each framework as the base.</p>\n\n<p>What’s likely going on here is a virtuos circle: people choose Debian because of the ecosystem, and ecosystem grows because people chose Debian.</p>\n\n<p>Now lets compare base image sizes:</p>\n<table class=\"ml-8 mb-8\">\n<thead>\n<tr>\n <th class=\"px-8\">\n <th class=\"px-8 underline\">Alpine\n <th class=\"px-8 underline\">Debian slim\n</tr>\n</thead>\n<tbody>\n<tr>\n <th class=\"text-left\">Bun 1.0.18\n <td class=\"text-center\">43.10M\n <td class=\"text-center\">63.84M\n</tr>\n<tr>\n <th class=\"text-left\">Node 21.4.0\n <td class=\"text-center\">46.83M\n <td class=\"text-center\">70.08M\n</tr>\n<tr>\n <th class=\"text-left\">Python 3.12.1\n <td class=\"text-center\">17.59M\n <td class=\"text-center\">45.36M\n</tr>\n<tr>\n <th class=\"text-left\">Ruby 3.2\n <td class=\"text-center\">40.14M\n <td class=\"text-center\">74.36M\n</tr>\n</tbody>\n</table>\n\n\n<p>And these numbers are just the for the base images. I’ve measured a minimal Rails/Postgresql/esbuild application at 304MB on Alpine and 428MB on Debian Slim. A minimal Bun application at 110MB on Alpine and 173MB on Debian Slim. And a minimal Node application at 142MB on Alpine and 207MB on Debian Slim.</p>\n\n<p>In each case, corresponding Alpine images are consistently smaller than their Debian slim equivalent.</p>\n<h2 id='switching-distributions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#switching-distributions' aria-label='Anchor'></a><span class='plain-code'>Switching Distributions</span></h2>\n<p>Switch distributions (and switching back!) is easy.</p>\n\n<p>The first change is to replace <code>-slim</code> with <code>-alpine</code> in <code>FROM</code> statements in your <code>Dockerfile</code>.</p>\n\n<p>Next is to replace <code>apt-get update</code> with <code>apk update</code> and <code>apt-get install</code> with <code>apk add</code>. Delete any options you may have like <code>-y</code> and <code>--no-install-recommends</code> - they aren’t needed.</p>\n\n<p>Now review the names of the packages you are installing. Many are named the same. A few are different.\nYou can use <a href='https://pkgs.alpinelinux.org/packages' title=''>alpine packages</a> to look for ones to use. Some examples of\ndifferences:</p>\n<table class=\"ml-8 mb-8\" style=\"border-collapse: separate; border-spacing: 1rem 0\">\n<thead>\n<tr>\n <th class=\"px-8 underline text-left\">Debian\n <th class=\"px-8 underline text-left\">Alpine\n</tr>\n</thead>\n<tbody>\n<tr>\n <td>build-essential\n <td>build-base\n</tr>\n<tr>\n <td>chromium-sandbox\n <td>chromium-chromedriver\n</tr>\n<tr>\n <td>default-libmysqlclient-dev\n <td>mysql-client\n</tr>\n<tr>\n <td>default-mysqlclient\n <td>mysql-client\n</tr>\n<tr>\n <td>freedts-bin\n <td>freedts\n</tr>\n<tr>\n <td>libicu-dev\n <td>icu-dev\n</tr>\n<tr>\n <td>libjemalloc\n <td>jemalloc-dev\n</tr>\n<tr>\n <td>libjpeg-dev\n <td>jpeg-dev\n</tr>\n<tr>\n <td>libmagickwand-dev\n <td>imagemagick-libs\n</tr>\n<tr>\n <td>libsqlite3-0\n <td>sqlite-dev\n</tr>\n<tr>\n <td>libtiff-dev\n <td>tiff-dev\n</tr>\n<tr>\n <td>libvips\n <td>vips-dev\n</tr>\n<tr>\n <td>node-gyp\n <td>gyp\n</tr>\n<tr>\n <td>pkg-config\n <td>pkgconfig\n</tr>\n<tr>\n <td>python\n <td>python3\n</tr>\n<tr>\n <td>python-is-python3\n <td>python3\n</tr>\n<tr>\n <td>sqlite3\n <td>sqlite\n</tr>\n</tbody>\n</table>\n\n\n<p>Note: the above is just an approximation. For example, while <code>libsqlite3-0</code> and <code>sqlite-dev</code> include everything\nyou need to build an application that uses sqlite3, all that is needed at runtime is <code>sqlite-lib</code>. This relentless attention to detail contributes to smaller final image sizes.</p>\n\n<p>Note: For Bun, Node, and Rails users, knowledge of how to apply the above changes are included in recent versions of the dockerfile generators that we provide. After all, computers are good at <code>if</code> statements:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-q2q9lq4b\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-q2q9lq4b\">bunx dockerfile --alpine\nnpx dockerfile --alpine\nbin/rails generate dockerfile --alpine\n</code></pre>\n </div>\n</div><figure class=\"post-cta\">\n <figcaption>\n <h1>Choose your own Linux Distribution</h1>\n <p>Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.</p>\n <a class=\"btn btn-lg\" href=\"https://fly.io/docs/\">\n Run your entire stack near your users\n </a>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-rabbit.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>\n\n<h2 id='potential-issues' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#potential-issues' aria-label='Anchor'></a><span class='plain-code'>Potential issues</span></h2>\n<p>Over time, we’ve noted a number of issues.</p>\n\n<ul>\n<li>Alpine uses <a href='https://musl.libc.org/' title=''>musl</a> for a runtime library. Debian uses <a href='https://www.gnu.org/software/libc/' title=''>glibc</a>. Software tested on glibc may not work as expected on musl. And there are other potential compatibility issues like <a href='https://bell-sw.com/blog/how-to-deal-with-alpine-dns-issues/' title=''>DNS</a>.\n</li><li>Debian includes both <code>adduser</code> and <code>useradd</code>. Alpine, by default, only includes <code>adduser</code>.\nThis can be addressed by installing package like <a href='https://pkgs.alpinelinux.org/package/edge/community/armv7/shadow' title=''>shadow</a>, or switching to <code>adduser</code>.\n</li><li>Packages like <a href='https://github.com/nodenv/node-build' title=''>node-build</a> require <code>bash</code> which isn’t included by default. Adding it back in allows <code>node-build</code> to run to completion, but the end result is that a precompiled Debian executable is installed that won’t run on Alpine.\nAn alternative is to download an <a href='https://unofficial-builds.nodejs.org/' title=''>unofficial build</a>.\n</li><li>Release candidates for Alpine may not get the same level of testing as Debian resulting in problems\nlike <a href='https://github.com/sparklemotion/sqlite3-ruby/issues/434' title=''>sqlite3-ruby not working on Alpine 3.19</a>.\nIn cases like this, stay back on previous versions of Alpine for a short while, or compile the gem for yourself.\nThese issues are temporary.\n</li><li>Some packages, like Chrome, are not available for Alpine. Alternatives like Chromium may be necessary.\n</li></ul>\n<h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2>\n<p>While not as large a community as Debian, there is a substantial number of happy Alpine users.</p>\n\n<p>For the forseeable future, the default for both frameworks and there fly.io will remain Debian, but we make it easy to switch.</p>\n\n<p>Try it out! Hopefully this blog has provided insight into what you should evaluate for before you switch.</p>", "image": { "url": "https://fly.io/blog/fly-with-alpine/assets/fly-with-alpine-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/fks/", "title": "Introducing Fly Kubernetes", "description": null, "url": "https://fly.io/blog/fks/", "published": "2023-12-18T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io, and if you’ve been following us awhile you probably just did a double-take. We’re building a new public cloud that runs containerized applications with virtual machine isolation on our own hardware around the world. And we’ve been doing it without any K8s. Until now!</p>\n</div><div class=\"callout\"><p><strong class=\"font-semibold text-navy-950\">Update, March 2024:</strong> FKS does more stuff now, and you can read about it in <a href=\"https://fly.io/blog/fks-beta-live/\" title=\"\">Fly Kubernetes does more now</a></p>\n</div>\n<p>We’ll own it: we’ve been snarky about Kubernetes. We are, at heart, old-school Unix nerds. We’re still scandalized by <code>systemd</code>.</p>\n\n<p>To make matters more complicated, the problems we’re working on <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>have a lot of overlap with K8s</a>, but <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/#numad' title=''>just enough impedance mismatch</a> that it (<a href='https://www.nomadproject.io/' title=''>or anything that looks like it</a>) is a bad fit for our own platform.</p>\n\n<p>But, come on: you never took us too seriously about K8s, right? K8s is hard for us to use, but that doesn’t mean it’s not a great fit for what you’re building. We’ve been clear about that all along, right? Sure we have!</p>\n\n<p>Well, good news, everybody! If K8s is important for your project, and that’s all that’s been holding you back from <a href='https://fly.io/docs/speedrun/' title=''>trying out Fly.io</a>, we’ve spent the past several months building something for you.</p>\n<h2 id='fly-io-for-kubernetians' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-io-for-kubernetians' aria-label='Anchor'></a><span class='plain-code'>Fly.io For Kubernetians</span></h2>\n<p>Fly.io works by transmogrifying Docker containers into filesystems for <a href='https://firecracker-microvm.github.io/' title=''>lightweight hypervisors</a>, and running them on servers we rack in dozens of regions around the world.</p>\n\n<p>You can build something like Fly.io with “standard” orchestration tools like K8s. In fact, that’s what we did to start, too. To keep things simple, we used Nomad, and instead of K8s CNIs, we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system <a href='https://fly.io/blog/bpf-xdp-packet-filters-and-udp/' title=''>based on eBPF</a>). But the ideas are the same.</p>\n\n<p>The way we look at it, the signature feature of a “standard” orchestrator is the global scheduler: the global eye in the sky that keeps track of vacancies on servers and optimized placement of new workloads. That’s the problem we ran into. We’re running over 200,000 applications, and we’re doing so on every continent except Antarctica. The speed of light (and a globally distributed network of backhoes) has something to say about keeping a perfectly consistent global picture of hundreds of thousands of applications, and it’s not pleasant.</p>\n\n<p>The other problem we ran into is that our Nomad scheduler kept trying to outsmart us, and, worse, our customers. It turns out that our users have pretty firm ideas of where they’d like their apps to run. If they ask for São Paulo, they want São Paulo, not Rio. But global schedulers have other priorities, like optimally bin-packing resources, and sometimes <code>GIG</code> looks just as good as <code>GRU</code> to them.</p>\n\n<p>To escape the scaling and DX problems we were hitting, we rethought orchestration. Where orchestrators like K8s tend to work through distributed consensus, we keep state local to workers. Each racked server in our fleet is a source of truth about the apps running on it, and provide an API to a market-style “scheduler” that bids on resources in regions. <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/#numad' title=''>You can read more about here, if you’re interested.</a> We call this system the <a href='https://fly.io/docs/machines/' title=''>Fly Machines API.</a></p>\n\n<p>An important detail to grok about how this all works – a reason we haven’t, like, beaten the CAP theorem by doing this – is that Fly Machines API calls can fail. If Nomad or K8s tries to place a workload on some server, only to find out that it’s filled up or thrown a rod, it will go hunt around for some other place to put it, like a good little robot. The Machines API won’t do this. It’ll just fail the request. In fact, it goes out of its way to fail the request quickly, to deliver feedback; if we can’t schedule work in <code>JNB</code> right now, you might want instead to quickly deploy to <code>BOM</code>.</p>\n<h2 id='pluggable-orchestration-and-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pluggable-orchestration-and-fks' aria-label='Anchor'></a><span class='plain-code'>Pluggable Orchestration and FKS</span></h2>\n<p>In a real sense what we’ve done here is extract a chunk of the scheduling problem out of our orchestrator, and handed it off to other components. For most of our users, that component is <a href='https://github.com/superfly/flyctl' title=''><code>flyctl</code>, our intrepid CLI</a>.</p>\n\n<p>But <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Fly Machines is an API</a>, and anything can drive it. A lot of our users want quick answers to requests to schedule apps in specific regions, and <code>flyctl</code> does a fine job of that. But it’s totally reasonable to want something that works more like the good little robots inside of K8s.</p>\n\n<p>You can build your own orchestrator with our API, but if what you’re looking for is literally Kubernetes, we’ve saved you the trouble. It’s called Fly Kubernetes, or FKS for short.</p>\n\n<p>FKS is an implementation of Kubernetes that runs on top of Fly.io. You start it up using <code>flyctl</code>, by running <code>flyctl ext k8s create</code>.</p>\n\n<p>Under the hood, FKS is a straightforward combination of two well-known Kubernetes projects: <a href='https://k3s.io/' title=''>K3s, the lightweight CNCF-certified K8s distro</a>, and <a href='https://virtual-kubelet.io/' title=''>Virtual Kubelet</a>.</p>\n\n<p>Virtual Kubelet is interesting. In K8s-land, a <code>kubelet</code> is a host agent; it’s the thing that runs on every server in your fleet that knows how to run a K8s Pod. Virtual Kubelet isn’t a host agent; it’s a software component that pretends to be a host, registering itself with K8s as if it was one, but then sneakily proxying the Kubelet API elsewhere.</p>\n\n<p>In FKS, “elsewhere” is <a href='https://fly.io/docs/machines/' title=''>Fly Machines</a>. All we have to do is satisfy various APIs that virtual kubelet exposes. For example, the API for the lifecycle of a pod:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-did7dsc1\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-did7dsc1\">type PodLifecycleHandler interface {\n CreatePod(ctx context.Context, pod *corev1.Pod) error\n UpdatePod(ctx context.Context, pod *corev1.Pod) error\n DeletePod(ctx context.Context, pod *corev1.Pod) error\n GetPod(ctx context.Context, namespace, name string) (*corev1.Pod, error)\n GetPodStatus(ctx context.Context, namespace, name string) (*corev1.PodStatus, error)\n GetPods(context.Context) ([]*corev1.Pod, error)\n}\n</code></pre>\n </div>\n</div>\n<p>This interface is easy to map to the Fly Machines API. For example:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-hv82buwy\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-hv82buwy\">CreatePod -> POST /apps/{app_name}/machines\nUpdatePod -> POST /apps/{app_name}/machines/{machine_id}\n</code></pre>\n </div>\n</div>\n<p>K3s, meanwhile, is a stripped-down implementation of all of K8s that fits into a single binary. K3s does a bunch of clever things to be as streamlined as it is, but the most notable of them is <a href='https://github.com/k3s-io/kine' title=''>kine, an API shim that switches <code>etcd</code> out with databases like SQLite</a>. Because of <code>kine</code>, K3s can manage multiple servers, but also gracefully runs on a single server, without distributed state.</p>\n\n<p>So that’s what we do. When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine. We compile a <a href='https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/' title=''>kubeconfig</a>, with which you can talk to your K3s via <code>kubectl</code>. We set the whole thing up to run Pods on individual Fly Machines, so your cluster scales out directly using our platform, but with K8s tooling.</p>\n\n<p>One thing we like about this design is how much of the lifting is already done for us by the underlying platform. If you’re a K8s person, take a second to think of all the different components you’re dealing with: <a href='https://etcd.io/' title=''>etcd</a>, specifically provisioned nodes, the <a href='https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/' title=''>kube-proxy</a>, <a href='https://github.com/flannel-io/flannel' title=''>a CNI </a>binary and configuration and its integration with the host network, containerd, registries. But Fly.io already does most of those things. So this project was mostly chipping away components until we found the bare minimum: CoreDNS, SQLite persistence, and Virtual Kubelet.</p>\n\n<p>We ended up with something significantly simpler than K3s, which is saying something.</p>\n\n<p>Fly Kubernetes has some advantages over plain <code>flyctl</code> and <code>fly.toml</code>:</p>\n\n<ul>\n<li>Your deployment is more declarative than it is with the <code>fly.toml</code> file. You declare the exact state of everything down to replica counts, autoscaling rules, volume definitions, and more.\n</li><li>When you deploy with Fly Kubernetes, Kubernetes will automatically make your definitions match the state of the world. Machines go down? Kubernetes will whack them back online.\n</li></ul>\n\n<p>This is a different way to do orchestration and scheduling on Fly.io. It’s not what everyone is going to want. But if you want it, you really want it, and we’re psyched to give it to you: Fly.io’s platform features, with Kubernetes handling configuration and driving your system to its desired state.</p>\n\n<p>We’ve kept things simple to start with. There are K8s use cases we’re a strong fit for today, and others we’ll get better at in the near future, as K8s users drive the underlying platform (and particularly our proxy) forward.</p>\n\n<p><strong class='font-semibold text-navy-950'>Interested in getting early access? Email us at <a href=\"mailto:[email protected]\">[email protected]</a> and we’ll hook you up.</strong></p>\n<figure class=\"post-cta\">\n <figcaption>\n <h1>Not invested in K8s?</h1>\n <p>Nothing has to change for you! You can deploy apps on Fly.io today, in a matter of minutes, without talking to Sales.</p>\n <a class=\"btn btn-lg\" href=\"https://fly.io/docs/speedrun/\">\n Deploy an app in minutes.<span class='opacity-50'>→</span>\n </a>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>\n\n<div class=\"youtube-container\" data-exclude-render>\n <div class=\"youtube-video\">\n <iframe\n width=\"100%\"\n height=\"100%\"\n src=\"https://www.youtube.com/embed/A3vFfZvUiwo\"\n frameborder=\"0\"\n allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n allowfullscreen>\n </iframe>\n </div>\n</div>\n\n<h2 id='what-it-all-means' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-all-means' aria-label='Anchor'></a><span class='plain-code'>What It All Means</span></h2>\n<p>One obvious thing it means is that you’ve got an investment in Kubernetes tooling, you can keep it while running things on top of Fly.io. So that’s pretty neat. Buy our cereal!</p>\n\n<p>But the computer science story is interesting, too. We placed a bet on an idiosyncratic strategy for doing global orchestration. We replaced global consensus, which is how Borg, Kubernetes, and Nomad all work, with a market-based system. That system was faster and, importantly, dumber than the consensus system it replaced.</p>\n\n<p>This had costs! Nomad’s global consensus would do truly heroic amounts of work to make sure Fly Apps got scheduled somewhere, anywhere. Like a good capitalist, Fly Machines will tell you in no uncertain terms how much work it’s willing to do for you (“less than a Nomad”).</p>\n\n<p>But that doesn’t mean you’re stuck with the answers Fly Machines gives by itself. Because Fly Machines is so simple, and tries so hard to be predictable, we hoped you’d be able to build more sophisticated scheduling and orchestration schemes on top of it. And here you go: Kubernetes scheduling, as a plugin to the platform.</p>\n\n<p>More to come! We’re itching to see just how many different ways this bet might pay off. Or: we’ll perish in flames! Either way, it’ll be fun to watch.</p>", "image": { "url": "https://fly.io/blog/fks/assets/fks-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/fly-io-has-gpus-now/", "title": "Fly.io has GPUs now", "description": null, "url": "https://fly.io/blog/fly-io-has-gpus-now/", "published": "2023-12-13T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io, we’re a new public cloud that lets you put your compute where it matters: near your users. Today we’re announcing that you can do this with GPUs too, allowing you to do AI workloads on the edge. Want to find out more? Keep reading.</p>\n</div><h2 id='ai-is-pretty-fly' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ai-is-pretty-fly' aria-label='Anchor'></a><span class='plain-code'>AI is pretty fly</span></h2>\n<p>AI is apparently a bit of a <em>thing</em> (maybe even <em>an thing</em> come to think about it). We’ve seen entire industries get transformed in the wake of ChatGPT existing (somehow it’s only been around for a year, I can’t believe it either). It’s likely to leave a huge impact on society as a whole in the same way that the Internet did once we got search engines. Like any good venture-capital funded infrastructure provider, we want to enable you to do hilarious things with AI using industrial-grade muscle.</p>\n\n<p>Fly.io lets you run a full-stack app—or an entire dev platform based on the <a href='https://fly.io/docs/machines/' title=''>Fly Machines API</a>—close to your users. Fly.io GPUs let you attach an <a href='https://www.nvidia.com/en-us/data-center/a100/' title=''>Nvidia A100</a> to whatever you’re building, harnessing the full power of CUDA with more VRAM than your local 4090 can shake a ray-traced stick at. With these cards (or whatever you call a GPU attached to SXM fabric), AI/ML workloads are at your fingertips. You can <a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>recognize speech</a>, segment text, summarize articles, synthesize images, and more at speeds that would make your homelab blush. You can even set one up as your programming companion with <a href='https://github.com/deepseek-ai/DeepSeek-Coder' title=''>your model of choice</a> in case you’ve just not been feeling it with the output of <em>other</em> models changing over time.</p>\n\n<p>If you want to find out more about what these cards are and what using them is like, check out <a href='https://fly.io/blog/what-are-these-gpus-really/' title=''>What are these “GPUs” really?</a> It covers the history of GPUs and why it’s ironic that the cards we offer are called “Graphics Processing Units” in the first place.</p>\n<h2 id='fly-io-gpus-in-action' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-io-gpus-in-action' aria-label='Anchor'></a><span class='plain-code'>Fly.io GPUs in Action</span></h2>\n<p>We want you to deploy your own code with your favorite models on top of Fly.io’s cloud backbone. Fly.io GPUs make this really easy.</p>\n\n<p>You can get a GPU app running <a href='https://ollama.ai' title=''>Ollama</a> (our friends in text generation) in two steps:</p>\n\n<ol>\n<li><p>Put this in your <code>fly.toml</code>:</p>\n<div class=\"highlight-wrapper group relative toml\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-l8a9wi1z\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-l8a9wi1z\"><span class=\"py\">app</span> <span class=\"p\">=</span> <span class=\"s\">\"sandwich_ai\"</span>\n<span class=\"py\">primary_region</span> <span class=\"p\">=</span> <span class=\"s\">\"ord\"</span>\n<span class=\"py\">vm.size</span> <span class=\"p\">=</span> <span class=\"s\">\"a100-40gb\"</span>\n\n<span class=\"nn\">[build]</span>\n <span class=\"py\">image</span> <span class=\"p\">=</span> <span class=\"s\">\"ollama/ollama\"</span>\n\n<span class=\"nn\">[mounts]</span>\n <span class=\"py\">source</span> <span class=\"p\">=</span> <span class=\"s\">\"models\"</span>\n <span class=\"py\">destination</span> <span class=\"p\">=</span> <span class=\"s\">\"/root/.ollama\"</span>\n <span class=\"py\">initial_size</span> <span class=\"p\">=</span> <span class=\"s\">\"100gb\"</span>\n</code></pre>\n </div>\n</div></li><li><p>Run <code>fly apps create sandwich_ai && fly deploy</code>.</p>\n</li></ol>\n\n<p>If you want to read more about how to start your new sandwich empire, check out <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a>, it explains how to set up Ollama so that it <em>automatically scales itself down</em> when it’s not in use.</p>\n<h2 id='the-speed-of-light-is-only-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-speed-of-light-is-only-so-fast' aria-label='Anchor'></a><span class='plain-code'>The speed of light is only so fast</span></h2>\n<p>Being able to spin up GPUs is great, but where Fly.io really shines is inference at the edge.</p>\n\n<p>Let’s say you have an app that lets users enter ingredients they have in their kitchen and receive a sandwich recipe. Your users expect their recipes <em>instantly</em> (or at least as fast as the other leading apps). Seconds count when you need an emergency sandwich.</p>\n<div class=\"left-sidenote\"><p><br>\n<br>\n<br>\nIt’s depressingly customary in the AI industry to cherry-pick outputs. This was not cherry-picked. I used <a href=\"https://ollama.ai/library/yi:34b\" title=\"\">yi:34b</a> to generate this recipe. I’m not sure what a taco salad sandwich is, but I might be willing to try it.</p>\n</div>\n<p><img alt=\"A conversation between a user and an artificial intelligence. The user asks: \"What kind of sandwich can I make with an egg, a taco shell, and a pound of ground beef?\" The AI responds: \" You can make a breakfast taco or a taco salad sandwich with the ingredients you have. Here's how to prepare each: for the breakfast taco. Cook the pound of ground beef in a pan until browned. Season with salt, pepper, and any other desired spices like chili powder or taco seasoning. While the beef is cooking, fry an egg (or scramble it if you prefer) in another pan. Warm up the taco shell in the oven or microwave according to the package instructions. Assemble the taco by placing the ground cooked beef in the taco shell followed by the fried egg. You can also add any additional toppings like diced onions avocados, avocado, salsa or cheese if you have them. Serve with a side of hot sauce or sour cream if desired. For the taco salad sandwich: Prepare the ground beef as in the breakfast taco recipe above. Instead of frying an egg, you can cook it sunny side up or over easy depending on your preference. Crumble the taco shell into small pieces and mix them with the cooked ground beef. Assemble the sandwich by layering the crumbled taco shell mixture and ground beef in two slices of bread. Top with the fried egg and any additional toppings you like such as shredded lettuce, diced tomatoes, or sliced olives. Serve with a side of salsa or ranch dressing if desired. Both options are hardy and can be easily customized based on your taste preferences. Enjoy!\"\" src=\"/blog/fly-io-has-gpus-now/assets/sandwich-recipes.webp\" /></p>\n\n<p>In the previous snippet, we deployed our app to ord (<code>primary_region = \"ord\"</code>). The good news is that our model returns a result really quickly and users in Chicago get instant sandwich recipes. It’s a good experience for users near your datacentre, and you can do this on any half decent cloud provider.</p>\n\n<p>But surely people outside of Chicago need sandwiches too. Amsterdam has sandwich fiends as well. And sometimes it takes too long to have their requests leap across the pond. The speed of light is only so fast after all. Don’t worry, we’ve got your back. Fly.io has GPUs in datacentres all over the world. Even more, we’ll let you run <em>the same program</em> with the same public IP address and the same TLS certificates in any regions with GPU support.</p>\n\n<p>Don’t believe us? See how you can scale your app up in Amsterdam with one command:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-404ps1ts\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-404ps1ts\">fly scale count 2 --region ams\n</code></pre>\n </div>\n</div>\n<p>It’s that easy.</p>\n<h2 id='actually-on-demand' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#actually-on-demand' aria-label='Anchor'></a><span class='plain-code'>Actually On-Demand</span></h2>\n<p>GPUs are powerful parallel processing packages, but they’re not cheap! Once we have enough people wanting to turn their fridge contents into tasty sandwiches, keeping a GPU or two running makes sense. But we’re just a small app still growing our user base while also funding the latest large sandwich model research. We want to only pay for GPUs when a user makes a request.</p>\n\n<p>Let’s open up that <code>fly.toml</code> again, and add a section called <code>services</code>, and we’ll include instructions on how we want our app to scale up and down:</p>\n<div class=\"highlight-wrapper group relative toml\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-cfo4p0z3\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-cfo4p0z3\"><span class=\"nn\">[[services]]</span>\n <span class=\"py\">internal_port</span> <span class=\"p\">=</span> <span class=\"mi\">8080</span>\n <span class=\"py\">protocol</span> <span class=\"p\">=</span> <span class=\"s\">\"tcp\"</span>\n <span class=\"py\">auto_stop_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n <span class=\"py\">auto_start_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n <span class=\"py\">min_machines_running</span> <span class=\"p\">=</span> <span class=\"mi\">0</span>\n</code></pre>\n </div>\n</div>\n<p>Now when no one needs sandwich recipes, you don’t pay for GPU time.</p>\n<h2 id='the-deets' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-deets' aria-label='Anchor'></a><span class='plain-code'>The Deets</span></h2>\n<p>We have GPUs ready to use in several US and EU regions and Sydney. You can deploy your sandwich, music generation, or AI illustration apps to:</p>\n\n<ul>\n<li><a href='https://www.nvidia.com/en-us/data-center/a100/' title=''>Ampere A100s</a> with 40gb of RAM for $2.50/hr\n</li><li><a href='https://www.nvidia.com/en-us/data-center/a100/' title=''>Ampere A100s</a> with 80gb of RAM for $3.50/hr\n</li><li><a href='https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413' title=''>Lovelace L40s</a> are coming soon (update: now here!) for $2.50/hr\n</li></ul>\n\n<p>By default, anything you deploy to GPUs will use eight heckin’ <a href='https://www.amd.com/en/processors/epyc-server-cpu-family' title=''>AMD EPYC</a> CPU cores, and you can attach volumes up to 500 gigabytes. We’ll even give you discounts for reserved instances and dedicated hosts if you ask nicely.</p>\n\n<p>We hope you have fun with these new cards and we’d love to see what you can do with them! Reach out to us on X (formerly Twitter) or <a href='https://community.fly.io/' title=''>the community forum</a> and share what you’ve been up to. We’d love to see what we can make easier!</p>", "image": { "url": "https://fly.io/blog/fly-io-has-gpus-now/assets/llama-portal-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/what-are-these-gpus-really/", "title": "What are these \"GPUs\" really?", "description": null, "url": "https://fly.io/blog/what-are-these-gpus-really/", "published": "2023-12-11T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>Fly.io runs containerized apps with virtual machine isolation on our own hardware around the world, so you can safely run your code close to where your users are. We’re in the process of rolling out GPU support, and that’s what this post is about, but you don’t have to wait for that to try us out: <a href=\"https://fly.io/docs/speedrun/\" title=\"\">your app can be up and running on us in minutes</a>.</p>\n</div>\n<p>GPU hardware will let our users run all sorts of fun Artificial Intelligence and Machine Learning (AI/ML) workloads near their users. But, what are these “GPUs” really? What can they do? What <em>can’t</em> they do?</p>\n\n<p>Listen here for my tale of woe as I spell out exactly what these cards are, are not, and what you can do with them. By the end of this magical journey, you should understand the true irony of them being called “Graphics Processing Units” and why every marketing term is always bad forever.</p>\n<h2 id='how-does-computer-formed' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-does-computer-formed' aria-label='Anchor'></a><span class='plain-code'>How does computer formed?</span></h2>\n<p>In the early days of computing, your computer generally had a few basic components:</p>\n\n<ul>\n<li>The CPU\n</li><li>Input device and assorted peripherals (keyboard, etc)\n</li><li>Output device (monitor, printer, etc)\n</li><li>Memory\n</li><li>Glue logic chips\n</li><li>Video rendering hardware\n</li></ul>\n\n<p>Taking the Commodore 64 as an example, it had a CPU, a chip to handle video output, a chip to handle audio output, and a chip to glue everything together. The CPU would read instructions from the RAM and then execute them to do things like draw to the screen, solve sudoku puzzles, play sounds, and so on.</p>\n\n<p>However, even though the CPU by itself was fast by the standards of the time, it could only do a million clock cycles per second or so. Imagine a very small shouting crystal vibrating millions of times per second triggering the CPU to do one part of a task and you’ll get the idea. This is fast, but not fast enough when executing instructions can take longer than a single clock cycle and when your video output device needs to be updated 60 times per second.</p>\n\n<p>The main way they optimized this was by shunting a lot of the video output tasks to a bespoke device called the VIC-II (Video Interface Chip, version 2). This allowed the Commodore 64 to send a bunch of instructions to the VIC-II and then let it do its thing while the CPU was off doing other things. This is called “offloading”.</p>\n\n<p><img src=\"/blog/what-are-these-gpus-really/assets/./deus-ex-machina-cover.webp\" /></p>\n\n<p>As technology advanced, the desire to do bigger and better things with both contemporary and future hardware increased. This came to a head when this little studio nobody had ever heard of called id Software released one of the most popular games of all time: DOOM.</p>\n\n<p>Now, even though DOOM was a huge advancement in gaming technology, it was still incredibly limited by the hardware of the time. It was actually a 2D game that used a lot of tricks to make it look (and feel) like it was 3D. It was also limited to a resolution of 320x200 and a hard cap of 35 frames per second. This was fine for the time (most movies were only at 24 frames per second), but it was clear that there was a lot of room for improvement.</p>\n\n<p>One of the main things that DOOM did was to use a pair of techniques to draw the world at near real-time. It used a combination of “raycasting” and binary-space partitioning to draw the world. This basically means that they drew a bunch of imaginary lines to where points in the map would be to figure out what color everything would be and then eliminated the parts of the map that were behind walls and other objects. This is a very simplified explanation, and if you want to know more, <a href='https://fabiensanglard.net/doomIphone/doomClassicRenderer.php' title=''>Fabien Sanglard explains the rendering</a> of DOOM in more detail.</p>\n<h2 id='the-dream-of-3d' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-dream-of-3d' aria-label='Anchor'></a><span class='plain-code'>The dream of 3D</span></h2>\n<p>However, a lot of this was logic that ran very slowly on the CPU, and while the CPU was doing the display logic, it couldn’t do anything else, such as enemy AI or playing sounds. Hence the idea of a “3D accelerator card”. The idea: offload the 3D rendering logic to a separate device that could do it much faster than the CPU could, and free the CPU to do other things like AI, sound, and so on.</p>\n\n<p>This was the dream, but it was a long way off. Then Quake happened.</p>\n<div class=\"right-sidenote\"><p>Really, Half-Life is based on Quake so much that the pattern for <a href=\"https://www.pcgamer.com/half-life-alyxs-lights-flicker-just-like-they-did-in-quake-almost-25-years-later/\" title=\"\">blinking lights</a> has carried forward 25 years later to Half-Life: Alyx in VR. If it ain’t broke, don’t fix it.</p>\n</div>\n<p>Unlike Doom, Quake was fully 3D on unmodified consumer hardware. Players could look up and down (something previously thought impossible without accelerator hardware!) and designers could make levels with that in mind. Quake also allowed much more complex geometry and textures. It was a huge leap forward in 3D gaming and it was only possible because of the massive leap in CPU power at the time. The Pentium family of processors was such a huge leap that it allowed them to bust through and do it in “real time”. Quake has since set the standard for multiplayer deathmatch games, and its source code has lineage to Call of Duty, Half-Life, Half-Life 2, DotA 2, Titanfall, and Apex Legends.</p>\n\n<p>However, the thing that really made 3D accelerator cards leap into the public spotlight was another little-known studio called Crystal Dynamics and their 1996 release of Tomb Raider. It was built from the ground up to require the use of 3D accelerator cards. The cards flew off the shelves.</p>\n\n<p>“3D accelerator cards” would later become known as “Graphics Processing Units” or GPUs because of how synonymous they became with 3D gaming, engineering tasks such as Computer-Aided Drafting (CAD), and even the entire OS environment with compositors like <a href='https://en.wikipedia.org/wiki/Desktop_Window_Manager' title=''>DWM</a> on Windows Vista, <a href='https://en.wikipedia.org/wiki/Compiz' title=''>Compiz</a> on GNU+Linux, and <a href='https://en.wikipedia.org/wiki/Quartz_(graphics_layer)' title=''>Quartz</a> on macOS. Things became so much easier for everyone when 2D and 3D graphics were integrated into the same device so you didn’t need to chain your output through your 3D accelerator card!</p>\n<h2 id='the-gpu-as-we-know-it' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-gpu-as-we-know-it' aria-label='Anchor'></a><span class='plain-code'>The GPU as we know it</span></h2>\n<p>When GPUs first came out, they were very simple devices. They had a few basic components:</p>\n\n<ul>\n<li>A framebuffer to store the current state of the screen\n</li><li>A command processor to take instructions from the game and translate them into something the hardware can understand\n</li><li>Memory to store temporary data\n</li><li>Shader processing hardware to allow designers to change how light and textures were rendered\n</li><li>A display output that was chained through an existing VGA card so that the user could see what was going on in real time (yes, this is something we actually did)\n</li></ul>\n\n<p>This basic architecture has remained the same for the past 20 years or so. The main differences are that as technology advanced, the capabilities of those cards increased. They got faster, more parallel, more capable, had more memory, were made cheaper, and so on. This gradually allowed for more and more complex games like Half-Life 2, Crysis, The Legend of Zelda: Breath of the Wild, Baudur’s Gate 3, and so on.</p>\n\n<p>Over time, as more and more hardware was added, GPUs became computers in their own rights (sometimes even bigger than the rest of the computer thanks for the need to cool things more aggressively). This new hardware includes:</p>\n\n<ul>\n<li>Video encoding hardware via NVENC and AMD VCE so that content creators can stream and record their gameplay in higher quality without having to impact the performance of the game\n</li><li><aside class=\"left-sidenote\">Seriously, once you experience high framerate HDR raytraced Tetris you can’t really go back to the old way.</aside> Raytracing accelerator cores via RTX so that light can be rendered more realistically\n</li><li>AI/ML cores to allow for dynamic upscaling to eke out more performance from the card\n</li><li>Display output hardware to allow for multiple monitors to be connected to the card\n</li><li>Faster and faster memory buses and interfaces to the rest of the system to allow for more data to be processed faster\n</li><li>Direct streaming from the drive to GPU memory to allow for faster loading times\n</li></ul>\n\n<p>But, at the same time, that AI/ML hardware started to get noticed by more and more people. It was discovered that the shader cores and then the CUDA cores could be used to do AI/ML workloads at ludicrous speeds. This enabled research and development of models like GPT-2, Stable Diffusion, DLSS, and so on. This has led to a Cambrian Explosion of AI/ML research and development that is continuing to this day.</p>\n<h2 id='the-quot-gpus-quot-that-fly-io-is-using' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-quot-gpus-quot-that-fly-io-is-using' aria-label='Anchor'></a><span class='plain-code'>The “GPUs” that Fly.io is using</span></h2>\n<p>I’ve mostly been describing consumer GPUs and their capabilities up to this point because that’s what we all have the biggest understanding of. There is a huge difference between the “GPUs” that you can get for server tasks and normal consumer tasks from a place like Newegg or Best Buy. The main difference is that enterprise-grade Graphics Processing Units do not have any of the hardware needed to process graphics.</p>\n<div class=\"right-sidenote\"><p>Author’s note: This will not be the case in the future. Fly.io is going to add <a href=\"https://www.nvidia.com/en-us/data-center/l40s/\" title=\"\">Lovelace L40S GPUs</a> that do have 3D rendering, video encoding, shader cores, and so on. But, that’s not what we’re talking about today.</p>\n</div>\n<p>Yes. Really. They don’t have rasterization hardware, shader cores, display outputs, or anything useful for trying to run games on them. They are AI/ML accelerator cards more than anything. It’s kinda beautifully ironic that they’re called Graphics Processing Units when they have no ability to process graphics.</p>\n<h2 id='what-can-you-do-with-them' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-can-you-do-with-them' aria-label='Anchor'></a><span class='plain-code'>What can you do with them?</span></h2>\n<p>These GPUs are really good at massively parallel tasks. This naturally translates to being very good at AI/ML tasks such as:</p>\n\n<ul>\n<li>Summarization (what is this article about in a few sentences?)\n</li><li>Translation (what does this article say in Spanish?)\n</li><li>Speech recognition (what is a voice clip saying?)\n</li><li>Speech synthesis (what does this text sound like?)\n</li><li>Text generation (what would a cat say if it could talk?)\n</li><li>Basic rote question and answering (what is the safe cooking temperature for chicken breasts in celsius?)\n</li><li>Text classification (is this article about cats or dogs?)\n</li><li>Sentiment analysis (is this article positive or negative, what could that mean about the companies involved?)\n</li><li>Image classification (is this a cat or a dog?)\n</li><li>Object detection (where are the cats and dogs in this image?)\n</li></ul>\n\n<p>Or any combination/chain of these tasks. A lot of this is pretty abstract building blocks that can be combined in a lot of different ways. This is why AI/ML stuff is so exciting right now. We’re in the early days of understanding what these things are, what they can do, and how to use them properly.</p>\n\n<p>Imagine being able to load articles about the topic you are researching into your queries to find where someone said something roughly similar to what you’re looking for. Queries like “that one recipe with eggs that you fold over with ham in it”. That’s the kind of thing that’s possible with AI/ML (and tools like vector databases) but difficult to impossible with traditional search engines.</p>\n<h2 id='how-to-use-ai-for-reals' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-to-use-ai-for-reals' aria-label='Anchor'></a><span class='plain-code'>How to use AI for reals</span></h2>\n<p>Fortunately and unfortunately, we’re in the Cambrian Explosion days of this industry. Key advances happen constantly. Exact models and tooling changes almost as often. This is both a very good thing and a very bad thing.</p>\n\n<p>If you want to get started today, here’s a few models that you can play with right now:</p>\n\n<ul>\n<li><a href='https://ai.meta.com/llama/' title=''>Llama 2</a> - A generic foundation model with instruction and chat tuned variants. It’s a good starting point for a lot of research and nearly everything else uses the same formats that Llama 2 does.\n</li><li><a href='https://openai.com/research/whisper' title=''>Whisper</a> - A speech to text model that transcribes audio files into text better than most professional dictation software. I, the author of this article, wrote most of this article using Whisper.\n</li><li><a href='https://huggingface.co/NurtureAI/OpenHermes-2.5-Mistral-7B-16k' title=''>OpenHermes-2.5 Mistral 7B 16k</a> - An instruction-tuned model that can operate on up to 16 thousand tokens (about 40 printed pages of text, 12,000 words) at once. It’s a good starting point for summarization and other tasks that require a lot of context. I personally use it for my personal AI chatbot named <a href='https://xeiaso.net/characters/#Mimi' title=''>Mimi</a>.\n</li><li><aside class=\"right-sidenote\">Seriously Annie, you’re great!</aside> <a href='https://stability.ai/stable-diffusion' title=''>Stable Diffusion XL</a> - A text-to-image model that lets you create high quality images from simple text descriptions. It’s a good starting point for tasks that require image generation, such as when you want to add images to your blog posts but don’t have an artist like Annie to draw you what you want.\n</li></ul>\n\n<p>For a practical example, imagine that you have a set of <a href='https://xeiaso.net/talks/' title=''>conference talks that you’ve given over the years</a>. You want to take those talk videos, extract the audio, and transform them into written text because some people learn better from text than video. The overall workflow would look something like this:</p>\n\n<ul>\n<li>Use ffmpeg to extract the audio track from the video files\n</li><li>Use Whisper to <a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>convert the audio files into subtitle files</a>\n</li><li>Break the subtitle file into sequences based on significant pauses between topics (humans do this subconsciously, take advantage of it and you can make things seem heckin’ magic)\n</li><li>Use a large language model to summarize the segments and create a title for each segment\n</li><li>Paste the rest of the text into a markdown document between the segment titles\n</li><li>Manually review the documents and make any necessary changes with technical terms that the model didn’t know about or things the model got wrong because English is a minefield of homophones that even trained experts have trouble with (ask me how I know)\n</li><li>Publish the documents on your blog\n</li></ul>\n\n<p>Then bam, you don’t just have a portfolio piece, you have the recipe for winning downtime from visitors of orange websites clicking on your link so much. You can also use this to create transcripts for your videos so that people who can’t hear can still enjoy your content.</p>\n\n<p>The true advantage of these is not using them as individual parts on themselves, but as a cohesive whole in a chain. This is where the real power of AI/ML comes from. It’s not the individual models, but the ability to chain them together to do something useful. This is where the true opportunities for innovation lie.</p>\n<h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2>\n<p>So that’s what these “GPUs” are really: they’re AI/ML accelerator cards. The A100 cards incapable of processing graphics or encoding video, but they’re really, really good at AI/ML workloads. They allow you to do way more tasks per watt than any CPU ever could.</p>\n\n<p>I hope you enjoyed this tale of woe as I spilled out the horrible truths about marketing being awful forever and gave you ideas for how to <em>actually use</em> these graphics-free Graphics Processing Units to do useful things. But sadly, not for processing graphics unless you wait for the <a href='https://www.nvidia.com/en-us/data-center/l40s/' title=''>Lovelace L40S</a> cards early in 2024.</p>\n\n<p>Sign up for Fly.io today and try our GPUs! I can’t wait to see what you build with them.</p>", "image": { "url": "https://fly.io/blog/what-are-these-gpus-really/assets/gpu-songstress-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/scaling-llm-ollama/", "title": "Scaling Large Language Models to zero with Ollama", "description": null, "url": "https://fly.io/blog/scaling-llm-ollama/", "published": "2023-12-06T12:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io. We have powerful servers worldwide to run your code close to your users. Including GPUs so you can self host your own AI.</p>\n</div>\n<p>Open-source self-hosted AI tools have advanced a lot in the past 6 months. They allow you to create new methods of expression (with QR code generation and Stable Diffusion), easy access to summarization powers that would have made Google blush a decade ago (even with untuned foundation models such as LLaMa 2 and Yi), to conversational assistants that enable people to do more with their time, and to perform speech recognition in <em>real time</em> on moderate hardware (with Whisper et al). With all these capabilities comes the need for more and more raw computational muscle to be able to do inference on bigger and bigger models, and eventually do things that we can’t even imagine right now. Fly.io lets you put your compute where your users are so that you can do machine learning inference tasks on the edge with the power of enterprise-grade GPUs such as the Nvidia A100. You can also scale your GPU nodes to zero running Machines, so you only pay for what you actually need, when you need it.</p>\n<div class=\"right-sidenote\"><p>It’s worth mentioning that “scaling to zero” doesn’t mean what you may think it means. When you “scale to zero” in Fly.io, you actually stop the running Machine. This means the Machine is still laying around on the same computer box that it runs on, but it’s just put to sleep. If there is a capacity issue then your app may be unable to wake back up. We are working on a solution to this, but for now you should be aware that scaling to zero is not the same as spinning down your Machine and spinning it back up again on a new computer box when you need it.</p>\n</div><div class=\"callout\"><p>This is a continuation of the last post in this series about <a href=\"https://fly.io/blog/transcribing-on-fly-gpu-machines/\" title=\"\">how to use GPUs on Fly.io</a>.</p>\n</div><h2 id='why-scale-to-zero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-scale-to-zero' aria-label='Anchor'></a><span class='plain-code'>Why scale to zero?</span></h2>\n<p>Running GPU nodes on top of Fly is expensive. Sure, GPUs enable you to do things a lot faster than CPUs ever could on their own, but you mostly will have things run idle between uses. This is where scaling to zero comes in. With scaling to zero, you can have your GPU nodes shut down when you’re not using them. When your Machine stops, you aren’t paying for the GPU any more. This is good for the environment and your wallet.</p>\n\n<p>In this post, we’re going to be using <a href='https://ollama.ai' title=''>Ollama</a> to generate text. Ollama is a fancy wrapper around <a href='https://github.com/ggerganov/llama.cpp' title=''>llama.cpp</a> that allows you to run large language models on your own hardware with your choice of model. It also supports GPU acceleration, meaning that you can use Fly.io’s huge GPUs to run your models faster than your RTX 3060 at home ever would on its own.</p>\n\n<p>One of the main downsides of using Ollama in a cloud environment is that it doesn’t have authentication by default. Thanks to the power of about 70 lines of Go, we are able to shim that in after the fact. This will protect your server from random people on the internet using your GPU time (and spending your money) to generate text and integrate it into your own applications.</p>\n\n<p>Create a new folder called <code>ollama-scale-to-0</code>:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-hmfd22hk\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-hmfd22hk\"><span class=\"nb\">mkdir </span>ollama-scale-to-0\n</code></pre>\n </div>\n</div><h2 id='fly-app-setup' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-app-setup' aria-label='Anchor'></a><span class='plain-code'>Fly app setup</span></h2>\n<p>First, we need to create a new Fly app:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-tzghjjx5\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-tzghjjx5\">fly launch <span class=\"nt\">--no-deploy</span>\n</code></pre>\n </div>\n</div>\n<p>After selecting a name and an organization to run it in, this command will create the app and write out a <code>fly.toml</code> file for you:</p>\n<div class=\"highlight-wrapper group relative toml\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-bfrjoo6m\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-bfrjoo6m\"><span class=\"c\"># fly.toml app configuration file generated for sparkling-violet-709 on 2023-11-14T12:13:53-05:00</span>\n<span class=\"c\">#</span>\n<span class=\"c\"># See https://fly.io/docs/reference/configuration/ for information about how to use this file.</span>\n<span class=\"c\">#</span>\n\n<span class=\"py\">app</span> <span class=\"p\">=</span> <span class=\"s\">\"sparkling-violet-709\"</span>\n<span class=\"py\">primary_region</span> <span class=\"p\">=</span> <span class=\"s\">\"ord\"</span>\n\n<span class=\"nn\">[http_service]</span>\n <span class=\"py\">internal_port</span> <span class=\"p\">=</span> <span class=\"mi\">11434</span> <span class=\"c\"># change me to 11434!</span>\n <span class=\"py\">force_https</span> <span class=\"p\">=</span> <span class=\"kc\">false</span> <span class=\"c\"># change mo to false!</span>\n <span class=\"py\">auto_stop_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n <span class=\"py\">auto_start_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n <span class=\"py\">min_machines_running</span> <span class=\"p\">=</span> <span class=\"mi\">0</span>\n <span class=\"py\">processes</span> <span class=\"p\">=</span> <span class=\"nn\">[\"app\"]</span>\n</code></pre>\n </div>\n</div>\n<p>This is the configuration file that Fly.io uses to know how to run your application. We’re going to be modifying the <code>fly.toml</code> file to add some additional configuration to it, such as enabling GPU support:</p>\n<div class=\"highlight-wrapper group relative toml\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-3lhl3358\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-3lhl3358\"><span class=\"py\">app</span> <span class=\"p\">=</span> <span class=\"s\">\"sparkling-violet-709\"</span>\n<span class=\"py\">primary_region</span> <span class=\"p\">=</span> <span class=\"s\">\"ord\"</span>\n<span class=\"py\">vm.size</span> <span class=\"p\">=</span> <span class=\"s\">\"a100-40gb\"</span> <span class=\"c\"># the GPU size, see https://fly.io/docs/gpus/gpu-quickstart/ for more info</span>\n</code></pre>\n </div>\n</div>\n<p>We don’t want to expose the GPU to the internet, so we’re going to create a <a href='https://fly.io/docs/reference/private-networking/#flycast-private-load-balancing' title=''>flycast</a> address to expose it to other services on your private network. To create a flycast address, run this command:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-bthlbecs\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-bthlbecs\">fly ips allocate-v6 <span class=\"nt\">--private</span>\n</code></pre>\n </div>\n</div>\n<p>The <code>fly ips allocate-v6</code> command makes a unique address in your private network that you can use to access Ollama from your other services. Make sure to add the <code>--private</code> flag, otherwise you’ll get a globally unique IP address instead of a private one.</p>\n\n<p>Next, you may need to remove all of the other public IP addresses for the app to lock it away from the public. Get a list of them with <code>fly ips list</code> and then remove them with <code>fly ips release <ip></code>. Delete everything but your flycast IP.</p>\n\n<p>Next, we need to declare the volume for Ollama to store models in. If you don’t do this, then when you scale to zero, your existing models will be destroyed and you will have to re-download them every time the server starts. This is not ideal, so we’re going to create a persistent volume to store the models in. Add the following to your <code>fly.toml</code>:</p>\n<div class=\"highlight-wrapper group relative toml\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-i9h5kt6l\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-i9h5kt6l\"><span class=\"nn\">[build]</span>\n <span class=\"py\">image</span> <span class=\"p\">=</span> <span class=\"s\">\"ollama/ollama\"</span>\n\n<span class=\"nn\">[mounts]</span>\n <span class=\"py\">source</span> <span class=\"p\">=</span> <span class=\"s\">\"models\"</span>\n <span class=\"py\">destination</span> <span class=\"p\">=</span> <span class=\"s\">\"/root/.ollama\"</span>\n <span class=\"py\">initial_size</span> <span class=\"p\">=</span> <span class=\"s\">\"100gb\"</span>\n</code></pre>\n </div>\n</div>\n<p>This will create a 100GB volume in the <a href='https://en.wikipedia.org/wiki/O%27Hare_International_Airport' title=''><code>ord</code></a> region when the app is deployed. This will be used to store the models that you download from the <a href='https://ollama.ai/library/' title=''>Ollama library</a>. You can make this smaller if you want, but 100GB is a good place to start from.</p>\n\n<p>Now that everything is set up, we can deploy this to Fly.io:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-iogi1ir3\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-iogi1ir3\">fly deploy\n</code></pre>\n </div>\n</div>\n<p>This will take a minute to pull the Ollama image, push it to a Machine, provision your volume, and kick everything else off with hypervisors, GPUs and whatnot. Once it’s done, you should see something like this:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-rgjl7r36\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-rgjl7r36\"> ✔ Machine 17816141f55489 <span class=\"o\">[</span>app] update succeeded\n<span class=\"nt\">-------</span>\n\nVisit your newly deployed app at https://sparkling-violet-709.fly.dev/\n</code></pre>\n </div>\n</div>\n<p>This is a lie because we just deleted the public IP addresses for this app. You can’t access it from the internet, and by extension, random people can’t access it either. For now, you can run an interactive session with Ollama using an ephemeral Fly Machine:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-pjpmi8ic\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-pjpmi8ic\">fly m run <span class=\"nt\">-e</span> <span class=\"nv\">OLLAMA_HOST</span><span class=\"o\">=</span>http://sparkling-violet-709.flycast <span class=\"nt\">--shell</span> ollama/ollama\n</code></pre>\n </div>\n</div>\n<p>And then you can pull an image from the <a href='https://ollama.ai/library/' title=''>ollama library</a> and generate some text:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-ytdqtkck\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-ytdqtkck\"><span class=\"nv\">$ </span>ollama run openchat:7b-v3.5-fp16\n<span class=\"o\">>>></span> How <span class=\"k\">do </span>I bake chocolate chip cookies?\n To bake chocolate chip cookies, follow these steps:\n\n1. Preheat the oven to 375°F <span class=\"o\">(</span>190°C<span class=\"o\">)</span> and line a baking sheet with parchment paper or silicone baking mat.\n\n2. In a large bowl, mix together 1 cup of unsalted butter <span class=\"o\">(</span>softened<span class=\"o\">)</span>, 3/4 cup granulated sugar, and 3/4\ncup packed brown sugar <span class=\"k\">until </span>light and fluffy.\n\n3. Add 2 large eggs, one at a <span class=\"nb\">time</span>, to the butter mixture, beating well after each addition. Stir <span class=\"k\">in </span>1\nteaspoon of pure vanilla extract.\n\n4. In a separate bowl, whisk together 2 cups all-purpose flour, 1/2 teaspoon baking soda, and 1/2 teaspoon\nsalt. Gradually add the dry ingredients to the wet ingredients, stirring <span class=\"k\">until </span>just combined.\n\n5. Fold <span class=\"k\">in </span>2 cups of chocolate chips <span class=\"o\">(</span>or chunks<span class=\"o\">)</span> into the dough.\n\n6. Drop rounded tablespoons of dough onto the prepared baking sheet, spacing them about 2 inches apart.\n\n7. Bake <span class=\"k\">for </span>10-12 minutes, or <span class=\"k\">until </span>the edges are golden brown. The centers should still be slightly soft.\n\n8. Allow the cookies to cool on the baking sheet <span class=\"k\">for </span>a few minutes before transferring them to a wire rack\nto cool completely.\n\nEnjoy your homemade chocolate chip cookies!\n</code></pre>\n </div>\n</div>\n<p>If you want a persistent wake-on-use connection to your Ollama instance, you can set up a <a href='https://fly.io/docs/reference/private-networking/#discovering-apps-through-dns-on-a-wireguard-connection' title=''>connection to your Fly network using WireGuard</a>. This will let you use Ollama from your local applications without having to run them on Fly. For example, if you want to figure out the safe cooking temperature for ground beef in Celsius, you can query that in JavaScript with this snippet of code:</p>\n<div class=\"highlight-wrapper group relative typescript\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-rlnqfarq\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-rlnqfarq\"><span class=\"kd\">const</span> <span class=\"nx\">generateRequest</span> <span class=\"o\">=</span> <span class=\"p\">{</span>\n <span class=\"na\">model</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">openchat:7b-v3.5-fp16</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n <span class=\"na\">prompt</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">What is the safe cooking temperature for ground beef in celsius?</span><span class=\"dl\">\"</span>\n <span class=\"na\">stream</span><span class=\"p\">:</span> <span class=\"kc\">false</span><span class=\"p\">,</span> <span class=\"c1\">// <- important for Node/Deno clients</span>\n<span class=\"p\">};</span>\n\n<span class=\"kd\">let</span> <span class=\"nx\">resp</span> <span class=\"o\">=</span> <span class=\"k\">await</span> <span class=\"nx\">fetch</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">http://sparkling-violet-709.flycast/api/generate</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"p\">{</span>\n <span class=\"na\">method</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">POST</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n <span class=\"na\">body</span><span class=\"p\">:</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nx\">stringify</span><span class=\"p\">(</span><span class=\"nx\">generateRequest</span><span class=\"p\">),</span>\n<span class=\"p\">});</span>\n\n<span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"nx\">resp</span><span class=\"p\">.</span><span class=\"nx\">status</span> <span class=\"o\">!==</span> <span class=\"mi\">200</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n <span class=\"k\">throw</span> <span class=\"k\">new</span> <span class=\"nb\">Error</span><span class=\"p\">(</span><span class=\"s2\">`error fetching response: </span><span class=\"p\">${</span><span class=\"nx\">resp</span><span class=\"p\">.</span><span class=\"nx\">status</span><span class=\"p\">}</span><span class=\"s2\">: </span><span class=\"p\">${</span><span class=\"k\">await</span> <span class=\"nx\">resp</span><span class=\"p\">.</span><span class=\"nx\">text</span><span class=\"p\">()}</span><span class=\"s2\">`</span><span class=\"p\">);</span>\n<span class=\"p\">}</span>\n\n<span class=\"nx\">resp</span> <span class=\"o\">=</span> <span class=\"k\">await</span> <span class=\"nx\">resp</span><span class=\"p\">.</span><span class=\"nx\">json</span><span class=\"p\">();</span>\n\n<span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nx\">log</span><span class=\"p\">(</span><span class=\"nx\">resp</span><span class=\"p\">.</span><span class=\"nx\">response</span><span class=\"p\">);</span> <span class=\"c1\">// Something like \"The safe cooking temperature for ground beef is 71 degrees celsius (160 degrees fahrenheit).</span>\n</code></pre>\n </div>\n</div><h2 id='scaling-to-zero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#scaling-to-zero' aria-label='Anchor'></a><span class='plain-code'>Scaling to zero</span></h2>\n<p>The best part about all of this is that when you want to scale down to zero running Machines: do nothing, it will automatically shut down when it’s idle. Wait a few minutes and then verify it with <code>fly status</code>:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-u3h45u8u\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-u3h45u8u\"><span class=\"nv\">$ </span>fly status\n\n...\n\nPROCESS ID VERSION REGION STATE ROLE CHECKS LAST UPDATED\napp 3d8d7949b22089 9 ord stopped 2023-11-14T19:34:24Z\n</code></pre>\n </div>\n</div>\n<p>The app has been stopped. This means that it’s not running and you’re not paying for it. When you want it to start up again, just make a request. It will automatically start up and you can use it as normal with the CLI or even just arbitrary calls to <a href='https://github.com/jmorganca/ollama/blob/main/docs/api.md' title=''>the API</a>.</p>\n\n<p>You can also upload your own models to the Ollama registry by <a href='https://github.com/jmorganca/ollama/blob/main/docs/import.md' title=''>creating your own Modelfile</a> and pushing it (though you will need to install Ollama locally to publish your own models). At this time, the only way to set a custom system prompt is to use a Modelfile and upload your model to the registry.</p>\n<h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2>\n<p>Ollama is a fantastic way to run large language models of your choice and the ability to use Fly.io’s powerful GPUs means you can use bigger models with more parameters and a larger context window. This lets you make your assistants more lifelike, your conversations have more context, and your text generation more realistic.</p>\n\n<p>Oh, by the way, this also lets you use the new <code>json</code> mode to have your models call functions, similar to how ChatGPT would. To do this, have a system prompt that looks like this:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-p3jklt02\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-p3jklt02\">You are a helpful research assistant. The following functions are available for you to fetch further data to answer user questions, if relevant:\n\n{\n \"function\": \"search_bing\",\n \"description\": \"Search the web for content on Bing. This allows users to search online/the internet/the web for content.\",\n \"arguments\": [\n {\n \"name\": \"query\",\n \"type\": \"string\",\n \"description\": \"The search query string\"\n }\n ]\n}\n\n{\n \"function\": \"search_arxiv\",\n \"description\": \"Search for research papers on ArXiv. Make use of AND, OR and NOT operators as appropriate to join terms within the query.\",\n \"arguments\": [\n {\n \"name\": \"query\",\n \"type\": \"string\",\n \"description\": \"The search query string\"\n }\n ]\n}\n\nTo call a function, respond - immediately and only - with a JSON object of the following format:\n{\n \"function\": \"function_name\",\n \"arguments\": {\n \"argument1\": \"argument_value\",\n \"argument2\": \"argument_value\"\n }\n}\n\nIf no function needs to be called, respond with an empty JSON object: {}\n</code></pre>\n </div>\n</div>\n<p>Then you can use the <a href='https://github.com/jmorganca/ollama/blob/main/docs/api.md#request-json-mode' title=''>JSON format</a> to receive a JSON response from Ollama (hint: <code>—format=json</code> in the CLI or <code>format: \"json\"</code> in the API). This is a great way to make your assistants more lifelike and more useful. You will need to use something like <a href='https://www.langchain.com/' title=''>Langchain</a> or manual iterations to properly handle the cases where the user doesn’t want to call a function, but that’s a topic for another blog post.</p>\n\n<p>For the best results you may want to use a model with a larger context window such as <a href='https://ollama.ai/library/vicuna:13b-v1.5-16k-fp16' title=''>vicuna:13b-v1.5-16k-fp16</a> (16k == 16,384 token window) as JSON is very token-expensive. Future advances in the next few weeks (such as the Yi models gaining ludicrous token windows on the line of 200,000 tokens at the cost of ludicrous amounts of VRAM usage) will make this less of an issue. You can also get away with minifying the JSON in the functions and examples a lot, but you may need to experiment to get the best results.</p>\n\n<p>Happy hacking, y'all.</p>", "image": { "url": "https://fly.io/blog/scaling-llm-ollama/assets/thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/rethinking-serverless-with-flame/", "title": "Rethinking Serverless with FLAME", "description": null, "url": "https://fly.io/blog/rethinking-serverless-with-flame/", "published": "2023-12-06T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<blockquote>Imagine if you could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of your app.</blockquote>\n\n\n<p>The pursuit of elastic, auto-scaling applications has taken us to silly places.</p>\n\n<p>Serverless/FaaS had a couple things going for it. Elastic Scale™ is hard. It’s even harder when you need to manage those pesky servers. It also promised pay-what-you-use costs to avoid idle usage. Good stuff, right?</p>\n\n<p>Well the charade is over. You offload scaling concerns and the complexities of scaling, just to end up needing <em>more complexity</em>. Additional queues, storage, and glue code to communicate back to our app is just the starting point. Dev, test, and CI complexity balloons as fast as your costs. Oh, and you often have to rewrite your app in proprietary JavaScript – even if it’s already written in JavaScript!</p>\n\n<p>At the same time, the rest of us have elastically scaled by starting more webservers. Or we’ve dumped on complexity with microservices. This doesn’t make sense. Piling on more webservers to transcode more videos or serve up more ML tasks isn’t what we want. And granular scale shouldn’t require slicing our apps into bespoke operational units with their own APIs and deployments to manage.</p>\n\n<p>Enough is enough. There’s a better way to elastically scale applications.</p>\n<h2 id='the-flame-pattern' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-flame-pattern' aria-label='Anchor'></a><span class='plain-code'>The FLAME pattern</span></h2>\n<p>Here’s what we really want:</p>\n\n<ul>\n<li>We don’t want to manage those pesky servers. We already have this for our app deployments via <code>fly deploy</code>, <code>git push heroku</code>, <code>kubectl</code>, etc\n</li><li>We want on-demand, <em>granular</em> elastic scale of specific parts of our app code\n</li><li>We don’t want to rewrite our application or write parts of it in proprietary runtimes\n</li></ul>\n\n<p>Imagine if we could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of the app.</p>\n\n<p>Enter the FLAME pattern.</p>\n<blockquote>FLAME - Fleeting Lambda Application for Modular Execution</blockquote>\n\n\n<p>With FLAME, you treat your <em>entire application</em> as a lambda, where modular parts can be executed on short-lived infrastructure.</p>\n\n<p>No rewrites. No bespoke runtimes. No outrageous layers of complexity. Need to insert the results of an expensive operation to the database? PubSub broadcast the result of some expensive work? No problem! It’s your whole app so of course you can do it.</p>\n\n<p>The Elixir <a href='https://github.com/phoenixframework/flame' title=''>flame library</a> implements the FLAME pattern. It has a backend adapter for Fly.io, but you can use it on any cloud that gives you an API to spin up an instance with your app code running on it. We’ll talk more about backends in a bit, as well as implementing FLAME in other languages.</p>\n\n<p>First, lets watch a realtime thumbnail generation example to see FLAME + Elixir in action:</p>\n<div class=\"youtube-container\" data-exclude-render>\n <div class=\"youtube-video\">\n <iframe\n width=\"100%\"\n height=\"100%\"\n src=\"https://www.youtube.com/embed/l1xt_rkWdic\"\n frameborder=\"0\"\n allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n allowfullscreen>\n </iframe>\n </div>\n</div>\n\n\n<p>Now let’s walk thru something a little more basic. Imagine we have a function to transcode video to thumbnails in our Elixir application after they are uploaded:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-dcj5640t\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-dcj5640t\"><span class=\"k\">def</span> <span class=\"n\">generate_thumbnails</span><span class=\"p\">(%</span><span class=\"no\">Video</span><span class=\"p\">{}</span> <span class=\"o\">=</span> <span class=\"n\">vid</span><span class=\"p\">,</span> <span class=\"n\">interval</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n <span class=\"n\">tmp</span> <span class=\"o\">=</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">join</span><span class=\"p\">(</span><span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">tmp_dir!</span><span class=\"p\">(),</span> <span class=\"no\">Ecto</span><span class=\"o\">.</span><span class=\"no\">UUID</span><span class=\"o\">.</span><span class=\"n\">generate</span><span class=\"p\">())</span>\n <span class=\"no\">File</span><span class=\"o\">.</span><span class=\"n\">mkdir!</span><span class=\"p\">(</span><span class=\"n\">tmp</span><span class=\"p\">)</span>\n <span class=\"n\">args</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"s2\">\"-i\"</span><span class=\"p\">,</span> <span class=\"n\">vid</span><span class=\"o\">.</span><span class=\"n\">url</span><span class=\"p\">,</span> <span class=\"s2\">\"-vf\"</span><span class=\"p\">,</span> <span class=\"s2\">\"fps=1/</span><span class=\"si\">#{</span><span class=\"n\">interval</span><span class=\"si\">}</span><span class=\"s2\">\"</span><span class=\"p\">,</span> <span class=\"s2\">\"</span><span class=\"si\">#{</span><span class=\"n\">tmp</span><span class=\"si\">}</span><span class=\"s2\">/%02d.png\"</span><span class=\"p\">]</span>\n <span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">cmd</span><span class=\"p\">(</span><span class=\"s2\">\"ffmpeg\"</span><span class=\"p\">,</span> <span class=\"n\">args</span><span class=\"p\">)</span>\n <span class=\"n\">urls</span> <span class=\"o\">=</span> <span class=\"no\">VidStore</span><span class=\"o\">.</span><span class=\"n\">put_thumbnails</span><span class=\"p\">(</span><span class=\"n\">vid</span><span class=\"p\">,</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">wildcard</span><span class=\"p\">(</span><span class=\"n\">tmp</span> <span class=\"o\"><></span> <span class=\"s2\">\"/*.png\"</span><span class=\"p\">))</span>\n <span class=\"no\">Repo</span><span class=\"o\">.</span><span class=\"n\">insert_all</span><span class=\"p\">(</span><span class=\"no\">Thumb</span><span class=\"p\">,</span> <span class=\"no\">Enum</span><span class=\"o\">.</span><span class=\"n\">map</span><span class=\"p\">(</span><span class=\"n\">urls</span><span class=\"p\">,</span> <span class=\"o\">&</span><span class=\"p\">%{</span><span class=\"ss\">vid_id:</span> <span class=\"n\">vid</span><span class=\"o\">.</span><span class=\"n\">id</span><span class=\"p\">,</span> <span class=\"ss\">url:</span> <span class=\"nv\">&1</span><span class=\"p\">}))</span>\n<span class=\"k\">end</span>\n</code></pre>\n </div>\n</div>\n<p>Our <code>generate_thumbnails</code> function accepts a video struct. We shell out to <code>ffmpeg</code> to take the video URL and generate thumbnails at a given interval. We then write the temporary thumbnail paths to durable storage. Finally, we insert the generated thumbnail URLs into the database.</p>\n\n<p>This works great locally, but CPU bound work like video transcoding can quickly bring our entire service to a halt in production. Instead of rewriting large swaths of our app to move this into microservices or some FaaS, we can simply wrap it in a FLAME call:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-gcihj0ww\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-gcihj0ww\"><span class=\"k\">def</span> <span class=\"n\">generate_thumbnails</span><span class=\"p\">(%</span><span class=\"no\">Video</span><span class=\"p\">{}</span> <span class=\"o\">=</span> <span class=\"n\">vid</span><span class=\"p\">,</span> <span class=\"n\">interval</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n <span class=\"no\">FLAME</span><span class=\"o\">.</span><span class=\"n\">call</span><span class=\"p\">(</span><span class=\"no\">MyApp</span><span class=\"o\">.</span><span class=\"no\">FFMpegRunner</span><span class=\"p\">,</span> <span class=\"k\">fn</span> <span class=\"o\">-></span>\n <span class=\"n\">tmp</span> <span class=\"o\">=</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">join</span><span class=\"p\">(</span><span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">tmp_dir!</span><span class=\"p\">(),</span> <span class=\"no\">Ecto</span><span class=\"o\">.</span><span class=\"no\">UUID</span><span class=\"o\">.</span><span class=\"n\">generate</span><span class=\"p\">())</span>\n <span class=\"no\">File</span><span class=\"o\">.</span><span class=\"n\">mkdir!</span><span class=\"p\">(</span><span class=\"n\">tmp</span><span class=\"p\">)</span>\n <span class=\"n\">args</span> <span class=\"o\">=</span>\n <span class=\"p\">[</span><span class=\"s2\">\"-i\"</span><span class=\"p\">,</span> <span class=\"n\">vid</span><span class=\"o\">.</span><span class=\"n\">url</span><span class=\"p\">,</span> <span class=\"s2\">\"-vf\"</span><span class=\"p\">,</span> <span class=\"s2\">\"fps=1/</span><span class=\"si\">#{</span><span class=\"n\">interval</span><span class=\"si\">}</span><span class=\"s2\">\"</span><span class=\"p\">,</span> <span class=\"s2\">\"</span><span class=\"si\">#{</span><span class=\"n\">tmp</span><span class=\"si\">}</span><span class=\"s2\">/%02d.png\"</span><span class=\"p\">]</span>\n <span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">cmd</span><span class=\"p\">(</span><span class=\"s2\">\"ffmpeg\"</span><span class=\"p\">,</span> <span class=\"n\">args</span><span class=\"p\">)</span>\n <span class=\"n\">urls</span> <span class=\"o\">=</span> <span class=\"no\">VidStore</span><span class=\"o\">.</span><span class=\"n\">put_thumbnails</span><span class=\"p\">(</span><span class=\"n\">vid</span><span class=\"p\">,</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">wildcard</span><span class=\"p\">(</span><span class=\"n\">tmp</span> <span class=\"o\"><></span> <span class=\"s2\">\"/*.png\"</span><span class=\"p\">))</span>\n <span class=\"no\">Repo</span><span class=\"o\">.</span><span class=\"n\">insert_all</span><span class=\"p\">(</span><span class=\"no\">Thumb</span><span class=\"p\">,</span> <span class=\"no\">Enum</span><span class=\"o\">.</span><span class=\"n\">map</span><span class=\"p\">(</span><span class=\"n\">urls</span><span class=\"p\">,</span> <span class=\"o\">&</span><span class=\"p\">%{</span><span class=\"ss\">vid_id:</span> <span class=\"n\">vid</span><span class=\"o\">.</span><span class=\"n\">id</span><span class=\"p\">,</span> <span class=\"ss\">url:</span> <span class=\"nv\">&1</span><span class=\"p\">}))</span>\n <span class=\"k\">end</span><span class=\"p\">)</span>\n<span class=\"k\">end</span>\n</code></pre>\n </div>\n</div>\n<p>That’s it! <code>FLAME.call</code> accepts the name of a runner pool, and a function. It then finds or boots a new copy of our entire application and runs the function there. Any variables the function closes over (like our <code>%Video{}</code> struct and <code>interval</code>) are passed along automatically.</p>\n\n<p>When the FLAME runner boots up, it connects back to the parent node, receives the function to run, executes it, and returns the result to the caller. Based on configuration, the booted runner either waits happily for more work before idling down, or extinguishes itself immediately.</p>\n\n<p>Let’s visualize the flow:</p>\n\n<p><img alt=\"visualizing the flow\" src=\"/blog/rethinking-serverless-with-flame/assets/visual.webp?centered\" /></p>\n\n<p>We changed no other code and issued our DB write with <code>Repo.insert_all</code> just like before, because we are running our <em>entire</em> <em>application</em>. Database connection(s) and all. Except this fleeting application only runs that little function after startup and nothing else.</p>\n\n<p>In practice, a FLAME implementation will support a pool of runners for hot startup, scale-to-zero, and elastic growth. More on that later.</p>\n<h2 id='solving-a-problem-vs-removing-the-problem' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#solving-a-problem-vs-removing-the-problem' aria-label='Anchor'></a><span class='plain-code'>Solving a problem vs removing the problem</span></h2><blockquote>FaaS solutions help you solve a problem. FLAME removes the problem.</blockquote>\n\n\n<p>The FaaS labyrinth of complexity defies reason. And it’s unavoidable. Let’s walkthrough the thumbnail use-case to see how.</p>\n\n<p>We try to start with the simplest building block like request/response AWS Lambda Function URL’s.</p>\n\n<p>The complexity hits immediately.</p>\n\n<p>We start writing custom encoders/decoders on both sides to handle streaming the thumbnails back to the app over HTTP. Phew that’s done. Wait, is our video transcoding or user uploads going to take longer than 15 minutes? Sorry, hard timeout limit – time to split our videos into chunks to stay within the timeout, which means more lambdas to do that. Now we’re orchestrating lambda workflows and relying on additional services, such as SQS and S3, to enable this.</p>\n\n<p>All the FaaS is doing is adding layers of communication between your code and the parts you want to run elastically. Each layer has its own glue integration price to pay.</p>\n\n<p>Ultimately handling this kind of use-case looks something like this:</p>\n\n<ul>\n<li>Trigger the lambda via HTTP endpoint, S3, or API gateway ($)\n</li><li>Write the bespoke lambda to transcode the video ($)\n</li><li>Place the thumbnail results into SQS ($)\n</li><li>Write the SQS consumer in our app (dev $)\n</li><li>Persist to DB and figure out how to get events back to active subscribers that may well be connected to other instances than the SQS consumer (dev $)\n</li></ul>\n\n<p>This is nuts. We pay the FaaS toll at every step. We shouldn’t have to do any of this!</p>\n\n<p>FaaS provides a bunch of offerings to build a solution on top of. FLAME removes the problem entirely.</p>\n<h2 id='flame-backends' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#flame-backends' aria-label='Anchor'></a><span class='plain-code'>FLAME Backends</span></h2><blockquote>On Fly.io infrastructure the <code>FLAME.FlyBackend</code> can boot a copy of your application on a new <a href=\"https://fly.io/docs/machines/\">Machine</a> and have it connect back to the parent for work within ~3s.</blockquote>\n\n\n<p>By default, FLAME ships with a <code>LocalBackend</code> and <code>FlyBackend</code>, but any host that provides an API to provision a server and run your app code can work as a FLAME backend. Erlang and Elixir primitives are doing all the heavy lifting here. The entire <code>FLAME.FlyBackend</code> is <a href='https://github.com/phoenixframework/flame/blob/main/lib/flame/fly_backend.ex' title=''>< 200 LOC with docs</a>. The library has a single dependency, <code>req</code>, which is an HTTP client.</p>\n\n<p>Because Fly.io runs our applications as a packaged up docker image, we simply ask the Fly API to boot a new Machine for us with the same image that our app is currently running. Also thanks to Fly infrastructure, we can guarantee the FLAME runners are started in the same region as the parent. This optimizes latency and lets you ship whatever data back and forth between parent and runner without having to think about it.</p>\n<h2 id='look-at-everything-were-not-doing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-everything-were-not-doing' aria-label='Anchor'></a><span class='plain-code'>Look at everything we’re not doing</span></h2>\n<p>With FaaS, just imagine how quickly the dev and testing story becomes a fate worse than death.</p>\n\n<p>To run the app locally, we either need to add some huge dev dependencies to simulate the entire FaaS pipeline, or worse, connect up our dev and test environments directly to the FaaS provider.</p>\n\n<p>With FLAME, your dev and test runners simply run on the local backend.</p>\n\n<p>Remember, this is your app. FLAME just controls where modular parts of it run. In dev or test, those parts simply run on the existing runtime on your laptop or CI server.</p>\n\n<p>Using Elixir, we can even send a file across to the remote FLAME application thanks to the distributed features of the Erlang VM:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-6icc60nu\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-6icc60nu\"><span class=\"k\">def</span> <span class=\"n\">generate_thumbnails</span><span class=\"p\">(%</span><span class=\"no\">Video</span><span class=\"p\">{}</span> <span class=\"o\">=</span> <span class=\"n\">vid</span><span class=\"p\">,</span> <span class=\"n\">interval</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n <span class=\"n\">parent_stream</span> <span class=\"o\">=</span> <span class=\"no\">File</span><span class=\"o\">.</span><span class=\"n\">stream!</span><span class=\"p\">(</span><span class=\"n\">vid</span><span class=\"o\">.</span><span class=\"n\">filepath</span><span class=\"p\">,</span> <span class=\"p\">[],</span> <span class=\"mi\">2048</span><span class=\"p\">)</span>\n <span class=\"no\">FLAME</span><span class=\"o\">.</span><span class=\"n\">call</span><span class=\"p\">(</span><span class=\"no\">MyApp</span><span class=\"o\">.</span><span class=\"no\">FFMpegRunner</span><span class=\"p\">,</span> <span class=\"k\">fn</span> <span class=\"o\">-></span>\n <span class=\"n\">tmp_file</span> <span class=\"o\">=</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">join</span><span class=\"p\">(</span><span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">tmp_dir!</span><span class=\"p\">(),</span> <span class=\"no\">Ecto</span><span class=\"o\">.</span><span class=\"no\">UUID</span><span class=\"o\">.</span><span class=\"n\">generate</span><span class=\"p\">())</span>\n <span class=\"n\">flame_stream</span> <span class=\"o\">=</span> <span class=\"no\">File</span><span class=\"o\">.</span><span class=\"n\">stream!</span><span class=\"p\">(</span><span class=\"n\">tmp_file</span><span class=\"p\">)</span>\n <span class=\"no\">Enum</span><span class=\"o\">.</span><span class=\"n\">into</span><span class=\"p\">(</span><span class=\"n\">parent_stream</span><span class=\"p\">,</span> <span class=\"n\">flame_stream</span><span class=\"p\">)</span>\n\n <span class=\"n\">tmp</span> <span class=\"o\">=</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">join</span><span class=\"p\">(</span><span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">tmp_dir!</span><span class=\"p\">(),</span> <span class=\"no\">Ecto</span><span class=\"o\">.</span><span class=\"no\">UUID</span><span class=\"o\">.</span><span class=\"n\">generate</span><span class=\"p\">())</span>\n <span class=\"no\">File</span><span class=\"o\">.</span><span class=\"n\">mkdir!</span><span class=\"p\">(</span><span class=\"n\">tmp</span><span class=\"p\">)</span>\n <span class=\"n\">args</span> <span class=\"o\">=</span>\n <span class=\"p\">[</span><span class=\"s2\">\"-i\"</span><span class=\"p\">,</span> <span class=\"n\">tmp_file</span><span class=\"p\">,</span> <span class=\"s2\">\"-vf\"</span><span class=\"p\">,</span> <span class=\"s2\">\"fps=1/</span><span class=\"si\">#{</span><span class=\"n\">interval</span><span class=\"si\">}</span><span class=\"s2\">\"</span><span class=\"p\">,</span> <span class=\"s2\">\"</span><span class=\"si\">#{</span><span class=\"n\">tmp</span><span class=\"si\">}</span><span class=\"s2\">/%02d.png\"</span><span class=\"p\">]</span>\n <span class=\"no\">System</span><span class=\"o\">.</span><span class=\"n\">cmd</span><span class=\"p\">(</span><span class=\"s2\">\"ffmpeg\"</span><span class=\"p\">,</span> <span class=\"n\">args</span><span class=\"p\">)</span>\n <span class=\"n\">urls</span> <span class=\"o\">=</span> <span class=\"no\">VidStore</span><span class=\"o\">.</span><span class=\"n\">put_thumbnails</span><span class=\"p\">(</span><span class=\"n\">vid</span><span class=\"p\">,</span> <span class=\"no\">Path</span><span class=\"o\">.</span><span class=\"n\">wildcard</span><span class=\"p\">(</span><span class=\"n\">tmp</span> <span class=\"o\"><></span> <span class=\"s2\">\"/*.png\"</span><span class=\"p\">))</span>\n <span class=\"no\">Repo</span><span class=\"o\">.</span><span class=\"n\">insert_all</span><span class=\"p\">(</span><span class=\"no\">Thumb</span><span class=\"p\">,</span> <span class=\"no\">Enum</span><span class=\"o\">.</span><span class=\"n\">map</span><span class=\"p\">(</span><span class=\"n\">urls</span><span class=\"p\">,</span> <span class=\"o\">&</span><span class=\"p\">%{</span><span class=\"ss\">vid_id:</span> <span class=\"n\">vid</span><span class=\"o\">.</span><span class=\"n\">id</span><span class=\"p\">,</span> <span class=\"ss\">url:</span> <span class=\"nv\">&1</span><span class=\"p\">}))</span>\n <span class=\"k\">end</span><span class=\"p\">)</span>\n<span class=\"k\">end</span>\n</code></pre>\n </div>\n</div>\n<p>On line 2 we open a file on the parent node to the video path. Then in the FLAME child, we stream the file from the parent node to the FLAME server in only a couple lines of code. That’s it! No setup of S3 or HTTP interfaces required.</p>\n\n<p>With FLAME it’s easy to miss everything we’re not doing:</p>\n\n<ul>\n<li>We don’t need to write code outside of our application. We can reuse business logic, database setup, PubSub, and all the features of our respective platforms\n</li><li>We don’t need to manage deploys of separate services or endpoints\n</li><li>We don’t need to write results to S3 or SQS just to pick up values back in our app\n</li><li>We skip the dev, test, and CI dependency dance\n</li></ul>\n<h2 id='flame-outside-elixir' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#flame-outside-elixir' aria-label='Anchor'></a><span class='plain-code'>FLAME outside Elixir</span></h2>\n<p>Elixir is fantastically well suited for the FLAME model because we get so much <a href='https://fly.io/phoenix-files/elixir-and-phoenix-can-do-it-all/' title=''>for free</a> like process supervision and distributed messaging. That said, any language with reasonable concurrency primitives can take advantage of this pattern. For example, my teammate, Lubien, created a proof of concept example for breaking out functions in your JavaScript application and running them inside a new Fly Machine: <a href='https://github.com/lubien/fly-run-this-function-on-another-machine' title=''>https://github.com/lubien/fly-run-this-function-on-another-machine</a></p>\n\n<p>So the general flow for a JavaScript-based FLAME call would be to move the modular executions to a new file, which is executed on a runner pool. Provided the arguments are JSON serializable, the general FLAME flow is similar to what we’ve outlined here. Your application, your code, running on fleeting instances.</p>\n\n<p>A complete FLAME library will need to handle the following concerns:</p>\n\n<ul>\n<li>Elastic pool scale-up and scale-down logic\n</li><li>Hot vs cold startup with pools\n</li><li>Remote runner monitoring to avoid orphaned resources\n</li><li>How to monitor and keep deployments fresh\n</li></ul>\n\n<p>For the rest of this post we’ll see how the Elixir FLAME library handles these concerns as well as features uniquely suited to Elixir applications. But first, you might be wondering about your background job queues.</p>\n<h2 id='what-about-my-background-job-processor' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-about-my-background-job-processor' aria-label='Anchor'></a><span class='plain-code'>What about my background job processor?</span></h2>\n<p>FLAME works great inside your background job processor, but you may have noticed some overlap. If your job library handles scaling the worker pool, what is FLAME doing for you? There’s a couple important distinctions here.</p>\n\n<p>First, we reach for these queues when we need <em>durability guarantees</em>. We often can turn knobs to have the queues scale to handle more jobs as load changes. But durable operations are separate from elastic execution. Conflating these concerns can send you down a similar path to lambda complexity. Leaning on your worker queue purely for offloaded execution means writing all the glue code to get the data into and out of the job, and back to the caller or end-user’s device somehow.</p>\n\n<p>For example, if we want to guarantee we successfully generated thumbnails for a video after the user upload, then a job queue makes sense as the <em>dispatch, commit, and retry</em> <em>mechanism</em> for this operation. The actual transcoding could be a FLAME call inside the job itself, so we decouple the ideas of durability and scaled execution.</p>\n\n<p>On the other side, we have operations we don’t need durability for. Take the screencast above where the user hasn’t yet saved their video. Or an ML model execution where there’s no need to waste resources churning a prompt if the user has already left the app. In those cases, it doesn’t make sense to write to a durable store to pick up a job for work that will go right into the ether.</p>\n<h2 id='pooling-for-elastic-scale' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pooling-for-elastic-scale' aria-label='Anchor'></a><span class='plain-code'>Pooling for Elastic Scale</span></h2>\n<p>With the Elixir implementation of FLAME, you define elastic pools of runners. This allows scale-to-zero behavior while also elastically scaling up FLAME servers with max concurrency limits.</p>\n\n<p>For example, lets take a look at the <code>start/2</code> callback, which is the entry point of all Elixir applications. We can drop in a <code>FLAME.Pool</code> for video transcriptions and say we want it to scale to zero, boot a max of 10, and support 5 concurrent <code>ffmpeg</code> operations per runner:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-glp57duz\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-glp57duz\"><span class=\"k\">def</span> <span class=\"n\">start</span><span class=\"p\">(</span><span class=\"n\">_type</span><span class=\"p\">,</span> <span class=\"n\">_args</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n <span class=\"n\">flame_parent</span> <span class=\"o\">=</span> <span class=\"no\">FLAME</span><span class=\"o\">.</span><span class=\"no\">Parent</span><span class=\"o\">.</span><span class=\"n\">get</span><span class=\"p\">()</span>\n\n <span class=\"n\">children</span> <span class=\"o\">=</span> <span class=\"p\">[</span>\n <span class=\"o\">...</span><span class=\"p\">,</span>\n <span class=\"no\">MyApp</span><span class=\"o\">.</span><span class=\"no\">Repo</span><span class=\"p\">,</span>\n <span class=\"p\">{</span><span class=\"no\">FLAME</span><span class=\"o\">.</span><span class=\"no\">Pool</span><span class=\"p\">,</span>\n <span class=\"ss\">name:</span> <span class=\"no\">Thumbs</span><span class=\"o\">.</span><span class=\"no\">FFMpegRunner</span><span class=\"p\">,</span>\n <span class=\"ss\">min:</span> <span class=\"mi\">0</span><span class=\"p\">,</span>\n <span class=\"ss\">max:</span> <span class=\"mi\">10</span><span class=\"p\">,</span>\n <span class=\"ss\">max_concurrency:</span> <span class=\"mi\">5</span><span class=\"p\">,</span>\n <span class=\"ss\">idle_shutdown_after:</span> <span class=\"mi\">30_000</span><span class=\"p\">},</span>\n <span class=\"n\">!flame_parent</span> <span class=\"o\">&&</span> <span class=\"no\">MyAppWeb</span><span class=\"o\">.</span><span class=\"no\">Endpoint</span>\n <span class=\"p\">]</span>\n <span class=\"o\">|></span> <span class=\"no\">Enum</span><span class=\"o\">.</span><span class=\"n\">filter</span><span class=\"p\">(</span><span class=\"o\">&</span> <span class=\"nv\">&1</span><span class=\"p\">)</span>\n\n <span class=\"n\">opts</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"ss\">strategy:</span> <span class=\"ss\">:one_for_one</span><span class=\"p\">,</span> <span class=\"ss\">name:</span> <span class=\"no\">MyApp</span><span class=\"o\">.</span><span class=\"no\">Supervisor</span><span class=\"p\">]</span>\n <span class=\"no\">Supervisor</span><span class=\"o\">.</span><span class=\"n\">start_link</span><span class=\"p\">(</span><span class=\"n\">children</span><span class=\"p\">,</span> <span class=\"n\">opts</span><span class=\"p\">)</span>\n<span class=\"k\">end</span>\n</code></pre>\n </div>\n</div>\n<p>We use the presence of a FLAME parent to conditionally start our Phoenix webserver when booting the app. There’s no reason to start a webserver if we aren’t serving web traffic. Note we leave other services like the database <code>MyApp.Repo</code> alone because we want to make use of those services inside FLAME runners.</p>\n\n<p>Elixir’s supervised process approach to applications is uniquely great for turning these kinds of knobs.</p>\n\n<p>We also set our pool to idle down after 30 seconds of no caller operations. This keeps our runners hot for a short while before discarding them. We could also pass a <code>min: 1</code> to always ensure at least one <code>ffmpeg</code> runner is hot and ready for work by the time our application is started.</p>\n<h2 id='process-placement' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#process-placement' aria-label='Anchor'></a><span class='plain-code'>Process Placement</span></h2>\n<p>In Elixir, stateful bits of our applications are built around the <em>process</em> primitive – lightweight greenthreads with message mailboxes. Wrapping our otherwise stateless app code in a synchronous <code>FLAME.call</code>‘s or async <code>FLAME.cast</code>’s works great, but what about the stateful parts of our app?</p>\n\n<p><code>FLAME.place_child</code> exists to take an existing process specification in your Elixir app and start it on a FLAME runner instead of locally. You can use it anywhere you’d use <code>Task.Supervisor.start_child</code> , <code>DynamicSupervisor.start_child</code>, or similar interfaces. Just like <code>FLAME.call</code>, the process is run on an elastic pool and runners handle idle down when the process completes its work.</p>\n\n<p>And like <code>FLAME.call</code>, it lets us take existing app code, change a single LOC, and continue shipping features.</p>\n\n<p>Let’s walk thru the example from the screencast above. Imagine we want to generate video thumbnails for a video <em>as it is being uploaded</em>. Elixir and LiveView make this easy. We won’t cover all the code here, but you can view the <a href='https://github.com/fly-apps/thumbnail_generator/blob/main/lib/thumbs/thumbnail_generator.ex' title=''>full app implementation</a>.</p>\n\n<p>Our first pass would be to write a LiveView upload writer that calls into a <code>ThumbnailGenerator</code>:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-e630ykcb\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-e630ykcb\"><span class=\"k\">defmodule</span> <span class=\"no\">ThumbsWeb</span><span class=\"o\">.</span><span class=\"no\">ThumbnailUploadWriter</span> <span class=\"k\">do</span>\n <span class=\"nv\">@behaviour</span> <span class=\"no\">Phoenix</span><span class=\"o\">.</span><span class=\"no\">LiveView</span><span class=\"o\">.</span><span class=\"no\">UploadWriter</span>\n\n <span class=\"n\">alias</span> <span class=\"no\">Thumbs</span><span class=\"o\">.</span><span class=\"no\">ThumbnailGenerator</span>\n\n <span class=\"k\">def</span> <span class=\"n\">init</span><span class=\"p\">(</span><span class=\"n\">opts</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n <span class=\"n\">generator</span> <span class=\"o\">=</span> <span class=\"no\">ThumbnailGenerator</span><span class=\"o\">.</span><span class=\"n\">open</span><span class=\"p\">(</span><span class=\"n\">opts</span><span class=\"p\">)</span>\n <span class=\"p\">{</span><span class=\"ss\">:ok</span><span class=\"p\">,</span> <span class=\"p\">%{</span><span class=\"ss\">gen:</span> <span class=\"n\">generator</span><span class=\"p\">}}</span>\n <span class=\"k\">end</span>\n\n <span class=\"k\">def</span> <span class=\"n\">write_chunk</span><span class=\"p\">(</span><span class=\"n\">data</span><span class=\"p\">,</span> <span class=\"n\">state</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n <span class=\"no\">ThumbnailGenerator</span><span class=\"o\">.</span><span class=\"n\">stream_chunk!</span><span class=\"p\">(</span><span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">gen</span><span class=\"p\">,</span> <span class=\"n\">data</span><span class=\"p\">)</span>\n <span class=\"p\">{</span><span class=\"ss\">:ok</span><span class=\"p\">,</span> <span class=\"n\">state</span><span class=\"p\">}</span>\n <span class=\"k\">end</span>\n\n <span class=\"k\">def</span> <span class=\"n\">meta</span><span class=\"p\">(</span><span class=\"n\">state</span><span class=\"p\">),</span> <span class=\"k\">do</span><span class=\"p\">:</span> <span class=\"p\">%{</span><span class=\"ss\">gen:</span> <span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">gen</span><span class=\"p\">}</span>\n\n <span class=\"k\">def</span> <span class=\"n\">close</span><span class=\"p\">(</span><span class=\"n\">state</span><span class=\"p\">,</span> <span class=\"n\">_reason</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n <span class=\"no\">ThumbnailGenerator</span><span class=\"o\">.</span><span class=\"n\">close</span><span class=\"p\">(</span><span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">gen</span><span class=\"p\">)</span>\n <span class=\"p\">{</span><span class=\"ss\">:ok</span><span class=\"p\">,</span> <span class=\"n\">state</span><span class=\"p\">}</span>\n <span class=\"k\">end</span>\n<span class=\"k\">end</span>\n</code></pre>\n </div>\n</div>\n<p>An upload writer is a behavior that simply ferries the uploaded chunks from the client into whatever we’d like to do with them. Here we have a <code>ThumbnailGenerator.open/1</code> which starts a process that communicates with an <code>ffmpeg</code> shell. Inside <code>ThumbnailGenerator.open/1</code>, we use regular elixir process primitives:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-ziskaky4\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-ziskaky4\"> <span class=\"c1\"># thumbnail_generator.ex</span>\n <span class=\"k\">def</span> <span class=\"n\">open</span><span class=\"p\">(</span><span class=\"n\">opts</span> <span class=\"p\">\\\\</span> <span class=\"p\">[])</span> <span class=\"k\">do</span>\n <span class=\"no\">Keyword</span><span class=\"o\">.</span><span class=\"n\">validate!</span><span class=\"p\">(</span><span class=\"n\">opts</span><span class=\"p\">,</span> <span class=\"p\">[</span><span class=\"ss\">:timeout</span><span class=\"p\">,</span> <span class=\"ss\">:caller</span><span class=\"p\">,</span> <span class=\"ss\">:fps</span><span class=\"p\">])</span>\n <span class=\"n\">timeout</span> <span class=\"o\">=</span> <span class=\"no\">Keyword</span><span class=\"o\">.</span><span class=\"n\">get</span><span class=\"p\">(</span><span class=\"n\">opts</span><span class=\"p\">,</span> <span class=\"ss\">:timeout</span><span class=\"p\">,</span> <span class=\"mi\">5_000</span><span class=\"p\">)</span>\n <span class=\"n\">caller</span> <span class=\"o\">=</span> <span class=\"no\">Keyword</span><span class=\"o\">.</span><span class=\"n\">get</span><span class=\"p\">(</span><span class=\"n\">opts</span><span class=\"p\">,</span> <span class=\"ss\">:caller</span><span class=\"p\">,</span> <span class=\"n\">self</span><span class=\"p\">())</span>\n <span class=\"n\">ref</span> <span class=\"o\">=</span> <span class=\"n\">make_ref</span><span class=\"p\">()</span>\n <span class=\"n\">parent</span> <span class=\"o\">=</span> <span class=\"n\">self</span><span class=\"p\">()</span>\n\n <span class=\"n\">spec</span> <span class=\"o\">=</span> <span class=\"p\">{</span><span class=\"bp\">__MODULE__</span><span class=\"p\">,</span> <span class=\"p\">{</span><span class=\"n\">caller</span><span class=\"p\">,</span> <span class=\"n\">ref</span><span class=\"p\">,</span> <span class=\"n\">parent</span><span class=\"p\">,</span> <span class=\"n\">opts</span><span class=\"p\">}}</span>\n <span class=\"p\">{</span><span class=\"ss\">:ok</span><span class=\"p\">,</span> <span class=\"n\">pid</span><span class=\"p\">}</span> <span class=\"o\">=</span> <span class=\"no\">DynamicSupervisor</span><span class=\"o\">.</span><span class=\"n\">start_child</span><span class=\"p\">(</span><span class=\"nv\">@sup</span><span class=\"p\">,</span> <span class=\"n\">spec</span><span class=\"p\">)</span>\n\n <span class=\"k\">receive</span> <span class=\"k\">do</span>\n <span class=\"p\">{</span><span class=\"o\">^</span><span class=\"n\">ref</span><span class=\"p\">,</span> <span class=\"p\">%</span><span class=\"no\">ThumbnailGenerator</span><span class=\"p\">{}</span> <span class=\"o\">=</span> <span class=\"n\">gen</span><span class=\"p\">}</span> <span class=\"o\">-></span>\n <span class=\"p\">%</span><span class=\"no\">ThumbnailGenerator</span><span class=\"p\">{</span><span class=\"n\">gen</span> <span class=\"o\">|</span> <span class=\"ss\">pid:</span> <span class=\"n\">pid</span><span class=\"p\">}</span>\n <span class=\"k\">after</span>\n <span class=\"n\">timeout</span> <span class=\"o\">-></span> <span class=\"k\">exit</span><span class=\"p\">(</span><span class=\"ss\">:timeout</span><span class=\"p\">)</span>\n <span class=\"k\">end</span>\n <span class=\"k\">end</span>\n</code></pre>\n </div>\n</div>\n<p>The details aren’t super important here, except line 10 where we call <code>{:ok, pid} = DynamicSupervisor.start_child(@sup, spec)</code>, which starts a supervised<code>ThumbnailGenerator</code> process. The rest of the implementation simply ferries chunks as stdin into <code>ffmpeg</code> and parses png’s from stdout. Once a PNG delimiter is found in stdout, we send the <code>caller</code> process (our LiveView process) a message saying “hey, here’s an image”:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-y166mubi\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-y166mubi\"><span class=\"c1\"># thumbnail_generator.ex</span>\n<span class=\"nv\">@png_begin</span> <span class=\"o\"><<</span><span class=\"mi\">137</span><span class=\"p\">,</span> <span class=\"mi\">80</span><span class=\"p\">,</span> <span class=\"mi\">78</span><span class=\"p\">,</span> <span class=\"mi\">71</span><span class=\"p\">,</span> <span class=\"mi\">13</span><span class=\"p\">,</span> <span class=\"mi\">10</span><span class=\"p\">,</span> <span class=\"mi\">26</span><span class=\"p\">,</span> <span class=\"mi\">10</span><span class=\"o\">>></span>\n<span class=\"k\">defp</span> <span class=\"n\">handle_stdout</span><span class=\"p\">(</span><span class=\"n\">state</span><span class=\"p\">,</span> <span class=\"n\">ref</span><span class=\"p\">,</span> <span class=\"n\">bin</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n <span class=\"p\">%</span><span class=\"no\">ThumbnailGenerator</span><span class=\"p\">{</span><span class=\"ss\">ref:</span> <span class=\"o\">^</span><span class=\"n\">ref</span><span class=\"p\">,</span> <span class=\"ss\">caller:</span> <span class=\"n\">caller</span><span class=\"p\">}</span> <span class=\"o\">=</span> <span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">gen</span>\n\n <span class=\"k\">case</span> <span class=\"n\">bin</span> <span class=\"k\">do</span>\n <span class=\"o\"><<</span><span class=\"nv\">@png_begin</span><span class=\"p\">,</span> <span class=\"n\">_rest</span><span class=\"p\">::</span><span class=\"n\">binary</span><span class=\"o\">>></span> <span class=\"o\">-></span>\n <span class=\"k\">if</span> <span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">current</span> <span class=\"k\">do</span>\n <span class=\"n\">send</span><span class=\"p\">(</span><span class=\"n\">caller</span><span class=\"p\">,</span> <span class=\"p\">{</span><span class=\"n\">ref</span><span class=\"p\">,</span> <span class=\"ss\">:image</span><span class=\"p\">,</span> <span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">count</span><span class=\"p\">,</span> <span class=\"n\">encode</span><span class=\"p\">(</span><span class=\"n\">state</span><span class=\"p\">)})</span>\n <span class=\"k\">end</span>\n\n <span class=\"p\">%{</span><span class=\"n\">state</span> <span class=\"o\">|</span> <span class=\"ss\">count:</span> <span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">count</span> <span class=\"o\">+</span> <span class=\"mi\">1</span><span class=\"p\">,</span> <span class=\"ss\">current:</span> <span class=\"p\">[</span><span class=\"n\">bin</span><span class=\"p\">]}</span>\n\n <span class=\"n\">_</span> <span class=\"o\">-></span>\n <span class=\"p\">%{</span><span class=\"n\">state</span> <span class=\"o\">|</span> <span class=\"ss\">current:</span> <span class=\"p\">[</span><span class=\"n\">bin</span> <span class=\"o\">|</span> <span class=\"n\">state</span><span class=\"o\">.</span><span class=\"n\">current</span><span class=\"p\">]}</span>\n <span class=\"k\">end</span>\n<span class=\"k\">end</span>\n</code></pre>\n </div>\n</div>\n<p>The <code>caller</code> LiveView process then picks up the message in a <code>handle_info</code> callback and updates the UI:</p>\n<div class=\"highlight-wrapper group relative elixir\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-3gf1jq5\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-3gf1jq5\"><span class=\"c1\"># thumb_live.ex</span>\n<span class=\"k\">def</span> <span class=\"n\">handle_info</span><span class=\"p\">({</span><span class=\"n\">_ref</span><span class=\"p\">,</span> <span class=\"ss\">:image</span><span class=\"p\">,</span> <span class=\"n\">_count</span><span class=\"p\">,</span> <span class=\"n\">encoded</span><span class=\"p\">},</span> <span class=\"n\">socket</span><span class=\"p\">)</span> <span class=\"k\">do</span>\n <span class=\"p\">%{</span><span class=\"ss\">count:</span> <span class=\"n\">count</span><span class=\"p\">}</span> <span class=\"o\">=</span> <span class=\"n\">socket</span><span class=\"o\">.</span><span class=\"n\">assigns</span>\n\n <span class=\"p\">{</span><span class=\"ss\">:noreply</span><span class=\"p\">,</span>\n <span class=\"n\">socket</span>\n <span class=\"o\">|></span> <span class=\"n\">assign</span><span class=\"p\">(</span><span class=\"ss\">count:</span> <span class=\"n\">count</span> <span class=\"o\">+</span> <span class=\"mi\">1</span><span class=\"p\">,</span> <span class=\"ss\">message:</span> <span class=\"s2\">\"Generating (</span><span class=\"si\">#{</span><span class=\"n\">count</span> <span class=\"o\">+</span> <span class=\"mi\">1</span><span class=\"si\">}</span><span class=\"s2\">)\"</span><span class=\"p\">)</span>\n <span class=\"o\">|></span> <span class=\"n\">stream_insert</span><span class=\"p\">(</span><span class=\"ss\">:thumbs</span><span class=\"p\">,</span> <span class=\"p\">%{</span><span class=\"ss\">id:</span> <span class=\"n\">count</span><span class=\"p\">,</span> <span class=\"ss\">encoded:</span> <span class=\"n\">encoded</span><span class=\"p\">})}</span>\n<span class=\"k\">end</span>\n</code></pre>\n </div>\n</div>\n<p>The <code>send(caller, {ref, :image, state.count, encode(state)}</code> is one magic part about Elixir. Everything is a process, and we can message those processes, regardless of their location in the cluster.</p>\n\n<p>It’s like if every instantiation of an object in your favorite OO lang included a cluster-global unique identifier to work with methods on that object. The LiveView (a process) simply receives the image message and updates the UI with new images.</p>\n\n<p>Now let’s head back over to our <code>ThumbnailGenerator.open/1</code> function and make this elastically scalable.</p>\n<div class=\"highlight-wrapper group relative diff\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-5jadq56a\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-5jadq56a\"><span class=\"gd\">- {:ok, pid} = DynamicSupervisor.start_child(@sup, spec)\n</span><span class=\"gi\">+ {:ok, pid} = FLAME.place_child(Thumbs.FFMpegRunner, spec)\n</span></code></pre>\n </div>\n</div>\n<p>That’s it! Because everything is a process and processes can live anywhere, it doesn’t matter what server our <code>ThumbnailGenerator</code> process lives on. It simply messages the caller with <code>send(caller, …)</code> and the messages are sent across the cluster if needed.</p>\n\n<p>Once the process exits, either from an explicit close, after the upload is done, or from the end-user closing their browser tab, the FLAME server will note the exit and idle down if no other work is being done.</p>\n\n<p>Check out the <a href='https://github.com/fly-apps/thumbnail_generator/blob/main/lib/thumbs/thumbnail_generator.ex' title=''>full implementation</a> if you’re interested.</p>\n<h2 id='remote-monitoring' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#remote-monitoring' aria-label='Anchor'></a><span class='plain-code'>Remote Monitoring</span></h2>\n<p>All this transient infrastructure needs failsafe mechanisms to avoid orphaning resources. If a parent spins up a runner, that runner must take care of idling itself down when no work is present and handle failsafe shutdowns if it can no longer contact the parent node.</p>\n\n<p>Likewise, we need to shutdown runners when parents are rolled for new deploys as we must guarantee we’re running the same code across the cluster.</p>\n\n<p>We also have active callers in many cases that are awaiting the result of work on runners that could go down for any reason.</p>\n\n<p>There’s a lot to monitor here.</p>\n\n<p>There’s also a number of failure modes that make this sound like a harrowing experience to implement. Fortunately Elixir has all the primitives to make this an easy task thanks to the Erlang VM. Namely, we get the following for free:</p>\n\n<ul>\n<li>Process monitoring and supervision – we know when things go bad. Whether on a node-local process, or one across the cluster\n</li><li>Node monitoring – we know when nodes come up, and when nodes go away\n</li><li>Declarative and controlled app startup and shutdown - we carefully control the startup and shutdown sequence of applications as a matter of course. This allows us to gracefully shutdown active runners when a fresh deploy is triggered, while giving them time to finish their work\n</li></ul>\n\n<p>We’ll cover the internal implementation details in a future deep-dive post. For now, feel free to poke around <a href='https://github.com/phoenixframework/flame' title=''>the flame source</a>.</p>\n<h2 id='whats-next' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next' aria-label='Anchor'></a><span class='plain-code'>What’s Next</span></h2>\n<p>We’re just getting started with the Elixir FLAME library, but it’s ready to try out now. In the future look for more advance pool growth techniques, and deep dives into how the Elixir implementation works. You can also find me <a href='https://twitter.com/chris_mccord' title=''>@chris_mccord</a> to chat about implementing the FLAME pattern in your language of choice.</p>\n\n<p>Happy coding!</p>\n\n<p>–Chris</p>", "image": { "url": "https://fly.io/blog/rethinking-serverless-with-flame/assets/flame-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/", "title": "The risks of building apps on ChatGPT", "description": null, "url": "https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/", "published": "2023-12-05T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>If AI will play an essential role in your application, then consider using a self-hosted, open source model instead of a proprietary and externally hosted one. In this post we explore some of the risks for the latter option. We’re Fly.io. We put your code into lightweight microVMs on our own hardware <a href=\"https://fly.io/docs/reference/regions/\" title=\"\">around the world</a>. <a href=\"https://fly.io/docs/speedrun/\" title=\"\">Check us out</a>—your app can be deployed in minutes.</p>\n</div>\n<p>The topic of “AI” gets a lot of attention and press. Coverage ranges from apocalyptic warnings to Utopian predictions. The truth, as always, is likely somewhere in the middle. As developers, we are the ones that either imagine ways that AI can be used to enhance our products or the ones doing the herculean tasks of implementing it inside our companies.</p>\n\n<p>I believe the following statement to be true:</p>\n\n<blockquote>\n<p>AI won’t replace humans — but humans with AI will replace humans without AI.</p>\n</blockquote>\n\n<p>I believe this can be extended to many products and services and the companies that create them. Let’s express it this way:</p>\n\n<blockquote>\n<p>AI won’t replace businesses — but businesses with AI will replace businesses without AI.</p>\n</blockquote>\n\n<p>Today I’m assuming your business would benefit from using AI. Or, at the very least, your C-levels have decreed from on high that thou must integrateth with AI. With that out of the way, the next question is how you’re meant to do it. This post is an argument to build on top of open source language models instead of closed models that you rent access to. We’ll take a look at what convinced me.</p>\n<h2 id='but-openai-is-the-market-leader' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-openai-is-the-market-leader' aria-label='Anchor'></a><span class='plain-code'>But OpenAI is the market leader…</span></h2>\n<p>OpenAI, the creators of the famous ChatGPT, are the strong market leaders in this category. Why wouldn’t you want to use the best in the business?</p>\n\n<p>Early on, stories of private corporate documents being uploaded by employees and then finding that private information leaking out to general ChatGPT users was a real black eye. <a href='https://www.sciencealert.com/many-companies-are-banning-chatgpt-this-is-why' title=''>Companies began banning employees from using ChatGPT for work</a>. It exposed that people’s interactions with ChatGPT were being used as training data for future versions of the model.</p>\n\n<p>In response, OpenAI recently announced an <a href='https://openai.com/enterprise' title=''>Enterprise</a> offering promising that no Enterprise customer data is used for training.</p>\n\n<p>With the top objection addressed, it should be smooth sailing for wide adoption, right?</p>\n\n<p>Not so fast.</p>\n\n<p>While an Enterprise offering may address that concern, there are other subtle reasons to not use OpenAI, or other closed models, that can’t be resolved by vague statements of enterprise privacy.</p>\n<h2 id='what-are-the-risks-for-building-on-top-of-openai' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-are-the-risks-for-building-on-top-of-openai' aria-label='Anchor'></a><span class='plain-code'>What are the risks for building on top of OpenAI?</span></h2>\n<p>Let’s briefly outline the risks we take on when relying on a company like OpenAI for critical AI features in our applications.</p>\n\n<ul>\n<li><strong class='font-semibold text-navy-950'>Single provider risk</strong>: Relying deeply on an external service that plays a critical role in our business is risky. The integration is not easily swapped out for another service if needed. Additionally, we don’t want part of our “secret sauce” to actually be another company’s product. That’s some seriously shaky ground! They <em>want</em> to sell the same thing to our competitors too.\n</li><li><strong class='font-semibold text-navy-950'>Regulation or Policy change risk</strong>: “AI” is being talked about a lot in politics. What’s acceptable today may be deemed “not allowed” in the future and a corporation providing a newly regulated service must comply.\n</li><li><strong class='font-semibold text-navy-950'>Financial risk</strong>: <a href='https://www.washingtonpost.com/technology/2023/06/05/chatgpt-hidden-cost-gpu-compute/' title=''>AI chatbots lose money on every chat.</a> If the financial models that make our business profitable are built on impossible to maintain prices, then our business model may be at risk when it’s time to “make the AI engine profitable” like we’ve seen happen time and time again with every industry from cookware to video games. What might the true cost be? We don’t know. ‘Nuff said.\n</li><li><strong class='font-semibold text-navy-950'>Governance and leadership risk</strong>: The co-founder and CEO of OpenAI, <a href='https://openai.com/blog/openai-announces-leadership-transition' title=''>Sam Altman, was forced out of his own company by a coup from his board</a>. This was later resolved with both <a href='https://openai.com/blog/openai-announces-leadership-transition' title=''>Sam Altman and Greg Brockman returning</a>. This exposes another risk we don’t often consider with our providers. More on this later.\n</li></ul>\n\n<p>Let’s look a bit closer at the “Single provider risk”.</p>\n<h2 id='single-provider-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#single-provider-risk' aria-label='Anchor'></a><span class='plain-code'>Single provider risk</span></h2>\n<p>For hobby usage, proof of concept work, and personal experiments, by all means, use ChatGPT! I do and I expect to continue to as well. It’s fantastic for prototyping, it’s trivial to set up, and it allows you to throw ink on canvas so much more quickly than any other option out there.</p>\n\n<p>Up until recently, I was all gung-ho for ChatGPT being integrated into my apps. What happened? November 2023 happened. It was a very bad month for OpenAI.</p>\n\n<p>I created a <a href='https://fly.io/phoenix-files/created-my-personal-ai-fitness-trainer-in-2-days/' title=''>Personal AI Fitness Trainer</a> powered by ChatGPT and on the morning of November 8th, I asked my personal trainer about the workout for the day and it failed. OpenAI was having a bad day with an outage.</p>\n\n<p>I don’t fault someone for having a bad day. At some point, downtime happens to the best of us. And given enough time, it happens to <strong class='font-semibold text-navy-950'>all</strong> of us. But when possible, I want to prevent someone <em>else’s</em> bad day from becoming <em>my</em> bad day too.</p>\n<h3 id='evaluating-a-critical-dependency' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#evaluating-a-critical-dependency' aria-label='Anchor'></a><span class='plain-code'>Evaluating a critical dependency</span></h3>\n<p>In my case, my personal fitness trainer being unavailable was a minor inconvenience, but I managed. However, it gave me pause. If I had built an AI fitness trainer as a service, that outage would be a much bigger deal and there would be nothing I could have done to fix it until the ChatGPT API came back up.</p>\n\n<p>With services like a Personal AI Fitness Trainer, the AI component is the primary focus and main value proposition of the app. That’s pretty darn critical! If that AI service is interrupted, significantly altered (say, by the model suddenly refusing my requests for fitness information in ways that worked before) or my desired usage is denied (without warning or reason), the application is useless. That’s an existential threat that could make my app evaporate overnight without warning.</p>\n\n<p>This highlights the risk of having a critical dependency on an external service.</p>\n\n<p>Modern applications depend on many services, both internal and external. But how <strong class='font-semibold text-navy-950'>critical</strong> that dependency is matters.</p>\n\n<p>Let’s take a <em>very</em> simple application as an example. The application has a critical dependency on the database and both the app and database have a critical dependency on the underlying VMs/machines/provider. These critical dependencies are so common that we seldom think about them because we deal with them every day we come to work. It’s just how things are.</p>\n\n<p><img alt=\"Diagram showing an application stack of hosting > Database > My Application and weak dependencies on logging, error reporting, etc. Then a critical dependency on an external AI as a Service. \" src=\"/blog/the-risks-of-building-apps-on-chatgpt/assets/critical-dependency-vs-weak.png\" /></p>\n\n<p>The danger comes when we draw a critical dependency line to an <strong class='font-semibold text-navy-950'>external</strong> <strong class='font-semibold text-navy-950'>service</strong>. If the service has a hiccup or the network between my app and their service starts dropping all my packets, the entire application goes down. Someone else’s bad day gets spread around when that happens. 😞</p>\n\n<p>In order to protect ourselves from a risk like that, we should diversify our reliance away from a single external provider. How do we do that? We’ll come back to this later.</p>\n<h3 id='we-are-not-without-dependencies' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-are-not-without-dependencies' aria-label='Anchor'></a><span class='plain-code'>We are not without dependencies</span></h3>\n<p>It’s really common for apps to have external dependencies. The question is how critical to our service are those dependencies?</p>\n\n<p>What happens to the application when the external log aggregation service, email service, and error reporting services are all unreachable? If the app is designed well, then users may have a slightly degraded experience or, best case, the users won’t even notice the issues at all!</p>\n\n<p>The key factor is these external services are not essential to our application functioning.</p>\n<h2 id='regulation-or-policy-change-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#regulation-or-policy-change-risk' aria-label='Anchor'></a><span class='plain-code'>Regulation or Policy change risk</span></h2>\n<p>Our industry has a lot of misconceptions, fear, uncertainty, and doubt around the idea of regulation, but sometimes it’s justified. I don’t want you to think about regulation as a scary thing that yanks away control. Instead, let’s think about regulation as when a government body gets involved to disallow businesses from doing or engaging in specific activities. Given that our industry has been so self-defined for so long, this feels like an existential threat. However, this is a good thing when we think about vehicle safety standards (you don’t want your 4-ton mass of metal exploding while traveling at 70 mph), pollution, health risks, and more. It’s a careful balance.</p>\n\n<p>Ironically, Sam Altman has been a major proponent <a href='https://www.forbes.com/sites/johannacostigan/2023/06/13/openais-sam-altman-makes-global-call-for-ai-regulation-and-includes-china/?sh=4fc007421b47' title=''>for government regulation</a> of the AI industry. Why would he want that?</p>\n\n<p>It turns out that <a href='https://www.cato.org/policy-analysis/regulatory-protectionism-hidden-threat-free-trade' title=''>regulation can also be used as a form of protectionism</a>. Or, put another way, when the people with an early lead see that <a href='https://www.semianalysis.com/p/google-we-have-no-moat-and-neither' title=''>they aren’t defensible against advances with open source AI models</a>, they want to pull up the ladders behind them and have the government make it legally harder, or impossible, for competitors to catch up to them.</p>\n\n<p>If Altman’s efforts are successful, then companies who create AI can expect government involvement and oversight. Added licensing requirements and certifications would raise the cost of starting a competing business.</p>\n\n<p>At this point you may be thinking something like “but all of that is theoretical Mark, how would this affect my business’ use of AI today?”</p>\n\n<p>Introducing an external organization that can dictate changes to an AI product risks breaking an existing company’s applications or significantly reducing the effectiveness of the application. And those changes may come without notice or warning.</p>\n\n<p>Additionally, if my business is built on an external AI system protected from competition by regulators, that adds a significant risk. If they are now the only game in town, they can set whatever price they want.</p>\n<h2 id='governance-and-leadership-risk' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#governance-and-leadership-risk' aria-label='Anchor'></a><span class='plain-code'>Governance and leadership risk</span></h2>\n<p>In the week following the OpenAI outage (November 17th to be precise), the entire tech industry was upended for most of a week following a blog post on the OpenAI blog <a href='https://openai.com/blog/openai-announces-leadership-transition' title=''>announcing that the OpenAI board fired the co-founder and CEO, Sam Altman</a>. Then <a href='https://www.forbes.com/sites/richardnieva/2023/11/17/openai-president-and-co-founder-quits-over-sam-altman-firing/?sh=34fe4b621d57' title=''>Greg Brockman, co-founder and acting President resigned in protest</a>.</p>\n\n<p>OpenAI is partnered with Microsoft and on Nov 20, 2023, <a href='https://twitter.com/satyanadella/status/1726509045803336122' title=''>Satya Nadella (CEO of Microsoft) posted the following on X</a> (formerly Twitter):</p>\n\n<blockquote>\n<p>We remain committed to our partnership with OpenAI (OAI) and have confidence in our product roadmap, our ability to continue to innovate with everything we announced at Microsoft Ignite, and in continuing to support our customers and partners. We look forward to getting to know Emmett Shear and OAI’s new leadership team and working with them. And <strong class='font-semibold text-navy-950'>we’re extremely excited to share the news that Sam Altman and Greg Brockman, together with colleagues, will be joining Microsoft to lead a new advanced AI research team.</strong> We look forward to moving quickly to provide them with the resources needed for their success.</p>\n</blockquote>\n\n<p>Microsoft nearly <a href='https://en.wikipedia.org/wiki/Acqui-hiring' title=''>acqui-hired</a> OpenAI for $0! That’s some serious business Jujutsu.</p>\n\n<p>In the end, after 12 days of very public corporate chaos, <a href='https://openai.com/blog/sam-altman-returns-as-ceo-openai-has-a-new-initial-board' title=''>Sam Altman and Greg Brockman returned to OpenAI at their previous leadership positions</a> as if nothing happened (save the firing of the rest of the board).</p>\n\n<p>With all the drama and uncertainty resolved, you may say, “it all worked out in the end, right? So what’s the problem?”</p>\n\n<p>This highlights the risk of building <em>any</em> critical business system on a product offered and hosted by an external company. When we do that, we implicitly take on all of that company’s risks in addition to the risks our business already has! In this case, it’s taking on all the risks of OpenAI while getting none of their financial benefits!</p>\n<h2 id='whats-the-alternative' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-the-alternative' aria-label='Anchor'></a><span class='plain-code'>What’s the alternative?</span></h2>\n<p>The thing big AI providers like OpenAI and Google seem to fear most is competition from open source AI models. And they should be afraid. Open source AI models continue to develop at a rapid pace (there’s huge incremental improvements on a weekly basis) and, most importantly, they can be self-hosted.</p>\n\n<p>Additionally, it’s not out of reach for us to <a href='https://huggingface.co/docs/transformers/training' title=''>fine tune</a> a general model to better fit our needs by adding and removing capabilities rather than hope that the capabilities we need suddenly manifest for us.</p>\n\n<p>Doesn’t this all sound like the classic argument in favor of open source?</p>\n\n<p>If we have the model and can host it ourselves, no one can take it away. When we self-host it, we are protected from:</p>\n\n<ul>\n<li>service interruptions from an external provider for a critical system\n</li><li>changes in licensing or usage fees (such as your provider suddenly doubling inference costs without warning via an email sent at 3AM)\n</li><li>government regulators dictate a change to the model that negatively affects our use case (assuming our use isn’t breaking the law of course)\n</li><li>company policy changes that change the behavior of the model we rely on\n</li><li>rogue boards or a leadership crisis that impacts a provider\n</li></ul>\n\n<p>Using an open source and self-hosted model insulates us from these external risks.</p>\n<h2 id='i-still-need-gpus' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#i-still-need-gpus' aria-label='Anchor'></a><span class='plain-code'>I still need GPUs!</span></h2>\n<p>Getting dedicated access to a GPU is more expensive than renting limited time on OpenAI’s servers. That’s why a hobby or personal project is better off paying for the brief bits of time when needed.</p>\n\n<p>But let’s face it.</p>\n\n<p>If you really want to integrate AI into your business, you need to host your own models. You can’t control third party privacy policies, but you can control your own policies when you are the one doing your own inference with your own models. Ideally this means getting your own GPUs and incurring the capital expenditure and operations expenditures, but thankfully we’re in the future. We have the cloud now. There’s many options you can use for renting GPU access from other companies. This is supported in the big clouds as well as Fly.io. You can check out our <a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''>GPU offerings here</a>.</p>\n<figure class=\"post-cta\">\n <figcaption>\n <h1>Fly.io also offer GPUs</h1>\n <p>Running inference on your own hosted models can help de-risk critical AI integrations.</p>\n <a class=\"btn btn-lg\" href=\"https://fly.io/docs/about/pricing/#gpus-and-fly-machines\">\n GPU resource prices\n </a>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-turtle.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>\n\n<h2 id='closing-thoughts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#closing-thoughts' aria-label='Anchor'></a><span class='plain-code'>Closing thoughts</span></h2>\n<p>It’s important to take advantage of AI in our applications so we can reap the benefits. It can give us an important edge in the market! However, we should be extra cautious of building any critical features on a product offered by a proprietary external business. <a href='https://www.msn.com/en-us/money/companies/sam-altman-chaos-helped-openai-rivals-says-hugging-face-ceo-cl%C3%A9ment-delangue/ar-AA1kIFQP' title=''>Others are considering the risks of building on OpenAI as well</a>.</p>\n\n<p>Your specific level of risk depends on how central the AI aspect is to your business. If it’s a central component like in my Personal AI Fitness Trainer, then I risk losing all my customers and even the company if any of the above mentioned risk factors happen to my AI provider. That’s an existential risk that I can’t do anything about without taking emergency heroic efforts.</p>\n\n<p>If the AI is sprinkled around the edges of the business, then suddenly losing it won’t kill the company. However, if the AI isn’t being well utilized, then the business may be at risk to competitors who place a bigger bet and take a bigger swing with AI.</p>\n\n<p>Oh, what interesting times we live in! 🙃</p>", "image": { "url": "https://fly.io/blog/the-risks-of-building-apps-on-chatgpt/assets/risks-building-on-chatgpt-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/print-on-demand/", "title": "Print on Demand", "description": null, "url": "https://fly.io/blog/print-on-demand/", "published": "2023-11-29T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>Save money by using appliance machines to only allocate memory and other machine resources when you actually need them.</p>\n</div>\n<p>Scaling discussions often lead to recommendations to add more memory, more CPU, more machines, more regions, more, more, more.</p>\n\n<p>This post is different. It focuses instead on the idea of decomposing parts of your applications into event handlers, starting up Machines to handle the events when needed, and stopping them when the event is done. Along the way we will see how a few built in Fly.io primitives make this easy.</p>\n\n<p>To make the discussion concrete, we are going to focus on a common requirement: generation of PDFs from web pages. The code that we will introduce isn’t merely an example produced in support of a blog post - rather it is code that was extracted from a production application, and packaged up into an appliance that you can deploy in minutes to add PDF generation to your existing application.</p>\n\n<p>But before we dive in, let’s back up a bit.</p>\n<h2 id='motivation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#motivation' aria-label='Anchor'></a><span class='plain-code'>Motivation</span></h2>\n<p>Normally the way this is approached is to start with a tool like <a href='https://github.com/puppeteer/puppeteer' title=''>Puppeteer</a>, <a href='https://github.com/Studiosity/grover#readme' title=''>Grover</a>, <a href='https://playwright.dev/' title=''>Playwright</a>, <a href='https://github.com/bitcrowd/chromic_pdf' title=''>ChromicPDF</a>, or <a href='https://spatie.be/docs/browsershot/v2/introduction' title=''>BrowserShot</a>. These and other tools ultimately launch a browser like <a href='https://developer.chrome.com/articles/new-headless/' title=''>Chrome headless</a>.</p>\n\n<p>Now a few things about Chrome itself:</p>\n\n<ul>\n<li>It likely is bigger than your entire web server.\n</li><li>It likely uses more memory than you see with a typical load on your server.\n</li><li>All total, people using your server likely spend much less time generating PDFs than they do using the rest of your application. \n</li></ul>\n\n<p>Taken together, this makes splitting PDF generation into a completely separate application an easy win. With a smaller image, your application will start faster. Memory usage will be more predictable, and the memory needed to generate PDFs will only be allocated when needed and can be scaled separately.</p>\n<h2 id='diving-in' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#diving-in' aria-label='Anchor'></a><span class='plain-code'>Diving in</span></h2>\n<p>Without further ado, the entire application is available on GitHub as <a href='https://github.com/fly-apps/pdf-appliance/#readme' title=''>fly-apps/pdf-appliance</a>. Installation is a simple matter of: clone repository, create app, adjust config, deploy, and scale.</p>\n\n<p>Next, you will need to integrate this into your application. All that is needed is to reply to requests that are intended to produce a PDF with a <a href='https://fly.io/docs/reference/dynamic-request-routing/#the-fly-replay-response-header' title=''>fly-replay</a> response header. This can either be done on individual application routes / controller actions, or it can be done globally via either middleware or a front end like <a href='https://www.nginx.com/' title=''>NGINX</a>. You can find a few examples in the <a href='https://github.com/fly-apps/pdf-appliance/#integrate-with-your-existing-application' title=''>README</a>.</p>\n\n<p>And, that’s it. The most you might consider doing is issuing an additional HTTP request in anticipation of the user selecting what they want to print as this will <a href='https://github.com/fly-apps/pdf-appliance/#preloading-optional' title=''>preload the machine</a>.</p>\n<figure class=\"post-cta\">\n <figcaption>\n <h1>Scale at your own pace</h1>\n <p>Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.</p>\n <a class=\"btn btn-lg\" href=\"https://fly.io/docs/\">\n Run your entire stack near your users\n </a>\n </figcaption>\n <div class=\"image-container\">\n <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n </div>\n</figure>\n\n\n<p>If you don’t have an application handy, you can try a demo. Go to <a href='https://smooth.fly.dev/' title=''>smooth.fly.dev</a>. Click on Demo, then on Publish, and finally on Invoices to see a PDF. The PDF you see will likely be underwhelming as you would need to enter students, entries, packages and options to fill out the page. But click refresh anyway and see how fast it responds. If you want to explore further, links to the <a href='https://smooth.fly.dev/showcase/docs/' title=''>documentation</a> and <a href='https://github.com/rubys/showcase#readme' title=''>code</a> can be found on the front page.</p>\n<h2 id='implementation-details' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implementation-details' aria-label='Anchor'></a><span class='plain-code'>Implementation Details</span></h2>\n<p>The basic flow starts with a request comes into your app for a PDF. That request is replayed to the PDF appliance. A Chrome instance in that app then issues a second request to your app for the same URL minus the <code>.pdf</code> extension and then converts the HTML which it receives in response to a PDF. That PDF is then returned as the response to the original request.</p>\n\n<p>A single Google Chrome instance per machine will be reused across all requests, which itself is faster than starting a new instance per request. As all HTTP headers will be passed back to your application, this will seamlessly work with your existing session, cookies, and basic authentication.</p>\n\n<p>Starting up a machine on demand is handled by the <code>auto_stop_machines</code> setting in your <code>fly.toml</code>. With this in place, machines can confidently exit when idle, secure in the knowledge that they will be restarted when needed. See the <a href='https://github.com/fly-apps/pdf-appliance/#scaling' title=''>README</a> for more information on scaling.</p>\n\n<p>Note that different machines can use different languages and frameworks. This code is written in JavaScript and runs on Bun. It was designed to support a Ruby on Rails app, but can be used with any app.</p>\n<h2 id='a-reusable-pattern' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-reusable-pattern' aria-label='Anchor'></a><span class='plain-code'>A Reusable Pattern</span></h2>\n<p>If your app is small and your usage is low, scaling may not be much of a\nconcern, but as your need grow your first instinct shouldn’t merely be to throw\nmore hardware at the problem, but rather to partition the problem so that each\nmachine has a somewhat predictable capacity.</p>\n\n<p>Do this by taking a look at your application, and look for requests that are\nsomehow different than the rest. Streaming audio and video files, handling websockets,\nconverting text to speech or performing other AI processing, long running\n“background” computation, fetching static pages, producing PDFs, and updating\ndatabases all have different profiles in terms of server load.</p>\n\n<p>It might even be helpful – purely as a thought experiment – to think of\nreplacing your main server with a proxy that does nothing more than route\nrequests to separate machines based on the type of workload performed.</p>\n\n<p>Once you have come up with an allocation of functions performed to pools of\nmachines, Fly-Replay is but one tool available to you. There is also a\n<a href='https://fly.io/docs/machines/working-with-machines/' title=''>Machines API</a> that will\nenable you to orchestrate whatever topology you can come up with.\n<a href='https://fly.io/laravel-bytes/cost-effective-queue-workers-with-fly-io-machines/' title=''>Cost-Effective Queue Workers With Fly.io\nMachines</a>\ngives a preview of what that would look like with Laravel.</p>", "image": { "url": "https://fly.io/blog/print-on-demand/assets/print-on-demand-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/new-launch/", "title": "Launching to Victory", "description": null, "url": "https://fly.io/blog/new-launch/", "published": "2023-11-28T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>Fly.io is the new public cloud for running your applications near your users so it can be faster than ever. When you create a new application, you use the <code>fly launch</code> command to give the platform all the information it needs to send it out into the sky. We’ve made steps towards making launching a new app <em>even easier</em> because first impressions matter. <a href=\"https://fly.io/docs/speedrun/\" title=\"\">Try the new <code>fly launch</code> now</a>; you can have an app up and running in mere minutes.</p>\n</div>\n<p>Previously when you ran <code>fly launch</code>, you got asked a bunch of hopefully relevant questions to help you get your app up and running. We’ve taken a lot of the guesswork out of the process and made it a lot more streamlined. It turns out that even though Fly.io developers use a variety of frameworks, languages, and toolchains you can fold most of them into a few basic infrastructure shapes.</p>\n<h2 id='the-new-launch' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-new-launch' aria-label='Anchor'></a><span class='plain-code'>The new launch</span></h2>\n<p>Now when you run <code>fly launch</code>, the CLI will infer what you want based on the source code of your application. For example, if you have a Rails app with SQLite, it’ll give you an opinionated set of defaults that you can build from. If you don’t, it’ll give you other options so you can craft the infrastructure you need. I took one of my older applications named <a href='https://douglas-adams-quotes.fly.dev/' title=''>douglas-adams-quotes</a> and launched it with the new flow. Here’s what it looks like:</p>\n\n<p><img alt=\"An animated GIF showing the new fully automated launch process. It starts by guessing what your app is and what needs it has, then presents you with a set of opinionated defaults so that you can confirm or deny. If you confirm it will build your application and deploy it, then give you the URL so you can use it.\" src=\"/blog/new-launch/assets/./the-gif-edited.gif\" /></p>\n\n<p>If the settings it guessed are good enough, you can launch it into the cloud. If not, then you’ll be taken to a webpage where you can confirm or change the settings it guessed.</p>\n\n<p>Once you say yes or confirm on the web, your app will get built and deployed (unless you asked it not to with <code>--no-deploy</code>). You’ll get a link to your app so you can go check it out. It’s that easy.</p>\n<h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2>\n<p>We hope that this can help you look before you <code>fly launch</code> into the wild unknowns of the cloud.</p>\n\n<p>Got any ideas or comments on how we can make this even smoother? Get in touch on our <a href='https://community.fly.io/' title=''>community forum</a>. We’d love to hear from you.</p>", "image": { "url": "https://fly.io/blog/new-launch/assets/thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/how-i-fly/", "title": "How I Fly", "description": null, "url": "https://fly.io/blog/how-i-fly/", "published": "2023-11-17T00:00:00.000Z", "updated": "2023-11-28T14:16:01.000Z", "content": "<div class=\"lead\"><p>We are Fly.io. We make it easy to run your programs close to your users. We make it easy to update your programs whenever you need to and communicate between your services in an end-to-end encrypted fashion. Today, Xe is going to tell you what they do to use Fly.io effectively. <a href=\"https://fly.io/docs/speedrun/\" title=\"\">Deploy your first app</a> for free and scale it up to production. That’s what Xe did.</p>\n</div>\n<p>I’m Xe Iaso. I’m a writer, technical educator, and philosopher who focuses on making technology easy to understand and scale to your needs. I use Fly.io to host my website and in nearly all of my personal projects now. Fly.io allows me to experiment with new ideas quickly and then deploy them to the world with ease.</p>\n<h2 id='what-is-fly-io' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-is-fly-io' aria-label='Anchor'></a><span class='plain-code'>What is Fly.io?</span></h2>\n<p>Fly.io lets you host your applications in data centers close to your users. Fly.io also lets you have rolling updates of your programs and facilitates easy communication between your services inside and outside of your organization’s private network.</p>\n\n<p>I use Fly.io to host my blog, its CDN (named XeDN for reasons which are an exercise for the reader), and a bunch of other supporting services that help make it run. It is easily the most fun I’ve had deploying things since I worked at Heroku.</p>\n<h2 id='my-blog' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#my-blog' aria-label='Anchor'></a><span class='plain-code'>My blog</span></h2>\n<p>My blog is made up of several parts: the backend blog server and the CDN. Both are written in Go, my favorite programming language. The back-end blog server runs in Toronto, but XeDN runs in 35 datacenters worldwide. I plan to eventually move my blog to be served from XeDN, but for right now it’s still comfortably running off of a single server in Toronto.</p>\n\n<p><img alt=\"The entire flow for how things run on Xesite.\" src=\"/blog/how-i-fly/assets/./rebuild-flow.svg\" /></p>\n\n<p>Overall, my website’s architecture looks like this. My website listens for updates from Patreon and GitHub to trigger rebuilds because of its <a href='https://xeiaso.net/blog/xesite-v4/' title=''>dystatic nature</a>. When I am working on new posts or building new assets, I upload them to Backblaze B2. Anytime someone tries to access one of the files on a XeDN node, it will download it from Backblaze B2 if it doesn’t have it locally already.</p>\n\n<p>With Fly.io, I don’t have to worry about the user experience being degraded when servers go down. If any individual XeDN server goes down, I can rely on the other XeDN servers worldwide to pick up the slack thanks to the fact that Fly.io will shunt the traffic to the servers that aren’t down. Combine this with some very aggressive caching logic for things like video assets, I can make sure that my blog is fast for everyone, no matter where they are in the world.</p>\n\n<p>Of course, it doesn’t end here. My CDN server is the back end that helps make my other projects work too. I spent some time working on a <a href='https://xeiaso.net/blog/iaso-fonts/' title=''>custom font</a> for all of my web properties, and I <a href='https://cdn.xeiaso.net/static/pkg/iosevka/specimen.html' title=''>serve it from my CDN</a> so that I can use it in every project of mine. This allows me to integrate it into other projects like <a href='https://arsene.fly.dev/' title=''>Arsène</a> without having to do anything special.</p>\n<h2 id='building-on-top-of-projects-with-fly-io' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#building-on-top-of-projects-with-fly-io' aria-label='Anchor'></a><span class='plain-code'>Building on top of projects with Fly.io</span></h2>\n<p>I like making projects that aren’t entirely serious. I love using these projects to explore aspects and bits of technology that I would have never gotten to play with before. One of these is <a href='https://arsene.fly.dev' title=''>Arsène</a>, a project I used to explore what a “dead internet” powered by AI could look like.</p>\n\n<p>Every 12 hours, Arsène will have the ChatGPT API generate new posts and then use Stable Diffusion to create a (hopefully relevant) illustration for that post. I run a copy of the <a href='https://github.com/AUTOMATIC1111/stable-diffusion-webui' title=''>Automatic1111</a> Stable Diffusion API in my private network. When Arsène generates an image, it reaches out to that Stable Diffusion API directly over that private network to make the calls it needs. Since XeDN is in the same private network, I can also have Arsène send the images there to be cached and served all over the world.</p>\n\n<p>Here’s what the total flow looks like:</p>\n\n<p><img alt=\"The flow of data for Arsène, showing how this lets me reuse projects\" src=\"/blog/how-i-fly/assets/./reuse-flow.svg\" /></p>\n\n<p>This means that when I am creating things, I am not just making one-off things that don’t work with each other. I am creating individual building blocks that interoperate with each other. I am creating opportunities for me to reuse my infrastructure to create brand new things that are robust and scalable with minimal effort on my end.</p>\n<h2 id='my-other-projects' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#my-other-projects' aria-label='Anchor'></a><span class='plain-code'>My other projects</span></h2>\n<p>I have some other projects that I’m working on that I don’t want to get into too much detail about yet, but it’s going to mostly involve transforming the basic ideas of using my CDN for distributing things and a webserver for sending HTML to users in new and interesting ways. I love using Fly.io for this because I am just allowed to create things instead of having to worry about how to implement it, where state is going to be stored, or how I’m going to scale it.</p>\n<div class=\"callout\"><p>Fly.io is the only platform where I’ve used where I can spin up 35 copies of a program as easily as one copy of a program.</p>\n</div><h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2>\n<p>If you haven’t given Fly.io a try yet, you’re really missing out. It is utterly trivial to deploy your application across the globe. Not to mention, when your applications are idle, you can have them scale down to zero copies. This means that you only pay for what you actually use. I don’t have to worry about overpaying for my blog by having a giant server in Helsinki running 24/7, even though I’m only using a small sliver of it.</p>\n\n<p>If you want to learn more about Fly.io, you can check out <a href='https://fly.io' title=''>fly.io</a>. My CDN cost me nothing until I started adding cover art per post and the <a href='https://xeiaso.net//blog/how-mara-works-2020-09-30/' title=''>conversation snippets</a> with furry stickers. It definitely went over the bar when I started uploading video. I can see it scaling in the future as my demands scale too.</p>\n\n<p>Of course, this is barely even scratching the surface. Stay tuned for secret tricks you can use to dynamically spin up and spin down machines as you need. Imagine uploading an image, automatically creating a machine to handle compressing it, and uploading it to your storage back end. Imagine what you could do if compute was a faucet that you could turn on and off as you needed it.</p>\n\n<p>You can do it on Fly.io. Try it today, you can run an app on a 256 MB Machine for free. XeDN ran on three 256 MB Machines for a year. Arsène still runs on a 256 MB Machine to this day. It’s more than enough for what you’re going to do. And when it isn’t, scaling up is <a href='https://fly.io/docs/about/pricing/' title=''>cheaper than you can imagine</a>.</p>", "image": { "url": "https://fly.io/blog/how-i-fly/assets/thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/transcribing-on-fly-gpu-machines/", "title": "Transcribing on Fly GPU Machines", "description": null, "url": "https://fly.io/blog/transcribing-on-fly-gpu-machines/", "published": "2023-11-13T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<div class=\"lead\"><p>Fly.io has GPUs! If you want to run AI (or whatever) workloads, checkout how to <a href=\"https://fly.io/docs/gpus/gpu-quickstart/\" title=\"\">get started with GPU Machines</a>!</p>\n</div>\n<p>Fly.io has GPU Machines, which means we can finally <del>play games</del> <del>mine bitcoin</del> <del>baghold NFTs</del> run AI workloads with just a few API calls.</p>\n\n<p>This is exciting! Running GPU workloads yourself is useful when the community™ builds upon available models to make them faster, more useful, or less restrictive than first-party APIs.</p>\n\n<p>One such tool is the <a href='https://github.com/ahmetoner/whisper-asr-webservice' title=''>Whisper Webservice</a>, which is conveniently packaged in a way that makes it a good candidate to use on Fly GPU Machines.</p>\n\n<p>Let’s see how to use Fly.io GPU by spinning up Whisper Webservice.</p>\n<h2 id='whisper-webservice' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whisper-webservice' aria-label='Anchor'></a><span class='plain-code'>Whisper Webservice</span></h2>\n<p>Whisper is OpenAI’s voice recognition service - it’s used for audio transcription. To use it anywhere that’s not OpenAI’s platform, you need <a href='https://github.com/openai/whisper' title=''>some Python</a>, a few GB of storage, and (preferably) a GPU.</p>\n\n<p>The aforementioned <a href='https://github.com/ahmetoner/whisper-asr-webservice' title=''>Whisper Webservice</a> packages this up for us, while making Whisper faster, more useful, and less restricted than OpenAI’s API:</p>\n\n<ol>\n<li>It provides a web API on top of Whisper’s Python library\n</li><li>It (optionally) integrates <a href='https://github.com/guillaumekln/faster-whisper' title=''>faster-whisper</a> to make it, you know, faster\n</li><li>It (optionally) uses FFmpeg to process the uploaded audio file, useful for getting audio out of video files or converting audio formats\n</li></ol>\n\n<p>Luckily for us, and totally <strong class='font-semibold text-navy-950'>not</strong> why I chose this as an example - the project provides GPU-friendly Docker images. We’ll use those to spin up Fly GPU Machines and process some audio files.</p>\n\n<p>(I’ll also show examples of making your own Docker image!)</p>\n<h2 id='running-a-gpu-machine' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#running-a-gpu-machine' aria-label='Anchor'></a><span class='plain-code'>Running a GPU Machine</span></h2>\n<p>Spinning up a GPU Machine is very similar to any other Machine. The main difference is the new “GPU kind” option (<code>--vm-gpu-kind</code>), which takes 2 possible values:</p>\n\n<ol>\n<li><code>a100-pcie-40gb</code>\n</li><li><code>a100-sxm4-80gb</code>\n</li></ol>\n\n<p>These are 2 flavors of Nvidia A100 GPUs, the difference worth caring about is <code>40</code> vs <code>80</code> GB of memory (here’s <a href='https://fly.io/docs/about/pricing/#gpus-and-fly-machines' title=''>pricing</a>).</p>\n\n<p>We’ll create machines using <code>a100-pcie-40gb</code> because we don’t need 80 freakin’ GB for what we’re doing.</p>\n\n<p>Using <code>flyctl</code> is a great way to run a GPU Machine. We’ll make an app and run the conveniently created <a href='https://hub.docker.com/r/onerahmet/openai-whisper-asr-webservice' title=''>Whisper Webservice Docker image</a> that supports Nvidia GPUs. The <code>flyctl</code> commands will default us into a <code>performance-8x</code> server size (8 CPUs, 16G ram) unless we specify something different.</p>\n\n<p><strong class='font-semibold text-navy-950'>One caveat:</strong> AI model files are big. Docker images ideally aren’t big - sending huge layers across the network angers the spiteful networking gods. If you shove models into your Docker images, you <em>might</em> have a bad time.</p>\n\n<p>We suggest creating a Fly Volume and making your Docker image download needed models when it first spins up. The Whisper service (and in my experience, OpenAI’s Python library) does that for us.</p>\n\n<p>So, we’ll create a volume to house (and cache) the models. In the case of the Whisper project, the models get placed in <code>/root/.cache/whisper</code> on its first boot, and so we’ll mount our disk there.</p>\n\n<p>Alright, let’s create a GPU Machine. Here’s what the process looks like:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-chwf29f7\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-chwf29f7\"><span class=\"nv\">APP_NAME</span><span class=\"o\">=</span><span class=\"s2\">\"whispering-zines\"</span>\n\nfly apps create <span class=\"nv\">$APP_NAME</span> <span class=\"nt\">-o</span> personal\n\n<span class=\"c\"># We \"hint\" --vm-gpu-kind so the volume</span>\n<span class=\"c\"># is provisioned on a GPU host</span>\n<span class=\"c\"># We choose region ord, where most Fly GPUs</span>\n<span class=\"c\"># currently live</span>\nfly volumes create whisper_zine_cache <span class=\"nt\">-s</span> 10 <span class=\"se\">\\</span>\n <span class=\"nt\">-a</span> <span class=\"nv\">$APP_NAME</span> <span class=\"nt\">-r</span> ord <span class=\"nt\">--vm-gpu-kind</span> a100-pcie-40gb\n\n<span class=\"c\"># Take note of the volume ID from the output ^</span>\n\n<span class=\"c\"># Run a machine that can accept web requests</span>\n<span class=\"c\"># from the public internet</span>\nfly machines run onerahmet/openai-whisper-asr-webservice:latest-gpu <span class=\"se\">\\</span>\n <span class=\"nt\">--vm-gpu-kind</span> a100-pcie-40gb <span class=\"se\">\\</span>\n <span class=\"nt\">-p</span> 443:9000/tcp:tls:http <span class=\"nt\">-p</span> 80:9000/tcp:http <span class=\"se\">\\</span>\n <span class=\"nt\">-r</span> ord <span class=\"se\">\\</span>\n <span class=\"nt\">-v</span> <VOLUME_ID>:/root/.cache/whisper <span class=\"se\">\\</span>\n <span class=\"nt\">-e</span> <span class=\"nv\">ASR_MODEL</span><span class=\"o\">=</span>large <span class=\"nt\">-e</span> <span class=\"nv\">ASR_ENGINE</span><span class=\"o\">=</span>faster_whisper <span class=\"se\">\\</span>\n <span class=\"nt\">-a</span> <span class=\"nv\">$APP_NAME</span>\n\n<span class=\"c\"># Allocate IPs so we can view it on the web</span>\nfly ips allocate-v4 <span class=\"nt\">--shared</span> <span class=\"nt\">-a</span> <span class=\"nv\">$APP_NAME</span>\nfly ips allocate-v6 <span class=\"nt\">-a</span> <span class=\"nv\">$APP_NAME</span>\n</code></pre>\n </div>\n</div>\n<p>That’s all pretty standard for Fly Machines, <strong class='font-semibold text-navy-950'>except</strong> for the <code>--vm-gpu-kind</code> flags used both for volume <strong class='font-semibold text-navy-950'>and</strong> Machine creation. Volumes are pinned to specific hosts - using this flag tells Fly.io to create the volume on a GPU host. Assuming we set the same region (<code>-r ord</code>), creating a GPU Machine with the just-created volume will tell Fly.io to place the Machine on the same host as the volume.</p>\n\n<p><strong class='font-semibold text-navy-950'>Note:</strong> As my machine started up, I saw a log line <code>WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.</code>, which ended up being an issue of timing. Once everything is running, I was able to see things were working by using <code>fly ssh console -a $APP_NAME</code> and running command <code>nvidia-smi</code> to confirm that the VM had a GPU. It also listed the running web service (Python in this case) was running as a GPU process.</p>\n\n<p>Once everything is running, you should be able to head to <code>$APP_NAME.fly.dev</code> and view it in the browser.</p>\n\n<p>The Whisper Webservice UI will let you try out individual calls in its API. This will also give you the information you need to make those calls from your code. There’s a link to the API specification (e.g. <code>$APP_NAME.fly.dev/openapi.json</code>) you can use to, say, have <a href='https://www.blobr.io/post/create-api-specs-chatgpt' title=''>ChatGPT generate a client</a> in your language of choice.</p>\n<h2 id='automating-gpu-machines' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#automating-gpu-machines' aria-label='Anchor'></a><span class='plain-code'>Automating GPU Machines</span></h2>\n<p>If you want to automate this, you can use the <a href='https://fly.io/docs/machines/working-with-machines/' title=''>Machines API</a> (spec <a href='https://docs.machines.dev/swagger/index.html' title=''>here</a>).</p>\n\n<p>An easy way to get started is to spy on the API requests <code>flyctl</code> is making:</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-13v3zt2f\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-13v3zt2f\"><span class=\"c\"># Debug logs will output the API requests / responses</span>\n<span class=\"c\"># made to Fly.io's API.</span>\n<span class=\"nv\">LOG_LEVEL</span><span class=\"o\">=</span>debug flyctl machine run ...\n</code></pre>\n </div>\n</div>\n<p>This helped me figure out why my own initial API attempts failed - it turns out we need some extra parameters in the <code>compute</code> portion of the request JSON for creating a volume, and the <code>guest</code> section for creating a Machine.</p>\n\n<p>For both volumes and Machines, we set the <code>gpu_kind</code> the same way we did in our <code>flyctl</code> command. However we <em>also</em> need the <code>cpu_kind</code> to be set. Additionally, when creating a Machine, we need to set <code>cpus</code> and <code>memory_mb</code> to <a href='https://fly.io/docs/machines/guides-examples/machine-sizing/' title=''>valid values</a> for <code>performance</code> Machines.</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-e14p7s3k\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-e14p7s3k\"><span class=\"nv\">APP_NAME</span><span class=\"o\">=</span><span class=\"s2\">\"whispering-zines\"</span>\n\n<span class=\"c\"># Create a volume on a GPU host. Specify both</span>\n<span class=\"c\"># cpu_kind and gpu_kind</span>\ncurl <span class=\"nt\">-H</span> <span class=\"s2\">\"Authorization: Bearer </span><span class=\"sb\">`</span>fly auth token<span class=\"sb\">`</span><span class=\"s2\">\"</span> <span class=\"se\">\\</span>\n <span class=\"nt\">-H</span> <span class=\"s2\">\"Accept: application/json\"</span> <span class=\"se\">\\</span>\n <span class=\"nt\">-H</span> <span class=\"s2\">\"Content-Type: application/json\"</span> <span class=\"se\">\\</span>\n https://api.machines.dev/v1/apps/<span class=\"nv\">$APP_NAME</span>/volumes <span class=\"se\">\\</span>\n <span class=\"nt\">-d</span> <span class=\"s1\">'{\n \"name\": \"whisper_zine_cache\",\n \"region\": \"ord\",\n \"size_gb\": 10,\n \"compute\": {\n \"cpu_kind\": \"performance\",\n \"gpu_kind\": \"a100-pcie-40gb\"\n }\n }'</span>\n\n<span class=\"c\"># Take note of the volume ID from the response ^</span>\n\n<span class=\"c\"># Run a machine that can accept web requests</span>\n<span class=\"c\"># from the public internet.</span>\ncurl <span class=\"nt\">-H</span> <span class=\"s2\">\"Authorization: Bearer </span><span class=\"sb\">`</span>fly auth token<span class=\"sb\">`</span><span class=\"s2\">\"</span> <span class=\"se\">\\</span>\n <span class=\"nt\">-H</span> <span class=\"s2\">\"Accept: application/json\"</span> <span class=\"se\">\\</span>\n <span class=\"nt\">-H</span> <span class=\"s2\">\"Content-Type: application/json\"</span> <span class=\"se\">\\</span>\n https://api.machines.dev/v1/apps/<span class=\"nv\">$APP_NAME</span>/machines <span class=\"se\">\\</span>\n <span class=\"nt\">-d</span> <span class=\"s1\">'{\n \"region\": \"ord\",\n \"config\": {\n \"env\": {\n \"ASR_ENGINE\": \"faster_whisper\",\n \"ASR_MODEL\": \"large\",\n \"FLY_PROCESS_GROUP\": \"app\",\n \"PRIMARY_REGION\": \"ord\"\n },\n \"mounts\": [\n {\n \"path\": \"/root/.cache/whisper\",\n \"volume\": \"<VOLUME_ID>\",\n \"name\": \"data\"\n }\n ],\n \"services\": [\n {\n \"protocol\": \"tcp\",\n \"internal_port\": 9000,\n \"autostop\": false,\n \"ports\": [\n {\n \"port\": 80,\n \"handlers\": [\n \"http\"\n ],\n \"force_https\": true\n },\n {\n \"port\": 443,\n \"handlers\": [\n \"http\",\n \"tls\"\n ]\n }\n ]\n }\n ],\n \"image\": \"onerahmet/openai-whisper-asr-webservice:latest-gpu\",\n \"guest\": {\n \"cpus\": 8,\n \"memory_mb\": 16384,\n \"cpu_kind\": \"performance\",\n \"gpu_kind\": \"a100-pcie-40gb\"\n }\n }\n }'</span>\n</code></pre>\n </div>\n</div>\n<p>After that we can assign the app some IPs. You can use <code>flyctl</code> for this, or the <a href='https://api.fly.io/graphql' title=''>graphql API.</a> You can once again use debug mode with <code>flyctl</code> to see what API calls it makes. Side note: Eventually the Machines REST API will include the ability to allocate IP addresses.</p>\n<div class=\"highlight-wrapper group relative bash\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-v5sntcmu\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-v5sntcmu\">fly ips allocate-v4 <span class=\"nt\">--shared</span> <span class=\"nt\">-a</span> <span class=\"nv\">$APP_NAME</span>\nfly ips allocate-v6 <span class=\"nt\">-a</span> <span class=\"nv\">$APP_NAME</span>\n</code></pre>\n </div>\n</div>\n<p>If you’re doing this type of work for your business, you may want to keep these Machines inside a private network anyway, in which case you won’t be assigning it IP addresses.</p>\n<h2 id='making-your-own-images' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#making-your-own-images' aria-label='Anchor'></a><span class='plain-code'>Making Your Own Images</span></h2>\n<p>There is, luckily (for me, a hardware ignoramus) less dark magic to making GPU-friendly Docker images than you might think. Basically you need to just install the correct Nvidia drivers.</p>\n\n<p>A way to cheat at this is to run <a href='https://github.com/NVIDIA/nvidia-container-toolkit/tree/main' title=''>Nvidia cuda base images</a>, but you’re made of sterner stuff, you can also start with a base Ubuntu image and install your own.</p>\n\n<p>While the Whisper webservice image is based on <code>nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04</code>, I got Whisper (plain, not the webservice) working with <code>ubuntu:22.04</code>:</p>\n<div class=\"highlight-wrapper group relative dockerfile\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-qjwtp7g3\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-qjwtp7g3\"><span class=\"c\"># Base image</span>\n<span class=\"k\">FROM</span><span class=\"s\"> ubuntu:22.04</span>\n\n<span class=\"k\">RUN </span>apt update <span class=\"nt\">-q</span> <span class=\"o\">&&</span> apt <span class=\"nb\">install</span> <span class=\"nt\">-y</span> ca-certificates wget <span class=\"se\">\\\n</span> <span class=\"o\">&&</span> wget <span class=\"nt\">-qO</span> /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb <span class=\"se\">\\\n</span> <span class=\"o\">&&</span> dpkg <span class=\"nt\">-i</span> /cuda-keyring.deb <span class=\"o\">&&</span> apt update <span class=\"nt\">-q</span> <span class=\"se\">\\\n</span> <span class=\"o\">&&</span> apt <span class=\"nb\">install</span> <span class=\"nt\">-y</span> <span class=\"nt\">--no-install-recommends</span> ffmpeg libcudnn8 libcublas-12-2 <span class=\"se\">\\\n</span> git python3 python3-pip\n\n<span class=\"k\">WORKDIR</span><span class=\"s\"> /app</span>\n<span class=\"k\">COPY</span><span class=\"s\"> audio.mp3</span>\n<span class=\"k\">COPY</span><span class=\"s\"> run.py /app/run.py</span>\n\n<span class=\"k\">CMD</span><span class=\"s\"> [\"python3\" \"run.py\"]</span>\n</code></pre>\n </div>\n</div>\n<p>You can find a full, <a href='https://github.com/fly-apps/whisper-example' title=''>working version of this here</a>.</p>\n<h2 id='this-time-its-different-i-guess' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#this-time-its-different-i-guess' aria-label='Anchor'></a><span class='plain-code'>This time it’s different, I guess</span></h2>\n<p>AI feels a bit different than previous trends in that it has immediately-obvious benefits. No one needs to throw around catchy phrases with a wink-wink nudge-nudge (“we like the art”) for us to find value.</p>\n\n<p>Since AI workloads work most efficiently in GPUs, they remain a hot commodity. For those of us who didn’t purchase enough $NVDA to retire, we can bring more value to our businesses by adding in AI.</p>\n\n<p>Fly Machines have always been a great little piece of tech to run “ephemeral compute workloads” (wait, do I work at AWS!?) - and this is what I like about GPU Machines. You can mix and match all sorts of AI stuff together to make a chain of useful tools!</p>", "image": { "url": "https://fly.io/blog/transcribing-on-fly-gpu-machines/assets/whispering-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/skip-the-api/", "title": "Skip the API, Ship Your Database", "description": null, "url": "https://fly.io/blog/skip-the-api/", "published": "2023-09-13T00:00:00.000Z", "updated": "2023-11-21T21:08:37.000Z", "content": "<div class=\"lead\"><p>With Fly.io, <a href=\"https://fly.io/docs/speedrun/\" title=\"\">you can get your app running globally in a matter of minutes</a>, and with LiteFS, you can run SQLite alongside your app! Now we’re introducing LiteFS Cloud: managed backups and point-in-time restores for LiteFS. <a href=\"https://fly.io/docs/litefs/speedrun/\" title=\"\">Try it out for yourself</a>!</p>\n</div>\n<p>My favorite part about building tools is discovering their unintended uses. It’s like starting to write a murder mystery book but you have no idea who the killer is!</p>\n\n<p>History is filled with examples of these accidental discoveries: WD-40 was originally <a href='https://en.wikipedia.org/wiki/WD-40#History' title=''>used to protect ICBMs from rust</a> and now it fixes your squeaky doorknob. Bubble wrap was <a href='https://en.wikipedia.org/wiki/Bubble_Wrap_(brand)#History' title=''>originally sold as wallpaper</a> and now it protects your Amazon packages.</p>\n\n<p>When we started writing <a href='https://fly.io/docs/litefs/' title=''>LiteFS</a>, a distributed SQLite database, we thought it would be used to distribute data geographically so users in, say, Bucharest see response times as fast as users in San Jose. And for the most part, that’s what LiteFS users are doing.</p>\n\n<p>But we discovered another unexpected use: replacing the API layer between services with SQLite databases.</p>\n<h2 id='how-it-started' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-it-started' aria-label='Anchor'></a><span class='plain-code'>How it started</span></h2>\n<p>In the early days of LiteFS development, we wanted to find a real-world test bed for our tool so we could smoke out any bugs that we didn’t find during automated tests. Part of our existing infrastructure is a program called <em>Corrosion</em> that gossips state between all our servers. Corrosion tracks VM statuses, health checks, and a plethora of other information for each server and communicates this info with other servers so they can make intelligent decisions about request routing and VM placement. Corrosion keeps a fast, local copy of all this data in a SQLite database.</p>\n\n<p>So we set up a Corrosion instance that also ran on top of LiteFS. This helped root out some bugs but we also found another use for it: making Corrosion accessible to our internal services.</p>\n\n<p><img src=\"/blog/skip-the-api/assets/corrosion.png\" /></p>\n<h2 id='shipping-the-kitchen-sink' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#shipping-the-kitchen-sink' aria-label='Anchor'></a><span class='plain-code'>Shipping the kitchen sink</span></h2>\n<p>The typical approach to making data available between services is to spend weeks designing an API and then building a service around it. Your API design needs to take into account the different use cases of each consuming service so that it can deliver the data it needs efficiently. You don’t want your clients making a dozen API calls for every request!</p>\n\n<p><img src=\"/blog/skip-the-api/assets/architecture.png\" /></p>\n\n<p>A different approach is to skip the API design entirely and just ship the entire database to your client. You don’t need to consider the consuming service’s access patterns as they can use vanilla SQL to query and join whatever data their heart desires. That’s what we did using LiteFS.</p>\n\n<p>While we could have set up each downstream service as a Corrosion node, gossip protocols can be chatty and we really just needed a one-way stream of updates. Setting up a read-only LiteFS instance for a new service is simple—it just needs the hostname of the upstream primary node to connect to:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-e631uyyz\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-e631uyyz\">lease:\n type: \"static\"\n candidate: false\n advertise-url: \"http://corrosion-bridge:20202\n</code></pre>\n </div>\n</div>\n<p>And voila! You have a full, read-only copy of the database on your app.</p>\n<h2 id='moving-compute-to-the-client' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#moving-compute-to-the-client' aria-label='Anchor'></a><span class='plain-code'>Moving compute to the client</span></h2>\n<p>API design is notoriously difficult as it’s hard to know what your consuming services will need. Query languages such as <a href='https://graphql.org/' title=''>GraphQL</a> have even been invented for this specific problem!</p>\n\n<p>However, GraphQL has its own limitations. It’s good for fetching raw data but it lacks built-in <a href='https://www.sqlite.org/lang_aggfunc.html' title=''>aggregation</a> & advanced querying capabilities like <a href='https://www.sqlite.org/windowfunctions.html' title=''>windowing</a>. GraphQL is typically layered on top of an existing relational database that uses SQL. So why not just use SQL?</p>\n\n<p>Additionally, performing queries on your service means that you need to handle multiple tenants competing for compute resources. Managing these tenants involves rate limiting and query timeouts so that no one client consumes all the resources.</p>\n\n<p>By pushing a read-only copy of the database to clients, these restrictions aren’t a concern anymore. A tenant can use 100% of its CPU for hours if it wants to. It won’t adversely affect any other tenant because the query is running on its own hardware.</p>\n<h2 id='so-whats-the-downside' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#so-whats-the-downside' aria-label='Anchor'></a><span class='plain-code'>So what’s the downside?</span></h2>\n<p>There’s always trade-offs with any technology and shipping read-only replicas is no different. One obvious limitation of read-only replicas is that they’re read-only. If your clients need to update data, they’ll still need an API for those mutations.</p>\n\n<p>A less obvious downside is that the contract for a database can be less strict than an API. One benefit to an API layer is that you can change the underlying database structure but still massage data to look the same to clients. When you’re shipping the raw database, that becomes more difficult. Fortunately, many database changes, such as adding columns to a table, are backwards compatible so clients don’t need to change their code. Database views are also a great way to reshape data so it stays consistent—even when the underlying tables change.</p>\n\n<p>Finally, shipping a database limits your ability to restrict access to data. If you have a multi-tenant database, you can’t ship that database without the client seeing all the data. One workaround for this is to use a database per tenant. SQLite databases are lightweight since they are just files on disk. This also has the added benefit of preventing queries in your application from accidentally fetching data across tenants.</p>\n<h2 id='where-do-we-take-this-next' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-do-we-take-this-next' aria-label='Anchor'></a><span class='plain-code'>Where do we take this next?</span></h2>\n<p>While this approach has worked well for some internal tooling, how does this look in the broader world of software? APIs are likely stick around for the foreseeable future so providing read-only database replicas make sense for specific use cases where those APIs aren’t a great fit.</p>\n\n<p>Imagine being able to query all your Stripe data or your GitHub data from a local database. You could join that data on to your own dataset and perform fast queries on your own hardware.</p>\n\n<p>While companies such as Stripe or GitHub likely colocate their tenant data into one database, many companies run an event bus using tools like Kafka which could allow them to generate per-tenant SQLite databases to then stream to customers.</p>\n\n<p>Pushing queries out to the end user has huge benefits for both the data provider & the data consumer in terms of flexibility and power.</p>", "image": { "url": "https://fly.io/blog/skip-the-api/assets/skip-the-api-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/sentry-partnership/", "title": "Automated Sentry Error Tracking", "description": null, "url": "https://fly.io/blog/sentry-partnership/", "published": "2023-09-12T00:00:00.000Z", "updated": "2023-11-21T21:08:37.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io. We put your code into lightweight microVMs on our own hardware <a href=\"https://fly.io/docs/reference/regions/\" title=\"\">around the world</a>, close to your users. We partnered with <a href=\"https://sentry.io\" title=\"\">Sentry</a> to bring error and performance monitoring to your apps. Deploy your first app, and automatically get a year’s worth of credits to Sentry’s <a href=\"https://sentry.io/pricing/\" title=\"\">Team Plan</a> credits. <a href=\"https://fly.io/docs/speedrun/\" title=\"\">Check us out</a>—your app can be deployed and instrumented in minutes.</p>\n</div>\n<p>We’ve been using Sentry since the dawn of the internet. Or at least as far back as the <a href='https://home.cern/science/physics/higgs-boson/how' title=''>discovery</a> of the Higgs boson. Project to project, the familiar Sentry issue detail screen has been our faithful debugging companion.</p>\n\n<p>Today it’s no exception: All of our Golang, Elixir, Ruby and Rust services report dutifully to Sentry.</p>\n\n<p>So, it felt natural to integrate Sentry as the default error monitoring tool. All new deployments on Fly.io get a Sentry project provisioned automatically. Existing apps can grab theirs with <code>flyctl ext sentry create</code>.</p>\n\n<p>Each Fly.io organization receives, for one year, a generous monthly quota:</p>\n\n<ul>\n<li>50,000 Error events\n</li><li>100,000 Performance units\n</li><li>500 Session Replays\n</li><li>1GB of storage for Attachments\n</li></ul>\n\n<p>Once your app is instrumented, you’ll automatically get notified of production errors, latency issues, and crashes as soon as they occur in production. Sentry’s Team plan also gives you access to over 40 integrations, unlimited seats, and custom alerting.</p>\n<h2 id='auto-instrumenting-rails' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#auto-instrumenting-rails' aria-label='Anchor'></a><span class='plain-code'>Auto-instrumenting Rails</span></h2>\n<p>To see Sentry in action, let’s launch our <a href='https://github.com/fly-apps/boomer' title=''>Boomer Rails App</a>. Yes kids, Rails is old school, and it’s the easiest framework to auto-instrument.</p>\n\n<p>When <code>flyctl launch</code> detects a Rails app, it’s automatically setup to use a freshly minted Sentry project. Gems are installed, initializers planted, and finally, the <code>SENTRY_DSN</code> secret is set for deployment. We redacted some output for brevity.</p>\n<div class=\"highlight-wrapper group relative cmd\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-onfm6lp2\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight cmd'><code id=\"code-onfm6lp2\">fly deploy\n</code></pre>\n </div>\n</div><div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-93jth4av\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-93jth4av\">==> Verifying app config\n\n...\n\nYour Sentry project is ready. See details and next steps with: flyctl apps errors\n\nSetting the following secrets on boomerang:\nSENTRY_DSN\n\n...\n\nVisit your newly deployed app at https://boomerang.fly.dev/\n</code></pre>\n </div>\n</div>\n<p>Now, having Sentry configured at launch time means that deployment errors are captured early. This is useful for situations where apps fail to boot, run out of memory, and so on.</p>\n\n<p>Now let’s force an application exception. We visit the app root, which goes Boom, thanks to some hastily written Ruby code.</p>\n<div class=\"highlight-wrapper group relative cmd\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-suswx77f\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight cmd'><code id=\"code-suswx77f\">flyctl open\n</code></pre>\n </div>\n</div>\n<p><img src=\"/blog/sentry-partnership/assets/boom-cover.webp?card¢er\" /></p>\n\n<p>Oh shucks. Something went wrong. But, I got an email about this error.</p>\n\n<p><img src=\"/blog/sentry-partnership/assets/email-cover.webp?card¢er\" /></p>\n\n<p>We could click “View on Sentry”. Instead, let’s use <code>flyctl</code> to send us to the Sentry issues dashboard.</p>\n<div class=\"highlight-wrapper group relative cmd\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-3seig9v4\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight cmd'><code id=\"code-3seig9v4\">flyctl apps errors\n</code></pre>\n </div>\n</div>\n<p>We click through to this specific issue.</p>\n\n<p><img src=\"/blog/sentry-partnership/assets/dash.webp?card¢er\" /></p>\n\n<p>We successfully debugged our issue. The takeaway: don’t raise when you can call.</p>\n\n<p>Error tracking on Sentry is just scratching the surface. Check out their <a href='https://docs.sentry.io/product/performance/' title=''>performance monitoring</a>, <a href='https://docs.sentry.io/product/session-replay' title=''>session replay</a>, <a href='https://docs.sentry.io/product/alerts/' title=''>alerting</a> and <a href='https://docs.sentry.io/product/' title=''>much more</a>.</p>\n<h2 id='next-steps-for-fly-io-and-sentry' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#next-steps-for-fly-io-and-sentry' aria-label='Anchor'></a><span class='plain-code'>Next Steps for Fly.io and Sentry</span></h2>\n<p>For our next trick, we’ll be tracking Fly.io releases in Sentry, so Sentry can link issues to their <a href='https://docs.sentry.io/product/releases/' title=''>release tracking</a> feature.\nWe’ll also send events like <a href='https://fly.io/docs/getting-started/troubleshooting/#out-of-memory-oom-or-high-cpu-usage' title=''>out-of-memory errors</a> to Sentry. The possibilities are endless.</p>\n\n<p>Got ideas or comments? Get in touch on our <a href='https://community.fly.io/' title=''>community forum</a>.</p>", "image": { "url": "https://fly.io/blog/sentry-partnership/assets/sentry-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/tracking-consistency-with-litefs/", "title": "Tracking Application-Level Consistency with LiteFS", "description": null, "url": "https://fly.io/blog/tracking-consistency-with-litefs/", "published": "2023-08-30T00:00:00.000Z", "updated": "2023-11-21T21:08:37.000Z", "content": "<div class=\"lead\"><p>With Fly.io, <a href=\"https://fly.io/docs/speedrun/\" title=\"\">you can get your app running globally in a matter of minutes</a>, and with LiteFS, you can run SQLite alongside your app! Now we’re introducing LiteFS Cloud: managed backups and point-in-time restores for LiteFS. <a href=\"https://fly.io/docs/litefs/speedrun/\" title=\"\">Try it out for yourself</a>!</p>\n</div>\n<p>When we started the <a href='https://fly.io/docs/litefs/' title=''>LiteFS</a> project a year ago, we started more with an ideal in mind rather than a specific implementation. We wanted to make it possible to not only run distributed SQLite but we also wanted to make it… <em>gasp</em>… easy!</p>\n\n<p>There were hurdles that we expected to be hard, such as intercepting SQLite transaction boundaries via syscalls or shipping logs around the world while ensuring data integrity. But there was one hurdle that was unexpectedly hard: maintaining a consistent view from the application’s perspective.</p>\n\n<p>LiteFS requires write transactions to only be performed at the primary node and then those transactions are shipped back to replicas instantaneously. Well, almost instantaneously. And therein lies the crux of our problem.</p>\n\n<p>Let’s say your user sends a write request to write to the primary node in Madrid and the user’s next read request goes to a local read-only replica in Rio de Janeiro. Most of the time LiteFS completes replication quickly and everything is fine. But if your request arrives a few milliseconds before data is replicated, then your user sees the database state from before the write occurred. That’s no good.</p>\n\n<p>How exactly do we handle that when our database lives outside the user’s application?</p>\n<h2 id='our-initial-series-of-failures-or-how-we-tried-to-teach-distributed-systems-to-users' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-initial-series-of-failures-or-how-we-tried-to-teach-distributed-systems-to-users' aria-label='Anchor'></a><span class='plain-code'>Our initial series of failures, or how we tried to teach distributed systems to users</span></h2>\n<p>Our first plan was to let LiteFS users manage consistency themselves. Every application may have different needs and, honestly, we didn’t have a better plan at the time. However, once we started explaining how to track replication state, it became obvious that it was going to be an untenable approach. Let’s start with a primer and you’ll understand why.</p>\n\n<p>Every node in LiteFS maintains a <em>replication position</em> for each database which consists of two values:</p>\n\n<ul>\n<li>Transaction ID (TXID): An identifier that monotonically increases with every successful write transaction.\n</li><li>Post-Apply Checksum: A checksum of the entire database after the transaction has been written to disk.\n</li></ul>\n\n<p>You can read the current position from your LiteFS mount from the <code>-pos</code> file:</p>\n<div class=\"highlight-wrapper group relative \">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-wv6ha7bx\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-wv6ha7bx\">$ cat /litefs/my.db-pos\n000000000042478b/8b73bc1d07d84988\n</code></pre>\n </div>\n</div>\n<p>This example shows that we are at TXID <code>0x42478b</code> (or 4,343,691 in decimal) and the checksum of our whole database after the transaction is <code>8b73bc1d07d84988</code>. A replica can detect how far it’s lagging behind by comparing its position to the primary’s position. Typically, a monotonic transaction ID doesn’t work in asynchronous replication systems like LiteFS but when we couple it with a checksum it allows us to check for divergence so the pair works surprisingly well.</p>\n\n<p>LiteFS handles the replication position internally, however, it would be up to the application to check it to ensure that its clients saw a consistent view. This meant that the application would have needed to have its clients track the TXID from their last write to the primary and then the application would have to wait until its local replication caught up to that position before it could serve the request.</p>\n\n<p>That would have been a lot to manage. While you may find the nuts and bolts of replication interesting, sometimes you just want to get your app up and running!</p>\n<h2 id='lets-use-a-library-er-libraries' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lets-use-a-library-er-libraries' aria-label='Anchor'></a><span class='plain-code'>Let’s use a library! Er, libraries.</span></h2>\n<p>Teaching distributed systems to each and every LiteFS user was not going to work. So instead, we thought we could tuck that complexity away by providing a LiteFS client library. Just import a package and you’re done!</p>\n\n<p>Libraries are a great way to abstract away the tough parts of a system. For example, nobody wants to roll their own cryptography implementation so they use a library. But LiteFS is a database so it needs to work across all languages which means we needed to implement a library for each language.</p>\n\n<p>Actually, it’s worse than that. We need to act as a traffic cop to redirect incoming client requests to make sure they arrive at the primary node for writes or that they see a consistent view on a replica for reads. We aren’t able to redirect writes at the data layer so it’s typically handled at the HTTP layer. Within each language ecosystem there can be a variety of web server implementations: Ruby has Rails & Sinatra, Go has net/http, gin, fasthttp, and whatever 12 new routers came out this week.</p>\n<h2 id='moving-up-the-abstraction-stack' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#moving-up-the-abstraction-stack' aria-label='Anchor'></a><span class='plain-code'>Moving up the abstraction stack</span></h2>\n<p>Abstraction often feels like a footgun. Generalizing functionality across multiple situations means that you lose flexibility in specific situations. Sometimes that means you shouldn’t abstract but sometimes you just haven’t found the right abstraction layer yet.</p>\n\n<p>For better or for worse, HTTP & REST-like applications have become the norm in our industry and some of the conventions provide a great layer for LiteFS to build upon. Specifically, the convention of using <code>GET</code> requests for reading data and the other methods (<code>POST</code>, <code>PUT</code>, <code>DELETE</code>, etc) for writing data.</p>\n\n<p>Instead of developers injecting a LiteFS library into their application, we built a thin HTTP proxy that lives in front of the application.</p>\n\n<p><img alt=\"Wrapping the application with a proxy & FUSE mount.\" src=\"https://slabstatic.com/prod/uploads/p1b436gf/posts/images/25yuWQlLKyLrkHBDFVcbU8to.png\" /></p>\n\n<p>This approach has let us manage both the incoming client side via HTTP as well as the backend data plane via our FUSE mount. It lets us isolate the application developer from the low-level details of LiteFS replication while making it feel like they’re developing against vanilla SQLite.</p>\n<h2 id='how-it-works' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-it-works' aria-label='Anchor'></a><span class='plain-code'>How it works</span></h2>\n<p>The LiteFS proxy design is simple but effective. As an example, let’s start with a write request. A user creates a new order so they send a <code>POST /orders</code> request to your web app. The LiteFS proxy intercepts the request & parses the HTTP headers to see that it’s a <code>POST</code> write request. If the local node is a replica, the proxy forwards the request to the primary node.</p>\n\n<p>If the local node is the primary, it’ll pass the request through to the application’s web server and the request will be processed normally. When the response begins streaming out to the client, the proxy will attach a cookie with the TXID of the newly-written commit.</p>\n\n<p>When the client then sends a <code>GET</code> read request, the LiteFS proxy again intercepts it and parses the headers. It can see the TXID that was set in the cookie on the previous write and the proxy will check it against the replication position of the local replica. If replication has caught up to the client’s last write transaction, it’ll pass through the request to the application. Otherwise, it’ll wait for the local node to catch up or it will eventually time out. The proxy is built into the <code>litefs</code> binary so communication with the internal replication state is wicked fast.</p>\n<h2 id='preventing-laggards' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#preventing-laggards' aria-label='Anchor'></a><span class='plain-code'>Preventing laggards</span></h2>\n<p>The proxy provides another benefit: health checks. Networks and servers don’t always play nice when they’re communicating across the world and sometimes they get disconnected. The proxy hooks into the LiteFS built-in heartbeat system to detect lag and it can report the node as unhealthy via a health check URL when this lag exceeds a threshold.</p>\n\n<p>If you’re running on Fly.io, we’ll take that node out of rotation when health checks begin reporting issues so users will automatically get routed to a different, healthy replica. When the replica reconnects to the primary, the health check will report as healthy and the node will rejoin.</p>\n<h2 id='the-tradeoffs-theres-always-tradeoffs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-tradeoffs-theres-always-tradeoffs' aria-label='Anchor'></a><span class='plain-code'>The Tradeoffs… there’s always tradeoffs!</span></h2>\n<p>Despite how well the LiteFS proxy works in most situations, there’s gonna be times when it doesn’t quite fit. For example, if your application cannot rely on cookies to track application state then the proxy won’t work for you.</p>\n\n<p>There are also frameworks, like <a href='https://www.phoenixframework.org/' title=''>Phoenix</a>, which can rely heavily on websockets for live updates so this circumvents your traditional HTTP request/response approach that LiteFS proxy depends on. Finally, the proxy provides <a href='https://jepsen.io/consistency/models/read-your-writes' title=''>read-your-writes</a> guarantees which may not work for every application out there.</p>\n\n<p>In these cases, <a href='https://github.com/superfly/litefs/issues/new' title=''>let us know how we can improve the proxy</a> to make it work for more use cases! We’d love to hear your thoughts.</p>\n<h2 id='diving-in-further' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#diving-in-further' aria-label='Anchor'></a><span class='plain-code'>Diving in further</span></h2>\n<p>The LiteFS proxy makes it easy to run SQLite applications in multiple regions around the world. You can even run many legacy applications with little to no change in the code.</p>\n\n<p>If you’re interested in setting up LiteFS, check out our <a href='https://fly.io/docs/litefs/getting-started-fly/' title=''>Getting Started</a> guide. You can find additional details about configuring the proxy on our <a href='https://fly.io/docs/litefs/proxy/' title=''>Built-in HTTP Proxy</a> docs page.</p>", "image": { "url": "https://fly.io/blog/tracking-consistency-with-litefs/assets/tracking-consistency-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/redundant-logs/", "title": "Multiple Logs for Resiliency", "description": null, "url": "https://fly.io/blog/redundant-logs/", "published": "2023-07-21T00:00:00.000Z", "updated": "2023-11-21T21:08:37.000Z", "content": "<p>You’ve done everything right. You are well aware of\n<a href='https://en.wikipedia.org/wiki/Murphy%27s_law' title=''>Murphy’s Law</a>.\nYou have multiple redundant machines. You’ve set up\na regular back up schedule for your database, perhaps even are using\n<a href='https://fly.io/blog/litefs-cloud/' title=''>LiteFS CLoud</a>. You\n<a href='https://fly.io/blog/shipping-logs/' title=''>ship your logs</a> to\n<a href='https://logtail.com/' title=''>LogTail</a> or perhaps some other\n<a href='https://github.com/superfly/fly-log-shipper#provider-configuration' title=''>provider</a>\nso you can do forensic analysis should anything go wrong…</p>\n\n<p>Then the unexpected happens. A major network outage causes your application to\nmisbehave. What’s worse is that your logs are missing crucial data from this\npoint, perhaps because of the same network outage. Maybe this time you are\nlucky and you can find the data you need by using copies of your logs via\n<a href='https://fly.io/docs/flyctl/logs/' title=''>flyctl logs</a> or the monitoring tab on the\n<a href='https://fly.io/docs/flyctl/dashboard/' title=''>flyctl dashboard</a> before they\ndisappear forever.</p>\n\n<p>So, what is going on here? Let’s look at the steps. Your application writes\nlogs to STDOUT. Fly.io will take that output and send it to\n<a href='https://nats.io/' title=''>NATS</a>. The <a href='https://github.com/superfly/fly-log-shipper' title=''>Log\nShipper</a> will take that data and\nhand it to <a href='https://vector.dev/docs/about/what-is-vector/' title=''>Vector</a>. From\nthere it is shipped to your third party logging provider. That’s a lot of\nmoving parts.</p>\n\n<p>All that is great, but just like how you have redundant machines in case of\nfailures, you may want to have redundant logs in addition to the ones fly.io\nand the log shipper provide. Below are two strategies for doing just that.\nYou can use either or both, and best of all the logs you create will be in\naddition to your existing logs.</p>\n<h2 id='logging-to-multiple-places' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#logging-to-multiple-places' aria-label='Anchor'></a><span class='plain-code'>Logging to multiple places</span></h2>\n<p>The following approach is likely the most failsafe, but often the least\nconvenient: having your primary application on each machine write to a\nseparate log file in addition to standard out. This does mean that when\nyou need this data you will have to fetch it from each machine and it\nlikely with be rather raw. But at least it will be there even in the face\nof network failures.</p>\n\n<p>For best results put these logs on a\n<a href='https://fly.io/docs/reference/volumes/' title=''>volume</a> so that it survives\na restart, and be prepared to rotate logs as they grow in size so\nthat they don’t eventually fill up that volume.</p>\n\n<p>This approach is necessarily framework specific, but most\nframeworks provides some ability to do this. A Rails example:</p>\n<div class=\"highlight-wrapper group relative ruby\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-2yaa45j3\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-2yaa45j3\"><span class=\"n\">logger</span> <span class=\"o\">=</span> <span class=\"no\">ActiveSupport</span><span class=\"o\">::</span><span class=\"no\">Logger</span><span class=\"p\">.</span><span class=\"nf\">new</span><span class=\"p\">(</span><span class=\"no\">STDOUT</span><span class=\"p\">)</span>\n<span class=\"n\">logger</span><span class=\"p\">.</span><span class=\"nf\">formatter</span> <span class=\"o\">=</span> <span class=\"n\">config</span><span class=\"p\">.</span><span class=\"nf\">log_formatter</span>\n<span class=\"n\">volume_logger</span> <span class=\"o\">=</span> <span class=\"no\">ActiveSupport</span><span class=\"o\">::</span><span class=\"no\">Logger</span><span class=\"p\">.</span><span class=\"nf\">new</span><span class=\"p\">(</span><span class=\"s2\">\"/logs/production.log\"</span><span class=\"p\">,</span> <span class=\"mi\">3</span><span class=\"p\">)</span>\n<span class=\"n\">logger</span> <span class=\"o\">=</span> <span class=\"n\">logger</span><span class=\"p\">.</span><span class=\"nf\">extend</span> <span class=\"no\">ActiveSupport</span><span class=\"o\">::</span><span class=\"no\">Logger</span><span class=\"p\">.</span><span class=\"nf\">broadcast</span><span class=\"p\">(</span><span class=\"n\">volume_logger</span><span class=\"p\">)</span>\n</code></pre>\n </div>\n</div>\n<p>You probably already have the first two lines already in your\n<code>config/environments/production.rb</code> file. Adjust and add the last\ntwo lines. That’s it! You now have redundant logs.</p>\n\n<p>See the <a href='https://docs.ruby-lang.org/en/master/Logger.html#class-Logger-label-Log+File+Rotation' title=''>Ruby docs for\nLogger</a>\ndocumentation on how to handle log rotation.</p>\n\n<p>Some pointers for other frameworks:</p>\n\n<ul>\n<li><a href='https://dev.to/darnahsan/elixir-logging-to-multiple-files-using-metadatafilter-3896' title=''>Elixir</a>\n</li><li><a href='https://laravel.com/docs/10.x/logging' title=''>Laravel</a>\n</li><li><a href='https://docs.python.org/3/howto/logging-cookbook.html#multiple-handlers-and-formatters' title=''>Python</a>\n</li><li><a href='https://github.com/winstonjs/winston#multiple-transports-of-the-same-type' title=''>Winston</a> for Node applications\n</li></ul>\n<h2 id='custom-log-shipper' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#custom-log-shipper' aria-label='Anchor'></a><span class='plain-code'>Custom log shipper</span></h2>\n<p>This approach is less bullet proof but may result in more immediately usable\nresults. Instead of using Log Shipper, Vector, and a third party, it is easy\nto subscribe directly to NATS and process log entries yourself.</p>\n\n<p>What you are going to want is a separate app running on a separate machine so\nthat it doesn’t go down there are problems with the machine you are monitoring,\nor even during the times when you are deploying a new version. If the\ncode you write will be writing to disk, you will want a volume.</p>\n\n<p>Also like with log shipper, you will want to set the following secret:</p>\n<div class=\"highlight-wrapper group relative shell\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-zdu3b55g\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-zdu3b55g\">fly secrets <span class=\"nb\">set </span><span class=\"nv\">FLY_AUTH_TOKEN</span><span class=\"o\">=</span><span class=\"si\">$(</span>fly auth token<span class=\"si\">)</span>\n</code></pre>\n </div>\n</div>\n<p>Here’s a minimal JavaScript example that can be run using Node or Bun:</p>\n<div class=\"highlight-wrapper group relative javascript\">\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-wrap-target=\"#code-fxjs7ls8\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n Wrap text\n </span>\n </button>\n <button \n type=\"button\"\n class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n data-copy-target=\"sibling\"\n >\n <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n Copy to clipboard\n </span>\n </button>\n <div class='highlight relative group'>\n <pre class='highlight '><code id=\"code-fxjs7ls8\"><span class=\"k\">import</span> <span class=\"p\">{</span> <span class=\"nx\">connect</span><span class=\"p\">,</span> <span class=\"nx\">StringCodec</span> <span class=\"p\">}</span> <span class=\"k\">from</span> <span class=\"dl\">\"</span><span class=\"s2\">nats</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"k\">import</span> <span class=\"nx\">fs</span> <span class=\"k\">from</span> <span class=\"dl\">'</span><span class=\"s1\">node:fs</span><span class=\"dl\">'</span><span class=\"p\">;</span>\n\n<span class=\"c1\">// tailor these two constants for your needs</span>\n<span class=\"kd\">const</span> <span class=\"nx\">LOG_FILE</span> <span class=\"o\">=</span> <span class=\"dl\">\"</span><span class=\"s2\">/log/production.log</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"kd\">const</span> <span class=\"nx\">ORGANIZATION</span> <span class=\"o\">=</span> <span class=\"dl\">\"</span><span class=\"s2\">your-organization-name</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n\n<span class=\"c1\">// create a connection to a nats-server</span>\n<span class=\"kd\">const</span> <span class=\"nx\">nc</span> <span class=\"o\">=</span> <span class=\"k\">await</span> <span class=\"nx\">connect</span><span class=\"p\">({</span>\n <span class=\"na\">servers</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">[fdaa::3]:4223</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n <span class=\"na\">user</span><span class=\"p\">:</span> <span class=\"nx\">ORGANIZATION</span><span class=\"p\">,</span>\n <span class=\"na\">pass</span><span class=\"p\">:</span> <span class=\"nx\">process</span><span class=\"p\">.</span><span class=\"nx\">env</span><span class=\"p\">.</span><span class=\"nx\">ACCESS_TOKEN</span>\n<span class=\"p\">});</span>\n\n<span class=\"c1\">// open log file</span>\n<span class=\"nx\">file</span> <span class=\"o\">=</span> <span class=\"nx\">fs</span><span class=\"p\">.</span><span class=\"nx\">openSync</span><span class=\"p\">(</span><span class=\"nx\">LOG_FILE</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">a+</span><span class=\"dl\">'</span><span class=\"p\">);</span>\n\n<span class=\"c1\">// create a codec</span>\n<span class=\"kd\">const</span> <span class=\"nx\">sc</span> <span class=\"o\">=</span> <span class=\"nx\">StringCodec</span><span class=\"p\">();</span>\n\n<span class=\"c1\">// create a simple subscriber and iterate over messages</span>\n<span class=\"c1\">// matching the subscription</span>\n<span class=\"kd\">const</span> <span class=\"nx\">sub</span> <span class=\"o\">=</span> <span class=\"nx\">nc</span><span class=\"p\">.</span><span class=\"nx\">subscribe</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">logs.></span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n<span class=\"k\">for</span> <span class=\"k\">await</span> <span class=\"p\">(</span><span class=\"kd\">const</span> <span class=\"nx\">msg</span> <span class=\"k\">of</span> <span class=\"nx\">sub</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n <span class=\"kd\">const</span> <span class=\"nx\">data</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nx\">parse</span><span class=\"p\">(</span><span class=\"nx\">sc</span><span class=\"p\">.</span><span class=\"nx\">decode</span><span class=\"p\">(</span><span class=\"nx\">msg</span><span class=\"p\">.</span><span class=\"nx\">data</span><span class=\"p\">));</span>\n\n <span class=\"c1\">// build log file entry</span>\n <span class=\"kd\">const</span> <span class=\"nx\">log</span> <span class=\"o\">=</span> <span class=\"p\">[</span>\n <span class=\"nx\">data</span><span class=\"p\">.</span><span class=\"nx\">timestamp</span><span class=\"p\">.</span><span class=\"nx\">padEnd</span><span class=\"p\">(</span><span class=\"mi\">30</span><span class=\"p\">),</span>\n <span class=\"s2\">`[</span><span class=\"p\">${</span><span class=\"nx\">data</span><span class=\"p\">.</span><span class=\"nx\">fly</span><span class=\"p\">.</span><span class=\"nx\">app</span><span class=\"p\">.</span><span class=\"nx\">instance</span><span class=\"p\">}</span><span class=\"s2\">]`</span><span class=\"p\">,</span>\n <span class=\"nx\">data</span><span class=\"p\">.</span><span class=\"nx\">fly</span><span class=\"p\">.</span><span class=\"nx\">region</span><span class=\"p\">,</span>\n <span class=\"s2\">`[</span><span class=\"p\">${</span><span class=\"nx\">data</span><span class=\"p\">.</span><span class=\"nx\">log</span><span class=\"p\">.</span><span class=\"nx\">level</span><span class=\"p\">}</span><span class=\"s2\">]`</span><span class=\"p\">,</span>\n <span class=\"nx\">data</span><span class=\"p\">.</span><span class=\"nx\">message</span>\n <span class=\"p\">].</span><span class=\"nx\">join</span><span class=\"p\">(</span><span class=\"dl\">'</span><span class=\"s1\"> </span><span class=\"dl\">'</span><span class=\"p\">)</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"se\">\\n</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n\n <span class=\"c1\">// write entry to disk</span>\n <span class=\"nx\">fs</span><span class=\"p\">.</span><span class=\"nx\">write</span><span class=\"p\">(</span><span class=\"nx\">file</span><span class=\"p\">,</span> <span class=\"nx\">log</span><span class=\"p\">,</span> <span class=\"nx\">error</span> <span class=\"o\">=></span> <span class=\"p\">{</span>\n <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"nx\">error</span><span class=\"p\">)</span> <span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nx\">error</span><span class=\"p\">(</span><span class=\"nx\">error</span><span class=\"p\">);</span>\n <span class=\"p\">});</span>\n<span class=\"p\">}</span>\n</code></pre>\n </div>\n</div>\n<p>The above is pretty straightforward. It connects to NAT, opens a file,\nsubscribes to logs, parses each message, and writes out selected data\nto disk. This example is in JavaScript, but feel free to reimplement\nthis basic approach using your favorite language, as NATS supports\n<a href='https://docs.nats.io/using-nats/developer' title=''>plenty</a>.</p>\n\n<p>Things to watch out for: you don’t want recursive errors when exceptions\noccur during write. You want to capture errors and reconnect to NATS\nwhen the connection goes down. You may even want to filter messages.\nA more complete example implementing a number of these features can be found\n<a href='https://github.com/rubys/showcase/blob/main/fly/applications/logger/logfiler.ts' title=''>here</a>.</p>\n<h2 id='conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#conclusion' aria-label='Anchor'></a><span class='plain-code'>Conclusion</span></h2>\n<p>Log failures are not common, and perhaps the redundant logs that fly.io already\nkeeps will be sufficient for your needs. But it may be worth reviewing what\nyour exposure is and how to mitigate that exposure should your logs fail at the\nworst possible time.</p>\n\n<p>Hopefully the approaches listed above give you ideas on how to ensure that\nyou will always have the log data you need even in the most hostile\nenvironment conditions.</p>", "image": { "url": "https://fly.io/blog/redundant-logs/assets/lergs-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/tokenized-tokens/", "title": "Tokenized Tokens", "description": null, "url": "https://fly.io/blog/tokenized-tokens/", "published": "2023-07-12T00:00:00.000Z", "updated": "2023-11-21T21:08:37.000Z", "content": "<div class=\"lead\"><p>We’re Fly.io. We run apps for our users on hardware we host around the world. Building security for a platform like this is tricky, and that’s what the post is about. But you don’t have to read any of this to get an app running on here. See how to <a href=\"https://fly.io/docs/speedrun/\" title=\"\">speedrun getting an app running on Fly.io here</a>.</p>\n</div>\n<p>We built some little security thingies. We’re open sourcing them, and hoping you like them as much as we do. In a nutshell: it’s a proxy that injects secrets into arbitrary 3rd-party API calls. We could describe it more completely here, but that wouldn’t be as fun as writing a big long essay about how the thingies came to be, so: buckle up.</p>\n\n<p>The problem we confront is as old as Rails itself. Our application started simple: some controllers, some models. The only secrets it stored were bcrypt password hashes. But not unlike a pet baby alligator, it grew up. Now it’s become more unruly than we’d planned.</p>\n\n<p>That’s because frameworks like Rails make it easy to collect secrets: you just create another model for them, <a href='https://guides.rubyonrails.org/active_record_encryption.html' title=''>roll some kind of secret to encrypt them</a>, jam that secret into the deployment environment, and call it a day.</p>\n\n<p>And, at least in less sensitive applications, or even the early days of an app like ours, that can work!</p>\n<div class=\"callout\"><p>For what it’s worth, and to the annoyance of some of our Heroku refugees, we’ve never stored customer app secrets this way; our Rails API can write customer secrets, but has never been able to read them. We’ll talk more about how this works in a sec.</p>\n</div>\n<p>But for us, not anymore. At the stage we’re at, all secrets are hazmat. And Rails itself is the portion of our attack surface we’re least confident about – the rest of it is either outside of our trust boundaries, or written in Rust and Go, strongly-typed memory-safe languages that are easy to reason about, and which have never accidentally treated YAML as an executable file format.</p>\n\n<p>So, a few months back, during an integration with a 3rd party API that relied on OAuth2 tokens, we drew a line: ⚡ <em>henceforth, hazmat shall only be removed from Rails, never added</em> ⚡. This is easier said than done, though: despite prominent “this is not a place of honor” signs all over the codebase, our Rails API is still where much of the action in our system takes place.</p>\n<h3 id='how-apps-use-secrets-3-different-approaches' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-apps-use-secrets-3-different-approaches' aria-label='Anchor'></a><span class='plain-code'>How Apps Use Secrets: 3 Different Approaches</span></h3>\n<p><img src=\"/blog/tokenized-tokens/assets/secrets-1.png?2/3&card¢er\" /></p>\n\n<p>We just gave you one way, probably the most common. Stick ‘em in a model, encrypt them with an environment secret, and watch Dependabot religiously for vulnerabilities in transitively-added libraries you’ve never heard of before.</p>\n\n<p><img src=\"/blog/tokenized-tokens/assets/secrets-2.png?2/3&card¢er\" /></p>\n\n<p>Here’s a second way, probably the second-most popular: use a secrets management system, like <a href='https://aws.amazon.com/kms/' title=''>KMS</a> or <a href='https://www.hashicorp.com/products/vault' title=''>Vault</a>. These systems, which are great, keep secrets encrypted and allow access based on an intricate access control language, which is great.</p>\n\n<p>That’s what we do for customer app secrets, like <code>DATABASE_URL</code> and <code>API_KEY</code>. We use <a href='https://www.hashicorp.com/products/vault' title=''>HashiCorp Vault</a> (for the time being). Our Rails API has an access token for Vault that allows it to set secrets, but not read any of them back, like a kind of diode. A game-over Rails vulnerability might allow an attacker to scramble secrets, but not to easily dump them.</p>\n\n<p>In the happiest cases with secrets, systems like Vault can keep secret bits from ever touching the application. Customer app secrets are a happy case: Rails never needs to read them, <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>just our orchestrator</a>, to inject them into VM environments. In other happy cases, Vault operates on the app’s behalf: signing a time-limited request URL for AWS, or making a direct request to a known 3rd-party service. Vault calls these features “<a href='https://developer.hashicorp.com/vault/docs/secrets' title=''>secret engines</a>”, and when you can get away with using them, it’s hard to do better.</p>\n\n<p>The catch is, sometimes you can’t get away with them. For most 3rd parties, Vault has no idea how to interact with them. And most secrets are bearer tokens, not request signatures. The only way to use those kinds of secrets is to read them into app memory. If good code can read a secret from Vault, so can a YAML vulnerability.</p>\n<div class=\"callout\"><p>Still: this is better than nothing: even if apps can read raw secrets, systems like Vault can provide an audit trail of which secrets were pulled when, and make it much easier to rotate secrets, which you’ll want to do with raw secrets to contain their blast radius. HashiCorp Vault is great, so is KMS, we recommend them unreservedly.</p>\n</div>\n<p><img src=\"/blog/tokenized-tokens/assets/secrets-3.png?2/3&card¢er\" /></p>\n\n<p>So that’s why there’s a third way to handle this problem, which is: decompose your application into services so that the parts that have to handle secrets are tiny and well-contained. The bulk of our domain-specific business code can chug along in Rails, and the parts that trade bearer tokens with 3rd parties can be built in a couple hundred lines of Go.</p>\n\n<p>This is a good approach, too. It’s just cumbersome, because a big application ends up dealing with lots of different kinds of secrets, making a trusted microservice for each of them is a drag. What you want is to notice some commonality in how 3rd party API secrets are used, and to come up with some possible way of exploiting that.</p>\n\n<p>We thought long and hard on this and came up with:</p>\n<h3 id='tokenizer-the-fabled-4th-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#tokenizer-the-fabled-4th-way' aria-label='Anchor'></a><span class='plain-code'>Tokenizer: The Fabled 4th Way</span></h3>\n<p><img src=\"/blog/tokenized-tokens/assets/secrets-4.png?2/3&card¢er\" /></p>\n\n<p>We developed a multipurpose secret-using service called the <code>Tokenizer</code>.</p>\n\n<p><code>Tokenizer</code> is a stateless HTTP proxy that holds the private key of a <a href='https://pkg.go.dev/golang.org/x/crypto/nacl/box' title=''>Curve25519 keypair.</a></p>\n\n<p>When we get a new 3rd party API secret, we encrypt it to <code>Tokenizer's</code> public key; we “tokenize” it. Our API server can handle the (encrypted) tokenized secret, but it can’t read or use it directly. Only <code>Tokenizer</code> can.</p>\n\n<p>When it comes time to talk to the 3rd party API, Rails does so via <code>Tokenizer</code>. Here’s how that works:</p>\n\n<ol>\n<li>The API request is proxied, as an ordinary HTTP 1.1 request, through <code>Tokenizer</code>.\n</li><li>The request carries one or more additional <code>Proxy-Tokenizer</code> headers.\n</li><li>Each <code>Proxy-Tokenizer</code> header carries an encrypted secret and instructions for <code>Tokenizer</code> to rewrite the request in some way, usually by injecting the decrypted plaintext into a header.\n</li></ol>\n\n<p>You can think of <code>Tokenizer</code> as a sort of Vault-style “secret engine” that happens to capture virtually everything an app needs secrets for. It can even use decrypted secrets to selectively HMAC parts of requests, for APIs that authenticate with signatures instead of bearer tokens.</p>\n\n<p>Check it out: <a href='https://github.com/superfly/tokenizer' title=''>it’s not super complicated</a>.</p>\n\n<p>Now, our goal is to keep Rails from ever touching secret bits. But, hold on: a game-over Rails vulnerability would give attackers an easy way around <code>Tokenizer</code>: you’d just proxy requests for a particular secret to a service you ran that collected the plaintext.</p>\n\n<p>To mitigate that, we built the obvious feature: you can lock requests for specific secrets down to a list of allowed hosts or host regexp patterns.</p>\n\n<p>We think this approach to handling secrets is pretty similar to how payment processors tokenize payment card information, hence the name. The advantages are straightforward:</p>\n\n<ul>\n<li>Secrets are exposed to a much smaller attack surface that doesn’t include Rails.\n</li><li>Virtually every usage of secrets we’re likely to run across is captured by HTTP proxying, without us needing to write per-service code.\n</li><li>The tokenizer is a tiny project that’s easy to audit and reason about. \n</li><li>Every language we work in already has first-class support for running requests through a proxy (something we already do for <a href='https://github.com/stripe/smokescreen' title=''>SSRF protection</a>.)\n</li></ul>\n<h3 id='ssokenizer-tokenizing-oauth-sso' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ssokenizer-tokenizing-oauth-sso' aria-label='Anchor'></a><span class='plain-code'>SSOkenizer: Tokenizing OAuth SSO</span></h3>\n<p>When we created <code>Tokenizer</code>, we were motivated by the problem of OAuth2 tokens other services providers gave us, for partnership features we build for mutual customers.</p>\n\n<p>We’d also dearly like our customers to use OAuth2/OIDC to log into Fly.io itself; it’s more secure for them, and gives them the full complement of Google MFA features, meaning we don’t immediately have to implement the full complement of Google MFA features. Letting people log into Fly.io with a Google OAuth token means we have to keep track of people’s OAuth tokens. That sounds like a job for the <code>Tokenizer</code>!</p>\n\n<p>But there’s a catch: acquiring those OAuth tokens in the first place means doing the OAuth2 dance, which means that for a brief window of time, Rails is handling hazmat. We’d like to close that window.</p>\n\n<p><img src=\"/blog/tokenized-tokens/assets/ssokenizer.png?2/3&card¢er\" /></p>\n\n<p>Enter the <code>SSOkenizer</code>.</p>\n\n<p>The job of the <code>SSOkenizer</code> is to perform the OAuth2 dance on behalf of Rails, and then use the output of that process (the OAuth2 bearer token yielded from the OAuth2 code flow, which you can <a href='https://github.com/superfly/ssokenizer#ssokenizer' title=''>see in its cursed majesty here</a>) to drive the <code>Tokenizer</code>.</p>\n\n<p>In other words, where we’d otherwise explicitly encrypt secrets to be tokenized a-priori, the <code>SSOkenizer</code> does that on the fly, passing tokenized OAuth2 credentials back to Rails. Those… tokenized tokens can only be used through the <code>Tokenizer</code> proxy, which is the only component in our system with the private key that unseals them.</p>\n\n<p>We think this is a pretty neat trick. The <code>SSOkenizer</code> itself is tiny, even smaller than the <code>Tokenizer</code> (<a href='https://github.com/superfly/ssokenizer/' title=''>you can read it here</a>), and essentially stateless; in fact, pretty much everything in this system is minimally stateful, except Rails, which is great at being stateful. We even keep almost all of OAuth2 out of Rails and confined to Go code (where it’s practically the hello-world of Go OAuth2 libraries).</p>\n\n<p>A nice side effect-slash-validation of this design: once we got it working for Google, it became a super easy project to get OAuth2 logins working for other providers.</p>\n<h3 id='feel-free-to-poach-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#feel-free-to-poach-this' aria-label='Anchor'></a><span class='plain-code'>Feel Free To Poach This</span></h3>\n<p>We’re psyched for a bunch of reasons:</p>\n\n<ul>\n<li>We’ve got a clear path to rolling out SSO logins.\n</li><li>We can do integrations with third-party services now without infecting Rails with more hazmat secrets.\n</li><li>We’ve honored the rule of “only removing hazmat from Rails, not adding it”.\n</li><li>We’ve also cleared a path to getting all the rest of the hazmat Rails has access to tokenized. \n</li></ul>\n\n<p>These are standalone tools with no real dependencies on Fly.io, so they’re easy for us to open source. Which is what we did: if they sound useful to you, check out the <a href='https://github.com/superfly/tokenizer' title=''>tokenizer</a> and <a href='https://github.com/superfly/ssokenizer' title=''>ssokenizer</a> repositories for instructions on deploying and using these services yourself.</p>", "image": { "url": "https://fly.io/blog/tokenized-tokens/assets/ghosts.png", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/flydotio-heart-bun/", "title": "Fly.io ❤️ Bun", "description": null, "url": "https://fly.io/blog/flydotio-heart-bun/", "published": "2023-07-11T00:00:00.000Z", "updated": "2024-04-12T18:23:39.000Z", "content": "<p><a href='https://lu.ma/cqk31rvl' title=''>Bun 1.0 comes out September 7th</a>. Fly.io is making preparations.</p>\n\n<p>Previously, we stated that <a href='https://fly.io/blog/flydotio-heart-js/' title=''>Fly.io ❤️ JS</a>, and we understandably started with Node.js. While that work is ongoing, it makes sense to start expanding to other runtimes.</p>\n\n<p>Bun is the obvious next choice given it <a href='https://bun.sh/docs/runtime/nodejs-apis' title=''>aims for complete Node.js API compatibility</a>.</p>\n\n<p>Starting with <a href='https://fly.io/docs/hands-on/install-flyctl/' title=''>flyctl</a> version 0.1.54 and <a href='https://www.npmjs.com/package/@flydotio/dockerfile' title=''>@flydotio/dockerfile</a> version 0.3.3, you can launch and deploy bun applications using <code>fly launch</code> and <code>fly deploy</code>,\nprovided:</p>\n\n<ul>\n<li>You’ve installed bun version 0.5.3 or later\n</li><li>You have a <code>package.json</code> that meets at least one of the following conditions:\n\n<ul>\n<li>It has a <code>start</code> entry in the <code>scripts</code> section.\n</li><li>It has a <code>module</code> entry and specified <code>module</code> as the <code>type</code>.\n</li><li>If has a <code>main</code> entry.\n</li></ul>\n</li></ul>\n\n<p>Basically, if you can run <a href='https://bun.sh/docs/quickstart' title=''>Bun’s Quickstart</a> and <a href='https://fly.io/docs/hands-on/' title=''>Fly’s hands-on walk-through</a>, you have all you need to deploy your application on fly.io.</p>\n\n<p>We also have a <a href='https://github.com/fly-apps/bun/' title=''>sample</a> that you can deploy.</p>\n\n<p>Be forewarned that everything is beta at this point. Some issues we encountered while preparing this support:</p>\n\n<ul>\n<li><a href='https://github.com/oven-sh/bun/issues/3605' title=''><code>bun install</code> has no <code>--prune</code> option</a>. Our Dockerfiles use this to remove development dependencies after running <code>build</code>. Of course with bun you are less likely to need a build step as TS and JSX are built in.\n</li><li><a href='https://github.com/oven-sh/bun/issues/1579' title=''><code>throwIfNoEntry</code> is not supported in <code>fs.statSync</code></a>. <a href='https://github.com/fly-apps/node-demo' title=''><code>fly-apps/node-demo</code></a> uses that.\n</li><li>Programs that used <a href='https://nodejs.org/api/readline.html' title=''>readline</a> <a href='https://github.com/oven-sh/bun/issues/3604' title=''>never exit</a>. Switching to <a href='https://bun.sh/docs/api/globals' title=''>global</a>.<a href='https://developer.mozilla.org/en-US/docs/Web/API/Window/prompt' title=''>prompt</a> resolved this issue for <code>@flydotio/dockerfile</code>.\n</li></ul>\n\n<p>Undoubtedly there will be bugs in fly’s dockerfile generator too. But as Node.js and Bun share the same generator, fixes that are made for either framework will generally benefit both.</p>\n\n<p>If you see a problem, \n<a href='https://community.fly.io/' title=''>start a discussion</a>,\n<a href='https://github.com/fly-apps/dockerfile-node' title=''>open an issue</a>, or\n<a href='https://github.com/fly-apps/dockerfile-node/pulls' title=''>create a pull request</a>.</p>", "image": { "url": "https://fly.io/blog/flydotio-heart-bun/assets/flydotio-heart-bun-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] }, { "id": "https://fly.io/blog/litefs-cloud/", "title": "LiteFS Cloud: Distributed SQLite with Managed Backups", "description": null, "url": "https://fly.io/blog/litefs-cloud/", "published": "2023-07-05T00:00:00.000Z", "updated": "2023-11-21T21:08:37.000Z", "content": "<div class=\"lead\"><p>With Fly.io, <a href=\"https://fly.io/docs/speedrun/\" title=\"\">you can get your app running globally in a matter of minutes</a>, and with LiteFS, you can run SQLite alongside your app! Now we’re introducing LiteFS Cloud: managed backups and point-in-time restores for LiteFS—whether your app is running on Fly.io or anywhere else. <a href=\"https://fly.io/docs/litefs/speedrun/\" title=\"\">Try it out for yourself</a>!</p>\n</div>\n<p>We love <a href='https://fly.io/blog/all-in-on-sqlite-litestream/' title=''>SQLite in production</a>, and we’re all about running apps close to users. That’s why we created LiteFS: an open source distributed SQLite database that lives on the same filesystem as your application, and replicates data to all the nodes in your app cluster.</p>\n\n<p>With LiteFS, you get the simplicity, flexibility, and lightning-fast local reads of working with vanilla SQLite, but distributed (so it’s close to your users)! It’s especially great for read-heavy web applications. Learn more about LiteFS in the <a href='https://fly.io/docs/litefs/' title=''>LiteFS docs</a> and in <a href='https://fly.io/blog/introducing-litefs/' title=''>our blog post introducing LiteFS</a>.</p>\n\n<p>At Fly.io we’ve been using LiteFS internally for a while now, and it’s awesome!</p>\n\n<p>However, something is missing: disaster recovery. Because it’s local to your app, you don’t need to—indeed can't—pay someone to manage your LiteFS cluster, which means no managed backups. Until now, you’ve had to <a href='https://fly.io/docs/litefs/backup/' title=''>build your own</a>: take regular snapshots, store them somewhere, figure out a retention policy, that sort of thing.</p>\n\n<p>This also means you can only restore from a point in time when you happen to have taken a snapshot, and you likely need to limit how frequently you snapshot for cost reasons. Wouldn’t it be cool if you could have super-frequent reliable backups to restore from, without having to implement it yourself?</p>\n\n<p>Well, that’s why we’re launching, in preview, LiteFS Cloud: backups and restores for LiteFS, managed by Fly.io. It gives you painless and reliable backups, with the equivalent of a snapshot every five minutes (8760 snapshots per month!), whether your database is hosted with us, or anywhere else.</p>\n<h2 id='how-do-i-use-litefs-cloud' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-do-i-use-litefs-cloud' aria-label='Anchor'></a><span class='plain-code'>How do I use LiteFS Cloud?</span></h2>\n<p>There’s a few steps to get started:</p>\n\n<ul>\n<li>Upgrade LiteFS to version 0.5.1 or greater\n</li><li>Create a LiteFS Cloud cluster in the Fly.io dashboard, <a href='https://fly.io/dashboard/personal/litefs' title=''>LiteFS Cloud section</a>\n</li><li>Make the LiteFS Cloud auth token available to your LiteFS\n</li></ul>\n\n<p><img alt=\"Screenshot of Fly.io dashboard, with a red arrow pointing to \"LiteFS Cloud\" in the left navbar, and another red arrow pointing to the \"Create\" button on the top right for creating a LiteFS Cloud cluster\" src=\"/blog/litefs-cloud/assets/screenshot1.png\" /></p>\n\n<p><a href='https://fly.io/docs/litefs/cloud-backups' title=''>There are some docs here</a>, but that’s literally it. Then your database will start automagically backing up, we’ll manage the backups for you, and you’ll be able to restore your database near instantaneously to any point in time in the last 30 days (with 5 minute granularity).</p>\n\n<p>I want to say that again because I think it’s just wild – you can restore your database to <em>any point in time, with 5 minute granularity</em>. <strong class='font-semibold text-navy-950'><em>Near instantaneously</em></strong>.</p>\n\n<p>Speaking of restores—you can do those in the dashboard too. You pick a date and time, and we’ll take the most recent snapshot before that timestamp and restore it. This will take a couple of seconds (or less).</p>\n\n<p><img alt=\"Screenshot of popup modal on Fly.io dashboard, with a date and time selector, and a text field with \"lfsc-test-runner/db\" typed in it, and a red button at the bottom with text \"I understand the consequences. Restore from this snapshot.\"\" src=\"/blog/litefs-cloud/assets/screenshot2.png\" /></p>\n\n<p>We’ll introduce pricing in the coming months, but for now LiteFS Cloud is in preview and is free to use. Please go check it out, and let us know how it goes!</p>\n<h2 id='the-secret-sauce-ltx-amp-compactions' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-secret-sauce-ltx-amp-compactions' aria-label='Anchor'></a><span class='plain-code'>The secret sauce: LTX & compactions</span></h2>\n<p>LiteFS is built on a simple file format called <a href='https://github.com/superfly/ltx' title=''>Lite Transaction File (LTX)</a> which is designed for fast, flexible replication and recovery in LiteFS itself and in LiteFS Cloud.</p>\n\n<p>But first, let’s start off with what an LTX file represents: <em>a change set of database pages</em>.</p>\n\n<p>When you commit a write transaction in SQLite, it updates one or more fixed-sized blocks called pages. By default, these are 4KB in size. An LTX file is simply a sorted list of these changed pages. Whenever you perform a transaction in SQLite, LiteFS will build an LTX file for that transaction.</p>\n\n<p>The interesting part of LTX is that contiguous sets of LTX files can be merged together into one LTX file. This merge process is called <em>compaction</em>.</p>\n\n<p>For example, let’s say you have 3 transactions in a row that update the following set of pages:</p>\n\n<ul>\n<li>LTX A: Pages 1, 5, 7\n</li><li>LTX B: Pages 5, 6\n</li><li>LTX C: Pages 5, 7\n</li></ul>\n\n<p>With LTX compaction, you avoid the duplicate work that comes from overwriting the same pages one transaction at a time. Instead, one LTX file for transactions A through C contains the last version of each page, so the pages are stored and updated only once:</p>\n\n<p><img alt=\"Compacting three contiguous LTX files into a single LTX file.\" src=\"/blog/litefs-cloud/assets/single-level-compaction.png\" /></p>\n\n<p>That, in a nutshell, is how a single-level compaction works.</p>\n<h2 id='its-ltx-all-the-way-down' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#its-ltx-all-the-way-down' aria-label='Anchor'></a><span class='plain-code'>It’s LTX all the way down</span></h2>\n<p>Compactions let us take changes for a bunch of transactions and smoosh them down into a single, small file. That’s cool and all but how does that give us fast point-in-time restores? By the magic of multi-level compactions!</p>\n\n<p>Compaction levels are progressively larger time intervals that we roll up transaction data. In the following illustration, you can see that the highest level (L3) starts with a full snapshot of the database. This occurs daily and it’s our starting point during a restore.</p>\n\n<p>Next, we have an hourly compaction level called L2 so there will be an LTX file with page changes between midnight and 1am, and then another file for 1am to 2am, etc. Below that is L1 which holds 5-minute intervals of data.</p>\n\n<p><img alt=\"Compaction levels for snapshots (L3), hourly (L2), & every five minutes (L1).\" src=\"/blog/litefs-cloud/assets/multi-level-compaction.png\" /></p>\n\n<p>When a restore is requested for a specific timestamp, we can determine a minimal set of LTX files to replay. For example, if we restored to January 10th at 8:15am we would grab the following files:</p>\n\n<ul>\n<li>Start with the snapshot for January 10th.\n</li><li>Fetch the eight hourly LTX files from midnight to 8am.\n</li><li>Fetch the three 5-minute interval LTX files from 8:00am to 8:15am.\n</li></ul>\n\n<p>Since LTX files are sorted by page number, we can perform a streaming merge of these twelve files and end up with the state of the database at the given timestamp.</p>\n<h2 id='department-of-redundancy-department' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#department-of-redundancy-department' aria-label='Anchor'></a><span class='plain-code'>Department of Redundancy Department</span></h2>\n<p>One of the primary goals of LiteFS is to be simple to use. However, that’s not an easy goal for a distributed database when our industry is moving more and more towards highly dynamic and ephemeral infrastructure. Traditional consensus algorithms require stable membership and adjusting the member set can be complicated.</p>\n\n<p>With LiteFS, we chose to use async replication as the primary mode of operation. This has some trade-offs in durability guarantees but it makes the cluster much simpler to operate. LiteFS Cloud alleviates many of these trade-offs of async replication by writing data out to high-durability, high-availability object storage—for now, we’re using S3.</p>\n\n<p>However, we don’t write every individual LTX file to object storage immediately. The latency is too high and it’s not cost effective when you write a lot of transactions. Instead, the LiteFS primary node will batch up its changes every second and send a single, compacted LTX file to LiteFS Cloud. Once there, LiteFS Cloud will batch these 1-second files together and flush them to storage periodically.</p>\n\n<p>We track the ID of the latest transaction that’s been flushed, and we call this the “high water mark” or HWM. This transaction ID is propagated back down to the nodes of the LiteFS cluster so we can ensure that the transaction file is not removed from any node until it is safely persisted in object storage. With this approach, we have multiple layers of redundancy in case your LiteFS cluster can’t communicate with LiteFS Cloud or if we can’t communicate with S3.</p>\n<h2 id='whats-next-for-litefs-cloud' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next-for-litefs-cloud' aria-label='Anchor'></a><span class='plain-code'>What’s next for LiteFS Cloud?</span></h2>\n<p>We have a small team dedicated to LiteFS Cloud, and we’re chugging away at new exciting features! Right now, LiteFS Cloud is really just backups and restores, but we are working on a lot of other cool stuff:</p>\n\n<ul>\n<li>Upload your database in the Fly.io dashboard. This way you don’t have to worry about figuring out how to initialize your database when you first deploy it, just upload the database in the dashboard and LiteFS will pull it from LiteFS Cloud.\n</li><li>Download a point-in-time snapshot of your database from the Fly.io dashboard. You can use this to spin up a local dev env (with production data), do some local analysis, etc.\n</li><li>Clone your LiteFS Cloud cluster to a new cluster, which you could use for a staging environment (or on-demand test environments for your CI pipelines) with real data.\n</li><li>Features to support apps that run on serverless platforms like Vercel, Google Cloud Run, Deno, and more. We’ll need to develop a number of different features for this, stay tuned for more information in the coming weeks!\n</li></ul>\n\n<p>We’re really excited about the future of LiteFS Cloud, so we wanted to share what we’re thinking. We’d also love to hear any feedback you have about these ideas that might inform our work.</p>", "image": { "url": "https://fly.io/blog/litefs-cloud/assets/litefs-cloud-thumb.webp", "title": null }, "media": [], "authors": [ { "name": "Fly", "email": null, "url": null } ], "categories": [] } ] }