Feed fetched in 82 ms.
Warning Content type is application/atom+xml
, not text/xml
.
Feed is 144,831 characters long.
Warning Feed is missing an ETag.
Warning Feed is missing the Last-Modified HTTP header.
Warning This feed does not have a stylesheet.
This appears to be an Atom feed.
Feed title: Aphyr: Posts
Feed self link matches feed URL.
Feed has 12 items.
First item published on 2025-02-21T23:04:55.000Z
Last item published on 2022-07-22T15:43:34.000Z
Home page URL: https://aphyr.com/
Home page has feed discovery link in <head>.
Error Home page does not have a link to the feed in the <body>.
<?xml version="1.0" encoding="UTF-8"?> <feed xmlns="http://www.w3.org/2005/Atom"> <id>https://aphyr.com/</id> <title>Aphyr: Posts</title> <updated>2025-02-21T18:08:31-05:00</updated> <link href="https://aphyr.com/"></link> <link rel="self" href="https://aphyr.com/posts.atom"></link> <entry> <id>https://aphyr.com/posts/380-comments-on-executive-order-14168</id> <title>Comments on Executive Order 14168</title> <published>2025-02-21T18:04:55-05:00</published> <updated>2025-02-21T18:04:55-05:00</updated> <link rel="alternate" href="https://aphyr.com/posts/380-comments-on-executive-order-14168"></link> <author> <name>Aphyr</name> <uri>https://aphyr.com/</uri> </author> <content type="html"><p><em>Submitted to the Department of State, which is <a href="https://www.federalregister.gov/documents/2025/02/18/2025-02696/30-day-notice-of-proposed-information-collection-application-for-a-us-passport-for-eligible">requesting comments</a> on a proposed change which would align US passport gender markers with <a href="https://www.whitehouse.gov/presidential-actions/2025/01/defending-women-from-gender-ideology-extremism-and-restoring-biological-truth-to-the-federal-government/">executive order 14168</a>.</em></p> <p>Executive order 14168 is biologically incoherent and socially cruel. All passport applicants should be allowed to select whatever gender markers they feel best fit, including M, F, or X.</p> <p>In humans, neither sex nor gender is binary at any level. There are several possible arrangements of sex chromosomes: X, XX, XY, XXY, XYY, XXX, tetrasomies, pentasomies, etc. A single person can contain a mosaic of cells with different genetics: some XX, some XYY. Chromosomes may not align with genitalia: people with XY chromosomes may have a vulva and internal testes. People with XY chromosomes and a small penis may be surgically and socially reassigned female at birth—and never told what happened. None of these biological dimensions necessarily align with one’s internal concept of gender, or one’s social presentation.</p> <p>The executive order has no idea how biology works. It defines “female” as “a person belonging, at conception, to the sex that produces the large reproductive cell”. Zygotes do not produce reproductive cells at all: under this order none of us have a sex. Oogenesis doesn’t start until over a month into embryo development. Even if people were karyotyping their zygotes immediately after conception so they could tell what “legal” sex they were going to be, they could be wrong: which gametes we produce depends on the formation of the genital ridge.</p> <p>All this is to say that if people fill out these forms using this definition of sex, they’re guessing at a question which is both impossible to answer and socially irrelevant. You might be one of the roughly two percent of humans born with an uncommon sexual development and not even know it. Moreover, the proposed change fundamentally asks the wrong question: gender markers on passports are used by border control agents, and are expected to align with how those agents read the passport holder’s gender. A mismatch will create needless intimidation and hardship for travelers.</p> <p>Of course most of us will not have our identities challenged under this order. That animus is reserved for trans people, for gender-non-conforming people, for anyone whose genetics, body, dress, voice, or mannerisms don’t quite fit the mold. Those are the people who will suffer under this order. That cruelty should be resisted.</p></content> </entry> <entry> <id>https://aphyr.com/posts/379-geoblocking-the-uk-with-debian-nginx</id> <title>Geoblocking the UK with Debian & Nginx</title> <published>2025-02-20T14:45:55-05:00</published> <updated>2025-02-20T14:45:55-05:00</updated> <link rel="alternate" href="https://aphyr.com/posts/379-geoblocking-the-uk-with-debian-nginx"></link> <author> <name>Aphyr</name> <uri>https://aphyr.com/</uri> </author> <content type="html"><p>A few quick notes for other folks who are <a href="https://geoblockthe.uk">geoblocking the UK</a>. I just set up a basic geoblock with Nginx on Debian. This is all stuff you can piece together, but the Maxmind and Nginx docs are a little vague about the details, so I figure it’s worth an actual writeup. My Nginx expertise is ~15 years out of date, so this might not be The Best Way to do things. YMMV.</p> <p>First, register for a free <a href="https://www.maxmind.com/en/geolite2/signup">MaxMind account</a>; you’ll need this to subscribe to their GeoIP database. Then set up a daemon to maintain a copy of the lookup file locally, and Nginx’s GeoIP2 module:</p> <pre><code><span></span>apt install geoipupdate libnginx-mod-http-geoip2 </code></pre> <p>Create a license key on the MaxMind site, and download a copy of the config file you’ll need. Drop that in <code>/etc/GeoIP.conf</code>. It’ll look like:</p> <pre><code>AccountID XXXX LicenseKey XXXX EditionIDs GeoLite2-Country </code></pre> <p>The package sets up a cron job automatically, but we should grab an initial copy of the file. This takes a couple minutes, and writes out <code>/var/lib/GeoIP/GeoLite2-Country-mmdb</code>:</p> <pre><code><span></span>geoipupdate </code></pre> <p>The GeoIP2 module should already be loaded via <code>/etc/nginx/modules-enabled/50-mod-http-geoip2.conf</code>. Add a new config snippet like <code>/etc/nginx/conf.d/geoblock.conf</code>. The first part tells Nginx where to find the GeoIP database file, and then extracts the two-letter ISO country code for each request as a variable. The <code>map</code> part sets up an <code>$osa_geoblocked</code> variable, which is set to <code>1</code> for GB, otherwise <code>0</code>.</p> <pre><code>geoip2 /var/lib/GeoIP/GeoLite2-Country.mmdb { $geoip2_data_country_iso_code country iso_code; } map $geoip2_data_country_iso_code $osa_geoblocked { GB 1; default 0; } </code></pre> <p>Write an HTML file somewhere like <code>/var/www/custom_errors/osa.html</code>, explaining the block. Then serve that page for HTTP 451 status codes: in <code>/etc/nginx/sites-enabled/whatever</code>, add:</p> <pre><code>server { ... # UK OSA error page error_page 451 /osa.html; location /osa.html { internal; root /var/www/custom_errors/; } # When geoblocked, return 451 location / { if ($osa_geoblocked = 1) { return 451; } } } </code></pre> <p>Test your config with <code>nginx -t</code>, and then <code>service nginx reload</code>. You can test how things look from the UK using a VPN service, or something like <a href="https://www.locabrowser.com/">locabrowser</a>.</p> <p>This is, to be clear, a bad solution. MaxMind’s free database is not particularly precise, and in general IP lookup tables are chasing a moving target. I know for a fact that there are people in non-UK countries (like Ireland!) who have been inadvertently blocked by these lookup tables. Making those people use Tor or a VPN <em>sucks</em>, but I don’t know what else to do in the current regulatory environment.</p></content> </entry> <entry> <id>https://aphyr.com/posts/378-seconds-since-the-epoch</id> <title>Seconds Since the Epoch</title> <published>2024-12-25T13:46:21-05:00</published> <updated>2024-12-25T13:46:21-05:00</updated> <link rel="alternate" href="https://aphyr.com/posts/378-seconds-since-the-epoch"></link> <author> <name>Aphyr</name> <uri>https://aphyr.com/</uri> </author> <content type="html"><p>This is not at all news, but it comes up often enough that I think there should be a concise explanation of the problem. People, myself included, like to say that POSIX time, also known as Unix time, is the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date">number</a> <a href="https://www.gnu.org/software/coreutils/manual/html_node/Seconds-since-the-Epoch.html">of</a> <a href="https://man7.org/linux/man-pages/man2/time.2.html">seconds</a> <a href="https://pkg.go.dev/time#Unix">since</a> <a href="https://dev.mysql.com/doc/refman/8.4/en/datetime.html">the</a> <a href="https://ruby-doc.org/core-3.0.0/Time.html">Unix</a> <a href="https://docs.datastax.com/en/cql-oss/3.x/cql/cql_reference/timestamp_type_r.html">epoch</a>, which was 1970-01-01 at 00:00:00.</p> <p>This is not true. Or rather, it isn’t true in the sense most people think. For example, it is presently 2024-12-25 at 18:51:26 UTC. The POSIX time is 1735152686. It has been 1735152713 seconds since the POSIX epoch. The POSIX time number is twenty-seven seconds lower.</p> <p>This is because POSIX time is derived <a href="https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub151-1.pdf">in IEEE 1003.1</a> from <a href="https://en.wikipedia.org/wiki/Coordinated_Universal_Time">Coordinated Universal Time</a>. The standard assumes that every day is exactly 86,400 seconds long. Specifically:</p> <blockquote> <p>The <em>time()</em> function returns the value of time in <b>seconds since the Epoch</b>.</p> </blockquote> <p>Which is defined as:</p> <blockquote> <p><b>seconds since the Epoch.</b> A value to be interpreted as the number of seconds between a specified time and the Epoch. A Coordinated Universal Time name (specified in terms of seconds (<em>tm_sec</em>), minutes (<em>tm_min</em>), hours (<em>tm_hour</em>), days since January 1 of the year (<em>tm_yday</em>), and calendar year minus 1900 (<em>tm_year</em>)) is related to a time represented as <em>seconds since the Epoch</em> according to the expression below.</p> <p>If year &lt; 1970 or the value is negative, the relationship is undefined. If year ≥ 1970 and the value is non-negative, the value is related to a Coordinated Universal Time name according to the expression:</p> <p><em>tm_sec</em> + <em>tm_min</em> * 60 + <em>tm_hour</em> * 3600 + <em>tm_yday</em> * 86400 + (<em>tm_year</em>-70) * 31536000 + ((<em>tm_year</em> - 69) / 4) * 86400</p> </blockquote> <p>The length of the day is not 86,400 seconds, and in fact changes over time. To keep UTC days from drifting too far from solar days, astronomers periodically declare a <a href="https://en.wikipedia.org/wiki/Leap_second">leap second</a> in UTC. Consequently, every few years POSIX time jumps backwards, <a href="https://marc.info/?l=linux-kernel&amp;m=134113577921904">wreaking</a> <a href="https://www.zdnet.com/article/qantas-suffers-delays-due-to-linux-leap-second-bug/">utter</a> <a href="https://blog.cloudflare.com/how-and-why-the-leap-second-affected-cloudflare-dns/">havoc</a>. Someday it might jump forward.</p> <h2><a href="#archaeology" id="archaeology">Archaeology</a></h2> <p>Appendix B of IEEE 1003 has a fascinating discussion of leap seconds:</p> <blockquote> <p>The concept of leap seconds is added for precision; at the time this standard was published, 14 leap seconds had been added since January 1, 1970. These 14 seconds are ignored to provide an easy and compatible method of computing time differences.</p> </blockquote> <p>I, too, love to ignore things to make my life easy. The standard authors knew “seconds since the epoch” were not, in fact, seconds since the epoch. And they admit as much:</p> <blockquote> <p>Most systems’ notion of “time” is that of a continuously-increasing value, so this value should increase even during leap seconds. However, not only do most systems not keep track of leap seconds, but most systems are probably not synchronized to any standard time reference. Therefore, it is inappropriate to require that a time represented as seconds since the Epoch precisely represent the number of seconds between the referenced time and the Epoch.</p> <p>It is sufficient to require that applications be allowed to treat this time as if it represented the number of seconds between the referenced time and the Epoch. It is the responsibility of the vendor of the system, and the administrator of the system, to ensure that this value represents the number of seconds between the referenced time and the Epoch as closely as necessary for the application being run on that system….</p> </blockquote> <p>I imagine there was some debate over this point. The appendix punts, saying that vendors and administrators must make time align “as closely as necessary”, and that “this value should increase even during leap seconds”. The latter is achievable, but the former is arguably impossible: the standard requires POSIX clocks be twenty-seven seconds off.</p> <blockquote> <p>Consistent interpretation of seconds since the Epoch can be critical to certain types of distributed applications that rely on such timestamps to synchronize events. The accrual of leap seconds in a time standard is not predictable. The number of leap seconds since the Epoch will likely increase. The standard is more concerned about the synchronization of time between applications of astronomically short duration and the Working Group expects these concerns to become more critical in the future.</p> </blockquote> <p>In a sense, the opposite happened. Time synchronization is <em>always</em> off, so systems generally function (however incorrectly) when times drift a bit. But leap seconds are rare, and the linearity evoked by the phrase “seconds since the epoch” is so deeply baked in to our intuition, that software can accrue serious, unnoticed bugs. Until a few years later, one of those tiny little leap seconds takes down a big chunk of the internet.</p> <h2><a href="#what-to-do-instead" id="what-to-do-instead">What To Do Instead</a></h2> <p>If you just need to compute the duration between two events on one computer, use <a href="https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/sect-posix_clocks#sect-POSIX_clocks"><code>CLOCK_MONOTONIC</code></a>, or better yet, <code>CLOCK_BOOTTIME</code>. If you don’t need to exchange timestamps with other systems that assume POSIX time, use <a href="https://www.ipses.com/eng/in-depth-analysis/standard-of-time-definition/">TAI, GPS, or maybe LORAN</a>. If you do need rough alignment with other POSIX-timestamp systems, <a href="https://developers.google.com/time/smear">smear leap seconds</a> over a longer window of time. Libraries like <a href="https://github.com/qntm/t-a-i">qntm’s t-a-i</a> can convert back and forth between POSIX and TAI.</p> <p>There’s an ongoing effort to <a href="https://www.timeanddate.com/news/astronomy/end-of-leap-seconds-2022">end leap seconds</a>, hopefully <a href="https://www.bipm.org/documents/20126/64811223/Resolutions-2022.pdf/281f3160-fc56-3e63-dbf7-77b76500990f">by 2035</a>. It’ll require additional work to build conversion tables into everything that relies on the “86,400 seconds per day” assumption, but it should also make it much simpler to ask questions like “how many seconds between these two times”. At least for times after 2035!</p></content> </entry> <entry> <id>https://aphyr.com/posts/371-threads-wont-take-you-south-of-market</id> <title>Threads Won't Take You South of Market</title> <published>2024-12-01T10:01:36-05:00</published> <updated>2024-12-01T10:01:36-05:00</updated> <link rel="alternate" href="https://aphyr.com/posts/371-threads-wont-take-you-south-of-market"></link> <author> <name>Aphyr</name> <uri>https://aphyr.com/</uri> </author> <content type="html"><p>In June 2023, when <a href="https://threads.net">Threads</a> announced their <a href="https://techcrunch.com/2023/07/05/adam-mosseri-says-metas-threads-app-wont-have-activitypub-support-at-launch/">plans to federate</a> with other <a href="https://en.wikipedia.org/wiki/Fediverse">Fediverse instances</a>, there was a good deal of <a href="https://fedipact.online/">debate</a> around whether smaller instances should allow federation or block it pre-emptively. As one of the admins of <a href="https://woof.group">woof.group</a>, I wrote about some of the <a href="https://blog.woof.group/announcements/considering-large-instance-federation">potential risks and rewards</a> of federating with Threads. We decided to <a href="https://blog.woof.group/announcements/deferring-threads-federation">wait and see</a>.</p> <p>In my queer and leather circles, Facebook and Instagram have been generally understood as hostile environments for over a decade. In 2014, their <a href="https://www.eff.org/deeplinks/2014/09/facebooks-real-name-policy-can-cause-real-world-harm-lgbtq-community">“Real Name” policy</a> made life particularly difficult for trans people, drag queens, sex workers, and people who, for various reasons, needed to keep their real name disconnected from their queer life. My friends have been repeatedly suspended from both platforms for showing too much skin, or using the peach emoji. Meta’s moderation has been aggressive, opaque, and wildly inconsistent: sometimes full nudity is fine; other times a kiss or swimsuit is beyond the line. In some circles, maintaining a series of backup accounts in advance of one’s ban became de rigueur.</p> <p>I’d hoped that federation between Threads and the broader Fediverse might allow a <a href="https://blog.woof.group/mods/the-shape-of-social-space">more nuanced spectrum</a> of moderation norms. Threads might opt for a more conservative environment locally, but through federation, allow their users to interact with friends on instances with more liberal norms. Conversely, most of my real-life friends are still on Meta services—I’d love to see their posts and chat with them again. Threads could communicate with Gay Fedi (using the term in the broadest sense), and de-rank or hide content they don’t like on a per-post or per-account basis.</p> <p>This world seems technically feasible. Meta reports <a href="https://techcrunch.com/2024/11/03/threads-now-has-275m-monthly-active-users/">275 million Monthly Active Users (MAUs)</a>, and over <a href="https://www.statista.com/statistics/1092227/facebook-product-dau/">three billion</a> accross other Meta services. Fediverse has something like <a href="https://fedidb.org/">one million MAUs across various instances</a>. This is not a large jump in processing or storage; nor would it seem to require a large increase in moderation staff. Threads has already committed to doing the requisite engineering, user experience, and legal work to allow federation across a broad range of instances. Meta is swimming in cash.</p> <p>All this seems a moot point. A year and a half later, Threads <a href="https://www.theverge.com/24107998/threads-fediverse-mastodon-how-to">is barely half federated</a>. It publishes Threads posts to the world, but only if you dig in to the settings and check the “Fediverse Sharing” box. Threads users can see replies to their posts, but can’t talk back. Threads users can’t mention others, see mentions from other people, or follow anyone outside Threads. This may work for syndication, but is essentially unusable for conversation.</p> <p>Despite the fact that Threads users can’t follow or see mentions from people on other instances, Threads has already <a href="https://www.threads.net/moderated_servers">opted to block</a> a slew of instances where gay &amp; leather people congregate. Threads blocks <a href="https://hypno.social">hypno.social</a>, <a href="rubber.social">rubber.social</a>, <a href="https://4bear.com">4bear.com</a>, <a href="https://nsfw.lgbt">nsfw.lgbt</a>, <a href="https://kinkyelephant.com">kinkyelephant.com</a>, <a href="https://kinktroet.social">kinktroet.social</a>, <a href="https://barkclub.xyz">barkclub.xyz</a>, <a href="https://mastobate.social">mastobate.social</a>, and <a href="https://kinky.business">kinky.business</a>. They also block the (now-defunct) instances <a href="https://bear.community">bear.community</a>, <a href="https://gaybdsm.group">gaybdsm.group</a>, and <a href="https://gearheads.social">gearheads.social</a>. They block more general queer-friendly instances like <a href="https://bark.lgbt">bark.lgbt</a>, <a href="https://super-gay.co">super-gay.co</a>, <a href="https://gay.camera">gay.camera</a>, and <a href="https://gaygeek.social">gaygeek.social</a>. They block sex-positive instances like <a href="https://nsfwphotography.social">nsfwphotography.social</a>, <a href="https://nsfw.social">nsfw.social</a>, and <a href="https://net4sw.com">net4sw.com</a>. All these instances are blocked for having “violated our Community Standards or Terms of Use”. Others like <a href="https://fisting.social">fisting.social</a>, <a href="https://mastodon.hypnoguys.com">mastodon.hypnoguys.com</a>, <a href="https://abdl.link">abdl.link</a>, <a href="https://qaf.men">qaf.men</a>, and <a href="https://social.rubber.family">social.rubber.family</a>, are blocked for having “no publicly accessible feed”. I don’t know what this means: hypnoguys.social, for instance, has the usual Mastodon <a href="https://mastodon.hypnoguys.com/public/local">publically accessible local feed</a>.</p> <p>It’s not like these instances are hotbeds of spam, hate speech, or harassment: woof.group federates heavily with most of the servers I mentioned above, and we rarely have problems with their moderation. Most have reasonable and enforced media policies requiring sensitive-media flags for genitals, heavy play, and so on. Those policies are generally speaking looser than Threads (woof.group, for instance, allows butts!) but there are plenty of accounts and posts on these instances which would be anodyne under Threads’ rules.</p> <p>I am shocked that woof.group is <em>not</em> on Threads’ blocklist yet. We have similar users who post similar things. Our content policies are broadly similar—several of the instances Threads blocks actually adopted woof.group’s specific policy language. I doubt it’s our size: Threads blocks several instances with less than ten MAUs, and woof.group has over seven hundred.</p> <p>I’ve been out of the valley for nearly a decade, and I don’t have insight into Meta’s policies or decision-making. I’m sure Threads has their reasons. Whatever they are, Threads, like all of Meta’s services, feels distinctly uncomfortable with sex, and sexual expression is a vibrant aspect of gay culture.</p> <p>This is part of why I started woof.group: we deserve spaces moderated with our subculture in mind. But I also hoped that by designing a moderation policy which compromised with normative sensibilities, we might retain connections to a broader set of publics. This particular leather bar need not be an invite-only clubhouse; it can be a part of a walkable neighborhood. For nearly five years we’ve kept that balance, retaining open federation with most all the Fediverse. I get the sense that Threads intends to wall its users off from our world altogether—to make “bad gays” invisible. If Threads were a taxi service, it wouldn’t take you <a href="https://sfleatherdistrict.org/wp-content/uploads/2021/04/Rubin-Valley-of-Kings.pdf">South of Market</a>.</p></content> </entry> <entry> <id>https://aphyr.com/posts/370-ecobee-settings-for-heat-pumps-with-resistive-aux-heat</id> <title>Ecobee Settings for Heat Pumps with Resistive Aux Heat</title> <published>2024-02-28T23:41:38-05:00</published> <updated>2024-02-28T23:41:38-05:00</updated> <link rel="alternate" href="https://aphyr.com/posts/370-ecobee-settings-for-heat-pumps-with-resistive-aux-heat"></link> <author> <name>Aphyr</name> <uri>https://aphyr.com/</uri> </author> <content type="html"><p>I’m in the process of replacing a old radiator system with a centrally-ducted, air-source heat pump system with electric resistive backup heat. I’ve found that the default ecobee algorithm seems to behave surprisingly poorly for this system, and wanted to write up some of the settings that I’ve found yield better behavior.</p> <p>A disclaimer. I’m not an HVAC professional. I have two decades in software operations, a background in physics, and far too much experience inferring system dynamics from timeseries graphs. This advice may void your warranty, burn your house down, etc.; everything you do is at your own risk.</p> <h2><a href="#the-system" id="the-system">The System</a></h2> <p>First, a bit about the system in question. You can skip this section if you know about heat pumps, short cycling, staging, etc.</p> <p>There are two main subsystems: a heat pump and an air handler. The heat pump sits outside: it has a fan which moves outside air over a heat exchanger, and a compressor, which compresses a working fluid. The working fluid is connected in a loop to the air handler, where it runs through another heat exchanger to heat or cool the inside air. The air handler also has a blower fan which circulates air through the whole house. If the heat pump can’t keep up with demand, the air handler also has a pair of resistive electric heating coils, called <em>aux heat</em>, which can supplement or take over from the heat pumps.</p> <p>A few important things to know about heat pumps. First, electric resistive heaters have a <em>Coefficient of Performance</em> (CoP) of essentially 1: they take 1 joule of electricity and turn it into 1 joule of heat in the air. My heat pumps have a typical heating CoP of about 2-4, depending on temperature and load. They take 1 joule of electricity and suck 2 to 4 joules of heat from the outside air into the inside. This means they cost 2-4 times less (in electric opex, at least) than a standard resistive electric heating system.</p> <p>Second, heat pumps, like A/C systems, shouldn’t start and stop too frequently. Starting up causes large transient electrical and mechanical stresses. Ideally they should run at a low speed for several hours, rather than running at full blast, shutting off, then turning on again ten minutes later. This is called “short cycling”.</p> <p>Third, the heat pump’s fan, heat pump’s compressor, and the air handler’s fan are all variable-speed: they can run very slow (quiet, efficient), very fast (loud, more powerful), or at any speed in between. This helps reduce short-cycling, as well as improving efficiency and reducing noise. However, directly setting compressor and fan speed requires a special “communicating” thermostat made by the same manufacturer, which speaks a proprietary wire protocol. My manufacturer’s communicating thermostats are very expensive and have a reputation for buggy hardware and software, so I opted to get an <a href="https://www.ecobee.com/en-us/smart-thermostats/smart-wifi-thermostat/">ecobee 3 lite</a>. Like essentially every other thermostat on the planet, the ecobee uses ~8 wires with simple binary signals, like “please give me heat” and “please turn on the fan”. It can’t ask for a specific <em>amount</em> of heat.</p> <p>However, all is not lost. The standard thermostat protocol has a notion of a “two-stage” system—if the Y1 wire is hot, it’s asking for “some heat”, and if Y2 is also hot, it’s asking for “more heat”. My variable-speed heat pump emulates a two-stage system using a hysteresis mechanism. In stage 1, the heat pump offers some nominal low degree of heat. When the thermostat calls for stage 2, it kicks up the air handler blower a notch, and after 20 minutes, it slowly ramps up the heat pump compressor as well. I assume there’s a ramp-down for going back to stage 1. They say this provides “true variable-capacity operation”. You can imagine that the most efficient steady state is where the thermostat toggles rapidly between Y1 and Y2, causing the system to hang out at exactly the right variable speeds for current conditions—but I assume ecobee has some kind of of frequency limiter to avoid damaging systems that actually have two separate stages with distinct startup/shutdown costs.</p> <p>The air handler’s aux heat is also staged: if the W1 wire is hot, I think (based on staring at the wiring diagram and air handler itself) it just energizes one of two coils. If W2 is also hot, it energizes both. I think this is good: we want to use as much of the heat pump heat as possible, and if we can get away with juuuust a little aux heat, instead of going full blast, that’ll save energy.</p> <p>In short: aux heat is 2-4x more expensive than heat pump heat; we want to use as little aux as possible. Short-cycling is bad: we want long cycle times. For maximum efficiency, we want both the heat pump and aux heat to be able to toggle between stage 1 and 2 depending on demand.</p> <h2><a href="#automatic-problems" id="automatic-problems">Automatic Problems</a></h2> <p>I initially left the ecobee at its automatic default settings for a few weeks; it’s supposed to learn the house dynamics and adapt. I noticed several problems. Presumably this behavior depends on weather, building thermal properties, HVAC dynamics, and however ecobee’s tuned their algorithm last week, so YMMV: check your system and see how it looks.</p> <p>It’s kind of buried, but ecobee offers a really nice time-series visualization of thermostat behavior on their web site. There’s also a Home Assistant integration that pulls in data from their API. It’s a pain in the ass to set up (ecobee, there’s no need for this to be so user-hostile), but it does work.</p> <p>Over the next few weeks I stared obsessively at time-series plots from both ecobee and Home Assistant, and mucked around with ecobee’s settings. Most of what I’ll describe below is configurable in the settings menu on the thermostat: look for “settings”, “installation settings”, “thresholds”.</p> <h2><a href="#reducing-aux-heat" id="reducing-aux-heat">Reducing Aux Heat</a></h2> <p>First, the automatics kicked on aux heat a <em>lot</em>. Even in situations where the heat pump would have been perfectly capable of getting up to temp, ecobee would burn aux heat to reach the target temperature (<em>set point</em>) faster.</p> <p>Part of the problem was that ecobee ships (I assume for safety reasons) with ludicrously high cut-off thresholds for heat pumps. Mine had “compressor min outdoor temperature” of something like 35 degrees, so the heat pump wouldn’t run for most of the winter. The actual minimum temperature of my model is -4, cold-climate heat pumps run down to -20. I lowered mine to -5; the manual says there’s a physical thermostat interlock on the heat pump itself, and I trust that more than the ecobee weather feed anyway.</p> <p>Second: ecobee seems to prioritize speed over progress: if it’s not getting to the set point fast enough, it’ll burn aux heat to get there sooner. I don’t want this: I’m perfectly happy putting on a jacket. After a bit I worked out that the heat pumps alone can cover the house load down to ~20 degrees or so, and raised “aux heat max outdoor temperature” to 25. If it’s any warmer than that, the system won’t use aux heat.</p> <h2><a href="#reverse-staging" id="reverse-staging">Reverse Staging</a></h2> <p>A second weird behavior: once the ecobee called for stage 2, either from the heat pump or aux, it would run in stage 2 until it hit the set point, then shut off the system entirely. Running aux stage 2 costs more energy. Running the heat pump in stage 2 shortens the cycle time: remember, the goal is a low, long running time.</p> <p><img class="attachment pure-img" src="/data/posts/370/no-reverse-staging.png" alt="A time-series plot showing that once stage 2 engages, it runs until shutting off, causing frequent cycling" title="A time-series plot showing that once stage 2 engages, it runs until shutting off, causing frequent cycling"></p> <p>The setting I used to fix this is called “reverse staging”. Ecobee’s <a href="https://support.ecobee.com/s/articles/Threshold-settings-for-ecobee-thermostats">documentation</a> says:</p> <blockquote> <p>Compressor Reverse Staging: Enables the second stage of the compressor near the temperature setpoint.</p> </blockquote> <p>As far as I can tell this documentation is completely wrong. From watching the graphs, this setting seems to allow the staging state machine to move from stage 2 back to stage 1, rather than forcing it to run in stage 2 until shutting off entirely. It’ll go back up to stage 2 if it needs to, and back down again.</p> <p><img class="attachment pure-img" src="/data/posts/370/reverse-staging.png" alt="With reverse staging, it'll jump up to stage 2, then drop back down to stage 1." title="With reverse staging, it'll jump up to stage 2, then drop back down to stage 1."></p> <h2><a href="#manual-staging" id="manual-staging">Manual Staging</a></h2> <p>I couldn’t seem to get ecobee’s automatic staging to drop back to stage 1 heat reliably, or avoid kicking on aux heat when stage 2 heat pump heat would have done fine. I eventually gave up and turned off automatic staging altogether. I went with the delta temperature settings. If the temperature delta between the set point and indoor air is more than 1 degree, it turns on heat pump stage 1. More than two degrees, stage 2. More than four degrees, aux 1. More than five degrees, aux 2. The goal here is to use only as much aux heat as absolutely necessary to supplement the heat pump. I also have aux heat configured to run concurrently with the heat pump: there’s a regime where the heat pump provides useful heat, but not quite enough, and my intuition is that <em>some</em> heat pump heat is cheaper than all aux.</p> <p>I initially tried the default 0.5 degree delta before engaging the heat pump’s first stage. It turns out that for some temperature regimes this creates rapid cycling: that first-phase heat is enough to heat the house rapidly to the set point, and then there’s nothing to do but shut the system off. The house cools, and the system kicks on again, several times per hour. I raised the delta to 1 degree, which significantly extended the cycle time.</p> <h2><a href="#large-setback-with-preheating" id="large-setback-with-preheating">Large Setback with Preheating</a></h2> <p>A <em>setback</em> is when you lower your thermostat, e.g. while away from home or sleeping. There’s some folk wisdom that heat pumps should run at a constant temperature all the time, rather than have a large setback. As far as I can tell, this is because a properly-sized heat pump system (unlike a gas furnace) doesn’t deliver a ton of excess heat, so it can’t catch up quickly when asked to return to a higher temperature. To compensate, the system might dip into aux heat, and that’s super expensive.</p> <p>I’m in the US Midwest, where winter temperatures are usually around 15-40 F. I drop from 68 to 60 overnight, and the house can generally coast all night without having to run any HVAC at all. In theory the ecobee should be able to figure out the time required to come back to 68 and start the heat pump early in the morning, but in practice I found it would wait too long, and then the large difference between actual and set temp would trigger aux heat. To avoid this, I added a custom activity in Ecobee’s web interface (I call mine “preheat”), with a temperature of 64. I have my schedule set up with an hour of preheat in the morning, before going to the normal 68. This means there’s less of a delta-T, and the system can heat up entirely using the heat pump.</p> <p><img class="attachment pure-img" src="/data/posts/370/setback.png" alt="A time series graph showing temperature falling smoothly overnight as the HVAC is disabled, and then rising during the preheat phase in the morning." title="A time series graph showing temperature falling smoothly overnight as the HVAC is disabled, and then rising during the preheat phase in the morning."></p></content> </entry> <entry> <id>https://aphyr.com/posts/369-classnotfoundexception-java-util-sequencedcollection</id> <title>ClassNotFoundException: java.util.SequencedCollection</title> <published>2024-02-20T18:03:42-05:00</published> <updated>2024-02-20T18:03:42-05:00</updated> <link rel="alternate" href="https://aphyr.com/posts/369-classnotfoundexception-java-util-sequencedcollection"></link> <author> <name>Aphyr</name> <uri>https://aphyr.com/</uri> </author> <content type="html"><p>Recently I’ve had users of my libraries start reporting mysterious errors due to a missing reference to <code>SequencedCollection</code>, a Java interface added in JDK 21:</p> <pre><code>Execution error (ClassNotFoundException) at jdk.internal.loader.BuiltinClassLoader/loadClass (BuiltinClassLoader.java:641). java.util.SequencedCollection </code></pre> <p>Specifically, projects using <a href="https://github.com/jepsen-io/jepsen/issues/585">Jepsen 0.3.5</a> started throwing this error due to Clojure’s built-in <code>rrb_vector.clj</code>, which is particularly vexing given that the class doesn’t reference <code>SequencedCollection</code> at all.</p> <p>It turns out that the Clojure compiler, when run on JDK 21 or later, will automatically insert references to this class when compiling certain expressions–likely because it now appears in the supertypes of other classes. Jepsen had <code>:javac-options [&quot;-source&quot; &quot;11&quot; &quot;-target&quot; &quot;11&quot;]</code> in Jepsen’s <code>project.clj</code> already, but it still emitted references to <code>SequencedCollection</code> because the reference is inserted by the Clojure compiler, not <code>javac</code>. Similarly, adding <code>[&quot;--release&quot; &quot;11&quot;]</code> didn’t work.</p> <p>Long story short: as far as I can tell the only workaround is to downgrade to Java 17 (or anything prior to 21) when building Jepsen as a library. That’s not super hard with <code>update-alternatives</code>, but I still imagine I’ll be messing this up until Clojure’s compiler can get a patch.</p></content> </entry> <entry> <id>https://aphyr.com/posts/368-how-to-replace-your-cpap-in-only-666-days</id> <title>How to Replace Your CPAP In Only 666 Days</title> <published>2024-02-03T19:38:53-05:00</published> <updated>2024-02-03T19:38:53-05:00</updated> <link rel="alternate" href="https://aphyr.com/posts/368-how-to-replace-your-cpap-in-only-666-days"></link> <author> <name>Aphyr</name> <uri>https://aphyr.com/</uri> </author> <content type="html"><p><em>This story is not practical advice. For me, it’s closing the book on an almost two-year saga. For you, I hope it’s an enjoyable bit of bureaucratic schadenfreude. For Anthem, I hope it’s the subject of a series of painful but transformative meetings. This is not an isolated event. I’ve had dozens of struggles with Anthem customer support, and they all go like this.</em></p> <p><em>If you’re looking for practical advice: it’s this. Be polite. Document everything. Keep a log. Follow the claims process. Check the laws regarding insurance claims in your state. If you pass the legally-mandated deadline for your claim, call customer service. Do not allow them to waste a year of your life, or force you to resubmit your claim from scratch. Initiate a complaint with your state regulators, and escalate directly to <a href="mailto:[email protected]">Gail Boudreaux’s team</a>–or whoever Anthem’s current CEO is.</em></p> <p>To start, experience an equipment failure.</p> <p>Use your CPAP daily for six years. Wake up on day zero with it making a terrible sound. Discover that the pump assembly is failing. Inquire with Anthem Ohio, your health insurer, about how to have it repaired. Allow them to refer you to a list of local durable medical equipment providers. Start calling down the list. Discover half the list are companies like hair salons. Eventually reach a company in your metro which services CPAPs. Discover they will not repair broken equipment unless a doctor tells them to.</p> <p>Leave a message with your primary care physician. Call the original sleep center that provided your CPAP. Discover they can’t help, since you’re no longer in the same state. Return to your primary, who can’t help either, because he had nothing to do with your prescription. Put the sleep center and your primary in touch, and ask them to talk.</p> <p>On day six, call your primary to check in. He’s received a copy of your sleep records, and has forwarded them to a local sleep center you haven’t heard of. They, in turn, will talk to Anthem for you.</p> <p>On day 34, receive an approval letter labeled “confirmation of medical necessity” from Anthem, directed towards the durable medical equipment company. Call that company and confirm you’re waitlisted for a new CPAP. They are not repairable. Begin using your partner’s old CPAP, which is not the right class of device, but at least it helps.</p> <p>Over the next 233 days, call that medical equipment company regularly. Every time, inquire whether there’s been any progress, and hear “we’re still out of stock”. Ask them you what the manufacturer backlog might be, how many people are ahead of you in line, how many CPAPs they <em>do</em> receive per month, or whether anyone has ever received an actual device from them. They won’t answer any questions. Realize they are never going to help you.</p> <p>On day 267, realize there is no manufacturer delay. The exact machine you need is in stock on CPAP.com. Check to make sure there’s a claims process for getting reimbursed by Anthem. Pay over three thousand dollars for it. When it arrives, enjoy being able to breathe again.</p> <p>On day 282, follow CPAP.com’s documentation to file a claim with Anthem online. Include your prescription, receipt, shipping information, and the confirmation of medical necessity Anthem sent you.</p> <p>On day 309, open the mail to discover a mysterious letter from Anthem. They’ve received your appeal. You do not recall appealing anything. There is no information about what might have been appealed, but something will happen within 30-60 days. There is nothing about your claim.</p> <p>On day 418, emerge from a haze of lead, asbestos, leaks, and a host of other home-related nightmares; remember Anthem still hasn’t said anything about your claim. Discover your claim no longer appears on Anthem’s web site. Call Anthem customer service. They have no record of your claim either. Ask about the appeal letter you received. Listen, gobsmacked, as they explain that they decided your claim was in fact an appeal, and transferred it immediately to the appeals department. The appeals department examined the appeal and looked for the claim it was appealing. Finding none, they decided the appeal was moot, and rejected it. At no point did anyone inform you of this. Explain to Anthem’s agent that you filed a claim online, not an appeal. At their instruction, resign yourself to filing the entire claim again, this time using a form via physical mail. Include a detailed letter explaining the above.</p> <p>On day 499, retreat from the battle against home entropy to call Anthem again. Experience a sense of growing dread as the customer service agent is completely unable to locate either of your claims. After a prolonged conversation, she finds it using a different tool. There is no record of the claim from day 418. There was a claim submitted on day 282. Because the claim does not appear in her system, there is no claim. Experience the cognitive equivalent of the Poltergeist hallway shot as the agent tells you “Our members are not eligible for charges for claim submission”.</p> <p>Hear the sentence “There is a claim”. Hear the sentence “There is no claim”. Write these down in the detailed log you’ve been keeping of this unfurling Kafkaesque debacle. Ask again if there is anyone else who can help. There is no manager you can speak to. There is no tier II support. “I’m the only one you can talk to,” she says. Write that down.</p> <p>Call CPAP.com, which has a help line staffed by caring humans. Explain that contrary to their documentation, Anthem now says members cannot file claims for equipment directly. Ask if they are the provider. Discover the provider for the claim is probably your primary care physician, who has no idea this is happening. Leave a message with him anyway. Leave a plaintive message with your original sleep center for good measure.</p> <p>On day 502, call your sleep center again. They don’t submit claims to insurance, but they confirm that some people <em>do</em> successfully submit claims to Anthem using the process you’ve been trying. They confirm that Anthem is, in fact, hot garbage. Call your primary, send them everything you have, and ask if they can file a claim for you.</p> <p>On day 541, receive a letter from Anthem, responding to your inquiry. You weren’t aware you filed one.</p> <blockquote> <p>Please be informed that we have received your concern. Upon review we have noticed that there is no claim billed for the date of service mentioned in the submitted documents, Please provide us with a valid claim. If not submitted,provide us with a valid claim iamge to process your claim further.</p> </blockquote> <p>Stare at the letter, typos and all. Contemplate your insignificance in the face of the vast and uncaring universe that is Anthem.</p> <p>On day 559, steel your resolve and call Anthem again. Wait as this representative, too, digs for evidence of a claim. Listen with delight as she finds your documents from day 282. Confirm that yes, a claim definitely exists. Have her repeat that so you can write it down. Confirm that the previous agent was lying: members can submit claims. At her instruction, fill out the claim form a third time. Write a detailed letter, this time with a Document Control Number (DCN). Submit the entire package via registered mail. Wait for USPS to confirm delivery eight days later.</p> <p>On day 588, having received no response, call Anthem again. Explain yourself. You’re getting good at this. Let the agent find a reference number for an appeal, but not the claim. Incant the magic DCN, which unlocks your original claim. “I was able to confirm that this was a claim submitted form for a member,” he says. He sees your claim form, your receipts, your confirmation of medical necessity. However: “We still don’t have the claim”.</p> <p>Wait for him to try system after system. Eventually he confirms what you heard on day 418: the claims department transferred your claims to appeals. “Actually this is not an appeal, but it was denied as an appeal.” Agree as he decides to submit your claim manually again, with the help of his supervisor. Write down the call ref number: he promises you’ll receive an email confirmation, and an Explanation of Benefits in 30-40 business days.</p> <p>“I can assure you this is the last time you are going to call us regarding this.”</p> <p>While waiting for this process, recall insurance is a regulated industry. Check the Ohio Revised Code. Realize that section 3901.381 establishes deadlines for health insurers to respond to claims. They should have paid or denied each of your claims within 30 days–45 if supporting documentation was required. Leave a message with the Ohio Department of Insurance’s Market Conduct Division. File an insurance complaint with ODI as well.</p> <p>Grimly wait as no confirmation email arrives.</p> <p>On day 602, open an email from Anthem. They are “able to put the claim in the system and currenty on processed [sic] to be applied”. They’re asking for more time. Realize that Anthem is well past the 30-day deadline under the Ohio Revised Code for all three iterations of your claim.</p> <p>On day 607, call Anthem again. The representative explains that the claim will be received and processed as of your benefits. She asks you to allow 30-45 days from today. Quote section 3901.381 to her. She promises to expedite the request; it should be addressed within 72 business hours. Like previous agents, she promises to call you back. Nod, knowing she won’t.</p> <p>On day 610, email the Ohio Department of Insurance to explain that Anthem has found entirely new ways to avoid paying their claims on time. It’s been 72 hours without a callback; call Anthem again. She says “You submitted a claim and it was received” on day 282. She says the claim was expedited. Ask about the status of that expedited resolution. “Because on your plan we still haven’t received any claims,” she explains. Wonder if you’re having a stroke.</p> <p>Explain that it has been 328 days since you submitted your claim, and ask what is going on. She says that since the first page of your mailed claim was a letter, that might have caused it to be processed as an appeal. Remind yourself Anthem told you to enclose that letter. Wait as she attempts to refer you to the subrogation department, until eventually she gives up: the subrogation department doesn’t want to help.</p> <p>Call the subrogation department yourself. Allow Anthem’s representative to induce in you a period of brief aphasia. She wants to call a billing provider. Try to explain there is none: you purchased the machine yourself. She wants to refer you to collections. Wonder why on earth Anthem would want money from <em>you</em>. Write down “I literally can’t understand what she thinks is going on” in your log. Someone named Adrian will call you by tomorrow.</p> <p>Contemplate alternative maneuvers. Go on a deep Google dive, searching for increasingly obscure phrases gleaned from Anthem’s bureaucracy. Trawl through internal training PDFs for Anthem’s ethics and compliance procedures. Call their compliance hotline: maybe someone cares about the law. It’s a third-party call center for Elevance Health. Fail to realize this is another name for Anthem. Begin drawing a map of Anthem’s corporate structure.</p> <p>From a combination of publicly-available internal slide decks, LinkedIn, and obscure HR databases, discover the name, email, and phone number of Anthem’s Chief Compliance Officer. Call her, but get derailed by an internal directory that requires a 10-digit extension. Try the usual tricks with automated phone systems. No dice.</p> <p>Receive a call from an Anthem agent. Ask her what happened to “72 hours”. She says there’s been no response from the adjustments team. She doesn’t know when a response will come. There’s no one available to talk to. Agree to speak to another representative tomorrow. It doesn’t matter: they’ll never call you.</p> <p>Do more digging. Guess the CEO’s email from what you can glean of Anthem’s account naming scheme. Write her an email with a short executive summary and a detailed account of the endlessly-unfolding Boschian hellscape in which her company has entrapped you. A few hours later, receive an acknowledgement from an executive concierge at Elevance (Anthem). It’s polite, formal, and syntactically coherent. She promises to look into things. Smile. Maybe this will work.</p> <p>On day 617, receive a call from the executive concierge. 355 days after submission, she’s identified a problem with your claim. CPAP.com provided you with an invoice with a single line item (the CPAP) and two associated billing codes (a CPAP and humidifier). Explain that they are integrated components of a single machine. She understands, but insists you need a receipt with multiple line items for them anyway. Anthem has called CPAP.com, but they can’t discuss an invoice unless you call them. Explain you’ll call them right now.</p> <p>Call CPAP.com. Their customer support continues to be excellent. Confirm that it is literally impossible to separate the CPAP and humidifier, or to produce an invoice with two line items for a single item. Nod as they ask what the hell Anthem is doing. Recall that this is the exact same machine Anthem covered for you eight years ago. Start a joint call with the CPAP.com representative and Anthem’s concierge. Explain the situation to her voicemail.</p> <p>On day 623, receive a letter from ODI. Anthem has told ODI this was a problem with the billing codes, and ODI does not intervene in billing code issues. They have, however, initiated a secretive second investigation. There is no way to contact the second investigator.</p> <p>Write a detailed email to the concierge and ODI explaining that it took over three hundred days for Anthem to inform you of this purported billing code issue. Explain again that it is a single device. Emphasize that Anthem has been handling claims for this device for roughly a decade.</p> <p>Wait. On day 636, receive a letter from Anthem’s appeals department. They’ve received your request for an appeal. You never filed one. They want your doctor or facility to provide additional information to Carelon Medical Benefits Management. You have never heard of Carelon. There is no explanation of how to reach Carelon, or what information they might require. The letter concludes: “There is currently no authorization on file for the services rendered.” You need to seek authorization from a department called “Utilization Management”.</p> <p>Call the executive concierge again. Leave a voicemail asking what on earth is going on.</p> <p>On day 637, receive an email: she’s looking into it.</p> <p>On day 644, Anthem calls you. It’s a new agent who is immensely polite. Someone you’ve never heard of was asked to work on another project, so she’s taking over your case. She has no updates yet, but promises to keep in touch.</p> <p>She does so. On day 653, she informs you Anthem will pay your claim in full. On day 659, she provides a check number. On day 666, the check arrives.</p> <p>Deposit the check. Write a thank you email to the ODI and Anthem’s concierge. Write this, too, down in your log.</p></content> </entry> <entry> <id>https://aphyr.com/posts/367-why-is-jepsen-written-in-clojure</id> <title>Why is Jepsen Written in Clojure?</title> <published>2023-12-05T09:49:05-05:00</published> <updated>2023-12-05T09:49:05-05:00</updated> <link rel="alternate" href="https://aphyr.com/posts/367-why-is-jepsen-written-in-clojure"></link> <author> <name>Aphyr</name> <uri>https://aphyr.com/</uri> </author> <content type="html"><p>People keep asking why <a href="https://jepsen.io">Jepsen</a> is written in <a href="https://clojure.org/">Clojure</a>, so I figure it’s worth having a referencable answer. I’ve programmed in something like twenty languages. Why choose a Weird Lisp?</p> <p>Jepsen is built for testing concurrent systems–mostly databases. Because it tests concurrent systems, the language itself needs good support for concurrency. Clojure’s immutable, persistent data structures make it easier to write correct concurrent programs, and the language and runtime have excellent concurrency support: real threads, promises, futures, atoms, locks, queues, cyclic barriers, all of java.util.concurrent, etc. I also considered languages (like Haskell) with more rigorous control over side effects, but decided that Clojure’s less-dogmatic approach was preferable.</p> <p>Because Jepsen tests databases, it needs broad client support. Almost every database has a JVM client, typically written in Java, and Clojure has decent Java interop.</p> <p>Because testing is experimental work, I needed a language which was concise, adaptable, and well-suited to prototyping. Clojure is terse, and its syntactic flexibility–in particular, its macro system–work well for that. In particular the threading macros make chained transformations readable, and macros enable re-usable error handling and easy control of resource scopes. The Clojure REPL is really handy for exploring the data a test run produces.</p> <p>Tests involve representing, transforming, and inspecting complex, nested data structures. Clojure’s data structures and standard library functions are possibly the best I’ve ever seen. I also print a lot of structures to the console and files: Clojure’s data syntax (EDN) is fantastic for this.</p> <p>Because tests involve manipulating a decent, but not huge, chunk of data, I needed a language with “good enough” performance. Clojure’s certainly not the fastest language out there, but idiomatic Clojure is usually within an order of magnitude or two of Java, and I can shave off the difference where critical. The JVM has excellent profiling tools, and these work well with Clojure.</p> <p>Jepsen’s (gosh) about a decade old now: I wanted a language with a mature core and emphasis on stability. Clojure is remarkably stable, both in terms of JVM target and the language itself. Libraries don’t “rot” anywhere near as quickly as in Scala or Ruby.</p> <p>Clojure does have significant drawbacks. It has a small engineering community and no (broadly-accepted, successful) static typing system. Both of these would constrain a large team, but Jepsen’s maintained and used by only 1-3 people at a time. Working with JVM primitives can be frustrating without dropping to Java; I do this on occasion. Some aspects of the polymorphism system are lacking, but these can be worked around with libraries. The error messages are terrible. I have no apologetics for this. ;-)</p> <p>I prototyped Jepsen in a few different languages before settling on Clojure. A decade in, I think it was a pretty good tradeoff.</p></content> </entry> <entry> <id>https://aphyr.com/posts/366-finding-domains-that-send-unactionable-reports-in-mastodon</id> <title>Finding Domains That Send Unactionable Reports in Mastodon</title> <published>2023-09-03T10:25:47-05:00</published> <updated>2023-09-03T10:25:47-05:00</updated> <link rel="alternate" href="https://aphyr.com/posts/366-finding-domains-that-send-unactionable-reports-in-mastodon"></link> <author> <name>Aphyr</name> <uri>https://aphyr.com/</uri> </author> <content type="html"><p>One of the things we struggle with on woof.group is un-actionable reports. For various reasons, most of the reports we handle are for posts that are either appropriately content-warned or don’t require a content warning under our content policy–things like faces, butts, and shirtlessness. We can choose to ignore reports from a domain, but we’d rather not do that: it means we might miss out on important reports that require moderator action. We can also talk to remote instance administrators and ask them to talk to their users about not sending copies of reports to the remote instance if they don’t know what the remote instance policy is, but that’s time consuming, and we only want to do it if there’s an ongoing problem.</p> <p>I finally broke down and dug around in the data model to figure out how to get statistics on this. If you’re a Mastodon admin and you’d like to figure out which domains send you the most non-actionable reports, you can run this at <code>rails console</code>:</p> <pre><code><span></span><span class="c1"># Map of domains to [action, no-action] counts</span> <span class="n">stats</span> <span class="o">=</span> <span class="no">Report</span><span class="o">.</span><span class="n">all</span><span class="o">.</span><span class="n">reduce</span><span class="p">({})</span> <span class="k">do</span> <span class="o">|</span><span class="n">stats</span><span class="p">,</span> <span class="n">r</span><span class="o">|</span> <span class="n">domain</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="n">account</span><span class="o">.</span><span class="n">domain</span> <span class="n">domain_stats</span> <span class="o">=</span> <span class="n">stats</span><span class="o">[</span><span class="n">domain</span><span class="o">]</span> <span class="o">||</span> <span class="o">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="o">]</span> <span class="n">action</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="n">history</span><span class="o">.</span><span class="n">any?</span> <span class="k">do</span> <span class="o">|</span><span class="n">a</span><span class="o">|</span> <span class="c1"># If you took action, there'd be a Status or Account target_type</span> <span class="n">a</span><span class="o">.</span><span class="n">target_type</span> <span class="o">!=</span> <span class="s1">'Report'</span> <span class="k">end</span> <span class="k">if</span> <span class="n">action</span> <span class="n">domain_stats</span><span class="o">[</span><span class="mi">0</span><span class="o">]</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">else</span> <span class="n">domain_stats</span><span class="o">[</span><span class="mi">1</span><span class="o">]</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">end</span> <span class="n">stats</span><span class="o">[</span><span class="n">domain</span><span class="o">]</span> <span class="o">=</span> <span class="n">domain_stats</span> <span class="n">stats</span> <span class="k">end</span> <span class="c1"># Top 20 domains, sorted by descending no-action count</span> <span class="n">stats</span><span class="o">.</span><span class="n">sort_by</span> <span class="k">do</span> <span class="o">|</span><span class="n">k</span><span class="p">,</span> <span class="n">stats</span><span class="o">|</span> <span class="n">stats</span><span class="o">[</span><span class="mi">1</span><span class="o">]</span> <span class="k">end</span><span class="o">.</span><span class="n">reverse</span><span class="o">.</span><span class="n">take</span><span class="p">(</span><span class="mi">20</span><span class="p">)</span> </code></pre> <p>For example:</p> <pre><code><span></span><span class="o">[[</span><span class="kp">nil</span><span class="p">,</span> <span class="o">[</span><span class="mi">77</span><span class="p">,</span> <span class="mi">65</span><span class="o">]</span> <span class="p">;</span> <span class="kp">nil</span> <span class="n">is</span> <span class="n">your</span> <span class="n">local</span> <span class="n">instance</span> <span class="o">[</span><span class="s2">&quot;foo.social&quot;</span><span class="p">,</span> <span class="o">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">52</span><span class="o">]]</span><span class="p">,</span> <span class="o">[</span><span class="s2">&quot;mastodon.oh.my.gosh&quot;</span><span class="p">,</span> <span class="o">[</span><span class="mi">40</span><span class="p">,</span> <span class="mi">24</span><span class="o">]]</span><span class="p">,</span> <span class="o">...</span> </code></pre></content> </entry> <entry> <id>https://aphyr.com/posts/365-10-operations-large-histories-with-jepsen</id> <title>10⁹ Operations: Large Histories with Jepsen</title> <published>2023-08-18T14:45:15-05:00</published> <updated>2023-08-18T14:45:15-05:00</updated> <link rel="alternate" href="https://aphyr.com/posts/365-10-operations-large-histories-with-jepsen"></link> <author> <name>Aphyr</name> <uri>https://aphyr.com/</uri> </author> <content type="html"><p><a href="https://github.com/jepsen-io/jepsen">Jepsen</a> is a library for writing tests of concurrent systems: everything from single-node data structures to distributed databases and queues. A key part of this process is recording a <em>history</em> of operations performed during the test. Jepsen <em>checkers</em> analyze a history to find consistency anomalies and to compute performance metrics. Traditionally Jepsen has stored the history in a Clojure vector (an immutable in-memory data structure like an array), and serialized it to disk at the end of the test. This limited Jepsen to histories on the order of tens of millions of operations. It also meant that if Jepsen crashed during a several-hour test run, it was impossible to recover any of the history for analysis. Finally, saving and loading large tests involved long wait times—sometimes upwards of ten minutes.</p> <p>Over the last year I’ve been working on ways to resolve these problems. Generators are up to ten times faster, and can emit up to 50,000 operations per second. A new operation datatype makes each operation smaller and faster to access. Jepsen’s new on-disk format allows us to stream histories incrementally to disk, to work with histories of up to a billion operations far exceeding available memory, to recover safely from crashes, and to load tests almost instantly by deserializing data lazily. New history datatypes support both densely and sparsely indexed histories, and efficiently cache auxiliary indices. They also support lazy disk-backed <code>map</code> and <code>filter</code>. These histories support both linear and concurrent folds, which dramatically improves checker performance on multicore systems: real-world checkers can readily analyze 250,000 to 900,000 operations/sec. Histories support multi-query optimization: when multiple threads fold over the same history, a query planner automatically fuses those folds together to perform them in a single pass. Since Jepsen often folds dozens of times over the same history, this saves a good deal of disk IO and deserialization time. These features are enabled by a new, transactional, dependency-aware task executor.</p> <h2><a href="#background" id="background">Background</a></h2> <p>Jepsen histories are basically a list of operations in time order. The meaning of an operation depends on the system being tested, but an operation could be something like “set <code>x</code> to three”, “read <code>y</code> and <code>z</code> as <code>2</code> and <code>7</code>, respectively”, or “add 64 to the queue”. Each of these logical operations has <em>two</em> operation elements in the history: an <em>invocation</em> when it begins, and a <em>completion</em> when it ends. This tells checkers which operations were performed concurrently, and which took place one after the next. For convenience, we use “operation” to refer both to a single entry in the history, and to an invocation/completion pair—it’s usually clear from context.</p> <p>For example, here’s a history of transactions from an etcd test:</p> <pre><code><span></span><span class="p">[{</span><span class="ss">:type</span> <span class="ss">:invoke</span>, <span class="ss">:f</span> <span class="ss">:txn</span>, <span class="ss">:value</span> <span class="p">[[</span><span class="ss">:w</span> <span class="mi">2</span> <span class="mi">1</span><span class="p">]]</span>, <span class="ss">:time</span> <span class="mi">3291485317</span>, <span class="ss">:process</span> <span class="mi">0</span>, <span class="ss">:index</span> <span class="mi">0</span><span class="p">}</span> <span class="p">{</span><span class="ss">:type</span> <span class="ss">:invoke</span>, <span class="ss">:f</span> <span class="ss">:txn</span>, <span class="ss">:value</span> <span class="p">[[</span><span class="ss">:r</span> <span class="mi">0</span> <span class="nv">nil</span><span class="p">]</span> <span class="p">[</span><span class="ss">:w</span> <span class="mi">1</span> <span class="mi">1</span><span class="p">]</span> <span class="p">[</span><span class="ss">:r</span> <span class="mi">2</span> <span class="nv">nil</span><span class="p">]</span> <span class="p">[</span><span class="ss">:w</span> <span class="mi">1</span> <span class="mi">2</span><span class="p">]]</span>, <span class="ss">:time</span> <span class="mi">3296209422</span>, <span class="ss">:process</span> <span class="mi">2</span>, <span class="ss">:index</span> <span class="mi">1</span><span class="p">}</span> <span class="p">{</span><span class="ss">:type</span> <span class="ss">:fail</span>, <span class="ss">:f</span> <span class="ss">:txn</span>, <span class="ss">:value</span> <span class="p">[[</span><span class="ss">:r</span> <span class="mi">0</span> <span class="nv">nil</span><span class="p">]</span> <span class="p">[</span><span class="ss">:w</span> <span class="mi">1</span> <span class="mi">1</span><span class="p">]</span> <span class="p">[</span><span class="ss">:r</span> <span class="mi">2</span> <span class="nv">nil</span><span class="p">]</span> <span class="p">[</span><span class="ss">:w</span> <span class="mi">1</span> <span class="mi">2</span><span class="p">]]</span>, <span class="ss">:time</span> <span class="mi">3565403674</span>, <span class="ss">:process</span> <span class="mi">2</span>, <span class="ss">:index</span> <span class="mi">2</span>, <span class="ss">:error</span> <span class="p">[</span><span class="ss">:duplicate-key</span> <span class="s">&quot;etcdserver: duplicate key given in txn request&quot;</span><span class="p">]}</span> <span class="p">{</span><span class="ss">:type</span> <span class="ss">:ok</span>, <span class="ss">:f</span> <span class="ss">:txn</span>, <span class="ss">:value</span> <span class="p">[[</span><span class="ss">:w</span> <span class="mi">2</span> <span class="mi">1</span><span class="p">]]</span>, <span class="ss">:time</span> <span class="mi">3767733708</span>, <span class="ss">:process</span> <span class="mi">0</span>, <span class="ss">:index</span> <span class="mi">3</span><span class="p">}]</span> </code></pre> <p>This shows two concurrent transactions executed by process 0 and 2 respectively. Process 0’s transaction attempted to write key 2 = 1 and succeeded, but process 2’s transaction tried to read 0, write key 1 = 1, read 2, then write key 1 = 2, and failed.</p> <p>Each operation is logically a map with <a href="https://github.com/jepsen-io/history/tree/be95aef21739167760c998e988fd4b3d10db1e3e#operation-structure">several predefined keys</a>. The <code>:type</code> field determines whether an operation is an invocation (<code>:invoke</code>) or a successful (<code>:ok</code>), failed (<code>:fail</code>), or indeterminate (<code>:info</code>) completion. The <code>:f</code> and <code>:value</code> fields indicate what the operation did—their interpretation is up to the test in question. We use <code>:process</code> to track which logical thread performed the operation. <code>:index</code> is a strictly-monotonic unique ID for every operation. In <em>dense</em> histories, indices go 0, 1, 2, …. FIltering or limiting a history to a subset of those operations yields a <em>sparse</em> history, with IDs like 3, 7, 8, 20, …. Operations can also include all kinds of additional fields at the test’s discretion.</p> <p>During a test Jepsen decides what operation to invoke next by asking a <em>generator</em>: a functional data structure that evolves as operations are performed. Generators include a rich library of combinators that can build up complex sequences of operations, including sequencing, alternation, mapping, filtering, limiting, temporal delays, concurrency control, and selective nondeterminism.</p> <h2><a href="#generator-performance" id="generator-performance">Generator Performance</a></h2> <p>One of the major obstacles to high-throughput testing with Jepsen was that the generator system simply couldn’t produce enough operations to keep up with the database under test. Much of this time was spent in manipulating the generator <em>context</em>: a data structure that keeps track of which threads are free and which logical processes are being executed by which threads. Contexts are often checked or restricted—for instance, a generator might want to pick a random free process in order to emit an operation, or to filter the set of threads in a context to a specific subset for various sub-generators: two threads making writes, three more performing reads, and so on.</p> <p>Profiling revealed context operations as a major bottleneck in generator performance—specifically, manipulating the Clojure maps and sets we initially used to represent context data. In Jepsen 0.3.0 I introduced a new <a href="https://github.com/jepsen-io/jepsen/blob/eeec7ef02ca278f52d3f1c71d9ff4ae5a75aad43/jepsen/src/jepsen/generator/context.clj">jepsen.generator.context</a> namespace with high-performance data structures for common context operations. I wrote a custom <a href="https://github.com/jepsen-io/jepsen/blob/main/jepsen/src/jepsen/generator/translation_table.clj">translation table</a> to map thread identifiers (e.g. <code>1</code>, <code>2</code>, … <code>:nemesis</code>) to and from integers. This let us encode context threads using Java <code>BitSet</code>s, which are significantly more compact than hashmap-backed sets and require no pointer chasing. To restrict threads to a specific subset, we <a href="https://github.com/jepsen-io/jepsen/blob/main/jepsen/src/jepsen/generator/context.clj#L311-L358">precompile a filter BitSet</a>, which turns expensive set intersection into a blazing-fast bitwise <code>and</code>. This bought us an <a href="https://github.com/jepsen-io/jepsen/commit/91d5e8fae64eca31b243fd978099c888db63260f">order-of-magnitude performance improvement</a> in a realistic generator performance test—pushing up to 26,000 ops/second through a 1024-thread generator.</p> <p>Based on <a href="https://www.yourkit.com/">YourKit</a> profiling I made additional performance tweaks. Jepsen now <a href="https://github.com/jepsen-io/jepsen/commit/165cba94be370a3226cb3c5e088b25598d0d97f0">memoizes function arity checking</a> when functions are used as generators. A few type hints helped avoid reflection in hot paths. <a href="https://github.com/jepsen-io/jepsen/commit/2c17e3253bd4459da07bf551a80d57f98e9d3f42">Additional micro-optimizations</a> bought further speedups. The function which fills in operation types, times, and processes spent a lot of time manipulating persistent maps, which we avoided by carefully tracking which fields had been read from the underlying map. We replaced expensive <code>mapcat</code> with a transducer, replaced <code>seq</code> with <code>count</code> to optimize empty-collection checks, and replaced a few <code>=</code> with <code>identical?</code> for symbols.</p> <p>In testing of list-append generators (a reasonably complex generator type) these improvements <a href="https://mastodon.jepsen.io/@jepsen/109287959624710790">let us push 50,000 operations per second</a> with 100 concurrent threads.</p> <h2><a href="#off-with-her-head" id="off-with-her-head">Off With Her Head</a></h2> <p>Another challenge in long-running Jepsen tests was a linear increase in memory consumption. This was caused by the test map, which is passed around ubiquitously and contains a reference to the initial generator that the test started with. Since generators are a lot like infinite sequences, “retaining the head” of the generator often resulted in keeping a chain of references to every single operation ever generated in memory.</p> <p>Because the test map is available during every phase of the test, and could be accessed by all kinds of uncontrolled code, we needed a way to enforce that the generator wouldn’t be retained accidentally. Jepsen now wraps the generator in a <a href="https://github.com/jepsen-io/jepsen/commit/7b8b5d531694a58a352de913c18c2bf2e4facacb">special <code>Forgettable</code> reference type</a> that can be explicitly forgotten. Forgetting that reference once the test starts running allows the garbage collector to free old generator states. Tests of 10<sup>9</sup> operations can now run in just a few hundred megabytes of heap.</p> <h2><a href="#operations-as-records" id="operations-as-records">Operations as Records</a></h2> <p>We initially represented operations as Clojure maps, which use arrays and hash tables internally. These maps require extra storage space, can only store JVM Objects, and require a small-but-noticeable lookup cost. Since looking up fields is so frequent, and memory efficiency important, we replaced them with Clojure <a href="https://clojure.org/reference/datatypes#_deftype_and_defrecord">records</a>. Our new <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L250-L253">Op datatype</a> stores its index and time as primitive longs, and the type, process, f, and value as Object fields. This is a significantly denser layout in memory, which also aids cache locality. Access to fields still looks like a map lookup (e.g. <code>(:index some-op)</code>), but compiles into a type cache check and a read of that field in the Op object. Much faster!</p> <p>Clojure records work very much like maps—they store a reference to a hashmap, so test can still store any number of extra fields on an Op. This preserves compatibility with existing Jepsen tests. The API is <em>slightly</em> different: <code>index</code> and <code>time</code> fields are no longer optional. Jepsen always generates these fields during tests, so this change mainly affects the internal test suites for Jepsen checkers.</p> <h2><a href="#streaming-disk-storage" id="streaming-disk-storage">Streaming Disk Storage</a></h2> <p>Jepsen initially saved tests, including both histories and the results of analysis, as a single map serialized with <a href="https://github.com/Datomic/fressian/wiki">Fressian</a>. This worked fine for small tests, but writing large histories could take tens of minutes. Jepsen now uses a <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj">custom disk format</a>. Jepsen files consist of a short header followed by an append-only log of immutable blocks. Each block contains a type, length, and checksum header, followed by data. These blocks form a persistent tree structure on disk: changes are written by appending new blocks to the end of the file, then flipping a single offset in the file header to point to the block that encodes the root of the tree.</p> <p>Blocks can store <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L97">arbitrary Fressian data</a>, <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L112">layered, lazily-decoded maps</a>, <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L130">streaming lists</a>, and <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L143">chunked vectors</a> which store pointers to other vectors “stitched together” into a single large vector. Chunked vectors store the index of the first element in each chunk, so you can jump to the correct chunk for any index in constant time.</p> <p>During a test Jepsen <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L1084-L1135">writes each operation</a> into a series of chunks on disk—each a streaming list of Fressian-encoded data. When a chunk is full (by default, 16384 operations), Jepsen seals that chunk, writes its checksum, and constructs a new chunked vector block with pointers to each chunk in the history—re-using existing chunks. It then writes a new root block and flips the root pointer at the start of the file. This ensures that at most the last 16384 operations can be lost during a crash.</p> <p>When Jepsen loads a test from disk it reads the header, decodes the root block, finds the base of the test map, and constructs a <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L1406-L1434">lazy map</a> which decodes results and the history only when asked for. Results are also encoded as <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L1252-L1285">lazy maps</a>, so tests with large results can still be decoded efficiently. This lets Jepsen read the high-level properties of tests—their names, timestamps, overall validity, etc., in a few milliseconds rather than tens of minutes. It also preserves compatibility: tests loaded from disk look and feel like regular hashmaps. Metadata allows those tests to be altered and saved back to disk efficiently, so you can re-analyze them later.</p> <p>Jepsen loads a test’s history using a <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/core.clj#L166-L326">lazy chunked vector</a>. The top-level vector keeps a cached count and lookup table to efficiently map indices to chunks. Each chunk is lazily decoded on first read and cached in a JVM <a href="https://docs.oracle.com/javase/8/docs/api/java/lang/ref/SoftReference.html">SoftReference</a>, which can be reclaimed under memory pressure. The resulting vector looks to users just like an in-memory vector, which ensures compatibility with extant Jepsen checkers.</p> <h2><a href="#history-datatypes" id="history-datatypes">History Datatypes</a></h2> <p>When working with histories checkers need to perform some operations over and over again. For instance, they often need a count of results, to look up an operation by its <code>:index</code> field, to jump between invocations and completions, and to schedule concurrent tasks for analysis. We therefore augment our lazy vectors of operations with an <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L491-L524">IHistory</a> type which supports these operations. Our histories still act like a vector of operations, but also provide efficient, cached mapping of invocations to completions, a threadpool for scheduling analysis tasks, and a stateful execution planner for folds over the history, which we call a <em>folder</em>.</p> <p>When Jepsen records a history it assigns each operation a sequential index 0, 1, 2, 3, …. We call these histories <em>dense</em>. This makes it straightforward to look up an operation by its index: <code>(nth history idx</code>). However, users might also filter a history to just some operations, or excerpt a slice of a larger history for detailed analysis. This results in a <em>sparse</em> history. Our sparse histories <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L782-L850">maintain a lazily-computed mapping of operation indices to vector indices</a> to efficiently support this.</p> <p>We also provide lazy, history-specific versions of <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L927-L1010">map</a> and <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L1014-L1153">filter</a>. Checkers perform these operations on histories over and over again, but using standard Clojure <code>map</code> and <code>filter</code> would be inappropriate: once realized, the entire mapped/filtered sequence is stored in memory indefinitely. They also require linear-time lookups for <code>nth</code>. Our versions <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L969-L975">push down the map and filter operations into each access</a>, trading off cachability for reduced memory consumption. This pushdown also enables multiple queries to share the same map or filter function. We also provide <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L939-L941">constant-time nth</a> on mapped histories, and <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L1095-L1099">log-time nth</a> on filtered ones.</p> <h2><a href="#concurrent-folds" id="concurrent-folds">Concurrent Folds</a></h2> <p>Because Jepsen’s disk files are divided into multiple chunks we can efficiently parallelize folds (reductions) over histories. Jepsen histories provide their own <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj">folding facility</a>, similar to (and, indeed, directly compatible with) <a href="https://github.com/aphyr/tesser">Tesser</a>, a library I wrote for composing concurrent folds. Tesser is intended for commutative folds, but almost all of its operators—map, filter, group-by, fuse, facet, etc.—work properly in ordered reductions too. History folds can perform a concurrent reduction over each chunk independently, and then combine the chunks together in a linear reduction. Since modern CPUs can have dozens or even hundreds of cores, this yields a significant speedup. Since each core reads its chunk from the file in a linear scan, it’s also friendly to disk and memory readahead prediction. If this sounds like <a href="https://en.wikipedia.org/wiki/MapReduce">MapReduce</a> on a single box, you’re right.</p> <p><img class="attachment pure-img" src="/data/posts/365/fold.jpg" alt="A diagram showing how concurrent folds proceed in independent reductions over each chunk, then combined in a second linear pass." title="A diagram showing how concurrent folds proceed in independent reductions over each chunk, then combined in a second linear pass."></p> <p>To support this access pattern each history comes with an associated <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L1381-L1405">Folder</a>: a stateful object that schedules, executes, and returns results from folds. Performing a <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L818-L845">linear fold over a history</a> is easy: the folder schedules a reduce of the first chunk of the history, then use that result as the input to a reduce of the second chunk, and so on. A <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L847-L879">concurrent fold</a> schedules reductions over every chunk concurrently, and a series of combine tasks which fold the results of each chunk reduction in serial.</p> <p>Folders are careful about thread locality and memory barriers during inter-thread handoff, which means folds can use <em>mutable</em> in-memory structures for their accumulators, not just immutable values. The reducing and combining functions don’t need to acquire locks of their own, and most access is completely unsynchronized. This works particularly well with the new <a href="https://aphyr.com/posts/363-fast-multi-accumulator-reducers">mutable reducers</a> in <a href="https://github.com/aphyr/dom-top">dom-top</a>.</p> <p>Folders and histories directly support the interfaces for plain-old Clojure <code>reduce</code> and <code>reducers/fold</code>, so existing checkers immediately take advantage of folder optimizations. They’re translated to folds over the history internally. Transducers work just like you’d expect.</p> <h2><a href="#multi-query-optimization" id="multi-query-optimization">Multi-Query Optimization</a></h2> <p>Jepsen tests often involve dozens of different checkers, each of which might perform many folds. When a history is larger than memory, some chunks are evicted from cache to make room for new ones. This means that a second fold might end up re-parsing chunks of history from disk that a previous fold already parsed–and forgot due to GC pressure. Running many folds at the same time also reduces locality: each fold likely reads an operation and follows pointers to other objects at different times and on different cores. This burns extra time in memory accesses <em>and</em> evicts other objects from the CPU’s cachelines. It would be much more efficient if we could do many folds in a single pass.</p> <p>How should this coordination be accomplished? In theory we could have a pure fold composition system and ask checkers to return the folds they intended to perform, along with some sort of continuation to evaluate when results were ready—but this requires rewriting every extant checker and breaking the intuitive path of a checker’s control flow. It also plays terribly with internal concurrency in a checker. Instead, each history’s folder provides a stateful coordination service for folds. Checkers simply ask the folder to fold over the history, and the folder secretly fuses all those folds into as few passes as possible.</p> <p><img class="attachment pure-img" src="/data/posts/365/mqo.jpg" alt="When a second fold is run, the folder computes a fused fold and tries to do as much work as possible in a single pass." title="When a second fold is run, the folder computes a fused fold and tries to do as much work as possible in a single pass."></p> <p>Imagine one thread starts a fold (orange) that looks for G1a (aborted read). That fold finishes reducing chunk 0 of the history and is presently reducing chunks 3 and 5, when another thread starts a fold to compute latency statistics (blue). The folder realizes it has an ongoing concurrent fold. It builds a fused fold (gray) which performs both G1a and latency processing in a single step. It then schedules latency reductions for chunks 0, 3, and 5, to catch up with the G1a fold. Chunks 1, 2, 4, and 6+ haven’t been touched yet, so the folder cancels their reductions. In their place it spawns new fused reduction tasks for those chunks, and new combine tasks for their results. It automatically weaves together the independent and fused stages so that they re-use as much completed work as possible. Most of the history is processed in a single pass, and once completed, the folder delivers results from both folds to their respective callers. All of this is <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L1025-L1329">done transactionally</a>, ensuring exactly-once processing and thread safety even with unsynchronized mutable accumulators. A simpler join process allows the folder to fuse together <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L910-L1023">linear folds</a>.</p> <p>The end result of this byzantine plumbing is that callers don’t have to think about <em>how</em> the underlying pass is executed. They just ask to fold over the history, and the folder figures out an efficient plan for executing it. This might involve a few independent passes over the first few chunks, but those chunks are likely retained in cache, which cuts down on deserialization. The rest of the history is processed in a single pass. We get high throughput, reduced IO, and cache locality.</p> <h2><a href="#task-scheduling" id="task-scheduling">Task Scheduling</a></h2> <p>To take advantage of multiple cores efficiently, we want checkers to run their analyses in multiple threads—but not <em>too</em> many, lest we impose heavy context-switching costs. In order to ensure thread safety and exactly-once processing during multi-query optimization, we need to ensure that fold tasks are cancelled and created atomically. We also want to ensure that tasks which block on the results of other tasks don’t launch until their dependencies are complete. This would lead to threadpool starvation and in some cases deadlock: a classic problem with Java threadpools.</p> <p>To solve these problems Jepsen’s history library includes a <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj">transactional, dependency-aware task scheduler</a>. Tasks are functions which receive their dependencies’ results as arguments, and can only start when their dependencies are complete. This prevents deadlock. Tasks are cancellable, which also cancels their dependencies. An exception in a task propagates to downstream dependencies, which simplifies error handling. Dereferencing a task returns its result, or throws if an exception occurred. These capabilities are backed by an <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L373-L379">immutable executor state</a> which tracks the dependency graph, which tasks are ready and presently running, and the next task ID. A miniature <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L806-L832">effect system</a> applies a log of “side effects” from changes to the immutable state. This approach allows us to support <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L834-L851">arbitrary transactions</a> over the task executor: one can create and cancel any number of tasks in an atomic step.</p> <p>This is powered by questionable misuses of Java’s ExecutorService &amp; BlockingQueue: our “queue” is simply the immutable state’s dependency graph. We <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L741-L747">ignore the ExecutorService’s attempts to add things to the queue</a> and instead mutate the state through our transaction channels. We wake up the executor with <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L828-L832">trivial runnables</a> when tasks become available. This feels evil, but it <em>does</em> work: executor safety is validated by an extensive series of <a href="https://github.com/jepsen-io/history/tree/be95aef21739167760c998e988fd4b3d10db1e3e/test/jepsen">generative tests</a> which stress both linear and concurrent behavior.</p> <h2><a href="#compatibility" id="compatibility">Compatibility</a></h2> <p>The most significant API change was replacing plain vectors of operations with a first-class <code>IHistory</code> type. Users who constructed their own histories and fed them to built-in checkers—for instance, by calling regular Clojure <code>map</code> or <code>filter</code> on a history—received a blunt warning of type incompatibility. The fix here is straightforward: either use <code>jepsen.history/map</code> and friends, or <code>jepsen.history/history</code> to wrap an existing collection in a new <code>IHistory</code>. These wrappers are designed to work both for the clean histories Jepsen produces as well as the incomplete test histories involved in Jepsen’s internal test suites.</p> <p>The other significant change was making operation <code>:time</code> and <code>:index</code> mandatory. These are always provided by Jepsen, but might be missing from hand-coded histories used in internal test suites. Operations are automatically promoted to hashmaps by the <code>history</code> wrapper function, and indices can either be validated or automatically assigned for testing.</p> <p>Otherwise the transition was remarkably successful. Existing code could continue to treat operations as hashmaps, to pretend histories were vectors, and to use standard Clojure collection operations as before. These all worked transparently, despite now being backed by lazily cached on-disk structures. I incrementally rewrote most of the core checkers in Jepsen and Elle to take advantage of the new Op type and fold system, and obtained significant speedups throughout the process.</p> <p>Dropping <code>:generator</code> from the test map is a recent change. I don’t <em>think</em> that’s going to cause problems; I’m not aware of anything that actually read the generator state other than debugging code, but you never know.</p> <h2><a href="#performance" id="performance">Performance</a></h2> <p>With Jepsen 0.2.7 an etcd list-append transactional test on a five-node LXC cluster might perform ~7,000 ops/sec (counting both invocations and completions). At this point the bottleneck is etcd, not Jepsen itself. With a 16 GB heap, that test would exhaust memory and crash after roughly two hours. Given 90 minutes, it generated a single Fressian object of 2.2 GB containing 38,076,478 operations; this file was larger than MAX_INT bytes and caused Jepsen to crash.</p> <p>When limited to an hour that test squeaked in under both heap and serialization limits: 25,461,508 operations (7.07 kHz) in a 1.5 GB Jepsen file. Analyzing that in-memory history for G1a, G1b, and internal consistency took 202 seconds (126 kHz). This timeseries plot of heap use and garbage collections shows the linear growth in memory use required to retain the entire history and generator state in memory.</p> <p><img class="attachment pure-img" src="/data/posts/365/027-memory-just-enough.png" alt="A timeseries plot of heap use and garbage collections. Memory use rises linearly over the hour-long test run, followed by a spike in heap use and intense collection pressure during the final analysis phase." title="A timeseries plot of heap use and garbage collections. Memory use rises linearly over the hour-long test run, followed by a spike in heap use and intense collection pressure during the final analysis phase."></p> <p>The resulting Jepsen file took 366 seconds and a 58 GB heap to load the history—even just to read a single operation. There are various reasons for this, including the larger memory footprint of persistent maps and vectors, temporarily retained intermediate representations from the Fressian decoder, pointers and object header overhead, Fressian’s more-compact encodings of primitive types, and its caching of repeated values.</p> <p>With Jepsen 0.3.3 that same hour-long test consumed only a few hundred megabytes of heap until the analysis phase, where it took full advantage of the 16 GB heap to cache most of the history and avoid parsing chunks multiple times. It produced a history of 40,358,966 elements (11.2 kHz) in a 1.9 GB Jepsen file. The analysis completed in 97 seconds (416 kHz). Even though the analysis phase has to re-parse the entire history from disk, it does so in parallel, which significantly speeds up the process.</p> <p><img class="attachment pure-img" src="/data/posts/365/033-one-hour.png" alt="A timeseries plot of heap use and garbage collections. Memory use is essentially constant over the hour-long test run, oldgen rising slightly and then falling during a few collections. Memory spikes at the end of the test, when we re-load the history and analyze it." title="A timeseries plot of heap use and garbage collections. Memory use is essentially constant over the hour-long test run, oldgen rising slightly and then falling during a few collections. Memory spikes at the end of the test, when we re-load the history and analyze it."></p> <p>Given ~25.4 hours that same test suite recorded a billion (1,000,000,000) operations at roughly 11 kHz, producing a 108 GB Jepsen file. Checking that history for G1a, G1b, and internal anomalies took 1 hour and 6 minutes (249 kHz) with a 100 GB heap, and consumed 70-98% of 48 cores. Note that our heap needed to be much larger. G1a and G1b are global anomalies and require computing all failed and intermediate writes in an initial pass over the history, then retaining those index structures throughout a second pass. Under memory pressure, this analysis requires more frequent reloading of history chunks that have been evicted from cache. Simpler folds, like computing totals for ok, failed, and crashed operations, can execute at upwards of 900 kHz.</p> <p>Thanks to the new file format’s lazy structure, parsing the top-level test attributes (e.g. time, name, validity) took just 20-30 milliseconds. Fetching a single element of the history (regardless of position) took ~500 milliseconds: a few disk seeks to load the root, test, and history block, and then checksumming and parsing the appropriate chunk. The history block itself was ~715 KB, and consisted of pointers to 61,000 chunks of 16384 operations each.</p> <h2><a href="#conclusion" id="conclusion">Conclusion</a></h2> <p>Jepsen 0.3.3 is capable of generating 50,000 operations per second, tackling histories of up to a billion operations and checking close to a million operations per second. Of course, these numbers depend on what the generator, client, and test are doing! Going further may be tricky: many JVM collection APIs are limited to 2<sup>31</sup> - 1 elements. Many of the checkers in Jepsen have been rewritten to run as concurrent folds, and to use more memory-efficient structures. However, only a few analyses are completely local: many require O(n) memory and may be impossible to scale past a few million operations.</p> <p>You can start using <a href="https://github.com/jepsen-io/jepsen/commit/cc200b1fd66cc36f827259237727655472c73f8e">Jepsen 0.3.3</a> today. I hope it opens up new territory for distributed systems testers everywhere.</p></content> </entry> <entry> <id>https://aphyr.com/posts/363-fast-multi-accumulator-reducers</id> <title>Fast Multi-Accumulator Reducers</title> <published>2023-04-27T08:47:46-05:00</published> <updated>2023-04-27T08:47:46-05:00</updated> <link rel="alternate" href="https://aphyr.com/posts/363-fast-multi-accumulator-reducers"></link> <author> <name>Aphyr</name> <uri>https://aphyr.com/</uri> </author> <content type="html"><p>Again <a href="https://aphyr.com/posts/360-loopr-a-loop-reduction-macro-for-clojure">with the reductions</a>! I keep writing code which reduces over a collection, keeping track of more than one variable. For instance, here’s one way to find the mean of a collection of integers:</p> <pre><code><span></span><span class="p">(</span><span class="kd">defn </span><span class="nv">mean</span> <span class="s">&quot;A reducer to find the mean of a collection. Accumulators are [sum count] pairs.&quot;</span> <span class="p">([]</span> <span class="p">[</span><span class="mi">0</span> <span class="mi">0</span><span class="p">])</span> <span class="p">([[</span><span class="nv">sum</span> <span class="nv">count</span><span class="p">]]</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">sum</span> <span class="nv">count</span><span class="p">))</span> <span class="p">([[</span><span class="nv">sum</span> <span class="nv">count</span><span class="p">]</span> <span class="nv">x</span><span class="p">]</span> <span class="p">[(</span><span class="nb">+ </span><span class="nv">sum</span> <span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">count</span><span class="p">)]))</span> </code></pre> <p>This <code>mean</code> function is what Clojure calls a <a href="https://clojure.org/reference/reducers">reducer</a>, or a <a href="https://clojure.org/reference/transducers">reducing function</a>. With no arguments, it constructs a fresh accumulator. With two arguments, it combines an element of some collection with the accumulator, returning a new accumulator. With one argument, it transforms the accumulator into some final result.</p> <p>We can use this reducer with standard Clojure <code>reduce</code>, like so:</p> <pre><code><span></span><span class="p">(</span><span class="nf">require</span> <span class="o">'</span><span class="p">[</span><span class="nv">criterium.core</span> <span class="ss">:refer</span> <span class="p">[</span><span class="nv">quick-bench</span> <span class="nv">bench</span><span class="p">]]</span> <span class="o">'</span><span class="p">[</span><span class="nv">clojure.core.reducers</span> <span class="ss">:as</span> <span class="nv">r</span><span class="p">])</span> <span class="p">(</span><span class="k">def </span><span class="nv">numbers</span> <span class="p">(</span><span class="nf">vec</span> <span class="p">(</span><span class="nb">range </span><span class="mi">1</span><span class="nv">e7</span><span class="p">)))</span> <span class="p">(</span><span class="nf">mean</span> <span class="p">(</span><span class="nb">reduce </span><span class="nv">mean</span> <span class="p">(</span><span class="nf">mean</span><span class="p">)</span> <span class="nv">numbers</span><span class="p">))</span> <span class="c1">; =&gt; 9999999/2</span> </code></pre> <p>Or use <code>clojure.core.reducers/reduce</code> to apply that function to every element in a collection. We don’t need to call <code>(mean)</code> to get an identity here; <code>r/reduce</code> does that for us:</p> <pre><code><span></span><span class="p">(</span><span class="nf">mean</span> <span class="p">(</span><span class="nf">r/reduce</span> <span class="nv">mean</span> <span class="nv">numbers</span><span class="p">))</span> <span class="c1">; =&gt; 9999999/2</span> </code></pre> <p>Or we can use <code>transduce</code>, which goes one step further and calls the single-arity form to transform the accumulator. Transduce also takes a <code>transducer</code>: a function which takes a reducing function and transforms it into some other reducing function. In this case we’ll use <code>identity</code>–<code>mean</code> is exactly what we want.</p> <pre><code><span></span><span class="p">(</span><span class="nf">transduce</span> <span class="nb">identity </span><span class="nv">mean</span> <span class="nv">numbers</span><span class="p">)</span> <span class="c1">; =&gt; 9999999/2</span> </code></pre> <p>The problem with all of these approaches is that <code>mean</code> is slow:</p> <pre><code><span></span><span class="nv">user=&gt;</span> <span class="p">(</span><span class="nf">bench</span> <span class="p">(</span><span class="nf">transduce</span> <span class="nb">identity </span><span class="nv">mean</span> <span class="nv">numbers</span><span class="p">))</span> <span class="nv">Evaluation</span> <span class="nb">count </span><span class="err">:</span> <span class="mi">60</span> <span class="nv">in</span> <span class="mi">60</span> <span class="nv">samples</span> <span class="nv">of</span> <span class="mi">1</span> <span class="nv">calls.</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">mean</span> <span class="err">:</span> <span class="mf">1.169762</span> <span class="nv">sec</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">std-deviation</span> <span class="err">:</span> <span class="mf">46.845286</span> <span class="nv">ms</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">lower</span> <span class="nv">quantile</span> <span class="err">:</span> <span class="mf">1.139851</span> <span class="nv">sec</span> <span class="p">(</span> <span class="mf">2.5</span><span class="nv">%</span><span class="p">)</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">upper</span> <span class="nv">quantile</span> <span class="err">:</span> <span class="mf">1.311480</span> <span class="nv">sec</span> <span class="p">(</span><span class="mf">97.5</span><span class="nv">%</span><span class="p">)</span> <span class="nv">Overhead</span> <span class="nv">used</span> <span class="err">:</span> <span class="mf">20.346327</span> <span class="nv">ns</span> </code></pre> <p>So that’s 1.169 seconds to find the mean of 10^7 elements; about 8.5 MHz. What’s going on here? Let’s try <a href="https://www.yourkit.com/">Yourkit</a>:</p> <p><img class="attachment pure-img" src="/data/posts/363/tuple-time.png" alt="A Yourkit screenshot of this benchmark" title="A Yourkit screenshot of this benchmark"></p> <p>Yikes. We’re burning 42% of our total time in destructuring (<code>nth</code>) and creating (<code>LazilyPersistentVector.createOwning</code>) those vector pairs. Also Clojure vectors aren’t primitive, so these are all instances of Long.</p> <h2><a href="#faster-reducers" id="faster-reducers">Faster Reducers</a></h2> <p>Check this out:</p> <pre><code><span></span><span class="p">(</span><span class="nf">require</span> <span class="o">'</span><span class="p">[</span><span class="nv">dom-top.core</span> <span class="ss">:refer</span> <span class="p">[</span><span class="nv">reducer</span><span class="p">]])</span> <span class="p">(</span><span class="k">def </span><span class="nv">mean</span> <span class="s">&quot;A reducer to find the mean of a collection.&quot;</span> <span class="p">(</span><span class="nf">reducer</span> <span class="p">[</span><span class="nv">sum</span> <span class="mi">0</span>, <span class="nb">count </span><span class="mi">0</span><span class="p">]</span> <span class="c1">; Starting with a sum of 0 and count of 0,</span> <span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="c1">; Take each element, x, and:</span> <span class="p">(</span><span class="nf">recur</span> <span class="p">(</span><span class="nb">+ </span><span class="nv">sum</span> <span class="nv">x</span><span class="p">)</span> <span class="c1">; Compute a new sum</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">count</span><span class="p">))</span> <span class="c1">; And count</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">sum</span> <span class="nv">count</span><span class="p">)))</span> <span class="c1">; Finally dividing sum by count</span> </code></pre> <p>The <code>reducer</code> macro takes a binding vector, like <code>loop</code>, with variables and their initial values. Then it takes a binding vector for an element of the collection. With each element it invokes the third form, which, like <code>loop</code>, recurs with new values for each accumulator variable. The optional fourth form is called at the end of the reduction, and transforms the accumulators into a final return value. You might recognize this syntax from <a href="https://aphyr.com/posts/360-loopr-a-loop-reduction-macro-for-clojure">loopr</a>.</p> <p>Under the hood, <code>reducer</code> <a href="https://github.com/aphyr/dom-top/blob/7f5d429593b42132b71c63f11d0fe8ae5b70ca68/src/dom_top/core.clj#L915-L957">dynamically compiles</a> an accumulator class with two fields: one for <code>sum</code> and one for <code>count</code>. It’ll re-use an existing accumulator class if it has one of the same shape handy. It then constructs a <a href="https://github.com/aphyr/dom-top/blob/7f5d429593b42132b71c63f11d0fe8ae5b70ca68/src/dom_top/core.clj#L1059-L1112">reducing function</a> which takes an instance of that accumulator class, and rewrites any use of <code>sum</code> and <code>acc</code> in the iteration form into direct field access. Instead of constructing a fresh instance of the accumulator every time through the loop, it mutates the accumulator fields in-place, reducing GC churn.</p> <pre><code><span></span><span class="nv">user=&gt;</span> <span class="p">(</span><span class="nf">bench</span> <span class="p">(</span><span class="nf">transduce</span> <span class="nb">identity </span><span class="nv">mean</span> <span class="nv">numbers</span><span class="p">))</span> <span class="nv">Evaluation</span> <span class="nb">count </span><span class="err">:</span> <span class="mi">120</span> <span class="nv">in</span> <span class="mi">60</span> <span class="nv">samples</span> <span class="nv">of</span> <span class="mi">2</span> <span class="nv">calls.</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">mean</span> <span class="err">:</span> <span class="mf">753.896407</span> <span class="nv">ms</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">std-deviation</span> <span class="err">:</span> <span class="mf">10.828236</span> <span class="nv">ms</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">lower</span> <span class="nv">quantile</span> <span class="err">:</span> <span class="mf">740.657627</span> <span class="nv">ms</span> <span class="p">(</span> <span class="mf">2.5</span><span class="nv">%</span><span class="p">)</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">upper</span> <span class="nv">quantile</span> <span class="err">:</span> <span class="mf">778.701155</span> <span class="nv">ms</span> <span class="p">(</span><span class="mf">97.5</span><span class="nv">%</span><span class="p">)</span> <span class="nv">Overhead</span> <span class="nv">used</span> <span class="err">:</span> <span class="mf">20.346327</span> <span class="nv">ns</span> </code></pre> <p>Getting rid of the vector destructuring makes the reduction ~35% faster. I’ve seen real-world speedups in my reductions of 50% or more.</p> <h2><a href="#primitives" id="primitives">Primitives</a></h2> <p>Since we control class generation, we can take advantage of type hints to compile accumulators with primitive fields:</p> <pre><code><span></span><span class="p">(</span><span class="k">def </span><span class="nv">mean</span> <span class="p">(</span><span class="nf">reducer</span> <span class="p">[</span><span class="o">^</span><span class="nb">long </span><span class="nv">sum</span> <span class="mi">0</span>, <span class="o">^</span><span class="nb">long count </span><span class="mi">0</span><span class="p">]</span> <span class="p">[</span><span class="o">^</span><span class="nb">long </span><span class="nv">x</span><span class="p">]</span> <span class="p">(</span><span class="nf">recur</span> <span class="p">(</span><span class="nb">+ </span><span class="nv">sum</span> <span class="p">(</span><span class="nb">long </span><span class="nv">x</span><span class="p">))</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">count</span><span class="p">))</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">sum</span> <span class="nv">count</span><span class="p">)))</span> </code></pre> <p>Since <code>reduce</code>, <code>transduce</code>, etc. call our reducer with Object, rather than Long, we still have to unbox each element–but we <em>do</em> avoid boxing on the iteration variables, at least. A primitive-aware reduce could do better.</p> <p><img class="attachment pure-img" src="/data/posts/363/reducer-prim.png" alt="A Yourkit screenshot showing almost all our time is spent in unboxing incoming longs. The addition and destructuring costs are gone." title="A Yourkit screenshot showing almost all our time is spent in unboxing incoming longs. The addition and destructuring costs are gone."></p> <p>This gives us a 43% speedup over the standard pair reducer approach.</p> <pre><code><span></span><span class="nv">user=&gt;</span> <span class="p">(</span><span class="nf">bench</span> <span class="p">(</span><span class="nf">transduce</span> <span class="nb">identity </span><span class="nv">mean</span> <span class="nv">numbers</span><span class="p">))</span> <span class="nv">Evaluation</span> <span class="nb">count </span><span class="err">:</span> <span class="mi">120</span> <span class="nv">in</span> <span class="mi">60</span> <span class="nv">samples</span> <span class="nv">of</span> <span class="mi">2</span> <span class="nv">calls.</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">mean</span> <span class="err">:</span> <span class="mf">660.420219</span> <span class="nv">ms</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">std-deviation</span> <span class="err">:</span> <span class="mf">14.255986</span> <span class="nv">ms</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">lower</span> <span class="nv">quantile</span> <span class="err">:</span> <span class="mf">643.583522</span> <span class="nv">ms</span> <span class="p">(</span> <span class="mf">2.5</span><span class="nv">%</span><span class="p">)</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">upper</span> <span class="nv">quantile</span> <span class="err">:</span> <span class="mf">692.084752</span> <span class="nv">ms</span> <span class="p">(</span><span class="mf">97.5</span><span class="nv">%</span><span class="p">)</span> <span class="nv">Overhead</span> <span class="nv">used</span> <span class="err">:</span> <span class="mf">20.346327</span> <span class="nv">ns</span> </code></pre> <h2><a href="#early-return" id="early-return">Early Return</a></h2> <p>The <code>reducer</code> macro supports early return. If the iteration form does not <code>recur</code>, it skips the remaining elements and returns immediately. Here, we find the mean of at most 1000 elements:</p> <pre><code><span></span><span class="p">(</span><span class="k">def </span><span class="nv">mean</span> <span class="p">(</span><span class="nf">reducer</span> <span class="p">[</span><span class="o">^</span><span class="nb">long </span><span class="nv">sum</span> <span class="mi">0</span>, <span class="o">^</span><span class="nb">long count </span><span class="mi">0</span> <span class="ss">:as</span> <span class="nv">acc</span><span class="p">]</span> <span class="c1">; In the final form, call the accumulator `acc`</span> <span class="p">[</span><span class="o">^</span><span class="nb">long </span><span class="nv">x</span><span class="p">]</span> <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">= count </span><span class="mi">1000</span><span class="p">)</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">sum</span> <span class="nv">count</span><span class="p">)</span> <span class="c1">; Early return</span> <span class="p">(</span><span class="nf">recur</span> <span class="p">(</span><span class="nb">+ </span><span class="nv">sum</span> <span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">count</span><span class="p">)))</span> <span class="c1">; Keep going</span> <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">number?</span> <span class="nv">acc</span><span class="p">)</span> <span class="c1">; Depending on whether we returned early...</span> <span class="nv">acc</span> <span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">sum</span> <span class="nv">count</span><span class="p">]</span> <span class="nv">acc</span><span class="p">]</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">sum</span> <span class="nv">count</span><span class="p">)))))</span> </code></pre> <p>For consistency with <code>transduce</code>, both normal and early return accumulators are passed to the final form. Because the early-returned value might be of a different type, <code>reducer</code> lets you declare a single variable for the accumulators in the final form–here, we call it <code>acc</code>. Depending on whether we return early or normally, <code>acc</code> will either be a number or a vector of all accumulators: <code>[sum count]</code>.</p> <h2><a href="#available-now" id="available-now">Available Now</a></h2> <p>I’ve been using <code>reducer</code> extensively in <a href="https://jepsen.io/">Jepsen</a> over the last six months, and I’ve found it very helpful! It’s used in many of our checkers, and helps find anomalies in histories of database operations more quickly.</p> <p>If you’d like to try this for yourself, you’ll find <code>reducer</code> in <a href="https://github.com/aphyr/dom-top">dom-top</a>, my library for unorthodox control flow. It’s available on Clojars.</p> <p><em>A special thanks to Justin Conklin, who was instrumental in getting the bytecode generation and dynamic classloading <code>reducer</code> needs to work!</em></p></content> </entry> <entry> <id>https://aphyr.com/posts/361-monkeypox-in-ohio</id> <title>Monkeypox in Ohio</title> <published>2022-07-22T10:43:34-05:00</published> <updated>2022-07-22T10:43:34-05:00</updated> <link rel="alternate" href="https://aphyr.com/posts/361-monkeypox-in-ohio"></link> <author> <name>Aphyr</name> <uri>https://aphyr.com/</uri> </author> <content type="html"><p><em>Update 2022-08-12: The Hamilton County Health Department now has a <a href="https://www.hamiltoncountyhealth.org/services/programs/community-health-services-disease-prevention/monkeypox/">page about monkeypox</a> with symptoms and isolation guidance, as well as options for vaccination, testing, and treatment–look for “complete our monkeypox vaccine registration”. The Cincinnati Health Department is <a href="https://www.cincinnati-oh.gov/health/news/monkeypox-vaccination/">also offering vaccines</a> for high-risk groups. People in Hamilton County without a primary care physician who have symptoms can also call call 513-357-7320 for the Cincinnati city health clinic.</em></p> <p>If you’re a gregarious gay man like me you’ve probably heard about monkeypox. <a href="https://www.cdc.gov/poxvirus/monkeypox/index.html">Monkeypox is an orthopoxvirus</a> which causes, in addition to systemic symptoms, lesions on the skin and mucosa. It’s transmitted primarily through skin-to-skin contact, though close-range droplet and fomite transfer are also possible. The current outbreak in industrialized nations is <a href="https://mobile.twitter.com/mugecevik/status/1549858617800753154">almost entirely</a> among gay, bisexual, and other men who have sex with men (GBMSM); likely via sexual networks. <a href="https://www.gov.uk/government/publications/monkeypox-outbreak-technical-briefings/investigation-into-monkeypox-outbreak-in-england-technical-briefing-3">In the UK, for example</a>, 99% of cases are male and 97% are among GBMSM. <a href="https://www.publichealthontario.ca/-/media/Documents/M/2022/monkeypox-episummary.pdf?sc_lang=en">Ontario reports</a> 99% of cases are in men. In New York <a href="https://www1.nyc.gov/assets/doh/downloads/pdf/han/advisory/2022/monkeypox-update-testing-treatment.pdf">99% of cases are in men who have sex with men</a>. For a good overview of what monkeypox looks like, how it’s spread, and ways we can reduce transmission, check out San Francisco Leathermen’s Discussion Group’s <a href="https://www.youtube.com/watch?v=iLninLYkXDc">presentation by MPH Frank Strona</a>.</p> <p>Earlier outbreaks of monkeypox have been generally self-limiting, but that does not appear to be the case with the current outbreak; perhaps because it’s spreading among men who enjoy lots of skin-to-skin contact with lots of people, and often travel to do just that. We saw cases come out of Darklands, IML, and Daddyland; they’re also rising quickly in major cities across the US. Cases haven’t taken off in Ohio yet, but for those of us who travel to large gay events it’s an increasing risk. Two of my friends tested positive last week; one landed in the ER. We’ve known for a while about symptoms and risk reduction tactics, so I won’t cover those here, but I <em>do</em> want to talk about vaccines and testing.</p> <p>There is a vaccine (in the US, <a href="https://www.cdc.gov/mmwr/volumes/71/wr/mm7122e1.htm">JYNNEOS</a>) for monkeypox, but but we have nowhere near enough doses. Some cities like SF, New York, and Chicago have pop-up mass vaccination sites. The Ohio Department of Health, on the other hand, has been <a href="https://ohiocapitaljournal.com/2022/07/13/monkeypox-is-spreading-but-the-ohio-department-of-health-hasnt-spread-the-message/">basically silent on the outbreak</a>: two months in there’s no public messaging about risk factors, prevention, testing, or treatment. <a href="https://aspr.hhs.gov/SNS/Pages/JYNNEOS-Distribution.aspx">Only ten doses</a> have been distributed in the state. Many of my friends here have no idea how to get vaccinated.</p> <p>Through an incredible stroke of luck and a truly fantastic doctor, I got my first dose yesterday. I followed up by chatting with our county’s (super nice!) head epidemiologist, and asking what advice I can give to the local queer leather community. Here’s the scoop:</p> <h2><a href="#getting-vaccinated-in-ohio" id="getting-vaccinated-in-ohio">Getting Vaccinated in Ohio</a></h2> <p>Because of the limited supply of vaccine doses, we’re currently focusing vaccination on those at high risk. Ohio’s current high-risk category includes:</p> <ul> <li>Anyone who has had close intimate contact (sex, cuddling, making out, etc.) with a known monkeypox case</li> <li>Exposure to respiratory droplets (e.g. dancing close, kissing) with a known case</li> </ul> <p>The problem is that many of us don’t <em>know</em> we’ve been in contact with monkeypox cases: of the people I know who have tested positive, none knew where they got it from. You might dance, make out with, or fuck dozens of people at an event weekend. So (our epidemiologist advised me) your participation in events where people have lots of sex or skin-to-skin contact can <em>also</em> be considered a risk factor. In other states having <a href="https://www1.nyc.gov/site/doh/health/health-topics/monkeypox.page">multiple or anonymous partners in a two-week span</a> is a qualifying criterion; that might be worth a shot here too.</p> <p>There are no public vaccination sites in Ohio yet, as far as I know. Doses are distributed to individuals through their primary care physician (which poses a problem if you <em>don’t</em> have one, oof). You should start by talking to your doctor about your risk factors: for me that was “I have two friends who tested positive, I just did bear week in ptown, I play regularly, and I’m going to be participating in a large leather run in a few weeks.” If your doctor concurs you’re at high risk, they should get in touch with their usual contact at the county or city department of health. The department of health gathers information about risk factors and contact information, and then gets in touch with the Ohio department of health, which requisitions doses from the CDC and ships them to your doctor. For me it took three days from “talking to my doctor” to “first dose in arm”.</p> <p>The JYNNEOS shot is fairly easy: it’s a single sub-cutaneous injection with a thin needle into the fat on your arm–much better than the old orthopoxvirus vaccines. I barely felt it. The usual vaccine side effects apply: you <a href="https://www.fda.gov/media/131078/download">might have some redness, swelling, or pain</a> at the injection site; other common side effects include low-grade muscle pain, headache, and fatigue. I’ve noticed some redness and a little bit of pain, but it’s been super mild. You need to come back for a second dose four weeks later, and protection is maximized two weeks after the second dose.</p> <p>When you get vaccinated, there’s a 21-day observation period where you need to report monkeypox-related symptoms via phone, text, or email to the health department–they’ll get in touch with you about this.</p> <h2><a href="#getting-tested-in-ohio" id="getting-tested-in-ohio">Getting Tested in Ohio</a></h2> <p>There’s no dedicated testing system in place yet, and I know friends have had really mixed luck getting tested through their primaries, urgent care, and even some hospitals. That said, our county epidemiologist says that both the state lab and some private labs–notably LabCorp–can test for monkeypox. Theoretically any primary care provider, urgent care, or emergency room should be able to do the test: they take a swab of your lesions and ship it off to the lab for analysis. In Cincinnati, Equitas Health (2805 Gilbert Ave, Cincinnati, OH 45206; 513-815-4475) should be able to test–call ahead and check before showing up though.</p> <p>When you get tested, you should isolate until you get results back. If your results are negative, talk to your primary doctor about what to do: it may be some other kind of illness. If the test is positive, you’ll have to isolate until no longer contagious, which means the lesions need to scab over and heal–you should see new skin underneath. Your contacts do not need to isolate unless they show symptoms, but the county will start contact tracing, and just in case, I think it’d be a good idea for you to reach out to intimate contacts personally.</p> <h2><a href="#mileage-may-vary" id="mileage-may-vary">Mileage May Vary</a></h2> <p>Right now I’m the only person I know who’s gotten vaccinated here, and I haven’t needed to look for testing, so I don’t know what’s going to happen when other people start trying to access care. A friend just told me that his doctor didn’t think there <em>was</em> a vaccine: if this happens to you, try showing them the CDC’s <a href="https://www.cdc.gov/mmwr/volumes/71/wr/mm7122e1.htm">MMWR report on JYNNEOS</a>. If they can’t get you JYNNEOS, <a href="https://www.cdc.gov/poxvirus/monkeypox/clinicians/smallpox-vaccine.html">ACAM2000</a> is also available–it involves worse side effects and may not be a good fit if you have a weakened immune system, but for many people it should still offer significant protection against monkeypox. If your doctor still doesn’t get it, ask them to contact the local health dept anyway.</p> <p>In Dayton, one friend’s primary doctor states the only way to get vaccinated is at the ER following a possible exposure. In NE Ohio, another reader’s primary said to contact the health dept directly; they’ve tried, but so far no response.</p> <p>If you’re still having trouble getting tested or vaccinated–your doctor doesn’t understand, or they can’t find the right contact at the health department–and we’re friends locally, get in touch. I’m <a href="mailto:[email protected]">[email protected]</a>. I may be able to put you in touch with a doctor and/or health department staff who can help.</p></content> </entry> </feed>
<?xml version="1.0" encoding="UTF-8"?><feed xmlns="http://www.w3.org/2005/Atom"><id>https://aphyr.com/</id><title>Aphyr: Posts</title><updated>2025-02-21T18:08:31-05:00</updated><link href="https://aphyr.com/"></link><link rel="self" href="https://aphyr.com/posts.atom"></link><entry><id>https://aphyr.com/posts/380-comments-on-executive-order-14168</id><title>Comments on Executive Order 14168</title><published>2025-02-21T18:04:55-05:00</published><updated>2025-02-21T18:04:55-05:00</updated><link rel="alternate" href="https://aphyr.com/posts/380-comments-on-executive-order-14168"></link><author><name>Aphyr</name><uri>https://aphyr.com/</uri></author><content type="html"><p><em>Submitted to the Department of State, which is <a href="https://www.federalregister.gov/documents/2025/02/18/2025-02696/30-day-notice-of-proposed-information-collection-application-for-a-us-passport-for-eligible">requesting comments</a> on a proposed change which would align US passport gender markers with <a href="https://www.whitehouse.gov/presidential-actions/2025/01/defending-women-from-gender-ideology-extremism-and-restoring-biological-truth-to-the-federal-government/">executive order 14168</a>.</em></p> <p>Executive order 14168 is biologically incoherent and socially cruel. All passport applicants should be allowed to select whatever gender markers they feel best fit, including M, F, or X.</p> <p>In humans, neither sex nor gender is binary at any level. There are several possible arrangements of sex chromosomes: X, XX, XY, XXY, XYY, XXX, tetrasomies, pentasomies, etc. A single person can contain a mosaic of cells with different genetics: some XX, some XYY. Chromosomes may not align with genitalia: people with XY chromosomes may have a vulva and internal testes. People with XY chromosomes and a small penis may be surgically and socially reassigned female at birth—and never told what happened. None of these biological dimensions necessarily align with one’s internal concept of gender, or one’s social presentation.</p> <p>The executive order has no idea how biology works. It defines “female” as “a person belonging, at conception, to the sex that produces the large reproductive cell”. Zygotes do not produce reproductive cells at all: under this order none of us have a sex. Oogenesis doesn’t start until over a month into embryo development. Even if people were karyotyping their zygotes immediately after conception so they could tell what “legal” sex they were going to be, they could be wrong: which gametes we produce depends on the formation of the genital ridge.</p> <p>All this is to say that if people fill out these forms using this definition of sex, they’re guessing at a question which is both impossible to answer and socially irrelevant. You might be one of the roughly two percent of humans born with an uncommon sexual development and not even know it. Moreover, the proposed change fundamentally asks the wrong question: gender markers on passports are used by border control agents, and are expected to align with how those agents read the passport holder’s gender. A mismatch will create needless intimidation and hardship for travelers.</p> <p>Of course most of us will not have our identities challenged under this order. That animus is reserved for trans people, for gender-non-conforming people, for anyone whose genetics, body, dress, voice, or mannerisms don’t quite fit the mold. Those are the people who will suffer under this order. That cruelty should be resisted.</p> </content></entry><entry><id>https://aphyr.com/posts/379-geoblocking-the-uk-with-debian-nginx</id><title>Geoblocking the UK with Debian & Nginx</title><published>2025-02-20T14:45:55-05:00</published><updated>2025-02-20T14:45:55-05:00</updated><link rel="alternate" href="https://aphyr.com/posts/379-geoblocking-the-uk-with-debian-nginx"></link><author><name>Aphyr</name><uri>https://aphyr.com/</uri></author><content type="html"><p>A few quick notes for other folks who are <a href="https://geoblockthe.uk">geoblocking the UK</a>. I just set up a basic geoblock with Nginx on Debian. This is all stuff you can piece together, but the Maxmind and Nginx docs are a little vague about the details, so I figure it’s worth an actual writeup. My Nginx expertise is ~15 years out of date, so this might not be The Best Way to do things. YMMV.</p> <p>First, register for a free <a href="https://www.maxmind.com/en/geolite2/signup">MaxMind account</a>; you’ll need this to subscribe to their GeoIP database. Then set up a daemon to maintain a copy of the lookup file locally, and Nginx’s GeoIP2 module:</p> <pre><code><span></span>apt install geoipupdate libnginx-mod-http-geoip2 </code></pre> <p>Create a license key on the MaxMind site, and download a copy of the config file you’ll need. Drop that in <code>/etc/GeoIP.conf</code>. It’ll look like:</p> <pre><code>AccountID XXXX LicenseKey XXXX EditionIDs GeoLite2-Country </code></pre> <p>The package sets up a cron job automatically, but we should grab an initial copy of the file. This takes a couple minutes, and writes out <code>/var/lib/GeoIP/GeoLite2-Country-mmdb</code>:</p> <pre><code><span></span>geoipupdate </code></pre> <p>The GeoIP2 module should already be loaded via <code>/etc/nginx/modules-enabled/50-mod-http-geoip2.conf</code>. Add a new config snippet like <code>/etc/nginx/conf.d/geoblock.conf</code>. The first part tells Nginx where to find the GeoIP database file, and then extracts the two-letter ISO country code for each request as a variable. The <code>map</code> part sets up an <code>$osa_geoblocked</code> variable, which is set to <code>1</code> for GB, otherwise <code>0</code>.</p> <pre><code>geoip2 /var/lib/GeoIP/GeoLite2-Country.mmdb { $geoip2_data_country_iso_code country iso_code; } map $geoip2_data_country_iso_code $osa_geoblocked { GB 1; default 0; } </code></pre> <p>Write an HTML file somewhere like <code>/var/www/custom_errors/osa.html</code>, explaining the block. Then serve that page for HTTP 451 status codes: in <code>/etc/nginx/sites-enabled/whatever</code>, add:</p> <pre><code>server { ... # UK OSA error page error_page 451 /osa.html; location /osa.html { internal; root /var/www/custom_errors/; } # When geoblocked, return 451 location / { if ($osa_geoblocked = 1) { return 451; } } } </code></pre> <p>Test your config with <code>nginx -t</code>, and then <code>service nginx reload</code>. You can test how things look from the UK using a VPN service, or something like <a href="https://www.locabrowser.com/">locabrowser</a>.</p> <p>This is, to be clear, a bad solution. MaxMind’s free database is not particularly precise, and in general IP lookup tables are chasing a moving target. I know for a fact that there are people in non-UK countries (like Ireland!) who have been inadvertently blocked by these lookup tables. Making those people use Tor or a VPN <em>sucks</em>, but I don’t know what else to do in the current regulatory environment.</p> </content></entry><entry><id>https://aphyr.com/posts/378-seconds-since-the-epoch</id><title>Seconds Since the Epoch</title><published>2024-12-25T13:46:21-05:00</published><updated>2024-12-25T13:46:21-05:00</updated><link rel="alternate" href="https://aphyr.com/posts/378-seconds-since-the-epoch"></link><author><name>Aphyr</name><uri>https://aphyr.com/</uri></author><content type="html"><p>This is not at all news, but it comes up often enough that I think there should be a concise explanation of the problem. People, myself included, like to say that POSIX time, also known as Unix time, is the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date">number</a> <a href="https://www.gnu.org/software/coreutils/manual/html_node/Seconds-since-the-Epoch.html">of</a> <a href="https://man7.org/linux/man-pages/man2/time.2.html">seconds</a> <a href="https://pkg.go.dev/time#Unix">since</a> <a href="https://dev.mysql.com/doc/refman/8.4/en/datetime.html">the</a> <a href="https://ruby-doc.org/core-3.0.0/Time.html">Unix</a> <a href="https://docs.datastax.com/en/cql-oss/3.x/cql/cql_reference/timestamp_type_r.html">epoch</a>, which was 1970-01-01 at 00:00:00.</p> <p>This is not true. Or rather, it isn’t true in the sense most people think. For example, it is presently 2024-12-25 at 18:51:26 UTC. The POSIX time is 1735152686. It has been 1735152713 seconds since the POSIX epoch. The POSIX time number is twenty-seven seconds lower.</p> <p>This is because POSIX time is derived <a href="https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub151-1.pdf">in IEEE 1003.1</a> from <a href="https://en.wikipedia.org/wiki/Coordinated_Universal_Time">Coordinated Universal Time</a>. The standard assumes that every day is exactly 86,400 seconds long. Specifically:</p> <blockquote> <p>The <em>time()</em> function returns the value of time in <b>seconds since the Epoch</b>.</p> </blockquote> <p>Which is defined as:</p> <blockquote> <p><b>seconds since the Epoch.</b> A value to be interpreted as the number of seconds between a specified time and the Epoch. A Coordinated Universal Time name (specified in terms of seconds (<em>tm_sec</em>), minutes (<em>tm_min</em>), hours (<em>tm_hour</em>), days since January 1 of the year (<em>tm_yday</em>), and calendar year minus 1900 (<em>tm_year</em>)) is related to a time represented as <em>seconds since the Epoch</em> according to the expression below.</p> <p>If year &lt; 1970 or the value is negative, the relationship is undefined. If year ≥ 1970 and the value is non-negative, the value is related to a Coordinated Universal Time name according to the expression:</p> <p><em>tm_sec</em> + <em>tm_min</em> * 60 + <em>tm_hour</em> * 3600 + <em>tm_yday</em> * 86400 + (<em>tm_year</em>-70) * 31536000 + ((<em>tm_year</em> - 69) / 4) * 86400</p> </blockquote> <p>The length of the day is not 86,400 seconds, and in fact changes over time. To keep UTC days from drifting too far from solar days, astronomers periodically declare a <a href="https://en.wikipedia.org/wiki/Leap_second">leap second</a> in UTC. Consequently, every few years POSIX time jumps backwards, <a href="https://marc.info/?l=linux-kernel&amp;m=134113577921904">wreaking</a> <a href="https://www.zdnet.com/article/qantas-suffers-delays-due-to-linux-leap-second-bug/">utter</a> <a href="https://blog.cloudflare.com/how-and-why-the-leap-second-affected-cloudflare-dns/">havoc</a>. Someday it might jump forward.</p> <h2><a href="#archaeology" id="archaeology">Archaeology</a></h2> <p>Appendix B of IEEE 1003 has a fascinating discussion of leap seconds:</p> <blockquote> <p>The concept of leap seconds is added for precision; at the time this standard was published, 14 leap seconds had been added since January 1, 1970. These 14 seconds are ignored to provide an easy and compatible method of computing time differences.</p> </blockquote> <p>I, too, love to ignore things to make my life easy. The standard authors knew “seconds since the epoch” were not, in fact, seconds since the epoch. And they admit as much:</p> <blockquote> <p>Most systems’ notion of “time” is that of a continuously-increasing value, so this value should increase even during leap seconds. However, not only do most systems not keep track of leap seconds, but most systems are probably not synchronized to any standard time reference. Therefore, it is inappropriate to require that a time represented as seconds since the Epoch precisely represent the number of seconds between the referenced time and the Epoch.</p> <p>It is sufficient to require that applications be allowed to treat this time as if it represented the number of seconds between the referenced time and the Epoch. It is the responsibility of the vendor of the system, and the administrator of the system, to ensure that this value represents the number of seconds between the referenced time and the Epoch as closely as necessary for the application being run on that system….</p> </blockquote> <p>I imagine there was some debate over this point. The appendix punts, saying that vendors and administrators must make time align “as closely as necessary”, and that “this value should increase even during leap seconds”. The latter is achievable, but the former is arguably impossible: the standard requires POSIX clocks be twenty-seven seconds off.</p> <blockquote> <p>Consistent interpretation of seconds since the Epoch can be critical to certain types of distributed applications that rely on such timestamps to synchronize events. The accrual of leap seconds in a time standard is not predictable. The number of leap seconds since the Epoch will likely increase. The standard is more concerned about the synchronization of time between applications of astronomically short duration and the Working Group expects these concerns to become more critical in the future.</p> </blockquote> <p>In a sense, the opposite happened. Time synchronization is <em>always</em> off, so systems generally function (however incorrectly) when times drift a bit. But leap seconds are rare, and the linearity evoked by the phrase “seconds since the epoch” is so deeply baked in to our intuition, that software can accrue serious, unnoticed bugs. Until a few years later, one of those tiny little leap seconds takes down a big chunk of the internet.</p> <h2><a href="#what-to-do-instead" id="what-to-do-instead">What To Do Instead</a></h2> <p>If you just need to compute the duration between two events on one computer, use <a href="https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/sect-posix_clocks#sect-POSIX_clocks"><code>CLOCK_MONOTONIC</code></a>, or better yet, <code>CLOCK_BOOTTIME</code>. If you don’t need to exchange timestamps with other systems that assume POSIX time, use <a href="https://www.ipses.com/eng/in-depth-analysis/standard-of-time-definition/">TAI, GPS, or maybe LORAN</a>. If you do need rough alignment with other POSIX-timestamp systems, <a href="https://developers.google.com/time/smear">smear leap seconds</a> over a longer window of time. Libraries like <a href="https://github.com/qntm/t-a-i">qntm’s t-a-i</a> can convert back and forth between POSIX and TAI.</p> <p>There’s an ongoing effort to <a href="https://www.timeanddate.com/news/astronomy/end-of-leap-seconds-2022">end leap seconds</a>, hopefully <a href="https://www.bipm.org/documents/20126/64811223/Resolutions-2022.pdf/281f3160-fc56-3e63-dbf7-77b76500990f">by 2035</a>. It’ll require additional work to build conversion tables into everything that relies on the “86,400 seconds per day” assumption, but it should also make it much simpler to ask questions like “how many seconds between these two times”. At least for times after 2035!</p> </content></entry><entry><id>https://aphyr.com/posts/371-threads-wont-take-you-south-of-market</id><title>Threads Won't Take You South of Market</title><published>2024-12-01T10:01:36-05:00</published><updated>2024-12-01T10:01:36-05:00</updated><link rel="alternate" href="https://aphyr.com/posts/371-threads-wont-take-you-south-of-market"></link><author><name>Aphyr</name><uri>https://aphyr.com/</uri></author><content type="html"><p>In June 2023, when <a href="https://threads.net">Threads</a> announced their <a href="https://techcrunch.com/2023/07/05/adam-mosseri-says-metas-threads-app-wont-have-activitypub-support-at-launch/">plans to federate</a> with other <a href="https://en.wikipedia.org/wiki/Fediverse">Fediverse instances</a>, there was a good deal of <a href="https://fedipact.online/">debate</a> around whether smaller instances should allow federation or block it pre-emptively. As one of the admins of <a href="https://woof.group">woof.group</a>, I wrote about some of the <a href="https://blog.woof.group/announcements/considering-large-instance-federation">potential risks and rewards</a> of federating with Threads. We decided to <a href="https://blog.woof.group/announcements/deferring-threads-federation">wait and see</a>.</p> <p>In my queer and leather circles, Facebook and Instagram have been generally understood as hostile environments for over a decade. In 2014, their <a href="https://www.eff.org/deeplinks/2014/09/facebooks-real-name-policy-can-cause-real-world-harm-lgbtq-community">“Real Name” policy</a> made life particularly difficult for trans people, drag queens, sex workers, and people who, for various reasons, needed to keep their real name disconnected from their queer life. My friends have been repeatedly suspended from both platforms for showing too much skin, or using the peach emoji. Meta’s moderation has been aggressive, opaque, and wildly inconsistent: sometimes full nudity is fine; other times a kiss or swimsuit is beyond the line. In some circles, maintaining a series of backup accounts in advance of one’s ban became de rigueur.</p> <p>I’d hoped that federation between Threads and the broader Fediverse might allow a <a href="https://blog.woof.group/mods/the-shape-of-social-space">more nuanced spectrum</a> of moderation norms. Threads might opt for a more conservative environment locally, but through federation, allow their users to interact with friends on instances with more liberal norms. Conversely, most of my real-life friends are still on Meta services—I’d love to see their posts and chat with them again. Threads could communicate with Gay Fedi (using the term in the broadest sense), and de-rank or hide content they don’t like on a per-post or per-account basis.</p> <p>This world seems technically feasible. Meta reports <a href="https://techcrunch.com/2024/11/03/threads-now-has-275m-monthly-active-users/">275 million Monthly Active Users (MAUs)</a>, and over <a href="https://www.statista.com/statistics/1092227/facebook-product-dau/">three billion</a> accross other Meta services. Fediverse has something like <a href="https://fedidb.org/">one million MAUs across various instances</a>. This is not a large jump in processing or storage; nor would it seem to require a large increase in moderation staff. Threads has already committed to doing the requisite engineering, user experience, and legal work to allow federation across a broad range of instances. Meta is swimming in cash.</p> <p>All this seems a moot point. A year and a half later, Threads <a href="https://www.theverge.com/24107998/threads-fediverse-mastodon-how-to">is barely half federated</a>. It publishes Threads posts to the world, but only if you dig in to the settings and check the “Fediverse Sharing” box. Threads users can see replies to their posts, but can’t talk back. Threads users can’t mention others, see mentions from other people, or follow anyone outside Threads. This may work for syndication, but is essentially unusable for conversation.</p> <p>Despite the fact that Threads users can’t follow or see mentions from people on other instances, Threads has already <a href="https://www.threads.net/moderated_servers">opted to block</a> a slew of instances where gay &amp; leather people congregate. Threads blocks <a href="https://hypno.social">hypno.social</a>, <a href="rubber.social">rubber.social</a>, <a href="https://4bear.com">4bear.com</a>, <a href="https://nsfw.lgbt">nsfw.lgbt</a>, <a href="https://kinkyelephant.com">kinkyelephant.com</a>, <a href="https://kinktroet.social">kinktroet.social</a>, <a href="https://barkclub.xyz">barkclub.xyz</a>, <a href="https://mastobate.social">mastobate.social</a>, and <a href="https://kinky.business">kinky.business</a>. They also block the (now-defunct) instances <a href="https://bear.community">bear.community</a>, <a href="https://gaybdsm.group">gaybdsm.group</a>, and <a href="https://gearheads.social">gearheads.social</a>. They block more general queer-friendly instances like <a href="https://bark.lgbt">bark.lgbt</a>, <a href="https://super-gay.co">super-gay.co</a>, <a href="https://gay.camera">gay.camera</a>, and <a href="https://gaygeek.social">gaygeek.social</a>. They block sex-positive instances like <a href="https://nsfwphotography.social">nsfwphotography.social</a>, <a href="https://nsfw.social">nsfw.social</a>, and <a href="https://net4sw.com">net4sw.com</a>. All these instances are blocked for having “violated our Community Standards or Terms of Use”. Others like <a href="https://fisting.social">fisting.social</a>, <a href="https://mastodon.hypnoguys.com">mastodon.hypnoguys.com</a>, <a href="https://abdl.link">abdl.link</a>, <a href="https://qaf.men">qaf.men</a>, and <a href="https://social.rubber.family">social.rubber.family</a>, are blocked for having “no publicly accessible feed”. I don’t know what this means: hypnoguys.social, for instance, has the usual Mastodon <a href="https://mastodon.hypnoguys.com/public/local">publically accessible local feed</a>.</p> <p>It’s not like these instances are hotbeds of spam, hate speech, or harassment: woof.group federates heavily with most of the servers I mentioned above, and we rarely have problems with their moderation. Most have reasonable and enforced media policies requiring sensitive-media flags for genitals, heavy play, and so on. Those policies are generally speaking looser than Threads (woof.group, for instance, allows butts!) but there are plenty of accounts and posts on these instances which would be anodyne under Threads’ rules.</p> <p>I am shocked that woof.group is <em>not</em> on Threads’ blocklist yet. We have similar users who post similar things. Our content policies are broadly similar—several of the instances Threads blocks actually adopted woof.group’s specific policy language. I doubt it’s our size: Threads blocks several instances with less than ten MAUs, and woof.group has over seven hundred.</p> <p>I’ve been out of the valley for nearly a decade, and I don’t have insight into Meta’s policies or decision-making. I’m sure Threads has their reasons. Whatever they are, Threads, like all of Meta’s services, feels distinctly uncomfortable with sex, and sexual expression is a vibrant aspect of gay culture.</p> <p>This is part of why I started woof.group: we deserve spaces moderated with our subculture in mind. But I also hoped that by designing a moderation policy which compromised with normative sensibilities, we might retain connections to a broader set of publics. This particular leather bar need not be an invite-only clubhouse; it can be a part of a walkable neighborhood. For nearly five years we’ve kept that balance, retaining open federation with most all the Fediverse. I get the sense that Threads intends to wall its users off from our world altogether—to make “bad gays” invisible. If Threads were a taxi service, it wouldn’t take you <a href="https://sfleatherdistrict.org/wp-content/uploads/2021/04/Rubin-Valley-of-Kings.pdf">South of Market</a>.</p> </content></entry><entry><id>https://aphyr.com/posts/370-ecobee-settings-for-heat-pumps-with-resistive-aux-heat</id><title>Ecobee Settings for Heat Pumps with Resistive Aux Heat</title><published>2024-02-28T23:41:38-05:00</published><updated>2024-02-28T23:41:38-05:00</updated><link rel="alternate" href="https://aphyr.com/posts/370-ecobee-settings-for-heat-pumps-with-resistive-aux-heat"></link><author><name>Aphyr</name><uri>https://aphyr.com/</uri></author><content type="html"><p>I’m in the process of replacing a old radiator system with a centrally-ducted, air-source heat pump system with electric resistive backup heat. I’ve found that the default ecobee algorithm seems to behave surprisingly poorly for this system, and wanted to write up some of the settings that I’ve found yield better behavior.</p> <p>A disclaimer. I’m not an HVAC professional. I have two decades in software operations, a background in physics, and far too much experience inferring system dynamics from timeseries graphs. This advice may void your warranty, burn your house down, etc.; everything you do is at your own risk.</p> <h2><a href="#the-system" id="the-system">The System</a></h2> <p>First, a bit about the system in question. You can skip this section if you know about heat pumps, short cycling, staging, etc.</p> <p>There are two main subsystems: a heat pump and an air handler. The heat pump sits outside: it has a fan which moves outside air over a heat exchanger, and a compressor, which compresses a working fluid. The working fluid is connected in a loop to the air handler, where it runs through another heat exchanger to heat or cool the inside air. The air handler also has a blower fan which circulates air through the whole house. If the heat pump can’t keep up with demand, the air handler also has a pair of resistive electric heating coils, called <em>aux heat</em>, which can supplement or take over from the heat pumps.</p> <p>A few important things to know about heat pumps. First, electric resistive heaters have a <em>Coefficient of Performance</em> (CoP) of essentially 1: they take 1 joule of electricity and turn it into 1 joule of heat in the air. My heat pumps have a typical heating CoP of about 2-4, depending on temperature and load. They take 1 joule of electricity and suck 2 to 4 joules of heat from the outside air into the inside. This means they cost 2-4 times less (in electric opex, at least) than a standard resistive electric heating system.</p> <p>Second, heat pumps, like A/C systems, shouldn’t start and stop too frequently. Starting up causes large transient electrical and mechanical stresses. Ideally they should run at a low speed for several hours, rather than running at full blast, shutting off, then turning on again ten minutes later. This is called “short cycling”.</p> <p>Third, the heat pump’s fan, heat pump’s compressor, and the air handler’s fan are all variable-speed: they can run very slow (quiet, efficient), very fast (loud, more powerful), or at any speed in between. This helps reduce short-cycling, as well as improving efficiency and reducing noise. However, directly setting compressor and fan speed requires a special “communicating” thermostat made by the same manufacturer, which speaks a proprietary wire protocol. My manufacturer’s communicating thermostats are very expensive and have a reputation for buggy hardware and software, so I opted to get an <a href="https://www.ecobee.com/en-us/smart-thermostats/smart-wifi-thermostat/">ecobee 3 lite</a>. Like essentially every other thermostat on the planet, the ecobee uses ~8 wires with simple binary signals, like “please give me heat” and “please turn on the fan”. It can’t ask for a specific <em>amount</em> of heat.</p> <p>However, all is not lost. The standard thermostat protocol has a notion of a “two-stage” system—if the Y1 wire is hot, it’s asking for “some heat”, and if Y2 is also hot, it’s asking for “more heat”. My variable-speed heat pump emulates a two-stage system using a hysteresis mechanism. In stage 1, the heat pump offers some nominal low degree of heat. When the thermostat calls for stage 2, it kicks up the air handler blower a notch, and after 20 minutes, it slowly ramps up the heat pump compressor as well. I assume there’s a ramp-down for going back to stage 1. They say this provides “true variable-capacity operation”. You can imagine that the most efficient steady state is where the thermostat toggles rapidly between Y1 and Y2, causing the system to hang out at exactly the right variable speeds for current conditions—but I assume ecobee has some kind of of frequency limiter to avoid damaging systems that actually have two separate stages with distinct startup/shutdown costs.</p> <p>The air handler’s aux heat is also staged: if the W1 wire is hot, I think (based on staring at the wiring diagram and air handler itself) it just energizes one of two coils. If W2 is also hot, it energizes both. I think this is good: we want to use as much of the heat pump heat as possible, and if we can get away with juuuust a little aux heat, instead of going full blast, that’ll save energy.</p> <p>In short: aux heat is 2-4x more expensive than heat pump heat; we want to use as little aux as possible. Short-cycling is bad: we want long cycle times. For maximum efficiency, we want both the heat pump and aux heat to be able to toggle between stage 1 and 2 depending on demand.</p> <h2><a href="#automatic-problems" id="automatic-problems">Automatic Problems</a></h2> <p>I initially left the ecobee at its automatic default settings for a few weeks; it’s supposed to learn the house dynamics and adapt. I noticed several problems. Presumably this behavior depends on weather, building thermal properties, HVAC dynamics, and however ecobee’s tuned their algorithm last week, so YMMV: check your system and see how it looks.</p> <p>It’s kind of buried, but ecobee offers a really nice time-series visualization of thermostat behavior on their web site. There’s also a Home Assistant integration that pulls in data from their API. It’s a pain in the ass to set up (ecobee, there’s no need for this to be so user-hostile), but it does work.</p> <p>Over the next few weeks I stared obsessively at time-series plots from both ecobee and Home Assistant, and mucked around with ecobee’s settings. Most of what I’ll describe below is configurable in the settings menu on the thermostat: look for “settings”, “installation settings”, “thresholds”.</p> <h2><a href="#reducing-aux-heat" id="reducing-aux-heat">Reducing Aux Heat</a></h2> <p>First, the automatics kicked on aux heat a <em>lot</em>. Even in situations where the heat pump would have been perfectly capable of getting up to temp, ecobee would burn aux heat to reach the target temperature (<em>set point</em>) faster.</p> <p>Part of the problem was that ecobee ships (I assume for safety reasons) with ludicrously high cut-off thresholds for heat pumps. Mine had “compressor min outdoor temperature” of something like 35 degrees, so the heat pump wouldn’t run for most of the winter. The actual minimum temperature of my model is -4, cold-climate heat pumps run down to -20. I lowered mine to -5; the manual says there’s a physical thermostat interlock on the heat pump itself, and I trust that more than the ecobee weather feed anyway.</p> <p>Second: ecobee seems to prioritize speed over progress: if it’s not getting to the set point fast enough, it’ll burn aux heat to get there sooner. I don’t want this: I’m perfectly happy putting on a jacket. After a bit I worked out that the heat pumps alone can cover the house load down to ~20 degrees or so, and raised “aux heat max outdoor temperature” to 25. If it’s any warmer than that, the system won’t use aux heat.</p> <h2><a href="#reverse-staging" id="reverse-staging">Reverse Staging</a></h2> <p>A second weird behavior: once the ecobee called for stage 2, either from the heat pump or aux, it would run in stage 2 until it hit the set point, then shut off the system entirely. Running aux stage 2 costs more energy. Running the heat pump in stage 2 shortens the cycle time: remember, the goal is a low, long running time.</p> <p><img class="attachment pure-img" src="/data/posts/370/no-reverse-staging.png" alt="A time-series plot showing that once stage 2 engages, it runs until shutting off, causing frequent cycling" title="A time-series plot showing that once stage 2 engages, it runs until shutting off, causing frequent cycling"></p> <p>The setting I used to fix this is called “reverse staging”. Ecobee’s <a href="https://support.ecobee.com/s/articles/Threshold-settings-for-ecobee-thermostats">documentation</a> says:</p> <blockquote> <p>Compressor Reverse Staging: Enables the second stage of the compressor near the temperature setpoint.</p> </blockquote> <p>As far as I can tell this documentation is completely wrong. From watching the graphs, this setting seems to allow the staging state machine to move from stage 2 back to stage 1, rather than forcing it to run in stage 2 until shutting off entirely. It’ll go back up to stage 2 if it needs to, and back down again.</p> <p><img class="attachment pure-img" src="/data/posts/370/reverse-staging.png" alt="With reverse staging, it'll jump up to stage 2, then drop back down to stage 1." title="With reverse staging, it'll jump up to stage 2, then drop back down to stage 1."></p> <h2><a href="#manual-staging" id="manual-staging">Manual Staging</a></h2> <p>I couldn’t seem to get ecobee’s automatic staging to drop back to stage 1 heat reliably, or avoid kicking on aux heat when stage 2 heat pump heat would have done fine. I eventually gave up and turned off automatic staging altogether. I went with the delta temperature settings. If the temperature delta between the set point and indoor air is more than 1 degree, it turns on heat pump stage 1. More than two degrees, stage 2. More than four degrees, aux 1. More than five degrees, aux 2. The goal here is to use only as much aux heat as absolutely necessary to supplement the heat pump. I also have aux heat configured to run concurrently with the heat pump: there’s a regime where the heat pump provides useful heat, but not quite enough, and my intuition is that <em>some</em> heat pump heat is cheaper than all aux.</p> <p>I initially tried the default 0.5 degree delta before engaging the heat pump’s first stage. It turns out that for some temperature regimes this creates rapid cycling: that first-phase heat is enough to heat the house rapidly to the set point, and then there’s nothing to do but shut the system off. The house cools, and the system kicks on again, several times per hour. I raised the delta to 1 degree, which significantly extended the cycle time.</p> <h2><a href="#large-setback-with-preheating" id="large-setback-with-preheating">Large Setback with Preheating</a></h2> <p>A <em>setback</em> is when you lower your thermostat, e.g. while away from home or sleeping. There’s some folk wisdom that heat pumps should run at a constant temperature all the time, rather than have a large setback. As far as I can tell, this is because a properly-sized heat pump system (unlike a gas furnace) doesn’t deliver a ton of excess heat, so it can’t catch up quickly when asked to return to a higher temperature. To compensate, the system might dip into aux heat, and that’s super expensive.</p> <p>I’m in the US Midwest, where winter temperatures are usually around 15-40 F. I drop from 68 to 60 overnight, and the house can generally coast all night without having to run any HVAC at all. In theory the ecobee should be able to figure out the time required to come back to 68 and start the heat pump early in the morning, but in practice I found it would wait too long, and then the large difference between actual and set temp would trigger aux heat. To avoid this, I added a custom activity in Ecobee’s web interface (I call mine “preheat”), with a temperature of 64. I have my schedule set up with an hour of preheat in the morning, before going to the normal 68. This means there’s less of a delta-T, and the system can heat up entirely using the heat pump.</p> <p><img class="attachment pure-img" src="/data/posts/370/setback.png" alt="A time series graph showing temperature falling smoothly overnight as the HVAC is disabled, and then rising during the preheat phase in the morning." title="A time series graph showing temperature falling smoothly overnight as the HVAC is disabled, and then rising during the preheat phase in the morning."></p> </content></entry><entry><id>https://aphyr.com/posts/369-classnotfoundexception-java-util-sequencedcollection</id><title>ClassNotFoundException: java.util.SequencedCollection</title><published>2024-02-20T18:03:42-05:00</published><updated>2024-02-20T18:03:42-05:00</updated><link rel="alternate" href="https://aphyr.com/posts/369-classnotfoundexception-java-util-sequencedcollection"></link><author><name>Aphyr</name><uri>https://aphyr.com/</uri></author><content type="html"><p>Recently I’ve had users of my libraries start reporting mysterious errors due to a missing reference to <code>SequencedCollection</code>, a Java interface added in JDK 21:</p> <pre><code>Execution error (ClassNotFoundException) at jdk.internal.loader.BuiltinClassLoader/loadClass (BuiltinClassLoader.java:641). java.util.SequencedCollection </code></pre> <p>Specifically, projects using <a href="https://github.com/jepsen-io/jepsen/issues/585">Jepsen 0.3.5</a> started throwing this error due to Clojure’s built-in <code>rrb_vector.clj</code>, which is particularly vexing given that the class doesn’t reference <code>SequencedCollection</code> at all.</p> <p>It turns out that the Clojure compiler, when run on JDK 21 or later, will automatically insert references to this class when compiling certain expressions–likely because it now appears in the supertypes of other classes. Jepsen had <code>:javac-options [&quot;-source&quot; &quot;11&quot; &quot;-target&quot; &quot;11&quot;]</code> in Jepsen’s <code>project.clj</code> already, but it still emitted references to <code>SequencedCollection</code> because the reference is inserted by the Clojure compiler, not <code>javac</code>. Similarly, adding <code>[&quot;--release&quot; &quot;11&quot;]</code> didn’t work.</p> <p>Long story short: as far as I can tell the only workaround is to downgrade to Java 17 (or anything prior to 21) when building Jepsen as a library. That’s not super hard with <code>update-alternatives</code>, but I still imagine I’ll be messing this up until Clojure’s compiler can get a patch.</p> </content></entry><entry><id>https://aphyr.com/posts/368-how-to-replace-your-cpap-in-only-666-days</id><title>How to Replace Your CPAP In Only 666 Days</title><published>2024-02-03T19:38:53-05:00</published><updated>2024-02-03T19:38:53-05:00</updated><link rel="alternate" href="https://aphyr.com/posts/368-how-to-replace-your-cpap-in-only-666-days"></link><author><name>Aphyr</name><uri>https://aphyr.com/</uri></author><content type="html"><p><em>This story is not practical advice. For me, it’s closing the book on an almost two-year saga. For you, I hope it’s an enjoyable bit of bureaucratic schadenfreude. For Anthem, I hope it’s the subject of a series of painful but transformative meetings. This is not an isolated event. I’ve had dozens of struggles with Anthem customer support, and they all go like this.</em></p> <p><em>If you’re looking for practical advice: it’s this. Be polite. Document everything. Keep a log. Follow the claims process. Check the laws regarding insurance claims in your state. If you pass the legally-mandated deadline for your claim, call customer service. Do not allow them to waste a year of your life, or force you to resubmit your claim from scratch. Initiate a complaint with your state regulators, and escalate directly to <a href="mailto:[email protected]">Gail Boudreaux’s team</a>–or whoever Anthem’s current CEO is.</em></p> <p>To start, experience an equipment failure.</p> <p>Use your CPAP daily for six years. Wake up on day zero with it making a terrible sound. Discover that the pump assembly is failing. Inquire with Anthem Ohio, your health insurer, about how to have it repaired. Allow them to refer you to a list of local durable medical equipment providers. Start calling down the list. Discover half the list are companies like hair salons. Eventually reach a company in your metro which services CPAPs. Discover they will not repair broken equipment unless a doctor tells them to.</p> <p>Leave a message with your primary care physician. Call the original sleep center that provided your CPAP. Discover they can’t help, since you’re no longer in the same state. Return to your primary, who can’t help either, because he had nothing to do with your prescription. Put the sleep center and your primary in touch, and ask them to talk.</p> <p>On day six, call your primary to check in. He’s received a copy of your sleep records, and has forwarded them to a local sleep center you haven’t heard of. They, in turn, will talk to Anthem for you.</p> <p>On day 34, receive an approval letter labeled “confirmation of medical necessity” from Anthem, directed towards the durable medical equipment company. Call that company and confirm you’re waitlisted for a new CPAP. They are not repairable. Begin using your partner’s old CPAP, which is not the right class of device, but at least it helps.</p> <p>Over the next 233 days, call that medical equipment company regularly. Every time, inquire whether there’s been any progress, and hear “we’re still out of stock”. Ask them you what the manufacturer backlog might be, how many people are ahead of you in line, how many CPAPs they <em>do</em> receive per month, or whether anyone has ever received an actual device from them. They won’t answer any questions. Realize they are never going to help you.</p> <p>On day 267, realize there is no manufacturer delay. The exact machine you need is in stock on CPAP.com. Check to make sure there’s a claims process for getting reimbursed by Anthem. Pay over three thousand dollars for it. When it arrives, enjoy being able to breathe again.</p> <p>On day 282, follow CPAP.com’s documentation to file a claim with Anthem online. Include your prescription, receipt, shipping information, and the confirmation of medical necessity Anthem sent you.</p> <p>On day 309, open the mail to discover a mysterious letter from Anthem. They’ve received your appeal. You do not recall appealing anything. There is no information about what might have been appealed, but something will happen within 30-60 days. There is nothing about your claim.</p> <p>On day 418, emerge from a haze of lead, asbestos, leaks, and a host of other home-related nightmares; remember Anthem still hasn’t said anything about your claim. Discover your claim no longer appears on Anthem’s web site. Call Anthem customer service. They have no record of your claim either. Ask about the appeal letter you received. Listen, gobsmacked, as they explain that they decided your claim was in fact an appeal, and transferred it immediately to the appeals department. The appeals department examined the appeal and looked for the claim it was appealing. Finding none, they decided the appeal was moot, and rejected it. At no point did anyone inform you of this. Explain to Anthem’s agent that you filed a claim online, not an appeal. At their instruction, resign yourself to filing the entire claim again, this time using a form via physical mail. Include a detailed letter explaining the above.</p> <p>On day 499, retreat from the battle against home entropy to call Anthem again. Experience a sense of growing dread as the customer service agent is completely unable to locate either of your claims. After a prolonged conversation, she finds it using a different tool. There is no record of the claim from day 418. There was a claim submitted on day 282. Because the claim does not appear in her system, there is no claim. Experience the cognitive equivalent of the Poltergeist hallway shot as the agent tells you “Our members are not eligible for charges for claim submission”.</p> <p>Hear the sentence “There is a claim”. Hear the sentence “There is no claim”. Write these down in the detailed log you’ve been keeping of this unfurling Kafkaesque debacle. Ask again if there is anyone else who can help. There is no manager you can speak to. There is no tier II support. “I’m the only one you can talk to,” she says. Write that down.</p> <p>Call CPAP.com, which has a help line staffed by caring humans. Explain that contrary to their documentation, Anthem now says members cannot file claims for equipment directly. Ask if they are the provider. Discover the provider for the claim is probably your primary care physician, who has no idea this is happening. Leave a message with him anyway. Leave a plaintive message with your original sleep center for good measure.</p> <p>On day 502, call your sleep center again. They don’t submit claims to insurance, but they confirm that some people <em>do</em> successfully submit claims to Anthem using the process you’ve been trying. They confirm that Anthem is, in fact, hot garbage. Call your primary, send them everything you have, and ask if they can file a claim for you.</p> <p>On day 541, receive a letter from Anthem, responding to your inquiry. You weren’t aware you filed one.</p> <blockquote> <p>Please be informed that we have received your concern. Upon review we have noticed that there is no claim billed for the date of service mentioned in the submitted documents, Please provide us with a valid claim. If not submitted,provide us with a valid claim iamge to process your claim further.</p> </blockquote> <p>Stare at the letter, typos and all. Contemplate your insignificance in the face of the vast and uncaring universe that is Anthem.</p> <p>On day 559, steel your resolve and call Anthem again. Wait as this representative, too, digs for evidence of a claim. Listen with delight as she finds your documents from day 282. Confirm that yes, a claim definitely exists. Have her repeat that so you can write it down. Confirm that the previous agent was lying: members can submit claims. At her instruction, fill out the claim form a third time. Write a detailed letter, this time with a Document Control Number (DCN). Submit the entire package via registered mail. Wait for USPS to confirm delivery eight days later.</p> <p>On day 588, having received no response, call Anthem again. Explain yourself. You’re getting good at this. Let the agent find a reference number for an appeal, but not the claim. Incant the magic DCN, which unlocks your original claim. “I was able to confirm that this was a claim submitted form for a member,” he says. He sees your claim form, your receipts, your confirmation of medical necessity. However: “We still don’t have the claim”.</p> <p>Wait for him to try system after system. Eventually he confirms what you heard on day 418: the claims department transferred your claims to appeals. “Actually this is not an appeal, but it was denied as an appeal.” Agree as he decides to submit your claim manually again, with the help of his supervisor. Write down the call ref number: he promises you’ll receive an email confirmation, and an Explanation of Benefits in 30-40 business days.</p> <p>“I can assure you this is the last time you are going to call us regarding this.”</p> <p>While waiting for this process, recall insurance is a regulated industry. Check the Ohio Revised Code. Realize that section 3901.381 establishes deadlines for health insurers to respond to claims. They should have paid or denied each of your claims within 30 days–45 if supporting documentation was required. Leave a message with the Ohio Department of Insurance’s Market Conduct Division. File an insurance complaint with ODI as well.</p> <p>Grimly wait as no confirmation email arrives.</p> <p>On day 602, open an email from Anthem. They are “able to put the claim in the system and currenty on processed [sic] to be applied”. They’re asking for more time. Realize that Anthem is well past the 30-day deadline under the Ohio Revised Code for all three iterations of your claim.</p> <p>On day 607, call Anthem again. The representative explains that the claim will be received and processed as of your benefits. She asks you to allow 30-45 days from today. Quote section 3901.381 to her. She promises to expedite the request; it should be addressed within 72 business hours. Like previous agents, she promises to call you back. Nod, knowing she won’t.</p> <p>On day 610, email the Ohio Department of Insurance to explain that Anthem has found entirely new ways to avoid paying their claims on time. It’s been 72 hours without a callback; call Anthem again. She says “You submitted a claim and it was received” on day 282. She says the claim was expedited. Ask about the status of that expedited resolution. “Because on your plan we still haven’t received any claims,” she explains. Wonder if you’re having a stroke.</p> <p>Explain that it has been 328 days since you submitted your claim, and ask what is going on. She says that since the first page of your mailed claim was a letter, that might have caused it to be processed as an appeal. Remind yourself Anthem told you to enclose that letter. Wait as she attempts to refer you to the subrogation department, until eventually she gives up: the subrogation department doesn’t want to help.</p> <p>Call the subrogation department yourself. Allow Anthem’s representative to induce in you a period of brief aphasia. She wants to call a billing provider. Try to explain there is none: you purchased the machine yourself. She wants to refer you to collections. Wonder why on earth Anthem would want money from <em>you</em>. Write down “I literally can’t understand what she thinks is going on” in your log. Someone named Adrian will call you by tomorrow.</p> <p>Contemplate alternative maneuvers. Go on a deep Google dive, searching for increasingly obscure phrases gleaned from Anthem’s bureaucracy. Trawl through internal training PDFs for Anthem’s ethics and compliance procedures. Call their compliance hotline: maybe someone cares about the law. It’s a third-party call center for Elevance Health. Fail to realize this is another name for Anthem. Begin drawing a map of Anthem’s corporate structure.</p> <p>From a combination of publicly-available internal slide decks, LinkedIn, and obscure HR databases, discover the name, email, and phone number of Anthem’s Chief Compliance Officer. Call her, but get derailed by an internal directory that requires a 10-digit extension. Try the usual tricks with automated phone systems. No dice.</p> <p>Receive a call from an Anthem agent. Ask her what happened to “72 hours”. She says there’s been no response from the adjustments team. She doesn’t know when a response will come. There’s no one available to talk to. Agree to speak to another representative tomorrow. It doesn’t matter: they’ll never call you.</p> <p>Do more digging. Guess the CEO’s email from what you can glean of Anthem’s account naming scheme. Write her an email with a short executive summary and a detailed account of the endlessly-unfolding Boschian hellscape in which her company has entrapped you. A few hours later, receive an acknowledgement from an executive concierge at Elevance (Anthem). It’s polite, formal, and syntactically coherent. She promises to look into things. Smile. Maybe this will work.</p> <p>On day 617, receive a call from the executive concierge. 355 days after submission, she’s identified a problem with your claim. CPAP.com provided you with an invoice with a single line item (the CPAP) and two associated billing codes (a CPAP and humidifier). Explain that they are integrated components of a single machine. She understands, but insists you need a receipt with multiple line items for them anyway. Anthem has called CPAP.com, but they can’t discuss an invoice unless you call them. Explain you’ll call them right now.</p> <p>Call CPAP.com. Their customer support continues to be excellent. Confirm that it is literally impossible to separate the CPAP and humidifier, or to produce an invoice with two line items for a single item. Nod as they ask what the hell Anthem is doing. Recall that this is the exact same machine Anthem covered for you eight years ago. Start a joint call with the CPAP.com representative and Anthem’s concierge. Explain the situation to her voicemail.</p> <p>On day 623, receive a letter from ODI. Anthem has told ODI this was a problem with the billing codes, and ODI does not intervene in billing code issues. They have, however, initiated a secretive second investigation. There is no way to contact the second investigator.</p> <p>Write a detailed email to the concierge and ODI explaining that it took over three hundred days for Anthem to inform you of this purported billing code issue. Explain again that it is a single device. Emphasize that Anthem has been handling claims for this device for roughly a decade.</p> <p>Wait. On day 636, receive a letter from Anthem’s appeals department. They’ve received your request for an appeal. You never filed one. They want your doctor or facility to provide additional information to Carelon Medical Benefits Management. You have never heard of Carelon. There is no explanation of how to reach Carelon, or what information they might require. The letter concludes: “There is currently no authorization on file for the services rendered.” You need to seek authorization from a department called “Utilization Management”.</p> <p>Call the executive concierge again. Leave a voicemail asking what on earth is going on.</p> <p>On day 637, receive an email: she’s looking into it.</p> <p>On day 644, Anthem calls you. It’s a new agent who is immensely polite. Someone you’ve never heard of was asked to work on another project, so she’s taking over your case. She has no updates yet, but promises to keep in touch.</p> <p>She does so. On day 653, she informs you Anthem will pay your claim in full. On day 659, she provides a check number. On day 666, the check arrives.</p> <p>Deposit the check. Write a thank you email to the ODI and Anthem’s concierge. Write this, too, down in your log.</p> </content></entry><entry><id>https://aphyr.com/posts/367-why-is-jepsen-written-in-clojure</id><title>Why is Jepsen Written in Clojure?</title><published>2023-12-05T09:49:05-05:00</published><updated>2023-12-05T09:49:05-05:00</updated><link rel="alternate" href="https://aphyr.com/posts/367-why-is-jepsen-written-in-clojure"></link><author><name>Aphyr</name><uri>https://aphyr.com/</uri></author><content type="html"><p>People keep asking why <a href="https://jepsen.io">Jepsen</a> is written in <a href="https://clojure.org/">Clojure</a>, so I figure it’s worth having a referencable answer. I’ve programmed in something like twenty languages. Why choose a Weird Lisp?</p> <p>Jepsen is built for testing concurrent systems–mostly databases. Because it tests concurrent systems, the language itself needs good support for concurrency. Clojure’s immutable, persistent data structures make it easier to write correct concurrent programs, and the language and runtime have excellent concurrency support: real threads, promises, futures, atoms, locks, queues, cyclic barriers, all of java.util.concurrent, etc. I also considered languages (like Haskell) with more rigorous control over side effects, but decided that Clojure’s less-dogmatic approach was preferable.</p> <p>Because Jepsen tests databases, it needs broad client support. Almost every database has a JVM client, typically written in Java, and Clojure has decent Java interop.</p> <p>Because testing is experimental work, I needed a language which was concise, adaptable, and well-suited to prototyping. Clojure is terse, and its syntactic flexibility–in particular, its macro system–work well for that. In particular the threading macros make chained transformations readable, and macros enable re-usable error handling and easy control of resource scopes. The Clojure REPL is really handy for exploring the data a test run produces.</p> <p>Tests involve representing, transforming, and inspecting complex, nested data structures. Clojure’s data structures and standard library functions are possibly the best I’ve ever seen. I also print a lot of structures to the console and files: Clojure’s data syntax (EDN) is fantastic for this.</p> <p>Because tests involve manipulating a decent, but not huge, chunk of data, I needed a language with “good enough” performance. Clojure’s certainly not the fastest language out there, but idiomatic Clojure is usually within an order of magnitude or two of Java, and I can shave off the difference where critical. The JVM has excellent profiling tools, and these work well with Clojure.</p> <p>Jepsen’s (gosh) about a decade old now: I wanted a language with a mature core and emphasis on stability. Clojure is remarkably stable, both in terms of JVM target and the language itself. Libraries don’t “rot” anywhere near as quickly as in Scala or Ruby.</p> <p>Clojure does have significant drawbacks. It has a small engineering community and no (broadly-accepted, successful) static typing system. Both of these would constrain a large team, but Jepsen’s maintained and used by only 1-3 people at a time. Working with JVM primitives can be frustrating without dropping to Java; I do this on occasion. Some aspects of the polymorphism system are lacking, but these can be worked around with libraries. The error messages are terrible. I have no apologetics for this. ;-)</p> <p>I prototyped Jepsen in a few different languages before settling on Clojure. A decade in, I think it was a pretty good tradeoff.</p> </content></entry><entry><id>https://aphyr.com/posts/366-finding-domains-that-send-unactionable-reports-in-mastodon</id><title>Finding Domains That Send Unactionable Reports in Mastodon</title><published>2023-09-03T10:25:47-05:00</published><updated>2023-09-03T10:25:47-05:00</updated><link rel="alternate" href="https://aphyr.com/posts/366-finding-domains-that-send-unactionable-reports-in-mastodon"></link><author><name>Aphyr</name><uri>https://aphyr.com/</uri></author><content type="html"><p>One of the things we struggle with on woof.group is un-actionable reports. For various reasons, most of the reports we handle are for posts that are either appropriately content-warned or don’t require a content warning under our content policy–things like faces, butts, and shirtlessness. We can choose to ignore reports from a domain, but we’d rather not do that: it means we might miss out on important reports that require moderator action. We can also talk to remote instance administrators and ask them to talk to their users about not sending copies of reports to the remote instance if they don’t know what the remote instance policy is, but that’s time consuming, and we only want to do it if there’s an ongoing problem.</p> <p>I finally broke down and dug around in the data model to figure out how to get statistics on this. If you’re a Mastodon admin and you’d like to figure out which domains send you the most non-actionable reports, you can run this at <code>rails console</code>:</p> <pre><code><span></span><span class="c1"># Map of domains to [action, no-action] counts</span> <span class="n">stats</span> <span class="o">=</span> <span class="no">Report</span><span class="o">.</span><span class="n">all</span><span class="o">.</span><span class="n">reduce</span><span class="p">({})</span> <span class="k">do</span> <span class="o">|</span><span class="n">stats</span><span class="p">,</span> <span class="n">r</span><span class="o">|</span> <span class="n">domain</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="n">account</span><span class="o">.</span><span class="n">domain</span> <span class="n">domain_stats</span> <span class="o">=</span> <span class="n">stats</span><span class="o">[</span><span class="n">domain</span><span class="o">]</span> <span class="o">||</span> <span class="o">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="o">]</span> <span class="n">action</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="n">history</span><span class="o">.</span><span class="n">any?</span> <span class="k">do</span> <span class="o">|</span><span class="n">a</span><span class="o">|</span> <span class="c1"># If you took action, there'd be a Status or Account target_type</span> <span class="n">a</span><span class="o">.</span><span class="n">target_type</span> <span class="o">!=</span> <span class="s1">'Report'</span> <span class="k">end</span> <span class="k">if</span> <span class="n">action</span> <span class="n">domain_stats</span><span class="o">[</span><span class="mi">0</span><span class="o">]</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">else</span> <span class="n">domain_stats</span><span class="o">[</span><span class="mi">1</span><span class="o">]</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">end</span> <span class="n">stats</span><span class="o">[</span><span class="n">domain</span><span class="o">]</span> <span class="o">=</span> <span class="n">domain_stats</span> <span class="n">stats</span> <span class="k">end</span> <span class="c1"># Top 20 domains, sorted by descending no-action count</span> <span class="n">stats</span><span class="o">.</span><span class="n">sort_by</span> <span class="k">do</span> <span class="o">|</span><span class="n">k</span><span class="p">,</span> <span class="n">stats</span><span class="o">|</span> <span class="n">stats</span><span class="o">[</span><span class="mi">1</span><span class="o">]</span> <span class="k">end</span><span class="o">.</span><span class="n">reverse</span><span class="o">.</span><span class="n">take</span><span class="p">(</span><span class="mi">20</span><span class="p">)</span> </code></pre> <p>For example:</p> <pre><code><span></span><span class="o">[[</span><span class="kp">nil</span><span class="p">,</span> <span class="o">[</span><span class="mi">77</span><span class="p">,</span> <span class="mi">65</span><span class="o">]</span> <span class="p">;</span> <span class="kp">nil</span> <span class="n">is</span> <span class="n">your</span> <span class="n">local</span> <span class="n">instance</span> <span class="o">[</span><span class="s2">&quot;foo.social&quot;</span><span class="p">,</span> <span class="o">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">52</span><span class="o">]]</span><span class="p">,</span> <span class="o">[</span><span class="s2">&quot;mastodon.oh.my.gosh&quot;</span><span class="p">,</span> <span class="o">[</span><span class="mi">40</span><span class="p">,</span> <span class="mi">24</span><span class="o">]]</span><span class="p">,</span> <span class="o">...</span> </code></pre> </content></entry><entry><id>https://aphyr.com/posts/365-10-operations-large-histories-with-jepsen</id><title>10⁹ Operations: Large Histories with Jepsen</title><published>2023-08-18T14:45:15-05:00</published><updated>2023-08-18T14:45:15-05:00</updated><link rel="alternate" href="https://aphyr.com/posts/365-10-operations-large-histories-with-jepsen"></link><author><name>Aphyr</name><uri>https://aphyr.com/</uri></author><content type="html"><p><a href="https://github.com/jepsen-io/jepsen">Jepsen</a> is a library for writing tests of concurrent systems: everything from single-node data structures to distributed databases and queues. A key part of this process is recording a <em>history</em> of operations performed during the test. Jepsen <em>checkers</em> analyze a history to find consistency anomalies and to compute performance metrics. Traditionally Jepsen has stored the history in a Clojure vector (an immutable in-memory data structure like an array), and serialized it to disk at the end of the test. This limited Jepsen to histories on the order of tens of millions of operations. It also meant that if Jepsen crashed during a several-hour test run, it was impossible to recover any of the history for analysis. Finally, saving and loading large tests involved long wait times—sometimes upwards of ten minutes.</p> <p>Over the last year I’ve been working on ways to resolve these problems. Generators are up to ten times faster, and can emit up to 50,000 operations per second. A new operation datatype makes each operation smaller and faster to access. Jepsen’s new on-disk format allows us to stream histories incrementally to disk, to work with histories of up to a billion operations far exceeding available memory, to recover safely from crashes, and to load tests almost instantly by deserializing data lazily. New history datatypes support both densely and sparsely indexed histories, and efficiently cache auxiliary indices. They also support lazy disk-backed <code>map</code> and <code>filter</code>. These histories support both linear and concurrent folds, which dramatically improves checker performance on multicore systems: real-world checkers can readily analyze 250,000 to 900,000 operations/sec. Histories support multi-query optimization: when multiple threads fold over the same history, a query planner automatically fuses those folds together to perform them in a single pass. Since Jepsen often folds dozens of times over the same history, this saves a good deal of disk IO and deserialization time. These features are enabled by a new, transactional, dependency-aware task executor.</p> <h2><a href="#background" id="background">Background</a></h2> <p>Jepsen histories are basically a list of operations in time order. The meaning of an operation depends on the system being tested, but an operation could be something like “set <code>x</code> to three”, “read <code>y</code> and <code>z</code> as <code>2</code> and <code>7</code>, respectively”, or “add 64 to the queue”. Each of these logical operations has <em>two</em> operation elements in the history: an <em>invocation</em> when it begins, and a <em>completion</em> when it ends. This tells checkers which operations were performed concurrently, and which took place one after the next. For convenience, we use “operation” to refer both to a single entry in the history, and to an invocation/completion pair—it’s usually clear from context.</p> <p>For example, here’s a history of transactions from an etcd test:</p> <pre><code><span></span><span class="p">[{</span><span class="ss">:type</span> <span class="ss">:invoke</span>, <span class="ss">:f</span> <span class="ss">:txn</span>, <span class="ss">:value</span> <span class="p">[[</span><span class="ss">:w</span> <span class="mi">2</span> <span class="mi">1</span><span class="p">]]</span>, <span class="ss">:time</span> <span class="mi">3291485317</span>, <span class="ss">:process</span> <span class="mi">0</span>, <span class="ss">:index</span> <span class="mi">0</span><span class="p">}</span> <span class="p">{</span><span class="ss">:type</span> <span class="ss">:invoke</span>, <span class="ss">:f</span> <span class="ss">:txn</span>, <span class="ss">:value</span> <span class="p">[[</span><span class="ss">:r</span> <span class="mi">0</span> <span class="nv">nil</span><span class="p">]</span> <span class="p">[</span><span class="ss">:w</span> <span class="mi">1</span> <span class="mi">1</span><span class="p">]</span> <span class="p">[</span><span class="ss">:r</span> <span class="mi">2</span> <span class="nv">nil</span><span class="p">]</span> <span class="p">[</span><span class="ss">:w</span> <span class="mi">1</span> <span class="mi">2</span><span class="p">]]</span>, <span class="ss">:time</span> <span class="mi">3296209422</span>, <span class="ss">:process</span> <span class="mi">2</span>, <span class="ss">:index</span> <span class="mi">1</span><span class="p">}</span> <span class="p">{</span><span class="ss">:type</span> <span class="ss">:fail</span>, <span class="ss">:f</span> <span class="ss">:txn</span>, <span class="ss">:value</span> <span class="p">[[</span><span class="ss">:r</span> <span class="mi">0</span> <span class="nv">nil</span><span class="p">]</span> <span class="p">[</span><span class="ss">:w</span> <span class="mi">1</span> <span class="mi">1</span><span class="p">]</span> <span class="p">[</span><span class="ss">:r</span> <span class="mi">2</span> <span class="nv">nil</span><span class="p">]</span> <span class="p">[</span><span class="ss">:w</span> <span class="mi">1</span> <span class="mi">2</span><span class="p">]]</span>, <span class="ss">:time</span> <span class="mi">3565403674</span>, <span class="ss">:process</span> <span class="mi">2</span>, <span class="ss">:index</span> <span class="mi">2</span>, <span class="ss">:error</span> <span class="p">[</span><span class="ss">:duplicate-key</span> <span class="s">&quot;etcdserver: duplicate key given in txn request&quot;</span><span class="p">]}</span> <span class="p">{</span><span class="ss">:type</span> <span class="ss">:ok</span>, <span class="ss">:f</span> <span class="ss">:txn</span>, <span class="ss">:value</span> <span class="p">[[</span><span class="ss">:w</span> <span class="mi">2</span> <span class="mi">1</span><span class="p">]]</span>, <span class="ss">:time</span> <span class="mi">3767733708</span>, <span class="ss">:process</span> <span class="mi">0</span>, <span class="ss">:index</span> <span class="mi">3</span><span class="p">}]</span> </code></pre> <p>This shows two concurrent transactions executed by process 0 and 2 respectively. Process 0’s transaction attempted to write key 2 = 1 and succeeded, but process 2’s transaction tried to read 0, write key 1 = 1, read 2, then write key 1 = 2, and failed.</p> <p>Each operation is logically a map with <a href="https://github.com/jepsen-io/history/tree/be95aef21739167760c998e988fd4b3d10db1e3e#operation-structure">several predefined keys</a>. The <code>:type</code> field determines whether an operation is an invocation (<code>:invoke</code>) or a successful (<code>:ok</code>), failed (<code>:fail</code>), or indeterminate (<code>:info</code>) completion. The <code>:f</code> and <code>:value</code> fields indicate what the operation did—their interpretation is up to the test in question. We use <code>:process</code> to track which logical thread performed the operation. <code>:index</code> is a strictly-monotonic unique ID for every operation. In <em>dense</em> histories, indices go 0, 1, 2, …. FIltering or limiting a history to a subset of those operations yields a <em>sparse</em> history, with IDs like 3, 7, 8, 20, …. Operations can also include all kinds of additional fields at the test’s discretion.</p> <p>During a test Jepsen decides what operation to invoke next by asking a <em>generator</em>: a functional data structure that evolves as operations are performed. Generators include a rich library of combinators that can build up complex sequences of operations, including sequencing, alternation, mapping, filtering, limiting, temporal delays, concurrency control, and selective nondeterminism.</p> <h2><a href="#generator-performance" id="generator-performance">Generator Performance</a></h2> <p>One of the major obstacles to high-throughput testing with Jepsen was that the generator system simply couldn’t produce enough operations to keep up with the database under test. Much of this time was spent in manipulating the generator <em>context</em>: a data structure that keeps track of which threads are free and which logical processes are being executed by which threads. Contexts are often checked or restricted—for instance, a generator might want to pick a random free process in order to emit an operation, or to filter the set of threads in a context to a specific subset for various sub-generators: two threads making writes, three more performing reads, and so on.</p> <p>Profiling revealed context operations as a major bottleneck in generator performance—specifically, manipulating the Clojure maps and sets we initially used to represent context data. In Jepsen 0.3.0 I introduced a new <a href="https://github.com/jepsen-io/jepsen/blob/eeec7ef02ca278f52d3f1c71d9ff4ae5a75aad43/jepsen/src/jepsen/generator/context.clj">jepsen.generator.context</a> namespace with high-performance data structures for common context operations. I wrote a custom <a href="https://github.com/jepsen-io/jepsen/blob/main/jepsen/src/jepsen/generator/translation_table.clj">translation table</a> to map thread identifiers (e.g. <code>1</code>, <code>2</code>, … <code>:nemesis</code>) to and from integers. This let us encode context threads using Java <code>BitSet</code>s, which are significantly more compact than hashmap-backed sets and require no pointer chasing. To restrict threads to a specific subset, we <a href="https://github.com/jepsen-io/jepsen/blob/main/jepsen/src/jepsen/generator/context.clj#L311-L358">precompile a filter BitSet</a>, which turns expensive set intersection into a blazing-fast bitwise <code>and</code>. This bought us an <a href="https://github.com/jepsen-io/jepsen/commit/91d5e8fae64eca31b243fd978099c888db63260f">order-of-magnitude performance improvement</a> in a realistic generator performance test—pushing up to 26,000 ops/second through a 1024-thread generator.</p> <p>Based on <a href="https://www.yourkit.com/">YourKit</a> profiling I made additional performance tweaks. Jepsen now <a href="https://github.com/jepsen-io/jepsen/commit/165cba94be370a3226cb3c5e088b25598d0d97f0">memoizes function arity checking</a> when functions are used as generators. A few type hints helped avoid reflection in hot paths. <a href="https://github.com/jepsen-io/jepsen/commit/2c17e3253bd4459da07bf551a80d57f98e9d3f42">Additional micro-optimizations</a> bought further speedups. The function which fills in operation types, times, and processes spent a lot of time manipulating persistent maps, which we avoided by carefully tracking which fields had been read from the underlying map. We replaced expensive <code>mapcat</code> with a transducer, replaced <code>seq</code> with <code>count</code> to optimize empty-collection checks, and replaced a few <code>=</code> with <code>identical?</code> for symbols.</p> <p>In testing of list-append generators (a reasonably complex generator type) these improvements <a href="https://mastodon.jepsen.io/@jepsen/109287959624710790">let us push 50,000 operations per second</a> with 100 concurrent threads.</p> <h2><a href="#off-with-her-head" id="off-with-her-head">Off With Her Head</a></h2> <p>Another challenge in long-running Jepsen tests was a linear increase in memory consumption. This was caused by the test map, which is passed around ubiquitously and contains a reference to the initial generator that the test started with. Since generators are a lot like infinite sequences, “retaining the head” of the generator often resulted in keeping a chain of references to every single operation ever generated in memory.</p> <p>Because the test map is available during every phase of the test, and could be accessed by all kinds of uncontrolled code, we needed a way to enforce that the generator wouldn’t be retained accidentally. Jepsen now wraps the generator in a <a href="https://github.com/jepsen-io/jepsen/commit/7b8b5d531694a58a352de913c18c2bf2e4facacb">special <code>Forgettable</code> reference type</a> that can be explicitly forgotten. Forgetting that reference once the test starts running allows the garbage collector to free old generator states. Tests of 10<sup>9</sup> operations can now run in just a few hundred megabytes of heap.</p> <h2><a href="#operations-as-records" id="operations-as-records">Operations as Records</a></h2> <p>We initially represented operations as Clojure maps, which use arrays and hash tables internally. These maps require extra storage space, can only store JVM Objects, and require a small-but-noticeable lookup cost. Since looking up fields is so frequent, and memory efficiency important, we replaced them with Clojure <a href="https://clojure.org/reference/datatypes#_deftype_and_defrecord">records</a>. Our new <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L250-L253">Op datatype</a> stores its index and time as primitive longs, and the type, process, f, and value as Object fields. This is a significantly denser layout in memory, which also aids cache locality. Access to fields still looks like a map lookup (e.g. <code>(:index some-op)</code>), but compiles into a type cache check and a read of that field in the Op object. Much faster!</p> <p>Clojure records work very much like maps—they store a reference to a hashmap, so test can still store any number of extra fields on an Op. This preserves compatibility with existing Jepsen tests. The API is <em>slightly</em> different: <code>index</code> and <code>time</code> fields are no longer optional. Jepsen always generates these fields during tests, so this change mainly affects the internal test suites for Jepsen checkers.</p> <h2><a href="#streaming-disk-storage" id="streaming-disk-storage">Streaming Disk Storage</a></h2> <p>Jepsen initially saved tests, including both histories and the results of analysis, as a single map serialized with <a href="https://github.com/Datomic/fressian/wiki">Fressian</a>. This worked fine for small tests, but writing large histories could take tens of minutes. Jepsen now uses a <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj">custom disk format</a>. Jepsen files consist of a short header followed by an append-only log of immutable blocks. Each block contains a type, length, and checksum header, followed by data. These blocks form a persistent tree structure on disk: changes are written by appending new blocks to the end of the file, then flipping a single offset in the file header to point to the block that encodes the root of the tree.</p> <p>Blocks can store <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L97">arbitrary Fressian data</a>, <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L112">layered, lazily-decoded maps</a>, <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L130">streaming lists</a>, and <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L143">chunked vectors</a> which store pointers to other vectors “stitched together” into a single large vector. Chunked vectors store the index of the first element in each chunk, so you can jump to the correct chunk for any index in constant time.</p> <p>During a test Jepsen <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L1084-L1135">writes each operation</a> into a series of chunks on disk—each a streaming list of Fressian-encoded data. When a chunk is full (by default, 16384 operations), Jepsen seals that chunk, writes its checksum, and constructs a new chunked vector block with pointers to each chunk in the history—re-using existing chunks. It then writes a new root block and flips the root pointer at the start of the file. This ensures that at most the last 16384 operations can be lost during a crash.</p> <p>When Jepsen loads a test from disk it reads the header, decodes the root block, finds the base of the test map, and constructs a <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L1406-L1434">lazy map</a> which decodes results and the history only when asked for. Results are also encoded as <a href="https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L1252-L1285">lazy maps</a>, so tests with large results can still be decoded efficiently. This lets Jepsen read the high-level properties of tests—their names, timestamps, overall validity, etc., in a few milliseconds rather than tens of minutes. It also preserves compatibility: tests loaded from disk look and feel like regular hashmaps. Metadata allows those tests to be altered and saved back to disk efficiently, so you can re-analyze them later.</p> <p>Jepsen loads a test’s history using a <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/core.clj#L166-L326">lazy chunked vector</a>. The top-level vector keeps a cached count and lookup table to efficiently map indices to chunks. Each chunk is lazily decoded on first read and cached in a JVM <a href="https://docs.oracle.com/javase/8/docs/api/java/lang/ref/SoftReference.html">SoftReference</a>, which can be reclaimed under memory pressure. The resulting vector looks to users just like an in-memory vector, which ensures compatibility with extant Jepsen checkers.</p> <h2><a href="#history-datatypes" id="history-datatypes">History Datatypes</a></h2> <p>When working with histories checkers need to perform some operations over and over again. For instance, they often need a count of results, to look up an operation by its <code>:index</code> field, to jump between invocations and completions, and to schedule concurrent tasks for analysis. We therefore augment our lazy vectors of operations with an <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L491-L524">IHistory</a> type which supports these operations. Our histories still act like a vector of operations, but also provide efficient, cached mapping of invocations to completions, a threadpool for scheduling analysis tasks, and a stateful execution planner for folds over the history, which we call a <em>folder</em>.</p> <p>When Jepsen records a history it assigns each operation a sequential index 0, 1, 2, 3, …. We call these histories <em>dense</em>. This makes it straightforward to look up an operation by its index: <code>(nth history idx</code>). However, users might also filter a history to just some operations, or excerpt a slice of a larger history for detailed analysis. This results in a <em>sparse</em> history. Our sparse histories <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L782-L850">maintain a lazily-computed mapping of operation indices to vector indices</a> to efficiently support this.</p> <p>We also provide lazy, history-specific versions of <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L927-L1010">map</a> and <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L1014-L1153">filter</a>. Checkers perform these operations on histories over and over again, but using standard Clojure <code>map</code> and <code>filter</code> would be inappropriate: once realized, the entire mapped/filtered sequence is stored in memory indefinitely. They also require linear-time lookups for <code>nth</code>. Our versions <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L969-L975">push down the map and filter operations into each access</a>, trading off cachability for reduced memory consumption. This pushdown also enables multiple queries to share the same map or filter function. We also provide <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L939-L941">constant-time nth</a> on mapped histories, and <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L1095-L1099">log-time nth</a> on filtered ones.</p> <h2><a href="#concurrent-folds" id="concurrent-folds">Concurrent Folds</a></h2> <p>Because Jepsen’s disk files are divided into multiple chunks we can efficiently parallelize folds (reductions) over histories. Jepsen histories provide their own <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj">folding facility</a>, similar to (and, indeed, directly compatible with) <a href="https://github.com/aphyr/tesser">Tesser</a>, a library I wrote for composing concurrent folds. Tesser is intended for commutative folds, but almost all of its operators—map, filter, group-by, fuse, facet, etc.—work properly in ordered reductions too. History folds can perform a concurrent reduction over each chunk independently, and then combine the chunks together in a linear reduction. Since modern CPUs can have dozens or even hundreds of cores, this yields a significant speedup. Since each core reads its chunk from the file in a linear scan, it’s also friendly to disk and memory readahead prediction. If this sounds like <a href="https://en.wikipedia.org/wiki/MapReduce">MapReduce</a> on a single box, you’re right.</p> <p><img class="attachment pure-img" src="/data/posts/365/fold.jpg" alt="A diagram showing how concurrent folds proceed in independent reductions over each chunk, then combined in a second linear pass." title="A diagram showing how concurrent folds proceed in independent reductions over each chunk, then combined in a second linear pass."></p> <p>To support this access pattern each history comes with an associated <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L1381-L1405">Folder</a>: a stateful object that schedules, executes, and returns results from folds. Performing a <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L818-L845">linear fold over a history</a> is easy: the folder schedules a reduce of the first chunk of the history, then use that result as the input to a reduce of the second chunk, and so on. A <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L847-L879">concurrent fold</a> schedules reductions over every chunk concurrently, and a series of combine tasks which fold the results of each chunk reduction in serial.</p> <p>Folders are careful about thread locality and memory barriers during inter-thread handoff, which means folds can use <em>mutable</em> in-memory structures for their accumulators, not just immutable values. The reducing and combining functions don’t need to acquire locks of their own, and most access is completely unsynchronized. This works particularly well with the new <a href="https://aphyr.com/posts/363-fast-multi-accumulator-reducers">mutable reducers</a> in <a href="https://github.com/aphyr/dom-top">dom-top</a>.</p> <p>Folders and histories directly support the interfaces for plain-old Clojure <code>reduce</code> and <code>reducers/fold</code>, so existing checkers immediately take advantage of folder optimizations. They’re translated to folds over the history internally. Transducers work just like you’d expect.</p> <h2><a href="#multi-query-optimization" id="multi-query-optimization">Multi-Query Optimization</a></h2> <p>Jepsen tests often involve dozens of different checkers, each of which might perform many folds. When a history is larger than memory, some chunks are evicted from cache to make room for new ones. This means that a second fold might end up re-parsing chunks of history from disk that a previous fold already parsed–and forgot due to GC pressure. Running many folds at the same time also reduces locality: each fold likely reads an operation and follows pointers to other objects at different times and on different cores. This burns extra time in memory accesses <em>and</em> evicts other objects from the CPU’s cachelines. It would be much more efficient if we could do many folds in a single pass.</p> <p>How should this coordination be accomplished? In theory we could have a pure fold composition system and ask checkers to return the folds they intended to perform, along with some sort of continuation to evaluate when results were ready—but this requires rewriting every extant checker and breaking the intuitive path of a checker’s control flow. It also plays terribly with internal concurrency in a checker. Instead, each history’s folder provides a stateful coordination service for folds. Checkers simply ask the folder to fold over the history, and the folder secretly fuses all those folds into as few passes as possible.</p> <p><img class="attachment pure-img" src="/data/posts/365/mqo.jpg" alt="When a second fold is run, the folder computes a fused fold and tries to do as much work as possible in a single pass." title="When a second fold is run, the folder computes a fused fold and tries to do as much work as possible in a single pass."></p> <p>Imagine one thread starts a fold (orange) that looks for G1a (aborted read). That fold finishes reducing chunk 0 of the history and is presently reducing chunks 3 and 5, when another thread starts a fold to compute latency statistics (blue). The folder realizes it has an ongoing concurrent fold. It builds a fused fold (gray) which performs both G1a and latency processing in a single step. It then schedules latency reductions for chunks 0, 3, and 5, to catch up with the G1a fold. Chunks 1, 2, 4, and 6+ haven’t been touched yet, so the folder cancels their reductions. In their place it spawns new fused reduction tasks for those chunks, and new combine tasks for their results. It automatically weaves together the independent and fused stages so that they re-use as much completed work as possible. Most of the history is processed in a single pass, and once completed, the folder delivers results from both folds to their respective callers. All of this is <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L1025-L1329">done transactionally</a>, ensuring exactly-once processing and thread safety even with unsynchronized mutable accumulators. A simpler join process allows the folder to fuse together <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L910-L1023">linear folds</a>.</p> <p>The end result of this byzantine plumbing is that callers don’t have to think about <em>how</em> the underlying pass is executed. They just ask to fold over the history, and the folder figures out an efficient plan for executing it. This might involve a few independent passes over the first few chunks, but those chunks are likely retained in cache, which cuts down on deserialization. The rest of the history is processed in a single pass. We get high throughput, reduced IO, and cache locality.</p> <h2><a href="#task-scheduling" id="task-scheduling">Task Scheduling</a></h2> <p>To take advantage of multiple cores efficiently, we want checkers to run their analyses in multiple threads—but not <em>too</em> many, lest we impose heavy context-switching costs. In order to ensure thread safety and exactly-once processing during multi-query optimization, we need to ensure that fold tasks are cancelled and created atomically. We also want to ensure that tasks which block on the results of other tasks don’t launch until their dependencies are complete. This would lead to threadpool starvation and in some cases deadlock: a classic problem with Java threadpools.</p> <p>To solve these problems Jepsen’s history library includes a <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj">transactional, dependency-aware task scheduler</a>. Tasks are functions which receive their dependencies’ results as arguments, and can only start when their dependencies are complete. This prevents deadlock. Tasks are cancellable, which also cancels their dependencies. An exception in a task propagates to downstream dependencies, which simplifies error handling. Dereferencing a task returns its result, or throws if an exception occurred. These capabilities are backed by an <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L373-L379">immutable executor state</a> which tracks the dependency graph, which tasks are ready and presently running, and the next task ID. A miniature <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L806-L832">effect system</a> applies a log of “side effects” from changes to the immutable state. This approach allows us to support <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L834-L851">arbitrary transactions</a> over the task executor: one can create and cancel any number of tasks in an atomic step.</p> <p>This is powered by questionable misuses of Java’s ExecutorService &amp; BlockingQueue: our “queue” is simply the immutable state’s dependency graph. We <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L741-L747">ignore the ExecutorService’s attempts to add things to the queue</a> and instead mutate the state through our transaction channels. We wake up the executor with <a href="https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L828-L832">trivial runnables</a> when tasks become available. This feels evil, but it <em>does</em> work: executor safety is validated by an extensive series of <a href="https://github.com/jepsen-io/history/tree/be95aef21739167760c998e988fd4b3d10db1e3e/test/jepsen">generative tests</a> which stress both linear and concurrent behavior.</p> <h2><a href="#compatibility" id="compatibility">Compatibility</a></h2> <p>The most significant API change was replacing plain vectors of operations with a first-class <code>IHistory</code> type. Users who constructed their own histories and fed them to built-in checkers—for instance, by calling regular Clojure <code>map</code> or <code>filter</code> on a history—received a blunt warning of type incompatibility. The fix here is straightforward: either use <code>jepsen.history/map</code> and friends, or <code>jepsen.history/history</code> to wrap an existing collection in a new <code>IHistory</code>. These wrappers are designed to work both for the clean histories Jepsen produces as well as the incomplete test histories involved in Jepsen’s internal test suites.</p> <p>The other significant change was making operation <code>:time</code> and <code>:index</code> mandatory. These are always provided by Jepsen, but might be missing from hand-coded histories used in internal test suites. Operations are automatically promoted to hashmaps by the <code>history</code> wrapper function, and indices can either be validated or automatically assigned for testing.</p> <p>Otherwise the transition was remarkably successful. Existing code could continue to treat operations as hashmaps, to pretend histories were vectors, and to use standard Clojure collection operations as before. These all worked transparently, despite now being backed by lazily cached on-disk structures. I incrementally rewrote most of the core checkers in Jepsen and Elle to take advantage of the new Op type and fold system, and obtained significant speedups throughout the process.</p> <p>Dropping <code>:generator</code> from the test map is a recent change. I don’t <em>think</em> that’s going to cause problems; I’m not aware of anything that actually read the generator state other than debugging code, but you never know.</p> <h2><a href="#performance" id="performance">Performance</a></h2> <p>With Jepsen 0.2.7 an etcd list-append transactional test on a five-node LXC cluster might perform ~7,000 ops/sec (counting both invocations and completions). At this point the bottleneck is etcd, not Jepsen itself. With a 16 GB heap, that test would exhaust memory and crash after roughly two hours. Given 90 minutes, it generated a single Fressian object of 2.2 GB containing 38,076,478 operations; this file was larger than MAX_INT bytes and caused Jepsen to crash.</p> <p>When limited to an hour that test squeaked in under both heap and serialization limits: 25,461,508 operations (7.07 kHz) in a 1.5 GB Jepsen file. Analyzing that in-memory history for G1a, G1b, and internal consistency took 202 seconds (126 kHz). This timeseries plot of heap use and garbage collections shows the linear growth in memory use required to retain the entire history and generator state in memory.</p> <p><img class="attachment pure-img" src="/data/posts/365/027-memory-just-enough.png" alt="A timeseries plot of heap use and garbage collections. Memory use rises linearly over the hour-long test run, followed by a spike in heap use and intense collection pressure during the final analysis phase." title="A timeseries plot of heap use and garbage collections. Memory use rises linearly over the hour-long test run, followed by a spike in heap use and intense collection pressure during the final analysis phase."></p> <p>The resulting Jepsen file took 366 seconds and a 58 GB heap to load the history—even just to read a single operation. There are various reasons for this, including the larger memory footprint of persistent maps and vectors, temporarily retained intermediate representations from the Fressian decoder, pointers and object header overhead, Fressian’s more-compact encodings of primitive types, and its caching of repeated values.</p> <p>With Jepsen 0.3.3 that same hour-long test consumed only a few hundred megabytes of heap until the analysis phase, where it took full advantage of the 16 GB heap to cache most of the history and avoid parsing chunks multiple times. It produced a history of 40,358,966 elements (11.2 kHz) in a 1.9 GB Jepsen file. The analysis completed in 97 seconds (416 kHz). Even though the analysis phase has to re-parse the entire history from disk, it does so in parallel, which significantly speeds up the process.</p> <p><img class="attachment pure-img" src="/data/posts/365/033-one-hour.png" alt="A timeseries plot of heap use and garbage collections. Memory use is essentially constant over the hour-long test run, oldgen rising slightly and then falling during a few collections. Memory spikes at the end of the test, when we re-load the history and analyze it." title="A timeseries plot of heap use and garbage collections. Memory use is essentially constant over the hour-long test run, oldgen rising slightly and then falling during a few collections. Memory spikes at the end of the test, when we re-load the history and analyze it."></p> <p>Given ~25.4 hours that same test suite recorded a billion (1,000,000,000) operations at roughly 11 kHz, producing a 108 GB Jepsen file. Checking that history for G1a, G1b, and internal anomalies took 1 hour and 6 minutes (249 kHz) with a 100 GB heap, and consumed 70-98% of 48 cores. Note that our heap needed to be much larger. G1a and G1b are global anomalies and require computing all failed and intermediate writes in an initial pass over the history, then retaining those index structures throughout a second pass. Under memory pressure, this analysis requires more frequent reloading of history chunks that have been evicted from cache. Simpler folds, like computing totals for ok, failed, and crashed operations, can execute at upwards of 900 kHz.</p> <p>Thanks to the new file format’s lazy structure, parsing the top-level test attributes (e.g. time, name, validity) took just 20-30 milliseconds. Fetching a single element of the history (regardless of position) took ~500 milliseconds: a few disk seeks to load the root, test, and history block, and then checksumming and parsing the appropriate chunk. The history block itself was ~715 KB, and consisted of pointers to 61,000 chunks of 16384 operations each.</p> <h2><a href="#conclusion" id="conclusion">Conclusion</a></h2> <p>Jepsen 0.3.3 is capable of generating 50,000 operations per second, tackling histories of up to a billion operations and checking close to a million operations per second. Of course, these numbers depend on what the generator, client, and test are doing! Going further may be tricky: many JVM collection APIs are limited to 2<sup>31</sup> - 1 elements. Many of the checkers in Jepsen have been rewritten to run as concurrent folds, and to use more memory-efficient structures. However, only a few analyses are completely local: many require O(n) memory and may be impossible to scale past a few million operations.</p> <p>You can start using <a href="https://github.com/jepsen-io/jepsen/commit/cc200b1fd66cc36f827259237727655472c73f8e">Jepsen 0.3.3</a> today. I hope it opens up new territory for distributed systems testers everywhere.</p> </content></entry><entry><id>https://aphyr.com/posts/363-fast-multi-accumulator-reducers</id><title>Fast Multi-Accumulator Reducers</title><published>2023-04-27T08:47:46-05:00</published><updated>2023-04-27T08:47:46-05:00</updated><link rel="alternate" href="https://aphyr.com/posts/363-fast-multi-accumulator-reducers"></link><author><name>Aphyr</name><uri>https://aphyr.com/</uri></author><content type="html"><p>Again <a href="https://aphyr.com/posts/360-loopr-a-loop-reduction-macro-for-clojure">with the reductions</a>! I keep writing code which reduces over a collection, keeping track of more than one variable. For instance, here’s one way to find the mean of a collection of integers:</p> <pre><code><span></span><span class="p">(</span><span class="kd">defn </span><span class="nv">mean</span> <span class="s">&quot;A reducer to find the mean of a collection. Accumulators are [sum count] pairs.&quot;</span> <span class="p">([]</span> <span class="p">[</span><span class="mi">0</span> <span class="mi">0</span><span class="p">])</span> <span class="p">([[</span><span class="nv">sum</span> <span class="nv">count</span><span class="p">]]</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">sum</span> <span class="nv">count</span><span class="p">))</span> <span class="p">([[</span><span class="nv">sum</span> <span class="nv">count</span><span class="p">]</span> <span class="nv">x</span><span class="p">]</span> <span class="p">[(</span><span class="nb">+ </span><span class="nv">sum</span> <span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">count</span><span class="p">)]))</span> </code></pre> <p>This <code>mean</code> function is what Clojure calls a <a href="https://clojure.org/reference/reducers">reducer</a>, or a <a href="https://clojure.org/reference/transducers">reducing function</a>. With no arguments, it constructs a fresh accumulator. With two arguments, it combines an element of some collection with the accumulator, returning a new accumulator. With one argument, it transforms the accumulator into some final result.</p> <p>We can use this reducer with standard Clojure <code>reduce</code>, like so:</p> <pre><code><span></span><span class="p">(</span><span class="nf">require</span> <span class="o">'</span><span class="p">[</span><span class="nv">criterium.core</span> <span class="ss">:refer</span> <span class="p">[</span><span class="nv">quick-bench</span> <span class="nv">bench</span><span class="p">]]</span> <span class="o">'</span><span class="p">[</span><span class="nv">clojure.core.reducers</span> <span class="ss">:as</span> <span class="nv">r</span><span class="p">])</span> <span class="p">(</span><span class="k">def </span><span class="nv">numbers</span> <span class="p">(</span><span class="nf">vec</span> <span class="p">(</span><span class="nb">range </span><span class="mi">1</span><span class="nv">e7</span><span class="p">)))</span> <span class="p">(</span><span class="nf">mean</span> <span class="p">(</span><span class="nb">reduce </span><span class="nv">mean</span> <span class="p">(</span><span class="nf">mean</span><span class="p">)</span> <span class="nv">numbers</span><span class="p">))</span> <span class="c1">; =&gt; 9999999/2</span> </code></pre> <p>Or use <code>clojure.core.reducers/reduce</code> to apply that function to every element in a collection. We don’t need to call <code>(mean)</code> to get an identity here; <code>r/reduce</code> does that for us:</p> <pre><code><span></span><span class="p">(</span><span class="nf">mean</span> <span class="p">(</span><span class="nf">r/reduce</span> <span class="nv">mean</span> <span class="nv">numbers</span><span class="p">))</span> <span class="c1">; =&gt; 9999999/2</span> </code></pre> <p>Or we can use <code>transduce</code>, which goes one step further and calls the single-arity form to transform the accumulator. Transduce also takes a <code>transducer</code>: a function which takes a reducing function and transforms it into some other reducing function. In this case we’ll use <code>identity</code>–<code>mean</code> is exactly what we want.</p> <pre><code><span></span><span class="p">(</span><span class="nf">transduce</span> <span class="nb">identity </span><span class="nv">mean</span> <span class="nv">numbers</span><span class="p">)</span> <span class="c1">; =&gt; 9999999/2</span> </code></pre> <p>The problem with all of these approaches is that <code>mean</code> is slow:</p> <pre><code><span></span><span class="nv">user=&gt;</span> <span class="p">(</span><span class="nf">bench</span> <span class="p">(</span><span class="nf">transduce</span> <span class="nb">identity </span><span class="nv">mean</span> <span class="nv">numbers</span><span class="p">))</span> <span class="nv">Evaluation</span> <span class="nb">count </span><span class="err">:</span> <span class="mi">60</span> <span class="nv">in</span> <span class="mi">60</span> <span class="nv">samples</span> <span class="nv">of</span> <span class="mi">1</span> <span class="nv">calls.</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">mean</span> <span class="err">:</span> <span class="mf">1.169762</span> <span class="nv">sec</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">std-deviation</span> <span class="err">:</span> <span class="mf">46.845286</span> <span class="nv">ms</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">lower</span> <span class="nv">quantile</span> <span class="err">:</span> <span class="mf">1.139851</span> <span class="nv">sec</span> <span class="p">(</span> <span class="mf">2.5</span><span class="nv">%</span><span class="p">)</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">upper</span> <span class="nv">quantile</span> <span class="err">:</span> <span class="mf">1.311480</span> <span class="nv">sec</span> <span class="p">(</span><span class="mf">97.5</span><span class="nv">%</span><span class="p">)</span> <span class="nv">Overhead</span> <span class="nv">used</span> <span class="err">:</span> <span class="mf">20.346327</span> <span class="nv">ns</span> </code></pre> <p>So that’s 1.169 seconds to find the mean of 10^7 elements; about 8.5 MHz. What’s going on here? Let’s try <a href="https://www.yourkit.com/">Yourkit</a>:</p> <p><img class="attachment pure-img" src="/data/posts/363/tuple-time.png" alt="A Yourkit screenshot of this benchmark" title="A Yourkit screenshot of this benchmark"></p> <p>Yikes. We’re burning 42% of our total time in destructuring (<code>nth</code>) and creating (<code>LazilyPersistentVector.createOwning</code>) those vector pairs. Also Clojure vectors aren’t primitive, so these are all instances of Long.</p> <h2><a href="#faster-reducers" id="faster-reducers">Faster Reducers</a></h2> <p>Check this out:</p> <pre><code><span></span><span class="p">(</span><span class="nf">require</span> <span class="o">'</span><span class="p">[</span><span class="nv">dom-top.core</span> <span class="ss">:refer</span> <span class="p">[</span><span class="nv">reducer</span><span class="p">]])</span> <span class="p">(</span><span class="k">def </span><span class="nv">mean</span> <span class="s">&quot;A reducer to find the mean of a collection.&quot;</span> <span class="p">(</span><span class="nf">reducer</span> <span class="p">[</span><span class="nv">sum</span> <span class="mi">0</span>, <span class="nb">count </span><span class="mi">0</span><span class="p">]</span> <span class="c1">; Starting with a sum of 0 and count of 0,</span> <span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="c1">; Take each element, x, and:</span> <span class="p">(</span><span class="nf">recur</span> <span class="p">(</span><span class="nb">+ </span><span class="nv">sum</span> <span class="nv">x</span><span class="p">)</span> <span class="c1">; Compute a new sum</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">count</span><span class="p">))</span> <span class="c1">; And count</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">sum</span> <span class="nv">count</span><span class="p">)))</span> <span class="c1">; Finally dividing sum by count</span> </code></pre> <p>The <code>reducer</code> macro takes a binding vector, like <code>loop</code>, with variables and their initial values. Then it takes a binding vector for an element of the collection. With each element it invokes the third form, which, like <code>loop</code>, recurs with new values for each accumulator variable. The optional fourth form is called at the end of the reduction, and transforms the accumulators into a final return value. You might recognize this syntax from <a href="https://aphyr.com/posts/360-loopr-a-loop-reduction-macro-for-clojure">loopr</a>.</p> <p>Under the hood, <code>reducer</code> <a href="https://github.com/aphyr/dom-top/blob/7f5d429593b42132b71c63f11d0fe8ae5b70ca68/src/dom_top/core.clj#L915-L957">dynamically compiles</a> an accumulator class with two fields: one for <code>sum</code> and one for <code>count</code>. It’ll re-use an existing accumulator class if it has one of the same shape handy. It then constructs a <a href="https://github.com/aphyr/dom-top/blob/7f5d429593b42132b71c63f11d0fe8ae5b70ca68/src/dom_top/core.clj#L1059-L1112">reducing function</a> which takes an instance of that accumulator class, and rewrites any use of <code>sum</code> and <code>acc</code> in the iteration form into direct field access. Instead of constructing a fresh instance of the accumulator every time through the loop, it mutates the accumulator fields in-place, reducing GC churn.</p> <pre><code><span></span><span class="nv">user=&gt;</span> <span class="p">(</span><span class="nf">bench</span> <span class="p">(</span><span class="nf">transduce</span> <span class="nb">identity </span><span class="nv">mean</span> <span class="nv">numbers</span><span class="p">))</span> <span class="nv">Evaluation</span> <span class="nb">count </span><span class="err">:</span> <span class="mi">120</span> <span class="nv">in</span> <span class="mi">60</span> <span class="nv">samples</span> <span class="nv">of</span> <span class="mi">2</span> <span class="nv">calls.</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">mean</span> <span class="err">:</span> <span class="mf">753.896407</span> <span class="nv">ms</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">std-deviation</span> <span class="err">:</span> <span class="mf">10.828236</span> <span class="nv">ms</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">lower</span> <span class="nv">quantile</span> <span class="err">:</span> <span class="mf">740.657627</span> <span class="nv">ms</span> <span class="p">(</span> <span class="mf">2.5</span><span class="nv">%</span><span class="p">)</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">upper</span> <span class="nv">quantile</span> <span class="err">:</span> <span class="mf">778.701155</span> <span class="nv">ms</span> <span class="p">(</span><span class="mf">97.5</span><span class="nv">%</span><span class="p">)</span> <span class="nv">Overhead</span> <span class="nv">used</span> <span class="err">:</span> <span class="mf">20.346327</span> <span class="nv">ns</span> </code></pre> <p>Getting rid of the vector destructuring makes the reduction ~35% faster. I’ve seen real-world speedups in my reductions of 50% or more.</p> <h2><a href="#primitives" id="primitives">Primitives</a></h2> <p>Since we control class generation, we can take advantage of type hints to compile accumulators with primitive fields:</p> <pre><code><span></span><span class="p">(</span><span class="k">def </span><span class="nv">mean</span> <span class="p">(</span><span class="nf">reducer</span> <span class="p">[</span><span class="o">^</span><span class="nb">long </span><span class="nv">sum</span> <span class="mi">0</span>, <span class="o">^</span><span class="nb">long count </span><span class="mi">0</span><span class="p">]</span> <span class="p">[</span><span class="o">^</span><span class="nb">long </span><span class="nv">x</span><span class="p">]</span> <span class="p">(</span><span class="nf">recur</span> <span class="p">(</span><span class="nb">+ </span><span class="nv">sum</span> <span class="p">(</span><span class="nb">long </span><span class="nv">x</span><span class="p">))</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">count</span><span class="p">))</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">sum</span> <span class="nv">count</span><span class="p">)))</span> </code></pre> <p>Since <code>reduce</code>, <code>transduce</code>, etc. call our reducer with Object, rather than Long, we still have to unbox each element–but we <em>do</em> avoid boxing on the iteration variables, at least. A primitive-aware reduce could do better.</p> <p><img class="attachment pure-img" src="/data/posts/363/reducer-prim.png" alt="A Yourkit screenshot showing almost all our time is spent in unboxing incoming longs. The addition and destructuring costs are gone." title="A Yourkit screenshot showing almost all our time is spent in unboxing incoming longs. The addition and destructuring costs are gone."></p> <p>This gives us a 43% speedup over the standard pair reducer approach.</p> <pre><code><span></span><span class="nv">user=&gt;</span> <span class="p">(</span><span class="nf">bench</span> <span class="p">(</span><span class="nf">transduce</span> <span class="nb">identity </span><span class="nv">mean</span> <span class="nv">numbers</span><span class="p">))</span> <span class="nv">Evaluation</span> <span class="nb">count </span><span class="err">:</span> <span class="mi">120</span> <span class="nv">in</span> <span class="mi">60</span> <span class="nv">samples</span> <span class="nv">of</span> <span class="mi">2</span> <span class="nv">calls.</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">mean</span> <span class="err">:</span> <span class="mf">660.420219</span> <span class="nv">ms</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">std-deviation</span> <span class="err">:</span> <span class="mf">14.255986</span> <span class="nv">ms</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">lower</span> <span class="nv">quantile</span> <span class="err">:</span> <span class="mf">643.583522</span> <span class="nv">ms</span> <span class="p">(</span> <span class="mf">2.5</span><span class="nv">%</span><span class="p">)</span> <span class="nv">Execution</span> <span class="nb">time </span><span class="nv">upper</span> <span class="nv">quantile</span> <span class="err">:</span> <span class="mf">692.084752</span> <span class="nv">ms</span> <span class="p">(</span><span class="mf">97.5</span><span class="nv">%</span><span class="p">)</span> <span class="nv">Overhead</span> <span class="nv">used</span> <span class="err">:</span> <span class="mf">20.346327</span> <span class="nv">ns</span> </code></pre> <h2><a href="#early-return" id="early-return">Early Return</a></h2> <p>The <code>reducer</code> macro supports early return. If the iteration form does not <code>recur</code>, it skips the remaining elements and returns immediately. Here, we find the mean of at most 1000 elements:</p> <pre><code><span></span><span class="p">(</span><span class="k">def </span><span class="nv">mean</span> <span class="p">(</span><span class="nf">reducer</span> <span class="p">[</span><span class="o">^</span><span class="nb">long </span><span class="nv">sum</span> <span class="mi">0</span>, <span class="o">^</span><span class="nb">long count </span><span class="mi">0</span> <span class="ss">:as</span> <span class="nv">acc</span><span class="p">]</span> <span class="c1">; In the final form, call the accumulator `acc`</span> <span class="p">[</span><span class="o">^</span><span class="nb">long </span><span class="nv">x</span><span class="p">]</span> <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">= count </span><span class="mi">1000</span><span class="p">)</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">sum</span> <span class="nv">count</span><span class="p">)</span> <span class="c1">; Early return</span> <span class="p">(</span><span class="nf">recur</span> <span class="p">(</span><span class="nb">+ </span><span class="nv">sum</span> <span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">count</span><span class="p">)))</span> <span class="c1">; Keep going</span> <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">number?</span> <span class="nv">acc</span><span class="p">)</span> <span class="c1">; Depending on whether we returned early...</span> <span class="nv">acc</span> <span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">sum</span> <span class="nv">count</span><span class="p">]</span> <span class="nv">acc</span><span class="p">]</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">sum</span> <span class="nv">count</span><span class="p">)))))</span> </code></pre> <p>For consistency with <code>transduce</code>, both normal and early return accumulators are passed to the final form. Because the early-returned value might be of a different type, <code>reducer</code> lets you declare a single variable for the accumulators in the final form–here, we call it <code>acc</code>. Depending on whether we return early or normally, <code>acc</code> will either be a number or a vector of all accumulators: <code>[sum count]</code>.</p> <h2><a href="#available-now" id="available-now">Available Now</a></h2> <p>I’ve been using <code>reducer</code> extensively in <a href="https://jepsen.io/">Jepsen</a> over the last six months, and I’ve found it very helpful! It’s used in many of our checkers, and helps find anomalies in histories of database operations more quickly.</p> <p>If you’d like to try this for yourself, you’ll find <code>reducer</code> in <a href="https://github.com/aphyr/dom-top">dom-top</a>, my library for unorthodox control flow. It’s available on Clojars.</p> <p><em>A special thanks to Justin Conklin, who was instrumental in getting the bytecode generation and dynamic classloading <code>reducer</code> needs to work!</em></p> </content></entry><entry><id>https://aphyr.com/posts/361-monkeypox-in-ohio</id><title>Monkeypox in Ohio</title><published>2022-07-22T10:43:34-05:00</published><updated>2022-07-22T10:43:34-05:00</updated><link rel="alternate" href="https://aphyr.com/posts/361-monkeypox-in-ohio"></link><author><name>Aphyr</name><uri>https://aphyr.com/</uri></author><content type="html"><p><em>Update 2022-08-12: The Hamilton County Health Department now has a <a href="https://www.hamiltoncountyhealth.org/services/programs/community-health-services-disease-prevention/monkeypox/">page about monkeypox</a> with symptoms and isolation guidance, as well as options for vaccination, testing, and treatment–look for “complete our monkeypox vaccine registration”. The Cincinnati Health Department is <a href="https://www.cincinnati-oh.gov/health/news/monkeypox-vaccination/">also offering vaccines</a> for high-risk groups. People in Hamilton County without a primary care physician who have symptoms can also call call 513-357-7320 for the Cincinnati city health clinic.</em></p> <p>If you’re a gregarious gay man like me you’ve probably heard about monkeypox. <a href="https://www.cdc.gov/poxvirus/monkeypox/index.html">Monkeypox is an orthopoxvirus</a> which causes, in addition to systemic symptoms, lesions on the skin and mucosa. It’s transmitted primarily through skin-to-skin contact, though close-range droplet and fomite transfer are also possible. The current outbreak in industrialized nations is <a href="https://mobile.twitter.com/mugecevik/status/1549858617800753154">almost entirely</a> among gay, bisexual, and other men who have sex with men (GBMSM); likely via sexual networks. <a href="https://www.gov.uk/government/publications/monkeypox-outbreak-technical-briefings/investigation-into-monkeypox-outbreak-in-england-technical-briefing-3">In the UK, for example</a>, 99% of cases are male and 97% are among GBMSM. <a href="https://www.publichealthontario.ca/-/media/Documents/M/2022/monkeypox-episummary.pdf?sc_lang=en">Ontario reports</a> 99% of cases are in men. In New York <a href="https://www1.nyc.gov/assets/doh/downloads/pdf/han/advisory/2022/monkeypox-update-testing-treatment.pdf">99% of cases are in men who have sex with men</a>. For a good overview of what monkeypox looks like, how it’s spread, and ways we can reduce transmission, check out San Francisco Leathermen’s Discussion Group’s <a href="https://www.youtube.com/watch?v=iLninLYkXDc">presentation by MPH Frank Strona</a>.</p> <p>Earlier outbreaks of monkeypox have been generally self-limiting, but that does not appear to be the case with the current outbreak; perhaps because it’s spreading among men who enjoy lots of skin-to-skin contact with lots of people, and often travel to do just that. We saw cases come out of Darklands, IML, and Daddyland; they’re also rising quickly in major cities across the US. Cases haven’t taken off in Ohio yet, but for those of us who travel to large gay events it’s an increasing risk. Two of my friends tested positive last week; one landed in the ER. We’ve known for a while about symptoms and risk reduction tactics, so I won’t cover those here, but I <em>do</em> want to talk about vaccines and testing.</p> <p>There is a vaccine (in the US, <a href="https://www.cdc.gov/mmwr/volumes/71/wr/mm7122e1.htm">JYNNEOS</a>) for monkeypox, but but we have nowhere near enough doses. Some cities like SF, New York, and Chicago have pop-up mass vaccination sites. The Ohio Department of Health, on the other hand, has been <a href="https://ohiocapitaljournal.com/2022/07/13/monkeypox-is-spreading-but-the-ohio-department-of-health-hasnt-spread-the-message/">basically silent on the outbreak</a>: two months in there’s no public messaging about risk factors, prevention, testing, or treatment. <a href="https://aspr.hhs.gov/SNS/Pages/JYNNEOS-Distribution.aspx">Only ten doses</a> have been distributed in the state. Many of my friends here have no idea how to get vaccinated.</p> <p>Through an incredible stroke of luck and a truly fantastic doctor, I got my first dose yesterday. I followed up by chatting with our county’s (super nice!) head epidemiologist, and asking what advice I can give to the local queer leather community. Here’s the scoop:</p> <h2><a href="#getting-vaccinated-in-ohio" id="getting-vaccinated-in-ohio">Getting Vaccinated in Ohio</a></h2> <p>Because of the limited supply of vaccine doses, we’re currently focusing vaccination on those at high risk. Ohio’s current high-risk category includes:</p> <ul> <li>Anyone who has had close intimate contact (sex, cuddling, making out, etc.) with a known monkeypox case</li> <li>Exposure to respiratory droplets (e.g. dancing close, kissing) with a known case</li> </ul> <p>The problem is that many of us don’t <em>know</em> we’ve been in contact with monkeypox cases: of the people I know who have tested positive, none knew where they got it from. You might dance, make out with, or fuck dozens of people at an event weekend. So (our epidemiologist advised me) your participation in events where people have lots of sex or skin-to-skin contact can <em>also</em> be considered a risk factor. In other states having <a href="https://www1.nyc.gov/site/doh/health/health-topics/monkeypox.page">multiple or anonymous partners in a two-week span</a> is a qualifying criterion; that might be worth a shot here too.</p> <p>There are no public vaccination sites in Ohio yet, as far as I know. Doses are distributed to individuals through their primary care physician (which poses a problem if you <em>don’t</em> have one, oof). You should start by talking to your doctor about your risk factors: for me that was “I have two friends who tested positive, I just did bear week in ptown, I play regularly, and I’m going to be participating in a large leather run in a few weeks.” If your doctor concurs you’re at high risk, they should get in touch with their usual contact at the county or city department of health. The department of health gathers information about risk factors and contact information, and then gets in touch with the Ohio department of health, which requisitions doses from the CDC and ships them to your doctor. For me it took three days from “talking to my doctor” to “first dose in arm”.</p> <p>The JYNNEOS shot is fairly easy: it’s a single sub-cutaneous injection with a thin needle into the fat on your arm–much better than the old orthopoxvirus vaccines. I barely felt it. The usual vaccine side effects apply: you <a href="https://www.fda.gov/media/131078/download">might have some redness, swelling, or pain</a> at the injection site; other common side effects include low-grade muscle pain, headache, and fatigue. I’ve noticed some redness and a little bit of pain, but it’s been super mild. You need to come back for a second dose four weeks later, and protection is maximized two weeks after the second dose.</p> <p>When you get vaccinated, there’s a 21-day observation period where you need to report monkeypox-related symptoms via phone, text, or email to the health department–they’ll get in touch with you about this.</p> <h2><a href="#getting-tested-in-ohio" id="getting-tested-in-ohio">Getting Tested in Ohio</a></h2> <p>There’s no dedicated testing system in place yet, and I know friends have had really mixed luck getting tested through their primaries, urgent care, and even some hospitals. That said, our county epidemiologist says that both the state lab and some private labs–notably LabCorp–can test for monkeypox. Theoretically any primary care provider, urgent care, or emergency room should be able to do the test: they take a swab of your lesions and ship it off to the lab for analysis. In Cincinnati, Equitas Health (2805 Gilbert Ave, Cincinnati, OH 45206; 513-815-4475) should be able to test–call ahead and check before showing up though.</p> <p>When you get tested, you should isolate until you get results back. If your results are negative, talk to your primary doctor about what to do: it may be some other kind of illness. If the test is positive, you’ll have to isolate until no longer contagious, which means the lesions need to scab over and heal–you should see new skin underneath. Your contacts do not need to isolate unless they show symptoms, but the county will start contact tracing, and just in case, I think it’d be a good idea for you to reach out to intimate contacts personally.</p> <h2><a href="#mileage-may-vary" id="mileage-may-vary">Mileage May Vary</a></h2> <p>Right now I’m the only person I know who’s gotten vaccinated here, and I haven’t needed to look for testing, so I don’t know what’s going to happen when other people start trying to access care. A friend just told me that his doctor didn’t think there <em>was</em> a vaccine: if this happens to you, try showing them the CDC’s <a href="https://www.cdc.gov/mmwr/volumes/71/wr/mm7122e1.htm">MMWR report on JYNNEOS</a>. If they can’t get you JYNNEOS, <a href="https://www.cdc.gov/poxvirus/monkeypox/clinicians/smallpox-vaccine.html">ACAM2000</a> is also available–it involves worse side effects and may not be a good fit if you have a weakened immune system, but for many people it should still offer significant protection against monkeypox. If your doctor still doesn’t get it, ask them to contact the local health dept anyway.</p> <p>In Dayton, one friend’s primary doctor states the only way to get vaccinated is at the ER following a possible exposure. In NE Ohio, another reader’s primary said to contact the health dept directly; they’ve tried, but so far no response.</p> <p>If you’re still having trouble getting tested or vaccinated–your doctor doesn’t understand, or they can’t find the right contact at the health department–and we’re friends locally, get in touch. I’m <a href="mailto:[email protected]">[email protected]</a>. I may be able to put you in touch with a doctor and/or health department staff who can help.</p> </content></entry></feed>
{ "cache-control": "private,max-age=60", "cf-cache-status": "DYNAMIC", "cf-ray": "929b3dc745dde819-ORD", "connection": "keep-alive", "content-type": "application/atom+xml", "date": "Tue, 01 Apr 2025 21:39:49 GMT", "server": "cloudflare", "set-cookie": "JSESSIONID=S9mwOr8Bem1V4f6rTnj66nr22UIp6eop63XeJFxh; path=/; secure; HttpOnly; Max-Age=2592000; Expires=Thu, 01-May-2025 21:39:49 GMT", "strict-transport-security": "max-age=31536000; includeSubdomains", "transfer-encoding": "chunked", "vary": "accept-encoding", "x-content-type-options": "nosniff", "x-frame-options": "SAMEORIGIN, DENY", "x-geoblock": "US", "x-xss-protection": "1; mode=block" }
{ "meta": { "type": "atom", "version": "1.0" }, "language": null, "title": "Aphyr: Posts", "description": null, "copyright": null, "url": "https://aphyr.com/", "self": "https://aphyr.com/posts.atom", "published": null, "updated": "2025-02-21T23:08:31.000Z", "generator": null, "image": null, "authors": [], "categories": [], "items": [ { "id": "https://aphyr.com/posts/380-comments-on-executive-order-14168", "title": "Comments on Executive Order 14168", "description": null, "url": "https://aphyr.com/posts/380-comments-on-executive-order-14168", "published": "2025-02-21T23:04:55.000Z", "updated": "2025-02-21T23:04:55.000Z", "content": "<p><em>Submitted to the Department of State, which is <a href=\"https://www.federalregister.gov/documents/2025/02/18/2025-02696/30-day-notice-of-proposed-information-collection-application-for-a-us-passport-for-eligible\">requesting comments</a> on a proposed change which would align US passport gender markers with <a href=\"https://www.whitehouse.gov/presidential-actions/2025/01/defending-women-from-gender-ideology-extremism-and-restoring-biological-truth-to-the-federal-government/\">executive order 14168</a>.</em></p>\n<p>Executive order 14168 is biologically incoherent and socially cruel. All passport applicants should be allowed to select whatever gender markers they feel best fit, including M, F, or X.</p>\n<p>In humans, neither sex nor gender is binary at any level. There are several possible arrangements of sex chromosomes: X, XX, XY, XXY, XYY, XXX, tetrasomies, pentasomies, etc. A single person can contain a mosaic of cells with different genetics: some XX, some XYY. Chromosomes may not align with genitalia: people with XY chromosomes may have a vulva and internal testes. People with XY chromosomes and a small penis may be surgically and socially reassigned female at birth—and never told what happened. None of these biological dimensions necessarily align with one’s internal concept of gender, or one’s social presentation.</p>\n<p>The executive order has no idea how biology works. It defines “female” as “a person belonging, at conception, to the sex that produces the large reproductive cell”. Zygotes do not produce reproductive cells at all: under this order none of us have a sex. Oogenesis doesn’t start until over a month into embryo development. Even if people were karyotyping their zygotes immediately after conception so they could tell what “legal” sex they were going to be, they could be wrong: which gametes we produce depends on the formation of the genital ridge.</p>\n<p>All this is to say that if people fill out these forms using this definition of sex, they’re guessing at a question which is both impossible to answer and socially irrelevant. You might be one of the roughly two percent of humans born with an uncommon sexual development and not even know it. Moreover, the proposed change fundamentally asks the wrong question: gender markers on passports are used by border control agents, and are expected to align with how those agents read the passport holder’s gender. A mismatch will create needless intimidation and hardship for travelers.</p>\n<p>Of course most of us will not have our identities challenged under this order. That animus is reserved for trans people, for gender-non-conforming people, for anyone whose genetics, body, dress, voice, or mannerisms don’t quite fit the mold. Those are the people who will suffer under this order. That cruelty should be resisted.</p>", "image": null, "media": [], "authors": [ { "name": "Aphyr", "email": null, "url": "https://aphyr.com/" } ], "categories": [] }, { "id": "https://aphyr.com/posts/379-geoblocking-the-uk-with-debian-nginx", "title": "Geoblocking the UK with Debian & Nginx", "description": null, "url": "https://aphyr.com/posts/379-geoblocking-the-uk-with-debian-nginx", "published": "2025-02-20T19:45:55.000Z", "updated": "2025-02-20T19:45:55.000Z", "content": "<p>A few quick notes for other folks who are <a href=\"https://geoblockthe.uk\">geoblocking the UK</a>. I just set up a basic geoblock with Nginx on Debian. This is all stuff you can piece together, but the Maxmind and Nginx docs are a little vague about the details, so I figure it’s worth an actual writeup. My Nginx expertise is ~15 years out of date, so this might not be The Best Way to do things. YMMV.</p>\n<p>First, register for a free <a href=\"https://www.maxmind.com/en/geolite2/signup\">MaxMind account</a>; you’ll need this to subscribe to their GeoIP database. Then set up a daemon to maintain a copy of the lookup file locally, and Nginx’s GeoIP2 module:</p>\n<pre><code><span></span>apt install geoipupdate libnginx-mod-http-geoip2\n</code></pre>\n<p>Create a license key on the MaxMind site, and download a copy of the config file you’ll need. Drop that in <code>/etc/GeoIP.conf</code>. It’ll look like:</p>\n<pre><code>AccountID XXXX\nLicenseKey XXXX\nEditionIDs GeoLite2-Country\n</code></pre>\n<p>The package sets up a cron job automatically, but we should grab an initial copy of the file. This takes a couple minutes, and writes out <code>/var/lib/GeoIP/GeoLite2-Country-mmdb</code>:</p>\n<pre><code><span></span>geoipupdate\n</code></pre>\n<p>The GeoIP2 module should already be loaded via <code>/etc/nginx/modules-enabled/50-mod-http-geoip2.conf</code>. Add a new config snippet like <code>/etc/nginx/conf.d/geoblock.conf</code>. The first part tells Nginx where to find the GeoIP database file, and then extracts the two-letter ISO country code for each request as a variable. The <code>map</code> part sets up an <code>$osa_geoblocked</code> variable, which is set to <code>1</code> for GB, otherwise <code>0</code>.</p>\n<pre><code>geoip2 /var/lib/GeoIP/GeoLite2-Country.mmdb {\n $geoip2_data_country_iso_code country iso_code;\n}\n\nmap $geoip2_data_country_iso_code $osa_geoblocked {\n GB 1;\n default 0;\n}\n</code></pre>\n<p>Write an HTML file somewhere like <code>/var/www/custom_errors/osa.html</code>, explaining the block. Then serve that page for HTTP 451 status codes: in <code>/etc/nginx/sites-enabled/whatever</code>, add:</p>\n<pre><code>server {\n ...\n # UK OSA error page\n error_page 451 /osa.html;\n location /osa.html {\n internal;\n root /var/www/custom_errors/;\n }\n\n # When geoblocked, return 451\n location / {\n if ($osa_geoblocked = 1) {\n return 451;\n }\n }\n}\n</code></pre>\n<p>Test your config with <code>nginx -t</code>, and then <code>service nginx reload</code>. You can test how things look from the UK using a VPN service, or something like <a href=\"https://www.locabrowser.com/\">locabrowser</a>.</p>\n<p>This is, to be clear, a bad solution. MaxMind’s free database is not particularly precise, and in general IP lookup tables are chasing a moving target. I know for a fact that there are people in non-UK countries (like Ireland!) who have been inadvertently blocked by these lookup tables. Making those people use Tor or a VPN <em>sucks</em>, but I don’t know what else to do in the current regulatory environment.</p>", "image": null, "media": [], "authors": [ { "name": "Aphyr", "email": null, "url": "https://aphyr.com/" } ], "categories": [] }, { "id": "https://aphyr.com/posts/378-seconds-since-the-epoch", "title": "Seconds Since the Epoch", "description": null, "url": "https://aphyr.com/posts/378-seconds-since-the-epoch", "published": "2024-12-25T18:46:21.000Z", "updated": "2024-12-25T18:46:21.000Z", "content": "<p>This is not at all news, but it comes up often enough that I think there should be a concise explanation of the problem. People, myself included, like to say that POSIX time, also known as Unix time, is the <a href=\"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date\">number</a> <a href=\"https://www.gnu.org/software/coreutils/manual/html_node/Seconds-since-the-Epoch.html\">of</a> <a href=\"https://man7.org/linux/man-pages/man2/time.2.html\">seconds</a> <a href=\"https://pkg.go.dev/time#Unix\">since</a> <a href=\"https://dev.mysql.com/doc/refman/8.4/en/datetime.html\">the</a> <a href=\"https://ruby-doc.org/core-3.0.0/Time.html\">Unix</a> <a href=\"https://docs.datastax.com/en/cql-oss/3.x/cql/cql_reference/timestamp_type_r.html\">epoch</a>, which was 1970-01-01 at 00:00:00.</p>\n<p>This is not true. Or rather, it isn’t true in the sense most people think. For example, it is presently 2024-12-25 at 18:51:26 UTC. The POSIX time is 1735152686. It has been 1735152713 seconds since the POSIX epoch. The POSIX time number is twenty-seven seconds lower.</p>\n<p>This is because POSIX time is derived <a href=\"https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub151-1.pdf\">in IEEE 1003.1</a> from <a href=\"https://en.wikipedia.org/wiki/Coordinated_Universal_Time\">Coordinated Universal Time</a>. The standard assumes that every day is exactly 86,400 seconds long. Specifically:</p>\n<blockquote>\n<p>The <em>time()</em> function returns the value of time in <b>seconds since the Epoch</b>.</p>\n</blockquote>\n<p>Which is defined as:</p>\n<blockquote>\n<p><b>seconds since the Epoch.</b> A value to be interpreted as the number of seconds between a specified time and the Epoch. A Coordinated Universal Time name (specified in terms of seconds (<em>tm_sec</em>), minutes (<em>tm_min</em>), hours (<em>tm_hour</em>), days since January 1 of the year (<em>tm_yday</em>), and calendar year minus 1900\n(<em>tm_year</em>)) is related to a time represented as <em>seconds since the Epoch</em> according to the expression below.</p>\n<p>If year < 1970 or the value is negative, the relationship is undefined. If year ≥ 1970 and the value is non-negative, the value is related to a Coordinated Universal Time name according to the expression:</p>\n<p><em>tm_sec</em> + <em>tm_min</em> * 60 + <em>tm_hour</em> * 3600 + <em>tm_yday</em> * 86400 +\n(<em>tm_year</em>-70) * 31536000 + ((<em>tm_year</em> - 69) / 4) * 86400</p>\n</blockquote>\n<p>The length of the day is not 86,400 seconds, and in fact changes over time. To keep UTC days from drifting too far from solar days, astronomers periodically declare a <a href=\"https://en.wikipedia.org/wiki/Leap_second\">leap second</a> in UTC. Consequently, every few years POSIX time jumps backwards, <a href=\"https://marc.info/?l=linux-kernel&m=134113577921904\">wreaking</a> <a href=\"https://www.zdnet.com/article/qantas-suffers-delays-due-to-linux-leap-second-bug/\">utter</a> <a href=\"https://blog.cloudflare.com/how-and-why-the-leap-second-affected-cloudflare-dns/\">havoc</a>. Someday it might jump forward.</p>\n<h2><a href=\"#archaeology\" id=\"archaeology\">Archaeology</a></h2>\n<p>Appendix B of IEEE 1003 has a fascinating discussion of leap seconds:</p>\n<blockquote>\n<p>The concept of leap seconds is added for precision; at the time this standard was published, 14 leap seconds had been added since January 1, 1970. These 14 seconds are ignored to provide an easy and compatible method of computing time differences.</p>\n</blockquote>\n<p>I, too, love to ignore things to make my life easy. The standard authors knew “seconds since the epoch” were not, in fact, seconds since the epoch. And they admit as much:</p>\n<blockquote>\n<p>Most systems’ notion of “time” is that of a continuously-increasing value, so this value should increase even during leap seconds. However, not only do most systems not keep track of leap seconds, but most systems are probably not synchronized to any standard time reference. Therefore, it is inappropriate to require that a time represented as seconds since the Epoch precisely represent the number of seconds between the referenced time and the Epoch.</p>\n<p>It is sufficient to require that applications be allowed to treat this time as if it represented the number of seconds between the referenced time and the Epoch. It is the responsibility of the vendor of the system, and the administrator of the system, to ensure that this value represents the number of seconds between the referenced time and the Epoch as closely as necessary for the application being run on that system….</p>\n</blockquote>\n<p>I imagine there was some debate over this point. The appendix punts, saying that vendors and administrators must make time align “as closely as necessary”, and that “this value should increase even during leap seconds”. The latter is achievable, but the former is arguably impossible: the standard requires POSIX clocks be twenty-seven seconds off.</p>\n<blockquote>\n<p>Consistent interpretation of seconds since the Epoch can be critical to certain types of distributed applications that rely on such timestamps to synchronize events. The accrual of leap seconds in a time standard is not predictable. The number of leap seconds since the Epoch will likely increase. The standard is\nmore concerned about the synchronization of time between applications of astronomically short duration and the Working Group expects these concerns to become more critical in the future.</p>\n</blockquote>\n<p>In a sense, the opposite happened. Time synchronization is <em>always</em> off, so systems generally function (however incorrectly) when times drift a bit. But leap seconds are rare, and the linearity evoked by the phrase “seconds since the epoch” is so deeply baked in to our intuition, that software can accrue serious, unnoticed bugs. Until a few years later, one of those tiny little leap seconds takes down a big chunk of the internet.</p>\n<h2><a href=\"#what-to-do-instead\" id=\"what-to-do-instead\">What To Do Instead</a></h2>\n<p>If you just need to compute the duration between two events on one computer, use <a href=\"https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/sect-posix_clocks#sect-POSIX_clocks\"><code>CLOCK_MONOTONIC</code></a>, or better yet, <code>CLOCK_BOOTTIME</code>. If you don’t need to exchange timestamps with other systems that assume POSIX time, use <a href=\"https://www.ipses.com/eng/in-depth-analysis/standard-of-time-definition/\">TAI, GPS, or maybe LORAN</a>. If you do need rough alignment with other POSIX-timestamp systems, <a href=\"https://developers.google.com/time/smear\">smear leap seconds</a> over a longer window of time. Libraries like <a href=\"https://github.com/qntm/t-a-i\">qntm’s t-a-i</a> can convert back and forth between POSIX and TAI.</p>\n<p>There’s an ongoing effort to <a href=\"https://www.timeanddate.com/news/astronomy/end-of-leap-seconds-2022\">end leap seconds</a>, hopefully <a href=\"https://www.bipm.org/documents/20126/64811223/Resolutions-2022.pdf/281f3160-fc56-3e63-dbf7-77b76500990f\">by 2035</a>. It’ll require additional work to build conversion tables into everything that relies on the “86,400 seconds per day” assumption, but it should also make it much simpler to ask questions like “how many seconds between these two times”. At least for times after 2035!</p>", "image": null, "media": [], "authors": [ { "name": "Aphyr", "email": null, "url": "https://aphyr.com/" } ], "categories": [] }, { "id": "https://aphyr.com/posts/371-threads-wont-take-you-south-of-market", "title": "Threads Won't Take You South of Market", "description": null, "url": "https://aphyr.com/posts/371-threads-wont-take-you-south-of-market", "published": "2024-12-01T15:01:36.000Z", "updated": "2024-12-01T15:01:36.000Z", "content": "<p>In June 2023, when <a href=\"https://threads.net\">Threads</a> announced their <a href=\"https://techcrunch.com/2023/07/05/adam-mosseri-says-metas-threads-app-wont-have-activitypub-support-at-launch/\">plans to federate</a> with other <a href=\"https://en.wikipedia.org/wiki/Fediverse\">Fediverse instances</a>, there was a good deal of <a href=\"https://fedipact.online/\">debate</a> around whether smaller instances should allow federation or block it pre-emptively. As one of the admins of <a href=\"https://woof.group\">woof.group</a>, I wrote about some of the <a href=\"https://blog.woof.group/announcements/considering-large-instance-federation\">potential risks and rewards</a> of federating with Threads. We decided to <a href=\"https://blog.woof.group/announcements/deferring-threads-federation\">wait and see</a>.</p>\n<p>In my queer and leather circles, Facebook and Instagram have been generally understood as hostile environments for over a decade. In 2014, their <a href=\"https://www.eff.org/deeplinks/2014/09/facebooks-real-name-policy-can-cause-real-world-harm-lgbtq-community\">“Real Name” policy</a> made life particularly difficult for trans people, drag queens, sex workers, and people who, for various reasons, needed to keep their real name disconnected from their queer life. My friends have been repeatedly suspended from both platforms for showing too much skin, or using the peach emoji. Meta’s moderation has been aggressive, opaque, and wildly inconsistent: sometimes full nudity is fine; other times a kiss or swimsuit is beyond the line. In some circles, maintaining a series of backup accounts in advance of one’s ban became de rigueur.</p>\n<p>I’d hoped that federation between Threads and the broader Fediverse might allow a <a href=\"https://blog.woof.group/mods/the-shape-of-social-space\">more nuanced spectrum</a> of moderation norms. Threads might opt for a more conservative environment locally, but through federation, allow their users to interact with friends on instances with more liberal norms. Conversely, most of my real-life friends are still on Meta services—I’d love to see their posts and chat with them again. Threads could communicate with Gay Fedi (using the term in the broadest sense), and de-rank or hide content they don’t like on a per-post or per-account basis.</p>\n<p>This world seems technically feasible. Meta reports <a href=\"https://techcrunch.com/2024/11/03/threads-now-has-275m-monthly-active-users/\">275 million Monthly Active Users (MAUs)</a>, and over <a href=\"https://www.statista.com/statistics/1092227/facebook-product-dau/\">three billion</a> accross other Meta services. Fediverse has something like <a href=\"https://fedidb.org/\">one million MAUs across various instances</a>. This is not a large jump in processing or storage; nor would it seem to require a large increase in moderation staff. Threads has already committed to doing the requisite engineering, user experience, and legal work to allow federation across a broad range of instances. Meta is swimming in cash.</p>\n<p>All this seems a moot point. A year and a half later, Threads <a href=\"https://www.theverge.com/24107998/threads-fediverse-mastodon-how-to\">is barely half federated</a>. It publishes Threads posts to the world, but only if you dig in to the settings and check the “Fediverse Sharing” box. Threads users can see replies to their posts, but can’t talk back. Threads users can’t mention others, see mentions from other people, or follow anyone outside Threads. This may work for syndication, but is essentially unusable for conversation.</p>\n<p>Despite the fact that Threads users can’t follow or see mentions from people on other instances, Threads has already <a href=\"https://www.threads.net/moderated_servers\">opted to block</a> a slew of instances where gay & leather people congregate. Threads blocks <a href=\"https://hypno.social\">hypno.social</a>, <a href=\"rubber.social\">rubber.social</a>, <a href=\"https://4bear.com\">4bear.com</a>, <a href=\"https://nsfw.lgbt\">nsfw.lgbt</a>, <a href=\"https://kinkyelephant.com\">kinkyelephant.com</a>, <a href=\"https://kinktroet.social\">kinktroet.social</a>, <a href=\"https://barkclub.xyz\">barkclub.xyz</a>, <a href=\"https://mastobate.social\">mastobate.social</a>, and <a href=\"https://kinky.business\">kinky.business</a>. They also block the (now-defunct) instances <a href=\"https://bear.community\">bear.community</a>, <a href=\"https://gaybdsm.group\">gaybdsm.group</a>, and <a href=\"https://gearheads.social\">gearheads.social</a>. They block more general queer-friendly instances like <a href=\"https://bark.lgbt\">bark.lgbt</a>, <a href=\"https://super-gay.co\">super-gay.co</a>, <a href=\"https://gay.camera\">gay.camera</a>, and <a href=\"https://gaygeek.social\">gaygeek.social</a>. They block sex-positive instances like <a href=\"https://nsfwphotography.social\">nsfwphotography.social</a>, <a href=\"https://nsfw.social\">nsfw.social</a>, and <a href=\"https://net4sw.com\">net4sw.com</a>. All these instances are blocked for having “violated our Community Standards or Terms of Use”. Others like <a href=\"https://fisting.social\">fisting.social</a>, <a href=\"https://mastodon.hypnoguys.com\">mastodon.hypnoguys.com</a>, <a href=\"https://abdl.link\">abdl.link</a>, <a href=\"https://qaf.men\">qaf.men</a>, and <a href=\"https://social.rubber.family\">social.rubber.family</a>, are blocked for having “no publicly accessible feed”. I don’t know what this means: hypnoguys.social, for instance, has the usual Mastodon <a href=\"https://mastodon.hypnoguys.com/public/local\">publically accessible local feed</a>.</p>\n<p>It’s not like these instances are hotbeds of spam, hate speech, or harassment: woof.group federates heavily with most of the servers I mentioned above, and we rarely have problems with their moderation. Most have reasonable and enforced media policies requiring sensitive-media flags for genitals, heavy play, and so on. Those policies are generally speaking looser than Threads (woof.group, for instance, allows butts!) but there are plenty of accounts and posts on these instances which would be anodyne under Threads’ rules.</p>\n<p>I am shocked that woof.group is <em>not</em> on Threads’ blocklist yet. We have similar users who post similar things. Our content policies are broadly similar—several of the instances Threads blocks actually adopted woof.group’s specific policy language. I doubt it’s our size: Threads blocks several instances with less than ten MAUs, and woof.group has over seven hundred.</p>\n<p>I’ve been out of the valley for nearly a decade, and I don’t have insight into Meta’s policies or decision-making. I’m sure Threads has their reasons. Whatever they are, Threads, like all of Meta’s services, feels distinctly uncomfortable with sex, and sexual expression is a vibrant aspect of gay culture.</p>\n<p>This is part of why I started woof.group: we deserve spaces moderated with our subculture in mind. But I also hoped that by designing a moderation policy which compromised with normative sensibilities, we might retain connections to a broader set of publics. This particular leather bar need not be an invite-only clubhouse; it can be a part of a walkable neighborhood. For nearly five years we’ve kept that balance, retaining open federation with most all the Fediverse. I get the sense that Threads intends to wall its users off from our world altogether—to make “bad gays” invisible. If Threads were a taxi service, it wouldn’t take you <a href=\"https://sfleatherdistrict.org/wp-content/uploads/2021/04/Rubin-Valley-of-Kings.pdf\">South of Market</a>.</p>", "image": null, "media": [], "authors": [ { "name": "Aphyr", "email": null, "url": "https://aphyr.com/" } ], "categories": [] }, { "id": "https://aphyr.com/posts/370-ecobee-settings-for-heat-pumps-with-resistive-aux-heat", "title": "Ecobee Settings for Heat Pumps with Resistive Aux Heat", "description": null, "url": "https://aphyr.com/posts/370-ecobee-settings-for-heat-pumps-with-resistive-aux-heat", "published": "2024-02-29T04:41:38.000Z", "updated": "2024-02-29T04:41:38.000Z", "content": "<p>I’m in the process of replacing a old radiator system with a centrally-ducted, air-source heat pump system with electric resistive backup heat. I’ve found that the default ecobee algorithm seems to behave surprisingly poorly for this system, and wanted to write up some of the settings that I’ve found yield better behavior.</p>\n<p>A disclaimer. I’m not an HVAC professional. I have two decades in software operations, a background in physics, and far too much experience inferring system dynamics from timeseries graphs. This advice may void your warranty, burn your house down, etc.; everything you do is at your own risk.</p>\n<h2><a href=\"#the-system\" id=\"the-system\">The System</a></h2>\n<p>First, a bit about the system in question. You can skip this section if you know about heat pumps, short cycling, staging, etc.</p>\n<p>There are two main subsystems: a heat pump and an air handler. The heat pump sits outside: it has a fan which moves outside air over a heat exchanger, and a compressor, which compresses a working fluid. The working fluid is connected in a loop to the air handler, where it runs through another heat exchanger to heat or cool the inside air. The air handler also has a blower fan which circulates air through the whole house. If the heat pump can’t keep up with demand, the air handler also has a pair of resistive electric heating coils, called <em>aux heat</em>, which can supplement or take over from the heat pumps.</p>\n<p>A few important things to know about heat pumps. First, electric resistive heaters have a <em>Coefficient of Performance</em> (CoP) of essentially 1: they take 1 joule of electricity and turn it into 1 joule of heat in the air. My heat pumps have a typical heating CoP of about 2-4, depending on temperature and load. They take 1 joule of electricity and suck 2 to 4 joules of heat from the outside air into the inside. This means they cost 2-4 times less (in electric opex, at least) than a standard resistive electric heating system.</p>\n<p>Second, heat pumps, like A/C systems, shouldn’t start and stop too frequently. Starting up causes large transient electrical and mechanical stresses. Ideally they should run at a low speed for several hours, rather than running at full blast, shutting off, then turning on again ten minutes later. This is called “short cycling”.</p>\n<p>Third, the heat pump’s fan, heat pump’s compressor, and the air handler’s fan are all variable-speed: they can run very slow (quiet, efficient), very fast (loud, more powerful), or at any speed in between. This helps reduce short-cycling, as well as improving efficiency and reducing noise. However, directly setting compressor and fan speed requires a special “communicating” thermostat made by the same manufacturer, which speaks a proprietary wire protocol. My manufacturer’s communicating thermostats are very expensive and have a reputation for buggy hardware and software, so I opted to get an <a href=\"https://www.ecobee.com/en-us/smart-thermostats/smart-wifi-thermostat/\">ecobee 3 lite</a>. Like essentially every other thermostat on the planet, the ecobee uses ~8 wires with simple binary signals, like “please give me heat” and “please turn on the fan”. It can’t ask for a specific <em>amount</em> of heat.</p>\n<p>However, all is not lost. The standard thermostat protocol has a notion of a “two-stage” system—if the Y1 wire is hot, it’s asking for “some heat”, and if Y2 is also hot, it’s asking for “more heat”. My variable-speed heat pump emulates a two-stage system using a hysteresis mechanism. In stage 1, the heat pump offers some nominal low degree of heat. When the thermostat calls for stage 2, it kicks up the air handler blower a notch, and after 20 minutes, it slowly ramps up the heat pump compressor as well. I assume there’s a ramp-down for going back to stage 1. They say this provides “true variable-capacity operation”. You can imagine that the most efficient steady state is where the thermostat toggles rapidly between Y1 and Y2, causing the system to hang out at exactly the right variable speeds for current conditions—but I assume ecobee has some kind of of frequency limiter to avoid damaging systems that actually have two separate stages with distinct startup/shutdown costs.</p>\n<p>The air handler’s aux heat is also staged: if the W1 wire is hot, I think (based on staring at the wiring diagram and air handler itself) it just energizes one of two coils. If W2 is also hot, it energizes both. I think this is good: we want to use as much of the heat pump heat as possible, and if we can get away with juuuust a little aux heat, instead of going full blast, that’ll save energy.</p>\n<p>In short: aux heat is 2-4x more expensive than heat pump heat; we want to use as little aux as possible. Short-cycling is bad: we want long cycle times. For maximum efficiency, we want both the heat pump and aux heat to be able to toggle between stage 1 and 2 depending on demand.</p>\n<h2><a href=\"#automatic-problems\" id=\"automatic-problems\">Automatic Problems</a></h2>\n<p>I initially left the ecobee at its automatic default settings for a few weeks; it’s supposed to learn the house dynamics and adapt. I noticed several problems. Presumably this behavior depends on weather, building thermal properties, HVAC dynamics, and however ecobee’s tuned their algorithm last week, so YMMV: check your system and see how it looks.</p>\n<p>It’s kind of buried, but ecobee offers a really nice time-series visualization of thermostat behavior on their web site. There’s also a Home Assistant integration that pulls in data from their API. It’s a pain in the ass to set up (ecobee, there’s no need for this to be so user-hostile), but it does work.</p>\n<p>Over the next few weeks I stared obsessively at time-series plots from both ecobee and Home Assistant, and mucked around with ecobee’s settings. Most of what I’ll describe below is configurable in the settings menu on the thermostat: look for “settings”, “installation settings”, “thresholds”.</p>\n<h2><a href=\"#reducing-aux-heat\" id=\"reducing-aux-heat\">Reducing Aux Heat</a></h2>\n<p>First, the automatics kicked on aux heat a <em>lot</em>. Even in situations where the heat pump would have been perfectly capable of getting up to temp, ecobee would burn aux heat to reach the target temperature (<em>set point</em>) faster.</p>\n<p>Part of the problem was that ecobee ships (I assume for safety reasons) with ludicrously high cut-off thresholds for heat pumps. Mine had “compressor min outdoor temperature” of something like 35 degrees, so the heat pump wouldn’t run for most of the winter. The actual minimum temperature of my model is -4, cold-climate heat pumps run down to -20. I lowered mine to -5; the manual says there’s a physical thermostat interlock on the heat pump itself, and I trust that more than the ecobee weather feed anyway.</p>\n<p>Second: ecobee seems to prioritize speed over progress: if it’s not getting to the set point fast enough, it’ll burn aux heat to get there sooner. I don’t want this: I’m perfectly happy putting on a jacket. After a bit I worked out that the heat pumps alone can cover the house load down to ~20 degrees or so, and raised “aux heat max outdoor temperature” to 25. If it’s any warmer than that, the system won’t use aux heat.</p>\n<h2><a href=\"#reverse-staging\" id=\"reverse-staging\">Reverse Staging</a></h2>\n<p>A second weird behavior: once the ecobee called for stage 2, either from the heat pump or aux, it would run in stage 2 until it hit the set point, then shut off the system entirely. Running aux stage 2 costs more energy. Running the heat pump in stage 2 shortens the cycle time: remember, the goal is a low, long running time.</p>\n<p><img class=\"attachment pure-img\" src=\"/data/posts/370/no-reverse-staging.png\" alt=\"A time-series plot showing that once stage 2 engages, it runs until shutting off, causing frequent cycling\" title=\"A time-series plot showing that once stage 2 engages, it runs until shutting off, causing frequent cycling\"></p>\n<p>The setting I used to fix this is called “reverse staging”. Ecobee’s <a href=\"https://support.ecobee.com/s/articles/Threshold-settings-for-ecobee-thermostats\">documentation</a> says:</p>\n<blockquote>\n<p>Compressor Reverse Staging: Enables the second stage of the compressor near the temperature setpoint.</p>\n</blockquote>\n<p>As far as I can tell this documentation is completely wrong. From watching the graphs, this setting seems to allow the staging state machine to move from stage 2 back to stage 1, rather than forcing it to run in stage 2 until shutting off entirely. It’ll go back up to stage 2 if it needs to, and back down again.</p>\n<p><img class=\"attachment pure-img\" src=\"/data/posts/370/reverse-staging.png\" alt=\"With reverse staging, it'll jump up to stage 2, then drop back down to stage 1.\" title=\"With reverse staging, it'll jump up to stage 2, then drop back down to stage 1.\"></p>\n<h2><a href=\"#manual-staging\" id=\"manual-staging\">Manual Staging</a></h2>\n<p>I couldn’t seem to get ecobee’s automatic staging to drop back to stage 1 heat reliably, or avoid kicking on aux heat when stage 2 heat pump heat would have done fine. I eventually gave up and turned off automatic staging altogether. I went with the delta temperature settings. If the temperature delta between the set point and indoor air is more than 1 degree, it turns on heat pump stage 1. More than two degrees, stage 2. More than four degrees, aux 1. More than five degrees, aux 2. The goal here is to use only as much aux heat as absolutely necessary to supplement the heat pump. I also have aux heat configured to run concurrently with the heat pump: there’s a regime where the heat pump provides useful heat, but not quite enough, and my intuition is that <em>some</em> heat pump heat is cheaper than all aux.</p>\n<p>I initially tried the default 0.5 degree delta before engaging the heat pump’s first stage. It turns out that for some temperature regimes this creates rapid cycling: that first-phase heat is enough to heat the house rapidly to the set point, and then there’s nothing to do but shut the system off. The house cools, and the system kicks on again, several times per hour. I raised the delta to 1 degree, which significantly extended the cycle time.</p>\n<h2><a href=\"#large-setback-with-preheating\" id=\"large-setback-with-preheating\">Large Setback with Preheating</a></h2>\n<p>A <em>setback</em> is when you lower your thermostat, e.g. while away from home or sleeping. There’s some folk wisdom that heat pumps should run at a constant temperature all the time, rather than have a large setback. As far as I can tell, this is because a properly-sized heat pump system (unlike a gas furnace) doesn’t deliver a ton of excess heat, so it can’t catch up quickly when asked to return to a higher temperature. To compensate, the system might dip into aux heat, and that’s super expensive.</p>\n<p>I’m in the US Midwest, where winter temperatures are usually around 15-40 F. I drop from 68 to 60 overnight, and the house can generally coast all night without having to run any HVAC at all. In theory the ecobee should be able to figure out the time required to come back to 68 and start the heat pump early in the morning, but in practice I found it would wait too long, and then the large difference between actual and set temp would trigger aux heat. To avoid this, I added a custom activity in Ecobee’s web interface (I call mine “preheat”), with a temperature of 64. I have my schedule set up with an hour of preheat in the morning, before going to the normal 68. This means there’s less of a delta-T, and the system can heat up entirely using the heat pump.</p>\n<p><img class=\"attachment pure-img\" src=\"/data/posts/370/setback.png\" alt=\"A time series graph showing temperature falling smoothly overnight as the HVAC is disabled, and then rising during the preheat phase in the morning.\" title=\"A time series graph showing temperature falling smoothly overnight as the HVAC is disabled, and then rising during the preheat phase in the morning.\"></p>", "image": null, "media": [], "authors": [ { "name": "Aphyr", "email": null, "url": "https://aphyr.com/" } ], "categories": [] }, { "id": "https://aphyr.com/posts/369-classnotfoundexception-java-util-sequencedcollection", "title": "ClassNotFoundException: java.util.SequencedCollection", "description": null, "url": "https://aphyr.com/posts/369-classnotfoundexception-java-util-sequencedcollection", "published": "2024-02-20T23:03:42.000Z", "updated": "2024-02-20T23:03:42.000Z", "content": "<p>Recently I’ve had users of my libraries start reporting mysterious errors due to a missing reference to <code>SequencedCollection</code>, a Java interface added in JDK 21:</p>\n<pre><code>Execution error (ClassNotFoundException) at\njdk.internal.loader.BuiltinClassLoader/loadClass (BuiltinClassLoader.java:641).\njava.util.SequencedCollection\n</code></pre>\n<p>Specifically, projects using <a href=\"https://github.com/jepsen-io/jepsen/issues/585\">Jepsen 0.3.5</a> started throwing this error due to Clojure’s built-in <code>rrb_vector.clj</code>, which is particularly vexing given that the class doesn’t reference <code>SequencedCollection</code> at all.</p>\n<p>It turns out that the Clojure compiler, when run on JDK 21 or later, will automatically insert references to this class when compiling certain expressions–likely because it now appears in the supertypes of other classes. Jepsen had <code>:javac-options [\"-source\" \"11\" \"-target\" \"11\"]</code> in Jepsen’s <code>project.clj</code> already, but it still emitted references to <code>SequencedCollection</code> because the reference is inserted by the Clojure compiler, not <code>javac</code>. Similarly, adding <code>[\"--release\" \"11\"]</code> didn’t work.</p>\n<p>Long story short: as far as I can tell the only workaround is to downgrade to Java 17 (or anything prior to 21) when building Jepsen as a library. That’s not super hard with <code>update-alternatives</code>, but I still imagine I’ll be messing this up until Clojure’s compiler can get a patch.</p>", "image": null, "media": [], "authors": [ { "name": "Aphyr", "email": null, "url": "https://aphyr.com/" } ], "categories": [] }, { "id": "https://aphyr.com/posts/368-how-to-replace-your-cpap-in-only-666-days", "title": "How to Replace Your CPAP In Only 666 Days", "description": null, "url": "https://aphyr.com/posts/368-how-to-replace-your-cpap-in-only-666-days", "published": "2024-02-04T00:38:53.000Z", "updated": "2024-02-04T00:38:53.000Z", "content": "<p><em>This story is not practical advice. For me, it’s closing the book on an almost two-year saga. For you, I hope it’s an enjoyable bit of bureaucratic schadenfreude. For Anthem, I hope it’s the subject of a series of painful but transformative meetings. This is not an isolated event. I’ve had dozens of struggles with Anthem customer support, and they all go like this.</em></p>\n<p><em>If you’re looking for practical advice: it’s this. Be polite. Document everything. Keep a log. Follow the claims process. Check the laws regarding insurance claims in your state. If you pass the legally-mandated deadline for your claim, call customer service. Do not allow them to waste a year of your life, or force you to resubmit your claim from scratch. Initiate a complaint with your state regulators, and escalate directly to <a href=\"mailto:[email protected]\">Gail Boudreaux’s team</a>–or whoever Anthem’s current CEO is.</em></p>\n<p>To start, experience an equipment failure.</p>\n<p>Use your CPAP daily for six years. Wake up on day zero with it making a terrible sound. Discover that the pump assembly is failing. Inquire with Anthem Ohio, your health insurer, about how to have it repaired. Allow them to refer you to a list of local durable medical equipment providers. Start calling down the list. Discover half the list are companies like hair salons. Eventually reach a company in your metro which services CPAPs. Discover they will not repair broken equipment unless a doctor tells them to.</p>\n<p>Leave a message with your primary care physician. Call the original sleep center that provided your CPAP. Discover they can’t help, since you’re no longer in the same state. Return to your primary, who can’t help either, because he had nothing to do with your prescription. Put the sleep center and your primary in touch, and ask them to talk.</p>\n<p>On day six, call your primary to check in. He’s received a copy of your sleep records, and has forwarded them to a local sleep center you haven’t heard of. They, in turn, will talk to Anthem for you.</p>\n<p>On day 34, receive an approval letter labeled “confirmation of medical necessity” from Anthem, directed towards the durable medical equipment company. Call that company and confirm you’re waitlisted for a new CPAP. They are not repairable. Begin using your partner’s old CPAP, which is not the right class of device, but at least it helps.</p>\n<p>Over the next 233 days, call that medical equipment company regularly. Every time, inquire whether there’s been any progress, and hear “we’re still out of stock”. Ask them you what the manufacturer backlog might be, how many people are ahead of you in line, how many CPAPs they <em>do</em> receive per month, or whether anyone has ever received an actual device from them. They won’t answer any questions. Realize they are never going to help you.</p>\n<p>On day 267, realize there is no manufacturer delay. The exact machine you need is in stock on CPAP.com. Check to make sure there’s a claims process for getting reimbursed by Anthem. Pay over three thousand dollars for it. When it arrives, enjoy being able to breathe again.</p>\n<p>On day 282, follow CPAP.com’s documentation to file a claim with Anthem online. Include your prescription, receipt, shipping information, and the confirmation of medical necessity Anthem sent you.</p>\n<p>On day 309, open the mail to discover a mysterious letter from Anthem. They’ve received your appeal. You do not recall appealing anything. There is no information about what might have been appealed, but something will happen within 30-60 days. There is nothing about your claim.</p>\n<p>On day 418, emerge from a haze of lead, asbestos, leaks, and a host of other home-related nightmares; remember Anthem still hasn’t said anything about your claim. Discover your claim no longer appears on Anthem’s web site. Call Anthem customer service. They have no record of your claim either. Ask about the appeal letter you received. Listen, gobsmacked, as they explain that they decided your claim was in fact an appeal, and transferred it immediately to the appeals department. The appeals department examined the appeal and looked for the claim it was appealing. Finding none, they decided the appeal was moot, and rejected it. At no point did anyone inform you of this. Explain to Anthem’s agent that you filed a claim online, not an appeal. At their instruction, resign yourself to filing the entire claim again, this time using a form via physical mail. Include a detailed letter explaining the above.</p>\n<p>On day 499, retreat from the battle against home entropy to call Anthem again. Experience a sense of growing dread as the customer service agent is completely unable to locate either of your claims. After a prolonged conversation, she finds it using a different tool. There is no record of the claim from day 418. There was a claim submitted on day 282. Because the claim does not appear in her system, there is no claim. Experience the cognitive equivalent of the Poltergeist hallway shot as the agent tells you “Our members are not eligible for charges for claim submission”.</p>\n<p>Hear the sentence “There is a claim”. Hear the sentence “There is no claim”. Write these down in the detailed log you’ve been keeping of this unfurling Kafkaesque debacle. Ask again if there is anyone else who can help. There is no manager you can speak to. There is no tier II support. “I’m the only one you can talk to,” she says. Write that down.</p>\n<p>Call CPAP.com, which has a help line staffed by caring humans. Explain that contrary to their documentation, Anthem now says members cannot file claims for equipment directly. Ask if they are the provider. Discover the provider for the claim is probably your primary care physician, who has no idea this is happening. Leave a message with him anyway. Leave a plaintive message with your original sleep center for good measure.</p>\n<p>On day 502, call your sleep center again. They don’t submit claims to insurance, but they confirm that some people <em>do</em> successfully submit claims to Anthem using the process you’ve been trying. They confirm that Anthem is, in fact, hot garbage. Call your primary, send them everything you have, and ask if they can file a claim for you.</p>\n<p>On day 541, receive a letter from Anthem, responding to your inquiry. You weren’t aware you filed one.</p>\n<blockquote>\n<p>Please be informed that we have received your concern. Upon review we have noticed that there is no claim billed for the date of service mentioned in the submitted documents, Please provide us with a valid claim. If not submitted,provide us with a valid claim iamge to process your claim further.</p>\n</blockquote>\n<p>Stare at the letter, typos and all. Contemplate your insignificance in the face of the vast and uncaring universe that is Anthem.</p>\n<p>On day 559, steel your resolve and call Anthem again. Wait as this representative, too, digs for evidence of a claim. Listen with delight as she finds your documents from day 282. Confirm that yes, a claim definitely exists. Have her repeat that so you can write it down. Confirm that the previous agent was lying: members can submit claims. At her instruction, fill out the claim form a third time. Write a detailed letter, this time with a Document Control Number (DCN). Submit the entire package via registered mail. Wait for USPS to confirm delivery eight days later.</p>\n<p>On day 588, having received no response, call Anthem again. Explain yourself. You’re getting good at this. Let the agent find a reference number for an appeal, but not the claim. Incant the magic DCN, which unlocks your original claim. “I was able to confirm that this was a claim submitted form for a member,” he says. He sees your claim form, your receipts, your confirmation of medical necessity. However: “We still don’t have the claim”.</p>\n<p>Wait for him to try system after system. Eventually he confirms what you heard on day 418: the claims department transferred your claims to appeals. “Actually this is not an appeal, but it was denied as an appeal.” Agree as he decides to submit your claim manually again, with the help of his supervisor. Write down the call ref number: he promises you’ll receive an email confirmation, and an Explanation of Benefits in 30-40 business days.</p>\n<p>“I can assure you this is the last time you are going to call us regarding this.”</p>\n<p>While waiting for this process, recall insurance is a regulated industry. Check the Ohio Revised Code. Realize that section 3901.381 establishes deadlines for health insurers to respond to claims. They should have paid or denied each of your claims within 30 days–45 if supporting documentation was required. Leave a message with the Ohio Department of Insurance’s Market Conduct Division. File an insurance complaint with ODI as well.</p>\n<p>Grimly wait as no confirmation email arrives.</p>\n<p>On day 602, open an email from Anthem. They are “able to put the claim in the system and currenty on processed [sic] to be applied”. They’re asking for more time. Realize that Anthem is well past the 30-day deadline under the Ohio Revised Code for all three iterations of your claim.</p>\n<p>On day 607, call Anthem again. The representative explains that the claim will be received and processed as of your benefits. She asks you to allow 30-45 days from today. Quote section 3901.381 to her. She promises to expedite the request; it should be addressed within 72 business hours. Like previous agents, she promises to call you back. Nod, knowing she won’t.</p>\n<p>On day 610, email the Ohio Department of Insurance to explain that Anthem has found entirely new ways to avoid paying their claims on time. It’s been 72 hours without a callback; call Anthem again. She says “You submitted a claim and it was received” on day 282. She says the claim was expedited. Ask about the status of that expedited resolution. “Because on your plan we still haven’t received any claims,” she explains. Wonder if you’re having a stroke.</p>\n<p>Explain that it has been 328 days since you submitted your claim, and ask what is going on. She says that since the first page of your mailed claim was a letter, that might have caused it to be processed as an appeal. Remind yourself Anthem told you to enclose that letter. Wait as she attempts to refer you to the subrogation department, until eventually she gives up: the subrogation department doesn’t want to help.</p>\n<p>Call the subrogation department yourself. Allow Anthem’s representative to induce in you a period of brief aphasia. She wants to call a billing provider. Try to explain there is none: you purchased the machine yourself. She wants to refer you to collections. Wonder why on earth Anthem would want money from <em>you</em>. Write down “I literally can’t understand what she thinks is going on” in your log. Someone named Adrian will call you by tomorrow.</p>\n<p>Contemplate alternative maneuvers. Go on a deep Google dive, searching for increasingly obscure phrases gleaned from Anthem’s bureaucracy. Trawl through internal training PDFs for Anthem’s ethics and compliance procedures. Call their compliance hotline: maybe someone cares about the law. It’s a third-party call center for Elevance Health. Fail to realize this is another name for Anthem. Begin drawing a map of Anthem’s corporate structure.</p>\n<p>From a combination of publicly-available internal slide decks, LinkedIn, and obscure HR databases, discover the name, email, and phone number of Anthem’s Chief Compliance Officer. Call her, but get derailed by an internal directory that requires a 10-digit extension. Try the usual tricks with automated phone systems. No dice.</p>\n<p>Receive a call from an Anthem agent. Ask her what happened to “72 hours”. She says there’s been no response from the adjustments team. She doesn’t know when a response will come. There’s no one available to talk to. Agree to speak to another representative tomorrow. It doesn’t matter: they’ll never call you.</p>\n<p>Do more digging. Guess the CEO’s email from what you can glean of Anthem’s account naming scheme. Write her an email with a short executive summary and a detailed account of the endlessly-unfolding Boschian hellscape in which her company has entrapped you. A few hours later, receive an acknowledgement from an executive concierge at Elevance (Anthem). It’s polite, formal, and syntactically coherent. She promises to look into things. Smile. Maybe this will work.</p>\n<p>On day 617, receive a call from the executive concierge. 355 days after submission, she’s identified a problem with your claim. CPAP.com provided you with an invoice with a single line item (the CPAP) and two associated billing codes (a CPAP and humidifier). Explain that they are integrated components of a single machine. She understands, but insists you need a receipt with multiple line items for them anyway. Anthem has called CPAP.com, but they can’t discuss an invoice unless you call them. Explain you’ll call them right now.</p>\n<p>Call CPAP.com. Their customer support continues to be excellent. Confirm that it is literally impossible to separate the CPAP and humidifier, or to produce an invoice with two line items for a single item. Nod as they ask what the hell Anthem is doing. Recall that this is the exact same machine Anthem covered for you eight years ago. Start a joint call with the CPAP.com representative and Anthem’s concierge. Explain the situation to her voicemail.</p>\n<p>On day 623, receive a letter from ODI. Anthem has told ODI this was a problem with the billing codes, and ODI does not intervene in billing code issues. They have, however, initiated a secretive second investigation. There is no way to contact the second investigator.</p>\n<p>Write a detailed email to the concierge and ODI explaining that it took over three hundred days for Anthem to inform you of this purported billing code issue. Explain again that it is a single device. Emphasize that Anthem has been handling claims for this device for roughly a decade.</p>\n<p>Wait. On day 636, receive a letter from Anthem’s appeals department. They’ve received your request for an appeal. You never filed one. They want your doctor or facility to provide additional information to Carelon Medical Benefits Management. You have never heard of Carelon. There is no explanation of how to reach Carelon, or what information they might require. The letter concludes: “There is currently no authorization on file for the services rendered.” You need to seek authorization from a department called “Utilization Management”.</p>\n<p>Call the executive concierge again. Leave a voicemail asking what on earth is going on.</p>\n<p>On day 637, receive an email: she’s looking into it.</p>\n<p>On day 644, Anthem calls you. It’s a new agent who is immensely polite. Someone you’ve never heard of was asked to work on another project, so she’s taking over your case. She has no updates yet, but promises to keep in touch.</p>\n<p>She does so. On day 653, she informs you Anthem will pay your claim in full. On day 659, she provides a check number. On day 666, the check arrives.</p>\n<p>Deposit the check. Write a thank you email to the ODI and Anthem’s concierge. Write this, too, down in your log.</p>", "image": null, "media": [], "authors": [ { "name": "Aphyr", "email": null, "url": "https://aphyr.com/" } ], "categories": [] }, { "id": "https://aphyr.com/posts/367-why-is-jepsen-written-in-clojure", "title": "Why is Jepsen Written in Clojure?", "description": null, "url": "https://aphyr.com/posts/367-why-is-jepsen-written-in-clojure", "published": "2023-12-05T14:49:05.000Z", "updated": "2023-12-05T14:49:05.000Z", "content": "<p>People keep asking why <a href=\"https://jepsen.io\">Jepsen</a> is written in <a href=\"https://clojure.org/\">Clojure</a>, so I figure it’s worth having a referencable answer. I’ve programmed in something like twenty languages. Why choose a Weird Lisp?</p>\n<p>Jepsen is built for testing concurrent systems–mostly databases. Because it tests concurrent systems, the language itself needs good support for concurrency. Clojure’s immutable, persistent data structures make it easier to write correct concurrent programs, and the language and runtime have excellent concurrency support: real threads, promises, futures, atoms, locks, queues, cyclic barriers, all of java.util.concurrent, etc. I also considered languages (like Haskell) with more rigorous control over side effects, but decided that Clojure’s less-dogmatic approach was preferable.</p>\n<p>Because Jepsen tests databases, it needs broad client support. Almost every database has a JVM client, typically written in Java, and Clojure has decent Java interop.</p>\n<p>Because testing is experimental work, I needed a language which was concise, adaptable, and well-suited to prototyping. Clojure is terse, and its syntactic flexibility–in particular, its macro system–work well for that. In particular the threading macros make chained transformations readable, and macros enable re-usable error handling and easy control of resource scopes. The Clojure REPL is really handy for exploring the data a test run produces.</p>\n<p>Tests involve representing, transforming, and inspecting complex, nested data structures. Clojure’s data structures and standard library functions are possibly the best I’ve ever seen. I also print a lot of structures to the console and files: Clojure’s data syntax (EDN) is fantastic for this.</p>\n<p>Because tests involve manipulating a decent, but not huge, chunk of data, I needed a language with “good enough” performance. Clojure’s certainly not the fastest language out there, but idiomatic Clojure is usually within an order of magnitude or two of Java, and I can shave off the difference where critical. The JVM has excellent profiling tools, and these work well with Clojure.</p>\n<p>Jepsen’s (gosh) about a decade old now: I wanted a language with a mature core and emphasis on stability. Clojure is remarkably stable, both in terms of JVM target and the language itself. Libraries don’t “rot” anywhere near as quickly as in Scala or Ruby.</p>\n<p>Clojure does have significant drawbacks. It has a small engineering community and no (broadly-accepted, successful) static typing system. Both of these would constrain a large team, but Jepsen’s maintained and used by only 1-3 people at a time. Working with JVM primitives can be frustrating without dropping to Java; I do this on occasion. Some aspects of the polymorphism system are lacking, but these can be worked around with libraries. The error messages are terrible. I have no apologetics for this. ;-)</p>\n<p>I prototyped Jepsen in a few different languages before settling on Clojure. A decade in, I think it was a pretty good tradeoff.</p>", "image": null, "media": [], "authors": [ { "name": "Aphyr", "email": null, "url": "https://aphyr.com/" } ], "categories": [] }, { "id": "https://aphyr.com/posts/366-finding-domains-that-send-unactionable-reports-in-mastodon", "title": "Finding Domains That Send Unactionable Reports in Mastodon", "description": null, "url": "https://aphyr.com/posts/366-finding-domains-that-send-unactionable-reports-in-mastodon", "published": "2023-09-03T15:25:47.000Z", "updated": "2023-09-03T15:25:47.000Z", "content": "<p>One of the things we struggle with on woof.group is un-actionable reports. For various reasons, most of the reports we handle are for posts that are either appropriately content-warned or don’t require a content warning under our content policy–things like faces, butts, and shirtlessness. We can choose to ignore reports from a domain, but we’d rather not do that: it means we might miss out on important reports that require moderator action. We can also talk to remote instance administrators and ask them to talk to their users about not sending copies of reports to the remote instance if they don’t know what the remote instance policy is, but that’s time consuming, and we only want to do it if there’s an ongoing problem.</p>\n<p>I finally broke down and dug around in the data model to figure out how to get statistics on this. If you’re a Mastodon admin and you’d like to figure out which domains send you the most non-actionable reports, you can run this at <code>rails console</code>:</p>\n<pre><code><span></span><span class=\"c1\"># Map of domains to [action, no-action] counts</span>\n<span class=\"n\">stats</span> <span class=\"o\">=</span> <span class=\"no\">Report</span><span class=\"o\">.</span><span class=\"n\">all</span><span class=\"o\">.</span><span class=\"n\">reduce</span><span class=\"p\">({})</span> <span class=\"k\">do</span> <span class=\"o\">|</span><span class=\"n\">stats</span><span class=\"p\">,</span> <span class=\"n\">r</span><span class=\"o\">|</span>\n <span class=\"n\">domain</span> <span class=\"o\">=</span> <span class=\"n\">r</span><span class=\"o\">.</span><span class=\"n\">account</span><span class=\"o\">.</span><span class=\"n\">domain</span>\n <span class=\"n\">domain_stats</span> <span class=\"o\">=</span> <span class=\"n\">stats</span><span class=\"o\">[</span><span class=\"n\">domain</span><span class=\"o\">]</span> <span class=\"o\">||</span> <span class=\"o\">[</span><span class=\"mi\">0</span><span class=\"p\">,</span> <span class=\"mi\">0</span><span class=\"o\">]</span>\n <span class=\"n\">action</span> <span class=\"o\">=</span> <span class=\"n\">r</span><span class=\"o\">.</span><span class=\"n\">history</span><span class=\"o\">.</span><span class=\"n\">any?</span> <span class=\"k\">do</span> <span class=\"o\">|</span><span class=\"n\">a</span><span class=\"o\">|</span>\n <span class=\"c1\"># If you took action, there'd be a Status or Account target_type</span>\n <span class=\"n\">a</span><span class=\"o\">.</span><span class=\"n\">target_type</span> <span class=\"o\">!=</span> <span class=\"s1\">'Report'</span>\n <span class=\"k\">end</span>\n <span class=\"k\">if</span> <span class=\"n\">action</span>\n <span class=\"n\">domain_stats</span><span class=\"o\">[</span><span class=\"mi\">0</span><span class=\"o\">]</span> <span class=\"o\">+=</span> <span class=\"mi\">1</span>\n <span class=\"k\">else</span>\n <span class=\"n\">domain_stats</span><span class=\"o\">[</span><span class=\"mi\">1</span><span class=\"o\">]</span> <span class=\"o\">+=</span> <span class=\"mi\">1</span>\n <span class=\"k\">end</span>\n <span class=\"n\">stats</span><span class=\"o\">[</span><span class=\"n\">domain</span><span class=\"o\">]</span> <span class=\"o\">=</span> <span class=\"n\">domain_stats</span>\n <span class=\"n\">stats</span>\n<span class=\"k\">end</span>\n\n<span class=\"c1\"># Top 20 domains, sorted by descending no-action count</span>\n<span class=\"n\">stats</span><span class=\"o\">.</span><span class=\"n\">sort_by</span> <span class=\"k\">do</span> <span class=\"o\">|</span><span class=\"n\">k</span><span class=\"p\">,</span> <span class=\"n\">stats</span><span class=\"o\">|</span>\n <span class=\"n\">stats</span><span class=\"o\">[</span><span class=\"mi\">1</span><span class=\"o\">]</span>\n<span class=\"k\">end</span><span class=\"o\">.</span><span class=\"n\">reverse</span><span class=\"o\">.</span><span class=\"n\">take</span><span class=\"p\">(</span><span class=\"mi\">20</span><span class=\"p\">)</span>\n</code></pre>\n<p>For example:</p>\n<pre><code><span></span><span class=\"o\">[[</span><span class=\"kp\">nil</span><span class=\"p\">,</span> <span class=\"o\">[</span><span class=\"mi\">77</span><span class=\"p\">,</span> <span class=\"mi\">65</span><span class=\"o\">]</span> <span class=\"p\">;</span> <span class=\"kp\">nil</span> <span class=\"n\">is</span> <span class=\"n\">your</span> <span class=\"n\">local</span> <span class=\"n\">instance</span>\n <span class=\"o\">[</span><span class=\"s2\">\"foo.social\"</span><span class=\"p\">,</span> <span class=\"o\">[</span><span class=\"mi\">3</span><span class=\"p\">,</span> <span class=\"mi\">52</span><span class=\"o\">]]</span><span class=\"p\">,</span>\n <span class=\"o\">[</span><span class=\"s2\">\"mastodon.oh.my.gosh\"</span><span class=\"p\">,</span> <span class=\"o\">[</span><span class=\"mi\">40</span><span class=\"p\">,</span> <span class=\"mi\">24</span><span class=\"o\">]]</span><span class=\"p\">,</span>\n <span class=\"o\">...</span>\n</code></pre>", "image": null, "media": [], "authors": [ { "name": "Aphyr", "email": null, "url": "https://aphyr.com/" } ], "categories": [] }, { "id": "https://aphyr.com/posts/365-10-operations-large-histories-with-jepsen", "title": "10⁹ Operations: Large Histories with Jepsen", "description": null, "url": "https://aphyr.com/posts/365-10-operations-large-histories-with-jepsen", "published": "2023-08-18T19:45:15.000Z", "updated": "2023-08-18T19:45:15.000Z", "content": "<p><a href=\"https://github.com/jepsen-io/jepsen\">Jepsen</a> is a library for writing tests of concurrent systems: everything from single-node data structures to distributed databases and queues. A key part of this process is recording a <em>history</em> of operations performed during the test. Jepsen <em>checkers</em> analyze a history to find consistency anomalies and to compute performance metrics. Traditionally Jepsen has stored the history in a Clojure vector (an immutable in-memory data structure like an array), and serialized it to disk at the end of the test. This limited Jepsen to histories on the order of tens of millions of operations. It also meant that if Jepsen crashed during a several-hour test run, it was impossible to recover any of the history for analysis. Finally, saving and loading large tests involved long wait times—sometimes upwards of ten minutes.</p>\n<p>Over the last year I’ve been working on ways to resolve these problems. Generators are up to ten times faster, and can emit up to 50,000 operations per second. A new operation datatype makes each operation smaller and faster to access. Jepsen’s new on-disk format allows us to stream histories incrementally to disk, to work with histories of up to a billion operations far exceeding available memory, to recover safely from crashes, and to load tests almost instantly by deserializing data lazily. New history datatypes support both densely and sparsely indexed histories, and efficiently cache auxiliary indices. They also support lazy disk-backed <code>map</code> and <code>filter</code>. These histories support both linear and concurrent folds, which dramatically improves checker performance on multicore systems: real-world checkers can readily analyze 250,000 to 900,000 operations/sec. Histories support multi-query optimization: when multiple threads fold over the same history, a query planner automatically fuses those folds together to perform them in a single pass. Since Jepsen often folds dozens of times over the same history, this saves a good deal of disk IO and deserialization time. These features are enabled by a new, transactional, dependency-aware task executor.</p>\n<h2><a href=\"#background\" id=\"background\">Background</a></h2>\n<p>Jepsen histories are basically a list of operations in time order. The meaning of an operation depends on the system being tested, but an operation could be something like “set <code>x</code> to three”, “read <code>y</code> and <code>z</code> as <code>2</code> and <code>7</code>, respectively”, or “add 64 to the queue”. Each of these logical operations has <em>two</em> operation elements in the history: an <em>invocation</em> when it begins, and a <em>completion</em> when it ends. This tells checkers which operations were performed concurrently, and which took place one after the next. For convenience, we use “operation” to refer both to a single entry in the history, and to an invocation/completion pair—it’s usually clear from context.</p>\n<p>For example, here’s a history of transactions from an etcd test:</p>\n<pre><code><span></span><span class=\"p\">[{</span><span class=\"ss\">:type</span> <span class=\"ss\">:invoke</span>, <span class=\"ss\">:f</span> <span class=\"ss\">:txn</span>, <span class=\"ss\">:value</span> <span class=\"p\">[[</span><span class=\"ss\">:w</span> <span class=\"mi\">2</span> <span class=\"mi\">1</span><span class=\"p\">]]</span>, <span class=\"ss\">:time</span> <span class=\"mi\">3291485317</span>, <span class=\"ss\">:process</span> <span class=\"mi\">0</span>, <span class=\"ss\">:index</span> <span class=\"mi\">0</span><span class=\"p\">}</span>\n <span class=\"p\">{</span><span class=\"ss\">:type</span> <span class=\"ss\">:invoke</span>, <span class=\"ss\">:f</span> <span class=\"ss\">:txn</span>, <span class=\"ss\">:value</span> <span class=\"p\">[[</span><span class=\"ss\">:r</span> <span class=\"mi\">0</span> <span class=\"nv\">nil</span><span class=\"p\">]</span> <span class=\"p\">[</span><span class=\"ss\">:w</span> <span class=\"mi\">1</span> <span class=\"mi\">1</span><span class=\"p\">]</span> <span class=\"p\">[</span><span class=\"ss\">:r</span> <span class=\"mi\">2</span> <span class=\"nv\">nil</span><span class=\"p\">]</span> <span class=\"p\">[</span><span class=\"ss\">:w</span> <span class=\"mi\">1</span> <span class=\"mi\">2</span><span class=\"p\">]]</span>, <span class=\"ss\">:time</span> <span class=\"mi\">3296209422</span>, <span class=\"ss\">:process</span> <span class=\"mi\">2</span>, <span class=\"ss\">:index</span> <span class=\"mi\">1</span><span class=\"p\">}</span>\n <span class=\"p\">{</span><span class=\"ss\">:type</span> <span class=\"ss\">:fail</span>, <span class=\"ss\">:f</span> <span class=\"ss\">:txn</span>, <span class=\"ss\">:value</span> <span class=\"p\">[[</span><span class=\"ss\">:r</span> <span class=\"mi\">0</span> <span class=\"nv\">nil</span><span class=\"p\">]</span> <span class=\"p\">[</span><span class=\"ss\">:w</span> <span class=\"mi\">1</span> <span class=\"mi\">1</span><span class=\"p\">]</span> <span class=\"p\">[</span><span class=\"ss\">:r</span> <span class=\"mi\">2</span> <span class=\"nv\">nil</span><span class=\"p\">]</span> <span class=\"p\">[</span><span class=\"ss\">:w</span> <span class=\"mi\">1</span> <span class=\"mi\">2</span><span class=\"p\">]]</span>, <span class=\"ss\">:time</span> <span class=\"mi\">3565403674</span>, <span class=\"ss\">:process</span> <span class=\"mi\">2</span>, <span class=\"ss\">:index</span> <span class=\"mi\">2</span>, <span class=\"ss\">:error</span> <span class=\"p\">[</span><span class=\"ss\">:duplicate-key</span> <span class=\"s\">\"etcdserver: duplicate key given in txn request\"</span><span class=\"p\">]}</span>\n <span class=\"p\">{</span><span class=\"ss\">:type</span> <span class=\"ss\">:ok</span>, <span class=\"ss\">:f</span> <span class=\"ss\">:txn</span>, <span class=\"ss\">:value</span> <span class=\"p\">[[</span><span class=\"ss\">:w</span> <span class=\"mi\">2</span> <span class=\"mi\">1</span><span class=\"p\">]]</span>, <span class=\"ss\">:time</span> <span class=\"mi\">3767733708</span>, <span class=\"ss\">:process</span> <span class=\"mi\">0</span>, <span class=\"ss\">:index</span> <span class=\"mi\">3</span><span class=\"p\">}]</span>\n</code></pre>\n<p>This shows two concurrent transactions executed by process 0 and 2 respectively. Process 0’s transaction attempted to write key 2 = 1 and succeeded, but process 2’s transaction tried to read 0, write key 1 = 1, read 2, then write key 1 = 2, and failed.</p>\n<p>Each operation is logically a map with <a href=\"https://github.com/jepsen-io/history/tree/be95aef21739167760c998e988fd4b3d10db1e3e#operation-structure\">several predefined keys</a>. The <code>:type</code> field determines whether an operation is an invocation (<code>:invoke</code>) or a successful (<code>:ok</code>), failed (<code>:fail</code>), or indeterminate (<code>:info</code>) completion. The <code>:f</code> and <code>:value</code> fields indicate what the operation did—their interpretation is up to the test in question. We use <code>:process</code> to track which logical thread performed the operation. <code>:index</code> is a strictly-monotonic unique ID for every operation. In <em>dense</em> histories, indices go 0, 1, 2, …. FIltering or limiting a history to a subset of those operations yields a <em>sparse</em> history, with IDs like 3, 7, 8, 20, …. Operations can also include all kinds of additional fields at the test’s discretion.</p>\n<p>During a test Jepsen decides what operation to invoke next by asking a <em>generator</em>: a functional data structure that evolves as operations are performed. Generators include a rich library of combinators that can build up complex sequences of operations, including sequencing, alternation, mapping, filtering, limiting, temporal delays, concurrency control, and selective nondeterminism.</p>\n<h2><a href=\"#generator-performance\" id=\"generator-performance\">Generator Performance</a></h2>\n<p>One of the major obstacles to high-throughput testing with Jepsen was that the generator system simply couldn’t produce enough operations to keep up with the database under test. Much of this time was spent in manipulating the generator <em>context</em>: a data structure that keeps track of which threads are free and which logical processes are being executed by which threads. Contexts are often checked or restricted—for instance, a generator might want to pick a random free process in order to emit an operation, or to filter the set of threads in a context to a specific subset for various sub-generators: two threads making writes, three more performing reads, and so on.</p>\n<p>Profiling revealed context operations as a major bottleneck in generator performance—specifically, manipulating the Clojure maps and sets we initially used to represent context data. In Jepsen 0.3.0 I introduced a new <a href=\"https://github.com/jepsen-io/jepsen/blob/eeec7ef02ca278f52d3f1c71d9ff4ae5a75aad43/jepsen/src/jepsen/generator/context.clj\">jepsen.generator.context</a> namespace with high-performance data structures for common context operations. I wrote a custom <a href=\"https://github.com/jepsen-io/jepsen/blob/main/jepsen/src/jepsen/generator/translation_table.clj\">translation table</a> to map thread identifiers (e.g. <code>1</code>, <code>2</code>, … <code>:nemesis</code>) to and from integers. This let us encode context threads using Java <code>BitSet</code>s, which are significantly more compact than hashmap-backed sets and require no pointer chasing. To restrict threads to a specific subset, we <a href=\"https://github.com/jepsen-io/jepsen/blob/main/jepsen/src/jepsen/generator/context.clj#L311-L358\">precompile a filter BitSet</a>, which turns expensive set intersection into a blazing-fast bitwise <code>and</code>. This bought us an <a href=\"https://github.com/jepsen-io/jepsen/commit/91d5e8fae64eca31b243fd978099c888db63260f\">order-of-magnitude performance improvement</a> in a realistic generator performance test—pushing up to 26,000 ops/second through a 1024-thread generator.</p>\n<p>Based on <a href=\"https://www.yourkit.com/\">YourKit</a> profiling I made additional performance tweaks. Jepsen now <a href=\"https://github.com/jepsen-io/jepsen/commit/165cba94be370a3226cb3c5e088b25598d0d97f0\">memoizes function arity checking</a> when functions are used as generators. A few type hints helped avoid reflection in hot paths. <a href=\"https://github.com/jepsen-io/jepsen/commit/2c17e3253bd4459da07bf551a80d57f98e9d3f42\">Additional micro-optimizations</a> bought further speedups. The function which fills in operation types, times, and processes spent a lot of time manipulating persistent maps, which we avoided by carefully tracking which fields had been read from the underlying map. We replaced expensive <code>mapcat</code> with a transducer, replaced <code>seq</code> with <code>count</code> to optimize empty-collection checks, and replaced a few <code>=</code> with <code>identical?</code> for symbols.</p>\n<p>In testing of list-append generators (a reasonably complex generator type) these improvements <a href=\"https://mastodon.jepsen.io/@jepsen/109287959624710790\">let us push 50,000 operations per second</a> with 100 concurrent threads.</p>\n<h2><a href=\"#off-with-her-head\" id=\"off-with-her-head\">Off With Her Head</a></h2>\n<p>Another challenge in long-running Jepsen tests was a linear increase in memory consumption. This was caused by the test map, which is passed around ubiquitously and contains a reference to the initial generator that the test started with. Since generators are a lot like infinite sequences, “retaining the head” of the generator often resulted in keeping a chain of references to every single operation ever generated in memory.</p>\n<p>Because the test map is available during every phase of the test, and could be accessed by all kinds of uncontrolled code, we needed a way to enforce that the generator wouldn’t be retained accidentally. Jepsen now wraps the generator in a <a href=\"https://github.com/jepsen-io/jepsen/commit/7b8b5d531694a58a352de913c18c2bf2e4facacb\">special <code>Forgettable</code> reference type</a> that can be explicitly forgotten. Forgetting that reference once the test starts running allows the garbage collector to free old generator states. Tests of 10<sup>9</sup> operations can now run in just a few hundred megabytes of heap.</p>\n<h2><a href=\"#operations-as-records\" id=\"operations-as-records\">Operations as Records</a></h2>\n<p>We initially represented operations as Clojure maps, which use arrays and hash tables internally. These maps require extra storage space, can only store JVM Objects, and require a small-but-noticeable lookup cost. Since looking up fields is so frequent, and memory efficiency important, we replaced them with Clojure <a href=\"https://clojure.org/reference/datatypes#_deftype_and_defrecord\">records</a>. Our new <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L250-L253\">Op datatype</a> stores its index and time as primitive longs, and the type, process, f, and value as Object fields. This is a significantly denser layout in memory, which also aids cache locality. Access to fields still looks like a map lookup (e.g. <code>(:index some-op)</code>), but compiles into a type cache check and a read of that field in the Op object. Much faster!</p>\n<p>Clojure records work very much like maps—they store a reference to a hashmap, so test can still store any number of extra fields on an Op. This preserves compatibility with existing Jepsen tests. The API is <em>slightly</em> different: <code>index</code> and <code>time</code> fields are no longer optional. Jepsen always generates these fields during tests, so this change mainly affects the internal test suites for Jepsen checkers.</p>\n<h2><a href=\"#streaming-disk-storage\" id=\"streaming-disk-storage\">Streaming Disk Storage</a></h2>\n<p>Jepsen initially saved tests, including both histories and the results of analysis, as a single map serialized with <a href=\"https://github.com/Datomic/fressian/wiki\">Fressian</a>. This worked fine for small tests, but writing large histories could take tens of minutes. Jepsen now uses a <a href=\"https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj\">custom disk format</a>. Jepsen files consist of a short header followed by an append-only log of immutable blocks. Each block contains a type, length, and checksum header, followed by data. These blocks form a persistent tree structure on disk: changes are written by appending new blocks to the end of the file, then flipping a single offset in the file header to point to the block that encodes the root of the tree.</p>\n<p>Blocks can store <a href=\"https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L97\">arbitrary Fressian data</a>, <a href=\"https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L112\">layered, lazily-decoded maps</a>, <a href=\"https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L130\">streaming lists</a>, and <a href=\"https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L143\">chunked vectors</a> which store pointers to other vectors “stitched together” into a single large vector. Chunked vectors store the index of the first element in each chunk, so you can jump to the correct chunk for any index in constant time.</p>\n<p>During a test Jepsen <a href=\"https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L1084-L1135\">writes each operation</a> into a series of chunks on disk—each a streaming list of Fressian-encoded data. When a chunk is full (by default, 16384 operations), Jepsen seals that chunk, writes its checksum, and constructs a new chunked vector block with pointers to each chunk in the history—re-using existing chunks. It then writes a new root block and flips the root pointer at the start of the file. This ensures that at most the last 16384 operations can be lost during a crash.</p>\n<p>When Jepsen loads a test from disk it reads the header, decodes the root block, finds the base of the test map, and constructs a <a href=\"https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L1406-L1434\">lazy map</a> which decodes results and the history only when asked for. Results are also encoded as <a href=\"https://github.com/jepsen-io/jepsen/blob/2310c73a23432c107d5e2604281a1f1a334ef06f/jepsen/src/jepsen/store/format.clj#L1252-L1285\">lazy maps</a>, so tests with large results can still be decoded efficiently. This lets Jepsen read the high-level properties of tests—their names, timestamps, overall validity, etc., in a few milliseconds rather than tens of minutes. It also preserves compatibility: tests loaded from disk look and feel like regular hashmaps. Metadata allows those tests to be altered and saved back to disk efficiently, so you can re-analyze them later.</p>\n<p>Jepsen loads a test’s history using a <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/core.clj#L166-L326\">lazy chunked vector</a>. The top-level vector keeps a cached count and lookup table to efficiently map indices to chunks. Each chunk is lazily decoded on first read and cached in a JVM <a href=\"https://docs.oracle.com/javase/8/docs/api/java/lang/ref/SoftReference.html\">SoftReference</a>, which can be reclaimed under memory pressure. The resulting vector looks to users just like an in-memory vector, which ensures compatibility with extant Jepsen checkers.</p>\n<h2><a href=\"#history-datatypes\" id=\"history-datatypes\">History Datatypes</a></h2>\n<p>When working with histories checkers need to perform some operations over and over again. For instance, they often need a count of results, to look up an operation by its <code>:index</code> field, to jump between invocations and completions, and to schedule concurrent tasks for analysis. We therefore augment our lazy vectors of operations with an <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L491-L524\">IHistory</a> type which supports these operations. Our histories still act like a vector of operations, but also provide efficient, cached mapping of invocations to completions, a threadpool for scheduling analysis tasks, and a stateful execution planner for folds over the history, which we call a <em>folder</em>.</p>\n<p>When Jepsen records a history it assigns each operation a sequential index 0, 1, 2, 3, …. We call these histories <em>dense</em>. This makes it straightforward to look up an operation by its index: <code>(nth history idx</code>). However, users might also filter a history to just some operations, or excerpt a slice of a larger history for detailed analysis. This results in a <em>sparse</em> history. Our sparse histories <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L782-L850\">maintain a lazily-computed mapping of operation indices to vector indices</a> to efficiently support this.</p>\n<p>We also provide lazy, history-specific versions of <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L927-L1010\">map</a> and <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L1014-L1153\">filter</a>. Checkers perform these operations on histories over and over again, but using standard Clojure <code>map</code> and <code>filter</code> would be inappropriate: once realized, the entire mapped/filtered sequence is stored in memory indefinitely. They also require linear-time lookups for <code>nth</code>. Our versions <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L969-L975\">push down the map and filter operations into each access</a>, trading off cachability for reduced memory consumption. This pushdown also enables multiple queries to share the same map or filter function. We also provide <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L939-L941\">constant-time nth</a> on mapped histories, and <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history.clj#L1095-L1099\">log-time nth</a> on filtered ones.</p>\n<h2><a href=\"#concurrent-folds\" id=\"concurrent-folds\">Concurrent Folds</a></h2>\n<p>Because Jepsen’s disk files are divided into multiple chunks we can efficiently parallelize folds (reductions) over histories. Jepsen histories provide their own <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj\">folding facility</a>, similar to (and, indeed, directly compatible with) <a href=\"https://github.com/aphyr/tesser\">Tesser</a>, a library I wrote for composing concurrent folds. Tesser is intended for commutative folds, but almost all of its operators—map, filter, group-by, fuse, facet, etc.—work properly in ordered reductions too. History folds can perform a concurrent reduction over each chunk independently, and then combine the chunks together in a linear reduction. Since modern CPUs can have dozens or even hundreds of cores, this yields a significant speedup. Since each core reads its chunk from the file in a linear scan, it’s also friendly to disk and memory readahead prediction. If this sounds like <a href=\"https://en.wikipedia.org/wiki/MapReduce\">MapReduce</a> on a single box, you’re right.</p>\n<p><img class=\"attachment pure-img\" src=\"/data/posts/365/fold.jpg\" alt=\"A diagram showing how concurrent folds proceed in independent reductions over each chunk, then combined in a second linear pass.\" title=\"A diagram showing how concurrent folds proceed in independent reductions over each chunk, then combined in a second linear pass.\"></p>\n<p>To support this access pattern each history comes with an associated <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L1381-L1405\">Folder</a>: a stateful object that schedules, executes, and returns results from folds. Performing a <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L818-L845\">linear fold over a history</a> is easy: the folder schedules a reduce of the first chunk of the history, then use that result as the input to a reduce of the second chunk, and so on. A <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L847-L879\">concurrent fold</a> schedules reductions over every chunk concurrently, and a series of combine tasks which fold the results of each chunk reduction in serial.</p>\n<p>Folders are careful about thread locality and memory barriers during inter-thread handoff, which means folds can use <em>mutable</em> in-memory structures for their accumulators, not just immutable values. The reducing and combining functions don’t need to acquire locks of their own, and most access is completely unsynchronized. This works particularly well with the new <a href=\"https://aphyr.com/posts/363-fast-multi-accumulator-reducers\">mutable reducers</a> in <a href=\"https://github.com/aphyr/dom-top\">dom-top</a>.</p>\n<p>Folders and histories directly support the interfaces for plain-old Clojure <code>reduce</code> and <code>reducers/fold</code>, so existing checkers immediately take advantage of folder optimizations. They’re translated to folds over the history internally. Transducers work just like you’d expect.</p>\n<h2><a href=\"#multi-query-optimization\" id=\"multi-query-optimization\">Multi-Query Optimization</a></h2>\n<p>Jepsen tests often involve dozens of different checkers, each of which might perform many folds. When a history is larger than memory, some chunks are evicted from cache to make room for new ones. This means that a second fold might end up re-parsing chunks of history from disk that a previous fold already parsed–and forgot due to GC pressure. Running many folds at the same time also reduces locality: each fold likely reads an operation and follows pointers to other objects at different times and on different cores. This burns extra time in memory accesses <em>and</em> evicts other objects from the CPU’s cachelines. It would be much more efficient if we could do many folds in a single pass.</p>\n<p>How should this coordination be accomplished? In theory we could have a pure fold composition system and ask checkers to return the folds they intended to perform, along with some sort of continuation to evaluate when results were ready—but this requires rewriting every extant checker and breaking the intuitive path of a checker’s control flow. It also plays terribly with internal concurrency in a checker. Instead, each history’s folder provides a stateful coordination service for folds. Checkers simply ask the folder to fold over the history, and the folder secretly fuses all those folds into as few passes as possible.</p>\n<p><img class=\"attachment pure-img\" src=\"/data/posts/365/mqo.jpg\" alt=\"When a second fold is run, the folder computes a fused fold and tries to do as much work as possible in a single pass.\" title=\"When a second fold is run, the folder computes a fused fold and tries to do as much work as possible in a single pass.\"></p>\n<p>Imagine one thread starts a fold (orange) that looks for G1a (aborted read). That fold finishes reducing chunk 0 of the history and is presently reducing chunks 3 and 5, when another thread starts a fold to compute latency statistics (blue). The folder realizes it has an ongoing concurrent fold. It builds a fused fold (gray) which performs both G1a and latency processing in a single step. It then schedules latency reductions for chunks 0, 3, and 5, to catch up with the G1a fold. Chunks 1, 2, 4, and 6+ haven’t been touched yet, so the folder cancels their reductions. In their place it spawns new fused reduction tasks for those chunks, and new combine tasks for their results. It automatically weaves together the independent and fused stages so that they re-use as much completed work as possible. Most of the history is processed in a single pass, and once completed, the folder delivers results from both folds to their respective callers. All of this is <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L1025-L1329\">done transactionally</a>, ensuring exactly-once processing and thread safety even with unsynchronized mutable accumulators. A simpler join process allows the folder to fuse together <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/fold.clj#L910-L1023\">linear folds</a>.</p>\n<p>The end result of this byzantine plumbing is that callers don’t have to think about <em>how</em> the underlying pass is executed. They just ask to fold over the history, and the folder figures out an efficient plan for executing it. This might involve a few independent passes over the first few chunks, but those chunks are likely retained in cache, which cuts down on deserialization. The rest of the history is processed in a single pass. We get high throughput, reduced IO, and cache locality.</p>\n<h2><a href=\"#task-scheduling\" id=\"task-scheduling\">Task Scheduling</a></h2>\n<p>To take advantage of multiple cores efficiently, we want checkers to run their analyses in multiple threads—but not <em>too</em> many, lest we impose heavy context-switching costs. In order to ensure thread safety and exactly-once processing during multi-query optimization, we need to ensure that fold tasks are cancelled and created atomically. We also want to ensure that tasks which block on the results of other tasks don’t launch until their dependencies are complete. This would lead to threadpool starvation and in some cases deadlock: a classic problem with Java threadpools.</p>\n<p>To solve these problems Jepsen’s history library includes a <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj\">transactional, dependency-aware task scheduler</a>. Tasks are functions which receive their dependencies’ results as arguments, and can only start when their dependencies are complete. This prevents deadlock. Tasks are cancellable, which also cancels their dependencies. An exception in a task propagates to downstream dependencies, which simplifies error handling. Dereferencing a task returns its result, or throws if an exception occurred. These capabilities are backed by an <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L373-L379\">immutable executor state</a> which tracks the dependency graph, which tasks are ready and presently running, and the next task ID. A miniature <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L806-L832\">effect system</a> applies a log of “side effects” from changes to the immutable state. This approach allows us to support <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L834-L851\">arbitrary transactions</a> over the task executor: one can create and cancel any number of tasks in an atomic step.</p>\n<p>This is powered by questionable misuses of Java’s ExecutorService & BlockingQueue: our “queue” is simply the immutable state’s dependency graph. We <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L741-L747\">ignore the ExecutorService’s attempts to add things to the queue</a> and instead mutate the state through our transaction channels. We wake up the executor with <a href=\"https://github.com/jepsen-io/history/blob/be95aef21739167760c998e988fd4b3d10db1e3e/src/jepsen/history/task.clj#L828-L832\">trivial runnables</a> when tasks become available. This feels evil, but it <em>does</em> work: executor safety is validated by an extensive series of <a href=\"https://github.com/jepsen-io/history/tree/be95aef21739167760c998e988fd4b3d10db1e3e/test/jepsen\">generative tests</a> which stress both linear and concurrent behavior.</p>\n<h2><a href=\"#compatibility\" id=\"compatibility\">Compatibility</a></h2>\n<p>The most significant API change was replacing plain vectors of operations with a first-class <code>IHistory</code> type. Users who constructed their own histories and fed them to built-in checkers—for instance, by calling regular Clojure <code>map</code> or <code>filter</code> on a history—received a blunt warning of type incompatibility. The fix here is straightforward: either use <code>jepsen.history/map</code> and friends, or <code>jepsen.history/history</code> to wrap an existing collection in a new <code>IHistory</code>. These wrappers are designed to work both for the clean histories Jepsen produces as well as the incomplete test histories involved in Jepsen’s internal test suites.</p>\n<p>The other significant change was making operation <code>:time</code> and <code>:index</code> mandatory. These are always provided by Jepsen, but might be missing from hand-coded histories used in internal test suites. Operations are automatically promoted to hashmaps by the <code>history</code> wrapper function, and indices can either be validated or automatically assigned for testing.</p>\n<p>Otherwise the transition was remarkably successful. Existing code could continue to treat operations as hashmaps, to pretend histories were vectors, and to use standard Clojure collection operations as before. These all worked transparently, despite now being backed by lazily cached on-disk structures. I incrementally rewrote most of the core checkers in Jepsen and Elle to take advantage of the new Op type and fold system, and obtained significant speedups throughout the process.</p>\n<p>Dropping <code>:generator</code> from the test map is a recent change. I don’t <em>think</em> that’s going to cause problems; I’m not aware of anything that actually read the generator state other than debugging code, but you never know.</p>\n<h2><a href=\"#performance\" id=\"performance\">Performance</a></h2>\n<p>With Jepsen 0.2.7 an etcd list-append transactional test on a five-node LXC cluster might perform ~7,000 ops/sec (counting both invocations and completions). At this point the bottleneck is etcd, not Jepsen itself. With a 16 GB heap, that test would exhaust memory and crash after roughly two hours. Given 90 minutes, it generated a single Fressian object of 2.2 GB containing 38,076,478 operations; this file was larger than MAX_INT bytes and caused Jepsen to crash.</p>\n<p>When limited to an hour that test squeaked in under both heap and serialization limits: 25,461,508 operations (7.07 kHz) in a 1.5 GB Jepsen file. Analyzing that in-memory history for G1a, G1b, and internal consistency took 202 seconds (126 kHz). This timeseries plot of heap use and garbage collections shows the linear growth in memory use required to retain the entire history and generator state in memory.</p>\n<p><img class=\"attachment pure-img\" src=\"/data/posts/365/027-memory-just-enough.png\" alt=\"A timeseries plot of heap use and garbage collections. Memory use rises linearly over the hour-long test run, followed by a spike in heap use and intense collection pressure during the final analysis phase.\" title=\"A timeseries plot of heap use and garbage collections. Memory use rises linearly over the hour-long test run, followed by a spike in heap use and intense collection pressure during the final analysis phase.\"></p>\n<p>The resulting Jepsen file took 366 seconds and a 58 GB heap to load the history—even just to read a single operation. There are various reasons for this, including the larger memory footprint of persistent maps and vectors, temporarily retained intermediate representations from the Fressian decoder, pointers and object header overhead, Fressian’s more-compact encodings of primitive types, and its caching of repeated values.</p>\n<p>With Jepsen 0.3.3 that same hour-long test consumed only a few hundred megabytes of heap until the analysis phase, where it took full advantage of the 16 GB heap to cache most of the history and avoid parsing chunks multiple times. It produced a history of 40,358,966 elements (11.2 kHz) in a 1.9 GB Jepsen file. The analysis completed in 97 seconds (416 kHz). Even though the analysis phase has to re-parse the entire history from disk, it does so in parallel, which significantly speeds up the process.</p>\n<p><img class=\"attachment pure-img\" src=\"/data/posts/365/033-one-hour.png\" alt=\"A timeseries plot of heap use and garbage collections. Memory use is essentially constant over the hour-long test run, oldgen rising slightly and then falling during a few collections. Memory spikes at the end of the test, when we re-load the history and analyze it.\" title=\"A timeseries plot of heap use and garbage collections. Memory use is essentially constant over the hour-long test run, oldgen rising slightly and then falling during a few collections. Memory spikes at the end of the test, when we re-load the history and analyze it.\"></p>\n<p>Given ~25.4 hours that same test suite recorded a billion (1,000,000,000) operations at roughly 11 kHz, producing a 108 GB Jepsen file. Checking that history for G1a, G1b, and internal anomalies took 1 hour and 6 minutes (249 kHz) with a 100 GB heap, and consumed 70-98% of 48 cores. Note that our heap needed to be much larger. G1a and G1b are global anomalies and require computing all failed and intermediate writes in an initial pass over the history, then retaining those index structures throughout a second pass. Under memory pressure, this analysis requires more frequent reloading of history chunks that have been evicted from cache. Simpler folds, like computing totals for ok, failed, and crashed operations, can execute at upwards of 900 kHz.</p>\n<p>Thanks to the new file format’s lazy structure, parsing the top-level test attributes (e.g. time, name, validity) took just 20-30 milliseconds. Fetching a single element of the history (regardless of position) took ~500 milliseconds: a few disk seeks to load the root, test, and history block, and then checksumming and parsing the appropriate chunk. The history block itself was ~715 KB, and consisted of pointers to 61,000 chunks of 16384 operations each.</p>\n<h2><a href=\"#conclusion\" id=\"conclusion\">Conclusion</a></h2>\n<p>Jepsen 0.3.3 is capable of generating 50,000 operations per second, tackling histories of up to a billion operations and checking close to a million operations per second. Of course, these numbers depend on what the generator, client, and test are doing! Going further may be tricky: many JVM collection APIs are limited to 2<sup>31</sup> - 1 elements. Many of the checkers in Jepsen have been rewritten to run as concurrent folds, and to use more memory-efficient structures. However, only a few analyses are completely local: many require O(n) memory and may be impossible to scale past a few million operations.</p>\n<p>You can start using <a href=\"https://github.com/jepsen-io/jepsen/commit/cc200b1fd66cc36f827259237727655472c73f8e\">Jepsen 0.3.3</a> today. I hope it opens up new territory for distributed systems testers everywhere.</p>", "image": null, "media": [], "authors": [ { "name": "Aphyr", "email": null, "url": "https://aphyr.com/" } ], "categories": [] }, { "id": "https://aphyr.com/posts/363-fast-multi-accumulator-reducers", "title": "Fast Multi-Accumulator Reducers", "description": null, "url": "https://aphyr.com/posts/363-fast-multi-accumulator-reducers", "published": "2023-04-27T13:47:46.000Z", "updated": "2023-04-27T13:47:46.000Z", "content": "<p>Again <a href=\"https://aphyr.com/posts/360-loopr-a-loop-reduction-macro-for-clojure\">with the reductions</a>! I keep writing code which reduces over a collection, keeping track of more than one variable. For instance, here’s one way to find the mean of a collection of integers:</p>\n<pre><code><span></span><span class=\"p\">(</span><span class=\"kd\">defn </span><span class=\"nv\">mean</span>\n <span class=\"s\">\"A reducer to find the mean of a collection. Accumulators are [sum count] pairs.\"</span>\n <span class=\"p\">([]</span> <span class=\"p\">[</span><span class=\"mi\">0</span> <span class=\"mi\">0</span><span class=\"p\">])</span>\n <span class=\"p\">([[</span><span class=\"nv\">sum</span> <span class=\"nv\">count</span><span class=\"p\">]]</span> <span class=\"p\">(</span><span class=\"nb\">/ </span><span class=\"nv\">sum</span> <span class=\"nv\">count</span><span class=\"p\">))</span>\n <span class=\"p\">([[</span><span class=\"nv\">sum</span> <span class=\"nv\">count</span><span class=\"p\">]</span> <span class=\"nv\">x</span><span class=\"p\">]</span>\n <span class=\"p\">[(</span><span class=\"nb\">+ </span><span class=\"nv\">sum</span> <span class=\"nv\">x</span><span class=\"p\">)</span> <span class=\"p\">(</span><span class=\"nb\">inc </span><span class=\"nv\">count</span><span class=\"p\">)]))</span>\n</code></pre>\n<p>This <code>mean</code> function is what Clojure calls a <a href=\"https://clojure.org/reference/reducers\">reducer</a>, or a <a href=\"https://clojure.org/reference/transducers\">reducing function</a>. With no arguments, it constructs a fresh accumulator. With two arguments, it combines an element of some collection with the accumulator, returning a new accumulator. With one argument, it transforms the accumulator into some final result.</p>\n<p>We can use this reducer with standard Clojure <code>reduce</code>, like so:</p>\n<pre><code><span></span><span class=\"p\">(</span><span class=\"nf\">require</span> <span class=\"o\">'</span><span class=\"p\">[</span><span class=\"nv\">criterium.core</span> <span class=\"ss\">:refer</span> <span class=\"p\">[</span><span class=\"nv\">quick-bench</span> <span class=\"nv\">bench</span><span class=\"p\">]]</span> <span class=\"o\">'</span><span class=\"p\">[</span><span class=\"nv\">clojure.core.reducers</span> <span class=\"ss\">:as</span> <span class=\"nv\">r</span><span class=\"p\">])</span>\n<span class=\"p\">(</span><span class=\"k\">def </span><span class=\"nv\">numbers</span> <span class=\"p\">(</span><span class=\"nf\">vec</span> <span class=\"p\">(</span><span class=\"nb\">range </span><span class=\"mi\">1</span><span class=\"nv\">e7</span><span class=\"p\">)))</span>\n<span class=\"p\">(</span><span class=\"nf\">mean</span> <span class=\"p\">(</span><span class=\"nb\">reduce </span><span class=\"nv\">mean</span> <span class=\"p\">(</span><span class=\"nf\">mean</span><span class=\"p\">)</span> <span class=\"nv\">numbers</span><span class=\"p\">))</span>\n<span class=\"c1\">; => 9999999/2</span>\n</code></pre>\n<p>Or use <code>clojure.core.reducers/reduce</code> to apply that function to every element in a collection. We don’t need to call <code>(mean)</code> to get an identity here; <code>r/reduce</code> does that for us:</p>\n<pre><code><span></span><span class=\"p\">(</span><span class=\"nf\">mean</span> <span class=\"p\">(</span><span class=\"nf\">r/reduce</span> <span class=\"nv\">mean</span> <span class=\"nv\">numbers</span><span class=\"p\">))</span>\n<span class=\"c1\">; => 9999999/2</span>\n</code></pre>\n<p>Or we can use <code>transduce</code>, which goes one step further and calls the single-arity form to transform the accumulator. Transduce also takes a <code>transducer</code>: a function which takes a reducing function and transforms it into some other reducing function. In this case we’ll use <code>identity</code>–<code>mean</code> is exactly what we want.</p>\n<pre><code><span></span><span class=\"p\">(</span><span class=\"nf\">transduce</span> <span class=\"nb\">identity </span><span class=\"nv\">mean</span> <span class=\"nv\">numbers</span><span class=\"p\">)</span>\n<span class=\"c1\">; => 9999999/2</span>\n</code></pre>\n<p>The problem with all of these approaches is that <code>mean</code> is slow:</p>\n<pre><code><span></span><span class=\"nv\">user=></span> <span class=\"p\">(</span><span class=\"nf\">bench</span> <span class=\"p\">(</span><span class=\"nf\">transduce</span> <span class=\"nb\">identity </span><span class=\"nv\">mean</span> <span class=\"nv\">numbers</span><span class=\"p\">))</span>\n<span class=\"nv\">Evaluation</span> <span class=\"nb\">count </span><span class=\"err\">:</span> <span class=\"mi\">60</span> <span class=\"nv\">in</span> <span class=\"mi\">60</span> <span class=\"nv\">samples</span> <span class=\"nv\">of</span> <span class=\"mi\">1</span> <span class=\"nv\">calls.</span>\n <span class=\"nv\">Execution</span> <span class=\"nb\">time </span><span class=\"nv\">mean</span> <span class=\"err\">:</span> <span class=\"mf\">1.169762</span> <span class=\"nv\">sec</span>\n <span class=\"nv\">Execution</span> <span class=\"nb\">time </span><span class=\"nv\">std-deviation</span> <span class=\"err\">:</span> <span class=\"mf\">46.845286</span> <span class=\"nv\">ms</span>\n <span class=\"nv\">Execution</span> <span class=\"nb\">time </span><span class=\"nv\">lower</span> <span class=\"nv\">quantile</span> <span class=\"err\">:</span> <span class=\"mf\">1.139851</span> <span class=\"nv\">sec</span> <span class=\"p\">(</span> <span class=\"mf\">2.5</span><span class=\"nv\">%</span><span class=\"p\">)</span>\n <span class=\"nv\">Execution</span> <span class=\"nb\">time </span><span class=\"nv\">upper</span> <span class=\"nv\">quantile</span> <span class=\"err\">:</span> <span class=\"mf\">1.311480</span> <span class=\"nv\">sec</span> <span class=\"p\">(</span><span class=\"mf\">97.5</span><span class=\"nv\">%</span><span class=\"p\">)</span>\n <span class=\"nv\">Overhead</span> <span class=\"nv\">used</span> <span class=\"err\">:</span> <span class=\"mf\">20.346327</span> <span class=\"nv\">ns</span>\n</code></pre>\n<p>So that’s 1.169 seconds to find the mean of 10^7 elements; about 8.5 MHz. What’s going on here? Let’s try <a href=\"https://www.yourkit.com/\">Yourkit</a>:</p>\n<p><img class=\"attachment pure-img\" src=\"/data/posts/363/tuple-time.png\" alt=\"A Yourkit screenshot of this benchmark\" title=\"A Yourkit screenshot of this benchmark\"></p>\n<p>Yikes. We’re burning 42% of our total time in destructuring (<code>nth</code>) and creating (<code>LazilyPersistentVector.createOwning</code>) those vector pairs. Also Clojure vectors aren’t primitive, so these are all instances of Long.</p>\n<h2><a href=\"#faster-reducers\" id=\"faster-reducers\">Faster Reducers</a></h2>\n<p>Check this out:</p>\n<pre><code><span></span><span class=\"p\">(</span><span class=\"nf\">require</span> <span class=\"o\">'</span><span class=\"p\">[</span><span class=\"nv\">dom-top.core</span> <span class=\"ss\">:refer</span> <span class=\"p\">[</span><span class=\"nv\">reducer</span><span class=\"p\">]])</span>\n<span class=\"p\">(</span><span class=\"k\">def </span><span class=\"nv\">mean</span>\n <span class=\"s\">\"A reducer to find the mean of a collection.\"</span>\n <span class=\"p\">(</span><span class=\"nf\">reducer</span> <span class=\"p\">[</span><span class=\"nv\">sum</span> <span class=\"mi\">0</span>, <span class=\"nb\">count </span><span class=\"mi\">0</span><span class=\"p\">]</span> <span class=\"c1\">; Starting with a sum of 0 and count of 0,</span>\n <span class=\"p\">[</span><span class=\"nv\">x</span><span class=\"p\">]</span> <span class=\"c1\">; Take each element, x, and:</span>\n <span class=\"p\">(</span><span class=\"nf\">recur</span> <span class=\"p\">(</span><span class=\"nb\">+ </span><span class=\"nv\">sum</span> <span class=\"nv\">x</span><span class=\"p\">)</span> <span class=\"c1\">; Compute a new sum</span>\n <span class=\"p\">(</span><span class=\"nb\">inc </span><span class=\"nv\">count</span><span class=\"p\">))</span> <span class=\"c1\">; And count</span>\n <span class=\"p\">(</span><span class=\"nb\">/ </span><span class=\"nv\">sum</span> <span class=\"nv\">count</span><span class=\"p\">)))</span> <span class=\"c1\">; Finally dividing sum by count</span>\n</code></pre>\n<p>The <code>reducer</code> macro takes a binding vector, like <code>loop</code>, with variables and their initial values. Then it takes a binding vector for an element of the collection. With each element it invokes the third form, which, like <code>loop</code>, recurs with new values for each accumulator variable. The optional fourth form is called at the end of the reduction, and transforms the accumulators into a final return value. You might recognize this syntax from <a href=\"https://aphyr.com/posts/360-loopr-a-loop-reduction-macro-for-clojure\">loopr</a>.</p>\n<p>Under the hood, <code>reducer</code> <a href=\"https://github.com/aphyr/dom-top/blob/7f5d429593b42132b71c63f11d0fe8ae5b70ca68/src/dom_top/core.clj#L915-L957\">dynamically compiles</a> an accumulator class with two fields: one for <code>sum</code> and one for <code>count</code>. It’ll re-use an existing accumulator class if it has one of the same shape handy. It then constructs a <a href=\"https://github.com/aphyr/dom-top/blob/7f5d429593b42132b71c63f11d0fe8ae5b70ca68/src/dom_top/core.clj#L1059-L1112\">reducing function</a> which takes an instance of that accumulator class, and rewrites any use of <code>sum</code> and <code>acc</code> in the iteration form into direct field access. Instead of constructing a fresh instance of the accumulator every time through the loop, it mutates the accumulator fields in-place, reducing GC churn.</p>\n<pre><code><span></span><span class=\"nv\">user=></span> <span class=\"p\">(</span><span class=\"nf\">bench</span> <span class=\"p\">(</span><span class=\"nf\">transduce</span> <span class=\"nb\">identity </span><span class=\"nv\">mean</span> <span class=\"nv\">numbers</span><span class=\"p\">))</span>\n<span class=\"nv\">Evaluation</span> <span class=\"nb\">count </span><span class=\"err\">:</span> <span class=\"mi\">120</span> <span class=\"nv\">in</span> <span class=\"mi\">60</span> <span class=\"nv\">samples</span> <span class=\"nv\">of</span> <span class=\"mi\">2</span> <span class=\"nv\">calls.</span>\n <span class=\"nv\">Execution</span> <span class=\"nb\">time </span><span class=\"nv\">mean</span> <span class=\"err\">:</span> <span class=\"mf\">753.896407</span> <span class=\"nv\">ms</span>\n <span class=\"nv\">Execution</span> <span class=\"nb\">time </span><span class=\"nv\">std-deviation</span> <span class=\"err\">:</span> <span class=\"mf\">10.828236</span> <span class=\"nv\">ms</span>\n <span class=\"nv\">Execution</span> <span class=\"nb\">time </span><span class=\"nv\">lower</span> <span class=\"nv\">quantile</span> <span class=\"err\">:</span> <span class=\"mf\">740.657627</span> <span class=\"nv\">ms</span> <span class=\"p\">(</span> <span class=\"mf\">2.5</span><span class=\"nv\">%</span><span class=\"p\">)</span>\n <span class=\"nv\">Execution</span> <span class=\"nb\">time </span><span class=\"nv\">upper</span> <span class=\"nv\">quantile</span> <span class=\"err\">:</span> <span class=\"mf\">778.701155</span> <span class=\"nv\">ms</span> <span class=\"p\">(</span><span class=\"mf\">97.5</span><span class=\"nv\">%</span><span class=\"p\">)</span>\n <span class=\"nv\">Overhead</span> <span class=\"nv\">used</span> <span class=\"err\">:</span> <span class=\"mf\">20.346327</span> <span class=\"nv\">ns</span>\n</code></pre>\n<p>Getting rid of the vector destructuring makes the reduction ~35% faster. I’ve seen real-world speedups in my reductions of 50% or more.</p>\n<h2><a href=\"#primitives\" id=\"primitives\">Primitives</a></h2>\n<p>Since we control class generation, we can take advantage of type hints to compile accumulators with primitive fields:</p>\n<pre><code><span></span><span class=\"p\">(</span><span class=\"k\">def </span><span class=\"nv\">mean</span>\n <span class=\"p\">(</span><span class=\"nf\">reducer</span> <span class=\"p\">[</span><span class=\"o\">^</span><span class=\"nb\">long </span><span class=\"nv\">sum</span> <span class=\"mi\">0</span>, \n <span class=\"o\">^</span><span class=\"nb\">long count </span><span class=\"mi\">0</span><span class=\"p\">]</span> \n <span class=\"p\">[</span><span class=\"o\">^</span><span class=\"nb\">long </span><span class=\"nv\">x</span><span class=\"p\">]</span> \n <span class=\"p\">(</span><span class=\"nf\">recur</span> <span class=\"p\">(</span><span class=\"nb\">+ </span><span class=\"nv\">sum</span> <span class=\"p\">(</span><span class=\"nb\">long </span><span class=\"nv\">x</span><span class=\"p\">))</span> <span class=\"p\">(</span><span class=\"nb\">inc </span><span class=\"nv\">count</span><span class=\"p\">))</span>\n <span class=\"p\">(</span><span class=\"nb\">/ </span><span class=\"nv\">sum</span> <span class=\"nv\">count</span><span class=\"p\">)))</span> \n</code></pre>\n<p>Since <code>reduce</code>, <code>transduce</code>, etc. call our reducer with Object, rather than Long, we still have to unbox each element–but we <em>do</em> avoid boxing on the iteration variables, at least. A primitive-aware reduce could do better.</p>\n<p><img class=\"attachment pure-img\" src=\"/data/posts/363/reducer-prim.png\" alt=\"A Yourkit screenshot showing almost all our time is spent in unboxing incoming longs. The addition and destructuring costs are gone.\" title=\"A Yourkit screenshot showing almost all our time is spent in unboxing incoming longs. The addition and destructuring costs are gone.\"></p>\n<p>This gives us a 43% speedup over the standard pair reducer approach.</p>\n<pre><code><span></span><span class=\"nv\">user=></span> <span class=\"p\">(</span><span class=\"nf\">bench</span> <span class=\"p\">(</span><span class=\"nf\">transduce</span> <span class=\"nb\">identity </span><span class=\"nv\">mean</span> <span class=\"nv\">numbers</span><span class=\"p\">))</span>\n<span class=\"nv\">Evaluation</span> <span class=\"nb\">count </span><span class=\"err\">:</span> <span class=\"mi\">120</span> <span class=\"nv\">in</span> <span class=\"mi\">60</span> <span class=\"nv\">samples</span> <span class=\"nv\">of</span> <span class=\"mi\">2</span> <span class=\"nv\">calls.</span>\n <span class=\"nv\">Execution</span> <span class=\"nb\">time </span><span class=\"nv\">mean</span> <span class=\"err\">:</span> <span class=\"mf\">660.420219</span> <span class=\"nv\">ms</span>\n <span class=\"nv\">Execution</span> <span class=\"nb\">time </span><span class=\"nv\">std-deviation</span> <span class=\"err\">:</span> <span class=\"mf\">14.255986</span> <span class=\"nv\">ms</span>\n <span class=\"nv\">Execution</span> <span class=\"nb\">time </span><span class=\"nv\">lower</span> <span class=\"nv\">quantile</span> <span class=\"err\">:</span> <span class=\"mf\">643.583522</span> <span class=\"nv\">ms</span> <span class=\"p\">(</span> <span class=\"mf\">2.5</span><span class=\"nv\">%</span><span class=\"p\">)</span>\n <span class=\"nv\">Execution</span> <span class=\"nb\">time </span><span class=\"nv\">upper</span> <span class=\"nv\">quantile</span> <span class=\"err\">:</span> <span class=\"mf\">692.084752</span> <span class=\"nv\">ms</span> <span class=\"p\">(</span><span class=\"mf\">97.5</span><span class=\"nv\">%</span><span class=\"p\">)</span>\n <span class=\"nv\">Overhead</span> <span class=\"nv\">used</span> <span class=\"err\">:</span> <span class=\"mf\">20.346327</span> <span class=\"nv\">ns</span>\n</code></pre>\n<h2><a href=\"#early-return\" id=\"early-return\">Early Return</a></h2>\n<p>The <code>reducer</code> macro supports early return. If the iteration form does not <code>recur</code>, it skips the remaining elements and returns immediately. Here, we find the mean of at most 1000 elements:</p>\n<pre><code><span></span><span class=\"p\">(</span><span class=\"k\">def </span><span class=\"nv\">mean</span>\n <span class=\"p\">(</span><span class=\"nf\">reducer</span> <span class=\"p\">[</span><span class=\"o\">^</span><span class=\"nb\">long </span><span class=\"nv\">sum</span> <span class=\"mi\">0</span>,\n <span class=\"o\">^</span><span class=\"nb\">long count </span><span class=\"mi\">0</span>\n <span class=\"ss\">:as</span> <span class=\"nv\">acc</span><span class=\"p\">]</span> <span class=\"c1\">; In the final form, call the accumulator `acc`</span>\n <span class=\"p\">[</span><span class=\"o\">^</span><span class=\"nb\">long </span><span class=\"nv\">x</span><span class=\"p\">]</span>\n <span class=\"p\">(</span><span class=\"k\">if </span><span class=\"p\">(</span><span class=\"nb\">= count </span><span class=\"mi\">1000</span><span class=\"p\">)</span>\n <span class=\"p\">(</span><span class=\"nb\">/ </span><span class=\"nv\">sum</span> <span class=\"nv\">count</span><span class=\"p\">)</span> <span class=\"c1\">; Early return</span>\n <span class=\"p\">(</span><span class=\"nf\">recur</span> <span class=\"p\">(</span><span class=\"nb\">+ </span><span class=\"nv\">sum</span> <span class=\"nv\">x</span><span class=\"p\">)</span> <span class=\"p\">(</span><span class=\"nb\">inc </span><span class=\"nv\">count</span><span class=\"p\">)))</span> <span class=\"c1\">; Keep going</span>\n <span class=\"p\">(</span><span class=\"k\">if </span><span class=\"p\">(</span><span class=\"nf\">number?</span> <span class=\"nv\">acc</span><span class=\"p\">)</span> <span class=\"c1\">; Depending on whether we returned early...</span>\n <span class=\"nv\">acc</span>\n <span class=\"p\">(</span><span class=\"k\">let </span><span class=\"p\">[[</span><span class=\"nv\">sum</span> <span class=\"nv\">count</span><span class=\"p\">]</span> <span class=\"nv\">acc</span><span class=\"p\">]</span>\n <span class=\"p\">(</span><span class=\"nb\">/ </span><span class=\"nv\">sum</span> <span class=\"nv\">count</span><span class=\"p\">)))))</span>\n</code></pre>\n<p>For consistency with <code>transduce</code>, both normal and early return accumulators are passed to the final form. Because the early-returned value might be of a different type, <code>reducer</code> lets you declare a single variable for the accumulators in the final form–here, we call it <code>acc</code>. Depending on whether we return early or normally, <code>acc</code> will either be a number or a vector of all accumulators: <code>[sum count]</code>.</p>\n<h2><a href=\"#available-now\" id=\"available-now\">Available Now</a></h2>\n<p>I’ve been using <code>reducer</code> extensively in <a href=\"https://jepsen.io/\">Jepsen</a> over the last six months, and I’ve found it very helpful! It’s used in many of our checkers, and helps find anomalies in histories of database operations more quickly.</p>\n<p>If you’d like to try this for yourself, you’ll find <code>reducer</code> in <a href=\"https://github.com/aphyr/dom-top\">dom-top</a>, my library for unorthodox control flow. It’s available on Clojars.</p>\n<p><em>A special thanks to Justin Conklin, who was instrumental in getting the bytecode generation and dynamic classloading <code>reducer</code> needs to work!</em></p>", "image": null, "media": [], "authors": [ { "name": "Aphyr", "email": null, "url": "https://aphyr.com/" } ], "categories": [] }, { "id": "https://aphyr.com/posts/361-monkeypox-in-ohio", "title": "Monkeypox in Ohio", "description": null, "url": "https://aphyr.com/posts/361-monkeypox-in-ohio", "published": "2022-07-22T15:43:34.000Z", "updated": "2022-07-22T15:43:34.000Z", "content": "<p><em>Update 2022-08-12: The Hamilton County Health Department now has a <a href=\"https://www.hamiltoncountyhealth.org/services/programs/community-health-services-disease-prevention/monkeypox/\">page about monkeypox</a> with symptoms and isolation guidance, as well as options for vaccination, testing, and treatment–look for “complete our monkeypox vaccine registration”. The Cincinnati Health Department is <a href=\"https://www.cincinnati-oh.gov/health/news/monkeypox-vaccination/\">also offering vaccines</a> for high-risk groups. People in Hamilton County without a primary care physician who have symptoms can also call call 513-357-7320 for the Cincinnati city health clinic.</em></p>\n<p>If you’re a gregarious gay man like me you’ve probably heard about monkeypox. <a href=\"https://www.cdc.gov/poxvirus/monkeypox/index.html\">Monkeypox is an orthopoxvirus</a> which causes, in addition to systemic symptoms, lesions on the skin and mucosa. It’s transmitted primarily through skin-to-skin contact, though close-range droplet and fomite transfer are also possible. The current outbreak in industrialized nations is <a href=\"https://mobile.twitter.com/mugecevik/status/1549858617800753154\">almost entirely</a> among gay, bisexual, and other men who have sex with men (GBMSM); likely via sexual networks. <a href=\"https://www.gov.uk/government/publications/monkeypox-outbreak-technical-briefings/investigation-into-monkeypox-outbreak-in-england-technical-briefing-3\">In the UK, for example</a>, 99% of cases are male and 97% are among GBMSM. <a href=\"https://www.publichealthontario.ca/-/media/Documents/M/2022/monkeypox-episummary.pdf?sc_lang=en\">Ontario reports</a> 99% of cases are in men. In New York <a href=\"https://www1.nyc.gov/assets/doh/downloads/pdf/han/advisory/2022/monkeypox-update-testing-treatment.pdf\">99% of cases are in men who have sex with men</a>. For a good overview of what monkeypox looks like, how it’s spread, and ways we can reduce transmission, check out San Francisco Leathermen’s Discussion Group’s <a href=\"https://www.youtube.com/watch?v=iLninLYkXDc\">presentation by MPH Frank Strona</a>.</p>\n<p>Earlier outbreaks of monkeypox have been generally self-limiting, but that does not appear to be the case with the current outbreak; perhaps because it’s spreading among men who enjoy lots of skin-to-skin contact with lots of people, and often travel to do just that. We saw cases come out of Darklands, IML, and Daddyland; they’re also rising quickly in major cities across the US. Cases haven’t taken off in Ohio yet, but for those of us who travel to large gay events it’s an increasing risk. Two of my friends tested positive last week; one landed in the ER. We’ve known for a while about symptoms and risk reduction tactics, so I won’t cover those here, but I <em>do</em> want to talk about vaccines and testing.</p>\n<p>There is a vaccine (in the US, <a href=\"https://www.cdc.gov/mmwr/volumes/71/wr/mm7122e1.htm\">JYNNEOS</a>) for monkeypox, but but we have nowhere near enough doses. Some cities like SF, New York, and Chicago have pop-up mass vaccination sites. The Ohio Department of Health, on the other hand, has been <a href=\"https://ohiocapitaljournal.com/2022/07/13/monkeypox-is-spreading-but-the-ohio-department-of-health-hasnt-spread-the-message/\">basically silent on the outbreak</a>: two months in there’s no public messaging about risk factors, prevention, testing, or treatment. <a href=\"https://aspr.hhs.gov/SNS/Pages/JYNNEOS-Distribution.aspx\">Only ten doses</a> have been distributed in the state. Many of my friends here have no idea how to get vaccinated.</p>\n<p>Through an incredible stroke of luck and a truly fantastic doctor, I got my first dose yesterday. I followed up by chatting with our county’s (super nice!) head epidemiologist, and asking what advice I can give to the local queer leather community. Here’s the scoop:</p>\n<h2><a href=\"#getting-vaccinated-in-ohio\" id=\"getting-vaccinated-in-ohio\">Getting Vaccinated in Ohio</a></h2>\n<p>Because of the limited supply of vaccine doses, we’re currently focusing vaccination on those at high risk. Ohio’s current high-risk category includes:</p>\n<ul>\n<li>Anyone who has had close intimate contact (sex, cuddling, making out, etc.) with a known monkeypox case</li>\n<li>Exposure to respiratory droplets (e.g. dancing close, kissing) with a known case</li>\n</ul>\n<p>The problem is that many of us don’t <em>know</em> we’ve been in contact with monkeypox cases: of the people I know who have tested positive, none knew where they got it from. You might dance, make out with, or fuck dozens of people at an event weekend. So (our epidemiologist advised me) your participation in events where people have lots of sex or skin-to-skin contact can <em>also</em> be considered a risk factor. In other states having <a href=\"https://www1.nyc.gov/site/doh/health/health-topics/monkeypox.page\">multiple or anonymous partners in a two-week span</a> is a qualifying criterion; that might be worth a shot here too.</p>\n<p>There are no public vaccination sites in Ohio yet, as far as I know. Doses are distributed to individuals through their primary care physician (which poses a problem if you <em>don’t</em> have one, oof). You should start by talking to your doctor about your risk factors: for me that was “I have two friends who tested positive, I just did bear week in ptown, I play regularly, and I’m going to be participating in a large leather run in a few weeks.” If your doctor concurs you’re at high risk, they should get in touch with their usual contact at the county or city department of health. The department of health gathers information about risk factors and contact information, and then gets in touch with the Ohio department of health, which requisitions doses from the CDC and ships them to your doctor. For me it took three days from “talking to my doctor” to “first dose in arm”.</p>\n<p>The JYNNEOS shot is fairly easy: it’s a single sub-cutaneous injection with a thin needle into the fat on your arm–much better than the old orthopoxvirus vaccines. I barely felt it. The usual vaccine side effects apply: you <a href=\"https://www.fda.gov/media/131078/download\">might have some redness, swelling, or pain</a> at the injection site; other common side effects include low-grade muscle pain, headache, and fatigue. I’ve noticed some redness and a little bit of pain, but it’s been super mild. You need to come back for a second dose four weeks later, and protection is maximized two weeks after the second dose.</p>\n<p>When you get vaccinated, there’s a 21-day observation period where you need to report monkeypox-related symptoms via phone, text, or email to the health department–they’ll get in touch with you about this.</p>\n<h2><a href=\"#getting-tested-in-ohio\" id=\"getting-tested-in-ohio\">Getting Tested in Ohio</a></h2>\n<p>There’s no dedicated testing system in place yet, and I know friends have had really mixed luck getting tested through their primaries, urgent care, and even some hospitals. That said, our county epidemiologist says that both the state lab and some private labs–notably LabCorp–can test for monkeypox. Theoretically any primary care provider, urgent care, or emergency room should be able to do the test: they take a swab of your lesions and ship it off to the lab for analysis. In Cincinnati, Equitas Health (2805 Gilbert Ave, Cincinnati, OH 45206; 513-815-4475) should be able to test–call ahead and check before showing up though.</p>\n<p>When you get tested, you should isolate until you get results back. If your results are negative, talk to your primary doctor about what to do: it may be some other kind of illness. If the test is positive, you’ll have to isolate until no longer contagious, which means the lesions need to scab over and heal–you should see new skin underneath. Your contacts do not need to isolate unless they show symptoms, but the county will start contact tracing, and just in case, I think it’d be a good idea for you to reach out to intimate contacts personally.</p>\n<h2><a href=\"#mileage-may-vary\" id=\"mileage-may-vary\">Mileage May Vary</a></h2>\n<p>Right now I’m the only person I know who’s gotten vaccinated here, and I haven’t needed to look for testing, so I don’t know what’s going to happen when other people start trying to access care. A friend just told me that his doctor didn’t think there <em>was</em> a vaccine: if this happens to you, try showing them the CDC’s <a href=\"https://www.cdc.gov/mmwr/volumes/71/wr/mm7122e1.htm\">MMWR report on JYNNEOS</a>. If they can’t get you JYNNEOS, <a href=\"https://www.cdc.gov/poxvirus/monkeypox/clinicians/smallpox-vaccine.html\">ACAM2000</a> is also available–it involves worse side effects and may not be a good fit if you have a weakened immune system, but for many people it should still offer significant protection against monkeypox. If your doctor still doesn’t get it, ask them to contact the local health dept anyway.</p>\n<p>In Dayton, one friend’s primary doctor states the only way to get vaccinated is at the ER following a possible exposure. In NE Ohio, another reader’s primary said to contact the health dept directly; they’ve tried, but so far no response.</p>\n<p>If you’re still having trouble getting tested or vaccinated–your doctor doesn’t understand, or they can’t find the right contact at the health department–and we’re friends locally, get in touch. I’m <a href=\"mailto:[email protected]\">[email protected]</a>. I may be able to put you in touch with a doctor and/or health department staff who can help.</p>", "image": null, "media": [], "authors": [ { "name": "Aphyr", "email": null, "url": "https://aphyr.com/" } ], "categories": [] } ] }