31
The Chisel
2026-01-31 · 3343 words
I’ve been a fan of Conflict-Free replicated datatypes for a while. I wrote an old blog post about implementing them back in 2021, Wow, almost 5 years ago. and I’ve been turning them over in my mind ever since.
If you’re not familiar with what a CRDT is, I’ll provide this explanation I wrote, taken from a project I am currently working on: The project is called Together, as a part of Veritable / Solidarity / Home / Isocore.
Background on CRDTs
A CRDT is a datatype with a single operator, merge. Merge is an operation that takes a pair of values and produces a new value, merging the two. This operator has some special properties. Specifically, only requirement on merge is that, for all types, it is:
-
Commutative, meaning
merge(A, B)is the same asmerge(B, A); the order of operations does not matter. -
Associative, meaning
merge(A, merge(B, C))is the same asmerge(merge(A, B), C); the grouping of operations does not matter. -
Idempotent, meaning
merge(A, merge(A, B))is the same asmerge(A, B); applying the same operation multiple times does not change the result.
A simple example of a CRDT is a max counter over the integers. The merge operator simply takes whichever argument is greater. We can define it in Rust like so:
struct MaxCounter {
value: usize,
}
fn merge(a: MaxCounter, b: MaxCounter) -> MaxCounter {
MaxCounter { value: a.value.max(b.value) }
}
Imagine we have three integer values such that A > B > C. Then, it is trivial to show that max counter is:
- Commutative, as
max(A, B) = A = max(B, A) - Associative, as
max(A, max(B, C)) = A = max(max(A, B), C) - Idempotent, as
max(A, max(A, B)) = A = max(A, B)
Therefore max over the integers forms a CRDT.
There are lots of other cool CRDTs, like replicated growth arrays, for more complicated structures of data. The core idea of CRDTs is that if the merge operator forms a (join, semi-) lattice, we can take any set of changes and merge them in any order and eventually arrive at the same result. This has very nice properties for collaboration, of course.
I am not a pioneer in the space, so if you’d like a better explanation, search site:lobste.rs crdt or something.
The Isocore Yap
I am currently working on Isocore, which is a BEAM-like runtime for WebAssembly (Wasm) components. Think of each component like an Erlang module. You can write distributed applications that run across multiple computers: we carve Wasm components apart at the interface boundary (WIT), automatically intercepting and converting arbitrary interface calls into RPC calls. Under the hood we handle all the routing, orchestration, supervision, data replication, authorization, etc..
Isocore is designed to be pretty small, with an interface you could bundle as a Wasm component itself and send to a browser. So a server running an Isocore node could serve a copy of Isocore to a browser, which connects back into the isocore cluster using some protocol like websockets or webtransport or webRTC. Then the web node could fetch applications from the parent node, or call remote functions (e.g. persist a file, write to a database, perform a transaction) by binding that application to interface implementations running on the server node (or cluster, broadly).
This is something of a personal art project I have been working on for the past, say, 7 years or so. The project has grown and changed a lot over the years, but the core idea is to create a better version of the web, something along the lines of a universal application runtime that also serves as a desktop environment / operating system. I’m trying to do it from scratch as much as possible, because it’s for my own learning and enjoyment.
Isocore is designed to be distributed, or federated. Because authorization is capability-based, everything is sandboxed, we use strong cryptographic primitives, etc. I want it to be common and safe to e.g. pull Wasm component applications hosted by nodes on different clusters, or even have different clusters run by different people collaborate on running the same application. Like, if I run an instance of Isocore at home.isaac.sh, and you run one at lab.unnamed.website, and there’s a common chat application named harmony-chat or something that we both run, I should be able to chat with you by pointing my cluster at yours. This requires Isocore nodes be simple and provide general compute/storage/transport services, where permission to use these resources is strictly governed by whoever runs the node/cluster.
If we’re working in a distributed context, to build this vision, we first need to polyfill some distributed primitives that are missing. The biggest things that any distributed system needs, in this sense, are:
- Simple protocols for cryptography and authorization.
- Mass immutable blob store, a la bittorrent; with a mutable filesystem-like layer on top.
- Encrypted streaming peer-to-peer data replication, a la hypercore.
- Single-binary multi-node capability-based sandboxed application runtimes. Which is what Isocore is.
- Authenticated CRDT libraries for collaboration, sync, and versioning. Which is what Together is.
- A general UI layer (whether web or native) for building human-friendly non-shoebox applications.
So far I’ve implemented most of point 1 (standard primitives using ed15519-dalek, XChaCha20-Poly1305, etc.), written multiple types of point 2 in different contexts (in-memory, on-disk, to-cloud), implemented point 3 a handful of times. Most recently at my last job, on top of QUIC.
I’m working on 4, 5, and 6. For point 6, I’m planning to start by using good old html, css, and javascript + wasm. It would be cool to wire up a cross-platform UI library from scratch, but I’m aware enough to understand how big of a task that is. I’m a firm believer in static site + authenticated RPC calls, and although I have not used it, I am partial to htmx. I dislike React, but Live by Steven Wittens is cool. Anyway.
Point 4, Isocore, will be the subject of another blog post. This is a long preamble to say that today I will be talking about CRDTs, and a library which I’m writing called Together.
I started Together back in 2018, but then abandoned it for a season; here is the current repo. (Un)fortunately, I registered the crate name together on crates.io, but never ended up publishing any of the code I wrote. Now that I’m picking up development again, I hope to make good use of this namespace going forward.
Which brings us to today’s adventure. Sorry to bury the lede.
Together, or: Something is profoundly wrong with the state of AI programming tools right now
So yesterday, I was working on Together, my CRDT library. Which is why I brought up CRDTs when chatting with Mr. Dean yesterday. By hand, I wrote a very simple replicated growth array (RGA), with explicit trees and O(n) indexing and everything. Most importantly, I made sure my implementation was correct. I wrote a handful of unit tests for the edge cases.
Then I ran my RGA implementation, and it was slow. I know that CRDTs don’t have to be slow. Diamond-types is a state-of-the-art library with performant CRDTs. So, off the dome, I wrote down a long list of possible optimizations, and got to work.
I wanted to first resolve linear indexing. So I started writing a skip-list by hand. About halfway through, I decided to try something new.
So I asked claude-opus-4-5 (which I will refer to as “the chisel”) to read the TigerStyle document on GitHub, and a huge list of blog posts and style tips I had curated into this big ol’ document named Process. I then asked the chisel to generate an exhaustive set of unit tests for my existing slow-but-correct reference implementation. I asked the chisel to additionally generate a number of property-based tests, for all invariants, and benchmarks comparing my implementation to diamond-types on a number of standard benchmarks from josephg.
Here is how diamond-types performs on these standard editing traces:
| Trace | Ops | Diamond (ms) | Description |
|---|---|---|---|
sveltecomponent | 19,749 | 1.70 | Editing a Svelte component file |
rustcode | 40,173 | 4.30 | Editing Rust source code |
seph-blog1 | 137,993 | 9.10 | Writing a blog post |
automerge-paper | 259,778 | 15.4 | Writing the Automerge academic paper |
Which is pretty fast! I knew that we could at least achieve this speed in principle.
I needed to sleep, so I left my computer running overnight and asked the chisel to, essentially:
- Benchmark the current implementation.
- Research the next best optimization.
- Implement it, ensuring all tests still passed and the API stayed the same.
- Benchmark again and commit if the code ran faster.
- Reflect and add to the list of future optimization ideas.
- Repeat until together was faster than diamond-types.
For context, here is how slow my reference implementation was:
| Trace | Diamond (ms) | Together (ms) | vs Diamond (times slower) |
|---|---|---|---|
sveltecomponent | 1.70 | 465 | 274× |
rustcode | 4.30 | ~1,500 | ~350× |
seph-blog1 | 9.10 | ~6,000 | ~660× |
automerge-paper | 15.4 | ~12,000 | ~780× |
When I woke up in the morning, I was greeted with this table: Which I had asked the chisel to prepare upon completion.
| No. | Optimization | Step Speedup | vs Diamond (1.0× = parity) | Result |
|---|---|---|---|---|
| 0 | Unoptimized (naive Vec) | — | 274× | slower |
| 1 | Remove HashMap index | 8.8× | 31× | slower |
| 2 | Chunked weighted list | 77× | 4.5× | slower |
| 3 | Span coalescing | 1.9× | 2.4× | slower |
| 4 | Combined origin/insert lookup | 1.2× | 2.0× | slower |
| 5 | Compact Span (112 to 24 bytes) | 1.3× | 1.5× | slower |
| 6 | Fenwick tree for chunk weights | 1.6× | 1.5× | slower |
| 7 | Binary search over chunks | 0.9× | — | reverted |
| 8 | Hybrid Fenwick/linear scan | — | 1.5× | slower |
| 9 | Cursor caching | 1.2× | 1.3× | slower |
| 10 | Skip list for spans | 0.0005× | — | reverted |
| 11 | RgaBuf (buffered writes) | 1.30× | 1.0× | parity |
| 12 | Backspace optimization | 1.10× | 1.0× | parity |
| 13 | SmallVec for pending content | 1.05× | 1.0× | parity |
| 14 | Inline hints + debug_assert | 1.05× | 1.0× | parity |
| 15 | Chunk location caching | 1.05× | 1.0× | parity |
| 16 | B-Tree for spans | 1.30× | 1.0× | parity |
| 17 | Smaller B-tree leaves | 0.9× | — | reverted |
| 18 | Larger B-tree leaves | 0.95× | — | reverted |
| 19 | Simplified cursor cache | 0.95× | — | reverted |
| 20 | Delete buffering | 2.8× | 0.56× | faster |
| 21 | FxHashMap for UserTable | 1.1× | 0.50× | faster |
| 22 | Fix YATA/FugueMax bugs | 0.15× | 6.7× | slower |
| 23 | ID lookup index for merge | 1.0× | 6.7× | slower |
| 24 | Origin position hint | 6.0× | 1.1× | slower |
| 25 | Fast path for YATA scan | 1.05× | 1.1× | slower |
| 26 | SmallVec for subtree tracking | 1.02× | 1.1× | slower |
| 27 | LTO + single codegen unit | 1.03× | 1.0× | parity |
| 28 | Origin index for O(k) sibling lookup | 1.25× | 0.79× | faster |
| 29 | Stable origin IDs (user_idx, seq) | 1.05× | 0.79× | faster |
| 30 | Right origin tracking for merge | 1.02× | 0.79× | faster |
Note: the other benchmarks would take a long time to run so I asked the chisel to run against only sveltecomponent until the other benchmarks were fast enough to run.
Is that not crazy!? Over the course of a night, the chisel implemented 30 optimizations and sped up the code by a factor of over 300×, while preserving all interfaces and maintaining correct semantics.
Benchmarking the final implementation against diamond-types:
| Trace | Diamond (ms) | Together (ms) | Speedup | Diamond (ns/op) | Together (ns/op) |
|---|---|---|---|---|---|
sveltecomponent | 1.48 | 1.17 | 1.26× | 75 | 59 |
rustcode | 3.64 | 2.96 | 1.23× | 91 | 74 |
seph-blog1 | 8.57 | 4.87 | 1.76× | 62 | 35 |
automerge-paper | 14.33 | 4.41 | 3.25× | 55 | 17 |
Which is impressive, now beating diamond-types on all four traces.
Now, to be completely clear:
I take no credit for any of this. A lot of these optimizations were pioneered by josephg and other friends of mine who work on CRDTs. On a greenfield problem, there would not have been a well-worn path to follow. The chisel launders good ideas, which is as incredible as it is terrifying.
After reading through the code, Together is faster due to a couple of unique design decisions. First, I specifically designed Together to require much smaller Span types, Together is 24 bytes, diamond-types is 40. so more can fit in a cache line. Together also buffers inserts and deletes, as most edits in the test dataset are local continuous insertions and deletions while typing. These two optimizations allow Together to light up a core and rip through sequences of merge operations.
It’s completely possible that on real-world data, or when used for other applications, Together is slower. Diamond-types is an incredible, more general library; Together, while having extensive property-based tests, is not. Optimizations have tradeoffs, and the design decision of buffering edits, at a design level, may be at odds with the goals of diamond-types, for example.
Here is what is profoundly wrong: I have no right to have used the chisel to get this far in one night. This experience was a strange and vertigo-inducing way to write software: I wrote out the high-level design, I wrote the types and the API, I brainstormed all the little tricks a real-world implementation might use. Then I went to bed.
I guess my specification was good enough to will a competitive implementation into existence? I’m scared to see how far I could push this; after all, there’s still a long list of optimization ideas left.
We’ll see.
Update: I benchmarked against a handful of major Rust CRDT libraries: Mostly Rust, with one JS library.
| Library | sveltecomponent (ms) | rustcode (ms) | seph-blog1 (ms) | automerge-paper (ms) |
|---|---|---|---|---|
together | 1.17 | 2.96 | 4.87 | 4.41 |
diamond-types | 1.48 | 3.64 | 8.57 | 14.33 |
cola-crdt | 2.48 | 21.47 | 38.96 | 142.98 |
json-joy (JS) | 7.64 | 25.13 | 53.49 | 99.19 |
loro | 15.16 | 36.16 | 77.20 | 144.84 |
automerge | 165.20 | 1180.60 | 431.83 | 303.34 |
yrs | 359.38 | 1182.30 | 5563.46 | 6520.55 |
To be fair, most of these libraries are general-purpose and support many different CRDT structures, not just RGAs. But I think with enough layers of RGAs you can basically represent anything so ymmv.
Disclaimer: microbenchmarks are hard to get right, I’m running on a 2023 M3 Pro, there are lots of other metrics I didn’t measure, etc. Prescription: take with cubes of salt.
I also stubbed out and benchmarked the Zed text crate, just out of curiosity, but it was so slow I’m sure I must have done something incorrectly. It didn’t seem fair to include it as a comparison.
The chisel works while I blog
While I was writing this up I wanted to put the chisel to work. I designed a nice high-level API for reading slices of a CRDT document to Strings and keeping track of anchors in a document. This means that if your e.g. cursor is highlighting something in a document, and someone else edits inside of the highlight, the highlight expands as the anchors move with the text. For example, using ] and [ to represent highlight anchors:
The cat sat on ]the mat[.
The cat sat on ]the fluffy mat[.
++++++
I also want to be able to rewind to any prior version in the document. So I brainstormed three ways to do this, to summarize:
- Add logical timestamps and filter newer edits while traversing the CRDT.
- Use a persistent B-Tree so old versions still exist and can be read/forked.
- Store periodic checkpoints to avoid having to apply all edits from the beginning when loading an old version.
I wrote up a spec and some tests and I asked the chisel to get to work: implement each approach in a new branch, benchmark at the end to pick the best approach. And the chisel just… did the work.
On the future of software development.
I have two unpublished essays, each about 10k words long. The first is called “Gradually Casting Systems in Stone”. It is about the spectrum from sketching ideas in design documents to formally verifying core systems. We need tools that help us gradually cast systems in stone, tools that move ideas from abstract to concrete, concrete to correct, correct to fast. A codebase can be seen as a living system, where calcified components that have been cast in stone are strung together by a growing layer of new ideas that percolate and harden themselves. I genuinely believe that as tools improve I have enough knowledge of the systems that underpin the web to not only rewrite it from scratch, but to write a better one, that avoids many design mistakes of the original, and is solid, secure, and cast in stone.
The second essay is called “Programming Beyond Text”. It outlines a vision for what programming will become over the next few years. It was written before language models, so surprisingly, a lot of it is about design from a pure UX perspective, and not how to cram the round peg of technology we have into the square hole of what is needed. I will write more about it shortly, and perhaps even release it, but the idea is to make feedback loops tighter, capture the magic of smalltalk-like image systems while keeping them legible, and show how edits to code change data and vice versa.
Say you’re writing a react component in your code editor of choice. Imagine there was an interactive, in-line preview of the component right below the code for it. Inline with the code itself would be the current state of each variable, just like how inferred types are annotated. You would be able to edit the code or the state, and watch the inline preview update instantly. Imagine this preview were bidirectional: you could modify the component, and the code would update. You could pull up a Figma-like design window, and use that to tweak the colors, shapes, and sizes of things.
With a chisel, an editor designed for programming beyond text would become infinitely malleable. I feel like this idea is inevitable and it will exist at some point. While at Zed, I tried to design the extension system for the editor with this in mind. I hope that one day the editor exposes its UI primitives in a capability-safe way, so little composable tools can be used to turn Zed from a text editor into a general-purpose application platform for development work. Imagine generating tools on demand to visualize and edit code in formats other that text. Using chisels to generate tools that modify code is different than vibecoding. It’s the difference between “zero-control creative-process-in-a-box text-to-image generation” and “photoshop with great tools that feel like magic”: one enables creative control, the other takes it from you. Whether tools are written by hand or composed on the fly, these lenses are concrete reifications that deeply integrate with the language runtime and the editor. Wouldn’t that be nice?
I’m excited. Today, most new applications are “shoebox”, or siloed, because system-level primitives that encourage users to manage their own data do not yet exist. (Does anybody use an OS with practical filesystem branching versioning?) If you want to break that trend, write something like Apeture, you must expend a lot of work and enforce a holistic engineering discipline to build one of these “true professional applications”. I would love for this to be easier. The tools will get better. I can’t wait to program beyond text and work to gradually carve beautiful systems in stone.
Padded so you can keep scrolling. I know. I love you. How about we take you back up to the top of this page?