Supporting Exabyte Sparse Files in DLFS

February 20, 2026 · 8 min read

Hacker, Convex Foundation

AI Assistant, Anthropic

DLFS — the Decentralised Lattice File System — now supports sparse files up to 9.2 exabytes (Long.MAX_VALUE bytes). Creating one is instant. Writing a single byte into the middle takes microseconds. Here's how we made that work with a handful of elegant tricks in Convex's immutable data layer.

All data in Convex is immutable. Once created, a value never changes — you produce a new version instead. This is fundamental to how the lattice works: immutability enables content-addressing (every value has a unique cryptographic hash), safe concurrent access, and automatic deduplication across the network.

But immutability has a cost. If updating a single byte in a large structure means copying the entire thing, performance collapses. The classic solution is structural sharing: when you create a new version of a tree, you reuse all the subtrees that didn't change. Only the path from the root to the modified leaf is rebuilt — everything else is shared between old and new versions, by reference.

This is how Convex vectors, maps, and sets already work. A million-element vector can be "modified" by rebuilding just a handful of tree nodes while the other thousands of nodes are shared with the previous version. Each node is wrapped in a Ref (reference) that caches its content hash lazily, so identical structures are always recognised as equal regardless of how they were constructed.

The question was: could we push structural sharing to its absolute limit — making it work for sparse data where the vast majority of the structure is identical empty space?

The Challenge

A sparse file is mostly empty — zeroes that don't actually consume storage. Unix filesystems have supported them for decades; you can truncate a file to a terabyte and it takes almost no space until you write real data into it.

We wanted the same capability in DLFS, Convex's decentralised file system. DLFS stores file contents as immutable BlobTree structures — persistent 16-way trees where every node is content-addressed by its cryptographic hash. A naive implementation of a 9.2 EB zero-filled blob would require building a tree with over two quintillion leaf nodes. That's clearly not going to work.

But with structural sharing, identical subtrees can be the same object. And in a zero-filled file, every subtree at every level is identical to its siblings. The entire exabyte file can collapse down to one shared node per tree level.

Making this work required four interlocking techniques.

BlobTree is a 16-way radix tree. Each internal node has up to 16 children. Leaf nodes are flat Blob objects of exactly 4096 bytes (the chunk size). Each level of the tree multiplies capacity by 16, so the tree depth grows as log₁₆ of the file size.

The key insight for sparse files: every chunk in a zero-filled region is identical. And since Convex data structures are immutable, identical values can be shared.

We start with a singleton:

public static final Blob EMPTY_CHUNK = Blob.wrap(new byte[CHUNK_LENGTH]);

From this single 4096-byte allocation, Blobs.createZero() builds an arbitrarily large zero blob through recursive structural sharing:

public static ABlob createZero(long length) {
    if (length <= Blob.CHUNK_LENGTH) {
        return Blob.EMPTY_CHUNK.slice(0, length);
    }
    long csize = BlobTree.childSize(length);
    int n = BlobTree.childCount(length);
    ABlob fullChild = createZero(csize);  // shared across all full children
    Ref<ABlob> fullRef = fullChild.getRef();

    @SuppressWarnings("unchecked")
    Ref<ABlob>[] children = new Ref[n];
    for (int i = 0; i < n - 1; i++) {
        children[i] = fullRef;            // same Ref object
    }
    // last child may be smaller
    long lastSize = length - csize * (n - 1);
    children[n - 1] = createZero(lastSize).getRef();
    return BlobTree.create(children, length);
}

Every full child at every level shares a single Java object. A 9.2 EB zero blob (2⁶³ − 1 bytes) has a tree depth of 13 levels — and contains only ~26 unique node objects in memory. Creation is instant.

2. Tree-Aware replaceSlice

Creating a sparse file is only half the problem. You need to be able to write into it efficiently. In DLFS, DLFileChannel.write() ultimately calls replaceSlice(position, data) on the file's BlobTree.

The default implementation in ABlob slices the blob into head and tail segments around the write position and rejoins them with two appends:

// Default: splits and rebuilds — O(n) for large blobs
Blob h = slice(0, position);
ABlob r = h.append(b).append(slice(position + blen, count));

For a 1 GB file, even a single-byte write in the middle rebuilds the entire tree spine twice. For an exabyte file, it's completely hopeless.

Our tree-aware override navigates directly to the affected children:

@Override
public ABlob replaceSlice(long position, ABlob b) {
    long csize = childLength();
    int firstChild = (int)(position / csize);
    int lastChild = (int)((end - 1) / csize);

    Ref<ABlob>[] newChildren = null;  // lazy clone

    for (int i = firstChild; i <= lastChild; i++) {
        ABlob oldChild = getChild(i);
        // ... compute slice of b for this child ...
        ABlob newChild = oldChild.replaceSlice(replaceStart, piece);
        if (newChild != oldChild) {
            if (newChildren == null) newChildren = children.clone();
            newChildren[i] = newChild.getRef();
        }
    }

    if (newChildren == null) return this;  // nothing changed
    return new BlobTree(newChildren, shift, count);
}

This has several nice properties:

O(log₁₆ n) — descends only through the affected branch of the tree
Identity preservation — unchanged children keep their existing Ref, so their subtrees aren't touched at all
Lazy clone — the children array is only copied if something actually changed
Recursive — a write spanning a chunk boundary naturally splits across two children, each recursing independently

Writing a single byte into a 9.2 EB sparse file touches exactly 13 nodes (one per tree level) plus one leaf chunk. Everything else is preserved.

3. Hash-Based Equality for Shared Structures

There's a subtle trap with structural sharing. When two independently created sparse blobs share the same logical content but aren't the same Java object (e.g. two calls to Blobs.createZero(Long.MAX_VALUE)), comparing them for equality must not devolve into visiting every node.

Convex values are wrapped in Ref objects that cache their content hash lazily. When two RefDirect instances hold different Java objects, the natural approach is to recurse into value.equals(va). But for a BlobTree with fanout 16 and depth 13, that fans out to 16¹³ ≈ 10¹⁵ recursive calls — even though there are only ~26 unique nodes.

The fix is to fall back to hash comparison when object identity doesn't match:

// RefDirect.equals — the critical path
if (value == va) return true;           // identity: instant
if (this.hash != null && a.hash != null)
    return this.hash.equals(a.hash);    // cached hashes: instant
return getHash().equals(a.getHash());   // force lazy computation

Because every shared node computes its hash exactly once and caches it, the total work for comparing two independently created exabyte zero blobs is proportional to the number of unique nodes — about 26 hash computations, not 10¹⁵ recursive comparisons.

This is a general improvement: any Convex data structure with structural sharing benefits from the same O(unique nodes) equality check.

4. Overflow Protection

When you can represent 9.2 EB blobs, you're operating at the edge of 64-bit arithmetic. Appending two large blobs can silently overflow Long.MAX_VALUE, wrapping to a negative count and producing a corrupt data structure.

We added explicit overflow guards:

if (count > Long.MAX_VALUE - dlen) {
    throw new IllegalArgumentException("Blob append would exceed maximum size");
}

This check is O(1), costs nothing in the normal case, and prevents a class of subtle corruption bugs that would be extremely difficult to diagnose.

Putting It Together

The result is a file system primitive with remarkable properties:

Operation	Complexity	Example (9.2 EB file)
Create sparse file	O(log₁₆ n)	~26 node allocations
Write single byte	O(log₁₆ n)	~13 tree nodes + 1 chunk
Read any byte	O(log₁₆ n)	~13 tree traversals
Compare two sparse files	O(unique nodes)	~26 hash computations
Memory for empty file	O(log₁₆ n)	~26 objects total

All of this composes with the rest of the Convex lattice stack. Every node is content-addressed, so identical subtrees are automatically deduplicated in storage. Writes produce new tree spines that share structure with the old version — giving you full version history for free. And because BlobTree is just another ACell, it participates in Convex's lattice merge semantics, consensus protocol, and persistence layer without special cases.

The entire implementation is under 100 lines of new code across three files. Sometimes the most powerful abstractions come from taking an existing design seriously — and then simply not allocating memory for things that don't need to exist.

The sparse file support is available in Convex 0.8.x. The full source is at github.com/Convex-Dev/convex.

Immutable Data with Structural Sharing​

The Challenge​

1. Structural Sharing via a Single Empty Chunk​

2. Tree-Aware replaceSlice​

3. Hash-Based Equality for Shared Structures​

4. Overflow Protection​

Putting It Together​

Immutable Data with Structural Sharing

The Challenge

1. Structural Sharing via a Single Empty Chunk

2. Tree-Aware replaceSlice

3. Hash-Based Equality for Shared Structures

4. Overflow Protection

Putting It Together