74 Commits

Author SHA1 Message Date
Kim Altintop
7cd72b8adf
commitlog: Resumption of sealed commitlog (#4650)
The commitlog so far assumed that the latest segment is never compressed
and can be opened for writing (if it is intact).

However, restoring the entire commitlog from cold storage results in all
segments being compressed. Make it so the resumption logic reads the
metadata from the potentially compressed last segment, and starts a new
segment for writing if the latest one was indeed compressed.

# Expected complexity level and risk

1.5

# Testing

Added a test.
2026-03-17 12:02:17 +00:00
Kim Altintop
17cc15ef4c
Append commit instead of individual transactions to commitlog (#4404)
Re-open #4140 (reverted in #4292).


The original patch was merged a bit too eagerly.
It should go in _after_ 2.0 is released with some confidence.
2026-03-03 15:08:05 +00:00
Noa
e3582131fe
Migrate to Rust 2024 (#3802)
# Description of Changes

It'd be best to review this commit-by-commit, and using
[difftastic](https://difftastic.wilfred.me.uk) to easily tell when
changes are minor in terms of syntax but a line based diff doesn't show
that.

# Expected complexity level and risk

3 - edition2024 does bring changes to drop order, which could cause
issues with locks, but I looked through [all of the warnings that
weren't fixed
automatically](ba80f3fecd/warnings.html)
and couldn't find any issues.

# Testing

n/a; internal code change
2026-03-03 11:06:52 +00:00
Shubham Mishra
ab65b60fe4
fix index truncate edge cases (#4501)
# Description of Changes
Index offset `truncate` methods returns `IndexError::KeyNotFound` when
asked to truncate on empty index offset file or the key in input is
smaller than the first entry.

This was causing `commitlog::reset_to` method to return error, and stuck
replicas in re-spawn loop.

# API and ABI breaking changes
NA

# Expected complexity level and risk
1

# Testing
Added new tests to cover edge scenerios.
2026-03-02 07:31:14 +00:00
Kim Altintop
8aa22da034
commitlog: Improve committed_meta (#4338)
- Extends `commit::Metadata` to include the checksum
- Extends `segment::Metadata` to include `Some(commit::Metadata)`
  containing the last commit in the segment (if there is one)
- Changes `committed_meta` to:
  - ignore empty segments at the end of the log
  - try harder to provide useful metadata, even if only a prefix of the
    latest segment is readable

This is allows to eliminate remaining `Commitlog::open` calls with the
purpose of querying the latest commit (offset). `Commitlog::open`
creates an empty segment if the tail of the log is corrupt, which is a
non-obvious side-effect that can be confusing when debugging.

It also allows to eliminate uses where the `commits_from` iterator is
used to find the latest full commit. The `Commits` iterator requires the
caller to handle the case of a corrupted commit at the end of the log,
by advancing the iterator once more after it has yielded an error in
order to check that it is exhausted, and then deciding whether to ignore
the error. This is easy to forget.

`committed_meta` now just does the right thing, preserving information
about tail corruption for when that's useful.
2026-02-23 14:16:59 +00:00
clockwork-labs-bot
bad5335114
Revert "Append commit instead of individual transactions to commitlog (#4140)" (#4292)
Reverts #4140 per @kim's request — was not ready to merge yet.

Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
2026-02-13 10:23:24 -05:00
Kim Altintop
c4c3bf78b3
Append commit instead of individual transactions to commitlog (#4140)
Changes the commitlog (and durability) write API, such that the caller
decides how many transactions are in a single commit, and has to supply
the transaction offsets.

This simplifies commitlog-side buffering logic to essentially a
`BufWriter` (which, of course, we must not forget to flush). This will
help throughput, but offers less opportunity to retry failed writes.
This is probably a good thing, as disks can fail in erratic ways, and we
should rather crash and re-verify the commitlog (suffix) than continue
writing.

To that end, this patch liberally raises panics when there is a chance
that internal state could be "poisoned" by partial writes, which may be
debatable.


# Motivation

The main motivation is to avoid maintaining the transaction offset in
two places in such a way that they could diverge. As ordering commits is
the responsibility of the datastore, we make it authoritative on this
matter -- the commitlog will still check that offsets are contiguous,
and refuse to commit if that's not the case.

A secondary, related motivation is the following:

A "commit" is an atomic unit of storage, meaning that a torn (partial)
write of a commit will render the entire commit corrupt. There hasn't
been a compelling case where we would want this, and have always
configured the server to write exactly one transaction per commit.
The code to handle buffering of transactions is, however, rather
complex, as it tries hard to allow the caller to retry writes at commit
boundaries. An unfortunate consequence of this is that we'd flush to the
OS very often, leaving throughput performance on the table.

So, if there is a compelling case for batching multiple transactions in
a commit, it should be the datastore's responsibility.


# API and ABI breaking changes

Breaks internal APIs only.

# Expected complexity level and risk

5 - Mostly for the risk

# Testing

Existing tests.
2026-02-13 13:10:30 +00:00
Tyler Cloutier
2ec07a3f70
Standardize query builder syntax across Rust, TypeScript, and C# (Server/Client) (#4261)
# Description of Changes

Standardizes the query builder API across all three language SDKs (Rust,
TypeScript, C#) for consistency.

**Rust:**
- Rename `Query` struct to `RawQuery`, make `Query` a trait with `fn
into_sql(self) -> String`
- All builder types (`Table`, `FromWhere`, `LeftSemiJoin`,
`RightSemiJoin`) implement `Query<T>` trait
- Views can return `-> impl Query<T>` instead of specifying exact
builder types
- The `#[view]` macro auto-detects `impl Query<T>` and rewrites to
`RawQuery<T>`
- Add `Not` variant to `BoolExpr` with `.not()` method

**TypeScript:**
- Add `ne()` to `ColumnExpression`
- Refactor `BooleanExpr` to `BoolExpr` class with chainable `.and()`,
`.or()`, `.not()` methods
- Make builders valid queries directly (`.build()` deprecated but still
works)
- Deprecate `from()` wrapper — use `tables.person.where(...)` directly
- Merge `query` export into `tables` so table refs are also query
builders
- Add subscription callback form: `subscribe(ctx =>
ctx.from.person.where(...))`
- Unify `useTable` with query builder syntax; deprecate `filter.ts`

**C#:**
- Add `Not()` method to `BoolExpr<TRow>`
- Add `IQuery<TRow>` interface implemented by all builder types
(`Table`, `FromWhere`, `LeftSemiJoin`, `RightSemiJoin`, `Query`)
- Add `ToSql()` to all builder types so `.Build()` is no longer required
- Update `AddQuery` to accept `IQuery<TRow>` instead of `Query<TRow>`

# API and ABI breaking changes

- Rust: `Query<T>` is now a trait (was a struct). The struct is renamed
to `RawQuery<T>`. This is a breaking change for any code that used
`Query<T>` as a type directly.
- TypeScript: `BooleanExpr` is now a `BoolExpr` class (was a
discriminated union type). The `query` export is deprecated in favor of
`tables`.
- C#: `AddQuery` now accepts `Func<QueryBuilder, IQuery<TRow>>` instead
of `Func<QueryBuilder, Query<TRow>>`. Existing `.Build()` calls still
work since `Query<TRow>` implements `IQuery<TRow>`.

# Expected complexity level and risk

3 — Changes touch multiple language SDKs and codegen, but each
individual change is straightforward. The Rust macro rewrite for `impl
Query<T>` detection is the most complex piece. All existing
`.build()`/`.Build()` calls continue to work.

# Testing

- [x] `cargo test -p spacetimedb-query-builder` — 16/16 tests pass
- [x] `cargo check -p spacetimedb` — clean, no warnings
- [x] `cargo check` on views-query, views-sql, views-basic,
views-trapped smoketest modules — all clean
- [x] `cargo test -p spacetimedb-codegen codegen_csharp` — snapshot
updated, passes
- [x] `npm test` (TypeScript) — 101/101 tests pass
- [x] C# QueryBuilder tests — new tests for `Not()`, `IQuery<T>`
interface
- [ ] CI passes
2026-02-13 04:22:49 +00:00
Phoebe Goldman
7136c37ed3
Expose a couple things to enable some work in another repo (#3986)
# Description of Changes

Just adds public accessors for some existing internal functionality, to
enable a change in another repo. Review starting from there.

# API and ABI breaking changes

These crates are marked unstable.

# Expected complexity level and risk

0.5

# Testing

See other PR.

---------

Co-authored-by: Kim Altintop <kim@eagain.io>
2026-01-15 18:44:12 +00:00
Kim Altintop
fc784fd233
commitlog: Change default max-records-in-commit to 1 (#3681)
It is almost always wrong / undesirable to pack more than one
transaction in a commit, so adjust the default accordingly.

This also avoids surprises when using `#[serde(default)]` with nested
structs -- serde evaluates the default depth-first, so overriding a
single field in a nested struct will not consider any
`#[serde(default = "custom_default")]` annotations on the parent.
2025-11-18 19:44:02 +00:00
Kim Altintop
cfd0d4b712
commitlog,durability: Support preallocation of disk space (#3437)
When a new commitlog segment is created, allocate disk space for it up
to the maximum segment size. Also do this when resuming writes to an
existing segment, such that segments created without preallocation will
allocate as well when the database is opened.

Preallocation is gated behind the feature "fallocate", because it is not
always desirable to preallocate, e.g. for local `standalone` users.

The feature can only be enabled on Linux targets, because allocation is
done using the Linux-specific `fallocate(2)` system call.

Unlike `ftruncate(2)` or the portable `posix_fallocate(3)`,
`fallocate(2)`
supports allocating disk space without zeroing. This is currently
required, because the commitlog format does not handle padding bytes.

If not enough space can be allocated, the commitlog refuses writes. For
commitlogs that were created without preallocation, this means that the
commitlog cannot even be opened in this situation.

The local durability impl will crash if it detects that the commitlog is
unable to allocate enough space.

This means that a database will eventually crash and be unable to start
in
an out-of-space situation.

Allocated space is not included in the reported size of the commitlog.
Instead, allocated blocks are reported separately.


# Expected complexity level and risk

3 - Disk size monitoring may need to be adjusted.

# Testing

- [x] Adds a test that demonstrates the crash behavior of
[`spacetimedb_durability::Local`]
when there is insufficient space. The test performs I/O against a loop
device.
- [x] Modified the `repo::Memory` impl so that it can run out of space.
No test currently
utilizes this, but existing tests assuming infinite space still pass.
2025-11-10 16:55:55 +00:00
Kim Altintop
798852e466
commitlog: Improve error context (#3506)
The commitlog creates new segments atomically, returning EEXIST if the
segment already exists. This is to break a retry loop in case the
filesystem becomes unwritable.

This error did not contain any context about what does not exist, so
this patch adds some.

Also, an unhandled edge case has been discovered:

When opening an existing log, the commitlog will try to resume the last
segment for writing. If it finds a corrupt commit in that segment, it
won't resume, but instead create a new segment at the corrupt commit's
offset + 1.

However, if the first commit in the last segment is corrupted, the
offset will be that of the last segment -- trying to start a new segment
will thus fail with EEXIST.

Without additional recovery mechanisms, it is not obvious what to do in
this case: the segment could contain valid data after the initial
commit, so we certainly don't want to throw it away.

Instead, we now detect this case and return `InvalidData` with some
context.

# Expected complexity level and risk

1

# Testing

- [ ] A (regression) test is included
2025-10-29 10:59:35 +00:00
Phoebe Goldman
e77b62f475
Also capture a snapshot every new commitlog segment (#3405)
# Description of Changes

We've run into a problem on Maincloud caused by a database that was
writing a relatively small number of very large transactions. This was
accruing many commitlog segments consuming hundreds of gigabytes of
disk, but had not ever taken a snapshot, or compressed or archived any
data, as the database had not progressed past one million transactions.

With this PR, we take a snapshot every time the commitlog segment
rotates. We still also snapshot every million transactions.

One BitCraft database we looked at had 2.5 million transactions per
commitlog segment, meaning that this change will not meaningfully affect
the frequency of snapshots. The offending Maincloud database, however,
had only 50 transactions per segment!

# API and ABI breaking changes

N/a

# Expected complexity level and risk

3: Hastily made changes to finnicky code across several crates.

# Testing

I am unsure how to test these changes.

- [ ] <!-- maybe a test you want to do -->
- [ ] <!-- maybe a test you want a reviewer to do, so they can check it
off when they're satisfied. -->
2025-10-15 15:18:15 +00:00
Noa
619b8ce021
Bump Rust to 1.90 (#3397)
# Description of Changes

Necessary for pulling in rolldown.

# API and ABI breaking changes

None

# Expected complexity level and risk

1, with the caveat that this updates the Rust version and therefore
touches all the code.

# Testing

- [ ] Just the automated testing
2025-10-09 20:41:25 +00:00
Zeke Foppa
f6f0909ea4
Update all licenses (#3002)
# Description of Changes

We recently merged several repos together. This PR clarifies the license
terms for several subdirectories, as well as the relationship between
the licenses.

The licenses in our subdirectories have become symbolic links to
licenses in our toplevel `licenses` directory. For any particular
subdirectory's license file in the diff, you can click `... -> View
file` and then click on the text that says "Symbolic Link" on that page.
This will take you to the license file that it links to.

I have also updated the `tools/upgrade-version` script to update the
change date in the new `licenses/BSL.txt` file.

# API and ABI breaking changes

None.

# Expected complexity level and risk

1

# Testing

None. Only changes to license files.

---------

Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
2025-08-12 18:20:58 +00:00
Kim Altintop
37c64c787b
commitlog: Provide folding over a range of tx offsets (#3129)
Adds methods and free-standing functions to allow folds to stop at an
upper
bound, by passing a range instead of only a start offset.

# Expected complexity level and risk

1

# Testing
2025-08-08 11:55:27 +00:00
Kim Altintop
7709f3cf1e
commitlog: Set up options for toml configuration (#2942) 2025-07-17 08:34:35 +00:00
Noa
742303ca49
Bump rust-toolchain to rust 1.88 (#2749)
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
2025-07-15 17:39:41 +00:00
Viktor Szépe
f6da9e1f5f
Fix typos (#2812)
Signed-off-by: Viktor Szépe <viktor@szepe.net>
2025-06-04 16:33:32 +00:00
Kim Altintop
a6bc0e59fd
commitlog: Reduce log noise when offset index cannot be used (#2791) 2025-06-02 07:27:12 +00:00
Kim Altintop
c98219529d
commitlog: Fix index truncation test. (#2792) 2025-05-30 15:13:06 +00:00
Shubham Mishra
f2a9657a72
Commitlog: handle empty offset index lookup (#2771) 2025-05-22 12:59:02 +00:00
Kim Altintop
1f4207de86
commitlog: Include latest commit offset in segment metadata (#2733) 2025-05-19 06:54:58 +00:00
Kim Altintop
3d1a91c25c
Handle snapshot restore more robustly (#2735)
Signed-off-by: Kim Altintop <kim@eagain.io>
Signed-off-by: Shubham Mishra <shivam828787@gmail.com>
Co-authored-by: Shubham Mishra <shubham@clockworklabs.io>
2025-05-15 14:35:09 +00:00
Shubham Mishra
41c316c984
Commitlog stream range fix. (#2721) 2025-05-10 04:06:05 +00:00
Noa
483a9488e2
Update rand (#2568) 2025-04-11 17:39:41 +00:00
Mario Montoya
3fd78203c4
Compress the snapshot (#2034) 2025-04-11 15:18:17 +00:00
Shubham Mishra
76a52ca747
Use Offset Index on Meta extract (#2549) 2025-04-09 19:05:52 +00:00
Noa
2f6660e919
Add integration test for commitlog compression (#2538) 2025-04-08 17:10:31 +00:00
Kim Altintop
d88a266c20
commitlog: Derive serde for Commit (#2535) 2025-04-02 11:16:11 +00:00
Noa
d436b1f9b7
Followup to #2504 (#2534) 2025-03-31 23:52:56 +00:00
Noa
a5212a5f75
Commitlog compression (#2504) 2025-03-31 22:00:52 +00:00
Kim Altintop
8dfab1c09d
commitlog: Open stream writer with metadata (#2530) 2025-03-31 17:25:39 +00:00
Kim Altintop
5063bd8759
commitlog: Streaming (#2492) 2025-03-26 07:40:23 +00:00
Kim Altintop
434c28063f
commitlog: Fix open flags for read-only offset index (#2468) 2025-03-19 12:24:30 +00:00
Mario Montoya
f9f38543c8
Add readmes to all implementation crates specifying that they do no offer stable interfaces (#2320) 2025-03-06 19:50:17 +00:00
Shubham Mishra
7cb509c2e2
handle offset index empty (#2344) 2025-03-05 11:04:10 +00:00
Kim Altintop
8054999927
commitlog: Use fdatasync (#2338) 2025-03-04 15:20:14 +00:00
Noa
293aebaef9
Bump to Rust 1.84 (#2001) 2025-01-28 23:11:29 +00:00
Phoebe Goldman
d171b44a89
Don't create indexes during bootstrapping; wait until after replay (#2161) 2025-01-23 19:41:39 +00:00
Kim Altintop
c5f4c8bc5c
commitlog: Make offset index usable externally (#2108) 2025-01-14 18:56:08 +00:00
Kim Altintop
da0f83b6dd
commitlog: Make memory segment behave like O_APPEND (#2072) 2024-12-20 11:28:19 +00:00
Kim Altintop
a191055f56
commitlog: Fix offset index truncation (#2073) 2024-12-19 15:44:10 +00:00
Kim Altintop
31698618a8
commitlog: Provide segment_len method for segments (#2042) 2024-12-10 10:43:39 +00:00
Shubham Mishra
f04d2817d0
create commitlog dir in fs::New (#2006) 2024-11-21 15:47:40 +00:00
Kim Altintop
125ab58388
commitlog: Fix set_epoch (#2005) 2024-11-21 13:34:10 +00:00
Noa
97bff92efb
Optimize integrate_generated_columns (#1895) 2024-11-12 16:36:50 +00:00
Noa
f136670420
Directory structure impl (#1879)
Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@gmail.com>
2024-11-12 04:24:43 +00:00
Kim Altintop
e4fcb72432
commitlog: Small tweaks (#1978) 2024-11-11 13:24:21 +00:00
Kim Altintop
f22b163c0a
commitlog: Introduce epoch (#1851) 2024-11-05 10:10:30 +00:00