1
Binary Format
GigabiteStudios edited this page 2026-06-17 19:04:13 -05:00

Binary Format

This page documents the current implementation-level layout. All multibyte fixed-width integers are little-endian.

Common primitives

Primitive Encoding
u8 One byte
u32le Four-byte unsigned integer, least-significant byte first
u64le Eight-byte unsigned integer, least-significant byte first
varu32 / varu64 Base-128 varint, seven payload bits per byte, continuation bit 0x80
vari64 Signed integer zigzag-transformed, then encoded as varu64
string varu32 byte_length, followed by that many bytes, no terminator

Zigzag maps small signed magnitudes to small unsigned values: 0 → 0, -1 → 1, 1 → 2, -2 → 3.

Type tags

Byte Type
0 null
1 string
2 signed integer
3 double
4 boolean
5 object
6 array

Node encoding

A general node is:

u8 type
payload(type)

Payloads are:

Type Payload
null no bytes
string string primitive
integer vari64
float IEEE-754 double bits as u64le
boolean u8, zero is false and nonzero is true
object varu32 count, then repeated string key + node value
array u8 element_type, varu32 count, then elements

For a mixed array (element_type == 0), every item is a complete node with its own type byte. For a typed array, each item stores only the fixed payload for the declared type. String, integer, boolean, and float typed arrays therefore avoid repeated tags. Object typed arrays still encode each item as a full node.

iKv1 binary document

4 bytes  magic = "iKv1"
u8       kind = 'b'
u32le    version = 1
string   root_name
node     root

iKv1 serializes the complete root sequentially.

iKv2 indexed binary document

iKv2 requires an object root and stores top-level members in a sorted index:

4 bytes  magic = "iKv2"
u8       kind = 'b'
u32le    version = 2
u32le    flags (bit 0 = indexed root)
string   root_name
varu32   entry_count

repeat entry_count times:
    string key

repeat entry_count times:
    u8     type
    u32le  payload_offset
    u32le  payload_size

payload area:
    complete node payload for each indexed entry

Offsets are absolute byte offsets from the start of the file. The writer sorts top-level keys lexicographically before writing the index and payload area.

Lazy top-level loading

When iKv2 binary input is loaded:

  1. The complete byte buffer is copied or owned by the root.
  2. Index keys, types, offsets, and lengths are validated.
  3. An in-memory hash index is built over the key table.
  4. Top-level values remain encoded until operator[]/object lookup requests a key.
  5. The selected payload is decoded and attached to the root.

This avoids eagerly decoding unrelated top-level values. Nested data inside a selected payload is decoded normally.

Validation

The reader rejects:

  • short or truncated headers
  • wrong magic, kind byte, or numeric version
  • missing indexed-root flag in iKv2
  • malformed or overflowing varints
  • out-of-range offsets and sizes
  • unknown type tags
  • incomplete object/array payloads
  • payloads that do not consume their declared iKv2 range exactly

Binary layout is an implementation contract for the current iKv versions. New incompatible layouts should use a new format version.