Binary Format
This page documents the current implementation-level layout. All multibyte fixed-width integers are little-endian.
Common primitives
| Primitive | Encoding |
|---|---|
u8 |
One byte |
u32le |
Four-byte unsigned integer, least-significant byte first |
u64le |
Eight-byte unsigned integer, least-significant byte first |
varu32 / varu64 |
Base-128 varint, seven payload bits per byte, continuation bit 0x80 |
vari64 |
Signed integer zigzag-transformed, then encoded as varu64 |
| string | varu32 byte_length, followed by that many bytes, no terminator |
Zigzag maps small signed magnitudes to small unsigned values: 0 → 0, -1 → 1, 1 → 2, -2 → 3.
Type tags
| Byte | Type |
|---|---|
| 0 | null |
| 1 | string |
| 2 | signed integer |
| 3 | double |
| 4 | boolean |
| 5 | object |
| 6 | array |
Node encoding
A general node is:
u8 type
payload(type)
Payloads are:
| Type | Payload |
|---|---|
| null | no bytes |
| string | string primitive |
| integer | vari64 |
| float | IEEE-754 double bits as u64le |
| boolean | u8, zero is false and nonzero is true |
| object | varu32 count, then repeated string key + node value |
| array | u8 element_type, varu32 count, then elements |
For a mixed array (element_type == 0), every item is a complete node with its own type byte. For a typed array, each item stores only the fixed payload for the declared type. String, integer, boolean, and float typed arrays therefore avoid repeated tags. Object typed arrays still encode each item as a full node.
iKv1 binary document
4 bytes magic = "iKv1"
u8 kind = 'b'
u32le version = 1
string root_name
node root
iKv1 serializes the complete root sequentially.
iKv2 indexed binary document
iKv2 requires an object root and stores top-level members in a sorted index:
4 bytes magic = "iKv2"
u8 kind = 'b'
u32le version = 2
u32le flags (bit 0 = indexed root)
string root_name
varu32 entry_count
repeat entry_count times:
string key
repeat entry_count times:
u8 type
u32le payload_offset
u32le payload_size
payload area:
complete node payload for each indexed entry
Offsets are absolute byte offsets from the start of the file. The writer sorts top-level keys lexicographically before writing the index and payload area.
Lazy top-level loading
When iKv2 binary input is loaded:
- The complete byte buffer is copied or owned by the root.
- Index keys, types, offsets, and lengths are validated.
- An in-memory hash index is built over the key table.
- Top-level values remain encoded until
operator[]/object lookup requests a key. - The selected payload is decoded and attached to the root.
This avoids eagerly decoding unrelated top-level values. Nested data inside a selected payload is decoded normally.
Validation
The reader rejects:
- short or truncated headers
- wrong magic, kind byte, or numeric version
- missing indexed-root flag in iKv2
- malformed or overflowing varints
- out-of-range offsets and sizes
- unknown type tags
- incomplete object/array payloads
- payloads that do not consume their declared iKv2 range exactly
Binary layout is an implementation contract for the current iKv versions. New incompatible layouts should use a new format version.
iKvxx Wiki
Format Reference
C++ API
- API Overview
- Objects and Arrays
- Parsing and Serialization
- Ownership, Errors, and Limits
- C++ Examples