# MinIO MemKV RELEASE.2026-05-12T19-07-44Z

Released: 2026-05-12

MinIO MemKV is a high-performance, RDMA-backed inference context memory store
that bridges GPU memory and NVMe. This is the inaugural production release —
it ships the server, the NIXL plugin for NVIDIA Dynamo / KVBM, an LD_PRELOAD
shim for MLPerf-Storage kvcache workloads, and Python plugins for LMCache
(vLLM) and sglang.

> **First Release:** This is the inaugural production release of MinIO MemKV.

> **Minimum requirements:** Linux kernel **6.8.x or newer** on amd64 or arm64
> for the RDMA + NVMe performance path. macOS 13+ (Apple Silicon or Intel) is
> supported for file-mode + TCP deployments. See
> [Hardware / Compatibility](#hardware--compatibility) for the full matrix.

---

## Downloads

### Server Binary

| Platform | Architecture | Download |
| -------- | ------------ | -------- |
| Linux    | amd64        | [memkv](https://dl.min.io/aistor/memkv/release/linux-amd64/memkv) |
| Linux    | arm64        | [memkv](https://dl.min.io/aistor/memkv/release/linux-arm64/memkv) |

### NIXL Plugin (for Dynamo / KVBM integrations)

| Platform | Architecture | Download |
| -------- | ------------ | -------- |
| Linux    | amd64        | [libplugin_MEMKV.so](https://dl.min.io/aistor/memkv/release/linux-amd64/libplugin_MEMKV.so) |
| Linux    | arm64        | [libplugin_MEMKV.so](https://dl.min.io/aistor/memkv/release/linux-arm64/libplugin_MEMKV.so) |

### LD_PRELOAD Shim (for MLPerf-Storage kvcache workloads)

| Platform | Architecture | Download |
| -------- | ------------ | -------- |
| Linux    | amd64        | [libmemkv_preload.so](https://dl.min.io/aistor/memkv/release/linux-amd64/libmemkv_preload.so) |
| Linux    | arm64        | [libmemkv_preload.so](https://dl.min.io/aistor/memkv/release/linux-arm64/libmemkv_preload.so) |

### Packages

`.deb`, `.rpm`, and `.apk` packages bundle the server + both `.so` sidecars + the LMCache and sglang Python wheels into a single per-arch install.

| Format | Architecture | Download |
| ------ | ------------ | -------- |
| DEB    | amd64        | [memkv\_20260512190744.0.0_amd64.deb](https://dl.min.io/aistor/memkv/release/linux-amd64/memkv_20260512190744.0.0_amd64.deb) |
| DEB    | arm64        | [memkv\_20260512190744.0.0_arm64.deb](https://dl.min.io/aistor/memkv/release/linux-arm64/memkv_20260512190744.0.0_arm64.deb) |
| RPM    | amd64        | [memkv-20260512190744.0.0-1.x86_64.rpm](https://dl.min.io/aistor/memkv/release/linux-amd64/memkv-20260512190744.0.0-1.x86_64.rpm) |
| RPM    | arm64        | [memkv-20260512190744.0.0-1.aarch64.rpm](https://dl.min.io/aistor/memkv/release/linux-arm64/memkv-20260512190744.0.0-1.aarch64.rpm) |
| APK    | amd64        | [memkv\_20260512190744.0.0_x86_64.apk](https://dl.min.io/aistor/memkv/release/linux-amd64/memkv_20260512190744.0.0_x86_64.apk) |
| APK    | arm64        | [memkv\_20260512190744.0.0_aarch64.apk](https://dl.min.io/aistor/memkv/release/linux-arm64/memkv_20260512190744.0.0_aarch64.apk) |

After installing the deb/rpm, the Python plugin wheels land at `/usr/share/memkv/wheels/`:

```bash
pip install /usr/share/memkv/wheels/memkv_lmcache-*.whl
pip install /usr/share/memkv/wheels/memkv_sglang-*.whl
```

The NIXL plugin is auto-symlinked to `/opt/nvidia/nvda_nixl/lib/plugins/` when that directory exists (postinstall hook).

### Container Image

```bash
docker pull quay.io/minio/memkv:RELEASE.2026-05-12T19-07-44Z
docker pull quay.io/minio/memkv:latest
```

Container ships the server + the NIXL plugin (under `/usr/local/lib/plugins/`). The LD_PRELOAD shim and Python wheels are not included in the container image — use the deb/rpm for those.

### Verification

Each binary is signed with both minisign (preferred) and GPG; sha256sums are published alongside.

```bash
# minisign
minisign -Vm memkv -P RWTx5Zr1tiHQLwG9keckT0c45M3AGeHD6IvimQHpyRywVWGbP1aVSGav

# sha256
sha256sum -c memkv.sha256sum
```

(The minisign public key above is the MinIO release-signing key — same key used across the AIStor product line.)

---

## What's in this release

MemKV ships as one server plus a set of framework sidecars. They share a
single wire protocol, a single authentication model, and a single
`memkv-client` engine — picking different sidecars is how you integrate with
different inference stacks.

- **`memkv` server (Rust).** RDMA-first KV server with three storage modes:
  `direct` (raw NVMe over `io_uring` + `O_DIRECT`), `memory` (RAM-backed,
  for benchmarks and ephemeral caches), and `file` (mmap on a regular
  filesystem, for hosts without raw drives — laptops, Apple Silicon Macs,
  edge nodes, kind / CI clusters). All three modes share the same auth,
  license, metrics, and on-the-wire format.
- **NIXL plugin (`libplugin_MEMKV.so`).** Native plugin for the NVIDIA
  Inference Xfer Library. Lets NVIDIA Dynamo / KVBM offload KV context
  blocks directly to MemKV; supports both `DRAM_SEG` and `VRAM_SEG`
  (via `nvidia-peermem`).
- **LD_PRELOAD shim (`libmemkv_preload.so`).** Drop-in shim that redirects
  filesystem reads/writes from the MLPerf-Storage kvcache workload to a
  MemKV cluster — no source changes required to benchmark.
- **`memkv-lmcache` Python plugin.** `StoragePluginInterface` backend that
  plugs MemKV into LMCache / vLLM. Includes a zero-copy batched-put path
  and per-stage timing.
- **`memkv-sglang` Python plugin.** `HiCacheStorage` backend that plugs
  MemKV into sglang's hierarchical KV cache, with an RDMA zero-copy read
  fast path and batched put + warmup mirroring the LMCache backend.
- **Embedded documentation.** `memkv doc` serves the same Fumadocs site
  that lives at <https://docs.min.io/memkv/> — zstd-compressed and baked
  into the binary, so the docs are always pinned to the build.

## Architecture & Transport

- **DC (Dynamically Connected) RDMA transport on Mellanox ConnectX-4+.** A
  shared pool of 256 DCI (DC Initiator) QPs is acquired round-robin and
  per-WR addressed to any DCT (DC Target) endpoint — scales from O(N²)
  QPs to O(N) while holding line-rate throughput.
- **Transparent RC fallback on non-Mellanox NICs.** No configuration
  required; the client negotiates DC vs RC during the connection handshake.
- **`transport={auto,rdma,tcp}` knob.** `auto` is the default and prefers
  RDMA; `rdma` makes RDMA-required deploys fail loud rather than silently
  downgrade; `tcp` is for development and the file-mode shape.
- **End-to-end transparent fallback.** Plugin, preload shim, and the
  underlying client all share the same RDMA→TCP downgrade path, so a
  single deployment can mix RDMA and TCP clients against one cluster.
- **Direct I/O pattern, no server cache.** The server RDMA-READs from
  client memory on writes and RDMA-WRITEs to client memory on reads, then
  persists to / reads from NVMe over `io_uring` + `O_DIRECT`. Data does
  not buffer through server-side caches.
- **HMAC-SHA256 auth on every UDP / RDMA control message.** Auth MAC is
  scoped to the header. A loopback dev key is derived automatically when
  `auth_key` is unset so development flows work without ceremony;
  production deployments should set `MEMKV_AUTH_KEY` / `network.auth_key`
  explicitly.

## Storage

- **Extent-based block store.** Large context blocks split into 4 MB
  extents so I/O parallelizes across drives — same shape XFS uses for
  large files, removing serialization bottlenecks on multi-megabyte
  transfers.
- **JBOD backend with bitmap allocator.** Pooling across many drives
  attached to the same PCIe domain as the NIC; shared-nothing across
  servers.
- **Single-drive mode with zero-copy persist.** Co-located deployments
  with one NVMe behind the GPU host skip the JBOD allocator.
- **Write-ahead journal.** Crash-consistent block-store metadata.
- **Background TRIM coalescing.** DELETE marks blocks free and queues
  ranges to a worker that coalesces and issues `BLKDISCARD` every 5 s or
  every 1024 extents, whichever comes first — avoids synchronous-TRIM
  latency without surrendering sustained write performance.
- **`storage/filestore` preallocation.** On Linux the backing file is
  preallocated via `fallocate`; on macOS via `fcntl(F_PREALLOCATE)` +
  `ftruncate`. On-disk size matches `file_size_gb` from day zero; no
  sparse-allocation surprises.

## Performance

Benchmarked on 2x storage servers (12x PCIe Gen4 QLC NVMe each) with a GPU
node running 2x ConnectX-7 400GbE.

- **Peak aggregate throughput: 90.0 GiB/s read, 89.9 GiB/s write** — 97%
  of the 2x 400GbE read line rate.
- **Linear scaling:** 45.2 GiB/s on one server, 89.9–90.0 GiB/s on two.
  Add servers to scale; no coordination overhead.
- **Direct I/O hot path.** `io_uring` + `O_DIRECT` reads and writes; the
  block layer parallelizes across drives via 4 MB extents.

Inference TTFT improvements depend on model, hardware, and cache hit rate.
Measure with the bench tools in `scripts/` against your environment.

## Observability

- **Prometheus on the admin port.** Metrics exporter is default-on; admin
  endpoints (including `/v1/` probes) live on port 9901, data plane on
  9900.
- **Per-step server histograms.** End-to-end and per-stage latencies are
  bucketed individually — RDMA acquire, RDMA transfer, `io_uring` write,
  `io_uring` fsync. No more averaged-together numbers.
- **Stage timing on the plugin side too.** LMCache and sglang plugins
  emit outer-envelope timing for `put_batch` / `get_blocking`, with RDMA
  warmup and Prometheus default-on, so cache-tier latency attribution
  works end to end.

## Hardware / Compatibility

- **Linux amd64 and arm64.** Full server + NIXL plugin + preload shim +
  Python wheels bundle shipped as deb / rpm / apk and as standalone
  binaries.
- **Minimum Linux kernel: 6.8.x or newer.** Older kernels are not
  supported.
- **macOS (Apple Silicon or Intel), macOS 13+.** Server runs in file-mode
  storage over TCP — no RDMA, no NVMe direct, no JBOF, no hugepages.
  Same wire protocol, auth, and license as the Linux build. Build via
  `cargo build --release` on macOS; this release does not bundle a
  prebuilt macOS binary.
- **NICs.** Mellanox ConnectX-4 or newer for full DC transport. Any
  RoCE-capable NIC works via RC fallback at reduced QP-scaling
  efficiency.
- **Drives.** NVMe attached to the same PCIe domain as the NIC, addressed
  via `io_uring` + `O_DIRECT`. Tested on PCIe Gen4 QLC; Gen5 TLC/SLC will
  improve latency further.
- **GPU integration.** NVIDIA H200 via NIXL is the reference platform for
  inference-cache workloads. Any NIXL-compatible framework integrates via
  the same `libplugin_MEMKV.so`.

## Known Limitations

- **Single-replica.** This release does not cross-replicate blocks
  between MemKV servers — durability is a function of the underlying
  NVMe and the client's retention policy. Plan capacity per-server.
- **Container image scope.** `quay.io/minio/memkv` ships the server plus
  the NIXL plugin only. Deployments that need the LD_PRELOAD shim or the
  Python wheels should install the deb / rpm / apk on the host.
- **No prebuilt macOS binary in this release.** macOS server is supported
  by the codebase and distributed as a source build for now.
- **License required.** The server verifies a MinIO Software License on
  startup; a no-cost `play-free.license` is accepted on Linux for
  evaluation. Production deployments need an Enterprise Agreement.

---

## Documentation

- Hosted docs: <https://docs.min.io/memkv/>
- Embedded docs (in the binary): `memkv doc` serves the same site locally.

## Support

- Licensed customers: <https://subnet.min.io>
- Security disclosures: <security@min.io>
