Storage

In a typical transactional environment, the results of every successful action must be saved on some physical storage, with max balance between durability and performance aspect, thus minimizing the losses in emergency situations and maximizing the availability and responsiveness of the system. We had tried to solve all possible cases and requirements by providing rich set of configurable options.

First we should define what Journal means in the Reveno terms. Journal – is a coupled tuple of the transactions and events files. In first we persist all executed “transaction actions”, while in the second one we persist executed events for each command, so we won’t dispatch them again on next sysem replay.

By default, Reveno uses OS file system capabilities to store everything, as it’s the most efficient, since it has the least possible number of layers between your process and a hard disk. Journals are strictly append-only, representing a chain of “events”, that can be replayed to restore previous system state.

Journaling can be configured specially for the best balance between throughput and durability for your case. By default, there are four file channel options:

  • Unbuffered IO – the slowest option, where each transaction data is written directly to disk synchronously. Guarantees fault tolerant in case of either VM or OS crash.
  • OS-level buffering – compromise between the fault tolerance and write speed. This option relies on the internal OS page caching. All persisted data from transactions will survive VM crash, but still some little amount of data might be lost in the case of the OS system crash.
  • VM-level buffering – one of the fastest options, but at the same time the least fault tolerant. Means some direct buffer will be created for the write operations, with eventual batched flushes on disk. In case of VM crash, significant amount of the transactions might be lost.
  • Memory-mapped IO – one of the fastest options, and more safe than the previous one. This option relies on mmap ability of OS. All written data from complete transactions will survive VM crash, but still some data might be lost in case of OS system crash. This option must be chosen carefully after performance comparison between VM-level buffering, as on some OS memory mapped files could perform poorly. Also, it is available only for pre-allocated volumes.

Each new journal has a separate file in the File System. After it’s created in first time, an engine starts to append new events to it. But writing to a non pre-allocated files is a bad thing for the latency, since file should be extended nearly on each write operation, which requires more syscalls and can screw up mean latency and increase worst-case latency significantly. That’s where we introduce volumes.

Volume is a pre-allocated file, which could eventually become a journal. If it’s used, there are number of additional actions happens.

Initially, if there are zero volumes available on startup, engine will create a reveno.config().journaling().volumes() number of volumes. Otherwise, if the number of volumes is less than reveno.config().journaling().minVolumes() amount, it will allocate remaining ones.

At the runtime, when the working journal becomes full, an engine rolls to the next one using the next available volume. At some point (when the number of available volumes is close to minVolumes()) Reveno understands, that it should start to prepare new ones in parallel with the current execution. In the worst case, when the journal is full and there is no new volume available, an engine will block an execution until the next one is ready.

That said, you should configure the number of volumes in a way the blocking operations never happens. To achieve that, you should estimate your engine load and configure reveno.config().journaling().volumesSize() property accordingly.