diff options
Diffstat (limited to 'libs/libmdbx/src/README.md')
-rw-r--r-- | libs/libmdbx/src/README.md | 539 |
1 files changed, 316 insertions, 223 deletions
diff --git a/libs/libmdbx/src/README.md b/libs/libmdbx/src/README.md index 78cc906763..a5b420f35b 100644 --- a/libs/libmdbx/src/README.md +++ b/libs/libmdbx/src/README.md @@ -1,159 +1,139 @@ +### The [repository now only mirrored on the Github](https://abf.io/erthink/libmdbx) due to illegal discriminatory restrictions for Russian Crimea and for sovereign crimeans. +<!-- Required extensions: pymdownx.betterem, pymdownx.tilde, pymdownx.emoji, pymdownx.tasklist, pymdownx.superfences --> +----- + libmdbx ====================================== -**Revised and extended descendant of [Symas LMDB](https://symas.com/lmdb/).** +Revised and extended descendant of [Lightning Memory-Mapped Database](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database) (aka _LMDB_). +Русскоязычная версия [здесь](README-RU.md). -*The Future will be positive.* -[](https://travis-ci.org/leo-yuriev/libmdbx) -[](https://ci.appveyor.com/project/leo-yuriev/libmdbx/branch/master) -[](https://scan.coverity.com/projects/reopen-libmdbx) +_libmdbx_ is superior to LMDB in terms of features and reliability, not +inferior in performance. _libmdbx_ works on Linux, FreeBSD, MacOS X and +other systems compliant with POSIX.1-2008, but also support Windows as a +complementary platform. -## Project Status for now - - - The stable versions - ([_stable/0.0_](https://github.com/leo-yuriev/libmdbx/tree/stable/0.0) - and - [_stable/0.1_](https://github.com/leo-yuriev/libmdbx/tree/stable/0.1) - branches) of _MDBX_ are frozen, i.e. no new features or API changes, but - only bug fixes. - - - The next version - ([_devel_](https://github.com/leo-yuriev/libmdbx/tree/devel) branch) - **is under active non-public development**, i.e. current API and set of - features are extreme volatile. - - - The immediate goal of development is formation of the stable API and - the stable internal database format, which allows realise all PLANNED - FEATURES: - 1. Integrity check by [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree); - 2. Support for [raw block devices](https://en.wikipedia.org/wiki/Raw_device); - 3. Separate place (HDD) for large data items; - 4. Using "[Roaring bitmaps](http://roaringbitmap.org/about/)" inside garbage collector; - 5. Non-sequential reclaiming, like PostgreSQL's [Vacuum](https://www.postgresql.org/docs/9.1/static/sql-vacuum.html); - 6. [Asynchronous lazy data flushing](https://sites.fas.harvard.edu/~cs265/papers/kathuria-2008.pdf) to disk(s); - 7. etc... - -Don't miss libmdbx for other runtimes. - -| Runtime | GitHub | Author | -| ------------- | ------------- | ------------- | -| JVM | [mdbxjni](https://github.com/castortech/mdbxjni) | [Castor Technologies](https://castortech.com/) | -| .NET | [mdbx.NET](https://github.com/wangjia184/mdbx.NET) | [Jerry Wang](https://github.com/wangjia184) | +The next version is under active non-public development, which will +radically change both the API and the database format. The goal of this +revolution is to provide a clearer and more reliable API, add more +features and new database properties. ------ +*The Future will (be) [Positive](https://www.ptsecurity.com). Всё будет хорошо.* -Nowadays MDBX intended for Linux, and support Windows (since Windows -Server 2008) as a complementary platform. Support for other OS could be -implemented on commercial basis. However such enhancements (i.e. pull -requests) could be accepted in mainstream only when corresponding public -and free Continuous Integration service will be available. +[](https://travis-ci.org/leo-yuriev/libmdbx) +[](https://ci.appveyor.com/project/leo-yuriev/libmdbx/branch/master) +[](https://scan.coverity.com/projects/reopen-libmdbx) -## Contents +## Table of Contents - [Overview](#overview) - - [Comparison with other DBs](#comparison-with-other-dbs) - - [History & Acknowledgments](#history) -- [Main features](#main-features) -- [Improvements over LMDB](#improvements-over-lmdb) -- [Gotchas](#gotchas) - - [Long-time read transactions problem](#long-time-read-transactions-problem) - - [Data safety in async-write-mode](#data-safety-in-async-write-mode) + - [Comparison with other databases](#comparison-with-other-databases) + - [History & Acknowledgments](#history) +- [Description](#description) + - [Key features](#key-features) + - [Improvements over LMDB](#improvements-over-lmdb) + - [Gotchas](#gotchas) + - [Problem of long-time reading](#problem-of-long-time-reading) + - [Durability in asynchronous writing mode](#durability-in-asynchronous-writing-mode) +- [Usage](#usage) + - [Building](#building) + - [Bindings](#bindings) - [Performance comparison](#performance-comparison) - - [Integral performance](#integral-performance) - - [Read scalability](#read-scalability) - - [Sync-write mode](#sync-write-mode) - - [Lazy-write mode](#lazy-write-mode) - - [Async-write mode](#async-write-mode) - - [Cost comparison](#cost-comparison) + - [Integral performance](#integral-performance) + - [Read scalability](#read-scalability) + - [Sync-write mode](#sync-write-mode) + - [Lazy-write mode](#lazy-write-mode) + - [Async-write mode](#async-write-mode) + - [Cost comparison](#cost-comparison) +----- ## Overview _libmdbx_ is an embedded lightweight key-value database engine oriented -for performance under Linux and Windows. +for performance. _libmdbx_ allows multiple processes to read and update several key-value tables concurrently, while being [ACID](https://en.wikipedia.org/wiki/ACID)-compliant, with minimal -overhead and operation cost of Olog(N). - -_libmdbx_ provides -[serializability](https://en.wikipedia.org/wiki/Serializability) and -consistency of data after crash. Read-write transactions don't block -read-only transactions and are -[serialized](https://en.wikipedia.org/wiki/Serializability) by -[mutex](https://en.wikipedia.org/wiki/Mutual_exclusion). +overhead and Olog(N) operation cost. -_libmdbx_ +_libmdbx_ enforce +[serializability](https://en.wikipedia.org/wiki/Serializability) for +writers by single +[mutex](https://en.wikipedia.org/wiki/Mutual_exclusion) and affords [wait-free](https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom) -provides parallel read transactions without atomic operations or -synchronization primitives. +for parallel readers without atomic/interlocked operations, while +writing and reading transactions do not block each other. + +_libmdbx_ can guarantee consistency after crash depending of operation +mode. _libmdbx_ uses [B+Trees](https://en.wikipedia.org/wiki/B%2B_tree) and -[mmap](https://en.wikipedia.org/wiki/Memory-mapped_file), doesn't use -[WAL](https://en.wikipedia.org/wiki/Write-ahead_logging). This might -have caveats for some workloads. +[Memory-Mapping](https://en.wikipedia.org/wiki/Memory-mapped_file), +doesn't use [WAL](https://en.wikipedia.org/wiki/Write-ahead_logging) +which might be a caveat for some workloads. -### Comparison with other DBs -Because _libmdbx_ is currently overhauled, I think it's better to just -link [chapter of Comparison with other -databases](https://github.com/coreos/bbolt#comparison-with-other-databases) -here. +### Comparison with other databases +For now please refer to [chapter of "BoltDB comparison with other +databases"](https://github.com/coreos/bbolt#comparison-with-other-databases) +which is also (mostly) applicable to MDBX. ### History The _libmdbx_ design is based on [Lightning Memory-Mapped Database](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database). Initial development was going in -[ReOpenLDAP](https://github.com/leo-yuriev/ReOpenLDAP) project, about a -year later it received separate development effort and in autumn 2015 -was isolated to separate project, which was [presented at Highload++ -2015 conference](http://www.highload.ru/2015/abstracts/1831.html). +[ReOpenLDAP](https://github.com/leo-yuriev/ReOpenLDAP) project. About a +year later libmdbx was isolated to separate project, which was +[presented at Highload++ 2015 +conference](http://www.highload.ru/2015/abstracts/1831.html). -Since early 2017 _libmdbx_ is used in [Fast PositiveTables](https://github.com/leo-yuriev/libfpta), -by [Positive Technologies](https://www.ptsecurity.com). +Since early 2017 _libmdbx_ is used in [Fast Positive Tables](https://github.com/leo-yuriev/libfpta), +and development is funded by [Positive Technologies](https://www.ptsecurity.com). -#### Acknowledgments -Howard Chu (Symas Corporation) - the author of LMDB, from which +### Acknowledgments +Howard Chu <hyc@openldap.org> is the author of LMDB, from which originated the MDBX in 2015. -Martin Hedenfalk <martin@bzero.se> - the author of `btree.c` code, which +Martin Hedenfalk <martin@bzero.se> is the author of `btree.c` code, which was used for begin development of LMDB. +----- + +Description +=========== -Main features -============= +## Key features -_libmdbx_ inherits all keys features and characteristics from +_libmdbx_ inherits all features and characteristics from [LMDB](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database): -1. Data is stored in ordered map, keys are always sorted, range lookups -are supported. +1. Key-value pairs are stored in ordered map(s), keys are always sorted, +range lookups are supported. -2. Data is [mmaped](https://en.wikipedia.org/wiki/Memory-mapped_file) to -memory of each worker DB process, read transactions are zero-copy. +2. Data is [memory-mapped](https://en.wikipedia.org/wiki/Memory-mapped_file) +into each worker DB process, and could be accessed zero-copy from transactions. 3. Transactions are -[ACID](https://en.wikipedia.org/wiki/ACID)-compliant, thanks to +[ACID](https://en.wikipedia.org/wiki/ACID)-compliant, through to [MVCC](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) and [CoW](https://en.wikipedia.org/wiki/Copy-on-write). Writes are strongly serialized and aren't blocked by reads, transactions can't conflict with each other. Reads are guaranteed to get only commited data ([relaxing serializability](https://en.wikipedia.org/wiki/Serializability#Relaxing_serializability)). -4. Reads and queries are +4. Read transactions are [non-blocking](https://en.wikipedia.org/wiki/Non-blocking_algorithm), -don't use [atomic -operations](https://en.wikipedia.org/wiki/Linearizability#High-level_atomic_operations). +don't use [atomic operations](https://en.wikipedia.org/wiki/Linearizability#High-level_atomic_operations). Readers don't block each other and aren't blocked by writers. Read performance scales linearly with CPU core count. - > Though "connect to DB" (start of first read transaction in thread) and - > "disconnect from DB" (shutdown or thread termination) requires to - > acquire a lock to register/unregister current thread from "readers - > table" + > Nonetheless, "connect to DB" (starting the first read transaction in a thread) and + > "disconnect from DB" (closing DB or thread termination) requires a lock + > acquisition to register/unregister at the "readers table". 5. Keys with multiple values are stored efficiently without key -duplication, sorted by value, including integers (reasonable for +duplication, sorted by value, including integers (valuable for secondary indexes). -6. Efficient operation on short fixed length keys, including integer -ones. +6. Efficient operation on short fixed length keys, +including 32/64-bit integer types. 7. [WAF](https://en.wikipedia.org/wiki/Write_amplification) (Write Amplification Factor) и RAF (Read Amplification Factor) are Olog(N). @@ -161,14 +141,12 @@ Amplification Factor) и RAF (Read Amplification Factor) are Olog(N). 8. No [WAL](https://en.wikipedia.org/wiki/Write-ahead_logging) and transaction journal. In case of a crash no recovery needed. No need for regular maintenance. Backups can be made on the fly on working DB - without freezing writers. +without freezing writers. -9. No custom memory management, all done with standard OS syscalls. +9. No additional memory management, all done by basic OS services. --------------------------------------------------------------------------------- -Improvements over LMDB -====================== +## Improvements over LMDB 1. Automatic dynamic DB size management according to the parameters specified by `mdbx_env_set_geometry()` function. Including @@ -192,21 +170,23 @@ additional resources for that. [BBWC](https://en.wikipedia.org/wiki/Disk_buffer#Write_acceleration) this may greatly improve write performance. -4. Fast estimation of range query result size via functions `mdbx_estimate_range()`, -`mdbx_estimate_move()` and `mdbx_estimate_distance()`. E.g. for selection the -optimal query execution plan. +4. Fast estimation of range query result size via functions +`mdbx_estimate_range()`, `mdbx_estimate_move()` and +`mdbx_estimate_distance()`. E.g. for selection the optimal query +execution plan. 5. `mdbx_chk` tool for DB integrity check. -6. Support for keys and values of zero length, including sorted -duplicates. +6. Support for keys and values of zero length, including multi-values +(aka sorted duplicates). -7. Ability to assign up to 3 markers to commiting transaction with -`mdbx_canary_put()` and then get them in read transaction by -`mdbx_canary_get()`. +7. Ability to assign up to 3 persistent 64-bit markers to commiting +transaction with `mdbx_canary_put()` and then get them in read +transaction by `mdbx_canary_get()`. 8. Ability to update or delete record and get previous value via -`mdbx_replace()`. Also can update specific multi-value. +`mdbx_replace()`. Also allows update the specific item from multi-value +with the same key. 9. Sequence generation via `mdbx_dbi_sequence()`. @@ -228,30 +208,29 @@ duplicates. * abort current write transaction with returning error code. -11. Ability to open DB in exclusive mode with `MDBX_EXCLUSIVE` flag. +11. Ability to open DB in exclusive mode by `MDBX_EXCLUSIVE` flag. -12. Ability to get how far current read-only snapshot is from latest -version of the DB by `mdbx_txn_straggler()`. +12. Ability to get how far current read-transaction snapshot lags +from the latest version of the DB by `mdbx_txn_straggler()`. -13. Ability to explicitly request update of present record without -creating new record. Implemented as `MDBX_CURRENT` flag for -`mdbx_put()`. +13. Ability to explicitly update the existing record, not insertion +a new one. Implemented as `MDBX_CURRENT` flag for `mdbx_put()`. 14. Fixed `mdbx_cursor_count()`, which returns correct count of -duplicated for all table types and any cursor position. +duplicated (aka multi-value) for all cases and any cursor position. 15. `mdbx_env_info()` to getting additional info, including number of -the oldest snapshot of DB, which is used by one of the readers. +the oldest snapshot of DB, which is used by someone of the readers. 16. `mdbx_del()` doesn't ignore additional argument (specifier) `data` for tables without duplicates (without flag `MDBX_DUPSORT`), if `data` is not null then always uses it to verify record, which is being deleted. -17. Ability to open dbi-table with simultaneous setup of comparators for -keys and values, via `mdbx_dbi_open_ex()`. +17. Ability to open dbi-table with simultaneous with race-free setup +of comparators for keys and values, via `mdbx_dbi_open_ex()`. -18. `mdbx_is_dirty()`to find out if key or value is on dirty page, that +18. `mdbx_is_dirty()`to find out if given key or value is on dirty page, that useful to avoid copy-out before updates. 19. Correct update of current record in `MDBX_CURRENT` mode of @@ -266,21 +245,21 @@ useful to avoid copy-out before updates. 22. Ability to get value by key and duplicates count by `mdbx_get_ex()`. 23. Functions `mdbx_cursor_on_first()` and `mdbx_cursor_on_last()`, -which allows to know if cursor is currently on first or last position +which allows to check cursor is currently on first or last position respectively. -24. Automatic creation of synchronization points (flush changes to -persistent storage) when changes reach set threshold (threshold can be -set by `mdbx_env_set_syncbytes()`). +24. Automatic creation of steady commit-points (flushing data to the +disk) when the volume of changes reaches a threshold, which can be +set by `mdbx_env_set_syncbytes()`. 25. Control over debugging and receiving of debugging messages via `mdbx_setup_debug()`. -26. Function `mdbx_env_pgwalk()` for page-walking all pages in DB. +26. Function `mdbx_env_pgwalk()` for page-walking the DB. -27. Three meta-pages instead of two, this allows to guarantee -consistently update weak sync-points without risking to corrupt last -steady sync-point. +27. Three meta-pages instead of two, that allows to guarantee +consistency of data when updating weak commit-points without the +risk of damaging the last steady commit-point. 28. Guarantee of DB integrity in `WRITEMAP+MAPSYNC` mode: > Current _libmdbx_ gives a choice of safe async-write mode (default) @@ -292,9 +271,9 @@ steady sync-point. creation of steady synchronization point) via `mdbx_env_close_ex()`. 30. If read transaction is aborted via `mdbx_txn_abort()` or -`mdbx_txn_reset()` then DBI-handles, which were opened in it, aren't -closed or deleted. This allows to avoid several types of hard-to-debug -errors. +`mdbx_txn_reset()` then DBI-handles, which were opened during it, +will not be closed or deleted. In several cases this allows +to avoid hard-to-debug errors. 31. All cursors in all read and write transactions can be reused by `mdbx_cursor_renew()` and MUST be freed explicitly. @@ -307,13 +286,29 @@ errors. > - double-free; > - memory corruption and segfaults. --------------------------------------------------------------------------------- + +32. On **Mac OS X** the `fcntl(F_FULLFSYNC)` syscall is used _by +default_ to synchronize data with the disk, as this is [the only way to +guarantee data +durability](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fsync.2.html) +in case of power failure. Unfortunately, in scenarios with high write +intensity, the use of `F_FULLFSYNC` significant degrades performance +compared to LMDB, where the `fsync()` syscall is used. Therefore, +_libmdbx_ allows you to override this behavior by defining the +`MDBX_OSX_SPEED_INSTEADOF_DURABILITY=1` option while build the library. + +33. On **Windows** the `LockFileEx()` syscall is used for locking, since +it allows place the database on network drives, and provides protection +against incompetent user actions (aka +[poka-yoke](https://en.wikipedia.org/wiki/Poka-yoke)). Therefore +_libmdbx_ may be a little lag in performance tests from LMDB where a +named mutexes are used. + ## Gotchas -1. At one moment there can be only one writer. But this allows to -serialize writes and eliminate any possibility of conflict or logical -errors during transaction rollback. +1. There cannot be more than one writer at a time. This allows serialize an +updates and eliminate any possibility of conflicts, deadlocks or logical errors. 2. No [WAL](https://en.wikipedia.org/wiki/Write-ahead_logging) means relatively big [WAF](https://en.wikipedia.org/wiki/Write_amplification) @@ -346,7 +341,7 @@ performance bottleneck in `MAPASYNC` mode. > storage then it's much more preferable to use `std::map`. -4. LMDB has a problem of long-time readers which degrades performance +4. _LMDB_ has a problem of long-time readers which degrades performance and bloats DB. > _libmdbx_ addresses that, details below. @@ -357,56 +352,41 @@ of data. > Details below. -#### Long-time read transactions problem +### Problem of long-time reading Garbage collection problem exists in all databases one way or another (e.g. VACUUM in PostgreSQL). But in _libmdbx_ and LMDB it's even more -important because of high performance and deliberate simplification of -internals with emphasis on performance. +discernible because of high transaction rate and intentional internals +simplification in favor of performance. + +Understanding the problem requires some explanation, but can be +difficult for quick perception. So is is reasonable +to simplify this as follows: -* Altering data during long read operation may exhaust available space -on persistent storage. +* Massive altering of data during a parallel long read operation may +exhaust the free DB space. -* If available space is exhausted then any attempt to update data -results in `MAP_FULL` error until long read operation ends. +* If the available space is exhausted, any attempt to update the data +will cause a "MAP_FULL" error until a long read transaction is +completed. -* Main examples of long readers is hot backup and debugging of client -application which actively uses read transactions. +* A good example of long readers is a hot backup or debugging of +a client application while retaining an active read transaction. * In _LMDB_ this results in degraded performance of all operations of -syncing data to persistent storage. - -* _libmdbx_ has a mechanism which aborts such operations and `LIFO RECLAIM` -mode which addresses performance degradation. - -Read operations operate only over snapshot of DB which is consistent on -the moment when read transaction started. This snapshot doesn't change -throughout the transaction but this leads to inability to reclaim the -pages until read transaction ends. - -In _LMDB_ this leads to a problem that memory pages, allocated for -operations during long read, will be used for operations and won't be -reclaimed until DB process terminates. In _LMDB_ they are used in -[FIFO](https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics)) -manner, which causes increased page count and less chance of cache hit -during I/O. In other words: one long-time reader can impact performance -of all database until it'll be reopened. - -_libmdbx_ addresses the problem, details below. Illustrations to this -problem can be found in the -[presentation](http://www.slideshare.net/leoyuriev/lmdb). There is also -example of performance increase thanks to -[BBWC](https://en.wikipedia.org/wiki/Disk_buffer#Write_acceleration) -when `LIFO RECLAIM` enabled in _libmdbx_. - -#### Data safety in async-write mode -In `WRITEMAP+MAPSYNC` mode dirty pages are written to persistent storage -by kernel. This means that in case of application crash OS kernel will -write all dirty data to disk and nothing will be lost. But in case of -hardware malfunction or OS kernel fatal error only some dirty data might -be synced to disk, and there is high probability that pages with -metadata saved, will point to non-saved, hence non-existent, data pages. -In such situation, DB is completely corrupted and can't be repaired even -if there was full sync before the crash via `mdbx_env_sync(). +writing data to persistent storage. + +* _libmdbx_ has the `OOM-KICK` mechanism which allow to abort such +operations and the `LIFO RECLAIM` mode which addresses performance +degradation. + +### Durability in asynchronous writing mode +In `WRITEMAP+MAPSYNC` mode updated (aka dirty) pages are written to +persistent storage by the OS kernel. This means that if the application +fails, the OS kernel will finish writing all updated data to disk and +nothing will be lost. However, in the case of hardware malfunction or OS +kernel fatal error, only some updated data can be written to disk and +the database structure is likely to be destroyed. In such situation, DB +is completely corrupted and can't be repaired. _libmdbx_ addresses this by fully reimplementing write path of data: @@ -414,39 +394,154 @@ _libmdbx_ addresses this by fully reimplementing write path of data: instead their shadow copies are used and their updates are synced after data is flushed to disk. -* During transaction commit _libmdbx_ marks synchronization points as -steady or weak depending on how much synchronization needed between RAM -and persistent storage, e.g. in `WRITEMAP+MAPSYNC` commited transactions -are marked as weak, but during explicit data synchronization - as -steady. +* During transaction commit _libmdbx_ marks it as a steady or weak +depending on synchronization status between RAM and persistent storage. +For instance, in the `WRITEMAP+MAPSYNC` mode committed transactions +are marked as weak by default, but as steady after explicit data flushes. * _libmdbx_ maintains three separate meta-pages instead of two. This -allows to commit transaction with steady or weak synchronization point -without losing two previous synchronization points (one of them can be -steady, and second - weak). This allows to order weak and steady -synchronization points in any order without losing consistency in case -of system crash. - -* During DB open _libmdbx_ rollbacks to the last steady synchronization -point, this guarantees database integrity. - -For data safety pages which form database snapshot with steady -synchronization point must not be updated until next steady -synchronization point. So last steady synchronization point creates -"long-time read" effect. The only difference that in case of memory -exhaustion the problem will be immediately addressed by flushing changes -to persistent storage and forming new steady synchronization point. +allows to commit transaction as steady or weak without losing two +previous commit points (one of them can be steady, and another +weak). Thus, after a fatal system failure, it will be possible to +rollback to the last steady commit point. + +* During DB open _libmdbx_ rollbacks to the last steady commit point, +this guarantees database integrity after a crash. However, if the +database opening in read-only mode, such rollback cannot be performed +which will cause returning the MDBX_WANNA_RECOVERY error. + +For data integrity a pages which form database snapshot with steady +commit point, must not be updated until next steady commit point. +Therefore the last steady commit point creates an effect analogues to +"long-time read". The only difference that now in case of space +exhaustion the problem will be immediately addressed by writing changes +to disk and forming the new steady commit point. + +So in async-write mode _libmdbx_ will always use new pages until the +free DB space will be exhausted or `mdbx_env_sync()` will be invoked, +and the total write traffic to the disk will be the same as in +sync-write mode. + +Currently libmdbx gives a choice between a safe async-write mode +(default) and `UTTERLY_NOSYNC` mode which may lead to DB corruption +after a system crash, i.e. like the LMDB. + +Next version of _libmdbx_ will be automatically create steady commit +points in async-write mode upon completion transfer data to the disk. -So in async-write mode _libmdbx_ will always use new pages until memory -is exhausted or `mdbx_env_sync()` is invoked. Total disk usage will be -almost the same as in sync-write mode. +-------------------------------------------------------------------------------- -Current _libmdbx_ gives a choice of safe async-write mode (default) and -`UTTERLY_NOSYNC` mode which may result in full DB corruption during -system crash as with LMDB. +Usage +===== + +## Building + +To build on all platforms except Windows the prerequirements are the +same: non-obsolete versions of GNU Make, +[bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)), C and C++ +compilers compatible with GCC or CLANG. On Windows you will need only : +Microsoft Visual Studio 2015 or later, Windows SDK for Windows 8 or +later. + +Historically, the libmdbx builing is based on single +[Makefile](https://en.wikipedia.org/wiki/Makefile) which assumes +different recipes depending on target platform. In the next versions, it +is planned to switch to [CMake](https://en.wikipedia.org/wiki/CMake), +with the refusal to support other tools. + +#### DSO/DLL unloading and destructors of Thread-Local-Storage objects +When building _libmdbx_ as a shared library or use static _libmdbx_ as a +part of another dynamic library, it is advisable to make sure that your +system ensures the correctness of the call destructors of +Thread-Local-Storage objects when unloading dynamic libraries'. + +If this is not the case, then unloading a dynamic-link library with +_libmdbx_ code inside, can result in either a resource leak or a crash +due to calling destructors from an already unloaded DSO/DLL object. The +problem can only manifest in a multithreaded application, which makes +the unloading of shared dynamic libraries with _libmdbx_ code inside, +after using _libmdbx_. It is known that TLS-destructors are properly +maintained in the following cases: + +- On all modern versions of Windows (Windows 7 and later). + +- On systems with the +[`__cxa_thread_atexit_impl()`](https://sourceware.org/glibc/wiki/Destructor%20support%20for%20thread_local%20variables) +function in the standard C library, including systems with GNU libc +version 2.18 and later. + +- On systems with libpthread/ntpl from GNU libc with bug fixes +[#21031](https://sourceware.org/bugzilla/show_bug.cgi?id=21031) and +[#21032](https://sourceware.org/bugzilla/show_bug.cgi?id=21032), or +where there are no similar bugs in the pthreads implementation. + +### Linux and other platforms with GNU Make +To build the library it is enough to execute `make all` in the directory +of source code, and `make check` for execute the basic tests. + +If the `make` installed on the system is not GNU Make, there will be a +lot of errors from make when trying to build. In this case, perhaps you +should use `gmake` instead of `make`, or even `gnu-make`, etc. + +### FreeBSD and related platforms +As a rule, in such systems, the default is to use Berkeley Make. And GNU +Make is called by the gmake command or may be missing. In addition, +[bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) may be absent. + +You need to install the required components: GNU Make, bash, C and C++ +compilers compatible with GCC or CLANG. After that, to build the +library, it is enough execute `gmake all` (or `make all`) in the +directory with source code, and `gmake check` (or `make check`) to run +the basic tests. + +### Windows +For building _libmdbx_ on Windows the [Microsoft Visual +Studio](https://en.wikipedia.org/wiki/Microsoft_Visual_Studio) is +recommended, but not tools such as MinGW, MSYS, or Cygwin. To do this, +the libmdbx source code includes the set of appropriate project files +that are compatible with Visual Studio 2015, the Windows SDK for Windows +8.1, and later. Just open `mdbx.sln` in Visual Studio and build the +library. + +To build with newer versions of the SDK or Visual Studio, it should be +sufficient to execute "Retarget solution". To build for older versions +of Windows (such as Windows XP) or by older compilers, you will need to +convert or recreate the corresponding project files yourself. + +Building by MinGW, MSYS or Cygwin is potentially possible. However, +these scripts are not tested and will probably require you to modify the +Makefile. It should be noted that in _libmdbx_ was efforts to resolve +runtime dependencies from CRT and other libraries Visual Studio. +For this is enough define the `MDBX_AVOID_CRT` during build. + +An example of running a basic test script can be found in the +[CI-script](appveyor.yml) for [AppVeyor](https://www.appveyor.com/). To +run the [long stochastic test scenario](test/long_stochastic.sh), +[bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) is required, and +the such testing is recommended with place the test data on the +[RAM-disk](https://en.wikipedia.org/wiki/RAM_drive). + +### MacOS X +Current [native build tools](https://en.wikipedia.org/wiki/Xcode) for +MacOS X include GNU Make, CLANG and an outdated version of bash. +Therefore, to build the library, it is enough to run `make all` in the +directory with source code, and run `make check` to execute the base +tests. If something goes wrong, it is recommended to install +[Homebrew](https://brew.sh/) and try again. + +To run the [long stochastic test scenario](test/long_stochastic.sh), you +will need to install the current (not outdated) version of +[bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)). To do this, we +recommend that you install [Homebrew](https://brew.sh/) and then execute +`brew install bash`. + +## Bindings + + | Runtime | GitHub | Author | + | -------- | ------ | ------ | + | Java | [mdbxjni](https://github.com/castortech/mdbxjni) | [Castor Technologies](https://castortech.com/) | + | .NET | [mdbx.NET](https://github.com/wangjia184/mdbx.NET) | [Jerry Wang](https://github.com/wangjia184) | -Next version of _libmdbx_ will create steady synchronization points -automatically in async-write mode. -------------------------------------------------------------------------------- @@ -458,9 +553,7 @@ and multiple [scripts](https://github.com/pmwkaa/ioarena/tree/HL%2B%2B2015) runs on Lenovo Carbon-2 laptop, i7-4600U 2.1 GHz, 8 Gb RAM, SSD SAMSUNG MZNTD512HAGL-000L1 (DXT23L0Q) 512 Gb. --------------------------------------------------------------------------------- - -### Integral performance +## Integral performance Here showed sum of performance metrics in 3 benchmarks: @@ -486,7 +579,7 @@ Here showed sum of performance metrics in 3 benchmarks: -------------------------------------------------------------------------------- -### Read Scalability +## Read Scalability Summary performance with concurrent read/search queries in 1-2-4-8 threads on 4 CPU cores machine. @@ -495,7 +588,7 @@ threads on 4 CPU cores machine. -------------------------------------------------------------------------------- -### Sync-write mode +## Sync-write mode - Linear scale on left and dark rectangles mean arithmetic mean transactions per second; @@ -517,7 +610,7 @@ and after full run the database contains 10,000 small key-value records. -------------------------------------------------------------------------------- -### Lazy-write mode +## Lazy-write mode - Linear scale on left and dark rectangles mean arithmetic mean of thousands transactions per second; @@ -544,7 +637,7 @@ records. -------------------------------------------------------------------------------- -### Async-write mode +## Async-write mode - Linear scale on left and dark rectangles mean arithmetic mean of thousands transactions per second; @@ -568,7 +661,7 @@ and after full run the database contains 10,000 small key-value records. -------------------------------------------------------------------------------- -### Cost comparison +## Cost comparison Summary of used resources during lazy-write mode benchmarks: |