Nakamichi
 |  Section #Highlights  |  Section #SCREENSHOTS  |  Section #LINKS  |  Section #DOWNLOAD  |  Section #BOOKLET  |  Section #Intel_corpus  |  Section #Britannica_corpus  |  Section #Dickens_corpus  |  Section #King_corpus  |  Section #2019Apr08  |  Section #AUTO411_corpus  |  Section #ISTA9_corpus  |  Section #TEXTFILES_corpus  |  Section #DELPHI128_corpus  |  Section #HEADLINES_corpus  |  Section #MECH_corpus  |  Section #Ghidra_corpus  |  Section #Linux_corpus  |  Section #Valgrind_corpus  |  Section #Nakamichified  |  Section #TheMiddleWay  | 

Nakamichi
Nakamichi 'Dragoneye' highlights:

- The latest Zennish LZSS Microdeduplicator;
- File-to-File [de]compressor;
- Superfast decompression rates, superslow compression rates;
- On big (500++MB) textual data, second only to Hamid's LzTurbo 29, ratiowise, resourcewise and speedwise - TRIPLE TRUMP :P;
- Single-threaded Non-SIMD console tool written in plain C, compileable under Windows and Linux;
- An LZSS (Lempel–Ziv–Storer–Szymanski) implementation with Greedy Parsing and 1TB Sliding Window;
- Ability to deduplicate (as little as) 64 bytes long chunks 1TB backwards;
- Targets huge textual datasets (mainly English), weak-'n'-slow on binary data;
- One goal is to boost traversing (full-text parsing) of the whole XML dump of Wikipedia being ~64GB strong via TRANSPARENT decompression;
- The first matchfinder using both the fastest memmem() Railgun ‘Trolldom’ and B-trees;
- The first parser using both Internal or External RAM, decided by a single command line option - 'i' or 'e';
- Hashpot/hashpool (residing in Physical RAM) could be tuned via command line parameter, thus lessening the B-trees heights/attempts;
- The B-trees form the second layer, the first being HASH table handled by FNV1A-Jesteress;
- The Leprechaunesque (Internal/External) B-trees order 3 (2 keys MAX) are highly-optimized;
- DEPRECIATED (too slow): To keep LEAF’s footprint small, keys 36/64 bytes long are hashed by SHA3-224, otherwise left intact;
- The building of B-trees is done in 128 PASSES, thus LOCALITY/LOCALIZATION leads to cache-friendliness, for example, instead of confusing/blinding
  the SSD controller with building 2^27 ~= 128M B-trees at a time, 'PASSES' revision lowers the "noise/mayhem" 128 times by processing 1M B-trees at a time;
- 100% FREE;
- SCALABLE! Gets faster when more Physical or/and External RAM is available, on servers with 1TB RAM (or desktops with 64GB and 1TB Optane SSD) it will dance...

TO-DO:
- Trivially to return building B-trees in System RAM in passes - thus saving the SSD from trashing - ONLY SEQUENTIAL DUMPS - and much faster also.
- 2019-Aug-15: INCOMING! Trivially to skip inserting UNIQUE KEYS into B-trees - thus saving big_time and big_space, this revision is to be only ~50 lines of additional code.

URL TAG: http://www.sanmayce.com/Nakamichi/index.html#Highlights

Nakamichi

Nakamichi


Below, depicting how discarding of unwanted "matches" goes...
Nakamichi
For example, on i5-2430M and Crucial SSD MX200 256GB, the 'Silesia' corpus has been compressed at 988 B/s rate to 72,022,153 bytes.
If there were 293N = 60,650,304KB System RAM to house B-trees then the rate would be changed a little, 30x as a minimum.
Nakamichi

URL TAG: http://www.sanmayce.com/Nakamichi/index.html#SCREENSHOTS

What is the compression speed boost in latest revision? Let us see 'www.kernel.org_linux-4.20.7.tar' (854,876,160 bytes) testdatafile...

The memory requirements and building speed of old/depreciated 'SHA3' revision:
"Nakamichi_Ryuugan-ditto-1TB_RAM_(5GB)_PASSES_Intel150_64bit" www.kernel.org_linux-4.20.7.tar www.kernel.org_linux-4.20.7.tar.Nakamichi 27 232000 e
Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 185N = 154,669,823KB
Leprechaun: Total IOPS for 10,756,401,287 'freads' and 10,609,824,046 'fwrites' (of packets 98 bytes long) during loading traversing all orders: 49,966 IOPS
The memory requirements and building speed of latest (downloadable, a page below) 'PASSES' revision:
"Nakamichi_Ryuugan-ditto-1TB_RAM_(5GB)_PASSES_Intel150_64bit" www.kernel.org_linux-4.20.7.tar www.kernel.org_linux-4.20.7.tar.Nakamichi 27 232000 E
Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 232N = 189,564MB
Leprechaun: Total IOPS for 10,767,962,228 'freads' and 10,609,717,802 'fwrites' (of packets 170 bytes long) during loading traversing all orders: 277,679 IOPS
So, 277,679/49,966= 5.5x faster building!

Nakamichi