Nakamichi 'Hayabusa' a.k.a. 'Peregrine'
Falconinely fast ( ͡⚆ ͜ʖ ͡⚆) textual decompressor


Nakamichi

/Photo by Masaki/


高速伸長
Above four kanjis were written as a reply to paying my respects to Prof. Okumura, they are the motto/banner of this page.
Fastest textual decompression is targeted, here comes Nakamichi.

Update, 2015-Feb-12:

Couldn't resist and finally wrote 'Hayabusa' (peregrine falcon), a lightweighted 'Tengu' with Match Lengths 8 and 4.
Nakamichi 'Tengu' is a native XMM (128bit) decompression etude, 1MB Sliding Window is in use.
Nakamichi 'Hayabusa' is a native GP (64bit) decompression etude, 2MB Sliding Window is in use.
The beauty of this last Nakamichi variant is in its:
- licenselessness (100% FREE); ( ͡⚆ ͜ʖ ͡⚆)
- falconine textual decompression speed;
- potential ability to remove in future the second reload (in the literal branch), exploiting the fact of 7 bytes MAX long literals.

Since 'Hayabusa' targets only Enlgish texts, in my opinion, the most relevant such benchmark has to be executed on the queen of English prose:
02/13/2015  03:45 AM        33,258,496 Agatha_Christie_89-ebooks_TXT.tar
02/13/2015  03:46 AM        13,365,137 Agatha_Christie_89-ebooks_TXT.tar.lz4
02/13/2015  04:09 AM        13,148,167 Agatha_Christie_89-ebooks_TXT.tar.Nakamichi

D:\_KAZE\EBOOKs_for_benching\z>..\lz4 -9 -Sx -b -T1 Agatha_Christie_89-ebooks_TXT.tar
Nb of threads = 1 ; Compression Level = 9
Agatha_Christie :  33258496 ->  13365090 ( 40.19%),   12.6 MB/s ,  908.7 MB/s

D:\_KAZE\EBOOKs_for_benching\z>..\Nakamichi_Hayabusa_GP.exe Agatha_Christie_89-ebooks_TXT.tar.Nakamichi /benchmark
Nakamichi 'Hayabusa', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 13148167 bytes ...
RAM-to-RAM performance: 896 MB/s.
Memory pool starting address: 00000000011D0080 ... 64 byte aligned, OK
Copying a 512MB block 1024 times i.e. 512GB READ + 512GB WRITTEN ...
memcpy(): (512MB block); 524288MB copied in 193533 clocks or 2.709MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 33%
'Hayabusa' happened to outspeed one of the two LZ kings (LZ4) on OSHO.TXT reaching 960 MB/s vs 947.4 MB/s,
but this is the exception that validates the rule, heh-heh.

unsigned int Decompress (char* ret, char* src, unsigned int srcSize) {
	char* retLOCAL = ret;
	char* srcLOCAL = src;
	char* srcEndLOCAL = src+srcSize;
	unsigned int DWORDtrio;
	unsigned int DWORDtrioDumbo;
	while (srcLOCAL < srcEndLOCAL) {
		DWORDtrio = *(unsigned int*)srcLOCAL;
#ifndef _N_GP
#ifdef _N_prefetch_64
		_mm_prefetch((char*)(srcLOCAL + 64), _MM_HINT_T0);
#endif
#ifdef _N_prefetch_128
		_mm_prefetch((char*)(srcLOCAL + 64*2), _MM_HINT_T0);
#endif
#ifdef _N_prefetch_4096
		_mm_prefetch((char*)(srcLOCAL + 64*64), _MM_HINT_T0);
#endif
#endif
// |1stLSB    |2ndLSB  |3rdLSB   |
// -------------------------------
// |OO|L|xxxxx|xxxxxxxx|xxxxxx|xx|
// -------------------------------
// [1bit          16bit]    24bit]
// L = 0b means Long MatchLength, (8-L) or 8
// L = 1b means Long MatchLength, (8-L) or 4
// OO = 00b means Literal                                                                        
// OO = 01b MatchOffset, 0xFFFFFFFF>>OO, 3 bytes long i.e. Sliding Window is 3*8-L-OO=3*8-3=21 or 2MB    
// OO = 10b MatchOffset, 0xFFFFFFFF>>OO, 2 bytes long i.e. Sliding Window is 2*8-L-OO=2*8-3=13 or 8KB    
// OO = 11b MatchOffset, 0xFFFFFFFF>>OO, 1 byte long  i.e. Sliding Window is 1*8-L-OO=1*8-3=5 or 32B     
		if ( (DWORDtrio & 0x03) == 0x00 ) {                                                       
				#ifdef _N_GP
		memcpy(retLOCAL, (const char *)( (uint64_t)(srcLOCAL+1) ), 8);
				#endif
				#ifdef _N_XMM
		SlowCopy128bit( (const char *)( (uint64_t)(srcLOCAL+1) ), retLOCAL );
				#endif
		retLOCAL+= ((DWORDtrio & 0xFF)>>3);
		srcLOCAL+= ((DWORDtrio & 0xFF)>>3)+1;
		} else {
		DWORDtrioDumbo = (DWORDtrio & 0x03)<<3; // To avoid 'LEA'
				#ifdef _N_GP
			memcpy(retLOCAL, (const char *)( (uint64_t)(retLOCAL-((DWORDtrio&(0xFFFFFFFF>>DWORDtrioDumbo))>>3)) ), 8);
				#endif
				#ifdef _N_XMM
			SlowCopy128bit( (const char *)( (uint64_t)(retLOCAL-((DWORDtrio&(0xFFFFFFFF>>DWORDtrioDumbo))>>3)) ), retLOCAL );
				#endif
		retLOCAL+= 8-(DWORDtrio&0x04);
		srcLOCAL+= 4-(DWORDtrioDumbo>>3);
		}
	}        
	return (unsigned int)(retLOCAL - ret);
}

; 'Hayabusa' decompression loop, 6d-14+2=91 bytes long:
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140";
; mark_description "-O3 -D_N_GP -FAcs";

.B6.3::                         
  00014 41 8b 01         mov eax, DWORD PTR [r9]                
  00017 89 c1            mov ecx, eax                           
  00019 83 e1 03         and ecx, 3                             
  0001c 75 17            jne .B6.5 
.B6.4::                         
  0001e 0f b6 c0         movzx eax, al                          
  00021 c1 e8 03         shr eax, 3                             
  00024 49 8b 49 01      mov rcx, QWORD PTR [1+r9]              
  00028 49 89 0a         mov QWORD PTR [r10], rcx               
  0002b 4c 03 d0         add r10, rax                           
  0002e ff c0            inc eax                                
  00030 4c 03 c8         add r9, rax                            
  00033 eb 35            jmp .B6.6 
.B6.5::                         
  00035 c1 e1 03         shl ecx, 3                             
  00038 41 bb ff ff ff 
        ff               mov r11d, -1                           
  0003e 41 d3 eb         shr r11d, cl                           
  00041 44 23 d8         and r11d, eax                          
  00044 83 e0 04         and eax, 4                             
  00047 41 c1 eb 03      shr r11d, 3                            
  0004b f7 d8            neg eax                                
  0004d 83 c0 08         add eax, 8                             
  00050 49 f7 db         neg r11                                
  00053 4d 03 da         add r11, r10                           
  00056 c1 e9 03         shr ecx, 3                             
  00059 f7 d9            neg ecx                                
  0005b 83 c1 04         add ecx, 4                             
  0005e 4d 8b 1b         mov r11, QWORD PTR [r11]               
  00061 4d 89 1a         mov QWORD PTR [r10], r11               
  00064 4c 03 d0         add r10, rax                           
  00067 4c 03 c9         add r9, rcx                            
.B6.6::                         
  0006a 4d 3b c8         cmp r9, r8                             
  0006d 72 a5            jb .B6.3 
The package (181KB) containing the source:
http://www.sanmayce.com/Hayabusa/Nakamichi_Hayabusa_Intel15.zip

Nakamichi

And the performance with epub2txt 'Autobiography_411-ebooks_Collection.tar' corpus:
10/05/2014  03:23 AM   273,401,856 Autobiography_411-ebooks_Collection.tar
10/05/2014  06:54 AM   117,126,206 Autobiography_411-ebooks_Collection.tar.lz4                   ! -9 !
10/06/2014  07:43 PM   115,558,538 Autobiography_411-ebooks_Collection.tar.v1.2_19.lzt           ! -19 !
10/05/2014  03:23 AM   113,511,873 Autobiography_411-ebooks_Collection.tar.Z
02/12/2015  08:17 AM   113,073,442 Autobiography_411-ebooks_Collection.tar.Hayabusa.Nakamichi
10/05/2014  09:12 AM   107,237,997 Autobiography_411-ebooks_Collection.tar.Tengu.Nakamichi
10/06/2014  04:06 AM   102,514,628 Autobiography_411-ebooks_Collection.tar.lzh                   / LHA32 version 2.67 /
10/06/2014  02:23 AM   101,966,054 Autobiography_411-ebooks_Collection.tar.method1.zpaq          / zpaq 6.50 /
10/05/2014  06:47 AM    97,569,200 Autobiography_411-ebooks_Collection.tar.zip                   ! -tzip -mx9 !
10/06/2014  02:25 AM    87,439,698 Autobiography_411-ebooks_Collection.tar.method2.zpaq          / zpaq 6.50 /
10/06/2014  02:21 AM    82,724,804 Autobiography_411-ebooks_Collection.tar.sr2
10/06/2014  07:18 PM    77,194,175 Autobiography_411-ebooks_Collection.tar.lzhds.nz              ! -cD !
10/06/2014  03:34 AM    75,388,190 Autobiography_411-ebooks_Collection.tar.ST3_block256.bsc      ! -m0 -Tt -b256 !
10/05/2014  06:53 AM    69,832,119 Autobiography_411-ebooks_Collection.tar.7z                    ! -t7z -mx9 !
10/06/2014  07:57 PM    69,398,819 Autobiography_411-ebooks_Collection.tar.v1.2_39_block256.lzt  ! -39 -b256 -p1 !
10/06/2014  08:36 PM    67,946,358 Autobiography_411-ebooks_Collection.tar.v1.2_49_block256.lzt  ! -49 -b256 -p1 !
10/05/2014  06:56 AM    66,517,316 Autobiography_411-ebooks_Collection.tar.order04.PPMonstr      ! -m1024 -o4 !
10/06/2014  03:35 AM    65,416,196 Autobiography_411-ebooks_Collection.tar.ST4_block256.bsc      ! -m1 -Tt -b256 !
10/06/2014  03:36 AM    61,018,244 Autobiography_411-ebooks_Collection.tar.ST5_block256.bsc      ! -m2 -Tt -b256 !
10/06/2014  02:27 AM    59,401,801 Autobiography_411-ebooks_Collection.tar.method3.zpaq          / zpaq 6.50 /
10/05/2014  07:00 AM    58,736,456 Autobiography_411-ebooks_Collection.tar.order06.PPMonstr      ! -m1024 -o6 !
10/05/2014  02:55 AM    58,266,061 Autobiography_411-ebooks_Collection.tar.tangelo               / Version 2.3 /
10/06/2014  02:32 AM    57,962,141 Autobiography_411-ebooks_Collection.tar.method4.zpaq          / zpaq 6.50 /
10/05/2014  07:41 AM    56,862,830 Autobiography_411-ebooks_Collection.tar.bbb                   ! cfm128 !
10/05/2014  07:05 AM    55,962,342 Autobiography_411-ebooks_Collection.tar.order08.PPMonstr      ! -m1024 -o8 !
10/06/2014  03:33 AM    55,464,039 Autobiography_411-ebooks_Collection.tar.BWT_block256.bsc      ! -m3 -Tt -b256 !
10/06/2014  03:02 AM    55,117,057 Autobiography_411-ebooks_Collection.tar.method5.zpaq          / zpaq 6.50 /
10/05/2014  07:26 AM    55,011,526 Autobiography_411-ebooks_Collection.tar.order32.PPMonstr      ! -m1024 -o32 !
10/05/2014  07:15 AM    54,635,921 Autobiography_411-ebooks_Collection.tar.order16.PPMonstr      ! -m1024 -o16 !
10/06/2014  07:16 PM    52,631,060 Autobiography_411-ebooks_Collection.tar.cm.nz                 ! -cc !
10/06/2014  02:06 AM    48,194,287 Autobiography_411-ebooks_Collection.tar.paq8hp12              ! -8 !
Benchmarks below are for laptop with Core 2 Q9550s @2833MHz.
D:\_KAZE\EBOOKs_for_benching\z>..\lz4 -9 -Sx -b -T1 Autobiography_411-ebooks_Collection.tar
Nb of threads = 1 ; Compression Level = 9
Autobiography_4 : 273401856 -> 117125927 ( 42.84%),   12.7 MB/s ,  902.3 MB/s

D:\_KAZE\EBOOKs_for_benching\z>..\Nakamichi_Hayabusa_GP.exe Autobiography_411-ebooks_Collection.tar.Nakamichi /benchmark
Nakamichi 'Hayabusa', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 113073442 bytes ...
RAM-to-RAM performance: 832 MB/s.
Memory pool starting address: 0000000007130080 ... 64 byte aligned, OK
Copying a 512MB block 1024 times i.e. 512GB READ + 512GB WRITTEN ...
memcpy(): (512MB block); 524288MB copied in 193289 clocks or 2.712MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 30%

D:\_KAZE\EBOOKs_for_benching\z>..\lz4 -9 -Sx -b -T1 enwik8
Nb of threads = 1 ; Compression Level = 9
enwik8          : 100000000 ->  42283793 ( 42.28%),   18.1 MB/s ,  929.5 MB/s

D:\_KAZE\EBOOKs_for_benching\z>..\Nakamichi_Hayabusa_GP.exe enwik8.Nakamichi /benchmark
Nakamichi 'Hayabusa', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 43010309 bytes ...
RAM-to-RAM performance: 768 MB/s.
Memory pool starting address: 0000000002EA0080 ... 64 byte aligned, OK
Copying a 512MB block 1024 times i.e. 512GB READ + 512GB WRITTEN ...
memcpy(): (512MB block); 524288MB copied in 193336 clocks or 2.712MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 28%

D:\_KAZE\EBOOKs_for_benching\z>..\lz4 -9 -Sx -b -T1 OSHO.TXT
Nb of threads = 1 ; Compression Level = 9
OSHO.TXT        : 206908949 ->  71399094 ( 34.51%),   14.2 MB/s ,  947.4 MB/s

D:\_KAZE\EBOOKs_for_benching\z>..\Nakamichi_Hayabusa_GP.exe OSHO.TXT.Nakamichi /benchmark
Nakamichi 'Hayabusa', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 75006396 bytes ...
RAM-to-RAM performance: 960 MB/s.
Memory pool starting address: 0000000004C60080 ... 64 byte aligned, OK
Copying a 512MB block 1024 times i.e. 512GB READ + 512GB WRITTEN ...
memcpy(): (512MB block); 524288MB copied in 193505 clocks or 2.709MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 35%

D:\_KAZE\EBOOKs_for_benching\z>..\lz4 -9 -Sx -b -T1 silesia.tar
Nb of threads = 1 ; Compression Level = 9
silesia.tar     : 211948544 ->  78036331 ( 36.82%),   16.5 MB/s , 1132.3 MB/s

D:\_KAZE\EBOOKs_for_benching\z>..\Nakamichi_Hayabusa_GP.exe silesia.tar.Nakamichi /benchmark
Nakamichi 'Hayabusa', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 92192717 bytes ...
RAM-to-RAM performance: 832 MB/s.
Memory pool starting address: 0000000005E00080 ... 64 byte aligned, OK
Copying a 512MB block 1024 times i.e. 512GB READ + 512GB WRITTEN ...
memcpy(): (512MB block); 524288MB copied in 193482 clocks or 2.710MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 30%
And quick comparison with 'LZ4 -9':
02/12/2015  04:18 AM           152,089 alice29.txt
02/12/2015  05:35 PM            63,705 alice29.txt.lz4
09/30/2014  06:31 AM            73,787 alice29.txt.Tengu.Nakamichi
02/12/2015  04:18 AM            74,782 alice29.txt.Hayabusa.Nakamichi

02/12/2015  03:34 AM       273,401,856 Autobiography_411-ebooks_Collection.tar
02/12/2015  05:36 PM       117,126,206 Autobiography_411-ebooks_Collection.tar.lz4
10/05/2014  09:12 AM       107,237,997 Autobiography_411-ebooks_Collection.tar.Tengu.Nakamichi
02/12/2015  08:17 AM       113,073,442 Autobiography_411-ebooks_Collection.tar.Hayabusa.Nakamichi

02/12/2015  03:22 AM         3,153,408 CalgaryCorpus.tar
02/12/2015  05:35 PM         1,195,853 CalgaryCorpus.tar.lz4
09/30/2014  07:15 AM         1,296,773 CalgaryCorpus.tar.Tengu.Nakamichi
02/12/2015  09:34 AM         1,388,997 CalgaryCorpus.tar.Hayabusa.Nakamichi

02/12/2015  03:24 AM        10,192,446 dickens
02/12/2015  05:35 PM         4,442,992 dickens.lz4
09/30/2014  07:49 AM         3,993,467 dickens.Tengu.Nakamichi
02/12/2015  09:45 AM         4,197,881 dickens.Hayabusa.Nakamichi

02/12/2015  02:48 AM       100,000,000 enwik8
02/12/2015  05:36 PM        42,283,904 enwik8.lz4
09/30/2014  12:06 PM        40,860,927 enwik8.Tengu.Nakamichi
02/12/2015  11:47 AM        43,010,309 enwik8.Hayabusa.Nakamichi

02/12/2015  03:15 AM         3,903,143 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd
02/12/2015  05:34 PM         1,539,449 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.lz4
09/30/2014  06:56 AM         1,308,271 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.Tengu.Nakamichi
02/12/2015  09:28 AM         1,269,686 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.Hayabusa.Nakamichi ( ͡⚆ ͜ʖ ͡⚆)

02/12/2015  03:26 AM         5,582,655 shaks12.txt
02/12/2015  05:35 PM         2,315,036 shaks12.txt.lz4
09/30/2014  06:45 AM         2,211,086 shaks12.txt.Tengu.Nakamichi
02/12/2015  09:37 AM         2,297,940 shaks12.txt.Hayabusa.Nakamichi

02/12/2015  03:22 AM       211,948,544 silesia.tar
02/12/2015  10:02 PM        78,036,550 silesia.tar.lz4
10/01/2014  12:52 PM        82,918,238 silesia.tar.Tengu.Nakamichi
02/12/2015  09:19 PM        92,192,717 silesia.tar.Hayabusa.Nakamichi

Nakamichi

/Photo by Masaki/




Update, 2015-Feb-20:

The French artist Rachelle Bartel, a.k.a Lillycat inspired me to ... dream of a new variant called 'Loonette' a.k.a. 'The Rabbit Girl',
probably it will be XMMized 'Hayabusa' i.e. Match Lengths will be 12/8 and the Sliding Windows will be ... slid by one order i.e.
(2*8-3)bit/(3*8-3)bit/(4*8-3)bit or 13bit/21bit/29bit or 8KB/2MB/512MB.
Well, here it comes:

Nakamichi

Nakamichi

Nakamichi

Nakamichi

Nakamichi

How it behaves is still an open question. 'Loonette' has its future in the future.

Copyleft Sanmayce, 2015 Feb 20; character encoding: charset=utf-8; for contacts: sanmayce@sanmayce.com