Nakamichi

高速伸長
Above four kanjis were written as a reply to paying my respects to Prof. Okumura, they are the motto/banner of this page.
Fastest textual decompression is targeted, here comes Nakamichi.

Update, 2014-Oct-07:

Still can't find time to read the funny and USEFUL "GRAMMAR SNOBS Are Great Big MEANIES" by June Casagrande.
And an excerpt from Chapter 41 "Satan’s Vocabulary":

SATAN UNVEILS NEW LANGUAGE
"ENGLISH" TO TORMENT HUNDREDS OF
MILLIONS OF SOULS WITH "DEVILISHLY
IMPOSSIBLE" VOCABULARY
By Bernie Crisp
Underworld Times Staff Writer

HADES — The Prince of Darkness this week unveiled a
new language he claims will one day terrorize more than
half the planet with vocabulary so illogical and treacherous
it amounts to a field of "verbal land mines."
"All who doubt my evil majesty, behold: ‘flammable’
and ‘inflammable’ are the same!" Mephistopheles said in a
press conference on Thursday. "In this new language, your
founder can flounder and your flounder can founder! You
can be fazed by a phase or phase out being fazed. You can
click with a clique. You can feign a feint until you’re so
faint that you faint! You can hoard your hoard or even a
whole horde! You can rein in your reign in the rain! You
can complement a compliment or compliment a complement.
This is the suffering I unleash on the world. I am
Satan!"
The Dark Lord then went on to explain to reporters
the goal of this new mechanism of evil.
"Untold millions will stumble; they will fall. And the
only way they will be able to escape their eternal shame is
by making a pact with me! I am Satan!"
Beelzebub then disappeared in a loud burst of smoke
and flame, leaving press secretary Simon Cowell to field
further questions. Much to the media’s surprise, Cowell
began handing out press kits that contained comprehensive
guides to the new language’s vocabulary.
"Doesn’t it defeat the purpose of creating cryptic language
if you hand out a guide to that language?" a reporter
from the Tupelo Star-Pentagram asked.
"Ah, but you underestimate Lucifer," Cowell said.
"For herein lies the true evil genius of his plan. All the
information — everything you need — to be completely
successful within this system will be readily available
and right at your fingertips. That way, when you fail —
and you will fail — you’ll have no one to blame but yourself."
Cowell also said this philosophy will be the basis of a
new economic system called "capitalism," but declined to
disclose further details.

A must-read book.

My dream to sidekick English language usage by offering 100% FREE rip/search/decompress tools continues.
Sadly, the status quo is ugly, nearly 2,200,000,000 people use English and what FREE/REAL support do they have?!
Had I been an alien assessing the situation on Earth I would have said – ‘WHAT A SHITTY STALEMATE.’
As far as I can see the Earth is plagued by blood-sucking hypocrites who have built many a sect disguised as EDUCATIONAL.
One superb song goes: "... THE VIBE IS WRONG ... YOU KEEP YOUR LOVE LOCKDOWN YOU LOSE ..."
The pimps that sell EDUCATION make me sick.
Early this year while writing 'Gallowwalker' I had been obsessed by one notion 'coming operational of a sidekick/fulcrum' -
a visual tool for textual support offering an easy-to-use superfast word/phrase suggesting/checking.

...
You know, I really don't know much about my mother.
I remember her drinking a lot and always angry and fighting.
I knew she had dreams of becoming a schoolteacher.
But then she met my father...
well, the man I was told was my father.
The fast-talking, cool-dressing pimp who I always credited with changing the path of my mother's life.
And before long, she was caught up in the street life.
But she paid the heavy toll because at heart she really wasn't that girl at all.
So she drank to cover up the pain.
...
You know, I have, like, a 9th grade education from a really, really fucked up, bad school, right?
And all my kids go to private schools and Ivy League schools, right?
And every now and then I text them and I see how they're doing, and they call me back... text me back and say,
"Daddy, this is not how you spell my name, Daddy.
 And you don't spell 'birthday' like this, Dad.
 I can't believe you can't spell 'birthday'."
...
/From the 'Mike Tyson - Undisputed Truth'/

I won't be surprised at all if 1,200,000,000 out of 2,200,000,000 would misspell 'birthday'.
The shame lies not in people erring but in JOIN-FORCES behind all this SICKNESS - keeping LITERACY out of reach.
Free access to English texts is quite as the right for happiness for EVERYONE as stated in constitution, that's right.
I see no genuine will whatsoever to help the fellow man, only business and "PROFESSIONALISM".
The old term 'men of letters' has lost its sacred power in our days, most of the teachers/writers are simply 'moneymakers'.
Or, as one popular song goes 'She don't believe in shooting-stars but she believe in shoes-and-cars'.
My point, this nasty sickness if not ended will spread as in 'RESIDENT EVIL' hurting ocean of souls.
Old thinking has to go, the tomorrow that never comes, in my view, is now.
The topic is vast, but as Babaji says 'One spark is enough.'
The want (not merely need) for more purer English texts corpus resulted in 'Autobiography_411-ebooks_Collection.tar' corpus:
10/05/2014  03:23 AM   273,401,856 Autobiography_411-ebooks_Collection.tar
10/05/2014  06:54 AM   117,126,206 Autobiography_411-ebooks_Collection.tar.lz4                   ! -9 !
10/06/2014  07:43 PM   115,558,538 Autobiography_411-ebooks_Collection.tar.v1.2_19.lzt           ! -19 !
10/05/2014  03:23 AM   113,511,873 Autobiography_411-ebooks_Collection.tar.Z
10/05/2014  09:12 AM   107,237,997 Autobiography_411-ebooks_Collection.tar.Tengu.Nakamichi
10/06/2014  04:06 AM   102,514,628 Autobiography_411-ebooks_Collection.tar.lzh                   / LHA32 version 2.67 /
10/06/2014  02:23 AM   101,966,054 Autobiography_411-ebooks_Collection.tar.method1.zpaq          / zpaq 6.50 /
10/05/2014  06:47 AM    97,569,200 Autobiography_411-ebooks_Collection.tar.zip                   ! -tzip -mx9 !
10/06/2014  02:25 AM    87,439,698 Autobiography_411-ebooks_Collection.tar.method2.zpaq          / zpaq 6.50 /
10/06/2014  02:21 AM    82,724,804 Autobiography_411-ebooks_Collection.tar.sr2
10/06/2014  07:18 PM    77,194,175 Autobiography_411-ebooks_Collection.tar.lzhds.nz              ! -cD !
10/06/2014  03:34 AM    75,388,190 Autobiography_411-ebooks_Collection.tar.ST3_block256.bsc      ! -m0 -Tt -b256 !
10/05/2014  06:53 AM    69,832,119 Autobiography_411-ebooks_Collection.tar.7z                    ! -t7z -mx9 !
10/06/2014  07:57 PM    69,398,819 Autobiography_411-ebooks_Collection.tar.v1.2_39_block256.lzt  ! -39 -b256 -p1 !
10/06/2014  08:36 PM    67,946,358 Autobiography_411-ebooks_Collection.tar.v1.2_49_block256.lzt  ! -49 -b256 -p1 !
10/05/2014  06:56 AM    66,517,316 Autobiography_411-ebooks_Collection.tar.order04.PPMonstr      ! -m1024 -o4 !
10/06/2014  03:35 AM    65,416,196 Autobiography_411-ebooks_Collection.tar.ST4_block256.bsc      ! -m1 -Tt -b256 !
10/06/2014  03:36 AM    61,018,244 Autobiography_411-ebooks_Collection.tar.ST5_block256.bsc      ! -m2 -Tt -b256 !
10/06/2014  02:27 AM    59,401,801 Autobiography_411-ebooks_Collection.tar.method3.zpaq          / zpaq 6.50 /
10/05/2014  07:00 AM    58,736,456 Autobiography_411-ebooks_Collection.tar.order06.PPMonstr      ! -m1024 -o6 !
10/05/2014  02:55 AM    58,266,061 Autobiography_411-ebooks_Collection.tar.tangelo               / Version 2.3 /
10/06/2014  02:32 AM    57,962,141 Autobiography_411-ebooks_Collection.tar.method4.zpaq          / zpaq 6.50 /
10/05/2014  07:41 AM    56,862,830 Autobiography_411-ebooks_Collection.tar.bbb                   ! cfm128 !
10/05/2014  07:05 AM    55,962,342 Autobiography_411-ebooks_Collection.tar.order08.PPMonstr      ! -m1024 -o8 !
10/06/2014  03:33 AM    55,464,039 Autobiography_411-ebooks_Collection.tar.BWT_block256.bsc      ! -m3 -Tt -b256 !
10/06/2014  03:02 AM    55,117,057 Autobiography_411-ebooks_Collection.tar.method5.zpaq          / zpaq 6.50 /
10/05/2014  07:26 AM    55,011,526 Autobiography_411-ebooks_Collection.tar.order32.PPMonstr      ! -m1024 -o32 !
10/05/2014  07:15 AM    54,635,921 Autobiography_411-ebooks_Collection.tar.order16.PPMonstr      ! -m1024 -o16 !
10/06/2014  07:16 PM    52,631,060 Autobiography_411-ebooks_Collection.tar.cm.nz                 ! -cc !
10/06/2014  02:06 AM    48,194,287 Autobiography_411-ebooks_Collection.tar.paq8hp12              ! -8 !
D:\_KAZE\DDETT>Nakamichi_Tengu_GP_64bit.exe Autobiography_411-ebooks_Collection.tar.Nakamichi /report
Nakamichi 'Tengu', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 107237997 bytes ...
RAM-to-RAM performance: 576 MB/s.
Memory pool starting address: 0000000006BB0080 ... 64 byte aligned, OK
Copying a 512MB block 1024 times i.e. 512GB READ + 512GB WRITTEN ...
memcpy(): (512MB block); 524288MB copied in 280989 clocks or 1.866MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 30%

D:\_KAZE\DDETT>lz4 -9 -Sx -b -T1 Autobiography_411-ebooks_Collection.tar
Nb of threads = 1 ; Compression Level = 9
Autobiography_4 : 273401856 -> 117125927 ( 42.84%),   10.2 MB/s ,  710.8 MB/s

D:\_KAZE\DDETT>Yappy_32bit.exe Autobiography_411-ebooks_Collection.tar 65536 999
YAPPY: [b 64K] bytes 273401856 -> 158222198  57.9%  comp  27.3 MB/s  uncomp 437.5 MB/s

D:\_KAZE\DDETT>Yappy_32bit.exe Autobiography_411-ebooks_Collection.tar 1048576 999
YAPPY: [b 1024K] bytes 273401856 -> 156590246  57.3%  comp  27.0 MB/s  uncomp 434.6 MB/s
Above benchmark is for my laptop with T7500 2200MHz, the goal is to open/traverse ‘transparently’ i.e.
‘on the fly’ 2,000,000+ ebooks, not to speak of epapers/emagazines. Too many times I see empty/ununderstanding eyes
while I mention ‘greed-for-speed’ instead of ‘need-for-speed’, see for yourself:
Pure text (EPUB2TXT): 411 ebooks ~ 260MB, then roughly 2,000,000/411x260 = 1,265,206MB or 1.2TB!
Now, 273,401,856:107,237,997 = 2.5:1 i.e. 1.2TB:491GB, if we have 512GB m.2 SSD with 1GB/s read and Haswell
(it offers 1500+MB/s decompression speed with ‘Tengu’) then we are nearing one good scenario with potential.

My first attempt ('DDETT' corpus) to offer solid ground for benchmarking is good enough, however not purely bookish.
This textual corpus, built from contemporary English style prose, having unique 411 files fits ultrawell in heavy English
texts benchmarking. The corpus contains the TXT format of EPUB sources, 'calibre' by Kovid Goyal was used as convertor,
great tool! Some 84 out of 411 covers are shown below:

Nakamichi

Update, 2014-Oct-02:

天狗
It's time for the fastest variant for English texts.
Happy to share my latest Nakamichi fine-tuned for English texts, it is called 'Tengu' a.k.a. 'Skydog'.

Nakamichi

I have always been interested in Japanese mythology and had recently been researching the subject of tengu,
which means "heavenly dogs", for an upcoming novel.
There are actually two forms of tengu. The first and more ancient type, karasu or "crow" tengu, has the beak, claws,
and wings of a bird but the body of a man. The logo for Tengu Press Hawaii depicts the head of a bird tengu wearing
a small round priest's cap. Yamabushi or "mountain priest" tengu are probably more well known. They take the form of
barefooted elderly mountain priests with extremely long noses. Wooden masks of both types of tengu are popular.
In English, the term tengu merely translates as "goblin" without distinction as to the two forms.
Since they lived in the mountains, tengu often took the form of the eccentric yamabushi (mountain priests) who
also lived there. Many yamabushi were thought to possess magical powers derived from their ascetic practices and
the sacredness of the mountains themselves. Over time, the folklore of tengu and yamabushi became intertwined.
The yamabushi form of tengu became most popular and even the bird tengu were shown wearing the short robes and
caps of priests. Tengu were also portrayed as being more mischievous than evil and were often depicted helping people.
In their last incarnation as humans, tengu were arrogant samurai or priests - that is why they have beaks or long noses.
The expression 'tengu ni naru' is thus an admonition to avoid being arrogant.
Tengu apparently have a hierarchy. Long nosed tengu are generally in charge of bird tengu. The king of all tengu is Sojobo,
an elderly, white-haired yamabushi tengu. Sojobo is famous for teaching martial arts and strategy to Minamoto Yoshitsune on
Mt. Kurama, north of Kyoto.
/Tengu: 'The Legendary Mountain Goblins of Japan' by Charles C. Goodin/

Sōjōbō (僧正坊, lit. "high Buddhist priest") is the mythical king of the tengu, minor deities who
inhabit the mountains and forests of Japan. Sōjōbō is an ancient yamabushi (mountain hermit) tengu with long, white hair
and an unnaturally long nose. He carries a fan made from seven feathers as a sign of his position at the top of tengu society. He is extremely powerful, and one legend says he has the strength of 1,000 normal tengu.
Sōjōbō lives on Mount Kurama (north of Kyoto).
/From Wikipedia, the free encyclopedia/

TENGU
The Slayer of Vanity

Tengu 天狗 are mountain and forest goblins with both Shinto and Buddhist attributes. Their supernatural powers include
shape-shifting into human or animal forms, the ability to speak to humans without moving their mouth, the magic of moving
instantly from place to place without using their wings, and the sorcery to appear uninvited in the dreams of the living.
The patron of martial arts, the bird-like Tengu is a skilled warrior and mischief maker, especially prone to playing tricks
on arrogant and vainglorious Buddhist priests, and to punishing those who willfully misuse knowledge and authority to gain
fame or position. In bygone days, they also inflicted their punishments on vain and arrogant samurai warriors. They dislike
braggarts, and those who corrupt the Dharma (Buddhist Law).
Daitengu, or "Major Tengu." Typically appears as man with a very long nose and red face. They often wear a pair of
geta (Japanese wooden sandals), and carry a magic fan made of bird feathers that can create a hellish tornade.
The long nose relates to the Tengu’s hatred of arrogance and prejudice. Priests with no true knowledge, prideful
individuals, those attached to fame, and those who willfully mislead or misuse the Buddhist cannons are turned into the
long-nosed Yamabushi Tengu (or sent to Tengudo, the realm of the Tengu) after their deaths. Corrupt Buddhist monks and
corrupt Buddhist monestaries were in fact a major concern throughout Japan’s middle ages. Tengu are thus seen as protectors
of the Dharma (Buddhist law), and punish those who mislead the people.

/Excerpts from superb www.onmarkproductions.com by Mark Schumacher/

Nakamichi
Above masterpiece is a Chozobudgie's work at deviantART. Magical!

Update, 2014-Sep-13:

中道落花
Since 'Blacklead' is currently nearly impractical far exceeding L1/L2/L3, its lightened variation comes saving the day.
It is called 'Rakka' a.k.a. 'Fallenflower' and uses 2MB (nowadays' cachesize per core) instead of 256MB.

Nakamichi

Nakamichi
How close, yet distant are the two associative flower notions 'Fleurs Du Mal' and 'Fallen Sakura petals'.
The latter being the symbol of untainted individuals dying still full of life.
The widespread translation of Charles Baudelaire's 'Fleurs du mal' is 'Flowers of Evil', which is brutally incorrect.

'Les Fleurs du mal', why!?
One of early editions was named correctly 'Ces Fleurs Maladives'.
The author himself names them in 'Dedication':
...
I dedicate
These unhealthy flowers

Also, 'La Muse malade' or 'The Sick Muse' reveals the spirit behind all the poems, namely, 'SICK/ILL' which is not 'EVIL'.

My poor Muse, alas! what ails you today?
Your hollow eyes are full of nocturnal visions;
I see in turn reflected on your face
Horror and madness, cold and taciturn.
Have the green succubus, the rosy elf,
Poured out for you love and fear from their urns?
Has the hand of Nightmare, cruel and despotic,
Plunged you to the bottom of some weird Minturnae?

Alas, poor Muse, what ails you so today?
Your hollow eyes with midnight visions burn,
And turn about, in your complexion play
Madness and horror, cold and taciturn.
Green succubus and rosy imp — have they
Poured you both fear and love into one glass?
Or with his tyrant fist the nightmare, say,
Submerged you in some fabulous morass?

In his days (even now) it is considered that illness is caused by evil (spirits/energy).
Simply, the meaning is 'The ILL/HURT/HARMED/sickly/weak/unhealthy flowers' or 'Illness inflicted on Flowers by evil'.
Or more poetically, 'Flowers harmed by evil'.
Some translators go even further by calling them 'POISONOUS'.
So sad that attention is focused on infliction rather than on primal nature, before/after all they are flowers.

* il est d'une timidité maladive : he's pathologically shy
* des paroles qui font du mal : words that hurt | hurtful words
* les maux dont souffre notre société : the ills/evils afflicting our society
* faire du mal à qn : to harm/hurt sb

/The Unabridged Collins-Robert Electronic French Dictionary/

Sarah Brightman's song of the same name is great:
... All my life I have been waiting for in this perfume of pain ...

Madonna's 'Erotica' is pretty close, the single is awesome.
I'll be your sorceress, your heart's magician
I'm not a witch, I'm a love technician
I'll be a guiding light in your darkest hour
I'm gonna change your life, I'm like a poison flower
...
Only the one that hurts you can make you feel better
Only the one that inflicts the pain can take it away
The variation/performance from 'The Confessions Tour' (Thailand) complements the original beautifully.

MALADIE D'AMOUR starring Nastassja Kinski says it all making the full circle.

Update, 2014-Sep-09:

中道黒鉛
Happy to share the mainstay variant, Nakamichi 'Kokuen' a.k.a. 'Blacklead'.
It is a beautiful mix of 'Kinroba' and 'Yoko'.

Nakamichi
黒鉛 [こくえん] (n) blacklead; graphite
落花 ~ Rakka - falling of blossoms; fallen flowers
高所 ~ Takami - elevated place; excellent idea
高速 [こうそく] (adj-na,n,adj-no) (1) high speed, high gear, (2) (abbr) highway, freeway, expressway, motorway
伸長 [しんちょう] (n,vs,adj-no) expansion, extension, elongation, stretching, uncompression

Nakamichi

Update, 2014-Sep-07:

雪男子
Nakamichi 'Yoko' (short for 'Yukiotoko') is just a whim.
It is a subvariant of 'Kinutora' using 4bit window as well, thus 16B/4KB/1MB windows are in use.
The name comes from the lovely German movie 'YOKO', SOED says:
Yeti
...
[Tibetan 'yeh-teh' little manlike animal.]

Nakamichi
yuki ~ [ゆき] (n) snow
otoko(noko) ~ 男子 [だんし] (n) youth, young man
kaisoku ~ 快速 [かいそく] (adj-na,n,adj-no) high speed, celerity, mobility, express (train that bypasses many stations)
kaikyo ~ 快挙 [かいきょ] (n) brilliant achievement

'Yukiotoko' is okay, I wonder, is 'Yukidanji' (after 'Kaidanji') plausible!

Nakamichi 'Yoko' and the BOOSTED 'Kinutora' are in this package:
http://www.sanmayce.com/Nakamichi/Nakamichi_Kinutora-BOOSTED_Yoko.zip

BOOSTED 'Kinutora' is 1+% faster.
D:\Nakamichi_Yoko>dir

09/26/1996  04:51 PM           152,089 alice29.txt
09/07/2014  05:07 PM            76,331 alice29.txt.Yoko.Nakamichi
05/16/2014  07:22 AM         3,153,408 CalgaryCorpus.tar
09/07/2014  05:17 PM         1,320,229 CalgaryCorpus.tar.Yoko.Nakamichi
05/16/2014  07:22 AM        10,192,446 dickens
09/07/2014  05:50 PM         4,260,015 dickens.Yoko.Nakamichi
05/16/2014  07:22 AM       100,000,000 enwik8
09/07/2014  11:02 PM        42,652,249 enwik8.Yoko.Nakamichi
06/03/2014  07:35 PM         5,582,655 shaks12.txt
09/07/2014  05:30 PM         2,315,289 shaks12.txt.Yoko.Nakamichi
A bit better compression than 'Kinutora', however on wordlists it is a 'byte'.
05/16/2014  07:22 AM         3,903,143 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd
09/08/2014  05:11 AM         1,563,885 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.Kinutora.Nakamichi
09/08/2014  04:59 AM         1,539,449 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.lz4
09/07/2014  11:12 PM         1,341,690 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.Yoko.Nakamichi
Nakamichi

Update, 2014-Sep-03:

Nakamichi 'Kinutora' is by far the strongest&fastest variant among all Nakamichita.
The two major testdatasets 'DDETT' and 'OSHO' were Silkentigered, 'Washi' falls 607-493=114 MB/s behind.
Even 'Butsuhira' cannot keep up with 'Silkentiger', being 607-556=51 MB/s slower.
1.0:1     1,082,907,648 DDETT-Definitive_Decompression_English_Texts_Torture.tar
2.8:1       386,308,041 DDETT-Definitive_Decompression_English_Texts_Torture.tar.lz4                 ! -9 !
2.8:1       380,665,413 DDETT-Definitive_Decompression_English_Texts_Torture.tar.19.lzt              ! -19 !
2.8:1       377,818,677 DDETT-Definitive_Decompression_English_Texts_Torture.tar.Kinutora.Nakamichi

1.0:1       206,908,949 OSHO.TXT
2.8:1        71,399,309 OSHO.TXT.lz4                                                                 ! -9 !
2.9:1        70,067,665 OSHO.TXT.lzt                                                                 ! -19 !
2.9:1        69,860,415 OSHO.TXT.Kinutora.Nakamichi
Tigerish performance on English texts, the two top LZ monsters are outroared using 1MB sliding window!
My prefinal showdown on laptop with Core 2 Q9550s 2.83GHz:
C:\Nakamichi_Kinutora>dir

09/03/2014  11:37 AM     1,082,907,648 DDETT-Definitive_Decompression_English_Texts_Torture.tar
08/29/2014  05:06 PM       477,188,523 DDETT-Definitive_Decompression_English_Texts_Torture.tar.Jiten.Nakamichi
09/03/2014  11:02 AM       377,818,677 DDETT-Definitive_Decompression_English_Texts_Torture.tar.Kinutora.Nakamichi
08/03/2014  11:20 AM       370,986,428 DDETT-Definitive_Decompression_English_Texts_Torture.tar.Washi.Nakamichi
...
09/03/2014  11:36 AM       206,908,949 OSHO.TXT
08/29/2014  02:06 AM        87,532,655 OSHO.TXT.Jiten.Nakamichi
09/03/2014  10:28 AM        69,860,415 OSHO.TXT.Kinutora.Nakamichi
08/08/2014  04:42 PM        66,793,112 OSHO.TXT.Washi.Nakamichi

Nakamichi_Kinutora_YMMless.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar.Nakamichi /report >>Results.txt

Nakamichi 'Kinutora', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 377818677 bytes ...
RAM-to-RAM performance: 607 MB/s.
Memory pool starting address: 0000000016D30080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 94553 clocks or 2.772MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 21%

lz4 -9 -Sx -b -T1 DDETT-Definitive_Decompression_English_Texts_Torture.tar 2>>Results.txt

Nb of threads = 1 ; Compression Level = 9
Not enough memory for 'DDETT-Definitive_Decompression_English_Texts_Torture.tar' full size; testing 896 MB only...
DDETT-Definitiv : 939524096 -> 329287352 ( 35.05%),   14.4 MB/s ,  982.1 MB/s

lz4 -9 -Sx -b DDETT-Definitive_Decompression_English_Texts_Torture.tar 2>>Results.txt

Nb of threads = 4 ; Compression Level = 9
Not enough memory for 'DDETT-Definitive_Decompression_English_Texts_Torture.tar' full size; testing 896 MB only...
DDETT-Definitiv : 939524096 -> 329287352 ( 35.05%),   55.7 MB/s , 2263.9 MB/s
Again, LZ4 proves to be the fastest RAM2RAM decompressor I have ever seen! Yann is a wizard!
I only hope this brutal domination of LZ4 to be lessened on Haswell, after all the habitat of Silkentiger is YMM realm.

Nakamichi

Update, 2014-Sep-01:

絹虎
The little brother of 'Butsuhira' - 'Kinutora' a.k.a. 'the silken tiger' comes roaring/promising.
Silkentiger became instantly my favorite since I have high affinity to pure English texts where it excels.
So vivid, almost 3D, the work of this masteress reminds me of one Japanese ivory craftsman, a masterpiece for emperors.

Nakamichi
------------------------------------------------------------------------------------------------------------------------------------------------------
| compressor \ filedataset      | alice29.txt    | CalgaryCorpus.tar  | shaks12.txt        | dickens             | enwik8                            |
------------------------------------------------------------------------------------------------------------------------------------------------------
| UNCOMPRESSED                  | 152,089        | 3,153,408          | 5,582,655          | 10,192,446          | 100,000,000                       |
| Nakamichi 'Jiten'      (32KB) | 071,924 / 0244 | 1,533,344 / 018744 | 2,657,388 / 005642 | 04,943,712 / 005555 | 051,084,523 / 0326068 /  ??? MB/s |
| Nakamichi 'Kinutora'    (1MB) | 076,705 / 0031 | 1,328,285 / 002955 | 2,330,896 / 000056 | 04,278,431 / 000124 | 042,888,017 / 0003492 /  ??? MB/s |
| Nakamichi 'Butsuhira'   (2MB) | 078,121 / 0307 | 1,461,597 / 007240 | 2,321,653 / 000406 | 04,176,004 / 000661 | 043,462,243 / 0035177 /  ??? MB/s |
| Nakamichi 'Washi'       (4MB) | 088,897 / 0006 | 1,484,221 / 000966 | 2,384,536 / 000011 | 04,261,276 / 000012 | 042,714,346 / 0000232 / 435+ MB/s |
| 7z's gz, Ultra Deflate32      | 051,707        | 0,980,026          | 1,934,787          | 03,681,828          | 035,102,891                       |
| 7z's zip, Ultra Deflate64     | 050,051        | 0,945,849          | 1,834,240          | 03,508,645          | 033,757,921                       |
| TANGELO 2.3                   | 039,160        | 0,710,066          | 1,236,021          | 02,279,659          | 020,921,619                       |
| LZ4 v1.4, -9                  | 063,705        | 1,195,853          | 2,315,036          | 04,442,992          | 042,283,904         / 2186.9 MB/s |
| Yappy, 8192 10000             | 087,965        | 1,654,203          | 3,337,964          | 06,374,780          | 057,701,807         /  698.7 MB/s |
| Yappy, 65536 10000            | 081,217        | 1,544,271          | 3,120,688          | 05,912,295          | 054,162,908         /  679.4 MB/s |
| Yappy, 1048576 10000          | 080,353        | 1,530,823          | 3,091,493          | 05,850,648          | 053,687,370         /  679.4 MB/s |
------------------------------------------------------------------------------------------------------------------------------------------------------
Didn't have enough time to compress 'DDETT' corpus, to be done.
unsigned int Decompress_Kinutora (char* ret, char* src, unsigned int srcSize) {
	//unsigned int srcIndex=0; // Dummy me
	//unsigned int retIndex=0; // Dummy me
	// The muffinesque suggestion by Jim Dempsey enforced:
	char* retLOCAL = ret;
	char* srcLOCAL = src;
	char* srcEndLOCAL = src+srcSize;
	unsigned int DWORDtrio;
	unsigned int Flag;
	while (srcLOCAL < srcEndLOCAL) {
		DWORDtrio = *(unsigned int*)srcLOCAL;
// |1stLSB     |2ndLSB  |3rdLSB   |
// --------------------------------
// |T|LL|O|xxxx|xxxxxxxx|xxxxxx|xx|
// --------------------------------
// [1bit           16bit]    24bit]
// T = 0 means Literal
// T = 1 means Match
// LL = 00b means Long MatchLength, 32>>LL or 32
// LL = 01b means Long MatchLength, 32>>LL or 16
// LL = 10b means Long MatchLength, 32>>LL or 8
// LL = 11b means Long MatchLength, 32>>LL or 4
// O = 0 means Long MatchOffset, 3 bytes long i.e. Sliding Window is 3*8-F-LL-O=3*8-4=20 or 1MB
// O = 1 means Short MatchOffset, 2 bytes long i.e. Sliding Window is 2*8-F-LL-O=2*8-4=12 or 4KB
		if (DWORDtrio & 0x01) {
				#ifndef _N_YMM
		memcpy(retLOCAL, (const char *)( (uint64_t)(retLOCAL-((DWORDtrio&(0xFFFFFF>>((DWORDtrio & 0x08)<<0)))>>4)) ), 32);
				#endif
				#ifdef _N_YMM
		SlowCopy256bit( (const char *)( (uint64_t)(retLOCAL-((DWORDtrio&(0xFFFFFF>>((DWORDtrio & 0x08)<<0)))>>4)) ), retLOCAL );
				#endif
		srcLOCAL+= (uint64_t)(3-((DWORDtrio & 0x08)>>3));
		retLOCAL+= (uint64_t)( Min_Match_Length>>((DWORDtrio>>1)&0x03) );
		} else {
				#ifndef _N_YMM
		memcpy(retLOCAL, (const char *)( (uint64_t)(srcLOCAL+1) ), 16);
				#endif
				#ifdef _N_YMM
		SlowCopy128bit( (const char *)( (uint64_t)(srcLOCAL+1) ), retLOCAL );
				#endif
		srcLOCAL+= ((DWORDtrio & 0xFF)>>4)+1;
		retLOCAL+= ((DWORDtrio & 0xFF)>>4);
		}
	}        
	return (unsigned int)(retLOCAL - ret);
}
The source and executables are in-here:
http://www.sanmayce.com/Nakamichi/Nakamichi_Kinutora.zip

Update, 2014-Aug-28:

My prefinal showdown on laptop with Core 2 Q9550s 2.83GHz:
C:\Nakamichi_Butsuhira_Jiten_Kaidanji_Kinroba_Nin_Washi>Nakamichi_Butsuhira_branchless_XMM.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar.Butsuhira.Nakamichi
Nakamichi 'Butsuhira', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 392117791 bytes ...
RAM-to-RAM performance: 516 MB/s.

C:\Nakamichi_Butsuhira_Jiten_Kaidanji_Kinroba_Nin_Washi>Nakamichi_Butsuhira_XMM.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar.Butsuhira.Nakamichi
Nakamichi 'Butsuhira', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 392117791 bytes ...
RAM-to-RAM performance: 556 MB/s.

C:\Nakamichi_Butsuhira_Jiten_Kaidanji_Kinroba_Nin_Washi>Nakamichi_Jiten_GP.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar.Jiten.Nakamichi
Nakamichi 'Jiten', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 477188523 bytes ...
RAM-to-RAM performance: 817 MB/s.

C:\Nakamichi_Butsuhira_Jiten_Kaidanji_Kinroba_Nin_Washi>Nakamichi_Washi_XMM.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar.Washi.Nakamichi
Nakamichi 'Washi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 370986428 bytes ...
RAM-to-RAM performance: 493 MB/s.

C:\Nakamichi_Butsuhira_Jiten_Kaidanji_Kinroba_Nin_Washi>Nakamichi_Kaidanji_YMMless.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar.Kaidanji.Nakamichi /report
Nakamichi 'Kaidanji', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 586929167 bytes ...
RAM-to-RAM performance: 826 MB/s.
Memory pool starting address: 0000000023490080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 94880 clocks or 2.763MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 29%
The results are bittersweet.
'Kaidanji' disappoints, no trace of its 1014MB/s.
'Butsuhira' didn't engladden me.
'Jiten' is sweet.

English language has one superb feature, to harbor words from rest languages easily, which led to its richest status today.
My quick search for adjectives describing dominance over monsters fell flat, so few quick coinages from me:
supra+monstrous=supramonstrous
'Supra' is dearer than its substitute 'super', I still keep some old Japanese micro-floppy-disks from TOKIN corporation.
The booklets within crystal cases read 'Supra-Quality'.
monstri+cide+ous=monstricideous (after huge/hugeous)
monstri+cide+ish=monstricidish (after large/largish)
monstri+cide+ic=monstricidic[al] (after imbecile/imbecilic mimicking diable/diabolic)
monstri+cide+al=monstricidal (after fungi/fungicidal)
immanitas+i+cide+ous=immanitasicideous
immanitas+i+cide+ish=immanitasicidish
immanitas+i+cide+ic=immanitasicidic[al]
immanitas+i+cide+al=immanitasicidal
As always some variants are marginal, most magnetic I find to be 'monstricidal'.
Thus, when one says some thing to be monstrously fast, I trump it with monstricidally.
From Latin:
supra PREP ACC [XXXAX] above, beyond; over; more than; in charge of, in authority over;
immanitas, immanitatis N (3rd) F [XXXCO] brutality, savage character, frightfulness; huge/vast size; barbarity; monster;
monstrum, monstri N (2nd) N [XXXBX] monster; portent, unnatural thing/event regarded as omen/sign/portent;

Nakamichi

Update, 2014-Aug-26:


Out of curiosity I wrote the 'missing link' between 'Butsuhira' (2MB window) and 'Jiten' (32KB window).
It is called 'Nin' and uses 256KB window.
------------------------------------------------------------------------------------------------------------------------------------------------------
| compressor \ filedataset      | alice29.txt    | CalgaryCorpus.tar  | shaks12.txt        | dickens             | enwik8                            |
------------------------------------------------------------------------------------------------------------------------------------------------------
| UNCOMPRESSED                  | 152,089        | 3,153,408          | 5,582,655          | 10,192,446          | 100,000,000                       |
| Nakamichi 'Jiten'      (32KB) | 071,924 / 0244 | 1,533,344 / 018744 | 2,657,388 / 005642 | 04,943,712 / 005555 | 051,084,523 / 0326068 /  676 MB/s |
| Nakamichi 'Nin'       (256KB) | 076,320 / 0031 | 1,502,068 / 003122 | 2,568,754 / 000059 | 04,711,885 / 000181 | 048,923,717 / 0008474 /  ??? MB/s |
| Nakamichi 'Butsuhira'   (2MB) | 078,121 / 0307 | 1,461,597 / 007240 | 2,321,653 / 000406 | 04,176,004 / 000661 | 043,462,243 / 0035177 /  ??? MB/s |
| Nakamichi 'Washi'       (4MB) | 088,897 / 0006 | 1,484,221 / 000966 | 2,384,536 / 000011 | 04,261,276 / 000012 | 042,714,346 / 0000232 / 435+ MB/s |
| 7z's gz, Ultra Deflate32      | 051,707        | 0,980,026          | 1,934,787          | 03,681,828          | 035,102,891                       |
| 7z's zip, Ultra Deflate64     | 050,051        | 0,945,849          | 1,834,240          | 03,508,645          | 033,757,921                       |
| TANGELO 2.3                   | 039,160        | 0,710,066          | 1,236,021          | 02,279,659          | 020,921,619                       |
| LZ4 v1.4, -9                  | 063,705        | 1,195,853          | 2,315,036          | 04,442,992          | 042,283,904         / 2186.9 MB/s |
| Yappy, 8192 10000             | 087,965        | 1,654,203          | 3,337,964          | 06,374,780          | 057,701,807         /  698.7 MB/s |
| Yappy, 65536 10000            | 081,217        | 1,544,271          | 3,120,688          | 05,912,295          | 054,162,908         /  679.4 MB/s |
| Yappy, 1048576 10000          | 080,353        | 1,530,823          | 3,091,493          | 05,850,648          | 053,687,370         /  679.4 MB/s |
------------------------------------------------------------------------------------------------------------------------------------------------------
After some months, I intend to post the final roster who-is-who regarding 'DDETT' corpus.
Up to now, the shortlist is:
Nakamichi 'Kaidanji'   (64KB) - main feature(s): worst compression, fastest decompression
Nakamichi 'Jiten'      (32KB) - main feature(s): halves most files, superfast decompression, YAPPY dethroner
Nakamichi 'Nin'       (256KB) - main feature(s): halves most files, fasterer decompression, well-balanced
Nakamichi 'Butsuhira'   (2MB) - main feature(s): very good compression, faster decompression, EXCELLENT balance
Nakamichi 'Washi'       (4MB) - main feature(s): very good compression, fast decompression, very well-balanced
Nakamichi 'Kinroba'   (256MB) - main feature(s): best compression, slowest decompression, SLOWEST compression under sun

The usual suspects' sources and executables are in-here:
http://www.sanmayce.com/Nakamichi/Nakamichi_Butsuhira_Jiten_Kaidanji_Kinroba_Nin_Washi.zip
Note: Reuploaded 2014-Aug-28, updated with faster 'Jiten' and faster 'Washi'.

Update, 2014-Aug-25:

First, I like to thank Colin0912, Blameless and Jim Dempsey, really appreciated is their help.
For so long I wanted to see what Haswell holds, Blameless' laptop says it all:
C:\Users\Dyer\Desktop\New folder>Nakamichi_Washi_YMM.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar.Nakamichi /report
Nakamichi 'Washi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 370986428 bytes ...
RAM-to-RAM performance: 986 MB/s.
Memory pool starting address: 00000000166D0080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 21966 clocks or 11.934MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 8%
This is an i7-4700MQ with 2x8GiB of DDR3L-2133 CL 11-11-11 that is turboing at ~3.2GHz during this test.
The weaklink was in my corner, therefore I made some adjustments and now 'Washi' is dethroned by 'Butsuhira'.
仏平
Lucky I was to write stronger (on English texts) and more importantly faster variant called 'Butsuhira'.
unsigned int Decompress_Butsuhira (char* ret, char* src, unsigned int srcSize) {
	//unsigned int srcIndex=0; // Dummy me
	//unsigned int retIndex=0; // Dummy me
	// The muffinesque suggestion by Jim Dempsey enforced:
	char* retLOCAL = ret;
	char* srcLOCAL = src;
	char* srcEndLOCAL = src+srcSize;
	unsigned int DWORDtrio;
	unsigned int Flag;
	//while(srcIndex < srcSize){ // Dummy me
	while(srcLOCAL < srcEndLOCAL){
		//DWORDtrio = *(unsigned int*)&src[srcIndex]; // Dummy me
		DWORDtrio = *(unsigned int*)srcLOCAL;
// |1stLSB     |2ndLSB  |3rdLSB   |
// --------------------------------
// |T|L|O|xxxxx|xxxxxxxx|xxxxxx|xx|
// --------------------------------
// [1bit           16bit]    24bit]
// T = 0 means Literal
// T = 1 means Match
// L = 0 means Long MatchLength, 16>>(L+O) or 8/16
// L = 1 means Short MatchLength, 16>>(L+O) or 4/8
// O = 0 means Long MatchOffset, 3 bytes long i.e. Sliding Window is 3*8-F-L-O=3*8-3=21 or 2MB
// O = 1 means Short MatchOffset, 2 bytes long i.e. Sliding Window is 2*8-F-L-O=2*8-3=13 or 8KB
		if (DWORDtrio & 0x01) {
				#ifndef _N_XMM
		//memcpy((ret+retIndex), (const char *)( (uint64_t)(ret+retIndex-((DWORDtrio&(0xFFFFFF>>((DWORDtrio & 0x04)<<1)))>>3)) ), 16); // Dummy me
		memcpy(retLOCAL, (const char *)( (uint64_t)(retLOCAL-((DWORDtrio&(0xFFFFFF>>((DWORDtrio & 0x04)<<1)))>>3)) ), 16);
				#endif
				#ifdef _N_XMM
		//SlowCopy128bit( (const char *)( (uint64_t)(ret+retIndex-((DWORDtrio&(0xFFFFFF>>((DWORDtrio & 0x04)<<1)))>>3)) ), ret+retIndex ); // Dummy me
		SlowCopy128bit( (const char *)( (uint64_t)(retLOCAL-((DWORDtrio&(0xFFFFFF>>((DWORDtrio & 0x04)<<1)))>>3)) ), retLOCAL );
				#endif
		//srcIndex+= (uint64_t)(3-((DWORDtrio & 0x04)>>2)); // Dummy me
		srcLOCAL+= (uint64_t)(3-((DWORDtrio & 0x04)>>2));
		//retIndex+= (uint64_t)( Min_Match_Length>>( ((DWORDtrio & 0x04)>>2) + ((DWORDtrio & 0x02)>>1) ) ); // Dummy me
		retLOCAL+= (uint64_t)( Min_Match_Length>>( ((DWORDtrio & 0x04)>>2) + ((DWORDtrio & 0x02)>>1) ) );
		} else {
				#ifndef _N_XMM
		//memcpy((ret+retIndex), (const char *)( (uint64_t)(src+srcIndex+1) ), 16); // Dummy me
		memcpy(retLOCAL, (const char *)( (uint64_t)(srcLOCAL+1) ), 16);
				#endif
				#ifdef _N_XMM
		//SlowCopy128bit( (const char *)( (uint64_t)(src+srcIndex+1+16*(0)) ), ret+retIndex ); // Dummy me
		SlowCopy128bit( (const char *)( (uint64_t)(srcLOCAL+1+16*(0)) ), retLOCAL );
				#endif
		//srcIndex+= ((DWORDtrio & 0xFF)>>3)+1; // Dummy me
		srcLOCAL+= ((DWORDtrio & 0xFF)>>3)+1;
		//retIndex+= ((DWORDtrio & 0xFF)>>3); // Dummy me
		retLOCAL+= ((DWORDtrio & 0xFF)>>3);
		}
	}        
	//return retIndex; // Dummy me
	return (unsigned int)(retLOCAL - ret);
}

; 'Butsuhira' decompression loop, 8a-1a+2=114 bytes long:
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -QxSSE2 -D_N_XMM -FAcs";

.B7.3::                         
  0001a 8b 02            mov eax, DWORD PTR [rdx]               
  0001c a8 01            test al, 1                             
  0001e 74 4e            je .B7.5 
.B7.4::                         
  00020 89 c5            mov ebp, eax                           
  00022 41 bb ff ff ff 
        00               mov r11d, 16777215                     
  00028 83 e5 04         and ebp, 4                             
  0002b 8d 4c 2d 00      lea ecx, DWORD PTR [rbp+rbp]           
  0002f c1 ed 02         shr ebp, 2                             
  00032 41 d3 eb         shr r11d, cl                           
  00035 44 23 d8         and r11d, eax                          
  00038 83 e0 02         and eax, 2                             
  0003b 41 c1 eb 03      shr r11d, 3                            
  0003f 49 f7 db         neg r11                                
  00042 4d 03 da         add r11, r10                           
  00045 d1 e8            shr eax, 1                             
  00047 f3 41 0f 6f 03   movdqu xmm0, XMMWORD PTR [r11]         
  0004c 41 89 eb         mov r11d, ebp                          
  0004f 03 e8            add ebp, eax                           
  00051 41 f7 db         neg r11d                               
  00054 89 e9            mov ecx, ebp                           
  00056 b8 10 00 00 00   mov eax, 16                            
  0005b 41 83 c3 03      add r11d, 3                            
  0005f d3 e8            shr eax, cl                            
  00061 49 03 d3         add rdx, r11                           
  00064 f3 41 0f 7f 02   movdqu XMMWORD PTR [r10], xmm0         
  00069 4c 03 d0         add r10, rax                           
  0006c eb 19            jmp .B7.6 
.B7.5::                         
  0006e 0f b6 c0         movzx eax, al                          
  00071 c1 e8 03         shr eax, 3                             
  00074 f3 0f 6f 42 01   movdqu xmm0, XMMWORD PTR [1+rdx]       
  00079 f3 41 0f 7f 02   movdqu XMMWORD PTR [r10], xmm0         
  0007e 8d 68 01         lea ebp, DWORD PTR [1+rax]             
  00081 4c 03 d0         add r10, rax                           
  00084 48 03 d5         add rdx, rbp                           
.B7.6::                         
  00087 49 3b d0         cmp rdx, r8                            
  0008a 72 8e            jb .B7.3 
'Butsuhira' uses 2MB sliding window whereas 'Washi' 4MB, that is, all references to previous occurrences are more cacheable.
Also I chose to sacrifice size in favor to speed by encoding 8:2 before 16:3.
Meaning, encoding 8bytes long matches with 2bytes with higher priority than 16bytes long matches with 3bytes.
On my laptop (Core 2 T7500 2200MHz), the quicktest with 'enwik8':
D:\Nakamichi_Butsuhira_Washi_JB>Nakamichi_Washi_XMM.exe enwik8.Nakamichi
Nakamichi 'Washi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 42714346 bytes ...
RAM-to-RAM performance: 265 MB/s.

D:\Nakamichi_Butsuhira_Washi_JB>Nakamichi_Butsuhira_branchless_XMM.exe enwik8.Nakamichi
Nakamichi 'Butsuhira', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 43462243 bytes ...
RAM-to-RAM performance: 321 MB/s.

D:\Nakamichi_Butsuhira_Washi_JB>Nakamichi_Butsuhira_XMM.exe enwik8.Nakamichi
Nakamichi 'Butsuhira', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 43462243 bytes ...
RAM-to-RAM performance: 339 MB/s.
Didn't have enough time to benchmark Nakamichi_Butsuhira_XMM with 'DDETT' corpus, to be done.
The package containing the 'Butsuhira_JB' and 'Washi_JB' sources and 64bit executables:
http://www.sanmayce.com/Nakamichi/Nakamichi_Butsuhira_Washi_JB.zip

Also 'Kaidanji' reminds of itself on Colin0912's desktop, Intel i5-4690K (3.50GHz-3.90GHz):
Nakamichi 'Kaidanji', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 63430147 bytes ...
RAM-to-RAM performance: 2980 MB/s.
Memory pool starting address: 00000000050CA080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 25566 clocks or 10.254MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 29%

YAPPY: [b 8K] bytes 100000000 -> 57701807  57.7%  comp  67.1 MB/s  uncomp 1821.7 MB/s 
YAPPY: [b 64K] bytes 100000000 -> 54162908  54.2%  comp  62.3 MB/s  uncomp 1730.6 MB/s 
YAPPY: [b 1024K] bytes 100000000 -> 53687370  53.7%  comp  62.3 MB/s  uncomp 1689.6 MB/s 
This is for Intel 4690K with Corsair Vengenace CL10 10-10-10-27.
This is unoptimized 'Kaidanji', using 2xXMM instead of 1xYMM, also without Jim Dempsey's tweak.

- tenohira ~ the palm (of one's hand)
- hira ~ 1. something broad and flat; palm of the hand; 2. common; ordinary
- butsu ~ Buddha; Buddhism
悲願 - higan ~ one's dearest wish; Buddha's vow to save humanity
菩薩 - bosatsu ~ bodhisattva (one who vows to save all beings before becoming a Buddha)
- nin ~ (arch) endurance; forbearance; patience; self-restraint
忍辱 - ninniku ~ (Buddh) forbearance (in the face of difficulty, persecution, etc.)
不作為 - fusakui ~ forbearance

Update, 2014-Aug-24:

Rewatching one of the superb Stephen Chow's movies led to one superb mix of 'Washi' and 'Jiten'.
Not knowing the four kanjis entitling the manual which the beggar sold to our boy I dummily chose the closest.
In Japanese 'Butsuhira' (after Butsudou), in Chinese I have to ask.
Instead of 'The Buddha's Palm' I took delight in reversing the definite article, thus, 'The Palm of a Buddha'.
My understanding of the movie's main thread is that 'NO FEAR MUDRA' is the ultimate weapon.
The terrorsome (way 'thick' than fearsome) speed and power of Demonic entities rules until this palm 'descends'.
In Kungfu world this technique is royalty, revered by the pure at heart and ridiculed by the profanes.

The spot of the tag 'T' saves one costly negation, it is already 0|1.
Therefore the branchless variant is even more sweeter, below the branchful one is given:
unsigned int Decompress_Butsuhira (char* ret, char* src, unsigned int srcSize) {
	unsigned int srcIndex=0;
	unsigned int retIndex=0;
	unsigned int DWORDtrio;
	unsigned int Flag;
	while (srcIndex < srcSize) {
		DWORDtrio = *(unsigned int*)&src[srcIndex];
// |1stLSB     |2ndLSB  |3rdLSB   |
// --------------------------------
// |T|L|O|xxxxx|xxxxxxxx|xxxxxx|xx|
// --------------------------------
// [1bit           16bit]    24bit]
// T = 0 means Literal
// T = 1 means Match
// L = 0 means Long MatchLength, 16>>(L+O) or 8/16
// L = 1 means Short MatchLength, 16>>(L+O) or 4/8
// O = 0 means Long MatchOffset, 3 bytes long i.e. Sliding Window is 3*8-F-L-O=3*8-3=21 or 2MB
// O = 1 means Short MatchOffset, 2 bytes long i.e. Sliding Window is 2*8-F-L-O=2*8-3=13 or 8KB
		if (DWORDtrio & 0x01) {
				#ifndef _N_XMM
		memcpy((ret+retIndex), (const char *)( (uint64_t)(ret+retIndex-((DWORDtrio&(0xFFFFFF>>((DWORDtrio & 0x04)<<1)))>>3)) ), 16);
				#endif
				#ifdef _N_XMM
		SlowCopy128bit( (const char *)( (uint64_t)(ret+retIndex-((DWORDtrio&(0xFFFFFF>>((DWORDtrio & 0x04)<<1)))>>3)) ), ret+retIndex );
				#endif
		srcIndex+= (uint64_t)(3-((DWORDtrio & 0x04)>>2));
		retIndex+= (uint64_t)( Min_Match_Length>>( ((DWORDtrio & 0x04)>>2) + ((DWORDtrio & 0x02)>>1) ) );
		} else {
				#ifndef _N_XMM
		memcpy((ret+retIndex), (const char *)( (uint64_t)(src+srcIndex+1) ), 16);
				#endif
				#ifdef _N_XMM
		SlowCopy128bit( (const char *)( (uint64_t)(src+srcIndex+1+16*(0)) ), ret+retIndex );
				#endif
		srcIndex+= ((DWORDtrio & 0xFF)>>3)+1;
		retIndex+= ((DWORDtrio & 0xFF)>>3);
		}
	}        
	return retIndex;
}
In quick tests 'Butsuhira' outperformed 'Washi', strangely enough in the movie our boy went higher than the Eagle, too.

Nakamichi
------------------------------------------------------------------------------------------------------------------------------------------------------
| compressor \ filedataset      | alice29.txt    | CalgaryCorpus.tar  | shaks12.txt        | dickens             | enwik8                            |
------------------------------------------------------------------------------------------------------------------------------------------------------
| UNCOMPRESSED                  | 152,089        | 3,153,408          | 5,582,655          | 10,192,446          | 100,000,000                       |
| Nakamichi 'Jiten'      (32KB) | 071,924 / 0244 | 1,533,344 / 018744 | 2,657,388 / 005642 | 04,943,712 / 005555 | 051,084,523 / 0326068 /  676 MB/s |
| Nakamichi 'Butsuhira'   (2MB) | 078,121 / 0307 | 1,461,597 / 007240 | 2,321,653 / 000406 | 04,176,004 / 000661 | 043,462,243 / 0035177 /  ??? MB/s |
| Nakamichi 'Washi'       (4MB) | 088,897 / 0006 | 1,484,221 / 000966 | 2,384,536 / 000011 | 04,261,276 / 000012 | 042,714,346 / 0000232 / 435+ MB/s |
| 7z's gz, Ultra Deflate32      | 051,707        | 0,980,026          | 1,934,787          | 03,681,828          | 035,102,891                       |
| 7z's zip, Ultra Deflate64     | 050,051        | 0,945,849          | 1,834,240          | 03,508,645          | 033,757,921                       |
| TANGELO 2.3                   | 039,160        | 0,710,066          | 1,236,021          | 02,279,659          | 020,921,619                       |
| LZ4 v1.4, -9                  | 063,705        | 1,195,853          | 2,315,036          | 04,442,992          | 042,283,904         / 2186.9 MB/s |
| Yappy, 8192 10000             | 087,965        | 1,654,203          | 3,337,964          | 06,374,780          | 057,701,807         /  698.7 MB/s |
| Yappy, 65536 10000            | 081,217        | 1,544,271          | 3,120,688          | 05,912,295          | 054,162,908         /  679.4 MB/s |
| Yappy, 1048576 10000          | 080,353        | 1,530,823          | 3,091,493          | 05,850,648          | 053,687,370         /  679.4 MB/s |
------------------------------------------------------------------------------------------------------------------------------------------------------

The package containing the source and 64bit executable:
http://www.sanmayce.com/Nakamichi/Nakamichi_Butsuhira.zip

Update, 2014-Aug-23:

Up to this moment the shortlist is this - 'Washi' and 'Jiten', 'the Eagle's grip' and 'the Light touch' as I call them.
Nevertheless, couldn't resist not to try XMM, again, with 6/12 matches and window 8MB or (24-1)bit.
This is 'Suiken', 酔拳, a.k.a. 'Drunkenfist'.
Yes, the legendary and unforgettable drunkenmaster Yuen Siu-tien 袁小田 a.k.a. Yuan Xiaotian.
'Drunkenfist' is good at big (8+MB) texts:
D:\Nakamichi_Suiken>Nakamichi_Suiken_XMM.exe dickens
Nakamichi 'Suiken', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Compressing 10192446 bytes ...
\; Each rotation means 64KB are encoded; Done 100%
NumberOfFullLiterals (lower-the-better): 588
NumberOfTinyMatches: 894542
NumberOfShortMatches: 379034
RAM-to-RAM performance: 3 KB/s.

D:\Nakamichi_Suiken>Nakamichi_Suiken_XMM.exe enwik8
Nakamichi 'Suiken', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Compressing 100000000 bytes ...
/; Each rotation means 64KB are encoded; Done 100%
NumberOfFullLiterals (lower-the-better): 21411
NumberOfTinyMatches: 7439513
NumberOfShortMatches: 4149661
RAM-to-RAM performance: 1 KB/s.

D:\Nakamichi_Suiken>dir

05/16/2014  07:22 AM        10,192,446 dickens
08/23/2014  05:35 AM         4,216,204 dickens.Nakamichi
05/16/2014  07:22 AM       100,000,000 enwik8
08/24/2014  04:29 AM        42,077,135 enwik8.Nakamichi

Nakamichi
unsigned int Decompress_Suiken (char* ret, char* src, unsigned int srcSize) {
	unsigned int srcIndex=0;
	unsigned int retIndex=0;
	unsigned int DWORDtrio;
	unsigned int Flag;
	uint64_t FlagMASK; //=       0xFFFFFFFFFFFFFFFF;
	uint64_t FlagMASKnegated; //=0x0000000000000000;

	while (srcIndex < srcSize) {
		DWORDtrio = *(unsigned int*)&src[srcIndex];
// |1stLSB   |2ndLSB  |3rdLSB   |
// ------------------------------
// |xxxx|TTTT|xxxxxxxx|xxxxxxx|L|
// ------------------------------
// [1bit                   24bit]
// L = 0 means MatchLength (12>>LL) or 12
// L = 1 means MatchLength (12>>LL) or 6
		Flag=!(DWORDtrio & 0xF0);
		// In here Flag=0|1
		FlagMASKnegated= Flag - 1; // -1|0
		FlagMASK= ~FlagMASKnegated;
				#ifdef _N_XMM
		SlowCopy128bit( (const char *)( ((uint64_t)(src+srcIndex+1)&FlagMASK) + ((uint64_t)(ret+retIndex-(DWORDtrio&0x7FFFFF))&FlagMASKnegated) ), (ret+retIndex+16*(0)));
				#endif
				#ifndef _N_XMM
		memcpy((ret+retIndex+16*(0)), (const char *)( ((uint64_t)(src+srcIndex+1)&FlagMASK) + ((uint64_t)(ret+retIndex-(DWORDtrio&0x7FFFFF))&FlagMASKnegated) ), 16);
				#endif
		srcIndex+= ((uint64_t)((DWORDtrio & 0xFF)+1)&FlagMASK) + ((uint64_t)(3)&FlagMASKnegated) ;
		retIndex+= ((uint64_t)((DWORDtrio & 0xFF))&FlagMASK) +   ((uint64_t)(Min_Match_Length>>((DWORDtrio&0xFFFFFF)>>(24-1)))&FlagMASKnegated) ;
	}
	return retIndex;
}
/*
; 'Suiken' decompression loop, be-40+2=128 bytes long:
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -QxSSE2 -D_N_XMM -FAcs";

.B7.3::                         
  00040 42 8b 0c 12      mov ecx, DWORD PTR [rdx+r10]           
  00044 33 ff            xor edi, edi                           
  00046 f7 c1 f0 00 00 
        00               test ecx, 240                          
  0004c 0f 44 f8         cmove edi, eax                         
  0004f 49 89 cc         mov r12, rcx                           
  00052 ff cf            dec edi                                
  00054 49 81 e4 ff ff 
        7f 00            and r12, 8388607                       
  0005b 49 f7 dc         neg r12                                
  0005e 48 89 fe         mov rsi, rdi                           
  00061 4d 03 e1         add r12, r9                            
  00064 48 f7 d6         not rsi                                
  00067 4e 8d 6c 12 01   lea r13, QWORD PTR [1+rdx+r10]         
  0006c 4d 03 e3         add r12, r11                           
  0006f 4c 23 ee         and r13, rsi                           
  00072 4c 23 e7         and r12, rdi                           
  00075 0f b6 d9         movzx ebx, cl                          
  00078 ff c3            inc ebx                                
  0007a f3 43 0f 6f 04 
        2c               movdqu xmm0, XMMWORD PTR [r12+r13]     
  00080 49 89 fc         mov r12, rdi                           
  00083 48 23 de         and rbx, rsi                           
  00086 49 83 e4 03      and r12, 3                             
  0008a 49 03 dc         add rbx, r12                           
  0008d 49 03 da         add rbx, r10                           
  00090 41 89 da         mov r10d, ebx                          
  00093 0f b6 d9         movzx ebx, cl                          
  00096 81 e1 ff ff ff 
        00               and ecx, 16777215                      
  0009c c1 e9 17         shr ecx, 23                            
  0009f 48 23 de         and rbx, rsi                           
  000a2 be 0c 00 00 00   mov esi, 12                            
  000a7 d3 ee            shr esi, cl                            
  000a9 48 23 f7         and rsi, rdi                           
  000ac 48 03 de         add rbx, rsi                           
  000af 49 03 db         add rbx, r11                           
  000b2 f3 43 0f 7f 04 
        19               movdqu XMMWORD PTR [r9+r11], xmm0      
  000b8 41 89 db         mov r11d, ebx                          
  000bb 45 3b d0         cmp r10d, r8d                          
  000be 72 80            jb .B7.3 
*/
Nakamichi

The package containing the source and 64bit executable:
http://www.sanmayce.com/Nakamichi/Nakamichi_Suiken.zip

Nakamichi
toshiyori ~ an old man
nomite ~ a hard-drinker
nomitsu ~ wild honey
keima ~ the knight

Nakamichi
- kobushi ~ fist
猿拳 - saruken ~ Monkey Fist; Monkey-Style kung-fu
鉄拳 - tekken ~ fist
形意拳 - keiiken ~ shape-of-the-mind fist; Hsing I Chuan
虎燕拳 - koenken ~ Tiger Swallow Fist
五形拳 - gokeiken ~ Wu Xing Fist; Five Form Fist (Dragon, Snake, Tiger, Crane, Leopard)
太極拳 - taikyokuken ~ grand ultimate fist; Tai Chi Chuan
八極拳 - hakkyokuken ~ Eight Extremities Fist

Update, 2014-Aug-20:

First, the unfinished benchmark, Kinroba's performance on laptop with Q9550s 2.83GHz:
E:\Goldendonkey>timer32 Nakamichi_Kinroba_YMMless.exe enwik8
Nakamichi 'Kinroba', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Compressing 100000000 bytes ...
/; Each rotation means 64KB are encoded; Done 100%
NumberOfFullLiterals (lower-the-better): 237
NumberOfTinyMatchesSmallWindow (4): 2019986
NumberOfShortMatchesSmallWindow (8): 1233902
NumberOfMediumMatchesSmallWindow (16): 287565
NumberOfLongMatchesSmallWindow (32): 87923
NumberOfTinyMatchesRegularWindow (4): 1725463
NumberOfShortMatchesRegularWindow (8): 2624994
NumberOfMediumMatchesRegularWindow (16): 345508
NumberOfLongMatchesRegularWindow (32): 78694
NumberOfTinyMatchesBigWindow (4): 0
NumberOfShortMatchesBigWindow (8): 1940102
NumberOfMediumMatchesBigWindow (16): 1084111
NumberOfLongMatchesBigWindow (32): 123782
RAM-to-RAM performance: 0 KB/s.

Kernel  Time =     0.561 =    0%
User    Time =274395.431 =   99%
Process Time =274395.993 =   99%    Virtual  Memory =    224 MB
Global  Time =274603.553 =  100%    Physical Memory =    133 MB

E:\Goldendonkey>dir

05/16/2014  07:22 AM       100,000,000 enwik8
08/18/2014  06:16 AM        36,860,695 enwik8.Nakamichi

E:\Goldendonkey>Nakamichi_Kinroba_YMMless.exe enwik8.Nakamichi
Nakamichi 'Kinroba', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 36860695 bytes ...
RAM-to-RAM performance: 243 MB/s.
Relatively slow decompression, but this is block 256MB, in fact ~100MB, when multi-threaded the block should be 3MB/12MB.
Moreover, this is without YMM register (4 QWORDs are used instead), misery within misery.

Nakamichi

Few sketches on incoming Building-Blocks compression approach.
Several useful tools I put together in 'Chunkerito' mini-package.
They all, in one way or another, will contribute to the future Nakamichi revisions.
E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>dir

08/18/2014  04:24 AM            12,895 Chunkerito.c
08/18/2014  04:24 AM            62,976 Chunkerito.exe
08/18/2014  04:24 AM           140,288 Leprechaun_BB001hex_128p_Intel.exe
08/18/2014  04:24 AM           139,776 Leprechaun_BB002hex_128p_Intel.exe
08/18/2014  04:24 AM           140,288 Leprechaun_BB004hex_128p_Intel.exe
08/18/2014  04:24 AM           141,312 Leprechaun_BB008hex_128p_Intel.exe
08/18/2014  04:24 AM           141,824 Leprechaun_BB016hex_128p_Intel.exe
08/18/2014  04:24 AM           142,848 Leprechaun_BB032hex_128p_Intel.exe
08/18/2014  04:24 AM           142,336 Leprechaun_BB048hex_128p_Intel.exe
08/18/2014  04:24 AM           141,312 Leprechaun_BB128hex_128p_Intel.exe
08/18/2014  04:24 AM           327,386 Leprechaun_BBhex.c
08/18/2014  04:24 AM               952 Leprechaun_BBhex_COMPILE.BAT
08/18/2014  04:24 AM               113 Leprechaun_Dump_BuildingBlocks_Order_001.bat
08/18/2014  04:24 AM               113 Leprechaun_Dump_BuildingBlocks_Order_002.bat
08/18/2014  04:24 AM               113 Leprechaun_Dump_BuildingBlocks_Order_004.bat
08/18/2014  04:24 AM               113 Leprechaun_Dump_BuildingBlocks_Order_008.bat
08/18/2014  04:24 AM               113 Leprechaun_Dump_BuildingBlocks_Order_016.bat
08/18/2014  04:24 AM               113 Leprechaun_Dump_BuildingBlocks_Order_032.bat
08/18/2014  04:24 AM               113 Leprechaun_Dump_BuildingBlocks_Order_048.bat
08/18/2014  04:24 AM               113 Leprechaun_Dump_BuildingBlocks_Order_128.bat
08/18/2014  04:24 AM               324 MakeEXEs_Kinroba.bat
08/18/2014  04:24 AM               304 MakeEXEs_Washi.bat
08/18/2014  04:24 AM             1,604 MokujIN prompt.lnk
08/18/2014  04:24 AM            86,007 Nakamichi_Kinroba.c
08/18/2014  04:24 AM           146,432 Nakamichi_Kinroba.doc
08/18/2014  04:24 AM            94,716 Nakamichi_Kinroba.pdf
08/18/2014  04:24 AM            98,304 Nakamichi_Kinroba_YMMless_32bit.exe
08/18/2014  04:24 AM           112,128 Nakamichi_Kinroba_YMM_64bit.exe
08/18/2014  04:24 AM            78,917 Nakamichi_Washi.c
08/18/2014  04:24 AM           486,848 Nakamichi_Washi_sourcelist.pdf
08/18/2014  04:24 AM            96,768 Nakamichi_Washi_YMMless_32bit.exe
08/18/2014  04:24 AM           112,128 Nakamichi_Washi_YMM_64bit.exe
08/18/2014  04:24 AM             9,458 sha1sum.c
08/18/2014  04:24 AM            68,608 sha1sum.exe
One example, how 1GB file is splitted into 128MB blocks:
D:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>move ..\DDETT-Definitive_Decompression_English_Texts_Torture.tar

D:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>Chunkerito.exe
Chunkerito, revision 1+, written by Kaze.
Purpose: To chunkize/split any file to 'ChunkSize' long chunks.
Usage: Chunker filename ChunkSize
Note: For 128MB chunks use ChunkSize = 134217728

D:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>Chunkerito.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar 134217728
Chunkerito, revision 1+, written by Kaze.
Purpose: To chunkize/split any file to 'ChunkSize' long chunks.
Usage: Chunker filename ChunkSize
Note: For 128MB chunks use ChunkSize = 134217728
Size of Input TEXTual file: 1,082,907,648
|; Chunk # 9 has been created ...

D:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>dir

08/15/2014  10:52 PM       134,217,728 Chunkerito.000,001
08/15/2014  10:52 PM       134,217,728 Chunkerito.000,002
08/15/2014  10:52 PM       134,217,728 Chunkerito.000,003
08/15/2014  10:52 PM       134,217,728 Chunkerito.000,004
08/15/2014  10:52 PM       134,217,728 Chunkerito.000,005
08/15/2014  10:52 PM       134,217,728 Chunkerito.000,006
08/15/2014  10:52 PM       134,217,728 Chunkerito.000,007
08/15/2014  10:52 PM       134,217,728 Chunkerito.000,008
08/15/2014  10:52 PM         9,165,824 Chunkerito.000,009
08/15/2014  07:52 AM     1,082,907,648 DDETT-Definitive_Decompression_English_Texts_Torture.tar
Statistics below answer why I am overpowered by the computations required to get the pieces (distinct BuildingBlocks) of the puzzle (the 1GB file).
Puzzle's pieces (distinct BuildingBlocks) order 1/4/8/16/32/48/128 are extracted.
The number preceding the BuildingBlock (given in HEX format) says how many of it are encountered.
Of course, all BBs appearing once i.e. with 000,000,001 tag are to be discarded.
The idea is to get rid of the whole search process by offering all OFFSETs (housed by a DWORD) of a given BB.
OFFSETs are all below 1GB so 4bytes are enough.
Thus by OFFSETs subtractions 'precomputed' distances will be easily accessible.
For example, see a bit below, the match 486F7276 (BB order 4) is encountered 8 times, therefore its 'record' looks like:
[4bytes  ][4bytes][8bytes]
[486F7276][DWORD0][QWORD0]
[DWORD0] = 8
[QWORD0] = OFFSET (within BB OFFSET pool) i.e. the starting address of its OFFSETs in ascending order:
[OFFSET+00] = [DWORD1]
[OFFSET+04] = [DWORD2]
...
[OFFSET+28] = [DWORD8]

And the actual 'BBs/matches ripping' - the first stage of precomputation:

Order 004 in HRF (Human-Readable-Format):
RIPPING: E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>Leprechaun_Dump_DDETT_BuildingBlocks_Order_004.bat DDETT-Definitive_Decompression_English_Texts_Torture.tar
RIPFILE: 26,875,310 DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_004bytes_long_blocks_in_HEX.txt
RIPDATA: E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>type DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_004bytes_long_blocks_in_HEX.txt|more
000,000,008     486F7276
000,000,005     5F6C666C
000,000,420     6C642D73
000,000,003     61632D77
000,000,025     69632D67
000,000,002     656D62C3
000,000,002     594D4157
000,000,069     2C666577
...

Total memory needed for one pass: 77,988KB
Total distinct BBs/phrases: 1,845,425
Total (128 passes) time: 3458 second(s)
Order 008 in HRF (Human-Readable-Format):
RIPPING: E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>Leprechaun_Dump_DDETT_BuildingBlocks_Order_008.bat DDETT-Definitive_Decompression_English_Texts_Torture.tar
RIPFILE: 993,870,210 DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_008bytes_long_blocks_in_HEX.txt
RIPDATA: E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>type DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_008bytes_long_blocks_in_HEX.txt|more
000,000,003     2053544154452034
000,000,007     5354204953204541
000,000,002     627574204D656E64
000,000,004     6F6D616E223EF769
000,000,002     776173650D0A706C
000,000,018     5354494C4C204341
000,000,003     3C693E70726F5F65
000,000,008     65732E200D0A512E
...

Total memory needed for one pass: 3,497,412KB
Total distinct BBs/phrases: 71,214,872
Total (128 passes) time: 4203 second(s)
Order 016 in HRF (Human-Readable-Format):
RIPPING: E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>Leprechaun_Dump_DDETT_BuildingBlocks_Order_016.bat DDETT-Definitive_Decompression_English_Texts_Torture.tar
RIPFILE: 5,116,625,908 DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_016bytes_long_blocks_in_HEX.txt
RIPDATA: E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>type DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_016bytes_long_blocks_in_HEX.txt|more
000,000,004     7370656C6C696E67206F667C74656C65
000,000,003     207665727920776F726B696E67206F66
000,000,002     617070656172616E6365206F72206D6F
000,000,002     757320666F7220746865204272697469
000,000,003     72697479206F66207468652065617274
...

Total memory needed for one pass: 39,766,974KB
Total distinct BBs/phrases: 554,282,812
Total (128 passes) time: 5979 second(s)
Order 032 in HRF (Human-Readable-Format):
RIPPING: E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>Leprechaun_Dump_DDETT_BuildingBlocks_Order_032.bat DDETT-Definitive_Decompression_English_Texts_Torture.tar
RIPFILE: 5,100,468,984 DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_032bytes_long_blocks_in_HEX.txt
RIPDATA: E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>type DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_032bytes_long_blocks_in_HEX.txt|more
000,000,002     756572793A2D0D0A0D0A20202020202020202020202020202020466F72206120
000,000,002     732E626D70223E3C2F613E3C623E5F736D5F725F726576765F6E6C5F73687469
000,000,002     72642C20616E642061726520746F206265206B696E677320616E64207072696E
000,000,002     61207374616E6461726420626F74746C652E0D0A456E676C697368094A65726F
000,000,002     206265206C65667420696E207468697320706C616E65206F6E6C792E20546865
000,000,002     6D6F6F6E206D617920626520636F6D706172656420746F205061726161727468
...

Total memory needed for one pass: 106,480,394KB
Total distinct BBs/phrases: 898,193,409
Total (128 passes) time: 8886 second(s)
Order 048 in HRF (Human-Readable-Format):
RIPPING: E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>Leprechaun_Dump_DDETT_BuildingBlocks_Order_048.bat DDETT-Definitive_Decompression_English_Texts_Torture.tar
RIPFILE: 5,177,818,580 DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_048bytes_long_blocks_in_HEX.txt
RIPDATA: E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>type DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_048bytes_long_blocks_in_HEX.txt|more
000,000,005     7375703E323C2F7375703E3B20616E20696E7374616E6365206F6620746869732E203C666F6E742073697A653D313E4D
000,000,002     696E6720796F75722067726F6F76650D0A5769746820616E20696E65666661626C6520616E64206D616C652064656C69
000,000,002     7320636F6E7374616E7420706F736974697665207374617465200A6F66206C696768742C20706F7765722C206A6F792C
000,000,002     65707320696E206D7920646972656374696F6E2E0D0A0D0A49742072656D61696E6564206E6F77206F6E6C7920746F20
000,000,002     6520616E642074686520746563686E6971756520746F207475726E20746865206D696E6420617761792066726F6D2074
...

Total memory needed for one pass: 159,520,214KB
Total distinct BBs/phrases: 962,829,673
Total (128 passes) time: 10077 second(s)
Order 128 in HRF (Human-Readable-Format):
RIPPING: E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>Leprechaun_Dump_DDETT_BuildingBlocks_Order_128.bat DDETT-Definitive_Decompression_English_Texts_Torture.tar
RIPFILE: 5,785,162,560 DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_128bytes_long_blocks_in_HEX.txt
RIPDATA: E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>type DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_128bytes_long_blocks_in_HEX.txt|more
000,000,002     4E616D67617920446F6F6C612773206661636520636C6F7564656420666F722061206D6F6D656E742E2053686F72746C790D0A6166746572776172642068652077697468647265772066726F6D206D792074656E742C20616E6420492068656172642068696D2073696E67696E670D0A736F66746C7920616D6F6E6720746865
000,000,002     6164206E657665722074686F756768742061626F7574206974206265666F72652E2054726565626561726420686164207361696420736F6D657468696E672061626F75742077697A617264732C20627574206576656E207468656E20686520686164206E6F742074686F75676874206F662047616E64616C66206173206F6E65
000,000,002     6F7274616C2C207468652061626964696E672C2074686174207768696368200A6973206E6F742061666665637465642062792074696D653B20776973656C79207365656B696E672054686174206265636175736520796F75206861766520656E7175697265642C200A6F62736572766564206C69666520616E64206F62736572
000,000,002     656E206F66206D617465726E616C20766963652C0D0A4F72206F6620666563756E646974792074686520686964656F75732070726963652E0D0A0D0A0D0A576520686176652028636F72727570746564206E6174696F6E732920697420697320747275650D0A42656175746965732074686520616E6369656E742070656F706C
000,000,002     3E3C2F613E203C693E6E6F756E3C2F693E20283C693E5A6F6F6C6F67793C2F693E29203D203C6120687265663D22646F633A3832363637223E3C666F6E742073697A653D313E444555544F4D45524954453C2F666F6E743E3C2F613E203C666F6E742073697A653D313E4C31393C2F666F6E743E2E203C62723E3C6A31313439
...

Total memory needed for one pass: 417,054,121KB
Total distinct BBs/phrases: 1,044,425,155
Total (128 passes) time: 20841 second(s)
Had I have one 512MB SSD I would have run order 128 in one pass, 'Z' option, that is external B-trees.
On second thought, seeing how short on operative RAM nowadays PCs are, all orders ought to be ripped in one pass that way.

In summary, 4/8/16/32/48/128 orders stats are:
Order 004: Total / Distinct / Distinct_with_2[+]_occurrences BBs: (1,082,907,648-004+1) / 0,001,845,425 / (0,026,875,310/(12+004*2+2)=001,221,605)
Order 008: Total / Distinct / Distinct_with_2[+]_occurrences BBs: (1,082,907,648-008+1) / 0,071,214,872 / (0,993,870,210/(12+008*2+2)=033,129,007)
Order 016: Total / Distinct / Distinct_with_2[+]_occurrences BBs: (1,082,907,648-016+1) / 0,554,282,812 / (5,116,625,908/(12+016*2+2)=111,230,998)
Order 032: Total / Distinct / Distinct_with_2[+]_occurrences BBs: (1,082,907,648-032+1) / 0,898,193,409 / (5,100,468,984/(12+032*2+2)=065,390,628)
Order 048: Total / Distinct / Distinct_with_2[+]_occurrences BBs: (1,082,907,648-048+1) / 0,962,829,673 / (5,177,818,580/(12+048*2+2)=047,071,078)
Order 128: Total / Distinct / Distinct_with_2[+]_occurrences BBs: (1,082,907,648-128+1) / 1,044,425,155 / (5,785,162,560/(12+128*2+2)=021,426,528)

For exact size of Nakamichi.BBOP file, BB OFFSET pool, all occurrences have to be summed and multiplied by 4.
As for BB files (pointing to Nakamichi.BBOP) they are of size:
Nakamichi.004.BB: 001,221,605*(004+4+8)=0,019,545,680
Nakamichi.008.BB: 033,129,007*(008+4+8)=0,662,580,140
Nakamichi.016.BB: 111,230,998*(016+4+8)=3,114,467,944
Nakamichi.032.BB: 065,390,628*(032+4+8)=2,877,187,632

In total 19,545,680+662,580,140+3,114,467,944+2,877,187,632=6364MB, fittable in physical RAM of a 'modern' PC with 16GB.
And needed time to create BB files:
Total (128 passes) time: 3458 second(s)
Total (128 passes) time: 4203 second(s)
Total (128 passes) time: 5979 second(s)
Total (128 passes) time: 8886 second(s)
Or, 3458+4203+5979+8886=22526, ugh, some nasty 6h.

Also, order 1 was ripped:
/*
E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>type DDETT-Definitive_Decompression_English_Texts_Torture.tar_Unique_001bytes_long_blocks_in_HEX.txt|sort/r
185,485,317     20
086,539,710     65
061,659,760     74
057,702,347     61
055,736,285     6F
053,296,887     6E
053,105,807     69
046,702,098     73
042,311,563     72
039,995,482     68
029,912,849     6C
027,709,393     64
021,116,713     75
019,978,250     63
019,759,916     66
018,016,495     6D
015,928,640     67
015,539,382     0A
014,767,818     0D
014,720,429     70
013,882,649     3E
013,880,698     3C
013,584,450     79
012,839,223     2C
012,681,891     77
012,658,135     62
011,622,474     2E
009,361,858     76
...
E:\Chunkerito_revision1+_Goldendonkey_Washi_BuildingBlocksDumper>type enwik9_Unique_001bytes_long_blocks_in_HEX.txt|sort/r
139,132,610     20
077,130,764     65
057,589,780     74
055,297,286     61
050,051,692     69
049,514,893     6F
046,064,994     6E
043,296,607     72
040,844,869     73
028,424,211     6C
026,177,831     68
022,976,495     64
020,385,009     63
020,216,224     5D
020,214,205     5B
019,831,916     75
019,597,417     6D
015,486,987     70
014,109,043     67
013,147,025     0A
012,824,746     66
010,044,325     79
009,578,832     2E
...
*/

static const unsigned char DDETT[256] = {0x20, 0x65, 0x74, 0x61, 0x6F, 0x6E, 0x69, 0x73, 0x72, 0x68, 0x6C, 0x64, 0x75, 0x63, 0x66, 0x6D, 0x67, 0x0A, 0x0D, 0x70, 0x3E, 0x3C, 0x79};
static const unsigned char Enwik[256] = {0x20, 0x65, 0x74, 0x61, 0x69, 0x6F, 0x6E, 0x72, 0x73, 0x6C, 0x68, 0x64, 0x63, 0x5D, 0x5B, 0x75, 0x6D, 0x70, 0x67, 0x0A, 0x66, 0x79, 0x2E};

printf("TOP 23 chars:\n");
printf("DDETT          | ENWIK9\n");
printf("-------------------------------\n");
for (j = 0; j < 23; j++) {
	if (DDETT[j]!=10 && DDETT[j]!=13)
		printf("%02X = %03d = '%c' | ",DDETT[j],DDETT[j],DDETT[j]);
	if (DDETT[j]==10 )
		printf("%02X = %03d = LF  | ",DDETT[j],DDETT[j]);
	if (DDETT[j]==13 )
		printf("%02X = %03d = CR  | ",DDETT[j],DDETT[j]);
	if (Enwik[j]!=10 && Enwik[j]!=13)
		printf("%02X = %03d = '%c'\n",Enwik[j], Enwik[j],Enwik[j]);
	if (Enwik[j]==10 )
		printf("%02X = %03d = LF\n",Enwik[j], Enwik[j]);
	if (Enwik[j]==13 )
		printf("%02X = %03d = CR\n",Enwik[j], Enwik[j]);
}
exit(1);

/*
TOP 23 chars:
DDETT          | ENWIK9
-------------------------------
20 = 032 = ' ' | 20 = 032 = ' '
65 = 101 = 'e' | 65 = 101 = 'e'
74 = 116 = 't' | 74 = 116 = 't'
61 = 097 = 'a' | 61 = 097 = 'a'
6F = 111 = 'o' | 69 = 105 = 'i'
6E = 110 = 'n' | 6F = 111 = 'o'
69 = 105 = 'i' | 6E = 110 = 'n'
73 = 115 = 's' | 72 = 114 = 'r'
72 = 114 = 'r' | 73 = 115 = 's'
68 = 104 = 'h' | 6C = 108 = 'l'
6C = 108 = 'l' | 68 = 104 = 'h'
64 = 100 = 'd' | 64 = 100 = 'd'
75 = 117 = 'u' | 63 = 099 = 'c'
63 = 099 = 'c' | 5D = 093 = ']'
66 = 102 = 'f' | 5B = 091 = '['
6D = 109 = 'm' | 75 = 117 = 'u'
67 = 103 = 'g' | 6D = 109 = 'm'
0A = 010 = LF  | 70 = 112 = 'p'
0D = 013 = CR  | 67 = 103 = 'g'
70 = 112 = 'p' | 0A = 010 = LF
3E = 062 = '>' | 66 = 102 = 'f'
3C = 060 = '<' | 79 = 121 = 'y'
79 = 121 = 'y' | 2E = 046 = '.'
*/

Update, 2014-Aug-12:

Influenced by the story of Kintaro, the legendary 'Golden Boy', here comes my most lovely variant so far 'Kinroba'.
Probably no such word existed in Japanese until now, one of the most underestimated virtues is forbearance.
Forbearance and Faith are so closely existing that sometimes I think they are one and the same thing.
So, in honor to this animal I coined the word 'Golden Donkey'.
Also, seeing how 'modern' cpu-ram subsystems dislike branchless code I returned back from the future using one 'if-else'.

Didn't have enough time to benchmark it, yet, my expectations are HIGH.
For 'enwik8' I expect going under the 40,000,000 mark.
As for decompression speed, just look at the Assembly code, only 2 DWORD memory accesses, mutsi-mutsi.
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -QxSSE2 -D_N_YMM -FAcs";

.B8.3::
  0001f 8b 1c 10         mov ebx, DWORD PTR [rax+rdx]           
  00022 41 89 db         mov r11d, ebx                          
  00025 41 83 e3 03      and r11d, 3                            
  00029 74 51            je .B8.5 
.B8.4::                         
  0002b 44 89 d9         mov ecx, r11d                          
  0002e be ff ff ff 3f   mov esi, 1073741823                    
  00033 83 f1 03         xor ecx, 3                             
  00036 41 ff c3         inc r11d                               
  00039 c1 e1 03         shl ecx, 3                             
  0003c d3 ee            shr esi, cl                            
  0003e 23 f3            and esi, ebx                           
  00040 c1 ee 02         shr esi, 2                             
  00043 48 f7 de         neg rsi                                
  00046 49 03 f2         add rsi, r10                           
  00049 4c 03 d8         add r11, rax                           
  0004c 44 89 d8         mov eax, r11d                          
  0004f c4 c1 7e 6f 04 
        31               vmovdqu ymm0, YMMWORD PTR [r9+rsi]     
  00055 be ff ff ff ff   mov esi, -1                            
  0005a d3 ee            shr esi, cl                            
  0005c f7 d9            neg ecx                                
  0005e 83 c1 1e         add ecx, 30                            
  00061 23 de            and ebx, esi                           
  00063 d3 eb            shr ebx, cl                            
  00065 89 d9            mov ecx, ebx                           
  00067 bb 20 00 00 00   mov ebx, 32                            
  0006c d3 eb            shr ebx, cl                            
  0006e 49 03 d9         add rbx, r9                            
  00071 c4 81 7e 7f 04 
        11               vmovdqu YMMWORD PTR [r9+r10], ymm0     
  00077 41 89 d9         mov r9d, ebx                           
  0007a eb 1c            jmp .B8.6 
.B8.5::                         
  0007c c5 fe 6f 44 10 
        01               vmovdqu ymm0, YMMWORD PTR [1+rax+rdx]  
  00082 0f b6 db         movzx ebx, bl                          
  00085 c1 eb 02         shr ebx, 2                             
  00088 c4 81 7e 7f 04 
        11               vmovdqu YMMWORD PTR [r9+r10], ymm0     
  0008e 8d 44 03 01      lea eax, DWORD PTR [1+rbx+rax]         
  00092 41 03 d9         add ebx, r9d                           
  00095 41 89 d9         mov r9d, ebx                           
.B8.6::                         
  00098 41 3b c0         cmp eax, r8d                           
  0009b 72 82            jb .B8.3 

unsigned int DecompressKinroba (char* ret, char* src, unsigned int srcSize) {
	unsigned int srcIndex=0;
	unsigned int retIndex=0;
	unsigned int DWORDtrio;
	unsigned int Flag;
	while (srcIndex < srcSize) {
		DWORDtrio = *(unsigned int*)&src[srcIndex];
// |1stLSB  |2ndLSB   |3rdLSB   |4thLSB   |
// ----------------------------------------
// |FFxxxxxx|xxxxxx|TT|xxxxxx|TT|xxxxxx|TT|
// ----------------------------------------
// [1bit         16bit]    24bit]    32bit]
// TT = 0 means MatchLength (32>>TT) or 32
// TT = 1 means MatchLength (32>>TT) or 16
// TT = 2 means MatchLength (32>>TT) or 8
// TT = 3 means MatchLength (32>>TT) or 4
// FF = 0 means Literal
// FF = 1 means Literal MatchOffset 2 bytes long i.e. Sliding Window is 2*8-F-TT=2*8-2-2=12 or 4KB
// FF = 2 means Literal MatchOffset 3 bytes long i.e. Sliding Window is 3*8-F-TT=3*8-2-2=20 or 1MB
// FF = 3 means Literal MatchOffset 4 bytes long i.e. Sliding Window is 4*8-F-TT=4*8-2-2=28 or 256MB
		if (DWORDtrio & 0x03) {
				#ifndef _N_YMM
		memcpy((ret+retIndex), (const char *)( (uint64_t)(ret+retIndex-((DWORDtrio&(0x3FFFFFFF>>((3-(DWORDtrio & 0x03))<<3)))>>2)) ), 32);
				#endif
				#ifdef _N_YMM
		SlowCopy256bit( (const char *)( (uint64_t)(ret+retIndex-((DWORDtrio&(0x3FFFFFFF>>((3-(DWORDtrio & 0x03))<<3)))>>2)) ), (ret+retIndex+32*(0)));
				#endif
		srcIndex+= (uint64_t)(1+(DWORDtrio & 0x03));
		retIndex+= (uint64_t)( (Min_Match_Length) >>( (DWORDtrio&(0xFFFFFFFF>>((3-(DWORDtrio & 0x03))<<3))) >> (30-((3-(DWORDtrio & 0x03))<<3)) ));
		} else {
				#ifndef _N_YMM
		memcpy((ret+retIndex), (const char *)( (uint64_t)(src+srcIndex+1) ), 32);
				#endif
				#ifdef _N_YMM
		SlowCopy256bit( (const char *)( (uint64_t)(src+srcIndex+1+32*(0)) ), (ret+retIndex+32*(0)) );
				#endif
		srcIndex+= ((DWORDtrio & 0xFF)>>2)+1;
		retIndex+= ((DWORDtrio & 0xFF)>>2);
		}
	}        
	return retIndex;
}
The package containing the source and 32bit/64bit executables:
http://www.sanmayce.com/Nakamichi/Nakamichi_Kinroba.zip

Update, 2014-Aug-10:

Time for a quite natural follow-up to 'Washi', a heavyweight variant called 'Keigan'.
In 'Keigan' 2MB/512MB sliding windows are used, going from 4MB (22bit) to 2MB (21bit) hurts ratio.
However, if many 32/16 long matches exist within the next window (29bit) the tradeoff will pay off (32:4 or 16:4).
I brutally disregard nowadays limitations like cache sizes and other boring stuff.
'Keigan' is a maverick and targets late twenties of 21st century, heh-heh, deeply penetrative, as it should.

Nakamichi

Update, 2014-Aug-04:

Just saw a helping hand, big thanks go to Blameless:
http://www.overclock.net/t/1496912/superfast-decompression-white-fox-benchmark
Time for a quite natural follow-up to 'Washi', a lightweight variant called 'Jiten'.
In my view 'Washi' is superbly balanced - no speed/size compromises of any kind, yet in 'Jiten' I sacrifice size for speed.

Nakamichi
Contemporary Prithvi statue from Jakarta Indonesia.
Prithvi/Jiten: Guardian of the downward direction. Earth goddess. Consort of Brahma/Bonten.
Prithvi worship fell out of fashion quickly after the Vedic period,
as other mother goddesses like Lakshmi and Saraswati gained popularity.
As a result, Indian Prithvi images are rare. However,
in Indonesia Prithvi was adopted as "Ibu Pertiwi" and is the national personification of Indonesia,
much like Bharat Mata is in India.

Saraswati/Benzaiten 弁財天
The only female of the Shichi Fukujin is Benzaiten (a.k.a. Benten) and is originally the Hindu goddess of water.
Another one of the Seven Lucky Gods. Overall she seems to have been transmitted to Japan without much major change.
She has retained her association with waters, music, language and knowledge.
She has become the bodhisattva of entertainers, similar to how musicians worship Saraswati in India.
Shinto worshippers have also adopted her as a kami.
In its japanese representation, she is the Goddess of Arts and Knowledge.
------------------------------------------------------------------------------------------------------------------------------------------------------
| compressor \ filedataset      | alice29.txt    | CalgaryCorpus.tar  | shaks12.txt        | dickens             | enwik8                            |
------------------------------------------------------------------------------------------------------------------------------------------------------
| UNCOMPRESSED                  | 152,089        | 3,153,408          | 5,582,655          | 10,192,446          | 100,000,000                       |
| Nakamichi 'Jiten'      (32KB) | 071,924 / 0244 | 1,533,344 / 018744 | 2,657,388 / 005642 | 04,943,712 / 005555 | 051,084,523 / 0326068 /  676 MB/s |
| Nakamichi 'Washi'       (4MB) | 088,897 / 0006 | 1,484,221 / 000966 | 2,384,536 / 000011 | 04,261,276 / 000012 | 042,714,346 / 0000232 / 435+ MB/s |
| 7z's gz, Ultra Deflate32      | 051,707        | 0,980,026          | 1,934,787          | 03,681,828          | 035,102,891                       |
| 7z's zip, Ultra Deflate64     | 050,051        | 0,945,849          | 1,834,240          | 03,508,645          | 033,757,921                       |
| TANGELO 2.3                   | 039,160        | 0,710,066          | 1,236,021          | 02,279,659          | 020,921,619                       |
| LZ4 v1.4, -9                  | 063,705        | 1,195,853          | 2,315,036          | 04,442,992          | 042,283,904         / 2186.9 MB/s |
| Yappy, 8192 10000             | 087,965        | 1,654,203          | 3,337,964          | 06,374,780          | 057,701,807         /  698.7 MB/s |
| Yappy, 65536 10000            | 081,217        | 1,544,271          | 3,120,688          | 05,912,295          | 054,162,908         /  679.4 MB/s |
| Yappy, 1048576 10000          | 080,353        | 1,530,823          | 3,091,493          | 05,850,648          | 053,687,370         /  679.4 MB/s |
------------------------------------------------------------------------------------------------------------------------------------------------------
The package containing the source:
http://www.sanmayce.com/Nakamichi/Nakamichi_Jiten.zip
unsigned int Decompress_Jiten (char* ret, char* src, unsigned int srcSize) {
	unsigned int srcIndex=0;
	unsigned int retIndex=0;
	unsigned int DWORDtrio;
	// |1stLSB   |2ndLSB   |
	// ---------------------
	// |xxxTT|TTT|xxxxxx|xL|
	// ---------------------
	// [1bit          16bit]
	// LL = 0 means (8>>LL) or 8
	// LL = 1 means (8>>LL) or 4
	while (srcIndex < srcSize) {
		DWORDtrio = *(unsigned short int*)&src[srcIndex];
		if ( (DWORDtrio & 0xF8) == 0 ) {
			*(uint64_t*)(ret+retIndex+8*(0)) = *(uint64_t*)(src+srcIndex+1+16*(0));
			retIndex+= (DWORDtrio & 0xFF); 
			srcIndex+= ((DWORDtrio & 0xFF)+1);
		} else {
			*(uint64_t*)(ret+retIndex+8*(0)) = *(uint64_t*)(ret+retIndex-(DWORDtrio&0x7FFF));
			retIndex+= (Min_Match_Length>>(DWORDtrio>>(23-8)));
			srcIndex+= (3-1);
		}
	}
	return retIndex;
}
D:\DDETT>Nakamichi_Jiten_GP.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar
Nakamichi 'Jiten', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Compressing 1082907648 bytes ...
\; Each rotation means 64KB are encoded; Done 100%
NumberOfFullLiterals (lower-the-better): 1128961
NumberOfTinyMatches: 108701600
NumberOfShortMatches: 70675430
NumberOfMediumMatches: 0
NumberOfLongMatches: 0
RAM-to-RAM performance: 197 KB/s.

D:\DDETT>dir

08/03/2014  07:46 PM     1,082,907,648 DDETT-Definitive_Decompression_English_Texts_Torture.tar
08/04/2014  07:01 AM       477,188,523 DDETT-Definitive_Decompression_English_Texts_Torture.tar.Nakamichi
08/02/2014  05:19 AM             1,604 MokujIN prompt.lnk
08/04/2014  05:37 AM           109,056 Nakamichi_Jiten_GP.exe

D:\DDETT>Nakamichi_Jiten_GP.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar.Nakamichi /report
Nakamichi 'Jiten', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 477188523 bytes ...
RAM-to-RAM performance: 769 MB/s.
Memory pool starting address: 000000001CBD0080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 94911 clocks or 2.762MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 27%

D:\DDETT>Yappy_32bit.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar 32768 10000
YAPPY: [b 32K] bytes 1082907648 -> 526570443  48.6%  comp  30.3 MB/s  uncomp 668.9 MB/s
Update, 2014-Aug-03:

Looking at all compression corpora scattered throughout INTERNET one thought arises again and again - we can do better.
Mixing different types of data as in SILESIA corpus is good but it says little about English texts as a whole.
Especially when speaking of decompression speed department.
Hardcore English texts torturers like me need the text picture only, and DEFINITIVE on top of that, that is:
- Giving coherent compression ratio statistics;
- Giving MAIN RAM decompression speed statistics;
- Being big enough in order to escape being trapped within L1/L2/L3 when compressed.

A year ago I saw a division of Intel, India, producing i7s with 30MB cache.
Now, it is time to have compressed corpus far bigger than this, yes?
For example 'enwik8' is simply outdated, 'enwik9' is quite useful but not exactly.
So, after tarring 1GB (1,082,810,446 bytes) of mainstream English texts into one single file we have:
1.0:1     1,082,907,648 DDETT-Definitive_Decompression_English_Texts_Torture.tar
2.8:1       380,665,413 DDETT-Definitive_Decompression_English_Texts_Torture.tar.19.lzt              ! -19 !
4.7:1       227,860,332 DDETT-Definitive_Decompression_English_Texts_Torture.tar.49.lzt              ! -49 !
5.0:1       213,816,605 DDETT-Definitive_Decompression_English_Texts_Torture.tar.49_block128MB.lzt   ! -49 -b128 !
5.0:1       216,188,730 DDETT-Definitive_Decompression_English_Texts_Torture.tar.7z                  ! -t7z -mx9 !
5.8:1       186,555,864 DDETT-Definitive_Decompression_English_Texts_Torture.tar.block128MB.bbb      ! cfm128 !
3.2:1       328,252,314 DDETT-Definitive_Decompression_English_Texts_Torture.tar.Doboz
2.0:1       534,389,558 DDETT-Definitive_Decompression_English_Texts_Torture.tar.FastLZ              ! -2 !
2.8:1       386,308,041 DDETT-Definitive_Decompression_English_Texts_Torture.tar.lz4                 ! -9 !
3.2:1       336,177,082 DDETT-Definitive_Decompression_English_Texts_Torture.tar.lzh                 / Yoshizaki's 20+ years old LHA /
4.4:1       241,416,540 DDETT-Definitive_Decompression_English_Texts_Torture.tar.lzhds.nz            ! -cD !
6.8:1       157,378,486 DDETT-Definitive_Decompression_English_Texts_Torture.tar.cm.nz               ! -cc !
5.0:1       213,997,220 DDETT-Definitive_Decompression_English_Texts_Torture.tar.order04.PPMonstr    ! -m1024 -o4 !
5.7:1       189,072,423 DDETT-Definitive_Decompression_English_Texts_Torture.tar.order06.PPMonstr    ! -m1024 -o6 !
6.0:1       179,543,238 DDETT-Definitive_Decompression_English_Texts_Torture.tar.order08.PPMonstr    ! -m1024 -o8 !
6.3:1       169,784,214 DDETT-Definitive_Decompression_English_Texts_Torture.tar.order16.PPMonstr    ! -m1024 -o16 !
6.3:1       169,634,437 DDETT-Definitive_Decompression_English_Texts_Torture.tar.order32.PPMonstr    ! -m1024 -o32 !
2.5:1       427,885,522 DDETT-Definitive_Decompression_English_Texts_Torture.tar.QuickLZ             / Level 3 /
3.9:1       271,076,720 DDETT-Definitive_Decompression_English_Texts_Torture.tar.sr2
4.4:1       245,308,541 DDETT-Definitive_Decompression_English_Texts_Torture.tar.ST3_block128MB.bsc  ! -m0b128 -Tt ! / Version 2.3.0 /
5.0:1       214,353,318 DDETT-Definitive_Decompression_English_Texts_Torture.tar.ST4_block128MB.bsc  ! -m1b128 -Tt ! / Version 2.3.0 /
5.3:1       201,591,671 DDETT-Definitive_Decompression_English_Texts_Torture.tar.ST5_block128MB.bsc  ! -m2b128 -Tt ! / Version 2.3.0 /
5.8:1       184,725,469 DDETT-Definitive_Decompression_English_Texts_Torture.tar.BWT_block128MB.bsc  ! -m3b128 -Tt ! / Version 2.3.0 /
6.0:1       179,037,137 DDETT-Definitive_Decompression_English_Texts_Torture.tar.tangelo             / Version 2.3 /
1.4:1       758,789,281 DDETT-Definitive_Decompression_English_Texts_Torture.tar.0001K.Yappy         ! 1024 10000 !
1.5:1       683,426,649 DDETT-Definitive_Decompression_English_Texts_Torture.tar.0002K.Yappy         ! 2048 10000 !
1.7:1       612,838,178 DDETT-Definitive_Decompression_English_Texts_Torture.tar.0004K.Yappy         ! 4096 10000 !
1.9:1       563,557,594 DDETT-Definitive_Decompression_English_Texts_Torture.tar.0008K.Yappy         ! 8192 10000 !
2.0:1       538,909,255 DDETT-Definitive_Decompression_English_Texts_Torture.tar.0016K.Yappy         ! 16384 10000 !
2.0:1       520,393,884 DDETT-Definitive_Decompression_English_Texts_Torture.tar.0064K.Yappy         ! 65536 10000 !
2.0:1       515,767,567 DDETT-Definitive_Decompression_English_Texts_Torture.tar.0256K.Yappy         ! 262144 10000 !
2.1:1       514,318,888 DDETT-Definitive_Decompression_English_Texts_Torture.tar.4096K.Yappy         ! 4194304 10000 !
2.9:1       370,986,428 DDETT-Definitive_Decompression_English_Texts_Torture.tar.Nakamichi
2.7:1       393,403,817 DDETT-Definitive_Decompression_English_Texts_Torture.tar.Z
3.3:1       319,107,528 DDETT-Definitive_Decompression_English_Texts_Torture.tar.zip                 ! -tzip -mx9 !

D:\DDETT>type DDETT-Definitive_Decompression_English_Texts_Torture.tar.SHA1
50c692b4500bab649789fa642c8619a495d399bc  DDETT-Definitive_Decompression_English_Texts_Torture.tar
The tarred files are listed:
07/31/2014  10:01 AM        21,697,024 _(ebook)Collection_Star_Trek(58_books).tar
07/31/2014  10:01 AM        28,762,624 _Agatha.Christie.Complete.Work.74.books.tar
07/31/2014  10:01 AM        12,432,384 _Cambridge.History.of.Japan.6.Volumes.Set.PDF.ebooks.tar
07/31/2014  10:01 AM         4,182,016 _hemingway_ernest_-_14_books_in_txt_format.tar
07/31/2014  10:01 AM        43,488,256 _Swedenborg.tar
07/31/2014  10:01 AM            98,848 3333_Latin_Powers.TXT
07/31/2014  10:01 AM           595,144 A_Popular_Dictionary_of_Shinto_(1997)BBS.pdf.txt
07/31/2014  10:01 AM           316,172 ALICE.txt
07/31/2014  10:01 AM           152,089 alice29.txt
07/31/2014  10:01 AM         1,865,925 An_Encyclopedia_of_Swearing_-_The_Social_History_of_Oaths_Profanity_Foul_Language_And_Ethnic_Slurs_in_the_English-speaking_World.pdf.txt
07/31/2014  10:01 AM         2,137,088 Baum_Frank.tar
07/31/2014  10:01 AM         4,587,478 bible.txt
07/31/2014  10:01 AM        19,832,832 Books_by_Krishnamurti.tar
07/31/2014  10:01 AM           739,840 Bradbury_Ray.tar
07/31/2014  10:01 AM         5,574,144 Burroughs_Edgar_Rice.tar
07/31/2014  10:01 AM            27,703 CARLOS.TXT
07/31/2014  10:01 AM         5,851,648 Carlos_Castaneda.tar
07/31/2014  10:01 AM         4,447,744 Clarke_Arthur.tar
07/31/2014  10:01 AM         3,697,142 Complete_Sherlock_Holmes.txt
07/31/2014  10:01 AM         1,104,971 Dead_Souls-Nikolai_Vasilievich_Gogol.txt
07/31/2014  10:01 AM         3,019,264 Defoe_Daniel.tar
07/31/2014  10:01 AM        10,192,446 dickens
07/31/2014  10:01 AM         2,891,125 Don_Quixote.txt
07/31/2014  10:01 AM         1,333,914 Dumas_Alexandre_-_The_Three_Musketeers.txt
07/31/2014  10:01 AM        19,373,635 Encyclopedia_Of_Astronomy_And_Astrophysics.pdf.txt
07/31/2014  10:01 AM         4,033,424 Encyclopedia_of_Consciousness.pdf.txt
07/31/2014  10:01 AM         6,710,958 Encyclopedia_of_Insects.pdf.txt
07/31/2014  10:01 AM         7,086,083 Encyclopedia_of_Occultism_and_Parapsychology_A-L.pdf.txt
07/31/2014  10:01 AM         7,082,800 Encyclopedia_of_Occultism_and_Parapsychology_M-Z.pdf.txt
07/31/2014  10:01 AM         1,821,285 English-Japanese_Dictionary.pdf.txt
07/31/2014  10:01 AM        65,638,667 enwikt-defs-20140206-en.tsv.txt
07/31/2014  10:01 AM         1,820,160 Fleurs_du_mal.tar
07/31/2014  10:01 AM        11,546,860 Goyathlay.txt
07/31/2014  10:01 AM         7,477,248 Herbert_Frank.tar
07/31/2014  10:01 AM        28,402,602 HERITAGE.TXT
07/31/2014  10:01 AM         2,036,212 Japanese-English_Dictionary.pdf.txt
07/31/2014  10:01 AM        30,477,824 JRR_Tolkien.tar
07/31/2014  10:01 AM        14,060,032 King_Stephen.tar
07/31/2014  10:01 AM         5,735,936 KJV_Bible.tar
07/31/2014  10:01 AM         5,248,512 Koontz_Dean.tar
07/31/2014  10:01 AM        15,677,952 Krishnamurti.tar
07/31/2014  10:01 AM         2,173,035 L_R_Hubbard_-_Battlefield_Earth-_FULL.txt
07/31/2014  10:01 AM         6,175,246 Latin_DICTPAGE.RAW.TXT
07/31/2014  10:01 AM         3,903,143 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd
07/31/2014  10:01 AM        24,823,016 mobythesaurus.txt
07/31/2014  10:01 AM         1,387,801 Nana.txt
07/31/2014  10:01 AM         4,642,789 NETBible_Noteless.txt
07/31/2014  10:01 AM         1,882,268 New_Larousse_Encyclopedia_Of_Mythology.pdf.txt
07/31/2014  10:01 AM        12,921,904 New_Oxford_Textbook_of_Psychiatry.txt
07/31/2014  10:01 AM       132,728,832 New_Shorter_Oxford_English_Dictionary_fifth_edition.tar
07/31/2014  10:01 AM       206,908,949 OSHO.TXT
07/31/2014  10:01 AM         6,293,011 OXFORD_Collocations_Dictionary.txt
07/31/2014  10:01 AM           278,528 Poe_Edgar_Allan.tar
07/31/2014  10:01 AM         3,284,649 Project_Gutenberg_EBook_of_War_and_Peace_by_Leo_Tolstoy_wrnpc11.txt
07/31/2014  10:01 AM         1,255,801 Project_Gutenberg_Etext_of_Moby_Dick_moby11.txt
07/31/2014  10:01 AM         1,702,164 Project_Gutenberg_Etext_of_The_Canterbury_Tales_cbtls11.txt
07/31/2014  10:01 AM         3,497,369 Project_Gutenberg_Etext_Of_The_CIA_World_Factbook_for_2000_world00.txt
07/31/2014  10:01 AM         6,884,210 Project_Gutenberg_Etext_of_The_Complete_Memoires_of_Casanova_csnva11.txt
07/31/2014  10:01 AM         5,527,772 Project_Gutenberg_Etext_of_the_Entire_PG_Memoirs_of_Napoleon_napol10.txt
07/31/2014  10:01 AM         1,798,759 Project_Gutenberg_Etext_of_The_Works_of_Rudyard_Kipling_1vkip10.txt
07/31/2014  10:01 AM         1,899,162 Project_Gutenberg_Etext_The_Descent_of_Man_by_Charles_Darwin_dscmn10.txt
07/31/2014  10:01 AM         3,984,896 Quran.tar
07/31/2014  10:01 AM         2,926,838 Rosemary_Ellen_Cuiley_The_Encyclopedia_of_Ghosts_Spirits_2007.pdf.txt
07/31/2014  10:01 AM         4,999,168 Sahih_Bukhari.tar
07/31/2014  10:01 AM         5,582,655 shaks12.txt
07/31/2014  10:01 AM        32,536,576 Sivananda.tar
07/31/2014  10:01 AM         5,433,344 Sookie_Stackhouse.tar
07/31/2014  10:01 AM         9,220,096 Sri_Sathya_Sai_Discourses.tar
07/31/2014  10:01 AM        60,006,912 Sri_Sathya_Sai_major_texts.tar
07/31/2014  10:01 AM         2,718,208 SriAurobindo.tar
07/31/2014  10:01 AM         5,558,784 SriSwamiChidanandaji.tar
07/31/2014  10:01 AM         3,955,200 Star_Wars.tar
07/31/2014  10:01 AM         2,620,928 SwamiKrishnanandaDiscourses.tar
07/31/2014  10:01 AM        18,963,456 Terry_Pratchett.tar
07/31/2014  10:01 AM         1,769,929 The.Encyclopedia.Of.Magic.And.Alchemy---420ebooks.pdf.txt
07/31/2014  10:01 AM         1,171,090 The.Encyclopedia.of.Religious.Phenomena---420ebooks.pdf.txt
07/31/2014  10:01 AM           532,612 THE_COLD_HEART_by_Wilhelm_Hauff_six_translations.txt
07/31/2014  10:01 AM         3,942,903 The_Count_of_Monte_Cristo.txt
07/31/2014  10:01 AM         6,375,032 The_Crusades_An_Encyclopedia_4_Vol._(2006).pdf.txt
07/31/2014  10:01 AM           809,222 The_Dice_Man_-_Luke_Rhinehart.txt
07/31/2014  10:01 AM         1,946,410 The_Encyclopedia_of_Celtic_Mythology_and_Folklore.pdf.txt
07/31/2014  10:01 AM         1,426,253 The_Encyclopedia_of_Demons_and_Demonology.pdf.txt
07/31/2014  10:01 AM         2,042,836 The_Encyclopedia_of_Witches_Witchcraft_and_Wicca.pdf.txt
07/31/2014  10:01 AM           109,909 The_Little_Match_Girl+The_Steadfast_Tin_Soldier_15-translations.txt
07/31/2014  10:01 AM         4,596,124 The_Oxford_Thesaurus_An_A-Z_Dictionary_of_Synonyms.txt
07/31/2014  10:01 AM         1,391,054 The_Project_Gutenberg_Etext_of_The_Idiot_by_Dostoevsky_idiot10.txt
07/31/2014  10:01 AM         1,113,088 The_Simplicissimus_Project.tar
07/31/2014  10:01 AM           689,260 The_Teachings_and_Practices_of_the_Early_Quanzhen_Taoist_Masters.pdf.txt
07/31/2014  10:01 AM         2,888,704 The_Works_of_Edgar_Allen_Poe_(five_volumes).tar
07/31/2014  10:01 AM           522,911 Thus_Spake_Zarathustra_by_Friedrich_Nietzsche.txt
07/31/2014  10:01 AM           246,461 Tzu_Sun_-_The_Art_Of_War.txt
07/31/2014  10:01 AM         6,906,816 UF-ENG-001World-2009-0.22.SRT.txt
07/31/2014  10:01 AM         4,612,608 Webster_Bible.tar
07/31/2014  10:01 AM         1,384,960 Wells_HG.tar
07/31/2014  10:01 AM        22,457,344 www.gutenberg.org_Folklore_(Bookshelf).tar
07/31/2014  10:01 AM        14,141,440 www.gutenberg.org_Paganism_(Bookshelf).tar
For a long time I wanted to have one solid English comparative TEXTUAL [DE]COMPRESSION corpus, so here it is.

Nakamichi

To decompress 'DDETT' you need the package containing the source and YMM executable:
http://www.sanmayce.com/Nakamichi/Nakamichi_Washi_and_Kaidanji.zip
Note: Reuploaded 2014-Aug-07, by mistake a few hours older draft had been zipped, now OK, my apologies.

And quick showdown against the superfast Yappy on laptop with Core 2 Q9550s @2.83GHz:
C:\DDETT>Nakamichi_Washi_YMMless.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar
Nakamichi 'Washi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Compressing 1082907648 bytes ...
\; Each rotation means 64KB are encoded; Done 100%
NumberOfFullLiterals (lower-the-better): 290
NumberOfTinyMatches: 35428349
NumberOfShortMatches: 64570042
NumberOfMediumMatches: 16084083
NumberOfLongMatches: 5077863
RAM-to-RAM performance: 4 KB/s.

C:\DDETT>Nakamichi_Washi_YMMless.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar.Nakamichi /report
Nakamichi 'Washi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 370986428 bytes ...
RAM-to-RAM performance: 493 MB/s.
Memory pool starting address: 0000000016720080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 94693 clocks or 2.768MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 17%

C:\DDETT>Yappy_32bit.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar 1024 10000
YAPPY: [b 1K] bytes 1082907648 -> 758789281  70.1%  comp  49.0 MB/s  uncomp 817.7 MB/s 

C:\DDETT>Yappy_32bit.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar 2048 10000
YAPPY: [b 2K] bytes 1082907648 -> 683426649  63.1%  comp  43.7 MB/s  uncomp 761.0 MB/s 

C:\DDETT>Yappy_32bit.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar 4096 10000
YAPPY: [b 4K] bytes 1082907648 -> 612838178  56.6%  comp  38.4 MB/s  uncomp 711.7 MB/s 

C:\DDETT>Yappy_32bit.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar 8192 10000
YAPPY: [b 8K] bytes 1082907648 -> 563557594  52.0%  comp  32.7 MB/s  uncomp 675.4 MB/s 

C:\DDETT>Yappy_32bit.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar 16384 10000
YAPPY: [b 16K] bytes 1082907648 -> 538909255  49.8%  comp  30.8 MB/s  uncomp 675.4 MB/s 

C:\DDETT>Yappy_32bit.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar 65536 10000
YAPPY: [b 64K] bytes 1082907648 -> 520393884  48.1%  comp  29.9 MB/s  uncomp 668.9 MB/s 

C:\DDETT>Yappy_32bit.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar 262144 10000
YAPPY: [b 256K] bytes 1082907648 -> 515767567  47.6%  comp  30.1 MB/s  uncomp 675.4 MB/s 

C:\DDETT>Yappy_32bit.exe DDETT-Definitive_Decompression_English_Texts_Torture.tar 4194304 10000
YAPPY: [b 4096K] bytes 1082907648 -> 514318888  47.5%  comp  29.5 MB/s  uncomp 668.9 MB/s 
My understanding is that 'Washi' with YMM support is gonna SCREAM! To be seen...

Nakamichi

Nakamichi

Eagle Talons
Get a Grip!
Like other birds of prey, eagles have very special feet, which are different from those of other animals.
We call those special feet talons.
Eagle feet have claws, but so do the feet on dogs, cats, squirrels, raccoons, robins, and even tiny hummingbirds.
What makes eagle feet different? First, the claws must be extremely strong and sharp.
When an eagle catches a fish, those claws have to slice into a stiff,
strong fish with thick scales protecting its body. (All birds of prey use their feet for killing,
from the tiniest Elf Owl and American Kestrel to the largest eagles.)
But sharp claws are NOT the reason eagle feet are called talons; after all, cats have sharp claws, too,
but they don't have talons. What makes talons different? They are designed to carry things.
An eagle foot is made up of four muscular toes, powerful enough to hang onto a fairly large fish
as the eagle carries it through the air.
Eagle talons are among the largest and strongest in the bird world. However, their feet aren't quite
as well designed for capturing fish as ospreys' feet are. One reason has to do with their toes.
Both eagles and osprey have three front toes and one back toe. But one of an osprey's front toes is opposable,
like our thumbs, and it can rotate backward.
How does an opposable toe help an osprey catch fish? When an eagle holds a fish, it has the front toes
from both feet on one side of the fish (6 toes) and just the back toes (2) on the other side of the fish.
Ospreys catch their fish with two toes from both feet on each side of the fish. This is more balanced.
Fish thrash their bodies back and forth when they are struggling against being caught.
If they thrash away from an eagle's front toes (with no back toe pushing the opposite way),
the fish may actually jerk itself out of the eagle's talons. If the eagle is over water,
it's even possible that the fish will survive the fall and its injuries may heal.
A fifth-grade boy in Duluth, Minnesota once caught a walleye that had six scars on one side of its body,
and two scars on the other, exactly the size and shape of eagle toes!
/An excerpt from http://www.learner.org 'Journey North Bald Eagles'/

SputnikFace: Last summer I saw a hawk closeup for the first time perched on a basketball court fence. Fucker was huge.
Looked like Danny Devito was standing on the fence in a hawk suit.
And he gave 0 fucks, glaring at me like 'bish, I'll crush your fuckin arms and peck out your eyes, keep ballin, muthafucka'.
Real shit.
/A comment on http://www.reddit.com 'Ever wonder how big an eagle talon is?'/

dancingwithcats: 900 psi is enough to crush a human skull if applied properly. Now a few more people can be terrified.
/A comment on http://www.reddit.com 'Ever wonder how big an eagle talon is?'/

LostSoulsAlliance: Depending on the type of eagle, they can exert between 500 - 1000 psi with their claws...
more than enough to crush your arm bones should it be perched there.
They also have an arrangement of bones/tendons that allows them to lock like a vice-grip--
meaning they can squeeze the hell out of your arm, then "lock" their grip and remain tightly attached with minimal effort.
/A comment on http://www.reddit.com 'Ever wonder how big an eagle talon is?'/

Update, 2014-Jul-29:

First, I thank Fantasy who helped me again.
It's time for a native YMM variant, a 'Kumataka' derivative called 'Washi', right to the etude:
unsigned int Decompress(char* ret, char* src, unsigned int srcSize){
	unsigned int srcIndex=0;
	unsigned int retIndex=0;
	unsigned int DWORDtrio;
	unsigned int Flag;
	uint64_t FlagMASK; //=       0xFFFFFFFFFFFFFFFF;
	uint64_t FlagMASKnegated; //=0x0000000000000000;

	while(srcIndex < srcSize){
		DWORDtrio = *(unsigned int*)&src[srcIndex];
// |1stLSB   |2ndLSB  |3rdLSB   |
// ------------------------------
// |xxxxx|TTT|xxxxxxxF|xxxxxx|LL|
// ------------------------------
// [1bit                   24bit]
// LL = 0 means MatchLength (32>>LL) or 32
// LL = 1 means MatchLength (32>>LL) or 16
// LL = 2 means MatchLength (32>>LL) or 8
// LL = 3 means MatchLength (32>>LL) or 4
// TO-DO: F = 1 means MatchOffset 3 bytes long
// TO-DO: F = 0 means MatchOffset 2 bytes long
		Flag=!(DWORDtrio & 0xE0);
		// In here Flag=0|1
		FlagMASKnegated= Flag - 1; // -1|0
		FlagMASK= ~FlagMASKnegated;
		// DWORDtrio&(0xFFFFFF>>((DWORDtrio&0x8000)>>12)) // shift by 0/8 for 3/2 bytes
		// DWORDtrio&(0xFFFFFF>>(((DWORDtrio&0xFFFF)>>15)<<3)) // shift by 0/8 for 3/2 bytes
//				#ifdef _N_XMM
//		SlowCopy128bit( (const char *)( ((uint64_t)(src+srcIndex+1+16*(0))&FlagMASK) + ((uint64_t)(ret+retIndex-(DWORDtrio&0x3FFFFF))&FlagMASKnegated) ), (ret+retIndex+16*(0)));
//		SlowCopy128bit( (const char *)( ((uint64_t)(src+srcIndex+1+16*(1))&FlagMASK) + ((uint64_t)(ret+retIndex-(DWORDtrio&0x3FFFFF)+16*(1))&FlagMASKnegated) ), (ret+retIndex+16*(1)));
//				#endif
				#ifndef _N_YMM
		memcpy((ret+retIndex+16*(0)), (const char *)( ((uint64_t)(src+srcIndex+1)&FlagMASK) + ((uint64_t)(ret+retIndex-(DWORDtrio&0x3FFFFF))&FlagMASKnegated) ), 32);
				#endif
				#ifdef _N_YMM
		SlowCopy256bit( (const char *)( ((uint64_t)(src+srcIndex+1)&FlagMASK) + ((uint64_t)(ret+retIndex-(DWORDtrio&0x3FFFFF))&FlagMASKnegated) ), (ret+retIndex+16*(0)));
				#endif
		srcIndex+= ((uint64_t)((DWORDtrio & 0xFF)+1)&FlagMASK) + ((uint64_t)(3)&FlagMASKnegated) ;
		retIndex+= ((uint64_t)((DWORDtrio & 0xFF))&FlagMASK) +   ((uint64_t)(Min_Match_Length>>((DWORDtrio&0xFFFFFF)>>22))&FlagMASKnegated) ;
	}
	return retIndex;
}

; 'Washi' decompression loop, be-40+2=128 bytes long:
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -QxSSE2 -D_N_YMM -FAcs";

.B8.3::                         
  00040 42 8b 0c 12      mov ecx, DWORD PTR [rdx+r10]           
  00044 33 ff            xor edi, edi                           
  00046 f7 c1 e0 00 00 
        00               test ecx, 224                          
  0004c 0f 44 f8         cmove edi, eax                         
  0004f 49 89 cc         mov r12, rcx                           
  00052 ff cf            dec edi                                
  00054 49 81 e4 ff ff 
        3f 00            and r12, 4194303                       
  0005b 49 f7 dc         neg r12                                
  0005e 48 89 fe         mov rsi, rdi                           
  00061 4d 03 e1         add r12, r9                            
  00064 48 f7 d6         not rsi                                
  00067 4e 8d 74 12 01   lea r14, QWORD PTR [1+rdx+r10]         
  0006c 4d 03 e3         add r12, r11                           
  0006f 4c 23 f6         and r14, rsi                           
  00072 4c 23 e7         and r12, rdi                           
  00075 0f b6 d9         movzx ebx, cl                          
  00078 ff c3            inc ebx                                
  0007a c4 81 7e 6f 04 
        34               vmovdqu ymm0, YMMWORD PTR [r12+r14]    
  00080 49 89 fc         mov r12, rdi                           
  00083 48 23 de         and rbx, rsi                           
  00086 49 83 e4 03      and r12, 3                             
  0008a 49 03 dc         add rbx, r12                           
  0008d 49 03 da         add rbx, r10                           
  00090 41 89 da         mov r10d, ebx                          
  00093 0f b6 d9         movzx ebx, cl                          
  00096 81 e1 ff ff ff 
        00               and ecx, 16777215                      
  0009c c1 e9 16         shr ecx, 22                            
  0009f 48 23 de         and rbx, rsi                           
  000a2 be 20 00 00 00   mov esi, 32                            
  000a7 d3 ee            shr esi, cl                            
  000a9 48 23 f7         and rsi, rdi                           
  000ac 48 03 de         add rbx, rsi                           
  000af 49 03 db         add rbx, r11                           
  000b2 c4 81 7e 7f 04 
        19               vmovdqu YMMWORD PTR [r9+r11], ymm0     
  000b8 45 3b d0         cmp r10d, r8d                          
  000bb 41 89 db         mov r11d, ebx                          
  000be 72 80            jb .B8.3 
------------------------------------------------------------------------------------------------------------------------------------------------------
| compressor \ filedataset      | alice29.txt    | CalgaryCorpus.tar  | shaks12.txt        | dickens             | enwik8                            |
------------------------------------------------------------------------------------------------------------------------------------------------------
| UNCOMPRESSED                  | 152,089        | 3,153,408          | 5,582,655          | 10,192,446          | 100,000,000                       |
| Nakamichi 'Jiten'      (16KB) | n/a            | n/a                | n/a                | n/a                 | n/a                               |
| Nakamichi 'Kaidanji'   (64KB) | 092,285 / 0328 | 1,862,449 / 011838 | 3,391,657 / 006799 | 06,387,079 / 014977 | 063,430,147 / 0283161 / 1014 MB/s |
| Nakamichi 'Sanbashi'    (2MB) | 095,682 / 0267 | 1,560,737 / 006841 | 2,559,110 / 000519 | 04,614,250 / 000558 | 046,881,842 / 0026545 /  607 MB/s |
| Nakamichi 'Washi'       (4MB) | 088,897 / 0006 | 1,484,221 / 000966 | 2,384,536 / 000011 | 04,261,276 / 000012 | 042,714,346 / 0000232 / 435+ MB/s |
| 7z's gz, Ultra Deflate32      | 051,707        | 0,980,026          | 1,934,787          | 03,681,828          | 035,102,891                       |
| 7z's zip, Ultra Deflate64     | 050,051        | 0,945,849          | 1,834,240          | 03,508,645          | 033,757,921                       |
| TANGELO 2.3                   | 039,160        | 0,710,066          | 1,236,021          | 02,279,659          | 020,921,619                       |
| LZ4 v1.4, -9                  | 063,705        | 1,195,853          | 2,315,036          | 04,442,992          | 042,283,904         / 2186.9 MB/s |
| Yappy, 8192 10000             | 087,965        | 1,654,203          | 3,337,964          | 06,374,780          | 057,701,807         /  698.7 MB/s |
| Yappy, 65536 10000            | 081,217        | 1,544,271          | 3,120,688          | 05,912,295          | 054,162,908         /  679.4 MB/s |
| Yappy, 1048576 10000          | 080,353        | 1,530,823          | 3,091,493          | 05,850,648          | 053,687,370         /  679.4 MB/s |
------------------------------------------------------------------------------------------------------------------------------------------------------
Note:
Core 2 Q9550s @2.83GHz was used, since Q9550s doesn't support YMM 'Washi' used the slow 'memcpy' thus 435+ MB/s.
In its native mode (AVX) the result should be MUCH better.

Nakamichi
- washi ~ eagle

I played all my cards/jokers and got trumped once more... simply lzturbo and LZ4 are too strong.
-------------------------------------------------
| Compressor | version              | options   |
-------------------------------------------------
| lzturbo    | Apr 29 2013, 1.1     | -19       |
| LZ4        | Sep 17 2013, 1.4     | -9 -Sx -v |
| Nakamichi  | Jul 28 2014, 'Washi' | NONE      |
-------------------------------------------------
07/29/2014  05:51 PM         5,851,648 Carlos_Castaneda.tar
07/29/2014  05:51 PM         2,282,027 Carlos_Castaneda.tar.lz4
07/29/2014  05:51 PM         2,246,350 Carlos_Castaneda.tar.lzt
07/29/2014  07:05 AM         2,261,121 Carlos_Castaneda.tar.Nakamichi

07/29/2014  05:52 PM         3,697,142 Complete_Sherlock_Holmes.txt
07/29/2014  05:52 PM         1,592,036 Complete_Sherlock_Holmes.txt.lz4
07/29/2014  05:52 PM         1,571,775 Complete_Sherlock_Holmes.txt.lzt
07/29/2014  09:35 AM         1,609,989 Complete_Sherlock_Holmes.txt.Nakamichi

07/29/2014  05:52 PM         1,104,971 Dead_Souls-Nikolai_Vasilievich_Gogol.txt
07/29/2014  05:52 PM           391,220 Dead_Souls-Nikolai_Vasilievich_Gogol.txt.lz4
07/29/2014  05:52 PM           384,711 Dead_Souls-Nikolai_Vasilievich_Gogol.txt.lzt
07/29/2014  09:37 AM           457,529 Dead_Souls-Nikolai_Vasilievich_Gogol.txt.Nakamichi

07/29/2014  05:51 PM        10,192,446 dickens
07/29/2014  05:51 PM         4,442,988 dickens.lz4
07/29/2014  05:51 PM         4,376,867 dickens.lzt
07/29/2014  07:50 AM         4,261,276 dickens.Nakamichi

07/29/2014  06:01 PM       100,000,000 enwik8
07/29/2014  06:02 PM        42,283,900 enwik8.lz4
07/29/2014  06:02 PM        41,929,879 enwik8.lzt
07/29/2014  09:44 AM        42,714,346 enwik8.Nakamichi

07/29/2014  05:51 PM         3,984,896 Quran.tar
07/29/2014  05:51 PM         1,181,474 Quran.tar.lz4
07/29/2014  05:51 PM         1,168,352 Quran.tar.lzt
07/29/2014  07:58 AM         1,282,396 Quran.tar.Nakamichi

07/29/2014  05:51 PM         4,999,168 Sahih_Bukhari.tar
07/29/2014  05:51 PM         1,513,903 Sahih_Bukhari.tar.lz4
07/29/2014  05:51 PM         1,490,615 Sahih_Bukhari.tar.lzt
07/29/2014  08:09 AM         1,554,461 Sahih_Bukhari.tar.Nakamichi

07/29/2014  05:51 PM         5,582,655 shaks12.txt
07/29/2014  05:51 PM         2,315,032 shaks12.txt.lz4
07/29/2014  05:51 PM         2,292,169 shaks12.txt.lzt
07/29/2014  08:33 AM         2,384,536 shaks12.txt.Nakamichi

07/29/2014  05:51 PM         6,906,816 UF-ENG-001World-2009-0.22.SRT.txt
07/29/2014  05:52 PM         2,580,766 UF-ENG-001World-2009-0.22.SRT.txt.lz4
07/29/2014  05:52 PM         2,541,429 UF-ENG-001World-2009-0.22.SRT.txt.lzt
07/29/2014  08:57 AM         2,602,606 UF-ENG-001World-2009-0.22.SRT.txt.Nakamichi

07/29/2014  05:51 PM         4,612,608 Webster_Bible.tar
07/29/2014  05:51 PM         1,686,144 Webster_Bible.tar.lz4
07/29/2014  05:51 PM         1,655,983 Webster_Bible.tar.lzt
07/29/2014  09:10 AM         1,769,937 Webster_Bible.tar.Nakamichi

07/29/2014  05:52 PM        12,432,384 _Cambridge.History.of.Japan.6.Volumes.Set.PDF.ebooks.tar
07/29/2014  05:52 PM         5,034,638 _Cambridge.History.of.Japan.6.Volumes.Set.PDF.ebooks.tar.lz4
07/29/2014  05:52 PM         4,975,335 _Cambridge.History.of.Japan.6.Volumes.Set.PDF.ebooks.tar.lzt
07/29/2014  10:35 AM         5,022,756 _Cambridge.History.of.Japan.6.Volumes.Set.PDF.ebooks.tar.Nakamichi

07/29/2014  05:52 PM         4,182,016 _hemingway_ernest_-_14_books_in_txt_format.tar
07/29/2014  05:52 PM         1,640,401 _hemingway_ernest_-_14_books_in_txt_format.tar.lz4
07/29/2014  05:52 PM         1,615,037 _hemingway_ernest_-_14_books_in_txt_format.tar.lzt
07/29/2014  10:46 AM         1,741,129 _hemingway_ernest_-_14_books_in_txt_format.tar.Nakamichi
The package containing the source and above English texts:
http://www.sanmayce.com/Nakamichi/Nakamichi_Washi.zip
Note: Reuploaded 2014-Aug-07, by mistake a few hours older draft had been zipped, now OK, my apologies.

Incoming Nakamichi 'Jiten' will feature 2 (instead of 3) bytes.

Update, 2014-Jul-24:

Time for a thoroughbred to enter, a variant embodying all the good features of 'Kaibutsu'&'Hitomi' called 'Kumataka'.
|1stLSB   |2ndLSB  |
--------------------
|xxxx|TTTT|xxxxxxxF|
--------------------
[1bit         16bit]
The 5/6/7/8 bits are the TAG, the TAG is within the OFFSET area, all the transfers are via one XMM register.
The flag (16th bit) says whether Match_Length is (16-0)>>0/(16-0)>>1 i.e. 16/8.
Performancewise, 'Kumataka' disappoints, yet it is a blueprint for future variants.
A two-faceted tweak makes 'Kumataka' a standout, it uses only 31KB window, 16bit is not used as in 'Hanazakari'.
In 'Hanazakari' the inner 32KB are 'saturated' with long matches while the short ones are 'banished' in outer window's 32KB.
And quick results for 'enwik8' on Core2 T7500:
Nakamichi 'Nekomata', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 52146282 bytes ...
RAM-to-RAM performance: 435 MB/s.

Nakamichi 'Hitomi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 52146282 bytes ...
RAM-to-RAM performance: 433 MB/s.

Nakamichi 'Kitsune', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 57108834 bytes ...
RAM-to-RAM performance: 379 MB/s.

Nakamichi 'Kaidanji', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 63430147 bytes ...
RAM-to-RAM performance: 676 MB/s.

Nakamichi 'Kumataka', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 66433275 bytes ...
RAM-to-RAM performance: 671 MB/s.

YAPPY: [b 1K] bytes 100000000 -> 73533773  73.5%  comp  41.3 MB/s  uncomp 658.4 MB/s 
YAPPY: [b 2K] bytes 100000000 -> 67516056  67.5%  comp  38.0 MB/s  uncomp 602.5 MB/s 
YAPPY: [b 4K] bytes 100000000 -> 61757720  61.8%  comp  34.2 MB/s  uncomp 547.6 MB/s 
YAPPY: [b 8K] bytes 100000000 -> 57701807  57.7%  comp  30.7 MB/s  uncomp 516.4 MB/s 
YAPPY: [b 64K] bytes 100000000 -> 54162908  54.2%  comp  28.7 MB/s  uncomp 509.1 MB/s 
YAPPY: [b 1024K] bytes 100000000 -> 53687370  53.7%  comp  28.3 MB/s  uncomp 509.5 MB/s 

Nakamichi 'Kumataka', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 66433275 bytes ...
RAM-to-RAM performance: 676 MB/s.
Memory pool starting address: 00000000044F0080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 139792 clocks or 1.875MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 36%
Nakamichi
熊鷹 - kumataka ~ Crested Hawk-Eagle
; 'Kumataka' decompression loop, b9-40+2=123 bytes long:
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -QxSSE2 -D_N_XMM -FAcs";

.B7.3::                         
  00040 42 0f b7 0c 12   movzx ecx, WORD PTR [rdx+r10]          
  00045 33 ff            xor edi, edi                           
  00047 f7 c1 f0 00 00 
        00               test ecx, 240                          
  0004d 0f 44 f8         cmove edi, eax                         
  00050 49 89 cc         mov r12, rcx                           
  00053 ff cf            dec edi                                
  00055 49 81 e4 ff 7f 
        00 00            and r12, 32767                         
  0005c 49 f7 dc         neg r12                                
  0005f 48 89 fe         mov rsi, rdi                           
  00062 4d 03 e1         add r12, r9                            
  00065 48 f7 d6         not rsi                                
  00068 4e 8d 6c 12 01   lea r13, QWORD PTR [1+rdx+r10]         
  0006d 4d 03 e3         add r12, r11                           
  00070 4c 23 ee         and r13, rsi                           
  00073 4c 23 e7         and r12, rdi                           
  00076 0f b6 d9         movzx ebx, cl                          
  00079 ff c3            inc ebx                                
  0007b f3 43 0f 6f 04 
        2c               movdqu xmm0, XMMWORD PTR [r12+r13]     
  00081 49 89 fc         mov r12, rdi                           
  00084 48 23 de         and rbx, rsi                           
  00087 49 83 e4 02      and r12, 2                             
  0008b 49 03 dc         add rbx, r12                           
  0008e 49 03 da         add rbx, r10                           
  00091 41 89 da         mov r10d, ebx                          
  00094 0f b6 d9         movzx ebx, cl                          
  00097 c1 e9 0f         shr ecx, 15                            
  0009a 48 23 de         and rbx, rsi                           
  0009d be 0d 00 00 00   mov esi, 13                            
  000a2 d3 ee            shr esi, cl                            
  000a4 48 23 f7         and rsi, rdi                           
  000a7 48 03 de         add rbx, rsi                           
  000aa 49 03 db         add rbx, r11                           
  000ad f3 43 0f 7f 04 
        19               movdqu XMMWORD PTR [r9+r11], xmm0      
  000b3 41 89 db         mov r11d, ebx                          
  000b6 45 3b d0         cmp r10d, r8d                          
  000b9 72 85            jb .B7.3 
The package (116KB) containing the source:
http://www.sanmayce.com/Nakamichi/Nakamichi_Kumataka.zip

Update, 2014-Jul-20:

Another variant of 'Aratama' using Match_Length 4/8 is called 'Hitomi', it uses one GP register for stores.
'Nekomata' and 'Hitomi' are siblings, they decode interchangeably, however 'Hitomi' uses AND-masking instead of IMULs.
Sadly, no speed boost whatsoever on current Core 2 I play with.

Nakamichi
It is multiplicationless, with not a single IMUL.
unsigned int Decompress(char* ret, char* src, unsigned int srcSize){
	unsigned int srcIndex=0;
	unsigned int retIndex=0;
	unsigned int WORDpair;
	unsigned int Flag;
	uint64_t FlagMASK; //=       0xFFFFFFFFFFFFFFFF;
	uint64_t FlagMASKnegated; //=0x0000000000000000;
	uint64_t QWORD;

	while(srcIndex < srcSize){
		WORDpair = *(unsigned short int*)&src[srcIndex];
		//QWORD = *(uint64_t*)&src[srcIndex];

		//*(uint64_t*)(ret+retIndex+8*(0)) = *(uint64_t*)(src+srcIndex+1+8*(0));
		//srcIndex+=(((WORDpair & 0xFF)>>0)+1);                                           
		//retIndex+=(WORDpair & 0xFF)>>0;                                                 

		//*(uint64_t*)(ret+retIndex) = *(uint64_t*)(ret+retIndex-WORDpair);
		//srcIndex=srcIndex+2;
		//retIndex+=Min_Match_Length;

		Flag=!(WORDpair & 0xF0);

		// In here Flag=0|1
		FlagMASKnegated= Flag - 1; // -1|0
		FlagMASK= ~FlagMASKnegated;

		// Nekomata [
		//*(uint64_t*)(ret+retIndex) = *(uint64_t*)( (uint64_t)(src+srcIndex+1)*(Flag) + (uint64_t)(ret+retIndex-WORDpair)*(!Flag) );
		// Nekomata ]
		*(uint64_t*)(ret+retIndex) = *(uint64_t*)( ((uint64_t)(src+srcIndex+1)&FlagMASK) + ((uint64_t)(ret+retIndex-WORDpair)&FlagMASKnegated) );

		// Nekomata [
		//srcIndex+= ((WORDpair & 0xFF)+1 +1)*(Flag) + (2)*(!Flag) ; // +1 due to m^2's tweak
		//retIndex+= ((WORDpair & 0xFF) +1)*(Flag) + (Min_Match_Length + ((WORDpair & 0x08)>>1) )*(!Flag) ; // +1 due to m^2's tweak
		// Nekomata ]
		srcIndex+= ((uint64_t)((WORDpair & 0xFF)+1 +1)&FlagMASK) + ((uint64_t)(2)&FlagMASKnegated) ; // +1 due to m^2's tweak
		retIndex+= ((uint64_t)((WORDpair & 0xFF) +1)&FlagMASK) + ((uint64_t)(Min_Match_Length + ((WORDpair & 0x08)>>1) )&FlagMASKnegated) ; // +1 due to m^2's tweak
	}
	return retIndex;
}

; 'Hitomi' decompression loop, b2-40+2=116 bytes long:
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -D_N_GP -FAcs";

.B6.3::                         
  00040 46 0f b7 1c 12   movzx r11d, WORD PTR [rdx+r10]         
  00045 33 ff            xor edi, edi                           
  00047 41 f7 c3 f0 00 
        00 00            test r11d, 240                         
  0004e 4e 8d 6c 12 01   lea r13, QWORD PTR [1+rdx+r10]         
  00053 0f 44 f8         cmove edi, eax                         
  00056 45 89 dc         mov r12d, r11d                         
  00059 ff cf            dec edi                                
  0005b 49 f7 dc         neg r12                                
  0005e 48 89 fe         mov rsi, rdi                           
  00061 4c 03 e1         add r12, rcx                           
  00064 48 f7 d6         not rsi                                
  00067 4d 03 e1         add r12, r9                            
  0006a 4c 23 ee         and r13, rsi                           
  0006d 4c 23 e7         and r12, rdi                           
  00070 4b 8b 1c 2c      mov rbx, QWORD PTR [r12+r13]           
  00074 49 89 fd         mov r13, rdi                           
  00077 4a 89 1c 09      mov QWORD PTR [rcx+r9], rbx            
  0007b 49 83 e5 02      and r13, 2                             
  0007f 41 0f b6 db      movzx ebx, r11b                        
  00083 41 83 e3 08      and r11d, 8                            
  00087 41 d1 eb         shr r11d, 1                            
  0008a 41 83 c3 04      add r11d, 4                            
  0008e 4c 23 df         and r11, rdi                           
  00091 44 8d 63 02      lea r12d, DWORD PTR [2+rbx]            
  00095 ff c3            inc ebx                                
  00097 4c 23 e6         and r12, rsi                           
  0009a 4d 03 e5         add r12, r13                           
  0009d 48 23 de         and rbx, rsi                           
  000a0 4d 03 e2         add r12, r10                           
  000a3 49 03 db         add rbx, r11                           
  000a6 45 89 e2         mov r10d, r12d                         
  000a9 49 03 d9         add rbx, r9                            
  000ac 41 89 d9         mov r9d, ebx                           
  000af 45 3b d0         cmp r10d, r8d                          
  000b2 72 8c            jb .B6.3 
The package (45.7MB) containing the source:
http://www.sanmayce.com/Nakamichi/Nakamichi_Hitomi.zip

Nakamichi
They came flyin' from far away, now I'm under their spell
I love hearing the stories that they tell
They've seen places beyond my land and they've found new horizons
They speak strangely but I understand
And I dream I'm an eagle
And I dream I can spread my wings
Flyin' high, high, I'm a bird in the sky
I'm an eagle that rides on the breeze
High, high, what a feeling to fly
Over mountains and forests and seas
And to go anywhere that I please
As all good friends we talk all night, and we fly wing to wing
I have questions and they know everything
There's no limit to what I feel, we climb higher and higher
Am I dreamin' or is it all real
Is it true I'm an eagle
Is it true I can spread my wings
Flying high, high, I'm a bird in the sky
(I'm an eagle)
I'm an eagle that rides on the breeze
High, up high, what a feeling to fly
(What a feeling)
Over mountains and forests and seas
And to go anywhere that I please
/ABBA - 'Eagle' Lyrics/

Update, 2014-Jun-27:

Cattime.
The variant of 'Aratama' using Match_Length 4/8 is called 'Nekomata', it uses one GP register for stores.
------------------------------------------------------------------------------------------------------------------------------------------------------
| compressor \ filedataset      | alice29.txt    | CalgaryCorpus.tar  | shaks12.txt        | dickens             | enwik8                            |
------------------------------------------------------------------------------------------------------------------------------------------------------
| UNCOMPRESSED                  | 152,089        | 3,153,408          | 5,582,655          | 10,192,446          | 100,000,000                       |
| Nakamichi 'Kaidanji'   (64KB) | 092,285 / 0328 | 1,862,449 / 011838 | 3,391,657 / 006799 | 06,387,079 / 014977 | 063,430,147 / 0283161 / 1014 MB/s |
| Nakamichi 'Hanazakari' (64KB) | 080,270 / 0218 | 1,620,653 / 005396 | 2,737,003 / 000452 | 05,068,626 / 000630 | 054,693,537 / 0031544 /  756 MB/s |
| Nakamichi 'Aratama'    (64KB) | 095,560 / 5310 | 1,933,936 / 121922 | 3,518,810 / 190256 | 06,620,093 / 363922 | 066,251,713 / 4170885 /  866 MB/s |
| Nakamichi 'Kitsune'    (64KB) | 082,750 / 0218 | 1,635,663 / 012410 | 2,961,702 / 001998 | 05,572,322 / 003358 | 057,108,834 / 0204190 /  554 MB/s |
| Nakamichi 'Nekomata'   (64KB) | 074,931 / 0273 | 1,567,442 / 016978 | 2,654,994 / 002077 | 04,951,599 / 003544 | 052,146,282 / 0235685 /  607 MB/s |
| Nakamichi 'Sanbashi'    (2MB) | 095,682 / 0267 | 1,560,737 / 006841 | 2,559,110 / 000519 | 04,614,250 / 000558 | 046,881,842 / 0026545 /  607 MB/s |
| Nakamichi 'Kaiko'      (16MB) | 100,756 / 1371 | 1,834,530 / 022877 | 2,704,445 / 004412 | 04,694,952 / 004356 | 047,016,954 / 0097458 /  289 MB/s |
| 7z's gz, Ultra Deflate32      | 051,707        | 0,980,026          | 1,934,787          | 03,681,828          | 035,102,891                       |
| 7z's zip, Ultra Deflate64     | 050,051        | 0,945,849          | 1,834,240          | 03,508,645          | 033,757,921                       |
| TANGELO 2.3                   | 039,160        | 0,710,066          | 1,236,021          | 02,279,659          | 020,921,619                       |
| LZ4 v1.4, -9                  | 063,705        | 1,195,853          | 2,315,036          | 04,442,992          | 042,283,904         / 2186.9 MB/s |
| Yappy, 8192 10000             | 087,965        | 1,654,203          | 3,337,964          | 06,374,780          | 057,701,807         /  698.7 MB/s |
| Yappy, 65536 10000            | 081,217        | 1,544,271          | 3,120,688          | 05,912,295          | 054,162,908         /  679.4 MB/s |
| Yappy, 1048576 10000          | 080,353        | 1,530,823          | 3,091,493          | 05,850,648          | 053,687,370         /  679.4 MB/s |
------------------------------------------------------------------------------------------------------------------------------------------------------
Nakamichi
; 'Nekomata' decompression loop, af-40+2=113 bytes long:
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -D_N_GP -FAcs";

.B6.3::                         
  00040 45 33 e4         xor r12d, r12d                         
  00043 41 89 c5         mov r13d, eax                          
  00046 33 db            xor ebx, ebx                           
  00048 4c 03 e9         add r13, rcx                           
  0004b 45 0f b7 1c 12   movzx r11d, WORD PTR [r10+rdx]         
  00050 41 f7 c3 f0 00 
        00 00            test r11d, 240                         
  00057 44 89 de         mov esi, r11d                          
  0005a 4a 8d 7c 12 01   lea rdi, QWORD PTR [1+rdx+r10]         
  0005f 45 0f 44 e1      cmove r12d, r9d                        
  00063 48 f7 de         neg rsi                                
  00066 49 03 f5         add rsi, r13                           
  00069 45 85 e4         test r12d, r12d                        
  0006c 41 0f 44 d9      cmove ebx, r9d                         
  00070 49 0f af fc      imul rdi, r12                          
  00074 48 0f af f3      imul rsi, rbx                          
  00078 48 8b 34 3e      mov rsi, QWORD PTR [rsi+rdi]           
  0007c 49 89 75 00      mov QWORD PTR [r13], rsi               
  00080 41 0f b6 f3      movzx esi, r11b                        
  00084 41 83 e3 08      and r11d, 8                            
  00088 41 d1 eb         shr r11d, 1                            
  0008b 41 83 c3 04      add r11d, 4                            
  0008f 44 0f af db      imul r11d, ebx                         
  00093 8d 7e 02         lea edi, DWORD PTR [2+rsi]             
  00096 ff c6            inc esi                                
  00098 41 0f af fc      imul edi, r12d                         
  0009c 41 0f af f4      imul esi, r12d                         
  000a0 44 03 d7         add r10d, edi                          
  000a3 03 c6            add eax, esi                           
  000a5 41 03 c3         add eax, r11d                          
  000a8 45 8d 14 5a      lea r10d, DWORD PTR [r10+rbx*2]        
  000ac 45 3b d0         cmp r10d, r8d                          
  000af 72 8f            jb .B6.3 
The package (30.2MB) containing the source:
http://www.sanmayce.com/Nakamichi/Nakamichi_Nekomata.zip

At last I enclustered four NakamichiTA in one package decompressing enwik8, also Yappy is a companion.
The benchmark package is called 'White Fox', http://www.sanmayce.com/Downloads/_KAZE_Greed-For-Speed_Hakukitsune.7z (157MB).

Nakamichi

Also wanted to clear the picture by getting some stats on the laptop (Core 2 Q9550S) which I used lately.

AIDA64 Extreme Edition v3.00.2500 says:
Intel Core 2 Quad Q9550S 'Yorkfield' @ 2.83GHz

CPU FSB: 332.9 MHz (original: 333 MHz)
Memory Bus: 332.9 MHz
DRAM:FSB Ratio: 1:1
L1 Code Cache: 32 KB per core
L2 Cache: 2x 6 MB (On-Die, ECC, ASC, Full-Speed) 

[Memory Read]

CPU                          CPU Clock  Memory                   CL-RCD-RP-RAS  Read Speed
6x Core i7-3960X Extreme HT  3300 MHz   Quad DDR3-1600           9-9-9-24 CR2   45162 MB/s
8x Xeon X5550 HT             2666 MHz   Triple DDR3-1333         9-9-9-24 CR1   37831 MB/s
8x FX-8150                   3600 MHz   Dual DDR3-1866           9-10-9-27 CR2  26428 MB/s
4x Core i7-3770K HT          3500 MHz   Dual DDR3-1600           9-9-9-24 CR2   23559 MB/s
4x A10-5800K                 3800 MHz   Dual DDR3-1866           9-10-9-27 CR2  21456 MB/s
6x Core i7-990X Extreme HT   3466 MHz   Triple DDR3-1333         9-9-9-24 CR1   21110 MB/s
12x Opteron 2431             2400 MHz   Unganged Dual DDR2-800R  6-6-6-18 CR1   19763 MB/s
4x Core i7-2600 HT           3400 MHz   Dual DDR3-1333           9-9-9-24 CR1   19552 MB/s
8x Opteron 2378              2400 MHz   Unganged Dual DDR2-800R  6-6-6-18 CR1   18834 MB/s
6x Phenom II X6 1100T        3300 MHz   Unganged Dual DDR3-1333  9-9-9-24 CR2   14730 MB/s
8x Xeon E5462                2800 MHz   Quad DDR2-640FB          5-5-5-15        7646 MB/s
4x Core 2 Quad Q9550S        2833 MHz   Dual DDR2-667            5-5-5-13 CR2    7457 MB/s

[Memory Copy]
 
CPU                          CPU Clock  Memory                   CL-RCD-RP-RAS  Copy Speed
6x Core i7-3960X Extreme HT  3300 MHz   Quad DDR3-1600           9-9-9-24 CR2   42150 MB/s
8x Xeon X5550 HT             2666 MHz   Triple DDR3-1333         9-9-9-24 CR1   34882 MB/s
6x Core i7-990X Extreme HT   3466 MHz   Triple DDR3-1333         9-9-9-24 CR1   24509 MB/s
8x FX-8150                   3600 MHz   Dual DDR3-1866           9-10-9-27 CR2  23799 MB/s
4x Core i7-3770K HT          3500 MHz   Dual DDR3-1600           9-9-9-24 CR2   22772 MB/s
4x A10-5800K                 3800 MHz   Dual DDR3-1866           9-10-9-27 CR2  18038 MB/s
4x Core i7-2600 HT           3400 MHz   Dual DDR3-1333           9-9-9-24 CR1   17836 MB/s
8x Opteron 2378              2400 MHz   Unganged Dual DDR2-800R  6-6-6-18 CR1   17362 MB/s
12x Opteron 2431             2400 MHz   Unganged Dual DDR2-800R  6-6-6-18 CR1   17174 MB/s
6x Phenom II X6 1100T        3300 MHz   Unganged Dual DDR3-1333  9-9-9-24 CR2   12444 MB/s
8x Xeon E5462                2800 MHz   Quad DDR2-640FB          5-5-5-15        8216 MB/s
4x Core 2 Quad Q9550S        2833 MHz   Dual DDR2-667            5-5-5-13 CR2    6686 MB/s

[CPU ZLib]
 
CPU                          CPU Clock  Memory                   CL-RCD-RP-RAS  Score
6x Core i7-3960X Extreme HT  3300 MHz   Quad DDR3-1600           9-9-9-24 CR2   444.5 MB/s
12x Opteron 2431             2400 MHz   Unganged Dual DDR2-800R  6-6-6-18 CR1   366.5 MB/s
6x Core i7-990X Extreme HT   3466 MHz   Triple DDR3-1333         9-9-9-24 CR1   358.7 MB/s
8x Xeon X5550 HT             2666 MHz   Triple DDR3-1333         9-9-9-24 CR1   358.1 MB/s
4x Core i7-3770K HT          3500 MHz   Dual DDR3-1600           9-9-9-24 CR2   317.2 MB/s
4x Core i7-2600 HT           3400 MHz   Dual DDR3-1333           9-9-9-24 CR1   289.2 MB/s
8x Xeon E5462                2800 MHz   Quad DDR2-640FB          5-5-5-15       281.4 MB/s
8x FX-8150                   3600 MHz   Dual DDR3-1866           9-10-9-27 CR2  276.3 MB/s
6x Phenom II X6 1100T        3300 MHz   Unganged Dual DDR3-1333  9-9-9-24 CR2   256.7 MB/s
8x Opteron 2378              2400 MHz   Unganged Dual DDR2-800R  6-6-6-18 CR1   244.3 MB/s
4x A10-5800K                 3800 MHz   Dual DDR3-1866           9-10-9-27 CR2  171.5 MB/s
4x Core 2 Quad Q9550S        2833 MHz   Dual DDR2-667            5-5-5-13 CR2   122.6 MB/s
After running 'RUNME.BAT' the results are:
C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Hanazakari_GP.exe enwik8.Hanazakari.Nakamichi
Nakamichi 'Hanazakari', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 54693537 bytes ...
RAM-to-RAM performance: 756 MB/s; 756 MB/s; 756 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Hanazakari_XMM.exe enwik8.Hanazakari.Nakamichi
Nakamichi 'Hanazakari', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 54693537 bytes ...
RAM-to-RAM performance: 607 MB/s; 607 MB/s; 607 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Kaidanji_GP.exe enwik8.Kaidanji.Nakamichi
Nakamichi 'Kaidanji', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 63430147 bytes ...
RAM-to-RAM performance: 1014 MB/s; 1014 MB/s; 1003 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Kaidanji_XMM.exe enwik8.Kaidanji.Nakamichi
Nakamichi 'Kaidanji', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 63430147 bytes ...
RAM-to-RAM performance: 1014 MB/s; 1014 MB/s; 866 MB/s.
Memory pool starting address: 0000000004100080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 95005 clocks or 2.759MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 31%

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Kitsune_GP.exe enwik8.Kitsune.Nakamichi
Nakamichi 'Kitsune', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 57108834 bytes ...
RAM-to-RAM performance: 551 MB/s; 554 MB/s; 554 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Kitsune_XMM.exe enwik8.Kitsune.Nakamichi
Nakamichi 'Kitsune', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 57108834 bytes ...
RAM-to-RAM performance: 507 MB/s; 507 MB/s; 507 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Sanbashi_GP.exe enwik8.Sanbashi.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 46881842 bytes ...
RAM-to-RAM performance: 607 MB/s; 607 MB/s; 607 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Sanbashi_XMM.exe enwik8.Sanbashi.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 46881842 bytes ...
RAM-to-RAM performance: 504 MB/s; 507 MB/s; 507 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Yappy_32bit.exe enwik8 8192 10000
YAPPY: [b 8K] bytes 100000000 -> 57701807  57.7%  comp  38.7 MB/s  uncomp 688.6 MB/s 

C:\_KAZE_Greed-For-Speed_Hakukitsune>Yappy_32bit.exe enwik8 65536 10000
YAPPY: [b 64K] bytes 100000000 -> 54162908  54.2%  comp  36.2 MB/s  uncomp 679.4 MB/s 

C:\_KAZE_Greed-For-Speed_Hakukitsune>Yappy_32bit.exe enwik8 1048576 10000
YAPPY: [b 1024K] bytes 100000000 -> 53687370  53.7%  comp  36.0 MB/s  uncomp 661.1 MB/s 
I have 13 testdatafiles (mainly textual) which are useful to get the idea how weak is the compression ratio:

For 'Hanazakari':
09/26/1996  04:51 PM           152,089 alice29.txt
06/22/2014  03:38 AM            80,270 alice29.txt.Hanazakari.Nakamichi
06/07/2014  02:59 PM        13,846,016 Bible_Bible_Bible.tar
06/22/2014  06:48 AM         6,084,248 Bible_Bible_Bible.tar.Hanazakari.Nakamichi
05/16/2014  07:22 AM         3,153,408 CalgaryCorpus.tar
06/22/2014  03:40 AM         1,620,653 CalgaryCorpus.tar.Hanazakari.Nakamichi
05/16/2014  07:22 AM        10,192,446 dickens
06/22/2014  03:45 AM         5,068,626 dickens.Hanazakari.Nakamichi
06/11/2014  09:35 PM       100,000,000 enwik8
06/22/2014  04:22 AM        54,693,537 enwik8.Hanazakari.Nakamichi
06/24/2014  04:09 AM     1,000,000,000 enwik9
06/22/2014  04:27 PM       516,805,980 enwik9.Hanazakari.Nakamichi
05/16/2014  07:22 AM        11,546,860 Goyathlay.txt
06/22/2014  06:54 AM         5,072,124 Goyathlay.txt.Hanazakari.Nakamichi
05/16/2014  07:22 AM       846,351,894 Kazahana_on.PAGODA-order-5.txt
06/22/2014  10:54 AM       268,125,302 Kazahana_on.PAGODA-order-5.txt.Hanazakari.Nakamichi
05/16/2014  07:22 AM        20,617,071 Large_traffic_log_file_of_a_popular_website_fp.log
06/22/2014  06:55 AM         5,721,115 Large_traffic_log_file_of_a_popular_website_fp.log.Hanazakari.Nakamichi
06/05/2014  07:35 PM       132,728,832 New_Shorter_Oxford_English_Dictionary_fifth_edition.tar
06/22/2014  07:19 AM        51,827,756 New_Shorter_Oxford_English_Dictionary_fifth_edition.tar.Hanazakari.Nakamichi
05/16/2014  07:22 AM       206,908,949 OSHO.TXT
06/22/2014  08:33 AM        88,953,095 OSHO.TXT.Hanazakari.Nakamichi
06/03/2014  07:35 PM         5,582,655 shaks12.txt
06/22/2014  03:42 AM         2,737,003 shaks12.txt.Hanazakari.Nakamichi
05/16/2014  07:22 AM       211,948,544 silesia.tar
06/22/2014  06:44 AM       108,267,950 silesia.tar.Hanazakari.Nakamichi
C:\Nakamichi_brutal_tests_Hanazakari>Nakamichi_Hanazakari\Nakamichi_Hanazakari_GP.exe enwik9.Nakamichi
Nakamichi 'Hanazakari', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 516805980 bytes ...
RAM-to-RAM performance: 735 MB/s.

C:\Nakamichi_brutal_tests_Hanazakari>Nakamichi_Hanazakari\Nakamichi_Hanazakari_XMM.exe enwik9.Nakamichi
Nakamichi 'Hanazakari', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 516805980 bytes ...
RAM-to-RAM performance: 657 MB/s.
For 'Kaidanji':
09/26/1996  04:51 PM           152,089 alice29.txt
06/23/2014  05:50 PM            92,285 alice29.txt.Kaidanji.Nakamichi
06/07/2014  02:59 PM        13,846,016 Bible_Bible_Bible.tar
06/23/2014  09:48 PM         7,201,021 Bible_Bible_Bible.tar.Kaidanji.Nakamichi
05/16/2014  07:22 AM         3,153,408 CalgaryCorpus.tar
06/23/2014  05:51 PM         1,862,449 CalgaryCorpus.tar.Kaidanji.Nakamichi
05/16/2014  07:22 AM        10,192,446 dickens
06/23/2014  05:54 PM         6,387,079 dickens.Kaidanji.Nakamichi
05/16/2014  07:22 AM       100,000,000 enwik8
06/23/2014  06:16 PM        63,430,147 enwik8.Kaidanji.Nakamichi
06/24/2014  04:07 AM     1,000,000,000 enwik9
06/24/2014  02:37 AM       590,380,377 enwik9.Kaidanji.Nakamichi
05/16/2014  07:22 AM        11,546,860 Goyathlay.txt
06/23/2014  09:49 PM         4,889,336 Goyathlay.txt.Kaidanji.Nakamichi
05/16/2014  07:22 AM       846,351,894 Kazahana_on.PAGODA-order-5.txt
06/23/2014  11:28 PM       279,899,511 Kazahana_on.PAGODA-order-5.txt.Kaidanji.Nakamichi
05/16/2014  07:22 AM        20,617,071 Large_traffic_log_file_of_a_popular_website_fp.log
06/23/2014  09:50 PM         5,772,170 Large_traffic_log_file_of_a_popular_website_fp.log.Kaidanji.Nakamichi
06/05/2014  07:35 PM       132,728,832 New_Shorter_Oxford_English_Dictionary_fifth_edition.tar
06/23/2014  10:04 PM        56,609,197 New_Shorter_Oxford_English_Dictionary_fifth_edition.tar.Kaidanji.Nakamichi
05/16/2014  07:22 AM       206,908,949 OSHO.TXT
06/23/2014  10:38 PM       106,013,567 OSHO.TXT.Kaidanji.Nakamichi
06/03/2014  07:35 PM         5,582,655 shaks12.txt
06/23/2014  05:52 PM         3,391,657 shaks12.txt.Kaidanji.Nakamichi
05/16/2014  07:22 AM       211,948,544 silesia.tar
06/23/2014  09:45 PM       128,649,094 silesia.tar.Kaidanji.Nakamichi
C:\Nakamichi_brutal_tests_Kaidanji>Nakamichi_Kaidanji\Nakamichi_Kaidanji_GP.exe enwik9.Nakamichi
Nakamichi 'Kaidanji', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 590380377 bytes ...
RAM-to-RAM performance: 1035 MB/s.

C:\Nakamichi_brutal_tests_Kaidanji>Nakamichi_Kaidanji\Nakamichi_Kaidanji_xmm.exe enwik9.Nakamichi
Nakamichi 'Kaidanji', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 590380377 bytes ...
RAM-to-RAM performance: 924 MB/s.
For 'Kitsune':
09/26/1996  04:51 PM           152,089 alice29.txt
06/23/2014  03:45 AM            82,750 alice29.txt.Kitsune.Nakamichi
06/07/2014  02:59 PM        13,846,016 Bible_Bible_Bible.tar
06/23/2014  06:17 AM         6,491,990 Bible_Bible_Bible.tar.Kitsune.Nakamichi
05/16/2014  07:22 AM         3,153,408 CalgaryCorpus.tar
06/23/2014  03:47 AM         1,635,663 CalgaryCorpus.tar.Kitsune.Nakamichi
05/16/2014  07:22 AM        10,192,446 dickens
06/23/2014  03:53 AM         5,572,322 dickens.Kitsune.Nakamichi
05/16/2014  07:22 AM       100,000,000 enwik8
06/23/2014  04:38 AM        57,108,834 enwik8.Kitsune.Nakamichi
06/24/2014  04:12 AM     1,000,000,000 enwik9
06/23/2014  04:48 PM       523,820,475 enwik9.Kitsune.Nakamichi
05/16/2014  07:22 AM        11,546,860 Goyathlay.txt
06/23/2014  06:22 AM         4,851,157 Goyathlay.txt.Kitsune.Nakamichi
05/16/2014  07:22 AM       846,351,894 Kazahana_on.PAGODA-order-5.txt
06/23/2014  10:00 AM       204,945,294 Kazahana_on.PAGODA-order-5.txt.Kitsune.Nakamichi
05/16/2014  07:22 AM        20,617,071 Large_traffic_log_file_of_a_popular_website_fp.log
06/23/2014  06:23 AM         4,043,639 Large_traffic_log_file_of_a_popular_website_fp.log.Kitsune.Nakamichi
06/05/2014  07:35 PM       132,728,832 New_Shorter_Oxford_English_Dictionary_fifth_edition.tar
06/23/2014  06:53 AM        46,750,535 New_Shorter_Oxford_English_Dictionary_fifth_edition.tar.Kitsune.Nakamichi
05/16/2014  07:22 AM       206,908,949 OSHO.TXT
06/23/2014  08:02 AM        93,856,538 OSHO.TXT.Kitsune.Nakamichi
06/03/2014  07:35 PM         5,582,655 shaks12.txt
06/23/2014  03:49 AM         2,961,702 shaks12.txt.Kitsune.Nakamichi
05/16/2014  07:22 AM       211,948,544 silesia.tar
06/23/2014  06:13 AM       104,839,740 silesia.tar.Kitsune.Nakamichi
C:\Nakamichi_brutal_tests_Kitsune>Nakamichi_Kitsune\Nakamichi_Kitsune_GP.exe enwik9.Nakamichi
Nakamichi 'Kitsune', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 523820475 bytes ...
RAM-to-RAM performance: 593 MB/s.

C:\Nakamichi_brutal_tests_Kitsune>Nakamichi_Kitsune\Nakamichi_Kitsune_xmm.exe enwik9.Nakamichi
Nakamichi 'Kitsune', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 523820475 bytes ...
RAM-to-RAM performance: 545 MB/s.
For 'Sanbashi':
09/26/1996  04:51 PM           152,089 alice29.txt
06/06/2014  11:12 PM            95,682 alice29.txt.Sanbashi.Nakamichi
05/16/2014  07:22 AM        13,846,016 Bible_Bible_Bible.tar
06/07/2014  12:28 AM         5,381,070 Bible_Bible_Bible.tar.Sanbashi.Nakamichi
05/16/2014  07:22 AM         3,153,408 CalgaryCorpus.tar
06/06/2014  11:20 PM         1,560,737 CalgaryCorpus.tar.Sanbashi.Nakamichi
05/16/2014  07:22 AM        10,192,446 dickens
06/06/2014  11:59 PM         4,614,250 dickens.Sanbashi.Nakamichi
06/10/2014  11:54 PM       100,000,000 enwik8
06/07/2014  06:39 AM        46,881,842 enwik8.Sanbashi.Nakamichi
06/24/2014  04:16 AM     1,000,000,000 enwik9
06/10/2014  05:50 AM       427,440,514 enwik9.Sanbashi.Nakamichi
05/16/2014  07:22 AM        11,546,860 Goyathlay.txt
06/07/2014  01:02 AM         4,228,600 Goyathlay.txt.Sanbashi.Nakamichi
06/10/2014  11:58 PM       846,351,894 Kazahana_on.PAGODA-order-5.txt
06/08/2014  04:00 AM       147,898,802 Kazahana_on.PAGODA-order-5.txt.Sanbashi.Nakamichi
05/16/2014  07:22 AM        20,617,071 Large_traffic_log_file_of_a_popular_website_fp.log
06/07/2014  01:05 AM         3,298,137 Large_traffic_log_file_of_a_popular_website_fp.log.Sanbashi.Nakamichi
06/05/2014  07:35 PM       132,728,832 New_Shorter_Oxford_English_Dictionary_fifth_edition.tar
06/07/2014  09:44 AM        37,000,029 New_Shorter_Oxford_English_Dictionary_fifth_edition.tar.Sanbashi.Nakamichi
06/10/2014  11:58 PM       206,908,949 OSHO.TXT
06/07/2014  04:18 PM        73,837,310 OSHO.TXT.Sanbashi.Nakamichi
06/03/2014  07:35 PM         5,582,655 shaks12.txt
06/06/2014  11:34 PM         2,559,110 shaks12.txt.Sanbashi.Nakamichi
06/10/2014  11:59 PM       211,948,544 silesia.tar
06/10/2014  10:29 PM        93,058,243 silesia.tar.Sanbashi.Nakamichi
C:\Nakamichi_brutal_tests_Sanbashi>Nakamichi_Sanbashi\Nakamichi_Sanbashi_GP_64bit.exe enwik9.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 427440514 bytes ...
RAM-to-RAM performance: 610 MB/s.

C:\Nakamichi_brutal_tests_Sanbashi>Nakamichi_Sanbashi\Nakamichi_Sanbashi_XMM_64bit.exe enwik9.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 427440514 bytes ...
RAM-to-RAM performance: 536 MB/s.
For 'Kaiko':
09/26/1996  04:51 PM           152,089 alice29.txt
06/11/2014  05:13 AM           100,756 alice29.txt.Kaiko.Nakamichi
06/07/2014  02:59 PM        13,846,016 Bible_Bible_Bible.tar
06/14/2014  05:31 PM         6,017,304 Bible_Bible_Bible.tar.Kaiko.Nakamichi
05/16/2014  07:22 AM         3,153,408 CalgaryCorpus.tar
06/11/2014  05:17 AM         1,834,530 CalgaryCorpus.tar.Kaiko.Nakamichi
05/16/2014  07:22 AM        10,192,446 dickens
06/11/2014  05:47 AM         4,694,952 dickens.Kaiko.Nakamichi
06/11/2014  09:34 PM       100,000,000 enwik8
06/11/2014  07:24 PM        47,016,954 enwik8.Kaiko.Nakamichi
06/24/2014  04:14 AM     1,000,000,000 enwik9
06/22/2014  03:37 AM       457,975,369 enwik9.Kaiko.Nakamichi
05/16/2014  07:22 AM        11,546,860 Goyathlay.txt
06/14/2014  06:21 PM         5,240,874 Goyathlay.txt.Kaiko.Nakamichi
05/16/2014  07:22 AM       846,351,894 Kazahana_on.PAGODA-order-5.txt
06/16/2014  08:42 AM       327,417,998 Kazahana_on.PAGODA-order-5.txt.Kaiko.Nakamichi
05/16/2014  07:22 AM        20,617,071 Large_traffic_log_file_of_a_popular_website_fp.log
06/14/2014  06:33 PM         7,827,069 Large_traffic_log_file_of_a_popular_website_fp.log.Kaiko.Nakamichi
06/05/2014  07:35 PM       132,728,832 New_Shorter_Oxford_English_Dictionary_fifth_edition.tar
06/15/2014  01:44 AM        54,508,294 New_Shorter_Oxford_English_Dictionary_fifth_edition.tar.Kaiko.Nakamichi
05/16/2014  07:22 AM       206,908,949 OSHO.TXT
06/15/2014  11:39 AM        83,171,686 OSHO.TXT.Kaiko.Nakamichi
06/03/2014  07:35 PM         5,582,655 shaks12.txt
06/11/2014  05:25 AM         2,704,445 shaks12.txt.Kaiko.Nakamichi
05/16/2014  07:22 AM       211,948,544 silesia.tar
06/14/2014  04:54 PM       114,279,086 silesia.tar.Kaiko.Nakamichi
C:\Nakamichi_brutal_tests_Kaiko>Nakamichi_Kaiko\Nakamichi_Kaiko_GP.exe enwik9.Nakamichi
Nakamichi 'Kaiko', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 457975369 bytes ...
RAM-to-RAM performance: 302 MB/s.

C:\Nakamichi_brutal_tests_Kaiko>Nakamichi_Kaiko\Nakamichi_Kaiko_XMM.exe enwik9.Nakamichi
Nakamichi 'Kaiko', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 457975369 bytes ...
RAM-to-RAM performance: 299 MB/s.
Thanks to the help coming from Garrard here how 'Thuban' executes Nakamichi is given:

AMD 'Thuban' Phenom II X6 1090T @3.2 GHz features:
Cores/Threads: 6/6
Level 1 cache size: 6 x 64 KB instruction/data caches
Level 2 cache size: 6 x 512 KB caches
Level 3 cache size: 6 MB shared cache
C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Hanazakari_GP.exe enwik8.Hanazakari.Nakamichi
Nakamichi 'Hanazakari', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 54693537 bytes ...
RAM-to-RAM performance: 851 MB/s; 851 MB/s; 874 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Hanazakari_XMM.exe enwik8.Hanazakari.Nakamichi
Nakamichi 'Hanazakari', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 54693537 bytes ...
RAM-to-RAM performance: 756 MB/s; 883 MB/s; 822 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Kaidanji_GP.exe enwik8.Kaidanji.Nakamichi
Nakamichi 'Kaidanji', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 63430147 bytes ...
RAM-to-RAM performance: 1135 MB/s; 1149 MB/s; 1096 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Kaidanji_XMM.exe enwik8.Kaidanji.Nakamichi /memtest
Nakamichi 'Kaidanji', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 63430147 bytes ...
RAM-to-RAM performance: 1135 MB/s; 1149 MB/s; 1121 MB/s.
Memory pool starting address: 0000000004200080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 93210 clocks or 2.812MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 39%

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Kitsune_GP.exe enwik8.Kitsune.Nakamichi
Nakamichi 'Kitsune', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 57108834 bytes ...
RAM-to-RAM performance: 504 MB/s; 526 MB/s; 518 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Kitsune_XMM.exe enwik8.Kitsune.Nakamichi
Nakamichi 'Kitsune', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 57108834 bytes ...
RAM-to-RAM performance: 592 MB/s; 599 MB/s; 607 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Sanbashi_GP.exe enwik8.Sanbashi.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 46881842 bytes ...
RAM-to-RAM performance: 564 MB/s; 560 MB/s; 557 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Nakamichi_Sanbashi_XMM.exe enwik8.Sanbashi.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 46881842 bytes ...
RAM-to-RAM performance: 541 MB/s; 538 MB/s; 532 MB/s.

C:\_KAZE_Greed-For-Speed_Hakukitsune>Yappy_32bit.exe enwik8 8192 10000
YAPPY: [b 8K] bytes 100000000 -> 57701807  57.7%  comp  49.3 MB/s  uncomp 942.4 MB/s 

C:\_KAZE_Greed-For-Speed_Hakukitsune>Yappy_32bit.exe enwik8 65536 10000
YAPPY: [b 64K] bytes 100000000 -> 54162908  54.2%  comp  47.7 MB/s  uncomp 921.4 MB/s 

C:\_KAZE_Greed-For-Speed_Hakukitsune>Yappy_32bit.exe enwik8 1048576 10000
YAPPY: [b 1024K] bytes 100000000 -> 53687370  53.7%  comp  46.4 MB/s  uncomp 888.8 MB/s 

C:\_KAZE_Greed-For-Speed_Hakukitsune>timer32.exe "MokujIN_r5+_16-Threads_IntelV12_64bit_O3" 524288 524288 /stats
MokujIN, Multiplication of INtegers, an OpenMP (multi-threaded) string multiplier, 16 threads enforced, written by Kaze, 2012-Nov-16, revision 5fix+.
omp_get_num_procs( ) = 6
omp_get_max_threads( ) = 6
Multiplying performance for operands 00,000,006 digits long (footprint: ~000,000KB, checksum: 6beb,c722): 36 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,012 digits long (footprint: ~000,000KB, checksum: 9e28,8598): 144 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,023 digits long (footprint: ~000,000KB, checksum: df58,ed82): 529 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,046 digits long (footprint: ~000,001KB, checksum: fbbf,9e27): 2,116 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,092 digits long (footprint: ~000,002KB, checksum: 5280,1546): 8,464 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,184 digits long (footprint: ~000,005KB, checksum: 3061,d39b): 33,856 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,367 digits long (footprint: ~000,011KB, checksum: 99a2,9b3c): 134,689 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,733 digits long (footprint: ~000,022KB, checksum: 71bd,9970): 537,289 MokujINs i.e. digits per second.
Multiplying performance for operands 00,001,465 digits long (footprint: ~000,045KB, checksum: c536,0c57): 2,146,225 MokujINs i.e. digits per second.
Multiplying performance for operands 00,002,929 digits long (footprint: ~000,091KB, checksum: f1f7,a243): 8,579,041 MokujINs i.e. digits per second.
Multiplying performance for operands 00,005,857 digits long (footprint: ~000,183KB, checksum: 5a3a,564c): 34,304,449 MokujINs i.e. digits per second.
Multiplying performance for operands 00,011,714 digits long (footprint: ~000,366KB, checksum: 464c,a182): 137,217,796 MokujINs i.e. digits per second.
Multiplying performance for operands 00,023,428 digits long (footprint: ~000,732KB, checksum: f9a9,2f30): 548,871,184 MokujINs i.e. digits per second.
Multiplying performance for operands 00,046,855 digits long (footprint: ~001,464KB, checksum: 86ba,47be): 548,847,756 MokujINs i.e. digits per second.
Multiplying performance for operands 00,093,710 digits long (footprint: ~002,928KB, checksum: b194,1027): 627,254,578 MokujINs i.e. digits per second.
Multiplying performance for operands 00,187,419 digits long (footprint: ~005,856KB, checksum: 97dc,040c): 616,243,536 MokujINs i.e. digits per second.
Multiplying performance for operands 00,374,838 digits long (footprint: ~011,713KB, checksum: 13e5,96d7): 630,060,655 MokujINs i.e. digits per second.
Multiplying performance for operands 00,749,676 digits long (footprint: ~023,427KB, checksum: 8ccc,73c7): 604,316,241 MokujINs i.e. digits per second.
Multiplying performance for operands 01,499,351 digits long (footprint: ~046,854KB, checksum: 8db2,fd7c): 605,617,839 MokujINs i.e. digits per second.
Dumping the result to 'MokujIN.txt' ... OK
Total Time: 4,941 second(s).

Kernel  Time =     4.227 =    0%
User    Time = 25046.600 =  506%
Process Time = 25050.827 =  507%    Virtual  Memory =    475 MB
Global  Time =  4940.650 =  100%    Physical Memory =     55 MB
Overall, nothing impressive, yet 'Kitsune' is foxy, AFAIK even 'K' computer is unable to handle its foxiness.
I believe it could foxify the searches (by reducing the I/O) on machines with 5 ALUs, have to wait and see.

Update, 2014-Jun-15:

Foxtime.
The variant of 'Aratama' using Match_Length 5/13 is called 'Kitsune', it uses one XMM register instead of one GP one.
On machines optimized for XMM I expect much faster execution.
The code generated by Intel compiler, in my opinion, is excellent - with only 1/2/1 WORD/DWORD/QWORD memory accesses.
------------------------------------------------------------------------------------------------------------------------------------------------------
| compressor \ filedataset      | alice29.txt    | CalgaryCorpus.tar  | shaks12.txt        | dickens             | enwik8                            |
------------------------------------------------------------------------------------------------------------------------------------------------------
| UNCOMPRESSED                  | 152,089        | 3,153,408          | 5,582,655          | 10,192,446          | 100,000,000                       |
| Nakamichi 'Kaidanji'   (64KB) | 092,285 / 0328 | 1,862,449 / 011838 | 3,391,657 / 006799 | 06,387,079 / 014977 | 063,430,147 / 0283161 / 1014 MB/s |
| Nakamichi 'Hanazakari' (64KB) | 080,270 / 0218 | 1,620,653 / 005396 | 2,737,003 / 000452 | 05,068,626 / 000630 | 054,693,537 / 0031544 /  756 MB/s |
| Nakamichi 'Aratama'    (64KB) | 095,560 / 5310 | 1,933,936 / 121922 | 3,518,810 / 190256 | 06,620,093 / 363922 | 066,251,713 / 4170885 /  866 MB/s |
| Nakamichi 'Kitsune'    (64KB) | 082,750 / 0218 | 1,635,663 / 012410 | 2,961,702 / 001998 | 05,572,322 / 003358 | 057,108,834 / 0204190 /  554 MB/s |
| Nakamichi 'Sanbashi'    (2MB) | 095,682 / 0267 | 1,560,737 / 006841 | 2,559,110 / 000519 | 04,614,250 / 000558 | 046,881,842 / 0026545 /  607 MB/s |
| Nakamichi 'Kaiko'      (16MB) | 100,756 / 1371 | 1,834,530 / 022877 | 2,704,445 / 004412 | 04,694,952 / 004356 | 047,016,954 / 0097458 /  289 MB/s |
| 7z's gz, Ultra Deflate32      | 051,707        | 0,980,026          | 1,934,787          | 03,681,828          | 035,102,891                       |
| 7z's zip, Ultra Deflate64     | 050,051        | 0,945,849          | 1,834,240          | 03,508,645          | 033,757,921                       |
| TANGELO 2.3                   | 039,160        | 0,710,066          | 1,236,021          | 02,279,659          | 020,921,619                       |
| LZ4 v1.4, -9                  | 063,705        | 1,195,853          | 2,315,036          | 04,442,992          | 042,283,904         / 2186.9 MB/s |
| Yappy, 8192 10000             | 087,965        | 1,654,203          | 3,337,964          | 06,374,780          | 057,701,807         /  698.7 MB/s |
| Yappy, 65536 10000            | 081,217        | 1,544,271          | 3,120,688          | 05,912,295          | 054,162,908         /  679.4 MB/s |
| Yappy, 1048576 10000          | 080,353        | 1,530,823          | 3,091,493          | 05,850,648          | 053,687,370         /  679.4 MB/s |
------------------------------------------------------------------------------------------------------------------------------------------------------
; Nakamichi_Kitsune_XMM's decompression loop, b0-40+2=114 bytes long
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -QxSSE2 -D_N_XMM -FAcs";

.B7.3::                         
  00040 45 33 e4         xor r12d, r12d                         
  00043 41 89 c5         mov r13d, eax                          
  00046 33 db            xor ebx, ebx                           
  00048 4c 03 e9         add r13, rcx                           
  0004b 45 0f b7 1c 12   movzx r11d, WORD PTR [r10+rdx]         
  00050 41 f7 c3 e0 00 
        00 00            test r11d, 224                         
  00057 44 89 de         mov esi, r11d                          
  0005a 4a 8d 7c 12 01   lea rdi, QWORD PTR [1+rdx+r10]         
  0005f 45 0f 44 e1      cmove r12d, r9d                        
  00063 48 f7 de         neg rsi                                
  00066 49 03 f5         add rsi, r13                           
  00069 45 85 e4         test r12d, r12d                        
  0006c 41 0f 44 d9      cmove ebx, r9d                         
  00070 49 0f af fc      imul rdi, r12                          
  00074 48 0f af f3      imul rsi, rbx                          
  00078 f3 0f 6f 04 3e   movdqu xmm0, XMMWORD PTR [rsi+rdi]     
  0007d 41 0f b6 fb      movzx edi, r11b                        
  00081 41 83 e3 10      and r11d, 16                           
  00085 41 d1 eb         shr r11d, 1                            
  00088 41 83 c3 05      add r11d, 5                            
  0008c 44 0f af db      imul r11d, ebx                         
  00090 f3 41 0f 7f 45 
        00               movdqu XMMWORD PTR [r13], xmm0         
  00096 8d 77 01         lea esi, DWORD PTR [1+rdi]             
  00099 41 0f af fc      imul edi, r12d                         
  0009d 41 0f af f4      imul esi, r12d                         
  000a1 44 03 d6         add r10d, esi                          
  000a4 03 c7            add eax, edi                           
  000a6 41 03 c3         add eax, r11d                          
  000a9 45 8d 14 5a      lea r10d, DWORD PTR [r10+rbx*2]        
  000ad 45 3b d0         cmp r10d, r8d                          
  000b0 72 8e            jb .B7.3 
The package (124KB) containing the source:
http://www.sanmayce.com/Downloads/Nakamichi_Kitsune.zip

There are two Bulgarian artists (songstresses) who I like a lot, Toni and Ruth Koleva.
Four or so years ago Toni with hers 'Kakto predi' totally made me feel.
Two years ago Nyree(Ruth Koleva) with hers 'Scream My Name' hit me hard.
So, now I love to see them in a duo, making the 'Break of Day'.
I salute everyone who still is on the path of appreciativeness with it:

Nakamichi
The videoclip is made gratuitously to support 'I Can Too' - a foundation for helping children with special needs.

Some names of the old/incoming NakamichiTA:

In the 2nd year of Koka (1845), there was a turtle who was worshiped in Lake Shinobazu, in Ueno.
This turtle was different from normal turtles. Its shell was white, and had faint markings on it that
could be read as kanji characters. Its neck, legs, and arms were unusually thick. The turtle was originally
from the great lake of Nagai in Settsu, and had been brought to Lake Shinobazu by virtuous local men who
had purchased the turtle in Osaka then brought it home and dedicated it to the goddess Benzaten.

White turtles have a history of sacredness. There is a legend from India of a one-eyed white turtle who
listened intently to the sermons of the Buddha Shakuson. China speaks of a white turtle who descended from Heaven
and brought with it peace and tranquility. And in Japan the white turtle is revered as a symbol of peace.
The appearance of a white turtle is thought necessary to ensure a peaceful Imperial reign.

/百物語怪談会 Hyakumonogatari Kaidankai/

?? - Benten/Benzaten ~ the goddess of fortune
粗玉 - aratama ~ unpolished and uncut gem
快男児 - kaidanji ~ nice guy
木鼠 - kinezumi ~ squirrel
桟橋 - sanbashi ~ wharf; bridge; jetty; pier
花盛 - hanazakari ~ flowers in full bloom; time of year in which flowers are in full bloom
- arashi ~ storm; tempest
片言 - katakoto ~ a smattering; talk like a baby; speak haltingly
乳母車 - ubaguruma ~ baby carriage; perambulator
幼子 - osanago ~ infant; baby; little child
小象 - kozou ~ young (baby) elephant
黒金剛石 - kurokongouseki ~ black diamond; carbon; carbonado
鶏群一鶴 - keigunikkaku ~ a swan among ducklings; a diamond among stones; a great figure among the common run of men
金剛 - kongou ~ 1. vajra (indestructible substance); diamond; adamantine 2. thunderbolt; Indra's weapon; Buddhist symbol of the indestructible truth
悪魔 - akuma ~ devil; demon; fiend; Satan; evil spirit
翠玉 - suigyoku ~ emerald; jade
川蝉 - kawasemi ~ 1. kingfisher 2. jade (gem) 3. beautiful lustrous colour similar to that of the kingfisher's feathers
渦中 - kachuu ~ vortex; maelstrom; whirlpool; convulsions; upheaval
白玉 - shiratama ~ 1. shiratama camelia (Camellia japonica) 2. white gem; rice flour dumpling
珠玉 - shugyoku ~ gem; jewel
逸品 - ippin ~ article of rare beauty; gem
宝石 - houseki ~ gem; jewel
妖怪 - yokai ~ mysterious phenomena
猫また/猫股/猫又 - nekomata ~ a cat-like yokai
鎌鼬 - kamaitachi ~ type Japanese folkloric monster (yokai), thought to be a trio of weasels who appear in a whirlwind to cut their victim
- kitsune ~ fox

怪獣 - Kaiju – 怪 (kai; mysterious) + 獣 (ju; beast), meaning "monster." Most of Japan's famous yokai are kaiju. Godzilla is a dai-kaiju, or "great monster."
超自然 - Choshizen - 超 (cho; super) + 自然 (shizen; natural), meaning the supernatural, including mysterious natural phenomena.
変化 - Henge - 変 (hen; strange) + 化 (ge; to change, transform), meaning shape-shifters like tanuki, foxes, and old cats.
幽霊 - Yurei - 幽 (yu; dim) + 霊 (rei; spirit), meaning ghosts, and spirits of the dead.

Nekomata - The Supernatural Cat
Nakamichi
... the picture you present of a dancing white cat dressed in a red robe comes from Kawanabe Kyōsai 河鍋暁斎 (1831-1889),
a spoof of a 12th-century hand scroll known as Frolicking Animals & Humans (Chōjū Jinbutsu Giga 鳥獣人物戯画). The latter
work is a national treasure at Kōzan-ji Temple 高山寺 (Kyoto). In any case, if you look at the entire painting, you could
also "justifiable" call the creature a white fox (not a white cat), for beside it is a tanuki — and the tanuki
is very closely related to the fox. The color white is also important, for white foxes are considered especially powerful
and serve as the mount or animal companion of Inari (kami of rice) as well as the mount of Dakiniten and various Tengu.
At the end of the day, however, we can only speculate, for Cat/Fox/Tanuki lore is a very complex topic.
/Mark Schumacher's comment on Zack Davisson's blog/

Kitsune by Ahyicodae on deviantART

Nakamichi
Thought I would try an anthro piece.
This is a character whose name is writ in the heavens as Celestial Kitsune, a black fox (actually a half-fox,
she has a human father) who attained divinity through her wisdom and compassion. Like all foxes, she is a trickster.
While she is very young by fox spirit standards, she already has enough of a reputation to be known to Ninetails (who,
not being disposed herself to the human trait of "virtue," looks down her nose at the halfbreed whose human side so strongly
directs her nature). Recently she tricked the dragon god out of an ancient map, and she is in quite a lot of trouble for it.
Though I suspect, wily creature that she is, that she'll be fine (so long as the dragon never catches her).
Like all kitsune she has powers of transformation. She is about to put on her mask and change to a human.
/Ahyicodae/

Update, 2014-Jun-13:

While playing with 'Kinezumi' (a refined variant of 'Kaidanji') one fully branchless etude emerged - Nakamichi 'Aratama'.
Love it! 'Aratama' is really an uncut gem, rough beauty indeed!
My knowledge of internals is inferior, nevertheless, my expectations are 'Aratama', when multi-threaded, to overshadow ... 'memcpy'.
One important m^2's tweak is to be made - utilizing MatchLength bits to their fullest, 1..8 instead of 1..7, to-do.
// For enwik8 on Q9550s:
// | Nakamichi 'Kaidanji'     (64KB) |  063,430,147 / 0283161 / 1014 MB/s |
// | Nakamichi 'Aratama' ML=8 (64KB) |  066,251,713 / 4170885 /  866 MB/s |
// | Nakamichi 'Aratama' ML=7 (64KB) |  061,499,908 / 2800002 /  762 MB/s |
// | Nakamichi 'Aratama' ML=5 (64KB) |  056,528,218 / 0717352 /  607 MB/s |
// | Nakamichi 'Hanazakari'   (64KB) |  054,693,537 / 0031544 /  756 MB/s |
/*
; 'Aratama' decompression loop, a0-40+2=98 bytes long:
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -D_N_GP -FAcs";

.B6.3::                         
  00040 45 33 e4         xor r12d, r12d                         

;;; 			*(uint64_t*)(ret+retIndex+8*(0)) = *(uint64_t*)( (uint64_t)(src+srcIndex+1+8*(0))*(unsigned int)(Flag) + (uint64_t)(ret+retIndex-WORDpair)*(unsigned int)(!Flag) );

  00043 41 89 c5         mov r13d, eax                          
  00046 33 db            xor ebx, ebx                           
  00048 4c 03 e9         add r13, rcx                           
  0004b 45 0f b7 1c 12   movzx r11d, WORD PTR [r10+rdx]         
  00050 41 f7 c3 f8 00 
        00 00            test r11d, 248                         
  00057 44 89 de         mov esi, r11d                          
  0005a 4a 8d 7c 12 01   lea rdi, QWORD PTR [1+rdx+r10]         
  0005f 45 0f 44 e1      cmove r12d, r9d                        
  00063 48 f7 de         neg rsi                                
  00066 49 03 f5         add rsi, r13                           
  00069 45 85 e4         test r12d, r12d                        

;;; 			srcIndex+= (((WORDpair & 0xFF)>>0)+1)*(unsigned int)(Flag) + (2)*(unsigned int)(!Flag) ;

  0006c 45 0f b6 db      movzx r11d, r11b                       
  00070 41 0f 44 d9      cmove ebx, r9d                         
  00074 49 0f af fc      imul rdi, r12                          
  00078 48 0f af f3      imul rsi, rbx                          
  0007c 48 8b 34 3e      mov rsi, QWORD PTR [rsi+rdi]           
  00080 41 8d 7b 01      lea edi, DWORD PTR [1+r11]             
  00084 41 0f af fc      imul edi, r12d                         

;;; 			retIndex+= ((WORDpair & 0xFF)>>0)*(unsigned int)(Flag) + (Min_Match_Length)*(unsigned int)(!Flag) ;

  00088 45 0f af dc      imul r11d, r12d                        
  0008c 44 03 d7         add r10d, edi                          
  0008f 41 03 c3         add eax, r11d                          
  00092 49 89 75 00      mov QWORD PTR [r13], rsi               
  00096 45 8d 14 5a      lea r10d, DWORD PTR [r10+rbx*2]        
  0009a 45 3b d0         cmp r10d, r8d                          
  0009d 8d 04 d8         lea eax, DWORD PTR [rax+rbx*8]         
  000a0 72 9e            jb .B6.3 
*/
The package (29KB) containing the sources:
http://www.sanmayce.com/Downloads/Nakamichi_Aratama.zip

Nakamichi
Update, 2014-Jun-11:

Finally I decided to dump here one table showing some important stats about 4 major most useful variants.
As main reference 7zip's Ultra Deflate64 is given in bold. TANGELO, LZ4 and Yappy are included as well.
-------------------------------------------------------------------------------------------------------------------------------------------------
| compressor \ filedataset      | alice29.txt    | CalgaryCorpus.tar | shaks12.txt      | dickens            | enwik8                           |
-------------------------------------------------------------------------------------------------------------------------------------------------
| UNCOMPRESSED                  | 152,089        | 3,153,408         | 5,582,655        | 10,192,446         | 100,000,000                      |
| Nakamichi 'Kaidanji'   (64KB) | 092,285 / 0328 | 1,862,449 / 11838 | 3,391,657 / 6799 | 06,387,079 / 14977 | 063,430,147 / 283161 / 1014 MB/s |
| Nakamichi 'Hanazakari' (64KB) | 080,270 / 0218 | 1,620,653 / 05396 | 2,737,003 / 0452 | 05,068,626 / 00630 | 054,693,537 / 031544 /  756 MB/s |
| Nakamichi 'Sanbashi'    (2MB) | 095,682 / 0267 | 1,560,737 / 06841 | 2,559,110 / 0519 | 04,614,250 / 00558 | 046,881,842 / 026545 /  607 MB/s |
| Nakamichi 'Kaiko'      (16MB) | 100,756 / 1371 | 1,834,530 / 22877 | 2,704,445 / 4412 | 04,694,952 / 04356 | 047,016,954 / 097458 /  289 MB/s |
| 7z's gz, Ultra Deflate32      | 051,707        | 0,980,026         | 1,934,787        | 03,681,828         | 035,102,891                      |
| 7z's zip, Ultra Deflate64     | 050,051        | 0,945,849         | 1,834,240        | 03,508,645         | 033,757,921                      |
| TANGELO 2.3                   | 039,160        | 0,710,066         | 1,236,021        | 02,279,659         | 020,921,619                      |
| LZ4 v1.4, -9                  | 063,705        | 1,195,853         | 2,315,036        | 04,442,992         | 042,283,904        / 2186.9 MB/s |
| Yappy, 8192 10000             | 087,965        | 1,654,203         | 3,337,964        | 06,374,780         | 057,701,807        /  698.7 MB/s |
| Yappy, 65536 10000            | 081,217        | 1,544,271         | 3,120,688        | 05,912,295         | 054,162,908        /  679.4 MB/s |
| Yappy, 1048576 10000          | 080,353        | 1,530,823         | 3,091,493        | 05,850,648         | 053,687,370        /  679.4 MB/s |
-------------------------------------------------------------------------------------------------------------------------------------------------
The package (381KB) containing sources of the four NakamichiS/NakamichiTA:
http://www.sanmayce.com/Downloads/Nakamichi_Kaidanji_Hanazakari_Sanbashi_Kaiko.zip

Note1: The second number is NumberOfFullLiterals (lower-the-better), the third - decompression speed.
Note2: Speeds obtained on laptop Core 2 Q9550s 2.83GHz (4 cores/threads) running Windows 7 64bit reaching 440 MegaMokujINs.
Note3: Speed of 'memcpy' was 2769 MB/s.
Note4: For comparison, my 'Bonboniera' laptop Core 2 T7500 2.20GHz (2 cores/threads) reaches 171 MegaMokujINs.

'Hanazakari', when multi-threaded, has the potential (it uses 64KB window) to outspeed memcpy's 2769 MB/s.
I don't know how delusional is this expectation, simply, 4x756 MB/s > 2769 MB/s.

Cyan's LZ4 is really awesome, it utilizes the resources good well, up to 387%, 'Sanbashi' follows with its one thread:
C:\Nakamichi_brutal_tests_Sanbashi>timer32.exe lz4 -9 -b enwik8
Nb of threads = 4 ; Compression Level = 9
enwik8          : 100000000 ->  42283793 ( 42.28%),   47.8 MB/s , 2186.9 MB/s

Kernel  Time =     0.202 =    1%
User    Time =    57.954 =  386%
Process Time =    58.157 =  387%    Virtual  Memory =   1797 MB
Global  Time =    14.991 =  100%    Physical Memory =    140 MB

C:\Nakamichi_brutal_tests_Sanbashi>Nakamichi_Sanbashi_XMM_64bit.exe enwik8.Nakamichi /memtest
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 46881842 bytes ...
RAM-to-RAM performance: 507 MB/s.
Memory pool starting address: 0000000003160080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 94661 clocks or 2.769MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 18%

C:\Nakamichi_brutal_tests_Sanbashi>Nakamichi_Sanbashi_GP_64bit.exe enwik8.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 46881842 bytes ...
RAM-to-RAM performance: 607 MB/s.

C:\Nakamichi_brutal_tests_Sanbashi>Nakamichi_Sanbashi_XMM_64bit.exe enwik9.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 427440514 bytes ...
RAM-to-RAM performance: 531 MB/s.

C:\Nakamichi_brutal_tests_Sanbashi>Nakamichi_Sanbashi_GP_64bit.exe enwik9.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 427440514 bytes ...
RAM-to-RAM performance: 605 MB/s.

C:\Nakamichi_brutal_tests_Sanbashi>Nakamichi_Sanbashi_XMM_64bit.exe Kazahana_on.PAGODA-order-5.txt.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 147898802 bytes ...
RAM-to-RAM performance: 890 MB/s.

C:\Nakamichi_brutal_tests_Sanbashi>Nakamichi_Sanbashi_GP_64bit.exe Kazahana_on.PAGODA-order-5.txt.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 147898802 bytes ...
RAM-to-RAM performance: 974 MB/s.

C:\Nakamichi_brutal_tests_Sanbashi>Nakamichi_Sanbashi_XMM_64bit.exe OSHO.TXT.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 73837310 bytes ...
RAM-to-RAM performance: 573 MB/s.

C:\Nakamichi_brutal_tests_Sanbashi>Nakamichi_Sanbashi_GP_64bit.exe OSHO.TXT.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 73837310 bytes ...
RAM-to-RAM performance: 630 MB/s.

C:\Nakamichi_brutal_tests_Sanbashi>Nakamichi_Sanbashi_XMM_64bit.exe silesia.tar.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 93058243 bytes ...
RAM-to-RAM performance: 716 MB/s.

C:\Nakamichi_brutal_tests_Sanbashi>Nakamichi_Sanbashi_GP_64bit.exe silesia.tar.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 93058243 bytes ...
RAM-to-RAM performance: 808 MB/s.
The actual 'MokujINiada' 4 cores/threads benchmark:
C:\Nakamichi_brutal_tests_Sanbashi>MokujIN_test.bat

C:\Nakamichi_brutal_tests_Sanbashi>timer32.exe "MokujIN_r5+_16-Threads_IntelV12_64bit_O3" 524288 524288 /stats
MokujIN, Multiplication of INtegers, an OpenMP (multi-threaded) string multiplier, 16 threads enforced, written by Kaze, 2012-Nov-16, revision 5fix+.
omp_get_num_procs( ) = 4
omp_get_max_threads( ) = 4
Multiplying performance for operands 00,000,006 digits long (footprint: ~000,000KB, checksum: 6beb,c722): 36 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,012 digits long (footprint: ~000,000KB, checksum: 9e28,8598): 144 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,023 digits long (footprint: ~000,000KB, checksum: df58,ed82): 529 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,046 digits long (footprint: ~000,001KB, checksum: fbbf,9e27): 2,116 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,092 digits long (footprint: ~000,002KB, checksum: 5280,1546): 8,464 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,184 digits long (footprint: ~000,005KB, checksum: 3061,d39b): 33,856 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,367 digits long (footprint: ~000,011KB, checksum: 99a2,9b3c): 134,689 MokujINs i.e. digits per second.
Multiplying performance for operands 00,000,733 digits long (footprint: ~000,022KB, checksum: 71bd,9970): 537,289 MokujINs i.e. digits per second.
Multiplying performance for operands 00,001,465 digits long (footprint: ~000,045KB, checksum: c536,0c57): 2,146,225 MokujINs i.e. digits per second.
Multiplying performance for operands 00,002,929 digits long (footprint: ~000,091KB, checksum: f1f7,a243): 8,579,041 MokujINs i.e. digits per second.
Multiplying performance for operands 00,005,857 digits long (footprint: ~000,183KB, checksum: 5a3a,564c): 34,304,449 MokujINs i.e. digits per second.
Multiplying performance for operands 00,011,714 digits long (footprint: ~000,366KB, checksum: 464c,a182): 137,217,796 MokujINs i.e. digits per second.
Multiplying performance for operands 00,023,428 digits long (footprint: ~000,732KB, checksum: f9a9,2f30): 548,871,184 MokujINs i.e. digits per second.
Multiplying performance for operands 00,046,855 digits long (footprint: ~001,464KB, checksum: 86ba,47be): 439,078,205 MokujINs i.e. digits per second.
Multiplying performance for operands 00,093,710 digits long (footprint: ~002,928KB, checksum: b194,1027): 439,078,205 MokujINs i.e. digits per second.
Multiplying performance for operands 00,187,419 digits long (footprint: ~005,856KB, checksum: 97dc,040c): 439,073,519 MokujINs i.e. digits per second.
Multiplying performance for operands 00,374,838 digits long (footprint: ~011,713KB, checksum: 13e5,96d7): 440,449,925 MokujINs i.e. digits per second.
Multiplying performance for operands 00,749,676 digits long (footprint: ~023,427KB, checksum: 8ccc,73c7): 440,449,925 MokujINs i.e. digits per second.
Multiplying performance for operands 01,499,351 digits long (footprint: ~046,854KB, checksum: 8db2,fd7c): 440,363,059 MokujINs i.e. digits per second.
Dumping the result to 'MokujIN.txt' ... OK
Total Time: 6,807 second(s).

Kernel  Time =     0.280 =    0%
User    Time = 27091.664 =  398%
Process Time = 27091.944 =  398%    Virtual  Memory =    475 MB
Global  Time =  6806.463 =  100%    Physical Memory =     55 MB

C:\Nakamichi_brutal_tests_Sanbashi>
Where the 'Silkcaterpillar', 'Nostalgia' and 'Abyss' meet.

It's time for the latest variant being an extended 'Kaidanji', 16bit window has become 24bit.
Its name is 蚕/回顧/懐古 'Kaiko' a.k.a. 'Silkcaterpillar/Recollection/Nostalgia'.
The natural follow-up will be 深潭 'Shintan' a.k.a. 'Abyss', first bit will be used as 16/8bit Match-Length selector.
The reason to [en]cluster these beauties is the process of deep (16MB backward) revocation - abysmally reinvoking for nowadays caches.

The Silkworm (Bombyx mori) is the caterpillar of a moth whose cocoon is used to make silk; it is not a worm at all.
This insect is also called the silkworm-moth and the mulberry silkworm. It is native to Northern China.
It has 3 pairs of thorax/thoracic legs, 4 pairs of abdomen/abdominal prolegs and 1 pair of anal prolegs, or 3x2+4x2+1x2=16.
Guess what, the future Nakamichi will enforce no more no less than 16 threads.

Class: Insecta (insects)
Order: Lepidoptera (butterflies and moths)
Suborder: Ditrysia (Moths, Butterflies, Skippers)
Superfamily: Bombycoidea
Family: Bombycidae
Genus: Bombyx
Species: Bombyx mori

Some facts:
The silk from the silkworm's cocoon is a single, continuous thread.
It is made of a protein that is secreted from two salivary glands in the caterpillar's head.
The Chinese have harvested silk from silkworm cocoons for thousands of years.
To harvest silk, the silkworm is allowed to spin its cocoon and it is then put in boiling water to kill the pupa and help unravel the thread.
Each cocoon contains a single silk thread that is about 300 to 900 meters long.
It takes 5000 silkworms to make a single kimono, in fact one kimono is worth, as a basis, 5000 deaths, enjoy.

"Remotely-Operated-Vehicle KAIKO reached the deepest area of Mariana trench and made the deepest diving record of 10,911m on March 24, 1995."
/Development and Construction of Launcher System of 10000m-Class Remotely-Operated-Vehicle 'KAIKO' 'Mitsubishi Heavy Industry'/

Nakamichi
The Disturbing Ways We Extract Silk From Silkworms

The latest research on silkworms is wonderful news on the fashion front,
opening up the possibility for new textiles and more efficient manufacturing methods.
But for the silkworms, it also sounds kind of creepy in a science fiction nightmare way.

First, a little background. Humans have had a long and rewarding relationship with the
Chinese mulberry silkworm Bombyx mori, ever since someone figured out 6,000 years ago
how to unravel the threads from the caterpillars' cocoons and weave them into gorgeous
textiles for China's emperors. The Chinese managed to keep the process secret for centuries,
until it became the object of history's first known instance of commercial espionage.
In the sixth century A.D., the European emperor Justinian, according to legend,
dispatched two monks to China. They returned with both the silkworm eggs and seeds
for the mulberry trees on which they feed, smuggled home inside bamboo walking sticks.

The result today is a global industry. China is once again leading the world,
producing 58,000 tons of silk annually. (And that is a lot of caterpillars.)
The United States also had a thriving silk industry, until the introduction of nylon
in World War II. My family was among the many that benefited from it: My grandfather
was a warper at silk mills in Manchester, Connecticut, and Paterson, N.J.
So without silkworm wages — not to put too fine a point on it — there would be no me.

Now back to the new research, just out in the journal Biomacromolecules. British researchers
have devised a means for continuously milking silk from living silkworms. This is a big deal
because of an unfortunate fact behind the loveliness of silk: Up until to now, the only way
to unravel silk was to boil the cocoons, killing the silkworms inside. Mohandas Gandhi
criticized the process, as have modern animal rights activists. But when researchers tried
to extract the silk more directly, the caterpillars resisted, clamping onto the line and
snapping it. The record for reeling a strand of silk out of a living caterpillar was just six meters.

The researchers in the new study noticed that silkworms employ a "play dead" behavior,
lingering in a state of self-induced paralysis when injured. Otherwise, the caterpillar's way
of moving — picture the classic inchworm — would cause hydrostatic pressure and further tear any wound.
Alex Wood, a physician and entomologist, identified the chemical the caterpillars rely on.
By injecting it into caterpillars, researchers at Oxford University were able to induce a state
of semi-paralysis.

The resulting production technique could make buying silk acceptable again, for people who have balked
because of that business about boiling. Even so, the new technique doesn't make a pretty picture:
The caterpillar is attached to a stick and suspended in mid-air, like one of the unconscious
organ donors in the science fiction movie Coma. One end of the silk that the worm is producing becomes
attached to a reel, which slowly winds it up, keeping time with the caterpillar.
In its semi-conscious state, the caterpillar may realize what's happening. And it may not like it.
But paralysis means that it is too weak to snap the wonderfully strong silk that it is producing.
So far, the researchers have been able to extract silk from a single caterpillar for up to six hours
and a record of 500 meters.

The press release for the new study touts the discovery as the key to manufacturing a variety
of new silk products, including medical implants (as in Coma). It also highlights "the exciting
potential for genetically modifying silkworms to induce paralysis 'on-demand,' a particularly useful
feature for mass-rearing."

But to be honest, I think I'm going to have bad dreams tonight. You know the kind — lying there paralyzed
while strangers move around in the shadows.

I suppose, though, that it is at least a step up from being boiled alive.

/Richard Conniff, September 24, 2013/

Nakamichi
Update, 2014-Jun-03:

Two things.

First.
A variant dedicated entirely to ZMM enters, both literals and matches are handled by 64bytes long registers.
This etude (32/512)bit is called 'Inazuma' a.k.a. 'Lightning' and is intended to outspeed 'Bhagavati' for my PAGODA files.
'Inazuma' is a derivative of 'Sanbashi' with next changes:
- Sliding windows sizes are (8+8-3)/(8+16-3)/(8+24-3) or 13bit/21bit/29bit or 8KB/2MB/512MB;
- The literal MAX length should be increased from 31 to 63, for now stays 31;
- The additional bit is used for Match_Length selector, 8/55.

Kara Swanson took this striking shot:

Nakamichi
; 'Inazuma' decompression loop, 9c-1e+2=128 bytes long:
; mark_description "Intel(R) C++ Compiler XE for applications running on IA-32, Version 14.0.1.139 Build 20131008";
; mark_description "-O3 -QxCORE-AVX2 -D_N_ZMM -FAcs";

.B8.3:                          
  0001e 8b 4c 24 14      mov ecx, DWORD PTR [20+esp]            
  00022 8b 0c 0a         mov ecx, DWORD PTR [edx+ecx]           
  00025 8b d9            mov ebx, ecx                           
  00027 83 e3 03         and ebx, 3                             
  0002a 75 28            jne .B8.5 
.B8.4:                          
  0002c 8b 5c 24 14      mov ebx, DWORD PTR [20+esp]            
  00030 0f b6 c9         movzx ecx, cl                          
  00033 c1 e9 03         shr ecx, 3                             
  00036 62 f1 7c 48 10 
        84 1a 01 00 00 
        00               vmovups zmm0, ZMMWORD PTR [1+edx+ebx]  
  00041 8b 74 24 10      mov esi, DWORD PTR [16+esp]            
  00045 8d 54 0a 01      lea edx, DWORD PTR [1+edx+ecx]         
  00049 62 f1 7c 48 11 
        04 30            vmovups ZMMWORD PTR [eax+esi], zmm0    
  00050 03 c1            add eax, ecx                           
  00052 eb 44            jmp .B8.6 
.B8.5:                          
  00054 bf ff ff ff ff   mov edi, -1                            
  00059 8d 34 dd 00 00 
        00 00            lea esi, DWORD PTR [ebx*8]             
  00060 f7 de            neg esi                                
  00062 8d 54 1a 01      lea edx, DWORD PTR [1+edx+ebx]         
  00066 83 c6 18         add esi, 24                            
  00069 c4 e2 4b f7 f7   shrx esi, edi, esi                     
  0006e 23 ce            and ecx, esi                           
  00070 8b 5c 24 10      mov ebx, DWORD PTR [16+esp]            
  00074 8b f1            mov esi, ecx                           
  00076 c1 ee 03         shr esi, 3                             
  00079 83 e1 04         and ecx, 4                             
  0007c f7 de            neg esi                                
  0007e c1 e9 02         shr ecx, 2                             
  00081 03 d8            add ebx, eax                           
  00083 03 f3            add esi, ebx                           
  00085 6b c9 2f         imul ecx, ecx, 47                      
  00088 62 f1 7c 48 10 
        06               vmovups zmm0, ZMMWORD PTR [esi]        
  0008e 62 f1 7c 48 11 
        03               vmovups ZMMWORD PTR [ebx], zmm0        
  00094 8d 44 08 08      lea eax, DWORD PTR [8+eax+ecx]         
.B8.6:                          
  00098 3b 54 24 18      cmp edx, DWORD PTR [24+esp]            
  0009c 72 80            jb .B8.3 
And quick (using not redundant enough files) results for 'Inazuma':
D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_benchmark\Nakamichi_Inazuma>dir

06/03/2014  09:08 AM           152,089 alice29.txt
06/03/2014  09:08 AM           110,525 alice29.txt.Nakamichi        !NumberOfFullLiterals (lower-the-better): 268!
05/16/2014  07:22 AM         3,153,408 CalgaryCorpus.tar
06/03/2014  02:25 PM         1,888,341 CalgaryCorpus.tar.Nakamichi  !NumberOfFullLiterals (lower-the-better): 6810!
06/03/2014  02:02 PM        10,192,446 dickens
06/03/2014  10:08 AM         5,765,200 dickens.Nakamichi            !NumberOfFullLiterals (lower-the-better): 510!

D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_benchmark\Nakamichi_Inazuma>
The archive (213KB) containing the source and executable:
http://www.sanmayce.com/Nakamichi/Nakamichi_Inazuma.zip

Second.
Some stats on incoming compression scheme based on Leprechaun_BBhex - my superfast fixed-length-unique-chunks ripper.
Back in 1989, Prof. Haruhiko Okumura used 256 Binary-trees to speed up Match Finding in his LZSS tool.
Many thanks go to him, herein extending/modifying his approach I chose to use 16,777,216 B-trees order 3.
In fact, 128MB 1-way hash where each slot is 8bytes long, housing the root of a tree.

The archive (615KB) containing the sources and executables:
http://www.sanmayce.com/Nakamichi/Get_all_Building-Blocks_008-016-032-064-bytes_long_from_ANY_file.zip
Below, the evaluation of 'enwik8' is given:
D:\_KAZE\Get_all_Building-Blocks_008bytes_long_from_ANY_file>type Dump_BuildingBlocks_Order_008-016-032-064_footprint-02GB.bat
@echo off
if '%1'=='' goto usage
dir %1/b> %1.lst
Leprechaun_BB008hex_32p_Intel_32bit.exe %1.lst %1_All_Unique_008bytes_long_blocks_in_HEX.txt 1634567 y
Leprechaun_BB016hex_32p_Intel_32bit.exe %1.lst %1_All_Unique_016bytes_long_blocks_in_HEX.txt 1634567 y
Leprechaun_BB032hex_32p_Intel_32bit.exe %1.lst %1_All_Unique_032bytes_long_blocks_in_HEX.txt 1634567 y
Leprechaun_BB064hex_32p_Intel_32bit.exe %1.lst %1_All_Unique_064bytes_long_blocks_in_HEX.txt 1634567 y
goto terminate
:usage
echo Usage: Dump_BuildingBlocks_Order_008-016-032-064_footprint-02GB.bat filename
echo.
:terminate

D:\_KAZE\Get_all_Building-Blocks_008bytes_long_from_ANY_file>Dump_BuildingBlocks_Order_008-016-032-064_footprint-02GB.bat enwik8
...
Results for Building-Blocks order 08:
Total memory needed for one pass: 957,260KB
Total distinct phrases: 18,772,793
Total time: 143 second(s)

Results for Building-Blocks order 16:
Total memory needed for one pass: 5,339,645KB
Total distinct phrases: 73,117,571
Total time: 262 second(s)

Results for Building-Blocks order 32:
Total memory needed for one pass: 11,253,648KB
Total distinct phrases: 93,753,132
Total time: 390 second(s)

Results for Building-Blocks order 64:
Total memory needed for one pass: 20,997,768KB
Total distinct phrases: 97,831,594
Total time: 559 second(s)

Note1: Each byte is dumped as HEX i.e. 2bytes, also CRLF at end of each line is +2, thus 97,831,594*(2*64+2) = 12,718,107,220.
Note2: For one pass (instead of 32 passes) the needed memory is up to 21GB.

Now, the TOTAL number of BBs for 100,000,000 bytes long window/block is 100,000,000-ORDER+1, or:
100,000,000-08+1=99,999,993 TOTAL BBs; 18,772,793 UNIQUE BBs; 99,999,993/18,772,793= 5.32 ratio
100,000,000-16+1=99,999,985 TOTAL BBs; 73,117,571 UNIQUE BBs; 99,999,985/73,117,571= 1.36 ratio
100,000,000-32+1=99,999,969 TOTAL BBs; 93,753,132 UNIQUE BBs; 99,999,969/93,753,132= 1.06 ratio
100,000,000-64+1=99,999,937 TOTAL BBs; 97,831,594 UNIQUE BBs; 99,999,937/97,831,594= 1.02 ratio

These 4 orders are very informative what type of data the file is.
The steep ratio drop after ORDER 16 suggests that no big gain would come for Match Length 16+.
Quite straightforward is the whole mumbo-jumbo:
- once knowing the number of occurrences for each BB it is easy to obtain the GOLDEN table housing the ABSOLUTE OFFSETS;
- each position/offset takes 4bytes therefore the table would be 4x(100,000,000-ORDER+1);
- for the extreme mode if the window/block is 512MB then the table is ~2GB;
- the power of B-tree order 3 is SHazam+amazing=SHamazing, the cluster of OFFSETS is findable with only 3 lookups;
- if we use subwindow/subblock within the window/block, say 256KB, the encoding is preceding by 'ON THE FLY' BB ripping for this subwindow/subblock.
D:\_KAZE\Get_all_Building-Blocks_008bytes_long_from_ANY_file>dir

06/03/2014  01:13 AM               595 Dump_BuildingBlocks_Order_008-016-032-064_footprint-02GB.bat
06/03/2014  01:14 AM               599 Dump_BuildingBlocks_Order_008-016-032-064_footprint-20GB.bat
05/16/2014  07:22 AM       100,000,000 enwik8
06/03/2014  01:24 AM                 8 enwik8.lst
06/03/2014  12:09 AM       337,910,274 enwik8_All_Unique_008bytes_long_blocks_in_HEX.txt
06/03/2014  01:25 AM       337,910,274 enwik8_All_Unique_008bytes_long_blocks_in_HEX_01p.txt
06/03/2014  12:14 AM     2,485,997,414 enwik8_All_Unique_016bytes_long_blocks_in_HEX.txt
06/03/2014  12:20 AM     6,187,706,712 enwik8_All_Unique_032bytes_long_blocks_in_HEX.txt
06/03/2014  12:29 AM    12,718,107,220 enwik8_All_Unique_064bytes_long_blocks_in_HEX.txt
06/03/2014  02:13 AM            58,164 Leprechaun.LOG
06/03/2014  01:17 AM           140,800 Leprechaun_BB008hex_01p_Intel_64bit.exe
06/02/2014  10:01 PM           129,536 Leprechaun_BB008hex_32p_Intel_32bit.exe
06/03/2014  01:17 AM           141,824 Leprechaun_BB016hex_01p_Intel_64bit.exe
06/02/2014  10:02 PM           131,072 Leprechaun_BB016hex_32p_Intel_32bit.exe
06/03/2014  01:17 AM           142,336 Leprechaun_BB032hex_01p_Intel_64bit.exe
06/02/2014  10:02 PM           131,584 Leprechaun_BB032hex_32p_Intel_32bit.exe
06/03/2014  01:17 AM           141,312 Leprechaun_BB064hex_01p_Intel_64bit.exe
06/02/2014  10:02 PM           131,584 Leprechaun_BB064hex_32p_Intel_32bit.exe
06/03/2014  01:17 AM           326,299 Leprechaun_BBhex.c
06/03/2014  01:18 AM            57,472 Leprechaun_BBhex_01p.zip
06/03/2014  01:16 AM            57,460 Leprechaun_BBhex_32p.zip
06/03/2014  01:19 AM               441 Leprechaun_BBhex_COMPILE.BAT
05/16/2014  07:22 AM       206,908,949 OSHO.TXT
06/03/2014  01:27 AM                10 OSHO.TXT.lst
06/03/2014  01:31 AM       161,216,928 OSHO.TXT_All_Unique_008bytes_long_blocks_in_HEX.txt
06/03/2014  01:26 AM       161,216,928 OSHO.TXT_All_Unique_008bytes_long_blocks_in_HEX_01p.txt
06/03/2014  01:39 AM     3,134,852,122 OSHO.TXT_All_Unique_016bytes_long_blocks_in_HEX.txt
06/03/2014  01:53 AM    12,407,731,974 OSHO.TXT_All_Unique_032bytes_long_blocks_in_HEX.txt
06/03/2014  02:13 AM    25,798,574,490 OSHO.TXT_All_Unique_064bytes_long_blocks_in_HEX.txt
06/03/2014  01:55 AM               258 RIP_BB-008bytes_in_one_pass.bat

D:\_KAZE\Get_all_Building-Blocks_008bytes_long_from_ANY_file>type enwik8_All_Unique_008bytes_long_blocks_in_HEX.txt|more
373735373C2F6964
6F766965297C5472
526F6D615D5D0A23
0A5B5B73636F3A42
746820412E4A2E20
74692727272E2727
6E647C5374657068
6F7273656E20486F
...

D:\_KAZE\Get_all_Building-Blocks_008bytes_long_from_ANY_file>type enwik8_All_Unique_016bytes_long_blocks_in_HEX.txt|more
616374696F6E3D7072696E7420416C70
CF81CEBFCEBD27272028656C65637472
2074686174207365676D656E74206C65
5D5D0A0A546865206E65787420737465
7665642066726F6D2061206461797469
726C6965737420646561646C696E6520
68616E20697320706879736963616C6C
697327272C204C6F6E646F6E3A204765
6F6C756D6573292E20204E657720596F
...

D:\_KAZE\Get_all_Building-Blocks_008bytes_long_from_ANY_file>type enwik8_All_Unique_032bytes_long_blocks_in_HEX.txt|more
756F743B73656520616C736F2671756F743B206C696E6B73290A2A205B5B4172
84D8AFD98AD98620D987D985D8A7D98AD988D9862727272920285B5B4D617263
27270A0A4175737472616C69616E205B5B466F726569676E204D696E69737465
5371756172652727272C206E6F742070617274206F6620746865206F72696769
637475726573202C636F6E76656E74696F6E20657463275D0A2A205B68747470
756F743B49662049204C6F76656420596F752671756F743B20616E642C206861
69736820706F736974696F6E206F6E20746865207465727269746F7269616C20
20496E61636365737369626C6520496E204368696E615D272727204368696E61
26616D703B6E6273703B26616D703B6E6273703B627D7D0A7C2026616D703B6E
...

D:\_KAZE\Get_all_Building-Blocks_008bytes_long_from_ANY_file>type enwik8_All_Unique_064bytes_long_blocks_in_HEX.txt|more
6E6520426F7576696572204B656E6E65647920285456206D6F766965297C4A61637175656C696E6520426F7576696572204B656E6E6564795D5D272720283139
6573292E0A0A3D3D436F676E61746573207769746820456E676C6973683D3D0A546865726520617265206D616E79204765726D616E20776F7264732074686174
6976655D0A2A5B687474703A2F2F75736572732E616265722E61632E756B2F6467772F6473702E68746D20445350206C696E6B735D0A2A5B687474703A2F2F77
67202671756F743B707572697479206F6620696E74656E74696F6E732671756F743B292E20466F7220746865206E657874206669667465656E20796561727320
6E20456E676C6973682C206173207468657920616C6C20657870726573732061205B5B5375627365747C7375627365745D5D2072656C6174696F6E736869702E
72732C20436F6D6D6F646F726520616C736F207461726765746564206465706172746D656E742073746F72657320616E6420746F792073746F7265732E205468
206D6F7265207375736365707469626C6520746F2073756767657374696F6E2E2057696E746572205B313935305D20636F6D6D656E7473207468617420746865
5D5D202D205B5B4A616D6573204F7469735D5D2C20416D65726963616E206C617779657220616E642070617472696F742028642E205B5B313738335D5D290A2A
6965772E6A73703F61727469643D38373926616D703B6C65747465723D4226616D703B7365617263683D6265726E6172645F6F665F636C616972766175785D20
756C642077617264206F66662074686520636F6C64657374206368696C6C2E205468652042756E6461626572672044697374696C6C696E6720436F6D70616E79
...

D:\_KAZE\Get_all_Building-Blocks_008bytes_long_from_ANY_file>
Update, 2014-Jun-02:

Couldn't resist not to try three 'weird' most simplistic variations called 'Zangetsu', 'Sanrenpatsu' and 'Nirenpatsu'.
Respectively with 64KB/256B/256B window and fixed Match Length 4/3/2.
King me! My first Japanese word coinage - 'Sanrenpatsu' after the 'Nirenpatsu', it means a triple-barrelled gun.
In here 'Zangetsu' has different meaning than 斬月, 'Slaying Moon', see further below.

Milky Way framing the setting Moon at dawn
Astrophotographer Miguel Claro captured this stunning 21-image mosaic showing the arch of the Milky Way framing the setting Moon at dawn.
Mr. Claro used a Canon 60Da – ISO1600 Lens 24mm f/2; Exp. 15 seconds, taken on 06/04/2013 at 5:32 AM local time.
Mr. Claro said via email:
"Near the center at the right of palm trees, the moon shines brightly,
although not interfering with the giant arc of the Milky Way
where it is possible to distinguish a lot of constellations like Ursa Minor,
with the Polaris star to the left of the image,
until the swan (Cygnus), with its North America nebula (NGC7000) clearly visible, down to the right,
we still find the constellation of Sagittarius and Scorpio, with the brilliant super giant star, Antares."

computerfinger said on April 16, 2013 at 4:59 PM:
Truly beautiful.
But... if that is Polaris to the left, then isn't it the "rising" moon at dawn (not the "setting" as captioned)?

Antares
A giant red binary star, the brightest in the constellation Scorpio, about 424 light-years from Earth.
/Heritage/

Walter Koprolin did another superb shot:

Nakamichi
Mr. Koprolin's notes:
"The moon rises to end a long night of astrophotography at Ebenwaldhöhe, Lower Austria, my most frequented observation site.
The lowlands were covered by a layer of fog which blocked all artificial light pollution from below.
Three airplane trails can be spotted near the moon. The bright spot below the moon is a lens reflex.
Photo taken in December 2004, exposure was 30 seconds using a zoom lens at 18mm f/3.5."
; 'Zangetsu' decompression loop, 5c-22+2=60 bytes long:
; mark_description "Intel(R) C++ Compiler XE for applications running on IA-32, Version 12.1.1.258 Build 20111011";
; mark_description "-O3 -QxSSE2 -D_N_XMM -FAcs";

.B7.3:                          
  00022 0f b7 3c 32      movzx edi, WORD PTR [edx+esi]          
  00026 f7 c7 f0 00 00 
        00               test edi, 240                          
  0002c 74 13            je .B7.5 
.B7.4:                          
  0002e f7 df            neg edi                                
  00030 8d 0c 03         lea ecx, DWORD PTR [ebx+eax]           
  00033 03 f9            add edi, ecx                           
  00035 83 c2 02         add edx, 2                             
  00038 83 c0 04         add eax, 4                             
  0003b 8b 3f            mov edi, DWORD PTR [edi]               
  0003d 89 39            mov DWORD PTR [ecx], edi               
  0003f eb 17            jmp .B7.6 
.B7.5:                          
  00041 81 e7 ff 00 00 
        00               and edi, 255                           
  00047 f3 0f 6f 44 32 
        01               movdqu xmm0, XMMWORD PTR [1+edx+esi]   
  0004d f3 0f 7f 04 18   movdqu XMMWORD PTR [eax+ebx], xmm0     
  00052 03 c7            add eax, edi                           
  00054 8d 54 3a 01      lea edx, DWORD PTR [1+edx+edi]         
.B7.6:                          
  00058 3b 54 24 18      cmp edx, DWORD PTR [24+esp]            
  0005c 72 c4            jb .B7.3 
; 'Sanrenpatsu' decompression loop, 51-22+2=49 bytes long:
; mark_description "Intel(R) C++ Compiler XE for applications running on IA-32, Version 12.1.1.258 Build 20111011";
; mark_description "-O3 -QxSSE2 -D_N_XMM -FAcs";

.B6.3:                          
  00022 0f b6 3c 1a      movzx edi, BYTE PTR [edx+ebx]          
  00026 83 ff 10         cmp edi, 16                            
  00029 72 11            jb .B6.5 
.B6.4:                          
  0002b f7 df            neg edi                                
  0002d 8d 0c 06         lea ecx, DWORD PTR [esi+eax]           
  00030 03 f9            add edi, ecx                           
  00032 42               inc edx                                
  00033 83 c0 03         add eax, 3                             
  00036 8b 3f            mov edi, DWORD PTR [edi]               
  00038 89 39            mov DWORD PTR [ecx], edi               
  0003a eb 11            jmp .B6.6 
.B6.5:                          
  0003c f3 0f 6f 44 1a 
        01               movdqu xmm0, XMMWORD PTR [1+edx+ebx]   
  00042 f3 0f 7f 04 30   movdqu XMMWORD PTR [eax+esi], xmm0     
  00047 03 c7            add eax, edi                           
  00049 8d 54 3a 01      lea edx, DWORD PTR [1+edx+edi]         
.B6.6:                          
  0004d 3b 54 24 18      cmp edx, DWORD PTR [24+esp]            
  00051 72 cf            jb .B6.3 
And quick results for 'Zangetsu', 'Sanrenpatsu' and 'Nirenpatsu':
D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_benchmark\Nakamichi_Zangetsu_Sanrenpatsu_Nirenpatsu>dir

09/26/1996  04:51 PM           152,089 alice29.txt
06/03/2014  12:15 AM           134,071 alice29.txt.Nirenpatsu.Nakamichi
06/02/2014  11:39 PM           137,028 alice29.txt.Sanrenpatsu.Nakamichi
06/02/2014  11:41 PM            86,014 alice29.txt.Zangetsu.Nakamichi
05/16/2014  07:22 AM         3,153,408 CalgaryCorpus.tar
06/03/2014  12:15 AM         2,331,533 CalgaryCorpus.tar.Nirenpatsu.Nakamichi
06/02/2014  11:39 PM         2,327,617 CalgaryCorpus.tar.Sanrenpatsu.Nakamichi
06/02/2014  11:43 PM         1,852,387 CalgaryCorpus.tar.Zangetsu.Nakamichi
05/16/2014  07:22 AM        10,192,446 dickens
06/03/2014  12:15 AM         7,588,467 dickens.Nirenpatsu.Nakamichi
06/02/2014  11:39 PM         8,175,469 dickens.Sanrenpatsu.Nakamichi
06/02/2014  11:42 PM         5,577,996 dickens.Zangetsu.Nakamichi
05/16/2014  07:22 AM       100,000,000 enwik8
06/03/2014  12:16 AM        75,206,423 enwik8.Nirenpatsu.Nakamichi                          ! 304 MB/s !
06/02/2014  11:39 PM        77,603,111 enwik8.Sanrenpatsu.Nakamichi                         ! 507 MB/s !
06/02/2014  11:57 PM        58,954,442 enwik8.Zangetsu.Nakamichi                            ! 607 MB/s !

D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_benchmark\Nakamichi_Zangetsu_Sanrenpatsu_Nirenpatsu>
The archive (226KB) containing the sources and executables:
http://www.sanmayce.com/Nakamichi/Nakamichi_Zangetsu_Sanrenpatsu_Nirenpatsu.zip

Nakamichi
What a ringy word 'Hanabanashii', with flower kanji again, 'flowery battle', who if not Japanese are to coin such Hanabanashii bigram.

Update, 2014-May-28:

A little reordering led to a faster variation of 'Hanabi' called 'Hanazakari' a.k.a. 'full bloom'.
So far, the two most useful (sizewise/speedwise) variants are Sanbashi/Hanazakari.
The basic change is to avoid bit operations, schocker, like this:
|1stLSB      |2ndLSB  |
-----------------------
|xxxx|x|0|0|0|xxxxxxxF|
-----------------------
[1bit            16bit]
The 6/7/8 bits are the TAG, the TAG is within the OFFSET area, thus OFFSETS are byte-aligned i.e. SHRless.
In a way 'Hanazakari' is a strengthened 'Kaidanji', 16th bit says whether Match_Length is 8/5 bytes long, 0/1 respectively.
It may seem counter-intuitive but in that way the inner 32KB are 'saturated' with longer matches while the shorter ones are 'banished' in outer 32KB of the window.
Of course all OFFSETS with 6/7/8 bits being 000 are to be discarded.
And quick results for 'Hanazakari':
D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_benchmark\Nakamichi_Sanbashi_Hanabi_Kaidanji>dir

05/16/2014  07:22 AM         3,153,408 CalgaryCorpus.tar
05/25/2014  12:13 PM         1,610,426 CalgaryCorpus.tar.Hanabi.Nakamichi
05/27/2014  11:16 PM         1,620,653 CalgaryCorpus.tar.Hanazakari.Nakamichi
05/26/2014  05:51 PM         1,560,737 CalgaryCorpus.tar.Sanbashi.Nakamichi
05/16/2014  07:22 AM       100,000,000 enwik8
05/26/2014  12:19 AM        53,293,681 enwik8.Hanabi.Nakamichi                              
05/28/2014  12:03 AM        54,693,537 enwik8.Hanazakari.Nakamichi                          ! 554 MB/s !
05/19/2014  06:37 AM        46,881,842 enwik8.Sanbashi.Nakamichi                            ! 381 MB/s !
05/16/2014  07:22 AM       846,351,894 Kazahana_on.PAGODA-order-5.txt
05/26/2014  02:12 AM       264,635,955 Kazahana_on.PAGODA-order-5.txt.Hanabi.Nakamichi      
05/28/2014  04:37 AM       268,125,302 Kazahana_on.PAGODA-order-5.txt.Hanazakari.Nakamichi  ! 795 MB/s !
05/27/2014  05:50 PM       147,898,802 Kazahana_on.PAGODA-order-5.txt.Sanbashi.Nakamichi    ! 608 MB/s !
05/19/2014  03:38 PM       206,908,949 OSHO.TXT
05/25/2014  11:40 PM        87,774,118 OSHO.TXT.Hanabi.Nakamichi                            
05/28/2014  01:38 AM        88,953,095 OSHO.TXT.Hanazakari.Nakamichi                        ! 549 MB/s !
05/19/2014  03:30 PM        73,837,310 OSHO.TXT.Sanbashi.Nakamichi                          ! 394 MB/s !
05/22/2014  02:43 PM       211,948,544 silesia.tar
05/25/2014  10:45 PM       118,424,825 silesia.tar.Hanabi.Nakamichi                         
05/28/2014  07:36 AM       108,267,950 silesia.tar.Hanazakari.Nakamichi                     ! 680 MB/s !
05/21/2014  11:26 AM        93,058,243 silesia.tar.Sanbashi.Nakamichi                       ! 561 MB/s !

D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_benchmark\Nakamichi_Sanbashi_Hanabi_Kaidanji>
The archive (120KB) containing the source and executable:
http://www.sanmayce.com/Nakamichi/Nakamichi_Hanazakari.zip
; 'Hanazakari' decompression loop:
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -D_N_GP -FAcs";

.B6.3::                         
  00017 41 0f b7 04 12   movzx eax, WORD PTR [r10+rdx]          
  0001c a9 e0 00 00 00   test eax, 224                          
  00021 74 26            je .B6.5 
.B6.4::                         
  00023 89 c3            mov ebx, eax                           
  00025 41 83 c1 02      add r9d, 2                             
  00029 c1 e8 0f         shr eax, 15                            
  0002c 48 f7 db         neg rbx                                
  0002f 48 03 d9         add rbx, rcx                           
  00032 45 89 ca         mov r10d, r9d                          
  00035 8d 04 40         lea eax, DWORD PTR [rax+rax*2]         
  00038 f7 d8            neg eax                                
  0003a 4d 8b 0c 1b      mov r9, QWORD PTR [r11+rbx]            
  0003e 4d 89 0c 0b      mov QWORD PTR [r11+rcx], r9            
  00042 46 8d 5c 18 08   lea r11d, DWORD PTR [8+rax+r11]        
  00047 eb 32            jmp .B6.6 
.B6.5::                         
  00049 49 8b 5c 12 01   mov rbx, QWORD PTR [1+r10+rdx]         
  0004e 0f b6 c0         movzx eax, al                          
  00051 49 89 1c 0b      mov QWORD PTR [r11+rcx], rbx           
  00055 49 8b 5c 12 09   mov rbx, QWORD PTR [9+r10+rdx]         
  0005a 49 89 5c 0b 08   mov QWORD PTR [8+r11+rcx], rbx         
  0005f 49 8b 5c 12 11   mov rbx, QWORD PTR [17+r10+rdx]        
  00064 4d 8b 54 12 19   mov r10, QWORD PTR [25+r10+rdx]        
  00069 49 89 5c 0b 10   mov QWORD PTR [16+r11+rcx], rbx        
  0006e 4d 89 54 0b 18   mov QWORD PTR [24+r11+rcx], r10        
  00073 45 8d 54 01 01   lea r10d, DWORD PTR [1+r9+rax]         
  00078 44 03 d8         add r11d, eax                          
.B6.6::                         
  0007b 45 89 d1         mov r9d, r10d                          
  0007e 45 3b c8         cmp r9d, r8d                           
  00081 72 94            jb .B6.3 
Update, 2014-May-24:

Curious to see what the gap between 'Kaibutsu' and 'Kaidanji' holds, this variant is named 'Hanabi' a.k.a. 'fire-works'.
It's cute how Japanese see flowers in snowflakes and fire-bursts, 'Hanabi' has two kanjis the first being 'Hana' - flower.
The basic premise is to avoid/minimize bit operations, shocker, like this:
|1stLSB      |2ndLSB  |
-----------------------
|x|0|0|0|xxxx|xxxxxxxx|
-----------------------
[1bit            16bit]
Reading a DWORD is not fast on Core2 but guess what, I don't care, to me it is history, I target i7.
Yet, for now, WORD stays.
The first four bits are the TAG, the TAG is within the OFFSET area, thus OFFSETS are byte-aligned i.e. SHRless.
In a way 'Hanabi' is a strengthened 'Kaibutsu', first bit says whether Match_Length is 5 or 8 bytes long.
Of course all OFFSETS with 2/3/4 bit being unset are to be discarded.
Literal MAX length is 15, handled by one XMM transfer, the tradeoff is kinda brutal but at the same time trickish even maverickish.
And quick results for 'Hanabi':
D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_benchmark\Nakamichi_Hanabi>dir

05/24/2014  02:02 PM         3,153,408 CalgaryCorpus.tar
05/24/2014  02:02 PM         1,610,426 CalgaryCorpus.tar.Nakamichi  !NumberOfFullLiterals (lower-the-better): 16001!
05/22/2014  02:41 PM       100,000,000 enwik8
05/24/2014  02:44 PM        53,293,681 enwik8.Nakamichi             !NumberOfFullLiterals (lower-the-better): 165971!
05/16/2014  08:22 AM       206,908,949 OSHO.TXT
05/24/2014  03:55 PM        87,774,118 OSHO.TXT.Nakamichi           !NumberOfFullLiterals (lower-the-better): 78495!

D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_benchmark\Nakamichi_Hanabi>Nakamichi_Hanabi_XMM.exe enwik8.Nakamichi
Nakamichi 'Hanabi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 53293681 bytes ...
RAM-to-RAM performance: 507 MB/s.

D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_benchmark\Nakamichi_Hanabi>Nakamichi_Hanabi_XMM.exe OSHO.TXT.Nakamichi
Nakamichi 'Hanabi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 87774118 bytes ...
RAM-to-RAM performance: 548 MB/s.
The archive (118KB) containing the source and executable:
http://www.sanmayce.com/Nakamichi/Nakamichi_Hanabi.zip

Along with space speeds achieved by Cyan I saw m^2 to reach some supernifty speeds as well, way to go.

Update, 2014-May-22:

And the third dataset (MIX) 'silesia':
G:\Decompression_Showdown_Nakamichi-Sanbashi_vs_lzturbo_vs_lz4>dir/on

09/17/2013  02:22 PM           118,272 LZ4.exe
04/29/2013  09:35 PM           349,609 lzturbo.exe
05/18/2014  05:14 PM           110,080 Nakamichi_Sanbashi_GP_64bit.exe
05/22/2014  04:06 PM       211,948,544 silesia.tar
05/22/2014  03:49 PM        78,036,546 silesia.tar.lz4
05/22/2014  03:49 PM        77,361,268 silesia.tar.lzt
05/21/2014  12:26 PM        93,058,243 silesia.tar.Nakamichi
05/22/2014  03:59 PM               939 Results_Core2_T7500_fee.txt
05/22/2014  04:00 PM               939 Results_Core2_T7500_faw.txt
05/22/2014  03:59 PM               939 Results_Core2_T7500_fum.txt

Nakamichi_Sanbashi_GP_64bit.exe silesia.tar.Nakamichi: 
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 93058243 bytes ...
RAM-to-RAM performance: 561 MB/s.

Kernel  Time =     0.639 =   66%
User    Time =     0.327 =   33%
Process Time =     0.967 =  100%    Virtual  Memory =   1116 MB
Global  Time =     0.967 =  100%    Physical Memory =    293 MB

lzturbo.exe -d silesia.tar.lzt: 

Kernel  Time =     0.733 =  109%
User    Time =     0.312 =   46%
Process Time =     1.045 =  155%    Virtual  Memory =     98 MB
Global  Time =     0.670 =  100%    Physical Memory =     72 MB

LZ4.exe -d -f -Sx -v silesia.tar.lz4 silesia.tar: 

Kernel  Time =     0.608 =   86%
User    Time =     0.358 =   51%
Process Time =     0.967 =  137%    Virtual  Memory =     33 MB
Global  Time =     0.702 =  100%    Physical Memory =     35 MB
Conclusion: Sanbashi struggles with non-textual data, 'silesia' is composed of some big binary chunks.

Update, 2014-May-20:

Until today, I haven't looked into Cyan's lz4 source nor tested it.
Seeing the mind-blowing results of lz4 finally a decompression showdown between Sanbashi/lz4/LzTurbo comes here.
The thing that strikes me is the internal benchmark of lz4, it is simply the fastest I have ever seen.
Didn't have enough time to compress 'silesia.tar' (to be done soon), so the rest two (XML/TEXT) datasets were tested.
Again, I did the showdown on 1GB ramdisk using Windows 7 64bit.
So, let us see who-is-who:
G:\Decompression_Showdown_Nakamichi-Sanbashi_vs_lzturbo_vs_lz4>dir/on

05/20/2014  02:07 PM       100,000,000 enwik8
05/20/2014  02:06 PM        42,283,900 enwik8.lz4
05/20/2014  02:03 PM        41,929,879 enwik8.lzt
05/19/2014  07:37 AM        46,881,842 enwik8.Nakamichi
09/17/2013  02:22 PM           118,272 LZ4.exe
05/18/2014  09:24 PM            60,781 LZ4v140.zip
04/29/2013  09:35 PM           349,609 lzturbo.exe
04/13/2014  10:28 PM           855,237 lzturbo.zip
05/18/2014  01:17 PM               477 MakeEXEs_Sanbashi.bat
05/04/2014  06:38 PM             1,604 MokujIN prompt.lnk
05/18/2014  02:00 PM            74,777 Nakamichi_Sanbashi.c
05/20/2014  12:25 PM           325,120 Nakamichi_Sanbashi.doc
05/20/2014  12:25 PM           229,668 Nakamichi_Sanbashi.pdf
05/18/2014  05:14 PM           110,080 Nakamichi_Sanbashi_GP_64bit.exe
05/20/2014  12:15 PM           427,821 Nakamichi_Sanbashi_XMM_32bit.cod
05/20/2014  12:15 PM            96,256 Nakamichi_Sanbashi_XMM_32bit.exe
05/20/2014  02:09 PM       206,908,949 OSHO.TXT
05/20/2014  02:07 PM        71,399,305 OSHO.TXT.lz4
05/20/2014  02:06 PM        70,067,665 OSHO.TXT.lzt
05/19/2014  04:30 PM        73,837,310 OSHO.TXT.Nakamichi
05/20/2014  02:11 PM             4,062 Results.txt
05/20/2014  01:36 PM             4,062 Results_Core2_T7500_fee.txt
05/20/2014  01:42 PM             4,062 Results_Core2_T7500_faw.txt
05/20/2014  01:48 PM             4,062 Results_Core2_T7500_fum.txt
05/20/2014  01:59 PM             1,681 RUNME.BAT
05/04/2014  06:38 PM             4,096 timer32.exe

G:\Decompression_Showdown_Nakamichi-Sanbashi_vs_lzturbo_vs_lz4>LZ4.exe -b -9 enwik8
Nb of threads = 2 ; Compression Level = 9
enwik8          : 100000000 ->  42283793 ( 42.28%),   27.3 MB/s , 1370.5 MB/s

G:\Decompression_Showdown_Nakamichi-Sanbashi_vs_lzturbo_vs_lz4>LZ4.exe -b -9 OSHO.TXT
Nb of threads = 2 ; Compression Level = 9
OSHO.TXT        : 206908949 ->  71399094 ( 34.51%),   21.7 MB/s , 1365.7 MB/s

G:\Decompression_Showdown_Nakamichi-Sanbashi_vs_lzturbo_vs_lz4>
Fantastic speeds, Cyan, I salute you with one of my favorite French songs: 'Thievery Corporation - Un Simple Histoire'.
It would be informative LzTurbo to have had such internal benchmark as well.
And the main test, by running 'RUNME.BAT' we obtain 'Results.txt', again, I ran it three times, the global time is given:

Sanbashi: 0.577+1.076+0.577+1.076+0.577+1.076 = 4.959
LzTurbo:  0.343+0.639+0.343+0.624+0.358+0.639 = 2.946
lz4:      0.390+0.748+0.358+0.702+0.374+0.686 = 3.258

And the compression ratio:

Sanbashi: 46,881,842+73,837,310 = 120,719,152/(100000000+206908949)*100 = 39.3%
LzTurbo:  41,929,879+70,067,665 = 111,997,544/(100000000+206908949)*100 = 36.4%
lz4:      42,283,900+71,399,305 = 113,683,205/(100000000+206908949)*100 = 37.0%
G:\Decompression_Showdown_Nakamichi-Sanbashi_vs_lzturbo_vs_lz4>timer32 Nakamichi_Sanbashi_GP_64bit.exe enwik8.Nakamichi
Nakamichi 'Sanbashi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 46881842 bytes ...
RAM-to-RAM performance: 365 MB/s.

Kernel  Time =     0.296 =   52%
User    Time =     0.265 =   47%
Process Time =     0.561 =   99%    Virtual  Memory =   1072 MB
Global  Time =     0.564 =  100%    Physical Memory =    142 MB

G:\Decompression_Showdown_Nakamichi-Sanbashi_vs_lzturbo_vs_lz4>timer32 LZ4.exe -d -f -Sx -v enwik8.lz4 enwik8
*** LZ4 for Windows 32-bits v1.4, by Yann Collet (Sep 17 2013) ***
Blocks size : 4096 KB
Extracted  95 MB
Successfully decoded 100000000 bytes
Done in 0.35 s ==> 270.93 MB/s

Kernel  Time =     0.312 =   88%
User    Time =     0.156 =   44%
Process Time =     0.468 =  132%    Virtual  Memory =     33 MB
Global  Time =     0.352 =  100%    Physical Memory =     35 MB

G:\Decompression_Showdown_Nakamichi-Sanbashi_vs_lzturbo_vs_lz4>timer32 lzturbo.exe -d enwik8.lzt .

Kernel  Time =     0.421 =  126%
User    Time =     0.140 =   42%
Process Time =     0.561 =  168%    Virtual  Memory =     98 MB
Global  Time =     0.334 =  100%    Physical Memory =     71 MB

G:\Decompression_Showdown_Nakamichi-Sanbashi_vs_lzturbo_vs_lz4>
The archive (101MB) containing the source and executables and (2 of 3) datasets:
http://www.sanmayce.com/Nakamichi/Decompression_Showdown_Nakamichi-Sanbashi_vs_lzturbo_vs_lz4.7z

Conclusion: Sanbashi is inferior to the monsters lzturbo and lz4.
I wish I had the opportunity to make this very test on i7 3rd/4th generation CPU, XMM on my Core2 doesn't perform well.

Update, 2014-May-18:

It's time for the strongest, so far, variant - Nakamichi 'Sanbashi' a.k.a. 'pier'.
'Sanbashi' is a BRANCHLESS follow-up to Nakamichi 'Sanagi', a variant quite as 'Sanshi' but with few tweaks:
- Sliding windows sizes are (8-3)/(16-3)/(24-3) or 5bit/13bit/21bit or 32B/8KB/2MB;
- The literal MAX length is lowered from 63 to 31;
- The additional bit is used for Match_Length selector, 8/16.
09/26/1996  04:51 PM           152,089 alice29.txt
05/18/2014  03:30 PM            95,682 alice29.txt.Nakamichi        !NumberOfFullLiterals (lower-the-better): 267!
07/07/2003  02:48 PM           768,771 book1
05/18/2014  03:31 PM           470,903 book1.Nakamichi              !NumberOfFullLiterals (lower-the-better): 674!
05/18/2014  01:17 PM         3,153,408 CalgaryCorpus.tar
05/18/2014  01:11 PM         1,560,737 CalgaryCorpus.tar.Nakamichi  !NumberOfFullLiterals (lower-the-better): 6841!
05/18/2014  03:30 PM        10,192,446 dickens
05/18/2014  03:15 PM         4,614,250 dickens.Nakamichi            !NumberOfFullLiterals (lower-the-better): 558!
The archive (276KB) containing the source and executables:
http://www.sanmayce.com/Nakamichi/Nakamichi_Sanbashi.zip

; 'Sanbashi' decompression loop:
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -QxSSE2 -D_N_XMM -FAcs";

.B7.3::                         
  00024 45 89 cb         mov r11d, r9d                          
  00027 89 c7            mov edi, eax                           
  00029 49 03 fa         add rdi, r10                           
  0002c 41 8b 1c 13      mov ebx, DWORD PTR [r11+rdx]           
  00030 89 de            mov esi, ebx                           
  00032 83 e6 03         and esi, 3                             
  00035 75 26            jne .B7.5 
.B7.4::                         
  00037 f3 41 0f 6f 44 
        13 01            movdqu xmm0, XMMWORD PTR [1+r11+rdx]   
  0003e f3 0f 7f 07      movdqu XMMWORD PTR [rdi], xmm0         
  00042 0f b6 db         movzx ebx, bl                          
  00045 c1 eb 03         shr ebx, 3                             
  00048 f3 41 0f 6f 4c 
        13 11            movdqu xmm1, XMMWORD PTR [17+r11+rdx]  
  0004f f3 0f 7f 4f 10   movdqu XMMWORD PTR [16+rdi], xmm1      
  00054 03 c3            add eax, ebx                           
  00056 45 8d 4c 19 01   lea r9d, DWORD PTR [1+r9+rbx]          
  0005b eb 37            jmp .B7.6 
.B7.5::                         
  0005d 41 bb ff ff ff 
        00               mov r11d, 16777215                     
  00063 8d 0c f5 00 00 
        00 00            lea ecx, DWORD PTR [rsi*8]             
  0006a f7 d9            neg ecx                                
  0006c 44 03 ce         add r9d, esi                           
  0006f 83 c1 18         add ecx, 24                            
  00072 41 d3 eb         shr r11d, cl                           
  00075 41 23 db         and ebx, r11d                          
  00078 89 de            mov esi, ebx                           
  0007a 83 e3 04         and ebx, 4                             
  0007d c1 ee 03         shr esi, 3                             
  00080 48 f7 de         neg rsi                                
  00083 48 03 f7         add rsi, rdi                           
  00086 8d 04 58         lea eax, DWORD PTR [rax+rbx*2]         
  00089 83 c0 08         add eax, 8                             
  0008c f3 0f 6f 06      movdqu xmm0, XMMWORD PTR [rsi]         
  00090 f3 0f 7f 07      movdqu XMMWORD PTR [rdi], xmm0         
.B7.6::                         
  00094 45 3b c8         cmp r9d, r8d                           
  00097 72 8b            jb .B7.3 
Nakamichi 'Sanbashi' is the/an unoptimized Nakamichi 'Captain Apache', spelled NAKAMITCHI-KAPITAN-APATCHI.
/Heritage etymology for 'captain': from Late Latin 'capitaneus', chief/

Nakamichi
They are after me with guns, knives and fast fast horses ... they call him 'Captain Apache'.
/Lee Van Cleef/

Update, 2014-May-14:

Two new variants (tweaks in essence) 'Kaibutsu' and 'Sanagi' have been made/tested.

The archive (115KB) containing the source and executables:
http://www.sanmayce.com/Nakamichi/Nakamichi_Kaibutsu.zip

The benchmark (72MB) containing the source, executables and the XML dataset:
http://www.sanmayce.com/Nakamichi/Decompression_Showdown_Nakamichi_Sanagi_vs_Yappy.7z


Nakamichi 'Kaibutsu' is a variant quite as 'Kaidanji' but with lower Match_Length and removed shifting altogether, m^2 idea.
Finally, it's time for ZMM to enter in form of Nakamichi 'Sanagi', a variant quite as 'Sanshi' but with bigger Match_Length.
Also, in my view the limit (63) for literals is excellent, NumberOfFullLiterals is only 3932 for 'enwik8'.
'Sanagi' struggles with files < 4MB (its maximal window), in test below it gains strength crossing the 4MB threshold.
This is not a problem for me since the targeted files are always much much bigger than 4MB.
'Sanagi' a.k.a. 'chrysalis' tortured with some worse-case scenarios (being not big enough and/or not highly redundant):

Only changing next line:
#define Min_Match_Length (8+4) // In fact it equals 13
#define Min_Match_Length (8+3)
#define Min_Match_Length (8+2)
#define Min_Match_Length (8+1)
#define Min_Match_Length (8+0) // In fact it equals 9 (DEFAULT)
#define Min_Match_Length (8-1)
#define Min_Match_Length (8-2)
#define Min_Match_Length (8-3)
#define Min_Match_Length (8-4)
#define Min_Match_Length (8-5)
#define Min_Match_Length (8-6) // In fact it equals 3

Results for 'Sanagi' and 'Kaibutsu':
'Sanagi'       | 00,152,089 alice29.txt | 00,768,771 book1    | 10,192,446 dickens    | 100,000,000 enwik8   |
--------------------------------------------------------------------------------------------------------------
Min_Match = 13 | 00,124,696 / 844       | 00,660,746 / 5089   | 06,462,649 / 9333     |                      |
Min_Match = 12 | 00,119,425 / 594       | 00,624,368 / 3246   | 05,907,518 / 3440     |                      |
Min_Match = 11 | 00,113,639 / 366       | 00,583,797 / 1676   | 05,410,147 / 1162     |                      |
Min_Match = 10 | 00,107,859 / 196       | 00,541,034 / 0703   | 05,004,090 / 0433     |                      |
Min_Match = 09 | 00,102,947 / 097       | 00,504,548 / 0251   | 04,733,412 / 0177     | 047,939,156 / 003932 |
Min_Match = 08 | 00,098,578 / 043       | 00,475,715 / 0104   | 04,616,823 / 0081     |                      |
Min_Match = 07 | 00,096,646 / 014       | 00,459,782 / 0032   | 04,681,597 / 0039     |                      |
Min_Match = 06 | 00,097,860 / 005       | 00,465,742 / 0009   | 04,968,758 / 0009     |                      |
Min_Match = 05 | 00,104,408 / 002       | 00,503,838 / 0004   | 05,532,945 / 0003     |                      |
Min_Match = 04 | 00,120,349 / 000       | 00,593,028 / 0002   | 06,474,521 / 0001     |                      |
Min_Match = 03 | 00,153,922 / 000       | 00,772,711 / 0001   | 08,144,882 / 0000     |                      |
--------------------------------------------------------------------------------------------------------------
'Kaibutsu'     | 00,152,089 alice29.txt | 00,768,771 book1    | 10,192,446 dickens    | 100,000,000 enwik8   |
--------------------------------------------------------------------------------------------------------------
Min_Match = 05 | 00,080,133 / 106       | 00,416,358 / 0259   | 05,198,596 / 1448     | 056,188,976 / 104433 |
--------------------------------------------------------------------------------------------------------------
Note: The second number is NumberOfFullLiterals (lower-the-better).

'Kaibutsu' is such a little scamp, only 66 bytes long:
.B7.3::                         
	mov r11d, r9d                          
	movzx ebx, WORD PTR [r11+rdx]          
	movzx eax, bl                          
	cmp eax, 16                            
	jb .B7.5 
	add r9d, 2                             
	neg rbx                                
	add rbx, rcx                           
	mov rax, QWORD PTR [r10+rbx]           
	mov QWORD PTR [r10+rcx], rax           
	add r10d, 5                            
	jmp .B7.6 
.B7.5::                         
	movdqu xmm0, XMMWORD PTR [1+r11+rdx]   
	movdqu XMMWORD PTR [r10+rcx], xmm0     
	add r10d, eax                          
	lea r9d, DWORD PTR [1+r9+rax]          
.B7.6::                         
	cmp r9d, r8d                           
	jb .B7.3 
Unfortunately, too slow compared to 'Kaidanji':
D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_Kaibutsu>Nakamichi_Kaibutsu_xmm.exe enwik9.Nakamichi
Nakamichi 'Kaibutsu', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 548919605 bytes ...
RAM-to-RAM performance: 452 MB/s.

D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_Kaibutsu>Nakamichi_Kaibutsu_xmm.exe enwik8.Nakamichi
Nakamichi 'Kaibutsu', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 56188976 bytes ...
RAM-to-RAM performance: 507 MB/s.

D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_Kaibutsu>Nakamichi_Kaidanji_GP_64bit.exe enwik8.Kaidanji.Nakamichi
Nakamichi 'Kaidanji', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 63430147 bytes ...
RAM-to-RAM performance: 756 MB/s.

YAPPY: [b 2K] bytes 100000000 -> 67556975  67.6%  comp  38.7 MB/s  uncomp 602.5 MB/s 
YAPPY: [b 4K] bytes 100000000 -> 61850858  61.9%  comp  35.5 MB/s  uncomp 547.6 MB/s 
YAPPY: [b 8K] bytes 100000000 -> 57844183  57.8%  comp  32.3 MB/s  uncomp 516.4 MB/s 
YAPPY: [b 16K] bytes 100000000 -> 55841094  55.8%  comp  30.7 MB/s  uncomp 516.4 MB/s 
YAPPY: [b 32K] bytes 100000000 -> 54842431  54.8%  comp  30.7 MB/s  uncomp 509.5 MB/s 
YAPPY: [b 64K] bytes 100000000 -> 54347923  54.3%  comp  30.6 MB/s  uncomp 516.9 MB/s 
YAPPY: [b 4096K] bytes 100000000 -> 53852558  53.9%  comp  30.1 MB/s  uncomp 509.1 MB/s 
Note: Actually, the .EXE compiled as GP (without XMM) on my laptop performs faster - 551 MB/s.
This 'singularity' I spotted in hash etudes too, however on i7 3rd gen. the picture is reversed, there XMM rules.
BTW, 'Kaibutsu' compressed 'silesia.tar' (211,948,544) and 'CalgaryCorpus.tar' (3,153,408) down to 119,956,667/1,723,262.

Update, 2014-May-10:

Three years ago Hamid Buzidi with his LzTurbo impressed me much.
I can describe this superb tool in two words only: outstanding performance.
In my opinion, LzTurbo was and still is the king.
If anyone thinks differently please contact me and I will put your performer into the clash, gladly.
Having finished my decompressor Nakamichi 'Sanshi', the clash between these two superfast decompressors is given below.
Just run 'RUNME.BAT' and after 10-15 minutes the 'Results.txt' file will be auto-loaded into NOTEPAD.
The benchmark (174MB) containing the source, executables and the three XML/TEXT/MIX datasets:
http://www.sanmayce.com/Nakamichi/Decompression_Showdown_Nakamichi-Sanshi_vs_lzturbo.7z
D:\Decompression_Showdown_Nakamichi-Sanshi_vs_lzturbo>dir

05/10/2014  05:48 AM        41,929,879 enwik8.lzt
05/08/2014  07:51 AM        46,516,482 enwik8.Nakamichi
04/29/2013  08:35 PM           349,609 lzturbo.exe
04/13/2014  09:28 PM           855,237 lzturbo.zip
05/10/2014  06:26 AM               142 MakeEXE.bat
05/04/2014  05:38 PM             1,604 MokujIN prompt.lnk
05/10/2014  06:26 AM            69,865 Nakamichi_Sanshi.c
05/10/2014  08:51 AM           330,752 Nakamichi_Sanshi.doc
05/10/2014  08:51 AM           228,031 Nakamichi_Sanshi.pdf
05/10/2014  06:26 AM           254,593 Nakamichi_Sanshi_GP_64bit.cod
05/10/2014  06:26 AM           107,008 Nakamichi_Sanshi_GP_64bit.exe
05/10/2014  05:51 AM        70,067,665 OSHO.TXT.lzt
05/07/2014  04:21 PM        76,483,638 OSHO.TXT.Nakamichi
05/10/2014  06:34 AM             4,040 Results1_Core2_T7500.txt
05/10/2014  06:43 AM             4,040 Results2_Core2_T7500.txt
05/10/2014  06:52 AM             4,040 Results3_Core2_T7500.txt
05/10/2014  06:21 AM             1,390 RUNME.BAT
05/10/2014  05:54 AM        77,361,268 silesia.tar.lzt
05/10/2014  05:13 AM        99,042,931 silesia.tar.Nakamichi
05/04/2014  05:38 PM             4,096 timer32.exe
Aftermath:
enwik8.lzt + OSHO.TXT.lzt + silesia.tar.lzt = 189,358,812 bytes
enwik8.Nakamichi + OSHO.TXT.Nakamichi + silesia.tar.Nakamichi = 222,043,051 bytes
First loss, LzTurbo compresses 17% better.
The test was done on 1GB RAMDISK under Windows 7, Core 2 T7500 2200MHz.
I ran 'RUNME.BAT' three times, for charm, and after summing the 'Global Time' we have:
For 'Sanshi':
0.748+1.185+1.045
+
0.624+1.170+1.060
+
0.624+1.170+1.045
=
8.671
For 'LzTurbo':
0.358+0.639+0.639
+
0.343+0.639+0.639
+
0.358+0.624+0.655
=
4.894
Second loss, LzTurbo decompresses 77% faster, ouch-ouch.
Conclusion:
LzTurbo is a monster for sure, 'Sanshi' is inferior in both compression ratio and decompression speed departments.
Hamid, I salute you with one of my (one of many Delerium's, Conjure One's) songs Conjure One - Nargis feat. Azam Ali
D:\Decompression_Showdown_Nakamichi-Sanshi_vs_lzturbo>type Results1_Core2_T7500.txt
Nakamichi_Sanshi_GP_64bit.exe enwik8.Nakamichi:
Nakamichi 'Sanshi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 46516482 bytes ...
RAM-to-RAM performance: 290 MB/s.

Kernel  Time =     0.296 =   39%
User    Time =     0.343 =   45%
Process Time =     0.639 =   85%    Virtual  Memory =   1072 MB
Global  Time =     0.748 =  100%    Physical Memory =    142 MB

Nakamichi_Sanshi_GP_64bit.exe OSHO.TXT.Nakamichi:
Nakamichi 'Sanshi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 76483638 bytes ...
RAM-to-RAM performance: 332 MB/s.

Kernel  Time =     0.624 =   52%
User    Time =     0.546 =   46%
Process Time =     1.170 =   98%    Virtual  Memory =   1100 MB
Global  Time =     1.185 =  100%    Physical Memory =    273 MB

Nakamichi_Sanshi_GP_64bit.exe silesia.tar.Nakamichi:
Nakamichi 'Sanshi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 99042931 bytes ...
RAM-to-RAM performance: 478 MB/s.

Kernel  Time =     0.686 =   65%
User    Time =     0.358 =   34%
Process Time =     1.045 =  100%    Virtual  Memory =   1122 MB
Global  Time =     1.045 =  100%    Physical Memory =    299 MB

lzturbo.exe -d enwik8.lzt:

Kernel  Time =     0.405 =  113%
User    Time =     0.124 =   34%
Process Time =     0.530 =  147%    Virtual  Memory =     98 MB
Global  Time =     0.358 =  100%    Physical Memory =     71 MB

lzturbo.exe -d OSHO.TXT.lzt:

Kernel  Time =     0.717 =  112%
User    Time =     0.312 =   48%
Process Time =     1.029 =  160%    Virtual  Memory =     98 MB
Global  Time =     0.639 =  100%    Physical Memory =     67 MB

lzturbo.exe -d silesia.tar.lzt:

Kernel  Time =     0.686 =  107%
User    Time =     0.343 =   53%
Process Time =     1.029 =  160%    Virtual  Memory =     98 MB
Global  Time =     0.639 =  100%    Physical Memory =     72 MB
By the way 'Shichifukujin' trumped 'Kaidanji' and 'Sanshi' on 'Calgary':
05/08/2014  12:14 AM         3,153,408 CalgaryCorpus.tar
05/10/2014  02:02 AM         1,862,449 CalgaryCorpus.tar.Nakamichi-Kaidanji
05/10/2014  01:26 AM         1,810,863 CalgaryCorpus.tar.Nakamichi-Sanshi
05/10/2014  01:02 AM         1,779,782 CalgaryCorpus.tar.Nakamichi-Shichifukujin
Update, 2014-May-08:

Observing the proportion of Leonardo Fibonacci (particularly) in flowers 'forced' me to make Nakamichi 'Daikuni', one natural follow-up of 'Sanshi'.
The only difference is in the Match_Length trio, instead of 8/8/8 I was curious to see what the lower 'natural order' 8|5/8|5/8|5 holds.
Nakamichi
Some fifteen years ago, I have been hit superheavily while watching the first (Zatoichi #21: 'The Festival Of Fire') out of 26 Zato Ichi movies, especially by the scene where he buried in the forest the beautiful wife of the furious hatamoto.
One unforgettable imagery of light looming through trees and splitting darkness as sheaves of rays, Japanese has word for such phenomenon, sadly I forgot it.
So, my word is for chrysanthemum, the favorite flower of the beauty ... and the Imperator.

Nakamichi
In order to form a trio (along with Kaidanji/Sanshi) Daikuni has to remain BRANCHLESS.
A niftiness (speed boost, that is) lies in branchless way of deriving the 5/8/13 out of 1/2/3.
The 'TAG' variable, deciding the Match_Length, has binary values 01, 10, 11, this is how I obtained the sequence:
First, seeing how 8-3*(1-(TAG>>1)) gives 1/0/0 we have 8-3*1/8-3*0//8-3*0 or 5/8/8.
Second, by using above approach and after negation we have the 'formula':
01: 05 = 1*2+1*3 = (1+(!((~TAG)&0x03)))*2+TAG*3
10: 08 = 1*2+2*3 = (1+(!((~TAG)&0x03)))*2+TAG*3
11: 13 = 2*2+3*3 = (1+(!((~TAG)&0x03)))*2+TAG*3
It might be a deboost, but I don't care, I like it.
And here comes the new thing!
In this chrysalis variant the search is for 8/5 (in that order) long matches within the three 6/14/22 (in that order) Sliding Windows, crazy slow!
Didn't have enough time to benchmark it, only for 'dickens':
05/06/2014  06:01 AM        10,192,446 dickens
05/06/2014  05:54 AM         6,387,079 dickens.Nakamichi-Kaidanji
05/06/2014  05:51 AM         5,799,901 dickens.Nakamichi-Shichifukujin
05/08/2014  07:22 PM         4,982,222 dickens.Nakamichi-Daikuni
05/07/2014  10:54 AM         4,617,821 dickens.Nakamichi-Sanshi
05/06/2014  06:03 AM         4,376,867 dickens.lzt_19
05/06/2014  06:05 AM         2,279,659 dickens.tangelo
A bit (or two) disappointed, 'Daikuni' proves to be inferior to 'Sanshi'.
Anyway, here it is, the package (66KB) containing the source and executable:
http://www.sanmayce.com/Nakamichi/Nakamichi_Daikuni.zip

Note: The compression speed is atrociously low, but this troubles me little, given the extra-simplicity of encoding scheme it is only a matter of time a superfast etude to enter the battlefield.

Overdue, I found time to juxtapose the duo Kaidanji/Sanshi:
D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_Sanshi>Nakamichi_r1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX_GP_32bit.exe enwik8
Nakamichi, revision 1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX, written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Compressing 100000000 bytes ...
/; Each rotation means 128KB are encoded; Done 100%
RAM-to-RAM performance: 62 KB/s.

D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_Sanshi>Nakamichi_Sanshi_GP.exe enwik8
Nakamichi 'Sanshi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Compressing 100000000 bytes ...
/; Each rotation means 128KB are encoded; Done 100%
NumberOfFullLiterals (lower-the-better): 41986
RAM-to-RAM performance: 4 KB/s.

D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_Daikuni>Nakamichi_Daikuni_GP.exe enwik8
Nakamichi 'Daikuni', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Compressing 100000000 bytes ...
/; Each rotation means 128KB are encoded; Done 100%
NumberOfFullLiterals (lower-the-better): 73017
RAM-to-RAM performance: 3 KB/s.

D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_Shichifukujin>Nakamichi_Shichifukujin_GP.exe enwik8
Nakamichi 'Shichifukujin', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Compressing 100000000 bytes ...
/; Each rotation means 128KB are encoded; Done 100%
RAM-to-RAM performance: 68 KB/s.

05/08/2014  01:57 AM       100,000,000 enwik8
05/08/2014  01:33 AM        63,430,147 enwik8.Nakamichi-Kaidanji
05/09/2014  09:00 AM        59,813,510 enwik8.Nakamichi-Shichifukujin
05/09/2014  08:14 AM        51,043,573 enwik8.Nakamichi-Daikuni
05/08/2014  07:51 AM        46,516,482 enwik8.Nakamichi-Sanshi
Update, 2014-May-07:

Latest SUPERIOUR variant betters compression ratio significantly while remaining BRANCHLESS!
It is called 'Sanshi' or 'Silk-thread', it falls in one class with 'Kaidanji', yet, due to the many L2/L3 references it is twice as slow.
// Sanshi:
// Decompression main loop:
/*
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -D_N_GP -FAcs";

.B6.3::                         
  00022 41 8b 1c 13      mov ebx, DWORD PTR [r11+rdx]           
  00026 89 d9            mov ecx, ebx                           
  00028 83 e1 03         and ecx, 3                             
  0002b 4b 8d 34 11      lea rsi, QWORD PTR [r9+r10]            
  0002f 75 2a            jne .B6.5 
.B6.4::                         
  00031 0f b6 db         movzx ebx, bl                          
  00034 c1 eb 02         shr ebx, 2                             
  00037 49 8b 4c 13 01   mov rcx, QWORD PTR [1+r11+rdx]         
  0003c 44 03 d3         add r10d, ebx                          
  0003f 48 89 0e         mov QWORD PTR [rsi], rcx               
  00042 49 8b 4c 13 09   mov rcx, QWORD PTR [9+r11+rdx]         
  00047 4d 8b 5c 13 11   mov r11, QWORD PTR [17+r11+rdx]        
  0004c 4c 89 5e 10      mov QWORD PTR [16+rsi], r11            
  00050 44 8d 5c 18 01   lea r11d, DWORD PTR [1+rax+rbx]        
  00055 48 89 4e 08      mov QWORD PTR [8+rsi], rcx             
  00059 eb 29            jmp .B6.6 
.B6.5::                         
  0005b 03 c1            add eax, ecx                           
  0005d c1 e1 03         shl ecx, 3                             
  00060 41 83 c2 08      add r10d, 8                            
  00064 f7 d9            neg ecx                                
  00066 83 c1 18         add ecx, 24                            
  00069 41 89 c3         mov r11d, eax                          
  0006c b8 ff ff ff 00   mov eax, 16777215                      
  00071 d3 e8            shr eax, cl                            
  00073 23 d8            and ebx, eax                           
  00075 c1 eb 02         shr ebx, 2                             
  00078 48 f7 db         neg rbx                                
  0007b 48 03 de         add rbx, rsi                           
  0007e 48 8b 03         mov rax, QWORD PTR [rbx]               
  00081 48 89 06         mov QWORD PTR [rsi], rax               
.B6.6::                         
  00084 44 89 d8         mov eax, r11d                          
  00087 41 3b c0         cmp eax, r8d                           
  0008a 72 96            jb .B6.3 
*/
D:\_KAZE\Nakamichi_Kaidanji_benchmark\Nakamichi_Sanshi>Nakamichi_Sanshi_64bit.exe OSHO.TXT.Nakamichi
Nakamichi 'Sanshi', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 76483638 bytes ...
RAM-to-RAM performance: 340 MB/s.
Memory pool starting address: 0000000004D60080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 140448 clocks or 1.866MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 18%
So, here it is, the package (533KB) containing the source, executable and PDF:
http://www.sanmayce.com/Nakamichi/Nakamichi_Sanshi.zip

In fact, Nakamichi means 'Middle/Harmonious Way', it has all to do with eightfoldness.
If you ask me, the so-called 'Golden section/ratio' has to be called 'Concordant/Harmonious section/ratio',
the etymology is clear - HEART - in a more poetic way 'Heartratio', one beautiful add-on to heartbeat/heartbreak/heartburn.

Nakamichi

The latest/fastest variant (r6) is 'Kaidanji', it is so simple and ultra-light-weight that I cannot see how to speed it up.

So, here it is, the package (107MB) containing sources, executables and the testdataset:
http://www.sanmayce.com/Downloads/Nakamichi_Kaidanji_benchmark.7z

The contents (just run 'RUNME.BAT' the 'Results.txt' will be auto-loaded into NOTEPAD):
D:\_KAZE\Nakamichi_Kaidanji_benchmark>dir

05/04/2014  05:38 PM         2,209,537 Fennec_Fox_or_Fennecus_zerda_1920x1200_derivate.png
05/04/2014  05:38 PM         4,889,336 Goyathlay.txt.Nakamichi
05/04/2014  05:38 PM        11,224,983 Goyathlay_844-pages.pdf
05/04/2014  05:38 PM           494,080 Kazahana.exe
05/04/2014  05:38 PM             1,014 Kazahana_compile_Intel12.bat
05/04/2014  05:38 PM         6,122,496 Kazahana_logo.doc
05/04/2014  05:38 PM         1,352,934 Kazahana_logo.pdf
05/04/2014  05:38 PM           999,644 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS.c
05/04/2014  05:38 PM         2,699,104 Kaze_desktop_vlcsnap-23262_OutputLevels22_Hue-122_Saturation-80_1920x1200.png
05/04/2014  05:38 PM           559,515 Leprechaun_16FIXFIX_40-pages.pdf
05/04/2014  05:38 PM           129,536 Leprechaun_BB008hex_32p_32bit_Intel.exe
05/04/2014  05:38 PM         4,281,711 Leprechaun_BBhex_rev15fixfix_subrevB.zip
05/04/2014  05:38 PM           131,584 Leprechaun_x-leton_32bit_Intel_04_128p.exe
05/04/2014  05:38 PM            71,964 LZSS-master.zip
05/04/2014  05:38 PM             1,197 MakeEXEs.bat
05/04/2014  05:38 PM         2,045,146 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.Nakamichi
05/04/2014  05:38 PM             1,604 MokujIN prompt.lnk
05/04/2014  05:38 PM         5,153,280 MokujIN.doc
05/04/2014  05:38 PM         1,969,229 MokujIN.pdf
05/04/2014  05:38 PM           587,193 MokujIN_16threads.c
05/04/2014  05:38 PM               127 MokujIN_compile_Intel.bat
05/04/2014  05:38 PM           488,960 MokujIN_r5+_16-Threads_IntelV12_64bit_O3.exe
05/04/2014  05:38 PM            63,304 Nakamichi_r1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX.c
05/04/2014  05:38 PM           308,736 Nakamichi_r1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX.doc
05/04/2014  05:38 PM           224,039 Nakamichi_r1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX.pdf
05/04/2014  05:38 PM           273,453 Nakamichi_r1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX_GP_32bit.cod
05/04/2014  05:38 PM            91,648 Nakamichi_r1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX_GP_32bit.exe
05/04/2014  05:38 PM           267,701 Nakamichi_r1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX_GP_64bit.cod
05/04/2014  05:38 PM           107,008 Nakamichi_r1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX_GP_64bit.exe
05/04/2014  05:38 PM           265,708 Nakamichi_r1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX_XMM_64bit.cod
05/04/2014  05:38 PM           107,008 Nakamichi_r1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX_XMM_64bit.exe
05/04/2014  05:38 PM           269,536 Nakamichi_r1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX_YMM_64bit.cod
05/04/2014  05:38 PM           107,520 Nakamichi_r1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji_FIX_YMM_64bit.exe
05/04/2014  05:38 PM            88,064 Project_GAMERA_banner_r19.doc
05/04/2014  05:38 PM           206,646 Project_GAMERA_banner_r19.pdf
05/04/2014  05:38 PM             2,205 Results_Core2_T7500.txt
05/04/2014  05:38 PM               540 RUNME.BAT
05/04/2014  05:38 PM       128,649,094 silesia.tar.Nakamichi
05/04/2014  05:38 PM         1,077,805 Sub-project_Schisch_8-pages.pdf
05/04/2014  05:38 PM             4,096 timer32.exe
05/04/2014  05:38 PM             8,810 Yappy.cpp
05/04/2014  05:38 PM           101,376 Yappy_32bit.exe

D:\_KAZE\Nakamichi_Kaidanji_benchmark>
Nakamichi
The etude below is entrusted to perform decompression:
// Decompression main loop, 30 nifty lines:
/*
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111";
; mark_description "-O3 -D_N_GP -FAcs";

.B6.3::                         
  00017 41 0f b7 04 12   movzx eax, WORD PTR [r10+rdx]          
  0001c a8 07            test al, 7                             
  0001e 75 37            jne .B6.5 
.B6.4::                         
  00020 0f b6 c0         movzx eax, al                          
  00023 49 8b 5c 12 01   mov rbx, QWORD PTR [1+r10+rdx]         
  00028 c1 e8 03         shr eax, 3                             
  0002b 49 89 1c 0b      mov QWORD PTR [r11+rcx], rbx           
  0002f 49 8b 5c 12 09   mov rbx, QWORD PTR [9+r10+rdx]         
  00034 49 89 5c 0b 08   mov QWORD PTR [8+r11+rcx], rbx         
  00039 49 8b 5c 12 11   mov rbx, QWORD PTR [17+r10+rdx]        
  0003e 4d 8b 54 12 19   mov r10, QWORD PTR [25+r10+rdx]        
  00043 49 89 5c 0b 10   mov QWORD PTR [16+r11+rcx], rbx        
  00048 4d 89 54 0b 18   mov QWORD PTR [24+r11+rcx], r10        
  0004d 45 8d 54 01 01   lea r10d, DWORD PTR [1+r9+rax]         
  00052 44 03 d8         add r11d, eax                          
  00055 eb 19            jmp .B6.6 
.B6.5::                         
  00057 41 83 c1 02      add r9d, 2                             
  0005b 48 f7 d8         neg rax                                
  0005e 48 03 c1         add rax, rcx                           
  00061 45 89 ca         mov r10d, r9d                          
  00064 49 8b 1c 03      mov rbx, QWORD PTR [r11+rax]           
  00068 49 89 1c 0b      mov QWORD PTR [r11+rcx], rbx           
  0006c 41 83 c3 08      add r11d, 8                            
.B6.6::                         
  00070 45 89 d1         mov r9d, r10d                          
  00073 45 3b c8         cmp r9d, r8d                           
  00076 72 9f            jb .B6.3 
*/
The decompression of compressed XML/PAGODA/TEXT files reached 41%/50%/35% of the 'memcpy()':
Nakamichi, revision 1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji, written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 590380377 bytes ...
RAM-to-RAM performance: 773 MB/s.
Memory pool starting address: 0000000023950080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 139387 clocks or 1.881MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 41%

Nakamichi, revision 1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji, written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 279899511 bytes ...
RAM-to-RAM performance: 956 MB/s.
Memory pool starting address: 0000000010F70080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 139731 clocks or 1.876MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 50%

Nakamichi, revision 1-RSSBO_1GB_Wordfetcher_TRIAD_NOmemcpy_FIX_Kaidanji, written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced.
Decompressing 106013567 bytes ...
RAM-to-RAM performance: 664 MB/s.
Memory pool starting address: 0000000006AC0080 ... 64 byte aligned, OK
Copying a 256MB block 1024 times i.e. 256GB READ + 256GB WRITTEN ...
memcpy(): (256MB block); 262144MB copied in 140089 clocks or 1.871MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 35%
Okay, to end the confusion, who-is-who is given:
Nakamichi (r6,MatchLength=8) stands for Nakamichi 'Kaidanji', written 2014-May-04;
Nakamichi (r6,MatchLength=7) stands for Nakamichi 'Shichifukujin', written 2014-May-05;
Nakamichi (r7,MatchLength=8|8|8) stands for Nakamichi 'Sanshi', written 2014-May-07, uses offsets 6/14/22 bit.

And the actual shootout:
05/01/2014  05:28 AM     1,000,000,000 enwik9
05/02/2014  10:18 AM       590,380,377 enwik9.Nakamichi-Kaidanji                                        ! 773 MB/s !
05/05/2014  05:51 PM       564,867,308 enwik9.Nakamichi-Shichifukujin                                   ! 686 MB/s !
05/01/2014  05:27 PM       501,106,828 enwik9.Yappy_65536                                               ! 545.9 MB/s !
05/01/2014  05:27 PM       371,915,871 enwik9.lzt_19
05/01/2014  04:55 AM       178,497,116 enwik9.tangelo

05/01/2014  02:59 AM        11,546,860 Goyathlay.txt
05/02/2014  12:40 AM         4,889,336 Goyathlay.txt.Nakamichi-Kaidanji
05/05/2014  04:47 AM         4,833,295 Goyathlay.txt.Nakamichi-Shichifukujin
05/07/2014  11:19 AM         4,413,175 Goyathlay.txt.Nakamichi-Sanshi
05/01/2014  05:47 PM         3,432,371 Goyathlay.txt.lzt_19
05/01/2014  05:47 PM         3,350,305 Goyathlay.txt.Yappy_65536
05/01/2014  04:55 AM         1,218,240 Goyathlay.txt.tangelo

05/01/2014  04:02 AM       846,351,894 Kazahana_on.PAGODA-order-5.txt
05/05/2014  02:25 PM       299,423,737 Kazahana_on.PAGODA-order-5.txt.Nakamichi-Shichifukujin           ! 833 MB/s !
05/02/2014  02:36 AM       279,899,511 Kazahana_on.PAGODA-order-5.txt.Nakamichi-Kaidanji                ! 956 MB/s !
05/01/2014  05:38 PM       178,027,899 Kazahana_on.PAGODA-order-5.txt.Yappy_65536                       ! 932.6 MB/s !
05/01/2014  05:38 PM       107,077,360 Kazahana_on.PAGODA-order-5.txt.lzt_19
05/01/2014  05:08 AM        35,834,062 Kazahana_on.PAGODA-order-5.txt.tangelo

05/01/2014  02:58 AM         3,903,143 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd
05/07/2014  10:27 AM         2,106,310 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.Nakamichi-Sanshi
05/02/2014  01:56 AM         2,045,146 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.Nakamichi-Kaidanji
05/05/2014  04:48 AM         1,899,539 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.Nakamichi-Shichifukujin
05/01/2014  05:47 PM         1,516,082 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.lzt_19
05/01/2014  05:47 PM         1,478,808 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.Yappy_65536
05/01/2014  05:08 AM           516,997 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.tangelo

05/01/2014  04:30 AM       206,908,949 OSHO.TXT
05/02/2014  12:32 AM       106,013,567 OSHO.TXT.Nakamichi-Kaidanji                                      ! 664 MB/s !
05/05/2014  04:36 AM       100,533,785 OSHO.TXT.Nakamichi-Shichifukujin                                 ! 601 MB/s !
05/01/2014  05:10 PM        96,745,005 OSHO.TXT.Yappy_65536                                             ! 513.0 MB/s !
05/07/2014  04:21 PM        76,483,638 OSHO.TXT.Nakamichi-Sanshi                                        ! 340 MB/s !
05/01/2014  05:10 PM        70,067,665 OSHO.TXT.lzt_19
05/01/2014  05:12 AM        34,419,437 OSHO.TXT.tangelo

05/02/2014  12:18 AM       211,948,544 silesia.tar
05/01/2014  02:49 AM       128,649,094 silesia.tar.Nakamichi-Kaidanji                                   ! 860 MB/s !
05/05/2014  01:30 PM       124,804,964 silesia.tar.Nakamichi-Shichifukujin                              ! 808 MB/s !
05/01/2014  04:41 PM       100,815,697 silesia.tar.Yappy_65536                                          ! 656.3 MB/s !
05/01/2014  04:41 PM        77,361,268 silesia.tar.lzt_19
05/01/2014  06:44 AM        44,861,109 silesia.tar.tangelo

05/06/2014  06:01 AM        10,192,446 dickens
05/06/2014  05:54 AM         6,387,079 dickens.Nakamichi-Kaidanji
05/06/2014  05:51 AM         5,799,901 dickens.Nakamichi-Shichifukujin
05/07/2014  10:54 AM         4,617,821 dickens.Nakamichi-Sanshi
05/06/2014  06:03 AM         4,376,867 dickens.lzt_19
05/06/2014  06:05 AM         2,279,659 dickens.tangelo
Decompression speeds, above, are obtained on my 'Bonboniera' laptop with Core2 T7500 2200MHz running Windows 7 64bit.

The five rounds which led to (r6), at:
http://www.codeproject.com/Articles/250566/Fastest-strstr-like-function-in-C?msg=4808432#xx4808432xx

Here I say 'thanks' to Nobuo Ito, m^2, Lasse Reinhold, Fantasy and Harold, I appreciate your (not their) help.

With implementation of several new ideas following revisions are to appear:
- Nakamichi 'Dozaemon', nickname for 'drowned corpse', based on a legendary warrior who looked like one. /'Akage'/
- Nakamichi 'Hayabusa', yes-yes as the fastest shinkansen
- Nakamichi 'Hashiriyomi'
- Nakamichi 'Kokuen'
- Nakamichi 'Kokotsu'
- Nakamichi 'Shinju'

Nakamichi
INOUYE's Japanese-English Dictionary:
"... that its completion leaves a void in my daily routine, a feeling of bereavement and loneliness;", been there, love it:
The Middle-school pupil, for whose use this work is primarily intended,
may consider that he has of late been but too well served with dictionaries of all descriptions
designed to minister to his intellectual needs,
and that there is no room or raison d'être for still another Japanese-English dictionary.
Yet I do not regard its compilation as altogether a work of supererogation.
Japanese-English dictionaries, it is true, there are in plenty;
but excellent as some of them are in their way,
the majority are sadly to seek with respect to the commonest words and phrases in our language.
In this point I believe I may claim that this dictionary differs from most of its predecessors;
for my plan has been to collect as far as possible all Japanese words,
phrases, and sentences in common use and turn them into English,
and though slips and oversights are unavoidable in a work of this kind,
in no case have I purposely omitted anything on the mere ground of its being hard to translate.
I have met every difficulty fair and square; and if exception is taken, 
as doubtless it will be, to some of my renderings, 
I can only reply that I have done my level best and can do no more.
For more than two years I have devoted every moment that I could call my own to the compilation of this work
and toiled at it day and night, often until the small hours of the morning,
that its completion leaves a void in my daily routine, a feeling of bereavement and loneliness;
and yet, as I look at it once more before I send it forth, the conviction grows upon me that,
with all my careful nursing, it is a very imperfect bantling that I have brought into the world.
Still, if it proves itself of service and lends a helping hand to the Middle-school pupil
as he treads gingerly the thorny path of English composition and conversation,
I shall have reason to be proud of this youngest child of mine.
In conclusion I have much pleasure in acknowledging the very valuable assistance which I have received
in the compilation of this dictionary from Mr. T. Ishikawa, of the Seisoku-Chugakko
and Mr. E. Ando, of the Uyeda Middle School.

JUKICHI INOUYE.
Tokyo,
March, 1909.

/The Preface to 'INOUYE's Japanese-English Dictionary'/
Nakamichi
Copyleft Sanmayce, 2014 Apr 30; last modified on 2014 Oct 07; character encoding: charset=utf-8; for contacts: sanmayce@sanmayce.com