| Sanmayce homepage: http://www.sanmayce.com/ | YoshimitsuTRIAD homepage: www.sanmayce.com/Fastest_Hash/index.html#TRIADvsCRC | Railgun homepage: www.sanmayce.com/Railgun/index.html | Nakamichi homepage: http://www.sanmayce.com/Nakamichi/index.html |



S



README_Schmekerezada.TXT:

README_Schmekerezada.TXT

Short DIZ (description) of the package 'Schmekerezada.tar.gz'.

This console tool allows sorting lines (both Windows and Linux) of a given file, FAST.
The compiles/binaries are targeting SSE4.2 CPUs with 1 or 4 threads, thus my cutest machinette Thinkpad 11e (8GB RAM and 4/4 cores/threads) sets the base.

The package contains these:

[sanmayce@djudjeto2 Schmekerezada]$ ls -l

-rwxr-xr-x. 1 sanmayce sanmayce      18675 Jul 13 00:05 Akkodah_v3.h
-rwxr-xr-x. 1 sanmayce sanmayce       1439 Jul 13 00:05 bench_PARAMETER.sh
-rwxr-xr-x. 1 sanmayce sanmayce       9327 Jul 13 00:05 crumsort.c
-rwxr-xr-x. 1 sanmayce sanmayce      10704 Jul 13 00:05 crumsort.h
-rwxr-xr-x. 1 sanmayce sanmayce        150 Jul 13 00:05 GENERATE_Xmillion_Knight-Tours.bat
-rwxr-xr-x. 1 sanmayce sanmayce        151 Jul 13 00:05 GENERATE_Xmillion_Knight-Tours.sh
-rwxr-xr-x. 1 sanmayce sanmayce      78263 Jul 13 00:05 Knight-Tour_FNV1A_YoshimitsuTRIADii_vs_CRC32_TRISMUS.c
-rwxr-xr-x. 1 sanmayce sanmayce      33792 Jul 13 00:05 Knight-Tour_FNV1A_YoshimitsuTRIADii_vs_CRC32_TRISMUS.elf
-rwxr-xr-x. 1 sanmayce sanmayce     139776 Jul 13 00:05 Knight-Tour_FNV1A_YoshimitsuTRIADii_vs_CRC32_TRISMUS.exe
-rwxr-xr-x. 1 sanmayce sanmayce      78553 Jul 13 00:05 log_su_Intel_Celeron_N4100_Cores-4.txt
-rwxr-xr-x. 1 sanmayce sanmayce      66313 Jul 13 00:05 log_su_Intel_Kaby-Lake_i5-7200U_Cores-2.txt
-rwxr-xr-x. 1 sanmayce sanmayce     151607 Jul 13 00:05 Magnetica_v18.h
-rwxr-xr-x. 1 sanmayce sanmayce        303 Jul 13 00:05 MAKE_CLANG_Schmekeriada.bat
-rwxr-xr-x. 1 sanmayce sanmayce        404 Jul 13 00:05 make_elf_CLANG_Schmekeriada.sh
-rwxr-xr-x. 1 sanmayce sanmayce       1066 Jul 13 00:05 make_elf_exe_GCC_Schmekeriada.sh
-rwxr-xr-x. 1 sanmayce sanmayce        728 Jul 13 00:05 MAKE_ICL.bat
-rwxr-xr-x. 1 sanmayce sanmayce      23515 Jul 13 00:05 quadsort.c
-rwxr-xr-x. 1 sanmayce sanmayce      12100 Jul 13 00:05 quadsort.h
-rwxr-xr-x. 1 sanmayce sanmayce    3288935 Jul 13 00:05 Quicksort_Magnetica_COVERS.pdf
-rwxr-xr-x. 1 sanmayce sanmayce  268684544 Jul 13 00:05 Schmekerezada_CLANG_16.0.1_SSE4.2_TetraThread.elf
-rwxr-xr-x. 1 sanmayce sanmayce    1606950 Jul 13 00:05 Schmekerezada_CLANG_16.0.1_SSE4.2_TetraThread.elf.asm
-rwxr-xr-x. 1 sanmayce sanmayce  268648576 Jul 13 00:05 Schmekerezada_GCC_13.0.1_SSE4.2_MonoThread.elf
-rwxr-xr-x. 1 sanmayce sanmayce  269694245 Jul 13 00:05 Schmekerezada_GCC_13.0.1_SSE4.2_MonoThread.exe
-rwxr-xr-x. 1 sanmayce sanmayce  268656936 Jul 13 00:05 Schmekerezada_GCC_13.0.1_SSE4.2_TetraThread.elf
-rwxr-xr-x. 1 sanmayce sanmayce     953727 Jul 13 00:05 Schmekerezada_GCC_13.0.1_SSE4.2_TetraThread.elf.asm
-rwxr-xr-x. 1 sanmayce sanmayce  269703091 Jul 13 00:05 Schmekerezada_GCC_13.0.1_SSE4.2_TetraThread.exe
-rwxr-xr-x. 1 sanmayce sanmayce     306350 Jul 13 00:05 Schmekeriada.c
-rwxr-xr-x. 1 sanmayce sanmayce      12927 Jul 13 00:05 sort_vs_Schmekerezada.sh
-rwxr-xr-x. 1 sanmayce sanmayce       6144 Jul 13 00:05 timer64.exe

-rwxr-xr-x. 1 sanmayce sanmayce 1359441920 Jul 13 00:05 linux-6.1.38.tar
-rwxr-xr-x. 1 sanmayce sanmayce 3313061631 Jul 13 00:05 www.ncbi.nlm.nih.gov_genome_guide_human_GRCh38_latest_genomic.fna
-rwxr-xr-x. 1 sanmayce sanmayce 3870000000 Jul 13 00:05 30000000.KnightTours.txt
-rwxr-xr-x. 1 sanmayce sanmayce 2099451904 Jul 13 00:05 Fedora-Workstation-Live-x86_64-38-1.6.iso

[sanmayce@djudjeto2 Schmekerezada]$ sha1sum *

a21deb40fec1591e8d9505c15d54322faa3e4618  Akkodah_v3.h
5af613a91318f5d15cea962123f0af44ad5c7338  bench_PARAMETER.sh
06b1f4034569ba908c7236e54bb7689d4e071121  crumsort.c
22b68f5ecc9e355e6ed9be4044d1741f255bb40b  crumsort.h
413c26e8443ddadd0605bfb8d68eea795b295b20  GENERATE_Xmillion_Knight-Tours.bat
595dcc597f87bf6f71613354c000b5dfc620e07a  GENERATE_Xmillion_Knight-Tours.sh
3a0be7e1767007cc2e9c0de949d767af64b48bc6  Knight-Tour_FNV1A_YoshimitsuTRIADii_vs_CRC32_TRISMUS.c
98e4d1ecda7d862604ebc2ece300d5f57ba5365b  Knight-Tour_FNV1A_YoshimitsuTRIADii_vs_CRC32_TRISMUS.elf
1f5907713512e24ee75115a32e64ec8bbd9ff955  Knight-Tour_FNV1A_YoshimitsuTRIADii_vs_CRC32_TRISMUS.exe
2aa7fc07cd9981f1f62dc68799edb501124c19fb  log_su_Intel_Celeron_N4100_Cores-4.txt
3a2b4e105d52cded87a163ca393a58cdda969683  log_su_Intel_Kaby-Lake_i5-7200U_Cores-2.txt
dc1f49ecc858da545d4bc7c86d2d4beb683adc3d  Magnetica_v18.h
42ea7db44d8821477ae250e123f51c780728954f  MAKE_CLANG_Schmekeriada.bat
2319549123d6e462ab92f4fb0e9c1c01f5c61d3e  make_elf_CLANG_Schmekeriada.sh
d2037c50131cae3e853968be90a2fc52d252a65c  make_elf_exe_GCC_Schmekeriada.sh
b275b35d5a0d370809dc355a78e61bc12f150767  MAKE_ICL.bat
1fd6d4365743b2ddb59d167cdd2fd0f078f042de  quadsort.c
3817e01a5bd6b0b929d5ce1b9ac8014de5f4c14f  quadsort.h
770ee11a3d1e1176abe5d724eb3bc5ba334fab65  Quicksort_Magnetica_COVERS.pdf
8bb9012967b2c319c46e1e299e81d4583954bfe8  Schmekerezada_CLANG_16.0.1_SSE4.2_TetraThread.elf
3f593ba538a7e44f9e3c18121266b0bd6a0ce29d  Schmekerezada_CLANG_16.0.1_SSE4.2_TetraThread.elf.asm
57a2789a730aa068a8977e811210418ddcd75c2c  Schmekerezada_GCC_13.0.1_SSE4.2_MonoThread.elf
66b49258c127ac0f8699f0e232efaa3402792555  Schmekerezada_GCC_13.0.1_SSE4.2_MonoThread.exe
192b346bcf7a3067fe305108318d50e87629e29d  Schmekerezada_GCC_13.0.1_SSE4.2_TetraThread.elf
cbc57c319311cf9b61bee238a3ae565aaafffa30  Schmekerezada_GCC_13.0.1_SSE4.2_TetraThread.elf.asm
ffcb37bf83f2c56e045c9212ce09b9dd1e01ac98  Schmekerezada_GCC_13.0.1_SSE4.2_TetraThread.exe
bc8cc0fc85a57ab64192fc741b7444ff6f3a50a0  Schmekeriada.c
7b76a8f00a45a873784a0c8ee673a0b0051b32ab  sort_vs_Schmekerezada.sh
97c68b70851e3f1db82e73107849fda988308c48  timer64.exe

74f6610fe722658cf2940be1853e9c13d4da464a  linux-6.1.38.tar
daa181d9ec4b58d17778bfcb5bdc96472c76fd15  www.ncbi.nlm.nih.gov_genome_guide_human_GRCh38_latest_genomic.fna
11a6084ed17de2a37e05e10a89baa8b22d90e68b  30000000.KnightTours.txt
9904b33356a7852fdf0df49c6e63d74361cc5d5b  Fedora-Workstation-Live-x86_64-38-1.6.iso

The benefits (compared to Windows' sort and Linux' sort) are:
- 100% FREE sourcecode, no licenses and shenanigans;
- Faster than both, see 'log_su_Intel_Celeron_N4100_Cores-4.txt';
- Neither of both are really cross-platform, no binaries (counterparts) found on the net that are transparently usable;
- Good starting point/playground for C coders.

Enfun!
2023-Jul-12,
Sanmayce

sort_vs_Schmekerezada.sh:

#   _________        .__                       __                                               .___        
#  /   _____/  ____  |  |__    _____    ____  |  | __  ____ _______   ____  _____________     __| _/_____   
#  \_____  \ _/ ___\ |  |  \  /     \ _/ __ \ |  |/ /_/ __ \\_  __ \_/ __ \ \___   /\__  \   / __ | \__  \  
#  /        \\  \___ |   Y  \|  Y Y  \\  ___/ |    < \  ___/ |  | \/\  ___/  /    /  / __ \_/ /_/ |  / __ \_
# /_______  / \___  >|___|  /|__|_|  / \___  >|__|_ \ \___  >|__|    \___  >/_____ \(____  /\____ | (____  /
#         \/      \/      \/       \/      \/      \/     \/             \/       \/     \/      \/      \/ 

if [ ! -f "./30000000.KnightTours.txt" ]; then
sh GENERATE_Xmillion_Knight-Tours.sh 30000000
fi
sh bench_PARAMETER.sh linux-6.1.38.tar
sh bench_PARAMETER.sh www.ncbi.nlm.nih.gov_genome_guide_human_GRCh38_latest_genomic.fna
sh bench_PARAMETER.sh Fedora-Workstation-Live-x86_64-38-1.6.iso
sh bench_PARAMETER.sh 30000000.KnightTours.txt

#-rwxrwxrwx. 1 kaze kaze 1359441920 Jul 11 14:53 linux-6.1.38.tar
#-rwxrwxrwx. 1 kaze kaze 3313061631 Apr  8  2022 www.ncbi.nlm.nih.gov_genome_guide_human_GRCh38_latest_genomic.fna
#-rwxrwxrwx. 1 kaze kaze 2099451904 Jul 11 17:35 Fedora-Workstation-Live-x86_64-38-1.6.iso
#-rwxrwxrwx. 1 kaze kaze 3870000000 Jul 12 01:33 30000000.KnightTours.txt

# +----------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
# |        \ Corpus Name             |              linux-6.1.38.tar |              Human_Genome_DNA |      30000000.KnightTours.txt | Fedora-Workstat...-38-1.6.iso |
# |        \ Corpus Size in Bytes    |                 1,359,441,920 |                 3,313,061,631 |                 3,870,000,000 |                 2,099,451,904 |
# | Sorter \ Corpus Size in Lines    |                    35,585,653 |                    40,902,071 |                    30,000,000 |                     8,189,810 |
# +----------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
# | sort v9.0 --parallel=4 -T ./     |  99,472,516,848 instructions  | 128,285,206,805 instructions  | 109,679,415,651 instructions  |  27,716,611,486 instructions  |
# |                                  |  21,494,037,021 branches      |  27,548,013,135 branches      |  23,469,594,117 branches      |   5,918,985,732 branches      |
# |                                  |     461,065,737 branch-misses |     579,823,078 branch-misses |     156,337,384 branch-misses |     166,284,942 branch-misses |
# |                                  |            20.5 seconds       |            42.9 seconds       |            29.8 seconds       |            13.0 seconds       |
# +----------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
# | sort v9.0 --parallel=16 -T ./    |  99,826,000,759 instructions  | 128,975,431,969 instructions  | 106,611,922,490 instructions  |  30,359,262,417 instructions  |
# |                                  |  21,496,860,784 branches      |  27,614,861,865 branches      |  22,799,055,675 branches      |   6,494,585,359 branches      |
# |                                  |     465,714,553 branch-misses |     587,152,024 branch-misses |     169,894,912 branch-misses |     176,038,949 branch-misses |
# |                                  |            20.1 seconds       |            39.7 seconds       |            30.1 seconds       |            13.4 seconds       |
# +----------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
# | Schmekerezada_CLANG_16.0.1 (v19) |  64,682,963,173 instructions  | 104,566,733,962 instructions  | 135,489,760,619 instructions  |  37,726,691,231 instructions  |
# |                                  |  13,770,680,866 branches      |  22,623,299,550 branches      |  31,235,409,067 branches      |   9,080,478,581 branches      |
# |                                  |     423,462,180 branch-misses |     571,367,377 branch-misses |     521,102,315 branch-misses |     120,282,275 branch-misses |
# |                                  |            14.6 seconds       |            30.5 seconds       |            24.3 seconds       |             9.8 seconds       |
# +----------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
# | Schmekerezada_GCC_13.0.1 (v19)   |  73,570,739,976 instructions  | 120,331,569,174 instructions  | 146,686,441,751 instructions  |  42,112,144,823 instructions  |
# |                                  |  14,652,384,005 branches      |  24,465,263,549 branches      |  33,075,459,615 branches      |   9,627,303,775 branches      |
# |                                  |     421,083,216 branch-misses |     568,237,768 branch-misses |     523,361,025 branch-misses |     121,180,507 branch-misses |
# |                                  |            19.5 seconds       |            37.8 seconds       |            32.3 seconds       |            14.7 seconds       |
# +----------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
# Note1: The benchmark is in 'performance' mode as superuser;
# Note2: Linux version 5.18.15-200.fc36.x86_64 (mockbuild@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 12.1.1 20220507 (Red Hat 12.1.1-1), GNU ld version 2.37-27.fc36) #1 SMP PREEMPT_DYNAMIC Sun Jul 31 21:30:34 UTC 2022
# Note3: Testmachine: Laptop i5-7200U CPU, 3.1GHz max turbo 2cores/4threads, L1d: 64 KiB (2 instances), L1i: 64 KiB (2 instances), L2: 512 KiB (2 instances), L3: 3 MiB (1 instance), 36GB DDR4 2133MT/s, running Fedora 36;
# Note4: Schmekerezada is tetrathreaded;
# Note5: The current drive: SSD SATA KINGSTON SKC6001024G (1GB cache);
# Note6a: Partition in use:
# Note6b: Filesystem     Type  Size  Used Avail Use% Mounted on
# Note6c: /dev/sdb1      ext4  331G  226G   89G  72% /
# Note7: LC_ALL=C locale was used for sort;
# Note8: After sorting, checking the sha1sum for both outputs - they all matched;
# Note9: CLANG compiler is significantly better than GCC, too many times;
# NoteA: The KT_30M corpus is of fixed-line size - 128 bytes each line - all lines unique, here the LittleEndian-To-BigEndian technique pays off;
# NoteB: The time statistics (wall clock) are reported by Linux’ perf;
# NoteC: It is worth mentioning that CLANG executable is executed before GCC counterpart, it means possible caching of the whole file is more likely for the latter.
# 
# So, cumulatively:
# 14.6+30.5+24.3+9.8=79.2 seconds
# 20.1+39.7+30.1+13.4=103.3 seconds
# Schmekerezada (compiled with CLANG) is only 103.3/79.2=1.30x or 30% faster than GNUsort 16threads, cold shower for those who (like me) underestimated Mergesort's parallelizability.

# +----------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
# |        \ Corpus Name             |              linux-6.1.38.tar |              Human_Genome_DNA |      30000000.KnightTours.txt | Fedora-Workstat...-38-1.6.iso |
# |        \ Corpus Size in Bytes    |                 1,359,441,920 |                 3,313,061,631 |                 3,870,000,000 |                 2,099,451,904 |
# | Sorter \ Corpus Size in Lines    |                    35,585,653 |                    40,902,071 |                    30,000,000 |                     8,189,810 |
# +----------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
# | sort v9.1 --parallel=4 -T ./     | 102,796,467,972 instructions  | 145,940,013,705 instructions  | 134,526,463,714 instructions  |  32,039,863,819 instructions  |
# |                                  |  21,511,871,163 branches      |  30,368,703,735 branches      |  27,618,897,382 branches      |   6,671,955,880 branches      |
# |                                  |     451,366,228 branch-misses |     646,691,572 branch-misses |     248,678,541 branch-misses |     182,670,430 branch-misses |
# |                                  |            38.9 seconds       |            89.6 seconds       |           101.2 seconds       |            22.6 seconds       |
# +----------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
# | sort v9.1 --parallel=16 -T ./    |  95,974,435,959 instructions  | 139,181,029,596 instructions  | 121,021,375,460 instructions  |  43,584,980,865 instructions  |
# |                                  |  20,010,961,812 branches      |  28,921,319,248 branches      |  24,752,255,755 branches      |   9,001,685,334 branches      |
# |                                  |     457,525,469 branch-misses |     638,439,543 branch-misses |     252,280,650 branch-misses |     251,235,618 branch-misses |
# |                                  |            34.6 seconds       |            76.0 seconds       |            97.7 seconds       |            31.6 seconds       |
# +----------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
# | Schmekerezada_CLANG_16.0.1 (v19) |  65,342,898,585 instructions  | 106,570,792,202 instructions  | 137,387,012,432 instructions  |  38,298,101,683 instructions  |
# |                                  |  13,854,546,733 branches      |  22,916,875,802 branches      |  31,582,212,549 branches      |   9,150,687,078 branches      |
# |                                  |     442,781,547 branch-misses |     565,776,851 branch-misses |     518,222,887 branch-misses |     125,573,907 branch-misses |
# |                                  |            33.1 seconds       |            61.4 seconds       |            50.7 seconds       |            20.9 seconds       |
# +----------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
# | Schmekerezada_GCC_13.0.1 (v19)   |  75,034,087,501 instructions  | 123,284,759,010 instructions  | 150,447,993,706 instructions  |  43,962,088,192 instructions  |
# |                                  |  14,933,728,465 branches      |  25,022,793,479 branches      |  33,785,964,900 branches      |   9,980,116,461 branches      |
# |                                  |     436,610,153 branch-misses |     566,297,480 branch-misses |     545,784,186 branch-misses |     127,345,641 branch-misses |
# |                                  |            35.2 seconds       |            65.4 seconds       |            52.7 seconds       |            23.3 seconds       |
# +----------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
# Note1: The benchmark is in 'performance' mode as superuser;
# Note2: Linux version 6.2.12-300.fc38.x86_64 (mockbuild@54604edad16f4e818e702bda973f7473) (gcc (GCC) 13.0.1 20230401 (Red Hat 13.0.1-0), GNU ld version 2.39-9.fc38) #1 SMP PREEMPT_DYNAMIC Thu Apr 20 23:05:25 UTC 2023
# Note3: Testmachine: Laptop Thinkpad 11e, Celeron N4100 CPU, 2.4GHz max turbo 4cores/4threads, L1d: 96 KiB (4 instances), L1i: 128 KiB (4 instances), L2: 4 MiB (1 instance), 8GB DDR4 2400MT/s, running Fedora 38;
# Note4: Schmekerezada is tetrathreaded;
# Note5: The current drive: SSD nvme 1TB TS1TMTE400S (DRAM-less cache);
# Note6a: Partition in use:
# Note6b: Filesystem     Type  Size  Used Avail Use% Mounted on
# Note6c: /dev/nvme0n1p2 ext4  875G  567G  264G  69% /
# Note7: LC_ALL=C locale was used for sort;
# Note8: After sorting, checking the sha1sum for both outputs - they all matched;
# Note9: CLANG compiler is significantly better than GCC, too many times;
# NoteA: The KT_30M corpus is of fixed-line size - 128 bytes each line - all lines unique, here the LittleEndian-To-BigEndian technique pays off;
# NoteB: The time statistics (wall clock) are reported by Linux’ perf;
# NoteC: It is worth mentioning that CLANG executable is executed before GCC counterpart, it means possible caching of the whole file is more likely for the latter.
# 
# So, cumulatively:
# 33.1+61.4+50.7+20.9=166.1 seconds
# 34.6+76.0+97.7+31.6=239.9 seconds
# Schmekerezada (compiled with CLANG) is only 239.9/166.1=1.44x or 44% faster than GNUsort 16threads, cold shower for those who (like me) underestimated Mergesort's parallelizability.

Sanmayce


* Booklet: Quicksort_Magnetica_Schmekerezada.pdf, 4.39 MB (4,609,096 bytes).
* Download Schmekerezada, full C sourcecode, Linux/Windows binaries : Schmekerezada_no-corpora.tar.gz, 5.91 MB (6,208,197 bytes).

Copyleft Sanmayce, 2023 Jul 14; for contacts: sanmayce@sanmayce.com