wiki:SyntheticBenchmarks

Version 15 (modified by faltet, 3 years ago) (diff)

Updated benchmarks for Blosc 0.9.0

Synthetic Benchmarks

Here it is an example on how to compile and run the simple benchmark that comes with Blosc.

Unix (gcc):

$ cd src
$ gcc -O3 -o bench bench.c blosc.c blosclz.c shuffle.c -msse2 -lpthread
$ ./bench [nthreads]

Windows (MSVC):

> cd src
> cl /Ox /Febench.exe bench.c blosc.c blosclz.c shuffle.c /link pthreadvc2.lib
> bench.exe [nthreads]

Below are some outputs when using a different number of threads with the next setup:

Processor model: Intel Core2 Duo Q8400 @ 2.66 GHz

L1 cache size (per core): 32 KB

L2 cache size (shared): 4 MB

Compiler: gcc version 4.4.1

OS: openSUSE 11.2, 64-bit


For 1 thread:

********************** Setup info *****************************
Blosc version: 0.9.0 (2010-05-04)
Using random data with 20 significant bits (out of 32)
Dataset size: 1048576 bytes     Type size: 4 bytes
Shuffle active?  Yes            Number of threads: 1
********************** Running benchmarks *********************
memcpy(write):            352.3 us, 2838.1 MB/s
memcpy(read):             287.5 us, 3478.3 MB/s
Compression level: 1
compression(write):       835.6 us, 1196.7 MB/s   Final bytes: 293072  Compr ratio: 3.58
decompression(read):      453.1 us, 2207.0 MB/s   OK
Compression level: 2
compression(write):       918.1 us, 1089.2 MB/s   Final bytes: 293072  Compr ratio: 3.58
decompression(read):      453.1 us, 2207.1 MB/s   OK
Compression level: 3
compression(write):       998.8 us, 1001.2 MB/s   Final bytes: 293072  Compr ratio: 3.58
decompression(read):      453.0 us, 2207.7 MB/s   OK
Compression level: 4
compression(write):      1844.0 us, 542.3 MB/s    Final bytes: 334768  Compr ratio: 3.13
decompression(read):      641.0 us, 1560.0 MB/s   OK
Compression level: 5
compression(write):      2170.0 us, 460.8 MB/s    Final bytes: 244432  Compr ratio: 4.29
decompression(read):     1026.7 us, 974.0 MB/s    OK
Compression level: 6
compression(write):      2170.2 us, 460.8 MB/s    Final bytes: 244432  Compr ratio: 4.29
decompression(read):     1027.5 us, 973.2 MB/s    OK
Compression level: 7
compression(write):      2123.7 us, 470.9 MB/s    Final bytes: 209544  Compr ratio: 5.00
decompression(read):     1190.3 us, 840.1 MB/s    OK
Compression level: 8
compression(write):      2123.8 us, 470.9 MB/s    Final bytes: 209544  Compr ratio: 5.00
decompression(read):     1190.4 us, 840.0 MB/s    OK
Compression level: 9
compression(write):      1882.9 us, 531.1 MB/s    Final bytes: 186216  Compr ratio: 5.63
decompression(read):     1073.4 us, 931.6 MB/s    OK

For 4 threads:

********************** Setup info *****************************
Blosc version: 0.9.0 (2010-05-04)
Using random data with 20 significant bits (out of 32)
Dataset size: 1048576 bytes     Type size: 4 bytes
Shuffle active?  Yes            Number of threads: 4
********************** Running benchmarks *********************
memcpy(write):            345.4 us, 2895.2 MB/s
memcpy(read):             284.1 us, 3520.5 MB/s
Compression level: 1
compression(write):       317.2 us, 3152.3 MB/s   Final bytes: 293072  Compr ratio: 3.58
decompression(read):      144.4 us, 6923.8 MB/s   OK
Compression level: 2
compression(write):       306.0 us, 3267.9 MB/s   Final bytes: 293072  Compr ratio: 3.58
decompression(read):      144.1 us, 6942.0 MB/s   OK
Compression level: 3
compression(write):       354.4 us, 2821.3 MB/s   Final bytes: 293072  Compr ratio: 3.58
decompression(read):      152.3 us, 6565.3 MB/s   OK
Compression level: 4
compression(write):       567.8 us, 1761.2 MB/s   Final bytes: 334768  Compr ratio: 3.13
decompression(read):      197.2 us, 5070.8 MB/s   OK
Compression level: 5
compression(write):       627.1 us, 1594.5 MB/s   Final bytes: 244432  Compr ratio: 4.29
decompression(read):      284.6 us, 3514.2 MB/s   OK
Compression level: 6
compression(write):       627.2 us, 1594.4 MB/s   Final bytes: 244432  Compr ratio: 4.29
decompression(read):      284.5 us, 3514.9 MB/s   OK
Compression level: 7
compression(write):       619.2 us, 1614.9 MB/s   Final bytes: 209544  Compr ratio: 5.00
decompression(read):      325.7 us, 3069.9 MB/s   OK
Compression level: 8
compression(write):       620.0 us, 1613.0 MB/s   Final bytes: 209544  Compr ratio: 5.00
decompression(read):      325.8 us, 3069.6 MB/s   OK
Compression level: 9
compression(write):       592.2 us, 1688.6 MB/s   Final bytes: 186216  Compr ratio: 5.63
decompression(read):      316.3 us, 3162.0 MB/s   OK

As you can see, using 4 threads on a Quad-core processor makes the compression/decompression to go faster by a 3.4x (!). More importantly, we can see there that threading actually can make Blosc to go faster for both writing to memory (1.12x faster) and reading (up to 2x!).

These results definitely open the door to perform computations faster by using compression, and I foresee that this technique will be increasingly useful with processors with more cores (which seems the trend for future computing).

If you want to help with the fine-tuning of Blosc for other processors, please send the output of this benchmark and processor details to blosc@…. Thanks!

Attachments