source: trunk/README.txt @ 210

Revision 210, 5.2 KB checked in by faltet, 4 years ago (diff)

Clearify the license statement.

Line 
1===============================================================
2 Blosc: A blocking, shuffling and lossless compression library
3===============================================================
4
5:Author: Francesc Alted i Abad
6:Contact: faltet@pytables.org
7:URL: http://blosc.pytables.org
8
9
10What is it?
11===========
12
13Blosc [1]_ is a high performance compressor optimized for binary data.
14It has been designed to transmit data to the processor cache faster
15than the traditional, non-compressed, direct memory fetch approach via
16a memcpy() OS call.  Blosc is the first compressor (that I'm aware of)
17that is meant not only to reduce the size of large datasets on-disk or
18in-memory, but also to accelerate memory-bound computations.
19
20It uses the blocking technique (as described in [2]_) to reduce
21activity on the memory bus as much as possible.  In short, this
22technique works by dividing datasets in blocks that are small enough
23to fit in caches of modern processors and perform compression /
24decompression there.  It also leverages, if available, SIMD
25instructions (SSE2) and multi-threading capabilities of CPUs, in order
26to accelerate the compression / decompression process to a maximum.
27
28You can see some recent benchmarks about Blosc performance in [3]_
29
30Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for
31details.
32
33.. [1] http://blosc.pytables.org
34.. [2] http://www.pytables.org/docs/CISE-12-2-ScientificPro.pdf
35.. [3] http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks
36
37
38Meta-compression and other advantages over other existing compressors
39====================================================================
40
41Blosc is not like other compressors: it should rather be called a
42meta-compressor.  This is so because it can use different compressors
43and pre-conditioners (programs that generally improve compression
44ratio).  Anyway, it can also be called a compressor because it happens
45that it already integrates one compressor and one pre-conditioner, so
46it can actually work like so.
47
48Currently it uses BloscLZ, a compressor heavily based on FastLZ
49(http://fastlz.org/), and a highly optimized (it can use SSE2
50instructions, if available) Shuffle pre-conditioner. However,
51different compressors or pre-conditioners may be added in the future.
52
53Blosc is in charge of coordinating the compressor and pre-conditioners
54so that they can leverage the blocking technique (described above) as
55well as multi-threaded execution (if several cores are available)
56automatically. That makes that every compressor and pre-conditioner
57will work at very high speeds, even if it was not initially designed
58for doing blocking or multi-threading.
59
60Other advantages of Blosc are:
61
62    * Meant for binary data: can take advantage of the type size
63      meta-information for improved compression ratio (using the
64      integrated shuffle pre-conditioner).
65
66    * Small overhead on non-compressible data: only a maximum of 16
67      additional bytes over the source buffer length are needed to
68      compress *every* input.
69
70    * Maximum destination length: contrarily to many other
71      compressors, both compression and decompression routines have
72      support for maximum size lengths for the destination buffer.
73
74When taken together, all these features set Blosc apart from other
75similar solutions.
76
77
78Compiling your application with Blosc
79=====================================
80
81Blosc consists of the next files (in blosc/ directory):
82
83blosc.h and blosc.c      -- the main routines
84blosclz.h and blosclz.c  -- the actual compressor
85shuffle.h and shuffle.c  -- the shuffle code
86
87Just add these files to your project in order to use Blosc.  For
88information on compression and decompression routines, see blosc.h.
89
90To compile using GCC/MINGW (4.4 or higher recommended):
91
92  gcc -O3 -msse2 -o myprog myprog.c blosc/*.c -lpthread
93
94Using Windows and MSVC (2008 or higher recommended):
95
96  cl /Ox /Femyprog.exe myprog.c blosc\*.c  /link pthreadvc2.lib
97
98[remember to set the LIB and INCLUDE environment variables to
99pthread-win32 directories first]
100
101A simple usage example is the benchmark in the bench/bench.c file.
102Also, another example for using Blosc as a generic HDF5 filter is in
103the hdf5/ directory.
104
105I have not tried to compile this with compilers other than GCC, MINGW,
106Intel ICC or MSVC yet. Please report your experiences with your own
107platforms.
108
109
110Testing Blosc
111=============
112
113Go to the test/ directory and issue:
114
115$ make test
116
117These tests are very basic, and only valid for platforms where GNU
118make/gcc tools are available.  If you really want to test Blosc the
119hard way, look at:
120
121http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks
122
123where instructions on how to intensively test (and benchmark) Blosc
124are given.  If while running these tests you get some error, please
125report it back!
126
127
128Filter for HDF5
129===============
130
131For those that want to use Blosc as a filter in the HDF5 library,
132there is an implementation in the hdf5/ directory.
133
134
135Acknowledgments
136===============
137
138I'd like to thank the PyTables community that have collaborated in the
139exhaustive testing of Blosc.  With an aggregate amount of more than
140300 TB of different datasets compressed *and* decompressed
141successfully, I can say that Blosc is pretty safe now and ready for
142production purposes.
143
144
145----
146
147  **Enjoy data!**
Note: See TracBrowser for help on using the repository browser.