|Version 20 (modified by faltet, 3 years ago) (diff)|
Blosc: A blocking, shuffling and loss-less compression library
What Is It?
Blosc is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach. Blosc is the first (that I'm aware of) of a series of compressors that are meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate computations that are currently memory-bound.
It uses the blocking technique (as described in this article) to reduce activity on the memory bus as much as possible. In short, the blocking technique works by dividing datasets in blocks that are small enough to fit in L1 cache of modern processor and perform compression/decompression there. You may want to see more info about Blosc, as well as some preliminary benchmarks, in the last part of this presentation.
Where Blosc Can Be Used?
Blosc is being developed mainly for the needs of the PyTables database, although it may be used elsewhere. Although it is still in beta state, it is expected to allow PyTables to perform arithmetic (for example, see http://pytables.org/moin/ComputingKernel) and indexing operations with large datasets well beyond the speed of more traditional approaches (like memmap'ed access to files).
Is It Stable?
No, not yet, so please be careful when using it. Blosc is still a young project (in terms of what is needed for a compressor to be considered stable), and it is currently undergoing very intensive testing on many different kinds of datasets. Being said this, since 0.8 version I've frozen the format of Blosc, so at least it is guaranteed that the format will not change in a long while. Also, it is being included in the 2.2 version of PyTables, so it is probably being tested quite intensively in many places. But still, Blosc really needs much more testing before declaring it stable enough for production purposes.
Want To Contribute?
Your cooperation is very important to make Blosc stable as soon as possible so, if you detect some bug or want to propose an enhancement, feel free to open a new ticket. Also, you can contribute to this project by simply compiling and running a small benchmark as explained in the SyntheticBenchmarks page and mailing back the results for your platform to me.
Blosc is free software and released under the terms of the very permissive MIT license, so you can use it in almost any way you want!
The root of the subversion repository for Blosc is located at:
There is not a source tarball as such yet. I'll provide one once Blosc will become stable.
About This Site
This is a place where you can have a look at the Blosc sources, download patches, view existing (open or already closed) tickets and file new tickets. Its goal is to simplify effective tracking and handling of blosc issues, enhancements and overall progress.
Important note: In order to prevent the spam, you must observe the next requisites before modifying things in this site:
- You need to enable cookies to access some parts of this site; otherwise, you may trigger spam protection and get a "Payment Required" page. Sorry for the inconvenience.
As all Wiki pages, this page is editable, this means that you can modify the contents of this page simply by using your web-browser. Simply click on the "Edit this page" link at the bottom of the page. WikiFormatting will give you a detailed description of available Wiki formatting commands.