Changes between Version 21 and Version 22 of WikiStart
- Timestamp:
- 06/12/10 15:25:28 (3 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
WikiStart
v21 v22 8 8 Blosc is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach. Blosc is the first (that I'm aware of) of a series of compressors that are meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate computations that are currently memory-bound. 9 9 10 It uses the blocking technique (as described in this [http://www.pytables.org/docs/CISE-12-2-ScientificPro.pdf article]) to reduce activity on the memory bus as much as possible. In short, the blocking technique works by dividing datasets in blocks that are small enough to fit in L1 cache of modern processor and perform compression/decompression there. It also leverages threadingin nowadays multicore processors so as to accelerate the compression/decompression process to a maximum.10 It uses the blocking technique (as described in this [http://www.pytables.org/docs/CISE-12-2-ScientificPro.pdf article]) to reduce activity on the memory bus as much as possible. In short, the blocking technique works by dividing datasets in blocks that are small enough to fit in L1 cache of modern processor and perform compression/decompression there. It also leverages multimedia extensions (SSE2) and multi-threading capabilities in nowadays multicore processors so as to accelerate the compression/decompression process to a maximum. 11 11 12 12 You may want to see more info about Blosc in the last part of this [http://www.pytables.org/docs/StarvingCPUs.pdf presentation]. You can see some recent bencharks in SyntheticBenchmarks. … … 14 14 == Where Blosc Can Be Used? == 15 15 16 Blosc is being developed mainly for the needs of the [http://www.pytables.org/ PyTables] database, although it may be used elsewhere. Although it is still in beta state, it is expected to allow !PyTables to perform arithmetic (for example, see [http://pytables.org/moin/ComputingKernel]) and indexing operations with large datasets well beyond the speed of more traditional approaches (like memmap'ed access to files).16 Blosc is being developed mainly for the needs of the [http://www.pytables.org/ PyTables] database, although it may be used elsewhere. Although it is still a young project, it is expected to allow !PyTables to perform arithmetic (for example, see [http://pytables.org/moin/ComputingKernel]) and indexing operations with large datasets well beyond the speed of more traditional approaches (like memmap'ed access to files). 17 17 18 18 == Is It Stable? == 19 19 20 No, not yet, so please be careful when using it. B losc is still a young project (in terms of what is needed for a compressor to be considered stable), and it is currently undergoing very intensive testing on many different kinds of datasets. Being said this, since 0.8 version I've frozen the format of Blosc, so at least it is guaranteed that the format will not change in a long while. The API is not yet frozen too (once this is done, that will mark the 1.0 release).20 No, not yet, so please be careful when using it. Being said this, since 0.8 version the format has been frozen, so at least it is guaranteed that it will not change in a long while. The API has been frozen in release 0.9.5 too. The only part that remains is testing Blosc extensively and broadely. 21 21 22 I'm currently testing it very hard, and I'm happy to say that, since 0.9.1 on, it worked flawlessly compressing several thousands of terabytes on Windows and Unix machines, both in 32-bit and 64-bit. Also, it is being included in the [http://www.pytables.org/download/preliminary/ 2.2] version of !PyTables, so it is probably being tested quite intensively in many other places.22 Part of the !PyTables community is currently testing Blosc very hard now, and I'm happy to say that, since 0.9.5 on, it worked flawlessly compressing several thousands of terabytes on many different Windows and Unix boxes, both in 32-bit and 64-bit. Also, it is being included in the [http://www.pytables.org/download/preliminary/ 2.2] version of !PyTables, so it is probably being tested quite intensively in many other places. When all this test process would end (very soon now), that will mark the begining of the 1.x series. 23 23 24 24 == Want To Contribute? == 25 25 26 Your cooperation is very important to make Blosc stable as soon as possible so, if you detect some bug or want to propose an enhancement, feel free to open a new ticket. Also, you can contribute to this project by simply compiling and running a small benchmark as explained in the SyntheticBenchmarks page and mailing back the results for your platform to [http://pytables.org/moin/FrancescAlted me].26 Your cooperation is very important to make Blosc stable as soon as possible so, if you detect some bug or want to propose an enhancement, feel free to open a new ticket. Also, you can contribute to this project by simply compiling and running different benchmark and test suites as explained in the SyntheticBenchmarks page. 27 27 28 28 == Blosc License == … … 38 38 == Source tarball == 39 39 40 There is not a source tarball as such yet. I'll provide one once Blosc will be comestable.40 There is not a source tarball as such yet. I'll provide one once Blosc will be declared stable. 41 41 42 42 == About This Site ==
![(please configure the [header_logo] section in trac.ini)](/images/blosc-logo-small.png)