Changeset 155


Ignore:
Timestamp:
06/08/10 11:22:48 (3 years ago)
Author:
faltet
Message:

Additional fine-tuning for the non-compressed case.

I've adopted a multi-domain algorithm for adapting to different cases (buffer size and number of cores, basically).

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/src/blosc.c

    r154 r155  
    4242#define MAX_THREADS 64 
    4343 
     44/* Some useful units */ 
     45#define KB 1024 
     46#define MB (1024*KB) 
     47 
    4448/* The size of L1 cache.  32 KB is quite common nowadays. */ 
    45 #define L1 (32*1024) 
     49#define L1 (32*KB) 
    4650 
    4751 
     
    497501  else if (nbytes >= L1*4) { 
    498502    blocksize = L1 * 4; 
    499     if (clevel <= 3) { 
     503    if (clevel == 0) { 
     504      blocksize /= 16; 
     505    } 
     506    else if (clevel <= 3) { 
    500507      blocksize /= 8; 
    501508    } 
     
    627634 
    628635  if (*flags & BLOSC_MEMCPYED) { 
    629     params.ntbytes = BLOSC_MAX_OVERHEAD; 
    630     ntbytes = do_job(); 
    631     /* The next is more effective?  It does not seem so on platforms 
    632        where memcpy is not well optimized for transmitting long chunks. 
    633        Also, using multicores benefits speed, specially on Windows. */ 
    634     /* memcpy(dest+BLOSC_MAX_OVERHEAD, src, nbytes); 
    635     ntbytes = nbytes + BLOSC_MAX_OVERHEAD; */ 
     636    if ((nbytes > 64*KB) || (nthreads > 1)) { 
     637      /* More effective in multi-core processors or large buffers */ 
     638      params.ntbytes = BLOSC_MAX_OVERHEAD; 
     639      ntbytes = do_job(); 
     640    } 
     641    else { 
     642      /* More effective in single-core processors or small buffers */ 
     643      memcpy(dest+BLOSC_MAX_OVERHEAD, src, nbytes); 
     644      ntbytes = nbytes + BLOSC_MAX_OVERHEAD; 
     645    } 
    636646  } 
    637647 
     
    712722  /* Check whether this buffer is memcpy'ed */ 
    713723  if (flags & BLOSC_MEMCPYED) { 
    714     ntbytes = do_job(); 
    715      /* The next is more effective?  It does not seem so on platforms 
    716        where memcpy is not well optimized for transmitting long chunks. 
    717        Also, using multicores benefits speed, specially on Windows. */ 
    718     /* memcpy(dest, src+BLOSC_MAX_OVERHEAD, nbytes); 
    719     ntbytes = nbytes; */ 
     724    if ((nbytes > 64*KB) || (nthreads > 1)) { 
     725      /* More effective in multi-core processors or large buffers */ 
     726      ntbytes = do_job(); 
     727    } 
     728    else { 
     729      /* More effective in single-core processors or small buffers */ 
     730      memcpy(dest, src+BLOSC_MAX_OVERHEAD, nbytes); 
     731      ntbytes = nbytes; 
     732    } 
    720733  } 
    721734  else { 
Note: See TracChangeset for help on using the changeset viewer.