Hee: performance - Fast percentile in C++ -

Tuesday, 15 July 2014

performance - Fast percentile in C++ -

my programme calculates monte carlo simulation value-at-risk metric. simplify much possible, have:

1/ simulated daily cashflows 2/ sample of possible 1-year cashflow, need draw 365 random daily cashflows , sum them

hence, daily cashflows empirically given distrobution function sampled 365 times. this,

1/ sort daily cashflows array called *this->distro* 2/ calculate 365 percentiles corresponding random probabilities

i need simulation of yearly cashflow, say, 10k times population of simulated yearly cashflows work with. having distribution function of daily cashflows prepared, sampling like...

for ( unsigned int idxsim = 0; idxsim < _g.xsimulationcount; idxsim++ ) {     generatedval = 0.0;     ( register unsigned int idxday = 0; idxday < 365; idxday ++ )     {         prob = (flt_type)fastrand();         // prob [0,1]         didx = prob * dmaxdistroindex;       // scale prob distro function size                                              // index distro array         _floor = ((flt_type)(long)didx);     // fast version of floor         _ceil  = _floor + 1.0f;              // 'fast' ceil:)         iidx1  = (unsigned int)( _floor );         iidx2  = iidx1 + 1;          // interpolation per se         generatedval += this->distro[iidx1]*(_ceil - didx  );         generatedval += this->distro[iidx2]*(didx  - _floor);     }     this->yearlycashflows[idxsim] = generatedval ; }

the code within of both for cycles linear interpolation. if, usd 1000 corresponds prob=0.01, usd 10000 corresponds prob=0.1 if don't have empipirical number p=0.05 want usd 5000 interpolation.

the question: code runs correctly, though profiler says programme spends cca 60% of runtime on interpolation per se. question is, how can create task faster? sample runtimes reported vtune specific lines follows:

prob = (flt_type)fastrand();         //  0.727s didx = prob * dmaxdistroindex;       //  1.435s _floor = ((flt_type)(long)didx);     //  0.718s _ceil  = _floor + 1.0f;              //    -  iidx1  = (unsigned int)( _floor );   // 4.949s iidx2  = iidx1 + 1;                  //    -  // interpolation per se generatedval += this->distro[iidx1]*(_ceil - didx  );  //    - generatedval += this->distro[iidx2]*(didx  - _floor);  // 12.704s

dashes mean profiler reports no runtimes lines.

any hint appreciated. daniel

edit: both c.fogelklou , msalters have pointed out great enhancements. best code in line c.fogelklou said

converter = distrodimension / (flt_type)(rand_max + 1) ( unsigned int idxsim = 0; idxsim < _g.xsimulationcount; idxsim++ ) {     generatedval = 0.0;     ( register unsigned int idxday = 0; idxday < 365; idxday ++ )     {         didx  = (flt_type)fastrand() * converter;         iidx1 = (unsigned long)didx);         _floor = (flt_type)iidx1;         generatedval += this->distro[iidx1] + this->diffs[iidx1] *(didx  - _floor);     } }

and best have along msalter's lines

normalizer = 1.0/(flt_type)(rand_max + 1); ( unsigned int idxsim = 0; idxsim < _g.xsimulationcount; idxsim++ ) {     generatedval = 0.0;     ( register unsigned int idxday = 0; idxday < 365; idxday ++ )     {         didx  = (flt_type)fastrand()* normalizer ;         iidx1 = fastrand() % _g.xdaycount;         generatedval += this->distro[iidx1];         generatedval += this->diffs[iidx1]*didx;     } }

the sec code approx. 30 percent faster. now, of 95s of total runtime, lastly line consumes 68s. lastly 1 line consumes 3.2s hence double*double multiplication must devil. thought of sse - saving lastly 3 operands array , carry out vector multiplication of this->diffs[i]*didx[i] , add together this->distro[i] code ran 50 percent slower. hence, think nail wall.

many all. d.

this proposal little optimization, removing need ceil, 2 casts, , 1 of multiplies. if running on fixed point processor, explain why multiplies , casts between float , int taking long. in case, seek using fixed point optimizations or turning on floating point in compiler if cpu supports it!

for ( unsigned int idxsim = 0; idxsim < _g.xsimulationcount; idxsim++ ) {     generatedval = 0.0;     ( register unsigned int idxday = 0; idxday < 365; idxday ++ )     {         prob = (flt_type)fastrand();         // prob [0,1]         didx = prob * dmaxdistroindex;       // scale prob distro function size                                              // index distro array         iidx1  = (long)didx;         _floor = (flt_type)iidx1;     // fast version of floor         iidx2  = iidx1 + 1;          // interpolation per se         {            const flt_type diff = this->distro[iidx2] - this->distro[iidx1];            const flt_type interp = this->distro[iidx1] + diff * (didx - _floor);            generatedval += interp;         }     }     this->yearlycashflows[idxsim] = generatedval ; }

c++ performance percentile

Hee

Tuesday, 15 July 2014

performance - Fast percentile in C++ -

No comments:

Post a Comment