OpenMP - current usage in ql

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

OpenMP - current usage in ql

Peter Caspers-4
Hello,

enabling OpenMP slows down the test-suite on my maching significantly.
Compiling with gcc 4.10.0 and O3 on my i7-2760QM@2.40GHz I get the
timings below (left = OpenMP enabled, right = disabled), using all 8
threads.

Am I doing something wrong here, does anyone gets a different picture
? If no I wonder, should we better remove the existing #pragmas in the
fd and tree part of the library again ?

Thanks a lot
Peter

Testing Barone-Adesi and Whaley approximation for American options...
Testing Bjerksund and Stensland approximation for American options...
Testing Ju approximation for American options...
Testing finite-difference engine for American options...
Testing finite-differences American option greeks...
Testing finite-differences shout option greeks...

Tests completed in 20.25 s / Tests completed in 1.63 s

Testing analytic continuous geometric average-price Asians...
Testing analytic continuous geometric average-price Asian greeks...
Testing analytic discrete geometric average-price Asians...
Testing analytic discrete geometric average-strike Asians...
Testing Monte Carlo discrete geometric average-price Asians...
Testing Monte Carlo discrete arithmetic average-price Asians...
Testing Monte Carlo discrete arithmetic average-strike Asians...
Testing discrete-averaging geometric Asian greeks...
Testing use of past fixings in Asian options...

Tests completed in 19.28 s / Tests completed in 6.16 s

Testing barrier options against Haug's values...
Testing barrier options against Babsiri's values...
Testing barrier options against Beaglehole's values...
Testing local volatility and Heston FD engines for barrier options...

Tests completed in 13.86 s / Tests completed in 2.70 s

Testing dividend European option values with no dividends...
Testing dividend European option with a dividend on today's date...
Testing dividend European option greeks...
Testing finite-difference dividend European option values...
Testing finite-differences dividend European option greeks...
Testing finite-differences dividend American option greeks...
Testing degenerate finite-differences dividend European option...
Testing degenerate finite-differences dividend American option...

Tests completed in 25.06 s / Tests completed in 3.55 s

Testing FDM with barrier option for Heston model vs Black-Scholes model...
Testing FDM with barrier option in Heston model...
Testing FDM with American option in Heston model...
Testing FDM Heston for Ikonen and Toivanen tests...
Testing FDM Heston with Black Scholes model...
Testing FDM with European option with dividends in Heston model...
Testing FDM Heston convergence...

Tests completed in 3 m 31.86 s / Tests completed in 44.90 s

Testing indexing of a linear operator...
Testing uniform grid mesher...
Testing application of first-derivatives map...
Testing application of second-derivatives map...
Testing application of second-order mixed-derivatives map...
Testing triple-band map solution...
Testing FDM with barrier option in Heston model...
Testing FDM with American option in Heston model...
Testing FDM with express certificate in Heston model...
Testing FDM with Heston Hull-White model...
Testing bi-conjugated gradient stabilized algorithm with Heston operator...
Testing Crank-Nicolson with initial implicit damping steps for a
digital option...
Testing SparseMatrixReference type...
Testing assignment to zero in sparse matrix...

Tests completed in 46.73 s / Tests completed in 6.63 s

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP - current usage in ql

Klaus Spanderen-2

Hi Peter,

 

I share your experiences, OpenMP slows down QL on my hardware as well. OpenMP might work for very large problems but for "normal" problems the overhead kills the speed-up. I'd rather remove it.

 

regards

Klaus

 

On Saturday, June 14, 2014 08:10:55 PM Peter Caspers wrote:

> Hello,

>

> enabling OpenMP slows down the test-suite on my maching significantly.

> Compiling with gcc 4.10.0 and O3 on my i7-2760QM@2.40GHz I get the

> timings below (left = OpenMP enabled, right = disabled), using all 8

> threads.

>

> Am I doing something wrong here, does anyone gets a different picture

> ? If no I wonder, should we better remove the existing #pragmas in the

> fd and tree part of the library again ?

>

> Thanks a lot

> Peter

>

> Testing Barone-Adesi and Whaley approximation for American options...

> Testing Bjerksund and Stensland approximation for American options...

> Testing Ju approximation for American options...

> Testing finite-difference engine for American options...

> Testing finite-differences American option greeks...

> Testing finite-differences shout option greeks...

>

> Tests completed in 20.25 s / Tests completed in 1.63 s

>

> Testing analytic continuous geometric average-price Asians...

> Testing analytic continuous geometric average-price Asian greeks...

> Testing analytic discrete geometric average-price Asians...

> Testing analytic discrete geometric average-strike Asians...

> Testing Monte Carlo discrete geometric average-price Asians...

> Testing Monte Carlo discrete arithmetic average-price Asians...

> Testing Monte Carlo discrete arithmetic average-strike Asians...

> Testing discrete-averaging geometric Asian greeks...

> Testing use of past fixings in Asian options...

>

> Tests completed in 19.28 s / Tests completed in 6.16 s

>

> Testing barrier options against Haug's values...

> Testing barrier options against Babsiri's values...

> Testing barrier options against Beaglehole's values...

> Testing local volatility and Heston FD engines for barrier options...

>

> Tests completed in 13.86 s / Tests completed in 2.70 s

>

> Testing dividend European option values with no dividends...

> Testing dividend European option with a dividend on today's date...

> Testing dividend European option greeks...

> Testing finite-difference dividend European option values...

> Testing finite-differences dividend European option greeks...

> Testing finite-differences dividend American option greeks...

> Testing degenerate finite-differences dividend European option...

> Testing degenerate finite-differences dividend American option...

>

> Tests completed in 25.06 s / Tests completed in 3.55 s

>

> Testing FDM with barrier option for Heston model vs Black-Scholes model...

> Testing FDM with barrier option in Heston model...

> Testing FDM with American option in Heston model...

> Testing FDM Heston for Ikonen and Toivanen tests...

> Testing FDM Heston with Black Scholes model...

> Testing FDM with European option with dividends in Heston model...

> Testing FDM Heston convergence...

>

> Tests completed in 3 m 31.86 s / Tests completed in 44.90 s

>

> Testing indexing of a linear operator...

> Testing uniform grid mesher...

> Testing application of first-derivatives map...

> Testing application of second-derivatives map...

> Testing application of second-order mixed-derivatives map...

> Testing triple-band map solution...

> Testing FDM with barrier option in Heston model...

> Testing FDM with American option in Heston model...

> Testing FDM with express certificate in Heston model...

> Testing FDM with Heston Hull-White model...

> Testing bi-conjugated gradient stabilized algorithm with Heston operator...

> Testing Crank-Nicolson with initial implicit damping steps for a

> digital option...

> Testing SparseMatrixReference type...

> Testing assignment to zero in sparse matrix...

>

> Tests completed in 46.73 s / Tests completed in 6.63 s

>

> ----------------------------------------------------------------------------

> -- HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions

> Find What Matters Most in Your Big Data with HPCC Systems

> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.

> Leverages Graph Analysis for Fast Processing & Easy Data Exploration

> http://p.sf.net/sfu/hpccsystems

> _______________________________________________

> QuantLib-dev mailing list

> [hidden email]

> https://lists.sourceforge.net/lists/listinfo/quantlib-dev

 


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP - current usage in ql

Joseph Wang-4
In reply to this post by Peter Caspers-4

​That's quite odd since OpenMP should not be causing such huge slowdowns.

Since by default the items are not complied, I'd rather keep the pragma's there.

Also is there any possibilities that the timing code is off? 

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP - current usage in ql

Luigi Ballabio
Yes, the timing might be off. I suspect that the Boost timer is
reporting the total CPU time, that is, the sum of the actual time per
each CPU. On my box, if I run the BermudanSwaption example with OpenMP
enabled, it outputs:

Run completed in 2 m 35 s

but if I call it through "time", I get an output like:

real    1m19.767s
user    2m34.183s
sys     0m0.538s

that is, total CPU time 2m34s, but real time 1m19s. Being the
untrusting individual that I am, I also timed it with a stopwatch. The
elapsed time is actually 1m19s :)

This said, I still see a little slowdown in the test cases Peter
listed. My times are:

AmericanOptionTest: disabled 2.4s, enabled 3.4s (real time)
AsianOptionTest: disabled 10.6s, enabled 10.4s
BarrierOptionTest: disabled 4.9s, enabled 6.1s
DividendOptionTest: disabled 5.1s, enabled 6.5s
FdHestonTest: disabled 73.4s, enabled 76.8s
FdmLinearOpTest: disabled 11.4s, enabled 11.6s

Not much, but a bit slower anyway. I've only got 2 CPUs though (and I
compiled with -O2). Peter, what do you get on your 8 CPUs if you run
the cases via "time"?

Luigi







On Sun, Jun 15, 2014 at 3:40 PM, Joseph Wang <[hidden email]> wrote:

>
> That's quite odd since OpenMP should not be causing such huge slowdowns.
>
> Since by default the items are not complied, I'd rather keep the pragma's
> there.
>
> Also is there any possibilities that the timing code is off?
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> QuantLib-dev mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>



--
<https://implementingquantlib.blogspot.com>
<https://twitter.com/lballabio>

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP - current usage in ql

japari
Hi,
Have you tried to use only the 4 physical threads of your cpu? I dont use OpenMP but I use boost threads and hyperthreading does very weird things; one is that over 4 threads (in this case) scaling stops being linear, which makes sense. Luigi, your 2 cpus are physical, right?
just a shot.
Best


----- Original Message -----

> Yes, the timing might be off. I suspect that the Boost timer is
> reporting the total CPU time, that is, the sum of the actual time per
> each CPU. On my box, if I run the BermudanSwaption example with
> OpenMP
> enabled, it outputs:
>
> Run completed in 2 m 35 s
>
> but if I call it through "time", I get an output like:
>
> real    1m19.767s
> user    2m34.183s
> sys     0m0.538s
>
> that is, total CPU time 2m34s, but real time 1m19s. Being the
> untrusting individual that I am, I also timed it with a stopwatch.
> The
> elapsed time is actually 1m19s :)
>
> This said, I still see a little slowdown in the test cases Peter
> listed. My times are:
>
> AmericanOptionTest: disabled 2.4s, enabled 3.4s (real time)
> AsianOptionTest: disabled 10.6s, enabled 10.4s
> BarrierOptionTest: disabled 4.9s, enabled 6.1s
> DividendOptionTest: disabled 5.1s, enabled 6.5s
> FdHestonTest: disabled 73.4s, enabled 76.8s
> FdmLinearOpTest: disabled 11.4s, enabled 11.6s
>
> Not much, but a bit slower anyway. I've only got 2 CPUs though (and I
> compiled with -O2). Peter, what do you get on your 8 CPUs if you run
> the cases via "time"?
>
> Luigi
>
>
>
>
>
>
>
> On Sun, Jun 15, 2014 at 3:40 PM, Joseph Wang <[hidden email]>
> wrote:
> >
> > That's quite odd since OpenMP should not be causing such huge
> > slowdowns.
> >
> > Since by default the items are not complied, I'd rather keep the
> > pragma's
> > there.
> >
> > Also is there any possibilities that the timing code is off?
> >
> > ------------------------------------------------------------------------------
> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
> > Solutions
> > Find What Matters Most in Your Big Data with HPCC Systems
> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> > Leverages Graph Analysis for Fast Processing & Easy Data
> > Exploration
> > http://p.sf.net/sfu/hpccsystems
> > _______________________________________________
> > QuantLib-dev mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
> >
>
>
>
> --
> <https://implementingquantlib.blogspot.com>
> <https://twitter.com/lballabio>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
> Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> QuantLib-dev mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP - current usage in ql

Peter Caspers-4
oh yes, my timings below are total CPU time rather than wall clock
time ( I usually measure the latter by just counting seconds in my
head ... ). That was unfair, sorry ! With the time command I get (for
the AmericanOptionTest)

g++ -O3 -fopenmp

OMP_NUM_THREADS=1   real = 1.925s
OMP_NUM_THREADS=2   real = 1.468s
OMP_NUM_THREADS=3   real = 1.590s
OMP_NUM_THREADS=4   real = 1.647s
OMP_NUM_THREADS=5   real = 1.780s
OMP_NUM_THREADS=6   real = 1.838s
OMP_NUM_THREADS=7   real = 2.081s
OMP_NUM_THREADS=8   real = 2.282s

g++ -O3

real = 1.638s

still, the point is the same imo. WIth 8 cores I'd expect maybe a
speed-up factor of 4 to 6. What we instead see is something around 1
(often below 1 as it seems), so effectively all the additional cpu
time is eaten up by the overhead for multiple threads. That's not
worth it, is it ? I didn't try many optimizations with omp yet, but
what I see in "good" cases are the 4-6 above. I wouldn't parallelize
for much below.

best regards
Peter














On 15 June 2014 17:18,  <[hidden email]> wrote:

> Hi,
> Have you tried to use only the 4 physical threads of your cpu? I dont use OpenMP but I use boost threads and hyperthreading does very weird things; one is that over 4 threads (in this case) scaling stops being linear, which makes sense. Luigi, your 2 cpus are physical, right?
> just a shot.
> Best
>
>
> ----- Original Message -----
>> Yes, the timing might be off. I suspect that the Boost timer is
>> reporting the total CPU time, that is, the sum of the actual time per
>> each CPU. On my box, if I run the BermudanSwaption example with
>> OpenMP
>> enabled, it outputs:
>>
>> Run completed in 2 m 35 s
>>
>> but if I call it through "time", I get an output like:
>>
>> real    1m19.767s
>> user    2m34.183s
>> sys     0m0.538s
>>
>> that is, total CPU time 2m34s, but real time 1m19s. Being the
>> untrusting individual that I am, I also timed it with a stopwatch.
>> The
>> elapsed time is actually 1m19s :)
>>
>> This said, I still see a little slowdown in the test cases Peter
>> listed. My times are:
>>
>> AmericanOptionTest: disabled 2.4s, enabled 3.4s (real time)
>> AsianOptionTest: disabled 10.6s, enabled 10.4s
>> BarrierOptionTest: disabled 4.9s, enabled 6.1s
>> DividendOptionTest: disabled 5.1s, enabled 6.5s
>> FdHestonTest: disabled 73.4s, enabled 76.8s
>> FdmLinearOpTest: disabled 11.4s, enabled 11.6s
>>
>> Not much, but a bit slower anyway. I've only got 2 CPUs though (and I
>> compiled with -O2). Peter, what do you get on your 8 CPUs if you run
>> the cases via "time"?
>>
>> Luigi
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Jun 15, 2014 at 3:40 PM, Joseph Wang <[hidden email]>
>> wrote:
>> >
>> > That's quite odd since OpenMP should not be causing such huge
>> > slowdowns.
>> >
>> > Since by default the items are not complied, I'd rather keep the
>> > pragma's
>> > there.
>> >
>> > Also is there any possibilities that the timing code is off?
>> >
>> > ------------------------------------------------------------------------------
>> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>> > Solutions
>> > Find What Matters Most in Your Big Data with HPCC Systems
>> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> > Leverages Graph Analysis for Fast Processing & Easy Data
>> > Exploration
>> > http://p.sf.net/sfu/hpccsystems
>> > _______________________________________________
>> > QuantLib-dev mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>> >
>>
>>
>>
>> --
>> <https://implementingquantlib.blogspot.com>
>> <https://twitter.com/lballabio>
>>
>> ------------------------------------------------------------------------------
>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>> Solutions
>> Find What Matters Most in Your Big Data with HPCC Systems
>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> http://p.sf.net/sfu/hpccsystems
>> _______________________________________________
>> QuantLib-dev mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> QuantLib-dev mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/quantlib-dev

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP - current usage in ql

Piter Dias-4
Is there a chance of the test being too small? I remember that many years ago (during my Algorithmics days) we used to make tests as big as the client real portfolio in order to make server buying advices (mainly based on number of processors due to scenarios valuation).

Would the speed-up factors significantly change if the base scenario runs for, lets say, 15 minutes? It would make fixed costs more negligible.

Regards,

_____________________
Piter Dias



> Date: Sun, 15 Jun 2014 20:11:28 +0200

> From: [hidden email]
> To: [hidden email]
> CC: [hidden email]; [hidden email]; [hidden email]
> Subject: Re: [Quantlib-dev] OpenMP - current usage in ql
>
> oh yes, my timings below are total CPU time rather than wall clock
> time ( I usually measure the latter by just counting seconds in my
> head ... ). That was unfair, sorry ! With the time command I get (for
> the AmericanOptionTest)
>
> g++ -O3 -fopenmp
>
> OMP_NUM_THREADS=1 real = 1.925s
> OMP_NUM_THREADS=2 real = 1.468s
> OMP_NUM_THREADS=3 real = 1.590s
> OMP_NUM_THREADS=4 real = 1.647s
> OMP_NUM_THREADS=5 real = 1.780s
> OMP_NUM_THREADS=6 real = 1.838s
> OMP_NUM_THREADS=7 real = 2.081s
> OMP_NUM_THREADS=8 real = 2.282s
>
> g++ -O3
>
> real = 1.638s
>
> still, the point is the same imo. WIth 8 cores I'd expect maybe a
> speed-up factor of 4 to 6. What we instead see is something around 1
> (often below 1 as it seems), so effectively all the additional cpu
> time is eaten up by the overhead for multiple threads. That's not
> worth it, is it ? I didn't try many optimizations with omp yet, but
> what I see in "good" cases are the 4-6 above. I wouldn't parallelize
> for much below.
>
> best regards
> Peter
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 15 June 2014 17:18, <[hidden email]> wrote:
> > Hi,
> > Have you tried to use only the 4 physical threads of your cpu? I dont use OpenMP but I use boost threads and hyperthreading does very weird things; one is that over 4 threads (in this case) scaling stops being linear, which makes sense. Luigi, your 2 cpus are physical, right?
> > just a shot.
> > Best
> >
> >
> > ----- Original Message -----
> >> Yes, the timing might be off. I suspect that the Boost timer is
> >> reporting the total CPU time, that is, the sum of the actual time per
> >> each CPU. On my box, if I run the BermudanSwaption example with
> >> OpenMP
> >> enabled, it outputs:
> >>
> >> Run completed in 2 m 35 s
> >>
> >> but if I call it through "time", I get an output like:
> >>
> >> real 1m19.767s
> >> user 2m34.183s
> >> sys 0m0.538s
> >>
> >> that is, total CPU time 2m34s, but real time 1m19s. Being the
> >> untrusting individual that I am, I also timed it with a stopwatch.
> >> The
> >> elapsed time is actually 1m19s :)
> >>
> >> This said, I still see a little slowdown in the test cases Peter
> >> listed. My times are:
> >>
> >> AmericanOptionTest: disabled 2.4s, enabled 3.4s (real time)
> >> AsianOptionTest: disabled 10.6s, enabled 10.4s
> >> BarrierOptionTest: disabled 4.9s, enabled 6.1s
> >> DividendOptionTest: disabled 5.1s, enabled 6.5s
> >> FdHestonTest: disabled 73.4s, enabled 76.8s
> >> FdmLinearOpTest: disabled 11.4s, enabled 11.6s
> >>
> >> Not much, but a bit slower anyway. I've only got 2 CPUs though (and I
> >> compiled with -O2). Peter, what do you get on your 8 CPUs if you run
> >> the cases via "time"?
> >>
> >> Luigi
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Sun, Jun 15, 2014 at 3:40 PM, Joseph Wang <[hidden email]>
> >> wrote:
> >> >
> >> > That's quite odd since OpenMP should not be causing such huge
> >> > slowdowns.
> >> >
> >> > Since by default the items are not complied, I'd rather keep the
> >> > pragma's
> >> > there.
> >> >
> >> > Also is there any possibilities that the timing code is off?
> >> >
> >> > ------------------------------------------------------------------------------
> >> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
> >> > Solutions
> >> > Find What Matters Most in Your Big Data with HPCC Systems
> >> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> >> > Leverages Graph Analysis for Fast Processing & Easy Data
> >> > Exploration
> >> > http://p.sf.net/sfu/hpccsystems
> >> > _______________________________________________
> >> > QuantLib-dev mailing list
> >> > [hidden email]
> >> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
> >> >
> >>
> >>
> >>
> >> --
> >> <https://implementingquantlib.blogspot.com>
> >> <https://twitter.com/lballabio>
> >>
> >> ------------------------------------------------------------------------------
> >> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
> >> Solutions
> >> Find What Matters Most in Your Big Data with HPCC Systems
> >> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> >> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> >> http://p.sf.net/sfu/hpccsystems
> >> _______________________________________________
> >> QuantLib-dev mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
> >>
> >
> > ------------------------------------------------------------------------------
> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> > Find What Matters Most in Your Big Data with HPCC Systems
> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> > Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> > http://p.sf.net/sfu/hpccsystems
> > _______________________________________________
> > QuantLib-dev mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> QuantLib-dev mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/quantlib-dev

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP - current usage in ql

Joseph Wang-4
There's a reason why it's default "off." :-) :-(

There's not that much parallelization going on in PDE and Tree.  The OpenMP is used mainly for copying arrays so you should see some modest improvements if you are doing very large arrays, but no where close to a factor of N for N cores.

The problem is that the big speedups are for things like monte carlo, which are "Riduculously parallel."  The trouble with MC is that once you parallelize, the random number generator gives you different answers, and it becomes impossible to test and also you have the possibility of very subtle bugs with the RNG coorrelations across different processors.  Trying to get quantlib to have consistent RNG's in multi-core MC turns out to be a non-trivial project.

Alternatively, the algos that are being used for PDE's in quantlib particularly ones relating to tridiagonal matrix operations turn out to be terrible for parallel computing.  There could be some better speedups for PDE's with different algos that are better for parallel systems.  Also if you want to parallelizes, you want to use an explicit PDE scheme rather than an implicit one.

The two projects that I can think of are:

1) getting MC working for openmp or
2) putting in better parallel algos for PDE's.

If no one else is working on this, I should be in a position to do it in three months or so.  Right now I'm working on the front end of my bitcoin trading system (http://www.bitquant.com.hk/).  Once I get the front-end working, I'll connect quantlib to the backend through OpenGamma and the java interface, at which point I'll need to go back to backend programming.

Also, if there are any other quantlib people going to HK for the bitcoin conference, let me know.  It turns out that quantlib is perfect for bitcoin derviatives.  In most financial products, you get the code and the trading system through the broker.  Since bitcoin has no brokers, and trading systems have to be external which leaves a space for open source software. 

The other project I'm working on is that there are a ton of HK people (including my wife) that trade warrants and callable bull-bear certificates, and most of them don't have access to any sort of analytics.  The reason for this is that brokers either don't care that their customers have access to analytics, or actually don't want clients with analytics since the brokers are taking the other side of the trade and want their clients to lose money.




On Mon, Jun 16, 2014 at 4:20 AM, Piter Dias <[hidden email]> wrote:
Is there a chance of the test being too small? I remember that many years ago (during my Algorithmics days) we used to make tests as big as the client real portfolio in order to make server buying advices (mainly based on number of processors due to scenarios valuation).

Would the speed-up factors significantly change if the base scenario runs for, lets say, 15 minutes? It would make fixed costs more negligible.

Regards,

_____________________
Piter Dias



> Date: Sun, 15 Jun 2014 20:11:28 +0200
> From: [hidden email]
> To: [hidden email]
> CC: [hidden email]; [hidden email]; [hidden email]
> Subject: Re: [Quantlib-dev] OpenMP - current usage in ql

>
> oh yes, my timings below are total CPU time rather than wall clock
> time ( I usually measure the latter by just counting seconds in my
> head ... ). That was unfair, sorry ! With the time command I get (for
> the AmericanOptionTest)
>
> g++ -O3 -fopenmp
>
> OMP_NUM_THREADS=1 real = 1.925s
> OMP_NUM_THREADS=2 real = 1.468s
> OMP_NUM_THREADS=3 real = 1.590s
> OMP_NUM_THREADS=4 real = 1.647s
> OMP_NUM_THREADS=5 real = 1.780s
> OMP_NUM_THREADS=6 real = 1.838s
> OMP_NUM_THREADS=7 real = 2.081s
> OMP_NUM_THREADS=8 real = 2.282s
>
> g++ -O3
>
> real = 1.638s
>
> still, the point is the same imo. WIth 8 cores I'd expect maybe a
> speed-up factor of 4 to 6. What we instead see is something around 1
> (often below 1 as it seems), so effectively all the additional cpu
> time is eaten up by the overhead for multiple threads. That's not
> worth it, is it ? I didn't try many optimizations with omp yet, but
> what I see in "good" cases are the 4-6 above. I wouldn't parallelize
> for much below.

>
> best regards
> Peter
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 15 June 2014 17:18, <[hidden email]> wrote:
> > Hi,
> > Have you tried to use only the 4 physical threads of your cpu? I dont use OpenMP but I use boost threads and hyperthreading does very weird things; one is that over 4 threads (in this case) scaling stops being linear, which makes sense. Luigi, your 2 cpus are physical, right?
> > just a shot.
> > Best
> >
> >
> > ----- Original Message -----
> >> Yes, the timing might be off. I suspect that the Boost timer is
> >> reporting the total CPU time, that is, the sum of the actual time per
> >> each CPU. On my box, if I run the BermudanSwaption example with
> >> OpenMP
> >> enabled, it outputs:
> >>
> >> Run completed in 2 m 35 s
> >>
> >> but if I call it through "time", I get an output like:
> >>
> >> real 1m19.767s
> >> user 2m34.183s
> >> sys 0m0.538s
> >>
> >> that is, total CPU time 2m34s, but real time 1m19s. Being the
> >> untrusting individual that I am, I also timed it with a stopwatch.
> >> The
> >> elapsed time is actually 1m19s :)
> >>
> >> This said, I still see a little slowdown in the test cases Peter
> >> listed. My times are:
> >>
> >> AmericanOptionTest: disabled 2.4s, enabled 3.4s (real time)
> >> AsianOptionTest: disabled 10.6s, enabled 10.4s
> >> BarrierOptionTest: disabled 4.9s, enabled 6.1s
> >> DividendOptionTest: disabled 5.1s, enabled 6.5s
> >> FdHestonTest: disabled 73.4s, enabled 76.8s
> >> FdmLinearOpTest: disabled 11.4s, enabled 11.6s
> >>
> >> Not much, but a bit slower anyway. I've only got 2 CPUs though (and I
> >> compiled with -O2). Peter, what do you get on your 8 CPUs if you run
> >> the cases via "time"?
> >>
> >> Luigi
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Sun, Jun 15, 2014 at 3:40 PM, Joseph Wang <[hidden email]>
> >> wrote:
> >> >
> >> > That's quite odd since OpenMP should not be causing such huge
> >> > slowdowns.
> >> >
> >> > Since by default the items are not complied, I'd rather keep the
> >> > pragma's
> >> > there.
> >> >
> >> > Also is there any possibilities that the timing code is off?
> >> >
> >> > ------------------------------------------------------------------------------
> >> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
> >> > Solutions
> >> > Find What Matters Most in Your Big Data with HPCC Systems
> >> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> >> > Leverages Graph Analysis for Fast Processing & Easy Data
> >> > Exploration
> >> > http://p.sf.net/sfu/hpccsystems
> >> > _______________________________________________
> >> > QuantLib-dev mailing list
> >> > [hidden email]
> >> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
> >> >
> >>
> >>
> >>
> >> --
> >> <https://implementingquantlib.blogspot.com>
> >> <https://twitter.com/lballabio>
> >>
> >> ------------------------------------------------------------------------------
> >> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
> >> Solutions
> >> Find What Matters Most in Your Big Data with HPCC Systems
> >> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> >> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> >> http://p.sf.net/sfu/hpccsystems
> >> _______________________________________________
> >> QuantLib-dev mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
> >>
> >
> > ------------------------------------------------------------------------------
> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> > Find What Matters Most in Your Big Data with HPCC Systems
> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> > Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> > http://p.sf.net/sfu/hpccsystems
> > _______________________________________________
> > QuantLib-dev mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> QuantLib-dev mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/quantlib-dev


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP - current usage in ql

japari

Hello
 
> The problem is that the big speedups are for things like monte carlo,
> which are "Riduculously parallel." The trouble with MC is that once
> you parallelize, the random number generator gives you different
> answers, and it becomes impossible to test and also you have the
> possibility of very subtle bugs with the RNG coorrelations across
> different processors. Trying to get quantlib to have consistent
> RNG's in multi-core MC turns out to be a non-trivial project.
>

I have done this with threads and its not that difficult to avoid those problems. But most (if not all) of the generators in the library are not suitable for it. Not the version of the Mersenne Twister or the distribution generators since most are rejection algorithms. I do get exactly the same result figures with one or N threads. Sobol is ok but is limited for distribution mapping; what I do is to wrap an interface around TINA's MT generator. But you need to link against that. The speed up is practically linear with CPU number at least in the context of the problem I use it for.



> Alternatively, the algos that are being used for PDE's in quantlib
> particularly ones relating to tridiagonal matrix operations turn out
> to be terrible for parallel computing. There could be some better
> speedups for PDE's with different algos that are better for parallel
> systems. Also if you want to parallelizes, you want to use an
> explicit PDE scheme rather than an implicit one.
>
> The two projects that I can think of are:
>
> 1) getting MC working for openmp or
> 2) putting in better parallel algos for PDE's.
>

Back in the tokamak dark ages people use to parallelize implicit PDE solvers by breaking the domain and solving those pieces concurrently (rather than working on parallelizing the algebraic solver); all the subtlery was in sticking them together on each time step. That was a long time ago and cant remember the details.

Best






------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP - current usage in ql

Luigi Ballabio
In reply to this post by Piter Dias-4
Trees seem to fare better. On the BermudanSwaption example, the
elapsed time does halve on two cores (more or less). Peter, what do
you get on 4 or 8?
Also, there might be other factors that enter the equation (number of
cache lines, for example?)

Luigi


On Sun, Jun 15, 2014 at 10:20 PM, Piter Dias <[hidden email]> wrote:

> Is there a chance of the test being too small? I remember that many years
> ago (during my Algorithmics days) we used to make tests as big as the client
> real portfolio in order to make server buying advices (mainly based on
> number of processors due to scenarios valuation).
>
> Would the speed-up factors significantly change if the base scenario runs
> for, lets say, 15 minutes? It would make fixed costs more negligible.
>
> Regards,
>
> _____________________
> Piter Dias
> [hidden email]
> www.piterdias.com
>
>
>
>> Date: Sun, 15 Jun 2014 20:11:28 +0200
>> From: [hidden email]
>> To: [hidden email]
>> CC: [hidden email]; [hidden email];
>> [hidden email]
>> Subject: Re: [Quantlib-dev] OpenMP - current usage in ql
>
>>
>> oh yes, my timings below are total CPU time rather than wall clock
>> time ( I usually measure the latter by just counting seconds in my
>> head ... ). That was unfair, sorry ! With the time command I get (for
>> the AmericanOptionTest)
>>
>> g++ -O3 -fopenmp
>>
>> OMP_NUM_THREADS=1 real = 1.925s
>> OMP_NUM_THREADS=2 real = 1.468s
>> OMP_NUM_THREADS=3 real = 1.590s
>> OMP_NUM_THREADS=4 real = 1.647s
>> OMP_NUM_THREADS=5 real = 1.780s
>> OMP_NUM_THREADS=6 real = 1.838s
>> OMP_NUM_THREADS=7 real = 2.081s
>> OMP_NUM_THREADS=8 real = 2.282s
>>
>> g++ -O3
>>
>> real = 1.638s
>>
>> still, the point is the same imo. WIth 8 cores I'd expect maybe a
>> speed-up factor of 4 to 6. What we instead see is something around 1
>> (often below 1 as it seems), so effectively all the additional cpu
>> time is eaten up by the overhead for multiple threads. That's not
>> worth it, is it ? I didn't try many optimizations with omp yet, but
>> what I see in "good" cases are the 4-6 above. I wouldn't parallelize
>> for much below.
>>
>> best regards
>> Peter
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 15 June 2014 17:18, <[hidden email]> wrote:
>> > Hi,
>> > Have you tried to use only the 4 physical threads of your cpu? I dont
>> > use OpenMP but I use boost threads and hyperthreading does very weird
>> > things; one is that over 4 threads (in this case) scaling stops being
>> > linear, which makes sense. Luigi, your 2 cpus are physical, right?
>> > just a shot.
>> > Best
>> >
>> >
>> > ----- Original Message -----
>> >> Yes, the timing might be off. I suspect that the Boost timer is
>> >> reporting the total CPU time, that is, the sum of the actual time per
>> >> each CPU. On my box, if I run the BermudanSwaption example with
>> >> OpenMP
>> >> enabled, it outputs:
>> >>
>> >> Run completed in 2 m 35 s
>> >>
>> >> but if I call it through "time", I get an output like:
>> >>
>> >> real 1m19.767s
>> >> user 2m34.183s
>> >> sys 0m0.538s
>> >>
>> >> that is, total CPU time 2m34s, but real time 1m19s. Being the
>> >> untrusting individual that I am, I also timed it with a stopwatch.
>> >> The
>> >> elapsed time is actually 1m19s :)
>> >>
>> >> This said, I still see a little slowdown in the test cases Peter
>> >> listed. My times are:
>> >>
>> >> AmericanOptionTest: disabled 2.4s, enabled 3.4s (real time)
>> >> AsianOptionTest: disabled 10.6s, enabled 10.4s
>> >> BarrierOptionTest: disabled 4.9s, enabled 6.1s
>> >> DividendOptionTest: disabled 5.1s, enabled 6.5s
>> >> FdHestonTest: disabled 73.4s, enabled 76.8s
>> >> FdmLinearOpTest: disabled 11.4s, enabled 11.6s
>> >>
>> >> Not much, but a bit slower anyway. I've only got 2 CPUs though (and I
>> >> compiled with -O2). Peter, what do you get on your 8 CPUs if you run
>> >> the cases via "time"?
>> >>
>> >> Luigi
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Sun, Jun 15, 2014 at 3:40 PM, Joseph Wang <[hidden email]>
>> >> wrote:
>> >> >
>> >> > That's quite odd since OpenMP should not be causing such huge
>> >> > slowdowns.
>> >> >
>> >> > Since by default the items are not complied, I'd rather keep the
>> >> > pragma's
>> >> > there.
>> >> >
>> >> > Also is there any possibilities that the timing code is off?
>> >> >
>> >> >
>> >> > ------------------------------------------------------------------------------
>> >> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>> >> > Solutions
>> >> > Find What Matters Most in Your Big Data with HPCC Systems
>> >> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> >> > Leverages Graph Analysis for Fast Processing & Easy Data
>> >> > Exploration
>> >> > http://p.sf.net/sfu/hpccsystems
>> >> > _______________________________________________
>> >> > QuantLib-dev mailing list
>> >> > [hidden email]
>> >> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> <https://implementingquantlib.blogspot.com>
>> >> <https://twitter.com/lballabio>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>> >> Solutions
>> >> Find What Matters Most in Your Big Data with HPCC Systems
>> >> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> >> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> >> http://p.sf.net/sfu/hpccsystems
>> >> _______________________________________________
>> >> QuantLib-dev mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>> >>
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>> > Solutions
>> > Find What Matters Most in Your Big Data with HPCC Systems
>> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> > Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> > http://p.sf.net/sfu/hpccsystems
>> > _______________________________________________
>> > QuantLib-dev mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>
>>
>> ------------------------------------------------------------------------------
>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>> Find What Matters Most in Your Big Data with HPCC Systems
>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> http://p.sf.net/sfu/hpccsystems
>> _______________________________________________
>> QuantLib-dev mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev



--
<https://implementingquantlib.blogspot.com>
<https://twitter.com/lballabio>

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP - current usage in ql

Peter Caspers-4
yes, this example scales well up to 6 cores:

without omp 1m20s
threads=2, real=0m42s
threads=4, real=0m36s
threads=5, real=0m30s
threads=6, real=0m27s
threads=7, real=0m40s
threads=8, real=2m53s

Peter


On 16 June 2014 09:45, Luigi Ballabio <[hidden email]> wrote:

> Trees seem to fare better. On the BermudanSwaption example, the
> elapsed time does halve on two cores (more or less). Peter, what do
> you get on 4 or 8?
> Also, there might be other factors that enter the equation (number of
> cache lines, for example?)
>
> Luigi
>
>
> On Sun, Jun 15, 2014 at 10:20 PM, Piter Dias <[hidden email]> wrote:
>> Is there a chance of the test being too small? I remember that many years
>> ago (during my Algorithmics days) we used to make tests as big as the client
>> real portfolio in order to make server buying advices (mainly based on
>> number of processors due to scenarios valuation).
>>
>> Would the speed-up factors significantly change if the base scenario runs
>> for, lets say, 15 minutes? It would make fixed costs more negligible.
>>
>> Regards,
>>
>> _____________________
>> Piter Dias
>> [hidden email]
>> www.piterdias.com
>>
>>
>>
>>> Date: Sun, 15 Jun 2014 20:11:28 +0200
>>> From: [hidden email]
>>> To: [hidden email]
>>> CC: [hidden email]; [hidden email];
>>> [hidden email]
>>> Subject: Re: [Quantlib-dev] OpenMP - current usage in ql
>>
>>>
>>> oh yes, my timings below are total CPU time rather than wall clock
>>> time ( I usually measure the latter by just counting seconds in my
>>> head ... ). That was unfair, sorry ! With the time command I get (for
>>> the AmericanOptionTest)
>>>
>>> g++ -O3 -fopenmp
>>>
>>> OMP_NUM_THREADS=1 real = 1.925s
>>> OMP_NUM_THREADS=2 real = 1.468s
>>> OMP_NUM_THREADS=3 real = 1.590s
>>> OMP_NUM_THREADS=4 real = 1.647s
>>> OMP_NUM_THREADS=5 real = 1.780s
>>> OMP_NUM_THREADS=6 real = 1.838s
>>> OMP_NUM_THREADS=7 real = 2.081s
>>> OMP_NUM_THREADS=8 real = 2.282s
>>>
>>> g++ -O3
>>>
>>> real = 1.638s
>>>
>>> still, the point is the same imo. WIth 8 cores I'd expect maybe a
>>> speed-up factor of 4 to 6. What we instead see is something around 1
>>> (often below 1 as it seems), so effectively all the additional cpu
>>> time is eaten up by the overhead for multiple threads. That's not
>>> worth it, is it ? I didn't try many optimizations with omp yet, but
>>> what I see in "good" cases are the 4-6 above. I wouldn't parallelize
>>> for much below.
>>>
>>> best regards
>>> Peter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 15 June 2014 17:18, <[hidden email]> wrote:
>>> > Hi,
>>> > Have you tried to use only the 4 physical threads of your cpu? I dont
>>> > use OpenMP but I use boost threads and hyperthreading does very weird
>>> > things; one is that over 4 threads (in this case) scaling stops being
>>> > linear, which makes sense. Luigi, your 2 cpus are physical, right?
>>> > just a shot.
>>> > Best
>>> >
>>> >
>>> > ----- Original Message -----
>>> >> Yes, the timing might be off. I suspect that the Boost timer is
>>> >> reporting the total CPU time, that is, the sum of the actual time per
>>> >> each CPU. On my box, if I run the BermudanSwaption example with
>>> >> OpenMP
>>> >> enabled, it outputs:
>>> >>
>>> >> Run completed in 2 m 35 s
>>> >>
>>> >> but if I call it through "time", I get an output like:
>>> >>
>>> >> real 1m19.767s
>>> >> user 2m34.183s
>>> >> sys 0m0.538s
>>> >>
>>> >> that is, total CPU time 2m34s, but real time 1m19s. Being the
>>> >> untrusting individual that I am, I also timed it with a stopwatch.
>>> >> The
>>> >> elapsed time is actually 1m19s :)
>>> >>
>>> >> This said, I still see a little slowdown in the test cases Peter
>>> >> listed. My times are:
>>> >>
>>> >> AmericanOptionTest: disabled 2.4s, enabled 3.4s (real time)
>>> >> AsianOptionTest: disabled 10.6s, enabled 10.4s
>>> >> BarrierOptionTest: disabled 4.9s, enabled 6.1s
>>> >> DividendOptionTest: disabled 5.1s, enabled 6.5s
>>> >> FdHestonTest: disabled 73.4s, enabled 76.8s
>>> >> FdmLinearOpTest: disabled 11.4s, enabled 11.6s
>>> >>
>>> >> Not much, but a bit slower anyway. I've only got 2 CPUs though (and I
>>> >> compiled with -O2). Peter, what do you get on your 8 CPUs if you run
>>> >> the cases via "time"?
>>> >>
>>> >> Luigi
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Sun, Jun 15, 2014 at 3:40 PM, Joseph Wang <[hidden email]>
>>> >> wrote:
>>> >> >
>>> >> > That's quite odd since OpenMP should not be causing such huge
>>> >> > slowdowns.
>>> >> >
>>> >> > Since by default the items are not complied, I'd rather keep the
>>> >> > pragma's
>>> >> > there.
>>> >> >
>>> >> > Also is there any possibilities that the timing code is off?
>>> >> >
>>> >> >
>>> >> > ------------------------------------------------------------------------------
>>> >> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>> >> > Solutions
>>> >> > Find What Matters Most in Your Big Data with HPCC Systems
>>> >> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>> >> > Leverages Graph Analysis for Fast Processing & Easy Data
>>> >> > Exploration
>>> >> > http://p.sf.net/sfu/hpccsystems
>>> >> > _______________________________________________
>>> >> > QuantLib-dev mailing list
>>> >> > [hidden email]
>>> >> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> <https://implementingquantlib.blogspot.com>
>>> >> <https://twitter.com/lballabio>
>>> >>
>>> >>
>>> >> ------------------------------------------------------------------------------
>>> >> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>> >> Solutions
>>> >> Find What Matters Most in Your Big Data with HPCC Systems
>>> >> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>> >> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>> >> http://p.sf.net/sfu/hpccsystems
>>> >> _______________________________________________
>>> >> QuantLib-dev mailing list
>>> >> [hidden email]
>>> >> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>> >>
>>> >
>>> >
>>> > ------------------------------------------------------------------------------
>>> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>> > Solutions
>>> > Find What Matters Most in Your Big Data with HPCC Systems
>>> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>> > Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>> > http://p.sf.net/sfu/hpccsystems
>>> > _______________________________________________
>>> > QuantLib-dev mailing list
>>> > [hidden email]
>>> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>>> Find What Matters Most in Your Big Data with HPCC Systems
>>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>> http://p.sf.net/sfu/hpccsystems
>>> _______________________________________________
>>> QuantLib-dev mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>
>
>
> --
> <https://implementingquantlib.blogspot.com>
> <https://twitter.com/lballabio>

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP - current usage in ql

Peter Caspers-4
Hi,

I did some more tests. First of all I have to correct my numbers below
for 7 and 8 threads, when I rerun the example I get ~ 26s for both. I
don't know, what went wrong the last time. Anyway I would say in
lattice.hpp the parallel for is indeed useful giving a speed-up factor
of 4.6 on 8 cores in this example, that's fine.

Next I looked at the FDM code again in more detail. In
triplebandlinearop.cpp and ninepointlinearop.cpp all omp pragmas are
applied to loops of this kind

for (Size i=0; i < size; ++i) {
      diag[i]  = y_diag[i];
      lower[i] = y_lower[i];
      upper[i] = y_upper[i];
}

which I guess are already optimized by the compiler on one thread very
well and multithreading would only makes sense for very big sizes
(playing around with that in toy examples with loops of similar
complexity I see a speed up only with loop sizes around 1E+8 and
bigger). Indeed disabling the parallel for pragmas in the operator
classes does not change the performance of the test cases filtered by
--run_test='*/*/*Fd*', see below (only the overhead is avoided, which
seems desirable). It seems that these loops are all vectorized by the
compiler without having to do anything, because I am getting the same
running times when adding #pragma omp simd explicitly.

I am not sure about the parallelevolver.hpp and stepcondition.hpp, but
at least I don't see any benefit in the Fd test cases or the
BermudanSwaption Example (which I think also uses them (?)).

all #pragma omp enabled

8 threads real    1m29.793s user    8m15.733s
2 threads real    1m13.676s user    1m56.217s

disable triplebandlinearop.cpp

8 threads real    1m31.091s user    6m47.130s
2 threads real    1m15.548s user    1m43.742s

disable triplebandlinearop.cpp, ninepointlinearop.cpp

8 threads real    1m18.263s user    1m56.950s
2 threads real    1m15.677suser    1m16.592s

disable triplebandlinearop.cpp, ninepointlinearop.cpp,
parallelevolver.hpp, stepcondition.hpp

real    1m14.468s user    1m11.959s

Peter

On 16 June 2014 12:52, Peter Caspers <[hidden email]> wrote:

> yes, this example scales well up to 6 cores:
>
> without omp 1m20s
> threads=2, real=0m42s
> threads=4, real=0m36s
> threads=5, real=0m30s
> threads=6, real=0m27s
> threads=7, real=0m40s
> threads=8, real=2m53s
>
> Peter
>
>
> On 16 June 2014 09:45, Luigi Ballabio <[hidden email]> wrote:
>> Trees seem to fare better. On the BermudanSwaption example, the
>> elapsed time does halve on two cores (more or less). Peter, what do
>> you get on 4 or 8?
>> Also, there might be other factors that enter the equation (number of
>> cache lines, for example?)
>>
>> Luigi
>>
>>
>> On Sun, Jun 15, 2014 at 10:20 PM, Piter Dias <[hidden email]> wrote:
>>> Is there a chance of the test being too small? I remember that many years
>>> ago (during my Algorithmics days) we used to make tests as big as the client
>>> real portfolio in order to make server buying advices (mainly based on
>>> number of processors due to scenarios valuation).
>>>
>>> Would the speed-up factors significantly change if the base scenario runs
>>> for, lets say, 15 minutes? It would make fixed costs more negligible.
>>>
>>> Regards,
>>>
>>> _____________________
>>> Piter Dias
>>> [hidden email]
>>> www.piterdias.com
>>>
>>>
>>>
>>>> Date: Sun, 15 Jun 2014 20:11:28 +0200
>>>> From: [hidden email]
>>>> To: [hidden email]
>>>> CC: [hidden email]; [hidden email];
>>>> [hidden email]
>>>> Subject: Re: [Quantlib-dev] OpenMP - current usage in ql
>>>
>>>>
>>>> oh yes, my timings below are total CPU time rather than wall clock
>>>> time ( I usually measure the latter by just counting seconds in my
>>>> head ... ). That was unfair, sorry ! With the time command I get (for
>>>> the AmericanOptionTest)
>>>>
>>>> g++ -O3 -fopenmp
>>>>
>>>> OMP_NUM_THREADS=1 real = 1.925s
>>>> OMP_NUM_THREADS=2 real = 1.468s
>>>> OMP_NUM_THREADS=3 real = 1.590s
>>>> OMP_NUM_THREADS=4 real = 1.647s
>>>> OMP_NUM_THREADS=5 real = 1.780s
>>>> OMP_NUM_THREADS=6 real = 1.838s
>>>> OMP_NUM_THREADS=7 real = 2.081s
>>>> OMP_NUM_THREADS=8 real = 2.282s
>>>>
>>>> g++ -O3
>>>>
>>>> real = 1.638s
>>>>
>>>> still, the point is the same imo. WIth 8 cores I'd expect maybe a
>>>> speed-up factor of 4 to 6. What we instead see is something around 1
>>>> (often below 1 as it seems), so effectively all the additional cpu
>>>> time is eaten up by the overhead for multiple threads. That's not
>>>> worth it, is it ? I didn't try many optimizations with omp yet, but
>>>> what I see in "good" cases are the 4-6 above. I wouldn't parallelize
>>>> for much below.
>>>>
>>>> best regards
>>>> Peter
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 15 June 2014 17:18, <[hidden email]> wrote:
>>>> > Hi,
>>>> > Have you tried to use only the 4 physical threads of your cpu? I dont
>>>> > use OpenMP but I use boost threads and hyperthreading does very weird
>>>> > things; one is that over 4 threads (in this case) scaling stops being
>>>> > linear, which makes sense. Luigi, your 2 cpus are physical, right?
>>>> > just a shot.
>>>> > Best
>>>> >
>>>> >
>>>> > ----- Original Message -----
>>>> >> Yes, the timing might be off. I suspect that the Boost timer is
>>>> >> reporting the total CPU time, that is, the sum of the actual time per
>>>> >> each CPU. On my box, if I run the BermudanSwaption example with
>>>> >> OpenMP
>>>> >> enabled, it outputs:
>>>> >>
>>>> >> Run completed in 2 m 35 s
>>>> >>
>>>> >> but if I call it through "time", I get an output like:
>>>> >>
>>>> >> real 1m19.767s
>>>> >> user 2m34.183s
>>>> >> sys 0m0.538s
>>>> >>
>>>> >> that is, total CPU time 2m34s, but real time 1m19s. Being the
>>>> >> untrusting individual that I am, I also timed it with a stopwatch.
>>>> >> The
>>>> >> elapsed time is actually 1m19s :)
>>>> >>
>>>> >> This said, I still see a little slowdown in the test cases Peter
>>>> >> listed. My times are:
>>>> >>
>>>> >> AmericanOptionTest: disabled 2.4s, enabled 3.4s (real time)
>>>> >> AsianOptionTest: disabled 10.6s, enabled 10.4s
>>>> >> BarrierOptionTest: disabled 4.9s, enabled 6.1s
>>>> >> DividendOptionTest: disabled 5.1s, enabled 6.5s
>>>> >> FdHestonTest: disabled 73.4s, enabled 76.8s
>>>> >> FdmLinearOpTest: disabled 11.4s, enabled 11.6s
>>>> >>
>>>> >> Not much, but a bit slower anyway. I've only got 2 CPUs though (and I
>>>> >> compiled with -O2). Peter, what do you get on your 8 CPUs if you run
>>>> >> the cases via "time"?
>>>> >>
>>>> >> Luigi
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Sun, Jun 15, 2014 at 3:40 PM, Joseph Wang <[hidden email]>
>>>> >> wrote:
>>>> >> >
>>>> >> > That's quite odd since OpenMP should not be causing such huge
>>>> >> > slowdowns.
>>>> >> >
>>>> >> > Since by default the items are not complied, I'd rather keep the
>>>> >> > pragma's
>>>> >> > there.
>>>> >> >
>>>> >> > Also is there any possibilities that the timing code is off?
>>>> >> >
>>>> >> >
>>>> >> > ------------------------------------------------------------------------------
>>>> >> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>>> >> > Solutions
>>>> >> > Find What Matters Most in Your Big Data with HPCC Systems
>>>> >> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>>> >> > Leverages Graph Analysis for Fast Processing & Easy Data
>>>> >> > Exploration
>>>> >> > http://p.sf.net/sfu/hpccsystems
>>>> >> > _______________________________________________
>>>> >> > QuantLib-dev mailing list
>>>> >> > [hidden email]
>>>> >> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> <https://implementingquantlib.blogspot.com>
>>>> >> <https://twitter.com/lballabio>
>>>> >>
>>>> >>
>>>> >> ------------------------------------------------------------------------------
>>>> >> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>>> >> Solutions
>>>> >> Find What Matters Most in Your Big Data with HPCC Systems
>>>> >> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>>> >> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>>> >> http://p.sf.net/sfu/hpccsystems
>>>> >> _______________________________________________
>>>> >> QuantLib-dev mailing list
>>>> >> [hidden email]
>>>> >> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>> >>
>>>> >
>>>> >
>>>> > ------------------------------------------------------------------------------
>>>> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>>> > Solutions
>>>> > Find What Matters Most in Your Big Data with HPCC Systems
>>>> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>>> > Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>>> > http://p.sf.net/sfu/hpccsystems
>>>> > _______________________________________________
>>>> > QuantLib-dev mailing list
>>>> > [hidden email]
>>>> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>>>> Find What Matters Most in Your Big Data with HPCC Systems
>>>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>>> http://p.sf.net/sfu/hpccsystems
>>>> _______________________________________________
>>>> QuantLib-dev mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>
>>
>>
>> --
>> <https://implementingquantlib.blogspot.com>
>> <https://twitter.com/lballabio>

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP - current usage in ql

Joseph Wang-4
Thanks for the note.  I'll take a look at the code some time next week.  Since the loops are being vectorized, it looks like there isn't any benefit to parallelization and I'll see about taking the omp pragmas out.  


On Thu, Jun 19, 2014 at 11:47 PM, Peter Caspers <[hidden email]> wrote:
Hi,

I did some more tests. First of all I have to correct my numbers below
for 7 and 8 threads, when I rerun the example I get ~ 26s for both. I
don't know, what went wrong the last time. Anyway I would say in
lattice.hpp the parallel for is indeed useful giving a speed-up factor
of 4.6 on 8 cores in this example, that's fine.

Next I looked at the FDM code again in more detail. In
triplebandlinearop.cpp and ninepointlinearop.cpp all omp pragmas are
applied to loops of this kind

for (Size i=0; i < size; ++i) {
      diag[i]  = y_diag[i];
      lower[i] = y_lower[i];
      upper[i] = y_upper[i];
}

which I guess are already optimized by the compiler on one thread very
well and multithreading would only makes sense for very big sizes
(playing around with that in toy examples with loops of similar
complexity I see a speed up only with loop sizes around 1E+8 and
bigger). Indeed disabling the parallel for pragmas in the operator
classes does not change the performance of the test cases filtered by
--run_test='*/*/*Fd*', see below (only the overhead is avoided, which
seems desirable). It seems that these loops are all vectorized by the
compiler without having to do anything, because I am getting the same
running times when adding #pragma omp simd explicitly.

I am not sure about the parallelevolver.hpp and stepcondition.hpp, but
at least I don't see any benefit in the Fd test cases or the
BermudanSwaption Example (which I think also uses them (?)).

all #pragma omp enabled

8 threads real    1m29.793s user    8m15.733s
2 threads real    1m13.676s user    1m56.217s

disable triplebandlinearop.cpp

8 threads real    1m31.091s user    6m47.130s
2 threads real    1m15.548s user    1m43.742s

disable triplebandlinearop.cpp, ninepointlinearop.cpp

8 threads real    1m18.263s user    1m56.950s
2 threads real    1m15.677suser    1m16.592s

disable triplebandlinearop.cpp, ninepointlinearop.cpp,
parallelevolver.hpp, stepcondition.hpp

real    1m14.468s user    1m11.959s

Peter

On 16 June 2014 12:52, Peter Caspers <[hidden email]> wrote:
> yes, this example scales well up to 6 cores:
>
> without omp 1m20s
> threads=2, real=0m42s
> threads=4, real=0m36s
> threads=5, real=0m30s
> threads=6, real=0m27s
> threads=7, real=0m40s
> threads=8, real=2m53s
>
> Peter
>
>
> On 16 June 2014 09:45, Luigi Ballabio <[hidden email]> wrote:
>> Trees seem to fare better. On the BermudanSwaption example, the
>> elapsed time does halve on two cores (more or less). Peter, what do
>> you get on 4 or 8?
>> Also, there might be other factors that enter the equation (number of
>> cache lines, for example?)
>>
>> Luigi
>>
>>
>> On Sun, Jun 15, 2014 at 10:20 PM, Piter Dias <[hidden email]> wrote:
>>> Is there a chance of the test being too small? I remember that many years
>>> ago (during my Algorithmics days) we used to make tests as big as the client
>>> real portfolio in order to make server buying advices (mainly based on
>>> number of processors due to scenarios valuation).
>>>
>>> Would the speed-up factors significantly change if the base scenario runs
>>> for, lets say, 15 minutes? It would make fixed costs more negligible.
>>>
>>> Regards,
>>>
>>> _____________________
>>> Piter Dias
>>> [hidden email]
>>> www.piterdias.com
>>>
>>>
>>>
>>>> Date: Sun, 15 Jun 2014 20:11:28 +0200
>>>> From: [hidden email]
>>>> To: [hidden email]
>>>> CC: [hidden email]; [hidden email];
>>>> [hidden email]
>>>> Subject: Re: [Quantlib-dev] OpenMP - current usage in ql
>>>
>>>>
>>>> oh yes, my timings below are total CPU time rather than wall clock
>>>> time ( I usually measure the latter by just counting seconds in my
>>>> head ... ). That was unfair, sorry ! With the time command I get (for
>>>> the AmericanOptionTest)
>>>>
>>>> g++ -O3 -fopenmp
>>>>
>>>> OMP_NUM_THREADS=1 real = 1.925s
>>>> OMP_NUM_THREADS=2 real = 1.468s
>>>> OMP_NUM_THREADS=3 real = 1.590s
>>>> OMP_NUM_THREADS=4 real = 1.647s
>>>> OMP_NUM_THREADS=5 real = 1.780s
>>>> OMP_NUM_THREADS=6 real = 1.838s
>>>> OMP_NUM_THREADS=7 real = 2.081s
>>>> OMP_NUM_THREADS=8 real = 2.282s
>>>>
>>>> g++ -O3
>>>>
>>>> real = 1.638s
>>>>
>>>> still, the point is the same imo. WIth 8 cores I'd expect maybe a
>>>> speed-up factor of 4 to 6. What we instead see is something around 1
>>>> (often below 1 as it seems), so effectively all the additional cpu
>>>> time is eaten up by the overhead for multiple threads. That's not
>>>> worth it, is it ? I didn't try many optimizations with omp yet, but
>>>> what I see in "good" cases are the 4-6 above. I wouldn't parallelize
>>>> for much below.
>>>>
>>>> best regards
>>>> Peter
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 15 June 2014 17:18, <[hidden email]> wrote:
>>>> > Hi,
>>>> > Have you tried to use only the 4 physical threads of your cpu? I dont
>>>> > use OpenMP but I use boost threads and hyperthreading does very weird
>>>> > things; one is that over 4 threads (in this case) scaling stops being
>>>> > linear, which makes sense. Luigi, your 2 cpus are physical, right?
>>>> > just a shot.
>>>> > Best
>>>> >
>>>> >
>>>> > ----- Original Message -----
>>>> >> Yes, the timing might be off. I suspect that the Boost timer is
>>>> >> reporting the total CPU time, that is, the sum of the actual time per
>>>> >> each CPU. On my box, if I run the BermudanSwaption example with
>>>> >> OpenMP
>>>> >> enabled, it outputs:
>>>> >>
>>>> >> Run completed in 2 m 35 s
>>>> >>
>>>> >> but if I call it through "time", I get an output like:
>>>> >>
>>>> >> real 1m19.767s
>>>> >> user 2m34.183s
>>>> >> sys 0m0.538s
>>>> >>
>>>> >> that is, total CPU time 2m34s, but real time 1m19s. Being the
>>>> >> untrusting individual that I am, I also timed it with a stopwatch.
>>>> >> The
>>>> >> elapsed time is actually 1m19s :)
>>>> >>
>>>> >> This said, I still see a little slowdown in the test cases Peter
>>>> >> listed. My times are:
>>>> >>
>>>> >> AmericanOptionTest: disabled 2.4s, enabled 3.4s (real time)
>>>> >> AsianOptionTest: disabled 10.6s, enabled 10.4s
>>>> >> BarrierOptionTest: disabled 4.9s, enabled 6.1s
>>>> >> DividendOptionTest: disabled 5.1s, enabled 6.5s
>>>> >> FdHestonTest: disabled 73.4s, enabled 76.8s
>>>> >> FdmLinearOpTest: disabled 11.4s, enabled 11.6s
>>>> >>
>>>> >> Not much, but a bit slower anyway. I've only got 2 CPUs though (and I
>>>> >> compiled with -O2). Peter, what do you get on your 8 CPUs if you run
>>>> >> the cases via "time"?
>>>> >>
>>>> >> Luigi
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Sun, Jun 15, 2014 at 3:40 PM, Joseph Wang <[hidden email]>
>>>> >> wrote:
>>>> >> >
>>>> >> > That's quite odd since OpenMP should not be causing such huge
>>>> >> > slowdowns.
>>>> >> >
>>>> >> > Since by default the items are not complied, I'd rather keep the
>>>> >> > pragma's
>>>> >> > there.
>>>> >> >
>>>> >> > Also is there any possibilities that the timing code is off?
>>>> >> >
>>>> >> >
>>>> >> > ------------------------------------------------------------------------------
>>>> >> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>>> >> > Solutions
>>>> >> > Find What Matters Most in Your Big Data with HPCC Systems
>>>> >> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>>> >> > Leverages Graph Analysis for Fast Processing & Easy Data
>>>> >> > Exploration
>>>> >> > http://p.sf.net/sfu/hpccsystems
>>>> >> > _______________________________________________
>>>> >> > QuantLib-dev mailing list
>>>> >> > [hidden email]
>>>> >> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> <https://implementingquantlib.blogspot.com>
>>>> >> <https://twitter.com/lballabio>
>>>> >>
>>>> >>
>>>> >> ------------------------------------------------------------------------------
>>>> >> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>>> >> Solutions
>>>> >> Find What Matters Most in Your Big Data with HPCC Systems
>>>> >> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>>> >> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>>> >> http://p.sf.net/sfu/hpccsystems
>>>> >> _______________________________________________
>>>> >> QuantLib-dev mailing list
>>>> >> [hidden email]
>>>> >> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>> >>
>>>> >
>>>> >
>>>> > ------------------------------------------------------------------------------
>>>> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>>> > Solutions
>>>> > Find What Matters Most in Your Big Data with HPCC Systems
>>>> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>>> > Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>>> > http://p.sf.net/sfu/hpccsystems
>>>> > _______________________________________________
>>>> > QuantLib-dev mailing list
>>>> > [hidden email]
>>>> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>>>> Find What Matters Most in Your Big Data with HPCC Systems
>>>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>>> http://p.sf.net/sfu/hpccsystems
>>>> _______________________________________________
>>>> QuantLib-dev mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>
>>
>>
>> --
>> <https://implementingquantlib.blogspot.com>
>> <https://twitter.com/lballabio>


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev