QuantLib - Re: OpenMP - current usage in ql

Re: OpenMP - current usage in ql

Posted by Peter Caspers-4 on Jun 16, 2014; 10:52am
URL: http://quantlib.414.s1.nabble.com/OpenMP-current-usage-in-ql-tp15458p15477.html

yes, this example scales well up to 6 cores:

without omp 1m20s
threads=2, real=0m42s
threads=4, real=0m36s
threads=5, real=0m30s
threads=6, real=0m27s
threads=7, real=0m40s
threads=8, real=2m53s

Peter

On 16 June 2014 09:45, Luigi Ballabio <[hidden email]> wrote:

> Trees seem to fare better. On the BermudanSwaption example, the
> elapsed time does halve on two cores (more or less). Peter, what do
> you get on 4 or 8?
> Also, there might be other factors that enter the equation (number of
> cache lines, for example?)
>
> Luigi
>
>
> On Sun, Jun 15, 2014 at 10:20 PM, Piter Dias <[hidden email]> wrote:
>> Is there a chance of the test being too small? I remember that many years
>> ago (during my Algorithmics days) we used to make tests as big as the client
>> real portfolio in order to make server buying advices (mainly based on
>> number of processors due to scenarios valuation).
>>
>> Would the speed-up factors significantly change if the base scenario runs
>> for, lets say, 15 minutes? It would make fixed costs more negligible.
>>
>> Regards,
>>
>> _____________________
>> Piter Dias
>> [hidden email]
>> www.piterdias.com
>>
>>
>>
>>> Date: Sun, 15 Jun 2014 20:11:28 +0200
>>> From: [hidden email]
>>> To: [hidden email]
>>> CC: [hidden email]; [hidden email];
>>> [hidden email]
>>> Subject: Re: [Quantlib-dev] OpenMP - current usage in ql
>>
>>>
>>> oh yes, my timings below are total CPU time rather than wall clock
>>> time ( I usually measure the latter by just counting seconds in my
>>> head ... ). That was unfair, sorry ! With the time command I get (for
>>> the AmericanOptionTest)
>>>
>>> g++ -O3 -fopenmp
>>>
>>> OMP_NUM_THREADS=1 real = 1.925s
>>> OMP_NUM_THREADS=2 real = 1.468s
>>> OMP_NUM_THREADS=3 real = 1.590s
>>> OMP_NUM_THREADS=4 real = 1.647s
>>> OMP_NUM_THREADS=5 real = 1.780s
>>> OMP_NUM_THREADS=6 real = 1.838s
>>> OMP_NUM_THREADS=7 real = 2.081s
>>> OMP_NUM_THREADS=8 real = 2.282s
>>>
>>> g++ -O3
>>>
>>> real = 1.638s
>>>
>>> still, the point is the same imo. WIth 8 cores I'd expect maybe a
>>> speed-up factor of 4 to 6. What we instead see is something around 1
>>> (often below 1 as it seems), so effectively all the additional cpu
>>> time is eaten up by the overhead for multiple threads. That's not
>>> worth it, is it ? I didn't try many optimizations with omp yet, but
>>> what I see in "good" cases are the 4-6 above. I wouldn't parallelize
>>> for much below.
>>>
>>> best regards
>>> Peter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 15 June 2014 17:18, <[hidden email]> wrote:
>>> > Hi,
>>> > Have you tried to use only the 4 physical threads of your cpu? I dont
>>> > use OpenMP but I use boost threads and hyperthreading does very weird
>>> > things; one is that over 4 threads (in this case) scaling stops being
>>> > linear, which makes sense. Luigi, your 2 cpus are physical, right?
>>> > just a shot.
>>> > Best
>>> >
>>> >
>>> > ----- Original Message -----
>>> >> Yes, the timing might be off. I suspect that the Boost timer is
>>> >> reporting the total CPU time, that is, the sum of the actual time per
>>> >> each CPU. On my box, if I run the BermudanSwaption example with
>>> >> OpenMP
>>> >> enabled, it outputs:
>>> >>
>>> >> Run completed in 2 m 35 s
>>> >>
>>> >> but if I call it through "time", I get an output like:
>>> >>
>>> >> real 1m19.767s
>>> >> user 2m34.183s
>>> >> sys 0m0.538s
>>> >>
>>> >> that is, total CPU time 2m34s, but real time 1m19s. Being the
>>> >> untrusting individual that I am, I also timed it with a stopwatch.
>>> >> The
>>> >> elapsed time is actually 1m19s :)
>>> >>
>>> >> This said, I still see a little slowdown in the test cases Peter
>>> >> listed. My times are:
>>> >>
>>> >> AmericanOptionTest: disabled 2.4s, enabled 3.4s (real time)
>>> >> AsianOptionTest: disabled 10.6s, enabled 10.4s
>>> >> BarrierOptionTest: disabled 4.9s, enabled 6.1s
>>> >> DividendOptionTest: disabled 5.1s, enabled 6.5s
>>> >> FdHestonTest: disabled 73.4s, enabled 76.8s
>>> >> FdmLinearOpTest: disabled 11.4s, enabled 11.6s
>>> >>
>>> >> Not much, but a bit slower anyway. I've only got 2 CPUs though (and I
>>> >> compiled with -O2). Peter, what do you get on your 8 CPUs if you run
>>> >> the cases via "time"?
>>> >>
>>> >> Luigi
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Sun, Jun 15, 2014 at 3:40 PM, Joseph Wang <[hidden email]>
>>> >> wrote:
>>> >> >
>>> >> > That's quite odd since OpenMP should not be causing such huge
>>> >> > slowdowns.
>>> >> >
>>> >> > Since by default the items are not complied, I'd rather keep the
>>> >> > pragma's
>>> >> > there.
>>> >> >
>>> >> > Also is there any possibilities that the timing code is off?
>>> >> >
>>> >> >
>>> >> > ------------------------------------------------------------------------------
>>> >> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>> >> > Solutions
>>> >> > Find What Matters Most in Your Big Data with HPCC Systems
>>> >> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>> >> > Leverages Graph Analysis for Fast Processing & Easy Data
>>> >> > Exploration
>>> >> > http://p.sf.net/sfu/hpccsystems
>>> >> > _______________________________________________
>>> >> > QuantLib-dev mailing list
>>> >> > [hidden email]
>>> >> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> <https://implementingquantlib.blogspot.com>
>>> >> <https://twitter.com/lballabio>
>>> >>
>>> >>
>>> >> ------------------------------------------------------------------------------
>>> >> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>> >> Solutions
>>> >> Find What Matters Most in Your Big Data with HPCC Systems
>>> >> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>> >> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>> >> http://p.sf.net/sfu/hpccsystems
>>> >> _______________________________________________
>>> >> QuantLib-dev mailing list
>>> >> [hidden email]
>>> >> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>> >>
>>> >
>>> >
>>> > ------------------------------------------------------------------------------
>>> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>> > Solutions
>>> > Find What Matters Most in Your Big Data with HPCC Systems
>>> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>> > Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>> > http://p.sf.net/sfu/hpccsystems
>>> > _______________________________________________
>>> > QuantLib-dev mailing list
>>> > [hidden email]
>>> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>>> Find What Matters Most in Your Big Data with HPCC Systems
>>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>> http://p.sf.net/sfu/hpccsystems
>>> _______________________________________________
>>> QuantLib-dev mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>
>
>
> --
> <https://implementingquantlib.blogspot.com>
> <https://twitter.com/lballabio>

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev