http://quantlib.414.s1.nabble.com/OpenMP-current-usage-in-ql-tp15458p15477.html
> Trees seem to fare better. On the BermudanSwaption example, the
> elapsed time does halve on two cores (more or less). Peter, what do
> you get on 4 or 8?
> Also, there might be other factors that enter the equation (number of
> cache lines, for example?)
>
> Luigi
>
>
> On Sun, Jun 15, 2014 at 10:20 PM, Piter Dias <
[hidden email]> wrote:
>> Is there a chance of the test being too small? I remember that many years
>> ago (during my Algorithmics days) we used to make tests as big as the client
>> real portfolio in order to make server buying advices (mainly based on
>> number of processors due to scenarios valuation).
>>
>> Would the speed-up factors significantly change if the base scenario runs
>> for, lets say, 15 minutes? It would make fixed costs more negligible.
>>
>> Regards,
>>
>> _____________________
>> Piter Dias
>>
[hidden email]
>> www.piterdias.com
>>
>>
>>
>>> Date: Sun, 15 Jun 2014 20:11:28 +0200
>>> From:
[hidden email]
>>> To:
[hidden email]
>>> CC:
[hidden email];
[hidden email];
>>>
[hidden email]
>>> Subject: Re: [Quantlib-dev] OpenMP - current usage in ql
>>
>>>
>>> oh yes, my timings below are total CPU time rather than wall clock
>>> time ( I usually measure the latter by just counting seconds in my
>>> head ... ). That was unfair, sorry ! With the time command I get (for
>>> the AmericanOptionTest)
>>>
>>> g++ -O3 -fopenmp
>>>
>>> OMP_NUM_THREADS=1 real = 1.925s
>>> OMP_NUM_THREADS=2 real = 1.468s
>>> OMP_NUM_THREADS=3 real = 1.590s
>>> OMP_NUM_THREADS=4 real = 1.647s
>>> OMP_NUM_THREADS=5 real = 1.780s
>>> OMP_NUM_THREADS=6 real = 1.838s
>>> OMP_NUM_THREADS=7 real = 2.081s
>>> OMP_NUM_THREADS=8 real = 2.282s
>>>
>>> g++ -O3
>>>
>>> real = 1.638s
>>>
>>> still, the point is the same imo. WIth 8 cores I'd expect maybe a
>>> speed-up factor of 4 to 6. What we instead see is something around 1
>>> (often below 1 as it seems), so effectively all the additional cpu
>>> time is eaten up by the overhead for multiple threads. That's not
>>> worth it, is it ? I didn't try many optimizations with omp yet, but
>>> what I see in "good" cases are the 4-6 above. I wouldn't parallelize
>>> for much below.
>>>
>>> best regards
>>> Peter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 15 June 2014 17:18, <
[hidden email]> wrote:
>>> > Hi,
>>> > Have you tried to use only the 4 physical threads of your cpu? I dont
>>> > use OpenMP but I use boost threads and hyperthreading does very weird
>>> > things; one is that over 4 threads (in this case) scaling stops being
>>> > linear, which makes sense. Luigi, your 2 cpus are physical, right?
>>> > just a shot.
>>> > Best
>>> >
>>> >
>>> > ----- Original Message -----
>>> >> Yes, the timing might be off. I suspect that the Boost timer is
>>> >> reporting the total CPU time, that is, the sum of the actual time per
>>> >> each CPU. On my box, if I run the BermudanSwaption example with
>>> >> OpenMP
>>> >> enabled, it outputs:
>>> >>
>>> >> Run completed in 2 m 35 s
>>> >>
>>> >> but if I call it through "time", I get an output like:
>>> >>
>>> >> real 1m19.767s
>>> >> user 2m34.183s
>>> >> sys 0m0.538s
>>> >>
>>> >> that is, total CPU time 2m34s, but real time 1m19s. Being the
>>> >> untrusting individual that I am, I also timed it with a stopwatch.
>>> >> The
>>> >> elapsed time is actually 1m19s :)
>>> >>
>>> >> This said, I still see a little slowdown in the test cases Peter
>>> >> listed. My times are:
>>> >>
>>> >> AmericanOptionTest: disabled 2.4s, enabled 3.4s (real time)
>>> >> AsianOptionTest: disabled 10.6s, enabled 10.4s
>>> >> BarrierOptionTest: disabled 4.9s, enabled 6.1s
>>> >> DividendOptionTest: disabled 5.1s, enabled 6.5s
>>> >> FdHestonTest: disabled 73.4s, enabled 76.8s
>>> >> FdmLinearOpTest: disabled 11.4s, enabled 11.6s
>>> >>
>>> >> Not much, but a bit slower anyway. I've only got 2 CPUs though (and I
>>> >> compiled with -O2). Peter, what do you get on your 8 CPUs if you run
>>> >> the cases via "time"?
>>> >>
>>> >> Luigi
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Sun, Jun 15, 2014 at 3:40 PM, Joseph Wang <
[hidden email]>
>>> >> wrote:
>>> >> >
>>> >> > That's quite odd since OpenMP should not be causing such huge
>>> >> > slowdowns.
>>> >> >
>>> >> > Since by default the items are not complied, I'd rather keep the
>>> >> > pragma's
>>> >> > there.
>>> >> >
>>> >> > Also is there any possibilities that the timing code is off?
>>> >> >
>>> >> >
>>> >> > ------------------------------------------------------------------------------
>>> >> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>> >> > Solutions
>>> >> > Find What Matters Most in Your Big Data with HPCC Systems
>>> >> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>> >> > Leverages Graph Analysis for Fast Processing & Easy Data
>>> >> > Exploration
>>> >> >
http://p.sf.net/sfu/hpccsystems>>> >> > _______________________________________________
>>> >> > QuantLib-dev mailing list
>>> >> >
[hidden email]
>>> >> >
https://lists.sourceforge.net/lists/listinfo/quantlib-dev>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> <
https://implementingquantlib.blogspot.com>
>>> >> <
https://twitter.com/lballabio>
>>> >>
>>> >>
>>> >> ------------------------------------------------------------------------------
>>> >> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>> >> Solutions
>>> >> Find What Matters Most in Your Big Data with HPCC Systems
>>> >> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>> >> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>> >>
http://p.sf.net/sfu/hpccsystems>>> >> _______________________________________________
>>> >> QuantLib-dev mailing list
>>> >>
[hidden email]
>>> >>
https://lists.sourceforge.net/lists/listinfo/quantlib-dev>>> >>
>>> >
>>> >
>>> > ------------------------------------------------------------------------------
>>> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>> > Solutions
>>> > Find What Matters Most in Your Big Data with HPCC Systems
>>> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>> > Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>> >
http://p.sf.net/sfu/hpccsystems>>> > _______________________________________________
>>> > QuantLib-dev mailing list
>>> >
[hidden email]
>>> >
https://lists.sourceforge.net/lists/listinfo/quantlib-dev>>>
>>>
>>> ------------------------------------------------------------------------------
>>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>>> Find What Matters Most in Your Big Data with HPCC Systems
>>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>>
http://p.sf.net/sfu/hpccsystems>>> _______________________________________________
>>> QuantLib-dev mailing list
>>>
[hidden email]
>>>
https://lists.sourceforge.net/lists/listinfo/quantlib-dev>
>
>
> --
> <
https://implementingquantlib.blogspot.com>
> <
https://twitter.com/lballabio>
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.