QuantLib - Re: OpenMP - current usage in ql

Re: OpenMP - current usage in ql

Posted by Joseph Wang-4 on Jun 16, 2014; 12:45am
URL: http://quantlib.414.s1.nabble.com/OpenMP-current-usage-in-ql-tp15458p15468.html

There's a reason why it's default "off." :-) :-(

There's not that much parallelization going on in PDE and Tree. The OpenMP is used mainly for copying arrays so you should see some modest improvements if you are doing very large arrays, but no where close to a factor of N for N cores.

The problem is that the big speedups are for things like monte carlo, which are "Riduculously parallel." The trouble with MC is that once you parallelize, the random number generator gives you different answers, and it becomes impossible to test and also you have the possibility of very subtle bugs with the RNG coorrelations across different processors. Trying to get quantlib to have consistent RNG's in multi-core MC turns out to be a non-trivial project.

Alternatively, the algos that are being used for PDE's in quantlib particularly ones relating to tridiagonal matrix operations turn out to be terrible for parallel computing. There could be some better speedups for PDE's with different algos that are better for parallel systems. Also if you want to parallelizes, you want to use an explicit PDE scheme rather than an implicit one.

The two projects that I can think of are:

1) getting MC working for openmp or

2) putting in better parallel algos for PDE's.

If no one else is working on this, I should be in a position to do it in three months or so. Right now I'm working on the front end of my bitcoin trading system (http://www.bitquant.com.hk/). Once I get the front-end working, I'll connect quantlib to the backend through OpenGamma and the java interface, at which point I'll need to go back to backend programming.

Also, if there are any other quantlib people going to HK for the bitcoin conference, let me know. It turns out that quantlib is perfect for bitcoin derviatives. In most financial products, you get the code and the trading system through the broker. Since bitcoin has no brokers, and trading systems have to be external which leaves a space for open source software.

The other project I'm working on is that there are a ton of HK people (including my wife) that trade warrants and callable bull-bear certificates, and most of them don't have access to any sort of analytics. The reason for this is that brokers either don't care that their customers have access to analytics, or actually don't want clients with analytics since the brokers are taking the other side of the trade and want their clients to lose money.

On Mon, Jun 16, 2014 at 4:20 AM, Piter Dias <[hidden email]> wrote:

Is there a chance of the test being too small? I remember that many years ago (during my Algorithmics days) we used to make tests as big as the client real portfolio in order to make server buying advices (mainly based on number of processors due to scenarios valuation).

Would the speed-up factors significantly change if the base scenario runs for, lets say, 15 minutes? It would make fixed costs more negligible.

Regards,

_____________________

Piter Dias
[hidden email]
www.piterdias.com

> Date: Sun, 15 Jun 2014 20:11:28 +0200
> From: [hidden email]
> To: [hidden email]
> CC: [hidden email]; [hidden email]; [hidden email]
> Subject: Re: [Quantlib-dev] OpenMP - current usage in ql

>
> oh yes, my timings below are total CPU time rather than wall clock
> time ( I usually measure the latter by just counting seconds in my
> head ... ). That was unfair, sorry ! With the time command I get (for
> the AmericanOptionTest)
>
> g++ -O3 -fopenmp
>
> OMP_NUM_THREADS=1 real = 1.925s
> OMP_NUM_THREADS=2 real = 1.468s
> OMP_NUM_THREADS=3 real = 1.590s
> OMP_NUM_THREADS=4 real = 1.647s
> OMP_NUM_THREADS=5 real = 1.780s
> OMP_NUM_THREADS=6 real = 1.838s
> OMP_NUM_THREADS=7 real = 2.081s
> OMP_NUM_THREADS=8 real = 2.282s
>
> g++ -O3
>
> real = 1.638s
>
> still, the point is the same imo. WIth 8 cores I'd expect maybe a
> speed-up factor of 4 to 6. What we instead see is something around 1
> (often below 1 as it seems), so effectively all the additional cpu
> time is eaten up by the overhead for multiple threads. That's not
> worth it, is it ? I didn't try many optimizations with omp yet, but
> what I see in "good" cases are the 4-6 above. I wouldn't parallelize
> for much below.

>
> best regards
> Peter
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 15 June 2014 17:18, <[hidden email]> wrote:
> > Hi,
> > Have you tried to use only the 4 physical threads of your cpu? I dont use OpenMP but I use boost threads and hyperthreading does very weird things; one is that over 4 threads (in this case) scaling stops being linear, which makes sense. Luigi, your 2 cpus are physical, right?
> > just a shot.
> > Best
> >
> >
> > ----- Original Message -----
> >> Yes, the timing might be off. I suspect that the Boost timer is
> >> reporting the total CPU time, that is, the sum of the actual time per
> >> each CPU. On my box, if I run the BermudanSwaption example with
> >> OpenMP
> >> enabled, it outputs:
> >>
> >> Run completed in 2 m 35 s
> >>
> >> but if I call it through "time", I get an output like:
> >>
> >> real 1m19.767s
> >> user 2m34.183s
> >> sys 0m0.538s
> >>
> >> that is, total CPU time 2m34s, but real time 1m19s. Being the
> >> untrusting individual that I am, I also timed it with a stopwatch.
> >> The
> >> elapsed time is actually 1m19s :)
> >>
> >> This said, I still see a little slowdown in the test cases Peter
> >> listed. My times are:
> >>
> >> AmericanOptionTest: disabled 2.4s, enabled 3.4s (real time)
> >> AsianOptionTest: disabled 10.6s, enabled 10.4s
> >> BarrierOptionTest: disabled 4.9s, enabled 6.1s
> >> DividendOptionTest: disabled 5.1s, enabled 6.5s
> >> FdHestonTest: disabled 73.4s, enabled 76.8s
> >> FdmLinearOpTest: disabled 11.4s, enabled 11.6s
> >>
> >> Not much, but a bit slower anyway. I've only got 2 CPUs though (and I
> >> compiled with -O2). Peter, what do you get on your 8 CPUs if you run
> >> the cases via "time"?
> >>
> >> Luigi
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Sun, Jun 15, 2014 at 3:40 PM, Joseph Wang <[hidden email]>
> >> wrote:
> >> >
> >> > That's quite odd since OpenMP should not be causing such huge
> >> > slowdowns.
> >> >
> >> > Since by default the items are not complied, I'd rather keep the
> >> > pragma's
> >> > there.
> >> >
> >> > Also is there any possibilities that the timing code is off?
> >> >
> >> > ------------------------------------------------------------------------------
> >> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
> >> > Solutions
> >> > Find What Matters Most in Your Big Data with HPCC Systems
> >> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> >> > Leverages Graph Analysis for Fast Processing & Easy Data
> >> > Exploration
> >> > http://p.sf.net/sfu/hpccsystems
> >> > _______________________________________________
> >> > QuantLib-dev mailing list
> >> > [hidden email]
> >> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
> >> >
> >>
> >>
> >>
> >> --
> >> <https://implementingquantlib.blogspot.com>
> >> <https://twitter.com/lballabio>
> >>
> >> ------------------------------------------------------------------------------
> >> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
> >> Solutions
> >> Find What Matters Most in Your Big Data with HPCC Systems
> >> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> >> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> >> http://p.sf.net/sfu/hpccsystems
> >> _______________________________________________
> >> QuantLib-dev mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
> >>
> >
> > ------------------------------------------------------------------------------
> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> > Find What Matters Most in Your Big Data with HPCC Systems
> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> > Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> > http://p.sf.net/sfu/hpccsystems
> > _______________________________________________
> > QuantLib-dev mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> QuantLib-dev mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/quantlib-dev

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev