Hi Joseph, all,
I added a wrapper for the dcmt library (Dynamic Creator of Mersenne Twisters). https://github.com/lballabio/quantlib/pull/132 I guess this is a useful building block for multithreaded monte carlo. Since for bigger p the dynamic creation takes a long time (it feels more like mining than computing ...), I precomputed 8 independent instances (i.e. for use in at most 8 parallel threads), for the "standard" value p = 19937 and word size 32, which one can instantiate with MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); for i = 0, ... , 7. In addition the speed of random number generation seems a bit faster in the dcmt library than with the original ql twister. I observe running times scaled by a factor of 0.8 when generating 1E8 numbers. All this is of course experimental and not well tested, so any feedback and experiences are very welcome. I'd be very interested in your opinion on the dcmt library and applications in parallel monte carlo. Peter On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: > I've done some more parallelization with openmp and quantlib. I've uploaded > the changes to the https://github.com/joequant/quantlib. The branch openmp > has some changes that I've issued a pull-request for. openmp-mcario has > some changes that need some more work. > > I've gotten the MC to work by generating the paths in a critical situation. > Calculating the prices once I have the path is multithreaded, but right now > I need to generate the paths in a single thread to make sure that the same > sequence is generated. > > The big issue right now is that there is a race condition in the calculation > of barrier options which is causing one regression test to fail. The > problem is that the random number generator is being called in > BarrierPathPricer, and since that is run multithread, the sequence that is > being pulled will change from run to run based on whether other paths have > pulled random numbers already. > > I think that fixing this is going to need some code restructuring, but I'd > like to get some thoughts as to how to do this. Basically, the interface > needs to be changed slightly so that the random numbers are drawn in a fixed > order, and that might mean one call to get any additional random numbers in > a pricer, which gets called in a critical section, and another to run the > pricer with the random numbers. > > > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > QuantLib-dev mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/quantlib-dev > ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Hi Peter,
I have used your wrapper dcmt library and test with following codes: It seems dcmt in single thread is 4X slower than the QL original MT. Is this consistent with your side? #include <ql/quantlib.hpp> #include <boost/timer.hpp> #include <iostream> using namespace QuantLib; using namespace std; int main() { int samples; cin >> samples; boost::timer myTimer; MersenneTwisterUniformRng orignalMT; for(Size i=0; i<samples; ++i) orignalMT.next(); cout << myTimer.elapsed() << endl; myTimer.restart(); MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); for(Size i=0; i<samples; ++i) { mt.next(); } cout << myTimer.elapsed() << endl; int n; std::cin>>n; return 0; } Regards, Cheng -----邮件原件----- 发件人: Peter Caspers [mailto:[hidden email]] 发送时间: 2014年9月6日 20:48 收件人: Joseph Wang 抄送: QuantLib Mailing Lists 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT Hi Joseph, all, I added a wrapper for the dcmt library (Dynamic Creator of Mersenne Twisters). https://github.com/lballabio/quantlib/pull/132 I guess this is a useful building block for multithreaded monte carlo. Since for bigger p the dynamic creation takes a long time (it feels more like mining than computing ...), I precomputed 8 independent instances (i.e. for use in at most 8 parallel threads), for the "standard" value p = 19937 and word size 32, which one can instantiate with MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); for i = 0, ... , 7. In addition the speed of random number generation seems a bit faster in the dcmt library than with the original ql twister. I observe running times scaled by a factor of 0.8 when generating 1E8 numbers. All this is of course experimental and not well tested, so any feedback and experiences are very welcome. I'd be very interested in your opinion on the dcmt library and applications in parallel monte carlo. Peter On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: > I've done some more parallelization with openmp and quantlib. I've > uploaded the changes to the https://github.com/joequant/quantlib. The > branch openmp has some changes that I've issued a pull-request for. > openmp-mcario has some changes that need some more work. > > I've gotten the MC to work by generating the paths in a critical situation. > Calculating the prices once I have the path is multithreaded, but > right now I need to generate the paths in a single thread to make sure > that the same sequence is generated. > > The big issue right now is that there is a race condition in the > calculation of barrier options which is causing one regression test to > fail. The problem is that the random number generator is being called > in BarrierPathPricer, and since that is run multithread, the sequence > that is being pulled will change from run to run based on whether > other paths have pulled random numbers already. > > I think that fixing this is going to need some code restructuring, but > I'd like to get some thoughts as to how to do this. Basically, the > interface needs to be changed slightly so that the random numbers are > drawn in a fixed order, and that might mean one call to get any > additional random numbers in a pricer, which gets called in a critical > section, and another to run the pricer with the random numbers. > > > > > ---------------------------------------------------------------------- > -------- October Webinars: Code for Performance Free Intel webinars > can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the > most from the latest Intel processors and coprocessors. See abstracts > and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.c > lktrk _______________________________________________ > QuantLib-dev mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/quantlib-dev > ---------------------------------------------------------------------------- -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
In reply to this post by Peter Caspers-4
Yes. It would it be nice if we could get it in.
My experience with MC is that the big bottleneck with parallel applications is a testing issue (i.e. how to you verify that the number is correct). The approach that is industry standard involved using an RNG that can be started at a given location and to generate the same random number for parallel and standard paths. I know of one bank where they ended up using mesenne twister for this. ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
In reply to this post by cheng li
Hi Cheng,
no, I get better timings with the dcmt implementation, e.g. for 1E8 numbers dcmt 0.982s quantlib 1.159s on my computer. Can you post your platform and compiler settings, so that I can try to reproduce ? Thanks Peter On 12 September 2014 05:29, cheng li <[hidden email]> wrote: > Hi Peter, > > I have used your wrapper dcmt library and test with following codes: It > seems dcmt in single thread is 4X slower than the QL original MT. Is this > consistent with your side? > > #include <ql/quantlib.hpp> > #include <boost/timer.hpp> > #include <iostream> > > using namespace QuantLib; > using namespace std; > > int main() { > > int samples; > cin >> samples; > boost::timer myTimer; > > MersenneTwisterUniformRng orignalMT; > for(Size i=0; i<samples; ++i) > orignalMT.next(); > > cout << myTimer.elapsed() << endl; > > myTimer.restart(); > > MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); > > for(Size i=0; i<samples; ++i) { > mt.next(); > } > > cout << myTimer.elapsed() << endl; > > int n; > std::cin>>n; > return 0; > } > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月6日 20:48 > 收件人: Joseph Wang > 抄送: QuantLib Mailing Lists > 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT > > Hi Joseph, all, > > I added a wrapper for the dcmt library (Dynamic Creator of Mersenne > Twisters). > > https://github.com/lballabio/quantlib/pull/132 > > I guess this is a useful building block for multithreaded monte carlo. > Since for bigger p the dynamic creation takes a long time (it feels more > like mining than computing ...), I precomputed 8 independent instances (i.e. > for use in at most 8 parallel threads), for the "standard" value p = 19937 > and word size 32, which one can instantiate with > > MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); > > for i = 0, ... , 7. > > In addition the speed of random number generation seems a bit faster in the > dcmt library than with the original ql twister. I observe running times > scaled by a factor of 0.8 when generating 1E8 numbers. > > All this is of course experimental and not well tested, so any feedback and > experiences are very welcome. I'd be very interested in your opinion on the > dcmt library and applications in parallel monte carlo. > > Peter > > On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >> I've done some more parallelization with openmp and quantlib. I've >> uploaded the changes to the https://github.com/joequant/quantlib. The >> branch openmp has some changes that I've issued a pull-request for. >> openmp-mcario has some changes that need some more work. >> >> I've gotten the MC to work by generating the paths in a critical > situation. >> Calculating the prices once I have the path is multithreaded, but >> right now I need to generate the paths in a single thread to make sure >> that the same sequence is generated. >> >> The big issue right now is that there is a race condition in the >> calculation of barrier options which is causing one regression test to >> fail. The problem is that the random number generator is being called >> in BarrierPathPricer, and since that is run multithread, the sequence >> that is being pulled will change from run to run based on whether >> other paths have pulled random numbers already. >> >> I think that fixing this is going to need some code restructuring, but >> I'd like to get some thoughts as to how to do this. Basically, the >> interface needs to be changed slightly so that the random numbers are >> drawn in a fixed order, and that might mean one call to get any >> additional random numbers in a pricer, which gets called in a critical >> section, and another to run the pricer with the random numbers. >> >> >> >> >> ---------------------------------------------------------------------- >> -------- October Webinars: Code for Performance Free Intel webinars >> can help you accelerate application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the >> most from the latest Intel processors and coprocessors. See abstracts >> and register > >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.c >> lktrk _______________________________________________ >> QuantLib-dev mailing list >> [hidden email] >> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >> > > ---------------------------------------------------------------------------- > -- > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > QuantLib-dev mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/quantlib-dev > ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
In reply to this post by Peter Caspers-4
Hi Cheng,
indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). Does anyone have an idea where the different behaviour under gcc / linux and msvc might come from (and how to improve the msvc side if possible) ? Kind regards Peter On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: > Thanks Peter. > > Regards, > Cheng > > 发自我的 iPad > >> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >> >> I will have a look on monday ( I have a Windows machine at work ) and see how it works there >> >> Thanks >> Peter >> >> Von meinem iPhone gesendet >> >>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>> >>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 under release mode >>> >>> 发自我的 iPad >>> >>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>> >>>> Hi Cheng, >>>> >>>> no, I get better timings with the dcmt implementation, e.g. for 1E8 numbers >>>> >>>> dcmt 0.982s >>>> quantlib 1.159s >>>> >>>> on my computer. Can you post your platform and compiler settings, so >>>> that I can try to reproduce ? >>>> >>>> Thanks >>>> Peter >>>> >>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>> Hi Peter, >>>>> >>>>> I have used your wrapper dcmt library and test with following codes: It >>>>> seems dcmt in single thread is 4X slower than the QL original MT. Is this >>>>> consistent with your side? >>>>> >>>>> #include <ql/quantlib.hpp> >>>>> #include <boost/timer.hpp> >>>>> #include <iostream> >>>>> >>>>> using namespace QuantLib; >>>>> using namespace std; >>>>> >>>>> int main() { >>>>> >>>>> int samples; >>>>> cin >> samples; >>>>> boost::timer myTimer; >>>>> >>>>> MersenneTwisterUniformRng orignalMT; >>>>> for(Size i=0; i<samples; ++i) >>>>> orignalMT.next(); >>>>> >>>>> cout << myTimer.elapsed() << endl; >>>>> >>>>> myTimer.restart(); >>>>> >>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>> >>>>> for(Size i=0; i<samples; ++i) { >>>>> mt.next(); >>>>> } >>>>> >>>>> cout << myTimer.elapsed() << endl; >>>>> >>>>> int n; >>>>> std::cin>>n; >>>>> return 0; >>>>> } >>>>> >>>>> Regards, >>>>> Cheng >>>>> >>>>> -----邮件原件----- >>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>> 发送时间: 2014年9月6日 20:48 >>>>> 收件人: Joseph Wang >>>>> 抄送: QuantLib Mailing Lists >>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT >>>>> >>>>> Hi Joseph, all, >>>>> >>>>> I added a wrapper for the dcmt library (Dynamic Creator of Mersenne >>>>> Twisters). >>>>> >>>>> https://github.com/lballabio/quantlib/pull/132 >>>>> >>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>> Since for bigger p the dynamic creation takes a long time (it feels more >>>>> like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>> for use in at most 8 parallel threads), for the "standard" value p = 19937 >>>>> and word size 32, which one can instantiate with >>>>> >>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>> >>>>> for i = 0, ... , 7. >>>>> >>>>> In addition the speed of random number generation seems a bit faster in the >>>>> dcmt library than with the original ql twister. I observe running times >>>>> scaled by a factor of 0.8 when generating 1E8 numbers. >>>>> >>>>> All this is of course experimental and not well tested, so any feedback and >>>>> experiences are very welcome. I'd be very interested in your opinion on the >>>>> dcmt library and applications in parallel monte carlo. >>>>> >>>>> Peter >>>>> >>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>> I've done some more parallelization with openmp and quantlib. I've >>>>>> uploaded the changes to the https://github.com/joequant/quantlib. The >>>>>> branch openmp has some changes that I've issued a pull-request for. >>>>>> openmp-mcario has some changes that need some more work. >>>>>> >>>>>> I've gotten the MC to work by generating the paths in a critical >>>>> situation. >>>>>> Calculating the prices once I have the path is multithreaded, but >>>>>> right now I need to generate the paths in a single thread to make sure >>>>>> that the same sequence is generated. >>>>>> >>>>>> The big issue right now is that there is a race condition in the >>>>>> calculation of barrier options which is causing one regression test to >>>>>> fail. The problem is that the random number generator is being called >>>>>> in BarrierPathPricer, and since that is run multithread, the sequence >>>>>> that is being pulled will change from run to run based on whether >>>>>> other paths have pulled random numbers already. >>>>>> >>>>>> I think that fixing this is going to need some code restructuring, but >>>>>> I'd like to get some thoughts as to how to do this. Basically, the >>>>>> interface needs to be changed slightly so that the random numbers are >>>>>> drawn in a fixed order, and that might mean one call to get any >>>>>> additional random numbers in a pricer, which gets called in a critical >>>>>> section, and another to run the pricer with the random numbers. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> -------- October Webinars: Code for Performance Free Intel webinars >>>>>> can help you accelerate application performance. >>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the >>>>>> most from the latest Intel processors and coprocessors. See abstracts >>>>>> and register > >>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.c >>>>>> lktrk _______________________________________________ >>>>>> QuantLib-dev mailing list >>>>>> [hidden email] >>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>> >>>>> ---------------------------------------------------------------------------- >>>>> -- >>>>> Slashdot TV. >>>>> Video for Nerds. Stuff that matters. >>>>> http://tv.slashdot.org/ >>>>> _______________________________________________ >>>>> QuantLib-dev mailing list >>>>> [hidden email] >>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>> ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce. Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
I'll try -O3 on my machine also with Ubuntu. Regards, Cheng -----邮件原件----- 发件人: Peter Caspers [mailto:[hidden email]] 发送时间: 2014年9月17日 0:32 收件人: Cheng Li; QuantLib Mailing Lists 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT Hi Cheng, indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). Does anyone have an idea where the different behaviour under gcc / linux and msvc might come from (and how to improve the msvc side if possible) ? Kind regards Peter On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: > Thanks Peter. > > Regards, > Cheng > > 发自我的 iPad > >> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >> >> I will have a look on monday ( I have a Windows machine at work ) and >> see how it works there >> >> Thanks >> Peter >> >> Von meinem iPhone gesendet >> >>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>> >>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>> under release mode >>> >>> 发自我的 iPad >>> >>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>> >>>> Hi Cheng, >>>> >>>> no, I get better timings with the dcmt implementation, e.g. for 1E8 >>>> numbers >>>> >>>> dcmt 0.982s >>>> quantlib 1.159s >>>> >>>> on my computer. Can you post your platform and compiler settings, >>>> so that I can try to reproduce ? >>>> >>>> Thanks >>>> Peter >>>> >>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>> Hi Peter, >>>>> >>>>> I have used your wrapper dcmt library and test with following >>>>> codes: It seems dcmt in single thread is 4X slower than the QL >>>>> original MT. Is this consistent with your side? >>>>> >>>>> #include <ql/quantlib.hpp> >>>>> #include <boost/timer.hpp> >>>>> #include <iostream> >>>>> >>>>> using namespace QuantLib; >>>>> using namespace std; >>>>> >>>>> int main() { >>>>> >>>>> int samples; >>>>> cin >> samples; >>>>> boost::timer myTimer; >>>>> >>>>> MersenneTwisterUniformRng orignalMT; >>>>> for(Size i=0; i<samples; ++i) >>>>> orignalMT.next(); >>>>> >>>>> cout << myTimer.elapsed() << endl; >>>>> >>>>> myTimer.restart(); >>>>> >>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>> >>>>> for(Size i=0; i<samples; ++i) { >>>>> mt.next(); >>>>> } >>>>> >>>>> cout << myTimer.elapsed() << endl; >>>>> >>>>> int n; >>>>> std::cin>>n; >>>>> return 0; >>>>> } >>>>> >>>>> Regards, >>>>> Cheng >>>>> >>>>> -----邮件原件----- >>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>> 发送时间: 2014年9月6日 20:48 >>>>> 收件人: Joseph Wang >>>>> 抄送: QuantLib Mailing Lists >>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT >>>>> >>>>> Hi Joseph, all, >>>>> >>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>> Mersenne Twisters). >>>>> >>>>> https://github.com/lballabio/quantlib/pull/132 >>>>> >>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>> Since for bigger p the dynamic creation takes a long time (it >>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>> for use in at most 8 parallel threads), for the "standard" value p >>>>> = 19937 and word size 32, which one can instantiate with >>>>> >>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>> >>>>> for i = 0, ... , 7. >>>>> >>>>> In addition the speed of random number generation seems a bit >>>>> faster in the dcmt library than with the original ql twister. I >>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>> >>>>> All this is of course experimental and not well tested, so any >>>>> feedback and experiences are very welcome. I'd be very interested >>>>> in your opinion on the dcmt library and applications in parallel monte carlo. >>>>> >>>>> Peter >>>>> >>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>> I've done some more parallelization with openmp and quantlib. >>>>>> I've uploaded the changes to the >>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>> openmp-mcario has some changes that need some more work. >>>>>> >>>>>> I've gotten the MC to work by generating the paths in a critical >>>>> situation. >>>>>> Calculating the prices once I have the path is multithreaded, but >>>>>> right now I need to generate the paths in a single thread to make >>>>>> sure that the same sequence is generated. >>>>>> >>>>>> The big issue right now is that there is a race condition in the >>>>>> calculation of barrier options which is causing one regression >>>>>> test to fail. The problem is that the random number generator is >>>>>> being called in BarrierPathPricer, and since that is run >>>>>> multithread, the sequence that is being pulled will change from >>>>>> run to run based on whether other paths have pulled random numbers already. >>>>>> >>>>>> I think that fixing this is going to need some code >>>>>> restructuring, but I'd like to get some thoughts as to how to do >>>>>> this. Basically, the interface needs to be changed slightly so >>>>>> that the random numbers are drawn in a fixed order, and that >>>>>> might mean one call to get any additional random numbers in a >>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ----------------------------------------------------------------- >>>>>> ----- >>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>> webinars can help you accelerate application performance. >>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get >>>>>> the most from the latest Intel processors and coprocessors. See >>>>>> abstracts and register > >>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/o >>>>>> stg.c lktrk _______________________________________________ >>>>>> QuantLib-dev mailing list >>>>>> [hidden email] >>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>> >>>>> ------------------------------------------------------------------ >>>>> ---------- >>>>> -- >>>>> Slashdot TV. >>>>> Video for Nerds. Stuff that matters. >>>>> http://tv.slashdot.org/ >>>>> _______________________________________________ >>>>> QuantLib-dev mailing list >>>>> [hidden email] >>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>> ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
with gcc 4.9.1 and O2 the new mt is a bit slower than the original one
(but only by a factor of 1.1). I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. Which compiler do you use on Ubuntu ? Peter On 17 September 2014 03:26, cheng li <[hidden email]> wrote: > Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. > > I'll try -O3 on my machine also with Ubuntu. > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月17日 0:32 > 收件人: Cheng Li; QuantLib Mailing Lists > 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT > > Hi Cheng, > > indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). > > Does anyone have an idea where the different behaviour under gcc / linux and msvc might come from (and how to improve the msvc side if > possible) ? > > Kind regards > Peter > > > > On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >> Thanks Peter. >> >> Regards, >> Cheng >> >> 发自我的 iPad >> >>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>> >>> I will have a look on monday ( I have a Windows machine at work ) and >>> see how it works there >>> >>> Thanks >>> Peter >>> >>> Von meinem iPhone gesendet >>> >>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>> >>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>> under release mode >>>> >>>> 发自我的 iPad >>>> >>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>> >>>>> Hi Cheng, >>>>> >>>>> no, I get better timings with the dcmt implementation, e.g. for 1E8 >>>>> numbers >>>>> >>>>> dcmt 0.982s >>>>> quantlib 1.159s >>>>> >>>>> on my computer. Can you post your platform and compiler settings, >>>>> so that I can try to reproduce ? >>>>> >>>>> Thanks >>>>> Peter >>>>> >>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>> Hi Peter, >>>>>> >>>>>> I have used your wrapper dcmt library and test with following >>>>>> codes: It seems dcmt in single thread is 4X slower than the QL >>>>>> original MT. Is this consistent with your side? >>>>>> >>>>>> #include <ql/quantlib.hpp> >>>>>> #include <boost/timer.hpp> >>>>>> #include <iostream> >>>>>> >>>>>> using namespace QuantLib; >>>>>> using namespace std; >>>>>> >>>>>> int main() { >>>>>> >>>>>> int samples; >>>>>> cin >> samples; >>>>>> boost::timer myTimer; >>>>>> >>>>>> MersenneTwisterUniformRng orignalMT; >>>>>> for(Size i=0; i<samples; ++i) >>>>>> orignalMT.next(); >>>>>> >>>>>> cout << myTimer.elapsed() << endl; >>>>>> >>>>>> myTimer.restart(); >>>>>> >>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>> >>>>>> for(Size i=0; i<samples; ++i) { >>>>>> mt.next(); >>>>>> } >>>>>> >>>>>> cout << myTimer.elapsed() << endl; >>>>>> >>>>>> int n; >>>>>> std::cin>>n; >>>>>> return 0; >>>>>> } >>>>>> >>>>>> Regards, >>>>>> Cheng >>>>>> >>>>>> -----邮件原件----- >>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>> 发送时间: 2014年9月6日 20:48 >>>>>> 收件人: Joseph Wang >>>>>> 抄送: QuantLib Mailing Lists >>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT >>>>>> >>>>>> Hi Joseph, all, >>>>>> >>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>> Mersenne Twisters). >>>>>> >>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>> >>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>> for use in at most 8 parallel threads), for the "standard" value p >>>>>> = 19937 and word size 32, which one can instantiate with >>>>>> >>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>> >>>>>> for i = 0, ... , 7. >>>>>> >>>>>> In addition the speed of random number generation seems a bit >>>>>> faster in the dcmt library than with the original ql twister. I >>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>> >>>>>> All this is of course experimental and not well tested, so any >>>>>> feedback and experiences are very welcome. I'd be very interested >>>>>> in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>> >>>>>> Peter >>>>>> >>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>> I've uploaded the changes to the >>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>> openmp-mcario has some changes that need some more work. >>>>>>> >>>>>>> I've gotten the MC to work by generating the paths in a critical >>>>>> situation. >>>>>>> Calculating the prices once I have the path is multithreaded, but >>>>>>> right now I need to generate the paths in a single thread to make >>>>>>> sure that the same sequence is generated. >>>>>>> >>>>>>> The big issue right now is that there is a race condition in the >>>>>>> calculation of barrier options which is causing one regression >>>>>>> test to fail. The problem is that the random number generator is >>>>>>> being called in BarrierPathPricer, and since that is run >>>>>>> multithread, the sequence that is being pulled will change from >>>>>>> run to run based on whether other paths have pulled random numbers already. >>>>>>> >>>>>>> I think that fixing this is going to need some code >>>>>>> restructuring, but I'd like to get some thoughts as to how to do >>>>>>> this. Basically, the interface needs to be changed slightly so >>>>>>> that the random numbers are drawn in a fixed order, and that >>>>>>> might mean one call to get any additional random numbers in a >>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ----------------------------------------------------------------- >>>>>>> ----- >>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>> webinars can help you accelerate application performance. >>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get >>>>>>> the most from the latest Intel processors and coprocessors. See >>>>>>> abstracts and register > >>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/o >>>>>>> stg.c lktrk _______________________________________________ >>>>>>> QuantLib-dev mailing list >>>>>>> [hidden email] >>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>> >>>>>> ------------------------------------------------------------------ >>>>>> ---------- >>>>>> -- >>>>>> Slashdot TV. >>>>>> Video for Nerds. Stuff that matters. >>>>>> http://tv.slashdot.org/ >>>>>> _______________________________________________ >>>>>> QuantLib-dev mailing list >>>>>> [hidden email] >>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>> > ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Hi Peter,
I used gcc 4.8.2. My result with O3 optimization is still not good. Similar performance of new MT ( about 3~4X speed down) I used such statement to turn on o3 optimization before I do ./configure for QuantLib, Export CXXFLAGS="-g -O3" Am I right? Regards, Cheng -----邮件原件----- 发件人: Peter Caspers [mailto:[hidden email]] 发送时间: 2014年9月18日 0:36 收件人: cheng li 抄送: QuantLib Mailing Lists 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1). I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. Which compiler do you use on Ubuntu ? Peter On 17 September 2014 03:26, cheng li <[hidden email]> wrote: > Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. > > I'll try -O3 on my machine also with Ubuntu. > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月17日 0:32 > 收件人: Cheng Li; QuantLib Mailing Lists > 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT > > Hi Cheng, > > indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). > > Does anyone have an idea where the different behaviour under gcc / > linux and msvc might come from (and how to improve the msvc side if > possible) ? > > Kind regards > Peter > > > > On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >> Thanks Peter. >> >> Regards, >> Cheng >> >> 发自我的 iPad >> >>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>> >>> I will have a look on monday ( I have a Windows machine at work ) >>> and see how it works there >>> >>> Thanks >>> Peter >>> >>> Von meinem iPhone gesendet >>> >>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>> >>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>> under release mode >>>> >>>> 发自我的 iPad >>>> >>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>> >>>>> Hi Cheng, >>>>> >>>>> no, I get better timings with the dcmt implementation, e.g. for >>>>> 1E8 numbers >>>>> >>>>> dcmt 0.982s >>>>> quantlib 1.159s >>>>> >>>>> on my computer. Can you post your platform and compiler settings, >>>>> so that I can try to reproduce ? >>>>> >>>>> Thanks >>>>> Peter >>>>> >>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>> Hi Peter, >>>>>> >>>>>> I have used your wrapper dcmt library and test with following >>>>>> codes: It seems dcmt in single thread is 4X slower than the QL >>>>>> original MT. Is this consistent with your side? >>>>>> >>>>>> #include <ql/quantlib.hpp> >>>>>> #include <boost/timer.hpp> >>>>>> #include <iostream> >>>>>> >>>>>> using namespace QuantLib; >>>>>> using namespace std; >>>>>> >>>>>> int main() { >>>>>> >>>>>> int samples; >>>>>> cin >> samples; >>>>>> boost::timer myTimer; >>>>>> >>>>>> MersenneTwisterUniformRng orignalMT; >>>>>> for(Size i=0; i<samples; ++i) >>>>>> orignalMT.next(); >>>>>> >>>>>> cout << myTimer.elapsed() << endl; >>>>>> >>>>>> myTimer.restart(); >>>>>> >>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>> >>>>>> for(Size i=0; i<samples; ++i) { >>>>>> mt.next(); >>>>>> } >>>>>> >>>>>> cout << myTimer.elapsed() << endl; >>>>>> >>>>>> int n; >>>>>> std::cin>>n; >>>>>> return 0; >>>>>> } >>>>>> >>>>>> Regards, >>>>>> Cheng >>>>>> >>>>>> -----邮件原件----- >>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>> 发送时间: 2014年9月6日 20:48 >>>>>> 收件人: Joseph Wang >>>>>> 抄送: QuantLib Mailing Lists >>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT >>>>>> >>>>>> Hi Joseph, all, >>>>>> >>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>> Mersenne Twisters). >>>>>> >>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>> >>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>> for use in at most 8 parallel threads), for the "standard" value >>>>>> p = 19937 and word size 32, which one can instantiate with >>>>>> >>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>> >>>>>> for i = 0, ... , 7. >>>>>> >>>>>> In addition the speed of random number generation seems a bit >>>>>> faster in the dcmt library than with the original ql twister. I >>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>> >>>>>> All this is of course experimental and not well tested, so any >>>>>> feedback and experiences are very welcome. I'd be very interested >>>>>> in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>> >>>>>> Peter >>>>>> >>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>> I've uploaded the changes to the >>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>> openmp-mcario has some changes that need some more work. >>>>>>> >>>>>>> I've gotten the MC to work by generating the paths in a critical >>>>>> situation. >>>>>>> Calculating the prices once I have the path is multithreaded, >>>>>>> but right now I need to generate the paths in a single thread to >>>>>>> make sure that the same sequence is generated. >>>>>>> >>>>>>> The big issue right now is that there is a race condition in the >>>>>>> calculation of barrier options which is causing one regression >>>>>>> test to fail. The problem is that the random number generator >>>>>>> is being called in BarrierPathPricer, and since that is run >>>>>>> multithread, the sequence that is being pulled will change from >>>>>>> run to run based on whether other paths have pulled random numbers already. >>>>>>> >>>>>>> I think that fixing this is going to need some code >>>>>>> restructuring, but I'd like to get some thoughts as to how to do >>>>>>> this. Basically, the interface needs to be changed slightly so >>>>>>> that the random numbers are drawn in a fixed order, and that >>>>>>> might mean one call to get any additional random numbers in a >>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ---------------------------------------------------------------- >>>>>>> - >>>>>>> ----- >>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>> webinars can help you accelerate application performance. >>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get >>>>>>> the most from the latest Intel processors and coprocessors. See >>>>>>> abstracts and register > >>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ >>>>>>> o stg.c lktrk _______________________________________________ >>>>>>> QuantLib-dev mailing list >>>>>>> [hidden email] >>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>> >>>>>> ----------------------------------------------------------------- >>>>>> - >>>>>> ---------- >>>>>> -- >>>>>> Slashdot TV. >>>>>> Video for Nerds. Stuff that matters. >>>>>> http://tv.slashdot.org/ >>>>>> _______________________________________________ >>>>>> QuantLib-dev mailing list >>>>>> [hidden email] >>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>> > ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Let me try your statement once I have a time.
Regards, Cheng -----邮件原件----- 发件人: cheng li [mailto:[hidden email]] 发送时间: 2014年9月18日 9:18 收件人: 'Peter Caspers' 抄送: 'QuantLib Mailing Lists' 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT Hi Peter, I used gcc 4.8.2. My result with O3 optimization is still not good. Similar performance of new MT ( about 3~4X speed down) I used such statement to turn on o3 optimization before I do ./configure for QuantLib, Export CXXFLAGS="-g -O3" Am I right? Regards, Cheng -----邮件原件----- 发件人: Peter Caspers [mailto:[hidden email]] 发送时间: 2014年9月18日 0:36 收件人: cheng li 抄送: QuantLib Mailing Lists 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1). I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. Which compiler do you use on Ubuntu ? Peter On 17 September 2014 03:26, cheng li <[hidden email]> wrote: > Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. > > I'll try -O3 on my machine also with Ubuntu. > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月17日 0:32 > 收件人: Cheng Li; QuantLib Mailing Lists > 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT > > Hi Cheng, > > indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). > > Does anyone have an idea where the different behaviour under gcc / > linux and msvc might come from (and how to improve the msvc side if > possible) ? > > Kind regards > Peter > > > > On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >> Thanks Peter. >> >> Regards, >> Cheng >> >> 发自我的 iPad >> >>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>> >>> I will have a look on monday ( I have a Windows machine at work ) >>> and see how it works there >>> >>> Thanks >>> Peter >>> >>> Von meinem iPhone gesendet >>> >>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>> >>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>> under release mode >>>> >>>> 发自我的 iPad >>>> >>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>> >>>>> Hi Cheng, >>>>> >>>>> no, I get better timings with the dcmt implementation, e.g. for >>>>> 1E8 numbers >>>>> >>>>> dcmt 0.982s >>>>> quantlib 1.159s >>>>> >>>>> on my computer. Can you post your platform and compiler settings, >>>>> so that I can try to reproduce ? >>>>> >>>>> Thanks >>>>> Peter >>>>> >>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>> Hi Peter, >>>>>> >>>>>> I have used your wrapper dcmt library and test with following >>>>>> codes: It seems dcmt in single thread is 4X slower than the QL >>>>>> original MT. Is this consistent with your side? >>>>>> >>>>>> #include <ql/quantlib.hpp> >>>>>> #include <boost/timer.hpp> >>>>>> #include <iostream> >>>>>> >>>>>> using namespace QuantLib; >>>>>> using namespace std; >>>>>> >>>>>> int main() { >>>>>> >>>>>> int samples; >>>>>> cin >> samples; >>>>>> boost::timer myTimer; >>>>>> >>>>>> MersenneTwisterUniformRng orignalMT; >>>>>> for(Size i=0; i<samples; ++i) >>>>>> orignalMT.next(); >>>>>> >>>>>> cout << myTimer.elapsed() << endl; >>>>>> >>>>>> myTimer.restart(); >>>>>> >>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>> >>>>>> for(Size i=0; i<samples; ++i) { >>>>>> mt.next(); >>>>>> } >>>>>> >>>>>> cout << myTimer.elapsed() << endl; >>>>>> >>>>>> int n; >>>>>> std::cin>>n; >>>>>> return 0; >>>>>> } >>>>>> >>>>>> Regards, >>>>>> Cheng >>>>>> >>>>>> -----邮件原件----- >>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>> 发送时间: 2014年9月6日 20:48 >>>>>> 收件人: Joseph Wang >>>>>> 抄送: QuantLib Mailing Lists >>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT >>>>>> >>>>>> Hi Joseph, all, >>>>>> >>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>> Mersenne Twisters). >>>>>> >>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>> >>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>> for use in at most 8 parallel threads), for the "standard" value >>>>>> p = 19937 and word size 32, which one can instantiate with >>>>>> >>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>> >>>>>> for i = 0, ... , 7. >>>>>> >>>>>> In addition the speed of random number generation seems a bit >>>>>> faster in the dcmt library than with the original ql twister. I >>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>> >>>>>> All this is of course experimental and not well tested, so any >>>>>> feedback and experiences are very welcome. I'd be very interested >>>>>> in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>> >>>>>> Peter >>>>>> >>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>> I've uploaded the changes to the >>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>> openmp-mcario has some changes that need some more work. >>>>>>> >>>>>>> I've gotten the MC to work by generating the paths in a critical >>>>>> situation. >>>>>>> Calculating the prices once I have the path is multithreaded, >>>>>>> but right now I need to generate the paths in a single thread to >>>>>>> make sure that the same sequence is generated. >>>>>>> >>>>>>> The big issue right now is that there is a race condition in the >>>>>>> calculation of barrier options which is causing one regression >>>>>>> test to fail. The problem is that the random number generator >>>>>>> is being called in BarrierPathPricer, and since that is run >>>>>>> multithread, the sequence that is being pulled will change from >>>>>>> run to run based on whether other paths have pulled random numbers already. >>>>>>> >>>>>>> I think that fixing this is going to need some code >>>>>>> restructuring, but I'd like to get some thoughts as to how to do >>>>>>> this. Basically, the interface needs to be changed slightly so >>>>>>> that the random numbers are drawn in a fixed order, and that >>>>>>> might mean one call to get any additional random numbers in a >>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ---------------------------------------------------------------- >>>>>>> - >>>>>>> ----- >>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>> webinars can help you accelerate application performance. >>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get >>>>>>> the most from the latest Intel processors and coprocessors. See >>>>>>> abstracts and register > >>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ >>>>>>> o stg.c lktrk _______________________________________________ >>>>>>> QuantLib-dev mailing list >>>>>>> [hidden email] >>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>> >>>>>> ----------------------------------------------------------------- >>>>>> - >>>>>> ---------- >>>>>> -- >>>>>> Slashdot TV. >>>>>> Video for Nerds. Stuff that matters. >>>>>> http://tv.slashdot.org/ >>>>>> _______________________________________________ >>>>>> QuantLib-dev mailing list >>>>>> [hidden email] >>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>> > ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Hi Cheng,
sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program. Actually on Windows (same machine on which I run Ubuntu, which doesn't really matter, because my computer in office gives very similar timings) I get for 1E8 random numbers generated (with O2) 400ms / 1100ms for the original ql mt / dynamic creator mt. The ql mt is just as fast as the boost mt implementation by the way. On Ubuntu with gcc 4.8.1 and O3 I get 290ms / 870ms and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0. If I directly call the original C routine without using the wrapper object, I get 720ms. If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms. This means, the wrapper introduces a slow down by 20% which seems not too bad. Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment. What is your opinion on this ? Peter I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases. Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms / 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0 => 1340 ms / 1150 ms clang 290 720 870 (c 730) so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt. At the moment I stil don't know what it is. On 18 September 2014 03:33, cheng li <[hidden email]> wrote: > Let me try your statement once I have a time. > > Regards, > Cheng > > -----邮件原件----- > 发件人: cheng li [mailto:[hidden email]] > 发送时间: 2014年9月18日 9:18 > 收件人: 'Peter Caspers' > 抄送: 'QuantLib Mailing Lists' > 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT > > Hi Peter, > > I used gcc 4.8.2. > > My result with O3 optimization is still not good. Similar performance of new MT ( about 3~4X speed down) > > I used such statement to turn on o3 optimization before I do ./configure for QuantLib, > > Export CXXFLAGS="-g -O3" > > Am I right? > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月18日 0:36 > 收件人: cheng li > 抄送: QuantLib Mailing Lists > 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT > > with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1). > I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. > > Which compiler do you use on Ubuntu ? > > Peter > > > > On 17 September 2014 03:26, cheng li <[hidden email]> wrote: >> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. >> >> I'll try -O3 on my machine also with Ubuntu. >> >> Regards, >> Cheng >> >> -----邮件原件----- >> 发件人: Peter Caspers [mailto:[hidden email]] >> 发送时间: 2014年9月17日 0:32 >> 收件人: Cheng Li; QuantLib Mailing Lists >> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT >> >> Hi Cheng, >> >> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). >> >> Does anyone have an idea where the different behaviour under gcc / >> linux and msvc might come from (and how to improve the msvc side if >> possible) ? >> >> Kind regards >> Peter >> >> >> >> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >>> Thanks Peter. >>> >>> Regards, >>> Cheng >>> >>> 发自我的 iPad >>> >>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>>> >>>> I will have a look on monday ( I have a Windows machine at work ) >>>> and see how it works there >>>> >>>> Thanks >>>> Peter >>>> >>>> Von meinem iPhone gesendet >>>> >>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>>> >>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>>> under release mode >>>>> >>>>> 发自我的 iPad >>>>> >>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>>> >>>>>> Hi Cheng, >>>>>> >>>>>> no, I get better timings with the dcmt implementation, e.g. for >>>>>> 1E8 numbers >>>>>> >>>>>> dcmt 0.982s >>>>>> quantlib 1.159s >>>>>> >>>>>> on my computer. Can you post your platform and compiler settings, >>>>>> so that I can try to reproduce ? >>>>>> >>>>>> Thanks >>>>>> Peter >>>>>> >>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>>> Hi Peter, >>>>>>> >>>>>>> I have used your wrapper dcmt library and test with following >>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL >>>>>>> original MT. Is this consistent with your side? >>>>>>> >>>>>>> #include <ql/quantlib.hpp> >>>>>>> #include <boost/timer.hpp> >>>>>>> #include <iostream> >>>>>>> >>>>>>> using namespace QuantLib; >>>>>>> using namespace std; >>>>>>> >>>>>>> int main() { >>>>>>> >>>>>>> int samples; >>>>>>> cin >> samples; >>>>>>> boost::timer myTimer; >>>>>>> >>>>>>> MersenneTwisterUniformRng orignalMT; >>>>>>> for(Size i=0; i<samples; ++i) >>>>>>> orignalMT.next(); >>>>>>> >>>>>>> cout << myTimer.elapsed() << endl; >>>>>>> >>>>>>> myTimer.restart(); >>>>>>> >>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>>> >>>>>>> for(Size i=0; i<samples; ++i) { >>>>>>> mt.next(); >>>>>>> } >>>>>>> >>>>>>> cout << myTimer.elapsed() << endl; >>>>>>> >>>>>>> int n; >>>>>>> std::cin>>n; >>>>>>> return 0; >>>>>>> } >>>>>>> >>>>>>> Regards, >>>>>>> Cheng >>>>>>> >>>>>>> -----邮件原件----- >>>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>>> 发送时间: 2014年9月6日 20:48 >>>>>>> 收件人: Joseph Wang >>>>>>> 抄送: QuantLib Mailing Lists >>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT >>>>>>> >>>>>>> Hi Joseph, all, >>>>>>> >>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>>> Mersenne Twisters). >>>>>>> >>>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>>> >>>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>>> for use in at most 8 parallel threads), for the "standard" value >>>>>>> p = 19937 and word size 32, which one can instantiate with >>>>>>> >>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>>> >>>>>>> for i = 0, ... , 7. >>>>>>> >>>>>>> In addition the speed of random number generation seems a bit >>>>>>> faster in the dcmt library than with the original ql twister. I >>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>>> >>>>>>> All this is of course experimental and not well tested, so any >>>>>>> feedback and experiences are very welcome. I'd be very interested >>>>>>> in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>>> I've uploaded the changes to the >>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>>> openmp-mcario has some changes that need some more work. >>>>>>>> >>>>>>>> I've gotten the MC to work by generating the paths in a critical >>>>>>> situation. >>>>>>>> Calculating the prices once I have the path is multithreaded, >>>>>>>> but right now I need to generate the paths in a single thread to >>>>>>>> make sure that the same sequence is generated. >>>>>>>> >>>>>>>> The big issue right now is that there is a race condition in the >>>>>>>> calculation of barrier options which is causing one regression >>>>>>>> test to fail. The problem is that the random number generator >>>>>>>> is being called in BarrierPathPricer, and since that is run >>>>>>>> multithread, the sequence that is being pulled will change from >>>>>>>> run to run based on whether other paths have pulled random numbers already. >>>>>>>> >>>>>>>> I think that fixing this is going to need some code >>>>>>>> restructuring, but I'd like to get some thoughts as to how to do >>>>>>>> this. Basically, the interface needs to be changed slightly so >>>>>>>> that the random numbers are drawn in a fixed order, and that >>>>>>>> might mean one call to get any additional random numbers in a >>>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ---------------------------------------------------------------- >>>>>>>> - >>>>>>>> ----- >>>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>>> webinars can help you accelerate application performance. >>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get >>>>>>>> the most from the latest Intel processors and coprocessors. See >>>>>>>> abstracts and register > >>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ >>>>>>>> o stg.c lktrk _______________________________________________ >>>>>>>> QuantLib-dev mailing list >>>>>>>> [hidden email] >>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>> >>>>>>> ----------------------------------------------------------------- >>>>>>> - >>>>>>> ---------- >>>>>>> -- >>>>>>> Slashdot TV. >>>>>>> Video for Nerds. Stuff that matters. >>>>>>> http://tv.slashdot.org/ >>>>>>> _______________________________________________ >>>>>>> QuantLib-dev mailing list >>>>>>> [hidden email] >>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>> >> > > ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that Matters. http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Hi Peter,
Thanks for your hard work. I think our results are consistent. Regards, Cheng -----邮件原件----- 发件人: Peter Caspers [mailto:[hidden email]] 发送时间: 2014年9月21日 0:33 收件人: cheng li 抄送: QuantLib Mailing Lists 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT Hi Cheng, sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program. Actually on Windows (same machine on which I run Ubuntu, which doesn't really matter, because my computer in office gives very similar timings) I get for 1E8 random numbers generated (with O2) 400ms / 1100ms for the original ql mt / dynamic creator mt. The ql mt is just as fast as the boost mt implementation by the way. On Ubuntu with gcc 4.8.1 and O3 I get 290ms / 870ms and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0. If I directly call the original C routine without using the wrapper object, I get 720ms. If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms. This means, the wrapper introduces a slow down by 20% which seems not too bad. Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment. What is your opinion on this ? Peter I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases. Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms / 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0 => 1340 ms / 1150 ms clang 290 720 870 (c 730) so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt. At the moment I stil don't know what it is. On 18 September 2014 03:33, cheng li <[hidden email]> wrote: > Let me try your statement once I have a time. > > Regards, > Cheng > > -----邮件原件----- > 发件人: cheng li [mailto:[hidden email]] > 发送时间: 2014年9月18日 9:18 > 收件人: 'Peter Caspers' > 抄送: 'QuantLib Mailing Lists' > 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator > MT > > Hi Peter, > > I used gcc 4.8.2. > > My result with O3 optimization is still not good. Similar performance > of new MT ( about 3~4X speed down) > > I used such statement to turn on o3 optimization before I do > ./configure for QuantLib, > > Export CXXFLAGS="-g -O3" > > Am I right? > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月18日 0:36 > 收件人: cheng li > 抄送: QuantLib Mailing Lists > 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator > MT > > with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1). > I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. > > Which compiler do you use on Ubuntu ? > > Peter > > > > On 17 September 2014 03:26, cheng li <[hidden email]> wrote: >> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. >> >> I'll try -O3 on my machine also with Ubuntu. >> >> Regards, >> Cheng >> >> -----邮件原件----- >> 发件人: Peter Caspers [mailto:[hidden email]] >> 发送时间: 2014年9月17日 0:32 >> 收件人: Cheng Li; QuantLib Mailing Lists >> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT >> >> Hi Cheng, >> >> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). >> >> Does anyone have an idea where the different behaviour under gcc / >> linux and msvc might come from (and how to improve the msvc side if >> possible) ? >> >> Kind regards >> Peter >> >> >> >> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >>> Thanks Peter. >>> >>> Regards, >>> Cheng >>> >>> 发自我的 iPad >>> >>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>>> >>>> I will have a look on monday ( I have a Windows machine at work ) >>>> and see how it works there >>>> >>>> Thanks >>>> Peter >>>> >>>> Von meinem iPhone gesendet >>>> >>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>>> >>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>>> under release mode >>>>> >>>>> 发自我的 iPad >>>>> >>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>>> >>>>>> Hi Cheng, >>>>>> >>>>>> no, I get better timings with the dcmt implementation, e.g. for >>>>>> 1E8 numbers >>>>>> >>>>>> dcmt 0.982s >>>>>> quantlib 1.159s >>>>>> >>>>>> on my computer. Can you post your platform and compiler settings, >>>>>> so that I can try to reproduce ? >>>>>> >>>>>> Thanks >>>>>> Peter >>>>>> >>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>>> Hi Peter, >>>>>>> >>>>>>> I have used your wrapper dcmt library and test with following >>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL >>>>>>> original MT. Is this consistent with your side? >>>>>>> >>>>>>> #include <ql/quantlib.hpp> >>>>>>> #include <boost/timer.hpp> >>>>>>> #include <iostream> >>>>>>> >>>>>>> using namespace QuantLib; >>>>>>> using namespace std; >>>>>>> >>>>>>> int main() { >>>>>>> >>>>>>> int samples; >>>>>>> cin >> samples; >>>>>>> boost::timer myTimer; >>>>>>> >>>>>>> MersenneTwisterUniformRng orignalMT; >>>>>>> for(Size i=0; i<samples; ++i) >>>>>>> orignalMT.next(); >>>>>>> >>>>>>> cout << myTimer.elapsed() << endl; >>>>>>> >>>>>>> myTimer.restart(); >>>>>>> >>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>>> >>>>>>> for(Size i=0; i<samples; ++i) { >>>>>>> mt.next(); >>>>>>> } >>>>>>> >>>>>>> cout << myTimer.elapsed() << endl; >>>>>>> >>>>>>> int n; >>>>>>> std::cin>>n; >>>>>>> return 0; >>>>>>> } >>>>>>> >>>>>>> Regards, >>>>>>> Cheng >>>>>>> >>>>>>> -----邮件原件----- >>>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>>> 发送时间: 2014年9月6日 20:48 >>>>>>> 收件人: Joseph Wang >>>>>>> 抄送: QuantLib Mailing Lists >>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>>>>>> MT >>>>>>> >>>>>>> Hi Joseph, all, >>>>>>> >>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>>> Mersenne Twisters). >>>>>>> >>>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>>> >>>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>>> for use in at most 8 parallel threads), for the "standard" value >>>>>>> p = 19937 and word size 32, which one can instantiate with >>>>>>> >>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>>> >>>>>>> for i = 0, ... , 7. >>>>>>> >>>>>>> In addition the speed of random number generation seems a bit >>>>>>> faster in the dcmt library than with the original ql twister. I >>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>>> >>>>>>> All this is of course experimental and not well tested, so any >>>>>>> feedback and experiences are very welcome. I'd be very >>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>>> I've uploaded the changes to the >>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>>> openmp-mcario has some changes that need some more work. >>>>>>>> >>>>>>>> I've gotten the MC to work by generating the paths in a >>>>>>>> critical >>>>>>> situation. >>>>>>>> Calculating the prices once I have the path is multithreaded, >>>>>>>> but right now I need to generate the paths in a single thread >>>>>>>> to make sure that the same sequence is generated. >>>>>>>> >>>>>>>> The big issue right now is that there is a race condition in >>>>>>>> the calculation of barrier options which is causing one >>>>>>>> regression test to fail. The problem is that the random number >>>>>>>> generator is being called in BarrierPathPricer, and since that >>>>>>>> is run multithread, the sequence that is being pulled will >>>>>>>> change from run to run based on whether other paths have pulled random numbers already. >>>>>>>> >>>>>>>> I think that fixing this is going to need some code >>>>>>>> restructuring, but I'd like to get some thoughts as to how to >>>>>>>> do this. Basically, the interface needs to be changed slightly >>>>>>>> so that the random numbers are drawn in a fixed order, and that >>>>>>>> might mean one call to get any additional random numbers in a >>>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------- >>>>>>>> - >>>>>>>> - >>>>>>>> ----- >>>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>>> webinars can help you accelerate application performance. >>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get >>>>>>>> the most from the latest Intel processors and coprocessors. See >>>>>>>> abstracts and register > >>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140 >>>>>>>> / o stg.c lktrk _______________________________________________ >>>>>>>> QuantLib-dev mailing list >>>>>>>> [hidden email] >>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>> >>>>>>> ---------------------------------------------------------------- >>>>>>> - >>>>>>> - >>>>>>> ---------- >>>>>>> -- >>>>>>> Slashdot TV. >>>>>>> Video for Nerds. Stuff that matters. >>>>>>> http://tv.slashdot.org/ >>>>>>> _______________________________________________ >>>>>>> QuantLib-dev mailing list >>>>>>> [hidden email] >>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>> >> > > ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that Matters. http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Hi Cheng,
I switched to a template class for precomputed twisters, which is faster by a factor of 2 (450ms instead of 870ms). This can be instantiated with MersenneTwisterCustomRng<Mtdesc19937_5> mt(42); with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime. The pull request is updated accordingly. Best regards Peter On 21 September 2014 08:11, cheng.li <[hidden email]> wrote: > Hi Peter, > > Thanks for your hard work. I think our results are consistent. > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月21日 0:33 > 收件人: cheng li > 抄送: QuantLib Mailing Lists > 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT > > Hi Cheng, > > sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program. > > Actually on Windows (same machine on which I run Ubuntu, which doesn't really matter, because my computer in office gives very similar > timings) I get for 1E8 random numbers generated (with O2) > > 400ms / 1100ms > > for the original ql mt / dynamic creator mt. The ql mt is just as fast as the boost mt implementation by the way. On Ubuntu with gcc 4.8.1 and O3 I get > > 290ms / 870ms > > and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0. > > If I directly call the original C routine without using the wrapper object, I get 720ms. > > If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms. > > This means, the wrapper introduces a slow down by 20% which seems not too bad. > > Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment. > > What is your opinion on this ? > > Peter > > > > > > > > > > > > > I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases. > > Winodws / MSVC 2010 => 400ms / 1100ms > Ubuntu / gcc 4.9.1 => 1200 ms / 1050 ms > Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms > Ubuntu / clang 3.6.0 => 1340 ms / 1150 ms > > clang > 290 > 720 > 870 > > (c 730) > > so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt. > > At the moment I stil don't know what it is. > > On 18 September 2014 03:33, cheng li <[hidden email]> wrote: >> Let me try your statement once I have a time. >> >> Regards, >> Cheng >> >> -----邮件原件----- >> 发件人: cheng li [mailto:[hidden email]] >> 发送时间: 2014年9月18日 9:18 >> 收件人: 'Peter Caspers' >> 抄送: 'QuantLib Mailing Lists' >> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >> MT >> >> Hi Peter, >> >> I used gcc 4.8.2. >> >> My result with O3 optimization is still not good. Similar performance >> of new MT ( about 3~4X speed down) >> >> I used such statement to turn on o3 optimization before I do >> ./configure for QuantLib, >> >> Export CXXFLAGS="-g -O3" >> >> Am I right? >> >> Regards, >> Cheng >> >> -----邮件原件----- >> 发件人: Peter Caspers [mailto:[hidden email]] >> 发送时间: 2014年9月18日 0:36 >> 收件人: cheng li >> 抄送: QuantLib Mailing Lists >> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >> MT >> >> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1). >> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. >> >> Which compiler do you use on Ubuntu ? >> >> Peter >> >> >> >> On 17 September 2014 03:26, cheng li <[hidden email]> wrote: >>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. >>> >>> I'll try -O3 on my machine also with Ubuntu. >>> >>> Regards, >>> Cheng >>> >>> -----邮件原件----- >>> 发件人: Peter Caspers [mailto:[hidden email]] >>> 发送时间: 2014年9月17日 0:32 >>> 收件人: Cheng Li; QuantLib Mailing Lists >>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT >>> >>> Hi Cheng, >>> >>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). >>> >>> Does anyone have an idea where the different behaviour under gcc / >>> linux and msvc might come from (and how to improve the msvc side if >>> possible) ? >>> >>> Kind regards >>> Peter >>> >>> >>> >>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >>>> Thanks Peter. >>>> >>>> Regards, >>>> Cheng >>>> >>>> 发自我的 iPad >>>> >>>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>>>> >>>>> I will have a look on monday ( I have a Windows machine at work ) >>>>> and see how it works there >>>>> >>>>> Thanks >>>>> Peter >>>>> >>>>> Von meinem iPhone gesendet >>>>> >>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>>>> >>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>>>> under release mode >>>>>> >>>>>> 发自我的 iPad >>>>>> >>>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>>>> >>>>>>> Hi Cheng, >>>>>>> >>>>>>> no, I get better timings with the dcmt implementation, e.g. for >>>>>>> 1E8 numbers >>>>>>> >>>>>>> dcmt 0.982s >>>>>>> quantlib 1.159s >>>>>>> >>>>>>> on my computer. Can you post your platform and compiler settings, >>>>>>> so that I can try to reproduce ? >>>>>>> >>>>>>> Thanks >>>>>>> Peter >>>>>>> >>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>>>> Hi Peter, >>>>>>>> >>>>>>>> I have used your wrapper dcmt library and test with following >>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL >>>>>>>> original MT. Is this consistent with your side? >>>>>>>> >>>>>>>> #include <ql/quantlib.hpp> >>>>>>>> #include <boost/timer.hpp> >>>>>>>> #include <iostream> >>>>>>>> >>>>>>>> using namespace QuantLib; >>>>>>>> using namespace std; >>>>>>>> >>>>>>>> int main() { >>>>>>>> >>>>>>>> int samples; >>>>>>>> cin >> samples; >>>>>>>> boost::timer myTimer; >>>>>>>> >>>>>>>> MersenneTwisterUniformRng orignalMT; >>>>>>>> for(Size i=0; i<samples; ++i) >>>>>>>> orignalMT.next(); >>>>>>>> >>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>> >>>>>>>> myTimer.restart(); >>>>>>>> >>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>>>> >>>>>>>> for(Size i=0; i<samples; ++i) { >>>>>>>> mt.next(); >>>>>>>> } >>>>>>>> >>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>> >>>>>>>> int n; >>>>>>>> std::cin>>n; >>>>>>>> return 0; >>>>>>>> } >>>>>>>> >>>>>>>> Regards, >>>>>>>> Cheng >>>>>>>> >>>>>>>> -----邮件原件----- >>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>>>> 发送时间: 2014年9月6日 20:48 >>>>>>>> 收件人: Joseph Wang >>>>>>>> 抄送: QuantLib Mailing Lists >>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>>>>>>> MT >>>>>>>> >>>>>>>> Hi Joseph, all, >>>>>>>> >>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>>>> Mersenne Twisters). >>>>>>>> >>>>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>>>> >>>>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>>>> for use in at most 8 parallel threads), for the "standard" value >>>>>>>> p = 19937 and word size 32, which one can instantiate with >>>>>>>> >>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>>>> >>>>>>>> for i = 0, ... , 7. >>>>>>>> >>>>>>>> In addition the speed of random number generation seems a bit >>>>>>>> faster in the dcmt library than with the original ql twister. I >>>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>>>> >>>>>>>> All this is of course experimental and not well tested, so any >>>>>>>> feedback and experiences are very welcome. I'd be very >>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>>>> >>>>>>>> Peter >>>>>>>> >>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>>>> I've uploaded the changes to the >>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>>>> openmp-mcario has some changes that need some more work. >>>>>>>>> >>>>>>>>> I've gotten the MC to work by generating the paths in a >>>>>>>>> critical >>>>>>>> situation. >>>>>>>>> Calculating the prices once I have the path is multithreaded, >>>>>>>>> but right now I need to generate the paths in a single thread >>>>>>>>> to make sure that the same sequence is generated. >>>>>>>>> >>>>>>>>> The big issue right now is that there is a race condition in >>>>>>>>> the calculation of barrier options which is causing one >>>>>>>>> regression test to fail. The problem is that the random number >>>>>>>>> generator is being called in BarrierPathPricer, and since that >>>>>>>>> is run multithread, the sequence that is being pulled will >>>>>>>>> change from run to run based on whether other paths have pulled random numbers already. >>>>>>>>> >>>>>>>>> I think that fixing this is going to need some code >>>>>>>>> restructuring, but I'd like to get some thoughts as to how to >>>>>>>>> do this. Basically, the interface needs to be changed slightly >>>>>>>>> so that the random numbers are drawn in a fixed order, and that >>>>>>>>> might mean one call to get any additional random numbers in a >>>>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------- >>>>>>>>> - >>>>>>>>> - >>>>>>>>> ----- >>>>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>>>> webinars can help you accelerate application performance. >>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get >>>>>>>>> the most from the latest Intel processors and coprocessors. See >>>>>>>>> abstracts and register > >>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140 >>>>>>>>> / o stg.c lktrk _______________________________________________ >>>>>>>>> QuantLib-dev mailing list >>>>>>>>> [hidden email] >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>> >>>>>>>> ---------------------------------------------------------------- >>>>>>>> - >>>>>>>> - >>>>>>>> ---------- >>>>>>>> -- >>>>>>>> Slashdot TV. >>>>>>>> Video for Nerds. Stuff that matters. >>>>>>>> http://tv.slashdot.org/ >>>>>>>> _______________________________________________ >>>>>>>> QuantLib-dev mailing list >>>>>>>> [hidden email] >>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>> >>> >> >> > ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that Matters. http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Hi Peter,
Thanks for your effort. I'll definitely have a try:) Regards, Cheng -----邮件原件----- 发件人: Peter Caspers [mailto:[hidden email]] 发送时间: 2014年9月21日 23:11 收件人: cheng.li 抄送: QuantLib Mailing Lists 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT Hi Cheng, I switched to a template class for precomputed twisters, which is faster by a factor of 2 (450ms instead of 870ms). This can be instantiated with MersenneTwisterCustomRng<Mtdesc19937_5> mt(42); with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime. The pull request is updated accordingly. Best regards Peter On 21 September 2014 08:11, cheng.li <[hidden email]> wrote: > Hi Peter, > > Thanks for your hard work. I think our results are consistent. > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月21日 0:33 > 收件人: cheng li > 抄送: QuantLib Mailing Lists > 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic > Creator MT > > Hi Cheng, > > sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program. > > Actually on Windows (same machine on which I run Ubuntu, which doesn't > really matter, because my computer in office gives very similar > timings) I get for 1E8 random numbers generated (with O2) > > 400ms / 1100ms > > for the original ql mt / dynamic creator mt. The ql mt is just as fast > as the boost mt implementation by the way. On Ubuntu with gcc 4.8.1 > and O3 I get > > 290ms / 870ms > > and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0. > > If I directly call the original C routine without using the wrapper object, I get 720ms. > > If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms. > > This means, the wrapper introduces a slow down by 20% which seems not too bad. > > Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment. > > What is your opinion on this ? > > Peter > > > > > > > > > > > > > I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases. > > Winodws / MSVC 2010 => 400ms / 1100ms > Ubuntu / gcc 4.9.1 => 1200 ms / 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms > / 1040 ms Ubuntu / clang 3.6.0 => 1340 ms / 1150 ms > > clang > 290 > 720 > 870 > > (c 730) > > so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt. > > At the moment I stil don't know what it is. > > On 18 September 2014 03:33, cheng li <[hidden email]> wrote: >> Let me try your statement once I have a time. >> >> Regards, >> Cheng >> >> -----邮件原件----- >> 发件人: cheng li [mailto:[hidden email]] >> 发送时间: 2014年9月18日 9:18 >> 收件人: 'Peter Caspers' >> 抄送: 'QuantLib Mailing Lists' >> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >> Creator MT >> >> Hi Peter, >> >> I used gcc 4.8.2. >> >> My result with O3 optimization is still not good. Similar performance >> of new MT ( about 3~4X speed down) >> >> I used such statement to turn on o3 optimization before I do >> ./configure for QuantLib, >> >> Export CXXFLAGS="-g -O3" >> >> Am I right? >> >> Regards, >> Cheng >> >> -----邮件原件----- >> 发件人: Peter Caspers [mailto:[hidden email]] >> 发送时间: 2014年9月18日 0:36 >> 收件人: cheng li >> 抄送: QuantLib Mailing Lists >> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >> Creator MT >> >> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1). >> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. >> >> Which compiler do you use on Ubuntu ? >> >> Peter >> >> >> >> On 17 September 2014 03:26, cheng li <[hidden email]> wrote: >>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. >>> >>> I'll try -O3 on my machine also with Ubuntu. >>> >>> Regards, >>> Cheng >>> >>> -----邮件原件----- >>> 发件人: Peter Caspers [mailto:[hidden email]] >>> 发送时间: 2014年9月17日 0:32 >>> 收件人: Cheng Li; QuantLib Mailing Lists >>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>> MT >>> >>> Hi Cheng, >>> >>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). >>> >>> Does anyone have an idea where the different behaviour under gcc / >>> linux and msvc might come from (and how to improve the msvc side if >>> possible) ? >>> >>> Kind regards >>> Peter >>> >>> >>> >>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >>>> Thanks Peter. >>>> >>>> Regards, >>>> Cheng >>>> >>>> 发自我的 iPad >>>> >>>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>>>> >>>>> I will have a look on monday ( I have a Windows machine at work ) >>>>> and see how it works there >>>>> >>>>> Thanks >>>>> Peter >>>>> >>>>> Von meinem iPhone gesendet >>>>> >>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>>>> >>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>>>> under release mode >>>>>> >>>>>> 发自我的 iPad >>>>>> >>>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>>>> >>>>>>> Hi Cheng, >>>>>>> >>>>>>> no, I get better timings with the dcmt implementation, e.g. for >>>>>>> 1E8 numbers >>>>>>> >>>>>>> dcmt 0.982s >>>>>>> quantlib 1.159s >>>>>>> >>>>>>> on my computer. Can you post your platform and compiler >>>>>>> settings, so that I can try to reproduce ? >>>>>>> >>>>>>> Thanks >>>>>>> Peter >>>>>>> >>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>>>> Hi Peter, >>>>>>>> >>>>>>>> I have used your wrapper dcmt library and test with following >>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL >>>>>>>> original MT. Is this consistent with your side? >>>>>>>> >>>>>>>> #include <ql/quantlib.hpp> >>>>>>>> #include <boost/timer.hpp> >>>>>>>> #include <iostream> >>>>>>>> >>>>>>>> using namespace QuantLib; >>>>>>>> using namespace std; >>>>>>>> >>>>>>>> int main() { >>>>>>>> >>>>>>>> int samples; >>>>>>>> cin >> samples; >>>>>>>> boost::timer myTimer; >>>>>>>> >>>>>>>> MersenneTwisterUniformRng orignalMT; >>>>>>>> for(Size i=0; i<samples; ++i) >>>>>>>> orignalMT.next(); >>>>>>>> >>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>> >>>>>>>> myTimer.restart(); >>>>>>>> >>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>>>> >>>>>>>> for(Size i=0; i<samples; ++i) { >>>>>>>> mt.next(); >>>>>>>> } >>>>>>>> >>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>> >>>>>>>> int n; >>>>>>>> std::cin>>n; >>>>>>>> return 0; >>>>>>>> } >>>>>>>> >>>>>>>> Regards, >>>>>>>> Cheng >>>>>>>> >>>>>>>> -----邮件原件----- >>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>>>> 发送时间: 2014年9月6日 20:48 >>>>>>>> 收件人: Joseph Wang >>>>>>>> 抄送: QuantLib Mailing Lists >>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>>>>>>> MT >>>>>>>> >>>>>>>> Hi Joseph, all, >>>>>>>> >>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>>>> Mersenne Twisters). >>>>>>>> >>>>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>>>> >>>>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>>>> for use in at most 8 parallel threads), for the "standard" >>>>>>>> value p = 19937 and word size 32, which one can instantiate >>>>>>>> with >>>>>>>> >>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>>>> >>>>>>>> for i = 0, ... , 7. >>>>>>>> >>>>>>>> In addition the speed of random number generation seems a bit >>>>>>>> faster in the dcmt library than with the original ql twister. I >>>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>>>> >>>>>>>> All this is of course experimental and not well tested, so any >>>>>>>> feedback and experiences are very welcome. I'd be very >>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>>>> >>>>>>>> Peter >>>>>>>> >>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>>>> I've uploaded the changes to the >>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>>>> openmp-mcario has some changes that need some more work. >>>>>>>>> >>>>>>>>> I've gotten the MC to work by generating the paths in a >>>>>>>>> critical >>>>>>>> situation. >>>>>>>>> Calculating the prices once I have the path is multithreaded, >>>>>>>>> but right now I need to generate the paths in a single thread >>>>>>>>> to make sure that the same sequence is generated. >>>>>>>>> >>>>>>>>> The big issue right now is that there is a race condition in >>>>>>>>> the calculation of barrier options which is causing one >>>>>>>>> regression test to fail. The problem is that the random >>>>>>>>> number generator is being called in BarrierPathPricer, and >>>>>>>>> since that is run multithread, the sequence that is being >>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already. >>>>>>>>> >>>>>>>>> I think that fixing this is going to need some code >>>>>>>>> restructuring, but I'd like to get some thoughts as to how to >>>>>>>>> do this. Basically, the interface needs to be changed >>>>>>>>> slightly so that the random numbers are drawn in a fixed >>>>>>>>> order, and that might mean one call to get any additional >>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> - >>>>>>>>> - >>>>>>>>> - >>>>>>>>> ----- >>>>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>>>> webinars can help you accelerate application performance. >>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. >>>>>>>>> Get the most from the latest Intel processors and >>>>>>>>> coprocessors. See abstracts and register > >>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/414 >>>>>>>>> 0 / o stg.c lktrk >>>>>>>>> _______________________________________________ >>>>>>>>> QuantLib-dev mailing list >>>>>>>>> [hidden email] >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>> >>>>>>>> --------------------------------------------------------------- >>>>>>>> - >>>>>>>> - >>>>>>>> - >>>>>>>> ---------- >>>>>>>> -- >>>>>>>> Slashdot TV. >>>>>>>> Video for Nerds. Stuff that matters. >>>>>>>> http://tv.slashdot.org/ >>>>>>>> _______________________________________________ >>>>>>>> QuantLib-dev mailing list >>>>>>>> [hidden email] >>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>> >>> >> >> > ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
yes, please. The slowdown on Windows on my office computer is around 1.6 now.
best regards Peter On 22 September 2014 03:48, cheng li <[hidden email]> wrote: > Hi Peter, > > Thanks for your effort. I'll definitely have a try:) > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月21日 23:11 > 收件人: cheng.li > 抄送: QuantLib Mailing Lists > 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT > > Hi Cheng, > > I switched to a template class for precomputed twisters, which is faster by a factor of 2 (450ms instead of 870ms). This can be instantiated with > > MersenneTwisterCustomRng<Mtdesc19937_5> mt(42); > > with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime. > > The pull request is updated accordingly. > > Best regards > Peter > > > > > On 21 September 2014 08:11, cheng.li <[hidden email]> wrote: >> Hi Peter, >> >> Thanks for your hard work. I think our results are consistent. >> >> Regards, >> Cheng >> >> -----邮件原件----- >> 发件人: Peter Caspers [mailto:[hidden email]] >> 发送时间: 2014年9月21日 0:33 >> 收件人: cheng li >> 抄送: QuantLib Mailing Lists >> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >> Creator MT >> >> Hi Cheng, >> >> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program. >> >> Actually on Windows (same machine on which I run Ubuntu, which doesn't >> really matter, because my computer in office gives very similar >> timings) I get for 1E8 random numbers generated (with O2) >> >> 400ms / 1100ms >> >> for the original ql mt / dynamic creator mt. The ql mt is just as fast >> as the boost mt implementation by the way. On Ubuntu with gcc 4.8.1 >> and O3 I get >> >> 290ms / 870ms >> >> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0. >> >> If I directly call the original C routine without using the wrapper object, I get 720ms. >> >> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms. >> >> This means, the wrapper introduces a slow down by 20% which seems not too bad. >> >> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment. >> >> What is your opinion on this ? >> >> Peter >> >> >> >> >> >> >> >> >> >> >> >> >> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases. >> >> Winodws / MSVC 2010 => 400ms / 1100ms >> Ubuntu / gcc 4.9.1 => 1200 ms / 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms >> / 1040 ms Ubuntu / clang 3.6.0 => 1340 ms / 1150 ms >> >> clang >> 290 >> 720 >> 870 >> >> (c 730) >> >> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt. >> >> At the moment I stil don't know what it is. >> >> On 18 September 2014 03:33, cheng li <[hidden email]> wrote: >>> Let me try your statement once I have a time. >>> >>> Regards, >>> Cheng >>> >>> -----邮件原件----- >>> 发件人: cheng li [mailto:[hidden email]] >>> 发送时间: 2014年9月18日 9:18 >>> 收件人: 'Peter Caspers' >>> 抄送: 'QuantLib Mailing Lists' >>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>> Creator MT >>> >>> Hi Peter, >>> >>> I used gcc 4.8.2. >>> >>> My result with O3 optimization is still not good. Similar performance >>> of new MT ( about 3~4X speed down) >>> >>> I used such statement to turn on o3 optimization before I do >>> ./configure for QuantLib, >>> >>> Export CXXFLAGS="-g -O3" >>> >>> Am I right? >>> >>> Regards, >>> Cheng >>> >>> -----邮件原件----- >>> 发件人: Peter Caspers [mailto:[hidden email]] >>> 发送时间: 2014年9月18日 0:36 >>> 收件人: cheng li >>> 抄送: QuantLib Mailing Lists >>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>> Creator MT >>> >>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1). >>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. >>> >>> Which compiler do you use on Ubuntu ? >>> >>> Peter >>> >>> >>> >>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote: >>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. >>>> >>>> I'll try -O3 on my machine also with Ubuntu. >>>> >>>> Regards, >>>> Cheng >>>> >>>> -----邮件原件----- >>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>> 发送时间: 2014年9月17日 0:32 >>>> 收件人: Cheng Li; QuantLib Mailing Lists >>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>>> MT >>>> >>>> Hi Cheng, >>>> >>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). >>>> >>>> Does anyone have an idea where the different behaviour under gcc / >>>> linux and msvc might come from (and how to improve the msvc side if >>>> possible) ? >>>> >>>> Kind regards >>>> Peter >>>> >>>> >>>> >>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >>>>> Thanks Peter. >>>>> >>>>> Regards, >>>>> Cheng >>>>> >>>>> 发自我的 iPad >>>>> >>>>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>>>>> >>>>>> I will have a look on monday ( I have a Windows machine at work ) >>>>>> and see how it works there >>>>>> >>>>>> Thanks >>>>>> Peter >>>>>> >>>>>> Von meinem iPhone gesendet >>>>>> >>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>>>>> >>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>>>>> under release mode >>>>>>> >>>>>>> 发自我的 iPad >>>>>>> >>>>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>>>>> >>>>>>>> Hi Cheng, >>>>>>>> >>>>>>>> no, I get better timings with the dcmt implementation, e.g. for >>>>>>>> 1E8 numbers >>>>>>>> >>>>>>>> dcmt 0.982s >>>>>>>> quantlib 1.159s >>>>>>>> >>>>>>>> on my computer. Can you post your platform and compiler >>>>>>>> settings, so that I can try to reproduce ? >>>>>>>> >>>>>>>> Thanks >>>>>>>> Peter >>>>>>>> >>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>>>>> Hi Peter, >>>>>>>>> >>>>>>>>> I have used your wrapper dcmt library and test with following >>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL >>>>>>>>> original MT. Is this consistent with your side? >>>>>>>>> >>>>>>>>> #include <ql/quantlib.hpp> >>>>>>>>> #include <boost/timer.hpp> >>>>>>>>> #include <iostream> >>>>>>>>> >>>>>>>>> using namespace QuantLib; >>>>>>>>> using namespace std; >>>>>>>>> >>>>>>>>> int main() { >>>>>>>>> >>>>>>>>> int samples; >>>>>>>>> cin >> samples; >>>>>>>>> boost::timer myTimer; >>>>>>>>> >>>>>>>>> MersenneTwisterUniformRng orignalMT; >>>>>>>>> for(Size i=0; i<samples; ++i) >>>>>>>>> orignalMT.next(); >>>>>>>>> >>>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>>> >>>>>>>>> myTimer.restart(); >>>>>>>>> >>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>>>>> >>>>>>>>> for(Size i=0; i<samples; ++i) { >>>>>>>>> mt.next(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>>> >>>>>>>>> int n; >>>>>>>>> std::cin>>n; >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Cheng >>>>>>>>> >>>>>>>>> -----邮件原件----- >>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>>>>> 发送时间: 2014年9月6日 20:48 >>>>>>>>> 收件人: Joseph Wang >>>>>>>>> 抄送: QuantLib Mailing Lists >>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>>>>>>>> MT >>>>>>>>> >>>>>>>>> Hi Joseph, all, >>>>>>>>> >>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>>>>> Mersenne Twisters). >>>>>>>>> >>>>>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>>>>> >>>>>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>>>>> for use in at most 8 parallel threads), for the "standard" >>>>>>>>> value p = 19937 and word size 32, which one can instantiate >>>>>>>>> with >>>>>>>>> >>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>>>>> >>>>>>>>> for i = 0, ... , 7. >>>>>>>>> >>>>>>>>> In addition the speed of random number generation seems a bit >>>>>>>>> faster in the dcmt library than with the original ql twister. I >>>>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>>>>> >>>>>>>>> All this is of course experimental and not well tested, so any >>>>>>>>> feedback and experiences are very welcome. I'd be very >>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>>>>> >>>>>>>>> Peter >>>>>>>>> >>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>>>>> I've uploaded the changes to the >>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>>>>> openmp-mcario has some changes that need some more work. >>>>>>>>>> >>>>>>>>>> I've gotten the MC to work by generating the paths in a >>>>>>>>>> critical >>>>>>>>> situation. >>>>>>>>>> Calculating the prices once I have the path is multithreaded, >>>>>>>>>> but right now I need to generate the paths in a single thread >>>>>>>>>> to make sure that the same sequence is generated. >>>>>>>>>> >>>>>>>>>> The big issue right now is that there is a race condition in >>>>>>>>>> the calculation of barrier options which is causing one >>>>>>>>>> regression test to fail. The problem is that the random >>>>>>>>>> number generator is being called in BarrierPathPricer, and >>>>>>>>>> since that is run multithread, the sequence that is being >>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already. >>>>>>>>>> >>>>>>>>>> I think that fixing this is going to need some code >>>>>>>>>> restructuring, but I'd like to get some thoughts as to how to >>>>>>>>>> do this. Basically, the interface needs to be changed >>>>>>>>>> slightly so that the random numbers are drawn in a fixed >>>>>>>>>> order, and that might mean one call to get any additional >>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> ----- >>>>>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>>>>> webinars can help you accelerate application performance. >>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. >>>>>>>>>> Get the most from the latest Intel processors and >>>>>>>>>> coprocessors. See abstracts and register > >>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/414 >>>>>>>>>> 0 / o stg.c lktrk >>>>>>>>>> _______________________________________________ >>>>>>>>>> QuantLib-dev mailing list >>>>>>>>>> [hidden email] >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>>> >>>>>>>>> --------------------------------------------------------------- >>>>>>>>> - >>>>>>>>> - >>>>>>>>> - >>>>>>>>> ---------- >>>>>>>>> -- >>>>>>>>> Slashdot TV. >>>>>>>>> Video for Nerds. Stuff that matters. >>>>>>>>> http://tv.slashdot.org/ >>>>>>>>> _______________________________________________ >>>>>>>>> QuantLib-dev mailing list >>>>>>>>> [hidden email] >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>>> >>>> >>> >>> >> > ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Hi Peter,
On my side the performance is also improved. Now around 2.5 slow down. Thanks for your help. Regards, Cheng -----邮件原件----- 发件人: Peter Caspers [mailto:[hidden email]] 发送时间: 2014年9月22日 16:05 收件人: cheng li 抄送: QuantLib Mailing Lists 主题: Re: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT yes, please. The slowdown on Windows on my office computer is around 1.6 now. best regards Peter On 22 September 2014 03:48, cheng li <[hidden email]> wrote: > Hi Peter, > > Thanks for your effort. I'll definitely have a try:) > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月21日 23:11 > 收件人: cheng.li > 抄送: QuantLib Mailing Lists > 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic > Creator MT > > Hi Cheng, > > I switched to a template class for precomputed twisters, which is > faster by a factor of 2 (450ms instead of 870ms). This can be > instantiated with > > MersenneTwisterCustomRng<Mtdesc19937_5> mt(42); > > with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime. > > The pull request is updated accordingly. > > Best regards > Peter > > > > > On 21 September 2014 08:11, cheng.li <[hidden email]> wrote: >> Hi Peter, >> >> Thanks for your hard work. I think our results are consistent. >> >> Regards, >> Cheng >> >> -----邮件原件----- >> 发件人: Peter Caspers [mailto:[hidden email]] >> 发送时间: 2014年9月21日 0:33 >> 收件人: cheng li >> 抄送: QuantLib Mailing Lists >> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >> Creator MT >> >> Hi Cheng, >> >> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program. >> >> Actually on Windows (same machine on which I run Ubuntu, which >> doesn't really matter, because my computer in office gives very >> similar >> timings) I get for 1E8 random numbers generated (with O2) >> >> 400ms / 1100ms >> >> for the original ql mt / dynamic creator mt. The ql mt is just as >> fast as the boost mt implementation by the way. On Ubuntu with gcc >> 4.8.1 and O3 I get >> >> 290ms / 870ms >> >> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0. >> >> If I directly call the original C routine without using the wrapper object, I get 720ms. >> >> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms. >> >> This means, the wrapper introduces a slow down by 20% which seems not too bad. >> >> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment. >> >> What is your opinion on this ? >> >> Peter >> >> >> >> >> >> >> >> >> >> >> >> >> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases. >> >> Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms / >> 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0 >> => 1340 ms / 1150 ms >> >> clang >> 290 >> 720 >> 870 >> >> (c 730) >> >> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt. >> >> At the moment I stil don't know what it is. >> >> On 18 September 2014 03:33, cheng li <[hidden email]> wrote: >>> Let me try your statement once I have a time. >>> >>> Regards, >>> Cheng >>> >>> -----邮件原件----- >>> 发件人: cheng li [mailto:[hidden email]] >>> 发送时间: 2014年9月18日 9:18 >>> 收件人: 'Peter Caspers' >>> 抄送: 'QuantLib Mailing Lists' >>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>> Creator MT >>> >>> Hi Peter, >>> >>> I used gcc 4.8.2. >>> >>> My result with O3 optimization is still not good. Similar >>> performance of new MT ( about 3~4X speed down) >>> >>> I used such statement to turn on o3 optimization before I do >>> ./configure for QuantLib, >>> >>> Export CXXFLAGS="-g -O3" >>> >>> Am I right? >>> >>> Regards, >>> Cheng >>> >>> -----邮件原件----- >>> 发件人: Peter Caspers [mailto:[hidden email]] >>> 发送时间: 2014年9月18日 0:36 >>> 收件人: cheng li >>> 抄送: QuantLib Mailing Lists >>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>> Creator MT >>> >>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1). >>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. >>> >>> Which compiler do you use on Ubuntu ? >>> >>> Peter >>> >>> >>> >>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote: >>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. >>>> >>>> I'll try -O3 on my machine also with Ubuntu. >>>> >>>> Regards, >>>> Cheng >>>> >>>> -----邮件原件----- >>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>> 发送时间: 2014年9月17日 0:32 >>>> 收件人: Cheng Li; QuantLib Mailing Lists >>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>>> MT >>>> >>>> Hi Cheng, >>>> >>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). >>>> >>>> Does anyone have an idea where the different behaviour under gcc / >>>> linux and msvc might come from (and how to improve the msvc side if >>>> possible) ? >>>> >>>> Kind regards >>>> Peter >>>> >>>> >>>> >>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >>>>> Thanks Peter. >>>>> >>>>> Regards, >>>>> Cheng >>>>> >>>>> 发自我的 iPad >>>>> >>>>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>>>>> >>>>>> I will have a look on monday ( I have a Windows machine at work ) >>>>>> and see how it works there >>>>>> >>>>>> Thanks >>>>>> Peter >>>>>> >>>>>> Von meinem iPhone gesendet >>>>>> >>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>>>>> >>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>>>>> under release mode >>>>>>> >>>>>>> 发自我的 iPad >>>>>>> >>>>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>>>>> >>>>>>>> Hi Cheng, >>>>>>>> >>>>>>>> no, I get better timings with the dcmt implementation, e.g. for >>>>>>>> 1E8 numbers >>>>>>>> >>>>>>>> dcmt 0.982s >>>>>>>> quantlib 1.159s >>>>>>>> >>>>>>>> on my computer. Can you post your platform and compiler >>>>>>>> settings, so that I can try to reproduce ? >>>>>>>> >>>>>>>> Thanks >>>>>>>> Peter >>>>>>>> >>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>>>>> Hi Peter, >>>>>>>>> >>>>>>>>> I have used your wrapper dcmt library and test with following >>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL >>>>>>>>> original MT. Is this consistent with your side? >>>>>>>>> >>>>>>>>> #include <ql/quantlib.hpp> >>>>>>>>> #include <boost/timer.hpp> >>>>>>>>> #include <iostream> >>>>>>>>> >>>>>>>>> using namespace QuantLib; >>>>>>>>> using namespace std; >>>>>>>>> >>>>>>>>> int main() { >>>>>>>>> >>>>>>>>> int samples; >>>>>>>>> cin >> samples; >>>>>>>>> boost::timer myTimer; >>>>>>>>> >>>>>>>>> MersenneTwisterUniformRng orignalMT; >>>>>>>>> for(Size i=0; i<samples; ++i) >>>>>>>>> orignalMT.next(); >>>>>>>>> >>>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>>> >>>>>>>>> myTimer.restart(); >>>>>>>>> >>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>>>>> >>>>>>>>> for(Size i=0; i<samples; ++i) { >>>>>>>>> mt.next(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>>> >>>>>>>>> int n; >>>>>>>>> std::cin>>n; >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Cheng >>>>>>>>> >>>>>>>>> -----邮件原件----- >>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>>>>> 发送时间: 2014年9月6日 20:48 >>>>>>>>> 收件人: Joseph Wang >>>>>>>>> 抄送: QuantLib Mailing Lists >>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>>>>>>>> MT >>>>>>>>> >>>>>>>>> Hi Joseph, all, >>>>>>>>> >>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>>>>> Mersenne Twisters). >>>>>>>>> >>>>>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>>>>> >>>>>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>>>>> for use in at most 8 parallel threads), for the "standard" >>>>>>>>> value p = 19937 and word size 32, which one can instantiate >>>>>>>>> with >>>>>>>>> >>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>>>>> >>>>>>>>> for i = 0, ... , 7. >>>>>>>>> >>>>>>>>> In addition the speed of random number generation seems a bit >>>>>>>>> faster in the dcmt library than with the original ql twister. >>>>>>>>> I observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>>>>> >>>>>>>>> All this is of course experimental and not well tested, so any >>>>>>>>> feedback and experiences are very welcome. I'd be very >>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>>>>> >>>>>>>>> Peter >>>>>>>>> >>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>>>>> I've uploaded the changes to the >>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>>>>> openmp-mcario has some changes that need some more work. >>>>>>>>>> >>>>>>>>>> I've gotten the MC to work by generating the paths in a >>>>>>>>>> critical >>>>>>>>> situation. >>>>>>>>>> Calculating the prices once I have the path is multithreaded, >>>>>>>>>> but right now I need to generate the paths in a single thread >>>>>>>>>> to make sure that the same sequence is generated. >>>>>>>>>> >>>>>>>>>> The big issue right now is that there is a race condition in >>>>>>>>>> the calculation of barrier options which is causing one >>>>>>>>>> regression test to fail. The problem is that the random >>>>>>>>>> number generator is being called in BarrierPathPricer, and >>>>>>>>>> since that is run multithread, the sequence that is being >>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already. >>>>>>>>>> >>>>>>>>>> I think that fixing this is going to need some code >>>>>>>>>> restructuring, but I'd like to get some thoughts as to how to >>>>>>>>>> do this. Basically, the interface needs to be changed >>>>>>>>>> slightly so that the random numbers are drawn in a fixed >>>>>>>>>> order, and that might mean one call to get any additional >>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> ----- >>>>>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>>>>> webinars can help you accelerate application performance. >>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. >>>>>>>>>> Get the most from the latest Intel processors and >>>>>>>>>> coprocessors. See abstracts and register > >>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/41 >>>>>>>>>> 4 >>>>>>>>>> 0 / o stg.c lktrk >>>>>>>>>> _______________________________________________ >>>>>>>>>> QuantLib-dev mailing list >>>>>>>>>> [hidden email] >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>>> >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> - >>>>>>>>> - >>>>>>>>> - >>>>>>>>> - >>>>>>>>> ---------- >>>>>>>>> -- >>>>>>>>> Slashdot TV. >>>>>>>>> Video for Nerds. Stuff that matters. >>>>>>>>> http://tv.slashdot.org/ >>>>>>>>> _______________________________________________ >>>>>>>>> QuantLib-dev mailing list >>>>>>>>> [hidden email] >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>>> >>>> >>> >>> >> > ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Hi,
I think I could further improve the performance of the precomputed twisters (i.e. the ones constructed as MersenneTwisterCustomRng<Mtdesc19937_5> mt(42); ). Now they seem to be just as fast as the original one (I only tested on Linux). The PR is updated. Cheng, would you maybe like to double check ? Thanks a lot Peter On 23 September 2014 03:50, cheng li <[hidden email]> wrote: > Hi Peter, > > On my side the performance is also improved. Now around 2.5 slow down. Thanks for your help. > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月22日 16:05 > 收件人: cheng li > 抄送: QuantLib Mailing Lists > 主题: Re: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT > > yes, please. The slowdown on Windows on my office computer is around 1.6 now. > best regards > Peter > > On 22 September 2014 03:48, cheng li <[hidden email]> wrote: >> Hi Peter, >> >> Thanks for your effort. I'll definitely have a try:) >> >> Regards, >> Cheng >> >> -----邮件原件----- >> 发件人: Peter Caspers [mailto:[hidden email]] >> 发送时间: 2014年9月21日 23:11 >> 收件人: cheng.li >> 抄送: QuantLib Mailing Lists >> 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >> Creator MT >> >> Hi Cheng, >> >> I switched to a template class for precomputed twisters, which is >> faster by a factor of 2 (450ms instead of 870ms). This can be >> instantiated with >> >> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42); >> >> with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime. >> >> The pull request is updated accordingly. >> >> Best regards >> Peter >> >> >> >> >> On 21 September 2014 08:11, cheng.li <[hidden email]> wrote: >>> Hi Peter, >>> >>> Thanks for your hard work. I think our results are consistent. >>> >>> Regards, >>> Cheng >>> >>> -----邮件原件----- >>> 发件人: Peter Caspers [mailto:[hidden email]] >>> 发送时间: 2014年9月21日 0:33 >>> 收件人: cheng li >>> 抄送: QuantLib Mailing Lists >>> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>> Creator MT >>> >>> Hi Cheng, >>> >>> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program. >>> >>> Actually on Windows (same machine on which I run Ubuntu, which >>> doesn't really matter, because my computer in office gives very >>> similar >>> timings) I get for 1E8 random numbers generated (with O2) >>> >>> 400ms / 1100ms >>> >>> for the original ql mt / dynamic creator mt. The ql mt is just as >>> fast as the boost mt implementation by the way. On Ubuntu with gcc >>> 4.8.1 and O3 I get >>> >>> 290ms / 870ms >>> >>> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0. >>> >>> If I directly call the original C routine without using the wrapper object, I get 720ms. >>> >>> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms. >>> >>> This means, the wrapper introduces a slow down by 20% which seems not too bad. >>> >>> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment. >>> >>> What is your opinion on this ? >>> >>> Peter >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases. >>> >>> Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms / >>> 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0 >>> => 1340 ms / 1150 ms >>> >>> clang >>> 290 >>> 720 >>> 870 >>> >>> (c 730) >>> >>> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt. >>> >>> At the moment I stil don't know what it is. >>> >>> On 18 September 2014 03:33, cheng li <[hidden email]> wrote: >>>> Let me try your statement once I have a time. >>>> >>>> Regards, >>>> Cheng >>>> >>>> -----邮件原件----- >>>> 发件人: cheng li [mailto:[hidden email]] >>>> 发送时间: 2014年9月18日 9:18 >>>> 收件人: 'Peter Caspers' >>>> 抄送: 'QuantLib Mailing Lists' >>>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>>> Creator MT >>>> >>>> Hi Peter, >>>> >>>> I used gcc 4.8.2. >>>> >>>> My result with O3 optimization is still not good. Similar >>>> performance of new MT ( about 3~4X speed down) >>>> >>>> I used such statement to turn on o3 optimization before I do >>>> ./configure for QuantLib, >>>> >>>> Export CXXFLAGS="-g -O3" >>>> >>>> Am I right? >>>> >>>> Regards, >>>> Cheng >>>> >>>> -----邮件原件----- >>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>> 发送时间: 2014年9月18日 0:36 >>>> 收件人: cheng li >>>> 抄送: QuantLib Mailing Lists >>>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>>> Creator MT >>>> >>>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1). >>>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. >>>> >>>> Which compiler do you use on Ubuntu ? >>>> >>>> Peter >>>> >>>> >>>> >>>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote: >>>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. >>>>> >>>>> I'll try -O3 on my machine also with Ubuntu. >>>>> >>>>> Regards, >>>>> Cheng >>>>> >>>>> -----邮件原件----- >>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>> 发送时间: 2014年9月17日 0:32 >>>>> 收件人: Cheng Li; QuantLib Mailing Lists >>>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>>>> MT >>>>> >>>>> Hi Cheng, >>>>> >>>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). >>>>> >>>>> Does anyone have an idea where the different behaviour under gcc / >>>>> linux and msvc might come from (and how to improve the msvc side if >>>>> possible) ? >>>>> >>>>> Kind regards >>>>> Peter >>>>> >>>>> >>>>> >>>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >>>>>> Thanks Peter. >>>>>> >>>>>> Regards, >>>>>> Cheng >>>>>> >>>>>> 发自我的 iPad >>>>>> >>>>>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>>>>>> >>>>>>> I will have a look on monday ( I have a Windows machine at work ) >>>>>>> and see how it works there >>>>>>> >>>>>>> Thanks >>>>>>> Peter >>>>>>> >>>>>>> Von meinem iPhone gesendet >>>>>>> >>>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>>>>>> >>>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>>>>>> under release mode >>>>>>>> >>>>>>>> 发自我的 iPad >>>>>>>> >>>>>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>>>>>> >>>>>>>>> Hi Cheng, >>>>>>>>> >>>>>>>>> no, I get better timings with the dcmt implementation, e.g. for >>>>>>>>> 1E8 numbers >>>>>>>>> >>>>>>>>> dcmt 0.982s >>>>>>>>> quantlib 1.159s >>>>>>>>> >>>>>>>>> on my computer. Can you post your platform and compiler >>>>>>>>> settings, so that I can try to reproduce ? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Peter >>>>>>>>> >>>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>>>>>> Hi Peter, >>>>>>>>>> >>>>>>>>>> I have used your wrapper dcmt library and test with following >>>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL >>>>>>>>>> original MT. Is this consistent with your side? >>>>>>>>>> >>>>>>>>>> #include <ql/quantlib.hpp> >>>>>>>>>> #include <boost/timer.hpp> >>>>>>>>>> #include <iostream> >>>>>>>>>> >>>>>>>>>> using namespace QuantLib; >>>>>>>>>> using namespace std; >>>>>>>>>> >>>>>>>>>> int main() { >>>>>>>>>> >>>>>>>>>> int samples; >>>>>>>>>> cin >> samples; >>>>>>>>>> boost::timer myTimer; >>>>>>>>>> >>>>>>>>>> MersenneTwisterUniformRng orignalMT; >>>>>>>>>> for(Size i=0; i<samples; ++i) >>>>>>>>>> orignalMT.next(); >>>>>>>>>> >>>>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>>>> >>>>>>>>>> myTimer.restart(); >>>>>>>>>> >>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>>>>>> >>>>>>>>>> for(Size i=0; i<samples; ++i) { >>>>>>>>>> mt.next(); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>>>> >>>>>>>>>> int n; >>>>>>>>>> std::cin>>n; >>>>>>>>>> return 0; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Cheng >>>>>>>>>> >>>>>>>>>> -----邮件原件----- >>>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>>>>>> 发送时间: 2014年9月6日 20:48 >>>>>>>>>> 收件人: Joseph Wang >>>>>>>>>> 抄送: QuantLib Mailing Lists >>>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>>>>>>>>> MT >>>>>>>>>> >>>>>>>>>> Hi Joseph, all, >>>>>>>>>> >>>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>>>>>> Mersenne Twisters). >>>>>>>>>> >>>>>>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>>>>>> >>>>>>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>>>>>> for use in at most 8 parallel threads), for the "standard" >>>>>>>>>> value p = 19937 and word size 32, which one can instantiate >>>>>>>>>> with >>>>>>>>>> >>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>>>>>> >>>>>>>>>> for i = 0, ... , 7. >>>>>>>>>> >>>>>>>>>> In addition the speed of random number generation seems a bit >>>>>>>>>> faster in the dcmt library than with the original ql twister. >>>>>>>>>> I observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>>>>>> >>>>>>>>>> All this is of course experimental and not well tested, so any >>>>>>>>>> feedback and experiences are very welcome. I'd be very >>>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>>>>>> >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>>>>>> I've uploaded the changes to the >>>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>>>>>> openmp-mcario has some changes that need some more work. >>>>>>>>>>> >>>>>>>>>>> I've gotten the MC to work by generating the paths in a >>>>>>>>>>> critical >>>>>>>>>> situation. >>>>>>>>>>> Calculating the prices once I have the path is multithreaded, >>>>>>>>>>> but right now I need to generate the paths in a single thread >>>>>>>>>>> to make sure that the same sequence is generated. >>>>>>>>>>> >>>>>>>>>>> The big issue right now is that there is a race condition in >>>>>>>>>>> the calculation of barrier options which is causing one >>>>>>>>>>> regression test to fail. The problem is that the random >>>>>>>>>>> number generator is being called in BarrierPathPricer, and >>>>>>>>>>> since that is run multithread, the sequence that is being >>>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already. >>>>>>>>>>> >>>>>>>>>>> I think that fixing this is going to need some code >>>>>>>>>>> restructuring, but I'd like to get some thoughts as to how to >>>>>>>>>>> do this. Basically, the interface needs to be changed >>>>>>>>>>> slightly so that the random numbers are drawn in a fixed >>>>>>>>>>> order, and that might mean one call to get any additional >>>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> ----- >>>>>>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>>>>>> webinars can help you accelerate application performance. >>>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. >>>>>>>>>>> Get the most from the latest Intel processors and >>>>>>>>>>> coprocessors. See abstracts and register > >>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/41 >>>>>>>>>>> 4 >>>>>>>>>>> 0 / o stg.c lktrk >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> QuantLib-dev mailing list >>>>>>>>>>> [hidden email] >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>>>> >>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> ---------- >>>>>>>>>> -- >>>>>>>>>> Slashdot TV. >>>>>>>>>> Video for Nerds. Stuff that matters. >>>>>>>>>> http://tv.slashdot.org/ >>>>>>>>>> _______________________________________________ >>>>>>>>>> QuantLib-dev mailing list >>>>>>>>>> [hidden email] >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>>>> >>>>> >>>> >>>> >>> >> > ------------------------------------------------------------------------------ _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Will do, my pleasure:)
Regards, Cheng -----邮件原件----- 发件人: Peter Caspers [mailto:[hidden email]] 发送时间: 2014年10月27日 3:49 收件人: cheng li 抄送: QuantLib Mailing Lists 主题: Re: 答复: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT Hi, I think I could further improve the performance of the precomputed twisters (i.e. the ones constructed as MersenneTwisterCustomRng<Mtdesc19937_5> mt(42); ). Now they seem to be just as fast as the original one (I only tested on Linux). The PR is updated. Cheng, would you maybe like to double check ? Thanks a lot Peter On 23 September 2014 03:50, cheng li <[hidden email]> wrote: > Hi Peter, > > On my side the performance is also improved. Now around 2.5 slow down. Thanks for your help. > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月22日 16:05 > 收件人: cheng li > 抄送: QuantLib Mailing Lists > 主题: Re: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : > Dynamic Creator MT > > yes, please. The slowdown on Windows on my office computer is around 1.6 now. > best regards > Peter > > On 22 September 2014 03:48, cheng li <[hidden email]> wrote: >> Hi Peter, >> >> Thanks for your effort. I'll definitely have a try:) >> >> Regards, >> Cheng >> >> -----邮件原件----- >> 发件人: Peter Caspers [mailto:[hidden email]] >> 发送时间: 2014年9月21日 23:11 >> 收件人: cheng.li >> 抄送: QuantLib Mailing Lists >> 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : >> Dynamic Creator MT >> >> Hi Cheng, >> >> I switched to a template class for precomputed twisters, which is >> faster by a factor of 2 (450ms instead of 870ms). This can be >> instantiated with >> >> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42); >> >> with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime. >> >> The pull request is updated accordingly. >> >> Best regards >> Peter >> >> >> >> >> On 21 September 2014 08:11, cheng.li <[hidden email]> wrote: >>> Hi Peter, >>> >>> Thanks for your hard work. I think our results are consistent. >>> >>> Regards, >>> Cheng >>> >>> -----邮件原件----- >>> 发件人: Peter Caspers [mailto:[hidden email]] >>> 发送时间: 2014年9月21日 0:33 >>> 收件人: cheng li >>> 抄送: QuantLib Mailing Lists >>> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>> Creator MT >>> >>> Hi Cheng, >>> >>> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program. >>> >>> Actually on Windows (same machine on which I run Ubuntu, which >>> doesn't really matter, because my computer in office gives very >>> similar >>> timings) I get for 1E8 random numbers generated (with O2) >>> >>> 400ms / 1100ms >>> >>> for the original ql mt / dynamic creator mt. The ql mt is just as >>> fast as the boost mt implementation by the way. On Ubuntu with gcc >>> 4.8.1 and O3 I get >>> >>> 290ms / 870ms >>> >>> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0. >>> >>> If I directly call the original C routine without using the wrapper object, I get 720ms. >>> >>> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms. >>> >>> This means, the wrapper introduces a slow down by 20% which seems not too bad. >>> >>> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment. >>> >>> What is your opinion on this ? >>> >>> Peter >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases. >>> >>> Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms >>> / >>> 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0 >>> => 1340 ms / 1150 ms >>> >>> clang >>> 290 >>> 720 >>> 870 >>> >>> (c 730) >>> >>> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt. >>> >>> At the moment I stil don't know what it is. >>> >>> On 18 September 2014 03:33, cheng li <[hidden email]> wrote: >>>> Let me try your statement once I have a time. >>>> >>>> Regards, >>>> Cheng >>>> >>>> -----邮件原件----- >>>> 发件人: cheng li [mailto:[hidden email]] >>>> 发送时间: 2014年9月18日 9:18 >>>> 收件人: 'Peter Caspers' >>>> 抄送: 'QuantLib Mailing Lists' >>>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>>> Creator MT >>>> >>>> Hi Peter, >>>> >>>> I used gcc 4.8.2. >>>> >>>> My result with O3 optimization is still not good. Similar >>>> performance of new MT ( about 3~4X speed down) >>>> >>>> I used such statement to turn on o3 optimization before I do >>>> ./configure for QuantLib, >>>> >>>> Export CXXFLAGS="-g -O3" >>>> >>>> Am I right? >>>> >>>> Regards, >>>> Cheng >>>> >>>> -----邮件原件----- >>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>> 发送时间: 2014年9月18日 0:36 >>>> 收件人: cheng li >>>> 抄送: QuantLib Mailing Lists >>>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>>> Creator MT >>>> >>>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1). >>>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. >>>> >>>> Which compiler do you use on Ubuntu ? >>>> >>>> Peter >>>> >>>> >>>> >>>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote: >>>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. >>>>> >>>>> I'll try -O3 on my machine also with Ubuntu. >>>>> >>>>> Regards, >>>>> Cheng >>>>> >>>>> -----邮件原件----- >>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>> 发送时间: 2014年9月17日 0:32 >>>>> 收件人: Cheng Li; QuantLib Mailing Lists >>>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>>>> MT >>>>> >>>>> Hi Cheng, >>>>> >>>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). >>>>> >>>>> Does anyone have an idea where the different behaviour under gcc / >>>>> linux and msvc might come from (and how to improve the msvc side >>>>> if >>>>> possible) ? >>>>> >>>>> Kind regards >>>>> Peter >>>>> >>>>> >>>>> >>>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >>>>>> Thanks Peter. >>>>>> >>>>>> Regards, >>>>>> Cheng >>>>>> >>>>>> 发自我的 iPad >>>>>> >>>>>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>>>>>> >>>>>>> I will have a look on monday ( I have a Windows machine at work >>>>>>> ) and see how it works there >>>>>>> >>>>>>> Thanks >>>>>>> Peter >>>>>>> >>>>>>> Von meinem iPhone gesendet >>>>>>> >>>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>>>>>> >>>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>>>>>> under release mode >>>>>>>> >>>>>>>> 发自我的 iPad >>>>>>>> >>>>>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>>>>>> >>>>>>>>> Hi Cheng, >>>>>>>>> >>>>>>>>> no, I get better timings with the dcmt implementation, e.g. >>>>>>>>> for >>>>>>>>> 1E8 numbers >>>>>>>>> >>>>>>>>> dcmt 0.982s >>>>>>>>> quantlib 1.159s >>>>>>>>> >>>>>>>>> on my computer. Can you post your platform and compiler >>>>>>>>> settings, so that I can try to reproduce ? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Peter >>>>>>>>> >>>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>>>>>> Hi Peter, >>>>>>>>>> >>>>>>>>>> I have used your wrapper dcmt library and test with following >>>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the >>>>>>>>>> QL original MT. Is this consistent with your side? >>>>>>>>>> >>>>>>>>>> #include <ql/quantlib.hpp> >>>>>>>>>> #include <boost/timer.hpp> >>>>>>>>>> #include <iostream> >>>>>>>>>> >>>>>>>>>> using namespace QuantLib; >>>>>>>>>> using namespace std; >>>>>>>>>> >>>>>>>>>> int main() { >>>>>>>>>> >>>>>>>>>> int samples; >>>>>>>>>> cin >> samples; >>>>>>>>>> boost::timer myTimer; >>>>>>>>>> >>>>>>>>>> MersenneTwisterUniformRng orignalMT; >>>>>>>>>> for(Size i=0; i<samples; ++i) >>>>>>>>>> orignalMT.next(); >>>>>>>>>> >>>>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>>>> >>>>>>>>>> myTimer.restart(); >>>>>>>>>> >>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>>>>>> >>>>>>>>>> for(Size i=0; i<samples; ++i) { >>>>>>>>>> mt.next(); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>>>> >>>>>>>>>> int n; >>>>>>>>>> std::cin>>n; >>>>>>>>>> return 0; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Cheng >>>>>>>>>> >>>>>>>>>> -----邮件原件----- >>>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>>>>>> 发送时间: 2014年9月6日 20:48 >>>>>>>>>> 收件人: Joseph Wang >>>>>>>>>> 抄送: QuantLib Mailing Lists >>>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>>>>>>>>> Creator MT >>>>>>>>>> >>>>>>>>>> Hi Joseph, all, >>>>>>>>>> >>>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>>>>>> Mersenne Twisters). >>>>>>>>>> >>>>>>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>>>>>> >>>>>>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>>>>>> for use in at most 8 parallel threads), for the "standard" >>>>>>>>>> value p = 19937 and word size 32, which one can instantiate >>>>>>>>>> with >>>>>>>>>> >>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>>>>>> >>>>>>>>>> for i = 0, ... , 7. >>>>>>>>>> >>>>>>>>>> In addition the speed of random number generation seems a bit >>>>>>>>>> faster in the dcmt library than with the original ql twister. >>>>>>>>>> I observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>>>>>> >>>>>>>>>> All this is of course experimental and not well tested, so >>>>>>>>>> any feedback and experiences are very welcome. I'd be very >>>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>>>>>> >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>>>>>> I've uploaded the changes to the >>>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>>>>>> openmp-mcario has some changes that need some more work. >>>>>>>>>>> >>>>>>>>>>> I've gotten the MC to work by generating the paths in a >>>>>>>>>>> critical >>>>>>>>>> situation. >>>>>>>>>>> Calculating the prices once I have the path is >>>>>>>>>>> multithreaded, but right now I need to generate the paths in >>>>>>>>>>> a single thread to make sure that the same sequence is generated. >>>>>>>>>>> >>>>>>>>>>> The big issue right now is that there is a race condition in >>>>>>>>>>> the calculation of barrier options which is causing one >>>>>>>>>>> regression test to fail. The problem is that the random >>>>>>>>>>> number generator is being called in BarrierPathPricer, and >>>>>>>>>>> since that is run multithread, the sequence that is being >>>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already. >>>>>>>>>>> >>>>>>>>>>> I think that fixing this is going to need some code >>>>>>>>>>> restructuring, but I'd like to get some thoughts as to how >>>>>>>>>>> to do this. Basically, the interface needs to be changed >>>>>>>>>>> slightly so that the random numbers are drawn in a fixed >>>>>>>>>>> order, and that might mean one call to get any additional >>>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> ----- >>>>>>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>>>>>> webinars can help you accelerate application performance. >>>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. >>>>>>>>>>> Get the most from the latest Intel processors and >>>>>>>>>>> coprocessors. See abstracts and register > >>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4 >>>>>>>>>>> 1 >>>>>>>>>>> 4 >>>>>>>>>>> 0 / o stg.c lktrk >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> QuantLib-dev mailing list >>>>>>>>>>> [hidden email] >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> ---------- >>>>>>>>>> -- >>>>>>>>>> Slashdot TV. >>>>>>>>>> Video for Nerds. Stuff that matters. >>>>>>>>>> http://tv.slashdot.org/ >>>>>>>>>> _______________________________________________ >>>>>>>>>> QuantLib-dev mailing list >>>>>>>>>> [hidden email] >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>>>> >>>>> >>>> >>>> >>> >> > ------------------------------------------------------------------------------ _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
In reply to this post by Peter Caspers-4
Hi Peter,
It works great works on Windows. Try 9999999999 samples: Original MT: 35.63 Daynamic MT: 37.03 And also I try 100000, 100000000, 1000000000 samples, The result are similar and the elapsed time grows linearly. I tried vc++ 2012. The vc++ 2010 will work same in my opnion. I will get back to you when vc++ 2010 test finished. Regards, Cheng -----邮件原件----- 发件人: Peter Caspers [mailto:[hidden email]] 发送时间: 2014年10月27日 3:49 收件人: cheng li 抄送: QuantLib Mailing Lists 主题: Re: 答复: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT Hi, I think I could further improve the performance of the precomputed twisters (i.e. the ones constructed as MersenneTwisterCustomRng<Mtdesc19937_5> mt(42); ). Now they seem to be just as fast as the original one (I only tested on Linux). The PR is updated. Cheng, would you maybe like to double check ? Thanks a lot Peter On 23 September 2014 03:50, cheng li <[hidden email]> wrote: > Hi Peter, > > On my side the performance is also improved. Now around 2.5 slow down. Thanks for your help. > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年9月22日 16:05 > 收件人: cheng li > 抄送: QuantLib Mailing Lists > 主题: Re: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : > Dynamic Creator MT > > yes, please. The slowdown on Windows on my office computer is around 1.6 now. > best regards > Peter > > On 22 September 2014 03:48, cheng li <[hidden email]> wrote: >> Hi Peter, >> >> Thanks for your effort. I'll definitely have a try:) >> >> Regards, >> Cheng >> >> -----邮件原件----- >> 发件人: Peter Caspers [mailto:[hidden email]] >> 发送时间: 2014年9月21日 23:11 >> 收件人: cheng.li >> 抄送: QuantLib Mailing Lists >> 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : >> Dynamic Creator MT >> >> Hi Cheng, >> >> I switched to a template class for precomputed twisters, which is >> faster by a factor of 2 (450ms instead of 870ms). This can be >> instantiated with >> >> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42); >> >> with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime. >> >> The pull request is updated accordingly. >> >> Best regards >> Peter >> >> >> >> >> On 21 September 2014 08:11, cheng.li <[hidden email]> wrote: >>> Hi Peter, >>> >>> Thanks for your hard work. I think our results are consistent. >>> >>> Regards, >>> Cheng >>> >>> -----邮件原件----- >>> 发件人: Peter Caspers [mailto:[hidden email]] >>> 发送时间: 2014年9月21日 0:33 >>> 收件人: cheng li >>> 抄送: QuantLib Mailing Lists >>> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>> Creator MT >>> >>> Hi Cheng, >>> >>> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program. >>> >>> Actually on Windows (same machine on which I run Ubuntu, which >>> doesn't really matter, because my computer in office gives very >>> similar >>> timings) I get for 1E8 random numbers generated (with O2) >>> >>> 400ms / 1100ms >>> >>> for the original ql mt / dynamic creator mt. The ql mt is just as >>> fast as the boost mt implementation by the way. On Ubuntu with gcc >>> 4.8.1 and O3 I get >>> >>> 290ms / 870ms >>> >>> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0. >>> >>> If I directly call the original C routine without using the wrapper object, I get 720ms. >>> >>> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms. >>> >>> This means, the wrapper introduces a slow down by 20% which seems not too bad. >>> >>> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment. >>> >>> What is your opinion on this ? >>> >>> Peter >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases. >>> >>> Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms >>> / >>> 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0 >>> => 1340 ms / 1150 ms >>> >>> clang >>> 290 >>> 720 >>> 870 >>> >>> (c 730) >>> >>> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt. >>> >>> At the moment I stil don't know what it is. >>> >>> On 18 September 2014 03:33, cheng li <[hidden email]> wrote: >>>> Let me try your statement once I have a time. >>>> >>>> Regards, >>>> Cheng >>>> >>>> -----邮件原件----- >>>> 发件人: cheng li [mailto:[hidden email]] >>>> 发送时间: 2014年9月18日 9:18 >>>> 收件人: 'Peter Caspers' >>>> 抄送: 'QuantLib Mailing Lists' >>>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>>> Creator MT >>>> >>>> Hi Peter, >>>> >>>> I used gcc 4.8.2. >>>> >>>> My result with O3 optimization is still not good. Similar >>>> performance of new MT ( about 3~4X speed down) >>>> >>>> I used such statement to turn on o3 optimization before I do >>>> ./configure for QuantLib, >>>> >>>> Export CXXFLAGS="-g -O3" >>>> >>>> Am I right? >>>> >>>> Regards, >>>> Cheng >>>> >>>> -----邮件原件----- >>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>> 发送时间: 2014年9月18日 0:36 >>>> 收件人: cheng li >>>> 抄送: QuantLib Mailing Lists >>>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>>> Creator MT >>>> >>>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1). >>>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. >>>> >>>> Which compiler do you use on Ubuntu ? >>>> >>>> Peter >>>> >>>> >>>> >>>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote: >>>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. >>>>> >>>>> I'll try -O3 on my machine also with Ubuntu. >>>>> >>>>> Regards, >>>>> Cheng >>>>> >>>>> -----邮件原件----- >>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>> 发送时间: 2014年9月17日 0:32 >>>>> 收件人: Cheng Li; QuantLib Mailing Lists >>>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>>>> MT >>>>> >>>>> Hi Cheng, >>>>> >>>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). >>>>> >>>>> Does anyone have an idea where the different behaviour under gcc / >>>>> linux and msvc might come from (and how to improve the msvc side >>>>> if >>>>> possible) ? >>>>> >>>>> Kind regards >>>>> Peter >>>>> >>>>> >>>>> >>>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >>>>>> Thanks Peter. >>>>>> >>>>>> Regards, >>>>>> Cheng >>>>>> >>>>>> 发自我的 iPad >>>>>> >>>>>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>>>>>> >>>>>>> I will have a look on monday ( I have a Windows machine at work >>>>>>> ) and see how it works there >>>>>>> >>>>>>> Thanks >>>>>>> Peter >>>>>>> >>>>>>> Von meinem iPhone gesendet >>>>>>> >>>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>>>>>> >>>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>>>>>> under release mode >>>>>>>> >>>>>>>> 发自我的 iPad >>>>>>>> >>>>>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>>>>>> >>>>>>>>> Hi Cheng, >>>>>>>>> >>>>>>>>> no, I get better timings with the dcmt implementation, e.g. >>>>>>>>> for >>>>>>>>> 1E8 numbers >>>>>>>>> >>>>>>>>> dcmt 0.982s >>>>>>>>> quantlib 1.159s >>>>>>>>> >>>>>>>>> on my computer. Can you post your platform and compiler >>>>>>>>> settings, so that I can try to reproduce ? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Peter >>>>>>>>> >>>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>>>>>> Hi Peter, >>>>>>>>>> >>>>>>>>>> I have used your wrapper dcmt library and test with following >>>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the >>>>>>>>>> QL original MT. Is this consistent with your side? >>>>>>>>>> >>>>>>>>>> #include <ql/quantlib.hpp> >>>>>>>>>> #include <boost/timer.hpp> >>>>>>>>>> #include <iostream> >>>>>>>>>> >>>>>>>>>> using namespace QuantLib; >>>>>>>>>> using namespace std; >>>>>>>>>> >>>>>>>>>> int main() { >>>>>>>>>> >>>>>>>>>> int samples; >>>>>>>>>> cin >> samples; >>>>>>>>>> boost::timer myTimer; >>>>>>>>>> >>>>>>>>>> MersenneTwisterUniformRng orignalMT; >>>>>>>>>> for(Size i=0; i<samples; ++i) >>>>>>>>>> orignalMT.next(); >>>>>>>>>> >>>>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>>>> >>>>>>>>>> myTimer.restart(); >>>>>>>>>> >>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>>>>>> >>>>>>>>>> for(Size i=0; i<samples; ++i) { >>>>>>>>>> mt.next(); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>>>> >>>>>>>>>> int n; >>>>>>>>>> std::cin>>n; >>>>>>>>>> return 0; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Cheng >>>>>>>>>> >>>>>>>>>> -----邮件原件----- >>>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>>>>>> 发送时间: 2014年9月6日 20:48 >>>>>>>>>> 收件人: Joseph Wang >>>>>>>>>> 抄送: QuantLib Mailing Lists >>>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>>>>>>>>> Creator MT >>>>>>>>>> >>>>>>>>>> Hi Joseph, all, >>>>>>>>>> >>>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>>>>>> Mersenne Twisters). >>>>>>>>>> >>>>>>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>>>>>> >>>>>>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>>>>>> for use in at most 8 parallel threads), for the "standard" >>>>>>>>>> value p = 19937 and word size 32, which one can instantiate >>>>>>>>>> with >>>>>>>>>> >>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>>>>>> >>>>>>>>>> for i = 0, ... , 7. >>>>>>>>>> >>>>>>>>>> In addition the speed of random number generation seems a bit >>>>>>>>>> faster in the dcmt library than with the original ql twister. >>>>>>>>>> I observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>>>>>> >>>>>>>>>> All this is of course experimental and not well tested, so >>>>>>>>>> any feedback and experiences are very welcome. I'd be very >>>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>>>>>> >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>>>>>> I've uploaded the changes to the >>>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>>>>>> openmp-mcario has some changes that need some more work. >>>>>>>>>>> >>>>>>>>>>> I've gotten the MC to work by generating the paths in a >>>>>>>>>>> critical >>>>>>>>>> situation. >>>>>>>>>>> Calculating the prices once I have the path is >>>>>>>>>>> multithreaded, but right now I need to generate the paths in >>>>>>>>>>> a single thread to make sure that the same sequence is generated. >>>>>>>>>>> >>>>>>>>>>> The big issue right now is that there is a race condition in >>>>>>>>>>> the calculation of barrier options which is causing one >>>>>>>>>>> regression test to fail. The problem is that the random >>>>>>>>>>> number generator is being called in BarrierPathPricer, and >>>>>>>>>>> since that is run multithread, the sequence that is being >>>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already. >>>>>>>>>>> >>>>>>>>>>> I think that fixing this is going to need some code >>>>>>>>>>> restructuring, but I'd like to get some thoughts as to how >>>>>>>>>>> to do this. Basically, the interface needs to be changed >>>>>>>>>>> slightly so that the random numbers are drawn in a fixed >>>>>>>>>>> order, and that might mean one call to get any additional >>>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> ----- >>>>>>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>>>>>> webinars can help you accelerate application performance. >>>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. >>>>>>>>>>> Get the most from the latest Intel processors and >>>>>>>>>>> coprocessors. See abstracts and register > >>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4 >>>>>>>>>>> 1 >>>>>>>>>>> 4 >>>>>>>>>>> 0 / o stg.c lktrk >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> QuantLib-dev mailing list >>>>>>>>>>> [hidden email] >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> - >>>>>>>>>> ---------- >>>>>>>>>> -- >>>>>>>>>> Slashdot TV. >>>>>>>>>> Video for Nerds. Stuff that matters. >>>>>>>>>> http://tv.slashdot.org/ >>>>>>>>>> _______________________________________________ >>>>>>>>>> QuantLib-dev mailing list >>>>>>>>>> [hidden email] >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>>>> >>>>> >>>> >>>> >>> >> > ------------------------------------------------------------------------------ _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Hello,
I made an attempt to add multi threading to the general Monte Carlo framework. It is built around a multi threaded Mersenne twister class using the dcmt library wrapper mentioned earlier. From an end user perspective multi threading is enabled by compiling the library and application with OpenMP enabled (--enable-openmp if you use configure) and using a multithreaded RNG trait like for example engine = MakeMCEuropeanHestonEngine<PseudoRandomMultiThreaded>(process) .withStepsPerYear(11) .withAntitheticVariate() .withSamples(500000) .withSeed(1234); where the usual PseudoRandom is replaced by PseudoRandomMultiThreaded. A nice property is that the results can be reproduced (up to round off errors) as long as the seed is the same (of course) and the number of threads being used are the same. I only added a critical section when adding the path results to the sample accumulator in MonteCarloModel (which does not cost much in my tests). However the MC engines have to be carefully reviewed on a single basis before using them multi threaded - usually they will give non deterministic crashes with scary error messages. In particular the path pricer and the process used for path generation must be made thread safe. I did this (as an example) for the MCEuropeanHestonEngine which essentially meant to ensure that these calls riskFreeRate_->forwardRate(t0, t1, Continuous) - dividendYield_->forwardRate(t0, t1, Continuous); in the HestonProcess do not trigger write operations in the yield term structures' underlying LazyObject's. That can be done by a critical section for these calls, which however is a performance killer (on 8 threads roughly no speed up is achieved effectively compared to a single thread then). A better solution is to trigger the computation of the two yield term structures before the simulation, which I do when the path pricer is created. It seems clear that during the simulation no notifications whatsoever are sent, so this should work fine. Still, only a speed up of 2x is achieved in this example on 8 threads, which is not very impressive. I am not sure at the moment why this is not better. I did similar adjustments to the MCAmericanEngine and the LongstaffSchwartzPathPricer. Speedup is even less here. I will try to replace OpenMP by a more native approach later to see what is going on. Nevertheless, the thing is flying at least. If somebody is interested to have a look, here is the pull request containing the changes in the framework, the added dcmt wrapper and test cases. https://github.com/lballabio/quantlib/pull/280 Thank you Peter On 27 October 2014 at 04:46, cheng li <[hidden email]> wrote: > Hi Peter, > > It works great works on Windows. > > Try 9999999999 samples: > > Original MT: 35.63 > Daynamic MT: 37.03 > > And also I try 100000, 100000000, 1000000000 samples, > > The result are similar and the elapsed time grows linearly. > > I tried vc++ 2012. The vc++ 2010 will work same in my opnion. I will get back to you when vc++ 2010 test finished. > > Regards, > Cheng > > -----邮件原件----- > 发件人: Peter Caspers [mailto:[hidden email]] > 发送时间: 2014年10月27日 3:49 > 收件人: cheng li > 抄送: QuantLib Mailing Lists > 主题: Re: 答复: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT > > Hi, > > I think I could further improve the performance of the precomputed twisters (i.e. the ones constructed as > > MersenneTwisterCustomRng<Mtdesc19937_5> mt(42); > > ). Now they seem to be just as fast as the original one (I only tested on Linux). The PR is updated. > > Cheng, would you maybe like to double check ? > > Thanks a lot > Peter > > On 23 September 2014 03:50, cheng li <[hidden email]> wrote: >> Hi Peter, >> >> On my side the performance is also improved. Now around 2.5 slow down. Thanks for your help. >> >> Regards, >> Cheng >> >> -----邮件原件----- >> 发件人: Peter Caspers [mailto:[hidden email]] >> 发送时间: 2014年9月22日 16:05 >> 收件人: cheng li >> 抄送: QuantLib Mailing Lists >> 主题: Re: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : >> Dynamic Creator MT >> >> yes, please. The slowdown on Windows on my office computer is around 1.6 now. >> best regards >> Peter >> >> On 22 September 2014 03:48, cheng li <[hidden email]> wrote: >>> Hi Peter, >>> >>> Thanks for your effort. I'll definitely have a try:) >>> >>> Regards, >>> Cheng >>> >>> -----邮件原件----- >>> 发件人: Peter Caspers [mailto:[hidden email]] >>> 发送时间: 2014年9月21日 23:11 >>> 收件人: cheng.li >>> 抄送: QuantLib Mailing Lists >>> 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : >>> Dynamic Creator MT >>> >>> Hi Cheng, >>> >>> I switched to a template class for precomputed twisters, which is >>> faster by a factor of 2 (450ms instead of 870ms). This can be >>> instantiated with >>> >>> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42); >>> >>> with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime. >>> >>> The pull request is updated accordingly. >>> >>> Best regards >>> Peter >>> >>> >>> >>> >>> On 21 September 2014 08:11, cheng.li <[hidden email]> wrote: >>>> Hi Peter, >>>> >>>> Thanks for your hard work. I think our results are consistent. >>>> >>>> Regards, >>>> Cheng >>>> >>>> -----邮件原件----- >>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>> 发送时间: 2014年9月21日 0:33 >>>> 收件人: cheng li >>>> 抄送: QuantLib Mailing Lists >>>> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>>> Creator MT >>>> >>>> Hi Cheng, >>>> >>>> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program. >>>> >>>> Actually on Windows (same machine on which I run Ubuntu, which >>>> doesn't really matter, because my computer in office gives very >>>> similar >>>> timings) I get for 1E8 random numbers generated (with O2) >>>> >>>> 400ms / 1100ms >>>> >>>> for the original ql mt / dynamic creator mt. The ql mt is just as >>>> fast as the boost mt implementation by the way. On Ubuntu with gcc >>>> 4.8.1 and O3 I get >>>> >>>> 290ms / 870ms >>>> >>>> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0. >>>> >>>> If I directly call the original C routine without using the wrapper object, I get 720ms. >>>> >>>> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms. >>>> >>>> This means, the wrapper introduces a slow down by 20% which seems not too bad. >>>> >>>> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment. >>>> >>>> What is your opinion on this ? >>>> >>>> Peter >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases. >>>> >>>> Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms >>>> / >>>> 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0 >>>> => 1340 ms / 1150 ms >>>> >>>> clang >>>> 290 >>>> 720 >>>> 870 >>>> >>>> (c 730) >>>> >>>> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt. >>>> >>>> At the moment I stil don't know what it is. >>>> >>>> On 18 September 2014 03:33, cheng li <[hidden email]> wrote: >>>>> Let me try your statement once I have a time. >>>>> >>>>> Regards, >>>>> Cheng >>>>> >>>>> -----邮件原件----- >>>>> 发件人: cheng li [mailto:[hidden email]] >>>>> 发送时间: 2014年9月18日 9:18 >>>>> 收件人: 'Peter Caspers' >>>>> 抄送: 'QuantLib Mailing Lists' >>>>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>>>> Creator MT >>>>> >>>>> Hi Peter, >>>>> >>>>> I used gcc 4.8.2. >>>>> >>>>> My result with O3 optimization is still not good. Similar >>>>> performance of new MT ( about 3~4X speed down) >>>>> >>>>> I used such statement to turn on o3 optimization before I do >>>>> ./configure for QuantLib, >>>>> >>>>> Export CXXFLAGS="-g -O3" >>>>> >>>>> Am I right? >>>>> >>>>> Regards, >>>>> Cheng >>>>> >>>>> -----邮件原件----- >>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>> 发送时间: 2014年9月18日 0:36 >>>>> 收件人: cheng li >>>>> 抄送: QuantLib Mailing Lists >>>>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>>>> Creator MT >>>>> >>>>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1). >>>>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before. >>>>> >>>>> Which compiler do you use on Ubuntu ? >>>>> >>>>> Peter >>>>> >>>>> >>>>> >>>>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote: >>>>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization. >>>>>> >>>>>> I'll try -O3 on my machine also with Ubuntu. >>>>>> >>>>>> Regards, >>>>>> Cheng >>>>>> >>>>>> -----邮件原件----- >>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>> 发送时间: 2014年9月17日 0:32 >>>>>> 收件人: Cheng Li; QuantLib Mailing Lists >>>>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator >>>>>> MT >>>>>> >>>>>> Hi Cheng, >>>>>> >>>>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3). >>>>>> >>>>>> Does anyone have an idea where the different behaviour under gcc / >>>>>> linux and msvc might come from (and how to improve the msvc side >>>>>> if >>>>>> possible) ? >>>>>> >>>>>> Kind regards >>>>>> Peter >>>>>> >>>>>> >>>>>> >>>>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote: >>>>>>> Thanks Peter. >>>>>>> >>>>>>> Regards, >>>>>>> Cheng >>>>>>> >>>>>>> 发自我的 iPad >>>>>>> >>>>>>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道: >>>>>>>> >>>>>>>> I will have a look on monday ( I have a Windows machine at work >>>>>>>> ) and see how it works there >>>>>>>> >>>>>>>> Thanks >>>>>>>> Peter >>>>>>>> >>>>>>>> Von meinem iPhone gesendet >>>>>>>> >>>>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>: >>>>>>>>> >>>>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 >>>>>>>>> under release mode >>>>>>>>> >>>>>>>>> 发自我的 iPad >>>>>>>>> >>>>>>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道: >>>>>>>>>> >>>>>>>>>> Hi Cheng, >>>>>>>>>> >>>>>>>>>> no, I get better timings with the dcmt implementation, e.g. >>>>>>>>>> for >>>>>>>>>> 1E8 numbers >>>>>>>>>> >>>>>>>>>> dcmt 0.982s >>>>>>>>>> quantlib 1.159s >>>>>>>>>> >>>>>>>>>> on my computer. Can you post your platform and compiler >>>>>>>>>> settings, so that I can try to reproduce ? >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote: >>>>>>>>>>> Hi Peter, >>>>>>>>>>> >>>>>>>>>>> I have used your wrapper dcmt library and test with following >>>>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the >>>>>>>>>>> QL original MT. Is this consistent with your side? >>>>>>>>>>> >>>>>>>>>>> #include <ql/quantlib.hpp> >>>>>>>>>>> #include <boost/timer.hpp> >>>>>>>>>>> #include <iostream> >>>>>>>>>>> >>>>>>>>>>> using namespace QuantLib; >>>>>>>>>>> using namespace std; >>>>>>>>>>> >>>>>>>>>>> int main() { >>>>>>>>>>> >>>>>>>>>>> int samples; >>>>>>>>>>> cin >> samples; >>>>>>>>>>> boost::timer myTimer; >>>>>>>>>>> >>>>>>>>>>> MersenneTwisterUniformRng orignalMT; >>>>>>>>>>> for(Size i=0; i<samples; ++i) >>>>>>>>>>> orignalMT.next(); >>>>>>>>>>> >>>>>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>>>>> >>>>>>>>>>> myTimer.restart(); >>>>>>>>>>> >>>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1); >>>>>>>>>>> >>>>>>>>>>> for(Size i=0; i<samples; ++i) { >>>>>>>>>>> mt.next(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> cout << myTimer.elapsed() << endl; >>>>>>>>>>> >>>>>>>>>>> int n; >>>>>>>>>>> std::cin>>n; >>>>>>>>>>> return 0; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Cheng >>>>>>>>>>> >>>>>>>>>>> -----邮件原件----- >>>>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]] >>>>>>>>>>> 发送时间: 2014年9月6日 20:48 >>>>>>>>>>> 收件人: Joseph Wang >>>>>>>>>>> 抄送: QuantLib Mailing Lists >>>>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic >>>>>>>>>>> Creator MT >>>>>>>>>>> >>>>>>>>>>> Hi Joseph, all, >>>>>>>>>>> >>>>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of >>>>>>>>>>> Mersenne Twisters). >>>>>>>>>>> >>>>>>>>>>> https://github.com/lballabio/quantlib/pull/132 >>>>>>>>>>> >>>>>>>>>>> I guess this is a useful building block for multithreaded monte carlo. >>>>>>>>>>> Since for bigger p the dynamic creation takes a long time (it >>>>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e. >>>>>>>>>>> for use in at most 8 parallel threads), for the "standard" >>>>>>>>>>> value p = 19937 and word size 32, which one can instantiate >>>>>>>>>>> with >>>>>>>>>>> >>>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i ); >>>>>>>>>>> >>>>>>>>>>> for i = 0, ... , 7. >>>>>>>>>>> >>>>>>>>>>> In addition the speed of random number generation seems a bit >>>>>>>>>>> faster in the dcmt library than with the original ql twister. >>>>>>>>>>> I observe running times scaled by a factor of 0.8 when generating 1E8 numbers. >>>>>>>>>>> >>>>>>>>>>> All this is of course experimental and not well tested, so >>>>>>>>>>> any feedback and experiences are very welcome. I'd be very >>>>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo. >>>>>>>>>>> >>>>>>>>>>> Peter >>>>>>>>>>> >>>>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote: >>>>>>>>>>>> I've done some more parallelization with openmp and quantlib. >>>>>>>>>>>> I've uploaded the changes to the >>>>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for. >>>>>>>>>>>> openmp-mcario has some changes that need some more work. >>>>>>>>>>>> >>>>>>>>>>>> I've gotten the MC to work by generating the paths in a >>>>>>>>>>>> critical >>>>>>>>>>> situation. >>>>>>>>>>>> Calculating the prices once I have the path is >>>>>>>>>>>> multithreaded, but right now I need to generate the paths in >>>>>>>>>>>> a single thread to make sure that the same sequence is generated. >>>>>>>>>>>> >>>>>>>>>>>> The big issue right now is that there is a race condition in >>>>>>>>>>>> the calculation of barrier options which is causing one >>>>>>>>>>>> regression test to fail. The problem is that the random >>>>>>>>>>>> number generator is being called in BarrierPathPricer, and >>>>>>>>>>>> since that is run multithread, the sequence that is being >>>>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already. >>>>>>>>>>>> >>>>>>>>>>>> I think that fixing this is going to need some code >>>>>>>>>>>> restructuring, but I'd like to get some thoughts as to how >>>>>>>>>>>> to do this. Basically, the interface needs to be changed >>>>>>>>>>>> slightly so that the random numbers are drawn in a fixed >>>>>>>>>>>> order, and that might mean one call to get any additional >>>>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>>> - >>>>>>>>>>>> - >>>>>>>>>>>> - >>>>>>>>>>>> - >>>>>>>>>>>> - >>>>>>>>>>>> ----- >>>>>>>>>>>> -------- October Webinars: Code for Performance Free Intel >>>>>>>>>>>> webinars can help you accelerate application performance. >>>>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. >>>>>>>>>>>> Get the most from the latest Intel processors and >>>>>>>>>>>> coprocessors. See abstracts and register > >>>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4 >>>>>>>>>>>> 1 >>>>>>>>>>>> 4 >>>>>>>>>>>> 0 / o stg.c lktrk >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> QuantLib-dev mailing list >>>>>>>>>>>> [hidden email] >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> - >>>>>>>>>>> ---------- >>>>>>>>>>> -- >>>>>>>>>>> Slashdot TV. >>>>>>>>>>> Video for Nerds. Stuff that matters. >>>>>>>>>>> http://tv.slashdot.org/ >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> QuantLib-dev mailing list >>>>>>>>>>> [hidden email] >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev >>>>>>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> > ------------------------------------------------------------------------------ _______________________________________________ QuantLib-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/quantlib-dev |
Free forum by Nabble | Edit this page |