Login  Register

Re: 答复: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Posted by Peter Caspers-4 on Sep 21, 2014; 3:10pm
URL: http://quantlib.414.s1.nabble.com/Re-Openmp-work-on-mcarlo-Dynamic-Creator-MT-tp15832p15904.html

Hi Cheng,

I switched to a template class for precomputed twisters, which is
faster by a factor of 2 (450ms instead of 870ms). This can be
instantiated with

MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);

with 5 replaceable by 0 to 7 as before. The other is only needed now
if you want to create a mt during runtime.

The pull request is updated accordingly.

Best regards
Peter




On 21 September 2014 08:11, cheng.li <[hidden email]> wrote:

> Hi Peter,
>
> Thanks for your hard work. I think our results are consistent.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月21日 0:33
> 收件人: cheng li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>
> Hi Cheng,
>
> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.
>
> Actually on Windows (same machine on which I run Ubuntu, which doesn't really matter, because my computer in office gives very similar
> timings) I get for 1E8 random numbers generated (with O2)
>
> 400ms / 1100ms
>
> for the original ql mt / dynamic creator mt. The ql mt is just as fast as the boost mt implementation by the way. On Ubuntu with gcc 4.8.1 and O3 I get
>
> 290ms / 870ms
>
> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.
>
> If I directly call the original C routine without using the wrapper object, I get 720ms.
>
> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.
>
> This means, the wrapper introduces a slow down by 20% which seems not too bad.
>
> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.
>
> What is your opinion on this ?
>
> Peter
>
>
>
>
>
>
>
>
>
>
>
>
> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.
>
> Winodws / MSVC 2010 => 400ms / 1100ms
> Ubuntu / gcc 4.9.1 => 1200 ms / 1050 ms
> Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms
> Ubuntu / clang 3.6.0 => 1340 ms / 1150 ms
>
> clang
> 290
> 720
> 870
>
> (c 730)
>
> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.
>
> At the moment I stil don't know what it is.
>
> On 18 September 2014 03:33, cheng li <[hidden email]> wrote:
>> Let me try your statement once I have a time.
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: cheng li [mailto:[hidden email]]
>> 发送时间: 2014年9月18日 9:18
>> 收件人: 'Peter Caspers'
>> 抄送: 'QuantLib Mailing Lists'
>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>> MT
>>
>> Hi Peter,
>>
>> I used gcc 4.8.2.
>>
>> My result with O3 optimization is still not good. Similar performance
>> of new MT ( about 3~4X speed down)
>>
>> I used such statement to turn on o3 optimization before I do
>> ./configure for QuantLib,
>>
>> Export CXXFLAGS="-g -O3"
>>
>> Am I right?
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月18日 0:36
>> 收件人: cheng li
>> 抄送: QuantLib Mailing Lists
>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>> MT
>>
>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>>
>> Which compiler do you use on Ubuntu ?
>>
>> Peter
>>
>>
>>
>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>>
>>> I'll try -O3 on my machine also with Ubuntu.
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>> 发送时间: 2014年9月17日 0:32
>>> 收件人: Cheng Li; QuantLib Mailing Lists
>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>>>
>>> Hi Cheng,
>>>
>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>>
>>> Does anyone have an idea where the different behaviour under gcc /
>>> linux and msvc might come from (and how to improve the msvc side if
>>> possible) ?
>>>
>>> Kind regards
>>> Peter
>>>
>>>
>>>
>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>>> Thanks Peter.
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> 发自我的 iPad
>>>>
>>>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道:
>>>>>
>>>>> I will have a look on monday ( I have a Windows machine at work )
>>>>> and see how it works there
>>>>>
>>>>> Thanks
>>>>> Peter
>>>>>
>>>>> Von meinem iPhone gesendet
>>>>>
>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>>
>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>>> under release mode
>>>>>>
>>>>>> 发自我的 iPad
>>>>>>
>>>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道:
>>>>>>>
>>>>>>> Hi Cheng,
>>>>>>>
>>>>>>> no, I get better timings with the dcmt implementation, e.g. for
>>>>>>> 1E8 numbers
>>>>>>>
>>>>>>> dcmt 0.982s
>>>>>>> quantlib 1.159s
>>>>>>>
>>>>>>> on my computer. Can you post your platform and compiler settings,
>>>>>>> so that I can try to reproduce ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Peter
>>>>>>>
>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>>>> original MT. Is this consistent with your side?
>>>>>>>>
>>>>>>>> #include <ql/quantlib.hpp>
>>>>>>>> #include <boost/timer.hpp>
>>>>>>>> #include <iostream>
>>>>>>>>
>>>>>>>> using namespace QuantLib;
>>>>>>>> using namespace std;
>>>>>>>>
>>>>>>>> int main() {
>>>>>>>>
>>>>>>>>      int samples;
>>>>>>>>      cin >> samples;
>>>>>>>>      boost::timer myTimer;
>>>>>>>>
>>>>>>>>      MersenneTwisterUniformRng orignalMT;
>>>>>>>>      for(Size i=0; i<samples; ++i)
>>>>>>>>              orignalMT.next();
>>>>>>>>
>>>>>>>>      cout << myTimer.elapsed() << endl;
>>>>>>>>
>>>>>>>>      myTimer.restart();
>>>>>>>>
>>>>>>>>      MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>>
>>>>>>>>      for(Size i=0; i<samples; ++i) {
>>>>>>>>              mt.next();
>>>>>>>>      }
>>>>>>>>
>>>>>>>>      cout << myTimer.elapsed() << endl;
>>>>>>>>
>>>>>>>>      int n;
>>>>>>>>      std::cin>>n;
>>>>>>>>      return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Cheng
>>>>>>>>
>>>>>>>> -----邮件原件-----
>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>>> 收件人: Joseph Wang
>>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>>>>> MT
>>>>>>>>
>>>>>>>> Hi Joseph, all,
>>>>>>>>
>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>>> Mersenne Twisters).
>>>>>>>>
>>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>>
>>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>>> for use in at most 8 parallel threads), for the "standard" value
>>>>>>>> p = 19937 and word size 32, which one can instantiate with
>>>>>>>>
>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>>
>>>>>>>> for i = 0, ... , 7.
>>>>>>>>
>>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>>> faster in the dcmt library than with the original ql twister. I
>>>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>>
>>>>>>>> All this is of course experimental and not well tested, so any
>>>>>>>> feedback and experiences are very welcome. I'd be very
>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>>> I've uploaded the changes to the
>>>>>>>>> https://github.com/joequant/quantlib.  The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>>
>>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>>> critical
>>>>>>>> situation.
>>>>>>>>> Calculating the prices once I have the path is multithreaded,
>>>>>>>>> but right now I need to generate the paths in a single thread
>>>>>>>>> to make sure that the same sequence is generated.
>>>>>>>>>
>>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>>> regression test to fail.  The problem is that the random number
>>>>>>>>> generator is being called in BarrierPathPricer, and since that
>>>>>>>>> is run multithread, the sequence that is being pulled will
>>>>>>>>> change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>>
>>>>>>>>> I think that fixing this is going to need some code
>>>>>>>>> restructuring, but I'd like to get some thoughts as to how to
>>>>>>>>> do this.  Basically, the interface needs to be changed slightly
>>>>>>>>> so that the random numbers are drawn in a fixed order, and that
>>>>>>>>> might mean one call to get any additional random numbers in a
>>>>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> -----
>>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get
>>>>>>>>> the most from the latest Intel processors and coprocessors. See
>>>>>>>>> abstracts and register >
>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140
>>>>>>>>> / o stg.c lktrk _______________________________________________
>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>> [hidden email]
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------
>>>>>>>> -
>>>>>>>> -
>>>>>>>> ----------
>>>>>>>> --
>>>>>>>> Slashdot TV.
>>>>>>>> Video for Nerds.  Stuff that matters.
>>>>>>>> http://tv.slashdot.org/
>>>>>>>> _______________________________________________
>>>>>>>> QuantLib-dev mailing list
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>
>>>
>>
>>
>

------------------------------------------------------------------------------
Slashdot TV.  Video for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev