Login  Register

答复: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Posted by cheng li on Sep 21, 2014; 6:11am
URL: http://quantlib.414.s1.nabble.com/Re-Openmp-work-on-mcarlo-Dynamic-Creator-MT-tp15832p15900.html

Hi Peter,

Thanks for your hard work. I think our results are consistent.

Regards,
Cheng

-----邮件原件-----
发件人: Peter Caspers [mailto:[hidden email]]
发送时间: 2014年9月21日 0:33
收件人: cheng li
抄送: QuantLib Mailing Lists
主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT

Hi Cheng,

sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.

Actually on Windows (same machine on which I run Ubuntu, which doesn't really matter, because my computer in office gives very similar
timings) I get for 1E8 random numbers generated (with O2)

400ms / 1100ms

for the original ql mt / dynamic creator mt. The ql mt is just as fast as the boost mt implementation by the way. On Ubuntu with gcc 4.8.1 and O3 I get

290ms / 870ms

and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.

If I directly call the original C routine without using the wrapper object, I get 720ms.

If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.

This means, the wrapper introduces a slow down by 20% which seems not too bad.

Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.

What is your opinion on this ?

Peter












I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.

Winodws / MSVC 2010 => 400ms / 1100ms
Ubuntu / gcc 4.9.1 => 1200 ms / 1050 ms
Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms
Ubuntu / clang 3.6.0 => 1340 ms / 1150 ms

clang
290
720
870

(c 730)

so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.

At the moment I stil don't know what it is.

On 18 September 2014 03:33, cheng li <[hidden email]> wrote:

> Let me try your statement once I have a time.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: cheng li [mailto:[hidden email]]
> 发送时间: 2014年9月18日 9:18
> 收件人: 'Peter Caspers'
> 抄送: 'QuantLib Mailing Lists'
> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
> MT
>
> Hi Peter,
>
> I used gcc 4.8.2.
>
> My result with O3 optimization is still not good. Similar performance
> of new MT ( about 3~4X speed down)
>
> I used such statement to turn on o3 optimization before I do
> ./configure for QuantLib,
>
> Export CXXFLAGS="-g -O3"
>
> Am I right?
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月18日 0:36
> 收件人: cheng li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
> MT
>
> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>
> Which compiler do you use on Ubuntu ?
>
> Peter
>
>
>
> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>
>> I'll try -O3 on my machine also with Ubuntu.
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月17日 0:32
>> 收件人: Cheng Li; QuantLib Mailing Lists
>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>>
>> Hi Cheng,
>>
>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>
>> Does anyone have an idea where the different behaviour under gcc /
>> linux and msvc might come from (and how to improve the msvc side if
>> possible) ?
>>
>> Kind regards
>> Peter
>>
>>
>>
>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>> Thanks Peter.
>>>
>>> Regards,
>>> Cheng
>>>
>>> 发自我的 iPad
>>>
>>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道:
>>>>
>>>> I will have a look on monday ( I have a Windows machine at work )
>>>> and see how it works there
>>>>
>>>> Thanks
>>>> Peter
>>>>
>>>> Von meinem iPhone gesendet
>>>>
>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>
>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>> under release mode
>>>>>
>>>>> 发自我的 iPad
>>>>>
>>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道:
>>>>>>
>>>>>> Hi Cheng,
>>>>>>
>>>>>> no, I get better timings with the dcmt implementation, e.g. for
>>>>>> 1E8 numbers
>>>>>>
>>>>>> dcmt 0.982s
>>>>>> quantlib 1.159s
>>>>>>
>>>>>> on my computer. Can you post your platform and compiler settings,
>>>>>> so that I can try to reproduce ?
>>>>>>
>>>>>> Thanks
>>>>>> Peter
>>>>>>
>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>>> original MT. Is this consistent with your side?
>>>>>>>
>>>>>>> #include <ql/quantlib.hpp>
>>>>>>> #include <boost/timer.hpp>
>>>>>>> #include <iostream>
>>>>>>>
>>>>>>> using namespace QuantLib;
>>>>>>> using namespace std;
>>>>>>>
>>>>>>> int main() {
>>>>>>>
>>>>>>>      int samples;
>>>>>>>      cin >> samples;
>>>>>>>      boost::timer myTimer;
>>>>>>>
>>>>>>>      MersenneTwisterUniformRng orignalMT;
>>>>>>>      for(Size i=0; i<samples; ++i)
>>>>>>>              orignalMT.next();
>>>>>>>
>>>>>>>      cout << myTimer.elapsed() << endl;
>>>>>>>
>>>>>>>      myTimer.restart();
>>>>>>>
>>>>>>>      MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>
>>>>>>>      for(Size i=0; i<samples; ++i) {
>>>>>>>              mt.next();
>>>>>>>      }
>>>>>>>
>>>>>>>      cout << myTimer.elapsed() << endl;
>>>>>>>
>>>>>>>      int n;
>>>>>>>      std::cin>>n;
>>>>>>>      return 0;
>>>>>>> }
>>>>>>>
>>>>>>> Regards,
>>>>>>> Cheng
>>>>>>>
>>>>>>> -----邮件原件-----
>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>> 收件人: Joseph Wang
>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>>>> MT
>>>>>>>
>>>>>>> Hi Joseph, all,
>>>>>>>
>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>> Mersenne Twisters).
>>>>>>>
>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>
>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>> for use in at most 8 parallel threads), for the "standard" value
>>>>>>> p = 19937 and word size 32, which one can instantiate with
>>>>>>>
>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>
>>>>>>> for i = 0, ... , 7.
>>>>>>>
>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>> faster in the dcmt library than with the original ql twister. I
>>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>
>>>>>>> All this is of course experimental and not well tested, so any
>>>>>>> feedback and experiences are very welcome. I'd be very
>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>> I've uploaded the changes to the
>>>>>>>> https://github.com/joequant/quantlib.  The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>
>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>> critical
>>>>>>> situation.
>>>>>>>> Calculating the prices once I have the path is multithreaded,
>>>>>>>> but right now I need to generate the paths in a single thread
>>>>>>>> to make sure that the same sequence is generated.
>>>>>>>>
>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>> regression test to fail.  The problem is that the random number
>>>>>>>> generator is being called in BarrierPathPricer, and since that
>>>>>>>> is run multithread, the sequence that is being pulled will
>>>>>>>> change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>
>>>>>>>> I think that fixing this is going to need some code
>>>>>>>> restructuring, but I'd like to get some thoughts as to how to
>>>>>>>> do this.  Basically, the interface needs to be changed slightly
>>>>>>>> so that the random numbers are drawn in a fixed order, and that
>>>>>>>> might mean one call to get any additional random numbers in a
>>>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------
>>>>>>>> -
>>>>>>>> -
>>>>>>>> -----
>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get
>>>>>>>> the most from the latest Intel processors and coprocessors. See
>>>>>>>> abstracts and register >
>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140
>>>>>>>> / o stg.c lktrk _______________________________________________
>>>>>>>> QuantLib-dev mailing list
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> -
>>>>>>> -
>>>>>>> ----------
>>>>>>> --
>>>>>>> Slashdot TV.
>>>>>>> Video for Nerds.  Stuff that matters.
>>>>>>> http://tv.slashdot.org/
>>>>>>> _______________________________________________
>>>>>>> QuantLib-dev mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>
>>
>
>


------------------------------------------------------------------------------
Slashdot TV.  Video for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev