Login  Register

答复: 答复: 答复: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Posted by cheng li on Sep 23, 2014; 1:49am
URL: http://quantlib.414.s1.nabble.com/Re-Openmp-work-on-mcarlo-Dynamic-Creator-MT-tp15832p15910.html

Hi Peter,

On my side the performance is also improved. Now around 2.5 slow down. Thanks for your help.

Regards,
Cheng

-----邮件原件-----
发件人: Peter Caspers [mailto:[hidden email]]
发送时间: 2014年9月22日 16:05
收件人: cheng li
抄送: QuantLib Mailing Lists
主题: Re: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT

yes, please. The slowdown on Windows on my office computer is around 1.6 now.
best regards
Peter

On 22 September 2014 03:48, cheng li <[hidden email]> wrote:

> Hi Peter,
>
> Thanks for your effort. I'll definitely have a try:)
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月21日 23:11
> 收件人: cheng.li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
> Creator MT
>
> Hi Cheng,
>
> I switched to a template class for precomputed twisters, which is
> faster by a factor of 2 (450ms instead of 870ms). This can be
> instantiated with
>
> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);
>
> with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime.
>
> The pull request is updated accordingly.
>
> Best regards
> Peter
>
>
>
>
> On 21 September 2014 08:11, cheng.li <[hidden email]> wrote:
>> Hi Peter,
>>
>> Thanks for your hard work. I think our results are consistent.
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月21日 0:33
>> 收件人: cheng li
>> 抄送: QuantLib Mailing Lists
>> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>> Creator MT
>>
>> Hi Cheng,
>>
>> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.
>>
>> Actually on Windows (same machine on which I run Ubuntu, which
>> doesn't really matter, because my computer in office gives very
>> similar
>> timings) I get for 1E8 random numbers generated (with O2)
>>
>> 400ms / 1100ms
>>
>> for the original ql mt / dynamic creator mt. The ql mt is just as
>> fast as the boost mt implementation by the way. On Ubuntu with gcc
>> 4.8.1 and O3 I get
>>
>> 290ms / 870ms
>>
>> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.
>>
>> If I directly call the original C routine without using the wrapper object, I get 720ms.
>>
>> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.
>>
>> This means, the wrapper introduces a slow down by 20% which seems not too bad.
>>
>> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.
>>
>> What is your opinion on this ?
>>
>> Peter
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.
>>
>> Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms /
>> 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0
>> => 1340 ms / 1150 ms
>>
>> clang
>> 290
>> 720
>> 870
>>
>> (c 730)
>>
>> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.
>>
>> At the moment I stil don't know what it is.
>>
>> On 18 September 2014 03:33, cheng li <[hidden email]> wrote:
>>> Let me try your statement once I have a time.
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: cheng li [mailto:[hidden email]]
>>> 发送时间: 2014年9月18日 9:18
>>> 收件人: 'Peter Caspers'
>>> 抄送: 'QuantLib Mailing Lists'
>>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>> Creator MT
>>>
>>> Hi Peter,
>>>
>>> I used gcc 4.8.2.
>>>
>>> My result with O3 optimization is still not good. Similar
>>> performance of new MT ( about 3~4X speed down)
>>>
>>> I used such statement to turn on o3 optimization before I do
>>> ./configure for QuantLib,
>>>
>>> Export CXXFLAGS="-g -O3"
>>>
>>> Am I right?
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>> 发送时间: 2014年9月18日 0:36
>>> 收件人: cheng li
>>> 抄送: QuantLib Mailing Lists
>>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>> Creator MT
>>>
>>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
>>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>>>
>>> Which compiler do you use on Ubuntu ?
>>>
>>> Peter
>>>
>>>
>>>
>>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>>>
>>>> I'll try -O3 on my machine also with Ubuntu.
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> -----邮件原件-----
>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>> 发送时间: 2014年9月17日 0:32
>>>> 收件人: Cheng Li; QuantLib Mailing Lists
>>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>> MT
>>>>
>>>> Hi Cheng,
>>>>
>>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>>>
>>>> Does anyone have an idea where the different behaviour under gcc /
>>>> linux and msvc might come from (and how to improve the msvc side if
>>>> possible) ?
>>>>
>>>> Kind regards
>>>> Peter
>>>>
>>>>
>>>>
>>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>>>> Thanks Peter.
>>>>>
>>>>> Regards,
>>>>> Cheng
>>>>>
>>>>> 发自我的 iPad
>>>>>
>>>>>> 在 2014年9月13日,13:29,Peter Caspers <[hidden email]> 写道:
>>>>>>
>>>>>> I will have a look on monday ( I have a Windows machine at work )
>>>>>> and see how it works there
>>>>>>
>>>>>> Thanks
>>>>>> Peter
>>>>>>
>>>>>> Von meinem iPhone gesendet
>>>>>>
>>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>>>
>>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>>>> under release mode
>>>>>>>
>>>>>>> 发自我的 iPad
>>>>>>>
>>>>>>>> 在 2014年9月13日,0:08,Peter Caspers <[hidden email]> 写道:
>>>>>>>>
>>>>>>>> Hi Cheng,
>>>>>>>>
>>>>>>>> no, I get better timings with the dcmt implementation, e.g. for
>>>>>>>> 1E8 numbers
>>>>>>>>
>>>>>>>> dcmt 0.982s
>>>>>>>> quantlib 1.159s
>>>>>>>>
>>>>>>>> on my computer. Can you post your platform and compiler
>>>>>>>> settings, so that I can try to reproduce ?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>>>>> original MT. Is this consistent with your side?
>>>>>>>>>
>>>>>>>>> #include <ql/quantlib.hpp>
>>>>>>>>> #include <boost/timer.hpp>
>>>>>>>>> #include <iostream>
>>>>>>>>>
>>>>>>>>> using namespace QuantLib;
>>>>>>>>> using namespace std;
>>>>>>>>>
>>>>>>>>> int main() {
>>>>>>>>>
>>>>>>>>>      int samples;
>>>>>>>>>      cin >> samples;
>>>>>>>>>      boost::timer myTimer;
>>>>>>>>>
>>>>>>>>>      MersenneTwisterUniformRng orignalMT;
>>>>>>>>>      for(Size i=0; i<samples; ++i)
>>>>>>>>>              orignalMT.next();
>>>>>>>>>
>>>>>>>>>      cout << myTimer.elapsed() << endl;
>>>>>>>>>
>>>>>>>>>      myTimer.restart();
>>>>>>>>>
>>>>>>>>>      MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>>>
>>>>>>>>>      for(Size i=0; i<samples; ++i) {
>>>>>>>>>              mt.next();
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>>      cout << myTimer.elapsed() << endl;
>>>>>>>>>
>>>>>>>>>      int n;
>>>>>>>>>      std::cin>>n;
>>>>>>>>>      return 0;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Cheng
>>>>>>>>>
>>>>>>>>> -----邮件原件-----
>>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>>>> 收件人: Joseph Wang
>>>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>>>>>> MT
>>>>>>>>>
>>>>>>>>> Hi Joseph, all,
>>>>>>>>>
>>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>>>> Mersenne Twisters).
>>>>>>>>>
>>>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>>>
>>>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>>>> for use in at most 8 parallel threads), for the "standard"
>>>>>>>>> value p = 19937 and word size 32, which one can instantiate
>>>>>>>>> with
>>>>>>>>>
>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>>>
>>>>>>>>> for i = 0, ... , 7.
>>>>>>>>>
>>>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>>>> faster in the dcmt library than with the original ql twister.
>>>>>>>>> I observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>>>
>>>>>>>>> All this is of course experimental and not well tested, so any
>>>>>>>>> feedback and experiences are very welcome. I'd be very
>>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>>>> I've uploaded the changes to the
>>>>>>>>>> https://github.com/joequant/quantlib.  The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>>>
>>>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>>>> critical
>>>>>>>>> situation.
>>>>>>>>>> Calculating the prices once I have the path is multithreaded,
>>>>>>>>>> but right now I need to generate the paths in a single thread
>>>>>>>>>> to make sure that the same sequence is generated.
>>>>>>>>>>
>>>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>>>> regression test to fail.  The problem is that the random
>>>>>>>>>> number generator is being called in BarrierPathPricer, and
>>>>>>>>>> since that is run multithread, the sequence that is being
>>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>>>
>>>>>>>>>> I think that fixing this is going to need some code
>>>>>>>>>> restructuring, but I'd like to get some thoughts as to how to
>>>>>>>>>> do this.  Basically, the interface needs to be changed
>>>>>>>>>> slightly so that the random numbers are drawn in a fixed
>>>>>>>>>> order, and that might mean one call to get any additional
>>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -----
>>>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more.
>>>>>>>>>> Get the most from the latest Intel processors and
>>>>>>>>>> coprocessors. See abstracts and register >
>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/41
>>>>>>>>>> 4
>>>>>>>>>> 0 / o stg.c lktrk
>>>>>>>>>> _______________________________________________
>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>> [hidden email]
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> ----------
>>>>>>>>> --
>>>>>>>>> Slashdot TV.
>>>>>>>>> Video for Nerds.  Stuff that matters.
>>>>>>>>> http://tv.slashdot.org/
>>>>>>>>> _______________________________________________
>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>> [hidden email]
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>
>>>>
>>>
>>>
>>
>


------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev