http://quantlib.414.s1.nabble.com/Re-Openmp-work-on-mcarlo-Dynamic-Creator-MT-tp15832p15997.html
It works great works on Windows.
The result are similar and the elapsed time grows linearly.
I tried vc++ 2012. The vc++ 2010 will work same in my opnion. I will get back to you when vc++ 2010 test finished.
主题: Re: 答复: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
I think I could further improve the performance of the precomputed twisters (i.e. the ones constructed as
). Now they seem to be just as fast as the original one (I only tested on Linux). The PR is updated.
> Hi Peter,
>
> On my side the performance is also improved. Now around 2.5 slow down. Thanks for your help.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:
[hidden email]]
> 发送时间: 2014年9月22日 16:05
> 收件人: cheng li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo :
> Dynamic Creator MT
>
> yes, please. The slowdown on Windows on my office computer is around 1.6 now.
> best regards
> Peter
>
> On 22 September 2014 03:48, cheng li <
[hidden email]> wrote:
>> Hi Peter,
>>
>> Thanks for your effort. I'll definitely have a try:)
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:
[hidden email]]
>> 发送时间: 2014年9月21日 23:11
>> 收件人: cheng.li
>> 抄送: QuantLib Mailing Lists
>> 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo :
>> Dynamic Creator MT
>>
>> Hi Cheng,
>>
>> I switched to a template class for precomputed twisters, which is
>> faster by a factor of 2 (450ms instead of 870ms). This can be
>> instantiated with
>>
>> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);
>>
>> with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime.
>>
>> The pull request is updated accordingly.
>>
>> Best regards
>> Peter
>>
>>
>>
>>
>> On 21 September 2014 08:11, cheng.li <
[hidden email]> wrote:
>>> Hi Peter,
>>>
>>> Thanks for your hard work. I think our results are consistent.
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: Peter Caspers [mailto:
[hidden email]]
>>> 发送时间: 2014年9月21日 0:33
>>> 收件人: cheng li
>>> 抄送: QuantLib Mailing Lists
>>> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>> Creator MT
>>>
>>> Hi Cheng,
>>>
>>> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.
>>>
>>> Actually on Windows (same machine on which I run Ubuntu, which
>>> doesn't really matter, because my computer in office gives very
>>> similar
>>> timings) I get for 1E8 random numbers generated (with O2)
>>>
>>> 400ms / 1100ms
>>>
>>> for the original ql mt / dynamic creator mt. The ql mt is just as
>>> fast as the boost mt implementation by the way. On Ubuntu with gcc
>>> 4.8.1 and O3 I get
>>>
>>> 290ms / 870ms
>>>
>>> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.
>>>
>>> If I directly call the original C routine without using the wrapper object, I get 720ms.
>>>
>>> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.
>>>
>>> This means, the wrapper introduces a slow down by 20% which seems not too bad.
>>>
>>> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.
>>>
>>> What is your opinion on this ?
>>>
>>> Peter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.
>>>
>>> Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms
>>> /
>>> 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0
>>> => 1340 ms / 1150 ms
>>>
>>> clang
>>> 290
>>> 720
>>> 870
>>>
>>> (c 730)
>>>
>>> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.
>>>
>>> At the moment I stil don't know what it is.
>>>
>>> On 18 September 2014 03:33, cheng li <
[hidden email]> wrote:
>>>> Let me try your statement once I have a time.
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> -----邮件原件-----
>>>> 发件人: cheng li [mailto:
[hidden email]]
>>>> 发送时间: 2014年9月18日 9:18
>>>> 收件人: 'Peter Caspers'
>>>> 抄送: 'QuantLib Mailing Lists'
>>>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>> Creator MT
>>>>
>>>> Hi Peter,
>>>>
>>>> I used gcc 4.8.2.
>>>>
>>>> My result with O3 optimization is still not good. Similar
>>>> performance of new MT ( about 3~4X speed down)
>>>>
>>>> I used such statement to turn on o3 optimization before I do
>>>> ./configure for QuantLib,
>>>>
>>>> Export CXXFLAGS="-g -O3"
>>>>
>>>> Am I right?
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> -----邮件原件-----
>>>> 发件人: Peter Caspers [mailto:
[hidden email]]
>>>> 发送时间: 2014年9月18日 0:36
>>>> 收件人: cheng li
>>>> 抄送: QuantLib Mailing Lists
>>>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>> Creator MT
>>>>
>>>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
>>>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>>>>
>>>> Which compiler do you use on Ubuntu ?
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>> On 17 September 2014 03:26, cheng li <
[hidden email]> wrote:
>>>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>>>>
>>>>> I'll try -O3 on my machine also with Ubuntu.
>>>>>
>>>>> Regards,
>>>>> Cheng
>>>>>
>>>>> -----邮件原件-----
>>>>> 发件人: Peter Caspers [mailto:
[hidden email]]
>>>>> 发送时间: 2014年9月17日 0:32
>>>>> 收件人: Cheng Li; QuantLib Mailing Lists
>>>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>> MT
>>>>>
>>>>> Hi Cheng,
>>>>>
>>>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>>>>
>>>>> Does anyone have an idea where the different behaviour under gcc /
>>>>> linux and msvc might come from (and how to improve the msvc side
>>>>> if
>>>>> possible) ?
>>>>>
>>>>> Kind regards
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>> On 13 September 2014 08:27, Cheng Li <
[hidden email]> wrote:
>>>>>> Thanks Peter.
>>>>>>
>>>>>> Regards,
>>>>>> Cheng
>>>>>>
>>>>>> 发自我的 iPad
>>>>>>
>>>>>>> 在 2014年9月13日,13:29,Peter Caspers <
[hidden email]> 写道:
>>>>>>>
>>>>>>> I will have a look on monday ( I have a Windows machine at work
>>>>>>> ) and see how it works there
>>>>>>>
>>>>>>> Thanks
>>>>>>> Peter
>>>>>>>
>>>>>>> Von meinem iPhone gesendet
>>>>>>>
>>>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <
[hidden email]>:
>>>>>>>>
>>>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>>>>> under release mode
>>>>>>>>
>>>>>>>> 发自我的 iPad
>>>>>>>>
>>>>>>>>> 在 2014年9月13日,0:08,Peter Caspers <
[hidden email]> 写道:
>>>>>>>>>
>>>>>>>>> Hi Cheng,
>>>>>>>>>
>>>>>>>>> no, I get better timings with the dcmt implementation, e.g.
>>>>>>>>> for
>>>>>>>>> 1E8 numbers
>>>>>>>>>
>>>>>>>>> dcmt 0.982s
>>>>>>>>> quantlib 1.159s
>>>>>>>>>
>>>>>>>>> on my computer. Can you post your platform and compiler
>>>>>>>>> settings, so that I can try to reproduce ?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>> On 12 September 2014 05:29, cheng li <
[hidden email]> wrote:
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the
>>>>>>>>>> QL original MT. Is this consistent with your side?
>>>>>>>>>>
>>>>>>>>>> #include <ql/quantlib.hpp>
>>>>>>>>>> #include <boost/timer.hpp>
>>>>>>>>>> #include <iostream>
>>>>>>>>>>
>>>>>>>>>> using namespace QuantLib;
>>>>>>>>>> using namespace std;
>>>>>>>>>>
>>>>>>>>>> int main() {
>>>>>>>>>>
>>>>>>>>>> int samples;
>>>>>>>>>> cin >> samples;
>>>>>>>>>> boost::timer myTimer;
>>>>>>>>>>
>>>>>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>>>>>> for(Size i=0; i<samples; ++i)
>>>>>>>>>> orignalMT.next();
>>>>>>>>>>
>>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>>
>>>>>>>>>> myTimer.restart();
>>>>>>>>>>
>>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>>>>
>>>>>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>>>>>> mt.next();
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>>
>>>>>>>>>> int n;
>>>>>>>>>> std::cin>>n;
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Cheng
>>>>>>>>>>
>>>>>>>>>> -----邮件原件-----
>>>>>>>>>> 发件人: Peter Caspers [mailto:
[hidden email]]
>>>>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>>>>> 收件人: Joseph Wang
>>>>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>>>>>>>> Creator MT
>>>>>>>>>>
>>>>>>>>>> Hi Joseph, all,
>>>>>>>>>>
>>>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>>>>> Mersenne Twisters).
>>>>>>>>>>
>>>>>>>>>>
https://github.com/lballabio/quantlib/pull/132>>>>>>>>>>
>>>>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>>>>> for use in at most 8 parallel threads), for the "standard"
>>>>>>>>>> value p = 19937 and word size 32, which one can instantiate
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>>>>
>>>>>>>>>> for i = 0, ... , 7.
>>>>>>>>>>
>>>>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>>>>> faster in the dcmt library than with the original ql twister.
>>>>>>>>>> I observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>>>>
>>>>>>>>>> All this is of course experimental and not well tested, so
>>>>>>>>>> any feedback and experiences are very welcome. I'd be very
>>>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <
[hidden email]> wrote:
>>>>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>>>>> I've uploaded the changes to the
>>>>>>>>>>>
https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>>>>
>>>>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>>>>> critical
>>>>>>>>>> situation.
>>>>>>>>>>> Calculating the prices once I have the path is
>>>>>>>>>>> multithreaded, but right now I need to generate the paths in
>>>>>>>>>>> a single thread to make sure that the same sequence is generated.
>>>>>>>>>>>
>>>>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>>>>> regression test to fail. The problem is that the random
>>>>>>>>>>> number generator is being called in BarrierPathPricer, and
>>>>>>>>>>> since that is run multithread, the sequence that is being
>>>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>>>>
>>>>>>>>>>> I think that fixing this is going to need some code
>>>>>>>>>>> restructuring, but I'd like to get some thoughts as to how
>>>>>>>>>>> to do this. Basically, the interface needs to be changed
>>>>>>>>>>> slightly so that the random numbers are drawn in a fixed
>>>>>>>>>>> order, and that might mean one call to get any additional
>>>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -----
>>>>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more.
>>>>>>>>>>> Get the most from the latest Intel processors and
>>>>>>>>>>> coprocessors. See abstracts and register >
>>>>>>>>>>>
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4>>>>>>>>>>> 1
>>>>>>>>>>> 4
>>>>>>>>>>> 0 / o stg.c lktrk
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>>>
[hidden email]
>>>>>>>>>>>
https://lists.sourceforge.net/lists/listinfo/quantlib-dev>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> ----------
>>>>>>>>>> --
>>>>>>>>>> Slashdot TV.
>>>>>>>>>> Video for Nerds. Stuff that matters.
>>>>>>>>>>
http://tv.slashdot.org/>>>>>>>>>> _______________________________________________
>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>>
[hidden email]
>>>>>>>>>>
https://lists.sourceforge.net/lists/listinfo/quantlib-dev>>>>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>