QuantLib › quantlib-dev

Re: Openmp work on mcarlo : Dynamic Creator MT

Classic

List

Threaded

19 messages Options

Peter Caspers-4

Re: Openmp work on mcarlo : Dynamic Creator MT

Hi Joseph, all,

I added a wrapper for the dcmt library (Dynamic Creator of Mersenne Twisters).

https://github.com/lballabio/quantlib/pull/132

I guess this is a useful building block for multithreaded monte carlo.
Since for bigger p the dynamic creation takes a long time (it feels
more like mining than computing ...), I precomputed 8 independent
instances (i.e. for use in at most 8 parallel threads), for the
"standard" value p = 19937 and word size 32, which one can instantiate
with

MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );

for i = 0, ... , 7.

In addition the speed of random number generation seems a bit faster
in the dcmt library than with the original ql twister. I observe
running times scaled by a factor of 0.8 when generating 1E8 numbers.

All this is of course experimental and not well tested, so any
feedback and experiences are very welcome. I'd be very interested in
your opinion on the dcmt library and applications in parallel monte
carlo.

Peter

On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:

> I've done some more parallelization with openmp and quantlib. I've uploaded
> the changes to the https://github.com/joequant/quantlib. The branch openmp
> has some changes that I've issued a pull-request for. openmp-mcario has
> some changes that need some more work.
>
> I've gotten the MC to work by generating the paths in a critical situation.
> Calculating the prices once I have the path is multithreaded, but right now
> I need to generate the paths in a single thread to make sure that the same
> sequence is generated.
>
> The big issue right now is that there is a race condition in the calculation
> of barrier options which is causing one regression test to fail. The
> problem is that the random number generator is being called in
> BarrierPathPricer, and since that is run multithread, the sequence that is
> being pulled will change from run to run based on whether other paths have
> pulled random numbers already.
>
> I think that fixing this is going to need some code restructuring, but I'd
> like to get some thoughts as to how to do this. Basically, the interface
> needs to be changed slightly so that the random numbers are drawn in a fixed
> order, and that might mean one call to get any additional random numbers in
> a pricer, which gets called in a critical section, and another to run the
> pricer with the random numbers.
>
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
> _______________________________________________
> QuantLib-dev mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>

cheng li

答复: Openmp work on mcarlo : Dynamic Creator MT

Hi Peter,

I have used your wrapper dcmt library and test with following codes: It
seems dcmt in single thread is 4X slower than the QL original MT. Is this
consistent with your side?

#include <ql/quantlib.hpp>
#include <boost/timer.hpp>
#include <iostream>

using namespace QuantLib;
using namespace std;

int main() {

int samples;
cin >> samples;
boost::timer myTimer;

MersenneTwisterUniformRng orignalMT;
for(Size i=0; i<samples; ++i)
orignalMT.next();

cout << myTimer.elapsed() << endl;

myTimer.restart();

MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);

for(Size i=0; i<samples; ++i) {
mt.next();
}

cout << myTimer.elapsed() << endl;

int n;
std::cin>>n;
return 0;
}

Regards,
Cheng

-----邮件原件-----
发件人: Peter Caspers [mailto:[hidden email]]
发送时间: 2014年9月6日 20:48
收件人: Joseph Wang
抄送: QuantLib Mailing Lists
主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT

Hi Joseph, all,

I added a wrapper for the dcmt library (Dynamic Creator of Mersenne
Twisters).

https://github.com/lballabio/quantlib/pull/132

I guess this is a useful building block for multithreaded monte carlo.
Since for bigger p the dynamic creation takes a long time (it feels more
like mining than computing ...), I precomputed 8 independent instances (i.e.
for use in at most 8 parallel threads), for the "standard" value p = 19937
and word size 32, which one can instantiate with

MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );

for i = 0, ... , 7.

In addition the speed of random number generation seems a bit faster in the
dcmt library than with the original ql twister. I observe running times
scaled by a factor of 0.8 when generating 1E8 numbers.

All this is of course experimental and not well tested, so any feedback and
experiences are very welcome. I'd be very interested in your opinion on the
dcmt library and applications in parallel monte carlo.

Peter

On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
> I've done some more parallelization with openmp and quantlib. I've
> uploaded the changes to the https://github.com/joequant/quantlib. The
> branch openmp has some changes that I've issued a pull-request for.
> openmp-mcario has some changes that need some more work.
>
> I've gotten the MC to work by generating the paths in a critical
situation.

> Calculating the prices once I have the path is multithreaded, but
> right now I need to generate the paths in a single thread to make sure
> that the same sequence is generated.
>
> The big issue right now is that there is a race condition in the
> calculation of barrier options which is causing one regression test to
> fail. The problem is that the random number generator is being called
> in BarrierPathPricer, and since that is run multithread, the sequence
> that is being pulled will change from run to run based on whether
> other paths have pulled random numbers already.
>
> I think that fixing this is going to need some code restructuring, but
> I'd like to get some thoughts as to how to do this. Basically, the
> interface needs to be changed slightly so that the random numbers are
> drawn in a fixed order, and that might mean one call to get any
> additional random numbers in a pricer, which gets called in a critical
> section, and another to run the pricer with the random numbers.
>
>
>
>
> ----------------------------------------------------------------------
> -------- October Webinars: Code for Performance Free Intel webinars
> can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
> most from the latest Intel processors and coprocessors. See abstracts
> and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.c
> lktrk _______________________________________________
> QuantLib-dev mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>

----------------------------------------------------------------------------
--
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev

Joseph Wang-4

Re: Openmp work on mcarlo : Dynamic Creator MT

In reply to this post by Peter Caspers-4

Yes. It would it be nice if we could get it in.

My experience with MC is that the big bottleneck with parallel
applications is a testing issue (i.e. how to you verify that the
number is correct). The approach that is industry standard involved
using an RNG that can be started at a given location and to generate
the same random number for parallel and standard paths. I know of one
bank where they ended up using mesenne twister for this.

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev

Peter Caspers-4

Re: 答复: Openmp work on mcarlo : Dynamic Creator MT

In reply to this post by cheng li

Hi Cheng,

no, I get better timings with the dcmt implementation, e.g. for 1E8 numbers

dcmt 0.982s
quantlib 1.159s

on my computer. Can you post your platform and compiler settings, so
that I can try to reproduce ?

Thanks
Peter

On 12 September 2014 05:29, cheng li <[hidden email]> wrote:

> Hi Peter,
>
> I have used your wrapper dcmt library and test with following codes: It
> seems dcmt in single thread is 4X slower than the QL original MT. Is this
> consistent with your side?
>
> #include <ql/quantlib.hpp>
> #include <boost/timer.hpp>
> #include <iostream>
>
> using namespace QuantLib;
> using namespace std;
>
> int main() {
>
> int samples;
> cin >> samples;
> boost::timer myTimer;
>
> MersenneTwisterUniformRng orignalMT;
> for(Size i=0; i<samples; ++i)
> orignalMT.next();
>
> cout << myTimer.elapsed() << endl;
>
> myTimer.restart();
>
> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>
> for(Size i=0; i<samples; ++i) {
> mt.next();
> }
>
> cout << myTimer.elapsed() << endl;
>
> int n;
> std::cin>>n;
> return 0;
> }
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月6日 20:48
> 收件人: Joseph Wang
> 抄送: QuantLib Mailing Lists
> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>
> Hi Joseph, all,
>
> I added a wrapper for the dcmt library (Dynamic Creator of Mersenne
> Twisters).
>
> https://github.com/lballabio/quantlib/pull/132
>
> I guess this is a useful building block for multithreaded monte carlo.
> Since for bigger p the dynamic creation takes a long time (it feels more
> like mining than computing ...), I precomputed 8 independent instances (i.e.
> for use in at most 8 parallel threads), for the "standard" value p = 19937
> and word size 32, which one can instantiate with
>
> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>
> for i = 0, ... , 7.
>
> In addition the speed of random number generation seems a bit faster in the
> dcmt library than with the original ql twister. I observe running times
> scaled by a factor of 0.8 when generating 1E8 numbers.
>
> All this is of course experimental and not well tested, so any feedback and
> experiences are very welcome. I'd be very interested in your opinion on the
> dcmt library and applications in parallel monte carlo.
>
> Peter
>
> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>> I've done some more parallelization with openmp and quantlib. I've
>> uploaded the changes to the https://github.com/joequant/quantlib. The
>> branch openmp has some changes that I've issued a pull-request for.
>> openmp-mcario has some changes that need some more work.
>>
>> I've gotten the MC to work by generating the paths in a critical
> situation.
>> Calculating the prices once I have the path is multithreaded, but
>> right now I need to generate the paths in a single thread to make sure
>> that the same sequence is generated.
>>
>> The big issue right now is that there is a race condition in the
>> calculation of barrier options which is causing one regression test to
>> fail. The problem is that the random number generator is being called
>> in BarrierPathPricer, and since that is run multithread, the sequence
>> that is being pulled will change from run to run based on whether
>> other paths have pulled random numbers already.
>>
>> I think that fixing this is going to need some code restructuring, but
>> I'd like to get some thoughts as to how to do this. Basically, the
>> interface needs to be changed slightly so that the random numbers are
>> drawn in a fixed order, and that might mean one call to get any
>> additional random numbers in a pricer, which gets called in a critical
>> section, and another to run the pricer with the random numbers.
>>
>>
>>
>>
>> ----------------------------------------------------------------------
>> -------- October Webinars: Code for Performance Free Intel webinars
>> can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>> most from the latest Intel processors and coprocessors. See abstracts
>> and register >
>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.c
>> lktrk _______________________________________________
>> QuantLib-dev mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>
>
> ----------------------------------------------------------------------------
> --
> Slashdot TV.
> Video for Nerds. Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> QuantLib-dev mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev

Peter Caspers-4

Re: 答复: Openmp work on mcarlo : Dynamic Creator MT

In reply to this post by Peter Caspers-4

Hi Cheng,

indeed with msvc I get a slow down with a factor of ~2.8x. As I said,
under gcc it is a speed up ~ 0.8x (with -O3).

Does anyone have an idea where the different behaviour under gcc /
linux and msvc might come from (and how to improve the msvc side if
possible) ?

Kind regards
Peter

On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:

> Thanks Peter.
>
> Regards,
> Cheng
>
> 发自我的 iPad
>
>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>
>> I will have a look on monday ( I have a Windows machine at work ) and see how it works there
>>
>> Thanks
>> Peter
>>
>> Von meinem iPhone gesendet
>>
>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>
>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55 under release mode
>>>
>>> 发自我的 iPad
>>>
>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>
>>>> Hi Cheng,
>>>>
>>>> no, I get better timings with the dcmt implementation, e.g. for 1E8 numbers
>>>>
>>>> dcmt 0.982s
>>>> quantlib 1.159s
>>>>
>>>> on my computer. Can you post your platform and compiler settings, so
>>>> that I can try to reproduce ?
>>>>
>>>> Thanks
>>>> Peter
>>>>
>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>> Hi Peter,
>>>>>
>>>>> I have used your wrapper dcmt library and test with following codes: It
>>>>> seems dcmt in single thread is 4X slower than the QL original MT. Is this
>>>>> consistent with your side?
>>>>>
>>>>> #include <ql/quantlib.hpp>
>>>>> #include <boost/timer.hpp>
>>>>> #include <iostream>
>>>>>
>>>>> using namespace QuantLib;
>>>>> using namespace std;
>>>>>
>>>>> int main() {
>>>>>
>>>>> int samples;
>>>>> cin >> samples;
>>>>> boost::timer myTimer;
>>>>>
>>>>> MersenneTwisterUniformRng orignalMT;
>>>>> for(Size i=0; i<samples; ++i)
>>>>> orignalMT.next();
>>>>>
>>>>> cout << myTimer.elapsed() << endl;
>>>>>
>>>>> myTimer.restart();
>>>>>
>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>
>>>>> for(Size i=0; i<samples; ++i) {
>>>>> mt.next();
>>>>> }
>>>>>
>>>>> cout << myTimer.elapsed() << endl;
>>>>>
>>>>> int n;
>>>>> std::cin>>n;
>>>>> return 0;
>>>>> }
>>>>>
>>>>> Regards,
>>>>> Cheng
>>>>>
>>>>> -----邮件原件-----
>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>> 发送时间: 2014年9月6日 20:48
>>>>> 收件人: Joseph Wang
>>>>> 抄送: QuantLib Mailing Lists
>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>>>>>
>>>>> Hi Joseph, all,
>>>>>
>>>>> I added a wrapper for the dcmt library (Dynamic Creator of Mersenne
>>>>> Twisters).
>>>>>
>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>
>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>> Since for bigger p the dynamic creation takes a long time (it feels more
>>>>> like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>> for use in at most 8 parallel threads), for the "standard" value p = 19937
>>>>> and word size 32, which one can instantiate with
>>>>>
>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>
>>>>> for i = 0, ... , 7.
>>>>>
>>>>> In addition the speed of random number generation seems a bit faster in the
>>>>> dcmt library than with the original ql twister. I observe running times
>>>>> scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>
>>>>> All this is of course experimental and not well tested, so any feedback and
>>>>> experiences are very welcome. I'd be very interested in your opinion on the
>>>>> dcmt library and applications in parallel monte carlo.
>>>>>
>>>>> Peter
>>>>>
>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>> I've done some more parallelization with openmp and quantlib. I've
>>>>>> uploaded the changes to the https://github.com/joequant/quantlib. The
>>>>>> branch openmp has some changes that I've issued a pull-request for.
>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>
>>>>>> I've gotten the MC to work by generating the paths in a critical
>>>>> situation.
>>>>>> Calculating the prices once I have the path is multithreaded, but
>>>>>> right now I need to generate the paths in a single thread to make sure
>>>>>> that the same sequence is generated.
>>>>>>
>>>>>> The big issue right now is that there is a race condition in the
>>>>>> calculation of barrier options which is causing one regression test to
>>>>>> fail. The problem is that the random number generator is being called
>>>>>> in BarrierPathPricer, and since that is run multithread, the sequence
>>>>>> that is being pulled will change from run to run based on whether
>>>>>> other paths have pulled random numbers already.
>>>>>>
>>>>>> I think that fixing this is going to need some code restructuring, but
>>>>>> I'd like to get some thoughts as to how to do this. Basically, the
>>>>>> interface needs to be changed slightly so that the random numbers are
>>>>>> drawn in a fixed order, and that might mean one call to get any
>>>>>> additional random numbers in a pricer, which gets called in a critical
>>>>>> section, and another to run the pricer with the random numbers.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>> -------- October Webinars: Code for Performance Free Intel webinars
>>>>>> can help you accelerate application performance.
>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>>>>>> most from the latest Intel processors and coprocessors. See abstracts
>>>>>> and register >
>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.c
>>>>>> lktrk _______________________________________________
>>>>>> QuantLib-dev mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>
>>>>> ----------------------------------------------------------------------------
>>>>> --
>>>>> Slashdot TV.
>>>>> Video for Nerds. Stuff that matters.
>>>>> http://tv.slashdot.org/
>>>>> _______________________________________________
>>>>> QuantLib-dev mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev

cheng li

答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.

I'll try -O3 on my machine also with Ubuntu.

Regards,
Cheng

-----邮件原件-----
发件人: Peter Caspers [mailto:[hidden email]]
发送时间: 2014年9月17日 0:32
收件人: Cheng Li; QuantLib Mailing Lists
主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT

Hi Cheng,

indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).

Does anyone have an idea where the different behaviour under gcc / linux and msvc might come from (and how to improve the msvc side if
possible) ?

Kind regards
Peter

On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:

> Thanks Peter.
>
> Regards,
> Cheng
>
> 发自我的 iPad
>
>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>
>> I will have a look on monday ( I have a Windows machine at work ) and
>> see how it works there
>>
>> Thanks
>> Peter
>>
>> Von meinem iPhone gesendet
>>
>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>
>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>> under release mode
>>>
>>> 发自我的 iPad
>>>
>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>
>>>> Hi Cheng,
>>>>
>>>> no, I get better timings with the dcmt implementation, e.g. for 1E8
>>>> numbers
>>>>
>>>> dcmt 0.982s
>>>> quantlib 1.159s
>>>>
>>>> on my computer. Can you post your platform and compiler settings,
>>>> so that I can try to reproduce ?
>>>>
>>>> Thanks
>>>> Peter
>>>>
>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>> Hi Peter,
>>>>>
>>>>> I have used your wrapper dcmt library and test with following
>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>> original MT. Is this consistent with your side?
>>>>>
>>>>> #include <ql/quantlib.hpp>
>>>>> #include <boost/timer.hpp>
>>>>> #include <iostream>
>>>>>
>>>>> using namespace QuantLib;
>>>>> using namespace std;
>>>>>
>>>>> int main() {
>>>>>
>>>>> int samples;
>>>>> cin >> samples;
>>>>> boost::timer myTimer;
>>>>>
>>>>> MersenneTwisterUniformRng orignalMT;
>>>>> for(Size i=0; i<samples; ++i)
>>>>> orignalMT.next();
>>>>>
>>>>> cout << myTimer.elapsed() << endl;
>>>>>
>>>>> myTimer.restart();
>>>>>
>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>
>>>>> for(Size i=0; i<samples; ++i) {
>>>>> mt.next();
>>>>> }
>>>>>
>>>>> cout << myTimer.elapsed() << endl;
>>>>>
>>>>> int n;
>>>>> std::cin>>n;
>>>>> return 0;
>>>>> }
>>>>>
>>>>> Regards,
>>>>> Cheng
>>>>>
>>>>> -----邮件原件-----
>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>> 发送时间: 2014年9月6日 20:48
>>>>> 收件人: Joseph Wang
>>>>> 抄送: QuantLib Mailing Lists
>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>>>>>
>>>>> Hi Joseph, all,
>>>>>
>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>> Mersenne Twisters).
>>>>>
>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>
>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>> for use in at most 8 parallel threads), for the "standard" value p
>>>>> = 19937 and word size 32, which one can instantiate with
>>>>>
>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>
>>>>> for i = 0, ... , 7.
>>>>>
>>>>> In addition the speed of random number generation seems a bit
>>>>> faster in the dcmt library than with the original ql twister. I
>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>
>>>>> All this is of course experimental and not well tested, so any
>>>>> feedback and experiences are very welcome. I'd be very interested
>>>>> in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>
>>>>> Peter
>>>>>
>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>> I've uploaded the changes to the
>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>
>>>>>> I've gotten the MC to work by generating the paths in a critical
>>>>> situation.
>>>>>> Calculating the prices once I have the path is multithreaded, but
>>>>>> right now I need to generate the paths in a single thread to make
>>>>>> sure that the same sequence is generated.
>>>>>>
>>>>>> The big issue right now is that there is a race condition in the
>>>>>> calculation of barrier options which is causing one regression
>>>>>> test to fail. The problem is that the random number generator is
>>>>>> being called in BarrierPathPricer, and since that is run
>>>>>> multithread, the sequence that is being pulled will change from
>>>>>> run to run based on whether other paths have pulled random numbers already.
>>>>>>
>>>>>> I think that fixing this is going to need some code
>>>>>> restructuring, but I'd like to get some thoughts as to how to do
>>>>>> this. Basically, the interface needs to be changed slightly so
>>>>>> that the random numbers are drawn in a fixed order, and that
>>>>>> might mean one call to get any additional random numbers in a
>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----------------------------------------------------------------
>>>>>> -----
>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>> webinars can help you accelerate application performance.
>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get
>>>>>> the most from the latest Intel processors and coprocessors. See
>>>>>> abstracts and register >
>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/o
>>>>>> stg.c lktrk _______________________________________________
>>>>>> QuantLib-dev mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> ----------
>>>>> --
>>>>> Slashdot TV.
>>>>> Video for Nerds. Stuff that matters.
>>>>> http://tv.slashdot.org/
>>>>> _______________________________________________
>>>>> QuantLib-dev mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>

Peter Caspers-4

Re: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

with gcc 4.9.1 and O2 the new mt is a bit slower than the original one
(but only by a factor of 1.1).
I have to add both -frename-registers, -finline-functions to -O2 to
get the speed up back I mentioned before.

Which compiler do you use on Ubuntu ?

Peter

On 17 September 2014 03:26, cheng li <[hidden email]> wrote:

> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>
> I'll try -O3 on my machine also with Ubuntu.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月17日 0:32
> 收件人: Cheng Li; QuantLib Mailing Lists
> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>
> Hi Cheng,
>
> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>
> Does anyone have an idea where the different behaviour under gcc / linux and msvc might come from (and how to improve the msvc side if
> possible) ?
>
> Kind regards
> Peter
>
>
>
> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>> Thanks Peter.
>>
>> Regards,
>> Cheng
>>
>> 发自我的 iPad
>>
>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>
>>> I will have a look on monday ( I have a Windows machine at work ) and
>>> see how it works there
>>>
>>> Thanks
>>> Peter
>>>
>>> Von meinem iPhone gesendet
>>>
>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>
>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>> under release mode
>>>>
>>>> 发自我的 iPad
>>>>
>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>
>>>>> Hi Cheng,
>>>>>
>>>>> no, I get better timings with the dcmt implementation, e.g. for 1E8
>>>>> numbers
>>>>>
>>>>> dcmt 0.982s
>>>>> quantlib 1.159s
>>>>>
>>>>> on my computer. Can you post your platform and compiler settings,
>>>>> so that I can try to reproduce ?
>>>>>
>>>>> Thanks
>>>>> Peter
>>>>>
>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>> I have used your wrapper dcmt library and test with following
>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>> original MT. Is this consistent with your side?
>>>>>>
>>>>>> #include <ql/quantlib.hpp>
>>>>>> #include <boost/timer.hpp>
>>>>>> #include <iostream>
>>>>>>
>>>>>> using namespace QuantLib;
>>>>>> using namespace std;
>>>>>>
>>>>>> int main() {
>>>>>>
>>>>>> int samples;
>>>>>> cin >> samples;
>>>>>> boost::timer myTimer;
>>>>>>
>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>> for(Size i=0; i<samples; ++i)
>>>>>> orignalMT.next();
>>>>>>
>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>
>>>>>> myTimer.restart();
>>>>>>
>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>
>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>> mt.next();
>>>>>> }
>>>>>>
>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>
>>>>>> int n;
>>>>>> std::cin>>n;
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> Regards,
>>>>>> Cheng
>>>>>>
>>>>>> -----邮件原件-----
>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>> 收件人: Joseph Wang
>>>>>> 抄送: QuantLib Mailing Lists
>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>>>>>>
>>>>>> Hi Joseph, all,
>>>>>>
>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>> Mersenne Twisters).
>>>>>>
>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>
>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>> for use in at most 8 parallel threads), for the "standard" value p
>>>>>> = 19937 and word size 32, which one can instantiate with
>>>>>>
>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>
>>>>>> for i = 0, ... , 7.
>>>>>>
>>>>>> In addition the speed of random number generation seems a bit
>>>>>> faster in the dcmt library than with the original ql twister. I
>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>
>>>>>> All this is of course experimental and not well tested, so any
>>>>>> feedback and experiences are very welcome. I'd be very interested
>>>>>> in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>> I've uploaded the changes to the
>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>
>>>>>>> I've gotten the MC to work by generating the paths in a critical
>>>>>> situation.
>>>>>>> Calculating the prices once I have the path is multithreaded, but
>>>>>>> right now I need to generate the paths in a single thread to make
>>>>>>> sure that the same sequence is generated.
>>>>>>>
>>>>>>> The big issue right now is that there is a race condition in the
>>>>>>> calculation of barrier options which is causing one regression
>>>>>>> test to fail. The problem is that the random number generator is
>>>>>>> being called in BarrierPathPricer, and since that is run
>>>>>>> multithread, the sequence that is being pulled will change from
>>>>>>> run to run based on whether other paths have pulled random numbers already.
>>>>>>>
>>>>>>> I think that fixing this is going to need some code
>>>>>>> restructuring, but I'd like to get some thoughts as to how to do
>>>>>>> this. Basically, the interface needs to be changed slightly so
>>>>>>> that the random numbers are drawn in a fixed order, and that
>>>>>>> might mean one call to get any additional random numbers in a
>>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----------------------------------------------------------------
>>>>>>> -----
>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>> webinars can help you accelerate application performance.
>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get
>>>>>>> the most from the latest Intel processors and coprocessors. See
>>>>>>> abstracts and register >
>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/o
>>>>>>> stg.c lktrk _______________________________________________
>>>>>>> QuantLib-dev mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>
>>>>>> ------------------------------------------------------------------
>>>>>> ----------
>>>>>> --
>>>>>> Slashdot TV.
>>>>>> Video for Nerds. Stuff that matters.
>>>>>> http://tv.slashdot.org/
>>>>>> _______________________________________________
>>>>>> QuantLib-dev mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>
>

cheng li

答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Hi Peter,

I used gcc 4.8.2.

My result with O3 optimization is still not good. Similar performance of new MT ( about 3~4X speed down)

I used such statement to turn on o3 optimization before I do ./configure for QuantLib,

Export CXXFLAGS="-g -O3"

Am I right?

Regards,
Cheng

-----邮件原件-----
发件人: Peter Caspers [mailto:[hidden email]]
发送时间: 2014年9月18日 0:36
收件人: cheng li
抄送: QuantLib Mailing Lists
主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT

with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.

Which compiler do you use on Ubuntu ?

Peter

On 17 September 2014 03:26, cheng li <[hidden email]> wrote:

> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>
> I'll try -O3 on my machine also with Ubuntu.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月17日 0:32
> 收件人: Cheng Li; QuantLib Mailing Lists
> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>
> Hi Cheng,
>
> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>
> Does anyone have an idea where the different behaviour under gcc /
> linux and msvc might come from (and how to improve the msvc side if
> possible) ?
>
> Kind regards
> Peter
>
>
>
> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>> Thanks Peter.
>>
>> Regards,
>> Cheng
>>
>> 发自我的 iPad
>>
>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>
>>> I will have a look on monday ( I have a Windows machine at work )
>>> and see how it works there
>>>
>>> Thanks
>>> Peter
>>>
>>> Von meinem iPhone gesendet
>>>
>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>
>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>> under release mode
>>>>
>>>> 发自我的 iPad
>>>>
>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>
>>>>> Hi Cheng,
>>>>>
>>>>> no, I get better timings with the dcmt implementation, e.g. for
>>>>> 1E8 numbers
>>>>>
>>>>> dcmt 0.982s
>>>>> quantlib 1.159s
>>>>>
>>>>> on my computer. Can you post your platform and compiler settings,
>>>>> so that I can try to reproduce ?
>>>>>
>>>>> Thanks
>>>>> Peter
>>>>>
>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>> I have used your wrapper dcmt library and test with following
>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>> original MT. Is this consistent with your side?
>>>>>>
>>>>>> #include <ql/quantlib.hpp>
>>>>>> #include <boost/timer.hpp>
>>>>>> #include <iostream>
>>>>>>
>>>>>> using namespace QuantLib;
>>>>>> using namespace std;
>>>>>>
>>>>>> int main() {
>>>>>>
>>>>>> int samples;
>>>>>> cin >> samples;
>>>>>> boost::timer myTimer;
>>>>>>
>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>> for(Size i=0; i<samples; ++i)
>>>>>> orignalMT.next();
>>>>>>
>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>
>>>>>> myTimer.restart();
>>>>>>
>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>
>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>> mt.next();
>>>>>> }
>>>>>>
>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>
>>>>>> int n;
>>>>>> std::cin>>n;
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> Regards,
>>>>>> Cheng
>>>>>>
>>>>>> -----邮件原件-----
>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>> 收件人: Joseph Wang
>>>>>> 抄送: QuantLib Mailing Lists
>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>>>>>>
>>>>>> Hi Joseph, all,
>>>>>>
>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>> Mersenne Twisters).
>>>>>>
>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>
>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>> for use in at most 8 parallel threads), for the "standard" value
>>>>>> p = 19937 and word size 32, which one can instantiate with
>>>>>>
>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>
>>>>>> for i = 0, ... , 7.
>>>>>>
>>>>>> In addition the speed of random number generation seems a bit
>>>>>> faster in the dcmt library than with the original ql twister. I
>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>
>>>>>> All this is of course experimental and not well tested, so any
>>>>>> feedback and experiences are very welcome. I'd be very interested
>>>>>> in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>> I've uploaded the changes to the
>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>
>>>>>>> I've gotten the MC to work by generating the paths in a critical
>>>>>> situation.
>>>>>>> Calculating the prices once I have the path is multithreaded,
>>>>>>> but right now I need to generate the paths in a single thread to
>>>>>>> make sure that the same sequence is generated.
>>>>>>>
>>>>>>> The big issue right now is that there is a race condition in the
>>>>>>> calculation of barrier options which is causing one regression
>>>>>>> test to fail. The problem is that the random number generator
>>>>>>> is being called in BarrierPathPricer, and since that is run
>>>>>>> multithread, the sequence that is being pulled will change from
>>>>>>> run to run based on whether other paths have pulled random numbers already.
>>>>>>>
>>>>>>> I think that fixing this is going to need some code
>>>>>>> restructuring, but I'd like to get some thoughts as to how to do
>>>>>>> this. Basically, the interface needs to be changed slightly so
>>>>>>> that the random numbers are drawn in a fixed order, and that
>>>>>>> might mean one call to get any additional random numbers in a
>>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> -
>>>>>>> -----
>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>> webinars can help you accelerate application performance.
>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get
>>>>>>> the most from the latest Intel processors and coprocessors. See
>>>>>>> abstracts and register >
>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/
>>>>>>> o stg.c lktrk _______________________________________________
>>>>>>> QuantLib-dev mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>
>>>>>> -----------------------------------------------------------------
>>>>>> -
>>>>>> ----------
>>>>>> --
>>>>>> Slashdot TV.
>>>>>> Video for Nerds. Stuff that matters.
>>>>>> http://tv.slashdot.org/
>>>>>> _______________________________________________
>>>>>> QuantLib-dev mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>
>

cheng li

答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Let me try your statement once I have a time.

Regards,
Cheng

-----邮件原件-----
发件人: cheng li [mailto:[hidden email]]
发送时间: 2014年9月18日 9:18
收件人: 'Peter Caspers'
抄送: 'QuantLib Mailing Lists'
主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT

Hi Peter,

I used gcc 4.8.2.

My result with O3 optimization is still not good. Similar performance of new MT ( about 3~4X speed down)

I used such statement to turn on o3 optimization before I do ./configure for QuantLib,

Export CXXFLAGS="-g -O3"

Am I right?

Regards,
Cheng

-----邮件原件-----
发件人: Peter Caspers [mailto:[hidden email]]
发送时间: 2014年9月18日 0:36
收件人: cheng li
抄送: QuantLib Mailing Lists
主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT

with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.

Which compiler do you use on Ubuntu ?

Peter

On 17 September 2014 03:26, cheng li <[hidden email]> wrote:

> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>
> I'll try -O3 on my machine also with Ubuntu.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月17日 0:32
> 收件人: Cheng Li; QuantLib Mailing Lists
> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>
> Hi Cheng,
>
> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>
> Does anyone have an idea where the different behaviour under gcc /
> linux and msvc might come from (and how to improve the msvc side if
> possible) ?
>
> Kind regards
> Peter
>
>
>
> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>> Thanks Peter.
>>
>> Regards,
>> Cheng
>>
>> 发自我的 iPad
>>
>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>
>>> I will have a look on monday ( I have a Windows machine at work )
>>> and see how it works there
>>>
>>> Thanks
>>> Peter
>>>
>>> Von meinem iPhone gesendet
>>>
>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>
>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>> under release mode
>>>>
>>>> 发自我的 iPad
>>>>
>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>
>>>>> Hi Cheng,
>>>>>
>>>>> no, I get better timings with the dcmt implementation, e.g. for
>>>>> 1E8 numbers
>>>>>
>>>>> dcmt 0.982s
>>>>> quantlib 1.159s
>>>>>
>>>>> on my computer. Can you post your platform and compiler settings,
>>>>> so that I can try to reproduce ?
>>>>>
>>>>> Thanks
>>>>> Peter
>>>>>
>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>> I have used your wrapper dcmt library and test with following
>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>> original MT. Is this consistent with your side?
>>>>>>
>>>>>> #include <ql/quantlib.hpp>
>>>>>> #include <boost/timer.hpp>
>>>>>> #include <iostream>
>>>>>>
>>>>>> using namespace QuantLib;
>>>>>> using namespace std;
>>>>>>
>>>>>> int main() {
>>>>>>
>>>>>> int samples;
>>>>>> cin >> samples;
>>>>>> boost::timer myTimer;
>>>>>>
>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>> for(Size i=0; i<samples; ++i)
>>>>>> orignalMT.next();
>>>>>>
>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>
>>>>>> myTimer.restart();
>>>>>>
>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>
>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>> mt.next();
>>>>>> }
>>>>>>
>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>
>>>>>> int n;
>>>>>> std::cin>>n;
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> Regards,
>>>>>> Cheng
>>>>>>
>>>>>> -----邮件原件-----
>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>> 收件人: Joseph Wang
>>>>>> 抄送: QuantLib Mailing Lists
>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>>>>>>
>>>>>> Hi Joseph, all,
>>>>>>
>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>> Mersenne Twisters).
>>>>>>
>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>
>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>> for use in at most 8 parallel threads), for the "standard" value
>>>>>> p = 19937 and word size 32, which one can instantiate with
>>>>>>
>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>
>>>>>> for i = 0, ... , 7.
>>>>>>
>>>>>> In addition the speed of random number generation seems a bit
>>>>>> faster in the dcmt library than with the original ql twister. I
>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>
>>>>>> All this is of course experimental and not well tested, so any
>>>>>> feedback and experiences are very welcome. I'd be very interested
>>>>>> in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>> I've uploaded the changes to the
>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>
>>>>>>> I've gotten the MC to work by generating the paths in a critical
>>>>>> situation.
>>>>>>> Calculating the prices once I have the path is multithreaded,
>>>>>>> but right now I need to generate the paths in a single thread to
>>>>>>> make sure that the same sequence is generated.
>>>>>>>
>>>>>>> The big issue right now is that there is a race condition in the
>>>>>>> calculation of barrier options which is causing one regression
>>>>>>> test to fail. The problem is that the random number generator
>>>>>>> is being called in BarrierPathPricer, and since that is run
>>>>>>> multithread, the sequence that is being pulled will change from
>>>>>>> run to run based on whether other paths have pulled random numbers already.
>>>>>>>
>>>>>>> I think that fixing this is going to need some code
>>>>>>> restructuring, but I'd like to get some thoughts as to how to do
>>>>>>> this. Basically, the interface needs to be changed slightly so
>>>>>>> that the random numbers are drawn in a fixed order, and that
>>>>>>> might mean one call to get any additional random numbers in a
>>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> -
>>>>>>> -----
>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>> webinars can help you accelerate application performance.
>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get
>>>>>>> the most from the latest Intel processors and coprocessors. See
>>>>>>> abstracts and register >
>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/
>>>>>>> o stg.c lktrk _______________________________________________
>>>>>>> QuantLib-dev mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>
>>>>>> -----------------------------------------------------------------
>>>>>> -
>>>>>> ----------
>>>>>> --
>>>>>> Slashdot TV.
>>>>>> Video for Nerds. Stuff that matters.
>>>>>> http://tv.slashdot.org/
>>>>>> _______________________________________________
>>>>>> QuantLib-dev mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>
>

Peter Caspers-4

Re: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Hi Cheng,

sorry, this was my fault, I messed up the timings, because I did not
use consistent optimizer flags when compiling the library and the test
program.

Actually on Windows (same machine on which I run Ubuntu, which doesn't
really matter, because my computer in office gives very similar
timings) I get for 1E8 random numbers generated (with O2)

400ms / 1100ms

for the original ql mt / dynamic creator mt. The ql mt is just as fast
as the boost mt implementation by the way. On Ubuntu with gcc 4.8.1
and O3 I get

290ms / 870ms

and with O2 a close value, for the creator mt 910ms. Also it makes no
difference if I use gcc 4.9.1 or clang 3.6.0.

If I directly call the original C routine without using the wrapper
object, I get 720ms.

If I use the original library and a C example (both compiled with O3,
this is the configuration how the library is shipped (it has a
hardcoded make file)) => 730ms.

This means, the wrapper introduces a slow down by 20% which seems not too bad.

Otherwise the dcmt is slower by a factor of around 2-3 compared to the
original mt in all cases. Since this is already the case with the
original library, I wouldn't try to do anything about it at the
moment.

What is your opinion on this ?

Peter

I compared dfiferent platforms again, but now on the _same_ machine -
Original MT / Dynamic Creator MT (generation of 1E8 numbers, single
threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the
boost implementation mt19937, which is very close to the ql original
mt in all cases.

Winodws / MSVC 2010 => 400ms / 1100ms
Ubuntu / gcc 4.9.1 => 1200 ms / 1050 ms
Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms
Ubuntu / clang 3.6.0 => 1340 ms / 1150 ms

clang
290
720
870

(c 730)

so it looks like MSVC does a specific optimization for the QL and
boost mt19937, which does not apply on the other platforms and not the
the dynamic creator mt.

At the moment I stil don't know what it is.

On 18 September 2014 03:33, cheng li <[hidden email]> wrote:

> Let me try your statement once I have a time.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: cheng li [mailto:[hidden email]]
> 发送时间: 2014年9月18日 9:18
> 收件人: 'Peter Caspers'
> 抄送: 'QuantLib Mailing Lists'
> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>
> Hi Peter,
>
> I used gcc 4.8.2.
>
> My result with O3 optimization is still not good. Similar performance of new MT ( about 3~4X speed down)
>
> I used such statement to turn on o3 optimization before I do ./configure for QuantLib,
>
> Export CXXFLAGS="-g -O3"
>
> Am I right?
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月18日 0:36
> 收件人: cheng li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>
> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>
> Which compiler do you use on Ubuntu ?
>
> Peter
>
>
>
> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>
>> I'll try -O3 on my machine also with Ubuntu.
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月17日 0:32
>> 收件人: Cheng Li; QuantLib Mailing Lists
>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>>
>> Hi Cheng,
>>
>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>
>> Does anyone have an idea where the different behaviour under gcc /
>> linux and msvc might come from (and how to improve the msvc side if
>> possible) ?
>>
>> Kind regards
>> Peter
>>
>>
>>
>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>> Thanks Peter.
>>>
>>> Regards,
>>> Cheng
>>>
>>> 发自我的 iPad
>>>
>>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>>
>>>> I will have a look on monday ( I have a Windows machine at work )
>>>> and see how it works there
>>>>
>>>> Thanks
>>>> Peter
>>>>
>>>> Von meinem iPhone gesendet
>>>>
>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>
>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>> under release mode
>>>>>
>>>>> 发自我的 iPad
>>>>>
>>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>>
>>>>>> Hi Cheng,
>>>>>>
>>>>>> no, I get better timings with the dcmt implementation, e.g. for
>>>>>> 1E8 numbers
>>>>>>
>>>>>> dcmt 0.982s
>>>>>> quantlib 1.159s
>>>>>>
>>>>>> on my computer. Can you post your platform and compiler settings,
>>>>>> so that I can try to reproduce ?
>>>>>>
>>>>>> Thanks
>>>>>> Peter
>>>>>>
>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>>> original MT. Is this consistent with your side?
>>>>>>>
>>>>>>> #include <ql/quantlib.hpp>
>>>>>>> #include <boost/timer.hpp>
>>>>>>> #include <iostream>
>>>>>>>
>>>>>>> using namespace QuantLib;
>>>>>>> using namespace std;
>>>>>>>
>>>>>>> int main() {
>>>>>>>
>>>>>>> int samples;
>>>>>>> cin >> samples;
>>>>>>> boost::timer myTimer;
>>>>>>>
>>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>>> for(Size i=0; i<samples; ++i)
>>>>>>> orignalMT.next();
>>>>>>>
>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>
>>>>>>> myTimer.restart();
>>>>>>>
>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>
>>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>>> mt.next();
>>>>>>> }
>>>>>>>
>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>
>>>>>>> int n;
>>>>>>> std::cin>>n;
>>>>>>> return 0;
>>>>>>> }
>>>>>>>
>>>>>>> Regards,
>>>>>>> Cheng
>>>>>>>
>>>>>>> -----邮件原件-----
>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>> 收件人: Joseph Wang
>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>>>>>>>
>>>>>>> Hi Joseph, all,
>>>>>>>
>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>> Mersenne Twisters).
>>>>>>>
>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>
>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>> for use in at most 8 parallel threads), for the "standard" value
>>>>>>> p = 19937 and word size 32, which one can instantiate with
>>>>>>>
>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>
>>>>>>> for i = 0, ... , 7.
>>>>>>>
>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>> faster in the dcmt library than with the original ql twister. I
>>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>
>>>>>>> All this is of course experimental and not well tested, so any
>>>>>>> feedback and experiences are very welcome. I'd be very interested
>>>>>>> in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>> I've uploaded the changes to the
>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>
>>>>>>>> I've gotten the MC to work by generating the paths in a critical
>>>>>>> situation.
>>>>>>>> Calculating the prices once I have the path is multithreaded,
>>>>>>>> but right now I need to generate the paths in a single thread to
>>>>>>>> make sure that the same sequence is generated.
>>>>>>>>
>>>>>>>> The big issue right now is that there is a race condition in the
>>>>>>>> calculation of barrier options which is causing one regression
>>>>>>>> test to fail. The problem is that the random number generator
>>>>>>>> is being called in BarrierPathPricer, and since that is run
>>>>>>>> multithread, the sequence that is being pulled will change from
>>>>>>>> run to run based on whether other paths have pulled random numbers already.
>>>>>>>>
>>>>>>>> I think that fixing this is going to need some code
>>>>>>>> restructuring, but I'd like to get some thoughts as to how to do
>>>>>>>> this. Basically, the interface needs to be changed slightly so
>>>>>>>> that the random numbers are drawn in a fixed order, and that
>>>>>>>> might mean one call to get any additional random numbers in a
>>>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------
>>>>>>>> -
>>>>>>>> -----
>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get
>>>>>>>> the most from the latest Intel processors and coprocessors. See
>>>>>>>> abstracts and register >
>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/
>>>>>>>> o stg.c lktrk _______________________________________________
>>>>>>>> QuantLib-dev mailing list
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>
>>>>>>> -----------------------------------------------------------------
>>>>>>> -
>>>>>>> ----------
>>>>>>> --
>>>>>>> Slashdot TV.
>>>>>>> Video for Nerds. Stuff that matters.
>>>>>>> http://tv.slashdot.org/
>>>>>>> _______________________________________________
>>>>>>> QuantLib-dev mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>
>>
>
>

------------------------------------------------------------------------------
Slashdot TV. Video for Nerds. Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev

cheng li

答复: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Hi Peter,

Thanks for your hard work. I think our results are consistent.

Regards,
Cheng

-----邮件原件-----
发件人: Peter Caspers [mailto:[hidden email]]
发送时间: 2014年9月21日 0:33
收件人: cheng li
抄送: QuantLib Mailing Lists
主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT

Hi Cheng,

sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.

Actually on Windows (same machine on which I run Ubuntu, which doesn't really matter, because my computer in office gives very similar
timings) I get for 1E8 random numbers generated (with O2)

400ms / 1100ms

for the original ql mt / dynamic creator mt. The ql mt is just as fast as the boost mt implementation by the way. On Ubuntu with gcc 4.8.1 and O3 I get

290ms / 870ms

and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.

If I directly call the original C routine without using the wrapper object, I get 720ms.

If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.

This means, the wrapper introduces a slow down by 20% which seems not too bad.

Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.

What is your opinion on this ?

Peter

I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.

Winodws / MSVC 2010 => 400ms / 1100ms
Ubuntu / gcc 4.9.1 => 1200 ms / 1050 ms
Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms
Ubuntu / clang 3.6.0 => 1340 ms / 1150 ms

clang
290
720
870

(c 730)

so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.

At the moment I stil don't know what it is.

On 18 September 2014 03:33, cheng li <[hidden email]> wrote:

> Let me try your statement once I have a time.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: cheng li [mailto:[hidden email]]
> 发送时间: 2014年9月18日 9:18
> 收件人: 'Peter Caspers'
> 抄送: 'QuantLib Mailing Lists'
> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
> MT
>
> Hi Peter,
>
> I used gcc 4.8.2.
>
> My result with O3 optimization is still not good. Similar performance
> of new MT ( about 3~4X speed down)
>
> I used such statement to turn on o3 optimization before I do
> ./configure for QuantLib,
>
> Export CXXFLAGS="-g -O3"
>
> Am I right?
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月18日 0:36
> 收件人: cheng li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
> MT
>
> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>
> Which compiler do you use on Ubuntu ?
>
> Peter
>
>
>
> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>
>> I'll try -O3 on my machine also with Ubuntu.
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月17日 0:32
>> 收件人: Cheng Li; QuantLib Mailing Lists
>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>>
>> Hi Cheng,
>>
>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>
>> Does anyone have an idea where the different behaviour under gcc /
>> linux and msvc might come from (and how to improve the msvc side if
>> possible) ?
>>
>> Kind regards
>> Peter
>>
>>
>>
>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>> Thanks Peter.
>>>
>>> Regards,
>>> Cheng
>>>
>>> 发自我的 iPad
>>>
>>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>>
>>>> I will have a look on monday ( I have a Windows machine at work )
>>>> and see how it works there
>>>>
>>>> Thanks
>>>> Peter
>>>>
>>>> Von meinem iPhone gesendet
>>>>
>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>
>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>> under release mode
>>>>>
>>>>> 发自我的 iPad
>>>>>
>>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>>
>>>>>> Hi Cheng,
>>>>>>
>>>>>> no, I get better timings with the dcmt implementation, e.g. for
>>>>>> 1E8 numbers
>>>>>>
>>>>>> dcmt 0.982s
>>>>>> quantlib 1.159s
>>>>>>
>>>>>> on my computer. Can you post your platform and compiler settings,
>>>>>> so that I can try to reproduce ?
>>>>>>
>>>>>> Thanks
>>>>>> Peter
>>>>>>
>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>>> original MT. Is this consistent with your side?
>>>>>>>
>>>>>>> #include <ql/quantlib.hpp>
>>>>>>> #include <boost/timer.hpp>
>>>>>>> #include <iostream>
>>>>>>>
>>>>>>> using namespace QuantLib;
>>>>>>> using namespace std;
>>>>>>>
>>>>>>> int main() {
>>>>>>>
>>>>>>> int samples;
>>>>>>> cin >> samples;
>>>>>>> boost::timer myTimer;
>>>>>>>
>>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>>> for(Size i=0; i<samples; ++i)
>>>>>>> orignalMT.next();
>>>>>>>
>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>
>>>>>>> myTimer.restart();
>>>>>>>
>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>
>>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>>> mt.next();
>>>>>>> }
>>>>>>>
>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>
>>>>>>> int n;
>>>>>>> std::cin>>n;
>>>>>>> return 0;
>>>>>>> }
>>>>>>>
>>>>>>> Regards,
>>>>>>> Cheng
>>>>>>>
>>>>>>> -----邮件原件-----
>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>> 收件人: Joseph Wang
>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>>>> MT
>>>>>>>
>>>>>>> Hi Joseph, all,
>>>>>>>
>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>> Mersenne Twisters).
>>>>>>>
>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>
>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>> for use in at most 8 parallel threads), for the "standard" value
>>>>>>> p = 19937 and word size 32, which one can instantiate with
>>>>>>>
>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>
>>>>>>> for i = 0, ... , 7.
>>>>>>>
>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>> faster in the dcmt library than with the original ql twister. I
>>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>
>>>>>>> All this is of course experimental and not well tested, so any
>>>>>>> feedback and experiences are very welcome. I'd be very
>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>> I've uploaded the changes to the
>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>
>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>> critical
>>>>>>> situation.
>>>>>>>> Calculating the prices once I have the path is multithreaded,
>>>>>>>> but right now I need to generate the paths in a single thread
>>>>>>>> to make sure that the same sequence is generated.
>>>>>>>>
>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>> regression test to fail. The problem is that the random number
>>>>>>>> generator is being called in BarrierPathPricer, and since that
>>>>>>>> is run multithread, the sequence that is being pulled will
>>>>>>>> change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>
>>>>>>>> I think that fixing this is going to need some code
>>>>>>>> restructuring, but I'd like to get some thoughts as to how to
>>>>>>>> do this. Basically, the interface needs to be changed slightly
>>>>>>>> so that the random numbers are drawn in a fixed order, and that
>>>>>>>> might mean one call to get any additional random numbers in a
>>>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------
>>>>>>>> -
>>>>>>>> -
>>>>>>>> -----
>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get
>>>>>>>> the most from the latest Intel processors and coprocessors. See
>>>>>>>> abstracts and register >
>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140
>>>>>>>> / o stg.c lktrk _______________________________________________
>>>>>>>> QuantLib-dev mailing list
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> -
>>>>>>> -
>>>>>>> ----------
>>>>>>> --
>>>>>>> Slashdot TV.
>>>>>>> Video for Nerds. Stuff that matters.
>>>>>>> http://tv.slashdot.org/
>>>>>>> _______________________________________________
>>>>>>> QuantLib-dev mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>
>>
>
>

Peter Caspers-4

Re: 答复: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Hi Cheng,

I switched to a template class for precomputed twisters, which is
faster by a factor of 2 (450ms instead of 870ms). This can be
instantiated with

MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);

with 5 replaceable by 0 to 7 as before. The other is only needed now
if you want to create a mt during runtime.

The pull request is updated accordingly.

Best regards
Peter

On 21 September 2014 08:11, cheng.li <[hidden email]> wrote:

> Hi Peter,
>
> Thanks for your hard work. I think our results are consistent.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月21日 0:33
> 收件人: cheng li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>
> Hi Cheng,
>
> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.
>
> Actually on Windows (same machine on which I run Ubuntu, which doesn't really matter, because my computer in office gives very similar
> timings) I get for 1E8 random numbers generated (with O2)
>
> 400ms / 1100ms
>
> for the original ql mt / dynamic creator mt. The ql mt is just as fast as the boost mt implementation by the way. On Ubuntu with gcc 4.8.1 and O3 I get
>
> 290ms / 870ms
>
> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.
>
> If I directly call the original C routine without using the wrapper object, I get 720ms.
>
> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.
>
> This means, the wrapper introduces a slow down by 20% which seems not too bad.
>
> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.
>
> What is your opinion on this ?
>
> Peter
>
>
>
>
>
>
>
>
>
>
>
>
> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.
>
> Winodws / MSVC 2010 => 400ms / 1100ms
> Ubuntu / gcc 4.9.1 => 1200 ms / 1050 ms
> Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms
> Ubuntu / clang 3.6.0 => 1340 ms / 1150 ms
>
> clang
> 290
> 720
> 870
>
> (c 730)
>
> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.
>
> At the moment I stil don't know what it is.
>
> On 18 September 2014 03:33, cheng li <[hidden email]> wrote:
>> Let me try your statement once I have a time.
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: cheng li [mailto:[hidden email]]
>> 发送时间: 2014年9月18日 9:18
>> 收件人: 'Peter Caspers'
>> 抄送: 'QuantLib Mailing Lists'
>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>> MT
>>
>> Hi Peter,
>>
>> I used gcc 4.8.2.
>>
>> My result with O3 optimization is still not good. Similar performance
>> of new MT ( about 3~4X speed down)
>>
>> I used such statement to turn on o3 optimization before I do
>> ./configure for QuantLib,
>>
>> Export CXXFLAGS="-g -O3"
>>
>> Am I right?
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月18日 0:36
>> 收件人: cheng li
>> 抄送: QuantLib Mailing Lists
>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>> MT
>>
>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>>
>> Which compiler do you use on Ubuntu ?
>>
>> Peter
>>
>>
>>
>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>>
>>> I'll try -O3 on my machine also with Ubuntu.
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>> 发送时间: 2014年9月17日 0:32
>>> 收件人: Cheng Li; QuantLib Mailing Lists
>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>>>
>>> Hi Cheng,
>>>
>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>>
>>> Does anyone have an idea where the different behaviour under gcc /
>>> linux and msvc might come from (and how to improve the msvc side if
>>> possible) ?
>>>
>>> Kind regards
>>> Peter
>>>
>>>
>>>
>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>>> Thanks Peter.
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> 发自我的 iPad
>>>>
>>>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>>>
>>>>> I will have a look on monday ( I have a Windows machine at work )
>>>>> and see how it works there
>>>>>
>>>>> Thanks
>>>>> Peter
>>>>>
>>>>> Von meinem iPhone gesendet
>>>>>
>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>>
>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>>> under release mode
>>>>>>
>>>>>> 发自我的 iPad
>>>>>>
>>>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>>>
>>>>>>> Hi Cheng,
>>>>>>>
>>>>>>> no, I get better timings with the dcmt implementation, e.g. for
>>>>>>> 1E8 numbers
>>>>>>>
>>>>>>> dcmt 0.982s
>>>>>>> quantlib 1.159s
>>>>>>>
>>>>>>> on my computer. Can you post your platform and compiler settings,
>>>>>>> so that I can try to reproduce ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Peter
>>>>>>>
>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>>>> original MT. Is this consistent with your side?
>>>>>>>>
>>>>>>>> #include <ql/quantlib.hpp>
>>>>>>>> #include <boost/timer.hpp>
>>>>>>>> #include <iostream>
>>>>>>>>
>>>>>>>> using namespace QuantLib;
>>>>>>>> using namespace std;
>>>>>>>>
>>>>>>>> int main() {
>>>>>>>>
>>>>>>>> int samples;
>>>>>>>> cin >> samples;
>>>>>>>> boost::timer myTimer;
>>>>>>>>
>>>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>>>> for(Size i=0; i<samples; ++i)
>>>>>>>> orignalMT.next();
>>>>>>>>
>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>
>>>>>>>> myTimer.restart();
>>>>>>>>
>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>>
>>>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>>>> mt.next();
>>>>>>>> }
>>>>>>>>
>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>
>>>>>>>> int n;
>>>>>>>> std::cin>>n;
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Cheng
>>>>>>>>
>>>>>>>> -----邮件原件-----
>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>>> 收件人: Joseph Wang
>>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>>>>> MT
>>>>>>>>
>>>>>>>> Hi Joseph, all,
>>>>>>>>
>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>>> Mersenne Twisters).
>>>>>>>>
>>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>>
>>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>>> for use in at most 8 parallel threads), for the "standard" value
>>>>>>>> p = 19937 and word size 32, which one can instantiate with
>>>>>>>>
>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>>
>>>>>>>> for i = 0, ... , 7.
>>>>>>>>
>>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>>> faster in the dcmt library than with the original ql twister. I
>>>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>>
>>>>>>>> All this is of course experimental and not well tested, so any
>>>>>>>> feedback and experiences are very welcome. I'd be very
>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>>> I've uploaded the changes to the
>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>>
>>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>>> critical
>>>>>>>> situation.
>>>>>>>>> Calculating the prices once I have the path is multithreaded,
>>>>>>>>> but right now I need to generate the paths in a single thread
>>>>>>>>> to make sure that the same sequence is generated.
>>>>>>>>>
>>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>>> regression test to fail. The problem is that the random number
>>>>>>>>> generator is being called in BarrierPathPricer, and since that
>>>>>>>>> is run multithread, the sequence that is being pulled will
>>>>>>>>> change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>>
>>>>>>>>> I think that fixing this is going to need some code
>>>>>>>>> restructuring, but I'd like to get some thoughts as to how to
>>>>>>>>> do this. Basically, the interface needs to be changed slightly
>>>>>>>>> so that the random numbers are drawn in a fixed order, and that
>>>>>>>>> might mean one call to get any additional random numbers in a
>>>>>>>>> pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> -----
>>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get
>>>>>>>>> the most from the latest Intel processors and coprocessors. See
>>>>>>>>> abstracts and register >
>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140
>>>>>>>>> / o stg.c lktrk _______________________________________________
>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>> [hidden email]
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------
>>>>>>>> -
>>>>>>>> -
>>>>>>>> ----------
>>>>>>>> --
>>>>>>>> Slashdot TV.
>>>>>>>> Video for Nerds. Stuff that matters.
>>>>>>>> http://tv.slashdot.org/
>>>>>>>> _______________________________________________
>>>>>>>> QuantLib-dev mailing list
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>
>>>
>>
>>
>

cheng li

答复: 答复: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Hi Peter,

Thanks for your effort. I'll definitely have a try:)

Regards,
Cheng

-----邮件原件-----
发件人: Peter Caspers [mailto:[hidden email]]
发送时间: 2014年9月21日 23:11
收件人: cheng.li
抄送: QuantLib Mailing Lists
主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT

Hi Cheng,

I switched to a template class for precomputed twisters, which is faster by a factor of 2 (450ms instead of 870ms). This can be instantiated with

MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);

with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime.

The pull request is updated accordingly.

Best regards
Peter

On 21 September 2014 08:11, cheng.li <[hidden email]> wrote:

> Hi Peter,
>
> Thanks for your hard work. I think our results are consistent.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月21日 0:33
> 收件人: cheng li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
> Creator MT
>
> Hi Cheng,
>
> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.
>
> Actually on Windows (same machine on which I run Ubuntu, which doesn't
> really matter, because my computer in office gives very similar
> timings) I get for 1E8 random numbers generated (with O2)
>
> 400ms / 1100ms
>
> for the original ql mt / dynamic creator mt. The ql mt is just as fast
> as the boost mt implementation by the way. On Ubuntu with gcc 4.8.1
> and O3 I get
>
> 290ms / 870ms
>
> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.
>
> If I directly call the original C routine without using the wrapper object, I get 720ms.
>
> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.
>
> This means, the wrapper introduces a slow down by 20% which seems not too bad.
>
> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.
>
> What is your opinion on this ?
>
> Peter
>
>
>
>
>
>
>
>
>
>
>
>
> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.
>
> Winodws / MSVC 2010 => 400ms / 1100ms
> Ubuntu / gcc 4.9.1 => 1200 ms / 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms
> / 1040 ms Ubuntu / clang 3.6.0 => 1340 ms / 1150 ms
>
> clang
> 290
> 720
> 870
>
> (c 730)
>
> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.
>
> At the moment I stil don't know what it is.
>
> On 18 September 2014 03:33, cheng li <[hidden email]> wrote:
>> Let me try your statement once I have a time.
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: cheng li [mailto:[hidden email]]
>> 发送时间: 2014年9月18日 9:18
>> 收件人: 'Peter Caspers'
>> 抄送: 'QuantLib Mailing Lists'
>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>> Creator MT
>>
>> Hi Peter,
>>
>> I used gcc 4.8.2.
>>
>> My result with O3 optimization is still not good. Similar performance
>> of new MT ( about 3~4X speed down)
>>
>> I used such statement to turn on o3 optimization before I do
>> ./configure for QuantLib,
>>
>> Export CXXFLAGS="-g -O3"
>>
>> Am I right?
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月18日 0:36
>> 收件人: cheng li
>> 抄送: QuantLib Mailing Lists
>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>> Creator MT
>>
>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>>
>> Which compiler do you use on Ubuntu ?
>>
>> Peter
>>
>>
>>
>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>>
>>> I'll try -O3 on my machine also with Ubuntu.
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>> 发送时间: 2014年9月17日 0:32
>>> 收件人: Cheng Li; QuantLib Mailing Lists
>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>> MT
>>>
>>> Hi Cheng,
>>>
>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>>
>>> Does anyone have an idea where the different behaviour under gcc /
>>> linux and msvc might come from (and how to improve the msvc side if
>>> possible) ?
>>>
>>> Kind regards
>>> Peter
>>>
>>>
>>>
>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>>> Thanks Peter.
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> 发自我的 iPad
>>>>
>>>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>>>
>>>>> I will have a look on monday ( I have a Windows machine at work )
>>>>> and see how it works there
>>>>>
>>>>> Thanks
>>>>> Peter
>>>>>
>>>>> Von meinem iPhone gesendet
>>>>>
>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>>
>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>>> under release mode
>>>>>>
>>>>>> 发自我的 iPad
>>>>>>
>>>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>>>
>>>>>>> Hi Cheng,
>>>>>>>
>>>>>>> no, I get better timings with the dcmt implementation, e.g. for
>>>>>>> 1E8 numbers
>>>>>>>
>>>>>>> dcmt 0.982s
>>>>>>> quantlib 1.159s
>>>>>>>
>>>>>>> on my computer. Can you post your platform and compiler
>>>>>>> settings, so that I can try to reproduce ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Peter
>>>>>>>
>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>>>> original MT. Is this consistent with your side?
>>>>>>>>
>>>>>>>> #include <ql/quantlib.hpp>
>>>>>>>> #include <boost/timer.hpp>
>>>>>>>> #include <iostream>
>>>>>>>>
>>>>>>>> using namespace QuantLib;
>>>>>>>> using namespace std;
>>>>>>>>
>>>>>>>> int main() {
>>>>>>>>
>>>>>>>> int samples;
>>>>>>>> cin >> samples;
>>>>>>>> boost::timer myTimer;
>>>>>>>>
>>>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>>>> for(Size i=0; i<samples; ++i)
>>>>>>>> orignalMT.next();
>>>>>>>>
>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>
>>>>>>>> myTimer.restart();
>>>>>>>>
>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>>
>>>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>>>> mt.next();
>>>>>>>> }
>>>>>>>>
>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>
>>>>>>>> int n;
>>>>>>>> std::cin>>n;
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Cheng
>>>>>>>>
>>>>>>>> -----邮件原件-----
>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>>> 收件人: Joseph Wang
>>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>>>>> MT
>>>>>>>>
>>>>>>>> Hi Joseph, all,
>>>>>>>>
>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>>> Mersenne Twisters).
>>>>>>>>
>>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>>
>>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>>> for use in at most 8 parallel threads), for the "standard"
>>>>>>>> value p = 19937 and word size 32, which one can instantiate
>>>>>>>> with
>>>>>>>>
>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>>
>>>>>>>> for i = 0, ... , 7.
>>>>>>>>
>>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>>> faster in the dcmt library than with the original ql twister. I
>>>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>>
>>>>>>>> All this is of course experimental and not well tested, so any
>>>>>>>> feedback and experiences are very welcome. I'd be very
>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>>> I've uploaded the changes to the
>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>>
>>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>>> critical
>>>>>>>> situation.
>>>>>>>>> Calculating the prices once I have the path is multithreaded,
>>>>>>>>> but right now I need to generate the paths in a single thread
>>>>>>>>> to make sure that the same sequence is generated.
>>>>>>>>>
>>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>>> regression test to fail. The problem is that the random
>>>>>>>>> number generator is being called in BarrierPathPricer, and
>>>>>>>>> since that is run multithread, the sequence that is being
>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>>
>>>>>>>>> I think that fixing this is going to need some code
>>>>>>>>> restructuring, but I'd like to get some thoughts as to how to
>>>>>>>>> do this. Basically, the interface needs to be changed
>>>>>>>>> slightly so that the random numbers are drawn in a fixed
>>>>>>>>> order, and that might mean one call to get any additional
>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> -----
>>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more.
>>>>>>>>> Get the most from the latest Intel processors and
>>>>>>>>> coprocessors. See abstracts and register >
>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/414
>>>>>>>>> 0 / o stg.c lktrk
>>>>>>>>> _______________________________________________
>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>> [hidden email]
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------
>>>>>>>> -
>>>>>>>> -
>>>>>>>> -
>>>>>>>> ----------
>>>>>>>> --
>>>>>>>> Slashdot TV.
>>>>>>>> Video for Nerds. Stuff that matters.
>>>>>>>> http://tv.slashdot.org/
>>>>>>>> _______________________________________________
>>>>>>>> QuantLib-dev mailing list
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>
>>>
>>
>>
>

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev

Peter Caspers-4

Re: 答复: 答复: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

yes, please. The slowdown on Windows on my office computer is around 1.6 now.
best regards
Peter

On 22 September 2014 03:48, cheng li <[hidden email]> wrote:

> Hi Peter,
>
> Thanks for your effort. I'll definitely have a try:)
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月21日 23:11
> 收件人: cheng.li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>
> Hi Cheng,
>
> I switched to a template class for precomputed twisters, which is faster by a factor of 2 (450ms instead of 870ms). This can be instantiated with
>
> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);
>
> with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime.
>
> The pull request is updated accordingly.
>
> Best regards
> Peter
>
>
>
>
> On 21 September 2014 08:11, cheng.li <[hidden email]> wrote:
>> Hi Peter,
>>
>> Thanks for your hard work. I think our results are consistent.
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月21日 0:33
>> 收件人: cheng li
>> 抄送: QuantLib Mailing Lists
>> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>> Creator MT
>>
>> Hi Cheng,
>>
>> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.
>>
>> Actually on Windows (same machine on which I run Ubuntu, which doesn't
>> really matter, because my computer in office gives very similar
>> timings) I get for 1E8 random numbers generated (with O2)
>>
>> 400ms / 1100ms
>>
>> for the original ql mt / dynamic creator mt. The ql mt is just as fast
>> as the boost mt implementation by the way. On Ubuntu with gcc 4.8.1
>> and O3 I get
>>
>> 290ms / 870ms
>>
>> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.
>>
>> If I directly call the original C routine without using the wrapper object, I get 720ms.
>>
>> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.
>>
>> This means, the wrapper introduces a slow down by 20% which seems not too bad.
>>
>> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.
>>
>> What is your opinion on this ?
>>
>> Peter
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.
>>
>> Winodws / MSVC 2010 => 400ms / 1100ms
>> Ubuntu / gcc 4.9.1 => 1200 ms / 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms
>> / 1040 ms Ubuntu / clang 3.6.0 => 1340 ms / 1150 ms
>>
>> clang
>> 290
>> 720
>> 870
>>
>> (c 730)
>>
>> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.
>>
>> At the moment I stil don't know what it is.
>>
>> On 18 September 2014 03:33, cheng li <[hidden email]> wrote:
>>> Let me try your statement once I have a time.
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: cheng li [mailto:[hidden email]]
>>> 发送时间: 2014年9月18日 9:18
>>> 收件人: 'Peter Caspers'
>>> 抄送: 'QuantLib Mailing Lists'
>>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>> Creator MT
>>>
>>> Hi Peter,
>>>
>>> I used gcc 4.8.2.
>>>
>>> My result with O3 optimization is still not good. Similar performance
>>> of new MT ( about 3~4X speed down)
>>>
>>> I used such statement to turn on o3 optimization before I do
>>> ./configure for QuantLib,
>>>
>>> Export CXXFLAGS="-g -O3"
>>>
>>> Am I right?
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>> 发送时间: 2014年9月18日 0:36
>>> 收件人: cheng li
>>> 抄送: QuantLib Mailing Lists
>>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>> Creator MT
>>>
>>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
>>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>>>
>>> Which compiler do you use on Ubuntu ?
>>>
>>> Peter
>>>
>>>
>>>
>>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>>>
>>>> I'll try -O3 on my machine also with Ubuntu.
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> -----邮件原件-----
>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>> 发送时间: 2014年9月17日 0:32
>>>> 收件人: Cheng Li; QuantLib Mailing Lists
>>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>> MT
>>>>
>>>> Hi Cheng,
>>>>
>>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>>>
>>>> Does anyone have an idea where the different behaviour under gcc /
>>>> linux and msvc might come from (and how to improve the msvc side if
>>>> possible) ?
>>>>
>>>> Kind regards
>>>> Peter
>>>>
>>>>
>>>>
>>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>>>> Thanks Peter.
>>>>>
>>>>> Regards,
>>>>> Cheng
>>>>>
>>>>> 发自我的 iPad
>>>>>
>>>>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>>>>
>>>>>> I will have a look on monday ( I have a Windows machine at work )
>>>>>> and see how it works there
>>>>>>
>>>>>> Thanks
>>>>>> Peter
>>>>>>
>>>>>> Von meinem iPhone gesendet
>>>>>>
>>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>>>
>>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>>>> under release mode
>>>>>>>
>>>>>>> 发自我的 iPad
>>>>>>>
>>>>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>>>>
>>>>>>>> Hi Cheng,
>>>>>>>>
>>>>>>>> no, I get better timings with the dcmt implementation, e.g. for
>>>>>>>> 1E8 numbers
>>>>>>>>
>>>>>>>> dcmt 0.982s
>>>>>>>> quantlib 1.159s
>>>>>>>>
>>>>>>>> on my computer. Can you post your platform and compiler
>>>>>>>> settings, so that I can try to reproduce ?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>>>>> original MT. Is this consistent with your side?
>>>>>>>>>
>>>>>>>>> #include <ql/quantlib.hpp>
>>>>>>>>> #include <boost/timer.hpp>
>>>>>>>>> #include <iostream>
>>>>>>>>>
>>>>>>>>> using namespace QuantLib;
>>>>>>>>> using namespace std;
>>>>>>>>>
>>>>>>>>> int main() {
>>>>>>>>>
>>>>>>>>> int samples;
>>>>>>>>> cin >> samples;
>>>>>>>>> boost::timer myTimer;
>>>>>>>>>
>>>>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>>>>> for(Size i=0; i<samples; ++i)
>>>>>>>>> orignalMT.next();
>>>>>>>>>
>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>
>>>>>>>>> myTimer.restart();
>>>>>>>>>
>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>>>
>>>>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>>>>> mt.next();
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>
>>>>>>>>> int n;
>>>>>>>>> std::cin>>n;
>>>>>>>>> return 0;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Cheng
>>>>>>>>>
>>>>>>>>> -----邮件原件-----
>>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>>>> 收件人: Joseph Wang
>>>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>>>>>> MT
>>>>>>>>>
>>>>>>>>> Hi Joseph, all,
>>>>>>>>>
>>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>>>> Mersenne Twisters).
>>>>>>>>>
>>>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>>>
>>>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>>>> for use in at most 8 parallel threads), for the "standard"
>>>>>>>>> value p = 19937 and word size 32, which one can instantiate
>>>>>>>>> with
>>>>>>>>>
>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>>>
>>>>>>>>> for i = 0, ... , 7.
>>>>>>>>>
>>>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>>>> faster in the dcmt library than with the original ql twister. I
>>>>>>>>> observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>>>
>>>>>>>>> All this is of course experimental and not well tested, so any
>>>>>>>>> feedback and experiences are very welcome. I'd be very
>>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>>>> I've uploaded the changes to the
>>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>>>
>>>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>>>> critical
>>>>>>>>> situation.
>>>>>>>>>> Calculating the prices once I have the path is multithreaded,
>>>>>>>>>> but right now I need to generate the paths in a single thread
>>>>>>>>>> to make sure that the same sequence is generated.
>>>>>>>>>>
>>>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>>>> regression test to fail. The problem is that the random
>>>>>>>>>> number generator is being called in BarrierPathPricer, and
>>>>>>>>>> since that is run multithread, the sequence that is being
>>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>>>
>>>>>>>>>> I think that fixing this is going to need some code
>>>>>>>>>> restructuring, but I'd like to get some thoughts as to how to
>>>>>>>>>> do this. Basically, the interface needs to be changed
>>>>>>>>>> slightly so that the random numbers are drawn in a fixed
>>>>>>>>>> order, and that might mean one call to get any additional
>>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -----
>>>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more.
>>>>>>>>>> Get the most from the latest Intel processors and
>>>>>>>>>> coprocessors. See abstracts and register >
>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/414
>>>>>>>>>> 0 / o stg.c lktrk
>>>>>>>>>> _______________________________________________
>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>> [hidden email]
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> ----------
>>>>>>>>> --
>>>>>>>>> Slashdot TV.
>>>>>>>>> Video for Nerds. Stuff that matters.
>>>>>>>>> http://tv.slashdot.org/
>>>>>>>>> _______________________________________________
>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>> [hidden email]
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>
>>>>
>>>
>>>
>>
>

cheng li

答复: 答复: 答复: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Hi Peter,

On my side the performance is also improved. Now around 2.5 slow down. Thanks for your help.

Regards,
Cheng

-----邮件原件-----
发件人: Peter Caspers [mailto:[hidden email]]
发送时间: 2014年9月22日 16:05
收件人: cheng li
抄送: QuantLib Mailing Lists
主题: Re: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT

yes, please. The slowdown on Windows on my office computer is around 1.6 now.
best regards
Peter

On 22 September 2014 03:48, cheng li <[hidden email]> wrote:

> Hi Peter,
>
> Thanks for your effort. I'll definitely have a try:)
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月21日 23:11
> 收件人: cheng.li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
> Creator MT
>
> Hi Cheng,
>
> I switched to a template class for precomputed twisters, which is
> faster by a factor of 2 (450ms instead of 870ms). This can be
> instantiated with
>
> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);
>
> with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime.
>
> The pull request is updated accordingly.
>
> Best regards
> Peter
>
>
>
>
> On 21 September 2014 08:11, cheng.li <[hidden email]> wrote:
>> Hi Peter,
>>
>> Thanks for your hard work. I think our results are consistent.
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月21日 0:33
>> 收件人: cheng li
>> 抄送: QuantLib Mailing Lists
>> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>> Creator MT
>>
>> Hi Cheng,
>>
>> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.
>>
>> Actually on Windows (same machine on which I run Ubuntu, which
>> doesn't really matter, because my computer in office gives very
>> similar
>> timings) I get for 1E8 random numbers generated (with O2)
>>
>> 400ms / 1100ms
>>
>> for the original ql mt / dynamic creator mt. The ql mt is just as
>> fast as the boost mt implementation by the way. On Ubuntu with gcc
>> 4.8.1 and O3 I get
>>
>> 290ms / 870ms
>>
>> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.
>>
>> If I directly call the original C routine without using the wrapper object, I get 720ms.
>>
>> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.
>>
>> This means, the wrapper introduces a slow down by 20% which seems not too bad.
>>
>> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.
>>
>> What is your opinion on this ?
>>
>> Peter
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.
>>
>> Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms /
>> 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0
>> => 1340 ms / 1150 ms
>>
>> clang
>> 290
>> 720
>> 870
>>
>> (c 730)
>>
>> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.
>>
>> At the moment I stil don't know what it is.
>>
>> On 18 September 2014 03:33, cheng li <[hidden email]> wrote:
>>> Let me try your statement once I have a time.
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: cheng li [mailto:[hidden email]]
>>> 发送时间: 2014年9月18日 9:18
>>> 收件人: 'Peter Caspers'
>>> 抄送: 'QuantLib Mailing Lists'
>>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>> Creator MT
>>>
>>> Hi Peter,
>>>
>>> I used gcc 4.8.2.
>>>
>>> My result with O3 optimization is still not good. Similar
>>> performance of new MT ( about 3~4X speed down)
>>>
>>> I used such statement to turn on o3 optimization before I do
>>> ./configure for QuantLib,
>>>
>>> Export CXXFLAGS="-g -O3"
>>>
>>> Am I right?
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>> 发送时间: 2014年9月18日 0:36
>>> 收件人: cheng li
>>> 抄送: QuantLib Mailing Lists
>>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>> Creator MT
>>>
>>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
>>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>>>
>>> Which compiler do you use on Ubuntu ?
>>>
>>> Peter
>>>
>>>
>>>
>>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>>>
>>>> I'll try -O3 on my machine also with Ubuntu.
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> -----邮件原件-----
>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>> 发送时间: 2014年9月17日 0:32
>>>> 收件人: Cheng Li; QuantLib Mailing Lists
>>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>> MT
>>>>
>>>> Hi Cheng,
>>>>
>>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>>>
>>>> Does anyone have an idea where the different behaviour under gcc /
>>>> linux and msvc might come from (and how to improve the msvc side if
>>>> possible) ?
>>>>
>>>> Kind regards
>>>> Peter
>>>>
>>>>
>>>>
>>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>>>> Thanks Peter.
>>>>>
>>>>> Regards,
>>>>> Cheng
>>>>>
>>>>> 发自我的 iPad
>>>>>
>>>>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>>>>
>>>>>> I will have a look on monday ( I have a Windows machine at work )
>>>>>> and see how it works there
>>>>>>
>>>>>> Thanks
>>>>>> Peter
>>>>>>
>>>>>> Von meinem iPhone gesendet
>>>>>>
>>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>>>
>>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>>>> under release mode
>>>>>>>
>>>>>>> 发自我的 iPad
>>>>>>>
>>>>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>>>>
>>>>>>>> Hi Cheng,
>>>>>>>>
>>>>>>>> no, I get better timings with the dcmt implementation, e.g. for
>>>>>>>> 1E8 numbers
>>>>>>>>
>>>>>>>> dcmt 0.982s
>>>>>>>> quantlib 1.159s
>>>>>>>>
>>>>>>>> on my computer. Can you post your platform and compiler
>>>>>>>> settings, so that I can try to reproduce ?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>>>>> original MT. Is this consistent with your side?
>>>>>>>>>
>>>>>>>>> #include <ql/quantlib.hpp>
>>>>>>>>> #include <boost/timer.hpp>
>>>>>>>>> #include <iostream>
>>>>>>>>>
>>>>>>>>> using namespace QuantLib;
>>>>>>>>> using namespace std;
>>>>>>>>>
>>>>>>>>> int main() {
>>>>>>>>>
>>>>>>>>> int samples;
>>>>>>>>> cin >> samples;
>>>>>>>>> boost::timer myTimer;
>>>>>>>>>
>>>>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>>>>> for(Size i=0; i<samples; ++i)
>>>>>>>>> orignalMT.next();
>>>>>>>>>
>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>
>>>>>>>>> myTimer.restart();
>>>>>>>>>
>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>>>
>>>>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>>>>> mt.next();
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>
>>>>>>>>> int n;
>>>>>>>>> std::cin>>n;
>>>>>>>>> return 0;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Cheng
>>>>>>>>>
>>>>>>>>> -----邮件原件-----
>>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>>>> 收件人: Joseph Wang
>>>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>>>>>> MT
>>>>>>>>>
>>>>>>>>> Hi Joseph, all,
>>>>>>>>>
>>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>>>> Mersenne Twisters).
>>>>>>>>>
>>>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>>>
>>>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>>>> for use in at most 8 parallel threads), for the "standard"
>>>>>>>>> value p = 19937 and word size 32, which one can instantiate
>>>>>>>>> with
>>>>>>>>>
>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>>>
>>>>>>>>> for i = 0, ... , 7.
>>>>>>>>>
>>>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>>>> faster in the dcmt library than with the original ql twister.
>>>>>>>>> I observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>>>
>>>>>>>>> All this is of course experimental and not well tested, so any
>>>>>>>>> feedback and experiences are very welcome. I'd be very
>>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>>>> I've uploaded the changes to the
>>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>>>
>>>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>>>> critical
>>>>>>>>> situation.
>>>>>>>>>> Calculating the prices once I have the path is multithreaded,
>>>>>>>>>> but right now I need to generate the paths in a single thread
>>>>>>>>>> to make sure that the same sequence is generated.
>>>>>>>>>>
>>>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>>>> regression test to fail. The problem is that the random
>>>>>>>>>> number generator is being called in BarrierPathPricer, and
>>>>>>>>>> since that is run multithread, the sequence that is being
>>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>>>
>>>>>>>>>> I think that fixing this is going to need some code
>>>>>>>>>> restructuring, but I'd like to get some thoughts as to how to
>>>>>>>>>> do this. Basically, the interface needs to be changed
>>>>>>>>>> slightly so that the random numbers are drawn in a fixed
>>>>>>>>>> order, and that might mean one call to get any additional
>>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -----
>>>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more.
>>>>>>>>>> Get the most from the latest Intel processors and
>>>>>>>>>> coprocessors. See abstracts and register >
>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/41
>>>>>>>>>> 4
>>>>>>>>>> 0 / o stg.c lktrk
>>>>>>>>>> _______________________________________________
>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>> [hidden email]
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> ----------
>>>>>>>>> --
>>>>>>>>> Slashdot TV.
>>>>>>>>> Video for Nerds. Stuff that matters.
>>>>>>>>> http://tv.slashdot.org/
>>>>>>>>> _______________________________________________
>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>> [hidden email]
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>
>>>>
>>>
>>>
>>
>

Peter Caspers-4

Re: 答复: 答复: 答复: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Hi,

I think I could further improve the performance of the precomputed
twisters (i.e. the ones constructed as

MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);

). Now they seem to be just as fast as the original one (I only tested
on Linux). The PR is updated.

Cheng, would you maybe like to double check ?

Thanks a lot
Peter

On 23 September 2014 03:50, cheng li <[hidden email]> wrote:

> Hi Peter,
>
> On my side the performance is also improved. Now around 2.5 slow down. Thanks for your help.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月22日 16:05
> 收件人: cheng li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>
> yes, please. The slowdown on Windows on my office computer is around 1.6 now.
> best regards
> Peter
>
> On 22 September 2014 03:48, cheng li <[hidden email]> wrote:
>> Hi Peter,
>>
>> Thanks for your effort. I'll definitely have a try:)
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月21日 23:11
>> 收件人: cheng.li
>> 抄送: QuantLib Mailing Lists
>> 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>> Creator MT
>>
>> Hi Cheng,
>>
>> I switched to a template class for precomputed twisters, which is
>> faster by a factor of 2 (450ms instead of 870ms). This can be
>> instantiated with
>>
>> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);
>>
>> with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime.
>>
>> The pull request is updated accordingly.
>>
>> Best regards
>> Peter
>>
>>
>>
>>
>> On 21 September 2014 08:11, cheng.li <[hidden email]> wrote:
>>> Hi Peter,
>>>
>>> Thanks for your hard work. I think our results are consistent.
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>> 发送时间: 2014年9月21日 0:33
>>> 收件人: cheng li
>>> 抄送: QuantLib Mailing Lists
>>> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>> Creator MT
>>>
>>> Hi Cheng,
>>>
>>> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.
>>>
>>> Actually on Windows (same machine on which I run Ubuntu, which
>>> doesn't really matter, because my computer in office gives very
>>> similar
>>> timings) I get for 1E8 random numbers generated (with O2)
>>>
>>> 400ms / 1100ms
>>>
>>> for the original ql mt / dynamic creator mt. The ql mt is just as
>>> fast as the boost mt implementation by the way. On Ubuntu with gcc
>>> 4.8.1 and O3 I get
>>>
>>> 290ms / 870ms
>>>
>>> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.
>>>
>>> If I directly call the original C routine without using the wrapper object, I get 720ms.
>>>
>>> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.
>>>
>>> This means, the wrapper introduces a slow down by 20% which seems not too bad.
>>>
>>> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.
>>>
>>> What is your opinion on this ?
>>>
>>> Peter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.
>>>
>>> Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms /
>>> 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0
>>> => 1340 ms / 1150 ms
>>>
>>> clang
>>> 290
>>> 720
>>> 870
>>>
>>> (c 730)
>>>
>>> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.
>>>
>>> At the moment I stil don't know what it is.
>>>
>>> On 18 September 2014 03:33, cheng li <[hidden email]> wrote:
>>>> Let me try your statement once I have a time.
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> -----邮件原件-----
>>>> 发件人: cheng li [mailto:[hidden email]]
>>>> 发送时间: 2014年9月18日 9:18
>>>> 收件人: 'Peter Caspers'
>>>> 抄送: 'QuantLib Mailing Lists'
>>>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>> Creator MT
>>>>
>>>> Hi Peter,
>>>>
>>>> I used gcc 4.8.2.
>>>>
>>>> My result with O3 optimization is still not good. Similar
>>>> performance of new MT ( about 3~4X speed down)
>>>>
>>>> I used such statement to turn on o3 optimization before I do
>>>> ./configure for QuantLib,
>>>>
>>>> Export CXXFLAGS="-g -O3"
>>>>
>>>> Am I right?
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> -----邮件原件-----
>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>> 发送时间: 2014年9月18日 0:36
>>>> 收件人: cheng li
>>>> 抄送: QuantLib Mailing Lists
>>>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>> Creator MT
>>>>
>>>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
>>>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>>>>
>>>> Which compiler do you use on Ubuntu ?
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>>>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>>>>
>>>>> I'll try -O3 on my machine also with Ubuntu.
>>>>>
>>>>> Regards,
>>>>> Cheng
>>>>>
>>>>> -----邮件原件-----
>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>> 发送时间: 2014年9月17日 0:32
>>>>> 收件人: Cheng Li; QuantLib Mailing Lists
>>>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>> MT
>>>>>
>>>>> Hi Cheng,
>>>>>
>>>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>>>>
>>>>> Does anyone have an idea where the different behaviour under gcc /
>>>>> linux and msvc might come from (and how to improve the msvc side if
>>>>> possible) ?
>>>>>
>>>>> Kind regards
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>>>>> Thanks Peter.
>>>>>>
>>>>>> Regards,
>>>>>> Cheng
>>>>>>
>>>>>> 发自我的 iPad
>>>>>>
>>>>>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>>>>>
>>>>>>> I will have a look on monday ( I have a Windows machine at work )
>>>>>>> and see how it works there
>>>>>>>
>>>>>>> Thanks
>>>>>>> Peter
>>>>>>>
>>>>>>> Von meinem iPhone gesendet
>>>>>>>
>>>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>>>>
>>>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>>>>> under release mode
>>>>>>>>
>>>>>>>> 发自我的 iPad
>>>>>>>>
>>>>>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>>>>>
>>>>>>>>> Hi Cheng,
>>>>>>>>>
>>>>>>>>> no, I get better timings with the dcmt implementation, e.g. for
>>>>>>>>> 1E8 numbers
>>>>>>>>>
>>>>>>>>> dcmt 0.982s
>>>>>>>>> quantlib 1.159s
>>>>>>>>>
>>>>>>>>> on my computer. Can you post your platform and compiler
>>>>>>>>> settings, so that I can try to reproduce ?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the QL
>>>>>>>>>> original MT. Is this consistent with your side?
>>>>>>>>>>
>>>>>>>>>> #include <ql/quantlib.hpp>
>>>>>>>>>> #include <boost/timer.hpp>
>>>>>>>>>> #include <iostream>
>>>>>>>>>>
>>>>>>>>>> using namespace QuantLib;
>>>>>>>>>> using namespace std;
>>>>>>>>>>
>>>>>>>>>> int main() {
>>>>>>>>>>
>>>>>>>>>> int samples;
>>>>>>>>>> cin >> samples;
>>>>>>>>>> boost::timer myTimer;
>>>>>>>>>>
>>>>>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>>>>>> for(Size i=0; i<samples; ++i)
>>>>>>>>>> orignalMT.next();
>>>>>>>>>>
>>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>>
>>>>>>>>>> myTimer.restart();
>>>>>>>>>>
>>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>>>>
>>>>>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>>>>>> mt.next();
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>>
>>>>>>>>>> int n;
>>>>>>>>>> std::cin>>n;
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Cheng
>>>>>>>>>>
>>>>>>>>>> -----邮件原件-----
>>>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>>>>> 收件人: Joseph Wang
>>>>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>>>>>>> MT
>>>>>>>>>>
>>>>>>>>>> Hi Joseph, all,
>>>>>>>>>>
>>>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>>>>> Mersenne Twisters).
>>>>>>>>>>
>>>>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>>>>
>>>>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>>>>> for use in at most 8 parallel threads), for the "standard"
>>>>>>>>>> value p = 19937 and word size 32, which one can instantiate
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>>>>
>>>>>>>>>> for i = 0, ... , 7.
>>>>>>>>>>
>>>>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>>>>> faster in the dcmt library than with the original ql twister.
>>>>>>>>>> I observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>>>>
>>>>>>>>>> All this is of course experimental and not well tested, so any
>>>>>>>>>> feedback and experiences are very welcome. I'd be very
>>>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>>>>> I've uploaded the changes to the
>>>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>>>>
>>>>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>>>>> critical
>>>>>>>>>> situation.
>>>>>>>>>>> Calculating the prices once I have the path is multithreaded,
>>>>>>>>>>> but right now I need to generate the paths in a single thread
>>>>>>>>>>> to make sure that the same sequence is generated.
>>>>>>>>>>>
>>>>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>>>>> regression test to fail. The problem is that the random
>>>>>>>>>>> number generator is being called in BarrierPathPricer, and
>>>>>>>>>>> since that is run multithread, the sequence that is being
>>>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>>>>
>>>>>>>>>>> I think that fixing this is going to need some code
>>>>>>>>>>> restructuring, but I'd like to get some thoughts as to how to
>>>>>>>>>>> do this. Basically, the interface needs to be changed
>>>>>>>>>>> slightly so that the random numbers are drawn in a fixed
>>>>>>>>>>> order, and that might mean one call to get any additional
>>>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -------------------------------------------------------------
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -----
>>>>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more.
>>>>>>>>>>> Get the most from the latest Intel processors and
>>>>>>>>>>> coprocessors. See abstracts and register >
>>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/41
>>>>>>>>>>> 4
>>>>>>>>>>> 0 / o stg.c lktrk
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>>> [hidden email]
>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>>
>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> ----------
>>>>>>>>>> --
>>>>>>>>>> Slashdot TV.
>>>>>>>>>> Video for Nerds. Stuff that matters.
>>>>>>>>>> http://tv.slashdot.org/
>>>>>>>>>> _______________________________________________
>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>> [hidden email]
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

------------------------------------------------------------------------------
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev

cheng li

答复: 答复: 答复: 答复: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Will do, my pleasure:)

Regards,
Cheng

-----邮件原件-----
发件人: Peter Caspers [mailto:[hidden email]]
发送时间: 2014年10月27日 3:49
收件人: cheng li
抄送: QuantLib Mailing Lists
主题: Re: 答复: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT

Hi,

I think I could further improve the performance of the precomputed twisters (i.e. the ones constructed as

MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);

). Now they seem to be just as fast as the original one (I only tested on Linux). The PR is updated.

Cheng, would you maybe like to double check ?

Thanks a lot
Peter

On 23 September 2014 03:50, cheng li <[hidden email]> wrote:

> Hi Peter,
>
> On my side the performance is also improved. Now around 2.5 slow down. Thanks for your help.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月22日 16:05
> 收件人: cheng li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo :
> Dynamic Creator MT
>
> yes, please. The slowdown on Windows on my office computer is around 1.6 now.
> best regards
> Peter
>
> On 22 September 2014 03:48, cheng li <[hidden email]> wrote:
>> Hi Peter,
>>
>> Thanks for your effort. I'll definitely have a try:)
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月21日 23:11
>> 收件人: cheng.li
>> 抄送: QuantLib Mailing Lists
>> 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo :
>> Dynamic Creator MT
>>
>> Hi Cheng,
>>
>> I switched to a template class for precomputed twisters, which is
>> faster by a factor of 2 (450ms instead of 870ms). This can be
>> instantiated with
>>
>> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);
>>
>> with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime.
>>
>> The pull request is updated accordingly.
>>
>> Best regards
>> Peter
>>
>>
>>
>>
>> On 21 September 2014 08:11, cheng.li <[hidden email]> wrote:
>>> Hi Peter,
>>>
>>> Thanks for your hard work. I think our results are consistent.
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>> 发送时间: 2014年9月21日 0:33
>>> 收件人: cheng li
>>> 抄送: QuantLib Mailing Lists
>>> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>> Creator MT
>>>
>>> Hi Cheng,
>>>
>>> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.
>>>
>>> Actually on Windows (same machine on which I run Ubuntu, which
>>> doesn't really matter, because my computer in office gives very
>>> similar
>>> timings) I get for 1E8 random numbers generated (with O2)
>>>
>>> 400ms / 1100ms
>>>
>>> for the original ql mt / dynamic creator mt. The ql mt is just as
>>> fast as the boost mt implementation by the way. On Ubuntu with gcc
>>> 4.8.1 and O3 I get
>>>
>>> 290ms / 870ms
>>>
>>> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.
>>>
>>> If I directly call the original C routine without using the wrapper object, I get 720ms.
>>>
>>> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.
>>>
>>> This means, the wrapper introduces a slow down by 20% which seems not too bad.
>>>
>>> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.
>>>
>>> What is your opinion on this ?
>>>
>>> Peter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.
>>>
>>> Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms
>>> /
>>> 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0
>>> => 1340 ms / 1150 ms
>>>
>>> clang
>>> 290
>>> 720
>>> 870
>>>
>>> (c 730)
>>>
>>> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.
>>>
>>> At the moment I stil don't know what it is.
>>>
>>> On 18 September 2014 03:33, cheng li <[hidden email]> wrote:
>>>> Let me try your statement once I have a time.
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> -----邮件原件-----
>>>> 发件人: cheng li [mailto:[hidden email]]
>>>> 发送时间: 2014年9月18日 9:18
>>>> 收件人: 'Peter Caspers'
>>>> 抄送: 'QuantLib Mailing Lists'
>>>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>> Creator MT
>>>>
>>>> Hi Peter,
>>>>
>>>> I used gcc 4.8.2.
>>>>
>>>> My result with O3 optimization is still not good. Similar
>>>> performance of new MT ( about 3~4X speed down)
>>>>
>>>> I used such statement to turn on o3 optimization before I do
>>>> ./configure for QuantLib,
>>>>
>>>> Export CXXFLAGS="-g -O3"
>>>>
>>>> Am I right?
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> -----邮件原件-----
>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>> 发送时间: 2014年9月18日 0:36
>>>> 收件人: cheng li
>>>> 抄送: QuantLib Mailing Lists
>>>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>> Creator MT
>>>>
>>>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
>>>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>>>>
>>>> Which compiler do you use on Ubuntu ?
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>>>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>>>>
>>>>> I'll try -O3 on my machine also with Ubuntu.
>>>>>
>>>>> Regards,
>>>>> Cheng
>>>>>
>>>>> -----邮件原件-----
>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>> 发送时间: 2014年9月17日 0:32
>>>>> 收件人: Cheng Li; QuantLib Mailing Lists
>>>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>> MT
>>>>>
>>>>> Hi Cheng,
>>>>>
>>>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>>>>
>>>>> Does anyone have an idea where the different behaviour under gcc /
>>>>> linux and msvc might come from (and how to improve the msvc side
>>>>> if
>>>>> possible) ?
>>>>>
>>>>> Kind regards
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>>>>> Thanks Peter.
>>>>>>
>>>>>> Regards,
>>>>>> Cheng
>>>>>>
>>>>>> 发自我的 iPad
>>>>>>
>>>>>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>>>>>
>>>>>>> I will have a look on monday ( I have a Windows machine at work
>>>>>>> ) and see how it works there
>>>>>>>
>>>>>>> Thanks
>>>>>>> Peter
>>>>>>>
>>>>>>> Von meinem iPhone gesendet
>>>>>>>
>>>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>>>>
>>>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>>>>> under release mode
>>>>>>>>
>>>>>>>> 发自我的 iPad
>>>>>>>>
>>>>>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>>>>>
>>>>>>>>> Hi Cheng,
>>>>>>>>>
>>>>>>>>> no, I get better timings with the dcmt implementation, e.g.
>>>>>>>>> for
>>>>>>>>> 1E8 numbers
>>>>>>>>>
>>>>>>>>> dcmt 0.982s
>>>>>>>>> quantlib 1.159s
>>>>>>>>>
>>>>>>>>> on my computer. Can you post your platform and compiler
>>>>>>>>> settings, so that I can try to reproduce ?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the
>>>>>>>>>> QL original MT. Is this consistent with your side?
>>>>>>>>>>
>>>>>>>>>> #include <ql/quantlib.hpp>
>>>>>>>>>> #include <boost/timer.hpp>
>>>>>>>>>> #include <iostream>
>>>>>>>>>>
>>>>>>>>>> using namespace QuantLib;
>>>>>>>>>> using namespace std;
>>>>>>>>>>
>>>>>>>>>> int main() {
>>>>>>>>>>
>>>>>>>>>> int samples;
>>>>>>>>>> cin >> samples;
>>>>>>>>>> boost::timer myTimer;
>>>>>>>>>>
>>>>>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>>>>>> for(Size i=0; i<samples; ++i)
>>>>>>>>>> orignalMT.next();
>>>>>>>>>>
>>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>>
>>>>>>>>>> myTimer.restart();
>>>>>>>>>>
>>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>>>>
>>>>>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>>>>>> mt.next();
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>>
>>>>>>>>>> int n;
>>>>>>>>>> std::cin>>n;
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Cheng
>>>>>>>>>>
>>>>>>>>>> -----邮件原件-----
>>>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>>>>> 收件人: Joseph Wang
>>>>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>>>>>>>> Creator MT
>>>>>>>>>>
>>>>>>>>>> Hi Joseph, all,
>>>>>>>>>>
>>>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>>>>> Mersenne Twisters).
>>>>>>>>>>
>>>>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>>>>
>>>>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>>>>> for use in at most 8 parallel threads), for the "standard"
>>>>>>>>>> value p = 19937 and word size 32, which one can instantiate
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>>>>
>>>>>>>>>> for i = 0, ... , 7.
>>>>>>>>>>
>>>>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>>>>> faster in the dcmt library than with the original ql twister.
>>>>>>>>>> I observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>>>>
>>>>>>>>>> All this is of course experimental and not well tested, so
>>>>>>>>>> any feedback and experiences are very welcome. I'd be very
>>>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>>>>> I've uploaded the changes to the
>>>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>>>>
>>>>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>>>>> critical
>>>>>>>>>> situation.
>>>>>>>>>>> Calculating the prices once I have the path is
>>>>>>>>>>> multithreaded, but right now I need to generate the paths in
>>>>>>>>>>> a single thread to make sure that the same sequence is generated.
>>>>>>>>>>>
>>>>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>>>>> regression test to fail. The problem is that the random
>>>>>>>>>>> number generator is being called in BarrierPathPricer, and
>>>>>>>>>>> since that is run multithread, the sequence that is being
>>>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>>>>
>>>>>>>>>>> I think that fixing this is going to need some code
>>>>>>>>>>> restructuring, but I'd like to get some thoughts as to how
>>>>>>>>>>> to do this. Basically, the interface needs to be changed
>>>>>>>>>>> slightly so that the random numbers are drawn in a fixed
>>>>>>>>>>> order, and that might mean one call to get any additional
>>>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -----
>>>>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more.
>>>>>>>>>>> Get the most from the latest Intel processors and
>>>>>>>>>>> coprocessors. See abstracts and register >
>>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4
>>>>>>>>>>> 1
>>>>>>>>>>> 4
>>>>>>>>>>> 0 / o stg.c lktrk
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>>> [hidden email]
>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> ----------
>>>>>>>>>> --
>>>>>>>>>> Slashdot TV.
>>>>>>>>>> Video for Nerds. Stuff that matters.
>>>>>>>>>> http://tv.slashdot.org/
>>>>>>>>>> _______________________________________________
>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>> [hidden email]
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

cheng li

答复: 答复: 答复: 答复: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

In reply to this post by Peter Caspers-4

Hi Peter,

It works great works on Windows.

Try 9999999999 samples:

Original MT: 35.63
Daynamic MT: 37.03

And also I try 100000, 100000000, 1000000000 samples,

The result are similar and the elapsed time grows linearly.

I tried vc++ 2012. The vc++ 2010 will work same in my opnion. I will get back to you when vc++ 2010 test finished.

Regards,
Cheng

-----邮件原件-----
发件人: Peter Caspers [mailto:[hidden email]]
发送时间: 2014年10月27日 3:49
收件人: cheng li
抄送: QuantLib Mailing Lists
主题: Re: 答复: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT

Hi,

I think I could further improve the performance of the precomputed twisters (i.e. the ones constructed as

MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);

). Now they seem to be just as fast as the original one (I only tested on Linux). The PR is updated.

Cheng, would you maybe like to double check ?

Thanks a lot
Peter

On 23 September 2014 03:50, cheng li <[hidden email]> wrote:

> Hi Peter,
>
> On my side the performance is also improved. Now around 2.5 slow down. Thanks for your help.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年9月22日 16:05
> 收件人: cheng li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo :
> Dynamic Creator MT
>
> yes, please. The slowdown on Windows on my office computer is around 1.6 now.
> best regards
> Peter
>
> On 22 September 2014 03:48, cheng li <[hidden email]> wrote:
>> Hi Peter,
>>
>> Thanks for your effort. I'll definitely have a try:)
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月21日 23:11
>> 收件人: cheng.li
>> 抄送: QuantLib Mailing Lists
>> 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo :
>> Dynamic Creator MT
>>
>> Hi Cheng,
>>
>> I switched to a template class for precomputed twisters, which is
>> faster by a factor of 2 (450ms instead of 870ms). This can be
>> instantiated with
>>
>> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);
>>
>> with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime.
>>
>> The pull request is updated accordingly.
>>
>> Best regards
>> Peter
>>
>>
>>
>>
>> On 21 September 2014 08:11, cheng.li <[hidden email]> wrote:
>>> Hi Peter,
>>>
>>> Thanks for your hard work. I think our results are consistent.
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>> 发送时间: 2014年9月21日 0:33
>>> 收件人: cheng li
>>> 抄送: QuantLib Mailing Lists
>>> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>> Creator MT
>>>
>>> Hi Cheng,
>>>
>>> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.
>>>
>>> Actually on Windows (same machine on which I run Ubuntu, which
>>> doesn't really matter, because my computer in office gives very
>>> similar
>>> timings) I get for 1E8 random numbers generated (with O2)
>>>
>>> 400ms / 1100ms
>>>
>>> for the original ql mt / dynamic creator mt. The ql mt is just as
>>> fast as the boost mt implementation by the way. On Ubuntu with gcc
>>> 4.8.1 and O3 I get
>>>
>>> 290ms / 870ms
>>>
>>> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.
>>>
>>> If I directly call the original C routine without using the wrapper object, I get 720ms.
>>>
>>> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.
>>>
>>> This means, the wrapper introduces a slow down by 20% which seems not too bad.
>>>
>>> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.
>>>
>>> What is your opinion on this ?
>>>
>>> Peter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.
>>>
>>> Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms
>>> /
>>> 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0
>>> => 1340 ms / 1150 ms
>>>
>>> clang
>>> 290
>>> 720
>>> 870
>>>
>>> (c 730)
>>>
>>> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.
>>>
>>> At the moment I stil don't know what it is.
>>>
>>> On 18 September 2014 03:33, cheng li <[hidden email]> wrote:
>>>> Let me try your statement once I have a time.
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> -----邮件原件-----
>>>> 发件人: cheng li [mailto:[hidden email]]
>>>> 发送时间: 2014年9月18日 9:18
>>>> 收件人: 'Peter Caspers'
>>>> 抄送: 'QuantLib Mailing Lists'
>>>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>> Creator MT
>>>>
>>>> Hi Peter,
>>>>
>>>> I used gcc 4.8.2.
>>>>
>>>> My result with O3 optimization is still not good. Similar
>>>> performance of new MT ( about 3~4X speed down)
>>>>
>>>> I used such statement to turn on o3 optimization before I do
>>>> ./configure for QuantLib,
>>>>
>>>> Export CXXFLAGS="-g -O3"
>>>>
>>>> Am I right?
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> -----邮件原件-----
>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>> 发送时间: 2014年9月18日 0:36
>>>> 收件人: cheng li
>>>> 抄送: QuantLib Mailing Lists
>>>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>> Creator MT
>>>>
>>>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
>>>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>>>>
>>>> Which compiler do you use on Ubuntu ?
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>>>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>>>>
>>>>> I'll try -O3 on my machine also with Ubuntu.
>>>>>
>>>>> Regards,
>>>>> Cheng
>>>>>
>>>>> -----邮件原件-----
>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>> 发送时间: 2014年9月17日 0:32
>>>>> 收件人: Cheng Li; QuantLib Mailing Lists
>>>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>> MT
>>>>>
>>>>> Hi Cheng,
>>>>>
>>>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>>>>
>>>>> Does anyone have an idea where the different behaviour under gcc /
>>>>> linux and msvc might come from (and how to improve the msvc side
>>>>> if
>>>>> possible) ?
>>>>>
>>>>> Kind regards
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>>>>> Thanks Peter.
>>>>>>
>>>>>> Regards,
>>>>>> Cheng
>>>>>>
>>>>>> 发自我的 iPad
>>>>>>
>>>>>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>>>>>
>>>>>>> I will have a look on monday ( I have a Windows machine at work
>>>>>>> ) and see how it works there
>>>>>>>
>>>>>>> Thanks
>>>>>>> Peter
>>>>>>>
>>>>>>> Von meinem iPhone gesendet
>>>>>>>
>>>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>>>>
>>>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>>>>> under release mode
>>>>>>>>
>>>>>>>> 发自我的 iPad
>>>>>>>>
>>>>>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>>>>>
>>>>>>>>> Hi Cheng,
>>>>>>>>>
>>>>>>>>> no, I get better timings with the dcmt implementation, e.g.
>>>>>>>>> for
>>>>>>>>> 1E8 numbers
>>>>>>>>>
>>>>>>>>> dcmt 0.982s
>>>>>>>>> quantlib 1.159s
>>>>>>>>>
>>>>>>>>> on my computer. Can you post your platform and compiler
>>>>>>>>> settings, so that I can try to reproduce ?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the
>>>>>>>>>> QL original MT. Is this consistent with your side?
>>>>>>>>>>
>>>>>>>>>> #include <ql/quantlib.hpp>
>>>>>>>>>> #include <boost/timer.hpp>
>>>>>>>>>> #include <iostream>
>>>>>>>>>>
>>>>>>>>>> using namespace QuantLib;
>>>>>>>>>> using namespace std;
>>>>>>>>>>
>>>>>>>>>> int main() {
>>>>>>>>>>
>>>>>>>>>> int samples;
>>>>>>>>>> cin >> samples;
>>>>>>>>>> boost::timer myTimer;
>>>>>>>>>>
>>>>>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>>>>>> for(Size i=0; i<samples; ++i)
>>>>>>>>>> orignalMT.next();
>>>>>>>>>>
>>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>>
>>>>>>>>>> myTimer.restart();
>>>>>>>>>>
>>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>>>>
>>>>>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>>>>>> mt.next();
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>>
>>>>>>>>>> int n;
>>>>>>>>>> std::cin>>n;
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Cheng
>>>>>>>>>>
>>>>>>>>>> -----邮件原件-----
>>>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>>>>> 收件人: Joseph Wang
>>>>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>>>>>>>> Creator MT
>>>>>>>>>>
>>>>>>>>>> Hi Joseph, all,
>>>>>>>>>>
>>>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>>>>> Mersenne Twisters).
>>>>>>>>>>
>>>>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>>>>
>>>>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>>>>> for use in at most 8 parallel threads), for the "standard"
>>>>>>>>>> value p = 19937 and word size 32, which one can instantiate
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>>>>
>>>>>>>>>> for i = 0, ... , 7.
>>>>>>>>>>
>>>>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>>>>> faster in the dcmt library than with the original ql twister.
>>>>>>>>>> I observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>>>>
>>>>>>>>>> All this is of course experimental and not well tested, so
>>>>>>>>>> any feedback and experiences are very welcome. I'd be very
>>>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>>>>> I've uploaded the changes to the
>>>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>>>>
>>>>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>>>>> critical
>>>>>>>>>> situation.
>>>>>>>>>>> Calculating the prices once I have the path is
>>>>>>>>>>> multithreaded, but right now I need to generate the paths in
>>>>>>>>>>> a single thread to make sure that the same sequence is generated.
>>>>>>>>>>>
>>>>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>>>>> regression test to fail. The problem is that the random
>>>>>>>>>>> number generator is being called in BarrierPathPricer, and
>>>>>>>>>>> since that is run multithread, the sequence that is being
>>>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>>>>
>>>>>>>>>>> I think that fixing this is going to need some code
>>>>>>>>>>> restructuring, but I'd like to get some thoughts as to how
>>>>>>>>>>> to do this. Basically, the interface needs to be changed
>>>>>>>>>>> slightly so that the random numbers are drawn in a fixed
>>>>>>>>>>> order, and that might mean one call to get any additional
>>>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -----
>>>>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more.
>>>>>>>>>>> Get the most from the latest Intel processors and
>>>>>>>>>>> coprocessors. See abstracts and register >
>>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4
>>>>>>>>>>> 1
>>>>>>>>>>> 4
>>>>>>>>>>> 0 / o stg.c lktrk
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>>> [hidden email]
>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> ----------
>>>>>>>>>> --
>>>>>>>>>> Slashdot TV.
>>>>>>>>>> Video for Nerds. Stuff that matters.
>>>>>>>>>> http://tv.slashdot.org/
>>>>>>>>>> _______________________________________________
>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>> [hidden email]
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Peter Caspers-4

Re: 答复: 答复: 答复: 答复: 答复: 答复: 答复: Openmp work on mcarlo : Dynamic Creator MT

Hello,

I made an attempt to add multi threading to the general Monte Carlo
framework. It is built around a multi threaded Mersenne twister class
using the dcmt library wrapper mentioned earlier. From an end user
perspective multi threading is enabled by compiling the library and
application with OpenMP enabled (--enable-openmp if you use configure)
and using a multithreaded RNG trait like for example

engine = MakeMCEuropeanHestonEngine<PseudoRandomMultiThreaded>(process)
.withStepsPerYear(11)
.withAntitheticVariate()
.withSamples(500000)
.withSeed(1234);

where the usual PseudoRandom is replaced by PseudoRandomMultiThreaded.
A nice property is that the results can be reproduced (up to round off
errors) as long as the seed is the same (of course) and the number of
threads being used are the same.

I only added a critical section when adding the path results to the
sample accumulator in MonteCarloModel (which does not cost much in my
tests). However the MC engines have to be carefully reviewed on a
single basis before using them multi threaded - usually they will give
non deterministic crashes with scary error messages. In particular the
path pricer and the process used for path generation must be made
thread safe. I did this (as an example) for the MCEuropeanHestonEngine
which essentially meant to ensure that these calls

riskFreeRate_->forwardRate(t0, t1, Continuous)
- dividendYield_->forwardRate(t0, t1, Continuous);

in the HestonProcess do not trigger write operations in the yield term
structures' underlying LazyObject's. That can be done by a critical
section for these calls, which however is a performance killer (on 8
threads roughly no speed up is achieved effectively compared to a
single thread then). A better solution is to trigger the computation
of the two yield term structures before the simulation, which I do
when the path pricer is created. It seems clear that during the
simulation no notifications whatsoever are sent, so this should work
fine. Still, only a speed up of 2x is achieved in this example on 8
threads, which is not very impressive. I am not sure at the moment why
this is not better. I did similar adjustments to the MCAmericanEngine
and the LongstaffSchwartzPathPricer. Speedup is even less here. I will
try to replace OpenMP by a more native approach later to see what is
going on. Nevertheless, the thing is flying at least.

If somebody is interested to have a look, here is the pull request
containing the changes in the framework, the added dcmt wrapper and
test cases.

https://github.com/lballabio/quantlib/pull/280

Thank you
Peter

On 27 October 2014 at 04:46, cheng li <[hidden email]> wrote:

> Hi Peter,
>
> It works great works on Windows.
>
> Try 9999999999 samples:
>
> Original MT: 35.63
> Daynamic MT: 37.03
>
> And also I try 100000, 100000000, 1000000000 samples,
>
> The result are similar and the elapsed time grows linearly.
>
> I tried vc++ 2012. The vc++ 2010 will work same in my opnion. I will get back to you when vc++ 2010 test finished.
>
> Regards,
> Cheng
>
> -----邮件原件-----
> 发件人: Peter Caspers [mailto:[hidden email]]
> 发送时间: 2014年10月27日 3:49
> 收件人: cheng li
> 抄送: QuantLib Mailing Lists
> 主题: Re: 答复: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator MT
>
> Hi,
>
> I think I could further improve the performance of the precomputed twisters (i.e. the ones constructed as
>
> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);
>
> ). Now they seem to be just as fast as the original one (I only tested on Linux). The PR is updated.
>
> Cheng, would you maybe like to double check ?
>
> Thanks a lot
> Peter
>
> On 23 September 2014 03:50, cheng li <[hidden email]> wrote:
>> Hi Peter,
>>
>> On my side the performance is also improved. Now around 2.5 slow down. Thanks for your help.
>>
>> Regards,
>> Cheng
>>
>> -----邮件原件-----
>> 发件人: Peter Caspers [mailto:[hidden email]]
>> 发送时间: 2014年9月22日 16:05
>> 收件人: cheng li
>> 抄送: QuantLib Mailing Lists
>> 主题: Re: 答复: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo :
>> Dynamic Creator MT
>>
>> yes, please. The slowdown on Windows on my office computer is around 1.6 now.
>> best regards
>> Peter
>>
>> On 22 September 2014 03:48, cheng li <[hidden email]> wrote:
>>> Hi Peter,
>>>
>>> Thanks for your effort. I'll definitely have a try:)
>>>
>>> Regards,
>>> Cheng
>>>
>>> -----邮件原件-----
>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>> 发送时间: 2014年9月21日 23:11
>>> 收件人: cheng.li
>>> 抄送: QuantLib Mailing Lists
>>> 主题: Re: 答复: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo :
>>> Dynamic Creator MT
>>>
>>> Hi Cheng,
>>>
>>> I switched to a template class for precomputed twisters, which is
>>> faster by a factor of 2 (450ms instead of 870ms). This can be
>>> instantiated with
>>>
>>> MersenneTwisterCustomRng<Mtdesc19937_5> mt(42);
>>>
>>> with 5 replaceable by 0 to 7 as before. The other is only needed now if you want to create a mt during runtime.
>>>
>>> The pull request is updated accordingly.
>>>
>>> Best regards
>>> Peter
>>>
>>>
>>>
>>>
>>> On 21 September 2014 08:11, cheng.li <[hidden email]> wrote:
>>>> Hi Peter,
>>>>
>>>> Thanks for your hard work. I think our results are consistent.
>>>>
>>>> Regards,
>>>> Cheng
>>>>
>>>> -----邮件原件-----
>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>> 发送时间: 2014年9月21日 0:33
>>>> 收件人: cheng li
>>>> 抄送: QuantLib Mailing Lists
>>>> 主题: Re: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>> Creator MT
>>>>
>>>> Hi Cheng,
>>>>
>>>> sorry, this was my fault, I messed up the timings, because I did not use consistent optimizer flags when compiling the library and the test program.
>>>>
>>>> Actually on Windows (same machine on which I run Ubuntu, which
>>>> doesn't really matter, because my computer in office gives very
>>>> similar
>>>> timings) I get for 1E8 random numbers generated (with O2)
>>>>
>>>> 400ms / 1100ms
>>>>
>>>> for the original ql mt / dynamic creator mt. The ql mt is just as
>>>> fast as the boost mt implementation by the way. On Ubuntu with gcc
>>>> 4.8.1 and O3 I get
>>>>
>>>> 290ms / 870ms
>>>>
>>>> and with O2 a close value, for the creator mt 910ms. Also it makes no difference if I use gcc 4.9.1 or clang 3.6.0.
>>>>
>>>> If I directly call the original C routine without using the wrapper object, I get 720ms.
>>>>
>>>> If I use the original library and a C example (both compiled with O3, this is the configuration how the library is shipped (it has a hardcoded make file)) => 730ms.
>>>>
>>>> This means, the wrapper introduces a slow down by 20% which seems not too bad.
>>>>
>>>> Otherwise the dcmt is slower by a factor of around 2-3 compared to the original mt in all cases. Since this is already the case with the original library, I wouldn't try to do anything about it at the moment.
>>>>
>>>> What is your opinion on this ?
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I compared dfiferent platforms again, but now on the _same_ machine - Original MT / Dynamic Creator MT (generation of 1E8 numbers, single threaded, with O2 (MSVC) and O3 (gcc, clang)). I also checked the boost implementation mt19937, which is very close to the ql original mt in all cases.
>>>>
>>>> Winodws / MSVC 2010 => 400ms / 1100ms Ubuntu / gcc 4.9.1 => 1200 ms
>>>> /
>>>> 1050 ms Ubuntu / gcc 4.8.1 => 1180 ms / 1040 ms Ubuntu / clang 3.6.0
>>>> => 1340 ms / 1150 ms
>>>>
>>>> clang
>>>> 290
>>>> 720
>>>> 870
>>>>
>>>> (c 730)
>>>>
>>>> so it looks like MSVC does a specific optimization for the QL and boost mt19937, which does not apply on the other platforms and not the the dynamic creator mt.
>>>>
>>>> At the moment I stil don't know what it is.
>>>>
>>>> On 18 September 2014 03:33, cheng li <[hidden email]> wrote:
>>>>> Let me try your statement once I have a time.
>>>>>
>>>>> Regards,
>>>>> Cheng
>>>>>
>>>>> -----邮件原件-----
>>>>> 发件人: cheng li [mailto:[hidden email]]
>>>>> 发送时间: 2014年9月18日 9:18
>>>>> 收件人: 'Peter Caspers'
>>>>> 抄送: 'QuantLib Mailing Lists'
>>>>> 主题: 答复: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>>> Creator MT
>>>>>
>>>>> Hi Peter,
>>>>>
>>>>> I used gcc 4.8.2.
>>>>>
>>>>> My result with O3 optimization is still not good. Similar
>>>>> performance of new MT ( about 3~4X speed down)
>>>>>
>>>>> I used such statement to turn on o3 optimization before I do
>>>>> ./configure for QuantLib,
>>>>>
>>>>> Export CXXFLAGS="-g -O3"
>>>>>
>>>>> Am I right?
>>>>>
>>>>> Regards,
>>>>> Cheng
>>>>>
>>>>> -----邮件原件-----
>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>> 发送时间: 2014年9月18日 0:36
>>>>> 收件人: cheng li
>>>>> 抄送: QuantLib Mailing Lists
>>>>> 主题: Re: 答复: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>>> Creator MT
>>>>>
>>>>> with gcc 4.9.1 and O2 the new mt is a bit slower than the original one (but only by a factor of 1.1).
>>>>> I have to add both -frename-registers, -finline-functions to -O2 to get the speed up back I mentioned before.
>>>>>
>>>>> Which compiler do you use on Ubuntu ?
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>> On 17 September 2014 03:26, cheng li <[hidden email]> wrote:
>>>>>> Thanks Peter. I test on Ubuntu also, about 3~4X lower with -O2 optiomization.
>>>>>>
>>>>>> I'll try -O3 on my machine also with Ubuntu.
>>>>>>
>>>>>> Regards,
>>>>>> Cheng
>>>>>>
>>>>>> -----邮件原件-----
>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>> 发送时间: 2014年9月17日 0:32
>>>>>> 收件人: Cheng Li; QuantLib Mailing Lists
>>>>>> 主题: Re: 答复: [Quantlib-dev] Openmp work on mcarlo : Dynamic Creator
>>>>>> MT
>>>>>>
>>>>>> Hi Cheng,
>>>>>>
>>>>>> indeed with msvc I get a slow down with a factor of ~2.8x. As I said, under gcc it is a speed up ~ 0.8x (with -O3).
>>>>>>
>>>>>> Does anyone have an idea where the different behaviour under gcc /
>>>>>> linux and msvc might come from (and how to improve the msvc side
>>>>>> if
>>>>>> possible) ?
>>>>>>
>>>>>> Kind regards
>>>>>> Peter
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 13 September 2014 08:27, Cheng Li <[hidden email]> wrote:
>>>>>>> Thanks Peter.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Cheng
>>>>>>>
>>>>>>> 发自我的 iPad
>>>>>>>
>>>>>>>> 在 2014年9月13日，13:29，Peter Caspers <[hidden email]> 写道：
>>>>>>>>
>>>>>>>> I will have a look on monday ( I have a Windows machine at work
>>>>>>>> ) and see how it works there
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> Von meinem iPhone gesendet
>>>>>>>>
>>>>>>>>> Am 13.09.2014 um 04:41 schrieb Cheng Li <[hidden email]>:
>>>>>>>>>
>>>>>>>>> I am on Win7 x64bit, using vs 2012 with quantlib 1.4 boost 1.55
>>>>>>>>> under release mode
>>>>>>>>>
>>>>>>>>> 发自我的 iPad
>>>>>>>>>
>>>>>>>>>> 在 2014年9月13日，0:08，Peter Caspers <[hidden email]> 写道：
>>>>>>>>>>
>>>>>>>>>> Hi Cheng,
>>>>>>>>>>
>>>>>>>>>> no, I get better timings with the dcmt implementation, e.g.
>>>>>>>>>> for
>>>>>>>>>> 1E8 numbers
>>>>>>>>>>
>>>>>>>>>> dcmt 0.982s
>>>>>>>>>> quantlib 1.159s
>>>>>>>>>>
>>>>>>>>>> on my computer. Can you post your platform and compiler
>>>>>>>>>> settings, so that I can try to reproduce ?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>>> On 12 September 2014 05:29, cheng li <[hidden email]> wrote:
>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>
>>>>>>>>>>> I have used your wrapper dcmt library and test with following
>>>>>>>>>>> codes: It seems dcmt in single thread is 4X slower than the
>>>>>>>>>>> QL original MT. Is this consistent with your side?
>>>>>>>>>>>
>>>>>>>>>>> #include <ql/quantlib.hpp>
>>>>>>>>>>> #include <boost/timer.hpp>
>>>>>>>>>>> #include <iostream>
>>>>>>>>>>>
>>>>>>>>>>> using namespace QuantLib;
>>>>>>>>>>> using namespace std;
>>>>>>>>>>>
>>>>>>>>>>> int main() {
>>>>>>>>>>>
>>>>>>>>>>> int samples;
>>>>>>>>>>> cin >> samples;
>>>>>>>>>>> boost::timer myTimer;
>>>>>>>>>>>
>>>>>>>>>>> MersenneTwisterUniformRng orignalMT;
>>>>>>>>>>> for(Size i=0; i<samples; ++i)
>>>>>>>>>>> orignalMT.next();
>>>>>>>>>>>
>>>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>>>
>>>>>>>>>>> myTimer.restart();
>>>>>>>>>>>
>>>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[5] , 1);
>>>>>>>>>>>
>>>>>>>>>>> for(Size i=0; i<samples; ++i) {
>>>>>>>>>>> mt.next();
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> cout << myTimer.elapsed() << endl;
>>>>>>>>>>>
>>>>>>>>>>> int n;
>>>>>>>>>>> std::cin>>n;
>>>>>>>>>>> return 0;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Cheng
>>>>>>>>>>>
>>>>>>>>>>> -----邮件原件-----
>>>>>>>>>>> 发件人: Peter Caspers [mailto:[hidden email]]
>>>>>>>>>>> 发送时间: 2014年9月6日 20:48
>>>>>>>>>>> 收件人: Joseph Wang
>>>>>>>>>>> 抄送: QuantLib Mailing Lists
>>>>>>>>>>> 主题: Re: [Quantlib-dev] Openmp work on mcarlo : Dynamic
>>>>>>>>>>> Creator MT
>>>>>>>>>>>
>>>>>>>>>>> Hi Joseph, all,
>>>>>>>>>>>
>>>>>>>>>>> I added a wrapper for the dcmt library (Dynamic Creator of
>>>>>>>>>>> Mersenne Twisters).
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/lballabio/quantlib/pull/132
>>>>>>>>>>>
>>>>>>>>>>> I guess this is a useful building block for multithreaded monte carlo.
>>>>>>>>>>> Since for bigger p the dynamic creation takes a long time (it
>>>>>>>>>>> feels more like mining than computing ...), I precomputed 8 independent instances (i.e.
>>>>>>>>>>> for use in at most 8 parallel threads), for the "standard"
>>>>>>>>>>> value p = 19937 and word size 32, which one can instantiate
>>>>>>>>>>> with
>>>>>>>>>>>
>>>>>>>>>>> MersenneTwisterDynamicRng mt( mtdesc_0_8_19937[i] , seed_i );
>>>>>>>>>>>
>>>>>>>>>>> for i = 0, ... , 7.
>>>>>>>>>>>
>>>>>>>>>>> In addition the speed of random number generation seems a bit
>>>>>>>>>>> faster in the dcmt library than with the original ql twister.
>>>>>>>>>>> I observe running times scaled by a factor of 0.8 when generating 1E8 numbers.
>>>>>>>>>>>
>>>>>>>>>>> All this is of course experimental and not well tested, so
>>>>>>>>>>> any feedback and experiences are very welcome. I'd be very
>>>>>>>>>>> interested in your opinion on the dcmt library and applications in parallel monte carlo.
>>>>>>>>>>>
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>>> On 20 October 2013 16:01, Joseph Wang <[hidden email]> wrote:
>>>>>>>>>>>> I've done some more parallelization with openmp and quantlib.
>>>>>>>>>>>> I've uploaded the changes to the
>>>>>>>>>>>> https://github.com/joequant/quantlib. The branch openmp has some changes that I've issued a pull-request for.
>>>>>>>>>>>> openmp-mcario has some changes that need some more work.
>>>>>>>>>>>>
>>>>>>>>>>>> I've gotten the MC to work by generating the paths in a
>>>>>>>>>>>> critical
>>>>>>>>>>> situation.
>>>>>>>>>>>> Calculating the prices once I have the path is
>>>>>>>>>>>> multithreaded, but right now I need to generate the paths in
>>>>>>>>>>>> a single thread to make sure that the same sequence is generated.
>>>>>>>>>>>>
>>>>>>>>>>>> The big issue right now is that there is a race condition in
>>>>>>>>>>>> the calculation of barrier options which is causing one
>>>>>>>>>>>> regression test to fail. The problem is that the random
>>>>>>>>>>>> number generator is being called in BarrierPathPricer, and
>>>>>>>>>>>> since that is run multithread, the sequence that is being
>>>>>>>>>>>> pulled will change from run to run based on whether other paths have pulled random numbers already.
>>>>>>>>>>>>
>>>>>>>>>>>> I think that fixing this is going to need some code
>>>>>>>>>>>> restructuring, but I'd like to get some thoughts as to how
>>>>>>>>>>>> to do this. Basically, the interface needs to be changed
>>>>>>>>>>>> slightly so that the random numbers are drawn in a fixed
>>>>>>>>>>>> order, and that might mean one call to get any additional
>>>>>>>>>>>> random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>> -
>>>>>>>>>>>> -
>>>>>>>>>>>> -
>>>>>>>>>>>> -
>>>>>>>>>>>> -
>>>>>>>>>>>> -----
>>>>>>>>>>>> -------- October Webinars: Code for Performance Free Intel
>>>>>>>>>>>> webinars can help you accelerate application performance.
>>>>>>>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more.
>>>>>>>>>>>> Get the most from the latest Intel processors and
>>>>>>>>>>>> coprocessors. See abstracts and register >
>>>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4
>>>>>>>>>>>> 1
>>>>>>>>>>>> 4
>>>>>>>>>>>> 0 / o stg.c lktrk
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>>>
>>>>>>>>>>> -------------------------------------------------------------
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> -
>>>>>>>>>>> ----------
>>>>>>>>>>> --
>>>>>>>>>>> Slashdot TV.
>>>>>>>>>>> Video for Nerds. Stuff that matters.
>>>>>>>>>>> http://tv.slashdot.org/
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> QuantLib-dev mailing list
>>>>>>>>>>> [hidden email]
>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>>>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>