Openmp work on mcarlo

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Openmp work on mcarlo

Joseph Wang-4
I've done some more parallelization with openmp and quantlib.  I've uploaded the changes to the https://github.com/joequant/quantlib.  The branch openmp has some changes that I've issued a pull-request for.  openmp-mcario has some changes that need some more work. 

I've gotten the MC to work by generating the paths in a critical situation.  Calculating the prices once I have the path is multithreaded, but right now I need to generate the paths in a single thread to make sure that the same sequence is generated.

The big issue right now is that there is a race condition in the calculation of barrier options which is causing one regression test to fail.  The problem is that the random number generator is being called in BarrierPathPricer, and since that is run multithread, the sequence that is being pulled will change from run to run based on whether other paths have pulled random numbers already.

I think that fixing this is going to need some code restructuring, but I'd like to get some thoughts as to how to do this.  Basically, the interface needs to be changed slightly so that the random numbers are drawn in a fixed order, and that might mean one call to get any additional random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.




------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: Openmp work on mcarlo

Michael Sharpe
Does the number of random numbers needed per iteration change or is it a constant amount? 

Would it be possible to encapsulate the random generator state so each thread could own its own RNG?

If that isn't feasible or reduces the quality of the generation, would it be possible to spawn a producer thread that pushes random numbers onto a queue (could even investigate boost::lockfree::queue to avoid locking, though it requires boost 1.53) and have the worker thread just pop off from that queue whenever a new random number is needed?

Mike


On Sun, Oct 20, 2013 at 7:01 AM, Joseph Wang <[hidden email]> wrote:
I've done some more parallelization with openmp and quantlib.  I've uploaded the changes to the https://github.com/joequant/quantlib.  The branch openmp has some changes that I've issued a pull-request for.  openmp-mcario has some changes that need some more work. 

I've gotten the MC to work by generating the paths in a critical situation.  Calculating the prices once I have the path is multithreaded, but right now I need to generate the paths in a single thread to make sure that the same sequence is generated.

The big issue right now is that there is a race condition in the calculation of barrier options which is causing one regression test to fail.  The problem is that the random number generator is being called in BarrierPathPricer, and since that is run multithread, the sequence that is being pulled will change from run to run based on whether other paths have pulled random numbers already.

I think that fixing this is going to need some code restructuring, but I'd like to get some thoughts as to how to do this.  Basically, the interface needs to be changed slightly so that the random numbers are drawn in a fixed order, and that might mean one call to get any additional random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.




------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: Openmp work on mcarlo

Kyle Schlansker
I think the standard solution is to have each thread maintain its own state (i.e. rand seed).  

Having a shared q, lock free or not, would then introduce nondeterminism unless all consumer threads read from that q in the same order from run to run, which it seems is not guaranteed.

--
kyle

Sent from my mobile; apologies for any deficiencies of spelling or message tone.

On Oct 21, 2013, at 2:53 PM, Mike Sharpe <[hidden email]> wrote:

Does the number of random numbers needed per iteration change or is it a constant amount? 

Would it be possible to encapsulate the random generator state so each thread could own its own RNG?

If that isn't feasible or reduces the quality of the generation, would it be possible to spawn a producer thread that pushes random numbers onto a queue (could even investigate boost::lockfree::queue to avoid locking, though it requires boost 1.53) and have the worker thread just pop off from that queue whenever a new random number is needed?

Mike


On Sun, Oct 20, 2013 at 7:01 AM, Joseph Wang <[hidden email]> wrote:
I've done some more parallelization with openmp and quantlib.  I've uploaded the changes to the https://github.com/joequant/quantlib.  The branch openmp has some changes that I've issued a pull-request for.  openmp-mcario has some changes that need some more work. 

I've gotten the MC to work by generating the paths in a critical situation.  Calculating the prices once I have the path is multithreaded, but right now I need to generate the paths in a single thread to make sure that the same sequence is generated.

The big issue right now is that there is a race condition in the calculation of barrier options which is causing one regression test to fail.  The problem is that the random number generator is being called in BarrierPathPricer, and since that is run multithread, the sequence that is being pulled will change from run to run based on whether other paths have pulled random numbers already.

I think that fixing this is going to need some code restructuring, but I'd like to get some thoughts as to how to do this.  Basically, the interface needs to be changed slightly so that the random numbers are drawn in a fixed order, and that might mean one call to get any additional random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.




------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: Openmp work on mcarlo

Joseph Wang-4
It's possible to figure out the number of random number evaluations, and then skip ahead N evaluations for each thread.  In practice this works but the code is very fragile.  Having each thread have its own state is also used but it requires the user to know ahead of time how many threads to use, and you have to be very careful not to introduce bad correlations.

I'm trying to get deterministic order by putting the RNG into a critical section.  Right now the goal isn't to get the same order between non-OMP and OMP but to at least get identical answers between different OMP runs.  This isn't a hard algorithm issue, but its a C++ class/architecture issue, since it involves changing the interface of the MC classes, and I'd like to get this right before moving anything to the main code branch.

One other technique that I've see is to generate all of the paths ahead of time, store in memory and the run the pricing afterwards.  This gets you deterministic ordering, and you can do clever things with RNG, and it's very useful for generating greeks, but you then have to carefully manage memory, and I want to avoid that.

I've been able to get nice speedups with OpenMP with FD and lattice, and I'd appreciate it if people could test the patches (openmp branch on joequant/quantlib github).  One thing that MP does is favor simple schemes so I've parallelized those.  There are some well known MP algorithms for tridiagonalizing a matrix and I can implement that once MC gets put in.



On Tue, Oct 22, 2013 at 4:31 AM, Kyle Schlansker <[hidden email]> wrote:
I think the standard solution is to have each thread maintain its own state (i.e. rand seed).  

Having a shared q, lock free or not, would then introduce nondeterminism unless all consumer threads read from that q in the same order from run to run, which it seems is not guaranteed.

--
kyle

Sent from my mobile; apologies for any deficiencies of spelling or message tone.

On Oct 21, 2013, at 2:53 PM, Mike Sharpe <[hidden email]> wrote:

Does the number of random numbers needed per iteration change or is it a constant amount? 

Would it be possible to encapsulate the random generator state so each thread could own its own RNG?

If that isn't feasible or reduces the quality of the generation, would it be possible to spawn a producer thread that pushes random numbers onto a queue (could even investigate boost::lockfree::queue to avoid locking, though it requires boost 1.53) and have the worker thread just pop off from that queue whenever a new random number is needed?

Mike


On Sun, Oct 20, 2013 at 7:01 AM, Joseph Wang <[hidden email]> wrote:
I've done some more parallelization with openmp and quantlib.  I've uploaded the changes to the https://github.com/joequant/quantlib.  The branch openmp has some changes that I've issued a pull-request for.  openmp-mcario has some changes that need some more work. 

I've gotten the MC to work by generating the paths in a critical situation.  Calculating the prices once I have the path is multithreaded, but right now I need to generate the paths in a single thread to make sure that the same sequence is generated.

The big issue right now is that there is a race condition in the calculation of barrier options which is causing one regression test to fail.  The problem is that the random number generator is being called in BarrierPathPricer, and since that is run multithread, the sequence that is being pulled will change from run to run based on whether other paths have pulled random numbers already.

I think that fixing this is going to need some code restructuring, but I'd like to get some thoughts as to how to do this.  Basically, the interface needs to be changed slightly so that the random numbers are drawn in a fixed order, and that might mean one call to get any additional random numbers in a pricer, which gets called in a critical section, and another to run the pricer with the random numbers.




------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: Openmp work on mcarlo

Luigi Ballabio
Hi Joseph,
    it will probably be a while before we add to the main branch any
changes that break backward compatibility.  How about I create a
long-lived branch for your changes and I pull your changes there? The
advantage would be that your code might be found more easily. On the
other hand, the disadvantage is that pull requests from other people
might get made against my repo, not yours, even though the right
person to manage them would be you instead; this would be easier if
your fork was the reference for further development.

Just let me know what you prefer.

Later,
    Luigi



On Tue, Oct 22, 2013 at 4:51 AM, Joseph Wang <[hidden email]> wrote:

> It's possible to figure out the number of random number evaluations, and
> then skip ahead N evaluations for each thread.  In practice this works but
> the code is very fragile.  Having each thread have its own state is also
> used but it requires the user to know ahead of time how many threads to use,
> and you have to be very careful not to introduce bad correlations.
>
> I'm trying to get deterministic order by putting the RNG into a critical
> section.  Right now the goal isn't to get the same order between non-OMP and
> OMP but to at least get identical answers between different OMP runs.  This
> isn't a hard algorithm issue, but its a C++ class/architecture issue, since
> it involves changing the interface of the MC classes, and I'd like to get
> this right before moving anything to the main code branch.
>
> One other technique that I've see is to generate all of the paths ahead of
> time, store in memory and the run the pricing afterwards.  This gets you
> deterministic ordering, and you can do clever things with RNG, and it's very
> useful for generating greeks, but you then have to carefully manage memory,
> and I want to avoid that.
>
> I've been able to get nice speedups with OpenMP with FD and lattice, and I'd
> appreciate it if people could test the patches (openmp branch on
> joequant/quantlib github).  One thing that MP does is favor simple schemes
> so I've parallelized those.  There are some well known MP algorithms for
> tridiagonalizing a matrix and I can implement that once MC gets put in.
>
>
>
> On Tue, Oct 22, 2013 at 4:31 AM, Kyle Schlansker <[hidden email]>
> wrote:
>>
>> I think the standard solution is to have each thread maintain its own
>> state (i.e. rand seed).
>>
>> Having a shared q, lock free or not, would then introduce nondeterminism
>> unless all consumer threads read from that q in the same order from run to
>> run, which it seems is not guaranteed.
>>
>> --
>> kyle
>>
>> Sent from my mobile; apologies for any deficiencies of spelling or message
>> tone.
>>
>> On Oct 21, 2013, at 2:53 PM, Mike Sharpe <[hidden email]> wrote:
>>
>> Does the number of random numbers needed per iteration change or is it a
>> constant amount?
>>
>> Would it be possible to encapsulate the random generator state so each
>> thread could own its own RNG?
>>
>> If that isn't feasible or reduces the quality of the generation, would it
>> be possible to spawn a producer thread that pushes random numbers onto a
>> queue (could even investigate boost::lockfree::queue to avoid locking,
>> though it requires boost 1.53) and have the worker thread just pop off from
>> that queue whenever a new random number is needed?
>>
>> Mike
>>
>>
>> On Sun, Oct 20, 2013 at 7:01 AM, Joseph Wang <[hidden email]> wrote:
>>>
>>> I've done some more parallelization with openmp and quantlib.  I've
>>> uploaded the changes to the https://github.com/joequant/quantlib.  The
>>> branch openmp has some changes that I've issued a pull-request for.
>>> openmp-mcario has some changes that need some more work.
>>>
>>> I've gotten the MC to work by generating the paths in a critical
>>> situation.  Calculating the prices once I have the path is multithreaded,
>>> but right now I need to generate the paths in a single thread to make sure
>>> that the same sequence is generated.
>>>
>>> The big issue right now is that there is a race condition in the
>>> calculation of barrier options which is causing one regression test to fail.
>>> The problem is that the random number generator is being called in
>>> BarrierPathPricer, and since that is run multithread, the sequence that is
>>> being pulled will change from run to run based on whether other paths have
>>> pulled random numbers already.
>>>
>>> I think that fixing this is going to need some code restructuring, but
>>> I'd like to get some thoughts as to how to do this.  Basically, the
>>> interface needs to be changed slightly so that the random numbers are drawn
>>> in a fixed order, and that might mean one call to get any additional random
>>> numbers in a pricer, which gets called in a critical section, and another to
>>> run the pricer with the random numbers.
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> October Webinars: Code for Performance
>>> Free Intel webinars can help you accelerate application performance.
>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>>> from
>>> the latest Intel processors and coprocessors. See abstracts and register
>>> >
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> QuantLib-dev mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>> from
>> the latest Intel processors and coprocessors. See abstracts and register >
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
>>
>> _______________________________________________
>> QuantLib-dev mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
> _______________________________________________
> QuantLib-dev mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>



--
<https://implementingquantlib.blogspot.com>
<https://twitter.com/lballabio>

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: Openmp work on mcarlo

Joseph Wang-4
Can we move onto the main tree any changes that don't break backward compatibility and then leave anything that does break compatibility on the outside until we figure out what the best way of handling things are?

If we default openmp to off, then the changes to lattice and fd give some speedups without breaking compatbility.  We can then work on the things that break compatibility separately.  Parallelzing mcarlo turns out to be extremely tricky so I want to work on it separately.  In particular, I want to make absolutely sure that there is no way we can still maintain backward compatibility.

When I worked on a similar problem on a different code base, we managed to preserve backward compatibility, but is was a really, really hard slog involving a lot of very delicate coding.  If I can play with the code for a while, it might be possible to do the same thing.



On Tue, Oct 22, 2013 at 6:13 PM, Luigi Ballabio <[hidden email]> wrote:
Hi Joseph,
    it will probably be a while before we add to the main branch any
changes that break backward compatibility.  How about I create a
long-lived branch for your changes and I pull your changes there? The
advantage would be that your code might be found more easily. On the
other hand, the disadvantage is that pull requests from other people
might get made against my repo, not yours, even though the right
person to manage them would be you instead; this would be easier if
your fork was the reference for further development.

Just let me know what you prefer.

Later,
    Luigi



On Tue, Oct 22, 2013 at 4:51 AM, Joseph Wang <[hidden email]> wrote:
> It's possible to figure out the number of random number evaluations, and
> then skip ahead N evaluations for each thread.  In practice this works but
> the code is very fragile.  Having each thread have its own state is also
> used but it requires the user to know ahead of time how many threads to use,
> and you have to be very careful not to introduce bad correlations.
>
> I'm trying to get deterministic order by putting the RNG into a critical
> section.  Right now the goal isn't to get the same order between non-OMP and
> OMP but to at least get identical answers between different OMP runs.  This
> isn't a hard algorithm issue, but its a C++ class/architecture issue, since
> it involves changing the interface of the MC classes, and I'd like to get
> this right before moving anything to the main code branch.
>
> One other technique that I've see is to generate all of the paths ahead of
> time, store in memory and the run the pricing afterwards.  This gets you
> deterministic ordering, and you can do clever things with RNG, and it's very
> useful for generating greeks, but you then have to carefully manage memory,
> and I want to avoid that.
>
> I've been able to get nice speedups with OpenMP with FD and lattice, and I'd
> appreciate it if people could test the patches (openmp branch on
> joequant/quantlib github).  One thing that MP does is favor simple schemes
> so I've parallelized those.  There are some well known MP algorithms for
> tridiagonalizing a matrix and I can implement that once MC gets put in.
>
>
>
> On Tue, Oct 22, 2013 at 4:31 AM, Kyle Schlansker <[hidden email]>
> wrote:
>>
>> I think the standard solution is to have each thread maintain its own
>> state (i.e. rand seed).
>>
>> Having a shared q, lock free or not, would then introduce nondeterminism
>> unless all consumer threads read from that q in the same order from run to
>> run, which it seems is not guaranteed.
>>
>> --
>> kyle
>>
>> Sent from my mobile; apologies for any deficiencies of spelling or message
>> tone.
>>
>> On Oct 21, 2013, at 2:53 PM, Mike Sharpe <[hidden email]> wrote:
>>
>> Does the number of random numbers needed per iteration change or is it a
>> constant amount?
>>
>> Would it be possible to encapsulate the random generator state so each
>> thread could own its own RNG?
>>
>> If that isn't feasible or reduces the quality of the generation, would it
>> be possible to spawn a producer thread that pushes random numbers onto a
>> queue (could even investigate boost::lockfree::queue to avoid locking,
>> though it requires boost 1.53) and have the worker thread just pop off from
>> that queue whenever a new random number is needed?
>>
>> Mike
>>
>>
>> On Sun, Oct 20, 2013 at 7:01 AM, Joseph Wang <[hidden email]> wrote:
>>>
>>> I've done some more parallelization with openmp and quantlib.  I've
>>> uploaded the changes to the https://github.com/joequant/quantlib.  The
>>> branch openmp has some changes that I've issued a pull-request for.
>>> openmp-mcario has some changes that need some more work.
>>>
>>> I've gotten the MC to work by generating the paths in a critical
>>> situation.  Calculating the prices once I have the path is multithreaded,
>>> but right now I need to generate the paths in a single thread to make sure
>>> that the same sequence is generated.
>>>
>>> The big issue right now is that there is a race condition in the
>>> calculation of barrier options which is causing one regression test to fail.
>>> The problem is that the random number generator is being called in
>>> BarrierPathPricer, and since that is run multithread, the sequence that is
>>> being pulled will change from run to run based on whether other paths have
>>> pulled random numbers already.
>>>
>>> I think that fixing this is going to need some code restructuring, but
>>> I'd like to get some thoughts as to how to do this.  Basically, the
>>> interface needs to be changed slightly so that the random numbers are drawn
>>> in a fixed order, and that might mean one call to get any additional random
>>> numbers in a pricer, which gets called in a critical section, and another to
>>> run the pricer with the random numbers.
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> October Webinars: Code for Performance
>>> Free Intel webinars can help you accelerate application performance.
>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>>> from
>>> the latest Intel processors and coprocessors. See abstracts and register
>>> >
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> QuantLib-dev mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>> from
>> the latest Intel processors and coprocessors. See abstracts and register >
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
>>
>> _______________________________________________
>> QuantLib-dev mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
> _______________________________________________
> QuantLib-dev mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>



--
<https://implementingquantlib.blogspot.com>
<https://twitter.com/lballabio>


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev
Reply | Threaded
Open this post in threaded view
|

Re: Openmp work on mcarlo

Luigi Ballabio
It sounds like a plan.  I'll have to look into defaulting openmp to
off (the macro defaults to on). Also, I'm adding a few comments to
your pull request.  Once the initial mods are in, I would go for a
long-lived branch in your fork (so that you remain in charge of
managing related pull requests). Backward compatibility would be
great, assuming that it doesn't make the code brittle, but I'm afraid
the time you'd have to put into that could be better spent...

Later,
    Luigi



On Tue, Oct 22, 2013 at 2:21 PM, Joseph Wang <[hidden email]> wrote:

> Can we move onto the main tree any changes that don't break backward
> compatibility and then leave anything that does break compatibility on the
> outside until we figure out what the best way of handling things are?
>
> If we default openmp to off, then the changes to lattice and fd give some
> speedups without breaking compatbility.  We can then work on the things that
> break compatibility separately.  Parallelzing mcarlo turns out to be
> extremely tricky so I want to work on it separately.  In particular, I want
> to make absolutely sure that there is no way we can still maintain backward
> compatibility.
>
> When I worked on a similar problem on a different code base, we managed to
> preserve backward compatibility, but is was a really, really hard slog
> involving a lot of very delicate coding.  If I can play with the code for a
> while, it might be possible to do the same thing.
>
>
>
> On Tue, Oct 22, 2013 at 6:13 PM, Luigi Ballabio <[hidden email]>
> wrote:
>>
>> Hi Joseph,
>>     it will probably be a while before we add to the main branch any
>> changes that break backward compatibility.  How about I create a
>> long-lived branch for your changes and I pull your changes there? The
>> advantage would be that your code might be found more easily. On the
>> other hand, the disadvantage is that pull requests from other people
>> might get made against my repo, not yours, even though the right
>> person to manage them would be you instead; this would be easier if
>> your fork was the reference for further development.
>>
>> Just let me know what you prefer.
>>
>> Later,
>>     Luigi
>>
>>
>>
>> On Tue, Oct 22, 2013 at 4:51 AM, Joseph Wang <[hidden email]> wrote:
>> > It's possible to figure out the number of random number evaluations, and
>> > then skip ahead N evaluations for each thread.  In practice this works
>> > but
>> > the code is very fragile.  Having each thread have its own state is also
>> > used but it requires the user to know ahead of time how many threads to
>> > use,
>> > and you have to be very careful not to introduce bad correlations.
>> >
>> > I'm trying to get deterministic order by putting the RNG into a critical
>> > section.  Right now the goal isn't to get the same order between non-OMP
>> > and
>> > OMP but to at least get identical answers between different OMP runs.
>> > This
>> > isn't a hard algorithm issue, but its a C++ class/architecture issue,
>> > since
>> > it involves changing the interface of the MC classes, and I'd like to
>> > get
>> > this right before moving anything to the main code branch.
>> >
>> > One other technique that I've see is to generate all of the paths ahead
>> > of
>> > time, store in memory and the run the pricing afterwards.  This gets you
>> > deterministic ordering, and you can do clever things with RNG, and it's
>> > very
>> > useful for generating greeks, but you then have to carefully manage
>> > memory,
>> > and I want to avoid that.
>> >
>> > I've been able to get nice speedups with OpenMP with FD and lattice, and
>> > I'd
>> > appreciate it if people could test the patches (openmp branch on
>> > joequant/quantlib github).  One thing that MP does is favor simple
>> > schemes
>> > so I've parallelized those.  There are some well known MP algorithms for
>> > tridiagonalizing a matrix and I can implement that once MC gets put in.
>> >
>> >
>> >
>> > On Tue, Oct 22, 2013 at 4:31 AM, Kyle Schlansker <[hidden email]>
>> > wrote:
>> >>
>> >> I think the standard solution is to have each thread maintain its own
>> >> state (i.e. rand seed).
>> >>
>> >> Having a shared q, lock free or not, would then introduce
>> >> nondeterminism
>> >> unless all consumer threads read from that q in the same order from run
>> >> to
>> >> run, which it seems is not guaranteed.
>> >>
>> >> --
>> >> kyle
>> >>
>> >> Sent from my mobile; apologies for any deficiencies of spelling or
>> >> message
>> >> tone.
>> >>
>> >> On Oct 21, 2013, at 2:53 PM, Mike Sharpe <[hidden email]> wrote:
>> >>
>> >> Does the number of random numbers needed per iteration change or is it
>> >> a
>> >> constant amount?
>> >>
>> >> Would it be possible to encapsulate the random generator state so each
>> >> thread could own its own RNG?
>> >>
>> >> If that isn't feasible or reduces the quality of the generation, would
>> >> it
>> >> be possible to spawn a producer thread that pushes random numbers onto
>> >> a
>> >> queue (could even investigate boost::lockfree::queue to avoid locking,
>> >> though it requires boost 1.53) and have the worker thread just pop off
>> >> from
>> >> that queue whenever a new random number is needed?
>> >>
>> >> Mike
>> >>
>> >>
>> >> On Sun, Oct 20, 2013 at 7:01 AM, Joseph Wang <[hidden email]>
>> >> wrote:
>> >>>
>> >>> I've done some more parallelization with openmp and quantlib.  I've
>> >>> uploaded the changes to the https://github.com/joequant/quantlib.  The
>> >>> branch openmp has some changes that I've issued a pull-request for.
>> >>> openmp-mcario has some changes that need some more work.
>> >>>
>> >>> I've gotten the MC to work by generating the paths in a critical
>> >>> situation.  Calculating the prices once I have the path is
>> >>> multithreaded,
>> >>> but right now I need to generate the paths in a single thread to make
>> >>> sure
>> >>> that the same sequence is generated.
>> >>>
>> >>> The big issue right now is that there is a race condition in the
>> >>> calculation of barrier options which is causing one regression test to
>> >>> fail.
>> >>> The problem is that the random number generator is being called in
>> >>> BarrierPathPricer, and since that is run multithread, the sequence
>> >>> that is
>> >>> being pulled will change from run to run based on whether other paths
>> >>> have
>> >>> pulled random numbers already.
>> >>>
>> >>> I think that fixing this is going to need some code restructuring, but
>> >>> I'd like to get some thoughts as to how to do this.  Basically, the
>> >>> interface needs to be changed slightly so that the random numbers are
>> >>> drawn
>> >>> in a fixed order, and that might mean one call to get any additional
>> >>> random
>> >>> numbers in a pricer, which gets called in a critical section, and
>> >>> another to
>> >>> run the pricer with the random numbers.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>> October Webinars: Code for Performance
>> >>> Free Intel webinars can help you accelerate application performance.
>> >>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>> >>> most
>> >>> from
>> >>> the latest Intel processors and coprocessors. See abstracts and
>> >>> register
>> >>> >
>> >>>
>> >>>
>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>> >>> _______________________________________________
>> >>> QuantLib-dev mailing list
>> >>> [hidden email]
>> >>> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>> >>>
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> October Webinars: Code for Performance
>> >> Free Intel webinars can help you accelerate application performance.
>> >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>> >> most
>> >> from
>> >> the latest Intel processors and coprocessors. See abstracts and
>> >> register >
>> >>
>> >>
>> >> http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
>> >>
>> >> _______________________________________________
>> >> QuantLib-dev mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>> >
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > October Webinars: Code for Performance
>> > Free Intel webinars can help you accelerate application performance.
>> > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>> > from
>> > the latest Intel processors and coprocessors. See abstracts and register
>> > >
>> >
>> > http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
>> > _______________________________________________
>> > QuantLib-dev mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>> >
>>
>>
>>
>> --
>> <https://implementingquantlib.blogspot.com>
>> <https://twitter.com/lballabio>
>
>



--
<https://implementingquantlib.blogspot.com>
<https://twitter.com/lballabio>

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev