Login  Register

Re: Reorganization of Git repository

Posted by Peter Caspers-4 on Dec 19, 2015; 6:40pm
URL: http://quantlib.414.s1.nabble.com/Reorganization-of-Git-repository-tp16962p17197.html

Yes, it works, thank you. I had some self-inflicted difficulties due
to some long forgotten faulty commits. I accidentially commited a few
large binary files and deleted them again, but git keeps them in the
history forever and when trying to push my filtered master to the new
fork of your QuantLib repository, github complains about a file size
limit of 100M that would be violated by the push and therefore rejects
it as a whole. It seems that this file size limit was not in place
when I did the faulty commits back in June 2013 (or maybe these
commits were the reason to introduce the limit .... :-)).

Although it is not impossible that I am the only one dumb enough to do
such things, here is the recipe how to repair things in case somebody
else has similar problems (and doesn't dare to ask). It might be a
good opportunity to check the branches to migrate for unwanted large
files anyway.

Here is a nice blog on how to get a list of blobs in a repository,
sorted by size

http://naleid.com/blog/2012/01/17/finding-and-purging-big-files-from-git-history

In my case for example I got the top entries

6fb186ffdef398a7114b3396a460c439af86ebc6 243316004
QuantLib/ql/.libs/libQuantLib.so.0.0.0
b2a00d0d70f2b7face89d0bb663d2f3283e1b206 174158133
QuantLib/test-suite/.libs/quantlib-test-suite
a39d347bcc50972c50fb7cc6ce33ff277bf0afaf 82943858
QuantLib/Examples/LatentModel/LatentModel
6044c4006c0bee541eb40cc27ada49bca4f7534c 45307201
QuantLib/test-suite/.libs/quantlib-benchmark
b2b0e357c9b86637c0df36cfe28c7d2193f2dc05 17600512
QuantLibXL/Workbooks/Tests/YieldCurveBasisAdjustmentMonitor.xls
3fdf1707ab1b60f068b9c683ee80f50b169248a4 16226304
QuantLibXL/framework2/bin/QuantLibXL-vc90-mt-1_1_0.xll
e7db41194c71b8680522801b7bf99fb6eb6fb7bc 15711232
QuantLibXL/Workbooks/Tests/YieldCurveBasisAdjustmentMonitor.xls
1563a7dbcb98e6f21f01d2844993cd515f708f0a 14321664
QuantLibXL/Workbooks/Tests/YieldCurveBasisAdjustmentMonitor.xls
50c4e5a956bad7e2a95e30eb21cec0b310cdb2f2 13601280
QuantLibXL/Workbooks/Drafts/PairwiseProjections.xls
8d92f48cdf62e1a5d1bb4cec4840cef1f6d09eb7 13552128
QuantLibXL/Workbooks/Drafts/PairwiseProjections.xls
50a3e9b7e34fa96c7ba82e857028fb942d109c7b 13551104
QuantLibXL/Workbooks/PairwiseProjections.xls
d4318bbe3f626e37c4343ff774135adf56cc5f41 12685312
QuantLibXL/framework2/bin/QuantLibXL-vc80-mt-s-0_9_6.xll
1a7f0eef209097b60dacd23bcd247a0cfc451c64 11760815
QuantLib/ql/experimental/models/betaetatabulation.cpp
52c14c298212344fe8a5e9cd62d7244f62910c4e 10442532
QuantLib-site/slides/qlum15/kienitz.pdf

It were the first two files that github complained about, but
obviously number 3 and 4 should also not be part of the git
repository. I found a small script that identifies all commits that
contain the suspicious blobs

#!/bin/sh
obj_name="$1"
shift
git log "$@" --pretty=format:'%T %h %s' \
| while read tree commit subject ; do
    if git ls-tree -r $tree | grep -q "$obj_name" ; then
        echo $commit "$subject"
    fi
done

and that can be called e.g. like this

git-blob.sh 6fb186ff --all

to identify the commits for the first blob above. This gives the two commits

w7bf1f20 restructure directories
b4239ca restructure directories

which can be further investigated with git show. For example the first
four blobs were due to some late-night commits by myself in my master
long ago mid 2013, obviously not using a proper .gitignore file. The
xll looks also suspicious but this was commited by Eric in an official
branch, so ok.

The blog also lists the filter commands that can be used to purge the
unwanted files from the git history. The drawback is that the commit
hashes will change by this, but as far as I can see only those in the
guilty branch are affected, so e.g. Luigi's commits in his master
which I mirror in a vendor branch stay the same. I guess I can live
with that, at least the full history of the branch is preserved.

The filter commands take quite a while to finish. There is also a java
tool bfg repo cleaner

https://rtyley.github.io/bfg-repo-cleaner/

which works much faster. I tried that too, because there was the small
hope that the commit hashes stay the same by a miracle, but they do
not. In the end I used the filter approach, and it seems to work, I
finally managed to migrate all my branches to the new repo. The size
is reasonable too, it takes 131M while Luigi's reference repo has 95M.

Best regards
Peter


On 18 December 2015 at 00:05, Luigi Ballabio <[hidden email]> wrote:

> It's possible, with the only caveat that git filter-branch is a destructive
> operation so your local repo can be converted just once. It's ok if you have
> just one module to convert, but you'll need more than one clone if you want
> to convert several modules.
>
> Luigi
>
>
> On Thu, Dec 17, 2015 at 8:09 PM Peter Caspers <[hidden email]>
> wrote:
>>
>> Hello Luigi,
>>
>> could one also do this "git filter-branch --prune-empty
>> --subdirectory-filter QuantLib -- --all" directly on (a copy of) one's
>> local repository, thus keeping *all* (local) branches, and then push
>> back some branches to a fresh fork of lballabio/QuantLib? This way it
>> would be easier to keep private branches in the local repo that should
>> not appear on the github repo. Or would that be unsafe for some
>> reason?
>>
>> Thanks a lot
>> Peter
>>
>>
>> On 17 December 2015 at 17:55, Luigi Ballabio <[hidden email]>
>> wrote:
>> > Hi all,
>> >     Eric and I just finished splitting the QuantLib git repository in
>> > smaller modules. The main forks for the new, smaller repos are:
>> >
>> > QuantLib: https://github.com/lballabio/QuantLib
>> > QuantLib-SWIG: https://github.com/lballabio/QuantLib-SWIG
>> > reposit: https://github.com/eehlers/reposit
>> > QuantLibAddin: https://github.com/eehlers/QuantLibAddin
>> > QuantLibXL: https://github.com/eehlers/QuantLibXL
>> >
>> > Eric's modules refer to the new reposit build.  The old build is still
>> > hosted at https://github.com/eehlers/quantlib for the time being.
>> >
>> > For those of you that had forked the old repository, see the
>> > instructions at
>> > http://quantlib.org/forkmigration.shtml for migrating any modifications
>> > you
>> > want to keep.
>> >
>> > Thanks for your patience,
>> >     Luigi
>> >
>> >
>> >
>> > On Thu, Dec 10, 2015 at 5:36 PM Luigi Ballabio
>> > <[hidden email]>
>> > wrote:
>> >>
>> >> Hi all,
>> >>     Eric and I will be splitting the QuantLib git repository in smaller
>> >> modules in the next few days (see my previous post, quoted below, for
>> >> more
>> >> details).  Once it's done, most of you that forked it on GitHub or
>> >> cloned it
>> >> locally can probably just fork or clone again the modules you're
>> >> interested
>> >> in. For those of you that have additional branches you want to keep,
>> >> I'll be
>> >> posting instructions for migrating them to the new modules.
>> >>
>> >> Later,
>> >>     Luigi
>> >>
>> >>
>> >> On Mon, Oct 26, 2015 at 3:09 PM Luigi Ballabio
>> >> <[hidden email]>
>> >> wrote:
>> >>>
>> >>> [ cross-posted to quantlib-users and quantlib-dev; apologies for any
>> >>> duplicates. ]
>> >>>
>> >>> Hi all,
>> >>>     I'm currently 3 or 4 issues away from setting up the 1.7 release.
>> >>>
>> >>> Shortly after doing that, and in concert with the other maintainers,
>> >>> I'll
>> >>> reorganize the Git repository so that the current, monolithic one
>> >>> containing
>> >>> all the modules will be split into smaller ones, with one module per
>> >>> current
>> >>> directory; thus, there will be a repository for the core C++ library,
>> >>> one
>> >>> for the Excel addin and so on.
>> >>>
>> >>> This will make it more convenient for the maintainers to manage the
>> >>> modules for which they have responsibility, and will also make it a
>> >>> lot
>> >>> easier to add new modules.  We had considered doing this when we
>> >>> migrated
>> >>> from subversion to git, and in hindsight we should have gone ahead at
>> >>> that
>> >>> time.
>> >>>
>> >>> I'm aware this will cause inconveniences to the 500+ people that
>> >>> forked
>> >>> the repository on GitHub.  I am sorry for this, and I will try to
>> >>> minimize
>> >>> the pain: I'll migrate the open pull requests to the new repository,
>> >>> and
>> >>> I'll try to make some kind of guide to help those of you with local
>> >>> changes
>> >>> to move them to the new fork. In the meantime, your current forks are
>> >>> not
>> >>> going away.
>> >>>
>> >>> Thanks for the understanding. I'll post a timeline as soon as I have
>> >>> one.
>> >>>
>> >>> Luigi
>> >>>
>> >>> --
>> >>>
>> >>> <http://leanpub.com/implementingquantlib>
>> >>> <http://implementingquantlib.com>
>> >>> <http://twitter.com/lballabio>
>> >>
>> >> --
>> >>
>> >> <http://leanpub.com/implementingquantlib>
>> >> <http://implementingquantlib.com>
>> >> <http://twitter.com/lballabio>
>> >
>> > --
>> >
>> > <http://leanpub.com/implementingquantlib>
>> > <http://implementingquantlib.com>
>> > <http://twitter.com/lballabio>
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> >
>> > _______________________________________________
>> > QuantLib-dev mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/quantlib-dev
>> >
>
> --
>
> <http://leanpub.com/implementingquantlib>
> <http://implementingquantlib.com>
> <http://twitter.com/lballabio>

------------------------------------------------------------------------------
_______________________________________________
QuantLib-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quantlib-dev