BFG Repo-Cleaner 不清理我的 gitlab 存储库



这真是一个令人担忧和紧张的问题,我找不到任何适合我的答案,所以我不能备份我的代码超过一年了(太糟糕了!)

我实际上使用BFG repo - cleaner来清理我的repo。实际的总文件大小只有几个KB,但远程大小增长到大约9.8GB,这使得我无法git push.

我是这样做的:

repo-clean$ git clone --mirror https://gitlab.com/our-projects/my-specific-project.git
Cloning into bare repository 'my-specific-project.git'...
Username for 'https://gitlab.com': my-username
Password for 'https://my-username@gitlab.com': 
remote: Enumerating objects: 1306, done.
remote: Total 1306 (delta 0), reused 0 (delta 0), pack-reused 1306
Receiving objects: 100% (1306/1306), 9.73 GiB | 37.61 MiB/s, done.
Resolving deltas: 100% (232/232), done.

检查库存大小:

repo-clean$ cd my-specific-project.git
repo-clean/my-specific-project.git$ du -sh *
4.0K    branches
4.0K    config
4.0K    description
4.0K    HEAD
64K hooks
8.0K    info
9.8G    objects
4.0K    packed-refs
12K refs
repo-clean/my-specific-project.git$ cd ..
repo-clean$

然后运行BFG清理我的repo:

repo-clean$ java -jar bfg.jar --strip-blobs-bigger-than 50M my-specific-project.git
Using repo : ~/repo-clean/my-specific-project.git
Scanning packfile for large blobs: 1306
Scanning packfile for large blobs completed in 172 ms.
Found 48 blob ids for large blobs - biggest=4510353716 smallest=74220532
Total size (unpacked)=25890690884
Found 132 objects to protect
Found 3 commit-pointing refs : HEAD, refs/heads/master, refs/merge-requests/1/head
Protected commits
-----------------
These are your protected commits, and so their contents will NOT be altered:
* commit 628fb69b (protected by 'HEAD') - contains 3 dirty files : 
- models/RF_modelGeolife.h5 (146.7 MB)
- models/RF_modelSMF.h5 (249.3 MB)
- models/RF_modelgeolife.h5 (146.7 MB)
WARNING: The dirty content above may be removed from other commits, but as
the *protected* commits still use it, it will STILL exist in your repository.
Details of protected dirty content have been recorded here :
~/repo-clean/my-specific-project.git.bfg-report/2022-05-09/15-50-17/protected-dirt/
If you *really* want this content gone, make a manual commit that removes it,
and then run the BFG on a fresh copy of your repo.

Cleaning
--------
Found 53 commits
Cleaning commits:       100% (53/53)
Cleaning commits completed in 150 ms.
Updating 2 Refs
---------------
Ref                          Before     After   
------------------------------------------------
refs/heads/master          | 628fb69b | 12113214
refs/merge-requests/1/head | f1182758 | 6c3ad899
Updating references:    100% (2/2)
...Ref update completed in 30 ms.
Commit Tree-Dirt History
------------------------
Earliest                                       Latest
|                                                   |
..............DDDmmmDDDDmmmmDDDDDDDDDDDDDDDmmmmmmmmmm
D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)
Before     After   
-------------------------------------------
First modified commit | fc7cf2f9 | a772ae4a
Last dirty commit     | d4a1a3d4 | c4a6ad7f
Deleted files
-------------
Filename                                                    Git id                                                       
-------------------------------------------------------------------------------------------------------------------------
3Class_Instances.pkl                                      | ceebb395 (558.1 MB)                                          
Beijing_KerasData.pkl                                     | 8681a270 (133.4 MB)                                          
Filtered_Trajectory.pkl                                   | bfe06d09 (137.8 MB)                                          
Foot_Car_Instances.pkl                                    | c4bea045 (537.3 MB)                                          
Foot_Car_Instances2.pkl                                   | 8d9b96ad (537.3 MB)                                          
Instance_Geolife.pickle                                   | ee16e13b (412.5 MB)                                          
Instance_Geolife_Beijing.pkl                              | c2cd394a (409.6 MB)                                          
RF_modelGeolife.h5                                        | 5629ee4d (146.7 MB)                                          
RF_modelSMF.h5                                            | 14372982 (249.3 MB)                                          
RF_modelgeolife.h5                                        | 36293e2c (146.7 MB)                                          
Revised_InstanceCreation+NoJerkOutlier+NOSmoothing.pickle | 29ff8dd4 (269.6 MB)                                          
Revised_KerasData_NoSmoothing.pickle                      | 2421f835 (91.7 MB), 775b6041 (1.5 GB)                        
Revised_Trajectory_Label_Array.pickle                     | 059a4596 (84.5 MB)                                           
Revised_Trajectory_Label_Array2017.pickle                 | 7e24d6f7 (216.7 MB)                                          
Revised_Trajectory_Label_Array2018.pickle                 | cee1e176 (791.3 MB)                                          
...

In total, 71 object ids were changed. Full details are logged here:
~/repo-clean/my-specific-project.git.bfg-report/2022-05-09/15-50-17
BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

去掉不需要的脏数据:

repo-clean$ cd my-specific-project.git
~/repo-clean/my-specific-project.git$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Enumerating objects: 1310, done.
Counting objects: 100% (1310/1310), done.
Delta compression using up to 8 threads
Compressing objects: 100% (1242/1242), done.
Writing objects: 100% (1310/1310), done.
Building bitmaps: 100% (53/53), done.
Total 1310 (delta 245), reused 962 (delta 0), pack-reused 0

然后尝试push to remote,这应该是最后一步,但是失败了:

~/repo-clean/my-specific-project.git$ git push
Username for 'https://gitlab.com': my-username
Password for 'https://my-username@gitlab.com': 
Enumerating objects: 1310, done.
Writing objects: 100% (1310/1310), 2.08 GiB | 21.02 MiB/s, done.
Total 1310 (delta 0), reused 0 (delta 0), pack-reused 1310
remote: Resolving deltas: 100% (245/245), done.
remote: GitLab: Your push to this repository has been rejected because it would exceed storage limits. Please contact your GitLab administrator for more information.
To https://gitlab.com/our-projects/my-specific-project.git
! [remote rejected] master -> master (pre-receive hook declined)
! [remote rejected] refs/merge-requests/1/head -> refs/merge-requests/1/head (deny updating a hidden ref)
error: failed to push some refs to 'https://our-projects/my-specific-project.git'

似乎BFG已经将回购大小减少到约2.1GB(这包括未跟踪的dir,如venv和数据dir)。

~/repo-clean/my-specific-project.git$ du -sh *
4.0K    branches
4.0K    config
4.0K    description
4.0K    HEAD
64K hooks
12K info
2.1G    objects
4.0K    packed-refs
16K refs

注意

我也使用过类似的工具,如这里描述的git-filter-repo,但产生了错误,我报告给GitLab社区,但没有得到任何帮助,以及这里描述的gitlab-rake,但没有成功。

注意warning。这告诉您在最近的提交中有三个非常大的文件,也称为HEAD。BFG无法移除。

正如warning告诉您的那样,您应该git rm --cached这三个文件(以及任何其他不需要的大文件,当您使用它时),然后git commit,然后再次运行BFG以解决问题。

(当然,请确保将所有这些文件也添加到.gitignore中,这样它们就不会意外地再次添加到提交中。)

最新更新