Speed up percona-server git checkout using repository snapshots (git bundles)
Description
Confluence content
Activity

Julia Vural March 5, 2025 at 12:34 PM
if we were to have this, we would be saving a minute on each build. It is quite a bit of savings. We would appreciate if you can take a look at it when you have time.

Yura Sorokin April 3, 2023 at 1:51 PM
To be honest we used to have "–depth=1" previously and indeed it helped a lot with download data size. However, we had to remove this option as some logic in the build scripts depends on being able to execute "git log" and get real commit history. I am not talking about having absolutely everything here but having commit history between at least 2 consecutive minor releases (for instance for Percona Server this is usually about 1500-2000 commits).
So I did a little experiment
and the results
So,
Full history - 4.27 GiB
"–depth=0" - 442.16 MiB
"–depth=2000" - 4.14 GiB
In other words, for real scenario, with "--depth=2000" download data size reduction is about 3%, which is not worth the effort.
What I really wanted as part of this task is to find a way to create pre-built read-only disk image, that can easily be mounted and shared between different EC2 instances (Jenkins workers).
Not only can we put git repository snapshots (bundles) there but other huge dependencies (like unpacked boost libraries source tree) as well.

surabhi.bhat March 28, 2023 at 5:27 AM
Hi ,
Yes, we have already fixed the issue via https://jira.percona.com/browse/PXB-2904 using git shallow clone using the --depth=1 parameter which includes only the most recent commits and it takes relatively lesser time as well. For example:
Could you please let me know if this alternative approach works and if the task can be closed?
Thanks,
Surabhi

Lenz Grimmer March 27, 2023 at 4:27 PM
Haven't we solved this issue using an alternative approach via PXB-2904 already? Does this still have to be implemented?

Aaditya Dubey December 5, 2022 at 9:40 AM
Hi ,
Thank you for the report.
Details
Details
Assignee

Reporter

The following approach can be used to speed up the process of checking out percona-server source code from GitHub.
We can periodically (say, every time a new version of Percona Server for MySQL is released) create percona-server repository snapshots using git bundle
Currently (after release-8.0.30-22 has been merged to the 8.0 trunk) the size of the percona-server.bundle is about 4GB.
percona-server.bundle must be put on a shared storage, accessible by Jenkins workers / Azure Pipelines workers. The easiest would be AWS S3 or Azure Blob Storage. Other alternatives like shared read-only disk image mounted directly to worker filesystems or including this bundle into docker images, may also be considered
Jenkins workers / Azure Pipelines workers will first download / get direct access to this percona-server.bundle
Inside workers, the procedure of specific branch (say "release-8.0.31-23") checkout will include the following steps
Cloning content of the snapshot from the bundle
adding remote <remote> for the repository where <branch> resides (for instance, https://github.com/percona/percona-server.git)
Fetching the requested branch <branch> directly from the GitHub repo. Please notice and that at this stage only those commits that are not included in the snapshot percona-server.bundle will be downloaded.
Checking out to the <branch>
The same idea can be applied to submodules