Caching APT packages in GitHub Actions workflow

Question

I use the following Github Actions workflow for my C project. The workflow finishes in ~40 seconds, but more than half of that time is spent by installing the valgrind package and its dependencies.

I believe caching could help me speed up the workflow. I do not mind waiting a couple of extra seconds, but this just seems like a pointless waste of GitHub's resources.

name: C Workflow on: [push, pull_request] jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v1 - name: make run: make - name: valgrind run: | sudo apt-get install -y valgrind valgrind -v --leak-check=full --show-leak-kinds=all ./bin

Running sudo apt-get install -y valgrind installs the following packages:

gdb
gdbserver
libbabeltrace1
libc6-dbg
libipt1
valgrind

I know Actions support caching of a specific directory (and there are already several answered SO questions and articles about this), but I am not sure where all the different packages installed by apt end up. I assume /bin/ or /usr/bin/ are not the only directories affected by installing packages.

Is there an elegant way to cache the installed system packages for future workflow runs?

smac89 · Accepted Answer · 2022-05-16 16:21:08Z

The purpose of this answer is to show how caching can be done with github actions, not necessarily to show how to cache valgrind, (which it does). I also try to explain why not everything can/should be cached, because the cost (in terms of time) of caching and restoring a cache, vs reinstalling the dependency needs to be taken into account.

You will make use of the actions/cache action to do this.

Add it as a step (before you need to use valgrind):

- name: Cache valgrind uses: actions/cache@v2 id: cache-valgrind with: path: "~/valgrind" key: ${{secrets.VALGRIND_VERSION}}

The next step should attempt to install the cached version if any or install from the repositories:

- name: Install valgrind env: CACHE_HIT: ${{steps.cache-valgrind.outputs.cache-hit}} VALGRIND_VERSION: ${{secrets.VALGRIND_VERSION}} run: | if [[ "$CACHE_HIT" == 'true' ]]; then sudo cp --verbose --force --recursive ~/valgrind/* / else sudo apt-get install --yes valgrind="$VALGRIND_VERSION" mkdir -p ~/valgrind sudo dpkg -L valgrind | while IFS= read -r f; do if test -f $f; then echo $f; fi; done | xargs cp --parents --target-directory ~/valgrind/ fi

Explanation

Set VALGRIND_VERSION secret to be the output of:

apt-cache policy valgrind | grep -oP '(?<=Candidate:\s)(.+)'

this will allow you to invalidate the cache when a new version is released simply by changing the value of the secret.

dpkg -L valgrind is used to list all the files installed when using sudo apt-get install valgrind.

What we can now do with this command is to copy all the dependencies to our cache folder:

dpkg -L valgrind | while IFS= read -r f; do if test -f $f; then echo $f; fi; done | xargs cp --parents --target-directory ~/valgrind/

Furthermore

In addition to copying all the components of valgrind, it may also be necessary to copy the dependencies (such as libc in this case), but I don't recommend continuing along this path because the dependency chain just grows from there. To be precise, the dependencies needed to copy to finally have an environment suitable for valgrind to run in is as follows:

libc6
libgcc1
gcc-8-base

To copy all these dependencies, you can use the same syntax as above:

for dep in libc6 libgcc1 gcc-8-base; do dpkg -L $dep | while IFS= read -r f; do if test -f $f; then echo $f; fi; done | xargs cp --parents --target-directory ~/valgrind/ done

Is all this work really worth the trouble when all that is required to install valgrind in the first place is to simply run sudo apt-get install valgrind? If your goal is to speed up the build process, then you also have to take into consideration the amount of time it is taking to restore (downloading, and extracting) the cache vs simply running the command again to install valgrind.

And finally to restore the cache, assuming it is stored at /tmp/valgrind, you can use the command:

cp --force --recursive /tmp/valgrind/* /

Which will basically copy all the files from the cache unto the root partition.

In addition to the process above, I also have an example of "caching valgrind" by installing and compiling it from source. The cache is now about 63MB (compressed) in size and one still needs to separately install libc which kind of defeats the purpose.

Note: Another answer to this question proposes what I could consider to be a safer approach to caching dependencies, by using a container which comes with the dependencies pre-installed. The best part is that you can use actions to keep those containers up-to-date.

References:

Oh, I see, that's ingenious. I had no idea you could safely take all the installed files and just move them to another directory without breaking something. I'm not sure it works though. I ran the workflow 3 times and always go Cache not found for input keys: ***.. I added the VALGRIND_VERSION secret in Settings > Secrets, is that right?
I've managed to get a cache hit now, but I'm getting the following error from valgrind: --2906-- Reading syms from /lib/x86_64-linux-gnu/ld-2.27.so --2906-- Considering /lib/x86_64-linux-gnu/ld-2.27.so .. --2906-- .. CRC mismatch (computed 1b7c895e wanted 2943108a) --2906-- object doesn't have a symbol table
@natiiix there is a possibility that caching valgrind made it so that libc dependency is not installed when the cache is retrieved. I am not near a monitor now, but I looked up your error and it seems like it is a bug with valgrind. You can try also installing libc version 6 and see if that helps. I will update the answer later today
Yes, it seems so. If I add sudo apt-get install -y libc6-dbg, then it works fine, but then I'm also where I started because the installation of that package takes 30 more seconds.
@natiiix It seems that caching valgrind may be more work than anticipated, but atleast this shows how caching can be done on ubuntu. Looking at the dependencies of valgrind, there is atleast 6 dependencies, and I think they probably all need to be cached if this is to work.

deivid · Accepted Answer · 2020-03-29 21:02:46Z

47

You could create a docker image with valgrind preinstalled and run your workflow on that.

Create a Dockerfile with something like:

FROM ubuntu RUN apt-get install -y valgrind

Build it and push it to dockerhub:

docker build -t natiiix/valgrind . docker push natiiix/valgrind

Then use something like the following as your workflow:

name: C Workflow on: [push, pull_request] jobs: build: container: natiiix/valgrind steps: - uses: actions/checkout@v1 - name: make run: make - name: valgrind run: valgrind -v --leak-check=full --show-leak-kinds=all ./bin

Completely untested, but you get the idea.

answered Mar 29, 2020 at 21:02

deivid

5,3382 gold badges37 silver badges40 bronze badges

8 Comments

natiiix Over a year ago

This is a very interesting idea, but it kind of undermines the whole principle of letting GitHub Actions cache the environment / artifacts for future runs and instead requires some additional effort from my side. On the other hand, once done, this could probably be reused quite easily.

deivid Over a year ago

It's up to you to decide what works best for you, or what requires the most offort from your side ¯_(ツ)_/¯

Pure Function Over a year ago

Personally I think this is the most sensible answer, since other answers that show cache the dependency manually show how fraught that is.

smac89 Over a year ago

I honestly like this idea better than my answer. You can even create a separate workflow that continuously builds the container and deploys to github itself

Cecil Curry Over a year ago

This is the way. The accepted answer is hyper-fragile and clearly guaranteed to fail. While deploying yet another third-party hosting service also adds a modicum of fragility, the one-line triviality of this solution speaks volumes for Docker. Like it or (more likely) not, this is the only sane solution for caching system-wide apt packages.

|

Israel Alberto RV · Accepted Answer · 2020-12-24 17:33:49Z

Updated: I created a GitHub action which work as this solution, less code and better optimizations. Cache Anything New

This solution is similar to the most voted. I tried the proposed solution but it didn't work for me because I was installing texlive-latex, and pandoc which has many dependencies and sub-dependencies.

I created a solution which should help many people. One case is when you install a couple of packages (apt install), the other solution is when you make a program and it takes for a while.

Solution:

Step which has all the logic, it will cache.
- Use find to create a list of all the files in the container.
- Install all the packages or make the programs, whatever that you want to cache.
- Use find to create a list of all the files in the container.
- Use diff to get the new created files.
- Add these new files to the cache directory. This directory will automatically store with actions/cache@v2.
Step which load the created cache.
- Copy all the files from the cache directory to the main path /.
Steps which will be benefited by the cache and other steps that you need.

When to use this?

I didn't use cache, the installation of the packages was around ~2 minutes to finish all the process.
With the cache, it takes 7~10 minutes to create it the first time.
- Using the cache takes ~ 1 minute to finish all the process.
It is useful only if your main process take a lot of time also it is convenient if you're deploying very often.

Implementation:

Source code: .github/workflows
Landing page of my actions: workflows.

release.yml

name: CI - Release books on: release: types: [ released ] workflow_dispatch: jobs: build: runs-on: ubuntu-18.04 steps: - uses: actions/checkout@v2 - uses: actions/cache@v2 id: cache-packages with: path: ${{ runner.temp }}/cache-linux key: ${{ runner.os }}-cache-packages-v2.1 - name: Install packages if: steps.cache-packages.outputs.cache-hit != 'true' env: SOURCE: ${{ runner.temp }}/cache-linux run: | set +xv echo "# --------------------------------------------------------" echo "# Action environment variables" echo "github.workspace: ${{ github.workspace }}" echo "runner.workspace: ${{ runner.workspace }}" echo "runner.os: ${{ runner.os }}" echo "runner.temp: ${{ runner.temp }}" echo "# --------------------------------------------------------" echo "# Where am I?" pwd echo "SOURCE: ${SOURCE}" ls -lha / sudo du -h -d 1 / 2> /dev/null || true echo "# --------------------------------------------------------" echo "# APT update" sudo apt update echo "# --------------------------------------------------------" echo "# Set up snapshot" mkdir -p "${{ runner.temp }}"/snapshots/ echo "# --------------------------------------------------------" echo "# Install tools" sudo rm -f /var/lib/apt/lists/lock #sudo apt install -y vim bash-completion echo "# --------------------------------------------------------" echo "# Take first snapshot" sudo find / \ -type f,l \ -not \( -path "/sys*" -prune \) \ -not \( -path "/proc*" -prune \) \ -not \( -path "/mnt*" -prune \) \ -not \( -path "/dev*" -prune \) \ -not \( -path "/run*" -prune \) \ -not \( -path "/etc/mtab*" -prune \) \ -not \( -path "/var/cache/apt/archives*" -prune \) \ -not \( -path "/tmp*" -prune \) \ -not \( -path "/var/tmp*" -prune \) \ -not \( -path "/var/backups*" \) \ -not \( -path "/boot*" -prune \) \ -not \( -path "/vmlinuz*" -prune \) \ > "${{ runner.temp }}"/snapshots/snapshot_01.txt 2> /dev/null \ || true echo "# --------------------------------------------------------" echo "# Install pandoc and dependencies" sudo apt install -y texlive-latex-extra wget wget -q https://github.com/jgm/pandoc/releases/download/2.11.2/pandoc-2.11.2-1-amd64.deb sudo dpkg -i pandoc-2.11.2-1-amd64.deb rm -f pandoc-2.11.2-1-amd64.deb echo "# --------------------------------------------------------" echo "# Take second snapshot" sudo find / \ -type f,l \ -not \( -path "/sys*" -prune \) \ -not \( -path "/proc*" -prune \) \ -not \( -path "/mnt*" -prune \) \ -not \( -path "/dev*" -prune \) \ -not \( -path "/run*" -prune \) \ -not \( -path "/etc/mtab*" -prune \) \ -not \( -path "/var/cache/apt/archives*" -prune \) \ -not \( -path "/tmp*" -prune \) \ -not \( -path "/var/tmp*" -prune \) \ -not \( -path "/var/backups*" \) \ -not \( -path "/boot*" -prune \) \ -not \( -path "/vmlinuz*" -prune \) \ > "${{ runner.temp }}"/snapshots/snapshot_02.txt 2> /dev/null \ || true echo "# --------------------------------------------------------" echo "# Filter new files" diff -C 1 \ --color=always \ "${{ runner.temp }}"/snapshots/snapshot_01.txt \ "${{ runner.temp }}"/snapshots/snapshot_02.txt \ | grep -E "^\+" \ | sed -E s/..// \ > "${{ runner.temp }}"/snapshots/snapshot_new_files.txt < "${{ runner.temp }}"/snapshots/snapshot_new_files.txt wc -l ls -lha "${{ runner.temp }}"/snapshots/ echo "# --------------------------------------------------------" echo "# Make cache directory" rm -fR "${SOURCE}" mkdir -p "${SOURCE}" while IFS= read -r LINE do sudo cp -a --parent "${LINE}" "${SOURCE}" done < "${{ runner.temp }}"/snapshots/snapshot_new_files.txt ls -lha "${SOURCE}" echo "" sudo du -sh "${SOURCE}" || true echo "# --------------------------------------------------------" - name: Copy cached packages if: steps.cache-packages.outputs.cache-hit == 'true' env: SOURCE: ${{ runner.temp }}/cache-linux run: | echo "# --------------------------------------------------------" echo "# Using Cached packages" ls -lha "${SOURCE}" sudo cp --force --recursive "${SOURCE}"/. / echo "# --------------------------------------------------------" - name: Generate release files and commit in GitHub run: | echo "# --------------------------------------------------------" echo "# Generating release files" git fetch --all git pull --rebase origin main git checkout main cd ./src/programming-from-the-ground-up ./make.sh cd ../../ ls -lha release/ git config --global user.name 'Israel Roldan' git config --global user.email '[email protected]' git add . git status git commit -m "Automated Release." git push git status echo "# --------------------------------------------------------"

Explaining some pieces of the code:

Here the action cache, indicate a key which will be generated once and compare in later executions. The path is the directory where the files should be to generate the cache compressed file.

 - uses: actions/cache@v2 id: cache-packages with: path: ${{ runner.temp }}/cache-linux key: ${{ runner.os }}-cache-packages-v2.1

This conditional search for the key cache, if it exits the cache-hit is 'true'.

if: steps.cache-packages.outputs.cache-hit != 'true' if: steps.cache-packages.outputs.cache-hit == 'true'

It's not critical but when the du command executes at first time, Linux indexed all the files (5~8 minutes), then when we will use the find, it will take only ~50 seconds to get all the files. You can delete this line, if you want.

The suffixed command || true prevents that 2> /dev/null return error otherwise the action will stop because it will detect that your script has an error output. You will see during the script a couple of theses.

sudo du -h -d 1 / 2> /dev/null || true

This is the magical part, use find to generate a list of the actual files, excluding some directories to optimize the cache folder. It also will be executed after the installations and make programs. In the next snapshot the file name should be different snapshot_02.txt.

sudo find / \ -type f,l \ -not \( -path "/sys*" -prune \) \ -not \( -path "/proc*" -prune \) \ -not \( -path "/mnt*" -prune \) \ -not \( -path "/dev*" -prune \) \ -not \( -path "/run*" -prune \) \ -not \( -path "/etc/mtab*" -prune \) \ -not \( -path "/var/cache/apt/archives*" -prune \) \ -not \( -path "/tmp*" -prune \) \ -not \( -path "/var/tmp*" -prune \) \ -not \( -path "/var/backups*" \) \ -not \( -path "/boot*" -prune \) \ -not \( -path "/vmlinuz*" -prune \) \ > "${{ runner.temp }}"/snapshots/snapshot_01.txt 2> /dev/null \ || true

Install some packages and pandoc.

sudo apt install -y texlive-latex-extra wget wget -q https://github.com/jgm/pandoc/releases/download/2.11.2/pandoc-2.11.2-1-amd64.deb sudo dpkg -i pandoc-2.11.2-1-amd64.deb rm -f pandoc-2.11.2-1-amd64.deb

Generate the text file with the new files added, the files could be symbolic files, too.

diff -C 1 \ "${{ runner.temp }}"/snapshots/snapshot_01.txt \ "${{ runner.temp }}"/snapshots/snapshot_02.txt \ | grep -E "^\+" \ | sed -E s/..// \ > "${{ runner.temp }}"/snapshots/snapshot_new_files.txt

At the end copy all the files into the cache directory as an archive to keep the original information.

while IFS= read -r LINE do sudo cp -a --parent "${LINE}" "${SOURCE}" done < "${{ runner.temp }}"/snapshots/snapshot_new_files.txt

Step to copy all the cached files into the main path /.

 - name: Copy cached packages if: steps.cache-packages.outputs.cache-hit == 'true' env: SOURCE: ${{ runner.temp }}/cache-linux run: | echo "# --------------------------------------------------------" echo "# Using Cached packages" ls -lha "${SOURCE}" sudo cp --force --recursive "${SOURCE}"/. / echo "# --------------------------------------------------------"

This step is where I'm using the installed packages generated by the cache, the ./make.sh script use pandoc to do some conversions. As I mentioned, you can create other steps which use the cache benefits or another which not use the cache.

 - name: Generate release files and commit in GitHub run: | echo "# --------------------------------------------------------" echo "# Generating release files" cd ./src/programming-from-the-ground-up ./make.sh

Andry · Accepted Answer · 2022-09-04 07:25:39Z

Just for instance, there is already exists several implementations:

https://github.com/awalsh128/cache-apt-pkgs-action
- installs and uses apt-fast from https://git.io/vokNn instead of direct usage the apt-get (https://askubuntu.com/questions/52243/what-is-apt-fast-and-should-i-use-it)
- generates unique cache directory name from input packages list
- uses dpkg -L to enlist changes
- tars package files into ${cache_dir}/${installed_package}.tar (without compression).
  Compression is not required as long as action/cache does compression:
  https://github.com/awalsh128/cache-apt-pkgs-action/issues/46
  https://github.com/awalsh128/cache-apt-pkgs-action/pull/53
https://github.com/airvzxf/cache-anything-new-action
Caching APT packages in GitHub Actions workflow
- Scans the Linux container to check if anything new was added after you run your custom script then, it will cache all the new files.
- script must be in a standalone file inside the GitHub workflows directory
- does not generate unique cache directory name
- can exclude user directories from the scan
- Can be much slower than just use the dpkg -L, but finds all the changes in the file system
https://github.com/Mudlet/xmlstarlet-action
- example of the docker file to run xmlstarlet with arguments
- limited to static or already committed Dockerfile and entrypoint.sh, can not use external script or instruction set
- must be used from the GitHub Actions pipeline only, can not be used from the inner bash or whatever script call because an install and a run can not be separated
- ~50% slower than a single apt-get install, but can be faster for multiple packages

Thank you for keeping this question's answers updated even after nearly 3 years. When I originally asked, Actions was still a closed beta feature, and it's obviously widely adopted across industries nowadays, so it's a very different situation with an incredibly different level of support from all sides.

Leonardo C. · Accepted Answer · 2023-04-06 15:38:19Z

Using a service it should be possible to use an apt-cacher-ng container to cache apt. Then you'd just have to setup apt-get to use the local proxy via this service container; the apt-cacher-ng docs has a how-to setup an apt proxy. The final step would be to cache the apt-cacher-ng cache via the github cache action.

If I ever get around to testing this, I will update this answer.

Locally I have an apt-cacher-ng container setup and the cache is in ~/.dockercache/apt-cacher-ng. So I do believe the theory is sound.

koppor · Accepted Answer · 2025-08-14 10:11:48Z

Even though the following actions exectes an apt-get update each time, it had no issues in my setting - https://github.com/Eeems-Org/apt-cache-action

- name: Install valgrind uses: Eeems-Org/apt-cache-action@v1 with: packages: valgrind

Collectives™ on Stack Overflow

Caching APT packages in GitHub Actions workflow

6 Answers 6

Explanation

Furthermore

10 Comments

8 Comments

Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Explanation

Furthermore

10 Comments

8 Comments

Comments

1 Comment

Comments

Comments

Linked

Related