Creating a cheap "overlay" Python virtual environment on top of current one without re-installing packages

Question

Problem:

My program is getting a list of (requirements_123.txt, program_123.py) pairs (actually a list of script lines like pip install a==1 b==2 c==3 && program_123.py).

My program needs to run each program in an isolated virtual environment based on the current environment.

Requirements:

Current environment is not modified
Program environment is based on the current environment
Not reinstalling the packages from the current env. It's slow. It does not really work (package sources might be missing, build tools might be missing). No pip freeze | pip install please
Fast. Copying gigabytes of files from current environment to a new environment every time is too slow. Symlinking might be OK as a last resort.

Ideal solution: I set some environment variables for each program, pointing to a new virtual environment dir, and then just execute the script and pip does the right thing.

How can I do this?

What do I mean by "overlay": Python already has some "overlays". There are system packages and user packages. User packages "shadow" the system packages, but non-shadowed system packages are still visible to the programs. When pip installs the packages in the user directory, it does not uninstall the system package version. This is the exact behavior I need. I just need a third overlay layer: "system packages", "user packages", "program packages".

Related questions (but they do not consider the user dir packages, only the virtual environments):

"Cascading" virtual Python environnements Is it possible to create nested virtual environments for python?

P.S.

If pip freeze doesn't even work, you have much larger problems lurking.

There are many reasons why the result of pip freeze > requirements.txt does not work in practice:

System-installed packages installed using apt.
Packages installed from different package indexes, not PyPI (PyTorch does that). Package conda-package-handling is not on PyPI.
Conda packages.
Packages built from source some time ago (and your compilers are different now).
Installs from git or zip/whl files.
Editable installs.

I've just checked a default notebook instance in Google Cloud and almost half of the pip freeze list looks like this:

threadpoolctl @ file:///tmp/tmp79xdzxkt/threadpoolctl-2.1.0-py3-none-any.whl tifffile @ file:///home/conda/feedstock_root/build_artifacts/tifffile_1597357726309/work

Also packages like conda-package-handling are not even on PyPI.

Anyways, this is just one of the many reasons why pip freeze | pip install does not work in practice.

....why? If these are independent programs, just run each in their own venv? That's literally what a virtual environment is for? — Mike 'Pomax' Kamermans
– Mike 'Pomax' Kamermans, Commented Feb 20, 2023 at 22:54
I do not think it is feasible. Maybe try with the conda ecosystem, as far as I understood if you have 2 environments with the same library, it still has only 1x the disk space footprint. -- Otherwise, maybe you can do things with symlinks and/or with .pth files, but it is a bit complicated. -- I am pretty sure this question has been asked multiple times here before, maybe there are some better ideas in the answers. — sinoroc
– sinoroc, Commented Feb 21, 2023 at 9:51
So every time your program runs, you need to install a different sets of packages, which may not be compatible which each other? Or, do you just install your "master program" once? Why does it have to be fast? Are you creating some sort of custom CI / automated testing system ..? — Niko Fohr
– Niko Fohr, Commented Feb 21, 2023 at 9:57
stackoverflow.com/q/74436125 -- stackoverflow.com/q/50953575 -- stackoverflow.com/q/61019081 — sinoroc
– sinoroc, Commented Feb 21, 2023 at 9:58
Honestly, it sounds like you need to fix that setup, not go "how do I keep working with this broken system". If pip freeze doesn't even work, you have much larger problems lurking. — Mike 'Pomax' Kamermans
– Mike 'Pomax' Kamermans, Commented Feb 22, 2023 at 15:58

gre_gor · Accepted Answer · 2024-08-16 18:11:51Z

You can add a .pth file (a site module feature) to the site packages directory of your derived virtual environment with a line pointing to the site-packages path of your base virtual environment.

In shell, you can do it like this:

# Assumes that the base virtual environment exists, activate it. . base/bin/activate # Create the derived virtual environment. python -m venv ./derived # Make the derived virtual environment import base's packages too. base_site_packages="$(python -c 'import sysconfig; print(sysconfig.get_paths()["purelib"])')" derived_site_packages="$(./derived/bin/python -c 'import sysconfig; print(sysconfig.get_paths()["purelib"])')" echo "$base_site_packages" > "$derived_site_packages"/_base_packages.pth

base_site_packages is usually base/lib/python<VERSION>/site-packages, the code to get it is taken from https://stackoverflow.com/a/46071447/3063 – same for derived_site_packages.

The packages installed in the base environment will be available in the derived environment. You can verify this by doing pip list in the derived environment.

# Deactivating the base environment is optional, # meaning that the derived environment can be activated directly too. deactivate . ./derived/bin/activate pip list

To install your custom Python packages and run your script in the custom environment, you don't necessarily need to activate the derived environment. You can call the derived Python environment's pip and python directly and it should just work:

./derived/bin/pip install a==1 b==2 c==3 ./derived/bin/python program_123.py

Thank you. Looks like this should work. (We'll need to add the user packages to the .pth file too.)

Collectives™ on Stack Overflow

Creating a cheap "overlay" Python virtual environment on top of current one without re-installing packages

1 Answer 1

1 Comment

Linked

Hot Network Questions