12

I frequently have to re-create virtual environments from a requirements.txt and I am already using $PIP_DOWNLOAD_CACHE. It still takes a lot of time and I noticed the following:

Pip spends a lot of time between the following two lines:

Downloading/unpacking SomePackage==1.4 (from -r requirements.txt (line 2)) Using download cache from $HOME/.pip_download_cache/cached_package.tar.gz 

Like ~20 seconds on average to decide it's going to use the cached package, then the install is fast. This is a lot of time when you have to install dozens of packages (actually enough to write this question).

What is going on in the background? Are they some sort of integrity checks against the online package?

Is there a way to speed this up?

edit: Looking at:

time pip install -v Django==1.4 

I get:

real 1m16.120s user 0m4.312s sys 0m1.280s 

The full output is here http://pastebin.com/e4Q2B5BA. Looks like pip is spending his time looking for a valid download link while it already has a valid cache of http://pypi.python.org/packages/source/D/Django/Django-1.4.tar.gz.

Is there a way to look for the cache first and stop there if versions match?

4
  • I'm wondering if this has anything to do with Issue #304 "install -U foo reinstalls foo's dependencies even if they're already satisfied" ( github.com/pypa/pip/issues/304 ). This is probably totally unrelated, but its another weird PIP issue. Commented Sep 13, 2012 at 15:28
  • In this case i'm just installing new packages in a clean virtualenv, no upgrades. Commented Sep 13, 2012 at 15:53
  • Yeah, I just meant to suggest that since the upgrade code hits the net to check for / get packages even when it really shouldn't - there might be something in the 'use cached' code that does the same. Commented Sep 13, 2012 at 16:24
  • Ah, right, that's the same kind of behavior. Commented Sep 13, 2012 at 16:45

2 Answers 2

10

After spending some time to study the pip internals and to profile some package installations I came to the conclusion that even with a download cache, pip does the following for each package :

  • go to the main index url, usually http://pypi.python.org/simple// (example)
  • follows every link to fetch additional web pages
  • extracts all links from all those pages
  • checks the validity of all the links against the package name and version requirements
  • selects the most recent version from the valid links

Now pip has a download url, checks against the download cache folder if configured and eventually decides not to use this url if a local file named after the url is present.

My guess is that we could save a lot of time by checking the cache upfront but I do not have a good enough understanding of all the pip code base to start the required modifications. Of course it would only be for exact version number requirements, ==, because with other constraints, like >= or >, we still want to crawl the web looking for the latest version.

Nevertheless, I was able to make a small pull request which will save us some time if merged.

Sign up to request clarification or add additional context in comments.

1 Comment

Also have a look at devpi, which acts as a local proxy to pypi. Using a proxy like this will save pip from going to the interwebs, and you'll have a speedy installation!
3

One alternative may be to avoid rebuilding the virtualenv and to instead take a copy of a master virtual environment that you can update and copy as required.

virtualenvwrapper provides some support for doing this with the cpvirtualenv command

1 Comment

I'll add that a recent installation of virtualenvwrapper is needed (ie. after this bugfix). My installation dated from march and cpvirtualenv took a lot of time and did not preserve the --no-site-packages option. Now it works well, and fast (couple of seconds), given you have a similar virtualenv at hand of course.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.