9

I'd like to get a list of all files in my Gentoo Linux system that were not installed by the package manager (Portage). This is because I want to keep my system as clean as possible, removing all useless files lying around.

Let me tell you what I've tried until now. First of all, I generate the list of all files that belong to some package tracked by Portage:

equery files "*" | sort | uniq > portage.txt 

Then I generate the list of all files on my system, except those that I don't care about:

find / \( -path /dev -o -path /proc -o -path /sys -o -path /media \ -o -path /mnt -o -path /usr/portage -o -path /var/db/pkg \ -o -path /var/www/localhost/htdocs -o -path /lib64/modules \ -o -path /usr/src -o -path /var/cache -o -path /home \ -o -path /root -o -path /run -o -path /var/run -o -path /var/tmp \ -o -path /var/log -o -path /tmp -o -path /etc/config-archive \ -o -path /usr/local/portage -o -path /boot \) -prune \ -o -type f | sort | uniq > all.txt 

Finally, I get the list of all files that are not tracked by Portage:

comm -13 portage.txt all.txt > extra.txt 

Some statistics:

wc -l portage.txt all.txt extra.txt 127724 portage.txt 78371 all.txt 8438 extra.txt 

As you can see I still get more than eight thousands extra files. I'd like to reduce that number, in order to focus more on files that really need to be deleted.

I noticed that in extra.txt there are thousands of files in a small number of directories, such as /usr/lib64/gcc, /usr/lib64/python2.7 and /usr/lib64/python3.2. The /usr/lib64/gcc/x86_64-pc-linux-gnu/4.6.3/crtbegin.o file, for example, is not in portage.txt because, in its place, there is /usr/lib/gcc/x86_64-pc-linux-gnu/4.6.3/crtbegin.o. On my system /usr/lib is a symlink to /usr/lib64. So it seems that I need to properly handle symlinks to get better results. Perhaps by adding in portage.txt all files they point to. I don't really know how to do that.

Also, why portage.txt is bigger than all.txt? Shouldn't be the opposite since files tracked by Portage are a subset of all files in my system?

Finally, am I forgetting any other location in the find command that should be also excluded?

3
  • 1
    "This is because I want to keep my system as clean as possible, removing all useless files lying around." — is your own time you've already spent on that cheaper than wasted megabytes of disk space? :) Commented Oct 14, 2012 at 17:00
  • Well, I should have said that it's also for finding files that belong to a package that has not been installed via the package manager. I needed a program but no recent ebuild was available, and I have yet to learn how to write ebuilds properly. Commented Oct 14, 2012 at 18:36
  • This might be helpful: us.generation-nt.com/answer/… Commented Oct 15, 2012 at 21:34

3 Answers 3

6

What you are looking for might be qfile. It is part of app-portage/portage-utils package and provides option -o or --orphans. You can use something like

find /usr/bin -type f | xargs -I{} qfile -o {} 

to get a list of orphaned files in /usr/bin.

Remark: Sadly, qfile in the current stable version of portage-utils, does not support readin from stdin, and the solution mentioned in the man page of qfile qfile -o $(find /usr/bin) does not work if the find result set is large, therefore we have to work around it a little bit, using xargs.

BTW, this is not something I myself came up with, but I found it at gossamer-threads, a comment by yvasilev.

3
  • Gentoo doesn't use the Debian package manager. Commented Jan 9, 2016 at 22:22
  • 1
    True. Gentoo uses portage. Like the original question clearly stated. Who wanted to know how to find orphaned files on a Debian system? Commented Jan 11, 2016 at 0:01
  • I've changed find /usr/bin | xargs -I{} qfile -o {} to find /usr/bin -type f | xargs -I{} qfile -o {}, because without -type f in the find command qfile would also check all the symlinks, which normally don't belong to any packages... Commented Jan 9, 2022 at 22:21
1

I managed to fix the problem related to symlinks in portage.txt by running the following command:

equery files '*' | while read i; do readlink -e "${i}"; done | sort | uniq \ > portage.txt 

This serves to put in portage.txt the files symlinks point to, and not symlinks themselves. It's needed because the find command that creates all.txt doesn't list any symlink, but just the files they point to, so there would be a lot of false positives otherwise. It's quite a slow command, as it runs readlink on thousands of files, but I couldn't find a better solution. Any suggestion is welcome.

Another thing I understood (this was easier) is why portage.txt was bigger than all.txt. This is mainly due to the fact that I explicitely pruned the /usr/src directory and all files beneath from the results of the find command, but equery listed them regardless.

The last thing I did, even if this was not in the question, was to ignore Python stuff (mostly __pycache__ files and files with the .pyc or .pyo suffix):

grep '\(\.cpython-32\)\?\.py[co]$\|/__pycache__' candidates.txt \ > candidates-bytecode.txt sed -e 's/\(\.cpython-32\)\?\.py[co]$/.py/' \ -e 's/\/__pycache__//' \ candidates-bytecode.txt | sort | uniq \ > candidates-bytecode-source.txt comm -23 candidates-bytecode-source.txt portage.txt \ > orphaned-bytecode.txt 

This way I trace the origin of all Python stuff and check if it's in portage.txt. As you can see I wrote the same regular expression two times, one for the grep command and the other for the sed command, but perhaps it can be done in just a single step.

1
  • It would probably be a lot faster, by simply using cat /var/db/pkg/*/*/CONTENTS | sed -r 's/^... //; s/ ([0-9a-f]+ )[0-9]+$//; s/ -> .*$//' directly, instead of the amazingly slow Python equery files '*' Commented Aug 30, 2016 at 11:10
0

IIRC, gentoo stores package info in plain text (/var/db/ perhaps), direct searching can be slow.

The best way of doing so, is create a sqlitedatabase (or whatever db) for all package files, then list all files on your system, look them up in the db one by one, if not found, it doesn't belong to portage.

You must log in to answer this question.