1
$ sudo lsof -u t | grep -i "\.pdf" evince 1788 t 37r REG 8,4 176328 134478 /home/t/some/path1/white space/string1 + string2 string3.pdf evince 3737 t 36r REG 8,4 1252636 6692680 /home/t/some/path2/white space/string5 string3.pdf 

How can I extract only the second column (pids of processes)?

How can I extract only the ninth column (pathnames of files)? (pathnames can contain any character allowed by Linux and ext4 file systems)

My real command is

$ sudo lsof -u t | grep -v "wineserv" | grep REG | grep "\.pdf" | grep "string" 

where I would search for records whose first column "COMMAND" isn't wineserv, and fifth column "TYPE" is REG, and whose ninth column "NAME" contains .pdf and string.

Prefer bash, awk or Python solutions (and maybe Perl, but I don't know Perl, so won't be able to verify if it is correct or modify it later)

Thanks.

7
  • 1
    lsof has -F flag according to the manual, so you could do lsof -F p to get just the PID itself. Let me know if you want that as an answer, but of course I can do Python and awk parsing as well Commented Feb 16, 2019 at 1:38
  • @SergiyKolodyazhnyy Thanks, and yes. See my update. Commented Feb 16, 2019 at 1:51
  • 1
    Related: unix.stackexchange.com/q/299040/117549 Commented Feb 16, 2019 at 2:59
  • no need for lsof: find /proc/*/fd -ilname '*.pdf' 2>/dev/null | awk -F/ '{print$3}' (btw, this will also work if the filenames contain newline, spaces, etc). Commented Feb 16, 2019 at 12:56
  • @mosvy Thanks. How is using parsing output of find on /proc file system compared to parsing lsof output? Commented Feb 16, 2019 at 15:14

2 Answers 2

3

Using regular expressions:

$ ... | perl -nlE '/.*? (\d+).*?(\/.*)/ and print("$1 ; $2")' 1788 ; /home/t/some/path1/white space/string1 + string2 string3.pdf 3737 ; /home/t/some/path2/white space/string5 string3.pdf 
2
  • Thanks. By (\/.*), do you assume that lsof always output resolved absolute pathnames not relative pathnames? see unix.stackexchange.com/questions/501002/… Commented Feb 17, 2019 at 2:27
  • @Tim, yes (i though this is the default behavior of lsof). I believe some other situations can also easily be covered (some limitations are predictable) Commented Feb 17, 2019 at 13:13
2

If I understand your requirements this should work:

awk '{ for (i=9; i<=NF; i++) { if ($i ~ "string" && $1 != "wineserv" && $5 == "REG" && $NF ~ "\.pdf$") { $1=$2=$3=$4=$5=$6=$7=$8="" print } }}' 
  • Loop through all the fields from 9 to the end, if one contains string:

    • Check that field 1 does not equal wineserv
    • field 5 does equal REG
    • The last field contains .pdf (I think it's safe to assume that even if the file has whitespace the extension should be in the last part)
  • If all conditions are met erase the first 8 fields and print what's left

11
  • Thanks.$NF ~ ".pdf" the . doesn't work as a literal dot. Commented Feb 16, 2019 at 2:14
  • @Tim: Thanks didn't realize that. I'll update with \ Commented Feb 16, 2019 at 2:15
  • Sorry, forgot to say $NF ~ "\.pdf" doesn't work either. pathnames containing /.../pdf.../... will still match. I don't know why they match. Commented Feb 16, 2019 at 2:23
  • @Tim: How about "\.pdf$" Commented Feb 16, 2019 at 2:32
  • That works. But still why /.../pdf.../... matches \.pdf? Commented Feb 16, 2019 at 2:33

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.