I had an operation that works thru originallist (dowork.sh originallist), and lets you know what it has finished into cleaned1. cleaned1 is sorted differently than originallist. I need to generate a list of what's left for dowork.sh to process. Essentially: list cleanedR - list cleaned1 = list cleaned2. It's a minus operation. I found that I can do that operation with the following grep options:
- F for full line match instead of regular expression (we don't want grep freaking out at filename characters thinking they are regular expressions),
- v for exclude (which is the minus operation),
- f for look thru the file cleaned1 for the expressions instead of a single given expression ("obtain PATTERN from FILE").
# wc -l cleaned* 9157094 cleaned1 14283591 cleanedR # du -sh cleaned* 1.3G cleaned1 2.0G cleanedR # grep -Fvf cleaned1 originallist > cleaned2 runs for 5 minutes, uses up 42G of ram or less but a lot of it, then exits with failure; cleaned2 is 0 bytes long.
cleaned2 at the end should be 14283591 - 9157094 = 5126497 lines long
This is the correct syntax for doing such an operation (I tested it with a 10 line long cleanedR and a 3 line long cleaned1; the resultant cleaned2 was 7 lines), however it uses up a lot of ram. Is there a way to make this work by making grep not use up so much ram? I know it will take a while, but I am okay with it.
I am looking for something like sort's -T option, which allows you to not use up /tmp (ram in my case), and allow you to use another directory
sort -h -T, --temporary-directory=DIR use DIR for temporaries, not $TMPDIR or /tmp; multiple options specify multiple directories
comm.-xoption help?