Try the following script:
#!/bin/bash
logfile="$1"
nfiles=$(grep -c 'checking file' "$logfile")
failed_userid=($(grep -oP 'failed reading user id: \K[^ ]*' "$logfile"))
corrupted_files=($(grep -oP '[^ ]*(?= is corrupt)' "$logfile"))
echo "Total Number of Files Scanned - $nfiles"
echo "Total Number of Unique User ID failed - ${#failed_userid[@]}"
echo "Total Number of Files Corrupted - ${#corrupted_files[@]}"
echo
echo "List of Unique User Id's which are corrupt - "
for uid in "${failed_userid[@]}"; do
echo "$uid"
done
echo
echo "Files which are corrupted - "
for corf in "${corrupted_files[@]}"; do
echo "$corf"
done
Run it with
$ ./script file.log
The result for input from your question looks like
Total Number of Files Scanned - 3
Total Number of Unique User ID failed - 3
Total Number of Files Corrupted - 1
List of Unique User Id's which are corrupt -
18446744073135142816
18446744073698151136
18446744072929739296
Files which are corrupted -
/database/batch/p1_snapshot//p1_weekly_1980_0_200003_5.data
Short explanation:
- `-c` option of grep counts the matching lines
- `-P` enables perl regular expresions syntax
- `-o` matches only part of lines
- `(?=` construct is the so called positive look-ahead (take it as pattern, but do not include to the output)
- `\K` is look-behind assertion (take whole pattern, but throw away from result everything up to this point)