Try the following script:
#!/bin/bash logfile="$1" nfiles=$(grep -c 'checking file' "$logfile") failed_userid=($(grep -oP 'failed reading user id: \K[^ ]*' "$logfile")) corrupted_files=($(grep -oP '[^ ]*(?= is corrupt)' "$logfile")) echo "Total Number of Files Scanned - $nfiles" echo "Total Number of Unique User ID failed - ${#failed_userid[@]}" echo "Total Number of Files Corrupted - ${#corrupted_files[@]}" echo echo "List of Unique User Id's which are corrupt - " for uid in "${failed_userid[@]}"; do echo "$uid" done echo echo "Files which are corrupted - " for corf in "${corrupted_files[@]}"; do echo "$corf" done Run it with
$ ./script file.log The result for input from your question looks like
Total Number of Files Scanned - 3 Total Number of Unique User ID failed - 3 Total Number of Files Corrupted - 1 List of Unique User Id's which are corrupt - 18446744073135142816 18446744073698151136 18446744072929739296 Files which are corrupted - /database/batch/p1_snapshot//p1_weekly_1980_0_200003_5.data Short explanation:
-coption of grep counts the matching lines-Penables perl regular expresions syntax-omatches only part of lines(?=construct is the so called positive look-ahead (take it as pattern, but do not include to the output)\Kis look-behind assertion (take whole pattern, but throw away from result everything up to this point)
The rest should be obvious. Be aware however that I've assumed there are no whitespaces in file names!