Return to Answer

Corrected terminology of \K assertion

edited Oct 27, 2014 at 23:57

48.7k
20
136
141

Try the following script:

#!/bin/bash logfile="$1" nfiles=$(grep -c 'checking file' "$logfile") failed_userid=($(grep -oP 'failed reading user id: \K[^ ]*' "$logfile")) corrupted_files=($(grep -oP '[^ ]*(?= is corrupt)' "$logfile")) echo "Total Number of Files Scanned - $nfiles" echo "Total Number of Unique User ID failed - ${#failed_userid[@]}" echo "Total Number of Files Corrupted - ${#corrupted_files[@]}" echo echo "List of Unique User Id's which are corrupt - " for uid in "${failed_userid[@]}"; do echo "$uid" done echo echo "Files which are corrupted - " for corf in "${corrupted_files[@]}"; do echo "$corf" done

Run it with

$ ./script file.log

The result for input from your question looks like

Total Number of Files Scanned - 3 Total Number of Unique User ID failed - 3 Total Number of Files Corrupted - 1 List of Unique User Id's which are corrupt - 18446744073135142816 18446744073698151136 18446744072929739296 Files which are corrupted - /database/batch/p1_snapshot//p1_weekly_1980_0_200003_5.data

Short explanation:

-c option of grep counts the matching lines
-P enables perl regular expresions syntax
-o matches only part of lines
(?= construct is the so called positive look-ahead (take it as pattern, but do not include to the output)
\K is look-behind assertion (take whole pattern, but throw away from result everything up to this point)

The rest should be obvious. Be aware however that I've assumed there are no whitespaces in file names!

Try the following script:

#!/bin/bash logfile="$1" nfiles=$(grep -c 'checking file' "$logfile") failed_userid=($(grep -oP 'failed reading user id: \K[^ ]*' "$logfile")) corrupted_files=($(grep -oP '[^ ]*(?= is corrupt)' "$logfile")) echo "Total Number of Files Scanned - $nfiles" echo "Total Number of Unique User ID failed - ${#failed_userid[@]}" echo "Total Number of Files Corrupted - ${#corrupted_files[@]}" echo echo "List of Unique User Id's which are corrupt - " for uid in "${failed_userid[@]}"; do echo "$uid" done echo echo "Files which are corrupted - " for corf in "${corrupted_files[@]}"; do echo "$corf" done

Run it with

$ ./script file.log

The result for input from your question looks like

Total Number of Files Scanned - 3 Total Number of Unique User ID failed - 3 Total Number of Files Corrupted - 1 List of Unique User Id's which are corrupt - 18446744073135142816 18446744073698151136 18446744072929739296 Files which are corrupted - /database/batch/p1_snapshot//p1_weekly_1980_0_200003_5.data

Short explanation:

-c option of grep counts the matching lines
-P enables perl regular expresions syntax
-o matches only part of lines
(?= construct is the so called positive look-ahead (take it as pattern, but do not include to the output)
\K is look-behind assertion (take whole pattern, but throw away from result everything up to this point)

Try the following script:

#!/bin/bash logfile="$1" nfiles=$(grep -c 'checking file' "$logfile") failed_userid=($(grep -oP 'failed reading user id: \K[^ ]*' "$logfile")) corrupted_files=($(grep -oP '[^ ]*(?= is corrupt)' "$logfile")) echo "Total Number of Files Scanned - $nfiles" echo "Total Number of Unique User ID failed - ${#failed_userid[@]}" echo "Total Number of Files Corrupted - ${#corrupted_files[@]}" echo echo "List of Unique User Id's which are corrupt - " for uid in "${failed_userid[@]}"; do echo "$uid" done echo echo "Files which are corrupted - " for corf in "${corrupted_files[@]}"; do echo "$corf" done

Run it with

$ ./script file.log

The result for input from your question looks like

Total Number of Files Scanned - 3 Total Number of Unique User ID failed - 3 Total Number of Files Corrupted - 1 List of Unique User Id's which are corrupt - 18446744073135142816 18446744073698151136 18446744072929739296 Files which are corrupted - /database/batch/p1_snapshot//p1_weekly_1980_0_200003_5.data

Short explanation:

-c option of grep counts the matching lines
-P enables perl regular expresions syntax
-o matches only part of lines
(?= construct is the so called positive look-ahead (take it as pattern, but do not include to the output)
\K is look-behind assertion (take whole pattern, but throw away from result everything up to this point)

The rest should be obvious. Be aware however that I've assumed there are no whitespaces in file names!

Corrected terminology of \K assertion

Source Link

edited Oct 27, 2014 at 23:52

jimmij

48.7k
20
136
141

Try the following script:

#!/bin/bash logfile="$1" nfiles=$(grep -c 'checking file' "$logfile") failed_userid=($(grep -oP 'failed reading user id: \K[^ ]*' "$logfile")) corrupted_files=($(grep -oP '[^ ]*(?= is corrupt)' "$logfile")) echo "Total Number of Files Scanned - $nfiles" echo "Total Number of Unique User ID failed - ${#failed_userid[@]}" echo "Total Number of Files Corrupted - ${#corrupted_files[@]}" echo echo "List of Unique User Id's which are corrupt - " for uid in "${failed_userid[@]}"; do echo "$uid" done echo echo "Files which are corrupted - " for corf in "${corrupted_files[@]}"; do echo "$corf" done

Run it with

$ ./script file.log

The result for input from your question looks like

Total Number of Files Scanned - 3 Total Number of Unique User ID failed - 3 Total Number of Files Corrupted - 1 List of Unique User Id's which are corrupt - 18446744073135142816 18446744073698151136 18446744072929739296 Files which are corrupted - /database/batch/p1_snapshot//p1_weekly_1980_0_200003_5.data

Short explanation:

-c option of grep counts the matching lines
-P enables perl regular expresions syntax
-o matches only part of lines
(?= construct is the so called positive look-ahead (take it as pattern, but do not include to the output)
\K is negative look-behind assertion (take whole pattern, but throw away from result everything up to this point)

Try the following script:

#!/bin/bash logfile="$1" nfiles=$(grep -c 'checking file' "$logfile") failed_userid=($(grep -oP 'failed reading user id: \K[^ ]*' "$logfile")) corrupted_files=($(grep -oP '[^ ]*(?= is corrupt)' "$logfile")) echo "Total Number of Files Scanned - $nfiles" echo "Total Number of Unique User ID failed - ${#failed_userid[@]}" echo "Total Number of Files Corrupted - ${#corrupted_files[@]}" echo echo "List of Unique User Id's which are corrupt - " for uid in "${failed_userid[@]}"; do echo "$uid" done echo echo "Files which are corrupted - " for corf in "${corrupted_files[@]}"; do echo "$corf" done

Run it with

$ ./script file.log

The result for input from your question looks like

Total Number of Files Scanned - 3 Total Number of Unique User ID failed - 3 Total Number of Files Corrupted - 1 List of Unique User Id's which are corrupt - 18446744073135142816 18446744073698151136 18446744072929739296 Files which are corrupted - /database/batch/p1_snapshot//p1_weekly_1980_0_200003_5.data

Short explanation:

-c option of grep counts the matching lines
-P enables perl regular expresions syntax
-o matches only part of lines
(?= construct is the so called positive look-ahead
\K is negative look-behind

Try the following script:

#!/bin/bash logfile="$1" nfiles=$(grep -c 'checking file' "$logfile") failed_userid=($(grep -oP 'failed reading user id: \K[^ ]*' "$logfile")) corrupted_files=($(grep -oP '[^ ]*(?= is corrupt)' "$logfile")) echo "Total Number of Files Scanned - $nfiles" echo "Total Number of Unique User ID failed - ${#failed_userid[@]}" echo "Total Number of Files Corrupted - ${#corrupted_files[@]}" echo echo "List of Unique User Id's which are corrupt - " for uid in "${failed_userid[@]}"; do echo "$uid" done echo echo "Files which are corrupted - " for corf in "${corrupted_files[@]}"; do echo "$corf" done

Run it with

$ ./script file.log

The result for input from your question looks like

Total Number of Files Scanned - 3 Total Number of Unique User ID failed - 3 Total Number of Files Corrupted - 1 List of Unique User Id's which are corrupt - 18446744073135142816 18446744073698151136 18446744072929739296 Files which are corrupted - /database/batch/p1_snapshot//p1_weekly_1980_0_200003_5.data

Short explanation:

-c option of grep counts the matching lines
-P enables perl regular expresions syntax
-o matches only part of lines
(?= construct is the so called positive look-ahead (take it as pattern, but do not include to the output)
\K is look-behind assertion (take whole pattern, but throw away from result everything up to this point)

added 246 characters in body

Source Link

edited Oct 27, 2014 at 23:40

jimmij

48.7k
20
136
141

Try the following script:

#!/bin/bash logfile="$1" nfiles=$(grep -c 'checking file' "$logfile") failed_userid=($(grep -oP 'failed reading user id: \K[^ ]*' "$logfile")) corrupted_files=($(grep -oP '[^ ]*(?= is corrupt)' "$logfile")) echo "Total Number of Files Scanned - $nfiles" echo "Total Number of Unique User ID failed - ${#failed_userid[@]}" echo "Total Number of Files Corrupted - ${#corrupted_files[@]}" echo echo "List of Unique User Id's which are corrupt - " for uid in "${failed_userid[@]}"; do echo "$uid" done echo echo "Files which are corrupted - " for corf in "${corrupted_files[@]}"; do echo "$corf" done

Run it with

$ ./script file.log

The result for input from your question looks like

Total Number of Files Scanned - 3 Total Number of Unique User ID failed - 3 Total Number of Files Corrupted - 1 List of Unique User Id's which are corrupt - 18446744073135142816 18446744073698151136 18446744072929739296 Files which are corrupted - /database/batch/p1_snapshot//p1_weekly_1980_0_200003_5.data

Short explanation:

-c option of grep counts the matching lines

-P enables perl regular expresions syntax

-o matches only part of lines

(?= construct is the so called positive look-ahead

\K is negative look-behind

Try the following script:

#!/bin/bash logfile="$1" nfiles=$(grep -c 'checking file' "$logfile") failed_userid=($(grep -oP 'failed reading user id: \K[^ ]*' "$logfile")) corrupted_files=($(grep -oP '[^ ]*(?= is corrupt)' "$logfile")) echo "Total Number of Files Scanned - $nfiles" echo "Total Number of Unique User ID failed - ${#failed_userid[@]}" echo "Total Number of Files Corrupted - ${#corrupted_files[@]}" echo echo "List of Unique User Id's which are corrupt - " for uid in "${failed_userid[@]}"; do echo "$uid" done echo echo "Files which are corrupted - " for corf in "${corrupted_files[@]}"; do echo "$corf" done

Run it with

$ ./script file.log

The result for input from your question looks like

Total Number of Files Scanned - 3 Total Number of Unique User ID failed - 3 Total Number of Files Corrupted - 1 List of Unique User Id's which are corrupt - 18446744073135142816 18446744073698151136 18446744072929739296 Files which are corrupted - /database/batch/p1_snapshot//p1_weekly_1980_0_200003_5.data

Try the following script:

#!/bin/bash logfile="$1" nfiles=$(grep -c 'checking file' "$logfile") failed_userid=($(grep -oP 'failed reading user id: \K[^ ]*' "$logfile")) corrupted_files=($(grep -oP '[^ ]*(?= is corrupt)' "$logfile")) echo "Total Number of Files Scanned - $nfiles" echo "Total Number of Unique User ID failed - ${#failed_userid[@]}" echo "Total Number of Files Corrupted - ${#corrupted_files[@]}" echo echo "List of Unique User Id's which are corrupt - " for uid in "${failed_userid[@]}"; do echo "$uid" done echo echo "Files which are corrupted - " for corf in "${corrupted_files[@]}"; do echo "$corf" done

Run it with

$ ./script file.log

The result for input from your question looks like

Total Number of Files Scanned - 3 Total Number of Unique User ID failed - 3 Total Number of Files Corrupted - 1 List of Unique User Id's which are corrupt - 18446744073135142816 18446744073698151136 18446744072929739296 Files which are corrupted - /database/batch/p1_snapshot//p1_weekly_1980_0_200003_5.data

Short explanation:

-c option of grep counts the matching lines

-P enables perl regular expresions syntax

-o matches only part of lines

(?= construct is the so called positive look-ahead

\K is negative look-behind

Source Link

answered Oct 27, 2014 at 23:33

jimmij

48.7k
20
136
141