The immediate thought is wc, but then the next not-so-immediate thought is... Is *nix's wc purely for *nix line endings \x0a?... It seems so.
I've semi-wangled my way around it, but I feel there may/must be a simpler way than working on a hex-dump of the original.
Here is my version, but there is still a mysterious discrepancy in the tallies. wc reports 1 more 0a than the sum of this script's CRLF + 0a.
file="nagaricb.nag" echo Report on CR and LF in UTF-16LE/CR-LF echo ===================================== cat "$file" | # a useles comment, courtesy of cat xxd -p -c 2 | sed -nr ' /0a../{ /0a00/!{ i ‾‾`0a: embedded in non-newline chars b } } /0d../{ /0d00/!{ i ‾‾`0d: embedded in non-newline chars b } } /0a00/{ i ‾‾`CR: found stray 0a00 b } /0d00/{ N /0d00\n0a00/{ i ‾‾`CRLF: found as normal newline pairs b } i ‾‾`LF: found stray 0d00 }' | sort | uniq -c echo " =====" printf ' %s ‾‾`wc\n' $(<"$file" wc -l) Output
Report on CR and LF in UTF-16LE/CR-LF ===================================== 125 ‾‾`0a: embedded in non-newline chars 407 ‾‾`0d: embedded in non-newline chars 31826 ‾‾`CRLF: found as normal newline pairs ===== 31952 ‾‾`wc Is there some more standard/simple way to do this?
wc (GNU coreutils) 7.4wc (GNU coreutils) 8.14ਊor \u090aऊ... That's the only time the problem shows itself...My file has 532 such chars.0athat is not "legitimate" I guess, to fix your script. (xx0a doesn't get counted, 0a0a only counts for one, if I understand it correctly).wc(andawk's counting ofNRis out by a further 1).. the above script's line-count is the same as shown inemacs... I'm just trying to find a less clumsy way of counting lines in a UTF-16LE/CR-LF (with BOM, in this case, if that makes a difference) file..