I have some text files which I am trying to transform with a Perl script on Windows. The text files look normal in Notepad+, but all the regexes in my script were failing to match. Then I noticed that when I open the text files in NotePad+, the status bar says "UCS-2 Little Endia" (sic). I am assuming this corresponds to the encoding UCS-2LE. So I created "readFile" and "writeFile" subs in Perl, like so:
use PerlIO::encoding; my $enc = ':encoding(UCS-2LE)'; sub readFile { my ($fName) = @_; open my $f, "<$enc", $fName or die "can't read $fName\n"; local $/; my $txt = <$f>; close $f; return $txt; } sub writeFile { my ($fName, $txt) = @_; open my $f, ">$enc", $fName or die "can't write $fName\n"; print $f $txt; close $f; } my $fName = 'someFile.txt'; my $txt = readFile $fName; # ... transform $txt using s/// ... writeFile $fName, $txt; Now the regexes match (although less often than I expect), but the output contains long strings of Asian-looking characters interspersed with longs strings of the correct text. Is my code wrong? Or perhaps Notepad+ is wrong about the encoding? How should I proceed?