Return to Answer

edited body

edited May 23, 2023 at 5:12

3.9k
10
21

If the script above doesn't woruwork then a Regex solution might be helpful on the slurped file (\v stands for vertical-whitespace). Raku claims to honor the Unicode definition of Line Boundaries within the Raku Regex dialect: https://unicode.org/reports/tr18/#Line_Boundaries .

If the script above doesn't woru then a Regex solution might be helpful on the slurped file (\v stands for vertical-whitespace). Raku claims to honor the Unicode definition of Line Boundaries within the Raku Regex dialect: https://unicode.org/reports/tr18/#Line_Boundaries .

If the script above doesn't work then a Regex solution might be helpful on the slurped file (\v stands for vertical-whitespace). Raku claims to honor the Unicode definition of Line Boundaries within the Raku Regex dialect: https://unicode.org/reports/tr18/#Line_Boundaries .

Source Link

answered May 23, 2023 at 5:07

jubilatious1

3.9k
10
21

Using Raku (formerly known as Perl_6)

If the OP believes the problem to be Unicode-based, passing through a Raku script might help, since Raku handles UTF-8 by default:

~$ cat dos2unix.raku my $fh1 = open $*IN, :r; #below use :w (write-only) or :x (:x write-only :exclusive i.e. 'no-clobber') my $fh2 = open $*OUT, :x, nl-out => "\n"; for $fh1.lines() { $fh2.put($_) }; $fh1.close; $fh2.close;

Save the above file to a script (e.g. "dos2unix.raku"), add a shebang line and make it executable--or simply call it at the command line:

~$ raku dos2unix.raku < ends_with_CRLF.txt > ends_with_LF.txt

Example Input with DOS line endings (0d 0a per line):

~$ jot -w '%d' 5 | raku unix2dos.raku | hexdump -C 00000000 31 0d 0a 32 0d 0a 33 0d 0a 34 0d 0a 35 0d 0a |1..2..3..4..5..|

Example Output converted to Unix line endings (0a per line):

~$ jot -w '%d' 5 | raku unix2dos.raku | raku dos2unix.raku | hexdump -C 00000000 31 0a 32 0a 33 0a 34 0a 35 0a |1.2.3.4.5.| 0000000a

Above replicates authentic Unix line endings (0a per line):

~$ jot -w '%d' 5 | hexdump -C 00000000 31 0a 32 0a 33 0a 34 0a 35 0a |1.2.3.4.5.| 0000000a

~$ raku -e 'slurp.subst(:global, / \v /, "\n").chop.put;' file #OR ~$ raku -e 'slurp.subst(:global, / <+ :Zl + :Zp> /, "\n").chop.put;' file

See the first link below for the unix2dos.raku script (i.e. the converse answer).

References:
https://unix.stackexchange.com/a/743445/227738
https://docs.raku.org/language/newline.html
https://raku.org

Example Source:
https://unix.stackexchange.com/a/742732/227738