OK, my first attempt was unbearably slow. Here is a good solution that was able to process a 1.8 GB file in 2 min 48 sec :-)
I used hybrid batch/JScript, so it runs on any Windows machine from XP onward - no 3rd party exe file is needed, nor is any compilation needed.
I read and write ~1 MB chunks. The logic is actually pretty simple.
I replace all \r\n with a single space, and #@#@# with \r\n. You can easily change the string values in the code to suit your needs.
fixLines.bat
@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment ::--- Batch section within JScript comment that calls the internal JScript ---- @echo off setlocal disableDelayedExpansion if "%~1" equ "" ( echo Error: missing input argument exit /b 1 ) if "%~2" equ "" ( set "out=%~f1.new" ) else ( set "out=%~2" ) <"%~1" >"%out%" cscript //nologo //E:JScript "%~f0" if "%~2" equ "" move /y "%out%" "%~1" >nul exit /b ----- End of JScript comment, beginning of normal JScript ------------------*/ var delim='#@#@#', delimReplace='\r\n', nl='\r\n', nlReplace=' ', pos=0, str=''; var delimRegex=new RegExp(delim,"g"), nlRegex=new RegExp(nl,"g"); while( !WScript.StdIn.AtEndOfStream ) { str=str.substring(pos)+WScript.StdIn.Read(1000000); pos=str.lastIndexOf(delim) if (pos>=0) { pos+=delim.length; WScript.StdOut.Write(str.substring(0,pos).replace(nlRegex,nlReplace).replace(delimRegex,delimReplace)); } else { pos=0 } } if (str.length>pos) WScript.StdOut.Write(str.substring(pos).replace(nlRegex,nlReplace));
To fix input.txt and write the output to output.txt:
fixLines input.txt output.txt
To overwrite the original file test.txt
fixLines test.txt
Just for kicks, I attempted to process the 1.8 GB file using JREPL.BAT. I didn't think it would work because it must load the entire file into memory. It doesn't matter how much memory is installed in the computer - JScript is limited to 2GB max string size. And I think there are additional constraints that come into play.
jrepl "\r?\n:#@#@#" " :\r\n" /m /x /t : /f input.txt /o output.txt
It took 5 minutes for the command to fail with an "Out Of Memory" error. And then it took a long time for my computer to recover from the serious abuse of memory.
Below is my original custom batch/JScript solution that reads and writes one character at a time.
slow.bat
@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment ::--- Batch section within JScript comment that calls the internal JScript ---- @echo off setlocal disableDelayedExpansion if "%~1" equ "" ( echo Error: missing input argument exit /b 1 ) if "%~2" equ "" ( set "out=%~f1.new" ) else ( set "out=%~2" ) <"%~1" >"%out%" cscript //nologo //E:JScript "%~f0" if "%~2" equ "" move /y "%out%" "%~1" >nul exit /b ----- End of JScript comment, beginning of normal JScript ------------------*/ var delim='#@#@#', delimReplace='\r\n', nlReplace=' ', read=1, write=2, pos=0, char; while( !WScript.StdIn.AtEndOfStream ) { chr=WScript.StdIn.Read(1); if (chr==delim.charAt(pos)) { if (++pos==delim.length) { WScript.StdOut.Write(delimReplace); pos=0; } } else { if (pos) { WScript.StdOut.Write(delim.substring(0,pos)); pos=0; } if (chr=='\n') { WScript.StdOut.Write(nlReplace); } else if (chr!='\r') { WScript.StdOut.Write(chr); } } } if (pos) WScript.StdOut.Write(delim.substring(0,pos));
It worked, but it was a dog. Here is a summary of timing results to process a 155 MB file:
slow.bat 3120 sec (52 min) jrepl.bat 55 sec fixLines.bat 15 sec
I verified that all three solutions gave the same result.