read-line returns a fresh string when it is called: this has at least two consequences. Each time read-line is called a string is allocated, which brings the overhead of allocation and potential garbage collection. Further, strings in SBCL are treated as unicode strings by default, and the *standard-input* and *standard-output* streams are utf-8 by default. So using read-line brings all of the overhead of both string creation, memory management, and unicode processing.
From the discussion that has resulted in this Q&A, it seems that the real bottleneck has to do with character I/O and bivalent streams such as *standard-input* and *standard-output*.
For reference, here is the timing to process the 1 GB test file for the OP read-line/write-string program when run on my laptop:
$ time cat dummy_1G.txt | ./op-echo > out_1G.txt real 3m49.827s user 1m28.260s sys 2m21.909s
There are surely ways to improve on all of the solutions found below.
A Simple Solution
After trying some byte-buffer solutions, it occurred to me that the main bottleneck is in writing characters to *standard-output*; reading characters from *standard-input* seems less troublesome.
Here is a program that uses read-line to read from *standard-input*, but uses the SBCL string-to-octets together with write-sequence so that bytes may be written to *standard-output* instead of characters.
#!/usr/bin/env -S sbcl --script ;;; simple-echo (declaim (optimize (speed 3))) (defun main () (loop for line = (read-line *standard-input* nil) while line ;; Process `line`. do (write-sequence (sb-ext:string-to-octets (concatenate 'string line "\Newline")) *standard-output*))) (eval-when (:execute) (main))
This seems to be about the simplest thing you could do to get a significant performance boost. This shows an improvement of nearly 6x over the OP program:
$ time cat dummy_1G.txt | ./simple-echo > out_1G.txt real 0m40.311s user 0m38.671s sys 0m2.167s
Another Simple (But Not Portable) Solution
The idea of @ignis volens to use /dev/stdin can be extended to use /dev/stdout and combined with the idea to write bytes instead of characters to output.
#!/usr/bin/env -S sbcl --script (declaim (optimize (speed 3))) (defun main () (with-open-file (in "/dev/stdin") (with-open-file (out "/dev/stdout" :direction :output :if-exists :append :element-type '(unsigned-byte 8)) (loop for line = (read-line in nil) while line ;; Process line. do (write-sequence (sb-ext:string-to-octets (concatenate 'string line "\Newline")) out))))) (eval-when (:execute) (main))
This is the fastest solution I have tested, but it relies on the standard Linux I/O file handles, may or may not work on other Unix-like platforms, and will certainly fail on Windows systems. But for the right user this solution provides a 10x speedup over the original OP code.
$ time cat dummy_1G.txt | ./linux-echo > out_1G.txt real 0m24.187s user 0m22.407s sys 0m2.233s
A Portable Solution
This solution is a little bit faster than the first simple solution, but not as fast as the non-portable solution using /dev/stdin and /dev/stdout. But this solution will work on any platform running SBCL.
In an earlier version of this answer I showed a solution which read the input in blocks of bytes and searched those bytes for newline bytes. This was reasonably fast, but fragile, required platform-specific handling of newlines, and required that lines could sensibly be byte-wise interpreted (meaning that input containing unicode characters would be problematic).
Here is an updated version of that idea which is not so problematic, does not rely on platform-specific treatment of newlines, and can be used with unicode input.
Here the input is read into a byte buffer for speed from *standard-input*. Then the byte buffer is used to create a character buffer using the SBCL extension octets-to-string.
The character buffer is traversed line-by-line as needed when get-line is called. When the character buffer is exhausted, the byte buffer is refilled and a new character buffer is provided.
#!/usr/bin/env -S sbcl --script ;;;; process-lines ;; Buffer for raw I/O bytes. (defparameter *buffer-size* 4096) (defparameter *byte-buffer* (make-array *buffer-size* :element-type '(unsigned-byte 8))) ;; Buffer for characters converted from byte buffer. (defparameter *char-buffer* (make-array 0 :element-type 'character)) (defparameter *end-char* 0) ; One-past last buffered character. (defparameter *start-line* 0) ; Start of next line in character buffer. (declaim (optimize (speed 3)) (type (simple-array (unsigned-byte 8) (*)) *byte-buffer*) (type (simple-array character (*)) *char-buffer*) (type fixnum *buffer-size* *end-char* *start-line*)) (defun get-line () (when (zerop *start-line*) ; Attempt to fill byte buffer when empty. (let ((end-byte (read-sequence *byte-buffer* *standard-input*))) (if (zerop end-byte) (return-from get-line nil) ; Return nil when input is exhausted. (setf *char-buffer* (sb-ext:octets-to-string *byte-buffer* :end end-byte) *end-char* (length *char-buffer*))))) (if (<= *start-line* *end-char*) ; Otherwise: end of input reached. (let ((end-line (position #\Newline *char-buffer* :start *start-line*))) (if end-line (let ((start *start-line*)) (declare (type fixnum start)) (setf *start-line* (+ end-line 1)) (subseq *char-buffer* start (+ end-line 1))) ;; Line end not in byte buffer. (let ((partial-line (subseq *char-buffer* *start-line* *end-char*))) (declare (type (simple-array character) partial-line)) (setf *start-line* 0) (let ((finish-line (get-line))) (if finish-line (concatenate 'string partial-line finish-line) partial-line))))) nil)) (defun main () (do ((line (get-line) (get-line))) ((null line)) ;; Do something with `line` ;; (setf (aref line 0) #\B) ;; (write-string line *standard-output*) ; slow (write-sequence (sb-ext:string-to-octets line) *standard-output*))) (eval-when (:execute) (main))
This solution allows input to be read quickly as bytes from *standard-input*, but the result returned from get-line is a string, so normal string operations can be used for line processing. The processed line is then converted back to bytes before writing to *standard-output*. The result is much faster than the original OP read-line/write-string program:
$ time cat dummy_1G.txt | ./process-lines > out_1G.txt real 0m33.768s user 0m32.459s sys 0m2.210s
Note that using (write-sequence (sb-ext:string-to-octets line) *standard-output*) here instead of (write-string line *standard-output*) resulted in a 7x speedup of this program. But this program is also 7x faster than the OP program, which leads me to believe that the main performance bottleneck is in the writing of output, not the reading of input.
Processing Bytes in Bulk
One way to reduce the overhead is to treat I/O as bytes instead of characters. This avoids the poor character I/O performance of current versions of SBCL. Reading the bytes into a suitably-sized buffer also helps with performance.
Here is a simple script that echos *standard-input* to *standard-output*:
#!/usr/bin/env -S sbcl --script ;;;; my-echo (defparameter *buffer-size* 4096) (defun main () (declare (optimize (speed 3))) (loop with buffer = (make-array *buffer-size* :element-type '(unsigned-byte 8)) for pos = (read-sequence buffer *standard-input*) do (write-sequence buffer *standard-output* :start 0 :end pos) until (< pos *buffer-size*))) (eval-when (:execute) (main))
This program uses read-sequence and write-sequence to process bytes, and on my (old and slowish) laptop the result is almost 30 times faster than the same program using an array of character elements. Here is a timing result for the 1 GB file described by OP:
$ time cat dummy_1G.txt | ./my-echo > out_1G.txt real 0m1.270s user 0m0.238s sys 0m1.367s
For sheer speed in copying of bytes, this program is 180x faster than the OP program.
(declare (type fixnum a b))is enough to arrange for efficient compiled code. I wonder if the trouble here is utf8 parsing? Could we maybe read in N binary bytes, or read in a Latin-1 or binary record terminated by a newline? // Does the corresponding Racket (Scheme) program perform similarly poorly?read-lineallocates a string every time it reads a new line; this probably has a detrimental impact on performance. You might be able to work around this by usingread-sequenceto read blocks of data into an array, then searching through the array for newline characters to locate the lines.