There are a lot of answers already, but unfortunately most of them are just tiny economies on a barely optimizable problem...
I worked on several projects where line count was the core function of the software, and working as fast as possible with a huge number of files was of paramount importance.
The main bottleneck with line count is I/O access, as you need to read each line in order to detect the line return character, there is simply no way around. The second potential bottleneck is memory management: the more you load at once, the faster you can process, but this bottleneck is negligible compared to the first.
Hence, there are three major ways to reduce the processing time of a line count function, apart from tiny optimizations such as disabling GC collection and other micro-managing tricks:
Hardware solution: the major and most obvious way is non-programmatic: buy a very fast SSD/flash hard drive. By far, this is how you can get the biggest speed boosts.
Data preparation solutionpreprocessing and lines parallelization: if you generate or can modify how the files you process are generated, or if it's acceptable that you can preprocess them. First convert the line return to Unix style (
\n) as this will save 1 character compared to Windows or macOS styles (not a big save, but it's an easy gain), and secondly and most importantly, you can potentially write lines of fixed length. If you need variable length, you can always pad smaller lines if the length variability is not that big. This way, you can calculate instantly the number of lines from the total file size, which is much faster to access. Also, by having fixed length lines, not only can you generally pre-allocate memory which will speed up processing, but also you can process lines in parallel! Of course, parallelization works better with a flash/SSD disk that has much faster random access I/O than HDDs.. Often, the best solution to a problem is to preprocess it so that it better fits your end purpose.ParallelizationDisks parallelization + hardware solution: if you can buy multiple hard disks (and if possible SSD flash disks), then you can even go beyond the speed of one disk by leveraging parallelization, by storing your files in a balanced way (easiest is to balance by total size) among disks, and then read in parallel from all those disks. Then, you can expect to get a multiplier boost in proportion with the number of disks you have. If buying multiple disks is not an option for you, then parallelization likely won't help (except if your disk has multiple reading headers like some professional-grade disks, but even then the disk's internal cache memory and PCB circuitry will likely be a bottleneck and prevent you from fully using all heads in parallel, plus you have to devise a specific code for this hard drive you'll use because you need to know the exact cluster mapping so that you store your files on clusters under different heads, and so that you can read them with different heads after). Indeed, it's commonly known that sequential reading is almost always faster than random reading, and parallelization on a single disk will have a performance more similar to random reading than sequential reading (you can test your hard drive speed in both aspects using CrystalDiskMark for example).
If none of those are an option, then you can only rely on micromanaging tricks to improve by a few percents the speed of your line counting function, but don't expect anything really significant. Rather, you can expect the time you'll spend tweaking will be disproportionate compared to the returns in speed improvement you'll see.