How to read first N lines of a file?

Question

We have a large raw data file that we would like to trim to a specified size.

How would I go about getting the first N lines of a text file in python? Will the OS being used have any effect on the implementation?

can I give n as command line argument

user6882757
– user6882757

2019-07-23 09:07:57 +00:00
Commented Jul 23, 2019 at 9:07 — user6882757
– user6882757, Commented Jul 23, 2019 at 9:07

Community · Accepted Answer · 2023-05-17 05:58:32Z

345

Python 3:

with open(path_to_file) as input_file: head = [next(input_file) for _ in range(lines_number)] print(head)

Python 2:

with open(path_to_file) as input_file: head = [next(input_file) for _ in xrange(lines_number)] print head

Here's another way (both Python 2 & 3):

from itertools import islice with open(path_to_file) as input_file: head = list(islice(input_file, lines_number)) print(head)

edited May 17, 2023 at 5:58

CommunityBot

11 silver badge

answered Nov 20, 2009 at 0:27

John La Rooy

306k54 gold badges378 silver badges514 bronze badges

Sign up to request clarification or add additional context in comments.

16 Comments

Russell Over a year ago

Thanks, that is very helpful indeed. What is the difference between the two? (in terms of performance, required libraries, compatibility etc)?

John La Rooy Over a year ago

I expect the performance to be similar, maybe the first to be slightly faster. But the first one won't work if the file doesn't have at least N lines. You are best to measure the performance against some typical data you will be using it with.

Alasdair Over a year ago

The with statement works on Python 2.6, and requires an extra import statement on 2.5. For 2.4 or earlier, you'd need to rewrite the code with a try...except block. Stylistically, I prefer the first option, although as mentioned the second is more robust for short files.

alicederyn Over a year ago

islice is probably faster as it is implemented in C.

Ilian Iliev Over a year ago

Have in mind that if the files have less then N lines this will raise StopIteration exception that you must handle

|

Apostla · Accepted Answer · 2025-04-13 11:25:35Z

33

N = 10 with open("file.txt", "r") as file: for i in range(N): line = next(file).strip() print(line)

edited Apr 13 at 11:25

Apostla

12711 bronze badges

answered Nov 20, 2009 at 2:04

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

4 Comments

AMC Over a year ago

Why open the file in append mode?

Ekrem Dinçel Over a year ago

@AMC I think it is for not deleting the file, but we should use 'r' here instead.

AMC Over a year ago

@Kowalski Append mode is for adding to the file, r is indeed the more logical choice, I think.

lena Over a year ago

@ghostdog74, how can read the next N values ?

G M · Accepted Answer · 2020-04-08 08:46:23Z

If you want to read the first lines quickly and you don't care about performance you can use .readlines() which returns list object and then slice the list.

E.g. for the first 5 lines:

with open("pathofmyfileandfileandname") as myfile: firstNlines=myfile.readlines()[0:5] #put here the interval you want

Note: the whole file is read so is not the best from the performance point of view but it is easy to use, fast to write and easy to remember so if you want just perform some one-time calculation is very convenient

print firstNlines

One advantage compared to the other answers is the possibility to select easily the range of lines e.g. skipping the first 10 lines [10:30] or the lasts 10 [:-10] or taking only even lines [::2].

The top answer is probably way more efficient, but this one works like a charm for small files.
Note that this actually reads the whole file into a list first (myfile.readlines()) and then splices the first 5 lines of it.
I see no reason to use this, it's not any simpler than the vastly more efficient solutions.
@AMC thanks for the feedback, I use it in the console for exploring the data when I have to have a quick look to the first lines, it just saves me time in writing code.

AMC · Accepted Answer · 2020-08-19 18:39:40Z

13

What I do is to call the N lines using pandas. I think the performance is not the best, but for example if N=1000:

import pandas as pd yourfile = pd.read_csv('path/to/your/file.csv',nrows=1000)

edited Aug 19, 2020 at 18:39

AMC

2,6977 gold badges15 silver badges35 bronze badges

answered Apr 11, 2017 at 14:54

RRuiz

2,23825 silver badges32 bronze badges

4 Comments

philshem Over a year ago

Better would be to use the nrows option, which can be set to 1000 and the entire file isn't loaded. pandas.pydata.org/pandas-docs/stable/generated/… In general, pandas has this and other memory-saving techniques for big files.

RRuiz Over a year ago

Yes, you are right. I just correct it. Sorry for the mistake.

philshem Over a year ago

You may also want to add sep to define a column delimiter (which shouldn't occur in a non-csv file)

AMC Over a year ago

@Cro-Magnon I cannot find the pandas.read() function in the documentation, do you know of any information on the subject?

u0b34a0f6ae · Accepted Answer · 2009-11-20 00:58:29Z

8

There is no specific method to read number of lines exposed by file object.

I guess the easiest way would be following:

lines =[] with open(file_name) as f: lines.extend(f.readline() for i in xrange(N))

edited Nov 20, 2009 at 0:58

u0b34a0f6ae

50k14 gold badges97 silver badges102 bronze badges

answered Nov 20, 2009 at 0:27

artdanil

5,1022 gold badges35 silver badges50 bronze badges

1 Comment

artdanil Over a year ago

This is something I had actually intended. Though, I though of adding each line to list. Thank you.

FatihAkici · Accepted Answer · 2018-03-02 23:42:23Z

The two most intuitive ways of doing this would be:

Iterate on the file line-by-line, and break after N lines.
Iterate on the file line-by-line using the next() method N times. (This is essentially just a different syntax for what the top answer does.)

Here is the code:

# Method 1: with open("fileName", "r") as f: counter = 0 for line in f: print line counter += 1 if counter == N: break # Method 2: with open("fileName", "r") as f: for i in xrange(N): line = f.next() print line

The bottom line is, as long as you don't use readlines() or enumerateing the whole file into memory, you have plenty of options.

The bottom line is, as long as you don't use readlines() or enumerateing the whole file into memory, you have plenty of options. Isn't enumerate() lazy?

fdb · Accepted Answer · 2011-01-20 19:42:58Z

Based on gnibbler top voted answer (Nov 20 '09 at 0:27): this class add head() and tail() method to file object.

class File(file): def head(self, lines_2find=1): self.seek(0) #Rewind file return [self.next() for x in xrange(lines_2find)] def tail(self, lines_2find=1): self.seek(0, 2) #go to end of file bytes_in_file = self.tell() lines_found, total_bytes_scanned = 0, 0 while (lines_2find+1 > lines_found and bytes_in_file > total_bytes_scanned): byte_block = min(1024, bytes_in_file-total_bytes_scanned) self.seek(-(byte_block+total_bytes_scanned), 2) total_bytes_scanned += byte_block lines_found += self.read(1024).count('\n') self.seek(-total_bytes_scanned, 2) line_list = list(self.readlines()) return line_list[-lines_2find:]

Usage:

f = File('path/to/file', 'r') f.head(3) f.tail(3)

Maxim Plaksin · Accepted Answer · 2011-12-07 09:03:39Z

most convinient way on my own:

LINE_COUNT = 3 print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

Solution based on List Comprehension The function open() supports an iteration interface. The enumerate() covers open() and return tuples (index, item), then we check that we're inside an accepted range (if i < LINE_COUNT) and then simply print the result.

Enjoy the Python. ;)

This just seems like a slightly more complex alternative to [next(file) for _ in range(LINE_COUNT)].

Surya Chhetri · Accepted Answer · 2016-10-28 02:36:25Z

For first 5 lines, simply do:

N=5 with open("data_file", "r") as file: for i in range(N): print file.next()

John Machin · Accepted Answer · 2009-11-20 02:00:36Z

If you want something that obviously (without looking up esoteric stuff in manuals) works without imports and try/except and works on a fair range of Python 2.x versions (2.2 to 2.6):

def headn(file_name, n): """Like *x head -N command""" result = [] nlines = 0 assert n >= 1 for line in open(file_name): result.append(line) nlines += 1 if nlines >= n: break return result if __name__ == "__main__": import sys rval = headn(sys.argv[1], int(sys.argv[2])) print rval print len(rval)

Alejandro D. Somoza · Accepted Answer · 2014-11-25 06:25:07Z

If you have a really big file, and assuming you want the output to be a numpy array, using np.genfromtxt will freeze your computer. This is so much better in my experience:

def load_big_file(fname,maxrows): '''only works for well-formed text file of space-separated doubles''' rows = [] # unknown number of lines, so use list with open(fname) as f: j=0 for line in f: if j==maxrows: break else: line = [float(s) for s in line.split()] rows.append(np.array(line, dtype = np.double)) j+=1 return np.vstack(rows) # convert list of vectors to array

If you have a really big file, and assuming you want the output to be a numpy array That's quite a unique set of restrictions, I can't really see any advantages to this over the alternatives.

Linh K Ha · Accepted Answer · 2021-07-10 13:03:24Z

I would like to handle the file with less than n-lines by reading the whole file

def head(filename: str, n: int): try: with open(filename) as f: head_lines = [next(f).rstrip() for x in range(n)] except StopIteration: with open(filename) as f: head_lines = f.read().splitlines() return head_lines

Credit go to John La Rooy and Ilian Iliev. Use the function for the best performance with exception handle

Revise 1: Thanks FrankM for the feedback, to handle file existence and read permission we can futher add

import errno import os def head(filename: str, n: int): if not os.path.isfile(filename): raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), filename) if not os.access(filename, os.R_OK): raise PermissionError(errno.EACCES, os.strerror(errno.EACCES), filename) try: with open(filename) as f: head_lines = [next(f).rstrip() for x in range(n)] except StopIteration: with open(filename) as f: head_lines = f.read().splitlines() return head_lines

You can either go with second version or go with the first one and handle the file exception later. The check is quick and mostly free from performance standpoint

Well this isn't soundproof. Meaning if there is an exception, you try to read the file again, which could throw another exception. This works if the file exists and you got the permissions to read. If not it results in an exception. The accepted answer provides (solution 3) a variant which does the same using islice (reads the whole file, when it has fewer lines). But your solution is better than variant 1 and 2.
Thanks @FrankM for the feedback, please see my revise answer

ibuch · Accepted Answer · 2025-04-08 13:10:13Z

Elaborating on previous answer from G M:

If you want to read the first lines quickly and you care about performance you can use .readlines(n) which reads first n bytes and then slice the list [0:5],
the count of bytes to read is in the sizehint-argument (1024 in the example)

 with open("pathofmyfileandfileandname") as myfile: firstNlines=myfile.readlines(1024)[0:5] # (): max byte count to read from file # []: put here the interval you want

Only some first part of the file (byte count rounded up to next buffer size) is read (which speeds up the process if you are skimming bigger data files).

Syntax Reference in Python File readlines() Method

Steve Bading · Accepted Answer · 2012-12-06 18:02:26Z

Starting at Python 2.6, you can take advantage of more sophisticated functions in the IO base clase. So the top rated answer above can be rewritten as:

 with open("datafile") as myfile: head = myfile.readlines(N) print head

(You don't have to worry about your file having less than N lines since no StopIteration exception is thrown.)

According to the docs N is the number of bytes to read, not the number of lines.
Wow. Talk about poor naming. The function name mentions lines but the argument refers to bytes.

Caconde · Accepted Answer · 2019-08-23 21:53:57Z

0

This worked for me

f = open("history_export.csv", "r") line= 5 for x in range(line): a = f.readline() print(a)

edited Aug 23, 2019 at 21:53

Caconde

4,5237 gold badges39 silver badges34 bronze badges

answered Aug 23, 2019 at 19:18

Sukanta

298 bronze badges

1 Comment

AMC Over a year ago

Why not use a context manager? In any case, I don't see how this improves on the many existing answers.

sandyp · Accepted Answer · 2019-11-11 23:09:16Z

-1

This works for Python 2 & 3:

from itertools import islice with open('/tmp/filename.txt') as inf: for line in islice(inf, N, N+M): print(line)

answered Nov 11, 2019 at 23:09

sandyp

4301 gold badge5 silver badges14 bronze badges

1 Comment

AMC Over a year ago

This is virtually identical to the decade-old top answer.

Shakirul · Accepted Answer · 2020-04-23 14:44:30Z

 fname = input("Enter file name: ") num_lines = 0 with open(fname, 'r') as f: #lines count for line in f: num_lines += 1 num_lines_input = int (input("Enter line numbers: ")) if num_lines_input <= num_lines: f = open(fname, "r") for x in range(num_lines_input): a = f.readline() print(a) else: f = open(fname, "r") for x in range(num_lines_input): a = f.readline() print(a) print("Don't have", num_lines_input, " lines print as much as you can") print("Total lines in the text",num_lines)

shivam singh · Accepted Answer · 2021-10-04 13:23:36Z

Simply Convert your CSV file object to a list using list(file_data)

import csv; with open('your_csv_file.csv') as file_obj: file_data = csv.reader(file_obj); file_list = list(file_data) for row in file_list[:4]: print(row)

Will be horribly slow for huge files, since you'll have to load every single line just to get first 4 of them

Oleksandr Novik · Accepted Answer · 2021-11-20 14:50:54Z

Here's another decent solution with a list comprehension:

file = open('file.txt', 'r') lines = [next(file) for x in range(3)] # first 3 lines will be in this list file.close()

Gelzone · Accepted Answer · 2023-01-06 08:48:46Z

An easy way to get first 10 lines:

with open('fileName.txt', mode = 'r') as file: list = [line.rstrip('\n') for line in file][:10] print(list)

Eric Aya · Accepted Answer · 2017-07-12 16:58:36Z

-2

#!/usr/bin/python import subprocess p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE) output, err = p.communicate() print output

This Method Worked for me

edited Jul 12, 2017 at 16:58

Eric Aya

70.2k36 gold badges190 silver badges266 bronze badges

answered Jul 12, 2017 at 16:25

Mansur Ul Hasan

3,67632 silver badges27 bronze badges

2 Comments

AMC Over a year ago

This isn't really a Python solution, though.

user9608133 Over a year ago

I do not even understand what is written in your answer. Please add some explanation.

Collectives™ on Stack Overflow

How to read first N lines of a file?

21 Answers 21

16 Comments

4 Comments

7 Comments

4 Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

Comments

1 Comment

2 Comments

Comments

2 Comments

1 Comment

1 Comment

Comments

1 Comment

1 Comment

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

21 Answers 21

16 Comments

4 Comments

7 Comments

4 Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

Comments

1 Comment

2 Comments

Comments

2 Comments

1 Comment

1 Comment

Comments

1 Comment

1 Comment

Comments

2 Comments

Linked

Related