2

I've a very simple question: which is the most efficient way to read different entries from a txt file with Python?

Suppose I've a text file like:

42017 360940084.621356 21.00 09/06/2015 13:08:04 42017 360941465.680841 29.00 09/06/2015 13:31:05 42017 360948446.517761 16.00 09/06/2015 15:27:26 42049 361133954.539315 31.00 11/06/2015 18:59:14 42062 361208584.222483 10.00 12/06/2015 15:43:04 42068 361256740.238150 19.00 13/06/2015 05:05:40 

In C I would do:

while(fscanf(file_name, "%d %lf %f %d/%d/%d %d:%d:%d", &id, &t0, &score, &day, &month, &year, &hour, &minute, &second) != EOF){...some instruction...} 

What would be the best way to do something like this in Python? In order to store every value into a different variable (since I've got to work with those variables throughout the code).

Thanks in advance!

3

3 Answers 3

2

I feel like the muddyfish answer is good, here is another way (maybe a bit lighter)

import time with open(file) as f: for line in f: identifier, t0, score, date, hour = line.split() # You can also get a time_struct from the time timer = time.strptime(date + hour, "%d/%m/%Y%H:%M:%S") 
Sign up to request clarification or add additional context in comments.

3 Comments

note that id is a reserved word. If you want to use it as an identifier, use id_ = value instead
Thanks FunkySayu! I also ended up to something similar... since I need each single entry (day, month, year, etc.), I was wondering whether there is a faster way or do I have to use line.split("/") and line.split(":") another time?
The point is that I've got to work with each single entry (like make operations with the t0 and the different days and months), so I need to store data into different variables
0

I would look up the string.split() method

For example you could use

for line in file.readlines(): data = dict(zip(("id", "t0", "score", "date", "time"), line.split(" "))) instructions() 

Comments

0

Depending on what you want to do with the data, pandas may be something to look into:

import pandas as pd with open(file_name) as infile: df = pd.read_fwf(infile, header=None, parse_dates=[[3, 4]], date_parser=lambda x: pd.to_datetime(x, format='%d/%m/%Y %H:%M:%S')) 

The double list [[3, 4]], together with the date_parser argument, will read the the third and fourth (0-indexed) columns as a single data-time object. You can then access individual parts of that column with

>>> df['3_4'].dt.hour 0 13 1 13 2 15 3 18 4 15 5 5 dtype: int64 

(If you don't like the '3_4' key, use the parse_dates argument above as follows:

parse_dates={'timestamp': [3, 4]} 

)

read_fwf is for reading fixed width columns, which your data seems to adhere to. Alternatively, there are functions such as read_csv, read_table and a lot more.

(This answer is pretty much a duplicate of this SO answer, but since this question here is more general, I've put this here as another answer, not as a comment.)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.