3

I have a bunch of files where I want to open, read the first line, parse it into several expected pieces of information, and then put the filenames and those data as rows in a dataframe. My question concerns the recommended syntax to build the dataframe in a pandanic/pythonic way (the file-opening and parsing I already have figured out).

For a dumbed-down example, the following seems to be the recommended thing to do when you want to create one new column:

df = pd.DataFrame(files, columns=['filename']) df['first_letter'] = df.apply(lambda x: x['filename'][:1], axis=1) 

but I can't, say, do this:

df['first_letter'], df['second_letter'] = df.apply(lambda x: (x['filename'][:1], x['filename'][1:2]), axis=1) 

as the apply function creates only one column with tuples in it.

Keep in mind that, in place of the lambda function I will place a function that will open the file and read and parse the first line.

1 Answer 1

7

You can put the two values in a Series, and then it will be returned as a dataframe from the apply (where each series is a row in that dataframe). With a dummy example:

In [29]: df = pd.DataFrame(['Aa', 'Bb', 'Cc'], columns=['filenames']) In [30]: df Out[30]: filenames 0 Aa 1 Bb 2 Cc In [31]: df['filenames'].apply(lambda x : pd.Series([x[0], x[1]])) Out[31]: 0 1 0 A a 1 B b 2 C c 

This you can then assign to two new columns:

In [33]: df[['first', 'second']] = df['filenames'].apply(lambda x : pd.Series([x[0], x[1]])) In [34]: df Out[34]: filenames first second 0 Aa A a 1 Bb B b 2 Cc C c 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.