pd.dataframe.apply() create multiple new columns

Question

I have a bunch of files where I want to open, read the first line, parse it into several expected pieces of information, and then put the filenames and those data as rows in a dataframe. My question concerns the recommended syntax to build the dataframe in a pandanic/pythonic way (the file-opening and parsing I already have figured out).

For a dumbed-down example, the following seems to be the recommended thing to do when you want to create one new column:

df = pd.DataFrame(files, columns=['filename']) df['first_letter'] = df.apply(lambda x: x['filename'][:1], axis=1)

but I can't, say, do this:

df['first_letter'], df['second_letter'] = df.apply(lambda x: (x['filename'][:1], x['filename'][1:2]), axis=1)

as the apply function creates only one column with tuples in it.

Keep in mind that, in place of the lambda function I will place a function that will open the file and read and parse the first line.

joris · Accepted Answer · 2014-05-23 22:24:37Z

You can put the two values in a Series, and then it will be returned as a dataframe from the apply (where each series is a row in that dataframe). With a dummy example:

In [29]: df = pd.DataFrame(['Aa', 'Bb', 'Cc'], columns=['filenames']) In [30]: df Out[30]: filenames 0 Aa 1 Bb 2 Cc In [31]: df['filenames'].apply(lambda x : pd.Series([x[0], x[1]])) Out[31]: 0 1 0 A a 1 B b 2 C c

This you can then assign to two new columns:

In [33]: df[['first', 'second']] = df['filenames'].apply(lambda x : pd.Series([x[0], x[1]])) In [34]: df Out[34]: filenames first second 0 Aa A a 1 Bb B b 2 Cc C c

Collectives™ on Stack Overflow

pd.dataframe.apply() create multiple new columns

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related