1

I have DataFrame like this:

+----------+---+ | code |idn| +----------+---+ | [I0478]| 0| | [B0527]| 1| | [C0798]| 2| | [C0059]| 3| | [I0767]| 4| | [I1001]| 5| | [C0446]| 6| +----------+---+ 

And i want to add new column to DataFrame

+----------+---+------+ | code |idn| item | +----------+---+------+ | [I0478]| 0| I0478| | [B0527]| 1| B0527| | [C0798]| 2| C0798| | [C0059]| 3| C0059| | [I0767]| 4| I0767| | [I1001]| 5| I1001| | [C0446]| 6| C0446| +----------+---+------+ 

Please help me do this!

0

4 Answers 4

3

Use []:

df.withColumn("item", df["item"][0]) 
Sign up to request clarification or add additional context in comments.

2 Comments

AnalysisException: u"Field name should be String Literal, but it's 0;"
@PhongNguyen I had the same issue even with []. Turned out my column was actually Struct rather than Array..
1

So the problem will be evident if you look at the schema - the column you are trying to subset is not an array. So the solution is to .* expand the column.

df.select('code.*', 'idn') 

Comments

1

python

import pandas as pd array = {'code': [['I0478'],['B0527'], ['C0798'], ['C0059'], ['I0767'], ['I1001'], ['C0446']], 'idn':[0, 1, 2, 3, 4, 5, 6]} df = pd.DataFrame(array) df['item'] = df.apply(lambda row: str(row.code).lstrip('[').rstrip(']').strip("'").strip(), axis= 1) print(df) 

1 Comment

The question is about apache-spark, not pandas.
-1
df.withColumn("item", df["code"][0]) 

If the "item" column is Array type, if it's Struct of string, you may need to inspect the key of the element of item by df.select("code").collect()[0], see what key(string) it has.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.