Hot Linked Questions

1 vote

0 answers

64 views

How to add an id column to a variable rows dataframe in pyspark [duplicate]

I'm searching the way to add a column 'id' to my dataframe (dfProc) with sequencial numbers from 1 (or zero) to number of rows (in this example it has 10 rows but my df has variable rows). The ...

Jeaslf

11

asked May 25, 2016 at 8:26

0 votes

0 answers

57 views

Is there a way to generate rownumber without converting the dataframe into rdd in pyspark 1.3.1? [duplicate]

I am using pyspark 1.3.1, I need to generate unique id/number for each row in a dataframe. Since window functions are not available with Pyspark Version:1.3.1, I am not able to make use of rownumber ...

Mohan

907

asked Apr 11, 2016 at 7:44

192 votes

11 answers

555k views

How do I add a new column to a Spark DataFrame (using PySpark)?

I have a Spark DataFrame (using PySpark 1.5.1) and would like to add a new column. I've tried the following without any success: type(randomed_hours) # => list # Create in Python and transform ...

Boris

2,125

asked Nov 12, 2015 at 21:14

34 votes

1 answer

68k views

how to add Row id in pySpark dataframes [duplicate]

I have a csv file; which i convert to DataFrame(df) in pyspark; after some transformation; I want to add a column in df; which should be simple row id (starting from 0 or 1 to N). I converted df in ...

ankit patel

1,509

asked Aug 19, 2015 at 4:28

16 votes

4 answers

85k views

How to select last row and also how to access PySpark dataframe by index?

From a PySpark SQL dataframe like name age city abc 20 A def 30 B How to get the last row.(Like by df.limit(1) I can get first row of dataframe into new dataframe). And how can I access the ...

Satya

5,947

asked Sep 17, 2016 at 8:48

7 votes

2 answers

31k views

Is there a way to slice dataframe based on index in pyspark?

In python or R, there are ways to slice DataFrame using index. For example, in pandas: df.iloc[5:10,:] Is there a similar way in pyspark to slice data based on location of rows?

Gavin

1,521

asked Oct 13, 2018 at 12:06

6 votes

1 answer

9k views

Spark: equivelant of zipwithindex in dataframe

Assuming I am having the following dataframe: dummy_data = [('a',1),('b',25),('c',3),('d',8),('e',1)] df = sc.parallelize(dummy_data).toDF(['letter','number']) And i want to create the following ...

Mpizos Dimitris

5,041

asked Aug 20, 2016 at 19:07

8 votes

1 answer

8k views

Stack Spark dataframes horizontally - equivalent to pandas concat or r cbind

Is there a way to append a dataframe horizontally to another one - assuming both have identical number of rows? This would be the equivalent of pandas concat by axis=1; result = pd.concat([df1, df4],...

WestCoastProjects

63.9k

asked Apr 10, 2018 at 21:19

-4 votes

2 answers

11k views

Spark sql row_number or sequence number?

random number generator SparkSQL ? For example: Netezza: sequence number mysql: sequence number Thanks.

sri hari kali charan Tummala

1,134

asked Dec 7, 2015 at 15:36

0 votes

2 answers

6k views

How to create row_index for a Spark dataframe using window.partionBy()?

I have a Dataframe with single column like shown below. Type 'BAT' 'BAT' 'BALL' 'BAT' 'BALL' 'BALL' To the above dataframe I have added a new column called 'const'. df = df.withColumn('const',F.lit(...

GeorgeOfTheRF

8,944

asked Jan 16, 2018 at 10:25

1 vote

0 answers

1k views

Pyspark dataframe add list as column

I have a python list (p_list) with 0 and 1 with as many elements as a spark dataframe that has one column only (all elements, are like: 'imaj7felb438l6hk', ....). And I am trying to add this list as ...

ARS

11

asked Nov 2, 2017 at 12:07

Collectives™ on Stack Overflow

Linked Questions

How to add an id column to a variable rows dataframe in pyspark [duplicate]

Is there a way to generate rownumber without converting the dataframe into rdd in pyspark 1.3.1? [duplicate]

How do I add a new column to a Spark DataFrame (using PySpark)?

how to add Row id in pySpark dataframes [duplicate]

How to select last row and also how to access PySpark dataframe by index?

Is there a way to slice dataframe based on index in pyspark?

Spark: equivelant of zipwithindex in dataframe

Stack Spark dataframes horizontally - equivalent to pandas concat or r cbind

Spark sql row_number or sequence number?

How to create row_index for a Spark dataframe using window.partionBy()?

Pyspark dataframe add list as column

Hot Network Questions