Linked Questions

1 vote
0 answers
64 views

I'm searching the way to add a column 'id' to my dataframe (dfProc) with sequencial numbers from 1 (or zero) to number of rows (in this example it has 10 rows but my df has variable rows). The ...
Jeaslf's user avatar
  • 11
0 votes
0 answers
57 views

I am using pyspark 1.3.1, I need to generate unique id/number for each row in a dataframe. Since window functions are not available with Pyspark Version:1.3.1, I am not able to make use of rownumber ...
Mohan's user avatar
  • 907
192 votes
11 answers
555k views

I have a Spark DataFrame (using PySpark 1.5.1) and would like to add a new column. I've tried the following without any success: type(randomed_hours) # => list # Create in Python and transform ...
Boris's user avatar
  • 2,125
34 votes
1 answer
68k views

I have a csv file; which i convert to DataFrame(df) in pyspark; after some transformation; I want to add a column in df; which should be simple row id (starting from 0 or 1 to N). I converted df in ...
ankit patel's user avatar
  • 1,509
16 votes
4 answers
85k views

From a PySpark SQL dataframe like name age city abc 20 A def 30 B How to get the last row.(Like by df.limit(1) I can get first row of dataframe into new dataframe). And how can I access the ...
Satya's user avatar
  • 5,947
7 votes
2 answers
31k views

In python or R, there are ways to slice DataFrame using index. For example, in pandas: df.iloc[5:10,:] Is there a similar way in pyspark to slice data based on location of rows?
Gavin's user avatar
  • 1,521
6 votes
1 answer
9k views

Assuming I am having the following dataframe: dummy_data = [('a',1),('b',25),('c',3),('d',8),('e',1)] df = sc.parallelize(dummy_data).toDF(['letter','number']) And i want to create the following ...
Mpizos Dimitris's user avatar
8 votes
1 answer
8k views

Is there a way to append a dataframe horizontally to another one - assuming both have identical number of rows? This would be the equivalent of pandas concat by axis=1; result = pd.concat([df1, df4],...
WestCoastProjects's user avatar
-4 votes
2 answers
11k views

random number generator SparkSQL ? For example: Netezza: sequence number mysql: sequence number Thanks.
sri hari kali charan Tummala's user avatar
0 votes
2 answers
6k views

I have a Dataframe with single column like shown below. Type 'BAT' 'BAT' 'BALL' 'BAT' 'BALL' 'BALL' To the above dataframe I have added a new column called 'const'. df = df.withColumn('const',F.lit(...
GeorgeOfTheRF's user avatar
1 vote
0 answers
1k views

I have a python list (p_list) with 0 and 1 with as many elements as a spark dataframe that has one column only (all elements, are like: 'imaj7felb438l6hk', ....). And I am trying to add this list as ...
ARS's user avatar
  • 11