Skip to main content
AI Assist is now on Stack Overflow. Start a chat to get instant answers from across the network. Sign up to save and share your chats.
added 64 characters in body; edited tags
Source Link
zero323
  • 331.4k
  • 108
  • 982
  • 958

I would like to apply the CounterCounter function to get this:

def flatten_counter(val): return Counter(reduce (lambda x, y:x+y, val)) udf_flatten_counter = sf.udf(flatten_counter, ty.ArrayType(ty.IntegerType())) df3 = df2.select("store", flatten_counter("values2").alias("values3")) df3.show(truncate=False) 
def flatten_counter(val): return Counter(reduce (lambda x, y:x+y, val)) udf_flatten_counter = sf.udf(flatten_counter, ty.ArrayType(ty.IntegerType())) df3 = df2.select("store", flatten_counter("values2").alias("values3")) df3.show(truncate=False) 
df.rdd.map(lambda r: (r.store, r.values)).reduceByKey(lambda x, y: x + y).map(lambda row: Counter(row[1])).toDF(['store', 'values']).show() 
df.rdd.map(lambda r: (r.store, r.values)).reduceByKey(lambda x, y: x + y).map(lambda row: Counter(row[1])).toDF(['store', 'values']).show() 

I would like to apply the Counter function to get this:

def flatten_counter(val): return Counter(reduce (lambda x, y:x+y, val)) udf_flatten_counter = sf.udf(flatten_counter, ty.ArrayType(ty.IntegerType())) df3 = df2.select("store", flatten_counter("values2").alias("values3")) df3.show(truncate=False) 
df.rdd.map(lambda r: (r.store, r.values)).reduceByKey(lambda x, y: x + y).map(lambda row: Counter(row[1])).toDF(['store', 'values']).show() 

I would like to apply the Counter function to get this:

def flatten_counter(val): return Counter(reduce (lambda x, y:x+y, val)) udf_flatten_counter = sf.udf(flatten_counter, ty.ArrayType(ty.IntegerType())) df3 = df2.select("store", flatten_counter("values2").alias("values3")) df3.show(truncate=False) 
df.rdd.map(lambda r: (r.store, r.values)).reduceByKey(lambda x, y: x + y).map(lambda row: Counter(row[1])).toDF(['store', 'values']).show() 
edited body
Source Link
desertnaut
  • 60.8k
  • 32
  • 155
  • 183

From this data frame

+-----+-----------------+ |store| values | +-----+-----------------+ | 1|[1, 2, 3,4, 5, 6]| | 2| [2,3]| +-----+-----------------+ 

I would like to apply the Counter function to get this:

+-----+------------------------------+ |store| values | +-----+------------------------------+ | 1|{1:1, 2:1, 3:1, 4:1, 5:1, 6:!1}| | 2|{2:1, 3:1} | +-----+------------------------------+ 

I got this data frame using the answer of another question :

GroupBy and concat array columns pyspark

So I try to modify the code that is in the answers like this:

Option 1:

def flatten_counter(val): return Counter(reduce (lambda x, y:x+y, val)) udf_flatten_counter = sf.udf(flatten_counter, ty.ArrayType(ty.IntegerType())) df3 = df2.select("store", flatten_counter("values2").alias("values3")) df3.show(truncate=False) 

Option 2:

df.rdd.map(lambda r: (r.store, r.values)).reduceByKey(lambda x, y: x + y).map(lambda row: Counter(row[1])).toDF(['store', 'values']).show() 

but it doesn't work.

Does anybody know how can I do it?

Thank you

From this data frame

+-----+-----------------+ |store| values | +-----+-----------------+ | 1|[1, 2, 3,4, 5, 6]| | 2| [2,3]| +-----+-----------------+ 

I would like to apply the Counter function to get this:

+-----+------------------------------+ |store| values | +-----+------------------------------+ | 1|{1:1, 2:1, 3:1, 4:1, 5:1, 6:!}| | 2|{2:1, 3:1} | +-----+------------------------------+ 

I got this data frame using the answer of another question :

GroupBy and concat array columns pyspark

So I try to modify the code that is in the answers like this:

Option 1:

def flatten_counter(val): return Counter(reduce (lambda x, y:x+y, val)) udf_flatten_counter = sf.udf(flatten_counter, ty.ArrayType(ty.IntegerType())) df3 = df2.select("store", flatten_counter("values2").alias("values3")) df3.show(truncate=False) 

Option 2:

df.rdd.map(lambda r: (r.store, r.values)).reduceByKey(lambda x, y: x + y).map(lambda row: Counter(row[1])).toDF(['store', 'values']).show() 

but it doesn't work.

Does anybody know how can I do it?

Thank you

From this data frame

+-----+-----------------+ |store| values | +-----+-----------------+ | 1|[1, 2, 3,4, 5, 6]| | 2| [2,3]| +-----+-----------------+ 

I would like to apply the Counter function to get this:

+-----+------------------------------+ |store| values | +-----+------------------------------+ | 1|{1:1, 2:1, 3:1, 4:1, 5:1, 6:1}| | 2|{2:1, 3:1} | +-----+------------------------------+ 

I got this data frame using the answer of another question :

GroupBy and concat array columns pyspark

So I try to modify the code that is in the answers like this:

Option 1:

def flatten_counter(val): return Counter(reduce (lambda x, y:x+y, val)) udf_flatten_counter = sf.udf(flatten_counter, ty.ArrayType(ty.IntegerType())) df3 = df2.select("store", flatten_counter("values2").alias("values3")) df3.show(truncate=False) 

Option 2:

df.rdd.map(lambda r: (r.store, r.values)).reduceByKey(lambda x, y: x + y).map(lambda row: Counter(row[1])).toDF(['store', 'values']).show() 

but it doesn't work.

Does anybody know how can I do it?

Thank you

Source Link

Counter function on a ArrayColumn Pyspark

From this data frame

+-----+-----------------+ |store| values | +-----+-----------------+ | 1|[1, 2, 3,4, 5, 6]| | 2| [2,3]| +-----+-----------------+ 

I would like to apply the Counter function to get this:

+-----+------------------------------+ |store| values | +-----+------------------------------+ | 1|{1:1, 2:1, 3:1, 4:1, 5:1, 6:!}| | 2|{2:1, 3:1} | +-----+------------------------------+ 

I got this data frame using the answer of another question :

GroupBy and concat array columns pyspark

So I try to modify the code that is in the answers like this:

Option 1:

def flatten_counter(val): return Counter(reduce (lambda x, y:x+y, val)) udf_flatten_counter = sf.udf(flatten_counter, ty.ArrayType(ty.IntegerType())) df3 = df2.select("store", flatten_counter("values2").alias("values3")) df3.show(truncate=False) 

Option 2:

df.rdd.map(lambda r: (r.store, r.values)).reduceByKey(lambda x, y: x + y).map(lambda row: Counter(row[1])).toDF(['store', 'values']).show() 

but it doesn't work.

Does anybody know how can I do it?

Thank you