2

Suppose I have got a list of key-value pairs:

kvs = [('x', 0), ('a', 1)] 

Now I'd like to create a Spark Row from kvs with the same order of keys as in kvs.
How to do it in Python ?

3
  • convert I to dict and use Row(**kvs) Commented Oct 1, 2017 at 11:02
  • It does not preserve the order of the pairs. Commented Oct 1, 2017 at 11:10
  • you can use OrderedDict stackoverflow.com/questions/38253385/… Commented Oct 1, 2017 at 11:10

2 Answers 2

1

I haven't run it yet but may you check once I will edit after running if fails.

from pyspark.sql import Row kvs = [('x', 0), ('a', 1)] h = {} [h.update({k:v}) for k,v in kvs] row = Row(**h) 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks but it does not preserve the order of the pairs in kvs,
Check how to preserve order using OrderedDict stackoverflow.com/questions/38253385/…
1

You can:

from pyspark.sql import Row Row(*[k for k, _ in kvs])(*[v for _, v in kvs]) 

but in my opinion it is better to avoid Row whatsoever. Other than being a convenient class to represent local values fetched from the JVM backend, it has no special meaning in Spark. In almost every context:

tuple(v for _, v in kvs) 

is perfectly valid replacement for Row.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.