0

Do the core transformations (Map, Filter, Flatten) in the apache beam use parallel processing to process the data elements, if yes when should we use ParDo transformation specifically?

2 Answers 2

0

Beam implement the concept of Map and Reduce. All the "MAP" operation, that means the operations can be performed unitary (filter, map, ...), can be done in parallel (on the same server with different thread or on different servers).

All the "REDUCE" operations, that required to compare a set (PCollection) of value together, are performed on the same server/thread.

So, use ParDo when you perform unitary operations, on a single entry in the PCollection.

Sign up to request clarification or add additional context in comments.

Comments

0

I will refer you to apache_beam docs.

Core beam Transforms, ParDo

In simple terms, you use ParDo when you have a "user defined function" that you want to apply to your pipeline, for example you want to split every sentence in a paragraph, into single words.You would want to apply a split() function but split() is not one of the Core beam Transforms, so ParDo lets you smuggle it in.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.