There are two DataFrames (Scala, Apache Spark 1.6.1)
1) Matches
MatchID | Player1 | Player2 -------------------------------- 1 | John Wayne | John Doe 2 | Ive Fish | San Simon 2) Personal Data
Player | BirthYear -------------------------------- John Wayne | 1986 Ive Fish | 1990 San Simon | 1974 john Doe | 1995 How could create a new DataFrame with 'BirthYear' for the both players
MatchID | Player1 | Player2 | BYear_P1 |BYear_P2 | Diff ------------------------------------------------------------- 1 | John Wayne | John Doe | 1986 | 1995 | 9 2 | Ive Fish | San Simon | 1990 | 1974 | 16 ?
I tried
val df = MatchesDF.join(PersonalDF, MatchesDF("Player1") === PersonalDF("Player")) then join again for the second player
val resDf = df.join(PersonalDF, df("Player2") === PersonalDF("Player")) but it's VERY time consuming operation.
May be another way to do it in Scala and Apache Spark?