I have a dataset, which contains lines in the format (tab separated):
Title<\t>Text Now for every word in Text, I want to create a (Word,Title) pair. For instance:
ABC Hello World gives me
(Hello, ABC) (World, ABC) Using Scala, I wrote the following:
val file = sc.textFile("s3n://file.txt") val title = file.map(line => line.split("\t")(0)) val wordtitle = file.map(line => (line.split("\t")(1).split(" ").map(word => (word, line.split("\t")(0))))) But this gives me the following output:
[Lscala.Tuple2;@2204b589 [Lscala.Tuple2;@632a46d1 [Lscala.Tuple2;@6c8f7633 [Lscala.Tuple2;@3e9945f3 [Lscala.Tuple2;@40bf74a0 [Lscala.Tuple2;@5981d595 [Lscala.Tuple2;@5aed571b [Lscala.Tuple2;@13f1dc40 [Lscala.Tuple2;@6bb2f7fa [Lscala.Tuple2;@32b67553 [Lscala.Tuple2;@68d0b627 [Lscala.Tuple2;@8493285 How do I solve this?
Further reading
What I want to achieve is to count the number of Words that occur in a Text for a particular Title.
The subsequent code that I have written is:
val wordcountperfile = file.map(line => (line.split("\t")(1).split(" ").flatMap(word => word), line.split("\t")(0))).map(word => (word, 1)).reduceByKey(_ + _) But it does not work. Please feel free to give your inputs on this. Thanks!