Return to Answer

More precise answer

edited Jun 27, 2017 at 12:15

13.4k
4
44
69

In order to convert a DataFrame to a Dataset you need to have an Encoder. You can do it by simply adding a context bound on and Encoder for T:

def read[T <: Product : Encoder](sql : String): Dataset[T] = { import sparkSession.implicits._ val sqlContext = sparkSession.sqlContext val df: DataFrame = sqlContext.read.option("query", sql).load() df.as[T] }

A context bound is syntactic sugar for the following:

def read[T <: Product](sql : String)(implicit $ev: Encoder[T]): Dataset[T]

which means that you need to have in the implicit context one (and only one) instance of an Encoder[T].

This is needed because the as method itself requires this context bound.

Spark itself can provide you most of the Encoders you may need (primitives, Strings and case classes so far) by importing (as you did) the implicits for your SparkSession. These, however, must be available in the implicit scope at call site, meaning that what you want to have is probably more like the following:

def read[T <: Product : Encoder](spark: SparkSession, sql: String): Dataset[T] = { import spark.implicits._ val df: DataFrame = spark.sqlContext.read.option("query", sql).load() df.as[T] } val spark: SparkSession = ??? // your SparkSession object import spark.implicits._ val ds: Dataset[YourType] = read[YourType](spark, "select something from a_table")

In order to convert a DataFrame to a Dataset you need to have an Encoder. You can do it by simply adding a context bound on and Encoder for T:

def read[T <: Product : Encoder](sql : String): Dataset[T] = { import sparkSession.implicits._ val sqlContext = sparkSession.sqlContext val df: DataFrame = sqlContext.read.option("query", sql).load() df.as[T] }

A context bound is syntactic sugar for the following:

def read[T <: Product](sql : String)(implicit $ev: Encoder[T]): Dataset[T]

which means that you need to have in the implicit context one (and only one) instance of an Encoder[T].

This is needed because the as method itself requires this context bound.

Spark itself can provide you most of the Encoders you may need (primitives, Strings and case classes so far) by importing (as you did) the implicits for your SparkSession.

In order to convert a DataFrame to a Dataset you need to have an Encoder. You can do it by simply adding a context bound on and Encoder for T:

def read[T <: Product : Encoder](sql : String): Dataset[T] = { import sparkSession.implicits._ val sqlContext = sparkSession.sqlContext val df: DataFrame = sqlContext.read.option("query", sql).load() df.as[T] }

A context bound is syntactic sugar for the following:

def read[T <: Product](sql : String)(implicit $ev: Encoder[T]): Dataset[T]

which means that you need to have in the implicit context one (and only one) instance of an Encoder[T].

This is needed because the as method itself requires this context bound.

def read[T <: Product : Encoder](spark: SparkSession, sql: String): Dataset[T] = { import spark.implicits._ val df: DataFrame = spark.sqlContext.read.option("query", sql).load() df.as[T] } val spark: SparkSession = ??? // your SparkSession object import spark.implicits._ val ds: Dataset[YourType] = read[YourType](spark, "select something from a_table")

Source Link

answered Jun 27, 2017 at 11:28

stefanobaghino

13.4k
4
44
69

In order to convert a DataFrame to a Dataset you need to have an Encoder. You can do it by simply adding a context bound on and Encoder for T:

def read[T <: Product : Encoder](sql : String): Dataset[T] = { import sparkSession.implicits._ val sqlContext = sparkSession.sqlContext val df: DataFrame = sqlContext.read.option("query", sql).load() df.as[T] }

A context bound is syntactic sugar for the following:

def read[T <: Product](sql : String)(implicit $ev: Encoder[T]): Dataset[T]

which means that you need to have in the implicit context one (and only one) instance of an Encoder[T].

This is needed because the as method itself requires this context bound.

Spark itself can provide you most of the Encoders you may need (primitives, Strings and case classes so far) by importing (as you did) the implicits for your SparkSession.

Collectives™ on Stack Overflow

Return to Answer