Skip to main content
More precise answer
Source Link
stefanobaghino
  • 13.4k
  • 4
  • 44
  • 69

In order to convert a DataFrame to a Dataset you need to have an Encoder. You can do it by simply adding a context bound on and Encoder for T:

def read[T <: Product : Encoder](sql : String): Dataset[T] = { import sparkSession.implicits._ val sqlContext = sparkSession.sqlContext val df: DataFrame = sqlContext.read.option("query", sql).load() df.as[T] } 

A context bound is syntactic sugar for the following:

def read[T <: Product](sql : String)(implicit $ev: Encoder[T]): Dataset[T] 

which means that you need to have in the implicit context one (and only one) instance of an Encoder[T].

This is needed because the as method itself requires this context bound.

Spark itself can provide you most of the Encoders you may need (primitives, Strings and case classes so far) by importing (as you did) the implicits for your SparkSession. These, however, must be available in the implicit scope at call site, meaning that what you want to have is probably more like the following:

def read[T <: Product : Encoder](spark: SparkSession, sql: String): Dataset[T] = { import spark.implicits._ val df: DataFrame = spark.sqlContext.read.option("query", sql).load() df.as[T] } val spark: SparkSession = ??? // your SparkSession object import spark.implicits._ val ds: Dataset[YourType] = read[YourType](spark, "select something from a_table") 

In order to convert a DataFrame to a Dataset you need to have an Encoder. You can do it by simply adding a context bound on and Encoder for T:

def read[T <: Product : Encoder](sql : String): Dataset[T] = { import sparkSession.implicits._ val sqlContext = sparkSession.sqlContext val df: DataFrame = sqlContext.read.option("query", sql).load() df.as[T] } 

A context bound is syntactic sugar for the following:

def read[T <: Product](sql : String)(implicit $ev: Encoder[T]): Dataset[T] 

which means that you need to have in the implicit context one (and only one) instance of an Encoder[T].

This is needed because the as method itself requires this context bound.

Spark itself can provide you most of the Encoders you may need (primitives, Strings and case classes so far) by importing (as you did) the implicits for your SparkSession.

In order to convert a DataFrame to a Dataset you need to have an Encoder. You can do it by simply adding a context bound on and Encoder for T:

def read[T <: Product : Encoder](sql : String): Dataset[T] = { import sparkSession.implicits._ val sqlContext = sparkSession.sqlContext val df: DataFrame = sqlContext.read.option("query", sql).load() df.as[T] } 

A context bound is syntactic sugar for the following:

def read[T <: Product](sql : String)(implicit $ev: Encoder[T]): Dataset[T] 

which means that you need to have in the implicit context one (and only one) instance of an Encoder[T].

This is needed because the as method itself requires this context bound.

Spark itself can provide you most of the Encoders you may need (primitives, Strings and case classes so far) by importing (as you did) the implicits for your SparkSession. These, however, must be available in the implicit scope at call site, meaning that what you want to have is probably more like the following:

def read[T <: Product : Encoder](spark: SparkSession, sql: String): Dataset[T] = { import spark.implicits._ val df: DataFrame = spark.sqlContext.read.option("query", sql).load() df.as[T] } val spark: SparkSession = ??? // your SparkSession object import spark.implicits._ val ds: Dataset[YourType] = read[YourType](spark, "select something from a_table") 
Source Link
stefanobaghino
  • 13.4k
  • 4
  • 44
  • 69

In order to convert a DataFrame to a Dataset you need to have an Encoder. You can do it by simply adding a context bound on and Encoder for T:

def read[T <: Product : Encoder](sql : String): Dataset[T] = { import sparkSession.implicits._ val sqlContext = sparkSession.sqlContext val df: DataFrame = sqlContext.read.option("query", sql).load() df.as[T] } 

A context bound is syntactic sugar for the following:

def read[T <: Product](sql : String)(implicit $ev: Encoder[T]): Dataset[T] 

which means that you need to have in the implicit context one (and only one) instance of an Encoder[T].

This is needed because the as method itself requires this context bound.

Spark itself can provide you most of the Encoders you may need (primitives, Strings and case classes so far) by importing (as you did) the implicits for your SparkSession.