The second version is returning Option itself for the bar case class field, thus you are not getting primitive value as the first version. you can use the following code for primitive values
def buildRowWithSchemaV2(record: MyCaseClass): Row = { val recordValues: Array[Any] = record.getClass.getDeclaredFields.map((field) => { field.setAccessible(true) val returnValue = field.get(record) if(returnValue.isInstanceOf[Option[String]]){ returnValue.asInstanceOf[Option[String]].get } else returnValue }) new GenericRowWithSchema(recordValues, Encoders.product[MyCaseClass].schema) } But meanwhile I would suggest you to use DataFrame or DataSet.
DataFrame and DataSet are themselves collections of Row with schema.
So when you have a case class defined, you just need to encode your input data into case class For example: lets say you have input data as
val data = Seq(("test1", "value1"),("test2", "value2"),("test3", "value3"),("test4", null)) If you have a text file you can read it with sparkContext.textFile and split according to your need.
Now when you have converted your data to RDD, converting it to dataframe or dataset is two lines code
import sqlContext.implicits._ val dataFrame = data.map(d => MyCaseClass(d._1, Option(d._2))).toDF .toDS would generate dataset Thus you have collection of Rows with schema
for validation you can do the followings
println(dataFrame.schema) //for checking if there is schema println(dataFrame.take(1).getClass.getName) //for checking if it is a collection of Rows Hope you have the right answer.