4

How can i parse this csv file in Scala, to extract an object Data contain (date,time,longitude,latitude)

*M…….:Dy4.5

*N……….:14_540

*V…..:N

*S….:1.2.1

*yyyy/mm/dd;hh:mm:ss;long;lat

2016/05/09;12:50:19;-122.45006;38.47320

2016/05/09;13:04:10;-122.45011;38.47317

i already wrote this function but it just read the file, i don't know how to transform it into object

def readData(fileName:String): Vector[Array[String]] = { for { line <- Source.fromFile(fileName).getLines().toVector values = line.split(";").map(_.trim) } yield values } 
1
  • I think you need to define 'values' as a variable or value. You can use a regular expression to find and save particular segments of text. Commented Mar 30, 2017 at 16:37

2 Answers 2

2

You can use scala type matching for this to build up on Anastasiia Kharchenko's response

def readData(fileName:String): Vector[Data] = { for { line <- Source.fromFile(fileName).getLines().toVector data <- parseCsvLine(line) } yield data } def parseCsvLine(line: String): Option[Data] = { line.split(";").toVector.map(_.trim) match { case Vector(date, time, longitude, latitude) => Some(Data(date, time, longitude, latitude)) case _ => println(s"WARNING UNKNOWN DATA FORMAT FOR LINE: $line") None } } } 
Sign up to request clarification or add additional context in comments.

3 Comments

your solution work perfectly, thanks Can you please help me to convert date from String to a type Date. i tried that, but I got always a NullPointerException: val format = new java.text.SimpleDateFormat("yyyy/MM/dd") Some(Data(format.parse(date),....
So first I would make sure everything was parsed correctly, then I would have something in the case class for the data (if we continue with the example above) case class Data(date: String, time: String, longitude: String, latitude: String) { def getDate(): java.util.Date = { val format = new java.text.SimpleDateFormat("yyyy/MM/dd"); format.parse(date) } } I would not change the date in the parseCsv function as the function should only focus on parsing a csv line
i have used java.time.format.DateTimeFormatter and it resolve my issue, because it's not practical to redefine getter for each attribute, Thank a lot for your help
2

Assuming you have class Data

case class Data(date: String, time: String, longitude: String, latitude: String) 

(date and time are strings just for simplifying example).

The code below will give you a vector of Data objects

def readData(fileName:String): Vector[Data] = { for { line <- Source.fromFile(fileName).getLines().toVector values = line.split(",").map(_.trim) date = Date(values(0), values(1), values(2), values(3)) } yield date } 

1 Comment

When I try your solution I go the error Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.