The xlsx package is reading dates in wrongly. I've read all the top similar Q's here and had a scout round the internet but I can't find this particular behaviour where the origin changes if there's non-date data in a column.
I have a tiny Excel spreadsheet you can get from dropbox:
https://www.dropbox.com/s/872q9mzb5uzukws/test.xlsx
It has three rows, two columns. First is a date, second is a number. The third row has "Grand Total" in the date column.
If I read in the first two rows with read.xlsx and tell it the first column is a date then this works:
read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("Date","integer"),endRow=2) X1 X2 1 2014-06-29 49 2 2014-06-30 46 Those are indeed the dates in the spreadsheet. If I try and read all three rows, something goes wrong:
read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("Date","integer")) X1 X2 1 2084-06-30 49 2 2084-07-01 46 3 <NA> 89251 Warning message: In as.POSIXlt.Date(x) : NAs introduced by coercion If I try reading in as integers I get different integers:
> read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("integer","integer"),endRow=2) X1 X2 1 16250 49 2 16251 46 > read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("integer","integer")) X1 X2 1 41819 49 2 41820 46 3 NA 89251 The first integers are correctly converted using as.Date(s1$X1,origin="1970-01-01") (Unix epoch) and the second integers are correctly converted using as.Date(s2$X1, origin="1899-12-30") (Excel epoch). If I convert the second lot using 1970 I get the 2084 dates.
So: Am I doing something wrong? Is the best thing to read as integers, and if any NAs then convert using Excel epoch, otherwise use Unix epoch? Or is it a bug in the xlsx package?
xlsx version is Version: 0.5.1
XLConnectpackage, but that seems to have its own problems - I can't get it to read the first row:readWorksheet(loadWorkbook("test.xlsx"),"Sheet1",startRow=0). Weird.readWorksheethas setheader = TRUE.xlxs::read.xlsx. Note that if you specifyas.data.frame=FALSEtoread.xlsx, the in all 4 cases (with and without the third row and with specification of"Date"or"integer"), the numerical values are41819or41820. I'd file an issue with the maintainer.