20

The xlsx package is reading dates in wrongly. I've read all the top similar Q's here and had a scout round the internet but I can't find this particular behaviour where the origin changes if there's non-date data in a column.

I have a tiny Excel spreadsheet you can get from dropbox:

https://www.dropbox.com/s/872q9mzb5uzukws/test.xlsx

It has three rows, two columns. First is a date, second is a number. The third row has "Grand Total" in the date column.

If I read in the first two rows with read.xlsx and tell it the first column is a date then this works:

read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("Date","integer"),endRow=2) X1 X2 1 2014-06-29 49 2 2014-06-30 46 

Those are indeed the dates in the spreadsheet. If I try and read all three rows, something goes wrong:

read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("Date","integer")) X1 X2 1 2084-06-30 49 2 2084-07-01 46 3 <NA> 89251 Warning message: In as.POSIXlt.Date(x) : NAs introduced by coercion 

If I try reading in as integers I get different integers:

> read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("integer","integer"),endRow=2) X1 X2 1 16250 49 2 16251 46 > read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("integer","integer")) X1 X2 1 41819 49 2 41820 46 3 NA 89251 

The first integers are correctly converted using as.Date(s1$X1,origin="1970-01-01") (Unix epoch) and the second integers are correctly converted using as.Date(s2$X1, origin="1899-12-30") (Excel epoch). If I convert the second lot using 1970 I get the 2084 dates.

So: Am I doing something wrong? Is the best thing to read as integers, and if any NAs then convert using Excel epoch, otherwise use Unix epoch? Or is it a bug in the xlsx package?

xlsx version is Version: 0.5.1

3
  • I was about to recommend the XLConnect package, but that seems to have its own problems - I can't get it to read the first row: readWorksheet(loadWorkbook("test.xlsx"),"Sheet1",startRow=0). Weird. Commented Aug 6, 2014 at 11:27
  • @StephanKolassa per default readWorksheet has set header = TRUE. Commented Aug 6, 2014 at 11:42
  • It is almost certainly a bug in xlxs::read.xlsx. Note that if you specify as.data.frame=FALSE to read.xlsx, the in all 4 cases (with and without the third row and with specification of "Date" or "integer"), the numerical values are 41819 or 41820. I'd file an issue with the maintainer. Commented Aug 6, 2014 at 15:17

3 Answers 3

21

The dates can be read as integers and later converted to Date using openxlsx::convertToDate() function.

More here

Sign up to request clarification or add additional context in comments.

1 Comment

This worked great for me, since colClasses does not seem to be an argument of open.xlsx() anymore.
5

XLConnect is able to handle this pretty sweet:

test <- readWorksheetFromFile( "~/Downloads/test.xlsx", sheet = "Sheet1", header = FALSE ) test Col1 Col2 1 2014-06-29 00:00:00 49 2 2014-06-30 00:00:00 46 3 Grand Total 89251 

The Problem you have is obvious, that the first column is of mixed type: character and POSIXct. XLConnect is able to read each cell correctly in, but casts all cells of a column then to the most common type, which is character in this case.

str(test) 'data.frame': 3 obs. of 2 variables: $ Col1: chr "2014-06-29 00:00:00" "2014-06-30 00:00:00" "Grand Total" $ Col2: num 49 46 89251 

2 Comments

yeah that is a problem with R in general, R does not have proper multidimensional list support. you can have listed lists, matrix, and data.frames, but all have their issues.
adding colTypes=c("Date","integer") seems to do the correct conversion and sets the last item to NA. I don't really care about the text in the last item.
3

The problem you're having is that Excel stores the number of days since Jan-0-1900, and that is the number R is reading from the excel file. When you convert in R, you are converting based on the number of days since Jan-1-1970. If you subtract the number of days between those two first, it should work.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.