1

I am trying to store several columns of the data, and then merge them with the other data when necessary. However, the data could only be merged with other data when I created it at the start, and it fails after fwrite and fread the data.

region <- climatebound_rast |> as.data.table(xy = TRUE) fwrite(region, "region.csv") region2 <- fread("region.csv", stringsAsFactors = TRUE) all.equal(region,region2) [1] TRUE TEST <- raster_to_table[Variable=="npp"&Model=="DLEM", 1:4] # 1. Merge raster data with region data merged_region <- merge(TEST, region, all.x = TRUE) |> drop_na(region) merged_region Key: <x, y> x y Model Variable region <num> <num> <char> <char> <fctr> 1: 87.99277 49.0312 DLEM npp MNG 2: 88.49277 48.5312 DLEM npp MNG 3: 88.49277 49.0312 DLEM npp MNG 4: 88.99277 48.0312 DLEM npp MNG 5: 88.99277 48.5312 DLEM npp MNG --- 5846: 161.49277 68.5312 DLEM npp RUS 5847: 161.99277 68.5312 DLEM npp RUS 5848: 161.99277 69.0312 DLEM npp RUS 5849: 162.49277 68.5312 DLEM npp RUS 5850: 162.49277 69.0312 DLEM npp RUS merged_region2 <- merge(TEST, region2, all.x = TRUE) |> drop_na(region) merged_region2 Key: <x, y> Empty data.table (0 rows and 5 cols): x,y,Model,Variable,region 
4
  • all.equal tests if the values are equal, but there may be more going on there. Compare str(region) and str(region2) as a next step. Commented May 12 at 17:55
  • Although not relevant to solving the problem, I'd like to ask where the as.data.table and raster_to_table functions come from the package? Commented May 13 at 2:48
  • @Breeze, raster_to_table is an object instead of a function. The function as.data.table is from the package data.table. Commented May 13 at 3:13
  • @GregorThomas OK, thanks Commented May 13 at 3:13

1 Answer 1

5

all.equal() is not the same as identical()

What is happening here is that you are trying to join on a float which cannot be precisely represented and is altered by saving to csv. Here's a simple example:

library(data.table) dat1 <- data.table(x = 1/3, label = "one third") dat2 <- data.table(x = 1/3) 

It's possible (though inadvisable - see below) to merge these:

merge(dat1, dat2) # Key: <x> # x label # <num> <char> # 1: 0.3333333 one third 

However, let's write dat1 to a csv and then read it back in. Floating-point numbers are stored in binary format with finite precision, meaning decimal fractions which cannot be expressed as a power of 2 (like 1/3) cannot be represented exactly. When written to a csv, values which cannot be expressed precisely as decimals are rounded to a limited number of decimal places, and reading them back introduces tiny errors.

fwrite(dat1, "./dat.csv") dat1_fread <- fread("./dat.csv") all.equal(dat1, dat1_fread) # TRUE identical(dat1, dat1_fread) # FALSE 

You can see that dat1 and dat1_fread are all.equal(), as this is a utility to compare R objects x and y testing ‘near equality’. However, they are not identical(). Joins require exact matches. And so we don't return any results if we join on x:

merge(dat1_fread, dat2) # Empty data.table (0 rows and 2 cols): x,label 

You can see the issue if you print the values with high enough precision:

sprintf("%.20f", dat1$x) # [1] "0.33333333333333331483" sprintf("%.20f", dat2$x) # [1] "0.33333333333333331483" sprintf("%.20f", dat1_fread$x) # [1] "0.33333333333333298176" 

Interestingly, the text value of x stored in dat.csv is 0.333333333333333, which is not the same as the value when it is read back in. This is because when fwrite() saves the csv, it rounds 1/3 to 15 significant digits. When fread() reads the csv, it sees a number which cannot be exactly represented in binary. It then represents it as 0.33333333333333298176.... This is a fundamental limitation of floating-point arithmetic, not an R issue.

How to resolve this

The best plan is not to join on a float. If you have to join on a float, you could round, if you can afford to lose some precision (which essentially you can because you can't rely on floats being stored with very high accuracy). If you don't want to round, don't save your file to csv but serialize to some binary format instead. For example, rds:

saveRDS(dat1, "./dat.rds") dat1_rds <- readRDS("./dat.rds") merge(dat1_rds, dat2) # Key: <x> # x label # <num> <char> # 1: 0.3333333 one third 
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot! this is an excellent answer with clear examples and explanations.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.