0

I have a code written in R that extracts data from an NC file to compute a time series for a specific location using latitude and longitude. However, the file only outputs three values instead of the entire date timeline. Why is this happening?

Additionally, I received an error message.

`x` must not contain list or matrix columns:✖ invalid columns at index(s): 1 Run `rlang::last_trace()` to see where the error occurred. 

How can I fix this?

install.packages("ncdf4") install.packages("tidyverse") install.packages("lubridate") library(ncdf4) library(tidyverse) library(lubridate) # Step 1: Define the folder containing NetCDF files and output path folder_path <- "C:/Users/WINDOWS 10/Downloads/MSWEP/Daily" output_csv <- "C:/Users/WINDOWS 10/Downloads/Full_Precipitation_Timeseries2.csv" # Define the target latitude and longitude target_lat <- 14.8903 target_lon <- -19.2321 # Initialize an empty data frame to store the results all_time_series <- data.frame(Date = as.Date(character()), Precipitation = numeric()) # Step 2: Process each NetCDF file in the folder nc_files <- list.files(folder_path, pattern = "\\.nc$", full.names = TRUE) for (file in nc_files) { cat("\nProcessing file:", file, "\n") # Open the NetCDF file nc <- nc_open(file) # Extract latitude, longitude, and time variables latitudes <- ncvar_get(nc, "lat") # Adjust "lat" if variable name differs longitudes <- ncvar_get(nc, "lon") # Adjust "lon" if variable name differs time <- ncvar_get(nc, "time") # Adjust "time" if variable name differs # Convert time to dates time_units <- ncatt_get(nc, "time", "units")$value cat("Time units:", time_units, "\n") time_origin <- strsplit(time_units, "since ")[[1]][2] if (!is.null(time_origin)) { dates <- as.Date(time, origin = time_origin) cat("Sample dates:", head(dates), "\n") } else { cat("Warning: Time origin not found in file:", file, "\n") nc_close(nc) next } # Enforce `dates` as a Date vector dates <- as.Date(dates) # Find the nearest grid point indices for target lat/lon lat_idx <- which.min(abs(latitudes - target_lat)) lon_idx <- which.min(abs(longitudes - target_lon)) cat("Latitude index:", lat_idx, "Longitude index:", lon_idx, "\n") # Extract precipitation data for all available time points precip_subset <- ncvar_get(nc, "precipitation", start = c(lon_idx, lat_idx, 1), count = c(1, 1, -1)) # Adjust count if needed # Ensure `precip_subset` is a numeric vector precip_subset <- as.vector(precip_subset) # Debugging: Check dimensions and content of precipitation data cat("Length of precip_subset:", length(precip_subset), "\n") if (length(precip_subset) > 0) { cat("Sample precipitation data:", head(precip_subset), "\n") } else { cat("Warning: Empty precipitation data for file:", file, "\n") nc_close(nc) next } # Combine dates and precipitation into a data frame file_time_series <- data.frame(Date = dates, Precipitation = precip_subset) # Ensure Date is consistently a Date type file_time_series$Date <- as.Date(file_time_series$Date) # Append to the overall time series all_time_series <- bind_rows(all_time_series, file_time_series) # Close the NetCDF file nc_close(nc) } # Debugging: Check the final combined time series cat("\nFinal time series preview:\n") print(head(all_time_series)) # Step 3: Save the combined time series to a CSV file if (nrow(all_time_series) > 0) { write_csv(all_time_series, output_csv) cat("Precipitation time series saved to:", output_csv, "\n") } else { cat("No valid data extracted. CSV file was not created.\n") } 
4
  • Did you run the trace? What does that contain? Commented Dec 13, 2024 at 4:48
  • Impossible to really help without your data. You are looping over NC files? How many? Is the error message when processing one of them or when you combine all of them? Rewrite the code in the loop as a function that works on one file and test on each file. Commented Dec 13, 2024 at 7:34
  • The file contains over 200 NC files. The error message appears after processing the whole dataset, and when the data is to be exported to CSV, an error comes up. I have shared a link to the data drive.google.com/drive/folders/… Commented Dec 14, 2024 at 23:27
  • The error occurs with fewer files, so you only needed to share three of them which would save us a lot of downloads. Minimise the work of people who are helping you! Commented Dec 15, 2024 at 13:57

1 Answer 1

1

For some reason, when using dplyr::bind_rows to build your data frame, you end up with a dataframe that disagrees with write_csv:

> readr::write_csv(all_time_series,"d.csv") Error in `cli_block()`: ! `x` must not contain list or matrix columns: ✖ invalid columns at index(s): 1 Run `rlang::last_trace()` to see where the error occurred. 

something to do with the dimension of the date column. Simplest fix is:

  • remove library(tidyverse) and never type that again. If you want to use functions from the tidyverse collection of packages only attach the components you need, eg library(ggplot2) or library(dplyr), or prefix the function calls like dplyr::bind_rows.
  • remove library(lubridate) because your script doesn't use it
  • replace bind_rows with rbind
  • replace write_csv with write.csv. You might want to add some extra parameters to that to remove row labels, or change the quotes or separator.

Now your script actually works, runs lighter and possibly faster (because you have removed dependencies).

I don't know the underlying reason why it breaks with bind_rows and write_csv (I have an idea but no enthusiasm to dig into fixing it when the real fix is "use the corresponding base R functions").

Here's what I get from your NCDF files:

> prec = read.csv("./Full_Precipitation_Timeseries2.csv") > plot(as.Date(prec$Date), prec$Precipitation,type="l") 

enter image description here

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.