0

I am trying to import my data using read_excel but I need to interpret any NA string value to missing values but I am stuck

Currently, my data has NAs all over the place and I need them to be blank so that when I run colsums(is.na(data)) it shouldn't show 0s

NA is chr in this table within numeric numbers as you can see in screenshot

data <- read_excel(workbook_path, na = c("")) colSums(is.na(data)) 

enter image description here

enter image description here

enter image description here

20
  • 1
    did you try read_excel(workbook_path, na = c("", "NA")) ? Commented Jan 26, 2023 at 0:35
  • Does this answer your question? Using read_excel(na = ) how do you specify more than one NA character string? Commented Jan 26, 2023 at 0:36
  • @ArthurYip I tried that code but {r} data <- read_excel(workbook_path, na = c("", "NA")) print(data) colSums(is.na(data)) I keep seeing 0 which means I have NAs still Commented Jan 26, 2023 at 0:37
  • I just ran sum(is.na(data)) and I get 40 which means my NAs didn't disappear Commented Jan 26, 2023 at 0:39
  • 1
    ahhhhhhh you are right, I was interpreting the question wrong !!!!!!! seeing 40 is the right way. thank you Commented Jan 26, 2023 at 1:12

1 Answer 1

1

Using one of readxl example files for reproducible example, you can open its location by running browseURL(dirname(readxl_example("type-me.xlsx"))), though the sheet looks like this:

enter image description here

library(readxl) library(dplyr) xlsx <- readxl_example("type-me.xlsx") # open file location explorer: # browseURL(dirname(readxl_example("type-me.xlsx"))) # by default blank cells are treated as missing data, note the single <NA>: df <- read_excel(xlsx, sheet = "text_coercion") %>% head(n = 2) df #> # A tibble: 2 × 2 #> text explanation #> <chr> <chr> #> 1 <NA> "empty" #> 2 cabbage "\"cabbage\"" # add "empty" to na vector, note 2 <NA> values: df <- readxl::read_excel(xlsx, sheet = "text_coercion", na = c("", "empty")) %>% head(n = 2) df #> # A tibble: 2 × 2 #> text explanation #> <chr> <chr> #> 1 <NA> <NA> #> 2 cabbage "\"cabbage\"" # to replace all(!) NA values with "" df[is.na(df)] <- "" df #> # A tibble: 2 × 2 #> text explanation #> <chr> <chr> #> 1 "" "" #> 2 "cabbage" "\"cabbage\"" 

Created on 2023-01-26 with reprex v2.0.2

Note from your screenshot: you have column names in the first row of your dataframe, this breaks data type detection (everything is chr) and you should deal with that first; at that point data[is.na(data)] <- "" will no longer work as you can not write strings to numerical columns. And it's perfectly fine.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.