52

To calculate the number of NAs in the entire data.frame, I can use sum(is.na(df), however, how can I count the number of NA in each column of a big data.frame? I tried apply(df, 2, function (x) sum(is.na(df$x)) but that didn't seem to work.

2
  • 2
    Try removing 'df$' in df$x. Commented Oct 9, 2014 at 8:31
  • @AndreyShabalin please post this as an answer (add some code, e.g. x <- data.frame(a = c(1, 2, NA, NA, 1), b = c(1, 1, 1, 1, NA));apply(x, 2, function(z) sum(is.na(z))) ). Commented Oct 10, 2014 at 11:32

12 Answers 12

76

You could try:

colSums(is.na(df)) # V1 V2 V3 V4 V5 # 2 4 2 4 4 

data

set.seed(42) df <- as.data.frame(matrix(sample(c(NA,0:4), 5*20,replace=TRUE), ncol=5)) 
Sign up to request clarification or add additional context in comments.

Comments

15

Since the dplyr::summarise_all function has been superseded by using across inside the original function and dplyr::funs has been deprecated, the current tidyverse approach would probably be something like:

df %>% summarise(across(everything(), ~ sum(is.na(.x)))) 

Comments

13

With dplyr...

df %>% summarise_all(funs(sum(is.na(.)))) 

or using the purrr library

map(df, ~sum(is.na(.))) 

Comments

5

You can use sapply :

sapply(X = df, FUN = function(x) sum(is.na(x))) 

Comments

4

You could try the following functions

  1. Using colSums()

    colSums(is.na(df))

  2. Using apply()

    apply(df, 2, function(x) {sum(is.na(x))})

  3. Using a function

    sum.na <- function (x) { sum(is.na(x)) }

    print(sum.na(df))

  4. Using lapply()

    lapply(df, function(x) sum(is.na(x)))

  5. Using sapply()

    lapply(df, function(x) sum(is.na(x)))

Comments

2

To maintain the names of each column, use this variation (substitute name of dataframe for df in example):

apply(is.na(df), 2, sum) 

Comments

1

Try:

apply(df, 2, function(x) length(which(is.na(x)))) 

Comments

1

We can also use the dplyr function to achieve this outcome:

df %>% select(everything()) %>% summarise_all(funs(sum(is.na(.)))) 

The above solution allows you to select specific columns by replacing the everything() with specific columns you are interested in analysing. This can be useful to meet specific needs. If you want to read further, you can check this page https://sebastiansauer.github.io/sum-isna/.

Comments

1

A more modern approach using dplr's across (summarise_all is now superseded):

df %>% summarise(across(everything(), ~sum(is.na(.)))) 

Comments

0

you can use

apply(is.na(df), 2, sum) 

this will return total sum of NAs available in each column

example

 df <- data.frame(x= as.numeric(c(1,2,3,4,5,6,6,'fg',8,8,3,4,2)), y = as.numeric(c(1,2,3,4,5,'as',7,8,9,9,1,4,2)), z = as.numeric(c(1,4,6,7,'a',12,45,7,'as',1,23,12,'la'))) apply(is.na(df), 2, sum) 

output

x y z 1 1 3 

Comments

0

A possible data.table approach:

library(data.table) egdf = data.frame(x=c(1, 10, NA, NA, 2), y=c(2.4, NA, 2, 3.5, NA)) setDT(egdf) # make it a data.table egdf[, z := x+y] # add another column # use .SDcols after 2nd ',' to specify only some columns egdf[, lapply(.SD, function(x) {return(sum(is.na(x)))}) ,] ### # x y z # 1: 2 2 4 

1 Comment

This seems separate from my previous answer, but if I should rather add this to that, let me know, please & I will.
0

Another, very simple, approach is just to use the summary() function on the dataframe. It will give at a glance the NA counts for each column.

From this website (intro2r) on the summary() function, "If a variable contains missing data then the number of NA values is also reported." (Note, I could not find confirmation in the official docs.)

A downside to summary() is it gets more involved if you want to extract the NA counts for use in another context. Though there are ways to parse a summary() object, for example, this link.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.