R: how to total the number of NA in each col of data.frame

Question

To calculate the number of NAs in the entire data.frame, I can use sum(is.na(df), however, how can I count the number of NA in each column of a big data.frame? I tried apply(df, 2, function (x) sum(is.na(df$x)) but that didn't seem to work.

@AndreyShabalin please post this as an answer (add some code, e.g. x <- data.frame(a = c(1, 2, NA, NA, 1), b = c(1, 1, 1, 1, NA));apply(x, 2, function(z) sum(is.na(z))) ). — Roman Luštrik
– Roman Luštrik, Commented Oct 10, 2014 at 11:32

akrun · Accepted Answer · 2014-10-09 08:59:51Z

You could try:

colSums(is.na(df)) # V1 V2 V3 V4 V5 # 2 4 2 4 4

data

set.seed(42) df <- as.data.frame(matrix(sample(c(NA,0:4), 5*20,replace=TRUE), ncol=5))

climatestudent · Accepted Answer · 2021-07-07 21:46:27Z

Since the dplyr::summarise_all function has been superseded by using across inside the original function and dplyr::funs has been deprecated, the current tidyverse approach would probably be something like:

df %>% summarise(across(everything(), ~ sum(is.na(.x))))

Nettle · Accepted Answer · 2018-09-17 15:46:44Z

13

With dplyr...

df %>% summarise_all(funs(sum(is.na(.))))

or using the purrr library

map(df, ~sum(is.na(.)))

edited Sep 17, 2018 at 15:46

answered Sep 17, 2018 at 15:35

Nettle

3,3312 gold badges24 silver badges27 bronze badges

Comments

Victorp · Accepted Answer · 2014-10-09 08:30:11Z

5

You can use sapply :

sapply(X = df, FUN = function(x) sum(is.na(x)))

answered Oct 9, 2014 at 8:30

Victorp

13.9k2 gold badges53 silver badges56 bronze badges

Comments

thisisadi · Accepted Answer · 2021-10-31 04:40:25Z

You could try the following functions

Using colSums()

colSums(is.na(df))
Using apply()

apply(df, 2, function(x) {sum(is.na(x))})
Using a function

sum.na <- function (x) { sum(is.na(x)) }

print(sum.na(df))
Using lapply()

lapply(df, function(x) sum(is.na(x)))
Using sapply()

lapply(df, function(x) sum(is.na(x)))

dwolf · Accepted Answer · 2018-07-17 21:18:08Z

To maintain the names of each column, use this variation (substitute name of dataframe for df in example):

apply(is.na(df), 2, sum)

alko989 · Accepted Answer · 2014-10-09 08:30:16Z

1

Try:

apply(df, 2, function(x) length(which(is.na(x))))

answered Oct 9, 2014 at 8:30

alko989

7,9585 gold badges43 silver badges64 bronze badges

Comments

Sandy · Accepted Answer · 2021-06-07 08:53:01Z

We can also use the dplyr function to achieve this outcome:

df %>% select(everything()) %>% summarise_all(funs(sum(is.na(.))))

The above solution allows you to select specific columns by replacing the everything() with specific columns you are interested in analysing. This can be useful to meet specific needs. If you want to read further, you can check this page https://sebastiansauer.github.io/sum-isna/.

Marta Cz-C · Accepted Answer · 2024-07-19 13:45:32Z

A more modern approach using dplr's across (summarise_all is now superseded):

df %>% summarise(across(everything(), ~sum(is.na(.))))

Macosso · Accepted Answer · 2021-05-17 19:14:13Z

you can use

apply(is.na(df), 2, sum)

this will return total sum of NAs available in each column

example

 df <- data.frame(x= as.numeric(c(1,2,3,4,5,6,6,'fg',8,8,3,4,2)), y = as.numeric(c(1,2,3,4,5,'as',7,8,9,9,1,4,2)), z = as.numeric(c(1,4,6,7,'a',12,45,7,'as',1,23,12,'la'))) apply(is.na(df), 2, sum)

output

x y z 1 1 3

climatestudent · Accepted Answer · 2023-11-22 12:14:34Z

A possible data.table approach:

library(data.table) egdf = data.frame(x=c(1, 10, NA, NA, 2), y=c(2.4, NA, 2, 3.5, NA)) setDT(egdf) # make it a data.table egdf[, z := x+y] # add another column # use .SDcols after 2nd ',' to specify only some columns egdf[, lapply(.SD, function(x) {return(sum(is.na(x)))}) ,] ### # x y z # 1: 2 2 4

This seems separate from my previous answer, but if I should rather add this to that, let me know, please & I will.

J Prestone · Accepted Answer · 2024-02-07 00:22:18Z

Another, very simple, approach is just to use the summary() function on the dataframe. It will give at a glance the NA counts for each column.

From this website (intro2r) on the summary() function, "If a variable contains missing data then the number of NA values is also reported." (Note, I could not find confirmation in the official docs.)

A downside to summary() is it gets more involved if you want to extract the NA counts for use in another context. Though there are ways to parse a summary() object, for example, this link.

Collectives™ on Stack Overflow

R: how to total the number of NA in each col of data.frame

12 Answers 12

data

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

data

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Linked

Related