61

[UPDATE: there is now a native transpose() function in data.table package]

I often need to transpose a data.table, every time it takes several lines of code and I am wondering if there's any better solution than mine.

if we take sample table

library(data.table) mydata <- data.table(col0=c("row1","row2","row3"), col1=c(11,21,31), col2=c(12,22,32), col3=c(13,23,33)) mydata # col0 col1 col2 col3 # row1 11 12 13 # row2 21 22 23 # row3 31 32 33 

and just transpose it with t(), it will be transposed to the matrix with conversion to character type, while applying data.table to such matrix will lose row.names:

t(mydata) # [,1] [,2] [,3] # col0 "row1" "row2" "row3" # col1 "11" "21" "31" # col2 "12" "22" "32" # col3 "13" "23" "33" data.table(t(mydata)) # V1 V2 V3 # row1 row2 row3 # 11 21 31 # 12 22 32 # 13 23 33 

so I had to write a function for this:

tdt <- function(inpdt){ transposed <- t(inpdt[,-1,with=F]); colnames(transposed) <- inpdt[[1]]; transposed <- data.table(transposed, keep.rownames=T); setnames(transposed, 1, names(inpdt)[1]); return(transposed); } tdt(mydata) # col0 row1 row2 row3 # col1 11 21 31 # col2 12 22 32 # col3 13 23 33 

is there anything I could optimize here or do it in "nicer" way?

6
  • 3
    I'm curious why you'd need to transpose a data.table? Transposing is a natural operation for numeric matrices but I have a hard time thinking of problems where it would be a good idea to transpose a by-definition non-uniform object like a data.table or a data.frame. Care to give an example? Commented Feb 22, 2015 at 3:55
  • 2
    well, most of my experimental data files are tables where main part is indeed numeric matrix, but accompanied by several non-numeric fields, so using data.table is quite logical for me. Most typical example would be the table where each column is a sample and each row is a probe; characteristics of probes and/or samples are recorded in the same table. Commented Feb 22, 2015 at 4:33
  • can you fread with header = FALSE and then use transpose? Commented Mar 7, 2019 at 10:31
  • @eddi one example of where transposing a data.frame is useful: in single cell genomics two concepts of storing data are present. one with genes in rows and cells in columns and one vice versa. so if you want to use different software packages you would need to transpose.. Commented Aug 23, 2019 at 16:07
  • Given that there now is the native data.table function transpose, it might be worth re-considering what answer to accept. Commented Jan 21, 2020 at 11:54

6 Answers 6

62

Why not just melt and dcast the data.table?

require(data.table) dcast(melt(mydata, id.vars = "col0"), variable ~ col0) # variable row1 row2 row3 # 1: col1 11 21 31 # 2: col2 12 22 32 # 3: col3 13 23 33 
Sign up to request clarification or add additional context in comments.

4 Comments

hmmm, it looks neat, thanks! (I will keep the question open for a while to see if there are other suggestions)
@VasilyA, I think that the main question comes back to why you would be using a data.table for matrix-type data. Are you doing a lot of work with grouping and so on?
yes, I use a lot of grouping and "subsetting", often need to select only a part of that big matrix.
@buhtz, I suppose you know that you should use library(data.table) before dcast and melt? If that doesn't solve your problem, please share the version of "data.table" that you're using.
46

The current docs show a builtin transpose method.

Specifically, you can do:

transpose(mydata, keep.names = "col", make.names = "col0") ## col row1 row2 row3 ## 1: col1 11 21 31 ## 2: col2 12 22 32 ## 3: col3 13 23 33 

3 Comments

This should be the way to go. But at the moment it doesn't keep the column names as rownames. There is a issue opened github.com/Rdatatable/data.table/issues/1886 to ask for it. I guess if we vote there it will be implemented sooner.
Is there a way to keep numeric as they are instead of converting to characters?
In case anybody else had trouble with it: I had to call it explicitly as data.table::transpose() to not get purrr's transpose().
21

Here's an alternative solution that only uses data.table and that is closer to the original idea of using t to transpose.

mydata[, data.table(t(.SD), keep.rownames=TRUE), .SDcols=-"col0"] ## rn V1 V2 V3 ## 1: col1 11 21 31 ## 2: col2 12 22 32 ## 3: col3 13 23 33 

If keeping the rownames is important, setnames can be used. Admittedly this becomes a bit clumsy and probably the recast solution is preferable.

setnames(mydata[, data.table(t(.SD), keep.rownames=TRUE), .SDcols=-"col0"], mydata[, c('rn', col0)])[] ## rn row1 row2 row3 ## 1: col1 11 21 31 ## 2: col2 12 22 32 ## 3: col3 13 23 33 

2 Comments

This approach is certainly easier to understand. By the way, release notes for v1.9.5 of data.table suggest that reshape2 will soon not be required for melt and dcast. github.com/Rdatatable/…
thanks, @shadow! I like that we only use data.table, but I should note that column names (i.e. original row names - "row1","row2","row3") - are lost here.
8
df <- as.data.frame(t(mydata)) 

is what I tried and df is a data.frame and the column names on mydata are now row names on df

Comments

2

Here's a solution that uses a wrapper to tidy up the output of the data.table transpose function.

With really large data sets this seems to be more efficient than the dcast/melt approach (I tested it on a 8000 row x 29000 column data set, the below function works in about 3 minutes but dcast/melt crashed R):

# Function to clean up output of data.table transpose: transposedt <- function(dt, varlabel) { require(data.table) dtrows = names(dt) dtcols = as.list(c(dt[,1])) dtt = transpose(dt) dtt[, eval(varlabel) := dtrows] setnames(dtt, old = names(dtt), new = c(dtcols[[1]], eval(varlabel))) dtt = dtt[-1,] setcolorder(dtt, c(eval(varlabel), names(dtt)[1:(ncol(dtt) - 1)])) return(dtt) } # Some dummy data mydt <- data.table(col0 = c(paste0("row", seq_along(1:100))), col01 = c(sample(seq_along(1:100), 100)), col02 = c(sample(seq_along(1:100), 100)), col03 = c(sample(seq_along(1:100), 100)), col04 = c(sample(seq_along(1:100), 100)), col05 = c(sample(seq_along(1:100), 100)), col06 = c(sample(seq_along(1:100), 100)), col07 = c(sample(seq_along(1:100), 100)), col08 = c(sample(seq_along(1:100), 100)), col09 = c(sample(seq_along(1:100), 100)), col10 = c(sample(seq_along(1:100), 100))) # Apply the function: mydtt <- transposedt(mydt, "myvariables") # View the results: > mydtt[,1:10] myvariables row1 row2 row3 row4 row5 row6 row7 row8 row9 1: col01 58 53 14 96 51 30 26 15 68 2: col02 6 72 46 62 69 9 63 32 78 3: col03 21 36 94 41 54 74 82 64 15 4: col04 68 41 66 30 31 78 51 67 26 5: col05 49 30 52 78 73 71 5 66 44 6: col06 89 35 79 67 6 88 62 97 73 7: col07 66 15 27 29 58 40 35 82 57 8: col08 55 47 83 30 23 65 48 56 87 9: col09 41 10 21 33 55 81 94 25 34 10: col10 35 17 41 44 21 66 69 61 46 

What is also useful is that columns (ex rows) occur in their original order and you can name the variables column something meaningful.

Comments

-1

The tdt function which I provide below should be faster

tdt <- function(DT, transpose.col, ...) { # The transpose function is efficient, but lacks the keeping of row and colnames new.row.names <- colnames(DT) new.row.names <- new.row.names[!new.row.names %in% transpose.col] new.col.names <- DT[, transpose.col, with = F] DT <- DT[, !colnames(DT) %in% transpose.col, with = F] DT <- transpose(DT, ...) colnames(DT) <- unlist(new.col.names) DT$var <- new.row.names # change order of DT after transposing setcolorder(DT, c("var", setdiff(names(DT), "var"))) colnames(DT)[1] <- transpose.col return(DT) } library(microbenchmark); library(microbenchmarkCore) DT <- data.table(x=1:1000, y=paste("name", 1:1000, sep = "_"), z = paste("test", 1:1000, sep = ".")) rbind(microbenchmark(tdt(DT, "y")), microbenchmark(dcast(melt(DT, id.vars = "y"), variable ~ y)), microbenchmark(DT[, data.table(t(.SD), keep.rownames=TRUE), .SDcols=-"y"])) Unit: milliseconds expr min lq mean median uq max neval cld tdt(DT, "y") 3.463842 3.719341 4.308158 3.911599 4.576477 20.406940 100 a dcast(melt(DT, id.vars = "y"), variable ~ y) 5.146119 5.496761 5.826647 5.580796 5.870584 9.536541 100 a DT[, data.table(t(.SD), keep.rownames = TRUE), .SDcols = -"y"] 29.975567 34.554989 40.807036 36.724430 39.102396 104.242218 100 b d <- tdt(DT, "y") d[1:2, 1:11] y name_1 name_2 name_3 name_4 name_5 name_6 name_7 name_8 name_9 name_10 1: x 1 2 3 4 5 6 7 8 9 10 2: z test.1 test.2 test.3 test.4 test.5 test.6 test.7 test.8 test.9 test.10 DT[1:10, 1:3] x y z 1: 1 name_1 test.1 2: 2 name_2 test.2 3: 3 name_3 test.3 4: 4 name_4 test.4 5: 5 name_5 test.5 6: 6 name_6 test.6 7: 7 name_7 test.7 8: 8 name_8 test.8 9: 9 name_9 test.9 10: 10 name_10 test.10 class(d) [1] "data.table" "data.frame" 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.