best way to transpose data.table

Question

[UPDATE: there is now a native transpose() function in data.table package]

I often need to transpose a data.table, every time it takes several lines of code and I am wondering if there's any better solution than mine.

if we take sample table

library(data.table) mydata <- data.table(col0=c("row1","row2","row3"), col1=c(11,21,31), col2=c(12,22,32), col3=c(13,23,33)) mydata # col0 col1 col2 col3 # row1 11 12 13 # row2 21 22 23 # row3 31 32 33

and just transpose it with t(), it will be transposed to the matrix with conversion to character type, while applying data.table to such matrix will lose row.names:

t(mydata) # [,1] [,2] [,3] # col0 "row1" "row2" "row3" # col1 "11" "21" "31" # col2 "12" "22" "32" # col3 "13" "23" "33" data.table(t(mydata)) # V1 V2 V3 # row1 row2 row3 # 11 21 31 # 12 22 32 # 13 23 33

so I had to write a function for this:

tdt <- function(inpdt){ transposed <- t(inpdt[,-1,with=F]); colnames(transposed) <- inpdt[[1]]; transposed <- data.table(transposed, keep.rownames=T); setnames(transposed, 1, names(inpdt)[1]); return(transposed); } tdt(mydata) # col0 row1 row2 row3 # col1 11 21 31 # col2 12 22 32 # col3 13 23 33

is there anything I could optimize here or do it in "nicer" way?

I'm curious why you'd need to transpose a data.table? Transposing is a natural operation for numeric matrices but I have a hard time thinking of problems where it would be a good idea to transpose a by-definition non-uniform object like a data.table or a data.frame. Care to give an example? — eddi
– eddi, Commented Feb 22, 2015 at 3:55
well, most of my experimental data files are tables where main part is indeed numeric matrix, but accompanied by several non-numeric fields, so using data.table is quite logical for me. Most typical example would be the table where each column is a sample and each row is a probe; characteristics of probes and/or samples are recorded in the same table. — Vasily A
– Vasily A, Commented Feb 22, 2015 at 4:33
@eddi one example of where transposing a data.frame is useful: in single cell genomics two concepts of storing data are present. one with genes in rows and cells in columns and one vice versa. so if you want to use different software packages you would need to transpose.. — Tapper
– Tapper, Commented Aug 23, 2019 at 16:07
Given that there now is the native data.table function transpose, it might be worth re-considering what answer to accept. — gofvonx
– gofvonx, Commented Jan 21, 2020 at 11:54

buhtz · Accepted Answer · 2018-12-23 21:53:10Z

62

Why not just melt and dcast the data.table?

require(data.table) dcast(melt(mydata, id.vars = "col0"), variable ~ col0) # variable row1 row2 row3 # 1: col1 11 21 31 # 2: col2 12 22 32 # 3: col3 13 23 33

edited Dec 23, 2018 at 21:53

buhtz

12.5k21 gold badges95 silver badges196 bronze badges

answered Feb 22, 2015 at 3:24

A5C1D2H2I1M1N2O1R2T1

194k30 gold badges416 silver badges496 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Vasily A Over a year ago

hmmm, it looks neat, thanks! (I will keep the question open for a while to see if there are other suggestions)

A5C1D2H2I1M1N2O1R2T1 Over a year ago

@VasilyA, I think that the main question comes back to why you would be using a data.table for matrix-type data. Are you doing a lot of work with grouping and so on?

Vasily A Over a year ago

yes, I use a lot of grouping and "subsetting", often need to select only a part of that big matrix.

A5C1D2H2I1M1N2O1R2T1 Over a year ago

@buhtz, I suppose you know that you should use library(data.table) before dcast and melt? If that doesn't solve your problem, please share the version of "data.table" that you're using.

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2020-05-15 21:54:56Z

46

The current docs show a builtin transpose method.

Specifically, you can do:

transpose(mydata, keep.names = "col", make.names = "col0") ## col row1 row2 row3 ## 1: col1 11 21 31 ## 2: col2 12 22 32 ## 3: col3 13 23 33

edited May 15, 2020 at 21:54

A5C1D2H2I1M1N2O1R2T1

194k30 gold badges416 silver badges496 bronze badges

answered Oct 31, 2016 at 10:51

abalter

10.5k18 gold badges103 silver badges172 bronze badges

3 Comments

skan Over a year ago

This should be the way to go. But at the moment it doesn't keep the column names as rownames. There is a issue opened github.com/Rdatatable/data.table/issues/1886 to ask for it. I guess if we vote there it will be implemented sooner.

Herman Toothrot Over a year ago

Is there a way to keep numeric as they are instead of converting to characters?

Tapper Over a year ago

In case anybody else had trouble with it: I had to call it explicitly as data.table::transpose() to not get purrr's transpose().

shadow · Accepted Answer · 2015-02-24 09:52:39Z

Here's an alternative solution that only uses data.table and that is closer to the original idea of using t to transpose.

mydata[, data.table(t(.SD), keep.rownames=TRUE), .SDcols=-"col0"] ## rn V1 V2 V3 ## 1: col1 11 21 31 ## 2: col2 12 22 32 ## 3: col3 13 23 33

If keeping the rownames is important, setnames can be used. Admittedly this becomes a bit clumsy and probably the recast solution is preferable.

setnames(mydata[, data.table(t(.SD), keep.rownames=TRUE), .SDcols=-"col0"], mydata[, c('rn', col0)])[] ## rn row1 row2 row3 ## 1: col1 11 21 31 ## 2: col2 12 22 32 ## 3: col3 13 23 33

This approach is certainly easier to understand. By the way, release notes for v1.9.5 of data.table suggest that reshape2 will soon not be required for melt and dcast. github.com/Rdatatable/…
thanks, @shadow! I like that we only use data.table, but I should note that column names (i.e. original row names - "row1","row2","row3") - are lost here.

ascripter · Accepted Answer · 2019-02-20 23:42:48Z

8

df <- as.data.frame(t(mydata))

is what I tried and df is a data.frame and the column names on mydata are now row names on df

edited Feb 20, 2019 at 23:42

ascripter

6,31512 gold badges54 silver badges74 bronze badges

answered Feb 20, 2019 at 23:21

Japa19

811 silver badge1 bronze badge

Comments

Amy M · Accepted Answer · 2017-04-28 18:39:39Z

Here's a solution that uses a wrapper to tidy up the output of the data.table transpose function.

With really large data sets this seems to be more efficient than the dcast/melt approach (I tested it on a 8000 row x 29000 column data set, the below function works in about 3 minutes but dcast/melt crashed R):

# Function to clean up output of data.table transpose: transposedt <- function(dt, varlabel) { require(data.table) dtrows = names(dt) dtcols = as.list(c(dt[,1])) dtt = transpose(dt) dtt[, eval(varlabel) := dtrows] setnames(dtt, old = names(dtt), new = c(dtcols[[1]], eval(varlabel))) dtt = dtt[-1,] setcolorder(dtt, c(eval(varlabel), names(dtt)[1:(ncol(dtt) - 1)])) return(dtt) } # Some dummy data mydt <- data.table(col0 = c(paste0("row", seq_along(1:100))), col01 = c(sample(seq_along(1:100), 100)), col02 = c(sample(seq_along(1:100), 100)), col03 = c(sample(seq_along(1:100), 100)), col04 = c(sample(seq_along(1:100), 100)), col05 = c(sample(seq_along(1:100), 100)), col06 = c(sample(seq_along(1:100), 100)), col07 = c(sample(seq_along(1:100), 100)), col08 = c(sample(seq_along(1:100), 100)), col09 = c(sample(seq_along(1:100), 100)), col10 = c(sample(seq_along(1:100), 100))) # Apply the function: mydtt <- transposedt(mydt, "myvariables") # View the results: > mydtt[,1:10] myvariables row1 row2 row3 row4 row5 row6 row7 row8 row9 1: col01 58 53 14 96 51 30 26 15 68 2: col02 6 72 46 62 69 9 63 32 78 3: col03 21 36 94 41 54 74 82 64 15 4: col04 68 41 66 30 31 78 51 67 26 5: col05 49 30 52 78 73 71 5 66 44 6: col06 89 35 79 67 6 88 62 97 73 7: col07 66 15 27 29 58 40 35 82 57 8: col08 55 47 83 30 23 65 48 56 87 9: col09 41 10 21 33 55 81 94 25 34 10: col10 35 17 41 44 21 66 69 61 46

What is also useful is that columns (ex rows) occur in their original order and you can name the variables column something meaningful.

PeterAU · Accepted Answer · 2018-02-08 15:33:08Z

The tdt function which I provide below should be faster

tdt <- function(DT, transpose.col, ...) { # The transpose function is efficient, but lacks the keeping of row and colnames new.row.names <- colnames(DT) new.row.names <- new.row.names[!new.row.names %in% transpose.col] new.col.names <- DT[, transpose.col, with = F] DT <- DT[, !colnames(DT) %in% transpose.col, with = F] DT <- transpose(DT, ...) colnames(DT) <- unlist(new.col.names) DT$var <- new.row.names # change order of DT after transposing setcolorder(DT, c("var", setdiff(names(DT), "var"))) colnames(DT)[1] <- transpose.col return(DT) } library(microbenchmark); library(microbenchmarkCore) DT <- data.table(x=1:1000, y=paste("name", 1:1000, sep = "_"), z = paste("test", 1:1000, sep = ".")) rbind(microbenchmark(tdt(DT, "y")), microbenchmark(dcast(melt(DT, id.vars = "y"), variable ~ y)), microbenchmark(DT[, data.table(t(.SD), keep.rownames=TRUE), .SDcols=-"y"])) Unit: milliseconds expr min lq mean median uq max neval cld tdt(DT, "y") 3.463842 3.719341 4.308158 3.911599 4.576477 20.406940 100 a dcast(melt(DT, id.vars = "y"), variable ~ y) 5.146119 5.496761 5.826647 5.580796 5.870584 9.536541 100 a DT[, data.table(t(.SD), keep.rownames = TRUE), .SDcols = -"y"] 29.975567 34.554989 40.807036 36.724430 39.102396 104.242218 100 b d <- tdt(DT, "y") d[1:2, 1:11] y name_1 name_2 name_3 name_4 name_5 name_6 name_7 name_8 name_9 name_10 1: x 1 2 3 4 5 6 7 8 9 10 2: z test.1 test.2 test.3 test.4 test.5 test.6 test.7 test.8 test.9 test.10 DT[1:10, 1:3] x y z 1: 1 name_1 test.1 2: 2 name_2 test.2 3: 3 name_3 test.3 4: 4 name_4 test.4 5: 5 name_5 test.5 6: 6 name_6 test.6 7: 7 name_7 test.7 8: 8 name_8 test.8 9: 9 name_9 test.9 10: 10 name_10 test.10 class(d) [1] "data.table" "data.frame"

Collectives™ on Stack Overflow

best way to transpose data.table

6 Answers 6

4 Comments

3 Comments

2 Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

4 Comments

3 Comments

2 Comments

Comments

Comments

Comments

Linked

Related