2

I am trying to create a simple data frame that contains information about what authors and their respective papers. I have a matrix that contains the author IDs as the rows and the paper IDs as the columns. This matrix contains 1s and 0s, where a 1 indicates that the author worked on that paper. For example, if A2P[1,1] == 1, that means that the author with ID 1 worked on the paper with ID 1.

I am trying to convert this matrix into a simple data frame that contains all of these relationships, something that just contains the author IDs and the papers that they worked on. As in,

au_ID P_ID 1 1 1 12 # Author 1 has worked on both paper 1 and 12 2 1 # Author 2 has also worked on paper 1, in addition to papers 2 and 3. 2 2 2 3 ... 

Here is what I am doing:

list1 <- list() list2 <- list() # Rows are Author IDs # Columns are Paper IDs for (row in 1:nrow(A2P)){ for (col in 1:ncol(A2P)){ if (A2P[row,col] == 1){ list1 <- append(list1, row) list2 <- append(list2, col) } } } authorship["au_ID"] = list1 authorship["P_ID"] = list2 

I am having difficulty getting this code to run quickly. It is taking forever to run, going on twenty minutes now. I think it has something to do with appending each row and column value to each of the lists, but I am unsure.

Any help would be greatly appreciated! Thank you so much!

1 Answer 1

3

You likely need which(A2P == 1L, arr.ind = TRUE)

mat <- matrix(c(1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L), ncol = 3) mat # [,1] [,2] [,3] #[1,] 1 0 1 #[2,] 0 1 0 #[3,] 0 1 1 which(mat == 1L, arr.ind = TRUE) # row col #[1,] 1 1 #[2,] 2 2 #[3,] 3 2 #[4,] 1 3 #[5,] 3 3 

In this case, row would correspond to au_ID and col would correspond to P_ID. Then to get it in your format completely:

authorship <- which(mat == 1L, arr.ind = TRUE) colnames(authorship) <- c('au_ID', 'P_ID') as.data.frame(authorship) ## au_ID P_ID ##1 1 1 ##2 2 2 ##3 3 2 ##4 1 3 ##5 3 3 
Sign up to request clarification or add additional context in comments.

3 Comments

Nice solution, love the simplicity
Thank you very much! My code actually did finish running, and I was able to create the data frame I wanted using the following: authorship <- do.call(rbind, Map(data.frame, au_ID=list1, P_ID=list2))
However, yours is so much more compact and does not use a loop. Thank you very much! I really appreciate it as I'm quite new to R - I was trying to use python syntax with the whole authorship["au_ID"] thing.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.