Remove and set aside elements of vectors within a list that don't exist in another vector

Question

I'm trying to form arguments for use in the reshape() function. I have a vector of column names, some of which should be merged by reshape() because they share the same letter at the end:

> v <- c("x","da","db","ea","eb","ec","fb")

Most of these columns are comprised of a combination of pre and post characters. pre will be the timevar argument and post will be the v.names argument in reshape(). They are defined as:

> pre <- c("d","e","f") > post <- c("a","b","c")

I have organized the problem this way since there are a variable number of columns I will have to perform this on for different files. By parsing the column names like this, I'm sure I can do this with an algorithm rather than a manual hack.

My desired output is a list of vectors that only include elements of v that share the same post letter. The intention is to use these as the varying parameter in reshape():

> desired_lov $a [1] "da" "ea" $b [1] "db" "eb" "fb"

And in addition, I would like to keep track of which elements are missing from desired_lov which still exist in the original v vector. The intention is to use these as the idvar parameter in reshape():

> desired_idh [1] "x" "ec"

With all that given, someone helped me to build a list of vectors with possible column names with those prefixes and postfixes. Each vector in this list is named after an element in post, and I believe this is important in order for this to work with reshape() since it will merge those columns in each vector under a common name:

> lov <- Map(function(x) paste0(pre,x),post) > lov $a [1] "da" "ea" "fa" $b [1] "db" "eb" "fb" $c [1] "dc" "ec" "fc"

Except this builds more names from those combinations than actually exist in v. So I would like to keep track of which names in v do not exist in lov, for which I've tried:

> idh <- NULL > Map(function(x) idh <- paste(idh,lov[[x]][lov[[x]] %in% v]),1:length(lov)) [[1]] [1] " da" " ea" [[2]] [1] " db" " eb" " fb" [[3]] [1] " ec" > idh NULL

Except apparently I'm not succeeding in modifying the idh variable using Map()

For the next step (after I figure out the bit immediately above), in order to strip out the elements of lov that don't match v, I've tried:

> Map(function(x) lov[[x]] <- lov[[x]][lov[[x]] %in% v],1:length(lov)) [[1]] [1] "da" "ea" [[2]] [1] "db" "eb" "fb" [[3]] [1] "ec" > lov $a [1] "da" "ea" "fa" $b [1] "db" "eb" "fb" $c [1] "dc" "ec" "fc"

Which gives me promising output (I would need to remove all vectors from that list that have length < 2 since I'm only looking for duplicated columns based on their second characters), but once again it failed to actually modify lov by removing the elements I was trying to remove.

I've tried searching, but all I keep finding are ways to remove elements of vectors. This seems to be a much different problem since I'm trying to remove elements from multiple vectors embedded in a list while trying to preserve the vector names in that list.

Edit: I do know about x ahead of time, so I can manually exclude it where needed. But I don't know that c is a unique postfix ahead of time (in this particular example), so it needs to be determined within the script.

In your initial vector, there are some elements with c as the end character. So, you wanted to remove x and c and then split up the vector based on the last character. You may use grep/split/substr to get the desired output. i.e. v1 <- v[!grepl('\\bx\\b|c$', v)]; split(v1, substr(v1, 2,2)) — akrun
– akrun, Commented May 14, 2015 at 12:15
@akrun I don't think OP wants to explicitly exclude 'x' and 'c'. They want code that identifies 'x' and 'c' based on the rule that they do not share a terminal character with any other string in the vector. — Pierre L
– Pierre L, Commented May 14, 2015 at 12:26
@akrun That's pretty cool, but I don't know about c being a unique postfix ahead of time (I do, however know about x ahead of time). That part needs to be deduced in the program, which is why I've structured things this way. — Shawn
– Shawn, Commented May 14, 2015 at 12:26
@plafort Sorry, I didn't read the full post. Just giving some ideas — akrun
– akrun, Commented May 14, 2015 at 12:26
@Shawn If you know the substrings that needs to be matched we can match it and remove the rest instead of matching c and x — akrun
– akrun, Commented May 14, 2015 at 12:27

Pierre L · Accepted Answer · 2015-05-14 13:09:02Z

freq <- lapply(Map(function(x) grep(x, v), post), length) index <- Map(function(x) grep(x, v), names(freq)[freq>1]) lapply(index, function(x) v[x]) $a [1] "da" "ea" $b [1] "db" "eb" "fb"

and

v[-unlist(index)] [1] "x" "ec"

Data

v <- c("x","da","db","ea","eb","ec","fb") pre <- c("d","e","f") post <- c("a","b","c")

Collectives™ on Stack Overflow

Remove and set aside elements of vectors within a list that don't exist in another vector

1 Answer 1

Data

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Data

Comments

Related