I'm trying to form arguments for use in the reshape() function. I have a vector of column names, some of which should be merged by reshape() because they share the same letter at the end:
> v <- c("x","da","db","ea","eb","ec","fb") Most of these columns are comprised of a combination of pre and post characters. pre will be the timevar argument and post will be the v.names argument in reshape(). They are defined as:
> pre <- c("d","e","f") > post <- c("a","b","c") I have organized the problem this way since there are a variable number of columns I will have to perform this on for different files. By parsing the column names like this, I'm sure I can do this with an algorithm rather than a manual hack.
My desired output is a list of vectors that only include elements of v that share the same post letter. The intention is to use these as the varying parameter in reshape():
> desired_lov $a [1] "da" "ea" $b [1] "db" "eb" "fb" And in addition, I would like to keep track of which elements are missing from desired_lov which still exist in the original v vector. The intention is to use these as the idvar parameter in reshape():
> desired_idh [1] "x" "ec" With all that given, someone helped me to build a list of vectors with possible column names with those prefixes and postfixes. Each vector in this list is named after an element in post, and I believe this is important in order for this to work with reshape() since it will merge those columns in each vector under a common name:
> lov <- Map(function(x) paste0(pre,x),post) > lov $a [1] "da" "ea" "fa" $b [1] "db" "eb" "fb" $c [1] "dc" "ec" "fc" Except this builds more names from those combinations than actually exist in v. So I would like to keep track of which names in v do not exist in lov, for which I've tried:
> idh <- NULL > Map(function(x) idh <- paste(idh,lov[[x]][lov[[x]] %in% v]),1:length(lov)) [[1]] [1] " da" " ea" [[2]] [1] " db" " eb" " fb" [[3]] [1] " ec" > idh NULL Except apparently I'm not succeeding in modifying the idh variable using Map()
For the next step (after I figure out the bit immediately above), in order to strip out the elements of lov that don't match v, I've tried:
> Map(function(x) lov[[x]] <- lov[[x]][lov[[x]] %in% v],1:length(lov)) [[1]] [1] "da" "ea" [[2]] [1] "db" "eb" "fb" [[3]] [1] "ec" > lov $a [1] "da" "ea" "fa" $b [1] "db" "eb" "fb" $c [1] "dc" "ec" "fc" Which gives me promising output (I would need to remove all vectors from that list that have length < 2 since I'm only looking for duplicated columns based on their second characters), but once again it failed to actually modify lov by removing the elements I was trying to remove.
I've tried searching, but all I keep finding are ways to remove elements of vectors. This seems to be a much different problem since I'm trying to remove elements from multiple vectors embedded in a list while trying to preserve the vector names in that list.
Edit: I do know about x ahead of time, so I can manually exclude it where needed. But I don't know that c is a unique postfix ahead of time (in this particular example), so it needs to be determined within the script.
cas the end character. So, you wanted to removexandcand thensplitup the vector based on the last character. You may usegrep/split/substrto get the desired output. i.e.v1 <- v[!grepl('\\bx\\b|c$', v)]; split(v1, substr(v1, 2,2))cbeing a unique postfix ahead of time (I do, however know aboutxahead of time). That part needs to be deduced in the program, which is why I've structured things this way.substringsthat needs to be matched we can match it and remove the rest instead of matchingcandx