0

I have files called

value<-c("ABC_Seed_1_0.csv", "ABC_Seed_1_1.csv", "ABC_Seed_10_0.csv", "ABC_Seed_10_1.csv") 

I would like to only find and delete files that belong to file: seed_1.tar.xz (i.e. find all files called ACB_Seed_1_*.csv)

Problem I have is that if I search for seed_1 I will also get seed_10. Is there a trick?

I've tried adding the "_" using paste0

 #Available files value<-c("ABC_Seed_1_0.csv", "ABC_Seed_1_1.csv", "ABC_Seed_10_0.csv", "ABC_Seed_10_1.csv") library(dplyr) library(tidyr) #File to match against (minus extension) file<-c("seed_10.tar.xz") ListToDelete<- value %>% as_tibble %>% filter(value, stringr::str_detect(string = value, pattern = paste0(fixed(tools::file_path_sans_ext(file, compression = TRUE),ignore_case = TRUE),"_")) #Returns an empty tibble file.remove(ListToDelete) 
2
  • 1
    There is definitely a regex to do that but you can also use stringr::word(), i.e. stringr::word(value, 3, sep = '_') == '1' Commented Dec 8, 2020 at 10:58
  • Using Tim's regex you could also use grep("ABC_Seed_1_\\d+.csv", value, value = TRUE) Commented Dec 8, 2020 at 11:26

2 Answers 2

2

You might be making this more complicated than it needs to be. In base R, I would just use grepl here:

value[grepl("ABC_Seed_1_\\d+.csv", value)] [1] "ABC_Seed_1_0.csv" "ABC_Seed_1_1.csv" 

Data:

value <- c("ABC_Seed_1_0.csv", "ABC_Seed_1_1.csv", "ABC_Seed_10_0.csv", "ABC_Seed_10_1.csv") 
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Tim, you're right I always make things more complicated. I'm looking for find seed_1 from the seed_1.tar.xz filename in the value vector so how would I put it into that grepl function? Here you've put in a specific string but first I need to split seed_1.tar.xz and then feed it in, no? I should clarify that all this is in a for loop so I will later need to extract all files belonging to seed_10, etc (sorry if this was not clear)
1

To improve the previous answer ... assuming that your file names are standard, first split the input using strsplit and extract the Seed number, then use grepl as suggested.

E.g.

value[grepl(paste("Seed_",as.numeric(strsplit(file, "[_|.]")[[1]][2]),"_",sep=""), value, fixed=TRUE)] 

6 Comments

Thank you. This returns all my files though. I only want seed_1 or seed_10 not both.
That's odd, it works as intended here. The grepl on both preceding and following underscores should eliminate any ambiguity.
Ahh I'm so sorry, I made a mistake. My tar csv files are ABC_Seed1_0.tar.xz
OK so if I take the first _ away it works
Great, so the confusion is solved :) It really pays to learn some regex basics for this kind of problems, the grep family is very useful.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.