0

I have a folder of .txt files, each has a long string names such as "ctrl_Jack_DrugA_XXuM.txt". However the name is missing an important string, timestamps.

However, I have that information in the dataframe inside each file. for example, in each file, contains multiple columns, one of the column is called "Pid_treatmentsum": the elements in it is "Jack_R4_200514_DrugA_XXuM.txt"

So before I proceed to downstream I want to sort the files out into subfolders based on the names such as Jack and timestamp such as "R4_200514", and in order to do that I need to rename the file title with "Pid_treatmentsum".

Now the code:

``` #create MRE #file 1 Row <- c(rep("16", 20)) column <- c(rep("3", 20)) Pid<- c(rep("Jack", 20)) Stimulation<- c(rep("3S", 20)) Drug <- c(rep("2DG", 20)) Dose <-c(rep("3uM", 20)) Treatmentsum <-c(rep(paste("Jack","3S",'2DG','3uM',sep = "_"), 20)) PiD_treatmentsum <- c(rep(paste('Jack',"T4_20200501",'3S','2DG','3uM',sep = "_"), 20)) sampleset <-data.frame(Row,column,Pid,Stimulation,Drug,Dose,Treatmentsum,PiD_treatmentsum) write.table(sampleset, file = "ctrl_Jack_3S_2DG_3uM.txt",sep="\t", row.names = F, col.names = T) #file 2 Row <- c(rep("16", 40)) column <- c(rep("3", 40)) Pid<- c(rep("Mark", 40)) Stimulation<- c(rep("3S", 40)) Drug <- c(rep("STS", 40)) Dose <-c(rep("1uM", 40)) Treatmentsum <-c(rep(paste("Mark","3S",'STS','1uM',sep = "_"), 40)) PiD_treatmentsum <- c(rep(paste('Mark',"T5_20200501",'3S','STS','1uM',sep = "_"), 40)) sampleset <-data.frame(Row,column,Pid,Stimulation,Drug,Dose,Treatmentsum,PiD_treatmentsum) write.table(sampleset, file = "ctrl_Mark_3S_STS_1uM.txt",sep="\t", row.names = F,col.names = T) # rename all the files using their PiD_treatmentsum filenames <- list.files("C:/UsersXXX", pattern="*.txt") outdirectory <- "~/out" lapply(filenames, function(x) { df <- read.csv(x,sep="\t", header=TRUE, fill = T,stringsAsFactors = F) a <- as.character(unique(df[["PiD_treatmentsum"]])) b<-paste0("ctrl_",a, '.txt', sep="") newname <- file.rename(basename(x), b) write.table(df, paste0(outdirectory,"/", newname, sep="\t", quote=FALSE, row.names=F, col.names=TRUE) }) 

Here it says error in unexpected }. I think I must have screwed up the loop.

If I just dissect the code and run one file as an example, the code works:

 df <- read.csv('ctrl_Jack_3S_2DG_3uM.txt',sep="\t", header=TRUE, fill = T,stringsAsFactors=F) a <- as.character(unique(df[["PiD_treatmentsum"]])) b<-paste0("ctrl_",a, '.txt', sep="") basename('ctrl_Jack_3S_2DG_3uM.txt') file.rename(basename('ctrl_Jack_3S_2DG_3uM.txt'), b) ``` 

A little help and explanation will be appreciated :)

20
  • df$Pid_treatmentsum is a column of the data.frame df and not a string Depending on the content of df you can try newfilename <-df$Pid_treatmentsum[1] Commented Feb 24, 2020 at 20:08
  • Hi @dario, I have tried your suggestion, as all elements in that column for each file is identical, so I'm happy with indexing any one of them. However, the file.renames still gives me the same error warning: incalid 'to' argument Commented Feb 24, 2020 at 20:12
  • What is the value of newfilename?? Please edit your question and add the output of head(df), as well as the value of x, newfilename and outputdirectory for an value of x that raises the error Commented Feb 24, 2020 at 20:15
  • Hi @dario, I have break the loop and just run file in the folder with the code and checking what you ask for. The newfilename value is a Factor with 1 level. Commented Feb 24, 2020 at 20:28
  • @dario editted as above. I hope this helps Commented Feb 24, 2020 at 20:34

1 Answer 1

1

This should work:

create MRE #file 1 Row <- c(rep("16", 20)) column <- c(rep("3", 20)) Pid<- c(rep("Jack", 20)) Stimulation<- c(rep("3S", 20)) Drug <- c(rep("2DG", 20)) Dose <-c(rep("3uM", 20)) Treatmentsum <-c(rep(paste("Jack","3S",'2DG','3uM',sep = "_"), 20)) PiD_treatmentsum <- c(rep(paste('Jack',"T4_20200501",'3S','2DG','3uM',sep = "_"), 20)) sampleset <-data.frame(Row,column,Pid,Stimulation,Drug,Dose,Treatmentsum,PiD_treatmentsum) write.table(sampleset, file = "ctrl_Jack_3S_2DG_3uM.txt",sep="\t", row.names = F, col.names = T) #file 2 Row <- c(rep("16", 40)) column <- c(rep("3", 40)) Pid<- c(rep("Mark", 40)) Stimulation<- c(rep("3S", 40)) Drug <- c(rep("STS", 40)) Dose <-c(rep("1uM", 40)) Treatmentsum <-c(rep(paste("Mark","3S",'STS','1uM',sep = "_"), 40)) PiD_treatmentsum <- c(rep(paste('Mark',"T5_20200501",'3S','STS','1uM',sep = "_"), 40)) sampleset <-data.frame(Row,column,Pid,Stimulation,Drug,Dose,Treatmentsum,PiD_treatmentsum) write.table(sampleset, file = "ctrl_Mark_3S_STS_1uM.txt",sep="\t", row.names = F,col.names = T) 

I only changed the last three lines. We rename the file using file.rename (newname is now TRUE or FALSE if there was an error while renaming)

Then we create outdirectory (it will raise a warning if dir already exists, but nothing will be overwritten. We could test first if outdir already exists and if so omit the dir.create)

Finally we use file.copy to copy the renamed file into outdirectory. We can use file.path to concatenate the directory and filename.

# rename all the files using their PiD_treatmentsum # and copy them to outdirectory filenames <- list.files(".", pattern="*M\\.txt") outdirectory <- "~/out" lapply(filenames, function(x) { df <- read.csv(x, sep="\t", header=TRUE, fill = T,stringsAsFactors = F) a <- as.character(unique(df[["PiD_treatmentsum"]])) b<-paste0("ctrl_",a, '.txt', sep="") newname <- file.rename(basename(x), b) dir.create(outdirectory) file.copy(b, file.path(outdirectory, b)) }) 

I'd suggest updating the variable names to something more meaningful to make future refactoring easier ;)

Sign up to request clarification or add additional context in comments.

7 Comments

Thank you, it's running now, I selected 600 files and let it run, I will see the results tmr morning :)
Best of luck!! ;)
Excellent! Glad to hear that!
Hi @dario, there is an error. The code runs okay in the beginning and files are created. But then it reports an error after it processed 33 files (out of 600). Saying"Error in file.rename(basename(x), b) : 'from' and 'to' are of different lengths"
So this error occured while processing filenames[34]? What is the value that? And what are the values of df as.character(unique(df[["PiD_treatmentsum"]])) outdirectory
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.