I'm using the AWS CLI to copy files from an S3 bucket to my R machine using a command like below:
system( "aws s3 cp s3://my_bucket_location/ ~/my_r_location/ --recursive --exclude '*' --include '*trans*' --region us-east-1" ) This works as expected, i.e. it copies all files in my_bucket_location that have "trans" in the filename at that location.
The problem that I am facing is that I have other files with similar naming conventions that I don't want to import in this step. As an example, in the list below I only want to copy the first two files, not the last two:
File list trans_120215.csv trans_130215.csv sum_trans_120215.csv sum_trans_130215.csv If I was using regex I could make it more specific like "^trans_\\d+" to bring in just the first two files, but this doesn't seem possible using AWS CLI. So my question is there a way to have more complex pattern matching using AWS CLI like below?
system( "aws s3 cp s3://my_bucket_location/ ~/my_r_location/ --recursive --exclude '*' --include '^trans_\\d+' --region us-east-1" ) Please note that I can only use information about the file in question, i.e. that I want to import a file with pattern "^trans_\\d+", I can't use the fact that the other unwanted files contain sum_ at the start, because this is only an example there could be other files with similar names like "check_trans_120215.csv".
I have considered other alternatives like below, but hoping there is a way to adjust the copy command to avoid going down either of these routes:
- Listing all items in the bucket > using regex in R to specify the files that I want > Only importing those files
- Keeping the copy command as it is > delete unwanted files on the R machine after the copy