5

I have some data files and I wish to rename them for my pipeline.

The files look like this:

{unique_ids}_{experiment_condition}_L{3_digit_number}.txt 

I need to rename them so the experiment condition flag will appear at the end of the filename, before the extension as follows:

{unique_ids}_L{3_digit_number}_{experiment_condition}.txt 

Length of unique_ids and experiment_condition is not fixed.

Example:

ghad312fd2_Mb_L002.txt becomes ghad312fd2_L002_Mb.txt.

Thank You!

3
  • 3
    Do unique_ids or experiment_condition contain underscores? Commented Feb 9, 2022 at 14:21
  • 1
    What operating system are you using? Do you have the perl rename command? What is the output of file $(readlink -f $(which rename))? Commented Feb 9, 2022 at 15:23
  • Also, you say you have {unique_ids}_L{3_digit_number}_{experiment_condition}.txt, but your example file name (ghad312fd2_Mb_L002.txt) is {unique_ids}_{experiment_condition}_L{3_digit_number}.txt. Can you give us a clearer example? Commented Feb 9, 2022 at 15:24

8 Answers 8

9

Using the Perl-based rename utility to rename all the files in the current directory matching the pattern ./*_*_*.txt (i.e. any file whose nome contains at least two underscores and ends with .txt):

rename -n 's/([^_]+)_([^_]+)\.txt$/$2_$1.txt/' ./*_*_*.txt 

This swaps the last two underscore-delimited parts of the filename, excluding the filename suffix .txt. Remove -n to run this for real after ensuring that it seems to be doing the correct thing.

5
  • 1
    @Cbhihe You will have to try installing both. On my (OpenBSD) system, the utility is actually called prename. I would be somewhat surprised if a name collision between the two utilities wasn't appropriately handled somehow. Commented Feb 9, 2022 at 19:32
  • 3
    I did think you were on BSD :-). I just found out that on Archlinux, installing perl-rename does preserve the default utility GNU rename. Tx. Commented Feb 9, 2022 at 21:06
  • @Cbhihe, AFAIK, there's no GNU rename. There's (a quite dumb) one in util-linux though, which might be the one installed on your system. Commented Feb 11, 2022 at 15:46
  • @StéphaneChazelas: You are (depressingly) correct on both counts: there is no specific GNU-flavored rename and the util-linux version installed by default is really "pared down" to bare bones compared to perl-rename. Instead the nifty regex based in-place substitution capability of perl-rename blew me away. perl rules ! (at least sometimes). Commented Feb 11, 2022 at 16:22
  • @Cbhihe, yes even the original rename from 33 years ago written as 10 lines of perl code was infinitely better than util-linux's (itself added around 2000 in 2.10e, annoying many Linux users at the time when some distributions started including it) Commented Feb 11, 2022 at 17:07
6

With the zsh shell:

autoload zmv zmv -n '(*)(_*)(_L[0-9](#c3))(.txt)' '$1$3$2$4' 

(remove -n (dry-run) if happy).

[0-9](#c3) matches a sequence of 3 ASCII decimal digits. You can also use <0-999> to match on numbers from 0 to 999 (bearing in mind it would also match on 0000123) or <-> for any number (any sequence of one or more ASCII decimal digits).

5

Try also

for FN in gh*; do IFS="_." read ID XC NR EXT <<< "$FN"; echo mv -- "$FN" "${ID}_${NR}_${XC}.${EXT}"; done 

It reads four variables from the respective file name in the "here string", and reconstructs the new file name from them. Remove the echo if happy with what you are seeing.

6
  • 1
    Dont use all upper case for non-exported shell variable names to avoid clashing with existing variables and obfuscating your code, see correct-bash-and-shell-script-variable-capitalization Commented Feb 9, 2022 at 18:10
  • 1
    @EdMorton The argument for case on shell variables seems to be mixed. The accepted answer in your link says to use all lower-case but the comments, including some from long-time Bell Labs employees, have reasons for not sticking to lower case. Readability and distinction from commands are the major arguments. Commented Feb 9, 2022 at 19:40
  • 2
    @doneal24 there's one person in the comments arguing for all upper case names and claiming to be an ex Bell Labs employee. As someone who worked at AT&T/Bell Labs/Lucent/etc. for 30 years myself I promise you that being a long term Bell Labs employee doesn't offer any particular authority on shell conventions, it just means you're old. You don't see people using all upper case variable names for readability and distinction from library functions, etc. in C, Java, Go, or any other non-ancient programming language so the argument you should do so in shell just doesn't hold water. Commented Feb 9, 2022 at 20:48
  • 3
    I also started with Fortran around 1978, manually writing programs on graph paper that were snail-mailed to the local college where a secretary typed them onto punch cards for a tech to run through the mainframe to snail-mail the output back a week later telling me that I forgot a semi-colon. Saying "why do it if you know..." is very much like saying if you know you're not going to crash why wear a seat belt? Just like quoting variables, you don't avoid all upper-case variables to protect against what you know about, you do it to protect against surprises. Commented Feb 9, 2022 at 22:10
  • 2
    @doneal24 for some examples, if you look at the well-respected, frequently referenced, and constantly reviewed bash FAQ, mywiki.wooledge.org/BashFAQ, I would be shocked if you see any non-exported variables that are all upper case unless they're in a "what not to do" section. I couldn't find any by poking around just now. Commented Feb 9, 2022 at 22:38
3

I propose this:

for i in *.txt; do n="${i%%.*}" id="$(echo "$n" | cut -d_ -f1)" e="$(echo "$n" | cut -d_ -f2)" d="$(echo "$n" | cut -d_ -f3)" echo mv -- "$i" "${id}_${d}_${e}.txt" done 

If you are happy with the result given by echo..., remove it and leave mv -- "$i" "${id}_${d}_${e}.txt" which will actually move the file.

3
  • 2
    I think the last line should be removed. Commented Feb 10, 2022 at 7:06
  • Does this spawn six child processes for each file to rename? Due to $( | ) expansion. Commented Feb 10, 2022 at 12:56
  • @rexkogitans right, removed, thanks. Commented Feb 10, 2022 at 13:46
2

Assuming unique_ids has no underscore, put this in a script and run it with GNU sed or any other sed that supports -E, giving your file names as arguments:

#!/bin/bash for f in "$@" ; do new_name=$(echo "$f" | sed -E 's/([^_]+)_(.+)_(L[0-9]{3})\.txt/\1_\3_\2.txt/g') echo "$f -> $new_name" mv "$f" "$new_name" done 
2

With sh:

for f in *.txt do # Getting the extension ext=".${f##*[.]}" # Get the 3 digit number part ext_trail="${f%[.]*}" digit_number="L${ext_trail##*_L}" # tmp variable to get the first two tmp="${ext_trail%_*}" # Get the experiment conditions experimental_condition="${tmp#*_}" # Get the unique id unique_id="${tmp%_*}" echo mv -- "$f" "${unique_id}_${digit_number}_${experimental_condition}${ext}" done 

With bash:

for f in *.txt do [[ "$f" =~ ^([^_]*)_([^_]*)(_L[0-9]{3})[.]txt ]] && echo mv -- "$f" "${BASH_REMATCH[1]}${BASH_REMATCH[3]}_${BASH_REMATCH[2]}.txt" done 
2

Try mmv:

mmv '*_*_*.txt' '#1_#3_#2.txt' 

It obviously only works if there are no other underscores present in the file names.

1

If the format is very robust like that, try to incorporate this:

echo ghad312fd2_Mb_L002.txt | awk -F'[_.]' -v OFS=_ '{ print $1, $3, $2 "." $4 }' 

output: ghad312fd2_L002_Mb.txt

Could look like this in a script:

#!/bin/bash for f in *.txt; do mv -v -- "$f" "$(awk -F'[_.]' -v OFS=_ '{ print $1, $3, $2 "." $4 }' <<<"$f")" done 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.