5

In a shell script program, I need to convert the filenames to uppercase if the converted filename does not already exist. In this particular case I need to change only the basename to uppercase leaving the extension (if any) as it is.

My idea of doing the problem is to extract first the basename and the extension separately, convert the basename to uppercase using tr command and then check whether the changed basename along with the extension exists in the directory or not.

If it does not exist then I will change the original filename with the uppercase basename using mv. Now I think that this can be done in two ways: firstly using expr and secondly using cut with . (space-period-space) as the delimiter.

If I want to use expr for extracting the basename (for eg. from the filename - python1.py or phonelist) then I have written this:

basefile=`expr "$filename" : '\(.*\)\.*.*' ` 

I have used \.* for those filenames also which do not have any extension because \.* would ignore zero or more occurrences of ., but this expression for expr is not working properly. For any filename, it is returning the whole filename as it is.

Can anyone please explain where am I wrong. Also please suggest how can I use expr to extract only the extension from the filename.

5
  • 2
    Have you tried using the basename command to extract it? Commented Apr 15, 2020 at 10:16
  • As a side note, do not use the old-style backtick notation for command-substitutions, it is deprecated and should be replaced by the $( ... ) notation, i.e. basefile="$(expr "$filename" : '\(.*\)\.*.*')" in your case. Commented Apr 15, 2020 at 10:47
  • @Dagelf Yes but for using basename we must know the extension already. But in this case the files may have any extension Commented Apr 15, 2020 at 12:05
  • @roaima Bash shell Commented Apr 15, 2020 at 12:06
  • @Admin Ok thankyou. Actually in the book I follow it is given within backtick notation. That's why I have used it. Commented Apr 15, 2020 at 12:07

3 Answers 3

8

If the shell is bash, using just bash parameters expansion:

file="aaa.bbb.dat" name=${file%.*} # delete everything after last dot ext=${file##*.} # delete everything up to last dot upcase=${name^^*}.$ext # uppercase everything echo "$upcase" 
AAA.BBB.dat 

Trying with a more difficult case:

file="déjà vu . dat " name=${file%.*} # delete everything after last dot ext=${file##*.} # delete everything up to last dot upcase=${name^^*}.$ext # uppercase everything echo ":$upcase:" 

Gives:

:DÉJÀ VU . dat : 

So:

  • double quotes aren't necessary, until using the result
  • Uppercase seems OK even for non-ASCII characters
3
  • Just one question: @Gilles already proposed that in this answer but cautioned that non-ASCII characters may not be correctly transformed; can you comment on whether the bash internals do that correctly? Commented Apr 15, 2020 at 12:12
  • @roaima Doesn't seem necessary, counter-example gladly accepted Commented Apr 15, 2020 at 12:23
  • @AdminBee Seems to work OK, see edited answer. Commented Apr 15, 2020 at 12:24
3

When there's an ambiguity in how far a group extends, regex engines favor the longest match first. For any file name, \(.*\) matches the whole name and \.*.* matches the empty string.

You'll need two cases: with or without extension. Note also that if a file name starts with a ., that's not the start of an extension.

I don't understand why you want to use expr. Shell parameter manipulation is easier.

On converting to uppercase, note that the tr implementation on Linux does not support non-ASCII locales. It only does byte manipulation. For example echo accentué | tr a-z A-Z results in ACCENTUé, not ACCENTUÉ. Use a locale-aware tool such as awk instead. In bash, you can use ${filename^^?}, but that's not available in sh. Make sure that your script is running in the correct locale for the file names' encoding.

I assume that the filename doesn't contain a directory part. If it does, separate it first.

case $filename in ?*.*) # There is an extension base="${filename%.*}"; ext=".${filename##*.}";; *) # No extension base="$filename"; ext="";; esac upcased_base="$(printf %s. %base | awk '$0 = toupper($0)')" upcased="${upcased_base%.}$ext" 

The trailing . in %s. that then gets stripped from $upcased_base ensures that the script correctly handles file names with a newline immediately before the extension. Without this, the command substitution would strip off trailing newlines. You don't need this if you've already ensured that your file names don't contain newline characters.

7
  • Why does \.*.* matches empty string? Can you please explain Commented Apr 15, 2020 at 12:11
  • This is because * means "zero or more" of the preceding character/sub-expression in the context of regular expressions. Commented Apr 15, 2020 at 12:13
  • Actually I am just a beginner in learning shell scripting and linux commands. Till now I know very few commands and one of those is expr . That's why I tried to do this problem using expr . Commented Apr 15, 2020 at 12:17
  • Can you please explain what does # and % mean in your code. Commented Apr 15, 2020 at 12:20
  • If you are new to bash scripting, I would recommend the Bash Guide for a good overview. In addition, using shellcheck - also available as standalone tool in many Linux distributions - can help debug shell scripts and prevent unexpected behavior. Commented Apr 15, 2020 at 12:26
0

Here is an entirely awk-based solution, where you would put the following line in your shell script:

uppercasename="$(echo "$filename" | awk 'BEGIN{FS=OFS="."} NF==1{$1=toupper($1)} {for (i=1;i<NF;i++) $i=toupper($i)} 1')" 

This will use the . as field separator for input and output and, if only one field is found, convert that to uppercase, and in all other cases convert all but the last fields to uppercase. It then prints the result (this is the meaning of the 1, which is a shorthand notation for {print}).

If you are using bash, you could get rid of the pipe and state it as

uppercasename="$(awk 'BEGIN{FS=OFS="."} NF==1{$1=toupper($1)} {for (i=1;i<NF;i++) $i=toupper($i)} 1' <<< "$filename")" 

using a here-string.

Note that this is designed so that in the borderline case of a filename ending in a ., as in myfile.this.txt., it will treat that like an "empty but present suffix" and convert it to MYFILE.THIS.TXT.. Also, if the filename starts with a . and has no other extension (as in .myfile), it will keep that as lowercase.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.