3

I'm trying to batch rename files in a folder with PHP. It's mostly working, though I'm having problems with accented characters.

An example of a filename with accented characters is ÅRE_GRÖN.JPG.

I would like to rename that file to ARE_GRON.JPG.

If I read the files in like this:

<?php $path = __DIR__; $dir_handle = opendir($path); while ($file = readdir($dir_handle)) { echo $file . "\n"; } closedir($dir_handle); 

...And the page displays ÅRE_GRÖN.JPG.

If I add header('Content-Type: text/html; charset=UTF-8'); to the beginning of my script, it displays the correct file name, but the rename() function seems to have no effect either way.

Here's what I've tried:

while ($file = readdir($dir_handle)) { rename($file, str_replace('Ö', 'O', $file)); # No effect rename($file, str_replace('Ö', 'O', $file)); # No effect } 

Where am I going wrong?


Do say if you believe I'm using the wrong tool for the job. If anyone knows how to achieve this with a Bash script, show me. I have no Bash chops.

10
  • Are you on windows or linux? Commented Feb 12, 2013 at 21:54
  • Is your PHP script encoded as UTF-8? Commented Feb 12, 2013 at 21:56
  • Since he said bash, I would guess he is referring to bash(1) which would suggest Lunix. Commented Feb 12, 2013 at 21:56
  • This is what I could find: bugs.php.net/bug.php?id=39660 However I believe there should already be a work-around for this, like using the encoding system PHP is okay with. I'll post an answer if I ever find anything. Also possible duplicate to: stackoverflow.com/questions/873853/… Commented Feb 12, 2013 at 21:57
  • It could easily be Bash on cygwin, natively on Windows or on FreeBSD. Commented Feb 12, 2013 at 21:57

2 Answers 2

2

I figured out how to do it.

I first ran urlencode() on the filename. This converts the string:

MÖRKGRÅ.JPG 

To the URL friendly:

MO%CC%88RKGRA%CC%8A.JPG 

I then ran str_replace() on the URL-encoded string, providing needles and haystacks in arrays. I only needed it for a few Swedish characters, so my solution looked like this:

<?php header('Content-Type: text/html; charset=UTF-8'); $path = __DIR__; $dir_handle = opendir($path); while ($file = readdir($dir_handle)) { $search = array('A%CC%8A', 'A%CC%88', 'O%CC%88'); $replace = array('A', 'A', 'O'); rename($file, str_replace($search, $replace, urlencode($file))); } closedir($dir_handle); 

Job done :)


I've come to realise this is more versatile than I anticipated. Running another script, url_encode() gave me some slightly different output, but it's easy to change accordingly.

$search = array('%26Aring%3B', '%26Auml%3B', '%26Ouml%3B', '+'); $replace = array('A', 'A', 'O', '_'); 
Sign up to request clarification or add additional context in comments.

Comments

0

If you have a limited number of characters you want to replace, you can do it with

for f in *; do mv "$f" "${f//Ö/O/}" 2> /dev/null; done 

On GNU you could more generally use

expr="" for char in {A..Z} do expr+="s/[[=$char=]]/$char/g; "; done; for f in *; do mv "$f" "$(sed -e "$expr" <<< "$f")" 2> /dev/null; done 

to replace all A-like accented characters with an ascii A, for every character in the alphabet, but with no guarantees for OS X sed. Beware that this has the side effect of capitalizing all filenames.

2 Comments

Hmm.. I tried running that first script from the directory that holds the files, but it didn't seem to have any effect.
Try copy-pasting the Ö character from the filename rather than typing it. Unicode has a lot of pretty identical Ö characters.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.