3

I have two million text files in a server online accesible to internet users. I was asked to make a change (a string replace operation) to these files as soon as possible. I was thinking about doing a str_replace on every text file on the server. However, I don't want to tie up the server and make it unreachable by internet users.

Do you think the following is a good idea?

<?php ini_set('max_execution_time', 1000); $path=realpath('/dir/'); $objects = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path), RecursiveIteratorIterator::SELF_FIRST); foreach($objects as $name => $object){ set_time_limit(100); //do str_replace stuff on the file } 
5
  • 8
    "two million text files in a server" - wut. Commented Apr 22, 2015 at 19:12
  • 2
    figure out how to do 2 and the rest will follow, just like sheep Commented Apr 22, 2015 at 19:12
  • Why would this make the server unreachable? It should be able to run multiple requests at the same time. Commented Apr 22, 2015 at 19:13
  • 7
    This does not sound like a job for PHP, but rather find and sed. superuser.com/search?q=replace+multiple+files+sed Something along these lines: superuser.com/questions/146389/… Commented Apr 22, 2015 at 19:14
  • @user2070775 I guess you don't want your server hang after a couple of minutes, right ? read my answer, specifically the xargs explanation. Commented Apr 22, 2015 at 19:53

3 Answers 3

4

Use find, xargs and sed from shell, i.e.:

cd /dir find . -type f -print0 | xargs -0 sed -i 's/OLD/NEW/g 

Will search all files recursively (hidden also) inside the current dir and replace OLD for NEW using sed.


Why -print0?

From man find:

If you are piping the output of find into another program and there is the faintest possibility that the files which you are searching for might contain a newline, then you should seriously consider using the '-print0' option instead of '-print'.


Why xargs ?

From man find:

The specified command is run once for each matched file.

That is, if there are 2000 files in /dir, then find ... -exec ... will result in 2000 invocations of sed; whereas find ... | xargs ... will only invoke sed once or twice.

Sign up to request clarification or add additional context in comments.

Comments

3

Don't do this with PHP, it's most likely to fail horribly and I'll take up all your system resources.

find . -type f -exec sed -i 's/search/replace/g' {} + 

The example above with search and replace string and it's recursive and regular files including hidden ones.

1 Comment

Your answer will make the server hang after a while, you should use xargs, read my explanation.
0

You could also do this with a Python program limited to one core (which is the default). If your machine has multiple cores, and at least one is generally free, you should be set.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.