43

I am new to linux. I have a directory in linux with approx 250,000 files I need to find count of number of files matching a pattern.

I tried using following command :

ls -1 20061101-20131101_kh5x7tte9n_2010_* | wc -l 

I got the following error message:

-bash: /bin/ls: Argument list too long 0 

Please help. Thanks in advance

0

8 Answers 8

75

It might be better to use find for this:

find . -name "pattern_*" -printf '.' | wc -m 

In your specific case:

find . -maxdepth 1 -name "20061101-20131101_kh5x7tte9n_2010_*" -printf '.' | wc -m 

find will return a list of files matching the criteria. -maxdepth 1 will make the search to be done just in the path, no subdirectories (thanks Petesh!). -printf '.' will print a dot for every match, so that names with new lines won't make wc -m break.

Then wc -m will indicate the number of characters which will match the number of files.


Performance comparation of two possible options:

Let's create 10 000 files with this pattern:

$ for i in {1..10000}; do touch 20061101-20131101_kh5x7tte9n_201_$i; done 

And then compare the time it takes to get the result with ls -1 ... or find ...:

$ time find . -maxdepth 1 -name "20061101-20131101_kh5x7tte9n_201_*" | wc -m 10000 real 0m0.034s user 0m0.017s sys 0m0.021s $ time ls -1 | grep 20061101-20131101_kh5x7tte9n_201 | wc -m 10000 real 0m0.254s user 0m0.245s sys 0m0.020s 

find is x5 times faster! But if we use ls -1f (thanks Petesh again!), then ls is even faster than find:

$ time ls -1f | grep 20061101-20131101_kh5x7tte9n_201 | wc -m 10000 real 0m0.023s user 0m0.020s sys 0m0.012s 
Sign up to request clarification or add additional context in comments.

7 Comments

to prevent recursing into subdirectories, you could use -maxdepth 1 (if it's supported in that version of find)
ls has the bad habit of sorting before outputting, you should test with ls -1 -f to get a similar behaviour as find for performance evaluation
Pretty interesting, @Petesh, didn't know about it. I have tested the performance and to me with ls -1f it was even faster than find.
If you use the -printf '.' trick, you should count characters (wc -m) not lines. Alternatively, add a newline after the dot (-printf '.\n').
How about using --count (-c) for grep and skipping wc? I would expect performance gain. (And also a simpler expression.) Then again, for the same reasons, I would expect find with -name to be faster than ls|grep while apparently it is not...
|
6

you got "argument too long" because shell expands your pattern to the list of files. try:

find -maxdepth 1 -name '20061101-20131101_kh5x7tte9n_2010_*' |wc -l 

please pay attention - pattern is enclosed in quotes to prevent shell expansion

Comments

6

The MacOS / OS X command line solution

If you are attempting to do this in the command line on a Mac you will soon find out that find does not support the -printf option.

To accomplish the same result as the solution proposed by fedorqui-supports-monica try this:

find . -name "pattern_*" -exec stat -f "." {} \; | wc -l 

This will find all files matching the pattern you entered, print a . for each of them in a newline, then finally count the number of lines and output that number.

Using find to count matching filenames in MacOS and OS X

To limit your search depth to the current directory, add -maxdepth 1 to the command like so:

find . -maxdepth 1 -name "196288.*" -exec stat -f "." {} \; | wc -l 

Comments

5

Just do:

find . -name "pattern_*" |wc -l 

Comments

2

Try this:

ls -1 | grep 20061101-20131101_kh5x7tte9n_2010_ | wc -l 

Comments

1

You should generally avoid ls in scripts and in fact, performing the calculation in a shell function will avoid the "argument list too long" error because there is no exec boundary and so the ARGV_MAX limit doesn't come into play.

number_of_files () { if [ -e "$1" ]; then echo "$#" else echo 0 fi } 

The conditional guards against the glob not being expanded at all (which is the default out of the box; in Bash, you can shopt -s nullglob to make wildcards which don't match any files get expanded into the empty string).

Try it:

number_of_files 20061101-20131101_kh5x7tte9n_2010_* 

Comments

0

First of all it is better not to use ls according to this article !!!

and this problem can be solved in many ways. I will list some of the most elegant ones that come to my mind.

count=$(printf '%s\n' *pattern* | wc -l) #or count=$(shopt -s nullglob; files=(*pattern*); echo ${#files[@]}) #or count=$(file *pattern* | wc -l) #or count=$(stat -c "%n" *pattern* | wc -l) #or count=$(du -a *pattern* | wc -l) #or count=$(echo *pattern* | wc -w) 

but last one gives the wrong number when the file names contain whitespace.

1 Comment

first version, count=(*pattern*), will not work as expected if the pattern matches no file. In that case, count will be 1, not 0, because it will contain the pattern itself. shopt -s nullglob must be specified for it to work.
-3
ls -1 | grep '20061101-20131101_kh5x7tte9n_2010_*' | wc -l 

Previous answer did not included quotes around search criteria neither * wildcard.

2 Comments

This is basically a repeat of a previous answer plus it won't work.
This is confusing shell wildcards and regular expressions. grep supports the latter, and will find a match on any substring, so the trailing wildcard is unnecessary, and also doesn't mean what you think. I support the idea that you should generally use quoting around your regexes, but in this particular case, it's not necessary, and the incorrect regex ruins the answer. For the record, the wildcard * (which mustn't be quoted) corresponds to the regex .*

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.