44

I have a list of files names:

names = ['aet2000','ppt2000', 'aet2001', 'ppt2001'] 

While I have found some functions that can work to grep character strings, I haven't figured out how to grep all elements of a list.

for instance I would like to:

grep(names,'aet') 

and get:

['aet2000','aet2001'] 

Sure its not too hard, but I am new to Python


update The question above apparently wasn't accurate enough. All the answers below work for the example but not for my actual data. Here is my code to make the list of file names:

years = range(2000,2011) months = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"] variables = ["cwd","ppt","aet","pet","tmn","tmx"] # *variable name* with wildcards tifnames = list(range(0,(len(years)*len(months)*len(variables)+1) )) i = 0 for variable in variables: for year in years: for month in months: fullname = str(variable)+str(year)+str(month)+".tif" tifnames[i] = fullname i = i+1 

Running filter(lambda x:'aet' in x,tifnames) or the other answers return:

Traceback (most recent call last): File "<pyshell#89>", line 1, in <module> func(tifnames,'aet') File "<pyshell#88>", line 2, in func return [i for i in l if s in i] TypeError: argument of type 'int' is not iterable 

Despite the fact that tifnames is a list of character strings:

type(tifnames[1]) <type 'str'> 

Do you guys see what's going on here? Thanks again!

5
  • 2
    Look at tifnames[-1]; it's an integer. Instead of preallocating space, simply write tifnames = [] and then tifnames.append(fullname), or use a dictionary if the indexing matters. Commented Oct 11, 2012 at 17:56
  • 1
    the last item in your list tifnames is 792(integer), that is why you're getting the Error while running our codes. Commented Oct 11, 2012 at 17:57
  • Nice thanks!!!!!!!!!!!!!!! Life savers all of you Commented Oct 11, 2012 at 17:58
  • +1 for list operations. @mmann1123 are you coming from a c language background? Commented Oct 11, 2012 at 18:09
  • The expression list(range(0,(len(years)*len(months)*len(variables)+1) )) is both wrong and not needed. It is wrong because of the +1 at the end; the list will be one element too long. It is not needed because you should just use tifnames=[] and then append the values to that in your loop. Commented Oct 11, 2012 at 20:20

5 Answers 5

70

Use filter():

>>> names = ['aet2000','ppt2000', 'aet2001', 'ppt2001'] >>> filter(lambda x:'aet' in x, names) ['aet2000', 'aet2001'] 

with regex:

>>> import re >>> filter(lambda x: re.search(r'aet', x), names) ['aet2000', 'aet2001'] 

In Python 3 filter returns an iterator, hence to get a list call list() on it.

>>> list(filter(lambda x:'aet' in x, names)) ['aet2000', 'aet2001'] 

else use list-comprehension(it will work in both Python 2 and 3:

>>> [name for name in names if 'aet' in name] ['aet2000', 'aet2001'] 
Sign up to request clarification or add additional context in comments.

6 Comments

Just worth noting that there can be a space between the lambda arg declaration and the function definition (eg: lambda x: 'aet' in x)
In python3, you might want to wrap the filter in a list like so: list(filter(lambda x:'aet' in x, names))
If you want a list, list comprehension also works: [n for n in names if 'aet' in n]. List comprehension is really what you're looking for and is the Perl-grep of Python. The built-in function filter() returns an iterator if you'd like to parse data on the fly.
python3 example?
@KhurshidAlam Updated.
|
12

Try this out. It may not be the "shortest" of all the code shown, but for someone trying to learn python, I think it teaches more

names = ['aet2000','ppt2000', 'aet2001', 'ppt2001'] found = [] for name in names: if 'aet' in name: found.append(name) print found 

Output

['aet2000', 'aet2001'] 

Edit: Changed to produce list.

See also:

How to use Python to find out the words begin with vowels in a list?

2 Comments

this doesn't produce the output the OP asked for, namely a list.
Oh, thanks @gerrat. Didn't realize OP needed list output. Fixed.
9
>>> names = ['aet2000', 'ppt2000', 'aet2001', 'ppt2001'] >>> def grep(l, s): ... return [i for i in l if s in i] ... >>> grep(names, 'aet') ['aet2000', 'aet2001'] 

Regex version, closer to grep, although not needed in this case:

>>> def func(l, s): ... return [i for i in l if re.search(s, i)] ... >>> func(names, r'aet') ['aet2000', 'aet2001'] 

2 Comments

Totally. too much like integer 1, esp. bc global audience with various fonts. People are free to do what they want in the privacy of their own code but this classroom is a place to be crystal clear avoiding any confusions, yep.
4

You should try to look into the pythong module called re. Bellow I have a grep function implmentation in python that uses re. It will help you understand how re works (of course only after you read about re)

def grep(pattern,word_list): expr = re.compile(pattern) return [elem for elem in word_list if expr.match(elem)] 

Comments

2

You do not need to preallocate the list tifnames or use the counter to put in elements. Just append the data to the list as generated or use a list comprehension.

ie, Just do this:

import re years = ['2000','2011'] months = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"] variables = ["cwd","ppt","aet","pet","tmn","tmx"] # *variable name* with wildcards tifnames = [] for variable in variables: for year in years: for month in months: fullname = variable+year+month+".tif" tifnames.append(fullname) print tifnames print '===' print filter(lambda x: re.search(r'aet',x),tifnames) 

Prints:

['cwd2000jan.tif', 'cwd2000feb.tif', 'cwd2000mar.tif', 'cwd2000apr.tif', 'cwd2000may.tif', 'cwd2000jun.tif', 'cwd2000jul.tif', 'cwd2000aug.tif', 'cwd2000sep.tif', 'cwd2000oct.tif', 'cwd2000nov.tif', 'cwd2000dec.tif', 'cwd2011jan.tif', 'cwd2011feb.tif', 'cwd2011mar.tif', 'cwd2011apr.tif', 'cwd2011may.tif', 'cwd2011jun.tif', 'cwd2011jul.tif', 'cwd2011aug.tif', 'cwd2011sep.tif', 'cwd2011oct.tif', 'cwd2011nov.tif', 'cwd2011dec.tif', 'ppt2000jan.tif', 'ppt2000feb.tif', 'ppt2000mar.tif', 'ppt2000apr.tif', 'ppt2000may.tif', 'ppt2000jun.tif', 'ppt2000jul.tif', 'ppt2000aug.tif', 'ppt2000sep.tif', 'ppt2000oct.tif', 'ppt2000nov.tif', 'ppt2000dec.tif', 'ppt2011jan.tif', 'ppt2011feb.tif', 'ppt2011mar.tif', 'ppt2011apr.tif', 'ppt2011may.tif', 'ppt2011jun.tif', 'ppt2011jul.tif', 'ppt2011aug.tif', 'ppt2011sep.tif', 'ppt2011oct.tif', 'ppt2011nov.tif', 'ppt2011dec.tif', 'aet2000jan.tif', 'aet2000feb.tif', 'aet2000mar.tif', 'aet2000apr.tif', 'aet2000may.tif', 'aet2000jun.tif', 'aet2000jul.tif', 'aet2000aug.tif', 'aet2000sep.tif', 'aet2000oct.tif', 'aet2000nov.tif', 'aet2000dec.tif', 'aet2011jan.tif', 'aet2011feb.tif', 'aet2011mar.tif', 'aet2011apr.tif', 'aet2011may.tif', 'aet2011jun.tif', 'aet2011jul.tif', 'aet2011aug.tif', 'aet2011sep.tif', 'aet2011oct.tif', 'aet2011nov.tif', 'aet2011dec.tif', 'pet2000jan.tif', 'pet2000feb.tif', 'pet2000mar.tif', 'pet2000apr.tif', 'pet2000may.tif', 'pet2000jun.tif', 'pet2000jul.tif', 'pet2000aug.tif', 'pet2000sep.tif', 'pet2000oct.tif', 'pet2000nov.tif', 'pet2000dec.tif', 'pet2011jan.tif', 'pet2011feb.tif', 'pet2011mar.tif', 'pet2011apr.tif', 'pet2011may.tif', 'pet2011jun.tif', 'pet2011jul.tif', 'pet2011aug.tif', 'pet2011sep.tif', 'pet2011oct.tif', 'pet2011nov.tif', 'pet2011dec.tif', 'tmn2000jan.tif', 'tmn2000feb.tif', 'tmn2000mar.tif', 'tmn2000apr.tif', 'tmn2000may.tif', 'tmn2000jun.tif', 'tmn2000jul.tif', 'tmn2000aug.tif', 'tmn2000sep.tif', 'tmn2000oct.tif', 'tmn2000nov.tif', 'tmn2000dec.tif', 'tmn2011jan.tif', 'tmn2011feb.tif', 'tmn2011mar.tif', 'tmn2011apr.tif', 'tmn2011may.tif', 'tmn2011jun.tif', 'tmn2011jul.tif', 'tmn2011aug.tif', 'tmn2011sep.tif', 'tmn2011oct.tif', 'tmn2011nov.tif', 'tmn2011dec.tif', 'tmx2000jan.tif', 'tmx2000feb.tif', 'tmx2000mar.tif', 'tmx2000apr.tif', 'tmx2000may.tif', 'tmx2000jun.tif', 'tmx2000jul.tif', 'tmx2000aug.tif', 'tmx2000sep.tif', 'tmx2000oct.tif', 'tmx2000nov.tif', 'tmx2000dec.tif', 'tmx2011jan.tif', 'tmx2011feb.tif', 'tmx2011mar.tif', 'tmx2011apr.tif', 'tmx2011may.tif', 'tmx2011jun.tif', 'tmx2011jul.tif', 'tmx2011aug.tif', 'tmx2011sep.tif', 'tmx2011oct.tif', 'tmx2011nov.tif', 'tmx2011dec.tif'] === ['aet2000jan.tif', 'aet2000feb.tif', 'aet2000mar.tif', 'aet2000apr.tif', 'aet2000may.tif', 'aet2000jun.tif', 'aet2000jul.tif', 'aet2000aug.tif', 'aet2000sep.tif', 'aet2000oct.tif', 'aet2000nov.tif', 'aet2000dec.tif', 'aet2011jan.tif', 'aet2011feb.tif', 'aet2011mar.tif', 'aet2011apr.tif', 'aet2011may.tif', 'aet2011jun.tif', 'aet2011jul.tif', 'aet2011aug.tif', 'aet2011sep.tif', 'aet2011oct.tif', 'aet2011nov.tif', 'aet2011dec.tif'] 

And, depending if you find this more readable, it would be more idiomatic Python to have this:

years = ['2000','2011'] months = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"] vars = ["cwd","ppt","aet","pet","tmn","tmx"] tifnames = [v+y+m+".tif" for y in years for m in months for v in vars] print tifnames print '===' print [e for e in tifnames if re.search(r'aet',e)] 

...same output

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.