23

I want to search a CSV file and print either True or False, depending on whether or not I found the string. However, I'm running into the problem whereby it will return a false positive if it finds the string embedded in a larger string of text. E.g.: It will return True if string is foo and the term foobar is in the CSV file. I need to be able to return exact matches.

username = input() if username in open('Users.csv').read(): print("True") else: print("False") 

I've looked at using mmap, re and csv module functions, but I haven't got anywhere with them.

EDIT: Here is an alternative method:

import re import csv username = input() with open('Users.csv', 'rt') as f: reader = csv.reader(f) for row in reader: re.search(r'\bNOTSUREHERE\b', username) 

6 Answers 6

35

when you look inside a csv file using the csv module, it will return each row as a list of columns. So if you want to lookup your string, you should modify your code as such:

import csv username = input() with open('Users.csv', 'rt') as f: reader = csv.reader(f, delimiter=',') # good point by @paco for row in reader: for field in row: if field == username: print "is in file" 

but as it is a csv file, you might expect the username to be at a given column:

with open('Users.csv', 'rt') as f: reader = csv.reader(f, delimiter=',') for row in reader: if username == row[2]: # if the username shall be on column 3 (-> index 2) print "is in file" 
Sign up to request clarification or add additional context in comments.

Comments

3

I have used the top comment, it works and looks OK, but it was too slow for me.

I had an array of many strings that I wanted to check if they were in a large csv-file. No other requirements.

For this purpose I used (simplified, I iterated through a string of arrays and did other work than print):

with open('my_csv.csv', 'rt') as c: str_arr_csv = c.readlines() 

Together with:

if str(my_str) in str(str_arr_csv): print("True") 

The reduction in time was about ~90% for me. Code locks ugly but I'm all about speed. Sometimes.

Comments

1

You should have a look at the csv module in python.

is_in_file = False with open('my_file.csv', 'rb') as csvfile: my_content = csv.reader(csvfile, delimiter=',') for row in my_content: if username in row: is_in_file = True print is_in_file 

It assumes that your delimiter is a comma (replace with the your delimiter. Note that username must be defined previously. Also change the name of the file. The code loops through all the lines in the CSV file. row a list of string containing each element of your row. For example, if you have this in your CSV file: Joe,Peter,Michel the row will be ['Joe', 'Peter', 'Michel']. Then you can check if your username is in that list.

Comments

0
import csv scoresList=[] with open ("playerScores_v2.txt") as csvfile: scores=csv.reader(csvfile, delimiter= ",") for row in scores: scoresList.append(row) playername=input("Enter the player name you would like the score for:") print("{0:40} {1:10} {2:10}".format("Name","Level","Score")) for i in range(0,len(scoresList)): print("{0:40} {1:10} {2:10}".format(scoresList[i] [0],scoresList[i] [1], scoresList[i] [2])) 

Comments

0

EXTENDED ALGO:
As i can have in my csv some values with space: ", atleft,atright , both " , I patch the code of zmo as follow

 if field.strip() == username: 

and it's ok, thanks.

OLD FASHION ALGO
i had previously coded an 'old fashion' algorithm that takes care of any allowed separators ( here comma, space and newline),so i was curious to compare performances.
With 10000 rounds on a very simple csv file, i got:

------------------ algo 1 old fashion ---------------
done in 1.931804895401001 s.
------------------ algo 2 with csv ---------------
done in 1.926626205444336 s.

As this is not too bad, 0.25% longer, i think that this good old hand made algo can help somebody (and will be useful if more parasitic chars as strip is only for spaces)
This algo uses bytes and can be used for anything else than strings.
It search for a name not embedded in another by checking left and right bytes that must be in the allowed separators.
It mainly uses loops with ejection asap through break or continue.

def separatorsNok(x): return (x!=44) and (x!=32) and (x!=10) and (x!=13) #comma space lf cr # set as a function to be able to run several chained tests def searchUserName(userName, fileName): # read file as binary (supposed to be utf-8 as userName) f = open(fileName, 'rb') contents = f.read() lenOfFile = len(contents) # set username in bytes userBytes = bytearray(userName.encode('utf-8')) lenOfUser = len(userBytes) posInFile = 0 posInUser = 0 while posInFile < lenOfFile: found = False posInUser = 0 # search full name while posInFile < lenOfFile: if (contents[posInFile] == userBytes[posInUser]): posInUser += 1 if (posInUser == lenOfUser): found = True break posInFile += 1 if not found: continue # found a fulll name, check if isolated on left and on right # left ok at very beginning or space or comma or new line if (posInFile > lenOfUser): if separatorsNok(contents[posInFile-lenOfUser]): #previousLeft continue # right ok at very end or space or comma or new line if (posInFile < lenOfFile-1): if separatorsNok(contents[posInFile+1]): # nextRight continue # found and bordered break # main while if found: print(userName, "is in file") # at posInFile-lenOfUser+1) else: pass 

to check: searchUserName('pirla','test.csv')

As other answers, code exit at first match but can be easily extended to find all.

HTH

Comments

-1
#!/usr/bin/python import csv with open('my.csv', 'r') as f: lines = f.readlines() cnt = 0 for entry in lines: if 'foo' in entry: cnt += 1 print"No of foo entry Count :".ljust(20, '.'), cnt 

1 Comment

This is wrong. The question explicitly states that it should not match foobar if it's looking for foo. Your code is also over-complicated. You don't need to split the lines - you can just scan for foo if you ignore the limitation.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.