How to "best" check if a string/word occurs in a csv file

Question

I want to check if a specific word (user-defined via input) occurs in a csv file. Now, I've come up with a code that does that but since I'm a beginner and don't want to adapt any "bad habits", I'm wondering if it is the fastest, easiest and shortest possibility. Any given improvements are appreciated. sa

This works (mostly, see below), but the whole thing with the "yes" variable makes me think that there has to be a better way to solve this.

def add(self, name): with open(filepath, "r+") as file: csvreader = csv.reader(file, delimiter=",", quotechar='"') csvwriter = csv.writer(file, delimiter=",", quotechar='"') yes = False for line in csvreader: if name in line[0]: yes = True if yes: print("This ingredient has already been added") else: csvwriter.writerow([name])

It sometimes throws an "IndexError: list index out of range". I don not have any idea as to why because it only does that sometimes. Other times it works fine...

Very simple improvement; add a break after yes = True. No point continuing going through the file. Otherwise, the code looks pretty good to me. If you can't store the file in memory, you're not going to get faster. — roganjosh
– roganjosh, Commented Feb 3, 2019 at 12:54
The index error, though, is unusual. Presumably there is a blank line at the end of the file? You could use if line and name in line[0]: — roganjosh
– roganjosh, Commented Feb 3, 2019 at 12:57
Why are you only checking the first word of a line? And your code will fail on empty lines, so check for that first if line and [...]. — Jan Christoph Terasa
– Jan Christoph Terasa, Commented Feb 3, 2019 at 12:58
@JanChristophTerasa it's not the first word, it's the first value. It could easily be a sentence or more. — roganjosh
– roganjosh, Commented Feb 3, 2019 at 13:01
@roganjosh Each line is split at delimiter into tokens, or words. — Jan Christoph Terasa
– Jan Christoph Terasa, Commented Feb 3, 2019 at 13:08

roganjosh · Accepted Answer · 2019-02-03 15:59:29Z

There are 2 improvements you could make:

After the value is found and you set the found flag to True, add a break; there's no point continuing to scan the file.
Your index error likely comes from a blank line. This will be falsey so we can add a check for that before trying to access by index. if line and name in line[0]:. This will not attempt the index if the first condition is not True.

In terms of being falsey, this refers to objects that will be considered False without actually being a Boolean. This includes None and empty sequences such as an empty string (''), an empty list ([]) etc. Empty sequences don't support indexing, even for the zeroth index, so that's why you get an error on a blank line.

With falsey items, we don't need a direct comparison (==) to True or False; indeed they would fail. But you can do boolean-type checks on them e.g. if some_sequence: or if not some_sequence:. Also, and checks conditions left-to-right and will stop as soon as it finds a falsey condition. In the case of if line and... it never gets to the point of trying to index line because it already knows the list is empty. Hence you never try to take the index of the empty list.

Thanks, that got rid of the error. Could you please explain, what "falsey" means and what the "line" condition in the if statement does?
@Viggar added an explanation but I'm on a phone. Hopefully it's clear.
Thanks, it became clearer now. One thing I still don't get, though: The "line" in the if-statement, does it come from ME declaring it "line" in the line above ( in the "for line in ..." part). Or is it a python-internal "keyword"?
@Viggar it has no meaning at all in Python; it comes entirely from you calling your loop variable line.

Jan Christoph Terasa · Accepted Answer · 2019-02-03 17:23:26Z

1

There is no reason to use csv at all to find a word in a file:

def word_in_file(filename, name) with open(filename, 'r') as f: for line in f: if name in line: return True return False

edited Feb 3, 2019 at 17:23

answered Feb 3, 2019 at 13:11

Jan Christoph Terasa

6,00327 silver badges35 bronze badges

9 Comments

roganjosh Over a year ago

Except that this now scans the full lines of text. If the value would only appear in the first column, this will be slower.

Jan Christoph Terasa Over a year ago

@roganjosh I assume you did a benchmark. How do you know if ... in ... does not shortcircuit when it found the first occurance?

roganjosh Over a year ago

And in all the cases it doesn't find the value, it scans the entire text

roganjosh Over a year ago

What if the CSV has 100 columns and you're only interested in the first column? That's 100 times more work for every single file, regardless of whether or not the word exists, minus the one line when in might short-circuit

roganjosh Over a year ago

Not to mention that if you are indeed only interested in the first column, this is now prone to false positives

|

Collectives™ on Stack Overflow

How to "best" check if a string/word occurs in a csv file

2 Answers 2

4 Comments

9 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

9 Comments

Linked

Related