Creates a class that searches for the elements in a list that contain a sequence

Question

Not sure what I should write here. The code should be self-explanatory.

"""This is a data definition class--Searchable_list. Searchable_list takes a list of strings and makes it searchable. Searchable meaning you can find which elements in the list have a pattern. """ class Searchable_list(object): """this will make your word list searchable. Note, It will also loose the original order of the list.""" def __init__(self, lis): assert hasattr(lis,"__iter__") self.search_dict=dict() for word in set(lis):self.add_word(word) def add_word(self,word): """this will add a word to the search_dict search dict is of the form: {letter:{nextletter:{(index,word)}}} """ assert type(word) is str#or isinstance(word,str) for index,val in enumerate(word[:-1]): next_letter=self.search_dict.setdefault(val,dict()) words_list=next_letter.setdefault(word[index+1],set())#object modification words_list.add((index,word))#object modifification def find_matches(self,seq): """finds all the words in the list with this sequence. Uses '.' as wildcard. """ s_d=self.search_dict assert len(seq)>1 #could put a try catch to catch key errors for index,letter in enumerate(seq[:-1]): if not(letter=="."and seq[index+1]=="."): #no point if they all match... if letter==".": L_m=set.union(*(i.get(seq[index+1],set()) for i in s_d.values())) #.get is important here. not all is have i[seq[index+1]] elif seq[index+1]==".": L_m=set.union(*(i for i in s_d[letter].values())) else: L_m=s_d[letter].get(seq[index+1],{})#this is a set. #L_m==letter_matches if index>0: m_m=((i-index,word) for i,word in L_m) #m_m=matches_matches. These words still have the pattern. #your matching all indexes to the original m_s m_s.intersection_update(m_m) #m_s=matches_set else: m_s=L_m.copy() #http://stackoverflow.com/questions/23200969/how-to-clone-or-copy-a-set-in-python return m_s

EDIT: Because this post was bumped, and I've added some pretty major improvements to it here's a link to the final version of this. It's not very clean, but it has optimizations this doesn't have. If there's an interest I can try to put some comments explaining the optimizations in the github or post something here about it (probably in the form of another answer). https://github.com/user-name-is-taken/words-with-friends/blob/master/WWF_DDC.py (note, the scrabble stuff is just adapting this code for scrabble). For now, the basic idea behind the optimizations is that python's set.intersection is faster than set.union

"Not sure what I should write here." Does it work as intended? Do you want a review about any and all aspects of your code? If you can answer both questions with 'yes', you probably posted in the right place. Of-course, you could always check the help center. — Mast
– Mast ♦, Commented Mar 10, 2017 at 22:26

mikeLundquist · Accepted Answer · 2017-03-26 14:49:10Z

Just an updated version of find matches. Changes are:

I added a while loop that removes trailing "."s see the comments for an explanation.
And, I moved the intersection out of the loop for speed and to make the code cleaner. This required adding the setsList list.

 def find_matches(self,seq): """finds all the words in the list with this sequence. Uses '.' as wildcard. """ assert len(seq)>1 s_d = self.search_dict setsList =[] while seq[-1]=='.': #not solved by if index+1=='.' because there's no [letter][''] for word endings in self.search_dict. #without this, .f. wouldn't find (0,"of"), because the L_m in the seq[index+1]=="." if wouldn't include it. seq = seq[:-1] for index,letter in enumerate(seq[:-1]): if not(letter=="." and seq[index+1]=="."):#no point if they all match... if letter==".": L_m = set.union(*(i.get(seq[index+1],set()) for i in s_d.values())) #.get is important here. not all is have i[seq[index+1]] elif seq[index+1]==".": L_m = set.union(*s_d[letter].values()) else: L_m = s_d[letter].get(seq[index+1],{})#this is a set. #not using s_d.get could cause errors here... #L_m==letter_matches setsList.append({(i-index,word) for i,word in L_m}) return set.intersection(*setsList)

J_H · Accepted Answer · 2017-09-04 03:38:08Z

naming

Based on reading your comment, I propose this name:

class SearchableCollection(object):

Rely on duck typing, rather than trying to write java code in python:

def __init__(self, words): # (delete this line) assert hasattr(lis,"__iter__") self.words = {} for word in words: self._add_word(word)

It appears that add_word is not part of your public API. Mark it so with a leading underscore, or make it a nested def. Do not assert that type is str.

 for index,val in enumerate(word[:-1]):

Please name it letter rather than the very vague val. Or cur_letter, parallel with next_letter.

You are using setdefault() in a sensible way. But you might be happier using defaultdict.

You named it words_list, but apparently you meant words_set.

style

Run $ flake8 WWF_DDC.py and follow its advice, please.

Stack Exchange Network

Creates a class that searches for the elements in a list that contain a sequence

2 Answers 2

naming

style

You must log in to answer this question.

Hot Network Questions

Creates a class that searches for the elements in a list that contain a sequence

2 Answers 2

naming

style

You must log in to answer this question.

Related

Hot Network Questions