5
\$\begingroup\$

I have a real-world problem involving finding the best known term from a string containing sets of terms. We have Term objects that are comparable in that they have a method a_term.is_better_than(other_term). This method is Boolean -- if it returns False, then a_term may be as good as other_term, but isn't better.

Using this method, I need to pare down a set of terms to the best ones. The best way I've been able to think about this is to compare pairs of terms and eliminate any terms when one isn't as good as the other, something like this:

def get_best_terms(terms: Set[Term]): Set[Term] best_terms = set() m_terms = list(terms) while m_terms: term = m_terms.pop() for i in reversed(range(len(m_terms))): if term.is_better_than(m_terms[i]): del m_terms[i] if m_terms[i].is_better_than(term): break else: # We never encountered the `break` statement best_terms.add(term) break best_terms |= {a_term for a_term in m_terms if not (term.is_better_than(a_term) or a_term.is_better_than(term))} return best_terms 

This approach is functional, but pretty gross in my opinion. For one thing, there's a lot of mutation going on here. For another, once I've identified a single best term, finding all the terms that are equally good is messy. The whole thing is hard to read.

What do you suggest?

Examples

A Term is a string with some metadata attached. For example,

a = AutoTerm(term='honda', term_span=1, term_range=(1, 2), original_term='CAR HONDA ACC...') b = AutoTerm(term='2015 honda accord', term_span=2, term_range=(1, 3), original_term='2015 HONDA ACC...') c = AutoTerm(term='2015 accord', ...) 

Sometimes a term is a better choice because it's more specific, but there are other nuances, such as a match in a database. Let's say, arbitrarily, that b and c are equally good, but a is not as good. Then

assert get_best_terms({a, b, c}) == {b, c} 

should pass.

\$\endgroup\$
3
  • \$\begingroup\$ Can you provide an example of Term and some example IO? \$\endgroup\$ Commented Jul 31, 2018 at 13:57
  • \$\begingroup\$ Is the ranking of Terms transitive? If Term A is better than Term B, and Term B is better than Term C, is Term A guaranteed to be better than Term C? \$\endgroup\$ Commented Jul 31, 2018 at 20:01
  • 1
    \$\begingroup\$ @DavidConrad yes. \$\endgroup\$ Commented Jul 31, 2018 at 20:06

2 Answers 2

5
\$\begingroup\$
  1. You have a bug, if you remove an item in your for loop without breaking you get an IndexError. Instead use a worse set so that you know what to skip.
  2. What you seem to be doing is performing a sort, and so if you make a cmp function you can use sorted. To do this you need to use functools.cmp_to_key and then you can itertools.takewhile.

And so I'd use:

import functools import itertools from typing import Set, List class Term(int): def is_better_than(self, other): return self // 2 > other // 2 def cmp(self: Term, other: Term): if self.is_better_than(other): return 1 if other.is_better_than(self): return -1 return 0 def get_best_terms(terms: Set[Term]) -> List[Term]: if not terms: return [] terms = sorted(terms, key=functools.cmp_to_key(cmp), reverse=True) best = terms[0] return list(itertools.takewhile(lambda i: not best.is_better_than(i), terms)) print(get_best_terms(map(Term, [1, 2, 3, 4, 5]))) print(get_best_terms(map(Term, [1, 2, 3, 4]))) print(get_best_terms(map(Term, [5, 1, 2, 4, 3]))) print(get_best_terms(map(Term, [5, 1, 4, 3]))) 

This can however be simplified if Terms are comparable, as you won't need to use key and can remove the need for a lambda.

@functools.total_ordering class Term(int): def __gt__(self, other: Term) -> bool: return self // 2 > other // 2 def __eq__(self, other: Term) -> bool: return self // 2 == other // 2 def get_best_terms(terms: Set[Term]) -> List[Term]: if not terms: return [] terms = sorted(terms, reverse=True) return list(itertools.takewhile(terms[0].__eq__, terms)) 
\$\endgroup\$
7
  • \$\begingroup\$ Yeah, this is great, thanks. I had thought about using total_ordering and sorting, but I was thinking I didn't actually need to sort. But in looking again, I think the gain in readability is worth every penny. \$\endgroup\$ Commented Jul 31, 2018 at 15:19
  • \$\begingroup\$ "in your for without breaking" for loop? \$\endgroup\$ Commented Jul 31, 2018 at 15:26
  • \$\begingroup\$ @Acccumulation Yes, I'll try make that clearer \$\endgroup\$ Commented Jul 31, 2018 at 15:27
  • \$\begingroup\$ @kojiro You didn't say in your question but is Term.is_better_than costly? Because you could make the code run in \$O(n)\$ time, with a worst case \$2n\$ calls to Term.is_best_terms if they are all equal. \$\endgroup\$ Commented Jul 31, 2018 at 15:34
  • \$\begingroup\$ It's not costly afaict. It is impossible to understand, though, which is why I am still having trouble writing a satisfactory __eq__ for total ordering. I'm also considering using max and a filter instead of takewhile, which trades the sorted for max 2x passes over the set. \$\endgroup\$ Commented Jul 31, 2018 at 15:36
3
\$\begingroup\$

If all you want are the best terms, then it seems to me that you can just keep a list of the current best terms. Every time you find a term that at least as good as any one of the previous best terms, then, assuming transitivity, it must be at least as good as all the previous best terms. And since the previous best terms were at least as good as all the terms tested so far, the new term is at least as good as all the terms tested so for. Next you check whether the new term is strictly better than one of the current best terms. If so, then you can replace the list of best terms with a list containing just the new term. If not, then the current term is just as good as the current best terms, so you should append it to the list of best terms.

best_terms = [] for term in terms: if not best_terms: best_terms = [term] continue if not best_terms[0].is_better_than(term): if term.is_better_than(best_terms[0]): best_terms = [term] else best_terms.append(term) 
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Please tell me you changed your account name just to give this answer, because that'd be about perfect. \$\endgroup\$ Commented Jul 31, 2018 at 15:47
  • \$\begingroup\$ Except "is_better_than" returns false if the terms are equally good. You seem to be considering otherwise... \$\endgroup\$ Commented Jul 31, 2018 at 20:14

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.