Improve the efficiency of the algorithm

Question

I'm trying to improve the algorithm. Now it works for O(n) and iterates through all the elements of the set. Always. My attempts to achieve incomplete O(n) lead to the introduction of var variables. It would be great to do without var.

Task: We need a class that implements a list of company names by substring - from the list of all available names, output a certain number of companies that start with the entered line. It is assumed that the class will be called when filling out a form on a website/mobile application with a high RPS (Requests per second).

My solution:

class SuggestService(companyNames : Seq[String]) { def suggest(input: String, numberOfSuggest : Int) : Seq[String] = { val resultCompanyNames = for { name <- companyNames if input.equals(name.take(input.length)) } yield name resultCompanyNames.take(numberOfSuggest) } //TODO: My code }

Scastie: https://scastie.scala-lang.org/mIC5ZTGwRyKnuJAbhM1VlA

The algorithm already looks good enough. Next trick is to introduce caching. Reddis or memcached will do fine. Also on the frontend, don't send a query every keystroke. Use windows of 150ms after the last keystroke to decide when to send a query — smac89
– smac89, Commented Jul 11, 2021 at 23:00
companyNames.view.filter(_.startsWith(input)).take(numberOfSuggest).toSeq — jwvh
– jwvh, Commented Jul 11, 2021 at 23:33

Kolmar · Accepted Answer · 2021-07-12 02:45:32Z

The solution proposed by @jwvh in the comments:

companyNames.view.filter(_.startsWith(input)).take(numberOfSuggest).toSeq

is good when you have several dozen company names, but in the worst case you'll have to check every single company name. If you have thousands of company names and many requests per second, this has a chance to become a serious bottleneck.

A better approach might be to sort the company names and use binary search to find the first potential result in O(L log N), where L is the average length of a company name:

import scala.collection.imuutable.ArraySeq // in Scala 2.13 class SuggestService(companyNames: Seq[String]) { // in Scala 2.12 use companyNames.toIndexedSeq.sorted private val sortedNames = companyNames.to(ArraySeq).sorted @annotation.tailrec private def binarySearch(input: String, from: Int = 0, to: Int = sortedNames.size): Int = { if (from == to) from else { val cur = (from + to) / 2 if (sortedNames(cur) < input) binarySearch(input, cur + 1, to) else binarySearch(input, from, cur) } } def suggest(input: String, numberOfSuggest: Int): Seq[String] = { val start = binarySearch(input) sortedNames.view .slice(start, start + numberOfSuggest) .takeWhile(_.startsWith(input)) .toSeq } }

Note that the built-in binary search in scala (sortedNames.search) returns any result, not necessarily the first one. And the built-in binary search from Java works either on Arrays (Arrays.binarySearch) or on Java collections (Collections.binarySearch). So in the code above I've provided an explicit implementation of the lower bound binary search.

If you need still better performance, you can use a trie data structure. After traversing the trie and finding the node corresponding to input in O(L) (may depend on trie implementation), you can then continue down from this node to retrieve the first numberOfSuggest results. The query time doesn't depend on N at all, so you can use this method with millions company names.

I really like the solution using binary search. It is very simple and effective. But my search for a solution led me to the idea that using a prefix tree is the most efficient. You confirmed this at the end. I will try to implement a tree. If I understand correctly, then first I need to implement the tree inside the SuggestService class.
If you only ever look for the start of the word then a prefix tree is the way to go. Building the word list from a full tree will be slow as there are many nodes to visit, so maybe truncate the tree after a few letters and switch to linear search on much smaller lists, which keeping list building efficient as well.

Collectives™ on Stack Overflow

Improve the efficiency of the algorithm

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related