0

I'm trying to improve the algorithm. Now it works for O(n) and iterates through all the elements of the set. Always. My attempts to achieve incomplete O(n) lead to the introduction of var variables. It would be great to do without var.

Task: We need a class that implements a list of company names by substring - from the list of all available names, output a certain number of companies that start with the entered line. It is assumed that the class will be called when filling out a form on a website/mobile application with a high RPS (Requests per second).

My solution:

class SuggestService(companyNames : Seq[String]) { def suggest(input: String, numberOfSuggest : Int) : Seq[String] = { val resultCompanyNames = for { name <- companyNames if input.equals(name.take(input.length)) } yield name resultCompanyNames.take(numberOfSuggest) } //TODO: My code } 

Scastie: https://scastie.scala-lang.org/mIC5ZTGwRyKnuJAbhM1VlA

3
  • 1
    The algorithm already looks good enough. Next trick is to introduce caching. Reddis or memcached will do fine. Also on the frontend, don't send a query every keystroke. Use windows of 150ms after the last keystroke to decide when to send a query Commented Jul 11, 2021 at 23:00
  • Have you tried making resultCompanyNames lazy? Commented Jul 11, 2021 at 23:23
  • 2
    companyNames.view.filter(_.startsWith(input)).take(numberOfSuggest).toSeq Commented Jul 11, 2021 at 23:33

1 Answer 1

2

The solution proposed by @jwvh in the comments:

companyNames.view.filter(_.startsWith(input)).take(numberOfSuggest).toSeq 

is good when you have several dozen company names, but in the worst case you'll have to check every single company name. If you have thousands of company names and many requests per second, this has a chance to become a serious bottleneck.

A better approach might be to sort the company names and use binary search to find the first potential result in O(L log N), where L is the average length of a company name:

import scala.collection.imuutable.ArraySeq // in Scala 2.13 class SuggestService(companyNames: Seq[String]) { // in Scala 2.12 use companyNames.toIndexedSeq.sorted private val sortedNames = companyNames.to(ArraySeq).sorted @annotation.tailrec private def binarySearch(input: String, from: Int = 0, to: Int = sortedNames.size): Int = { if (from == to) from else { val cur = (from + to) / 2 if (sortedNames(cur) < input) binarySearch(input, cur + 1, to) else binarySearch(input, from, cur) } } def suggest(input: String, numberOfSuggest: Int): Seq[String] = { val start = binarySearch(input) sortedNames.view .slice(start, start + numberOfSuggest) .takeWhile(_.startsWith(input)) .toSeq } } 

Note that the built-in binary search in scala (sortedNames.search) returns any result, not necessarily the first one. And the built-in binary search from Java works either on Arrays (Arrays.binarySearch) or on Java collections (Collections.binarySearch). So in the code above I've provided an explicit implementation of the lower bound binary search.


If you need still better performance, you can use a trie data structure. After traversing the trie and finding the node corresponding to input in O(L) (may depend on trie implementation), you can then continue down from this node to retrieve the first numberOfSuggest results. The query time doesn't depend on N at all, so you can use this method with millions company names.

Sign up to request clarification or add additional context in comments.

2 Comments

I really like the solution using binary search. It is very simple and effective. But my search for a solution led me to the idea that using a prefix tree is the most efficient. You confirmed this at the end. I will try to implement a tree. If I understand correctly, then first I need to implement the tree inside the SuggestService class.
If you only ever look for the start of the word then a prefix tree is the way to go. Building the word list from a full tree will be slow as there are many nodes to visit, so maybe truncate the tree after a few letters and switch to linear search on much smaller lists, which keeping list building efficient as well.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.