Brute force searching

Question

I'm currently working on a natural language processing project and I have two ArrayLists<String> where each ArrayList contains a verb at index 0, a noun at index 1 and a noun at index 2 (repeat). I'm trying to find any two nouns in the first list that is present consecutively in the second by using:

 for(int i = 1; i<finalKnowledgeGraph.size(); i+=3) { for(int j = 1; j<finalKnowledgeGraph.size(); j+=3) { for(int k=1; k<storeAsserions.size(); k+=3) { if(finalKnowledgeGraph.get(i).equals(storeAsserions.get(k)) && finalKnowledgeGraph.get(j+1).equals(storeAsserions.get(k+1))){ System.out.println("Found one"); } else if(finalKnowledgeGraph.get(i+1).equals(storeAsserions.get(k)) && finalKnowledgeGraph.get(j).equals(storeAsserions.get(k+1))) { System.out.println("Found another"); } } } }

However, this code has cubic complexity and both ArrayLists are thousands of lines long. I'm wondering if anyone has any tips on how I could go about speeding up this process. Also, I know virtually nothing about optimization so if you do have some help please break it down relatively simply. One of my friends recently suggested a HashMap and searching through that but in my head that's simply pushing the search problem from one data-structure to another

\$\begingroup\$ Look up how search engines do it. \$\endgroup\$

markspace
– markspace

2017-11-18 18:40:24 +00:00
Commented Nov 18, 2017 at 18:40 — markspace
– markspace, Commented Nov 18, 2017 at 18:40

J_H · Accepted Answer · 2017-11-21 01:05:15Z

Your current approach would benefit from sorted inputs.

But an even simpler approach would be to ignore the irrelevant verbs and rephrase the problem like this:

A noun phrase is a pair of nouns. We are given two input sets of noun phrases, and must compute set intersection, using a problem specific notion of "phrase equality".

This implies a linear O(n) input pre-processing pass, where you populate a HashSet with each input phrase, X+" "+Y.

With that in hand, your task is trivial. Scan the other collection of phrases, and for each phrase make two set membership queries: is X+" "+Y or Y+" "+X present in the set?

BTW, your identifier storeAsserions seems to be a typo for storeAssertions. And please include a space after keywords like if & for.

Stack Exchange Network

Brute force searching

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Brute force searching

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions