4
\$\begingroup\$

I'm currently working on a natural language processing project and I have two ArrayLists<String> where each ArrayList contains a verb at index 0, a noun at index 1 and a noun at index 2 (repeat). I'm trying to find any two nouns in the first list that is present consecutively in the second by using:

 for(int i = 1; i<finalKnowledgeGraph.size(); i+=3) { for(int j = 1; j<finalKnowledgeGraph.size(); j+=3) { for(int k=1; k<storeAsserions.size(); k+=3) { if(finalKnowledgeGraph.get(i).equals(storeAsserions.get(k)) && finalKnowledgeGraph.get(j+1).equals(storeAsserions.get(k+1))){ System.out.println("Found one"); } else if(finalKnowledgeGraph.get(i+1).equals(storeAsserions.get(k)) && finalKnowledgeGraph.get(j).equals(storeAsserions.get(k+1))) { System.out.println("Found another"); } } } } 

However, this code has cubic complexity and both ArrayLists are thousands of lines long. I'm wondering if anyone has any tips on how I could go about speeding up this process. Also, I know virtually nothing about optimization so if you do have some help please break it down relatively simply. One of my friends recently suggested a HashMap and searching through that but in my head that's simply pushing the search problem from one data-structure to another

\$\endgroup\$
1
  • 1
    \$\begingroup\$ Look up how search engines do it. \$\endgroup\$ Commented Nov 18, 2017 at 18:40

1 Answer 1

3
\$\begingroup\$

Your current approach would benefit from sorted inputs.

But an even simpler approach would be to ignore the irrelevant verbs and rephrase the problem like this:

A noun phrase is a pair of nouns. We are given two input sets of noun phrases, and must compute set intersection, using a problem specific notion of "phrase equality".

This implies a linear O(n) input pre-processing pass, where you populate a HashSet with each input phrase, X+" "+Y.

With that in hand, your task is trivial. Scan the other collection of phrases, and for each phrase make two set membership queries: is X+" "+Y or Y+" "+X present in the set?

BTW, your identifier storeAsserions seems to be a typo for storeAssertions. And please include a space after keywords like if & for.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.