My current project has us using TreeSet and TreeMap in Java, with an input array of 10514 Song elements read in from a text file. Each Song contains a Artist, Title and Lyric fields. The aim of this project is to conduct fast searches on the lyrics using sets and maps.
First, I iterate over the input Song array, accessing the lyrics field and creating a Scanner object to iterate over the lyric words using this code: commonWords is a TreeSet of words that should not be keys, and lyricWords is the overall map of words to Songs.
public void buildSongMap() { for (Song song:songs) { //method variables String currentLyrics= song.getLyrics().toLowerCase(); TreeSet<Song> addToSet=null; Scanner readIn= new Scanner(currentLyrics); String word= readIn.next(); while (readIn.hasNext()) { if (!commonWords.contains(word) && !word.equals("") && word.length()>1) { if (lyricWords.containsKey(word)) { addToSet= lyricWords.get(word); addToSet.add(song); word=readIn.next(); } else buildSongSet(word); } else word= readIn.next(); } } In order to build the songSet, I use this code:
public void buildSongSet(String word) { TreeSet<Song> songSet= new TreeSet<Song>(); for (Song song:songs) { //adds song to set if (song.getLyrics().contains(word)) { songSet.add(song); } } lyricWords.put(word, songSet); System.out.println("Word added "+word); } Now, since buildSongSet is called from inside a loop, creating the map executes in N^2 time. When the input array is 4 songs, searches run very fast, but when using the full array of 10514 elements, it can take over 15+ min to build the map on a 2.4GHz machine with 6 GiB RAM. What can I do to make this code more efficient? Unfortunately, reducing the input data is not an option.