I want to replace rare words with _RARE_ in a JSON tree using JAVA.
My rareWords list contains
late populate convicts So for JSON below
["S", ["PP", ["ADP", "In"], ["NP", ["DET", "the"], ["NP", ["ADJ", "late"], ["NOUN", "1700<s"]]]], ["S", ["NP", ["ADJ", "British"], ["NOUN", "convicts"]], ["S", ["VP", ["VERB", "were"], ["VP", ["VERB", "used"], ["S+VP", ["PRT", "to"], ["VP", ["VERB", "populate"], ["WHNP", ["DET", "which"], ["NOUN", "colony"]]]]]], [".", "?"]]]] I should get
["S", ["PP", ["ADP", "In"], ["NP", ["DET", "the"], ["NP", ["ADJ", "_RARE_"], ["NOUN", "1700<s"]]]], ["S", ["NP", ["ADJ", "British"], ["NOUN", "_RARE_"]], ["S", ["VP", ["VERB", "were"], ["VP", ["VERB", "used"], ["S+VP", ["PRT", "to"], ["VP", ["VERB", "populate"], ["WHNP", ["DET", "which"], ["NOUN", "colony"]]]]]], [".", "?"]]]] Notice how
["ADJ","late"] was replaced by
["ADJ","_RARE_"] My code so far is like below:
I recursively iterate over the tree and as soon as rare word is found, I create a new JSON array and try to replace the existing tree's node with it. See // this Doesn't work in below, that is where I got stuck. The tree remains unchanged outside of this function.
public static void traverseTreeAndReplaceWithRare(JsonArray tree){ //System.out.println(tree.getAsJsonArray()); for (int x = 0; x < tree.getAsJsonArray().size(); x++) { if(!tree.get(x).isJsonArray()) { if(tree.size()==2) { //beware it will get here twice for same word String word= tree.get(1).toString(); word=word.replaceAll("\"", ""); // removing double quotes if(rareWords.contains(word)) { JsonParser parser = new JsonParser(); //This works perfectly System.out.println("Orig:"+tree); JsonElement jsonElement = parser.parse("["+tree.get(0)+","+"_RARE_"+"]"); JsonArray newRareArray = jsonElement.getAsJsonArray(); //This works perfectly System.out.println("New:"+newRareArray); tree=newRareArray; // this Doesn't work } } continue; } traverseTreeAndReplaceWithRare(tree.get(x).getAsJsonArray()); } } code for calling above, I use google's gson
JsonParser parser = new JsonParser(); JsonElement jsonElement = parser.parse(strJSON); JsonArray tree = jsonElement.getAsJsonArray();
strJSON.replaceAll("(late|populate|convicts)", "_RARE_")