i need to group CSV columns such that
User ID Group ABC Group1 DEF Group2 ABC Group3 GHI Group4 XYZ Group2 UVW Group5 XYZ Group1 ABC Group1 DEF Group2 Output should be such that
ABC Group1 ->2 ABC Group3 ->1 DEF Group2 ->2 GHI Group4 ->1 UVW Group5 ->1 XYZ Group2 ->1 XYZ Group1 ->1 and need to group the data such that for ex. in ABC-->((group1 occurs twice)/(total number of occurences of ABC))+((group3 occurs once)/(total number of occurences of ABC)). so ABC-->2/3+1/3
ABC--> 2/3(no. of occurences of ABC)+1/3 DEF-->2/2 GHI-->1/1 UVW-->1/1 XYZ-->1/2+1/2 the first set of results is got using GUAVA lib
Multiset<String> set = TreeMultiset.create(); BufferedReader reader = null; try { reader = new BufferedReader(new FileReader("test.csv")); String[] currLineSplitted; while (reader.ready()) { currLineSplitted = reader.readLine().split(","); set.add(currLineSplitted[0] + "," + currLineSplitted[1]); } for (String key : set.elementSet()) { System.out.println(key + " : " + set.count(key)); } } finally { if (reader != null) { reader.close(); } } not sure how to get the second result by grouping.
XYZ-->1/2+1/2mean? You wrote2/2(no. of occurences of ABC)so I guess (but that's not clear) that the second number is the number of occurences, but what's the first? What does the number of occurences refer to? Global occurences or per group?