-1

i need to group CSV columns such that

User ID Group ABC Group1 DEF Group2 ABC Group3 GHI Group4 XYZ Group2 UVW Group5 XYZ Group1 ABC Group1 DEF Group2 

Output should be such that

ABC Group1 ->2 ABC Group3 ->1 DEF Group2 ->2 GHI Group4 ->1 UVW Group5 ->1 XYZ Group2 ->1 XYZ Group1 ->1 

and need to group the data such that for ex. in ABC-->((group1 occurs twice)/(total number of occurences of ABC))+((group3 occurs once)/(total number of occurences of ABC)). so ABC-->2/3+1/3

ABC--> 2/3(no. of occurences of ABC)+1/3 DEF-->2/2 GHI-->1/1 UVW-->1/1 XYZ-->1/2+1/2 

the first set of results is got using GUAVA lib

Multiset<String> set = TreeMultiset.create(); BufferedReader reader = null; try { reader = new BufferedReader(new FileReader("test.csv")); String[] currLineSplitted; while (reader.ready()) { currLineSplitted = reader.readLine().split(","); set.add(currLineSplitted[0] + "," + currLineSplitted[1]); } for (String key : set.elementSet()) { System.out.println(key + " : " + set.count(key)); } } finally { if (reader != null) { reader.close(); } } 

not sure how to get the second result by grouping.

4
  • 1
    Very unclear. What do all the numbers mean? What exactly do you want? Commented Aug 20, 2014 at 11:46
  • I don't get the second grouping, could you explain the syntax? What does XYZ-->1/2+1/2 mean? You wrote 2/2(no. of occurences of ABC) so I guess (but that's not clear) that the second number is the number of occurences, but what's the first? What does the number of occurences refer to? Global occurences or per group? Commented Aug 20, 2014 at 11:46
  • A better explanation of the 2nd output would help to give you a solution. Commented Aug 20, 2014 at 11:48
  • in ABC-->((group1 occurs twice)/(total number of occurences of ABC))+((group3 occurs once)/(total number of occurences of ABC)). so ABC-->2/3+1/3 Commented Aug 20, 2014 at 14:26

1 Answer 1

1

You should use a map of collections instead of a plain set. Something like this:

Map<String, Map<String,Integer>> supermap = new Hashmap(); BufferedReader reader = null; try { reader = new BufferedReader(new FileReader("test.csv")); String[] currLineSplitted; while (reader.ready()) { currLineSplitted = reader.readLine().split(","); Map<String,Integer> innermap; if(supermap.contains(currLineSplitted[0]){ innermap = supermap.get(currLineSplitted[0]); if(innermap.contains(currLineSplitted[1]){ innermap.put(currLineSplitted[1], innermap.get(currLineSplitted[1])++); } else { innermap.put(currLineSplitted[1],new Integer(1));//EDITED } } else { innermap=new Hashmap(); innermap.put(currLineSplitted[1],new Integer(1));//EDITED supermap.put(currLineSplitted[0], innermap); } } Collections.sort(supermap.keySet() , new YourOwnComparator() );//EDITED for (String userID : supermap.keySet()) { Map m = supermap.get(userID); //===========first result============= for(String group : m.keySet()){ System.out.println(userID + group + " : " + m.get(group)); } //===================================== } for (String userID : supermap.keySet()) { Map m = supermap.get(userID); //===========second result============= int numberOfGroups = m.size(); StringBuilder sb = new StringBuilder(); sb.append(userID+"-->"); for(String group : m.keySet()){ sb.append(m.get(group).toString()+"/"+numberOfGroups); } System.out.println(sb.toString()); //===================================== } } finally { if (reader != null) { reader.close(); } } 

EDIT: My bad: the Integers must be created with 1 as start value. The sorting of your entries can be implemented accordingly to this case.

Sign up to request clarification or add additional context in comments.

1 Comment

XYZGroup1 : 0 ABCGroup1 : 0 DEFGroup2 : 0 GHIGroup4 : 0 UVWGroup5 : 0 XYZ-->0/1 ABC-->0/1 DEF-->0/1 GHI-->0/1 UVW-->0/1 this is what i get from the above approach

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.