2

I have a text file with Tag - Value format data. I want to parse this file to form a Trie. What will be the best approach?

Sample of File: (String inside "" is a tag and '#' is used to comment the line.)

 #Hi, this is a sample file. "abcd" = 12; "abcde" = 16; "http" = 32; "sip" = 21; 
7
  • 1
    What is a Trie? if you mean Tree, this data is not in a tree structure. Homework tag? Commented Jun 17, 2010 at 20:27
  • 5
    This is a trie. Commented Jun 17, 2010 at 20:30
  • 1
    @Byron A Trie is a data structure similar to a Tree. See en.wikipedia.org/wiki/Trie Commented Jun 17, 2010 at 20:30
  • 2
    @Hank Thank you. Ive not heard of that data structer before. Commented Jun 17, 2010 at 20:34
  • @Byron It's somewhat obscure. I don't think I've seen one "in the wild" because they aren't really targeted at my domain. Commented Jun 17, 2010 at 20:47

4 Answers 4

5

This is basically a properties file, I would remove the " around the tags, then use the Properties class http://java.sun.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) to load the file.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for enlightening me with this new Class.
5

Read that in using Properties and trim the excess parts (", ; and whitespace). Short example:

Properties props = Properties.load(this.getClass() .getResourceAsStream("path/to.file")); Map<String, String> cleanedProps = new HashMap<String, String>(); for(Entry pair : props.entrySet()) { cleanedProps.put(cleanKey(pair.getKey()), cleanValue(pair.getValue())); } 

Note that in the solution above you only need implement the cleanKey() and cleanValue() yourself. You may want to change the datatypes accordingly if necessary, I used Strings just as an example.

2 Comments

Or simplify the source file so it doesn't have these extra characters. ;)
He gave us the input to work with, I'm just working with what I got.
1

There are many ways to do this; others have mentioned that java.util.Properties gets most of the job done, and is probably the most robust solution.

One other option is to use a java.util.Scanner.

Here's an example that scans a String for simplicity:

 String text = "#Hi, this is a sample file.\n" + "\n" + "\"abcd\" = 12; \r\n" + "\"abcde\"=16;\n" + " # \"ignore\" = 13;\n" + "\"http\" = 32; # Comment here \r" + "\"zzz\" = 666; # Out of order! \r" + " \"sip\" = 21 ;"; System.out.println(text); System.out.println("----------"); SortedMap<String,Integer> map = new TreeMap<String,Integer>(); Scanner sc = new Scanner(text).useDelimiter("[\"=; ]+"); while (sc.hasNextLine()) { if (sc.hasNext("[a-z]+")) { map.put(sc.next(), sc.nextInt()); } sc.nextLine(); } System.out.println(map); 

This prints (as seen on ideone.com):

#Hi, this is a sample file. "abcd" = 12; "abcde"=16; # "ignore" = 13; "http" = 32; # Comment here "zzz" = 666; # Out of order! "sip" = 21 ; ---------- {abcd=12, abcde=16, http=32, sip=21, zzz=666} 

Related questions

See also

Comments

0

The most natural way is probably this:

void doParse() { String text = "#Hi, this is a sample file.\n" + "\"abcd\" = 12;\n" + "\"abcde\" = 16;\n" + "#More comment\n" + "\"http\" = 32;\n" + "\"sip\" = 21;"; Matcher matcher = Pattern.compile("\"(.+)\" = ([0-9]+)").matcher(text); while (matcher.find()) { String txt = matcher.group(1); int val = Integer.parseInt(matcher.group(2)); System.out.format("parsed: %s , %d%n", txt, val); } } 

1 Comment

I think this will still find a key/value pair on a line that is commented with #.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.