FlowLogParser is a Java application designed to process and analyze network flow logs using concurrent processing. It parses flow log files, categorizes traffic based on ports and protocols, and generates statistical reports.
- Default Protocol Support:
- Limited to 8 predefined protocols in default PROTOCOL_MAP:
- Any protocol, port combination not in the map is labeled as "Untagged"
- Custom protocol mappings can be provided via optional protocol_map_file
-
Flow Log Version:
- Assumes VPC Flow Logs version 2 format
- No header row in flow log file
- Each line must contain minimum 8 fields
- Lines with fewer than 8 fields are skipped
- Fields are space-separated
-
Field Requirements:
- protocol (field 8): Must be a numeric protocol identifier
- Invalid or malformed fields result in line being skipped
- No validation of other fields as they're not used in analysis
- Each valid line increments both tag and port-protocol counters - Duplicate lines are counted separately - "Untagged" is used when no matching tag is found in lookup table # Clone the repository git clone https://github.com/nachivrn/FlowLogParser.git cd FlowLogParser javac src/FlowLogParser.java -d . javac -cp .:lib/junit-platform-console-standalone-1.8.2.jar src/FlowLogParserTest.java -d . # Run the code # Usage: java FlowLogParser <flow_log_file> <lookup_table_file> <output_file> [protocol_map_file] java FlowLogParser ./data/flowlogfile.txt ./data/lookuptable.csv ./data/output.txt # Check output cat ./data/output.txt # Run the Unit & Functional tests java -jar lib/junit-platform-console-standalone-1.8.2.jar --class-path . --select-class FlowLogParserTest -
Basic Functionality Tests
- Validates lookup table loading
- Tests single line processing
- Validates concurrent processing
-
Edge Cases
- Tests duplicate entries
- Test malformed input
- Test empty files
-
Error Handling Tests
- Invalid protocol numbers
- Missing fields
- Malformed input lines
- Tested with 100,000+ flow log records approximately 10MB and lookup table with 10,000 entries
- Measured processing time for large datasets
- Confirmed proper thread utilization