I have a very large text file to parse for some information. Each line I do checks for certain keywords (I call them "flags"). Once I find the "flag", I then call the below method and gather the data that comes right after the flag (usually just a name or number) to find the information after the flag I used the below method(which works):
def findValue(string, flag): string = string.strip() startIndex = string.find(flag) + len(flag) index = startIndex char = string[index:index+1] while char != " " and index < len(string): index += 1 char = string[index:index+1] endIndex = index return string[startIndex:endIndex] However, it is much easier if I just use the split() with white spaces as separators and then take the next item in the list rather than "crawling" the characters.
The log files I am parsing are really large (around 1.5 million or more lines), so I would like to know if and how much it would hurt my efficient to use split() on lines compared to my current method.
char = string[index:index+1]creates a new string at each loop, very unefficient.split (string[startIndex:])would be much faster than your current method.stringcontain the entire contents of the file, or just a single line?string[i]equivalent tostring[i:i+1]? And maybe a bit more efficient?