0

I have a textfile that consists of multiple blocks such as one shown here :

TestVar 00000000 WWWWWW 222.222 222.222 222.222 UNKNOWN ,,,,,,,, ,,,,,, ,,,,,, 

I would like to get the folowing output: Each part is always 8 characters long (e.g TestVar , 00000000) From each line that start with testvar i would like the code to return:

WWWWWW_00000000 

Can someone help me with this I have used regex before but never with python and am quite new to both of them.

Thanks

2
  • 3
    Do you really need regexp? Could you just split each line and take the 2nd and 3rd elements? Commented Sep 12, 2012 at 13:43
  • @PierreGM - Regex is easy to use here, as OP needs also check if string begins with TestVar, so it can be done in one step... Commented Sep 12, 2012 at 13:48

4 Answers 4

2

assuming you dont want us to write the code for you here is a link that is quite specific http://docs.python.org/howto/regex.html#regex-howto

keep in mind you will likely want to use the findall()... and also write your code using r' instead of constantly needing the backslashes...

you might want to show us the code you wrote already and isnt working so we can help you better gl

Sign up to request clarification or add additional context in comments.

Comments

1

With regex pattern ^TestVar\s+(\d{8})\s+(\S+) you can get that as >>

import re p = re.compile('^TestVar\s+(\d{8})\s+(\S+)') m = p.match('TestVar 00000000 WWWWWW 222.222 222.222 222.222') if m: print 'Match found: ', m.group(2) + '_' + m.group(1) else: print 'No match' 

Test this demo here.


To find all occurrences in multiline input string use:

p = re.compile("^TestVar\s+(\d{8})\s+(\S+)", re.MULTILINE) m = p.findall(input) 

To learn more about regex with Python, see http://docs.python.org/howto/regex.html

1 Comment

At first thank you very much. Could anyone by change explain then how to put this regex in python code to check it on a string?
1

You mention multiple occurrences of the pattern, in which case you could use re.findall along with re.MULTILINE:

input_string = """ TestVar 00000000 WWWWWW 222.222 222.222 222.222 UNKNOWN ,,,,,,,, ,,,,,, ,,,,,, TestVar 22222222 AAAAAA 222.222 222.222 222.222 UNKNOWN ,,,,,,,, ,,,,,, ,,,,,, """ import re pat = re.compile("^TestVar\s+(\d{8})\s+(\S+)", re.MULTILINE) matches = pat.findall(input_string) # Result: matches == [('00000000', 'WWWWWW'), ('22222222', 'AAAAAA')] for num, let in matches: print "%s_%s" % (num, let) 

Comments

0

Without regexp:

lines = ["TestVar 00000000 WWWWWW 222.222 222.222 222.222", "UNKNOWN ,,,,,,,, ,,,,,, ,,,,,,"] print [toks[2].strip(' ')+'_'+toks[1] for toks in \ [[line[i:i+8] for i in xrange(0,len(line),8)] for line in lines] \ if toks[0] == 'TestVar '] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.