5

I have the following code to match the dates

import re date_reg_exp2 = re.compile(r'\d{2}([-/.])(\d{2}|[a-zA-Z]{3})\1(\d{4}|\d{2})|\w{3}\s\d{2}[,.]\s\d{4}') matches_list = date_reg_exp2.findall("23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015") print matches_list 

The output I expect is

["23-SEP-2015","23-09-2015","23-09-15","Sep 23, 2015"] 

What I am getting is:

[('-', 'SEP', '2015'), ('-', '09', '2015'), ('-', '09', '15'), ('', '', '')] 

Please check the link for regex here.

5
  • 1
    I think your first ( may be in the wrong place - the first two numbers are not captured, the first thing you've told it to capture is the [-/.] sequence Commented Dec 11, 2015 at 10:05
  • 2
    Really, it's little difficult for regex to do that...what about just use "23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015".split(' and ') in this case? Commented Dec 11, 2015 at 10:07
  • In this case it works but the input string is not actually separated by and. It can be This string is 23-09-2015 and It can also be something. I need a match saying ['23-09-2015'] Commented Dec 11, 2015 at 10:15
  • @SimonFraser I'm not good with regex If you can help me with the above expression that would be great. Commented Dec 11, 2015 at 10:17
  • @PalepuKartheek have a look at the regex in my answer. it will take care of the string from which you want to extract the date. Commented Dec 11, 2015 at 10:27

3 Answers 3

3

The problem you have is that re.findall returns captured texts only excluding Group 0 (the whole match). Since you need the whole match (Group 0), you just need to use re.finditer and grab the group() value:

matches_list = [x.group() for x in date_reg_exp2.finditer("23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015")] 

See IDEONE demo

re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of strings... If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

re.finditer(pattern, string, flags=0)
Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string.

Sign up to request clarification or add additional context in comments.

Comments

2

You could try this regex

date_reg_exp2 = re.compile(r'(\d{2}(/|-|\.)\w{3}(/|-|\.)\d{4})|([a-zA-Z]{3}\s\d{2}(,|-|\.|,)?\s\d{4})|(\d{2}(/|-|\.)\d{2}(/|-|\.)\d+)') 

Then use re.finditer()

for m in re.finditer(date_reg_exp2,"23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015"): print m.group() 

The Output will be

23-SEP-2015
23-09-2015
23-09-15
Sep 23, 2015

4 Comments

Your regex also captures something like this (which I don't need) - 55.123.4567. Also / says "Unescaped Forward Slash". So I guess you need to use \/.
I think . does not occur in your dates so that need not be included in the regex and for / you can use \/
In the example I have given above - . doesn't occur but in real-time cases a lot of . occur (atleast in my case).
Some cases which I can think of: dd-mm-yyyy dd/mm/yyyy dd.mm.yyyy dd-mon-yyyy dd/mon/yyyy dd.mon.yyyy Mon dd, yyyy Mon dd. yyyy
1

try this

# The first (\d{2}-([A-Z]{3}|\d{2})-(\d{4}|\d{2})) group tries to match the first three types of dates # rest will match the last type dates = "23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015" for x in re.finditer('((\d{2}-([A-Z]{3}|\d{2})-(\d{4}|\d{2}))|([a-zA-Z]{3}\s\d{1,2},\s\d{4}))', dates): print x.group(1) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.