I am trying to write a regex to identify some dates.
the string I am working on is :
string: 'these are just rubbish 11-2-2222, 24-3-1695-194475 12-13-1111, 32/11/2000\ these are dates 4-02-2011, 12/12/1990, 31-11-1690, 11 July 1990, 7 Oct 2012\ these are actual deal- by 12 December six people died and in June 2000 he told, by 5 July 2001, he will leave.' The regex looks like :
re.findall('(\ [\b, ]\ ([1-9]|0[1-9]|[12][0-9]|3[01])\ [-/.\s+]\ (1[1-2]|0[1-9]|[1-9]|Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sept|September|Oct|October|Nov|November|Dec|December)\ (?:[-/.\s+](1[0-9]\d\d|20[0-2][0-5]))?\ [^\da-zA-Z])',String) The output I get is :
[(' 11-2-', '11', '2', ''), (' 24-3-1695-', '24', '3', '1695'), (' 4-02-2011,', '4', '02', '2011'), (' 12/12/1990,', '12', '12', '1990'), (' 31-11-1690,', '31', '11', '1690'), (' 11 July 1990,', '11', 'July', '1990'), (' 7 Oct 2012 ', '7', 'Oct', '2012'), (' 12 December ', '12', 'December', ''), (' 5 July 2001,', '5', 'July', '2001')] Problems:
The first two output are wrong, they come because of the optional expression
((?:[-/.\s+](1[0-9]\d\d|20[0-2][0-5]))?)put to handle cases like"12 December". How do I get rid of them?There is a case
"June 2000"that is not handles by the expression.
Can I implement something with the expression that could handle this case without affecting others?