I'm trying to mine a text into a list using re.
Here is what I've written:
dateStr = "20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009" regex = r'(?:\d{1,2}[/-]*)?(?:Mar)?[a-z\s,.]*(?:\d{1,2}[/-]*)+(?:\d{2,4})+' result = re.findall(regex, dateStr) Even if I stated (?:\d{1,2}[/-]*) at the beginning of the expression, I'm missing the days digits. Here is what I get :
['Mar 2009', 'March 2009', 'Mar. 2009', 'March, 2009'] Could you help? Thanks
Edit:
This question was solved through the comments.
Original assignment string: dateStr = "04-20-2009; 04/20/09; 4/20/09; 4/3/09; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009; 20 Mar 2009; 20 March 2009; 2 Mar. 2009; 20 March, 2009; Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009; Feb 2009; Sep 2009; Oct 2010; 6/2008; 12/2009; 2009; 2010"
r'(?:\d{1,2}[\s/-]*)?(?:Mar)?[a-z\s,.]*(?:\d{1,2}[\s/-]*)+(?:\d{2,4})+', see demo. From the looks of it, you want to match many more date formats, what are your pattern requirements?