I'm trying to extract date information from a string. The string may look like:
- 5 months and 17 hours
- 1 month and 19 days
- 3 months and 1 day
- 2 years 1 month and 2 days
- 1 year 1 month and 1 days and 1 hour
And I'd like to extract:
- y=0 m=5 d=0 h=17
- y=0 m=1 d=19 h=0
- y=0 m=3 d=1 h=0
- y=2 m=1 d=2 h=0
- y=1 m=1 d=1 h=1
I started working something out like this:
publishedWhen = '1 year 1 month and 1 days and 1 hour' y,m,d,h = 0,0,0,0 if 'day ' in publishedWhen: d = int(publishedWhen.split(' day ')[0]) if 'days ' in publishedWhen: d = int(publishedWhen.split(' days ')[0]) if 'days ' not in publishedWhen and 'day ' not in publishedWhen: d = 0 if 'month ' in publishedWhen: m = int(publishedWhen.split(' month ')[0]) d = int(publishedWhen.replace(publishedWhen.split(' month ')[0] + ' month ','').replace('and','').replace('days','').replace('day','')) if 'months ' in publishedWhen: m = int(publishedWhen.split(' months ')[0]) However, I know that this code is bug-ridden (some cases are probably not taken into account) and that regex would probably produce something much cleaner and effective. Is this true? Which regex would help me extract all this information?