4

I have strings like these in my Python program and I added the wanted result as requested:

"Sat 1 Dec - 11h + 14h / Sun 2 Dec - 12h30" ("Sat 1 Dec 11h", "Sat 1 Dec 14h", "Sun 2 Dec 12h30") "Tue 27 + Wed 28 Nov - 20h30" ("Tue 27 Nov 20h30", "Wed 28 Nov 20h30") "Fri 4 + Sat 5 Jan - 20h30" ("Fri 4 Jan 20h30", "Sat 5 Jan 20h30") "Wed 23 Jan - 20h" ("Wed 23 Jan 20h") "Sat 26 Jan - 11h + 14h / Sun 27 Jan - 11h" ("Sat 26 Jan 11h", "Sat 26 Jan 14h", "Sun 27 Jan 11h") "Fri 8 and Sat 9 Feb - 20h30 + thu 1 feb - 15h" ("Fri 8 Feb 20h30", "Sat 9 Feb 20h30", "Thu 1 feb 15h") "Sat 2 Mar - 11h + 14h / Sun 3 Mar - 11h" ("Sat 2 Mar 11h", "Sat 2 Mar 14h", "Sun 3 Mar 11h") "Wed 12, Thu 13, Fri 14 and Sat 15 Jun - 19h + Sun 16 Jun - 12h30" ("Wed 12 Jun 19h", "Thu 13 Jun 19h", "Fri 14 Jun 19h", "Sat 15 Jun 19h", "Sun 16 Jun 12h30") 

and with these two regex I can finde the 3 dates of the first string:

(Mon|Tue|Wed|Thu|Fri|Sat|Sun)\s([0-9]{1,2}\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))(?:.*?)([0-9]{1,2}[uh\:](?:[0-9]{2})?) (Mon|Tue|Wed|Thu|Fri|Sat|Sun)\s([0-9]{1,2}\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))(?:.*?\+\s)([0-9]{1,2}[uh\:](?:[0-9]{2})?) 

Is it possible to get all the dates from these strings with one or two regex patterns (to match all of them). So I think what it needs to do: finding the first following month for each date if not given, get the corresponding time and if followed by multiple hours make multiple datetimes per date. Formatting is not that important.

3
  • Should it be Sat 1 Dec + 14h? Commented Nov 16, 2012 at 14:03
  • I could answer this if you wrote examples for more of the dates. I would split the strings and append dates based on any + signs, or other rules/patterns I'm noticing. String parsing usually works better for data massaging, IMO. But I can't tell how to approach it with only one example. Please update and I'll post a solution. Commented Nov 16, 2012 at 14:25
  • I am struggling to incorporate a solution that can accommodate for the...unusual...format you've provided for your last example, "Wed 12, Thu 13, Fri 14 and Sat 15 Jun - 19h + Sun 16 Jun - 12h30". The overloading of the + is a particular sore spot. You don't have any more examples of dates with ands and + signs in them, do you? Commented Nov 16, 2012 at 16:08

2 Answers 2

1

I got you started. This is my interpretation of your problem. I leave the implementation of parse_complex up to you.

class DateParser(object): """parse dates according to the custom rules here: >>> DateParser("Sat 1 Dec - 11h + 14h / Sun 2 Dec - 12h30").parse() ("Sat 1 Dec 11h", "Sat 1 Dec 14h", "Sun 2 Dec 12h30") >>> DateParser("Tue 27 + Wed 28 Nov - 20h30").parse() ("Tue 27 Nov 20h30", "Wed 28 Nov 20h30") >>> DateParser("Fri 4 + Sat 5 Jan - 20h30").parse() ("Fri 4 Jan 20h30", "Sat 5 Jan 20h30") >>> DateParser("Wed 23 Jan - 20h").parse() ("Wed 23 Jan 20h") >>> DateParser("Sat 26 Jan - 11h + 14h / Sun 27 Jan - 11h").parse() ("Sat 26 Jan 11h", "Sat 26 Jan 14h", "Sun 27 Jan 11h") >>> DateParser("Fri 8 and Sat 9 Feb - 20h30 + thu 1 feb - 15h").parse() ("Fri 8 Feb 20h30", "Sat 9 Feb 20h30", "Thu 1 feb 15h") >>> DateParse("Sat 2 Mar - 11h + 14h / Sun 3 Mar - 11h").parse() ("Sat 2 Mar 11h", "Sat 2 Mar 14h", "Sun 3 Mar 11h") >>> DateParser("Wed 12, Thu 13, Fri 14 and Sat 15 Jun - 19h + Sun 16 Jun - 12h30").parse() ("Wed 12 Jun 19h", "Thu 13 Jun 19h", "Fri 14 Jun 19h", "Sat 15 Jun 19h", "Sun 16 Jun 12h30") """ def __init__(self, line): self.date = line self.dates = self.split_dates(line) self.final = [] self.days = ['mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun'] self.mons = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec'] def parse(self): if self.is_complex(): self.parse_complex() else: self.parse_simple() return tuple(self.final) def parse_simple(self): """typical formats: Day 00 + Day 01 Mon - 00h00 Day 00 Mon - 00h00 + 01h00 Day 00 Mon - 00h00 / Day 02 Mon - 00h00 """ for date in self.dates: mods = self.split_modifiers(date) date_mods = [] for mod in mods: if self.is_complete(mod): #only *one* base_date base_date, time = self.split_time(mod) date_mods.append(time) else: date_mods.append(mod) for mod in date_mods: if self.is_hour(mod): #Sat 1 Dec - 11h + 14h self.final.append(' '.join([base_date, mod])) else: #Fri 4 + Sat 5 Jan - 20h30 self.final.append(' '.join([mod, self.extract_month(base_date), time])) def parse_complex(self): """typical format: Day 00, Day 01 and Day 02 Mon - 00h00 + Day 03 Mon 01h00 """ raise NotImplementedError() def is_complex(self): """presence of the complex date attribute requires special parsing""" return self.date.find(' and ') > -1 def is_complete(self, section): """section has format `Day 00 Mon - 00h00` must have no modifiers to determine completeness """ sections = map(lambda x: x.lower(), section.split()) try: dow, dom, moy, dash, time = sections except ValueError, e: return False return all([dow in self.days, moy in self.mons]) def is_hour(self, section): return section[0].isdigit() def is_day(self, section): return section[:3].lower() in self.days def extract_month(self, section): """return the month present in the string, if any""" for mon in self.mons: if section.lower().find(mon) > -1: found = section.lower().index(mon) return section[found : found + 3] return None def split_dates(self, section): """split individual dates from a list of dates""" return section.split(' / ') def split_time(self, section): """split individual times from a complete date""" return section.split(' - ') def split_modifiers(self, section): """extend a date by implying that they share a date or a time modifiers change their meaning when parsing a complex date """ return section.split(' + ') >>> DateParser("Fri 4 + Sat 5 Jan - 20h30 / Sat 1 Dec - 11h + 14h + 16h / Sun 2 Dec - 12h30").parse() ('Fri 4 Jan 20h30', 'Sat 5 Jan 20h30', 'Sat 1 Dec 11h', 'Sat 1 Dec 14h', 'Sat 1 Dec 16h', 'Sun 2 Dec 12h30') 

If you have questions about the way I've documented this class, feel free to get back to me, and I can help you out some more. This problem was a little more complex than I first thought, I need to get some other stuff done first.

Sign up to request clarification or add additional context in comments.

1 Comment

It's not regex but it's probably better to catch exceptions too. You really helped me out here, thanks!
0

I do not have enough reputation to comment your question, but can't you just use grouping. And add parentheses around the day-number-month part and find the group number and put it together with the hour grouped part ? Then i think u only need 1 regex, but a bit of handling of groups. here is a link http://flockhart.virtualave.net/RBIF0100/regexp.html here there is a small example of grouping its the second one, very simple and u prob already know this stuff.

Kind regards

1 Comment

Only in .NET can you get an arbitrary number of captures (by reusing a single group). In all other regex engines (as far as I know) a reused capturing group will always overwrite what has been captured before.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.