2

I have scraped some data and there are some hours that have time in 12 hours format. The string is like this: Mon - Fri:,10:00 am - 7:00 pm. So i need to extract the times 10:00 am and 7:00 pm and then convert them to 24 hour format. Then the final string I want to make is like this:

Mon - Fri:,10:00 - 19:00 

Any help would be appreciated in this regard. I have tried the following:

import re txt = 'Mon - Fri:,10:00 am - 7:00 pm' data = re.findall(r'\s(\d{2}\:\d{2}\s?(?:AM|PM|am|pm))', txt) print(data) 

But this regex and any other that I tried to use didn't do the task.

2
  • i cannot find the regex to extract the time from this string. If i get the regex, i 'll then move forward. I 'll be obliged if u help in generating regex to extract time from this string Commented May 19, 2020 at 16:11
  • 1
    i have posted an attemp Commented May 19, 2020 at 16:18

4 Answers 4

3

Your regex enforces a whitespace before the leading digit which prevents ,10:00 am from matching and requires two digits before the colon which fails to match 7:00 pm. r"(?i)(\d?\d:\d\d (?:a|p)m)" seems like the most precise option.

After that, parse the match using datetime.strptime and convert it to military using the "%H:%M" format string. Any invalid times like 10:67 will raise a nice error (if you anticipate strings that should be ignored, adjust the regex to strictly match 24-hour times).

import re from datetime import datetime def to_military_time(x): return datetime.strptime(x.group(), "%I:%M %p").strftime("%H:%M") txt = "Mon - Fri:,10:00 am - 7:00 pm" data = re.sub(r"(?i)(\d?\d:\d\d (?:a|p)m)", to_military_time, txt) print(data) # => Mon - Fri:,10:00 - 19:00 
Sign up to request clarification or add additional context in comments.

Comments

1

Your regex looks only for two digit hours (\d{2}) with white space before them (\s). The following captures also one digit hours, with a possible comma instead of the space.

data = re.findall(r'[\s,](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt) 

However, you might want to consider all punctuation as valid:

data = re.findall(r'[\s!"#$%&\'\(\)*+,-./:;\<=\>?@\[\\\]^_`\{|\}~](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt) 

Comments

1

Regex need to change like here.

import re text = 'Mon - Fri:,10:00 am - 7:00 pm' result = re.match(r'\D* - \D*:,([\d\s\w:]+) - ([\d\s\w:]+)', text) print(result.group(1)) # it will print 10:00 am print(result.group(2)) # it will print 7:00 pm 

You need some thing like '+' and '*' to tell regex to get multiple word, if you only use \s it will only match one character.

You can learn more regex here.

https://regexr.com/

And here you can try regex online.

https://regex101.com/

Comments

1

Why not use the time module?

import time data = "Mon - Fri:,10:00 am - 7:00 pm" parts = data.split(",") days = parts[0] hours = parts[1] parts = hours.split("-") t1 = time.strptime(parts[0].strip(), "%I:%M %p") t2 = time.strptime(parts[1].strip(), "%I:%M %p") result = days + "," + time.strftime("%H:%M", t1) + " - " + time.strftime("%H:%M", t2) 

Output:

Mon - Fri:,10:00 - 19:00 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.