3

I have CSV data files with timestamped records that are in local time. Unfortunately the data files cover the period where daylight saving time changes (Nov 3rd 2013) so the time component of the timestamps for the records go: 12:45, 1:00, 1:15, 1:30, 1:45, 1:00, 1:15, 1:30, 1:45, 2:00. I want to be able to convert and store the values in the database as UTC.

Unfortunately the standard DateTime.Parse() function of .NET will parse as this (all November 3rd 2013):

| Time String | Parsed Local Time | In DST | Parsed Local Time to UTC | 12:45 am | 12:45 am | Yes | 4:45 am | 12:59:59 am | 12:59:59 am | Yes | 4:59:59 am | 01:00 am | 1:00 am | No | 6:00 am | 01:15 am | 1:15 am | No | 6:15 am 

So it never sees the 1:00-1:59:59 am range as being in DST and my parsed timestamps in UTC jumps an hour.

Is there a library or class out there that will allow me to parse the timestamps and take into account the change in DST? Like some instantiatable class that will remember the stream of timestamps it's already received and adjust the parsed timestamp accordingly?

Assumptions about the data that can be made while parsing:

  1. I have the start time of file (timestamp of the first record) in the header section of the file in both local and UTC.
  2. The records are in order by timestamp
  3. All local times are in Eastern Standard
  4. The data could also go the other way: from out of DST into it
  5. The records contain a full timestamp in the format: yyyy/mm/dd HH:mm:ss (2013/11/03 00:45:00)

Note: While my software is in C#, I did not tag C#/.NET specifically as I figured I could use any language's implementation of a solution and recode if necessary.

3
  • can successive timestamps in your file go backwards in UTC? Commented Oct 6, 2014 at 16:44
  • No, they can be assumed to be in order of UTC. Commented Oct 6, 2014 at 17:02
  • 1
    I was just being lazy in typing the table above. The timestamp field in the data file contains a full date and time. I'll add it to the above assumptions. Commented Oct 6, 2014 at 17:34

2 Answers 2

3

In C#:

// Define the input values. string[] input = { "2013-11-03 00:45:00", "2013-11-03 01:00:00", "2013-11-03 01:15:00", "2013-11-03 01:30:00", "2013-11-03 01:45:00", "2013-11-03 01:00:00", "2013-11-03 01:15:00", "2013-11-03 01:30:00", "2013-11-03 01:45:00", "2013-11-03 02:00:00", }; // Get the time zone the input is meant to be interpreted in. TimeZoneInfo tz = TimeZoneInfo.FindSystemTimeZoneById("Eastern Standard Time"); // Create an array for the output values DateTimeOffset[] output = new DateTimeOffset[input.Length]; // Start with the assumption that DST is active, as ambiguities occur when moving // out of daylight time into standard time. bool dst = true; // Iterate through the input. for (int i = 0; i < input.Length; i++) { // Parse the input string as a DateTime with Unspecified kind DateTime dt = DateTime.ParseExact(input[i], "yyyy-MM-dd HH:mm:ss", CultureInfo.InvariantCulture); // Determine the offset. TimeSpan offset; if (tz.IsAmbiguousTime(dt)) { // Get the possible offsets, and use the DST flag and the previous entry // to determine if we are past the transition point. This only works // because we have outside knowledge that the items are in sequence. TimeSpan[] offsets = tz.GetAmbiguousTimeOffsets(dt); offset = dst && (i == 0 || dt >= output[i - 1].DateTime) ? offsets[1] : offsets[0]; } else { // The value is unambiguous, so just get the single offset it can be. offset = tz.GetUtcOffset(dt); } // Use the determined values to construct a DateTimeOffset DateTimeOffset dto = new DateTimeOffset(dt, offset); // We can unambiguously check a DateTimeOffset for daylight saving time, // which sets up the DST flag for the next iteration. dst = tz.IsDaylightSavingTime(dto); // Save the DateTimeOffset to the output array. output[i] = dto; } // Show the output for debugging foreach (var dto in output) { Console.WriteLine("{0:yyyy-MM-dd HH:mm:ss zzzz} => {1:yyyy-MM-dd HH:mm:ss} UTC", dto, dto.UtcDateTime); } 

Output:

2013-11-03 00:45:00 -04:00 => 2013-11-03 04:45:00 UTC 2013-11-03 01:00:00 -04:00 => 2013-11-03 05:00:00 UTC 2013-11-03 01:15:00 -04:00 => 2013-11-03 05:15:00 UTC 2013-11-03 01:30:00 -04:00 => 2013-11-03 05:30:00 UTC 2013-11-03 01:45:00 -04:00 => 2013-11-03 05:45:00 UTC 2013-11-03 01:00:00 -05:00 => 2013-11-03 06:00:00 UTC 2013-11-03 01:15:00 -05:00 => 2013-11-03 06:15:00 UTC 2013-11-03 01:30:00 -05:00 => 2013-11-03 06:30:00 UTC 2013-11-03 01:45:00 -05:00 => 2013-11-03 06:45:00 UTC 2013-11-03 02:00:00 -05:00 => 2013-11-03 07:00:00 UTC 

Note that this assumes that the first time you encounter an ambiguous time like 1:00 that it will be in DST. Say your list was truncated to just the last 5 entries - you wouldn't know that those were in standard time. There's not much you could do in that particular case.

Sign up to request clarification or add additional context in comments.

2 Comments

Dude. Fantastic. I didn't know about TimeZoneInfo and was basically implemented my own IsAmbiguousTime() but didn't have a good way to get the ambiguous offsets. And I was trying to drill too far into the guessing how I should adjust forwards or backwards. THANK YOU! I was able to easily encapsulate this into a class and add some unit tests. It works going into DST too (as long as you have at least one value before the change).
There shouldn't be any requirement to have one value before the change going into DST. In that transition, there is a gap but there is no valid time that is ambiguous. See also the dst tag wiki.
2

If successive timestamps can't go backwards if expressed as time in UTC then this Python script can convert the local time into UTC:

#!/usr/bin/env python3 import sys from datetime import datetime, timedelta import pytz # $ pip install pytz tz = pytz.timezone('America/New_York' if len(sys.argv) < 2 else sys.argv[1]) previous = None #XXX set it from UTC time: `first_entry_utc.astimezone(tz)` for line in sys.stdin: # read from stdin naive = datetime.strptime(line.strip(), "%Y/%m/%d %H:%M:%S") # no timezone try: local = tz.localize(naive, is_dst=None) # attach timezone info except pytz.AmbiguousTimeError: # assume ambiguous time always corresponds to True -> False transition local = tz.localize(naive, is_dst=True) if previous >= local: # timestamps must be increasing local = tz.localize(naive, is_dst=False) assert previous < local #NOTE: allow NonExistentTimeError to propagate (there shouldn't be # invalid local times in the input) previous = local utc = local.astimezone(pytz.utc) timestamp = utc.timestamp() time_format = "%Y-%m-%d %H:%M:%S %Z%z" print("{local:{time_format}}; {utc:{time_format}}; {timestamp:.0f}" .format_map(vars())) 

Input

2013/11/03 00:45:00 2013/11/03 01:00:00 2013/11/03 01:15:00 2013/11/03 01:30:00 2013/11/03 01:45:00 2013/11/03 01:00:00 2013/11/03 01:15:00 2013/11/03 01:30:00 2013/11/03 01:45:00 2013/11/03 02:00:00 

Output

2013-11-03 00:45:00 EDT-0400; 2013-11-03 04:45:00 UTC+0000; 1383453900 2013-11-03 01:00:00 EDT-0400; 2013-11-03 05:00:00 UTC+0000; 1383454800 2013-11-03 01:15:00 EDT-0400; 2013-11-03 05:15:00 UTC+0000; 1383455700 2013-11-03 01:30:00 EDT-0400; 2013-11-03 05:30:00 UTC+0000; 1383456600 2013-11-03 01:45:00 EDT-0400; 2013-11-03 05:45:00 UTC+0000; 1383457500 2013-11-03 01:00:00 EST-0500; 2013-11-03 06:00:00 UTC+0000; 1383458400 2013-11-03 01:15:00 EST-0500; 2013-11-03 06:15:00 UTC+0000; 1383459300 2013-11-03 01:30:00 EST-0500; 2013-11-03 06:30:00 UTC+0000; 1383460200 2013-11-03 01:45:00 EST-0500; 2013-11-03 06:45:00 UTC+0000; 1383461100 2013-11-03 02:00:00 EST-0500; 2013-11-03 07:00:00 UTC+0000; 1383462000 

9 Comments

Thank you for this, it's great. The other solution was in C#, so more easily applied to my project, but for those using Python this seems pretty straight forward.
@MattJohnson: it is nice but it fails for 2014-10-26 in Europe/Moscow timezone with the latest tz database (dst is 0 before and after the jump).
Hmmmm... tz.normalize(tz.localize(datetime(2014,10,26,1,0,0)) + timedelta(hours=1)) works. (Though it would be better not to hardcode the 1 hour). But you're right that localize by itself has no way to give the +03:00 offset because is_dst would be false for both.
@MattJohnson: tz.normalize() fixes nothing here: tz.localize(datetime(2014,10,26,1,0,0) + timedelta(hours=1)) also produces +0300. I don't see it helps to parse the input (replace 2013/11/03 with 2014/10/26 above) in Europe/Moscow timezone.
The time changes. Without normalize, you get 02:00+3:00. With normalize, you get 01:00+3:00
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.