DateTimeFormatter parsing - timezone names and daylight savings overlap times

Question

For improving performance of some legacy code, I am considering a replacement of java.text.SimpleDateFormat by java.time.format.DateTimeFormatter.

Among the tasks performed is parsing date/time values that had been serialized using java.util.Date.toString. With SimpleDateFormat, it was possible to turn them back into the original timestamps (neglecting fractional seconds), however I am facing problems when attempting to do the same with DateTimeFormatter.

When formatting with either, my local timezone is indicated as CET or CEST, depending on whether daylight savings time is in effect for the time to be formatted. However it appears that at parsing time, both CET and CEST are treated the same by DateTimeFormatter.

This creates a problem with the overlap occurring at the end of daylight savings time. When formatting, 02:00:00 is created twice, for times one hour apart, but with CEST and CET timezone names - which is fine. But at parsing time, that difference can't be reclaimed.

Here is an example:

long msecPerHour = 3600000L; long cet_dst_2016 = 1477778400000L; DateTimeFormatter formatter = DateTimeFormatter.ofPattern("EEE MMM dd HH:mm:ss zzz yyyy", Locale.ENGLISH); ZoneId timezone = ZoneId.of("Europe/Berlin"); for (int hours = 0; hours < 6; ++hours) { long time = cet_dst_2016 + msecPerHour * hours; String formatted = formatter.format(Instant.ofEpochMilli(time).atZone(timezone)); long parsedTime = Instant.from(formatter.parse(formatted)).toEpochMilli(); System.out.println(formatted + ", diff: " + (parsedTime - time)); }

which results in

Sun Oct 30 00:00:00 CEST 2016, diff: 0 Sun Oct 30 01:00:00 CEST 2016, diff: 0 Sun Oct 30 02:00:00 CEST 2016, diff: 0 Sun Oct 30 02:00:00 CET 2016, diff: -3600000 Sun Oct 30 03:00:00 CET 2016, diff: 0 Sun Oct 30 04:00:00 CET 2016, diff: 0

It shows that the second occurrence of 02:00:00, inspite of the different timezone name, was treated like the first one. So the result effectively is off by one hour.

Obviously the formatted string has all information available, and SimpleDateFormat parsing in fact honored it. Is it possible to roundtrip through formatting and parsing, using DateTimeFormatter, with the given pattern?

JodaStephen · Accepted Answer · 2017-03-30 10:53:23Z

It is possible for a specific case:

DateTimeFormatter formatter = new DateTimeFormatterBuilder() .appendPattern("EEE MMM dd HH:mm:ss ") .appendText(OFFSET_SECONDS, ImmutableMap.of(2L * 60 * 60, "CEST", 1L * 60 * 60, "CET")) .appendPattern(" yyyy") .toFormatter(Locale.ENGLISH);

This maps the exact offset to the expected text. Where this fails is when you need to deal with more than one time-zone.

To do the job properly requires a JDK change.

Thank you, as well for filing the JDK issue. Indeed I have to support other timezones as well (e.g. PDT/PST), so I will stick to parsing with SimpleDateFormat for patterns that contain z specifiers.

Michael · Accepted Answer · 2021-11-11 15:41:05Z

It seems like a bug. I tested in Java 17 and it's still the same behaviour. I dug into the parsing logic and I can see why this happens.

One of the first things that happens is TimeZoneNameUtility.getZoneStrings(locale) is called. This gives you a 2D array of Strings

[ [ "Europe/Paris", "Central European Standard Time", "CET", "Central European Summer Time", "CEST", "Central European Time", "CET" ], // others ]

It builds a prefix tree out of them. All items in here get mapped to the 0th item - "Europe/Paris". When it's parsing, it descends the prefix tree one character at a time e.g. C... E... T..., then returns a match if there was one. Since CEST and CET map to the same thing, they're effectively just aliases of one another.

Later on that string is passed to ZoneId.of() which means the fact of whether it's summertime or not has been thrown away.

It does seem in Java 18 that there have been significant changes in this code, so maybe they're addressing that.

Anonymous · Accepted Answer · 2021-11-25 00:27:11Z

The general workaround

JodaStephen, the main author of java.time, in his answer shows a workaround for the case of CET and CEST (Central European Time and Central European Summer Time). I present a workaround that I believe will work in all time zones having different abbreviations for standard time and summer time (DST).

public static ZonedDateTime parse(String text) { ZonedDateTime result = ZonedDateTime.parse(text, FORMATTER); if (result.format(FORMATTER).equals(text)) { return result; } // Default we get the earlier offset at overlap, // so if it didn’t work, try the later offset result = result.withLaterOffsetAtOverlap(); if (result.format(FORMATTER).equals(text)) { return result; } // As a last desperate attempt, try earlier offset explicitly result = result.withEarlierOffsetAtOverlap(); if (result.format(FORMATTER).equals(text)) { return result; } // Give up throw new IllegalArgumentException(); }

The method could use any formatter with a time zone name or abbreviation as long as it’s supposed to give the same output from formatting as the input it parses (so optional parts are a no-no, for example). I have assumed a formatter equivalent to yours:

private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("EEE MMM dd HH:mm:ss zzz yyyy", Locale.ROOT);

Your trouble was with a millisecond value of 1 477 789 200 000, which was formatted into Sun Oct 30 02:00:00 CET 2016 and then parsed to 1 477 785 600 000 for a difference of -3 600 000 milliseconds. So let’s try my method with that one.

private static final ZoneId TIME_ZONE = ZoneId.of("Europe/Berlin"); long trouble = 1_477_789_200_000L; String formatted = Instant.ofEpochMilli(trouble).atZone(TIME_ZONE).format(FORMATTER); ZonedDateTime zdt = parse(formatted); long parsedTime = zdt.toInstant().toEpochMilli(); System.out.println(formatted + ", diff: " + (parsedTime - trouble));

Output is:

Sun Oct 30 02:00:00 CET 2016, diff: 0

But don’t parse three letter time zone abbreviations

All of the above said, even with a workaround for that case of the fall overlap, you are on shaky ground when trying to parse time zone abbreviations. Most of the most common ones are ambiguous, and you don’t know what you get from parsing. In the case of CET and CEST, they are common abbreviations for very many European time zones that at present share offset +01:00 during standard time and +02:00 during summer time, but historically have had their own offset each and are likely to go separate ways again since the EU has decided to give up summer time completely. Next year one time zone may use CET all year and another CEST all year. My code above does not account for that.

Instead simply take the output from ZonedDateTime.toString and parse it back using the one-arg ZonedDateTime.parse(CharSequence).

Collectives™ on Stack Overflow

DateTimeFormatter parsing - timezone names and daylight savings overlap times

3 Answers 3

1 Comment

Comments

The general workaround

But don’t parse three letter time zone abbreviations

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

The general workaround

But don’t parse three letter time zone abbreviations

Comments

Linked

Related