Skip to content

Implement zoneinfo parsing/compilation and add TZif structs #257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Jun 21, 2025

Conversation

nekevss
Copy link
Member

@nekevss nekevss commented Apr 2, 2025

It's time to finally tackle the zoneinfo compilation and tzif sourcing issue. The work here lays the baseline for to handle sourcing. There's still a lot more to add, and maybe some bugs to uncover. But this is also major progress to supporting the ability to source tzifs.

Opening as a draft first for early feedback due to just how dense this actually is. So far, the work here allows us to parse zoneinfo files and compile sets of transition data from the parsed zoneinfo files. There is also related work in provider to setup the tzif structures.

The steps missing from this PR is:

  • All that's left is to bridge the gap from transition sets -> TZIFs,
  • Plug everything into the bakeddata tool or a new connected tool.

@nekevss nekevss requested a review from jedel1043 April 2, 2025 03:04
@nekevss nekevss force-pushed the impl-zoneinfo-support branch 2 times, most recently from f0dd735 to fe9d51b Compare April 2, 2025 03:16
It's time to finally tackle the zoneinfo and tzif sourcing
issue. The work here lays the baseline for to handle sourcing.
There's still a lot more to add, and maybe some bugs to
uncover. But this is also major progress.
@nekevss nekevss force-pushed the impl-zoneinfo-support branch from 03303bd to 3a9ffcc Compare April 4, 2025 05:36
@nekevss nekevss force-pushed the impl-zoneinfo-support branch from 3a9ffcc to a12f14e Compare April 4, 2025 13:33
Copy link
Member Author

@nekevss nekevss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some general comments for context.

write!(file, "//@generated\n\n{generated}")
}

fn write_debug(&self, debug_path: &Path) -> io::Result<()> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there may be a better way to output the debug info rather than just one single JSON that will hopefully make the debugs a little more digestible

let tzif = TransitionData {
transitions,
single_line_zone,
// TODO: Handle POSIX tz string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is already really big, and sort of dense. Not sure that POSIX tz string serialization should be added to it.

But if it's preferred, then I can add it.

///
/// Please note: implementation is very minimal
#[derive(Debug)]
pub struct TzifBlockV2 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of TZif structs everywhere. I like the convenience of a related struct being in zoneinfo_compiler, but I could be convinced that it should live elsewhere.

It might be nice to use structs from tzif, but that may require some upstream changes.

})
}

pub(crate) fn calculate_transitions_for_year(
Copy link
Member Author

@nekevss nekevss Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is where a large bulk of the zoneinfo compilation work lives.

I tried my best to document it as much as possible because it's basically a rough reverse engineering of outzone. So keeping general steps in the code seemed like a good idea for whenever we have to come back to it.

@nekevss nekevss marked this pull request as ready for review April 4, 2025 22:07
@nekevss
Copy link
Member Author

nekevss commented Apr 5, 2025

Just to note: I'm working through a couple bugs that I came across when double checking some more TZifs.

@nekevss nekevss requested a review from leftmostcat May 28, 2025 02:38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove entries from the transitions table of each example? It would be nice to have something smaller for tests, not just a big file of values that we don't really use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure ... is there a different testing scheme that would be preferred.

We need to be able to test against the offset, the savings flag (is_dst), and the time zone abbreviation for correctness, so every entry in the transitions table is used in the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not proposing not testing those things, I'm proposing removing all the entries that are double testing the same things by carefully selecting a smaller range of transitions that has all the things we need to test. This would imply modifying the test data generation code to be a bit more aware about the test cases instead of just dumping everything from /usr/share/zoneinfo

Copy link
Member Author

@nekevss nekevss Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best option to do this would be to manually generate the test data in a slim format, which would contain all the primary transitions in the data, and removes the extra transitions from a bloat format that is normally found in /usr/share/zoneinfo.

I'm not exactly clear on how much that may save in the transitions though.

If you're referring to an even smaller subset of transitions from the slim format than it may be possible. Not exactly sure how at this point and it'd probably need to be worked through.

Just to note, while building this and testing this against these test cases, I've hit nearly every primary transition point. It's possible to correct a tricky transition, but in correcting it you've messed up the transition before and after and so on.

EDIT: Updating to slim removed ~6000 lines in 1b1ea48

pub transition_types: ZeroVec<'data, u8>,
// NOTE: zoneinfo64 does a fun little bitmap str
pub types: ZeroVec<'data, LocalTimeRecord>,
pub posix: &'data str,
Copy link
Member

@jedel1043 jedel1043 Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose parsing the posix string, instead of leaving it as a raw string. IIRC, zoneinfo64 has the Rules field, which could map nicely into a common rules format for tzif and zoneinfo64, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is changed in #265. If it isn't then, it would be easy to add on that follow up. This was primarily a filler at the time.

Copy link
Member Author

@nekevss nekevss Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zoneinfo64 has the Rules field, which could map nicely into a common rules format for tzif and zoneinfo64, right?

It does have a rules format, but I haven't looked super deeply at it or its purpose. It seemed a little strange or edge case based. The posix struct would map to the final* fields in the zone table entry.

EDIT: correction, they have consolidated the rules into a table. So, yes there is potential to map between the two formats if need be.

@nekevss nekevss requested a review from jedel1043 June 20, 2025 23:41
Copy link
Member

@jedel1043 jedel1043 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive work!

@nekevss nekevss merged commit 6b4868c into main Jun 21, 2025
8 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in Temporal Implementation Jun 21, 2025
@nekevss nekevss deleted the impl-zoneinfo-support branch June 21, 2025 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
providers Related to time zone providers
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants