-
Notifications
You must be signed in to change notification settings - Fork 150
chore(typing): SparkLikeExpr
properties
#2152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@MarcoGorelli if you merge this ahead of #2132 - you'll be able to get more useful IDE feedback from I checked out the PR and was confused why I didn't get hints from #2051 |
narwhals/_spark_like/expr_str.py
Outdated
def is_naive(obj: Any) -> bool: | ||
return obj is not None and {"%s", "%z", "Z"}.isdisjoint(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def is_naive(obj: Any) -> bool: | |
return obj is not None and {"%s", "%z", "Z"}.isdisjoint(obj) | |
def is_naive(format: str) -> bool: | |
return {"%s", "%z", "Z"}.isdisjoint(format) |
just to signal that this checks the date time format (should we maybe call it is_naive_format
?).
Also based on how it is used, format
will always be a non-null str right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @EdAbati, yeah is_naive_format
sounds good to me
I renamed format
to obj
to avoid a ruff warning for shadowing builtins.format
Also narrowing from Any
to str
sounds good as well.
I originally wrote it using a TypeIs
but removed that since it gave a false-positive on unreachability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy for you to make those changes and merge π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think the current set
needs to be changed to:
{"s", "z", "Z"}
I'm relying on format
splitting into a sequence of characters - but two of them are 2-char sequences.
I wonder why that didn't fail a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why that didn't fail a test?
I actually haven't seen a test covering this π double checking with the datetime wizard @MarcoGorelli , is it something that we are not testing on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably related to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm reading this wrong, but I dont think we have coverage for any backend with a tz-aware format for .str.to_datetime(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli really need some help on writing a test for this π
Right now I've got something kinda working for a lot of backends.
But it fails to parse for sqlframe
, pyarrow
.
Also polars.Expr.str.to_datetime
seems to have more options - that might be able to preserve the timezone offset?
test_to_datetime_tz_aware
@pytest.mark.parametrize(
("fmt", "data", "expected_pandas", "expected_polars_duckdb"),
[
(
"%Y-%m-%d %H:%M:%S%z",
{"a": ["2020-01-01 12:34:56+0200"]},
"2020-01-01 12:34:56+02:00",
"2020-01-01 10:34:56+00:00",
)
],
)
def test_to_datetime_tz_aware(
constructor: Constructor,
fmt: str,
data: dict[str, Sequence[str]],
expected_pandas: str,
expected_polars_duckdb: str,
) -> None:
constructor_str = str(constructor)
if "pandas" in constructor_str or "dask" in constructor_str:
expected = expected_pandas
elif "polars" in constructor_str or "duckdb" in constructor_str:
expected = expected_polars_duckdb
else:
expected = "fails"
result = (
nw.from_native(constructor(data))
.lazy()
.select(b=nw.col("a").str.to_datetime(fmt))
.collect()
.item(row=0, column="b")
)
assert str(result) == expected
I can fully understand nope-ing out of handling timezones for all these different packages π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright so (e19dcdf) -> (194ef83) now has a single test
sqlframe
works, pyspark
apprently doesn't https://github.com/narwhals-dev/narwhals/actions/runs/13700361438/job/38312364899?pr=2152
Absolutely don't want to do timezones again π©
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FBruzzesi I see you added strptime_to_pyspark_format
in #1826
AFAICT there were never any tests for tz-aware timestamps and right now I can't get one to work π
Tried my best, but as of (ce3145d) I'm giving up
Maybe this was just a TZDATA issue locally? https://github.com/narwhals-dev/narwhals/actions/runs/13699734154/job/38310256617?pr=2152
Not even sure what `pyarrow` is doing here https://github.com/narwhals-dev/narwhals/actions/runs/13700021595/job/38311197947?pr=2152
SparkLikeExpr
propertiesSparkLikeExpr
property typing, fix str.to_datetime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for looking into this!
I think this PR is doing too much, please either address typing or fix str.to_datetime
the former should be fine, some comments for the latter:
- can we keep
format
even if it overrides the builtin? - we should probably make sure that we consistently parse into UTC, so there should only be a single
expected
fortest_to_datetime_tz_aware
I can take care of the timezone handling separately, but it does need splitting out
SparkLikeExpr
property typing, fix str.to_datetime
SparkLikeExpr
properties
Thanks for the review @MarcoGorelli Caught me at a good time with the feedback - hopefully I've addressed it all in:
Phew - I'm in no hurry to deal w/ timezones again π EditJust for some clarity - I touched that method to resolve these https://github.com/narwhals-dev/narwhals/actions/runs/13682912007/job/38259412675?pr=2152 Looking back now - I'm not sure that was super clear |
narwhals/_spark_like/expr_str.py
Outdated
def is_naive_format(format: str) -> bool: # noqa: A002 | ||
return {"s", "z", "Z"}.isdisjoint(format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't really like how we're changing the logic here, 'z'
in a format string means the 'z'
literal, it's '%z'
we need to check for
the rest looks good though, well done!
PR to address time zone aware parsing: #2166
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MarcoGorelli
I'm happy for that bit to be reverted and just keep the splitting out into the new function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of any(x in (...))
check would probably be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure! if you're happy with #2166 to add a test and adjust some logic, happy to then take this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully got it done (311c7ea)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dangotbanned much appreciated!
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Porting over (#2051), didn't realize this was declared twice until (#2132)