-
Notifications
You must be signed in to change notification settings - Fork 146
chore(typing): enable typing checks for pyspark
#2051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
I'd really like this PR to be resolving warnings vs just adding ignore comments. |
Thanks for the comments, I'll move the |
pyspark
typing and related unpivot bugpyspark
Spotted an error locally that should be showing in CI https://github.com/narwhals-dev/narwhals/blob/12afbfe03ea3cc46330d2dd3520892912859926f/narwhals/dependencies.py#L234
pyspark
pyspark
https://github.com/narwhals-dev/narwhals/actions/runs/13527240491/job/37800780977?pr=2051 This one is gonna take a little extra work to fix Update(11bde2e) is a pretty fast-and-loose solution to the problem. We should think about moving the
Important Food for thought - not proposing these changes become part of this PR |
Fixes this and all the issues that propogated from solving https://github.com/narwhals-dev/narwhals/actions/runs/13527240491/job/37800780977?pr=2051
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this PR @EdAbati!
Had a lot of fun spiraling through the issues that came up here.
I'll leave this open to let someone else merge - since I've made a lot of changes that I'd also want reviewed π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit I still get easily lost in the typing shenanigans - can't spot anything that does not sound reasonable, but it's not a review that you should trust either π
|
||
from narwhals._spark_like.expr import SparkLikeExpr | ||
from narwhals._spark_like.group_by import SparkLikeLazyGroupBy | ||
from narwhals._spark_like.namespace import SparkLikeNamespace | ||
from narwhals.dtypes import DType | ||
from narwhals.utils import Version | ||
|
||
SQLFrameDataFrame: TypeAlias = _SQLFrameDataFrame[Any, Any, Any, Any, Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn that's a hell of a generic π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn that's a hell of a generic π
Ikr!
All the Unknown
(s) in this gave me quite the scare
https://github.com/narwhals-dev/narwhals/actions/runs/13527240491/job/37800780977?pr=2051
if TYPE_CHECKING: | ||
from pyspark.sql import types | ||
|
||
return types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any drawback to this? I have barely seen these conditional in non-import blocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any drawback to this? I have barely seen these conditional in non-import blocks
Oh let me introduce you to how pytest
works then π
https://github.com/pytest-dev/pytest/blob/9cf2cae94355cd83ad7e8d88f976c5a524c98cfb/src/_pytest/mark/structures.py#L432-L485
https://github.com/pytest-dev/pytest/blob/9cf2cae94355cd83ad7e8d88f976c5a524c98cfb/src/_pytest/mark/structures.py#L505-L512
https://github.com/pytest-dev/pytest/blob/9cf2cae94355cd83ad7e8d88f976c5a524c98cfb/src/_pytest/_py/path.py#L203-L217
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
polars
does some pretty complex stuff here - which includes tricking the type checker to recognise these symbols & getting lazy imports from them
https://github.com/pola-rs/polars/blob/45ec22a18c0cbdc67f56b3230f8b76149679ef02/py-polars/polars/dependencies.py
https://github.com/pola-rs/polars/blob/45ec22a18c0cbdc67f56b3230f8b76149679ef02/py-polars/polars/dataframe/frame.py#L88-L90
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π€―π€―
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any drawback to this? I have barely seen these conditional in non-import blocks
I'm not sure I'd say drawback - but I would recommend only trying this as a last resort.
Here - I'm using it to describe something the type system doesn't support yet.
Another use was #2064 (comment) - for a backwards compat bug fix.
I'd much rather do things in a simple way whenever possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a similar thing here to convert these functions into typeguards
narwhals/narwhals/_arrow/utils.py
Lines 18 to 62 in 81353a3
if TYPE_CHECKING: | |
from typing import TypeVar | |
from typing_extensions import TypeAlias | |
from typing_extensions import TypeIs | |
from narwhals._arrow.series import ArrowSeries | |
from narwhals._arrow.typing import ArrowArray | |
from narwhals._arrow.typing import ArrowChunkedArray | |
from narwhals._arrow.typing import Incomplete | |
from narwhals._arrow.typing import StringArray | |
from narwhals.dtypes import DType | |
from narwhals.typing import _AnyDArray | |
from narwhals.utils import Version | |
# NOTE: stubs don't allow for `ChunkedArray[StructArray]` | |
# Intended to represent the `.chunks` property storing `list[pa.StructArray]` | |
ChunkedArrayStructArray: TypeAlias = ArrowChunkedArray | |
_T = TypeVar("_T") | |
def is_timestamp(t: Any) -> TypeIs[pa.TimestampType[Any, Any]]: ... | |
def is_duration(t: Any) -> TypeIs[pa.DurationType[Any]]: ... | |
def is_list(t: Any) -> TypeIs[pa.ListType[Any]]: ... | |
def is_large_list(t: Any) -> TypeIs[pa.LargeListType[Any]]: ... | |
def is_fixed_size_list(t: Any) -> TypeIs[pa.FixedSizeListType[Any, Any]]: ... | |
def is_dictionary( | |
t: Any, | |
) -> TypeIs[pa.DictionaryType[Any, Any, Any]]: ... | |
def extract_regex( | |
strings: ArrowChunkedArray, | |
/, | |
pattern: str, | |
*, | |
options: Any = None, | |
memory_pool: Any = None, | |
) -> ChunkedArrayStructArray: ... | |
else: | |
from pyarrow.compute import extract_regex | |
from pyarrow.types import is_dictionary # noqa: F401 | |
from pyarrow.types import is_duration | |
from pyarrow.types import is_fixed_size_list | |
from pyarrow.types import is_large_list | |
from pyarrow.types import is_list | |
from pyarrow.types import is_timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to mention, but (#2051 (comment)) would be the direction to go in - to avoid needing TYPE_CHECKING
blocks.
Keeping everything in one class means you need to either do this - or introduce Union
(s) and isinstance
checks to resolve them on every usage - to type this correctly.
Currently, we branch on Implementation
but we can't represent the relationship between Implementation
and these unions:
pyspark.sql.types
orsqlframe.base.types
pyspark.sql.functions
orsqlframe.base.functions
pyspark.sql.Window
orsqlframe.base.window.Window
pyspark.sql.dataframe.DataFrame.sparkSession
orsqlframe.base.dataframe.BaseDataFrame.session
They're a mix of modules, classes and object attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah perfect thanks @MarcoGorelli yeah I'm happy with it if you are Edit: ffs every time with the double gif π€¦ |
* chore(typing): Add typing for `SparkLikeExpr` properties Porting over (#2051), didn't realize this was delclared twice until (#2132) * chore: fix typing and simplify `SparkLikeExprStringNamespace.to_datetime` Resolves (https://github.com/narwhals-dev/narwhals/actions/runs/13682912007/job/38259412675?pr=2152) * rename function * use single chars in set * fix: Remove timezone offset replacement * test: Adds `test_to_datetime_tz_aware` Resolves #2152 (comment) * test: possibly fix `pyarrow` in ci? Maybe this was just a TZDATA issue locally? https://github.com/narwhals-dev/narwhals/actions/runs/13699734154/job/38310256617?pr=2152 * test: xfail polars `3.8`, fix false positive pyarrow https://github.com/narwhals-dev/narwhals/actions/runs/13699804987/job/38310487932?pr=2152 https://github.com/narwhals-dev/narwhals/actions/runs/13699804987/job/38310488783?pr=2152 * test: narrower xfail, tz-less expected? Not even sure what `pyarrow` is doing here https://github.com/narwhals-dev/narwhals/actions/runs/13700021595/job/38311197947?pr=2152 * test: account for `pyarrow` version changes https://github.com/narwhals-dev/narwhals/actions/runs/13700267075/job/38312036397?pr=215 * test: maybe fix `pyspark` https://github.com/narwhals-dev/narwhals/actions/runs/13700361438/job/38312364899?pr=2152 * revert: go back to typing fixes only Addresses #2152 (review) * chore: ignore `format` shadowing #2152 (review) * keep logic the same I hope #2152 (comment) --------- Co-authored-by: Edoardo Abati <[email protected]>
- Added in narwhals-dev#2187, narwhals-dev#2051 - No longer needed since `sqlframe` only typing
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below