Skip to content

fix TypeVar used in (SparkLike|DuckDB)Expr #2044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dangotbanned opened this issue Feb 19, 2025 · 5 comments · Fixed by #2149
Closed

fix TypeVar used in (SparkLike|DuckDB)Expr #2044

dangotbanned opened this issue Feb 19, 2025 · 5 comments · Fixed by #2149

Comments

@dangotbanned
Copy link
Member

dangotbanned commented Feb 19, 2025

Originally posted by @EdAbati in #2005 (comment)

This reminds me that we should fix the typing for CompliantSeriesT_co
for Spark we use Column, in duckdb Expression. They are not compliant series :/

@MarcoGorelli
Copy link
Member

thanks!

we may want to just make CompliantExpr generic in some TypeVar, rather than CompliantSeriesT_co which would then force us to implement unnecessary compliant series for duckdb / pyspark

@dangotbanned
Copy link
Member Author

dangotbanned commented Feb 19, 2025

#2044 (comment)

Sounds interesting @MarcoGorelli

I'm exploring an idea that seems it could be related - I should have something more concrete to share later today.

Essentially, I think the pandas & pyarrow impls could be better represented in the type system with sub-protocols.
I was initially looking at this to fix #2035 (comment) and all the # type: ignore(s) in _expression_parsing.py but the deeper I go - the more it seems that we could share more code between the two (#1942).

It sounds like pyspark & duckdb may also benefit from something like that - even if it's only a small thing like a TypeVar and not anything structural

@MarcoGorelli
Copy link
Member

sure sounds good, thanks for your investigations!

@dangotbanned
Copy link
Member Author

sure sounds good, thanks for your investigations!

@MarcoGorelli I've just pushed to https://github.com/narwhals-dev/narwhals/tree/reuse-sub-protocols
Everything seems to be type-checking a lot better 🎉

Hope you can get the gist of where I'm going there .

Didn't wanna make it a PR yet since it'll touch too much and be a headache to keep in sync

dangotbanned added a commit that referenced this issue Feb 20, 2025
Didn't resolve any issues, better solution in (#2044 (comment))
@dangotbanned
Copy link
Member Author

dangotbanned commented Feb 21, 2025

thanks!

we may want to just make CompliantExpr generic in some TypeVar, rather than CompliantSeriesT_co which would then force us to implement unnecessary compliant series for duckdb / pyspark

@MarcoGorelli
Would this be an opportunity to encode the levels of support concept into the type system? (ignoring interchange)

The distinction would be EagerExpr and LazyExpr:

CompliantSeriesT_co = TypeVar(
    "CompliantSeriesT_co", bound="CompliantSeries", covariant=True
)
NativeExprT_co = TypeVar("NativeExprT_co", bound="NativeExpr", covariant=True)


class CompliantSeries(Protocol): ...


class NativeExpr(Protocol): ...
# TODO: Find some common property each share


class EagerExpr(Generic[CompliantSeriesT_co], Protocol): ...


class LazyExpr(Generic[NativeExprT_co], Protocol): ...

EagerExpr would be what we currently have for CompliantExpr:

CompliantExpr

class CompliantExpr(Protocol, Generic[CompliantSeriesT_co]):
_implementation: Implementation
_backend_version: tuple[int, ...]
_evaluate_output_names: Callable[
[CompliantDataFrame | CompliantLazyFrame], Sequence[str]
]
_alias_output_names: Callable[[Sequence[str]], Sequence[str]] | None
_depth: int
_function_name: str
def __call__(self, df: Any) -> Sequence[CompliantSeriesT_co]: ...
def __narwhals_expr__(self) -> None: ...
def __narwhals_namespace__(self) -> CompliantNamespace[CompliantSeriesT_co]: ...
def is_null(self) -> Self: ...
def alias(self, name: str) -> Self: ...
def cast(self, dtype: DType) -> Self: ...
def __and__(self, other: Any) -> Self: ...
def __or__(self, other: Any) -> Self: ...
def __add__(self, other: Any) -> Self: ...
def __sub__(self, other: Any) -> Self: ...
def __mul__(self, other: Any) -> Self: ...
def __floordiv__(self, other: Any) -> Self: ...
def __truediv__(self, other: Any) -> Self: ...
def __mod__(self, other: Any) -> Self: ...
def __pow__(self, other: Any) -> Self: ...
def __gt__(self, other: Any) -> Self: ...
def __ge__(self, other: Any) -> Self: ...
def __lt__(self, other: Any) -> Self: ...
def __le__(self, other: Any) -> Self: ...
def broadcast(
self, kind: Literal[ExprKind.AGGREGATION, ExprKind.LITERAL]
) -> Self: ...

And LazyExpr would be for: DaskExpr, DuckDBExpr, SparkLikeExpr.
Where they would be redefined as:

class DaskExpr(LazyExpr["dask.dataframe.Series"]): ...
class DuckDBExpr(LazyExpr["duckdb.Expression"]): ...
class SparkLikeExpr(LazyExpr["pyspark.sql.Column"]): ...

Note

(Eager|Lazy)Expr would likely be a sub-protocol of the current CompliantExpr or some newly named thing.
Just trying to keep this minimal here.

dangotbanned added a commit that referenced this issue Feb 22, 2025
- Only showing locally when `dask` is installed
- Hoping it doesn't show in CI as unused
- Can't fix until #2044
dangotbanned added a commit that referenced this issue Feb 22, 2025
Didn't resolve any issues, better solution in (#2044 (comment))
dangotbanned added a commit that referenced this issue Feb 28, 2025
- Big overlap with the gaps in `DaskExpr`
- Probably should move bits into #2044 (comment)
dangotbanned added a commit that referenced this issue Mar 1, 2025
@dangotbanned dangotbanned linked a pull request Mar 6, 2025 that will close this issue
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants