Skip to content

chore: Add EagerSeries*Namespace protocols #2294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 27, 2025
Merged

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Mar 25, 2025

Closes #2230

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

- Needs to work differently to `Expr` version
Using `.cat` as a test case, since it has only 1 method
Pushing to try out `3.8`& `3.10` bugs in CI again
- Regular `Protocol` failed on `3.8`, `3.9`, `3.10`
  - (f4c0cd6)
- Testing if I need on **every** protocol, or just the one with an `__init__`
- Thinking this allows using the `Protocol` defs within `EagerSeries` properties

6a39d56
They were added before the package existed and don't belong in `utils`
Comment on lines +310 to +343
class _SeriesNamespace( # type: ignore[misc]
_StoresCompliant[CompliantSeriesT_co],
_StoresNative[NativeSeriesT_co],
Protocol[CompliantSeriesT_co, NativeSeriesT_co],
):
_compliant_series: CompliantSeriesT_co

@property
def compliant(self) -> CompliantSeriesT_co:
return self._compliant_series

@property
def native(self) -> NativeSeriesT_co:
return self._compliant_series.native # type: ignore[no-any-return]

def from_native(self, series: Any, /) -> CompliantSeriesT_co:
return self.compliant._from_native_series(series)


class EagerSeriesNamespace(
_SeriesNamespace[EagerSeriesT_co, NativeSeriesT_co],
Generic[EagerSeriesT_co, NativeSeriesT_co],
):
_compliant_series: EagerSeriesT_co

def __init__(self, series: EagerSeriesT_co, /) -> None:
self._compliant_series = series


class EagerSeriesCatNamespace( # type: ignore[misc]
_SeriesNamespace[EagerSeriesT_co, NativeSeriesT_co],
CatNamespace[EagerSeriesT_co],
Protocol[EagerSeriesT_co, NativeSeriesT_co],
): ...
Copy link
Member Author

@dangotbanned dangotbanned Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that these are in the right place, I can collect my thoughts on the 3 solutions we can use for Protocol definintions with an __init__.

Problem

Defining a constructor in a Protocol can produce two different bugs.
I've documented that in (#2064 (comment)) - but the short of it is 3.8 and 3.(9|10) have issues.

Solutions

1. Protocol38

This was the first one I discovered (for #2064) and I really dislike it due to the complexity.

"Fixing" 3.8 requires tricking the type checker into thinking typing.Generic is typing.Protocol.

Imports

if not TYPE_CHECKING: # pragma: no cover
if sys.version_info >= (3, 9):
from typing import Protocol as Protocol38
else:
from typing import Generic as Protocol38
else: # pragma: no cover
# TODO @dangotbanned: Remove after dropping `3.8` (#2084)
# - https://github.com/narwhals-dev/narwhals/pull/2064#discussion_r1965921386
from typing import Protocol as Protocol38

The side-effect is every sub-protocol is then required to use that Protocol38 definition or face runtime errors - that can no-longer be caught statically

E   TypeError: Protocols can only inherit from other protocols, got <class 'narwhals._compliant.series.EagerSeriesNamespace'>

"Fixing" 3.(9|10) requires redefining __init__ anyway in the impl.
Which mostly negates the benefit of defining in the Protocol πŸ€¦β€β™‚οΈ

Repeat __init__

def __init__(
self: Self,
call: Callable[[EagerDataFrameT], Sequence[EagerSeriesT]],
*,
depth: int,
function_name: str,
evaluate_output_names: Callable[[EagerDataFrameT], Sequence[str]],
alias_output_names: Callable[[Sequence[str]], Sequence[str]] | None,
implementation: Implementation,
backend_version: tuple[int, ...],
version: Version,
call_kwargs: dict[str, Any] | None = None,
) -> None: ...

def __init__(
self: Self,
call: Callable[[ArrowDataFrame], Sequence[ArrowSeries]],
*,
depth: int,
function_name: str,
evaluate_output_names: Callable[[ArrowDataFrame], Sequence[str]],
alias_output_names: Callable[[Sequence[str]], Sequence[str]] | None,
backend_version: tuple[int, ...],
version: Version,
call_kwargs: dict[str, Any] | None = None,
implementation: Implementation | None = None,
) -> None:
self._call = call
self._depth = depth
self._function_name = function_name
self._depth = depth
self._evaluate_output_names = evaluate_output_names
self._alias_output_names = alias_output_names
self._backend_version = backend_version
self._version = version
self._call_kwargs = call_kwargs or {}
self._metadata: ExprMetadata | None = None

def __init__(
self: Self,
call: Callable[[PandasLikeDataFrame], Sequence[PandasLikeSeries]],
*,
depth: int,
function_name: str,
evaluate_output_names: Callable[[PandasLikeDataFrame], Sequence[str]],
alias_output_names: Callable[[Sequence[str]], Sequence[str]] | None,
implementation: Implementation,
backend_version: tuple[int, ...],
version: Version,
call_kwargs: dict[str, Any] | None = None,
) -> None:
self._call = call
self._depth = depth
self._function_name = function_name
self._evaluate_output_names = evaluate_output_names
self._alias_output_names = alias_output_names
self._implementation = implementation
self._backend_version = backend_version
self._version = version
self._call_kwargs = call_kwargs or {}
self._metadata: ExprMetadata | None = None

2. Define the constructor another way

(#2261) introduced a different solution - but still needed to use Protocol38 to avoid the metaclass issues of the first solution.

I like this a lot more as it feels more intuitive from the outside - given the clear chain of @classmethod and @property definitions.

when-then

def when(self: Self, predicate: ArrowExpr) -> ArrowWhen:
return ArrowWhen.from_expr(predicate, context=self)

class ArrowWhen(EagerWhen[ArrowDataFrame, ArrowSeries, ArrowExpr, "ArrowChunkedArray"]):
@property
def _then(self) -> type[ArrowThen]:
return ArrowThen

@classmethod
def from_expr(cls, condition: ExprT, /, *, context: _FullContext) -> Self:
obj = cls.__new__(cls)
obj._condition = condition
obj._then_value = None
obj._otherwise_value = None
obj._implementation = context._implementation
obj._backend_version = context._backend_version
obj._version = context._version
return obj

@property
def _then(self) -> type[CompliantThen[FrameT, SeriesT, ExprT]]: ...

def then(
self, value: IntoExpr[SeriesT, ExprT], /
) -> CompliantThen[FrameT, SeriesT, ExprT]:
return self._then.from_when(self, value)

@classmethod
def from_when(
cls,
when: CompliantWhen[FrameT, SeriesT, ExprT],
then: IntoExpr[SeriesT, ExprT],
/,
) -> Self:
when._then_value = then
obj = cls.__new__(cls)
obj._call = when
obj._when_value = when
obj._depth = 0
obj._function_name = "whenthen"
obj._evaluate_output_names = getattr(
then, "_evaluate_output_names", lambda _df: ["literal"]
)
obj._alias_output_names = getattr(then, "_alias_output_names", None)
obj._implementation = when._implementation
obj._backend_version = when._backend_version
obj._version = when._version
obj._call_kwargs = {}
return obj

The only downside I could see is __new__ being a less-understood method than __init__.
Personally it took me a while to understand when it makes sense to use.
Even now that I do - I didn't think of it as a solution until a month after (#2064) merged πŸ˜…

3. Avoid constructors entirely in Protocol(s)

That is the solution this comment is attached to.

It might be easy to miss, but _SeriesNamespace is a Protocol and EagerSeriesNamespace is a Generic class.

The interesting aspect of that is the Protocol(s) represent the interface - but the concrete impl is separate.
This approach makes sense when the interface and implementation differ greatly.
Here it is ArrowSeries and PandasLikeSeries - but we go fully structural for all of Polars

@dangotbanned
Copy link
Member Author

dangotbanned commented Mar 26, 2025

https://github.com/narwhals-dev/narwhals/actions/runs/14091955981/job/39470550165?pr=2294

Haven't figured out why yet, but the .pivot(...) here is throwing for Series.dt

def render_table_and_write_to_output(
results: list[pl.DataFrame], # pyright: ignore[reportRedeclaration]
title: str,
output_filename: str,
) -> None:
results: pl.DataFrame = (
pl.concat(results)
.with_columns(supported=pl.lit(":white_check_mark:"))
.pivot(on="Backend", values="supported", index=["Method"])

First guess would be the extra properties for PandasLike in

class PandasLikeSeriesNamespace(EagerSeriesNamespace["PandasLikeSeries", Any]):
@property
def implementation(self) -> Implementation:
return self.compliant._implementation
@property
def backend_version(self) -> tuple[int, ...]:
return self.compliant._backend_version
@property
def version(self) -> Version:
return self.compliant._version

Update

Resolved in (3ff77a0)

@dangotbanned dangotbanned changed the title chore(DRAFT): Add EagerSeries*Namespace protocols chore: Add EagerSeries*Namespace protocols Mar 26, 2025
@dangotbanned dangotbanned marked this pull request as ready for review March 26, 2025 21:02
@dangotbanned dangotbanned mentioned this pull request Mar 26, 2025
15 tasks
@dangotbanned dangotbanned linked an issue Mar 26, 2025 that may be closed by this pull request
15 tasks
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dangotbanned !

@MarcoGorelli MarcoGorelli merged commit 47d0d80 into main Mar 27, 2025
28 checks passed
@MarcoGorelli MarcoGorelli deleted the series-ns-cleanup branch March 27, 2025 18:50
dangotbanned added a commit that referenced this pull request Apr 1, 2025
Still need to untangle `CompliantExpr` + friends

#2294 (comment)
MarcoGorelli pushed a commit that referenced this pull request Apr 4, 2025
* refactor: Add `SelectorNamespace.from_namespace`

#2294 (comment)

* refactor: Reuse pattern from `when_then`

* refactor: Only use `Protocol38` in sub `CompliantExpr`

Still need to untangle `CompliantExpr` + friends

#2294 (comment)

* refactor: Simplify typing

* refactor: Move `Eval*` aliases to `_compliant.typing`
@dangotbanned dangotbanned mentioned this pull request Jun 20, 2025
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Formalise Compliant* protocols
2 participants