-
Notifications
You must be signed in to change notification settings - Fork 150
chore: Add EagerSeries*Namespace
protocols
#2294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Needs to work differently to `Expr` version
Using `.cat` as a test case, since it has only 1 method
Pushing to try out `3.8`& `3.10` bugs in CI again
- Regular `Protocol` failed on `3.8`, `3.9`, `3.10` - (f4c0cd6) - Testing if I need on **every** protocol, or just the one with an `__init__`
- Thinking this allows using the `Protocol` defs within `EagerSeries` properties 6a39d56
They were added before the package existed and don't belong in `utils`
class _SeriesNamespace( # type: ignore[misc] | ||
_StoresCompliant[CompliantSeriesT_co], | ||
_StoresNative[NativeSeriesT_co], | ||
Protocol[CompliantSeriesT_co, NativeSeriesT_co], | ||
): | ||
_compliant_series: CompliantSeriesT_co | ||
|
||
@property | ||
def compliant(self) -> CompliantSeriesT_co: | ||
return self._compliant_series | ||
|
||
@property | ||
def native(self) -> NativeSeriesT_co: | ||
return self._compliant_series.native # type: ignore[no-any-return] | ||
|
||
def from_native(self, series: Any, /) -> CompliantSeriesT_co: | ||
return self.compliant._from_native_series(series) | ||
|
||
|
||
class EagerSeriesNamespace( | ||
_SeriesNamespace[EagerSeriesT_co, NativeSeriesT_co], | ||
Generic[EagerSeriesT_co, NativeSeriesT_co], | ||
): | ||
_compliant_series: EagerSeriesT_co | ||
|
||
def __init__(self, series: EagerSeriesT_co, /) -> None: | ||
self._compliant_series = series | ||
|
||
|
||
class EagerSeriesCatNamespace( # type: ignore[misc] | ||
_SeriesNamespace[EagerSeriesT_co, NativeSeriesT_co], | ||
CatNamespace[EagerSeriesT_co], | ||
Protocol[EagerSeriesT_co, NativeSeriesT_co], | ||
): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that these are in the right place, I can collect my thoughts on the 3 solutions we can use for Protocol
definintions with an __init__
.
Problem
Defining a constructor in a Protocol
can produce two different bugs.
I've documented that in (#2064 (comment)) - but the short of it is 3.8
and 3.(9|10)
have issues.
Solutions
1. Protocol38
This was the first one I discovered (for #2064) and I really dislike it due to the complexity.
"Fixing" 3.8
requires tricking the type checker into thinking typing.Generic
is typing.Protocol
.
Imports
narwhals/narwhals/_compliant/expr.py
Lines 40 to 48 in 697bb2c
if not TYPE_CHECKING: # pragma: no cover | |
if sys.version_info >= (3, 9): | |
from typing import Protocol as Protocol38 | |
else: | |
from typing import Generic as Protocol38 | |
else: # pragma: no cover | |
# TODO @dangotbanned: Remove after dropping `3.8` (#2084) | |
# - https://github.com/narwhals-dev/narwhals/pull/2064#discussion_r1965921386 | |
from typing import Protocol as Protocol38 |
The side-effect is every sub-protocol is then required to use that Protocol38
definition or face runtime errors - that can no-longer be caught statically
E TypeError: Protocols can only inherit from other protocols, got <class 'narwhals._compliant.series.EagerSeriesNamespace'>
"Fixing" 3.(9|10)
requires redefining __init__
anyway in the impl.
Which mostly negates the benefit of defining in the Protocol
π€¦ββοΈ
Repeat __init__
narwhals/narwhals/_compliant/expr.py
Lines 332 to 345 in 697bb2c
def __init__( | |
self: Self, | |
call: Callable[[EagerDataFrameT], Sequence[EagerSeriesT]], | |
*, | |
depth: int, | |
function_name: str, | |
evaluate_output_names: Callable[[EagerDataFrameT], Sequence[str]], | |
alias_output_names: Callable[[Sequence[str]], Sequence[str]] | None, | |
implementation: Implementation, | |
backend_version: tuple[int, ...], | |
version: Version, | |
call_kwargs: dict[str, Any] | None = None, | |
) -> None: ... | |
narwhals/narwhals/_arrow/expr.py
Lines 33 to 55 in 697bb2c
def __init__( | |
self: Self, | |
call: Callable[[ArrowDataFrame], Sequence[ArrowSeries]], | |
*, | |
depth: int, | |
function_name: str, | |
evaluate_output_names: Callable[[ArrowDataFrame], Sequence[str]], | |
alias_output_names: Callable[[Sequence[str]], Sequence[str]] | None, | |
backend_version: tuple[int, ...], | |
version: Version, | |
call_kwargs: dict[str, Any] | None = None, | |
implementation: Implementation | None = None, | |
) -> None: | |
self._call = call | |
self._depth = depth | |
self._function_name = function_name | |
self._depth = depth | |
self._evaluate_output_names = evaluate_output_names | |
self._alias_output_names = alias_output_names | |
self._backend_version = backend_version | |
self._version = version | |
self._call_kwargs = call_kwargs or {} | |
self._metadata: ExprMetadata | None = None |
narwhals/narwhals/_pandas_like/expr.py
Lines 69 to 91 in 697bb2c
def __init__( | |
self: Self, | |
call: Callable[[PandasLikeDataFrame], Sequence[PandasLikeSeries]], | |
*, | |
depth: int, | |
function_name: str, | |
evaluate_output_names: Callable[[PandasLikeDataFrame], Sequence[str]], | |
alias_output_names: Callable[[Sequence[str]], Sequence[str]] | None, | |
implementation: Implementation, | |
backend_version: tuple[int, ...], | |
version: Version, | |
call_kwargs: dict[str, Any] | None = None, | |
) -> None: | |
self._call = call | |
self._depth = depth | |
self._function_name = function_name | |
self._evaluate_output_names = evaluate_output_names | |
self._alias_output_names = alias_output_names | |
self._implementation = implementation | |
self._backend_version = backend_version | |
self._version = version | |
self._call_kwargs = call_kwargs or {} | |
self._metadata: ExprMetadata | None = None |
2. Define the constructor another way
(#2261) introduced a different solution - but still needed to use Protocol38
to avoid the metaclass issues of the first solution.
I like this a lot more as it feels more intuitive from the outside - given the clear chain of @classmethod
and @property
definitions.
when-then
narwhals/narwhals/_arrow/namespace.py
Lines 241 to 242 in 697bb2c
def when(self: Self, predicate: ArrowExpr) -> ArrowWhen: | |
return ArrowWhen.from_expr(predicate, context=self) |
narwhals/narwhals/_arrow/namespace.py
Lines 282 to 285 in 697bb2c
class ArrowWhen(EagerWhen[ArrowDataFrame, ArrowSeries, ArrowExpr, "ArrowChunkedArray"]): | |
@property | |
def _then(self) -> type[ArrowThen]: | |
return ArrowThen |
narwhals/narwhals/_compliant/when_then.py
Lines 72 to 82 in 697bb2c
@classmethod | |
def from_expr(cls, condition: ExprT, /, *, context: _FullContext) -> Self: | |
obj = cls.__new__(cls) | |
obj._condition = condition | |
obj._then_value = None | |
obj._otherwise_value = None | |
obj._implementation = context._implementation | |
obj._backend_version = context._backend_version | |
obj._version = context._version | |
return obj | |
narwhals/narwhals/_compliant/when_then.py
Lines 63 to 64 in 697bb2c
@property | |
def _then(self) -> type[CompliantThen[FrameT, SeriesT, ExprT]]: ... |
narwhals/narwhals/_compliant/when_then.py
Lines 67 to 70 in 697bb2c
def then( | |
self, value: IntoExpr[SeriesT, ExprT], / | |
) -> CompliantThen[FrameT, SeriesT, ExprT]: | |
return self._then.from_when(self, value) |
narwhals/narwhals/_compliant/when_then.py
Lines 94 to 115 in 697bb2c
@classmethod | |
def from_when( | |
cls, | |
when: CompliantWhen[FrameT, SeriesT, ExprT], | |
then: IntoExpr[SeriesT, ExprT], | |
/, | |
) -> Self: | |
when._then_value = then | |
obj = cls.__new__(cls) | |
obj._call = when | |
obj._when_value = when | |
obj._depth = 0 | |
obj._function_name = "whenthen" | |
obj._evaluate_output_names = getattr( | |
then, "_evaluate_output_names", lambda _df: ["literal"] | |
) | |
obj._alias_output_names = getattr(then, "_alias_output_names", None) | |
obj._implementation = when._implementation | |
obj._backend_version = when._backend_version | |
obj._version = when._version | |
obj._call_kwargs = {} | |
return obj |
The only downside I could see is __new__
being a less-understood method than __init__
.
Personally it took me a while to understand when it makes sense to use.
Even now that I do - I didn't think of it as a solution until a month after (#2064) merged π
3. Avoid constructors entirely in Protocol
(s)
That is the solution this comment is attached to.
It might be easy to miss, but _SeriesNamespace
is a Protocol
and EagerSeriesNamespace
is a Generic
class.
The interesting aspect of that is the Protocol
(s) represent the interface - but the concrete impl is separate.
This approach makes sense when the interface and implementation differ greatly.
Here it is ArrowSeries
and PandasLikeSeries
- but we go fully structural for all of Polars
I forgot that one isn't for `pd.Series`
Not used anywhere else, no need to check `Implementation` so many times
https://github.com/narwhals-dev/narwhals/actions/runs/14091955981/job/39470550165?pr=2294 Haven't figured out why yet, but the narwhals/utils/generate_backend_completeness.py Lines 103 to 111 in 697bb2c
First guess would be the extra properties for narwhals/narwhals/_pandas_like/utils.py Lines 813 to 824 in 7daa8f6
UpdateResolved in (3ff77a0) |
Seems to have solved the issue locally #2294 (comment)
EagerSeries*Namespace
protocolsEagerSeries*Namespace
protocols
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dangotbanned !
Still need to untangle `CompliantExpr` + friends #2294 (comment)
* refactor: Add `SelectorNamespace.from_namespace` #2294 (comment) * refactor: Reuse pattern from `when_then` * refactor: Only use `Protocol38` in sub `CompliantExpr` Still need to untangle `CompliantExpr` + friends #2294 (comment) * refactor: Simplify typing * refactor: Move `Eval*` aliases to `_compliant.typing`
Closes #2230
What type of PR is this? (check all applicable)
Related issues
_Stores(Native|Compliant)
Β #2130._native_series
refsΒ #2293Compliant*
protocolsΒ #2230CompliantExpr
Β #21193.8
Β #2084Checklist
If you have comments or can explain your changes, please do so below
<3.11
compatible approachEagerSeries*Namespace
protocolsΒ #2294 (comment))PandasLike*
CatNamespace
DateTimeNamespace
ListNamespace
StringNamespace
StructNamespace
generate_backend_completeness
EagerSeries*Namespace
protocolsΒ #2294 (comment))