-
Notifications
You must be signed in to change notification settings - Fork 146
chore(typing): Generic CompliantDataFrame
#2115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Originally in #2064 - Will eventually support #2104 (comment)
- Now for `PolarsDataFrame` to be compliant, `PolarsSeries.alias` needs to be present - Since `PolarsDataFrame` can be returned in all of these places, they all required the update to `_polars`
Originally c11dc95
@property | ||
def columns(self) -> Sequence[str]: ... | ||
def get_column(self, name: str) -> CompliantSeriesT_co: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli I chose get_column
just to provide a method that uses CompliantSeriesT_co
.
The eventual goal will be (#2104 (comment))
def aggregate(self: Self, *exprs: ArrowExpr) -> Self: | ||
def aggregate(self: ArrowDataFrame, *exprs: ArrowExpr) -> ArrowDataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need changing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli
pyright
is fine with Self
IIRC - but mypy
still needs all the extra inline annotations that we have on main
narwhals/narwhals/_arrow/dataframe.py
Lines 357 to 374 in dc9fcaa
def aggregate(self: Self, *exprs: ArrowExpr) -> Self: | |
return self.select(*exprs) | |
def select(self: Self, *exprs: ArrowExpr) -> Self: | |
new_series: Sequence[ArrowSeries] = evaluate_into_exprs(self, *exprs) | |
if not new_series: | |
# return empty dataframe, like Polars does | |
return self._from_native_frame( | |
self._native_frame.__class__.from_arrays([]), validate_column_names=False | |
) | |
names = [s.name for s in new_series] | |
new_series = align_series_full_broadcast(*new_series) | |
df = pa.Table.from_arrays([s._native_series for s in new_series], names=names) | |
return self._from_native_frame(df, validate_column_names=True) | |
def with_columns(self: Self, *exprs: ArrowExpr) -> Self: | |
native_frame = self._native_frame | |
new_columns: list[ArrowSeries] = evaluate_into_exprs(self, *exprs) |
I had a tough time with that as well in (#2055) - where I've tried moving it into a method:
narwhals/narwhals/_protocols.py
Lines 50 to 87 in 35cef0b
class ReuseDataFrame(CompliantDataFrame, Protocol[ReuseSeriesT]): | |
def _evaluate_into_expr( | |
self: ReuseDataFrameT, expr: ReuseExpr[ReuseDataFrameT, ReuseSeriesT], / | |
) -> Sequence[ReuseSeriesT]: | |
_, aliases = evaluate_output_names_and_aliases(expr, self, []) | |
result = expr(self) | |
if list(aliases) != [s.name for s in result]: | |
msg = f"Safety assertion failed, expected {aliases}, got {result}" | |
raise AssertionError(msg) | |
return result | |
def _evaluate_into_exprs( | |
self, *exprs: ReuseExpr[Self, ReuseSeriesT] | |
) -> Sequence[ReuseSeriesT]: | |
return list(chain.from_iterable(self._evaluate_into_expr(expr) for expr in exprs)) | |
# NOTE: `mypy` is **very** fragile here in what is permitted | |
# DON'T CHANGE UNLESS IT SOLVES ANOTHER ISSUE | |
def _maybe_evaluate_expr( | |
self, expr: ReuseExpr[ReuseDataFrame[ReuseSeriesT], ReuseSeriesT] | T, / | |
) -> ReuseSeriesT | T: | |
if is_reuse_expr(expr): | |
result: Sequence[ReuseSeriesT] = expr(self) | |
if len(result) > 1: | |
msg = ( | |
"Multi-output expressions (e.g. `nw.all()` or `nw.col('a', 'b')`) " | |
"are not supported in this context" | |
) | |
raise ValueError(msg) | |
return result[0] | |
return expr | |
# NOTE: DON'T CHANGE THIS EITHER | |
def is_reuse_expr( | |
obj: ReuseExpr[Any, ReuseSeriesT] | Any, | |
) -> TypeIs[ReuseExpr[Any, ReuseSeriesT]]: | |
return hasattr(obj, "__narwhals_expr__") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the log from (c11dc95)
narwhals/_arrow/dataframe.py:350: error: Cannot infer type argument 1 of
"evaluate_into_exprs" [misc]
new_series: Sequence[ArrowSeries] = evaluate_into_exprs(self, ...
^~~~~~~~~~~~~~~~~~~~~~~~~~...
narwhals/_arrow/dataframe.py:363: error: Cannot infer type argument 1 of
"evaluate_into_exprs" [misc]
new_columns: list[ArrowSeries] = evaluate_into_exprs(self, *ex...
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~...
narwhals/_arrow/dataframe.py:548: error: Cannot infer type argument 1 of
"evaluate_into_exprs" [misc]
mask_native = evaluate_into_exprs(self, predicate)[0]._nat...
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
narwhals/_pandas_like/dataframe.py:395: error: Cannot infer type argument 1 of
"evaluate_into_exprs" [misc]
new_series: list[PandasLikeSeries] = evaluate_into_exprs(self,...
^~~~~~~~~~~~~~~~~~~~~~~~~...
narwhals/_pandas_like/dataframe.py:445: error: Cannot infer type argument 1 of
"evaluate_into_exprs" [misc]
mask = evaluate_into_exprs(self, predicate)[0]
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
narwhals/_pandas_like/dataframe.py:454: error: Cannot infer type argument 1 of
"evaluate_into_exprs" [misc]
new_columns: list[PandasLikeSeries] = evaluate_into_exprs(self...
^~~~~~~~~~~~~~~~~~~~~~~~...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well done @dangotbanned , thanks for improving Narwhals!
What type of PR is this? (check all applicable)
Related issues
CompliantSelector
Β #2064)mypy
errors fixed in (c11dc95)DataFrame.iter_columns
Β #2104 (comment))Checklist
If you have comments or can explain your changes, please do so below