Skip to content

chore(typing): Generic CompliantDataFrame #2115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 1, 2025

Conversation

dangotbanned
Copy link
Member

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

- Originally in #2064
- Will eventually support #2104 (comment)
- Now for `PolarsDataFrame` to be compliant, `PolarsSeries.alias` needs to be present
- Since `PolarsDataFrame` can be returned in all of these places, they all required the update to `_polars`
Comment on lines 69 to +71
@property
def columns(self) -> Sequence[str]: ...
def get_column(self, name: str) -> CompliantSeriesT_co: ...
Copy link
Member Author

@dangotbanned dangotbanned Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this, the only part that should be new in (#2064) is adding .schema for this and CompliantLazyFrame.

I can add that here, split into another PR, or just include in (#2064)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli I chose get_column just to provide a method that uses CompliantSeriesT_co.

The eventual goal will be (#2104 (comment))

@dangotbanned dangotbanned marked this pull request as ready for review February 28, 2025 14:47
@dangotbanned dangotbanned requested a review from FBruzzesi March 1, 2025 10:24
def aggregate(self: Self, *exprs: ArrowExpr) -> Self:
def aggregate(self: ArrowDataFrame, *exprs: ArrowExpr) -> ArrowDataFrame:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need changing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli
pyright is fine with Self IIRC - but mypy still needs all the extra inline annotations that we have on main

def aggregate(self: Self, *exprs: ArrowExpr) -> Self:
return self.select(*exprs)
def select(self: Self, *exprs: ArrowExpr) -> Self:
new_series: Sequence[ArrowSeries] = evaluate_into_exprs(self, *exprs)
if not new_series:
# return empty dataframe, like Polars does
return self._from_native_frame(
self._native_frame.__class__.from_arrays([]), validate_column_names=False
)
names = [s.name for s in new_series]
new_series = align_series_full_broadcast(*new_series)
df = pa.Table.from_arrays([s._native_series for s in new_series], names=names)
return self._from_native_frame(df, validate_column_names=True)
def with_columns(self: Self, *exprs: ArrowExpr) -> Self:
native_frame = self._native_frame
new_columns: list[ArrowSeries] = evaluate_into_exprs(self, *exprs)

I had a tough time with that as well in (#2055) - where I've tried moving it into a method:

class ReuseDataFrame(CompliantDataFrame, Protocol[ReuseSeriesT]):
def _evaluate_into_expr(
self: ReuseDataFrameT, expr: ReuseExpr[ReuseDataFrameT, ReuseSeriesT], /
) -> Sequence[ReuseSeriesT]:
_, aliases = evaluate_output_names_and_aliases(expr, self, [])
result = expr(self)
if list(aliases) != [s.name for s in result]:
msg = f"Safety assertion failed, expected {aliases}, got {result}"
raise AssertionError(msg)
return result
def _evaluate_into_exprs(
self, *exprs: ReuseExpr[Self, ReuseSeriesT]
) -> Sequence[ReuseSeriesT]:
return list(chain.from_iterable(self._evaluate_into_expr(expr) for expr in exprs))
# NOTE: `mypy` is **very** fragile here in what is permitted
# DON'T CHANGE UNLESS IT SOLVES ANOTHER ISSUE
def _maybe_evaluate_expr(
self, expr: ReuseExpr[ReuseDataFrame[ReuseSeriesT], ReuseSeriesT] | T, /
) -> ReuseSeriesT | T:
if is_reuse_expr(expr):
result: Sequence[ReuseSeriesT] = expr(self)
if len(result) > 1:
msg = (
"Multi-output expressions (e.g. `nw.all()` or `nw.col('a', 'b')`) "
"are not supported in this context"
)
raise ValueError(msg)
return result[0]
return expr
# NOTE: DON'T CHANGE THIS EITHER
def is_reuse_expr(
obj: ReuseExpr[Any, ReuseSeriesT] | Any,
) -> TypeIs[ReuseExpr[Any, ReuseSeriesT]]:
return hasattr(obj, "__narwhals_expr__")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the log from (c11dc95)

narwhals/_arrow/dataframe.py:350: error: Cannot infer type argument 1 of
"evaluate_into_exprs"  [misc]
            new_series: Sequence[ArrowSeries] = evaluate_into_exprs(self, ...
                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~...
narwhals/_arrow/dataframe.py:363: error: Cannot infer type argument 1 of
"evaluate_into_exprs"  [misc]
            new_columns: list[ArrowSeries] = evaluate_into_exprs(self, *ex...
                                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~...
narwhals/_arrow/dataframe.py:548: error: Cannot infer type argument 1 of
"evaluate_into_exprs"  [misc]
                mask_native = evaluate_into_exprs(self, predicate)[0]._nat...
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
narwhals/_pandas_like/dataframe.py:395: error: Cannot infer type argument 1 of
"evaluate_into_exprs"  [misc]
            new_series: list[PandasLikeSeries] = evaluate_into_exprs(self,...
                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~...
narwhals/_pandas_like/dataframe.py:445: error: Cannot infer type argument 1 of
"evaluate_into_exprs"  [misc]
                mask = evaluate_into_exprs(self, predicate)[0]
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
narwhals/_pandas_like/dataframe.py:454: error: Cannot infer type argument 1 of
"evaluate_into_exprs"  [misc]
            new_columns: list[PandasLikeSeries] = evaluate_into_exprs(self...
                                                  ^~~~~~~~~~~~~~~~~~~~~~~~...

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done @dangotbanned , thanks for improving Narwhals!

@MarcoGorelli MarcoGorelli merged commit b987e1a into main Mar 1, 2025
26 of 28 checks passed
@MarcoGorelli MarcoGorelli deleted the typing-compliant-frames-1 branch March 1, 2025 18:32
dangotbanned added a commit that referenced this pull request Mar 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants