Skip to content

enforced return types of groupby sequence arguments #10271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zerothi
Copy link
Contributor

@zerothi zerothi commented Apr 29, 2025

By distinguishing between scalar vs. sequence arguments we can now correctly return a scalar/tuple of the corresponding group values that is grouped over.

This makes autonomous usage of groupby more predictable.

E.g.

().groupby("x")[0] is a scalar

whereas

().groupby(["x"])[0] is a tuple of length 1

This meant some slight restructuring in the
core.groupby code, but it should be relatively simple.

zerothi added 2 commits April 29, 2025 09:38
By distinguishing between scalar vs. sequence arguments
we can now correctly return a scalar/tuple of the corresponding
group values that is grouped over.

This makes autonomous usage of `groupby` more predictable.

E.g.

   ().groupby("x")[0] is a scalar

whereas

   ().groupby(["x"])[0] is a tuple of length 1

This meant some slight restructuring in the
core.groupby code, but it should be relatively simple.

Signed-off-by: Nick Papior <nickpapior@gmail.com>
Signed-off-by: Nick Papior <nickpapior@gmail.com>
@zerothi
Copy link
Contributor Author

zerothi commented Apr 29, 2025

@dcherian some pointers here would be helpful, I don't fully understand the implications here... :(

@@ -429,6 +430,8 @@ def _parse_group_and_groupers(
)
for group, grouper in grouper_mapping.items()
)
if isinstance(group, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead I'd return a flag squeeze_group_label_on_iter that is True if group is str

@@ -626,7 +628,7 @@ class GroupBy(Generic[T_Xarray]):
def __init__(
self,
obj: T_Xarray,
groupers: tuple[ResolvedGrouper, ...],
groupers: ResolvedGrouper | tuple[ResolvedGrouper, ...],
restore_coord_dims: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
restore_coord_dims: bool = True,
squeeze_group_label_on_iter: bool
restore_coord_dims: bool = True,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then I'd handle this inside __iter__ where you'll need

for label, group in zip(self.unique_coord.data, self._iter_grouped(), strict=True):
    # handle promotion to tuple
    yield label, group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data*.groupby with sequences yields scalars for len-1 sequences
2 participants