Skip to content

[SYNPY-1578] DatasetCollection OOP Model #1189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Apr 17, 2025

Conversation

BWMac
Copy link
Contributor

@BWMac BWMac commented Apr 10, 2025

Problem:

We do not currently have an OOP model representing the Synapse DatasetCollection model. We should follow the example of the Dataset class and use the table component mixins to create one.

Solution:

This PR follows the Dataset example to create the DatasetCollection model. Users are now able to:

  • Create DatasetCollections in Synapse
  • Delete DatasetCollections
  • Get existing DatasetCollections
  • Add Datasets to their collection
  • Remove Datasets from their collection
  • Annotate Datasets within a collection using custom columns
  • Save snapshots of DatasetCollections

Needed documentation pages and a tutorial to help users get started using DatasetCollections were added to the mkdocs site.

Testing:

  • New methods were manually tested as they were implemented and as part of the process of creating the tutorial
  • Unit and Integration tests were created for all new logic introduced into with the DatasetCollection class

Notes:

  • The _cast_into_class_type function in entity_factory.py was getting pretty unwieldy with the amount if/elif lines that were included, but a lot of logic was repeated across many different concrete types. I refactored that function to reduce complexity and hopefully make it more maintainable. I will need to add MaterializedView to this new mapping method and resolve conflicts once [SYNPY-1579] Introduce the materialized view #1190 is merged.
  • During testing, we discovered that there is a condition in the Synapse backend when you create a snapshot of a View that causes additional async jobs to run subsequent to the initial snapshot job itself. I added retry logic in the async communicator class to account for in progress async jobs that we are not in control of.

@BWMac BWMac marked this pull request as ready for review April 14, 2025 18:17
@BWMac BWMac requested a review from a team as a code owner April 14, 2025 18:17
@BryanFauble
Copy link
Member

The _cast_into_class_type function in entity_factory.py was getting pretty unwieldy with the amount if/elif lines that were included, but a lot of logic was repeated across many different concrete types. I refactored that function to reduce complexity and hopefully make it more maintainable. I will need to add MaterializedView to this new mapping method/resolve conflicts once #1190 is merged.

@BWMac I merged in the changes to develop. This branch is ready for your resolution to the merge conflicts

@BryanFauble BryanFauble requested a review from Copilot April 14, 2025 19:41
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

tests/unit/synapseclient/models/synchronous/unit_test_dataset.py:121

  • [nitpick] Consider using a less brittle error message matcher (for example, checking for a substring like 'Dataset or EntityRef') to reduce test fragility if the error message formatting changes.
with pytest.raises(ValueError, match="item must be a Dataset or EntityRef. 1 is a <class 'int'>",):

synapseclient/api/entity_factory.py:383

  • [nitpick] It might help to add an inline comment clarifying that entity_to_update is reused when provided, which is particularly relevant now that the mapping includes DatasetCollection.
entity_instance = entity_to_update or entity_class()

@BWMac BWMac requested a review from BryanFauble April 16, 2025 14:41
Copy link
Member

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left my last round of comments, great changes here. Thanks for your attention to detail on the changes.

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 nice! i took a look at the tests and thanks for tackling this!

@BWMac BWMac merged commit 4d26c49 into develop Apr 17, 2025
28 checks passed
@BWMac BWMac deleted the synpy-1578-oop-model-dataset-collection branch April 17, 2025 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants