-
Notifications
You must be signed in to change notification settings - Fork 72
[SYNPY-1578] DatasetCollection
OOP Model
#1189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
a084116
adds initial DatasetCollection implementation
BWMac 2d4b8e6
adds unit tests
BWMac 3d14cd7
pre-commit
BWMac cd9e910
updates docstrings
BWMac e1a1e8a
adds integration tests
BWMac 41eaf74
adds docs pages
BWMac 25f12ff
removes example script section from dataset documentation
BWMac 3a5b017
adds dataset collection tutorial
BWMac 2ffdeab
fixes tutorial script
BWMac c43a172
adds tutorial path to mkdocs.yml
BWMac 0b40a21
bullet points
BWMac 73baab0
fixes tutorial code lines
BWMac 9d2984e
fixes tutorial references
BWMac c4866f6
test doc format fix
BWMac 6c196a3
fixes dataset docs
BWMac daedf46
fixes sync integration tests
BWMac 84a73e3
fixes DatasetCollection docstrings
BWMac 60fa4f7
refactors entity factory
BWMac cd208d6
fixes argument error
BWMac 3a70496
updates test strings
BWMac b7e728e
Merge branch 'develop' into synpy-1578-oop-model-dataset-collection
BWMac 7512c0e
pre-commit
BWMac e0e82bd
Update docs/tutorials/python/dataset_collection.md
BWMac 27a7656
updates tutorials
BWMac cd775a5
removes elif block
BWMac 0b2603f
pre-commit
BWMac 66c562a
removes unused cleanup
BWMac 87950e9
updates version handling and tests
BWMac 1368c7a
fix async tests
BWMac 5d39a79
addresses comments
BWMac f279036
fixes docstrings
BWMac d63b212
adds retry logic for uncaught async jobs
BWMac 345a2ee
set max on timeout
BWMac 7a993ba
addresses comments
BWMac fe17606
updates unit test for version num
BWMac 80edc7e
fixes incorrect line number
BWMac 6bc4374
adds missing snapshot tests
BWMac f646dc1
corrects type hint
BWMac File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Dataset Collection | ||
|
||
Contained within this file are experimental interfaces for working with the Synapse Python | ||
Client. Unless otherwise noted these interfaces are subject to change at any time. Use | ||
at your own risk. | ||
|
||
## API reference | ||
|
||
::: synapseclient.models.DatasetCollection | ||
options: | ||
inherited_members: true | ||
members: | ||
- add_item_async | ||
- remove_item_async | ||
- store_async | ||
- get_async | ||
- delete_async | ||
- update_rows_async | ||
- snapshot_async | ||
- query_async | ||
- query_part_mask_async | ||
- add_column | ||
- delete_column | ||
- reorder_column | ||
- rename_column | ||
- get_permissions | ||
- get_acl | ||
- set_permissions | ||
--- | ||
::: synapseclient.models.EntityRef | ||
--- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Dataset Collection | ||
|
||
Contained within this file are experimental interfaces for working with the Synapse Python | ||
Client. Unless otherwise noted these interfaces are subject to change at any time. Use | ||
at your own risk. | ||
|
||
## API reference | ||
|
||
::: synapseclient.models.DatasetCollection | ||
options: | ||
inherited_members: true | ||
members: | ||
- add_item | ||
- remove_item | ||
- store | ||
- get | ||
- delete | ||
- update_rows | ||
- snapshot | ||
- query | ||
- query_part_mask | ||
- add_column | ||
- delete_column | ||
- reorder_column | ||
- rename_column | ||
- get_permissions | ||
- get_acl | ||
- set_permissions | ||
--- | ||
::: synapseclient.models.EntityRef | ||
--- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
# Dataset Collections | ||
Dataset Collections are a way to organize, annotate, and publish sets of datasets for others to use. Dataset Collections behave similarly to Tables and EntityViews, but provide some default behavior that makes it easy to put a group of datasets together. | ||
|
||
This tutorial will walk through basics of working with Dataset Collections using the Synapse Python Client. | ||
|
||
# Tutorial Purpose | ||
In this tutorial, you will: | ||
|
||
- Create a Dataset Collection | ||
- Add datasets to the collection | ||
- Add a custom column to the collection | ||
- Update the collection with new annotations | ||
- Query the collection | ||
- Save a snapshot of the collection | ||
|
||
# Prerequisites | ||
* This tutorial assumes that you have a project in Synapse and have already created datasets that you would like to add to a Dataset Collection. | ||
* If you need help creating datasets, you can refer to the [dataset tutorial](./dataset.md). | ||
* Pandas must be installed as shown in the [installation documentation](../installation.md) | ||
|
||
## 1. Get the ID of your Synapse project | ||
|
||
Let's get started by authenticating with Synapse and retrieving the ID of your project. | ||
|
||
```python | ||
{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=3-16} | ||
``` | ||
|
||
## 2. Create your Dataset Collection | ||
|
||
Next, we will create the Dataset Collection using the project ID to tell Synapse where we want the Dataset Collection to be created. After this step, we will have a Dataset Collection object with all of the necessary information to start building the collection. | ||
|
||
```python | ||
{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=25-33} | ||
``` | ||
|
||
Because we haven't added any datasets to the collection yet, it will be empty, but if you view the Dataset Collection's schema in the UI, you will notice that Dataset Collections come with default columns. | ||
BWMac marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
 | ||
|
||
## 3. Add Datasets to the Dataset Collection | ||
|
||
Now, let's add some datasets to the collection. We will loop through our dataset ids and add each dataset to the collection using the `add_item` method. | ||
|
||
```python | ||
{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=37-38} | ||
``` | ||
|
||
Whenever we make changes to the Dataset Collection, we need to call the `store()` method to save the changes to Synapse. | ||
|
||
```python | ||
{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=40} | ||
``` | ||
|
||
And now we are able to see our Dataset Collection with all of the datasets that we added to it. | ||
|
||
 | ||
|
||
## 4. Retrieve the Dataset Collection | ||
|
||
Now that our Dataset Collection has been created and we have added some Datasets to it, we can retrieve the Dataset Collection from Synapse the next time we need to use it. | ||
|
||
```python | ||
{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=44-46} | ||
``` | ||
|
||
## 5. Add a custom column to the Dataset Collection | ||
|
||
In addition to the default columns, you may want to annotate items in your DatasetCollection using custom columns. | ||
|
||
```python | ||
{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=50-56} | ||
``` | ||
|
||
Our custom column isn't all that useful empty, so let's update the Dataset Collection with some values. | ||
|
||
```python | ||
{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=59-67} | ||
``` | ||
|
||
## 6. Query the Dataset Collection | ||
|
||
If you want to query your DatasetCollection for items that match certain criteria, you can do so using the `query` method. | ||
|
||
```python | ||
{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=71-74} | ||
``` | ||
|
||
## 7. Save a snapshot of the Dataset Collection | ||
|
||
Finally, let's save a snapshot of the Dataset Collection. This creates a read-only version of the Dataset Collection that captures the current state of the Dataset Collection and can be referenced later. | ||
|
||
```python | ||
{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=77} | ||
``` | ||
|
||
## Source Code for this Tutorial | ||
|
||
<details class="quote"> | ||
<summary>Click to show me</summary> | ||
|
||
```python | ||
{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!} | ||
``` | ||
</details> | ||
|
||
## References | ||
- [DatasetCollection](../../reference/experimental/sync/dataset_collection.md) | ||
- [Dataset](../../reference/experimental/sync/dataset.md) | ||
- [Project](../../reference/experimental/sync/project.md) | ||
- [Column][synapseclient.models.Column] | ||
- [syn.login][synapseclient.Synapse.login] |
Binary file added
BIN
+146 KB
docs/tutorials/python/tutorial_screenshots/dataset_collection_default_schema.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+121 KB
docs/tutorials/python/tutorial_screenshots/dataset_collection_with_datasets.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
77 changes: 77 additions & 0 deletions
77
docs/tutorials/python/tutorial_scripts/dataset_collection.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
"""Here is where you'll find the code for the DatasetCollection tutorial.""" | ||
|
||
import pandas as pd | ||
|
||
from synapseclient import Synapse | ||
from synapseclient.models import Column, ColumnType, Dataset, DatasetCollection, Project | ||
|
||
# First, let's get the project that we want to create the DatasetCollection in | ||
syn = Synapse() | ||
syn.login() | ||
|
||
project = Project( | ||
name="My uniquely named project about Alzheimer's Disease" | ||
).get() # Replace with your project name | ||
project_id = project.id | ||
print(f"My project ID is {project_id}") | ||
|
||
# This tutorial assumes that you have already created datasets that you would like to add to a DatasetCollection. | ||
# If you need help creating datasets, you can refer to the dataset tutorial. | ||
|
||
# For this example, we will be using datasets already created in the project. | ||
# Let's create the DatasetCollection. We'll use the project id as the parent id. | ||
# At first, the DatasetCollection will be empty, but if you view the DatasetCollection's schema in the UI, | ||
# you will notice that DatasetCollections come with default columns. | ||
DATASET_IDS = [ | ||
"syn65987017", | ||
"syn65987019", | ||
"syn65987020", | ||
] # Replace with your dataset IDs | ||
test_dataset_collection = DatasetCollection( | ||
parent_id=project_id, name="test_dataset_collection" | ||
).store() | ||
print(f"My DatasetCollection's ID is {test_dataset_collection.id}") | ||
|
||
# Now, let's add some datasets to the collection. We will loop through our dataset ids and add each dataset to the | ||
# collection using the `add_item` method. | ||
for dataset_id in DATASET_IDS: | ||
test_dataset_collection.add_item(Dataset(id=dataset_id).get()) | ||
# Our changes won't be persisted to Synapse until we call the `store` method on our DatasetCollection. | ||
test_dataset_collection.store() | ||
|
||
# Now that our DatasetCollection with all of our datasets has been created, the next time we want to use it, | ||
# we can retrieve it from Synapse. | ||
my_retrieved_dataset_collection = DatasetCollection(id=test_dataset_collection.id).get() | ||
print(f"My DatasetCollection's ID is still {my_retrieved_dataset_collection.id}") | ||
print(f"My DatasetCollection has {len(my_retrieved_dataset_collection.items)} items") | ||
|
||
# In addition to the default columns, you may want to annotate items in your DatasetCollection using | ||
# custom columns. | ||
my_retrieved_dataset_collection.add_column( | ||
column=Column( | ||
name="my_annotation", | ||
column_type=ColumnType.STRING, | ||
) | ||
) | ||
my_retrieved_dataset_collection.store() | ||
|
||
# Now that our custom column has been added, we can update the DatasetCollection with new annotations. | ||
modified_data = pd.DataFrame( | ||
{ | ||
"id": DATASET_IDS, | ||
"my_annotation": ["good dataset" * len(DATASET_IDS)], | ||
} | ||
) | ||
my_retrieved_dataset_collection.update_rows( | ||
values=modified_data, primary_keys=["id"], dry_run=False | ||
) | ||
|
||
# If you want to query your DatasetCollection for items that match certain criteria, you can do so | ||
# using the `query` method. | ||
rows = my_retrieved_dataset_collection.query( | ||
query=f"SELECT id, my_annotation FROM {my_retrieved_dataset_collection.id} WHERE my_annotation = 'good dataset'" | ||
) | ||
print(rows) | ||
|
||
# Create a snapshot of the DatasetCollection | ||
my_retrieved_dataset_collection.snapshot(comment="test snapshot") |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.