Sage-Bionetworks · BWMac · Apr 17, 2025 · Apr 10, 2025 · Apr 10, 2025 · Apr 10, 2025
@@ -0,0 +1,31 @@
+# Dataset Collection
+
+Contained within this file are experimental interfaces for working with the Synapse Python
+Client. Unless otherwise noted these interfaces are subject to change at any time. Use
+at your own risk.
+
+## API reference
+
+::: synapseclient.models.DatasetCollection
+    options:
+        inherited_members: true
+        members:
+            - add_item_async
+            - remove_item_async
+            - store_async
+            - get_async
+            - delete_async
+            - update_rows_async
+            - snapshot_async
+            - query_async
+            - query_part_mask_async
+            - add_column
+            - delete_column
+            - reorder_column
+            - rename_column
+            - get_permissions
+            - get_acl
+            - set_permissions
+---
+::: synapseclient.models.EntityRef
+---
@@ -4,16 +4,6 @@ Contained within this file are experimental interfaces for working with the Syna
 Client. Unless otherwise noted these interfaces are subject to change at any time. Use
 at your own risk.
 
-## Example Script:
-
-<details class="quote">
-  <summary>Working with Synapse datasets</summary>
-
-```python
-{!docs/scripts/object_orientated_programming_poc/oop_poc_dataset.py!}
-```
-</details>
-
 ## API reference
 
 ::: synapseclient.models.Dataset

@@ -0,0 +1,31 @@
+# Dataset Collection
+
+Contained within this file are experimental interfaces for working with the Synapse Python
+Client. Unless otherwise noted these interfaces are subject to change at any time. Use
+at your own risk.
+
+## API reference
+
+::: synapseclient.models.DatasetCollection
+    options:
+        inherited_members: true
+        members:
+            - add_item
+            - remove_item
+            - store
+            - get
+            - delete
+            - update_rows
+            - snapshot
+            - query
+            - query_part_mask
+            - add_column
+            - delete_column
+            - reorder_column
+            - rename_column
+            - get_permissions
+            - get_acl
+            - set_permissions
+---
+::: synapseclient.models.EntityRef
+---
@@ -1,7 +1,7 @@
 # Datasets
 Datasets in Synapse are a way to organize, annotate, and publish sets of files for others to use. Datasets behave similarly to Tables and EntityViews, but provide some default behavior that makes it easy to put a group of files together.
 
-This tutorial will walk through basics of working with datasets using the Synapse Python client.
+This tutorial will walk through basics of working with datasets using the Synapse Python Client.
 
 # Tutorial Purpose
 In this tutorial, you will:
@@ -29,15 +29,15 @@ In this tutorial, you will:
 Let's get started by authenticating with Synapse and retrieving the ID of your project.
 
 ```python
-{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=17-23}
+{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=3-24}
 ```
 
 ## 2. Create your Dataset
 
 Next, we will create the dataset. We will use the project ID to tell Synapse where we want the dataset to be created. After this step, we will have a Dataset object with all of the needed information to start building the dataset.
 
 ```python
-{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=27-28}
+{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=29-30}
 ```
 
 Because we haven't added any files to the dataset yet, it will be empty, but if you view the dataset's schema in the UI, you will notice that datasets come with default columns that help to describe each file that we add to the dataset.
@@ -50,20 +50,20 @@ Let's add some files to the dataset now. There are three ways to add files to a
 
 1. Add an Entity Reference to a file with its ID and version
 ```python
-{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=32-34}
+{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=34-36}
 ```
 2. Add a File with its ID and version
 ```python
-{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=36-38}
+{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=38-40}
 ```
 3. Add a Folder. When adding a folder, all child files inside of the folder are added to the dataset recursively.
 ```python
-{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=40-42}
+{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=42-44}
 ```
 
 Whenever we make changes to the dataset, we need to call the `store()` method to save the changes to Synapse.
 ```python
-{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=44}
+{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=46}
 ```
 
 And now we are able to see our dataset with all of the files that we added to it.
@@ -75,37 +75,37 @@ And now we are able to see our dataset with all of the files that we added to it
 Now that we have a dataset with some files in it, we can retrieve the dataset from Synapse the next time we need to use it.
 
 ```python
-{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=48-50}
+{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=50-52}
 ```
 
 ## 5. Query the dataset
 
 Now that we have a dataset with some files in it, we can query the dataset to find files that match certain criteria.
 
 ```python
-{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=54-57}
+{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=56-59}
 ```
 
 ## 6. Add a custom column to the dataset
 
 We can also add a custom column to the dataset. This will allow us to annotate files in the dataset with additional information.
 
 ```python
-{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=61-67}
+{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=63-69}
 ```
 
 Our custom column isn't all that useful empty, so let's update the dataset with some values.
 
 ```python
-{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=70-78}
+{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=72-80}
 ```
 
 ## 7. Save a snapshot of the dataset
 
 Finally, let's save a snapshot of the dataset. This creates a read-only version of the dataset that captures the current state of the dataset and can be referenced later.
 
 ```python
-{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=82-86}
+{!docs/tutorials/python/tutorial_scripts/dataset.py!lines=84-88}
 ```
 
 ## Source Code for this Tutorial

@@ -0,0 +1,112 @@
+# Dataset Collections
+Dataset Collections are a way to organize, annotate, and publish sets of datasets for others to use. Dataset Collections behave similarly to Tables and EntityViews, but provide some default behavior that makes it easy to put a group of datasets together.
+
+This tutorial will walk through basics of working with Dataset Collections using the Synapse Python Client.
+
+# Tutorial Purpose
+In this tutorial, you will:
+
+- Create a Dataset Collection
+- Add datasets to the collection
+- Add a custom column to the collection
+- Update the collection with new annotations
+- Query the collection
+- Save a snapshot of the collection
+
+# Prerequisites
+* This tutorial assumes that you have a project in Synapse and have already created datasets that you would like to add to a Dataset Collection.
+* If you need help creating datasets, you can refer to the [dataset tutorial](./dataset.md).
+* Pandas must be installed as shown in the [installation documentation](../installation.md)
+
+## 1. Get the ID of your Synapse project
+
+Let's get started by authenticating with Synapse and retrieving the ID of your project.
+
+```python
+{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=3-16}
+```
+
+## 2. Create your Dataset Collection
+
+Next, we will create the Dataset Collection using the project ID to tell Synapse where we want the Dataset Collection to be created. After this step, we will have a Dataset Collection object with all of the necessary information to start building the collection.
+
+```python
+{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=25-33}
+```
+
+Because we haven't added any datasets to the collection yet, it will be empty, but if you view the Dataset Collection's schema in the UI, you will notice that Dataset Collections come with default columns.
+
+![Dataset Collection Default Schema](./tutorial_screenshots/dataset_collection_default_schema.png)
+
+## 3. Add Datasets to the Dataset Collection
+
+Now, let's add some datasets to the collection. We will loop through our dataset ids and add each dataset to the collection using the `add_item` method.
+
+```python
+{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=37-38}
+```
+
+Whenever we make changes to the Dataset Collection, we need to call the `store()` method to save the changes to Synapse.
+
+```python
+{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=40}
+```
+
+And now we are able to see our Dataset Collection with all of the datasets that we added to it.
+
+![Dataset Collection with Datasets](./tutorial_screenshots/dataset_collection_with_datasets.png)
+
+## 4. Retrieve the Dataset Collection
+
+Now that our Dataset Collection has been created and we have added some Datasets to it, we can retrieve the Dataset Collection from Synapse the next time we need to use it.
+
+```python
+{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=44-46}
+```
+
+## 5. Add a custom column to the Dataset Collection
+
+In addition to the default columns, you may want to annotate items in your DatasetCollection using custom columns.
+
+```python
+{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=50-56}
+```
+
+Our custom column isn't all that useful empty, so let's update the Dataset Collection with some values.
+
+```python
+{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=59-67}
+```
+
+## 6. Query the Dataset Collection
+
+If you want to query your DatasetCollection for items that match certain criteria, you can do so using the `query` method.
+
+```python
+{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=71-74}
+```
+
+## 7. Save a snapshot of the Dataset Collection
+
+Finally, let's save a snapshot of the Dataset Collection. This creates a read-only version of the Dataset Collection that captures the current state of the Dataset Collection and can be referenced later.
+
+```python
+{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!lines=77}
+```
+
+## Source Code for this Tutorial
+
+<details class="quote">
+  <summary>Click to show me</summary>
+
+```python
+{!docs/tutorials/python/tutorial_scripts/dataset_collection.py!}
+```
+</details>
+
+## References
+- [DatasetCollection](../../reference/experimental/sync/dataset_collection.md)
+- [Dataset](../../reference/experimental/sync/dataset.md)
+- [Project](../../reference/experimental/sync/project.md)
+- [Column][synapseclient.models.Column]
+- [syn.login][synapseclient.Synapse.login]
@@ -17,9 +17,11 @@
 syn = Synapse()
 syn.login()
 
-project = Project(name="My Testing Project").get()  # Replace with your project name
+project = Project(
+    name="My uniquely named project about Alzheimer's Disease"
+).get()  # Replace with your project name
 project_id = project.id
-print(project_id)
+print(f"My project ID is {project_id}")
 
 # Next, let's create the dataset. We'll use the project id as the parent id.
 # To begin, the dataset will be empty, but if you view the dataset's schema in the UI,

@@ -0,0 +1,77 @@
+"""Here is where you'll find the code for the DatasetCollection tutorial."""
+
+import pandas as pd
+
+from synapseclient import Synapse
+from synapseclient.models import Column, ColumnType, Dataset, DatasetCollection, Project
+
+# First, let's get the project that we want to create the DatasetCollection in
+syn = Synapse()
+syn.login()
+
+project = Project(
+    name="My uniquely named project about Alzheimer's Disease"
+).get()  # Replace with your project name
+project_id = project.id
+print(f"My project ID is {project_id}")
+
+# This tutorial assumes that you have already created datasets that you would like to add to a DatasetCollection.
+# If you need help creating datasets, you can refer to the dataset tutorial.
+
+# For this example, we will be using datasets already created in the project.
+# Let's create the DatasetCollection. We'll use the project id as the parent id.
+# At first, the DatasetCollection will be empty, but if you view the DatasetCollection's schema in the UI,
+# you will notice that DatasetCollections come with default columns.
+DATASET_IDS = [
+    "syn65987017",
+    "syn65987019",
+    "syn65987020",
+]  # Replace with your dataset IDs
+test_dataset_collection = DatasetCollection(
+    parent_id=project_id, name="test_dataset_collection"
+).store()
+print(f"My DatasetCollection's ID is {test_dataset_collection.id}")
+
+# Now, let's add some datasets to the collection. We will loop through our dataset ids and add each dataset to the
+# collection using the `add_item` method.
+for dataset_id in DATASET_IDS:
+    test_dataset_collection.add_item(Dataset(id=dataset_id).get())
+# Our changes won't be persisted to Synapse until we call the `store` method on our DatasetCollection.
+test_dataset_collection.store()
+
+# Now that our DatasetCollection with all of our datasets has been created, the next time we want to use it,
+# we can retrieve it from Synapse.
+my_retrieved_dataset_collection = DatasetCollection(id=test_dataset_collection.id).get()
+print(f"My DatasetCollection's ID is still {my_retrieved_dataset_collection.id}")
+print(f"My DatasetCollection has {len(my_retrieved_dataset_collection.items)} items")
+
+# In addition to the default columns, you may want to annotate items in your DatasetCollection using
+# custom columns.
+my_retrieved_dataset_collection.add_column(
+    column=Column(
+        name="my_annotation",
+        column_type=ColumnType.STRING,
+    )
+)
+my_retrieved_dataset_collection.store()
+
+# Now that our custom column has been added, we can update the DatasetCollection with new annotations.
+modified_data = pd.DataFrame(
+    {
+        "id": DATASET_IDS,
+        "my_annotation": ["good dataset" * len(DATASET_IDS)],
+    }
+)
+my_retrieved_dataset_collection.update_rows(
+    values=modified_data, primary_keys=["id"], dry_run=False
+)
+
+# If you want to query your DatasetCollection for items that match certain criteria, you can do so
+# using the `query` method.
+rows = my_retrieved_dataset_collection.query(
+    query=f"SELECT id, my_annotation FROM {my_retrieved_dataset_collection.id} WHERE my_annotation = 'good dataset'"
+)
+print(rows)
+
+# Create a snapshot of the DatasetCollection
+my_retrieved_dataset_collection.snapshot(comment="test snapshot")