Skip to content

Enable sampling multiple nodes per "example" #155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kmontemayor2-sc
Copy link
Contributor

We have a use case where we'd like to sample against multiple nodes at the same time. For instance, we'd like to generate subgraphs for both A and B in the below graph.

image

I'd like to be able to specify that input_nodes = [[A, B], [C, D]] as inputs to DistNeighborLoader 1 so that we sample against A, B for one example and B, C for another example.

While it is possible to achieve the same functionality via batching, I feel like it will be very tedious to keep track of both the "example count" and "batch size" and how they interplay.
Additionally, if we must batch here, then we would need to construct our input_nodes very carefully, which I feel is also quite error prone.

Without this change, when we are iterating through the NodeSamplerInput in _sampling_worker_loop 2, if we have something like NodeSamplerInput([[0, 1], [2, 3]]) then when the dataloader outputs tensor([0]) we would pass in tensor([[0, 1]]), whose shape is (1, 2) to the inducer 3, and the inducer would use seed_size = tensor([[0, 1]]).size(0) = 1

I feel like the existing behavior is somewhat surprising, as it essentially means if you pass in a tensor of greater than dim 1, some of your inputs get ignored.

Alternatives:

I considered making it so we could pass in NodeSamplerInput (or a subclass) to DisNeighborLoader but that seemed quite complicated given that the sampler input has some interplay with the worker options/etc and I wasn't sure how to consolidate all of the types of input_data (input_data: list[RemoteNodeSplitSamplerInput] | RemoteNodeSplitSamplerInput | RemoteNodePathSamplerInput | NodeSamplerInput) 4

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants