Skip to content

A faster implementation of AddMetaPaths #5390

Closed
@EdisonLeeeee

Description

@EdisonLeeeee

🚀 The feature, motivation and pitch

I've recently come across the method AddMetaPaths and I found it quite slow particularly on large-scale heterogeneous graphs. Take an example in this discussion:

from torch_geometric.datasets import AMiner
from torch_geometric.transforms import AddMetaPaths
data = AMiner(root='./data/aminer_pyg/')[0]
metapaths = [[("author", "paper"), ("paper", "author")],
[("author", "paper"), ("paper", "venue"), ("venue", "paper"), ("paper", "author")]]
data = AddMetaPaths(metapaths)(data)

it takes me hours and much memory to run on the AMiner graph, even though I specified max_sample=1 in AddMetaPaths.

Alternatives

I think the implementation can be accelerated via multiple one-step random walks to generate the metapaths. For example, for the metapath [("author", "paper"), ("paper", "author")], we can perform one-step random walk on ("author", "paper") graph, and then on ("paper", "author") graph, to generate the metapath efficiently.

I've tested this method and it run much faster than the current implementation with lower memory overheads. Would it be possible to be added to PyG?

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions