Description
🚀 The feature, motivation and pitch
I've recently come across the method AddMetaPaths
and I found it quite slow particularly on large-scale heterogeneous graphs. Take an example in this discussion:
from torch_geometric.datasets import AMiner
from torch_geometric.transforms import AddMetaPaths
data = AMiner(root='./data/aminer_pyg/')[0]
metapaths = [[("author", "paper"), ("paper", "author")],
[("author", "paper"), ("paper", "venue"), ("venue", "paper"), ("paper", "author")]]
data = AddMetaPaths(metapaths)(data)
it takes me hours and much memory to run on the AMiner
graph, even though I specified max_sample=1
in AddMetaPaths
.
Alternatives
I think the implementation can be accelerated via multiple one-step random walks to generate the metapaths. For example, for the metapath [("author", "paper"), ("paper", "author")]
, we can perform one-step random walk on ("author", "paper")
graph, and then on ("paper", "author")
graph, to generate the metapath efficiently.
I've tested this method and it run much faster than the current implementation with lower memory overheads. Would it be possible to be added to PyG?
Additional context
No response