Skip to content

Add Performer to GPSConv #7465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jun 11, 2023
Merged

Add Performer to GPSConv #7465

merged 19 commits into from
Jun 11, 2023

Conversation

zechengz
Copy link
Member

@zechengz zechengz commented May 30, 2023

Description

ZINC Example

python3 examples/graph_gps.py --attn_type performer

I directly replace the multihead attention with performer fast attention with the ReLU kernel. The performance is similar (there are some randomness for each run and for the graph classification task) Following is the test_mae plot for each epoch.

training

@codecov
Copy link

codecov bot commented May 30, 2023

Codecov Report

Merging #7465 (1481cdc) into master (0520bbb) will decrease coverage by 0.30%.
The diff coverage is 99.03%.

❗ Current head 1481cdc differs from pull request most recent head 4f98778. Consider uploading reports for the commit 4f98778 to get more accurate results

@@            Coverage Diff             @@
##           master    #7465      +/-   ##
==========================================
- Coverage   91.70%   91.40%   -0.30%     
==========================================
  Files         447      449       +2     
  Lines       24925    25024      +99     
==========================================
+ Hits        22857    22874      +17     
- Misses       2068     2150      +82     
Impacted Files Coverage Δ
torch_geometric/nn/attention/performer.py 98.90% <98.90%> (ø)
torch_geometric/nn/attention/__init__.py 100.00% <100.00%> (ø)
torch_geometric/nn/conv/gps_conv.py 100.00% <100.00%> (ø)

... and 17 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@wsad1 wsad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some initial comments. Will take a look at it again.
Also, what sort of speedups do you see using Performer, it'll be nice to benchmark this.

return out


class PerformerAttention(torch.nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zechengz , can PerformerAttention be used by any PyG model that needs multihead attention. If so can we move this to some common place like torch_geometric/nn/models or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that moving to torch_geometric/nn/models will cause some circular import. Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Since we plan to add more attention modules. Lets create a folder called torch/nn/attention, and put all the attention modules there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@zechengz zechengz requested a review from wsad1 June 2, 2023 08:16
Copy link
Member

@wsad1 wsad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zechengz I am not sure how many attention models we plan to add. But if there are potentially several out there and multiple pyg modules could use them, we should define a Attention base class and have all the attention modules we add use the same interface, similar to torch_geometric/nn/aggr. With this the user can simply pass any attention module as an argument to the GNN modules that support attention. WDYT? cc: @rusty1s

I don't want to gate this PR, with the above suggestion though. I've left some more comments and it looks good.

return out


class PerformerAttention(torch.nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Since we plan to add more attention modules. Lets create a folder called torch/nn/attention, and put all the attention modules there.

@zechengz
Copy link
Member Author

zechengz commented Jun 5, 2023

@wsad1 Current plan is to add this performer and I am also working on another one called bigbird. So, with normal multihead attention we at least have three different global attentions. I also observed that there are also relative a lot of attention methods other than performer and bigbird that can be useful to add to PyG. So creating the folder torch/nn/attention makes sense to me. Thanks!

@zechengz zechengz requested a review from wsad1 June 5, 2023 07:03
Copy link
Member

@wsad1 wsad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zechengz !
Good to merge after the comments are addressed.

@zechengz zechengz merged commit e4297b1 into pyg-team:master Jun 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants