Adding a cell list neighbor list module #169

RaulPPelaez · 2023-05-03T11:22:54Z

This PR includes a new module called DistanceCellList (looking for a better name), which is an alternative to the Distance module that provides three strategies to find neighbors:

The O(N^2) getNeighborPairs functionality from NNPops, referred to as brute, which required making it batch-aware. The batching functionality is actually a really minimal modification, so this could be upstreamed to NNPops and that used instead.
A cell list distance module, using a classic spatial hashing approach (see here)
An improved O(N^2) approach referred to as shared (from NVIDIA here)

The current Distance module has some drawbacks:

Really slow for a single batch
Does not understand periodic boundary conditions
Incompatible with CUDA graphs
Always returns redundant pairs (give you i,j and j,i). This is required for the current Message Passing modules.

The new module solves all these by being:

Two orders of magnitude faster for a single batch (while being at least as fast as Distance in all instances tested)
CUDA graph compatible (and jit.script compatible)
Drop in replacement to Distance when using default parameters*.
Compatible with periodic boundary conditions.
Able to optionally skip redundant neighbors, saving time and memory.

*There is a caveat, Distance requires max_num_neighs (maximum allows neighbors per particle), DistanceCellList requires max_num_pairs (maximum number of total neighbor pairs).

#From DistanceCellList:
"""
        max_num_pairs : int
            Maximum number of pairs to store.
            If negative, it is interpreted as (minus) the maximum number of neighbors per atom.
"""

This is the current declaration of the new module:

torchmd-net/torchmdnet/models/utils.py

Lines 81 to 129 in 730b0a1

    
           class DistanceCellList(torch.nn.Module): 
        
               def __init__( 
        
                   self, 
        
                   cutoff_lower=0.0, 
        
                   cutoff_upper=5.0, 
        
                   max_num_pairs=-32, 
        
                   return_vecs=False, 
        
                   loop=False, 
        
                   strategy="brute", 
        
                   include_transpose=True, 
        
                   resize_to_fit=True, 
        
                   check_errors=True, 
        
                   box=None, 
        
               ): 
        
                   super(DistanceCellList, self).__init__() 
        
                   """ Compute the neighbor list for a given cutoff. 
        
                   This operation can be placed inside a CUDA graph in some cases. 
        
                   In particular, resize_to_fit and check_errors must be False. 
        
                   Parameters 
        
                   ---------- 
        
                   cutoff_lower : float 
        
                       Lower cutoff for the neighbor list. 
        
                   cutoff_upper : float 
        
                       Upper cutoff for the neighbor list. 
        
                   max_num_pairs : int 
        
                       Maximum number of pairs to store. 
        
                       If negative, it is interpreted as (minus) the maximum number of neighbors per atom. 
        
                   strategy : str 
        
                       Strategy to use for computing the neighbor list. Can be one of 
        
                       ["shared", "brute", "cell"]. 
        
                       Shared: An O(N^2) algorithm that leverages CUDA shared memory, best for large number of particles. 
        
                       Brute: A brute force O(N^2) algorithm, best for small number of particles. 
        
                       Cell:  A cell list algorithm, best for large number of particles, low cutoffs and low batch size. 
        
                   box : Optional[torch.Tensor] 
        
                       Size of the box, shape (3,3) or None. 
        
                       If strategy is "cell", the box must be diagonal. 
        
                   loop : bool 
        
                       Whether to include self-interactions. 
        
                   include_transpose : bool 
        
                       Whether to include the transpose of the neighbor list. 
        
                   resize_to_fit : bool 
        
                       Whether to resize the neighbor list to the actual number of pairs found. When False, the list is padded with (-1,-1) pairs up to max_num_pairs 
        
                       If this is True the operation is not CUDA graph compatible. 
        
                   check_errors : bool 
        
                       Whether to check for too many pairs. If this is True the operation is not CUDA graph compatible. 
        
                   return_vecs : bool 
        
                       Whether to return the distance vectors. 
        
                   """

Changes to the installation process

This operation is written in C++ and CUDA. TorchMD-Net does not currently have an in place build system being only python.
I build this as a torch cpp_extension with jit compilation
. Meaning that the CUDA/C++ code is compiled transparently the first time DistanceCellList is instantiated in a way compatible with the current "pip install -e ." workflow.
If an user does not use the new module, nothing is compiled and no additional dependencies/overhead are required.

OTOH, a user constructing DistanceCellList will:

Require NVCC installed with a CUDA version ABI compatible with torch (cudatoolkit-dev from conda-forge works)
Experience a several minutes compilation time the first time DistanceCellList is used after "pip install -e ." (torch will cache the compilation).

Tasks:

Challenges:

The brute approach cannot handle more than 32K particles total. AFAIK, there is no way to make it work without destroying what makes it be so fast. Anyhow, this strategy is not really suited for such high workloads. There is a guard that simply forces the shared strategy to be used if the user selected brute but more than 32K particles are requested.
The performance of the cell strategy degrades quickly with the number of batches. This is because I construct a single cell list such that particles with the same cell are contiguous in memory. When traversing the cell list, one thus finds particles from all batches, forcing a lot of unnecessary checks. I mitigate this by sorting by batch inside each cell, allowing to skip some pairs, but this could be done in a more smart way, I am sure (maybe a binary search looking for the first atom in the current particle`s batch?).
The alternative, constructing a cell list per batch, requires much more memory and cannot be done without GPU-CPU memory copies.
Automatically choosing a strategy based on some heuristic. I tried this in a million ways, but jit.script is not taking it. The heuristic cannot be applied until the forward method (when positions and batch are known), changing the function dynamically like that is just not something that TorchScript supports as far as I can tell.

Use kInt32 for batch Update cell list impl.

Document cell list implementation Clean up a bit

behind an option) Add test to check identical outputs compared to Distance

include_traspose (whether to include redundant pairs)

triclinic boxes, cell list only rectangular) Some optimizations overall Update tests Update benchmark

Move error checking python-side update tests update benchmarks

Allow batch to be None (defaults to all zeros)

Use pytorch CUDACachingAllocator with thrust::sort and for temporary memory

RaulPPelaez · 2023-05-22T09:44:27Z

I did a quick skim through the code. A thorough review is going to take a long time, given how much code there is! But I had a couple of high level thoughts about it.

Thanks for taking a look Peter! I am aware its a lot of code -.-

First, I wonder if the CUDA parts would go better in NNPOps? Building a neighbor list is a really common operation. That would make it available for more than just TorchMD-Net.

I think this would indeed be a good addition. Feel free to upstream this to NNPOps and we will then switch to that in torchmd-net. I can help too. Currently NNPOps does not support some of the things that are implemented here. In particular the possibility to include self interactions and "transpose" pairs.
These can be added after the NNPOps operation in a simple way, although I am not sure how to do so while remaining CUDA-graph compatible. I can only think of ways to go about it that cost performance and memory.

To port this to NNPOps we need to decide if we want to put the following functionality there:

Add a lower cutoff
Add the include_transpose and loop options
How to handle the selection of the different strategies (three separated functions vs one with a parameter?)
Batches.

All of these can be implemented in NNPOps as additions to the interface to remain retrocompatible.

As a side note, I studied the voxel implementation you shared. One thing that I found improves traversal performance a lot is to use a single loop to go over cells:

torchmd-net/torchmdnet/neighbors/neighbors_cuda_cell.cu

Lines 482 to 485 in 9d5028e

    
           for (int i = 0; i < 27; i++) { 
        
               const int neighbor_cell = getNeighborCellIndex(cell_i, i, cell_dim); 
        
               addNeighborsForCell(i_atom, neighbor_cell, cell_list, box_size, list); 
        
           }

Instead of three nested ones:
https://github.com/openmm/openmm/blob/d6cca3903aa0be02c105aed4febcfa2747f48fc1/platforms/reference/src/SimTKReference/ReferenceNeighborList.cpp#L154

Its the classic "recompute whatever to avoid branching" CUDA tradeoff, also promotes unrolling. This is cool and all, but completely destroys the triclinic traversal strategy you have in the ref. Not sure how to go about it yet...

Second, this code seems way more complicated than it needs to be. For example, everything related to hashing. That's a textbook implementation of a generic voxel structure that can support arbitrarily sparse data points scattered over an arbitrarily large volume of space. But for the sorts of applications we care about, it's way more complexity than we need.

For molecular models where atoms are evenly distributed over a fixed volume, it's really easy to sort them into voxels.
1. Record the index of the voxel each atom is in.

2. Sort them.

3. For every voxel, record the index of the first atom in the sorted list that's in that voxel.

This is exactly what my code does, but I use a 64 bit hash composed of a Morton hash and the batch index to sort instead of just the cell linear index

torchmd-net/torchmdnet/neighbors/neighbors_cuda_cell.cu

Lines 137 to 142 in 9d5028e

    
           auto ci = getCell(pi, box_size, cutoff); 
        
           // Calculate the hash 
        
           const uint32_t hash = hashMorton(ci); 
        
           // Create a hash combining the Morton hash and the batch index, so that atoms in the same cell 
        
           // are contiguous 
        
           const uint64_t hash_final = (static_cast<uint64_t>(hash) << 32) | i_batch;

Step 1 can be implemented with PyTorch in just a small amount of Python code.

Why separate this from the rest of the CUDA implementation? Assigning the hash is a small kernel launch in C++, and I am much more familiar with bit manipulation there.

Step 2 is a single call to torch.sort().

I resorted to use the Radix sort implementation that comes with CUDA via cub because:

Tensor (and thus torch.sort) does not support uint64, which I want to include both cell-hash and batch as hash.
I could not make torch.sort play well with CUDA graphs. I also had this problem with thrust::sort (both torch and thrust call cub::DeviceRadixSort like I do down the line). Both of them synchronize at some point, be it to allocate some temp memory, copy, or decide on the number of radix sweeps (this is a guess).

The Morton hash only uses 30 bits, so I could ignore the two last bits and use torch.long, solving 1. I chose not to do that because I suspect I can leverage these bits to improve batch handling in the future. Right now I construct a single cell list with all batches in it, but maybe a cell-list-per-batch is best? Which amounts to just switching the order here:

torchmd-net/torchmdnet/neighbors/neighbors_cuda_cell.cu

Line 142 in 9d5028e

const uint64_t hash_final = (static_cast<uint64_t>(hash) << 32) | i_batch;

One hurtful thing to achieve this is that you do not actually know the number of batches CPU side, so you need to cudamemcpy to implement this. Not sure how to solve that...

For 2 though, calling the low-level-ish cub::DeviceRadixSort directly takes ~20 lines of boilerplate code, I do not see the value of loosing CUDA graphs for that. I could agree if all the rest could be written in pytorch, which would make the thing work with all torch backends automatically. But since the list traversal requires a CUDA kernel anyway...
Also, cub exposes these begin_bit/end_bit parameters https://nvlabs.github.io/cub/structcub_1_1_device_radix_sort.html that thrust and torch hardcode to 8*sizeof(KeyT) but can be used to do funny things when sorting, such as sparing radix sweeps. Granted, not that important when sorting is basically instantaneous, but still useful because it saves kernel launches.

Step 3 is a single linear pass through the sorted list, though it's also easy to parallelize for efficiency. And you're done. The atoms in voxel i are the ones from voxelStart[i] to voxelStart[i+1].

These things change, but last time I checked the access pattern resulting from having voxelStart and voxelEnd (while taking twice the memory) was much faster. Anyhow this is what this kernel is doing:

torchmd-net/torchmdnet/neighbors/neighbors_cuda_cell.cu

Lines 216 to 248 in 9d5028e

    
           template <typename scalar_t> 
        
           __global__ void fillCellOffsetsD(const Accessor<scalar_t, 2> sorted_positions, 
        
                                            const Accessor<int32_t, 1> sorted_indices, 
        
                                            Accessor<int32_t, 1> cell_start, Accessor<int32_t, 1> cell_end, 
        
                                            scalar3<scalar_t> box_size, scalar_t cutoff) { 
        
               // Since positions are sorted by cell, for a given atom, if the previous atom is in a different 
        
               // cell, then the current atom is the first atom in its cell We use this fact to fill the 
        
               // cell_start and cell_end arrays 
        
               const int32_t i_atom = blockIdx.x * blockDim.x + threadIdx.x; 
        
               if (i_atom >= sorted_positions.size(0)) 
        
                   return; 
        
               const auto pi = fetchPosition(sorted_positions, i_atom); 
        
               const int3 cell_dim = getCellDimensions(box_size, cutoff); 
        
               const int icell = getCellIndex(getCell(pi, box_size, cutoff), cell_dim); 
        
               int im1_cell; 
        
               if (i_atom > 0) { 
        
                   int im1 = i_atom - 1; 
        
                   const auto pim1 = fetchPosition(sorted_positions, im1); 
        
                   im1_cell = getCellIndex(getCell(pim1, box_size, cutoff), cell_dim); 
        
               } else { 
        
                   im1_cell = 0; 
        
               } 
        
               if (icell != im1_cell || i_atom == 0) { 
        
                   int n_cells = cell_start.size(0); 
        
                   cell_start[icell] = i_atom; 
        
                   if (i_atom > 0) { 
        
                       cell_end[im1_cell] = i_atom; 
        
                   } 
        
               } 
        
               if (i_atom == sorted_positions.size(0) - 1) { 
        
                   cell_end[icell] = i_atom + 1; 
        
               } 
        
           }

Again, I decided to use a kernel instead of trying to torchify it because I anticipate one can be smart about it to improve batch handling.

Bringing in Thrust as a dependency also seems unnecessary. It looks like mostly all you're using it for are the min() and max() functions?

I initially used thrust::sort and some containers/allocators from thrust, seems like I forgot to take away some headers. Thrust has many convenience functions like min/max that replace the uncudable std alternatives, like min/max.
I do not see thrust as a dependency since it comes with CUDA, if you can include cuda.h you can include thrust. I can just roll my own thrust::min max, but why negate us from the utilities in thrust? its a good library and its always there AFAIK.
Is there a situation in which you have the CUDA headers but not thrust?

peastman · 2023-05-22T17:51:39Z

I don't understand what the hashes are adding. Ultimately you want to know the list of atoms in a particular voxel. Sorting by hash rather than voxel index just adds a lot of code complexity and runtime overhead for no obvious benefit.

I understand the benefit of hashes in the generic case, when you have arbitrarily sparse data scattered over an arbitrarily large volume of space. You might have billions of voxels, almost all of which are empty. But that's not the case in molecular simulations. The data is evenly distributed over a small area of space. The number of voxels is in the thousands at most, and few are empty.

Why separate this from the rest of the CUDA implementation?

You can replace a few hundred lines of CUDA code with probably 10-20 lines of Python, and it will be just as fast. This isn't the bottleneck operation.

torch::sort for minimal performance loss. Change block size to 128

…les_all

RaulPPelaez · 2023-05-24T09:28:57Z

I don't understand what the hashes are adding. Ultimately you want to know the list of atoms in a particular voxel. Sorting by hash rather than voxel index just adds a lot of code complexity and runtime overhead for no obvious benefit.

I believe I did not conveyed correctly a detail in my strategy, I do not only compute the cell index/hash of each position and sort the indexes, I also reorder the positions and batch arrays according to this hash and use these sorted copies when traversing the cell list.

This has a profound impact in performance because it increases data locality, not only in terms of cache but also increasing coherence.

This is where the Z-order hash comes into play, since one goes over the neighboring 27 cells, a Z order increases the change of neighboring cells being contiguous in memory. A simple linear cell index makes it so one has to jump n.x*n.y elements in the voxelStart array when traversing the cells of a different height.

This improved things by like 20% when I first implemented this (in a GTX980), I checked now in a 4090 and a Titan V and the effect of this is negligible. I guess cache sizes are crazy now :p. Or maybe the overhead of the atomic addition of neighbor pairs hides any gains from this.
So ok, lets leave the simpler cell index as hash.
I also tried to not include the batch in the hash, which prevents from breaking early during traversal based on batch. Surprisingly, skipping the batch check actually increases performance a bit. Maybe actually loading chunks of other batches actually helps cache overall?
I took it out, which allows me to leave the hash as an int32 and use torch::sort. Annoyingly torch::sort returns the indexes in int64, requiring an extra cast. However this is negligible.

Now, recording the voxel index of each particle requires some arithmetics and checks, my pytorch-fu is not enough to see how to transform this into an efficient sequence of torch calls:

torchmd-net/torchmdnet/neighbors/neighbors_cuda_cell.cuh

Lines 56 to 89 in 48e40e3

    
           /* 
        
            * @brief Get the cell index of a point 
        
            * @param p The point position 
        
            * @param box_size The size of the box in each dimension 
        
            * @param cutoff The cutoff 
        
            * @return The cell index 
        
            */ 
        
           template <typename scalar_t> 
        
           __device__ int3 getCell(scalar3<scalar_t> p, scalar3<scalar_t> box_size, scalar_t cutoff) { 
        
               p = rect::apply_pbc<scalar_t>(p, box_size); 
        
               // Take to the [0, box_size] range and divide by cutoff (which is the cell size) 
        
               int cx = floorf((p.x + scalar_t(0.5) * box_size.x) / cutoff); 
        
               int cy = floorf((p.y + scalar_t(0.5) * box_size.y) / cutoff); 
        
               int cz = floorf((p.z + scalar_t(0.5) * box_size.z) / cutoff); 
        
               int3 cell_dim = getCellDimensions(box_size, cutoff); 
        
               // Wrap around. If the position of a particle is exactly box_size, it will be in the last cell, 
        
               // which results in an illegal access down the line. 
        
               if (cx == cell_dim.x) 
        
                   cx = 0; 
        
               if (cy == cell_dim.y) 
        
                   cy = 0; 
        
               if (cz == cell_dim.z) 
        
                   cz = 0; 
        
               return make_int3(cx, cy, cz); 
        
           } 
        
           /* 
        
            * @brief Get the index of a cell in a 1D array of cells. 
        
            * @param cell The cell coordinates, assumed to be in the range [0, cell_dim]. 
        
            * @param cell_dim The number of cells in each dimension 
        
            */ 
        
           __device__ int getCellIndex(int3 cell, int3 cell_dim) { 
        
               return cell.x + cell_dim.x * (cell.y + cell_dim.y * cell.z); 
        
           }

I am also scared to torchify just this, since the above device functions are also used when traversing and translating construction to torch would require implementing this logic twice.

But that's not the case in molecular simulations. The data is evenly distributed over a small area of space. The number of voxels is in the thousands at most, and few are empty.

I do not have any hope for the cell list to be a viable option for inference of small molecules. I am thinking about dense large systems, like a water box, or some protein with implicit water. Something like 32^3 cells and above. To me anything below that sounds like a N^2 with extra steps. Maybe I should aim for other stuff?

…les_all

raimis · 2023-05-25T09:39:25Z

We need this in production as soon as possible. After discussing with @RaulPPelaez, we decided:

The cell list algorithm seems to be good enough for our purpose. Further improvements (if any) will be in a separate PR.
The cell list algorithm does not support the triclinic cells. We don't need that at the moment. Let's just open an issue to remind.

peastman · 2023-05-25T21:51:49Z

I am thinking about dense large systems, like a water box, or some protein with implicit water. Something like 32^3 cells and above.

In a water box, there are usually no empty voxels. So anything you do is going to be linear in the number of voxels (which is linear in the number of atoms).

The cell list algorithm seems to be good enough for our purpose. Further improvements (if any) will be in a separate PR.

Ok. Let me do one review pass through the code first.

The cell list algorithm does not support the triclinic cells. We don't need that at the moment. Let's just open an issue to remind.

Agreed.

peastman · 2023-05-25T21:57:18Z

torchmdnet/models/utils.py

@@ -77,6 +79,164 @@ def message(self, x_j, W):
        return x_j * W


+class OptimizedDistance(torch.nn.Module):


Would it make sense to just replace the existing Distance class, instead of adding another class with a different name? Are there cases when someone would prefer the old class instead?

The reason I did not did this is because I am weary of a current use of Distance relying on something unexpected (maybe related to the ordering?). I am waiting for regular users of each model to tell me "I trained with the new Distance and everything is ok". Maybe a bit superstitious, but if it is all the same I would rather do another PR to replace current uses.

torchmdnet/neighbors/neighbors_cuda_cell.cuh

peastman · 2023-05-25T22:52:02Z

torchmdnet/neighbors/neighbors_cuda_cell.cuh

+    int3 periodic_cell = cell;
+    if (cell.x < 0)
+        periodic_cell.x += cell_dim.x;
+    if (cell.x >= cell_dim.x)
+        periodic_cell.x -= cell_dim.x;
+    if (cell.y < 0)
+        periodic_cell.y += cell_dim.y;
+    if (cell.y >= cell_dim.y)
+        periodic_cell.y -= cell_dim.y;
+    if (cell.z < 0)
+        periodic_cell.z += cell_dim.z;
+    if (cell.z >= cell_dim.z)
+        periodic_cell.z -= cell_dim.z;
+    return periodic_cell;


This assumes it will never be off by more than one grid width. Is that guaranteed to be true? A safer (and much simpler) implementation is

return make_int3(cell.x%cell_dim.x, cell.y%cell_dim.y, cell.z%cell_dim.z);

Careful, in C++ -1%10 is -1, not 9

Which is why the previous suggestion for getCell using modulus also does not work when particles are left to the main box, ugghhhh.
What am I missing here?

torchmdnet/neighbors/neighbors_cuda_cell.cuh

torchmdnet/neighbors/common.cuh

This reverts commit 5459db6.

check PBC correctness

RaulPPelaez · 2023-05-29T09:20:04Z

I am going to merge this now so we can move on. @peastman please feel free to PR if you see how to transform some of the kernels to torch ops, I gave it a try but got nowhere.
Will open a new PR as I switch current uses of Distance with OptimizedDistance.

RaulPPelaez added 23 commits April 29, 2023 18:46

Initial implementation, brute force

76a48d1

Move DistanceCellList to utils

92eaea8

Add benchmark

6097a19

Add CellList implementation source

85b77b3

Add CellList implementation source

6f7c165

Add strategy and box as parameters

f0ddc72

Use kInt32 for batch Update cell list impl.

Update test

cb8a586

Working implementation

0700aae

Adapt to work with scalar_t instead of float/double

658eb0a

Document cell list implementation Clean up a bit

Adapt brute force method to understand batch

badecdc

Update test_neigbors

a25eeec

Ensure number of total pairs to check does not overflow int32

7040359

Update benchmark

9afad75

Improve efficiency of tests and benchmark

2589479

Add neighbor sources to setup.py

8109186

Remove unnecessary synchronization barrier

f838068

Add loop and cutoff_lower parameters

9767972

Added the return_vecs option for api compatibility with Distance

6f1645d

Fix typo

9cb9ca6

Remove (-1,-1) pairs python-side

9acb8b0

Return i,j and j,i pairs to mimic Distance (probably will put it

9f42956

behind an option) Add test to check identical outputs compared to Distance

Use kLong for batch

e6004dd

Add options resize_to_fit (whether to trim -1,-1 pairs) and

493a9ae

include_traspose (whether to include redundant pairs)

RaulPPelaez mentioned this pull request May 5, 2023

Periodic boundary conditions #92

Closed

RaulPPelaez added 6 commits May 5, 2023 21:50

Add Periodic Boundary Conditions (CPU and brute force support

b066b31

triclinic boxes, cell list only rectangular) Some optimizations overall Update tests Update benchmark

Add shared-memory nbody algorithm

730ecd4

Move error checking python-side update tests update benchmarks

Make cpu implementation torch only to get autograd

713dcad

Annotate DistanceCellList

9e9cfab

Allow batch to be None (defaults to all zeros)

Add backward pass with corresponding tests

3517d6f

Use pytorch CUDACachingAllocator with thrust::sort and for temporary memory

Actually add the backwards pass code

24ff0fe

RaulPPelaez added 4 commits May 22, 2023 11:47

Small changes

6b941ff

Remove unnecessary thrust headers

526d366

Change name of extension so it does not collide with NNPOps one

a75a521

Remove verbose

59e442f

RaulPPelaez added 4 commits May 24, 2023 10:44

Do not include batch in hash. This allows to simplify the code and use

8f9c30f

torch::sort for minimal performance loss. Change block size to 128

Update benchmark

95e6669

Merge remote-tracking branch 'origin/main' into neighbors_one_list_ru…

1533f9d

…les_all

Add ninja to build the C++ extension

48e40e3

RaulPPelaez added 2 commits May 24, 2023 14:15

Use cell index as hash

0d99bd7

Merge remote-tracking branch 'origin/main' into neighbors_one_list_ru…

0e50bd6

…les_all

raimis approved these changes May 25, 2023

View reviewed changes

peastman reviewed May 26, 2023

View reviewed changes

RaulPPelaez added 10 commits May 26, 2023 09:32

Use min/max CUDA builtins instead of thrust

81a9bf9

Simplify getCell

5459db6

Remove unused variable, add some consts

27adbbe

Simplify code by assuming particles are always sorted by cell index

27a3e87

Remove and/or keywords

5dad9a9

Small changes to common

c18d2c7

Change a comment

c89b987

Fix bug introduced in previous commit

cd6b6f9

Revert "Simplify getCell"

4382741

This reverts commit 5459db6.

Move CoM of particles in neighbor tests outside of the main box to

fb554c9

check PBC correctness

Remove commented-out code

605a4b5

RaulPPelaez merged commit e20876f into torchmd:main May 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding a cell list neighbor list module #169

Adding a cell list neighbor list module #169

Uh oh!

RaulPPelaez commented May 3, 2023 •

edited

Loading

Uh oh!

RaulPPelaez commented May 22, 2023

Uh oh!

peastman commented May 22, 2023

Uh oh!

RaulPPelaez commented May 24, 2023

Uh oh!

raimis commented May 25, 2023

Uh oh!

peastman commented May 25, 2023

Uh oh!

peastman May 25, 2023

Uh oh!

RaulPPelaez May 26, 2023

Uh oh!

Uh oh!

Uh oh!

peastman May 25, 2023

Uh oh!

RaulPPelaez May 26, 2023

Uh oh!

RaulPPelaez May 26, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RaulPPelaez commented May 29, 2023

Uh oh!

Uh oh!

	class DistanceCellList(torch.nn.Module):

	def __init__(
	self,
	cutoff_lower=0.0,
	cutoff_upper=5.0,
	max_num_pairs=-32,
	return_vecs=False,
	loop=False,
	strategy="brute",
	include_transpose=True,
	resize_to_fit=True,
	check_errors=True,
	box=None,
	):
	super(DistanceCellList, self).__init__()
	""" Compute the neighbor list for a given cutoff.
	This operation can be placed inside a CUDA graph in some cases.
	In particular, resize_to_fit and check_errors must be False.
	Parameters
	----------
	cutoff_lower : float
	Lower cutoff for the neighbor list.
	cutoff_upper : float
	Upper cutoff for the neighbor list.
	max_num_pairs : int
	Maximum number of pairs to store.
	If negative, it is interpreted as (minus) the maximum number of neighbors per atom.
	strategy : str
	Strategy to use for computing the neighbor list. Can be one of
	["shared", "brute", "cell"].
	Shared: An O(N^2) algorithm that leverages CUDA shared memory, best for large number of particles.
	Brute: A brute force O(N^2) algorithm, best for small number of particles.
	Cell: A cell list algorithm, best for large number of particles, low cutoffs and low batch size.
	box : Optional[torch.Tensor]
	Size of the box, shape (3,3) or None.
	If strategy is "cell", the box must be diagonal.
	loop : bool
	Whether to include self-interactions.
	include_transpose : bool
	Whether to include the transpose of the neighbor list.
	resize_to_fit : bool
	Whether to resize the neighbor list to the actual number of pairs found. When False, the list is padded with (-1,-1) pairs up to max_num_pairs
	If this is True the operation is not CUDA graph compatible.
	check_errors : bool
	Whether to check for too many pairs. If this is True the operation is not CUDA graph compatible.
	return_vecs : bool
	Whether to return the distance vectors.
	"""

		@@ -77,6 +79,164 @@ def message(self, x_j, W):
		return x_j * W


		class OptimizedDistance(torch.nn.Module):

Adding a cell list neighbor list module #169

Adding a cell list neighbor list module #169

Uh oh!

Conversation

RaulPPelaez commented May 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes to the installation process

Tasks:

Challenges:

Uh oh!

RaulPPelaez commented May 22, 2023

Uh oh!

peastman commented May 22, 2023

Uh oh!

RaulPPelaez commented May 24, 2023

Uh oh!

raimis commented May 25, 2023

Uh oh!

peastman commented May 25, 2023

Uh oh!

peastman May 25, 2023

Choose a reason for hiding this comment

Uh oh!

RaulPPelaez May 26, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

peastman May 25, 2023

Choose a reason for hiding this comment

Uh oh!

RaulPPelaez May 26, 2023

Choose a reason for hiding this comment

Uh oh!

RaulPPelaez May 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RaulPPelaez commented May 29, 2023

Uh oh!

Uh oh!

RaulPPelaez commented May 3, 2023 •

edited

Loading

RaulPPelaez May 26, 2023 •

edited

Loading