From 64726fcd0f8cd7bf8cba15c9dd485dd652dc0895 Mon Sep 17 00:00:00 2001 From: Tristan Youngs Date: Wed, 3 Jun 2020 10:56:43 +0100 Subject: [PATCH 1/3] Add draft parallelism design doc. --- design/parallelism.md | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) create mode 100644 design/parallelism.md diff --git a/design/parallelism.md b/design/parallelism.md new file mode 100644 index 0000000..83936b9 --- /dev/null +++ b/design/parallelism.md @@ -0,0 +1,41 @@ +# Dissolve Parallel Tech Spec + +## Introduction + +### Purpose +This document lays out the current and future technical requirements for the parallel architecture of the Dissolve project. + +### Background + +Dissolve is an atomistic simulation code for the interrogation and analysis of neutron scattering data. Since the simulation boxes (collections of atoms) that it produces must be representative of the physical systems measured experimentally, particle numbers of 1 million and beyond must be treatable. This is not practical on a single-core machine, or even a modestly powerful workstation, and so suitable parallel technology must be implemented. + +### Current Implementation + +At present (v0.7.0) Dissolve utilises the widely-available OpenMPI standard to achieve parallelism across multiple CPU cores over multiple physical processing enclosures. Each slave process maintains its own copy of all simulation data (replicated data strategy) leading to high memory requirements. + +Dissolve's original design was to allow partitioning of processors into sub-groups in order to permit: + +1. Fine-grained control over partitioning of workload in order to achieve balance between processing time and communication overhead. +2. Forking of the main loop to allow simultaneous calculations targetting different atomic configurations or modules + +1) is particularly important in the implementation of parallel Monte Carlo techniques, where trial moves must be distributed throughout the simulation box in such a way that they are independent of each other. This is achievable, but heavily system dependent (size, molecule type, etc.) and requires adaptive partitioning of the available processes. As a consequence of 2), each main loop iteration requires a broadcast of all simulation data between processes to ensure that each process has up-to-date information, forcing the requirement for T::broadcast() and T::equality() (enforced by inheriting from GenericItemBase) for each class object that needs to be distributed. + +## Requirements + +### Code + +- A single codebase must be maintained for serial, GUI and parallel versions, with parallelism built-in when requested through the build generator (CMake). +- The GUI version will remain as a serial code, as its scope covers only the set-up and analysis of simulations, and the running of small tutorial systems that do not require to be run in parallel. + +### General Approach + +- The ability to partition available processes into task-groups for certain algorithms is beneficial. +- Scaling over CPU- and GPU-based multi-core systems should be considered. +- Replicated data strategies over CPU cores should not be employed - such approaches are not feasible for systems of 1 million particles and beyond. + +### System Architecture + +- Parallelisation of the Linux non-GUI version is essential. +- Parallelisation of the OSX non-GUI version is desirable. +- Parallelisation of the Windows non-GUI version is beneficial, but not essential. +- Parallelisation of the GUI version is desirable, but not essential. From 6ad035d1071aba17a15d5bcb0612fff9dd03aae6 Mon Sep 17 00:00:00 2001 From: Tristan Youngs Date: Mon, 8 Jun 2020 18:29:29 +0100 Subject: [PATCH 2/3] Update doc. --- design/parallelism.md | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/design/parallelism.md b/design/parallelism.md index 83936b9..c992d0c 100644 --- a/design/parallelism.md +++ b/design/parallelism.md @@ -9,6 +9,16 @@ This document lays out the current and future technical requirements for the par Dissolve is an atomistic simulation code for the interrogation and analysis of neutron scattering data. Since the simulation boxes (collections of atoms) that it produces must be representative of the physical systems measured experimentally, particle numbers of 1 million and beyond must be treatable. This is not practical on a single-core machine, or even a modestly powerful workstation, and so suitable parallel technology must be implemented. +## Reason for Parallel Operation + +At present Dissolve's core operation is to generate (ensembles of) atomic coordinates for a given system that are consistent with measured experimental data. It achieves this through definition of a reference description of the basic (approximate) interactions occurring between atoms, and then iteratively modifies this potential while evolving the system using standard techniques such as Monte Carlo or Molecular Dynamics, or a combination of both. Once a suitable 'fit' to the experimental data has been achieved, further (long) simulations are performed in order to calculate structural properties of interest to the end user. The basic workflow is thus: + +REF -> ( SIM -> REFINE ) -> SIM + +The speed at which the program converges on the best solution is heavily dependent on the types of and number molecules present in the system, the strength of the interactions between atoms, the complexity of the system, and the completeness of the experimental data driving the refinement process. As such, the number of refinement iterations required in the main loop cannot easily be predicted, although thousands of iterations for small molecular liquids up to tens of thousands for self-assembled systems such as micelles are common. For multi-million atom systems with higher complexity the number of required refinement iterations will likely increase further. + +In the simulation itself, performed at least once on every refinement iteration, simple evolution mechanisms such as Molecular Dynamics scale as O(N(N+1)/2) (per timestep), which demands considerable computational effort for millions of atoms and beyond, and which might be run for 50 - 100 steps or more on each pass. However, these may be parlalelised with relatively little effort as the mutation to every particle in the system is calculated in one single calculation that itself does not alter the state. Monte Carlo offers a greater challenge to efficient parallelisation, where each particle or molecule is moved individually, and the energy change is used to guide whether the move is accepted or not. Since moving a particle or molecule affects the energy of the surrounding particles and molecules (within the cutoff distance) care must be taken to ensure that any parallel scheme trials simultaneous moves that are independent of each other. + ### Current Implementation At present (v0.7.0) Dissolve utilises the widely-available OpenMPI standard to achieve parallelism across multiple CPU cores over multiple physical processing enclosures. Each slave process maintains its own copy of all simulation data (replicated data strategy) leading to high memory requirements. @@ -18,7 +28,9 @@ Dissolve's original design was to allow partitioning of processors into sub-grou 1. Fine-grained control over partitioning of workload in order to achieve balance between processing time and communication overhead. 2. Forking of the main loop to allow simultaneous calculations targetting different atomic configurations or modules -1) is particularly important in the implementation of parallel Monte Carlo techniques, where trial moves must be distributed throughout the simulation box in such a way that they are independent of each other. This is achievable, but heavily system dependent (size, molecule type, etc.) and requires adaptive partitioning of the available processes. As a consequence of 2), each main loop iteration requires a broadcast of all simulation data between processes to ensure that each process has up-to-date information, forcing the requirement for T::broadcast() and T::equality() (enforced by inheriting from GenericItemBase) for each class object that needs to be distributed. +As a consequence of 2), each main loop iteration requires a broadcast of all simulation data between processes to ensure that each process has up-to-date information, forcing the requirement for T::broadcast() and T::equality() (enforced by inheriting from GenericItemBase) for each class object that needs to be distributed. + +Energy and force calculations are relatively straightforward to parallelise, simply involving striding through elements of arrays according to the number of available processors (or groups of processors) and followed by a sum of the final value(s) over the relevant MPI communicator. Monte Carlo techniques are parallelised through the use of a MoleculeDistributor class, which partitions the system contents in such a way as to allow parallel trialling of particle / molecule moves independently, based on the simulation cutoff distance for energy calculation. Point 1) above is important here, as the number of regions that the system can be partiioned in to is not directly predictable, and so requires adaptive partitioning of the available processes. ## Requirements @@ -38,4 +50,5 @@ Dissolve's original design was to allow partitioning of processors into sub-grou - Parallelisation of the Linux non-GUI version is essential. - Parallelisation of the OSX non-GUI version is desirable. - Parallelisation of the Windows non-GUI version is beneficial, but not essential. -- Parallelisation of the GUI version is desirable, but not essential. +- Parallelisation of the GUI version is not essential. +- Any parallel scheme so chosen should be flexible enough to permit the adoption and paralleilsation of future algorithms into the code. From 2a73728ce8021cebb949dac1dbc5904247f94a1c Mon Sep 17 00:00:00 2001 From: Tristan Youngs Date: Sat, 25 Jul 2020 09:34:22 +0100 Subject: [PATCH 3/3] Update gitignore. --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index a40d31a..522ab50 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,3 @@ *.swp *.backup +*.kdev4