Feature: Parallel iterations #63
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add parallel iterations along with a couple of bug fixes/improvements. I would consider this a pretty important feature because of the significant speedup it provides to training. This is the implementation that I tried and has worked for my use cases.
Primary changes:
concurrent.futures
incontroller.py
to spawn processes for performing the iterations of trainingrun_iteration_sync
in a new fileiteration.py
which is run by the workers to allow concurrent executionLLMEnsemble
andProgramDatabase
(these classes are not pickleable so this is the best approach I could think of to use these classes in each of the worker processes)Minor changes:
calculate_edit_distance
function was crashing the database when I was using it and since there's already libraries that do this routine I ended up using one of them (levenshtein
)_calculate_island_diversity
since it's easier to read_calculate_feature_coords
since it's normalized for code lengthallowed_population_overflow
since otherwise the database was adding and removing a program every iteration when it reach the allowed program limit