Skip to content

SALTO-315, SALTO-316: Move parser to worker thread #423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

SALTO-315, SALTO-316: Move parser to worker thread #423

wants to merge 2 commits into from

Conversation

ori-moisis
Copy link
Contributor

  • Moved calls to WASM parser to a different thread to avoid blocking the main thread
  • Changed input for parse from buffer to string because in fact we always needed string

Trying to use worker pool libraries yielded too many strange errors on thread teardown so this implementation is just a single worker thread (i.e - no real performance gain).
This can be quite easily changed to run on a worker pool for better performance but hopefully this won't be very valuable once @roironn is done with the new parser.

@ori-moisis ori-moisis requested a review from a team November 17, 2019 16:26
@royra
Copy link

royra commented Nov 17, 2019

Codecov Report

Merging #423 into master will decrease coverage by 0.04%.
The diff coverage is 94.56%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #423      +/-   ##
==========================================
- Coverage   94.08%   94.03%   -0.05%     
==========================================
  Files         111      112       +1     
  Lines        4899     4946      +47     
  Branches      873      879       +6     
==========================================
+ Hits         4609     4651      +42     
- Misses        290      295       +5
Flag Coverage Δ
#unit_test 94.03% <94.56%> (-0.05%) ⬇️
Impacted Files Coverage Δ
packages/salto/src/workspace/workspace.ts 94.35% <ø> (ø) ⬆️
packages/salto/src/parser/internal/types.ts 100% <ø> (ø) ⬆️
packages/salto/src/parser/expressions.ts 100% <ø> (ø) ⬆️
packages/cli/package_native.js 0% <ø> (ø) ⬆️
packages/salto/src/workspace/config.ts 95.38% <100%> (ø) ⬆️
packages/salto/src/parser/internal/worker.ts 100% <100%> (ø)
packages/cli/src/argparser.ts 97.61% <100%> (+0.02%) ⬆️
packages/cli/src/cli.ts 97.43% <100%> (+0.13%) ⬆️
packages/salto/src/parser/parse.ts 98.63% <100%> (ø) ⬆️
packages/salto/src/parser/dump.ts 97.26% <100%> (ø) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4f0944...2fd498f. Read the comment docs.

- Moved calls to WASM parser to a different thread to avoid blocking the main thread
- Changed input for parse from buffer to string because in fact we always needed string
const loadWasmModule = async (): Promise<WebAssembly.Module> => {
// Relative path from source location
const modulePath = path.join(__dirname, '..', '..', '..', 'hcl.wasm')
const data = await readFile(modulePath)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaks packaging...

This here should specifically be fs.readFile or fs.readFileSync since we're reading a bundled "resource", it should be read from the bundle by the fs methods patched by nexe. file.ts uses the fs.promises api which is not patched.

Probably best to add the above statement as a comment there :)

Copy link
Contributor Author

@ori-moisis ori-moisis Nov 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the readFileSync was a mistake there. will revert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promisify on fs.readFile seems to work

// Generate call ID
const currMaxID = _.max([0, ...this.inFlightIDs]) as number
const callID = currMaxID + 1
this.inFlightIDs.add(callID)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this simplifies - instead of having both inflightIds and activeCalls store a single map of callId to { worker id, return func } ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't always need the active calls map (single threaded is still an option) but we always need the call IDs.

return context.return as HclReturn
} finally {
// cleanup call context from global scope
delete hclParserCall[callId]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above regarding dynamic delete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used for communication with go. not sure we can / want to change how that works as part of this change

/* istanbul ignore next */
if (parentPort !== null) {
const parent = parentPort
const handleCall = async ([func, args]: ['call' | 'stop', unknown[]]): Promise<void> => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can type it:

['stop'] | ['call', wasmModule: WebAssembly.Module, context: HclCallContext]

singleThreaded: boolean
}
export const initializeCore = async (
args: InitializeArgs = { singleThreaded: true },
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the default value here causes tests to run in single-threaded mode?

If so, seems a bit implicit and buried, maybe add jest global setup?

Copy link
Contributor Author

@ori-moisis ori-moisis Nov 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value here causes the parser to run the single threaded mode by default, that applies to most tests as well as the vscode extension and generally anywhere else that might want to use the parser other than the CLI.

Unfortunately jest global setup doesn't run in every jest process, only in the main one, so if we change the default (which I think we should not, at least until we see what it does to the extension) and try to use it, we still end up with threaded parsers in the jest workers and jest times out on a lot of tests.
There is a different method of using setup files that inject an additional setup for test every file in a project, but that would have to be implemented separately for every project in the repository as far as I can understand and IMO it won't really make it any less hidden because you'd have to know about that feature in jest and look for it specifically

const go = new Go()
go.env = GO_ENV
// eslint-disable-next-line no-undef
const inst = await WebAssembly.instantiate(wasmModule, go.importObject)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we reuse the instance per worker?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Looks like this is because the go main can only handle a single call. If so, is instantiating the module per call time consuming? I think it can be solved by applying the solution suggested here that we talked about, which will remove the callId mechanism, removing one stack of logic...

Copy link
Contributor Author

@ori-moisis ori-moisis Nov 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once the module is loaded, instantiating doesn't take very long (comparing to the amount of time it takes to run the go code), around 25-40ms IIRC.
edit: actually the number I had in mind was with the actual parsing for a small file, so the instanciate probably takes less...

}

// The following code only runs in the worker thread so it is never reported for coverage
/* istanbul ignore next */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the complete lack of coverage here might be slightly dangerous. is there a way to run this code in tests, in "mock worker" mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is covered, just not reported because, as the comment says, jest doesn't collect coverage data from the worker thread

@@ -13,7 +13,7 @@ func convertAllDiagnostics(
parserErrs hcl.Diagnostics,
traversalErrs hcl.Diagnostics,
) []interface{} {
result := make([]interface{}, 0, len(parserErrs) + len(traversalErrs))
result := make([]interface{}, 0, len(parserErrs)+len(traversalErrs))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this gofmt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@ori-moisis ori-moisis closed this Nov 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants