feat: Build cached anchor/alias node list for faster alias resolving #612

PolyMeilex · 2025-03-28T03:10:25Z

Currently, to resolve an alias, we have to visit the whole document tree until the alias node is found. The lower in the file we put an alias, the slower the resolution is.

This is an attempt to speed this process significantly by:

Putting all nodes in a list and iterating over it, this is significantly faster than descending the tree via visitor every time
If we are putting nodes into a list we might as well filter them to only the ones that are needed for alias resolution, so only aliases and anchors are cached

Not sure if this is the right approach, but it helped me reduce the load time of my dataset, which is not very alias-heavy, from 10s to 4s.

eemeli

I can see how this might be helpful. I'd much prefer for most of the work to be done in the Alias resolve() method, though. See inline for further comments.

Would you be able to share some YAML file with which the performance benefits of the change become measurable?

src/nodes/Alias.ts

src/nodes/toJS.ts

PolyMeilex · 2025-03-29T16:43:54Z

Would you be able to share some YAML file with which the performance benefits of the change become measurable?

Had to anonymize it, so it looks kinda goofy, but here is one of the worst offenders in my data set: https://pastebin.com/nPB4hLyT

(usually my files are a lot smaller than this, but with each loaded file it adds up quickly)

eemeli

Getting there, I think. And thank you for the sample data, I'll need to find a bit of time to test how much this improves the performance on it.

src/nodes/Alias.ts

src/nodes/toJS.ts

PolyMeilex · 2025-03-29T19:28:13Z

I'll need to find a bit of time to test how much this improves the performance on it.

Not a proper scientific benchmark, but with the example file this is the difference on my setup:

before

parse: 141.241ms
toJs: 1.684s

after

parse: 145.326ms
toJs: 26.266ms

Also just as a fun fact, caching all nodes without any filtering still gives a significant improvement as we skip the visitor's overhead:

parse: 143.696ms
toJs: 119.525ms

PolyMeilex · 2025-03-30T01:50:57Z

We could also build an anchors map where a key would be the anchor's name and the value would be the node with all its child aliases already resolved (or maybe already JSified). Then if a duplicated anchor name is found it would simply replace the previous one, but I'm assuming that there is some reason why that's not how it's done already, that I'm not aware of.

eemeli

Thank you @PolyMeilex for your contribution!

With you test file, this does provide something like a 12x improvement when I test it locally, though processing it does require adjusting maxAliasCount to avoid detecting the input as a resource exhaustion attack.

We could also build an anchors map where a key would be the anchor's name [...] but I'm assuming that there is some reason why that's not how it's done already, that I'm not aware of.

That's effectively ctx.anchors, but it's not as simple as one might like. The YAML spec allows for fun structures like this:

&foo
key:
- *foo
- &foo 42
- *foo

Here, the resolution of the top-level map depends on resolving all of its contents, and that includes resolving the first *foo as a circular reference to the map itself, as well as resolving the second *foo as the scalar 42, thanks to the anchor re-use.

Getting that to work right was a bit tricky.

PolyMeilex · 2025-03-30T13:28:19Z

That's effectively ctx.anchors, but it's not as simple as one might like. The YAML spec allows for fun structures like this:
&foo
key:
- *foo
- &foo 42
- *foo
Here, the resolution of the top-level map depends on resolving all of its contents, and that includes resolving the first *foo as a circular reference to the map itself, as well as resolving the second *foo as the scalar 42, thanks to the anchor re-use.

Getting that to work right was a bit tricky.

Oh, yeah that's fun, would rather pay the price of wasteful iteration then try to implement this 😅

Thank you for being open to this! I will try to find more low hanging fruit so hopefully we can keep generators, while still making the package usable for my usecase 🚀

feat: Build cached anchor/alias node list for faster alias resolving

13a141c

eemeli requested changes Mar 29, 2025

View reviewed changes

src/nodes/Alias.ts Outdated Show resolved Hide resolved

src/nodes/toJS.ts Outdated Show resolved Hide resolved

src/nodes/toJS.ts Outdated Show resolved Hide resolved

Build cache in alias resolve method

2854cb7

eemeli requested changes Mar 29, 2025

View reviewed changes

src/nodes/Alias.ts Outdated Show resolved Hide resolved

src/nodes/toJS.ts Outdated Show resolved Hide resolved

Simplify resolve implementation

3c5e2a3

Fix typos

a284b8e

eemeli approved these changes Mar 30, 2025

View reviewed changes

eemeli merged commit 55c5ef4 into eemeli:main Mar 30, 2025
21 checks passed

This was referenced Mar 30, 2025

Performance issue on relatively small file #537

Closed

Missing error on parsed document for unresolved alias #497

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Build cached anchor/alias node list for faster alias resolving #612

feat: Build cached anchor/alias node list for faster alias resolving #612

PolyMeilex commented Mar 28, 2025

Uh oh!

eemeli left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PolyMeilex commented Mar 29, 2025 •

edited

Loading

Uh oh!

eemeli left a comment

Uh oh!

Uh oh!

Uh oh!

PolyMeilex commented Mar 29, 2025 •

edited

Loading

Uh oh!

PolyMeilex commented Mar 30, 2025 •

edited

Loading

Uh oh!

eemeli left a comment •

edited

Loading

Uh oh!

Uh oh!

PolyMeilex commented Mar 30, 2025

Uh oh!

Uh oh!

Uh oh!

feat: Build cached anchor/alias node list for faster alias resolving #612

feat: Build cached anchor/alias node list for faster alias resolving #612

Conversation

PolyMeilex commented Mar 28, 2025

Uh oh!

eemeli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PolyMeilex commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eemeli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

PolyMeilex commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PolyMeilex commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eemeli left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PolyMeilex commented Mar 30, 2025

Uh oh!

Uh oh!

PolyMeilex commented Mar 29, 2025 •

edited

Loading

PolyMeilex commented Mar 29, 2025 •

edited

Loading

PolyMeilex commented Mar 30, 2025 •

edited

Loading

eemeli left a comment •

edited

Loading