Skip to content

H2.0 runner memory optimization spike #5256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
jbrown-xentity opened this issue May 19, 2025 · 3 comments
Closed
1 task

H2.0 runner memory optimization spike #5256

jbrown-xentity opened this issue May 19, 2025 · 3 comments
Assignees
Labels
H2.0/Harvest-Runner Harvest Source Processing for Harvesting 2.0

Comments

@jbrown-xentity
Copy link
Contributor

Purpose

We want to optimize memory usage on cloud.gov, but we're not sure what the current process will require from the system.

Given above question, conducting testing is needed to provide factual knowledge on future steps.

1 day of effort has been allocated and once compete, findings will be demonstrated and specific future actions will be decided.

Acceptance Criteria

[ACs should be clearly demo-able/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN our largest source (IOOS) is harvestable
    WHEN 1 day expires
    THEN the amount of required memory to successfully harvest the source is known
    AND a memory increase recommendation is made
    AND any optimization/fixes are proposed (if necessary)

Background

See https://datagov-harvest-admin-dev.app.cloud.gov/harvest_source/554d15db-6080-4441-b4b2-d045451d6967, currently crashing regularly.
May want to investigate (if memory usage is high) what a "typical" source memory requirements are, and consider S/M/L approach.
May relate or be blocked by #5254

Sketch

Start at 2G, and increase as failures occur. Report success when import starts. Review code for possible optimizations.

@jbrown-xentity jbrown-xentity self-assigned this May 19, 2025
@jbrown-xentity jbrown-xentity added the H2.0/Harvest-Runner Harvest Source Processing for Harvesting 2.0 label May 19, 2025
@jbrown-xentity jbrown-xentity moved this to 🏗 In Progress [8] in data.gov team board May 19, 2025
@jbrown-xentity
Copy link
Contributor Author

So the harvester never made it past the compare stage. It ran for 5.1 hours, and the memory usage climbed slowly and consistently. You can see that usage here.
Some unfortunate things:

  • There was no logging during the extraction process, so I had no way of knowing how far we had come, and how much further we had to go.
  • We did get logging that the extraction completed, but it ran out of memory at the hashing step. It was already right at the limit of 3 G, so unclear how much more headroom it'll need.
  • I downloaded a metadata file from the WAF. It was 129 KB. That's 0.129 MB. 35K of those is 3500 MB, or 3.5 G. So just downloading is roughly 3.5 G.
  • We have 7 sources with > 10K datasets. Of those 2 are DCAT-US, and 5 are WAF.

In order to get the largest sources working, we probably will need at least 4 G. However we probably don't need that for most jobs. I'd like to have a working session next week to consider implementation of the t-shirt size of harvest sources to size the jobs accordingly. The logic will be a bit more complex, but not that complex.
5G test is ongoing, expect it to take at least 5 hours to extract from the source and then possibly longer to sync with CKAN. Will review statistics in the morning.

@FuhuXia
Copy link
Member

FuhuXia commented May 22, 2025

shocking

Image

@jbrown-xentity
Copy link
Contributor Author

Unfortunately this job has still not finished. More than that, it hasn't moved on. It hasn't changed it's memory significantly since 4pm yesterday. New Relic Logs show that the task never made it past the external records prep. However, as of this writing the task is still running and taking up memory.

I've made a ticket to follow up on this spike, as I don't believe the current solution is tenable for a large source like IOOS. #5261. We will discuss further in an office hours if the proposed sketch is appropriate, and possible other optimizations.

@jbrown-xentity jbrown-xentity moved this from 🏗 In Progress [8] to 👀 Needs Review [2] in data.gov team board May 23, 2025
@neilmb neilmb moved this from 👀 Needs Review [2] to ✔ Done in data.gov team board Jun 2, 2025
@neilmb neilmb closed this as completed by moving to ✔ Done in data.gov team board Jun 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
H2.0/Harvest-Runner Harvest Source Processing for Harvesting 2.0
Projects
Status: ✔ Done
Development

No branches or pull requests

3 participants