Skip to content

fix support for litserve>0.2.4 #1994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Apr 15, 2025
Merged

fix support for litserve>0.2.4 #1994

merged 34 commits into from
Apr 15, 2025

Conversation

ali-alshaar7
Copy link
Contributor

@ali-alshaar7 ali-alshaar7 commented Apr 3, 2025

LitGPT CI fails with latest LitServe (>=0.2.4), this PR fixes that.

The real issue was that LitGPT was always failing with LitServe but the tests were flaky and they passed. Post 0.2.4, LitServe added the mechanism to check if the inference worker is alive before starting the uvicorn worker and it made the LitGPT CI fail.

@ali-alshaar7 ali-alshaar7 changed the title upgrade lit serve upgrade litserve Apr 3, 2025
@t-vi
Copy link
Collaborator

t-vi commented Apr 3, 2025

Do we expect the serving to work on macos? that seems to be failing.

@ali-alshaar7
Copy link
Contributor Author

Do we expect the serving to work on macos? that seems to be failing.

Yeah, passes locally.

@ali-alshaar7 ali-alshaar7 force-pushed the alia/lsup branch 2 times, most recently from 0133554 to 7270c23 Compare April 4, 2025 02:15
Copy link
Contributor

@aniketmaurya aniketmaurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this @ali-alshaar7!! thank you so much 🚀

Also, can we set the version >=0.2.7? This way we will know if in case LitGPT stops working for a new version of LitServe again?

@Borda
Copy link
Member

Borda commented Apr 7, 2025

@aniketmaurya do you see any reason why it may hang/fail on Mac?

@aniketmaurya
Copy link
Contributor

reran with debugging enabled @Borda! probably some issue with LitGPT setting up the devices. Ali mentioned that it works locally. will debug this in depth later today.

@aniketmaurya
Copy link
Contributor

I get this error locally:

 File "/Users/aniket/Projects/github/litgpt/litgpt/config.py", line 149, in from_file
    return cls(**file_kwargs)
           ^^^^^^^^^^^^^^^^^^
TypeError: Config.__init__() got an unexpected keyword argument 'sliding_window_layer_placing'

investigating more cc @ali-alshaar7 @k223kim

@k223kim
Copy link
Contributor

k223kim commented Apr 7, 2025

@aniketmaurya Did you pull the main branch? That's coming from my gemma PR

@aniketmaurya aniketmaurya changed the title upgrade litserve fix support for litserve>0.2.4 Apr 7, 2025
@t-vi
Copy link
Collaborator

t-vi commented Apr 7, 2025

Also, can we set the version >=0.2.7? This way we will know if in case LitGPT stops working for a new version of LitServe again?

Note that the useful CI relation is opposite to >= dependencies: If we want to declare a >= 0.2.7 (or whatever version should be the minimum) dependency on LitServe, LitServe should ideally gain a test that updates to it don't break LitGPT.
If we don't have that, it seems more prudent to limit LitGPT compat to what we know we work with and then bump the dependency in a PR (that will check compat in the CI tests).

@aniketmaurya
Copy link
Contributor

right @t-vi, we should aim for a test to ensure this!

@Borda
Copy link
Member

Borda commented Apr 7, 2025

we should aim for a test to ensure this!

it did not help so let's revert it back and try to debug it...

@aniketmaurya
Copy link
Contributor

seems like the server process is not terminated properly on macos - ERROR: [Errno 48] Address already in use

@ali-alshaar7
Copy link
Contributor Author

ali-alshaar7 commented Apr 7, 2025

seems like the server process is not terminated properly on macos - ERROR: [Errno 48] Address already in use

Hey, thanks for looking into this. I did add a kill_process_tree which seemed to resolve that on Ubuntu and windows. In any case, it shouldn't fail the test, since if there's already a server at that port, the test can just ping that and pass right?

@aniketmaurya
Copy link
Contributor

aniketmaurya commented Apr 7, 2025

I think uvicorn runs in the main process and rest of the FastAPI logic lies in the child processes so if the child processes die then it won't be reachable.

Maybe try using this logic from LitServe tests for running and stopping the tests?

@ali-alshaar7
Copy link
Contributor Author

ali-alshaar7 commented Apr 7, 2025

I think uvicorn runs in the main process and rest of the FastAPI logic lies in the child processes so if the child processes die then it won't be reachable.

Yeah, so if we kill all the children, we should free the port and be able to spin up a new server and so we won't have this issue right? that's what kill_process_tree does.

@aniketmaurya
Copy link
Contributor

yup, you're right! theoretically that's how things should go.

@Borda
Copy link
Member

Borda commented Apr 7, 2025

@aniketmaurya would be nice if litServe has a function to properly terminate itself :)

@Borda Borda enabled auto-merge (squash) April 15, 2025 10:22
@Borda Borda merged commit 3d66f32 into main Apr 15, 2025
15 checks passed
@Borda Borda deleted the alia/lsup branch April 15, 2025 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants