-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[Bug]: v1 AsyncLLM model hangs if provided with 2 successive batches of items to process #17385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You called sleep async after the first set of batches. So when you input the second set of batches, the engine is still in sleep mode. |
Ha wait I removed this in my test, let me make sure and edit my example (I had tried manually starting/putting to sleep the core engine). |
I confirm it's also not working when removing the "sleep async" line, I edited the issue to reflect the 2 cases I tried. |
Shouldn't you also await the sleeping async check? |
Hm possibly? Though using However the main issue is not coming from this as the MWE is failing with |
Did you try running the MWE? |
I am outside right now so haven't tried. |
No problem! When you try it, tell me if you can't reproduce, or if you need further logs :) |
I don't have remaining bandwidth today, sorry. @robertgshaw2-redhat @njhill can you help repro/debug? |
@clefourrier you shouldn't reuse this across multiple calls to See: https://docs.python.org/3.9/library/asyncio-task.html#asyncio.run You should structure your code like: async def main():
model = AsyncLLM.from_engine_args(AsyncEngineArgs(**model_args))
# ...
await async_batch(model=model, ...)
# more batch calls ...
if __name__ == "__main__":
asyncio.run(main()) |
So you don't expect the final interface of the AsyncLLM to be similar to the simple LLM or other models which can be called async, for example all the APIs models with litellm/openai/tgi? |
@clefourrier I don't really understand your question. How is it that you're trying to use it? You can many as many async / concurrent batch calls to it as you like, it just needs to be in the context of a single event loop. This is more of a "how python asyncio works" kind of thing than anything to do with the AsyncLLM interface. |
It's highly possible I'm missing something. In evaluation, I usually want to run together requests of a similar type (for example, generations with similar sampling parameters, or requests looking at logprobs vs requests looking at generated text), each is in its own loop, which are not happening at the same time and are not concurrent with the others. I assumed that since using asyncio.run always creates and closes a new event loop, I could run it consecutively, but that might be an issue with my reasoning. |
Ok let me come back to you on this |
Can't you can run consecutively using the same event loop? |
Not trivially given where I'm running this ^^ |
Ok thanks to you both for your help, ended up changing the pipeline code of our eval suite to support this! |
Uh oh!
There was an error while loading. Please reload this page.
Your current environment
Possibly relevant packages from pip list
Possibly relevant env vars
🐛 Describe the bug
Expected behavior: launch an AsyncLLM model, then use it to generate on X successive batches without issues (here, a batch is a group of inputs provided together in the asyncio run call).
Actual behavior: the AsyncLLM models processes the first batch without issue, then hangs on the second batch.
Notes:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: