Description
For different workloads it might be beneficial to tune the system differently to obtain the best performance. Maybe adding performance tuning guide under "operate" could help users. To motivate this issue here are a few scenarios that need different configuration to perform well:
High load short lived invocations calling other invocations
With the default settings when running a high load workload where invocations call other invocations we can end up in a situation where the callers block slots for the callees to run until the inactivity-timeout kicks in. Under these circumstances one either should reduce the inactivity timeout to clear slots faster or if the service endpoints support more load increase the concurrent-invocations-limit. As part of this we should also document that the invoker has the concept of slots which are occupied by in-flight invocations. More details for the described problem can be found here restatedev/restate#2758.
Invocations with long side effects/steps
For invocations with long lasting side effects/steps (e.g. when querying a LLM), the current inactivity-timeout might be too aggressive. In this case, the system might suspend an invocation that could still have made more progress just because an individual step has taken too long. Here it would be beneficial to increase the inactivity-timeout.