-
Notifications
You must be signed in to change notification settings - Fork 880
Natively support Memory<byte> #2311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
There seems to be no progress on either of the several approaches to reduce allocations on the publishing side of Kafka. All of them are stuck at the review phase. We are affected by the issue as well and would really like to see how we can push this topic so that it finally gets the attention it deserves. @verdie-g Did you end up forking the project and adding your changes? |
We used a fork with #2219 just to confirm the important performance benefits but we were not ready to maintain a fork. We are still waiting for a review on that PR. |
f0a45e6
to
79bb726
Compare
🎉 All Contributor License Agreements have been signed. Ready to merge. |
edb8d39
to
caf216a
Compare
@Claimundefine @rayokota could you have a look at his PR 🙏 It's been sitting for a while and I believe it fixes an important performance issue in the client. |
63a7bbc
to
30146df
Compare
Closes #1238, #1603, #1725, #2177, #2219, #1782, #2367.
Problem
The current serializer ISerializer forces the user to return a
byte[]
. This is an important performance problem for two reasons:byte[]
. In the following example, a protobuf object is serialized in aMemoryStream
and to send these bytes to the Kafka client, MemoryStream.GetBuffer which returns the underlying buffer can't be used because it would return the data + some extra bytes. SoMemoryStream.ToArray
has to be used, meaning copying the entire buffer.Produce
will be pinned, which cause an important amount of pressure on the GC. If pooling was available, we could create a pool of buffer on the POH (pinned object heap) using GC.AllocateArray.Solution
There are many PRs open aiming to extend the API to allow low allocation produce but they all have some caveats:
Span
but given the constraints of that type, it means creating a whole new set of API.IBufferWriter
instead, which unlikebyte[]
orSpan
, supports non-contiguous memory, kind of likeStream
but it enables a lof of ways to reduce allocations. I don't thinkIBufferWriter
is ideal here because in the end, a continuous memory buffer is expected for both the key and value. Also, the PR is introducing several breaking changes.ArraySegment
s as parameter. This is a little awkward because you need to create aProducer<Null, Null>
and it's also inconsistent with how you would producebyte[]
(usingProducer<byte[], byte[]>
).ArraySegment
which does not break the API and fixes the original problem. Though it would prevent the use ofMemory<byte>
which is a more general type, preferred when dealing with contiguous memory whenSpan<byte>
can't be used. This type is for example used in Stream.WriteAsync, Socket.SendAsync, and also in MemoryPool.Rent.Instead of adding a third type of serializer (after
ISerializer
andIAsyncSerializer
), this PR makes a special case forMemory<byte>
(andReadOnlyMemory<byte>
), just likebyte[]
from the user POV, to directly useMessage.Key
andMessage.Value
.So in the end, you can write
Benchmark
Here is a benchmark using BenchmarkDotNet which serializes a protobuf object of ~10 KB from open-telemetry/opentelemetry-proto and produces it.
ProduceByteArray
copies theMemoryStream
to abyte[]
ProduceMemory
gets a slice of theMemoryStream
with no copy. It's 35% faster and uses 20% less memoryProducePooledMemory
is similar asProduceMemory
but uses Microsoft.IO.RecyclableMemoryStream to show how pooling could work. It's 32% faster and uses 83% less memoryIn real-world the benefits could be way higher because the benchmark doesn't capture the decreased pressure on the GC from the avoided allocation as well as the pinned buffers being reused.
Results
Code
Production test
We have also load tested a prod service that uses kafka extensively.
Before:

After:

What can be observed is that, with the change, on the 3rd step, the latency is greatly better, while before the change, some requests even fail.