-
Notifications
You must be signed in to change notification settings - Fork 299
Update bytes/bytearray init #787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Two minor comments to think about.
_bytes = new ArrayData<byte>(); | ||
IEnumerator ie = PythonOps.GetEnumerator(context, source); | ||
while (ie.MoveNext()) { | ||
Add(GetByte(ie.Current)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use _bytes.Add
here and avoid the lock on each call? I guess it's hard to answer since our threading is different than CPython's. Maybe the perf difference is negligible...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also regret the performance impact of having to initialize the array one by one (without a length hint!), but this is how CPython works and exactly the reason why we need to lock on each call. The test I added is single-threaded to have a deterministic result, but it is possible to start another thread while the generator is going (e.g. from within the generator). One IronPython's big advantage over CPython is that is doesn't have GIL, but this means that types have to be thread-safe.
The alternative to locking on each call would be to lock the whole body of this method, but I am always uneasy of putting on a lock over code that calls out some user-provided function... Deadlocks start lurking...
Practically, though, we can alleviate the performance hit by providing overloads for most common scenarios. ByteArray
already has overloads for bytes-likes, strings, byte enumerables, and memory. If we add PythonList
and PythonTuple
to the party, I think all practical use cases will be covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Time is probably better spent on getting things working for the moment than on potential performance issues (although we should still try not to be negligent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still true, there is no GIL in IronPython. Indeed, multithreading differences are interesting enough that an explanation in that document would be desirable.
_bytes = new byte[size]; | ||
} | ||
|
||
public Bytes(BigInteger size) : this((int)size) { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should force cast BigInteger
to int
instead of preferring the object overloads? I guess this is part of the whole what to do about integers discussion...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it here to avoid calls with BigInteger
argument bound to object?
overload. It changes TypeError: 'long' object is not iterable
to OverflowError: Arithmetic operation resulted in an overflow.
or OverflowError: Value was either too large or too small for an Int32.
, depending on the value. This better matches CPython's behaviour: OverflowError: cannot fit 'int' into an index-sized integer
.
As for a general approach to such cases, I agree, it is this big long/int (#52) discussion... I started contributing to this project by working on UTF-8 source code support, got carried over to codecs, then StringOps, then realized this depends on buffer protocol, this led to bytes/bytearray, and now I see that it would be good to have int/long sorted out first... It is a big topic, and I am continually thinking about possibilities, but would like to get to some baseline with the holes I have already started digging up.
As for more specific ideas simmering: I too was thinking about forcing cast from BigInteger
to long
and int
instead of binding to object
, but it cannot be done unconditionally. In some (most) cases the out-of-range cast should result in OverflowError
, but in some other, the value should be clipped to MinValue...MaxValue
range. I was thinking this could be done with a dedicated parameter attribute, say ClippingAttribute
or CoercingAttribute
(I think I could not stomach [Clamping]
😉 ). The attribute could have optional constructor parameters to indicate to what value the out-of-range values should be coerced to, with MinValue
and MaxValue
being the defaults. This depends on how the particular function handles magic values. The attribute could also be applicable to other integer types (byte
?), if a use case arises.
In any case, a change like this has a significant impact: practically the whole API will have to be reviewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a cast from BigInteger to int when there is no object overload so I guess the attribute would only need to be used in those cases (although maybe a more general AllowBigIntegerAttribute(Method)
). I'm not sure I could come up with a good example on the public API surface where clamping would be the expected behavior? Anyway, discussion for another time...
Apparently it is possible to modify a
bytearray
while it is being initialized.