Skip to content

Update bytes/bytearray init #787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 16, 2020
Merged

Conversation

BCSharp
Copy link
Member

@BCSharp BCSharp commented Apr 15, 2020

Apparently it is possible to modify a bytearray while it is being initialized.

Copy link
Contributor

@slozier slozier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Two minor comments to think about.

_bytes = new ArrayData<byte>();
IEnumerator ie = PythonOps.GetEnumerator(context, source);
while (ie.MoveNext()) {
Add(GetByte(ie.Current));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use _bytes.Add here and avoid the lock on each call? I guess it's hard to answer since our threading is different than CPython's. Maybe the perf difference is negligible...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also regret the performance impact of having to initialize the array one by one (without a length hint!), but this is how CPython works and exactly the reason why we need to lock on each call. The test I added is single-threaded to have a deterministic result, but it is possible to start another thread while the generator is going (e.g. from within the generator). One IronPython's big advantage over CPython is that is doesn't have GIL, but this means that types have to be thread-safe.

The alternative to locking on each call would be to lock the whole body of this method, but I am always uneasy of putting on a lock over code that calls out some user-provided function... Deadlocks start lurking...

Practically, though, we can alleviate the performance hit by providing overloads for most common scenarios. ByteArray already has overloads for bytes-likes, strings, byte enumerables, and memory. If we add PythonList and PythonTuple to the party, I think all practical use cases will be covered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Time is probably better spent on getting things working for the moment than on potential performance issues (although we should still try not to be negligent).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One IronPython's big advantage over CPython is that is doesn't have GIL,

@BCSharp - is this no longer true? Its not listed on the "Differences between IronPython and Cython" page

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still true, there is no GIL in IronPython. Indeed, multithreading differences are interesting enough that an explanation in that document would be desirable.

_bytes = new byte[size];
}

public Bytes(BigInteger size) : this((int)size) { }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should force cast BigInteger to int instead of preferring the object overloads? I guess this is part of the whole what to do about integers discussion...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it here to avoid calls with BigInteger argument bound to object? overload. It changes TypeError: 'long' object is not iterable to OverflowError: Arithmetic operation resulted in an overflow. or OverflowError: Value was either too large or too small for an Int32., depending on the value. This better matches CPython's behaviour: OverflowError: cannot fit 'int' into an index-sized integer.

As for a general approach to such cases, I agree, it is this big long/int (#52) discussion... I started contributing to this project by working on UTF-8 source code support, got carried over to codecs, then StringOps, then realized this depends on buffer protocol, this led to bytes/bytearray, and now I see that it would be good to have int/long sorted out first... It is a big topic, and I am continually thinking about possibilities, but would like to get to some baseline with the holes I have already started digging up.

As for more specific ideas simmering: I too was thinking about forcing cast from BigInteger to long and int instead of binding to object, but it cannot be done unconditionally. In some (most) cases the out-of-range cast should result in OverflowError, but in some other, the value should be clipped to MinValue...MaxValue range. I was thinking this could be done with a dedicated parameter attribute, say ClippingAttribute or CoercingAttribute (I think I could not stomach [Clamping] 😉 ). The attribute could have optional constructor parameters to indicate to what value the out-of-range values should be coerced to, with MinValue and MaxValue being the defaults. This depends on how the particular function handles magic values. The attribute could also be applicable to other integer types (byte?), if a use case arises.

In any case, a change like this has a significant impact: practically the whole API will have to be reviewed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a cast from BigInteger to int when there is no object overload so I guess the attribute would only need to be used in those cases (although maybe a more general AllowBigIntegerAttribute(Method)). I'm not sure I could come up with a good example on the public API surface where clamping would be the expected behavior? Anyway, discussion for another time...

@slozier slozier merged commit fb863da into IronLanguages:master Apr 16, 2020
@BCSharp BCSharp deleted the bytes_init branch April 17, 2020 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants