Update bytes/bytearray init #787

BCSharp · 2020-04-15T05:05:21Z

Apparently it is possible to modify a bytearray while it is being initialized.

slozier

Looks good! Two minor comments to think about.

slozier · 2020-04-15T17:10:09Z

Src/IronPython/Runtime/ByteArray.cs

+            _bytes = new ArrayData<byte>();
+            IEnumerator ie = PythonOps.GetEnumerator(context, source);
+            while (ie.MoveNext()) {
+                Add(GetByte(ie.Current));


Could we use _bytes.Add here and avoid the lock on each call? I guess it's hard to answer since our threading is different than CPython's. Maybe the perf difference is negligible...

I also regret the performance impact of having to initialize the array one by one (without a length hint!), but this is how CPython works and exactly the reason why we need to lock on each call. The test I added is single-threaded to have a deterministic result, but it is possible to start another thread while the generator is going (e.g. from within the generator). One IronPython's big advantage over CPython is that is doesn't have GIL, but this means that types have to be thread-safe.

The alternative to locking on each call would be to lock the whole body of this method, but I am always uneasy of putting on a lock over code that calls out some user-provided function... Deadlocks start lurking...

Practically, though, we can alleviate the performance hit by providing overloads for most common scenarios. ByteArray already has overloads for bytes-likes, strings, byte enumerables, and memory. If we add PythonList and PythonTuple to the party, I think all practical use cases will be covered.

Sounds good. Time is probably better spent on getting things working for the moment than on potential performance issues (although we should still try not to be negligent).

One IronPython's big advantage over CPython is that is doesn't have GIL,

@BCSharp - is this no longer true? Its not listed on the "Differences between IronPython and Cython" page

This is still true, there is no GIL in IronPython. Indeed, multithreading differences are interesting enough that an explanation in that document would be desirable.

slozier · 2020-04-15T20:36:33Z

Src/IronPython/Runtime/Bytes.cs

            _bytes = new byte[size];
        }

+        public Bytes(BigInteger size) : this((int)size) { }


I wonder if we should force cast BigInteger to int instead of preferring the object overloads? I guess this is part of the whole what to do about integers discussion...

I added it here to avoid calls with BigInteger argument bound to object? overload. It changes TypeError: 'long' object is not iterable to OverflowError: Arithmetic operation resulted in an overflow. or OverflowError: Value was either too large or too small for an Int32., depending on the value. This better matches CPython's behaviour: OverflowError: cannot fit 'int' into an index-sized integer.

As for a general approach to such cases, I agree, it is this big long/int (#52) discussion... I started contributing to this project by working on UTF-8 source code support, got carried over to codecs, then StringOps, then realized this depends on buffer protocol, this led to bytes/bytearray, and now I see that it would be good to have int/long sorted out first... It is a big topic, and I am continually thinking about possibilities, but would like to get to some baseline with the holes I have already started digging up.

As for more specific ideas simmering: I too was thinking about forcing cast from BigInteger to long and int instead of binding to object, but it cannot be done unconditionally. In some (most) cases the out-of-range cast should result in OverflowError, but in some other, the value should be clipped to MinValue...MaxValue range. I was thinking this could be done with a dedicated parameter attribute, say ClippingAttribute or CoercingAttribute (I think I could not stomach [Clamping] 😉 ). The attribute could have optional constructor parameters to indicate to what value the out-of-range values should be coerced to, with MinValue and MaxValue being the defaults. This depends on how the particular function handles magic values. The attribute could also be applicable to other integer types (byte?), if a use case arises.

In any case, a change like this has a significant impact: practically the whole API will have to be reviewed.

We already have a cast from BigInteger to int when there is no object overload so I guess the attribute would only need to be used in those cases (although maybe a more general AllowBigIntegerAttribute(Method)). I'm not sure I could come up with a good example on the public API surface where clamping would be the expected behavior? Anyway, discussion for another time...

BCSharp added 3 commits April 14, 2020 11:58

Pass test_from_ssize

81199af

Fix bytearray initializer for using generators

6248bbf

Follow CPython in __length_hint__ use

e9857cd

slozier approved these changes Apr 15, 2020

View reviewed changes

slozier merged commit fb863da into IronLanguages:master Apr 16, 2020

BCSharp deleted the bytes_init branch April 17, 2020 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update bytes/bytearray init #787

Update bytes/bytearray init #787

Uh oh!

BCSharp commented Apr 15, 2020

Uh oh!

slozier left a comment

Uh oh!

slozier Apr 15, 2020

Uh oh!

BCSharp Apr 15, 2020

Uh oh!

slozier Apr 16, 2020

Uh oh!

StingyJack Dec 9, 2023

Uh oh!

BCSharp Dec 10, 2023

Uh oh!

slozier Apr 15, 2020

Uh oh!

BCSharp Apr 15, 2020

Uh oh!

slozier Apr 16, 2020

Uh oh!

Uh oh!

Update bytes/bytearray init #787

Update bytes/bytearray init #787

Uh oh!

Conversation

BCSharp commented Apr 15, 2020

Uh oh!

slozier left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!