-
Notifications
You must be signed in to change notification settings - Fork 1.1k
HashCode of enum cases is not stable #19177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
so currently its the default would this be considered a breaking change to make stable? |
this isn't too wild to implement either, the shared anonymous class of the |
Making it stable would not be a breaking change. |
This stackoverflow post suggests implementing the hashCode like this: ordinal ^ this.getClass.getName.hashCode because
|
I don't see why Scala should be different from Java here. If you need a different hashcode it's very easy to compute it from oridinal. |
why is this a requirement? I would assume no one would use the hashcode as a persistent key |
So, should we close this as "not planned"? |
Spark (and potentially other distributed frameworks) uses hashCode in shuffles in order to assign a partitions and make sure all the data with the same value ends up in the same partition. So for users of such framework it would be a nice convenience if enums just worked out of the box. |
jshell
| Welcome to JShell -- Version 21.0.5
| For an introduction type: /help intro
jshell> enum Foo { A, B, C}
| created enum Foo
jshell> Foo.A.hashCode()
$2 ==> 466505482
jshell
| Welcome to JShell -- Version 21.0.5
| For an introduction type: /help intro
jshell> enum Foo { A, B, C}
| created enum Foo
jshell> Foo.A.hashCode()
$2 ==> 1006485584 java enums do not guarantee a stable hash between runs as far as i can see, (although it is based on System.identityHashCode - so it seems as long as the enum is always the very first object instantiated then it will be stable, otherwise no) but still if its the expectations that enum replaces case object then good to know |
At a core meeting a few weeks ago we decided that we should make the hash code of
|
This issue was picked for the Scala Issue Spree of Monday, May 19th. @mbovel and @mtrigueira will be working on it. |
edit - changed recommendation to without ordinal and add explanation about current behaviour of parameterised values with empty parameters. I've written up my thoughts on this before we tackle this later today. This is written with quite an opinionated style for clarity, but as always is open to challenge and change. All suggestions welcome. BackgroundJavaTo be clear. In Java the Notably the documentation explicitly states:
The documentation also states that The expectation that a ScalaGranted that case classes may return stable hashes across executions. That apache spark abuses hashCode() for partitioning is unfortunate. This breaks our "typiness". One now has a While we should not pander to requirements outside of the remit of the contract, the java contract does not preclude returning a consistent In this case we seem to lose nothing. ObjectionsOne objection I can think of is that it creates the expectation that this restriction will be honoured going forward. Given that enums are by nature enumerable, and integer, I have not been able to envision a scenario where we would be unable to generate a consistent hashCode across executions. It should also be noted that there is no guarantee that hashCode will remain stable across different versions. Different JVM's and different versions of java can change this. We should lean heavily on existing hashCode() infrastructure based on the DRY principle. If this infrastructure changes then the hashCodes will change (E.g. if the default hashing algorithm in scala changed.) (This would break spark anyway since practically all hashCodes would change.) Implementation concept - Include ordinalPresuming a particular implementation basing the hashCode() on the enum declaration AND ordinal : Let's walk through an example:
VersionsThere may exists some oddities across different versions of the enum. Especially if the hashCode has been used as a key. There are some corollaries:
Note in these cases if different behaviour is required in the the LimitationsThis implementation will not provide drop-in compatibility for an almost-equivalent case class. The case class has no concept of ordinality, and the inclusion of ordinality precludes the Implementation concept - Exclude ordinalDifferencesThis changes the outcome of the second corollary when versions change.
BenefitIt would be possible to produce an LimitationsThe equals will change, since the ordinal will still change. This is a recipe for confusion and chaos. However it must be noted that enums are syntactic sugar for case classes (similar to how java enums are sugar for java classes). Parametered enum values (with empty parenthesis) already use the hash of the name only, and ignore the ordinality. Example
HashingThe particular hashing algorithm, I would tend to lean on the existing hashing used in scala rather than any sort of custom algorithm. I don't believe the java hashCode is being used in case classes. I suggest following closely whatever case classes are using for WhenWhen should the hash be calculated is interesting. Since it is constant it only needs to be computed once. If it is fast then it can be done on the fly without the memory overhead of storing it. Looks like case classes store it, trading ram for compute. So will follow that lead. ConclusionAlthough ordinality in an enumeration is part of its contract. It would be better to include the ordinal and change the affected hashCodes if the order changes. However given that the parameterised values (with empty parenthesis) are already resolved into case classes that use only the name hashCode as the hashCode, we should follow this Use an implementation concept that FeedbackAs stated at the start, this is just a starting point, all suggestions welcome. |
Thanks a lot for this summary @mtrigueira! As you noted, for case objects and case classes, the hash code is just the hash of the name: scala3/compiler/src/dotty/tools/dotc/transform/SyntheticMembers.scala Lines 340 to 343 in 66d15b2
scala3/compiler/src/dotty/tools/dotc/transform/SyntheticMembers.scala Lines 359 to 360 in 66d15b2
I think this is a strong argument to do the same for enum members. |
A note when we were working on this - that I did not appreciate before. If you specify parameters (e.g. |
I'll just mention this is what I use as a workaround: trait StableEnum extends scala.reflect.Enum:
override def hashCode: Int =
if (this.getClass.isAnonymousClass())
// uses both the enum class name and
// the case ordinal to get a unique and stable hash
(this.getClass.getName, this.ordinal).hashCode()
else
// uses both the enum class name and
// the product iterator to get a unique and stable hash
(this.getClass.getName, this.productIterator.toList).hashCode()
enum Test extends StableEnum:
case A, B, C
case Foo(arg: Int) |
Uh oh!
There was an error while loading. Please reload this page.
Compiler version
3.3.1
Minimized code
Scastie
Output
The outputted value changes for each run. Most likely the hashCode is being computed from the memory position.
Expectation
The hashCode should be stable across multiple JVMs.
The text was updated successfully, but these errors were encountered: