Skip to content

HashCode of enum cases is not stable #19177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Lasering opened this issue Dec 3, 2023 · 15 comments · May be fixed by #23218
Open

HashCode of enum cases is not stable #19177

Lasering opened this issue Dec 3, 2023 · 15 comments · May be fixed by #23218
Labels
area:enums itype:enhancement Spree Suitable for a future Spree

Comments

@Lasering
Copy link

Lasering commented Dec 3, 2023

Compiler version

3.3.1

Minimized code

enum Test:
  case A

Test.A.hashCode

Scastie

Output

The outputted value changes for each run. Most likely the hashCode is being computed from the memory position.

Expectation

The hashCode should be stable across multiple JVMs.

@Lasering Lasering added itype:bug stat:needs triage Every issue needs to have an "area" and "itype" label labels Dec 3, 2023
@bishabosha
Copy link
Member

bishabosha commented Dec 3, 2023

so currently its the default java.lang.Object hashcode - this does make it a departure from case objects which singleton cases are meant to replace (when case objects are used as part of an ADT).

would this be considered a breaking change to make stable?

@bishabosha bishabosha added itype:enhancement area:enums and removed stat:needs triage Every issue needs to have an "area" and "itype" label itype:bug labels Dec 3, 2023
@bishabosha
Copy link
Member

this isn't too wild to implement either, the shared anonymous class of the $new factory can derive the hashCode from the name parameter (do we want to only mix in name, or should ordinal count too?)

@sjrd
Copy link
Member

sjrd commented Dec 3, 2023

Making it stable would not be a breaking change.

@Lasering
Copy link
Author

Lasering commented Dec 3, 2023

This stackoverflow post suggests implementing the hashCode like this:

ordinal ^ this.getClass.getName.hashCode

because

would be deterministic across different JVMs. It would even work a bit better, since the least significant bits would "change as much as possible"

@mbovel mbovel added exp:intermediate Spree Suitable for a future Spree labels Feb 19, 2024
@odersky
Copy link
Contributor

odersky commented Feb 27, 2024

I don't see why Scala should be different from Java here. If you need a different hashcode it's very easy to compute it from oridinal.

@bishabosha
Copy link
Member

The hashCode should be stable across multiple JVMs.

why is this a requirement? I would assume no one would use the hashcode as a persistent key

@mbovel mbovel removed the Spree Suitable for a future Spree label Apr 9, 2024
@mbovel
Copy link
Member

mbovel commented Apr 9, 2024

So, should we close this as "not planned"?

@eejbyfeldt
Copy link

Spark (and potentially other distributed frameworks) uses hashCode in shuffles in order to assign a partitions and make sure all the data with the same value ends up in the same partition. So for users of such framework it would be a nice convenience if enums just worked out of the box.

@mbovel mbovel added stat:needs decision Some aspects of this issue need a decision from the maintainance team. and removed exp:intermediate labels Feb 14, 2025
@bishabosha
Copy link
Member

bishabosha commented May 5, 2025

I don't see why Scala should be different from Java here.

jshell
|  Welcome to JShell -- Version 21.0.5
|  For an introduction type: /help intro

jshell> enum Foo { A, B, C}
|  created enum Foo

jshell> Foo.A.hashCode()
$2 ==> 466505482

jshell
|  Welcome to JShell -- Version 21.0.5
|  For an introduction type: /help intro

jshell> enum Foo { A, B, C}
|  created enum Foo

jshell> Foo.A.hashCode()
$2 ==> 1006485584

java enums do not guarantee a stable hash between runs as far as i can see, (although it is based on System.identityHashCode - so it seems as long as the enum is always the very first object instantiated then it will be stable, otherwise no)

but still if its the expectations that enum replaces case object then good to know

@sjrd
Copy link
Member

sjrd commented May 7, 2025

At a core meeting a few weeks ago we decided that we should make the hash code of enum values stable.

  • There shouldn't be any downside to it
  • It makes it consistent with enum class cases, whose hash code is stable assuming the hash codes of its components are stable
  • It preserves the status quo of case objects.

@Gedochao Gedochao removed the stat:needs decision Some aspects of this issue need a decision from the maintainance team. label May 7, 2025
@mbovel mbovel added Spree Suitable for a future Spree stat:needs decision Some aspects of this issue need a decision from the maintainance team. and removed stat:needs decision Some aspects of this issue need a decision from the maintainance team. labels May 7, 2025
@mbovel
Copy link
Member

mbovel commented May 16, 2025

This issue was picked for the Scala Issue Spree of Monday, May 19th. @mbovel and @mtrigueira will be working on it.

@mtrigueira
Copy link

mtrigueira commented May 19, 2025

edit - changed recommendation to without ordinal and add explanation about current behaviour of parameterised values with empty parameters.

I've written up my thoughts on this before we tackle this later today. This is written with quite an opinionated style for clarity, but as always is open to challenge and change. All suggestions welcome.

Background

Java

To be clear. In Java the hashCode() method is inherited from Object. It has a special relationship with equals(). If equals() is true then the hashCodes must also be equal and since hashCode() returns a primitive also ==. A hashCode() for two objects can be == but the objects themselves not be equals().

Notably the documentation explicitly states:

The documentation also states that hashCode() is for the benefit of hash tables. As such it is a sort of "exposed" memento.

The expectation that a hashCode() will remain stable is not required.

Scala

Granted that case classes may return stable hashes across executions.

That apache spark abuses hashCode() for partitioning is unfortunate. This breaks our "typiness". One now has a hashCode() contract that does not require the behaviour relied upon. The spark documentation laments this (e.g. HashPartitioner).

While we should not pander to requirements outside of the remit of the contract, the java contract does not preclude returning a consistent hashCode() across executions.

In this case we seem to lose nothing.

Objections

One objection I can think of is that it creates the expectation that this restriction will be honoured going forward. Given that enums are by nature enumerable, and integer, I have not been able to envision a scenario where we would be unable to generate a consistent hashCode across executions.

It should also be noted that there is no guarantee that hashCode will remain stable across different versions. Different JVM's and different versions of java can change this. We should lean heavily on existing hashCode() infrastructure based on the DRY principle. If this infrastructure changes then the hashCodes will change (E.g. if the default hashing algorithm in scala changed.)

(This would break spark anyway since practically all hashCodes would change.)

Implementation concept - Include ordinal

Presuming a particular implementation basing the hashCode() on the enum declaration AND ordinal :

Let's walk through an example:

  • Given an enum Foo { A, B, C }
  • >Foo.A.hashCode() returns e.g. 466505482
  • The enum is changed to enum Foo { A, D, B, C }
  • >Foo.A.hashCode() should still return the same example value 466505482
  • Similarly in both versions B.hashCode() should return the same values.

Versions

There may exists some oddities across different versions of the enum. Especially if the hashCode has been used as a key.

There are some corollaries:

  • If an enum name is changed (e.g. enum Foo { Q, B, C }) the hashCode() for Q will probably not match A but B and C will be the same.
  • If the order is changed (e.g. enum Foo { C, B, A } ) the ordinal values will change and the hashCode() will probably change for C and A, but B will remain the same.

Note in these cases if different behaviour is required in the the hashCode() could be overridden and explicit values returned. (e.g. it may be desired for the newer Q to return the same hashCode() as the older A). I don't think we should explicitly cater for this scenario.

Limitations

This implementation will not provide drop-in compatibility for an almost-equivalent case class. The case class has no concept of ordinality, and the inclusion of ordinality precludes the hashCodes() of the case class from being == (except accidentally).

Implementation concept - Exclude ordinal

Differences

This changes the outcome of the second corollary when versions change.

  • If the order is changed (e.g. enum Foo { C, B, A } ) the ordinal values will change and the hashCode()s will all remain the same.

Benefit

It would be possible to produce an == hashCode for an equivalent case class + objects if we used the same algorithm.

Limitations

The equals will change, since the ordinal will still change. older Foo.A.equals(newer Foo.A) would be false. But older Foo.A.hashCode() == newer Foo.A.hashCode(). Worse, in the spark scenario, these would be used for persistence. So you will find A and C in the partition where the hashCodes are equal, and disambiguation would be painful. Especially adding a second change that affects order, this could be come incalculable without another point of reference.

This is a recipe for confusion and chaos. Do not recommend.

However it must be noted that enums are syntactic sugar for case classes (similar to how java enums are sugar for java classes). Parametered enum values (with empty parenthesis) already use the hash of the name only, and ignore the ordinality.

Example

  • enum Foo { A, B } then Foo.A.hashCode() returns e.g. 97
  • enum Foo { A(), B } then Foo.A.hashCode() would return the same value 97
  • enum Foo { B(), A } then Foo.A.hashCode() would return the same value 97

Hashing

The particular hashing algorithm, I would tend to lean on the existing hashing used in scala rather than any sort of custom algorithm. I don't believe the java hashCode is being used in case classes. I suggest following closely whatever case classes are using for hashCode(), and adding in ordinal / or not, this would effectively include the parameter values of the enumeration which I believe we should, and not just the name.

When

When should the hash be calculated is interesting. Since it is constant it only needs to be computed once. If it is fast then it can be done on the fly without the memory overhead of storing it. Looks like case classes store it, trading ram for compute. So will follow that lead.

Conclusion

Although ordinality in an enumeration is part of its contract. It would be better to include the ordinal and change the affected hashCodes if the order changes. However given that the parameterised values (with empty parenthesis) are already resolved into case classes that use only the name hashCode as the hashCode, we should follow this

Use an implementation concept that includes excludes the ordinal, and uses the hashCode of the name as the hashCode of the enum value, following closely what is done for case classes hashCode().

Feedback

As stated at the start, this is just a starting point, all suggestions welcome.

@mbovel
Copy link
Member

mbovel commented May 19, 2025

Thanks a lot for this summary @mtrigueira!

As you noted, for case objects and case classes, the hash code is just the hash of the name:

* A `case object C` or a `case class C()` without parameters gets the `hashCode` method
* ```
* def hashCode: Int = "C".hashCode // constant folded
* ```

def chooseHashcode(using Context) =
if (accessors.isEmpty) Literal(Constant(ownName.hashCode))

I think this is a strong argument to do the same for enum members.

@mtrigueira
Copy link

A note when we were working on this - that I did not appreciate before. If you specify parameters (e.g. enum Foo { A() }) then the compiler de-sugars into a case class anyway and the values are constant. So this fix only needs to deal with the parameter-less case.

@soronpo
Copy link
Contributor

soronpo commented May 19, 2025

I'll just mention this is what I use as a workaround:

trait StableEnum extends scala.reflect.Enum:
  override def hashCode: Int =
    if (this.getClass.isAnonymousClass())
      // uses both the enum class name and
      // the case ordinal to get a unique and stable hash
      (this.getClass.getName, this.ordinal).hashCode()
    else
      // uses both the enum class name and
      // the product iterator to get a unique and stable hash
      (this.getClass.getName, this.productIterator.toList).hashCode()

enum Test extends StableEnum:
  case A, B, C
  case Foo(arg: Int)

@mbovel mbovel linked a pull request May 21, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:enums itype:enhancement Spree Suitable for a future Spree
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants