Shady Minds

Oleksii Diagiliev on computer science and related ..

Java enums in distributed systems

Did you ever think about how hashCode() of java.lang.Enum implemented?

Surprisingly or not it’s

1
2
3
public final int hashCode() {
  return super.hashCode();
}

it returns the Object’s hashCode which is an internal address of the object to a certain extend. From the first glance it totally makes sense since Enum values are singletons.

Now imagine you are building distributed system. Distributed systems use hashCode to

  • determine which worker in a cluster should handle part of a huge job
  • determine which node in a cluster should store given item of dataset (e.g. in distributed cache)

The same Enum instance would give you a different hashCode value in different JVMs/hosts, screwing up your Hadoop job or put/lookup in distributed storage. Just something I faced recently.

Comments