The hashCode()
and equals()
methods
The collection classes interact with the objects they contain. Somehow the set needs to know whether an object that it contains already exists or not.
Some of the collections, such as ArrayList
or HashSet
use methods defined by the Object
class, such as the equals()
and the hashCode()
methods. Some other types of collections make use of the Comparable
or Comparator
interfaces to determine whether an element already exists or not and where to store the element in the collection.
Table of contents
- The
List
and theequals()
method - Hash based Collections and the
equals()
andhashCode()
methods - How does the
hashCode()
method effect performance?
The List
and the equals()
method
Consider the following example.
package demo;
import lombok.AllArgsConstructor;
import java.util.ArrayList;
import java.util.List;
public class App {
@AllArgsConstructor
public static class Person {
private String name;
@Override
public String toString() {
return name;
}
}
public static void main( String[] args ) {
final List<Person> list = new ArrayList<>();
final Person aden = new Person( "Aden" );
list.add( aden );
System.out.printf( "Is %s in the list? %s%n", aden, list.contains( aden ) );
final Person search = new Person( "Aden" );
System.out.printf( "Is %s in the list? %s%n", search, list.contains( search ) );
}
}
The program is able to first a match for the first example, but fails on the second try.
Is Aden in the list? true
Is Aden in the list? false
This is happening as the equals()
is not overridden. The Object
’s version of the relies on the ==
operator. In the first case, we used the same object instance that is available in the list to search. The ==
operator returns true
then, as that’s the same object in the Java heap. In the second case, we have two objects that have the same value. Given that these are two objects in the Java heap, the ==
operator will return false
.
In order to address this problem we need to override the equals()
method.
@Override
public boolean equals( final Object object ) {
if ( this == object )
return true;
if ( !( object instanceof Person ) )
return false;
final Person other = (Person) object;
return Objects.equals( name, other.name );
}
Following is a complete example.
hashCode()
method. While it works for this example, this is highly discouraged and considered as bad programming practice.package demo;
import lombok.AllArgsConstructor;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
public class App {
@AllArgsConstructor
public static class Person {
private String name;
@Override
public boolean equals( final Object object ) {
if ( this == object )
return true;
if ( !( object instanceof Person ) )
return false;
final Person other = (Person) object;
return Objects.equals( name, other.name );
}
@Override
public String toString() {
return name;
}
}
public static void main( String[] args ) {
final List<Person> list = new ArrayList<>();
final Person aden = new Person( "Aden" );
list.add( aden );
System.out.printf( "Is %s in the list? %s%n", aden, list.contains( aden ) );
final Person search = new Person( "Aden" );
System.out.printf( "Is %s in the list? %s%n", search, list.contains( search ) );
}
}
This seems to be have fixed the problem.
Is Aden in the list? true
Is Aden in the list? true
The above example is incorrect as we only overrode the equals()
method. The relation between these the equals()
and the hashCode()
methods is so strong that the Effective Java book has an item about this, Item 11: Always override hashCode when you override equals.
When we override the equals()
we should also override the hashCode()
method.
@Override
public int hashCode() {
return Objects.hash( name );
}
Following is a correct example.
package demo;
import lombok.AllArgsConstructor;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
public class App {
@AllArgsConstructor
public static class Person {
private String name;
@Override
public boolean equals( final Object object ) {
if ( this == object )
return true;
if ( !( object instanceof Person ) )
return false;
final Person other = (Person) object;
return Objects.equals( name, other.name );
}
@Override
public int hashCode() {
return Objects.hash( name );
}
@Override
public String toString() {
return name;
}
}
public static void main( String[] args ) {
final List<Person> list = new ArrayList<>();
final Person aden = new Person( "Aden" );
list.add( aden );
System.out.printf( "Is %s in the list? %s%n", aden, list.contains( aden ) );
final Person search = new Person( "Aden" );
System.out.printf( "Is %s in the list? %s%n", search, list.contains( search ) );
}
}
Hash based Collections and the equals()
and hashCode()
methods
Consider the following example.
package demo;
import lombok.AllArgsConstructor;
import java.util.HashSet;
import java.util.Set;
public class App {
@AllArgsConstructor
public static class Key {
private final String value;
}
public static void main( String[] args ) {
final Set<Key> set = new HashSet<>();
set.add( new Key( "x" ) );
set.add( new Key( "x" ) );
System.out.printf( "Set contains %d elements%n", set.size() );
}
}
The inner class Key
has one constant field of type String
. The Key
class is immutable and once created its state cannot be changed. While this seems to be a good candidate for sets, or as a map key, the above example will not work as expected. Despite the fact that both Key
instances have the same value, the Set
treats them as different objects.
Set contains 2 elements
This is because the equals()
and hashCode()
methods were not overloaded.
Implement
equals()
method@Override public boolean equals( final Object object ) { if ( this == object ) return true; if ( !( object instanceof Key ) ) return false; final Key other = (Key) object; return Objects.equals( value, other.value ); }
Implement
hashCode()
method@Override public int hashCode() { return Objects.hash( value ); }
Complete example
package demo;
import lombok.AllArgsConstructor;
import java.util.HashSet;
import java.util.Objects;
import java.util.Set;
public class App {
@AllArgsConstructor
public static class Key {
private final String value;
@Override
public boolean equals( final Object object ) {
if ( this == object )
return true;
if ( !( object instanceof Key ) )
return false;
final Key key = (Key) object;
return Objects.equals( value, key.value );
}
@Override
public int hashCode() {
return Objects.hash( value );
}
}
public static void main( String[] args ) {
final Set<Key> set = new HashSet<>();
set.add( new Key( "x" ) );
set.add( new Key( "x" ) );
System.out.printf( "Set contains %d elements%n", set.size() );
}
}
Now that both the equals()
and hashCode()
methods are implements, the hash based set and map will work as expected.
Set contains 1 elements
How does the hashCode()
method effect performance?
The performance of the hash-based collections is directly related to how well distributed the elements are in its buckets. A poorly implemented hashCode()
method may drive the performance of such collections down.
The hashCode()
method must return the same value for two objects that are considered equal. Two objects that are not equal can return the same hash code value.
Consider the following inefficient example.
package demo;
import lombok.AllArgsConstructor;
import org.apache.commons.lang3.time.StopWatch;
import java.util.HashSet;
import java.util.Objects;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
public class App {
@AllArgsConstructor
public static class Key {
private final String value;
@Override
public boolean equals( final Object object ) {
if ( this == object )
return true;
if ( !( object instanceof Key ) )
return false;
final Key key = (Key) object;
return Objects.equals( value, key.value );
}
@Override
public int hashCode() {
/* ⚠️ This will cause the hash-based collections to perform poorly!! */
return 1;
}
}
public static void main( String[] args ) {
StopWatch stopWatch = new StopWatch();
stopWatch.start();
final Set<Key> set = createSet( 1_000 );
stopWatch.stop();
System.out.printf( "Creating a set of %d unique elements took %d milliseconds%n", set.size(), stopWatch.getTime() );
}
private static Set<Key> createSet( final int size ) {
return IntStream
.range( 0, size )
.mapToObj( i -> new Key( String.format( "Element %d", i ) ) )
.collect( Collectors.toCollection( HashSet::new ) )
;
}
}
Creating a set of 1,000 elements takes few milliseconds but creating another set with 100,000 elements takes minutes. The following table compares several set sizes and the time it takes to create such tests with a poorly implemented hashCode()
method.
Size | Time Taken |
---|---|
1,000 | 102 milliseconds |
10,000 | 3 seconds |
25,000 | 15 seconds |
50,000 | 93 seconds |
100,000 | 8 minutes |
The time take to populate the set grows exponential to the size of the set as shown in the following graph.
The data of the above graph was derived from the following data.
Creating a set of 1000 unique elements took 102 milliseconds
Creating a set of 10000 unique elements took 2527 milliseconds
Creating a set of 25000 unique elements took 15355 milliseconds
Creating a set of 50000 unique elements took 92971 milliseconds
Creating a set of 100000 unique elements took 494658 milliseconds
Using the Objects.hashCode()
method to compute the hash will improve the performance drastically.
Consider the following example.
package demo;
import lombok.AllArgsConstructor;
import org.apache.commons.lang3.time.StopWatch;
import java.util.HashSet;
import java.util.Objects;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
public class App {
@AllArgsConstructor
public static class Key {
private final String value;
@Override
public boolean equals( final Object object ) {
if ( this == object )
return true;
if ( !( object instanceof Key ) )
return false;
final Key key = (Key) object;
return Objects.equals( value, key.value );
}
@Override
public int hashCode() {
/**/ return Objects.hashCode( value );
}
}
public static void main( String[] args ) {
StopWatch stopWatch = new StopWatch();
stopWatch.start();
final Set<Key> set = createSet( 1_000 );
stopWatch.stop();
System.out.printf( "Creating a set of %d unique elements took %d milliseconds%n", set.size(), stopWatch.getTime() );
}
private static Set<Key> createSet( final int size ) {
return IntStream
.range( 0, size )
.mapToObj( i -> new Key( String.format( "Element %d", i ) ) )
.collect( Collectors.toCollection( HashSet::new ) )
;
}
}
Using the default implementation, inserting 100,000 unique elements takes less than a second ,as shown in the following table.
Size | Poor Implementation | Default Implementation |
---|---|---|
1,000 | 102 milliseconds | 78 milliseconds |
10,000 | 3 seconds | 235 milliseconds |
25,000 | 15 seconds | 296 milliseconds |
50,000 | 93 seconds | 415 milliseconds |
100,000 | 8 minutes | 657 milliseconds |
The difference in the hash-based collection is captured by the following graph.
The data of the above graph was derived from the following data.
Creating a set of 1000 unique elements took 78 milliseconds
Creating a set of 10000 unique elements took 235 milliseconds
Creating a set of 25000 unique elements took 296 milliseconds
Creating a set of 50000 unique elements took 415 milliseconds
Creating a set of 100000 unique elements took 657 milliseconds
The above example made use from the default string’s hash code method which has a very good distribution. There are cases where we need to compute the hash code ourselves and take advantage of the current domain. Before doing so though, test first to verify whether the default implementation is good enough. Implement a new hash code algorithm only if the default implementation is truly hurting the application’s performance.