HashMap
A structure that trades memory for near-constant-time lookup and underpins caches, indexes, and associative containers.
Context
A HashMap is an associative array that maps keys to values using a hash function to determine where entries are stored internally. Instead of searching linearly through elements, it computes an index and jumps directly to a bucket.
Under good hash distribution, lookup, insertion, and deletion run in amortized O(1) time. This efficiency depends on two assumptions:
- The hash function distributes keys uniformly.
- The number of entries per bucket remains small.
When many keys collide into the same bucket, operations degrade to O(n) because the bucket must be traversed. In other words, constant time is not guaranteed — it is statistical.
A HashMap therefore represents a tradeoff:
- More memory and occasional expensive rehashing
- In exchange for very fast average-case access
This tradeoff makes it a foundational building block for caches, indexing systems, symbol tables, and countless higher-level abstractions.
Core idea
At its heart, a HashMap combines two components:
- An array of buckets
- A collision-handling strategy (e.g., separate chaining)
Each key is transformed into an integer via hashCode(). That hash determines which bucket the entry belongs to.
If multiple keys map to the same bucket, they are stored together in a small collection (commonly a linked list).
As long as buckets stay short, operations remain close to constant time.
To keep buckets short, the structure maintains a load factor: load factor = number of entries / number of buckets
When the load factor exceeds a threshold (commonly ~0.7–0.75), the map resizes, typically doubling its capacity and redistributing all entries.
Resizing is an O(n) operation — every entry must be rehashed — but it happens infrequently. Because of this, insertion remains amortized O(1) over time.
Key insights:
- A lower load factor → fewer collisions → more memory usage.
- A higher load factor → more collisions → less memory usage.
- Resizing balances performance and space.
- True O(1) guarantees are impossible; performance depends on hash quality and distribution.
A HashMap is therefore not “magic constant time.” It is a carefully engineered compromise between probability, memory, and rehashing cost.
Implementation
import java.util.*;
import java.util.stream.Collectors;
class HashMap<K, V> implements Iterable<HashMap.Entry<K, V>> {
private int capacity;
private int size;
private List<Entry<K, V>>[] buckets;
private int modCount;
public HashMap(int capacity) {
this.capacity = capacity;
initEmptyBuckets();
}
public void put(K key, V value) {
assertNotNullKey(key);
resizeIfNecessary();
var bucket = getBucket(key);
modCount++;
for (var entry : bucket) {
if (entry.key.equals(key)) {
entry.value = value;
return;
}
}
size++;
bucket.add(new Entry<>(key, value));
}
public V get(K key) {
assertNotNullKey(key);
for (var entry : getBucket(key)) {
if (entry.key.equals(key)) return entry.value;
}
return null;
}
public V remove(K key) {
assertNotNullKey(key);
if (isEmpty()) return null;
var bucket = getBucket(key);
var iter = bucket.listIterator();
while (iter.hasNext()) {
var next = iter.next();
if (next.key.equals(key)) {
iter.remove();
size--;
modCount++;
return next.value;
}
}
return null;
}
public void clear() {
size = 0;
modCount++;
initEmptyBuckets();
}
public Set<K> keySet() {
if (isEmpty()) return Set.of();
return Arrays.stream(buckets)
.flatMap(List::stream)
.map(entry -> entry.key)
.collect(Collectors.toSet());
}
public Collection<V> values() {
if (isEmpty()) return List.of();
return Arrays.stream(buckets)
.flatMap(List::stream)
.map(entry -> entry.value)
.toList();
}
public Collection<Entry<K, V>> entries() {
if (isEmpty()) return List.of();
return Arrays.stream(buckets)
.flatMap(List::stream)
.toList();
}
public boolean isEmpty() {
return size == 0;
}
public int size() {
return size;
}
@Override
public Iterator<Entry<K, V>> iterator() {
return new HashMapIterator();
}
private void assertNotNullKey(K key) {
if (key == null) throw new NullPointerException("Key must not be null");
}
private List<Entry<K, V>> getBucket(K key) {
var hash = key.hashCode();
var bucketIndex = Math.floorMod(hash, capacity);
return buckets[bucketIndex];
}
private void resizeIfNecessary() {
var loadFactor = (double) size / capacity;
if (loadFactor > 0.7) {
capacity *= 2;
var currentBuckets = buckets;
initEmptyBuckets();
size = 0;
for (var bucket : currentBuckets) {
for (var entry : bucket) {
put(entry.key, entry.value);
}
}
}
}
@SuppressWarnings("unchecked")
private void initEmptyBuckets() {
buckets = new LinkedList[capacity];
for (var i = 0; i < buckets.length; i++) {
buckets[i] = new LinkedList<>();
}
}
public static class Entry<K, V> {
private final K key;
private V value;
private Entry(K key, V value) {
this.key = key;
this.value = value;
}
}
private final class HashMapIterator implements Iterator<Entry<K, V>> {
private final int expectedModCount = modCount;
private int bucketIndex;
private Iterator<Entry<K, V>> bucketIterator;
public HashMapIterator() {
bucketIndex = findNextNonEmptyBucket(0);
bucketIterator = bucketIndex != -1 ? buckets[bucketIndex].iterator() : null;
}
@Override
public boolean hasNext() {
if (expectedModCount != modCount)
throw new ConcurrentModificationException();
if (bucketIterator == null) return false;
if (bucketIterator.hasNext()) return true;
var nextBucket = findNextNonEmptyBucket(bucketIndex + 1);
if (nextBucket == -1) return false;
bucketIndex = nextBucket;
bucketIterator = buckets[bucketIndex].iterator();
return bucketIterator.hasNext();
}
@Override
public Entry<K, V> next() {
if (!hasNext()) throw new NoSuchElementException();
return bucketIterator.next();
}
private int findNextNonEmptyBucket(int index) {
for (var i = index; i < buckets.length; i++) {
if (!buckets[i].isEmpty()) return i;
}
return -1;
}
}
}