Geohashing
Geohashing is a method of encoding geographic coordinates (latitude and longitude) into a short alphanumeric string. It was invented by Gustavo Niemeyer in 2008 as a technique for efficient spatial indexing and proximity search in databases and distributed systems.
The core problem it solves: querying latitude/longitude pairs to find nearby points is slow at scale. Most databases use B-tree indexes, which work well for one-dimensional data but struggle with 2D range queries. Geohashing converts a 2D location into a 1D string that a standard index can search efficiently.
Example: the coordinates for downtown San Francisco (37.7749°N, -122.4194°W) encode to the geohash 9q8yyf.
Key properties
- Spatial locality
-
Locations that share a longer common prefix are geographically closer. Each additional character in the hash zooms in to a more precise region of the earth. The hash is therefore hierarchical: longer = more precise.
Example:
9q8yyfand9q8yyxshare five characters — they are in the same neighbourhood.9q8yyfanda2sed7share no characters — they may be on different continents. - One-dimensional and indexable
-
Because a geohash is a plain string, it can be stored in a
VARCHARcolumn and indexed with a standard B-tree or trie. This enables fast prefix-based lookups and efficient sorting — operations that are difficult to perform directly on floating-point coordinate pairs.
How geohashing works
Encoding a coordinate pair into a geohash involves three steps.
Step 1: Recursive bisection
The globe is divided recursively, alternating between latitude (-90° to +90°) and longitude (-180° to +180°). At each step the current range is split in half:
-
If the coordinate falls in the lower half, a
0bit is appended. -
If it falls in the upper half, a
1bit is appended.
This is essentially binary search, but recording each decision rather than just finding the value.
Step 2: Interleave bits
Rather than encoding latitude and longitude as separate binary sequences, their bits are interleaved (odd positions = longitude bits, even positions = latitude bits). This produces a single binary string that traces a Z-order (Morton) curve across the surface of the earth, which is what preserves spatial locality between nearby points.
Step 3: Encode as Base32
The interleaved binary string is split into 5-bit groups. Each group is mapped to a character from a 32-character alphabet (0–9, bcdefghjkmnpqrstuvwxyz — letters a, i, l, and o are omitted to avoid confusion with digits). The result is the final geohash string.
Precision vs. length
Each additional character adds approximately 2.5 bits of precision in each dimension:
| Hash length | Approximate cell width | Approximate cell height |
|---|---|---|
1 |
~5,000 km |
~5,000 km |
3 |
~156 km |
~156 km |
5 |
~4.9 km |
~4.9 km |
6 |
~1.2 km |
~0.6 km |
7 |
~152 m |
~152 m |
9 |
~4.8 m |
~4.8 m |
12 |
~3.7 cm |
~1.9 cm |
Precision is controlled entirely by truncating the string. This makes it trivially easy to "zoom out" from a precise point to its containing region.
Querying for nearby items
To find all items within a given radius of a user:
-
Calculate the user’s geohash at the desired precision (e.g. 7 characters for ~150m cells).
-
Retrieve the 8 neighbouring cell hashes using a geohash library.
-
Query the database for all items whose stored geohash starts with any of the 9 hashes (the user’s cell plus its 8 neighbours).
-
Post-filter the results using an exact distance formula (e.g. the Haversine formula) to remove items outside the true radius and rank the remainder by distance.
The border-crossing problem is why step 2 is essential: a point 20 metres away may sit in a neighbouring cell and would be missed by a query on just the user’s own cell. Querying all 9 cells eliminates this blind spot. The expensive distance calculation in step 4 operates on a small candidate set (hundreds of rows), not the full dataset (millions of rows).
Use cases
-
Ridesharing (Uber, Lyft): group active drivers into geohash cells; when a rider requests a trip, search only drivers in the matching and neighbouring cells.
-
Food delivery (Deliveroo, DoorDash): find nearby restaurants by matching stored geohashes against the user’s current cell and its neighbours.
-
Geospatial databases: Elasticsearch supports
geohash_gridaggregations for bucketing documents by location; MongoDB supports geohash-based prefix queries. -
Spatial sharding: distribute data across servers by geohash prefix, keeping nearby records on the same node.
Limitations
- The edge case problem
-
Points near a cell boundary may be closer to a point in a neighbouring cell than to points at the far edge of their own cell. Always query neighbouring cells and post-filter by exact distance.
- Cell shape distortion
-
Geohash cells are rectangular, and their physical size and shape vary with latitude (they are smaller near the poles). A geohash does not represent a uniform circular area.
- False positives
-
Two points can be in the same cell but at opposite corners, making them farther apart than a point in a neighbouring cell. The post-filtering step (step 4 above) is required to correct for this.
- Not a true k-NN
-
Geohashing is a proximity-filtering technique, not a true k-nearest-neighbours algorithm. Combine it with an exact distance computation for ranked nearest-neighbour results.
Alternatives
- Quadtrees / K-d trees
-
Tree-based structures that recursively partition 2D (or k-dimensional) space. Geohashing can be seen as a linearised representation of a quadtree traversal along a Z-order curve.
- R-trees
-
Used in spatial databases such as PostGIS, Oracle Spatial, and SQLite SpatiaLite. Group nearby objects into minimum bounding rectangles, supporting efficient containment, intersection, and nearest-neighbour queries.
See also
-
Hashing — the general concept of hash functions; geohashing applies the same ideas to geographic space.
-
Consistent hashing — another specialised hashing technique used in distributed systems.
-
Distributed system — geohashing is commonly used for spatial sharding in distributed architectures.
-
Databases — spatial indexing strategies and how standard indexes support geohash prefix queries.