1.7 Trillion Points: Ingesting the Entire North Island into LumiDB

Do not index

Following up on our previous post on importing 250 billion points from the Gisborne region, we recently ingested airborne lidar scans covering the majority of New Zealand’s North Island. This data shows how LumiDB handles country-scale scans without the traditional trade-offs inherent in file-based workflows with terabyte-scale point clouds.

Huge thanks to OpenTopography.org for providing the source data that made this possible. The datasets are released under Creative Commons CC BY 4.0.

The dataset

Total points: 1.735 Trillion

Input files: ~300,000

Data size: 13 TB (Compressed LAZ)

Area: 110,000 km²

Average density: 15.83 pts/m2

The data has been collected over multiple years, and divided into regions roughly corresponding to New Zealand regions. The scans vary in density and attributes, and for instance not all regions have RGB colors. The combined dataset is really interesting and creates quite an immersive experience. Having never visited New Zealand, it really makes me want to start booking flights.

The landscape of New Zealand is amazing! — The landscape of New Zealand is *amazing!*

The North Island (well, almost!) colored by elevation

Scalability equals value

Managing a dataset of this size is rarely about just looking at the whole island at once. The real value lies in centralization and accessibility. We see organizations lose significant time managing fragmented "tiles" or files across network drives and silos. By moving terabytes of data into a single source of truth that can easily handle this kind of a scale, they get the following benefits:

First, time to insight is reduced. Engineers and planners can access any area instantly, replacing the hours or even days typically spent locating files across fragmented systems. Second, every stakeholder accesses the same authoritative data, ensuring measurements and annotations remain synchronized across teams. Third, regional infrastructure, such as rail networks or power grids, can be analyzed as a single entity rather than through disconnected project files.

Eliminating file boundaries

A major hurdle in traditional workflows is managing 300,000 individual files. Usually, this requires manual merging or indexing that breaks down at the terabyte scale. LumiDB’s ingestion process does not simply "upload" files and make them available on a map view; it reorganizes the data into a global spatial index. During this phase, the system extracts metadata, normalizes coordinate systems, and builds a hierarchical structure that allows for progressive streaming. Even heterogeneous datasets, where different capture methods have been used work fine. Or here where point attributes vary from file to file, i.e. not all regions have RGB colors at all, but things will still just work.

The NZ North Island dataset of 300,000 files visualized as a patchwork where each patch is one LAZ file. Quite a few!

The 1.7 trillion points, originally split into multiple regions of varying properties, are indexed as a single continuous repository. You can then use the browser based tooling or the API to query this data as if it were a single file, essentially. This architecture removes the concept of "files" for the end user, unless they specifically want to download the originals. (LumiDB does keep the references.) Instead of downloading a directory of LAZ tiles, a user can query any arbitrary polygon, and make it available either through file export or via the API using some standard format such as 3D Tiles or Entwine Point Tiles. The system identifies and retrieves only the specific points within that boundary and any filters applied, typically within seconds or faster, regardless of which dataset the points came from, or the total archive size or number of intersecting files. You no longer have to care about file boundaries or "seams" in the data. Whether you are cropping a single building or querying a 50km corridor, the response time remains consistently fast.

Interactive visualization in the browser

Despite the volume of data, the experience remains interactive, as shown in the video below. The web-based viewer maintains high frame rates by dynamically loading only the density required for your current view. This performance isn’t limited to visualization. Downstream operations, such as combining files, classification filtering and exporting specific areas, are just as fast as they would be on a dataset a fraction of this size. The system is built to ensure that 1.7 trillion points feel as lightweight as 1 million.

The 700 km cross-section

Having such a dataset behind a queryable API opens up all kinds of new opportunities, even weird ones. For example, we can easily do a 700 km cross section across the entire island interactively, which will yield an interesting height profile of the terrain:

Taking a fantastic 700 km cross-section across four different NZ regions. Not sure why though!

I don't know if anyone actually needs to take cross sections spanning hundreds of kilometers, but showing that it works just as easily and quickly as a 7m one sure makes for a nice demo!

Pffffft… 13 terabytes is peanuts!

Do you have even larger datasets you’d like to see if LumiDB can handle? Petabytes? We’re up for the challenge if you are. Get in touch!