Leveraging Metadata in LumiDB Queries: A Multi-Scanner Example

Leveraging Metadata in LumiDB Queries: A Multi-Scanner Example

Managing large 3D scan datasets efficiently is challenging—especially when dealing with strict memory constraints. In this post, we explore how metadata queries in LumiDB let you interactively enable and disable scans without ever loading the full dataset into memory. We’ll walk through a real-world example, where a building scan is split into multiple scanner positions, and show how LumiDB’s built-in filtering and level-of-detail (LOD) handling can keep your application fast and responsive. 🚀

Published on


Do not index
Do not index
We're LumiDB, a reality capture data management system built to scale across industries, making reality capture data easy, accessible, and fun to use. Request access to the LumiDB alpha, and try our API-driven reality capture management for yourself.
We talk a lot about metadata and how LumiDB helps you leverage it with your 3D scans. In this post, I want to show you what we mean by "metadata queries"—a way to retrieve specific subsets of 3D scan data based on metadata attributes like scanner position, timestamp, or custom tags. Instead of loading raw point clouds and filtering them manually, metadata queries let you request only the data you need, right from the source. To make this more concrete, let's walk through a simple example.
Let’s say you've scanned a building by placing a tripod scanner in each room and scanned until you have a set of point clouds representing the entire structure. You store this data in E57 files, as the format neatly supports multiple point clouds per file, along with their relative transformations.
Now, let’s say you want to create a visualization where you can interactively enable and disable scans coming from any scanner location. To make things more challenging, you have a strict budget of 10 million points that you can process or render at any given time—otherwise, your computer will run out of memory and melt down. Unfortunately, the building is large, and the total number of points in the dataset is far greater than that. So, what do you do?
First, let’s consider how you’d do this without LumiDB. The brute-force approach would be to load the entire dataset into memory, file by file, and then dynamically include or exclude points in the visualization based on which scanners the user toggled on. But this doesn’t work with the “max 10M points or meltdown” constraint, especially if the dataset doesn’t fit into RAM at all. Another option would be to re-scan the files whenever the user toggles a scanner on or off, loading only enabled scan points into memory, but this would be terrible for performance. At any rate, you’d have to implement some form of level-of-detail (LOD) management or decimation, which requires non-trivial engineering effort better spent elsewhere.
So, how could LumiDB help? As a database specifically designed for efficient indexing and querying of large 3D scan datasets, LumiDB provides the building blocks for making this happen—batteries included. After importing the files into LumiDB (which supports all standard file formats out of the box), you can query the dataset’s metadata. Here’s a simplified example of what that metadata might look like:
{
   "input_files": {
       "0": {
           "crs": "N/A",
           "file_size": 20536164352,
           "filename": "my_data/building.e57",
           "num_points": 894117432,
           "point_clouds": {
               "0": {
                   "sensor_pose": [
                       20.66949955796885,
                       -38.81337540699597,
                       -0.02414025318102635
                   ],
                   "transformed_aabb": {
                       "max": [
                           71.39484061323014,
                           28.54835052056189,
                           57.62159258982914
                       ],
                       "min": [
                           -129.8692745231457,
                           -134.7249588760113,
                           -9.557221904212565
                       ]   
                   }
               },
               ...
           },
           "user_data": {
               "owner": "My Company",
               "capture_date": "2024-03-16"
           }
       }
   },
   ...
}
Notice that the metadata includes unique identifiers for scanners and the files from which each scan was imported, scanner positions useful for spatial filtering and visualization, and timestamps for date-based filtering. More importantly, every point in the LumiDB database is unambiguously linked to its metadata by automatically adding a unique attribute mapping the point back to its source at import.
The LumiDB query is where it all comes together. The application maintains a list of identifiers of enabled scanners, and passes this list simply as an additional metadata filter to the LumiDB query, retrieving only the relevant points whose metadata mapping matches this filter, without ever touching the full dataset. Oh, and LumiDB will automatically provide you with a nice-looking LOD of the requested points, containing up to 10M points (or whatever point budget you provided), so you don’t have to worry about running out of memory even if your dataset is terabytes in size.
To drive the point home, we implemented this in our web demo viewer app, and here’s what it looks like in action:
notion image
Of course, there are many other interesting features you can build using this same mechanism, such as time-series visualizations and date filtering, but those are topics for another blog post.
Interested in testing LumiDB? Request access to our closed alpha program and join the ranks of companies testing LumiDB today.

Written by

Jasin Bushnaief
Jasin Bushnaief

CTO, Co-founder