AI Mapping Benchmark 2.0: Performance Insights for Multi-Modal Spatial AI

Technology

Eva Cheng

30/06/2025

Three months ago, we introduced the world’s first AI indoor mapping benchmark — a transparent, standardized way to evaluate AI performance in floor plan analysis and map generation. Following the recent launch of MapScale® v9.0, we’re releasing AI Mapping Benchmark 2.0 to share updated results on how the new version interprets complex floor plans across sectors such as healthcare, education, and workplace environments.

This benchmark highlights a significant improvement in spatial understanding, driven by multi-modal large language models (MLLMs) in MapScale® v9.0.

MapScale® v9.0 combines visual analysis with unstructured, non-machine-readable metadata—such as scanned drawings, handwritten notes, and embedded text—to extract accurate mapping data without relying on structured inputs.

Key improvements include:

A 34% increase in classification accuracy over v8.
Broader support across real-world layouts in education, hospitality, and workplace verticals.
Classify rooms with little or no metadata.
Handle incomplete or missing vector geometry data.
Interpret spatial context in loosely structured layouts.

Benchmark Results

To evaluate performance, MapScale® v9.0 was tested on 19 floor plans* from sectors including workplace, education, hospitality, and healthcare. Three key mapping tasks were measured:

*Note: We initially selected 20 random floor plans online, and one of the files turned out to be corrupt, leaving 19 valid files for benchmarking.

Key Metrics

#1 Detection

Measures the model’s ability to identify spatial elements (e.g., walls, doors, POIs) and their boundaries.

#2 Classification

Evaluates the accuracy of categorizing architectural elements (e.g., meeting-space vs. work-space).

#3 Identification

Extracts and identifies all available metadata from floor plans — including unit names, IDs, and annotations (e.g., Meeting Room A by the exit vs. Meeting Room B by the window).

Results

Note: Detection was evaluated only for MapScale®, as current general-purpose models lack support for geometry detection.

What stands out?

General-purpose models lack detection capabilities; MapScale® achieves 90.2%.
Classification reaches 81.5%, nearly double that of ChatGPT-4o.
Identification score is 71.5%, outperforming all other models.
General LLMs lack the spatial understanding required for floor plan interpretation; MapScale® is purpose-built for this task.

From v8.17 to v9.0: The Shift in Capabilities

The main improvement from MapScale® v8.17 to v9.0 is the ability to classify and label rooms even when structured metadata is missing. This is achieved through better use of visual context and available metadata. MapScale® v9.0 can now detect, classify, and identify rooms and POIs more effectively.

While detection remains strong, the most significant improvement is a 34% increase in classification accuracy, driven by enhanced handling of visual features and unstructured data. Trained on real-world floor plans, MapScale® v9.0 can now interpret a broader range of layouts across healthcare, education, hospitality, and workplace environments with greater accuracy.

AI Classification Accuracy by Sector

MapScale® v9.0 leads all models in classification accuracy across four sectors, significantly outperforming general-purpose models like ChatGPT-4o, Gemma 3, and Llama-3.2.

It scores highest in Hospitality (89%), Education (85%), Workplace (80.3%), and Healthcare (53.2%), showing strong performance in reading and labeling floor plans—even in complex, unstructured layouts.

Examples:

Hospitality (Hotels) - Classification Score: 89%
- v8.17: High reliance on “Unspecified” tags for rooms and amenities.
- v9.0: “Guest rooms”, “Restrooms”, “Lounges,” and “Storage” areas are now clearly classified, offering a more complete and functional layout for hospitality use cases.

Education (Universities) - Classification Score: 85%
- v8.17: Large areas remained unlabeled or generically marked as "Office" or "Academic Room," with minimal detail.
- v9.0: Accurately identifies specific spaces like “Teams Room,” “Break Out,” and “Tea Point,” with improved room detection and functional labeling throughout.

Healthcare (Hospitals) - Classification Score: 53.2%
- v8.17: Key areas like “Prep Area”, “Delivery”, “Lounge”, Pharmacy, and Surgeons Training were marked as “Unspecified” or missing entirely, limiting functional understanding of the layout.
- v9.0: These rooms are now correctly labeled, with added detail for Waiting Rooms, Isolated WCs, Radiodiagnostic areas, and multiple Offices—greatly improving clarity, accuracy, and room coverage across the map.

Workplace (Offices) - Classification Score: 80.3%
- v8.17: Some areas are generically labeled with room numbers or left blank; large open zones are unlabeled or marked as “Support Space.”
- v9.0: Clearly distinguishes functional areas like “Work Space,” “Meeting Space,” “Restrooms,” and “Elevator,” providing a more structured and informative layout for office environments.

What’s Next?

MapScale® v9.0 builds on the foundation of v8.17 by integrating Multi-Modal LLMs (MLLMs), enabling it to interpret both visual and textual elements, even when metadata is limited or embedded. This results in a 34% boost in classification accuracy and more reliable performance across diverse, real-world floor plans.

Looking ahead, the upcoming v9.2 release will introduce a revised room taxonomy, structured into clearer core categories and a hierarchy of subtypes. This change is especially impactful in sectors like healthcare, where rooms with similar layouts may serve distinct functions (e.g., consultation, diagnostics, staff). By addressing this semantic complexity, v9.2 is expected to deliver a significant boost in classification precision, with a target accuracy of 80–90% for healthcare maps.

With continued improvements in taxonomy, data coverage, and sector-specific logic, MapScale® is evolving to meet the increasing demands of real-world spatial intelligence.

Written by Eva Cheng with contributions from Melih Peker.

Melih is Pointr’s AI mapping lead, with years of experience in computer vision and deep learning. He helped develop MapScale®, the patented AI that converts CAD files into smart digital maps for real-world spaces.