- Not Magic, Just Math
- Posts
- Beyond the Blueprint: Choosing the Best Data Structure for Architectural ML
Beyond the Blueprint: Choosing the Best Data Structure for Architectural ML
Hi Fellow Magicians š§āāļø
Before we jump into the world of Regression, I want to share some thoughts from my work at DLR Group. A core question Iāve been exploring is:
āWhat data structures are best for representing design and architectural work for analysis and machine learning?ā
Framing the Problem: Evaluating Representations of Data for Machine Learning š
How do we evaluate different representations of information? Here are a few quality guidelines for assessing statistical datasets that are guiding me in this journey.
š§© Feature Completeness
Feature completeness refers to the datasetās coverage of all the essential information needed to predict an outcome. Essentially, it speaks to how fully the data describes each example. The more completely we capture relevant information, the more accurately the model can infer patterns.
ā” Determinism Between Inputs and Outputs
Determinism in relationships means that the same set of input features should reliably lead to the same outcome. Consistent, repeatable patterns between features and labels allow models to learn effectively.
Think of this like the āvertical line testā from algebra, which confirms that each input has only one output. Determinism between inputs and outputs is crucial since statistical models approximate functions.
š² Independent & Identically Distributed (I.I.D.)
The i.i.d. assumption is foundational in many statistical models, where each observation is expected to be:
Independent: Observations should not influence each other.
Identically Distributed: Observations come from the same distribution, ensuring relationships hold consistently across the dataset.
Armed with these guidelines, letās evaluate some popular representations for architectural data!
First a quick blurb from this weeks sponsor 1440 Media! Please check them out to support the newsletter!
Fact-based news without bias awaits. Make 1440 your choice today.
Overwhelmed by biased news? Cut through the clutter and get straight facts with your daily 1440 digest. From politics to sports, join millions who start their day informed.
š Images
The go-to answer for architectural representation is often imagesāsketches, renderings, construction plans. Itās natural, given that architecture communicates through visuals. But thereās a catch: images rely heavily on inferred information. Much of what we interpret from an architectural imageālike function, spatial hierarchy, and flowāis not directly encoded and depends on our knowledge.

Take this floorplan as an example. How many rooms do you see? Using a basic definition of a room as āa part of a building enclosed by walls, floor, and ceiling,ā we might find 6 rooms. But, experience-wise, people may break this up differentlyāseeing separate functional areas for living, dining, and cooking, even if theyāre in a single large space.

With images, we face an issue with Determinism Between Inputs and Outputs. Interpreting spaces purely from images is ambiguous, and our models are likely to be just as confused as we are. My rule of thumb:
āIf Iām confused about the information in the dataset, the computer will be completely lost.ā
While images capture form, shape, and interconnectivity well, they lack easy access to specifics about each elementās purpose or function, limiting feature completeness.
š Tables
Next, we have tables, which in my work are usually pulled from Revit data exports. Tables give structured, detailed informationāroom areas, dimensions, materials, and so on. However, tabular data lacks critical relational information.
Name | Area | Perimeter |
---|---|---|
Kitchen | 500 | 80 |
Dining Room | 900 | 130 |
Living Room | 800 | 100 |
Bedroom 1 | 450 | 60 |
Bedroom 2 | 450 | 60 |
Corridor | 200 | 140 |
Bathroom | 150 | 50 |
For example, a table of room data tells us each room's size, type, and perimeter, but it doesnāt reveal spatial relationships like adjacency or connectivity. This absence is a feature completeness issue. Without layout context, we canāt determine how spaces interact, creating a huge limitation for our models.
This example also highlights low input/output determinism. Many potential floor plans could match a single table of room data, leading to ambiguity in spatial understanding.

Finally, using tables for ML often implies assuming Independent & Identically Distributed (I.I.D.) data, where records are isolated from each other. But architectural data records (like rooms) are interdependent, with much information embedded in their relationships. Ignoring this structure limits our understanding and weakens model performance.
šø Graphs
Now that weāve covered images and tables, letās dive into graphsāthe structure I believe holds the most promise for architectural information.
Graphs consist of nodes (representing entities like rooms, spaces, or materials) and edges (representing relationships like adjacency, hierarchy, or flow). This setup naturally captures complex, interconnected information that mirrors real-world design.

Letās use rooms and floor plans as an example. We can create a node for each room and use edges to indicate adjacency or connectivity. Evaluating this structure against our guidelines:
Feature Completeness: Graphs can represent both attributes of rooms (size, function) and their relationships, like which rooms are adjacentāan area where images and tables fell short.
Determinism Between Inputs and Outputs: Given a graph with room attributes and connections, there are fewer possible interpretations. Knowing adjacency and size creates a smaller, more defined set of potential layouts, which improves determinism.
Independent & Identically Distributed (I.I.D.): Graphs inherently violate i.i.d. by capturing dependencies between nodes. But unlike standard ML models, graph learning methods are specifically designed to leverage these interdependencies, so violating i.i.d. is actually a strength here.
In summary, while images and tables each have value, graphs offer the most promise for representing architectural data in a way that aligns with our machine learning needs. Beyond simply passing our tests - I think there is a deeper relationship between how we develop architecture and graphs which I plan to go into soon! So if that sounds interesting please subscribe!!
Reply