Not Magic, Just Math
Posts
Beyond the Blueprint: Choosing the Best Data Structure for Architectural ML

Beyond the Blueprint: Choosing the Best Data Structure for Architectural ML

Benjamin Friedman
November 12, 2024

Hi Fellow Magicians 🧙‍♂️

Before we jump into the world of Regression, I want to share some thoughts from my work at DLR Group. A core question I’ve been exploring is:

“What data structures are best for representing design and architectural work for analysis and machine learning?”

Framing the Problem: Evaluating Representations of Data for Machine Learning 📐

How do we evaluate different representations of information? Here are a few quality guidelines for assessing statistical datasets that are guiding me in this journey.

🧩 Feature Completeness

Feature completeness refers to the dataset’s coverage of all the essential information needed to predict an outcome. Essentially, it speaks to how fully the data describes each example. The more completely we capture relevant information, the more accurately the model can infer patterns.

➡ Determinism Between Inputs and Outputs

Determinism in relationships means that the same set of input features should reliably lead to the same outcome. Consistent, repeatable patterns between features and labels allow models to learn effectively.

Think of this like the “vertical line test” from algebra, which confirms that each input has only one output. Determinism between inputs and outputs is crucial since statistical models approximate functions.

🎲 Independent & Identically Distributed (I.I.D.)

The i.i.d. assumption is foundational in many statistical models, where each observation is expected to be:

Independent: Observations should not influence each other.
Identically Distributed: Observations come from the same distribution, ensuring relationships hold consistently across the dataset.

Armed with these guidelines, let’s evaluate some popular representations for architectural data!

First a quick blurb from this weeks sponsor 1440 Media! Please check them out to support the newsletter!

Fact-based news without bias awaits. Make 1440 your choice today.

Overwhelmed by biased news? Cut through the clutter and get straight facts with your daily 1440 digest. From politics to sports, join millions who start their day informed.

🌆 Images

The go-to answer for architectural representation is often images—sketches, renderings, construction plans. It’s natural, given that architecture communicates through visuals. But there’s a catch: images rely heavily on inferred information. Much of what we interpret from an architectural image—like function, spatial hierarchy, and flow—is not directly encoded and depends on our knowledge.

Take this floorplan as an example. How many rooms do you see? Using a basic definition of a room as “a part of a building enclosed by walls, floor, and ceiling,” we might find 6 rooms. But, experience-wise, people may break this up differently—seeing separate functional areas for living, dining, and cooking, even if they’re in a single large space.

With images, we face an issue with Determinism Between Inputs and Outputs. Interpreting spaces purely from images is ambiguous, and our models are likely to be just as confused as we are. My rule of thumb:

❝

“If I’m confused about the information in the dataset, the computer will be completely lost.”

While images capture form, shape, and interconnectivity well, they lack easy access to specifics about each element’s purpose or function, limiting feature completeness.

📊 Tables

Next, we have tables, which in my work are usually pulled from Revit data exports. Tables give structured, detailed information—room areas, dimensions, materials, and so on. However, tabular data lacks critical relational information.

Name	Area	Perimeter
Kitchen	500	80
Dining Room	900	130
Living Room	800	100
Bedroom 1	450	60
Bedroom 2	450	60
Corridor	200	140
Bathroom	150	50

For example, a table of room data tells us each room's size, type, and perimeter, but it doesn’t reveal spatial relationships like adjacency or connectivity. This absence is a feature completeness issue. Without layout context, we can’t determine how spaces interact, creating a huge limitation for our models.

This example also highlights low input/output determinism. Many potential floor plans could match a single table of room data, leading to ambiguity in spatial understanding.

Finally, using tables for ML often implies assuming Independent & Identically Distributed (I.I.D.) data, where records are isolated from each other. But architectural data records (like rooms) are interdependent, with much information embedded in their relationships. Ignoring this structure limits our understanding and weakens model performance.

🕸 Graphs

Now that we’ve covered images and tables, let’s dive into graphs—the structure I believe holds the most promise for architectural information.

Graphs consist of nodes (representing entities like rooms, spaces, or materials) and edges (representing relationships like adjacency, hierarchy, or flow). This setup naturally captures complex, interconnected information that mirrors real-world design.

Let’s use rooms and floor plans as an example. We can create a node for each room and use edges to indicate adjacency or connectivity. Evaluating this structure against our guidelines:

Feature Completeness: Graphs can represent both attributes of rooms (size, function) and their relationships, like which rooms are adjacent—an area where images and tables fell short.
Determinism Between Inputs and Outputs: Given a graph with room attributes and connections, there are fewer possible interpretations. Knowing adjacency and size creates a smaller, more defined set of potential layouts, which improves determinism.
Independent & Identically Distributed (I.I.D.): Graphs inherently violate i.i.d. by capturing dependencies between nodes. But unlike standard ML models, graph learning methods are specifically designed to leverage these interdependencies, so violating i.i.d. is actually a strength here.

In summary, while images and tables each have value, graphs offer the most promise for representing architectural data in a way that aligns with our machine learning needs. Beyond simply passing our tests - I think there is a deeper relationship between how we develop architecture and graphs which I plan to go into soon! So if that sounds interesting please subscribe!!

Reply

or to participate.