Not Magic, Just Math
Posts
NMJM #3: Unlocking the Power of ML Typologies

NMJM #3: Unlocking the Power of ML Typologies

Benjamin Friedman
October 01, 2024

🧙‍♂️ Hello Fellow Magicians!

Last week, I touched on how I determine what ML typology a problem falls into, along with some connection words and phrases clients often use. These typologies help guide us toward the right category of solution.

This week, let’s dig deeper into the value of ML typologies and how they can help us figure out what systems and data we need to develop or source.

🔑 ML Typologies to Keep in Mind
Here’s a recap of the typologies I shared last week:

🛠️ Optimization
📈 Regression
🧩 Classification
🔍 Similarity
🎨 Generative
✂️ Segmentation
🔄 Translation
🧐 Explainable ML

💡 The Value of Typologies
“Typology: the study of or analysis or classification based on types or categories.” — Merriam Webster Dictionary

There are many ways to categorize ML systems. The most common lists I see are:

The Learning List

Supervised Learning
Unsupervised Learning
Semi-Supervised Learning
Reinforcement Learning

The Task-Based List

Regression
Classification
Clustering
Transcription
…and so on.

Personally, I approach ML from a more task-focused perspective. I find it’s more useful to consider types of learning once we have a better grasp of the data at hand.

🧠 Why My Typology List?
My set of ML typologies groups data science and ML systems by their desired effect. From understanding a typology, you can often intuit what kinds of:

Data
Models
Infrastructure
Monitoring

…you may need to leverage.

For example:

Data for Regression: Involves independent features and a dependent criterion.
Data for Segmentation: Requires segmentation masks outlining the area to be segmented.

On the infrastructure side:

Classification models tend to be lightweight, needing relatively little compute power.
Generative or Translation systems, however, are more computationally heavy since they often involve both encoding and decoding information.

When it comes to monitoring, typologies also provide clues:

Regression models should be monitored for issues like outliers and dataset drift.
Classification models need attention to class imbalance.

🔦 Next Steps
Over the next few weeks, I’ll dive into each typology and explore what intuitions and insights we can gather when diagnosing a problem in that context.

📬 Share Your Thoughts!
Do you use your own ML typologies when diagnosing systems? I’d love to hear about your approach! Feel free to drop a comment or send an email.

Reply

or to participate.