Not Magic, Just Math
Posts
Diagnosing Business Problems and Finding ML Solutions

Diagnosing Business Problems and Finding ML Solutions

Not Magic, Just Math - Field Note #1

Benjamin Friedman
September 24, 2024

🧙‍♂️ Hello fellow Magicians!

Welcome to the first field note entry of Not Magic, Just Math! Thanks for joining this project! This week, I thought it would be interesting to dive into something I spend a lot of time doing - diagnosing a business problem and figuring out what type of ML solution fits best.

Thank you to the Data Science meme facebook group

💡 Most Business Users Don’t Know How ML Should Be Built

“Hey Benny! I’d love an ML system that can provide me with relevant images for my presentations based on slide content! Maybe you can use semantic similarity between image and text embeddings!!”

— Unfortunately, No one ever

Most of the time, business groups, clients, or users don’t know how an ML system should be built or what methods it should leverage. It’s up to us practitioners to translate their unique problem into a set of potential ML approaches we can explore and refine to meet their needs.

🔑 ML Typologies to Keep in Mind

I find it helpful to think of the many ML approaches we have at our fingertips as existing in a set of overarching ML typologies:

🛠️ Optimization
📈 Regression
🧩 Classification
🔍 Similarity
🎨 Generative
✂️ Segmentation
🔄 Encoder/Decoder
🧐 Explainable ML

These are the groupings that I find most helpful. I suspect many of you have your own lists. I invite you to share yours or offer improvements to mine in a comment or in an email response!

When scoping a new ML project or solution with teammates, I often listen out for key words and phrases to help me quickly identify which ML typologies might solve a particular problem. I thought I’d share some of the phrases on my list:

📢 Common Phrases To Listen For

“I want help finding the best”
When someone needs to identify the best, worst, quickest, or another superlative, that’s a strong clue they’re after optimization.
“I want to know what will happen if...”
If a colleague is interested in outcomes, I usually lean toward either regression or classification, depending on the precision they require:
- High precision on continuous data? → Regression.
- Lower precision or non-continuous data? → Classification.

“I want to be able to search by / classify this by”
This one's simple—if they’re talking about classification or tagging, they're likely looking for a classification solution.
“I want to find something similar to...”
If they want to find similar items based on a known one (think recommendation systems or search engines), they’re probably after similarity-based search.
“I want to be shown new options”
When someone wants to see something new based on given data, they’re likely in need of a generative solution.
“I’d like to detect...”
When someone says detect, it usually means they need a classification system or segmentation, depending on the data type:
- Tabular data → classification
- Non-tabular data → segmentation
“I’m hoping to understand...”
People asking to understand often need highly explainable models like linear regressions, generative additive models, or decision trees. These models offer insights into factor contributions, which help build understanding.
If the phrase ends with “the difference,” they may be interested in unsupervised classification, seeking to uncover differences in existing data.
“I’d like to translate...”
When someone says translate, it suggests they need an encoder-decoder system. These days, for text, I’d typically go with a generative pre-trained transformer (GPT) model.
“I’d like to locate X in Y”
This phrase often indicates a need for segmentation.

🔦 Digging in on a Candidate System

By associating these phrases with ML typologies, I can zero in on candidate systems and start asking specific questions, where things really start to get interesting like:

What data types are we working with?
How extensive is the data?
What is the size of the data set?
How was the data collected?
What are the requirements for inference speed?
What level of quality is acceptable?

I imagine many of you who work end-to-end on ML systems have your own key phrases to listen for. I’d love to hear them! Feel free to share your thoughts in the comments, or if you’d like to contribute your own field note to Not Magic, Just Math, reach out!

🎉 Thanks for reading the first field note of Not Magic, Just Math!

If you have ideas for future topics, let me know! To new readers—welcome aboard 🚀

Reply

or to participate.