It’s easy to believe that machine learning is hard. An arcane craft known only to a select few academics.
After all, you’re teaching machines that work in ones and zeros to reach their own conclusions about the world. You’re teaching them how to think! However, it’s not nearly as hard as the complex and formula-laden literature would have you believe.
Like all of the best frameworks we have for understanding our world, e.g. Newton’s Laws of Motion, Jobs to be Done, Supply & Demand — the best ideas and concepts in machine learning are simple. The majority of literature on machine learning, however, is riddled with complex notation, formulae and superfluous language. It puts walls up around fundamentally simple ideas.
Let’s take a practical example. Say we wanted to include a “you might also like” section at the bottom of this post. How would we go about that?
To clarify the idea, let’s look at a naive solution:
- Split the current post title into its individual words
- Get all other posts
- Sort all other posts by those with the most words in their body in common with our title
Or, in Ruby:
Using this method to find similar posts on this blog to “How The Support Team Improves The Product”, gives us the following top 10:
- How To Launch With A Validated Idea
- Know Your Customers and How They Decide
- Designing First Run Experiences To Delight Users
- How to hire designers
- The Dribbblisation of Design
- An interview with Ryan Singer
- Why Being First Doesn’t Matter
- Proactive Support with Intercom
- An interview with Joshua Porter
- Retention, Cohorts, and Visualisations
As you can see, posts about running an effective support process have little in common with cohort analysis, or debate around the merits of design. We can do better.
Let’s try a real machine learning approach. We’re going to break this into two parts:
- Represent posts mathematically.
- Cluster these mathematical representations with K-Means.
1. Representing posts mathematically
If we can represent our posts mathematically, we can plot the posts, compare distances between posts, and identify clusters of similar posts.
Mapping each post to a mathematical representation is easy, we can do it in two steps:
- Find all words in all posts.
- Convert each post into an array. Each element is a 1 or a 0, denoting presence of a word. This array is of the same order for each post, as it’s based off step #1.
Or, in Ruby:
['hello', 'inside', 'intercom', 'readers', 'blog', 'post']
A post with the body “hello blog post readers” would be mapped to:
We don’t have simple tools for plotting vectors in 6-dimensions, like we do for those in 2-dimensions — but concepts like distance are easily extrapolated. (It’s also still useful to use the 2-dimensional example).
2. Clustering posts with K-Means
Now we have a mathematical representation of our blog posts — let’s try find clusters of similar posts. To do this we’re going to use a crazy simple clustering algorithm called K-Means, it can be described in 5 steps:
- Set ‘K’ to the number of clusters you want
- Choose ‘K’ random points
- Assign each document to its closest point
- Choose ‘K’ new points, from the ‘average’ of all documents assigned to each point
- Repeat steps 3-4. Until documents’ assignments stop changing.
Let’s visualize these steps. First, we choose 2 (i.e. k = 2) random points, in the same space as our posts:
We assign each document to its closest point:
We re-evaluate the center of each of these clusters, to be the average of all posts in that cluster:
That’s the end of our first iteration. Now we re-assign each post to its new closest point:
We’ve found our clusters! We know this because it’s obvious in further iterations that the assignments would not change.
Or, in Ruby:
Here’s the top 10 similar posts to “How The Support Team Improves The Product”, with this method:
- Are you being Clear, or Clever?
- 3 Rules for Customer Feedback
- Asking customers what you want to hear
- Shipping is the beginning of a process
- What Does Feature Creep Look Like?
- Getting Insight Into Your Userbase
- Converting Customers with the Right Message at the Right time
- Conversations With Your Customers
- Does your app have a message schedule?
- Have You Tried Talking To Your Customers?
The results speak for themselves.
We achieved all of this with less than 40 lines of code, and some simple algorithms that can be described in a blog post. However, you would never know how simple some of these ideas are from reading academic literature. Here’s an excerpt from the paper introducing K-Means (it’s hard to pinpoint the exact first introduction of K-Means, but this was the first paper to use the term “K-Means”):
The academic literature can often be useful, if you’re willing to work through the notation. However, there are a lot of excellent alternative resources that are more practical and approachable:
- Wikipedia (e.g. Latent Semantic Indexing, Cluster analysis)
- Source code of open source machine learning libraries (e.g. Scipy’s K-Means, Scikit’s DBSCAN)
- Books written for programmers, not academics (e.g. “Programming Collective Intelligence” by Toby Segaran, “Machine Learning for Hackers” by Conway & White)
- Khan Academy
Give it a try
Want to suggest tags in your project management app? Or assignees in your customer support tool? Or members of a group on a social network? The chances are some simple code, and an easy algorithm will get there. So, when faced with a challenge in your product where you believe machine learning can help, don’t be discouraged.
Machine learning is easier than you might think.