Skip to content


By Yusuf Olodo

This article focuses on some of the things I have picked up being a Data Scientist.

What is customer segmentation and how can you supercharge customer engagement thereby driving revenue and retention in your business?

This article approaches the market segmentation problem from a stakeholder business problem point of view. How does a Data scientist formulate the right solution to the right problems?

Before attacking the problem with your fancy clustering algorithm and minimizing the cost function, brainstorm solutions with your stakeholders and ask the right questions.

Customer segmentation is the bedrock of a successful marketing strategy that every customer-centric organization must adopt. In the age of the internet and social media, the new customer is dynamic. How do we keep track of the dynamism of customers and leverage that dynamism to supercharge our business? Companies are starting to use data science to better understand customer behaviour’s and identify different customer segments based on their activity patterns.

How do we make our ads, email marketing, or non-digital channel marketing more personalized? Companies need to structure their business processes from the demand side of the business equation rather than purely from the supply side; in other words, the needs of the customers in terms of products/services required need to come first.

An advantage of segmentation is the niche capabilities and ability to stand out as a product leader amongst your competition by knowing exactly what the customer wants. Development and delivery of a carefully targeted value proposition should be the main goal for the product and marketing department. Marketing plays a key role in identifying and quantifying the needs of various customer groups or segments within these markets, communicating with the customers through various channels, and proposing the values of the products/services the business has to offer.

As a Data Scientist placed within the marketing/product team, think of yourself as Robin and they are Batman, they have the authority of Batman and make the final decisions but you have the flexibility of Robin as an acrobat to dig into the data and find the patterns(yes I do love DC comics!).


The popularity of Google Ads has exploded the world of digital marketing to a whole new level. Years of gathering data on customers using their products including search engines have allowed Google to be number one in this field.

Google made 146.92 billion dollars in ad revenue according to and there has been a steady increase since 2001. Google realized early on that the key to becoming a multibillion-dollar organization is to build products that people want, leverage the behaviours of customers on these products and advertise other company products thereby making them the number one-stop shop for targeted advertising. It is crucial you know who your customers are in this day and age, why do they use the products and services you offer them, what keeps them interested and why do they keep coming back? It’s alright to have assumptions and hypotheses but you need the data to justify them.

Say for example, as the head of a video game company, you are looking to roll out a new feature on your most successful mobile game, if you do not know the exact group or population in our customer base this new feature will appeal to, that is the very definition of flying blindly and you might be in for a world of hurt and surprise. I still remember 2018’s FallOut 76 developed by Bethesda Game studios as being the worst game in the Fallout franchise when it launched.


It is essential to build dashboards and reporting frameworks across all levels of the business, this is how you make real-time decisions that impact the business in the short and long terms. As a Data scientist, it’s not just enough to build the models and find the patterns in customer behaviour, you also need the help of reporting specialists in the Business Intelligence or wider data team to build out dashboards and reports so your stakeholders can consume the results of your models or A/B test experiments on marketing campaigns to aid their real-time decision making. Metrics like cost per acquisition, return on investment (ROI) are very important to measure after applying specialized marketing strategies to smaller subgroups of the customer base.


The last thing you want to do as a Data scientist is to confidently tell your stakeholders what kind of segments your clustering algorithm will output even though you might have a clue based on the features you think you can engineer from the data but sometimes some features are just not as differentiating as you thought it might be. It is always best to err on the side of caution when promising your stakeholders certain ML solutions.

In my experience, segmentation projects are largely driven not by the type of clustering algorithm you decide to use (which might also be important in terms of computation, efficiency, and inference time) but by feature selection and engineering. You have probably heard it a lot in the data science industry, your business acumen and expertise in the industry you work in will make you 10x better at delivering tangible business solutions rather than deploying fancy ML algorithms just for the sake of it.

If you are a new Data Scientist/ML engineer, learn about the industry of your company and every project you embark on will start to make more sense because you see problems that can be solved using ML and you know the right kind of data to deliver those solutions.


As new data scientists enter the field, they are always so quick to implement the fancy new algorithm and get the best metric score for whatever algorithm they choose. All those are important, we all want robust models that can generalize to unseen data, but think for a second….Am I making this model solve the business problem at hand? For a supervised learning task, I have my labels to predict but what is the composition of those labels, and why am I predicting them? For unsupervised learning tasks, why are my data points clustered this way? Is there a story these clusters are trying to tell? Can I go deeper into segments created and tell a better story?

The idea is to have some answers to these questions before you start training your model. The data will guide you, but it’s essential to understand the business problem you are trying to solve and remember simple, explainable solution is king.


The most popular statistical test in product and marketing and for good reason. There are many pitfalls in taking numbers generated in business at face value. It is your job as a Data scientist to speak up when stakeholders don’t adhere to the statistics of experiment results or fall into the trap of false positives.

There are many pitfalls out there, but a common one is the concept of spurious correlation. A spurious correlation is a relationship between two variables driven by a third and outside variable (sometimes you are unaware of this third variable). You have heard the popular saying “Correlation does not imply causation”, funny enough it happens a lot in the media, “because this big event happened it led to this big event happening” either because they saw both events trending in the same direction or they happened so close to each other in time, sometimes it’s true but most of the time it’s not. The caveat is the media can get away with it most of the time but can a business organization afford to imply as such without proper experimentation and testing, I think not! It is important to know that correlation, although it is a linear relationship between two variables does not imply one causes the other.

Say, for example, a sports streaming platform recently had a big jump in the number of subscribers and at the same time, there was a big jump in viewing figures, we can easily ascertain the former causes the latter. It is a reasonable explanation after all and music to the ears of stakeholders. Now they decide to roll out fancy marketing and advertising campaign to get even more subscribers. After all that money spent, they notice viewing figures remain stagnant or even decreased even though they got more subscribers. They hire a data scientist to find out the real cause. The real cause turns out to be their least watched sport on the platform had two historical teams with a big fan base meeting in the final of a competition for the first time and there was a huge spike in engagement they never captured initially (another case of not knowing thy industry).

That was one big pitfall. Sometimes that happens and people overlook things but if you have control over events in A/B testing, you should aim to avoid pitfalls such as the one above, non-randomization of control and treatment groups, not using the right statistical test for the distribution of your data, not setting the significance level (alpha) before carrying out your experiment, not calculating the sample size/power of your experiment and most importantly not defining null and alternative hypothesis. I am sure there are many more I did not mention.

If you would like to know more about A/B testing, this link will get you started on the basics of what it entails.


  • Customer segmentation is important for a customer-centric/product-focus business.
  • Simple and explainable solutions are always preferred.
  • Know thy industry
  • Avoid pitfalls in data analysis and testing.