Reinforcement Learning Empowers Retailers
Part I. Communication Chains Optimization Issue
The T1A Advanced Analytics team is developing ML solutions for applied industries like retail, banking and telecommunications, etc., where multiple issues require outstanding solutions. For instance, optimization of customer communication chains with Reinforcement Learning (RL) that are covered in this article.
First things first, let’s make a little glossary.
CRM stands for customer relationship management system. As a rule, CRM comprises the process of collecting and analysis of customer knowledge used to increase sales and enhance the service level.

A Customer is an entity that uses the services of an organization.

Customer Attributes are items of customer knowledge. For example:
  • Average bill
  • Frequency of monthly purchase
  • Age
  • Residence.

Marketing Campaign \ Communication \ Offering are promotional offers to customers made by an organization. For example:

  • You have got XXX reward points valid until YYY.
  • You have got XXX discount on YYY brand products.

Communication Chain is a subsequence of marketing campaigns.

Loyalty Program is a combination of marketing efforts aimed to increase customer value. A discount card is a bright example.

Customer Clustering aims to classify customers into groups based on similar customer behavior.

A Recommendation System is an engine that generates the best possible offers for customers, in terms of business value.

LTV (lifetime value) is the predicted profit attributed to the entire future relationship with a customer.
When an analyst is developing a loyalty program, the common expectation is that his/her key objective is to create an excellent recommendation system, which knows exactly what the customer wants, when and how many. Indeed, this is important! Moreover, it generates certain profits, but this is not the core objective of the business. First and foremost, every company wishes its customers to make a habit of using its services. An ideal customer uses services provided only by this company, consistently generates profit, recommends the company’s services to his/her friends and requires minimum expenses from the company. Customer loyalty is not an instant gain. Thus, the company’s objective is to drive the customer from the first order to regular purchases in the most efficient way.

Let us imagine a school group where a teacher should not only explain a rule or an algorithm to the pupils but encourage a love for learning and for the subject. Experienced teachers know that studying is not always a joy. Sometimes it hurts. But the final result is the only thing that is really important. Thus, the teacher is developing a personal approach to every pupil, taking into account multiple individual factors.

Unlike a small school group, a company may have dozens of millions of customers. And every customer shall be taken by the hand and accompanied to the required state. It is not enough to meet the desire of the customer just once. This task obviously goes beyond human capabilities.

We have the following settings:
Drive the customer to some ideal state (e.g., high LTV or regular response to the marketing campaigns). Let’s assume that we can clearly define whether the customer is in the ideal state or not.
We have some customer knowledge derived from his/her transaction history, registration form, etc.
We can influence customer behavior through marketing communications inspiring certain activities.
Thus, our task is to find the best possible communication chain that will drive the customer to the ideal state. Every time we make a decision on the marketing campaign, we take into account the customer knowledge currently available (ref. i.2). Since every customer has his/her own individual characteristics, the best communication chain will be different for each individual customer.
Our solution to this task is based on the Reinforcement Learning concept. However, before we get down to our approach, we have prepared a short journey into the theory.