Intelligent Agents for Analytics
Whether you are in marketing, web analytics, data science, or even building a Lean Startup, you probably are on board with the importance of analytical decision-making. Go to any related conference, blog, meet up and you will hear at least one of the following terms: Optimization, AB & Multivariate Testing, Behavioral Targeting, Attribution, Predictive Analytics, LTV … the list just keeps growing. There are so many terms, techniques, and next big things that it is no surprise that things start to get a little confusing.
Often I hear people talk about AB Testing, when they really mean Optimization. Or I might hear someone ask if behavioral targeting is better than testing. Part of the problem, I believe, is that folks tend not to have a general way to think about how each of the concepts above are related. Each new concept is presented as a separate idea or approach – leaving the impression that there is just a jumbled collection of methods in the online analyst’s toolbox – when really, they can best be thought of as related parts of a larger framework.
In fact all of the methods/ideas above can be recast as components of a simple, yet powerful framework borrowed from the field of Artificial Intelligence, the intelligent agent.
Of course the intelligent agent is not my idea. I came across it while I was in grad school for AI. The IA approach is used as the guiding principle in Russell and Norvig’s excellent AI text Artificial Intelligence: A Modern Approach – it’s an awesome book, and I recommend anyone who wants to learn more to go get a copy or check out their online AI course.
Why should you care?
Well, personally, I have found that by thinking about analytics problems as intelligent agents, I am able to instantly see how each of these related concepts are related and apply them most effectively individually or in concert. Intelligent Agents are a great way to organize your analytics tool box, letting you grab the right tool at the right time. Additionally, since the conceptual focus of an agent is to figure out what action to take, the approach is goal/action rather than data collection/reporting oriented.
The Intelligent Agent
So what is an intelligent agent? You can think of an agent as being an autonomous entity that takes actions in an environment in order to achieve some sort of goal. If that sounds simple, it is, but don’t let that fool you into thinking that it is not very powerful.
An example of an agent is the Roomba – a robot for vacuuming floors. The Roombas environment is the room/floor it is trying to clean. It wants to clean the floor as quickly as possible. Since it doesn’t have an internal map of your room, it needs to use sensors to observe bits of information about the room that it can use to build an internal representation of the room. To do this it takes some time at first to learn the outline of your room in order to figure out the most efficient way to clean.
Lets take a look at a basic components of the agent and its environment, and walk through the major elements.
First off, we have both the agent, on the left, and its environment. You can think of the environment as where the agent ‘lives’ and goes about its business of trying to achieve its goals.
What are Goals and Reward?
The goals are what the agent wants to achieve, what it is striving to do. Often, agents are set up so that the goals have a value. So when it achieves a goal, it gets a reward based on the value of the goal. So if the goal of the agent is to increase online sales, the reward might be the value of the sale.
Given that the agent has a set of goals and allowable actions, the agent’s task is to learn what actions to take based on its observations of the environment – so what it ‘sees’, ‘hears’, ‘feels’, etc. Assuming the agent is trying to maximize the total value of its goals over time, then it needs to select the action that maximizes this value, given the observations.
So how does the agent determine how to act based on what it observes? The agent accomplishes this by taking the following basic steps:
- Observe the environment to determine its current situation.
- Refer to its internal model of the environment to select an action from the collection of allowable actions
- Take the action.
- Observe of the environment to determine its new situation.
- Evaluate the ‘goodness’ of its new situation – did it reach a goal, if not, does it seem closer or further away from reaching a goal then before it took the past action.
- Update its internal model on how taking that action ‘moved’ it in the environment and if it helped it get or get closer to a goal.
By repeating this process, the agent’s internal model of the environment continuously improves and better approximates the true environment.
Learning and Control
The intelligent agent has two interrelated tasks – to learn and to control. In fact, all online testing and behavioral targeting tools can be thought of as being composed of these two primary components, a learning/analysis component and a controller component. The controller makes decisions about what actions the application is to take. The learner’s task is to make predictions on how the environment will respond to the controller’s actions. Ah, but we have a bit of a problem. The agent’s main objective is to get as much reward as possible. However, in order to do that, it needs to figure out what action to take in each environmental situation.
Explore vs. Exploit
The intelligent agent will need to try out each of the possible actions in order to determine the optimal solution. Of course, to achieve the greatest overall success, poorly performing actions should be taken as infrequently as possible. This leads to an inherent tension between the desire to select the high value action against the need to try seemingly sub-optimal but under explored actions. This tension is often referred to as the “Explore vs. Exploit” trade-off and is a part of optimizing in uncertain environments. Really, what this is getting at is that there are Opportunity Costs to Learn (OCL).
To provide some context for the explore/exploit trade-off consider the standard A/B approach to optimization. The application runs the A/B test by first randomly exposing different users to the A/B treatments. This initial period, where the application is gathering information about each treatment, can be thought of as the exploration period. Then, after some statistical threshold has been reached, one treatment is declared the ‘winner’ and is thus selected to be part of the default user experience. This is the exploit period, since the application is exploiting its learning’s in order to provide the optimal user experience.
AB/Multivariate Testing Agent
In the case of AB Testing both the learning and controller components are fairly unsophisticated. The way the controller selects the actions is to just pick one of them at random. If you are doing a standard AB style test then the controller picks from a uniform distribution – all actions have an equal chance of selection.
The learning component is essentially just a report or set of reports, perhaps calculating significance tests. Often there is no direct communication from the learning module to the controller. In order to take advantage of the learning, a human analyst is required to review the reporting, and then based on results, make adjustments to the controller’s action selection policy. Usually this means that the analyst will select one of the test options the ‘winner’, and remove the rest from consideration. So AB Testing can be thought of as a method for the agent to determine the value of each action.
I just quickly want to point out, however, that the AB Testing with analyst approach is not the only way to go about determining and selecting best actions. There are alternative approaches that try to balance in real-time the learning (exploration) and optimization (exploitation). They are often referred to as adaptive learning and control. For adaptive solutions, the controller is made ‘aware’ of the learner and is able to autonomously make decisions based on the most recent ‘beliefs’ about the effectiveness of each action. This approach requires that the information stored in the learner is made accessible to the controller component. We will see a bit of this when we look at Multi-armed Bandits in an upcoming post.
Maybe you call it targeting, or segmentation, or personalization, but whatever you call it, the idea is different folks get different experiences. In the intelligent agent framework, targeting is really just about specifying the environment that the agent lives in.
Let’s revisit the AB Testing agent, but we add some user segments to it.
You can see the segmented agent differs in that its environment is a bit more complex. Unlike before, where the AB Test agent just needed to be aware of the conversions (reward) after taking an action, it now also needs to ‘see’ what type of user segment it is as well.
Targeting or Testing? It is the Wrong Question
Notice that with the addition of segment based targeting, we still need to have some method of determining what actions to take. So targeting isn’t an alternative to testing, or vice versa. Targeting is just when you use a more complex environment for your optimization problem. You still need to evaluate and select the action. In simpler targeting environments, it might make sense to use the AB Testing approach as we did above. Regardless, Targeting and Testing shouldn’t be confused as competing approaches –they are really just different parts of a more general problem.
Ah, well you may say, ‘hey that is just AB Testing with Segments, not behavioral targeting. Real targeting uses fancy math – it is a totally different thing.’ Actually, not really. Lets look at another targeting agent, but this time instead of a few user segments, we have a bunch of user features.
Now the environment is made up of many individual bits of information, such that there could be millions or even billions of possible unique combinations. Hmm, it is going to get a little tricky to try to run your standard AB style test here. Too many possible micro segments to just enumerate them all in a big table, and even if you did, you wouldn’t have enough data to learn since most of the combinations would have 1 user at most.
That isn’t too much of a problem actually, because rather than setting up a big table, we can use approximating functions to represent the map between observed features to the value of each action
Function Mapping Observed Features to Actions
Not only does function approximation reduce the size of the internal representation, but it also allows us to generalized to observations that the agent has not come across before. Also we are free to pick what ever functions, models etc. we want here. How we go about selecting and calculating these relationships is often in the domain of Predictive Analytics.
Ah, but we still have to figure out how to select the best action. The exploration/exploitation tradeoff hasn’t gone away. If we didn’t care about the opportunity costs to learn, then we could try all the actions out randomly for a time, train our models and then switch off the learning and apply the models. Of course there is a cost to learn, which is why Google, Yahoo! and other Ad targeting platforms, spend quite a bit of time and resources trying to come up with sophisticated ways to learn as fast as possible.
This post is probably way too long as it is, so let me just summarize a few points.
Most online learning problems can be reformulated as an intelligent agent problem.
Optimization – is the discovery of best action for each observation of the environment in the least amount of time. In other words, optimization should take into account the opportunity cost to learn.
Testing – either AB or Multivariate, is just is one way, of many, to learn the value of taking each action in a given environment.
Targeting – is really just specifying the agent’s environment. Efficient targeting provides the agent with just enough detail so that the agent can select the best actions for each situation is finds itself in.
Predictive Analytics – covers how to specify which internal models to use and how to best establish the mapping between the agent’s observations, and the actions. This allows the agent to predict what the outcome will be for each action it can take.
I didn’t get to talk about attribution and LTV. I will save that for another post since this post is already long, but in a nutshell, you just need to extend the agent to handle sequential decision processes.
Please comment and let me know what you think.