The Holy Grail of mobile apps: How to predict customer LTV

Predictive LTV - Fry shut up and take my money

Martin Colaco will be giving a talk on “Feature Extraction for Predictive LTV Modeling using Hadoop, Hive, and Cascading” this Wednesday, April 10 at 6 p.m. at Kontagent’s San Francisco office at 201 Mission St., 25th Floor.

In the world of mobile applications, the ability to predict a customer’s lifetime value (LTV) is akin to finding the Holy Grail of mobile analytics. Imagine if you knew how much your customers were going to spend in your app over their entire lifetime. It’d be an incredible business advantage, one with uses beyond just being able to accurately forecast your revenue for investors.

You’d be able to hone your marketing to target only the highest monetizing customers and increase your profits by tweaking your app to keep your big spenders engaged longer.

But the applications don’t stop there. Predictive analytics could also solve the litany of other problems plaguing the app market, like fraud. Heck, with the right data scientists and business analysts on hand, you could potentially even figure out a way to transform app pirates into paying customers.

Sounds great, right? Unfortunately, being able to accurately predict LTV at this level is incredibly challenging. But building the perfect predictive model might not be so faraway and the process starts with feature extraction.

What is feature extraction?

The process for predicting LTV (or any desired customer behavior) can be broken down into three basic steps:

  1. Feature extraction
  2. Building a predictive model
  3. Applying that model to predict future customer behaviors

In essence, feature extraction is the framework for predictive analytics. It’s the first and arguably most vital step in this process because any mistakes here will snowball and lead to invalid predictions. Feature extraction works by examining a set of customer input data in order to identify relevant behaviors that lead a person to take a desired action.

By “extracting” these features from one customer data set, you can use them to develop a predictive model that can be applied to other data sets in order to forecast when those same behaviors will occur.

It sounds complicated, but it’s actually pretty intuitive.

Kontagent Predicting customer lifetime value - Halfbrick Games

By extracting features that lead to spending in one data set, you can build a model to predict LTV in another.

Let’s say you have a game app that’s been out for a year and you want to find out which of your newest customers are going to buy your virtual power-up. How would you figure this out?

Using feature extraction, you would first find all the customers who bought your power-up in the past. Once you’ve identify these spenders, you would then analyze their past behaviors in order to pinpoint and extract what actions eventually led them to monetize (e.g. time played, level reached, customer age, etc.).

By extracting these relevant features, you could then piece together patterns of behaviors (or models) customers exhibit prior to buying your power-up. And if your new customers are doing these same actions, you can probably assume there’s a good chance they’ll also buy your power-up in the future, too.

The big data problem

The problem with feature extraction is that it gets exponentially more difficult as the number of customers you’re extracting information from increases.

It’s the difference between finding similarities among a hundred people versus millions. Feature extraction works relatively easily when it comes to predicting LTV at a smaller scale because it’s possible to analyze every customer.

But when you’re dealing with the kind of big data sets common in the mobile domain, individually examining millions of customers just isn’t feasible. Moreover, you may also find that features that didn’t seem important at a smaller scale become critical at larger ones.

Predictive LTV modeling - Suit and tie

“I guess I shouldn’t wear pajamas today since I’m going to a funeral.”

Remember, predicting customer LTV is, in essence, an attempt to predict human behavior. And people are fickle.

Think of how you decided what to wear this morning. What factors caused you to pick a sweater over a shirt, black shoes instead of brown, or a tie instead of a scarf? The possible influences are endless. Now imagine trying to create a formula that can be applied to the whole world and that’s what data scientists are facing in their quest to predict mobile customer LTV.

Of course, that’s not to say the task is impossible. While it may never be perfect, in the confined realm of apps it can come pretty darn close. It’s largely dependent on the data scientists themselves and their ability to interpret data and how people interact with technology.

Predicting LTV is as much about human psychology as it is about metrics, business and statistics. And there are a lot of breakthroughs in feature extraction happening everyday that’s getting us closer than ever to this Holy Grail of data analysis.

To learn more about feature extraction and how it’s being used to predict mobile customer LTV, attend Martin Colaco’s presentation on “Feature Extraction for Predictive LTV Modeling using Hadoop, Hive, and Cascading” this Wednesday, April 10 at 6 p.m. at Kontagent’s San Francisco office at 201 Mission St., 25th Floor.


Kontagent kScope - Martin Colaco - Data Scientist ProfileAbout the author: Martin Colaco is head of Kontagent’s data science team. Prior to Kontagent, Martin served as a data consultant for the Denver Nuggets where he developed a game state matrix for analyzing win/loss probability. He received his bachelor’s degree and Ph.D. in chemical engineering from Stanford and UC Berkeley, respectively. Send him your thoughts and questions at

Related Stories

Leave a comment


This will only be used to quickly provide signup information and will not allow us to post to your account or appear on your timeline.