The potential for mining cost-saving and revenue-boosting ideas from data is increasing as companies build bigger data warehouses, applications become more integrated, computers grow more powerful and vendors of analytic software introduce products that are easier to use.
But many companies that have made huge investments in terabyte-size data stores aren't using them effectively to forecast the future -- to predict, for example, which customers are likely to leave, which ones will probably respond to the next promotion, which ones are ripe for cross-selling and what will happen to sales if prices are increased by 5 percent.
While many of the products that can answer those questions use esoteric techniques such as neural networks, logistic regression and support-vector machines, they don't require a Ph.D. in math, users say. Indeed, the biggest stumbling block to using "predictive analytics" is getting the data, not analyzing it, they say.
That has been the case so far at BankFinancial in Chicago. It uses the Clementine data mining "workbench" from SPSS to develop models that predict customer behavior so the bank can, for example, more accurately target promotions to customers and prospects.
The bank uses Clementine's neural network and regression routines for these models. It's also beginning to use PredictiveMarketing, SPSS's new package of "best-practice templates" for helping users set up predictive models.
Models Easy, Data Hard
PredictiveMarketing will reduce the time it takes the bank to develop a model by 50 percent to 75 percent, says William Connerty, assistant vice president of market research. The first major application is a model to predict customer "churn," the rate at which customers come and go. It will be used to identify the customers most likely to leave the bank during the coming month.
The problem is, the model has access only to account information prepared from weekly and monthly summaries, not to the daily customer activity that would make it more timely. "The biggest obstacle is getting transaction data and dealing with disparate data sources," Connerty says.
The data that BankFinancial needs in order to assess customer loyalty comes from several bank systems and unintegrated customer survey databases. A lot of systems integration and interface work needs to be done before the bank will see the full fruits of its modeling tools, Connerty says.
"We need to increase our efficiency, our ability to deliver actionable information to decision-makers," he says. "I'm under a lot of pressure to deliver."
KXEN (Knowledge Extraction Engines), an analytic software company in San Francisco, is another vendor that has heard users' cries for easier modeling. It claims that its Analytic Framework product can greatly reduce the time it takes to define, develop and run a model. For example, KXEN's Consistent Coder module automatically transforms raw, inconsistent data into clean, uniformly formatted data that's ready for modeling.
"The big sweet spot for KXEN is it cuts data preparation time in half," says KXEN user Seymour Douglas, director of CRM and database marketing at Cox Communications in Atlanta. It also masks complexity, he says, "so you don't need a big-dollar statistician; you can put someone at a more junior level, because a lot of heavy lifting is done by the tool."
Cox, a cable services provider, uses KXEN's Analytic Framework to identify its most loyal and profitable customers, predict churn and forecast who might be most receptive to cross-selling pitches.
One model revealed that customers in apartments tend to be relatively short-term Cox customers. "So we now offer them product packages where we try to recover our investment quicker," Douglas says. "Without KXEN, that would not have been obvious at all."
But the labor-saving benefits of KXEN come at a stiff price, he says. "For a five-seat license, you'll pay about $360,000, plus an annual fee of about $60,000."
Robert Berry, president and CEO of Central Michigan University Research, says many companies have made huge investments in data warehouses but tend to use them more for analysis of past performance than for "predictive intelligence." One reason is they aren't organized for it, he says.
Berry says predictive modeling should involve collaboration among people who have IT, analytical and business expertise.
"You have to build a business-intelligence team," he says. "But companies are struggling with common issues like who owns it, who manages it and so on. How do you pull the business skills, the IT skills and the analytical skills across corporate silos and create this team? It's not easy."
Berry advises having the business-intelligence team report directly to a business unit. "It needs to have a definite link to corporate profits," he says.
Giving Ratings to Leads
Hewlett-Packard's Enterprise Systems Group pulls together people with diverse backgrounds and strong analytical skills -- including some people who also have IT skills -- for its group that does predictive modeling of customer behavior. The group is part of "CRM operations" under a vice president for sales, says Randy Collica, a senior business/data mining analyst. On a project-by-project basis, people from sales, marketing and other departments may participate, he says.
Collica says it's not necessary to have a professional mathematician on staff in order to do statistical modeling. "But you need some basic statistics," he says. "If someone says, ëThis is a normal distribution,' you at least need to know what that means."
HP uses software from SAS Institute to mine its database of customers and prospects, using regression and other techniques to predict churn, loyalty and where to target promotions. HP also mines its huge stores of unformatted text information, conducting a kind of predictive analytics that's much less common.
HP has some 750GB of customer information, including data from premerger Compaq Computer that dates to 1984. It has customer data from its call centers, including e-mails from customers and prospects and text typed in during voice calls. Included in these call records are "lead ratings," call center personnel's assessments of a caller's readiness to buy -- coded as "hot," "warm" or "suspect."
But some records lack lead ratings, so HP has used SAS's Text Miner to predict the rating these customers should get. Text Miner does that by comparing text from unrated customers to "clusters" of text from rated customers that contain similar terms and concepts.
Text Miner works by preprocessing raw text after transforming it into a grid, or matrix, that relates terms to documents. The matrix indicates the frequency of every term in the document collection. Specific bits of important information, such as customer names, are extracted and summarized.
Next, a mathematical technique called singular-value decomposition replaces the original matrix with a much smaller matrix by purging unimportant words and highlighting more relevant ones.
The new matrix can be used to place associated terms and documents into categories. HP helps standardize the matrix with synonym lists that say, for example, that customers calling about "disk drives" or "hard disks" are really all interested in storage.
Finally, clustering, classification and predictive methods are applied to the reduced data using traditional data mining techniques. HP uses "memory-based reasoning," a technique that makes a prediction about a record by comparing it with past records with similar characteristics.
These techniques can predict the customer-lead rating with 85 percent accuracy, Collica says. "Without this technique, I'd have had to go back to the original records and actually read them," he explains. "And when you have that much volume, you can't read them all."
HP also intends to use the text mining and clustering techniques to find out what loyal customers tend to talk about when they contact an HP call center, as well as what's on the minds of those customers deemed least loyal. The goal, of course, is to win over the less loyal ones.
Collica says HP has yet to exploit a number of promising text data sources. For example, it will analyze the text in warranty claims to glean insights about problems customers are having with products and the text in warranty cards to better understand its customers.
HP will also try to mine information from customers' and prospects' own Web sites. "Web sites are a great source of wonderful information about your customers," Collica says.
Tool Box for Forecasting And Analysis
Regression: Fits a line to a set of historical data points to minimize the sum of the squares of the distances of the data points to the line. For example, if the line expresses the relationship between independent variables such as age, sex and income to a dependent variable such as sales, then it defines an equation that can be used to forecast sales.
Time Series Analyses
-- Moving average: Each new point in the time series is the average of some number of earlier consecutive data points, sometimes chosen to eliminate seasonal factors or other irregularities.
-- Exponential smoothing: Similar to the moving average, except more recent data points are given more weight.
Memory-based reasoning: Sometimes called the "nearest neighbor method," it's an artificial intelligence technique that can forecast something by identifying the most similar past cases and applying that information to a new case.
Artificial neural networks: Patterned after the human brain, they're composed of a large number of processing elements (neurons) tied together with weighted connections (synapses). They're trained by looking at real-world examples -- for example, historical sales data and the past values of variables that may influence sales. The training adjusts the weights, which store the data needed to solve specific problems, such as sales forecasting.
Decision trees: Sequential decisions are drawn as branches of a tree, stemming from an initial decision point and branching out to multiple possible outcomes. The trees can be used to predict the most likely outcome and to forecast financial outcomes by multiplying costs or returns at each branch by the probability that of branch being taken.
-- Gary H. Anthes