Data Mining (142 page)

Read Data Mining Online

Authors: Mehmed Kantardzic

BOOK: Data Mining
6.75Mb size Format: txt, pdf, ePub
Data-Mining Vendor
Address
Web Site/Phone Number
MapInfo Corp.
1 Global View, Troy, NY 12180, USA
www.mapinfo.com
Information Builders, Inc.
1250 Broadway, 30
th
Floor, New York, NY 10001-3782, USA
Phone: 212-736-4433
Prism Solutions, Inc.
1000 Hamlin Court, Sunnyvale, CA 94089, USA
Phone: 408-752-1888
Oracle Corp.
500 Oracle Parkway, Redwood Shores, CA 94086, USA
Phone: 800-633-0583
Evolutionary Technologies, Inc.
4301 Westbank Drive, Austin, TX 78746, USA
Phone: 512-327-6994
Information Advantage, Inc.
12900 Whitewater Drive, Suite 100, Minnetonka, MN 55343, USA
Phone: 612-938-7015
IntelligenceWare, Inc.
55933 W. Century Blvd., Suite 900, Los Angeles, CA 90045, USA
Phone: 310-216-6177
Microsoft Corporation
One Microsoft Way, Redmond, WA 98052, USA
Phone: 206-882-8080
Computer Associates International, Inc.
One Computer Associates Plaza, Islandia, NY 11788-7000, USA
Phone: 516-342-5224

APPENDIX B
DATA-MINING APPLICATIONS

Many businesses and scientific communities are currently employing data-mining technology. Their number continues to grow, as more and more data-mining success stories become known. Here we present a small collection of real-life examples of data-mining implementations from the business and scientific world. We also present some pitfalls of data mining to make readers aware that this process needs to be applied with care and knowledge (both, about the application domain and about the methodology) to obtain useful results.

In the previous chapters of this book, we have studied the principles and methods of data mining. Since data mining is a young discipline with wide and diverse applications, there is a still a serious gap between the general principles of data mining and the domain-specific knowledge required to apply it effectively. In this appendix, we examine a few application domains illustrated by the results of data-mining systems that have been implemented.

B.1 DATA MINING FOR FINANCIAL DATA ANALYSIS

Most banks and financial institutions offer a wide variety of banking services such as checking, savings, business and individual customer transactions, investment services, credits, and loans. Financial data, collected in the banking and financial industry, are often relatively complete, reliable, and of a high quality, which facilitates systematic data analysis and data mining to improve a company’s competitiveness.

In the banking industry, data mining is used heavily in the areas of modeling and predicting credit fraud, in evaluating risk, in performing trend analyses, in analyzing profitability, as well as in helping with direct-marketing campaigns. In the financial markets, neural networks have been used in forecasting stock prices, options trading, rating bonds, portfolio management, commodity-price prediction, and mergers and acquisitions analyses; it has also been used in forecasting financial disasters. Daiwa Securities, NEC Corporation, Carl & Associates, LBS Capital Management, Walkrich Investment Advisors, and O’Sallivan Brothers Investments are only a few of the financial companies who use neural-network technology for data mining. A wide range of successful business applications has been reported, although the retrieval of technical details is not always easy. The number of investment companies and banks that mine data is far more extensive than the list mentioned earlier, but you will not often find them willing to be referenced. Usually, they have policies not to discuss it. Therefore, finding articles about banking companies who use data mining is not an easy task, unless you look at the SEC reports of some of the data-mining companies who sell their tools and services. There, you will find customers such as Bank of America, First USA Bank, Wells Fargo Bank, and U.S. Bancorp.

The widespread use of data mining in banking has not been unnoticed.
Bank Systems & Technology
commented that data mining was the most important application in financial services in 1996. For example, fraud costs industries billions of dollars, so it is not surprising to see that systems have been developed to combat fraudulent activities in such areas as credit card, stock market, and other financial transactions. Fraud is an extremely serious problem for credit-card companies. For example, Visa and MasterCard lost over $700 million in 1995 from fraud. A neural network-based credit card fraud-detection system implemented in Capital One has been able to cut the company’s losses from fraud by more than 50%. Several successful data-mining systems are explained here to support the importance of data-mining technology in financial institutions.

U.S. Treasury Department

Worth particular mention is a system developed by the Financial Crimes Enforcement Network (FINCEN) of the U.S. Treasury Department called “FAIS.” FAIS detects potential money-laundering activities from a large number of big cash transactions. The Bank Secrecy Act of 1971 required the reporting of all cash transactions greater than $10,000, and these transactions, of about 14 million a year, are the basis for detecting suspicious financial activities. By combining user expertise with the system’s rule-based reasoner, visualization facilities, and association-analysis module, FIAS uncovers previously unknown and potentially high-value leads for possible investigation. The reports generated by the FIAS application have helped FINCEN uncover more than 400 cases of money-laundering activities, involving more than $1 billion in potentially laundered funds. In addition, FAIS is reported to be able to discover criminal activities that law enforcement in the field would otherwise miss, for example, connections in cases involving nearly 300 individuals, more than 80 front operations, and thousands of cash transactions.

Mellon Bank, USA

Mellon Bank has used the data on existing credit-card customers to characterize their behavior and they try to predict what they will do next. Using IBM Intelligent Miner, Mellon developed a credit card-attrition model to predict which customers will stop using Mellon’s credit card in the next few months. Based on the prediction results, the bank can take marketing actions to retain these customers’ loyalty.

Capital One Financial Group

Financial companies are one of the biggest users of data-mining technology. One such user is Capital One Financial Corp., one of the nation’s largest credit-card issuers. It offers 3000 financial products, including secured, joint, co-branded, and college-student cards. Using data-mining techniques, the company tries to help market and sell the most appropriate financial product to 150 million potential prospects residing in its over 2-terabyte Oracle-based data warehouse. Even after a customer has signed up, Capital One continues to use data mining for tracking the ongoing profitability and other characteristics of each of its customers. The use of data mining and other strategies has helped Capital One expand from $1 billion to $12.8 billion in managed loans over 8 years. An additional successful data-mining application at Capital One is fraud detection.

American Express

Another example of data mining is at American Express, where data warehousing and data mining are being used to cut spending. American Express has created a single Microsoft SQL Server database by merging its worldwide purchasing system, corporate purchasing card, and corporate-card databases. This allows American Express to find exceptions and patterns to target for cost cutting. One of the main applications is loan application screening. American Express used statistical methods to divide loan applications into three categories: those that should definitely be accepted, those that should definitely be rejected, and those which required a human expert to judge. The human experts could correctly predict if an applicant would, or would not, default on the loan in only about 50% of the cases. Machine learning produced rules that were much more accurate—correctly predicting default in 70% of the cases—and that were immediately put into use.

MetLife, Inc.

MetLife’s Intelligent Text Analyzer has been developed to help automate the underwriting of 260,000 life insurance applications received by the company every year. Automation is difficult because the applications include many free-form text fields. The use of keywords or simple parsing techniques to understand the text fields has proven to be inadequate, while the application of full semantic natural-language processing was perceived to be too complex and unnecessary. As a compromise solution, the “information-extraction” approach was used in which the input text is skimmed for specific information relevant to the particular application. The system currently processes 20,000 life-insurance applications a month and it is reported that 89% of the text fields processed by the system exceed the established confidence-level threshold.

Bank of America (USA)

Bank of America is one of the world’s largest financial institutions. With approximately 59 million consumer and small business relationships, 6,000 retail banking offices and more than 18,000 ATMs, Bank of America is among the world’s leading wealth management companies and is a global leader in corporate and investment banking and trading across a broad range of asset classes. Bank of America identified savings of $4.8 million in2 years (a 400% return on investment) from use of a credit risk management system provided by SAS institute consultants and based on statistical and data-mining analytics [“Predicting Returns from the Use of Data Mining to Support CRM,”
http://insight.nau.edu/WhitePapers.asp
]. They have also developed profiles of most valuable accounts, with relationship managers being assigned to the top 10% of the bank’s customers in order to identify opportunities to sell them additional services [“Using Data Mining on the Road to Successful BI, Part 3,” Information Management Special Reports, Oct. 2004]. Recently, to retain deposits, the Global Wealth and Investment Management division has used KXEN Analytic Framework in identifying clients likely to move assets and then creating offers conducive to retention [“KXEN Analytic Framework,” Information Management Magazine, July/Aug 2009].

B.2 DATA MINING FOR THE TELECOMUNICATIONS INDUSTRY

The telecommunication industry has quickly evolved from offering local and long-distance telephone services to providing many other comprehensive communication services including voice, fax, pager, cellular phone, images, e-mail, computer, and Web-data transmission, and other data traffic. The integration of telecommunications, computer networks, Internet, and numerous others means of communication and computing is under way. The U.S. Telecommunication Act of 1996 allowed Regional Bell Operating Companies to enter the long-distance market as well as offer “cable-like” services. The European Liberalization of Telecommunications Services has been effective from the beginning of 1998. Besides deregulation, there has been a sale by the FCC of airwaves to companies pioneering new ways to communicate. The cellular industry is rapidly taking on a life of its own. With all this deregulation of the telecommunication industry, the market is expanding rapidly and becoming highly competitive.

The hypercompetitive nature of the industry has created a need to understand customers, to keep them, and to model effective ways to market new products. This creates a great demand for data mining to help understand the new business involved, identify telecommunication patterns, catch fraudulent activities, make better use of resources, and improve the quality of services. In general, the telecommunications industry is interested in answering some strategic questions through data-mining applications such as:

  • How does one retain customers and keep them loyal as competitors offer special offers and reduced rates?
  • Which customers are most likely to churn?
  • What characteristics indicate high-risk investments, such as investing in new fiber-optic lines?
  • How does one predict whether customers will buy additional products like cellular services, call waiting, or basic services?
  • What characteristics differentiate our products from those of our competitors?

Companies like AT&T, AirTouch Communications, and AMS Mobile Communication Industry Group have announced the use of data mining to improve their marketing activities. There are several companies including Lightbridge and Verizon that use data-mining technology to look at cellular fraud for the telecommunications industry. Another trend has been to use advanced visualization techniques to model and analyze wireless-telecommunication networks. Selected examples of data-mining applications in the telecommunication industry follow.

Cablevision Systems, Inc.

Cablevision Systems Inc., a cable TV provider from New York, was concerned about its competitiveness after deregulation allowed telecom companies into the cable industry. As a consequence, it decided that it needed a central data repository so that its marketing people could have faster and more accurate access to data. Using data mining, the marketing people at Cablevision were able to identify nine primary customer segments among the company’s 2.8 million customers. This included customers in the segment that are likely to “switch” to another provider. Cablevision also focused on those segments most likely to buy its offerings for new services. The company has used data mining to compare the profiles of two sets of targeted customers—those who bought new services and those who did not. This has led the company to make some changes in its messages to customers, which, in turn, has led to a 30% increase in targeted customers signing up for new services

Worldcom

Worldcom is another company that has found great value in data mining. By mining databases of its customer-service and telemarketing data, Worldcom has discovered new ways to sell voice and data services. For example, it has found that people who buy two or more services were likely to be relatively loyal customers. It also found that people were willing to buy packages of products such as long-distance, cellular-phone, Internet, and other services. Consequently, Worldcom started to offer more such packages.

BBC TV

TV-program schedulers would like to know the likely audience for a proposed program and the best time to show it. The data for audience prediction are fairly complex. Factors, which determine the audience share gained by a particular program, include not only the characteristics of the program itself and the time at which is shown, but also the nature of the competing programs in other channels. Using Clementine, Integral Solutions Limited developed a system to predict television audiences for the BBC. The prediction accuracy was reported to be the same as that achieved by the best performance of BBC’s planners.

Bell Atlantic

Bell Atlantic developed telephone technician dispatch system. When a customer reports a telephone problem to Bell Atlantic, the company must decide what type of technician to dispatch to resolve the issue. Starting in 1991, this decision was made using a hand-crafted expert system, but in 1999 it was replaced by another set of rules created with machine learning. The learned rules save Bell Atlantic more than 10 million dollars per year because they make fewer erroneous decisions. In addition, the original expert system had reached a stage in its evolution where it could not be maintained cost-effectively. Because the learned system was built by training it on examples, it is easy to maintain and to adapt to regional differences and changing cost structures.

Other books

El fin de la infancia by Arthur C. Clarke
Deerskin by Robin McKinley
Scarlet Woman by Shelley Munro
Hotel Living by Ioannis Pappos
The New Wild by Holly Brasher
Tropical Storm - DK1 by Good, Melissa
It Begins with a Kiss by Eileen Dreyer