Discover PerformanceHP Software's community for IT leaders // March 2013
Adding power to data mining with R
Technologies such as the R language are making it easier than ever to get big business value out of data mining.
If you’ve got data, then data mining could improve your business. Identifying hidden patterns within data positions your organization to make better business decisions in less time.
But data mining requires specialized domain expertise that many organizations can’t afford. Fortunately, new techniques and technologies are emerging to help drive down the complexity and smooth over some of the biggest technical challenges.
Data mining is used to analyze sequences, make associations between data, and to cluster and classify data in many different ways. It is used in a wide variety of industries to provide much-needed business value, to identify risk, and to find new business opportunities. It can help answer critical questions in a variety of sectors:
- Financial services—What is the probability of default for each mortgage in our portfolio?
- Sensor data—What is the probability of failure for each of my in-home devices?
- Health care—What is the probability that this medical insurance claim is fraudulent?
- Retail—What items are my customers most likely to purchase next from me?
The use cases range from behavior analytics (making meaningful predictions based on past and current buying behavior) to claims analyses (identifying anomalies, such as fraud, or identifying product defects early in the product release phase).
Waiting on the sidelines
While the potential benefits are well documented, many organizations cannot achieve their expected results from data mining. The most common barriers to data mining adoption are:
- Complexity—Data mining requires specialized domain expertise and can be difficult to integrate into applications.
- Cost—The initial costs of software, hardware, and implementation are a concern for many organizations.
- Performance—Many algorithms associated with data mining are compute-intensive, which limits the potential areas for application because decision makers cannot obtain results in a timely manner.
When confronting the reality of these challenges, many organizations conclude they are not ready for data mining—it just takes too much time, money, and effort to obtain justifiable business value. Others have looked for alternatives to implementing data mining, mostly by writing custom code that is expensive to maintain and error-prone. These companies are paying a high cost to compensate for the challenges of data mining. Fortunately, some better options have emerged.
R you experienced?
Today’s organizations can move beyond these inhibitors. One of the most effective tools in the arsenal is R, a powerful open-source programming language for statistical computing. Especially when combined with a commercial analytics platform, R can help you extract big value from your big data.
R provides many algorithms that make data mining results much simpler to achieve. In this way, R reduces the learning curve and makes data mining more accessible to organizations with limited resources. As an open source project, R is free, which helps take high cost out of the equation and makes it an attractive alternative to maintaining custom code.
R does not have to stand alone. When R is integrated with a highly capable BI platform, organizations can enjoy its reduced complexity while also enjoying the power of massive scalability, data compression, high performance, and a variety of visualization tools. For example, the integration of R into the HP Vertica Analytics Platform lets your enterprise sift through your big data quickly to find anomalies using advanced data mining algorithms provided by R.
To learn about Vertica’s approach to data mining using R, download the white paper “R You Ready? Turning big data into big value with the HP Vertica Analytics Platform and R” (reg. req’d).
Welcome to a new reality of split-second decisions and marketing by the numbers.
Looking toward the era when everyone — and everything — is connected.
Introduction to Enterprise 20/20
What will a successful enterprise look like in the future?
Challenges and opportunities for the CIO of the future.
Dev Center 20/20
How will we organize development centers for the apps that will power our enterprises?
IT Operations 20/20
How can you achieve the data center of the future?
What the workforce of 2020 can expect from IT, and what IT can expect from the workforce.
Preparing today for tomorrow’s threats.
Data Center 20/20
The innovation and revenue engine of the enterprise.