Discover Performance

HP Software's community for IT leaders // January 2013

Can the cloud solve big data's new challenge?

The cofounder and CTO of Hadoop powerhouse Cloudera considers the next phase of taming big data, and whether the solution is in the cloud.

New issues come to enterprise IT all the time, but managing information in 2013 will have at least one continuing challenge: big data. In years past, enterprises were primarily engaged in digging trenches: Where is big data coming from? Why do we need it? And where do we store it? For many organizations, 2013 will be the start of the next step: What can we do with it?

To understand the ascendant challenges and opportunities, we spoke to Amr Awadallah, CTO of Cloudera, a leading developer of Hadoop solutions to big data’s challenges.

Data interactivity is job #1

The most critical issue that CIOs need to be thinking about in the year ahead, says Awadallah, is how to put their data to good use.

“CIOs need to figure out how to extract more value out of the data they have. How to use data not just as numbers that feed into a dashboard, but rather something that can become a product in itself or can help the business make more money,” he says.

Extracting value from data requires interactivity—the ability to dynamically change the view of data or drill down to greater granularity. “Many enterprises want to see that happen right away,” Awadallah says, “but you have to process and deliver large amounts of data in less than a second, and that’s where the challenge lies.”

To make this possible without shrinking or predetermining the available data, the marketplace is responding with two basic approaches: 

  • heavy parallelism—a fully distributed system using many nodes to attack the data set in parallel;
  • in-memory—keeping all of the data in-memory to facilitate new views of data without round trips to the file system.

In-memory systems generally provide the lowest latency, but at a higher cost, especially when data size reaches the hundreds of terabytes.

Adjusting to server sprawl

When data grows, infrastructure grows in kind. Furthermore, “organizations are just getting way more sophisticated in how they run their business,” Awadallah says, which means that more business functions are being computerized.

The result is an orders-of-magnitude increase in servers. Traditionally, enterprises only had a few hundred servers, maybe a few thousand at the high end, Awadallah explains. Thanks to growing data and business complexity, today’s enterprises have to manage not a thousand servers, but tens of thousands. Thus, another key challenge is for enterprises to make their infrastructure more scalable from a management point of view.

“And both of these things—more data and more complexity—are driven by employees and customers being way more connected because of mobile devices,” Awadallah says—mobile devices that are constantly on, constantly connected, and constantly using services and creating data in the cloud.

Speaking of the cloud …

While creating data interactivity is the primary concern in the year ahead, with infrastructure sprawl close behind, a third problem, Awadallah says, is figuring out how to use the cloud to acquire outside help with enterprise data management challenges without putting data or compliance at risk.

With an external cloud—third-party infrastructure-as-a-service, which includes services like Amazon Web Services (AWS), Microsoft Azure, and others—you don’t have to manage it yourself, Awadallah explains. You can leave it to someone else. But that strategy doesn’t work for everyone.

“There are some businesses that will never let their data live outside of their perimeter,” he says. “They look at their data as their bloodline, and nobody would like to have their blood outside the body. That’s why financial institutions, hospitals, and many other institutions might never adopt external cloud.”

But they do have alternatives. Awadallah says an internal cloud, which leverages virtualization to treat your internal infrastructure as a cloud service, is something that these organizations “will certainly move toward.”

Finally, for some, a fully managed external cloud—where a bank of fully dedicated servers are managed by a third party, in a secured environment that can’t be shared or accessed by anyone else—can be an acceptable middle ground for some organizations that need infrastructure management but can’t abide the increased risks of an external cloud.

Awadallah discusses the importance of hierarchy in big data on the Discover Performance blog. To read more about the power of Hadoop, check out Vertica’s enterprise-level approach to Hadoop, which can integrate with Cloudera.


IT leader assessment

This tool evaluates the correlation between IT attributes and business success and, based on how your answers compare with average scores, will advise you where to invest in IT.

It is based on data HP collected from 650 global companies about a range of IT characteristics (server capacities, approach to information management, security, BYOD, etc.) and how they correlate to revenue gain. This assessment will compare your answers to the average scores in that study.

There are 12 questions that will require an estimated 10 minutes of your time. You'll receive a summary of your rating upon completion.

Let's get started

Please select an answer.


Your answer:
Your score:
Average score:
Revenue leaders' score:


Please select an answer.



Your score:
Average score:
Revenue leaders' score:

Get detailed results:


Popular tags


Discover Performance Weekly

HP Software’s Paul Muller hosts a weekly video digging into the hottest IT issues. Check out the latest episodes.

Big Data as key to change management

Change is hard—and risky—in any IT organization. Learn how better analytics makes things smoother and more successful.

Enterprise 20/20

Marketing 20/20

Welcome to a new reality of split-second decisions and marketing by the numbers.

Mobility 20/20

Looking toward the era when everyone — and everything — is connected.

Introduction to Enterprise 20/20

What will a successful enterprise look like in the future?

CIO 20/20

Challenges and opportunities for the CIO of the future.

Dev Center 20/20

How will we organize development centers for the apps that will power our enterprises?

IT Operations 20/20

How can you achieve the data center of the future?

Employee 20/20

What the workforce of 2020 can expect from IT, and what IT can expect from the workforce.

Security 20/20

Preparing today for tomorrow’s threats.

Data Center 20/20

The innovation and revenue engine of the enterprise.

Read more

HP Software related

Most read articles

Discover Performance


Tweets @ HPITperformance