Discover PerformanceHP Software's community for IT leaders // January 2014
HP Labs: Big Data and the future of computing
A top HP Labs researcher discusses how context will turn information into value.
It’s pretty much a given in enterprise computing that ‘Big Data’ is both the industry’s biggest current challenge and its greatest hope for future growth. It’s also very much on the minds of researchers at HP Labs.
We met up recently with John Sontag, who leads the storied research organization’s investigations into enterprise computing, to find out how he and his colleagues are thinking about Big Data right now.
Q: First, John, can you share what Big Data means to you?
John Sontag: Sure. Here’s one way to look at it: in the future, everyone is going to become a data scientist, whether they like it or not. That’s because we’re moving to a world where, in 10 years, we’ll be creating 20 times as much data as there is now. And it will be all around us, not just in one place. The businesses that will thrive in that environment will have employees who can look at the data sources around them and use those sources to improve their customer intimacy, to understand what problems they can solve better, and to build a stronger ecosystem of partners around them to impact whatever sector they serve, be it IT, finance, healthcare, government, or any other.
Q: What does that mean for technology companies like HP?
JS: It means that we need to be asking how we can help people look at the contexts in which their businesses are operating and at all the data they have, and empower them to experiment on it, find patterns, institutionalize those workflows, eventually automate them, and do all those things in real time.
Q: Can you give us an example of what that might look like in practice?
JS: I’ll give you one that’s already happened. We’ve been working closely with a number of leading hospitals. Not long ago, a physician at one of these hospitals saw a teenage girl with a very rare medical condition that might elevate her risk of having a stroke. They faced a difficult decision—whether to try to give her treatment that would lower the risk of stroke but introduce other risks, such as bleeding. The treating physician hadn’t seen anyone with this syndrome before, and it was so rare that there was neither clinical literature on it nor agreement among the world-class experts in the hospital itself. Ten years ago, they’d have been stuck. But this time they were able very quickly to gather data from every patient that had been seen with this condition before, and build a virtual cohort for this girl. They could have an accurate assessment of the risks of treating or not treating, seeing what others had done in similar situations, and choose a course of action that was best, given the available information. They saved her life, and in doing so, those doctors just became data scientists. They were able to make a better decision, in real time, thanks to data. It’s those kinds of experiences that we’re trying to make accessible to everybody who works in a business—where they don’t need to plan for it beforehand but can quickly assemble data sources, conduct experiments, and gather insights whenever they need.
Q: Where is HP today in terms of helping everyone become a data scientist?
JS: For that to happen we need a set of tools that allows us to be data scientists in more than the ad hoc way I just described. These tools should let us operate productively and repeatably, using vocabulary that we can share—so that each of us doesn’t have to learn the same lessons over and over again. Currently at HP, we’re building a software tool set that’s helping people find value in the data they’re already surrounded by. We have HAVEn for data management, which includes the Vertica data store and Autonomy for analysis. For enterprise security, we have ArcSight and Threat Central. We have our work around StoreOnce to compress things, and Express Query to allow us to consume data in huge volumes. Then we have hardware initiatives like Moonshot, which is bringing different kinds of accelerators to bear so we can actually change how fast—and how effectively—we can chew on data.
Q: And how is HP Labs helping shape where we are going?
JS: One thing we’re doing on the software front is creating new ways to interrogate data in real time through an interface that doesn’t require you to be a computer scientist. We’re also looking at how we present the answers you get in a way that brings attention to the things you most need to be aware of. And then we’re thinking about how to let people who don’t have massive compute resources at their disposal also become data scientists.
Q: What’s the answer to that?
JS: For that, we need to rethink the nature of the computer itself. If Moonshot is helping us make computers smaller and less energy hungry, then our work on memristors will allow us to collapse the old processor/memory/storage hierarchy, and put processing right next to the data. Next, our work on photonics will help collapse the communication fabric and bring these very large scales into closer proximity. That lets us combine systems in new and interesting ways. And then we’re thinking about how to package these reimagined computers into boxes of different sizes that match the needs of everyone from the individual to the massive, multinational entity. On top of all that, we need to reduce costs—if we tried to process all the data that we’re predicting we’ll want to at today’s prices, we’d collapse the world economy—and we need to think about how we secure and manage that data, and how we deliver algorithms that let us transform it fast enough so that you, your colleagues, and partners across the world can conduct experiments on this data literally as fast as we can think them up.
Q: Could you talk a little more about the impact of the new hardware you are developing on how we use software?
JS: Sure—the combination of non-volatile, memristor-powered memory and very large scales is causing the people who think about storage and algorithms to realize that the tradeoff has changed. For the last 50 years, we’ve had to think of every bit of data that we process as something that eventually has to get put on a disk drive if you intend to keep it. That means you have to think about the time to fetch it, to re-sort it into whatever way you want it to rest in memory, and to put it back when you’re done, as one of your costs of doing business. If you don’t have those issues to worry about, you can leave things in memory—graphs, for example, which are powerful expressions of complex data—that at present you have to spend a lot of compute time and effort pulling apart for storage.
The same goes for processing. Right now we have to worry about how we break data up, what questions we ask it, and how many of us are asking it at the same time. It makes experimentation hard, because you don’t know whether the answer’s going to come immediately or an hour later. Our vision is that you can sit at your desk and know you’ll get your answer instantly. Today we can do that for small-scale problems, but we want to make that happen for all of the problems that you care about. What’s great is that we can begin to do this with some questions that we have right now. We don’t have to wait for this to change all at once. We can go at it in an incremental way and have pieces at multiple stages of evolution concurrently—which is exactly what we’re doing.
Q: What kind of impact do you see these changes having on businesses and on our lives more generally?
JS: One thing is that there are people who have given up on thinking about certain problems because there’s no way to compactly express them with the systems we have today. They’re going to be able to look at those problems again—it’s already happening with Moonshot and HAVEn—and at each stage of this evolution, we’re going to allow another set of people to realize that the problem they thought was impossible is now within reach.
One example of where this already happened is aircraft design. When we moved to 64-bit processors that fit on your desktop and that could hold more than 4 gigabytes of memory, the people who built software that modeled the mechanical stresses on aircraft realized that they could write completely different algorithms. Instead of having to have a supercomputer to run just a part of their query, they could do it on their desktop. They could hold an entire problem in memory, and then they could look at it differently. From that we got the Airbus A380, the Boeing 777 and 787, and, jumping industries, most new cars. I am absolutely certain there are lots of problems like that—that at present are just too big—and where applying this new compute power will offer us a totally new insight on the world. We’ll see it in retail, in finance, in healthcare, everywhere—people stitching together the information that’s in their ecosystem to more effectively connect the dots. It’s an exciting time.
Learn more about Big Data in the new Enterprise 20/20 chapter, "Big Data 20/20," looking at the transformative future of advanced analytics. Download the complete chapter—and find previous chapters—on our Enterprise 20/20 page.
About John Sontag:
John Sontag is vice president and director of systems research at HP Labs. With more than 30 years of experience at HP in systems and operating system design and research, Sontag has had a variety of leadership roles in the development of HP-UX on PA-RISC and IPF, including 64-bit systems, support for multiple input/output systems, multi-system availability, and Symmetric Multi-Processing scaling for OLTP and web servers. Sontag received a bachelor of science degree in electrical engineering from Carnegie Mellon University.
Welcome to a new reality of split-second decisions and marketing by the numbers.
Looking toward the era when everyone — and everything — is connected.
Introduction to Enterprise 20/20
What will a successful enterprise look like in the future?
Challenges and opportunities for the CIO of the future.
Dev Center 20/20
How will we organize development centers for the apps that will power our enterprises?
IT Operations 20/20
How can you achieve the data center of the future?
What the workforce of 2020 can expect from IT, and what IT can expect from the workforce.
Preparing today for tomorrow’s threats.
Data Center 20/20
The innovation and revenue engine of the enterprise.