Discover PerformanceHP Software's community for IT leaders // November 2013
Vertica 7: Combining flexibility with deeper insight
Vertica VP and GM Colin Mahony discusses the advantages of flexibility and deeper insight.
Amid the launch of the new Vertica 7, Discover Performance sat down with Colin Mahony, vice president and general manager of HP Vertica, to talk about how the new release reflects Vertica’s approach to the fast-evolving analytics/Big Data marketplace. Mahony also touched on how Vertica is increasing its integration with HAVEn, the pan-HP platform for Big Data.
Q: So what was the thinking that guided work on the new iteration of Vertica?
Colin Mahony: We’re focused on three things. First, we want to make it really easy for customers to bring their information into Vertica, because analysis cannot happen until you get the data in.The second thing we focus on is what we let you do with that data once it’s in Vertica. We’re constantly expanding our platform to offer our customers more ways to extract value and insight from their information. Third, we want them to be able to extract that value in whatever environment and on whatever platform is most convenient to them. If they want to download our bits and run it on their own hardware, they certainly can. If they want to buy a Vertica platform packaged as an appliance from HP, all the better. If they want to run it on the cloud, public or private, we can do that too.
Those are the three things that we’re committed to with every release, and we’ve greatly enhanced each of those aspects recently. We just announced Vertica 7, and one of the huge innovations there is a feature we call FlexTable, and a new related offering built on it called Flex Zone.
These new capabilities help companies tackle the historic tension that has existed between traditional relational databases—which require customers to clearly define the data’s structure, or “schema,” before data is loaded—and having the flexibility to just dump the data in something like a file system, where you don’t need to worry about structure, but it is significantly harder to analyze and extract value from.
So, on the one side of the industry you have established traditional database vendors that are struggling with scaling, performance, and this flexibility issue, especially with new forms of data. On the other side, you have the NoSQL vendors and technologies like Hadoop that offer flexibility and scale but require a lot more heavy lifting and custom coding. But there are still tradeoffs. A traditional relational database gives you good performance when it comes to querying and deriving value out of the data, but you have to do a lot of upfront work defining models and optimizations. With Hadoop and other NoSQL products, it might be less work to get the data in, but they were not designed for interactive analytics. Furthermore, they have limited business intelligence ecosystem support, which is why most of the Hadoop vendors are now trying to add a SQL capability.
HP Vertica FlexTable and Flex Zone give you the best of both worlds. Flex Zone enables you to load data into Vertica as a big data lake—extremely cost effectively, without having to define structure or schemas upfront—and FlexTable allows us to ingest and analyze all sorts of new forms of un- and semi-structured data such as social media, machine data, and log files. With FlexTable, we can automatically infer the structure, and automatically build the schemas under the covers, so that the data is seamlessly loaded into the Vertica database, where it can be analyzed and monetized for the highest performance.
We absolutely believe this is one of the most monumental changes to the database world in the last 40 years. Simply drop files into Vertica, where we’ll do the heavy lifting and immediately make it available to you for analysis in your favorite tool. Traditionally, it has been very difficult for people to analyze emerging data types, especially semi-structured data like JSON and other key value structures.
So this falls into that first bucket of “make it easier to get data in”—and we give great performance with that simplicity, so there’s no tradeoff there. Also, because we embrace all of the traditional BI and ecosystem tools, you can immediately visualize and analyze your data: Unlike Hadoop or some of the other NoSQL options out there that can’t take advantage of these products natively, we can. Vertica 7 automatically bridges the gap, so that the NoSQL world has a seamless onramp and destination to leverage SQL and three decades of tooling around it.
Q: What about the “doing more once it’s in there” category?
CM: For that, we’re expanding our software developer kit, the feature within Vertica that allows you to write custom logic beyond SQL. Our SDK now supports Java as a coding language—the Vertica SDK already supports the R statistical language, as well as C and C++, but now it will support the huge number of Java developers out there. They can write their Java code right in there, intermixed with SQL, and leverage Vertica’s massive performance and scale. This is tremendously powerful, and again, it helps seamlessly bring together the SQL and NoSQL worlds.
We are also releasing a bunch of packs that plug into our SDK. We have a sentiment analysis pack so that you can take different types of data, such as social media, emails, or even transcripts of customer calls, and do sentiment analysis against them right in Vertica. We have a pack for geospatial capabilities so you can analyze not just what, but where. We have a number of other packs that we’re coming out with around specific analyses and statistical functions, all of them enabling people to get more work done faster, and addressing the issue of the industry-wide shortage of data scientists. By building better tools and packs for the most common forms of analysis that our customers have asked for, we empower them to do more, and do it more quickly and effectively. We have truly delivered on our promise to build an environment where analytics comes to the data, not the other way around.
Q: Are the packs based on other things within HP HAVEn?
CM: Yeah, great question. We’re working with Autonomy to create an Autonomy pack for Vertica, for a future release leveraging our SDK. We’re also working with HP Enterprise Security’s ArcSight team on a connector that brings some of our capabilities together there. There are a lot of these efforts going on inside the company, powering the HAVEn platform. The other nice thing about the Vertica SDK is that this functionality can be released whenever it is ready—it doesn’t have to wait for another major Vertica release.
And Flex Zone really helps a lot with HAVEn as well. Flex Zone makes it so much easier for us to receive data, whether it’s Autonomy information or ArcSight common event format (CEF) information. Our internal and external partners are then taking full advantage of our analytic packs and the Java SDK to do more with the data once it’s in Vertica.
Q: Okay—that leaves your third category: broader access to that information.
CM: Yep. We’re working with HP’s Enterprise Group on a new Vertica appliance, and also building the framework for some HAVEn appliances. And you’re going to see a lot more from us with HP Cloud Services around both public and private cloud.
Q: That’s part of the ongoing evolution of the HAVEn Big Data
CM: Right. We really are focused on enhancing HAVEn, pulling together all these capabilities across HP, with tighter integration than the initial HAVEn platform. With the next generation of HAVEn, we still have discrete products such as Vertica, ArcSight, and Autonomy IDOL, but we’re able to, in a modular way, really couple them more tightly when the customer wants it.
Regardless of whether a customer’s next step is a one-off project that just employs Vertica or another individual HAVEn component, or a whole transformational, multifaceted project, HAVEn delivers a path for being successful with Big Data. If you invest in any component of HAVEn, we’ve got your back, and we’re going to make it very easy to take advantage of a lot of different areas as part of that investment. I think that approach is very different from a lot of our big competitors. And one thing we have learned since the inception is that as soon as a customer gets Vertica, they immediately start doubling and tripling the amount of data they want in it, while also dramatically increasing their analytics capabilities.
Welcome to a new reality of split-second decisions and marketing by the numbers.
Looking toward the era when everyone — and everything — is connected.
Introduction to Enterprise 20/20
What will a successful enterprise look like in the future?
Challenges and opportunities for the CIO of the future.
Dev Center 20/20
How will we organize development centers for the apps that will power our enterprises?
IT Operations 20/20
How can you achieve the data center of the future?
What the workforce of 2020 can expect from IT, and what IT can expect from the workforce.
Preparing today for tomorrow’s threats.
Data Center 20/20
The innovation and revenue engine of the enterprise.