Pivotal Releases GemFire 8.1 With Updates and New Features

Cross posted from The Pivotal POV Blog…

cubes

Pivotal GemFire 8 was the first major release of the in-memory distributed database since it joined Pivotal’s portfolio of products. Today, we’re announcing the release of Pivotal GemFire 8.1. Part of the Pivotal Big Data Suite, Pivotal GemFire enables developers to deploy their big data NoSQL apps upon a massive scale. In addition to incremental product improvements, 8.1 enhances GemFire’s availability and resilience within a distributed system, and improves upon its management and monitoring features.

Allowing High Availability, Resilience, and Global Scale

[Read more…]

GemFire XD 1.4 Now Available for Download

Cross posted from The Pivotal POV Blog…

GemXD

The latest release of GemFire XD, version 1.4 is now available for download. its biggest improvements include single hop inserts for 50% faster performance, and support for JSON document objects in SQL tables. This makes GemFire XD even better for write-intensive use cases, such as high-speed ingest. Also, now we can support use cases that need more schema flexibility to the otherwise well-defined relational structure of GemFire XD.

[Read more…]

10 Amazing Things to Do With a Hadoop-Based Data Lake

Cross posted from The Pivotal POV Blog.

The following is a summary of a talk I gave at Strata NY that is proving popular among a lot of people who are still trying to understand use cases for Apache Hadoop® and big data.  In this talk, I introduce the concept of a Big Data Lake, which utilizes Apache Hadoop® as storage, and powerful open source and Pivotal technologies. Here are 10 amazing things companies can do with such a big data lake, ordered according to increasing impact on the business.

1

[Read more…]

TEDx Talk: “What’s the Big Deal about Big Data for Humans?”

I gave this TEDx talk to a general audience of folks at the SJSU TEDx event. It’s an interesting job trying to explain fairly complex technical concepts to a mixed audience of people of many different ages and backgrounds. It helps one realize just how much we depend on a common vocabulary and understanding in the IT industry.

And in the tradition of many TED talks, the point is to motivate an action, not just educate.

How do you think I did?

And here are the slides…

Announcing the New Version of GemFire XD and SQLFire: Pivotal GemFire XD 1.3

Cross posted from my blog at Pivotal POV…

 

 

The newest versions of SQLFire and GemFire XD are one and the same: Pivotal GemFire XD version 1.3. What were previously two separate products are now merged, so current licensees of either product are entitled to upgrade to the new version.

[Read more…]

What’s New in Pivotal GemFire 8

Reposted from Pivotal POV….

featured-gemfire-8

On September 23, 2014 Pivotal announced the release of Pivotal GemFire 8, part of Pivotal Big Data Suite. This is the first major release of GemFire since it became part of the Pivotal portfolio.

Born from the experience of working with over 3000 of the largest in-memory data grid projects out there, including China Railways, GIRE, and Southwest Airlines, and  we’ve invested more into the needs of the most demanding enterprises: more scale, more resilience, and more developer APIs.

This release is a significant enhancement for developers looking to take their big data NoSQL apps to massive scale. For the complete technical details, you can check out the new datasheet, and official product documentation.

Here’s what’s new, sorted by the 5 areas that GemFire does best in the industry:

Providing Scale Out Performance

This is why most of Pivotal’s customers begin looking at GemFire in the first place—because they can’t make traditional RDBMS’s scale with the number of concurrent transactions and data they need to manage.

Pivotal GemFire manages data in-memory distributed across multiple systems on commodity hardware—100’s of nodes if you like—in a shared-nothing architecture.  So there’s plenty of compute and memory to host all your data to get real-time response.

WHAT’S NEW

We’ve added in-memory compression, effectively giving each node the capacity to hold up to 50% more data. Compression is achieved through Snappy, a speed-optimized algorithm, although the compression codec is replaceable to whatever algorithm you want to use.

Maintaining Consistent Database Operations Across Globally Distributed Nodes

[Read more…]

Our Customers at Pivotal Recognize the Importance of Bridging Traditional Data Warehousing into Next Generation Platform

Cross posted from my blog at Pivotal POV:

Recently Gartner published the report, “Gartner Critical Capabilities for Data Warehouse Database Management Systems” that shares survey results of customers from a variety of Data Warehouse solution vendors.  The report ranks vendors in 4 categories of use cases in the Data Warehouse market: “Traditional Data Warehouse”, “Operational Data Warehouse”, “Logical Data Warehouse”, and “Context Independent Data Warehouse.”

Based on existing customer implementations and their experiences with data warehouse DBMS products, the report scored Pivotal in the top 2 out of 16 vendors in two use cases: “Traditional Data Warehouse” and “Logical Data Warehouse”.  In a third use case, “Context Independent Data Warehouse”, Pivotal scored in the top 3 relative to the 15 other vendors.

In the report, Gartner writes “the adoption rate for modern use cases (such as the logical data warehouse and the context independent warehouse) is increasing year over year by more than 50%—but the net percentage for the context independent and logical data warehouse combined remains below 8% of the total market.”

Modern Data Warehouse Use Cases Generate Trillions in Value

Many of Pivotal’s big data analytics customers started out as Greenplum Databasecustomers. These customers are both well established in traditional data warehousing techniques and take advantage of modern data warehousing scenarios supported by Greenplum Database’s advanced analytics capabilities, and other products of Pivotal Big Data Suite: Pivotal HAWQ and Pivotal HD.

Industry leaders like General Electric are using Pivotal Big Data Suite to create new solutions that cut weeks of analysis time that would be required using traditional data warehouse approaches. For example, a process for refining insightful analytics from sensor data streams generated by industrial machinery was compressed from 30 daysto just 20 minutes.

Other companies are using these approaches to improve customer retention, target advertising, detect anomalies, improve asset utilization and more. The combined potential benefit of these opportunities is staggering. GE alone predicts its solutions will boost GDP by $10-15 trillion in the next 20 years by saving labor costs and improving energy efficiency. [Read more…]

What Does “Data-Driven Company” Mean for a Developer?

data_driven

At Pivotal Software, we frequently pitch about a virtuous cycle of of data-driven app development where:

  1. A cache of data is collected and stored.
  2. A team of analysts or data scientists discovers an insight or optimization opportunity
  3. This affects the app development which generates more data.
  4. The company continues through the cycle more and more agilely.

Wash. Rinse. Repeat.

However, the story above is more of an analytics story than a developer story.  This became clear to me while sitting with some crack developers at dinner at the Spring One 2GX conference.

One of my table mates asked me, “I understand what Pivotal is saying about the virtuous cycle, but what exactly happens between analytics and apps stages?”

[Read more…]

New Benchmark Results: Pivotal Query Optimizer Speeds Up Big Data Queries Up To 1000x

Cross posted from my original blog at Pivotal P.O.V

Have you heard about the new super-efficient Pivotal Query Optimizer developed by the Greenplum engineering team? Previously codenamed “Orca”, this new feature has been released as part of the HAWQ query engine in Pivotal HD, Pivotal’s commercially-supported distribution of Apache Hadoop.

This new optimizer has been undergoing months of performance testing and improvements and is nearly ready for market. Pivotal will be showcasing a peer-reviewed paper at ACM SIGMOD Conference 2014, June 22 – 27, on the results of this performance study. Titled “Orca: A Modular Query Optimizer Architecture for Big Data”, this paper explains how they built the query optimizer, and show the results they’ve seen so far in customer usage and ongoing testing. If you would like to get a copy of the paper yourself and see the detailed benchmark results, ask at the Pivotal booth (booth S32) at this week’s Hadoop Summit in San Jose.

The Pivotal Query Optimizer is now also available to Pivotal Greenplum DB customers as part of an early access program. For customers that are interested in trying this out, please register here.

Sophisticated Computer Science

Developing a query optimizer involves some very sophisticated computer science. The team wanted to create a new SQL-compliant query technology that was better suited to the trends we are seeing in big data:

  • Increasing volume from companies keeping detail data, not aggregates, from many more sources.
  • More variety in the types of data to be incorporated into queries such as application logs, sensor time series, geospatially tagged data, genomics data, and social media feeds.
  • Diverse storage due to an increasing variety of data technologies being instead of traditional RDBMS for storing and managing this data.
  • Complex queries generated by advanced analytics algorithms being applied to all this data.

This technology is laser focused on providing fast SQL query results on petabytes of data and be portable across data architectures, such as Pivotal HD and Pivotal Greenplum.

PQO_system_architecture
© 2014 ACM, used with permission.

Figure 1. The Pivotal Query Optimizer is a stand alone optimizer that is portable across databases that implement Data eXchange Language (DXL).

[Read more…]

SAP SAPPHIRE NOW 2014 Kickoff Keynote – First Impressions

Bill McDermott, CEO of SAP  intends to prove to SAP customers that they are now a "cloud company."

Bill McDermott, CEO of SAP intends to prove to SAP customers that SAP is now a “cloud company.”

I must admit, it’s kind of strange not being part of the festivities in Orlando this week since I’m no longer with SAP.

I didn’t wake up early enough to catch all of Bill McDermott’s opening keynote, but reviewing announcements gives a pretty good idea where SAP is heading this year. This is interesting to hear given the recent departure of SAP HANA’s executive sponsor, former CTO Vishal Sikka.

[Read more…]