书名：Getting Started with Greenplum for Big Data Analytics
作者名：Sunila Gollapudi
本章字数：428字
更新时间：2025-02-22 07:02:40

Preface

Big Data started off as a technology buzzword rapidly growing into the headline agenda of several corporate strategies across industry verticals. With the amount of structured and unstructured data available to organizations exploding, analysis of these large data sets is increasingly becoming a key basis of competition, productivity growth, and more importantly, product innovation.

Most technology approaches on Big Data appear to come across as linear deployments of new technology stacks on top of their existing databases or data warehouse. Big Data strategy is partly about solving the "computational" challenge that comes with exponentially growing data, and more importantly about "uncovering the patterns" and trends lying hidden in the heaps of data in these large data sets. Also, with changing data storage and processing challenges, existing data warehousing and business intelligence solutions need a face-lift, a requisite for new agile platforms addressing all the aspects of Big Data has become inevitable. From loading/integrating data to presenting analytical visualizations and reports, the new Big Data platforms like Greenplum do it all. Very evidently, we now need to address this opportunity with a combination of "art of data science" and "related tools/technologies".

This book is meant to serve as a practical, hands-on guide to learning and implementing Big Data analytics using Greenplum and other related tools and frameworks like Hadoop, R, MADlib, and Weka. Some key Big Data architectural patterns are covered with detail on few relevant advanced analytics techniques. includes required details to help onboard the readers to all the required concepts, tools, and frameworks to implement a data analytics project.

R, Weka, MADlib, advanced SQL functions, and Windows functions are covered for in-database analytics implementation. Infrastructure and hardware aspects of Greenplum are covered along with some detail on the configurations and tuning.

Overall, from processing structured and unstructured data to presenting the results/insights to key business stakeholders, this book introduces all the key aspects of the technology and science.

Note

Greenplum UAP is currently being repositioned by Pivotal. The modules and components are being rebranded to include the "Pivotal" tag and are being packaged under PivotalOne. Few of the VMware products such as GemFire and SQLFire are being included in the Pivotal Solution Suite along with RabbitMQ. Additionally, support/integration with Complex Event Processing (CEP) for real-time analytics is added. Hadoop (HD) distribution, now called Pivotal HD, with new framework HAWQ has support for SQL-like querying capabilities for Hadoop data (a framework similar to Impala from open source distribution). However, the current features and capabilities of the Greenplum UAP detailed in this book will still continue to exist.