“Pentaho Data Integration Cookbook” 2nd edition Review

In 2011, the first edition of “Pentaho Data Integration Cookbook” was published. In that moment in time, the book was interesting enough for a PDI (Pentaho Data Integration) developer as it provided relevant answers for many of the common tasks that have to be carried out for data warehousing processes.

After two years, the data market has greatly evolved. Among other trends, Big Data is a major trend and nowadays PDI included numerous new features to connect and use Hadoop and NoSQL databases.

The idea behind the second version is to include some of the brand new tasks required to tame Big Data using PDI and update the content of the previous edition. Alex Meadows, from Red Hat, has joined the previous authors (María Carina Roldan and Adrián Sergio Pulvirenti) in this second version. Maria is author of four books about Pentaho Data Integration.

What is Pentaho Data Integration?

I’m sure that many of you already know it. For those who doesn’t. PDI is an open source swiss army knife of tools to extract, move, transform and load data.

What is this book?

To put it simply. It includes practical handy recipes for many of the everyday situations for a PDI developer. All recipes follow the same schema:

  • State the problem
  • Create a transformation or job to solve the problem
  • Explain in detail and provide potential pitfalls

What is new?

One thing that a potential reader can question himself is: If I already have the previous one, is it worth to read this additional version? If you are a Pentaho Data Integration developer, the easy answer is yes. Mainly, because the book includes new chapters and sections for Big Data and Business Analytics, technologies that are becoming crucial core corporate capabilities in the information age.

So, in my humble opinion, the most interesting chapters are:

  1. Chapter 3: where the reader will have the chance to learn how to load / get data into Hadoop, hbase and MongoDB.
  2. Chapter 12: where the reader will be given the opportunity to read data from a SAS data file, to create statistics from a data stream and to build a random data sample for Weka.

What I’m missing or could be improved?

More screenshots, some readers probably could think the same. Being honest, while I’m happy about the chapter 3 and 12, it will be interesting to have more content related to these topics. So, let’s put it this way. I am counting down the days for the following edition.

In summary, an interesting book for PDI and data warehousing practitioners that give some information about how to use PDI for Big Data and Analytics. If you are interested you can find it here.


Review of “Pentaho Reporting 5.0 by Example Beginner’s Guide” from Mariano García Mattío and Dario R. Bernabeu

A few weeks ago, Pentaho released the new version of its products (both CE and commercial). The latest version, currently 5.0, is accompanied by the latest developer tools. As it is usual, each new major release means new features. For example, the new version focus on better user interface and support for Big Data.

We have a new version of Pentaho Reporting as well. This tool helps to create professional reports with graphics, formulas, subreports, and so on.

If you want to master this tool you have several options: (1) mastering the tool yourself by trial and error (and / or searching information in forums), (2) training (through a certified partner or not) or (3) using a book.

That brings me to the topic I want to speak about in this post. Packt Publishing has given me the opportunity to review the new book for Pentaho Reporting which name is “Pentaho Reporting 5.0 by Example Beginner’s Guide”. This books provides a detailed overview of using Pentaho Report Designer by examples.

The book starts with the usual suspects: What is Pentaho Reporting and Pentaho Reporting Designer (PRD), which are the main components of PRD and the evolution of Pentaho Reporting since 2002. Nothing additional for a the daily Pentaho developer, but it is still interesting for a newcomer.

Why this book may be still interesting for you? If you are a Pentaho developer the initial chapters are not new. Chapter 2 is about the installation of PRD, Chapter 3 is about the user interface and Chapter 4 is about your first report. So probably you are going to skip them.

The interesting part starts with Chapter 5. Even if you are a regular developer, it is easy to forget some features or the proper way to do things. Following a step by step description process, the book provides numerous examples and helps to increase your skills. Among the topics, it is worth to highlight: how to connect to a database, how to create formulas, how to add a new JDBC driver, how to add a group, how to add parameters, how to add charts, how to add subreports, how to publish your reports to pentaho server,…

This is: The main things that a reporting developer should know and master to start working with Pentaho Reporting Designer.

It’s nice to say that is one of the earliest books about the newest version of Pentaho. And that means that the chapter that references to Pentaho Server (Chapter 11) is including the screenshots with the new interface.

What if I am a beginner developer: This is your book. No brainier.

What if you are an expert developer. It may be a nice addition to your library if you don’t have any book, but probably you already know almost everything that is explained (even hyperlinks, sparklines, stylesheets and crosstabs). So, just remember what the title says: it’s for beginners.

Book Review – “Instant Pentaho Data Integration Kitchen”

A new year, a new book about Pentaho. And many times a book about Pentaho Data Integration (PDI). While this could seem a topic, it’s true at least for the last years. Some of you could think that this is a bad thing, but it’s not. If there is a great piece of software that it’s worth to spend some time to learn and master in the whole Pentaho stack is PDI, a.k.a Kettle, for old friends. Don’t doubt about it. Not for a second. If you want to be a Data Scientist is a good skill to add to your toolbox.

So, when Packt Publishing offer me the chance to review the new book from Sergio Ramazzina, “Instant Pentaho Data Integration Kitchen” published by Packt Publishing, it was a big yes. No brain.

For those who still don’t know what is Pentaho Data Integration, the simplest answer is: it’s an open source ETL tool created 10 years ago by Matt Caster.

About the book

As many other books about PDI, the book starts explaining what is PDI adding a brief summary of its story. As many of you already known, PDI is a quite powerful tool but mastering all the features requires time and commitment before you are able to design enterprise-level ETLs.

This book can help with that goal. Starting with how to create a simple transformation and a simple job (the two types of ETL processes according PDI), the book provides valuable information, tips and insights on how to master the use of the command line, the repository, the execution log or scheduling jobs and transformations. Let’s put it clear it helps you to master some of the most important features when using PDI in a project as your main ETL tools. It is particularly interesting and useful the chapter “Scheduling PDI jobs and transformations”.

With a straight narrative, this short book is easy to read and in my humble opinion it could be an interesting complement to your PDI library if you are looking for a quick guide.

However it should be said that if you are looking for a book describing the data warehousing process and how to use PDI for that process, this is not your book.