A new year, a new book about Pentaho. And many times a book about Pentaho Data Integration (PDI). While this could seem a topic, it’s true at least for the last years. Some of you could think that this is a bad thing, but it’s not. If there is a great piece of software that it’s worth to spend some time to learn and master in the whole Pentaho stack is PDI, a.k.a Kettle, for old friends. Don’t doubt about it. Not for a second. If you want to be a Data Scientist is a good skill to add to your toolbox.
For those who still don’t know what is Pentaho Data Integration, the simplest answer is: it’s an open source ETL tool created 10 years ago by Matt Caster.
About the book
As many other books about PDI, the book starts explaining what is PDI adding a brief summary of its story. As many of you already known, PDI is a quite powerful tool but mastering all the features requires time and commitment before you are able to design enterprise-level ETLs.
This book can help with that goal. Starting with how to create a simple transformation and a simple job (the two types of ETL processes according PDI), the book provides valuable information, tips and insights on how to master the use of the command line, the repository, the execution log or scheduling jobs and transformations. Let’s put it clear it helps you to master some of the most important features when using PDI in a project as your main ETL tools. It is particularly interesting and useful the chapter “Scheduling PDI jobs and transformations”.
With a straight narrative, this short book is easy to read and in my humble opinion it could be an interesting complement to your PDI library if you are looking for a quick guide.
However it should be said that if you are looking for a book describing the data warehousing process and how to use PDI for that process, this is not your book.