Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
D**N
It's in the book
I wanted to do this review much sooner but I've been too busy using the book.Jos and Roland have taken the proven formula they used in Pentaho Solutions and focused it on ETL and Kettle, AKA Pentaho Data Integration. Their magic formula is to seamlessly mix a product users guide with equal parts of real world examples and best practices training. With the addition of Matt Casters, Mr Kettle himself, the depth of knowledge in the book is now equal to it's breadth. The result is a book that you can read cover to cover and learn about all aspects of building and deploying ETL solutions, and is equally useful as a day to day reference.The book is divided into five parts starting with an obligatory Getting Started. Getting Started, however, goes beyond the traditional "here's how to install it guide" and presents a nice tutorial on the sometimes confusing terminology and practices used in the data world. It explains how Kettle fits into this world and talks about the key concepts in Kettle. The first part ends with an excellent example ETL solution to populate a non trivial yet easily understood star schema. The example covers fact and dimension tables, change data capture, generating date dimensions and the ETL jobs and transforms required to populate the data.The organization of the second part of the book is based on the 34 subsystems of ETL as defined by Ralph Kimball in "The Data Warehouse Lifecycle Toolkit", considered by many (including me) as the bible of data warehousing. For each subsystem, Kettle Solutions refers to the original chapters that describe the topic and provides examples on how to solve those issues using Kettle. It is a must have for anyone struggling with the concepts presented in the Kimble book. For the rare cases that Kettle does not have a straight forward solution, the book points you to other open source software that can get the job done. The authors stay true to the task of helping the ETL developer solve real problems regardless of whether Kettle is the complete solution or not.The first two parts take up about half the book and if the authors stopped there, it would be worthy of at least 4 stars. But like most software development, the base code (in this case, the jobs and transforms) are the easy part and usually the most fun. The real hard stuff comes when you have deploy your solution into the real world, keep it running, add new capability, explain it to others and be confident that it is actually working. Part three walks you through the ETL lifecycle with best practices and pitfalls by 3 people who do (or have done) this for a living. Everything is covered from development and testing through documentation, monitoring, migrating and auditing. Part four finishes what part 3 started by covering performance tuning and scaling with topics like clustering, partitioning, cloud and real time ETL.The last part is my favorite since it covers the advanced stuff like writing Kettle plugins, complex data formats, integrating data from web services, dynamic ETL, embedding Kettle, etc. There are many ways to extend Kettle via defined APIs and Kettle Solutions covers them all.As you can probably tell I like the book and I use it often. I have the luxury of being able to ask Matt questions when I run into trouble. After writing the book, he now answers "it's in the book" and needless to say, it is. I can honestly say, having this book sitting on your desk is better than having Matt sitting on your desk. Kettle Solutions is also available for Kindle which, much to my surprise, has proven very useful. I use it from my iPhone and Mac via Kindle app and despite some of the Kindle app limitations like cut and paste and a good search, it is always available as a reference. The links are live which is a bonus.I'm a fifteen year veteran of building BI software, one of the original Pentaho developers and am currently the Pentaho community guy. I work with Matt Casters, I'm not professionally affiliated with Jos, Roland or Wiley and receive no benefit from this book beyond the satisfaction of having Pentaho software be so well represented. I do consider all three of them good personal friends and I provide this review with the risk that it may greatly inflate their heads.Doug MoranPentaho
S**P
Very good book. Could have been even better ...
I've given it 5 stars because for me the value I got out of it just in one chapter on the Data Vault was worth the money.However, there is definitely something to be said about Pentaho Data Integrator (Kettle) coverage that I would've wanted more. The online documentation is good but still leaves a lot to be desired in real workable examples. The book does fill that need to an extent and covers some pretty decent logic flows.It would have been nicer to have a step by step source to target guide with reasoning and with explanations of the steps and a LOT more focus on the tool than on ETL subsystems. I didn't even care about this because as an experienced data integration expert, I don't need it.What I wanted was a walkthrough of each step, debugging, tuning, patterns and more on the tool metadata which is extremely powerful. I think it fell a bit short on this even though it did deliver to an extent. For someone wanting to learn Pentaho DI (Kettle) from scratch, it's really not enough and it should have been.The "Advanced Topics" are very well written and covers some ground that is inspiring.Overall, it's a really good book.
F**S
Must have!
The guys who developed the Pentaho Data Integration, aka PDI or Kettle, teamed to write a definitive book on the software. Everything you always wanted to know about PDI but didn't know you needed! Plus a Dimensional Modeling chapter written by Kimball himself and an appendix teaching the basics of Data Vault, how to create one and use it to populate a dimensional model. Buy it! It is worth much more than they are asking for!
B**L
Good Starting Point
This book is pretty good for learning more about Pentaho. Although online sources are rich, they do not supply with enough information and I think this book is a must have. It is also good for people who are new to ETL and want to learn about data warehousing.
A**B
Too much Kimball's subsystems, not enough Pentaho.
Finished _Pentaho Kettle Solutions_, finding it generally OK. For me, it spends too much time covering Kimball's "34 Subsystems of ETL", fitting Pentaho into that framework. However, I got quite a bit out of the fifth section of the book "Advanced Topics".
J**T
John
While I haven't read this book end-to-end (and never planned to), it is my main reference for everything to do with PDI.Everyone who is developing in PDI should have this book sitting beside them.
S**R
Already Out of Date
The book is great the problem is that Pentaho has placed updates out for their product too frequently so now this book is no longer current.
V**L
Hands on
Excellent book, very much hands on, if you are skeptical about using this amazing open source ETL solution (or any sustainable open source for that matter) this book will surely put you at ease...
A**F
Five Stars
Great
D**L
The original guide of Pentaho Data Integration, an excellent reference book
This is a guide of all capabilities included in Kettle ( Pentaho Data Integration), provided with transformations and jobs to interact directly with metadata
A**R
Five Stars
Good
Y**M
ETLの基礎から応用までカバー
著者はPentaho Data Integration(Kettle)の創始者でありプロジェクトチーフ。内容は基本的なものから応用まで多岐に渡って23章600ページで構成されている。特に業務系OLTPから分析系OLAPへのデータ変換や概念が事例付きで解説されており、DWH構築のためのHow toが体系的に理解できる点が素晴らしい。ETL初心者から上級者まで、Kettleに携わる人にはバイブル的な存在になると思う。技術用語がメインであるため、ある程度BI予備知識と英語の読解力(忍耐力?)は必要だが、それを差し置いても読む価値がある。
Trustpilot
Hace 1 mes
Hace 1 mes