Learning Spark: Lightning-Fast Data Analytics
A**Z
Covers theoretical and practical aspects of the spark ecosystem in great depth
This book is a great resource to learn about spark. It covers in detail the concepts related to the Spark architecture, theoretical concepts about parallelization and topics related to optimizing analytical pipelines running on Spark. The book has a very nice section about the delta lake. Also covers MLflow yup a good level of detail, more like a complement to the docs. The section on machines learning includes theoretical explanations on how some ML algorithms change when running then parallely, as MLlib does.I used the book as an extra study resource when taking some Databricks certifications. It was a great addition to my study materials.
J**A
Well organized and solid information
It was easy to follow the book. The setup of Spark shell was also clearly written. I also find the instructions online to install spark locally to be sufficient as well. The book is well organized to delineate different components of Spark, e.g. intro, structured api, streaming, optimizations, data lake, ml deployment options. While ML deployment needs for individual business use cases are highly specific, I find the overview deployment framework provided by the book to be helpful. I also liked that the book uses screenshots of Spark UI and arrows to point in the screenshots to explain the UI, since the UI can be hard to understand. The code samples and the graphics in other sections are useful as well. There’s also coverage on how to connect to different apps, like beeline (which I’ve never heard of), tableau, thrift. Overall, the book contains solid information on the inner workings of Spark. I would recommend giving this book a read!
S**E
Decent introduction to Spark
I am always trying to learn new skills to make myself more marketable in the work place. My background is mainly in SQL with some Python and I am learning JS right now. I decided to give this book a shot to see whether Spark is another tool I want to add to my arsenal. The books does what it promises; it gives you a good introduction to Spark. I did have some issues installing the required programs on a MacBook, but once I had everything installed, I was able to follow along. My big complaint is what others have mentioned, which is concepts are mentioned without any background to what or why.If you have some programming background, this book should be sufficient to get you up and running in Spark.
C**S
Buen libro para iniciarse en spark
Da buenos ejemplos sea en Scala y python aunque no siempre están en python el lenguaje Scala es similar (como un Java python). Sugiere que si quieres practicar utiliza databricks si no quieres instalar nada on-premise o si gusta instala spark utilizando wsl de Windows o una máquina virtual con Linux.
M**D
Must read
This book is a must read for anyone trying to learn Spark in the big data environment.
A**R
More databricks centric
Nice book if you really want to work hands on without having to worry about internals of spark.
T**S
Great beginner book
I'm a software engineer who knows his way through SQL, mostly running queries/transforms on Postgres and Redshift. The majority of my background is in building and supporting services. Having no background knowledge in Spark, I was looking for a book that explains the fundamental concepts, helps me get up running, and helps me expand my toolkit for working with "big data".I was able to follow along in this book fairly easily. Working on a MacBook, I did have to first install Scala, download Spark, enable Spark in IntelliJ, etc. I didn't have trouble with this as it was fairly straightforward. With my environment set up, I found the book presents every code sample in Scala and Python. I worked through the code samples, chapter by chapter, writing Scala in IntelliJ or sometimes writing Scala in the Spark CLI itself.I did take a detour from the book slightly to learn a bit more about sbt, which is the Scala build tool.For a beginner such as myself, this book is a God send, but I do wish the authors approached some things differently.In my opinion, some topics are covered in a very "hand-wavy" manner. For example, Chapter 4 discusses managed vs. unmanaged tables. While knowing this difference exists is helpful for the reader, the authors never discuss when you should use a managed table or an unmanaged table. They could have included that information or pointed the user to some external source. This part of Chapter 4 then shows sample code on how to create a managed table from a CSV file. However, it's not clear what should I do with that information. What are the patterns applicable to a managed table vs. unmanaged table? What are the trade-offs? Being a beginner book, I still feel the authors could have written even just 1 page, which would add significant value to this section.Sometimes the book will share some interesting tidbit but using terminology or concepts that the authors haven't really described. I found this very frustrating. For example:> (Chapter 4, page 92) ... you can create multiple SparkSessions within a single Spark application—this can be handy, for example, in cases where you want to access (and combine) data from two different SparkSessions that don’t share the same Hive metastore configurations.If you search for mentions Hive, you see the authors briefly mentioned Spark uses a Hive metastore to persist table metadata. So are the authors saying I can use one Spark installation and access table metadata from different Hive metastores? Why would I ever want to access only the metadata for different tables? Again – the use case isn't clear.As a beginner, I found this book very valuable, and I believe it is a great investment.
E**C
Decent introduction to Spark
You should probably have some familiarity with machine learning and python before you pick this book up, but it's a decent introduction to Spark.
I**R
Best Reference for Spark
If you want to learn Spark 3.x+ this book is for you. Easy to read and with practical examples. Recommend it!
F**O
Contenido actualizado
Me parece un buen libro introductor al framework, sobre todo porque hasta el momento de esta reseña es de los únicos que tiene contenido actualizado a la versión 3.0 de Spark. Me ayudó mucho a pasar la certificación de Databricks en conjunto con el libro "Spark: The Definitive Guide: Big Data Processing Made Simple".Lo recomiendo.
J**Y
Nicely laid out and explained
I've just started my role as a Data Engineer where I looked at Azure's Data Factory. I needed to learn PySpark so I picked up this book and found it a super useful guide. It is explained clearly, and whilst it's clearly aimed at someone who has been in the industry longer than I, I found I could easily understand it.I haven't read the chapter on streaming or the two chapters on machine learning as it isn't applicable to me, but everything else has been just what I needed. Well done to the authors for putting together such an amazing guide.If you want to see the different chapter contents, I've added them as photos for your ease.
B**R
Very Good
Concepts are explained very well
Trustpilot
3 weeks ago
2 weeks ago