Full description not available
I**G
Beautiful with excellent references but the code needs work
This is a beautiful, thoughtful survey with excellent references. I am an academic data scientist with nearly 20 years experience and I wanted a book to offer my students who are starting in the field. This is it.The "difficulty" with data science is in the breadth of skills that are needed. Because data scientists need training in art, communication, statistics, and programming nobody is prepared to handle all the tasks and the neophyte (and expert) will need to fill in around their weaknesses. This book does a brilliant job of working around that issue. The writing is superb for a beginning to intermediate reader and the graphics and aside boxes are engaging. More importantly. the references are plentiful and spot on. In the areas I know well the authors suggest the things I recommend and where I am weak the recommendations have proven interesting.While this is a broad survey, there is some depth here. There are formulas throughout but the book does not get bogged down in proofs and derivations. There are programs written in R code scattered throughout. The code is nicely commented but there is not a deep dive into how it words. So, the reader who knows some R will learn a few new tricks but it does not interrupt the flow of the book.A reader who types the R code will run into problems. Clearly the authors/editors did not attempt to run the code after the type setter mangled it. For example, on page 39 there is a line which begins with a + and that character needed to be on the previous line. In other places, (like page 49) functions are invoked (count) but the authors have not included the commands to make the functions available (in this case library(plyr)). Sadly there does not seem to be an errata for the book and these will be major headaches or show stoppers for novices.While this book could be improved with a code supplement on the web (including the code to make all the graphics, complete solutions to the example/problems and an errata), this is a wonderful buy for readers of all levels.
D**L
Great Introduction to the Field!
I purchased and read this book because I was interested in learning about what data scientists "do" on a day-to-day basis, and the book definitely delivers in that respect. It is very up-front in stating that it does not cover many of the details (e.g., requisite math/statistics background, coding in R, etc.), but it provides numerous references for learning this material for those who wish to take a deeper dive. It provides perspectives from a number of experts in the field, along with their associated domains of expertise, and hence appears to give a good sampling of the different backgrounds and personalities of self-labelled data scientists.From a pedagogical perspective, I think it's a valiant attempt to create a class that's not so "academic." The book is a compendium of individual lectures that were the basis of a data science class at Columbia University, and the corresponding assignments were aimed at giving students a flavor of real-world data science problems (where data is messy, specific questions regarding outcomes are not-well-formed, etc.), which I think is an amazingly valuable experience to give students perspective on what the field is about. But I wonder if the course went a little *too* far into "the real world is not pretty" aspect of things. The students wrote a chapter at the end of the book where they criticize a standard academic problem that comes from a statistical learning textbook and that, in my opinion, is a negative outcome. While elegant mathematical theories do not describe the complexities of the real world, understanding the subtleties of algorithms is an important part of any scientific field, and to discount that is a disservice to the students. Too much depth and not enough breadth is bad in the real world, but so is not having *enough* depth. On the whole, though, I really applaud the approach of the authors at building a "real-world" class.One of the things that book has left me pondering is whether there is a clear distinction between data *science* and data *engineering*, and whether it is possible to have clear roles for people who are predominately scientists than those who are predominately engineers. A few of the guest lecturers claim that 90% of the work of a data scientist is organizing the data, but I can't help but wonder if that's the bias of the individual speakers---that they *like* this component of the work and, therefore, focus on that more, rather than it being an absolute must to spend 90% of your time working on data structuring. I am sure that if the authors were to read this comment, they would think that I am missing one of their major points (namely, that one can make a lot of mistakes in data leakage and that like from poor data structures, hence making this an integral part of the process). But I think there's a certain personality type that really loves working on the actually processing component who would otherwise be turned off my too much data structuring if it really composed 90% of the work. And harnessing the skills from those types of people is undoubtedly invaluable. So I'm really wondering if many companies are able to break up roles where some people (the scientists) focus more on the processing part, while others (the engineers) focus more on the structuring part.
C**A
A healthy blend of competence and humility
I make my living working with businesses to help them build their analytic capabilities in sales and marketing, by working on real opportunities and generalizing lessons from specific experiences and results. Business sort of demands that over time you narrow problem-defining frameworks, problem-solving techniques, and problem-processing operations to what you can scale and consistently deliver. But what I have learned is that if you push these past use as points of departure, and force them as points of arrival, results and learning suffer. (As they say, if you have a hammer, then everything looks like, etc. etc.) This is especially true as things get more complex on all these fronts. What I appreciated about this book was that even as it is an excellent survey of specific approaches and techniques at the coal-face of their application (there are code samples here, people!), it's also a thoughtful exposition of the limits of these and of the pitfalls they present. You might be tempted to skip Chapters 15 and 16 if you're just looking to RTFM, but you shouldn't; for me they were the real meat of the book. Maybe it's confirmation bias at work, but it was reassuring to hear such a humble tone coming at the end of such a competent treatment of different data science challenges. Doing this work very occasionally brings triumph and much more often despair, so it helps to have reminders from seasoned pros that what really matters is just trying, steadily, to get better at your craft.
Trustpilot
Hace 2 meses
Hace 1 semana