Essential Math for Data Science

adanai

What are some concepts you’d recommend reading over after completing the book?

adanai

Do you feel that learning to build the algorithms from scratch is important or using the tools is enough? eg. Implementation of gradient descent using libraries vs using numpy to build from scratch

Thomas Nield

Like I posted in previous answer to Kevin’s post above, I believe it’s necessary to build from scratch at least once to have strategic value and insight into how these black boxes work. It really depends on how seriously you want to become a subject matter expert on a given topic, and of course you have to prioritize what you learn.
in my book, I teach how to do gradient descent and stochastic gradient descent from scratch for linear regression, logistic regression, and neural networks. For the first two of those I show how to do that without NumPy!
What you study after the book is dependent on your goals and what problems interest you. in the last chapter I emphasize it’s truly a “choose your own adventure” based on what you want to pursue! It wouldn’t be right for me to presume what every reader should pursue after the book, because I have a feeling many readers will react differently to what’s next for them.

David J.

Hi Thomas Nield, thank you for taking the time to answer questions about your book! My question is, what is something you learned later on in your data science journey that you wish you knew earlier on?

Thomas Nield

Oh boy, the way I describe my book to friends and colleagues is “it’s what I wished I knew 12 years ago before data science and AI became a thing.”
I think the biggest lesson I’ve learned is to ask questions when nobody else is. We live in a strange corporate culture where people don’t ask questions, especially the “elephant in the room” ones that people are afraid to ask (maybe out of fear they are missing something, or appearing ignorant?) I actually give this a name in my book, the Jabberwocky Effect.
What I find surreal is asking questions, especially the uncomfortable ones that challenge popular assumptions, really took me for a ride. It frustrated some folks, but I got relieved gratitude from others simply by cutting through narratives and seeking grounded information. I now teach at University of Southern California advising government, military, and aerospace agencies on artificial intelligence system safety… simply by asking questions. I now wrote a book on a subject I’d never think I was qualified to write, simply by asking questions.
Ted Lasso said it best, “Be curious, not judgmental.” You don’t have to be confrontational or rock the boat. Just be curious, tell people you’re confused, and you need help understanding. At worst you learn something new and show you aren’t afraid to admit you don’t know something. At best, you bring clarity to glaring problems that aren’t being acknowledged or addressed.

David J.

Thank you for the thoughtful response! I am definitely someone who has struggled with the Jabberwocky Effect for the reasons you stated (fear of missing something/appearing ignorant), but I think I’ve gotten better at trusting myself in knowing when I haven’t properly reflected on a question and when I think my question/confusion is legitimate. Sometimes that’s hard to gauge though, so when in doubt I try to embrace question-asking and follow Ted Lasso’s advice 🙂

Christian

Hi Thomas Nield, thanks for sharing your thoughs with us.
What do you think are the best ways to convert our reading activity on essential or fundamental Math principles for Machine Learning to expertise in solving real world issues?

Thomas Nield

I get this question in some flavor a lot. It’s easy to feel like a hammer looking for nails, which is pretty common for those who study machine learning. You can always create your own self-study projects, on public datasets or toy datasets you create (which I like doing as controlled experiments). But if you have a job that has you doing less glamorous tasks with data and isn’t providing you opportunities to use machine learning, try to take a problem-first approach. What problems does your employer have? And once you’ve identified that, try not to bludgeon the problem with machine learning but rather look at what other solutions are out there: linear programming, optimization, heuristics, metaheuristics… pairing the right solution to a problem is an invaluable skill. I think half the value of knowing machine learning is just simply recognizing what it doesn’t do, and confused employers can benefit from that kind of knowledge expert.
The other route is to explicitly pursue roles that are machine learning geared, like ML Engineer rather than a generic data scientist.

Christian

Thanks a lot 🙏

Ricky McMaster

Hi Thomas Nield thanks a lot for doing this! Along with the main areas covered in the book of calculus, probability, linear algebra, and statistics, do you address (or do you have any thoughts on) other areas such as Optimization (which is of course highly related to linear algebra and calculus)?
The reason I ask is that I have sometimes seen candidates or colleagues approach a task by devising complex machine learning models, when in fact all that was really required was to solve a non-linear equation, for example.
Update:
I see that you’ve already mentioned this in your last answer to Christian , so my question would be then be - what do you advise those who do not come with significant academic grounding in mathematics in choosing the appropriate technique for a task (like the scenario above)?

Thomas Nield

Glad you discovered my answer to your first question. I do talk about optimization a bit in my book, and I even throw a section on linear programming in the appendix. I would have loved to include a chapter on metaheuristics and optimization, like the traveling salesman problem. But I had a 350 page book already so I just mention these other techniques.
What’s funny about academics and people with PhDs is they often are hyper specialized in their own rabbit holes. Many are much better writing papers on the HOW rather than the WHY. They can fill a whole whiteboard full of Greek symbols and impressive math equations, but handwave over the silliness it is an algorithm to separate pictures of hot dogs from dachshunds wearing hot dog costumes. What value does this create? So you have to take the academic gravitas with a grain of salt, but there are exceptionally talented academics of course who do valuable work. There are a few I look up to.
A Cherry-picked example though is a secondhand account from a colleague who went to a predictive modeling conference. A PhD candidate created an elaborate model trying to pinpoint factors causing a major airport to have sudden delay problems. During his presentation full of theories, data, and regression models, my colleague said “the airport closed a terminal for construction this year! Your model is not at all accounting for this!” He reacted sheepishly and carried on, even though a quick Google search nullifies his entire project as the delay cause was well known.
The question then becomes how do you stand out and have a skill set most academics do not have? I think the best answer to that is to have more awareness of business context and what’s practical, not having deep specialization in one topic with little insight on the overlap with the real world. I hate to sound like I’m advocating becoming a “jack of trades, master of none” but there is so much information and complexity out there. Special interests are selling silver bullets to get more funding and investors. Somebody has to be the one to know how to pair the right solution the right problem, and question experts who can’t see beyond their own field.

Ricky McMaster

Wow thanks a lot for the considered response! That’s a great and memorable anecdote about making sure you keep practical context and current available knowledge in mind.
So in other words…. there’s just as much danger in applying too much academic theory as not enough, possibly more so?!
I suppose from the point of view of practical statistics, this would tie in somewhat about checking your assumptions (and indeed your biases) at the outset of the project. And of course Occam’s Razor is always good to keep in mind - actually Alexey, the creator of this group, has something similar at the beginning of his Machine Learning book. Namely, ask yourself whether an ML model is actually relevant to the current task?

Thomas Nield

Exactly 💯

Evren Unal

Hi Thomas Nield,
When I was looking at your book`s content I realized that it is very concise.
How did you choose the content of the book?kilicdaroglu

Thomas Nield

Choosing content wasn’t easy. There are certainly topics I wish I could have included such as how to build simulations as well as optimization algorithms in more depth. But I made machine learning the end goal of the book, and to get there I guided readers through foundational topics like linear algebra, calculus, and statistics which then feed into linear regression, logistic regression, and neural networks. The “build upon” approach worked quite nicely, and areas I couldn’t get to like optimization could at least get called out as other areas to explore, and I provide tons of resources throughout the book to learn more. I made a diligent effort as well to tie in real world examples and insights, as well as pitfalls to watch out for.
The last chapter covers career advice and my polite rant on the state of data science. I give advice on how to navigate a field that’s largely been co-opted by many interests due to poor definition on what is and is not data science, and how to thrive and avoid career pitfalls. That was a fun chapter to write and one I think will help most readers. Based on the questions I’m getting on this channel, I am getting further affirmation that’s the case.

Evren Unal

Thank you very much
I hope your experience give good insight to readers.

Matthew Emerick

Thank you for doing this, Thomas Nield. We appreciate your time.
Is your book geared more toward newcomers to the mathematical side of ML? Or is it better suited for those who once learned the subjects at university and are trying to relearn it and dig a bit deeper?

Matthew Emerick

Should the learner pick up some basic ML first, or start with the math?

Matthew Emerick

What is your opinion of learning through methods such as your book and YouTube videos versus taking classes?

Thomas Nield

This book is definitely geared more towards newcomers and beginners, who at least has some understanding of high school algebra. no other knowledge is assumed including machine learning. That being said I think a lot of folks who have been dabbling in data science and machine learning will learn something new from the book. I pack in a lot of lesser-known knowledge that I wish more people knew going into the field, including brief insights into how self-driving cars work and how the application of ML matters in terms of hazard and risk.
I think learning is all about pursuing good information regardless of the medium. Classes, videos, books… I have found many good instructors across all these media and I share them in my book. It’s not so much which medium is best but rather the instructor is good and knowledgeable, has experience, and is able to explain things that click with learners.

Alena Kniazeva

Hi Thomas Nield, thanks for sharing your expertise in math and data science.
How do you think, what is more valuable in data science: mathematical or programming background? I mean who is more likely to succeed in data science:

experienced programmer, if he’ll learn math concepts and take some data science course
an investigator or university professor with a strong knowledge of math, if he’ll learn Python and also take some data science course?

Thomas Nield

I talk about this extensively in my book, and you’ll probably not be surprised by my answer based on some previous answers I gave to other questions ; ) I think the the experienced programmer is going to do better in a majority of data science job listings out there, because most tasks in data science are unglamorous data wrangling and moving it from one place to another. Then there is a growing awkward need to put models in production, and a programmer is already going to know how to do this well. This is 95-99% of useful data science work.
There are exceptions for some roles the PhD would do better, if the role is hyperspecialized and requires said PhD. Maybe a high profile role at a big tech company would require that kind credential as well, and I’m guessing if they are qualified for that role they already know how to code. If there isn’t programming involved a role would probably be more advisory than coding. These are just my observations though and are somewhat anecdotal, but I’ve seen this pattern from what other people have shared with me too.

Alena Kniazeva

Thank you for a thorough answer. It is very very useful to hear position, that is based on real practical experience :thank_you:

onyeka okonji

Hi Thomas Nield how important is an understanding of Maths for a non-research career in Deep Learning with a focus on computer vision.
Secondly, how well do you think one needs to master the Maths of DS. Say on a scale of 10.

Thomas Nield

I think there is definitely some mathematical proficiency needed, especially on the linear algebra and basic calculus front (and yes, my book covers both). There are definitely rabbit holes knowing every minute detail on how TF works. But conceptually knowing gradient descent, matrices, vectors, tensors, and mathematical functions is largely unavoidable if you want a productive understanding of TF. Chapter 7 teaches how to build a neural network from scratch too.
For data science in general, data science can mean something different to each organization. But at minimum I would be familiar with statistics and hypothesis testing. It’s just as important to have comfort and proficiency working with data (SQL, pandas) and programming in general from my experiences.

Thomas Nield

Best thing you can do is to learn what’s relevant for your job and to always find the right tool for the problem, not the other way around! Math may be involved, it may not

onyeka okonji

Thank you Thomas Nield i hope I get lucky. I’ll definitely want to read the book.

onyeka okonji

I should ask, does the book cover statistics and probability too?

Thomas Nield

onyeka okonji yup! Each of those topics get their own chapters.

Alexey Grigorev

Animals for O’Reilly books always seem a bit random - but why mice? 😃
I know you probably didn’t have any control over it, but maybe you have a theory how the cover is related to the content? 😅

Bhupendrasinh Thakre

Read somewhere (maybe rumor) that animals in their books are about to extinct or needs attention. Probably these are not mice 🐁

Rafael Socorro

https://www.oreilly.com/content/a-short-history-of-the-oreilly-animals/

Thomas Nield

Last I checked, the cover artist chooses the animal. My first book (Getting Started with SQL) had a natterjack toad. This one had mice. One thing that seems to be consistent somewhat is animals are thematically consistent somewhat based on their taxonomy. Reptiles are used for data books? Cats for Java? Rodents for data science? It’s a mystery…

Diogo Telheiro do Nascimento

Hey Thomas Nield! Nowadays it is getting much easier to try ML models and check if it fits to your data (especially Classic ML approaches). It is commonly done almost like searching for your sneakers size.
My question is: How does Math should be used in this context of model selection?

Thomas Nield

What a question. I could give you the conventional answer that you should choose the model that fits best to the test dataset and ROC/AUC (I cover this ad nauseum in my book) or produces the highest R2.
But I caution a lot in the book that math and data does not capture context. Just because your test dataset or validation dataset scores well or you found a set of hyper parameters that give a result, it does not mean your model is at all connected to reality. The data (inevitably) is biased, the model (inevitably) has assumptions. The hyperparameters are easily P-hacked. A magic math formula is not going to quantify any of that or capture those qualitative issues that only a human in the loop can solve. Computers are incapable of discerning correlation from causation, detecting bias in data, or having any notion of ground truth in higher dimensional problems.
This is why I tell people to be analysis-driven, not data-driven.

Diogo Telheiro do Nascimento

Thomas Nield, thank you a lot for answering my question. This is by far the best answer I have ever had in this subject. 🤯

Sandhya G

Thomas Nield i was wondering about the reasoning for including SciKit Learn in a chapter on Neural Networks. Thanks!

Thomas Nield

Good question, I didn’t want to inflict another library on readers when they worked with a few already throughout the book (numpy, scipy, sympy, sklearn). It was basic and simple to just use what sklearn already provided, following the previous API patterns from previous chapters. Also a majority of that chapter focuses on building a neural network from scratch using NumPy, including backpropagation and Stochastic gradient descent. TF or PyTorch were mentioned but not given a tangential focus from the purpose of that chapter: just getting insight on how neural networks actually work.

cactusmkt

Hi Thomas Nield, thanks for coming to talk about maths in data science! How do you suggest data professionals to navigate through maths concepts and applications in data science? Is there a way to know which ones are must know amongst different paths in data science: analyst, data scientist, machine learning engineer, and so on? If I don’t use some concepts often in my work, it’s very difficult to remember them. Any suggestions would be helpful, thanks!

Thomas Nield

I talk about this A TON in the final chapter of the book, which is career advice paired with my polite rant on the state of data science. What seems to be widespread is employers jump on the data science wagon but there doesn’t seem to be a good definition that everyone can agree on. This is problematic for anyone working in the field, because there’s no scope or restrictions on what is and is not data science. Why this happens is due to organizational politics, and I expand on this in my book. I also provide a sensible and practical definition: a data scientist is a software engineer with proficiency in statistics, machine learning, and optimization.
My advice is to learn what solves the immediate problems in front of you. Learn some foundational building blocks which my book attempts to share, but don’t bias towards a specific tool out of FOMO. The greatest solutions to everyday problems are often obscure and sensible, not making news headlines : )

Ricky McMaster

Thomas Nield I guess from the above that you might have something to say about communication barriers between business and data/tech? This has certainly been a recurring issue for me, and I presume many others.

Thomas Nield

Those barriers exist as always yes, but I also think there are larger systemic cultural issues that are unique to the current corporate climate. The barriers to what is and is not data science has regressed to a point anyone who touches data can call themselves a data scientist. I think this is largely because middle managers under pressure to check the “data science” box have every incentive to rebrand their existing analysts, SQL developers, and Excel jockeys as data scientists.
But this isn’t just due to ignorance, but rather the wrong incentives that are put in place. High dollar management consultancies tell their F500 clients that sentient AI is just around the corner (not true) and this further enables them to sell more services to make the organization “AI-ready.” Management then grasps at straws trying to get talent as stated earlier and they haven’t even tangibly defined what they are trying to achieve…
And that’s another thing to consider! One has to weigh the gold rush that machine learning has created, and you have to look at the people selling shovels: consultants, cloud vendors, media outlets, GPU vendors, people who hold stock in these companies… You also have to observe the speculative markets pushing stories about AI to increase stock valuations. Slight tangent, I find it well-timed that Elon Musk used the Twitter fiasco as a vehicle to sell his Tesla stock, shortly after admitting that full-self driving is harder than he thought, and it would devastate Tesla’s stock valuation if it didn’t pan out. Maybe coincidence, maybe not. But it certainly was convenient he could signal he wasn’t selling Tesla stock out of concerns for the company’s future, but rather he wanted to buy Twitter for free speech activism reasons and then blame it for its well-known bot problem.
But I digress. Pulling back, there are a lot of valuable career paths in becoming proficient with data and computer science techniques to work with it. The challenge is navigating the current corporate culture that’s highly vulnerable to narratives and financial incentives that distort expectations on what’s possible, as well as what’s actually useful. I talk about this a lot in the final chapter.

David J.

Thomas Nield Super interesting. A somewhat digressive question related to your Tesla digression: are there any blogs/publications that you read on a regular basis (daily/weekly) to keep up-to-date on data and data-adjacent news and perspectives?

Thomas Nield

David J. For vehicle automation, I like Autonocast. Great podcast. Two of my favorite episodes.
http://www.autonocast.com/blog/2020/3/27/181-stefan-seltz-axmacher-on-the-end-of-starsky-and-the-future-of-autonomy?format=amp

Thomas Nield

http://www.autonocast.com/blog/2020/4/1/182-nancy-post-of-john-deere?format=amp

Thomas Nield

For staying up to speed, what’s funny is I spend a lot of time studying classical statistics and finding how much has been forgotten by contemporary practices. I try to understand how that progression happened, and more often than not it’s a little scary how messy things have become.
I read a lot of books (Aurelian Geron’s on Machine Learning is fantastic) but synthesize information from a lot of disparate sources, from research papers to YouTube videos. For data-adjacent I read WSJ and nonfiction books. Most importantly I do targeted research and try to pursue information relevant to what I’m interested in, while occasionally chasing my tail with what the cool kids are talking about this week. I need to dive into transformers at some point…

Ricky McMaster

Very interesting (Tesla etc.). For sure it would benefit the public and indeed our field if stats and data were better understood - indeed that’s why How to Lie With Statistics was written, among others. I just hadn’t considered that there were now such huge vested interests against it.

Thomas Nield

Companies don’t like sharing their safety data because of confidentiality and competitive secrets as well. Major frustration with safety world. It usually takes a high profile accident to make any strides in getting insight, but this struggle is still ongoing.
https://www.ntsb.gov/investigations/AccidentReports/Reports/HAR1903.pdf

Kevin

Hi Thomas Nield I remembered from my time at college that my math teacher used to say that probably we would never perform matrix multiplication manually and we should learn the tools (at college maple) and understand what we were doing instead remembering how to perform those operation manually.
What is your opinion about it?

Thomas Nield

Good question. I say this in the last chapter of my book, but what you learn has to be prioritized. I can’t tell you how a regular expression compiles but I am very good at using them. I have no reason to go down that rabbit hole unless my job suddenly needs me to become a subject matter expert on the ins and outs of regular expressions… and that’s the key determining factor!
But for things like machine learning, and if you want to practice machine learning, it is beneficial to attempt building a linear regression, logistic regression, and neural network from scratch at least once. And yes my book covers this! This requires some matrix multiplication, but by doing this exercise you can speak to the libraries you use with more insight and subject matter expertise. That’s not just invaluable but arguably necessary.
So it really depends on how much knowledge authority you need on a subject, and whether doing a deep dive into the black box has strategic value. Machine learning is a topic that very few people actually understand and yet are making strategic decisions on, so it might be a liability to just have a black box understanding and nothing more.

Kevin

Thomas Nield thanks for the answer!

DataTalks.Club

Essential Math for Data Science

by Thomas Nield

The book of the week from 29 Aug 2022 to 02 Sep 2022

Questions and Answers