Questions and Answers
Hello Giuseppe Bonaccorso and thank you for taking the time to answer our questions! I looked at your book on Amazon and it looks really fascinating and comprehensive.
If you could rewrite it today, what algorithms would you include (that weren’t included) and which algorithms would you exclude (that were included), and why?
Hi Sara Lane,
the book was meant as a “continuation” of Machine Learning Algorithms (2nd ed. too), which contains more fundamentals.
In another project, I’d probably completely remove the Deep Learning part (which can be expanded in a separate book) and focus more on:
- All evaluation metrics with pros and cons
- A deeper (it’s already quite complex) emphasis on statistical learning
- Probabilistic graphical models (more complex methods and examples)
- Time-series analysis (also in this case, with many more details)
The reason is to write a more “complete” book focused on “classic” ML. This is not a limitation, considering the number of applications and the usage of these concepts in the context of DL.
Also, I see that you go through all the different areas of machine learning (or at least most of them) and explain the various algorithms for each of them. Which algorithms, overall, do you think are the most overlooked?
Dimensionality reduction (both linear and non-linear), component analysis, and, of course, DL models.
Why do you say that DL models are overlooked? In what way do you mean?
It’s more a marketing problem. In general DL practitioners search for books that emphasize them. This book is about ML (more generically), but many people enjoyed the DL part :)
Thank you for taking the time to answer my questions.
Hi, my questions are:
- are there any algorithms that you think many people make mistakes when they’re using it? (e.g. not understanding the underlying assumptions of a particular algorithm about data, use it in an unsuitable context etc)
- what are good ways of becoming more familiar with algorithms in depth? (since it can be really dry reading about them, and it’s also not all the interesting to write it from scratch to work on a toy dataset, or at least I don’t ;))
For example, component analysis is a tricky part, that is often misunderstood. Another area where it’s even too easy to make mistakes is clustering. Many newbies have no idea about the concept of distance measures and tend to use default approaches even when they are completely inappropriate.
In general, every algorithm has been developed with a purpose. That should be the starting point. Why is this algorithm different? Which peculiarities does it have? Answering these questions allows knowing the meaning of all hyperparameters and how to tune them up in real cases. It’s also important to compare the performances, trying to focus the attention on the differences. The expertise can be obtained only starting from the foundations of the algorithm. That’s why, in many cases, it’s also helpful to read the original papers, where the authors explained the context that led them to develop a new algorithm.
Hi Giuseppe Bonaccorso
As far as I see, your book contains wide range of topics.
Congratulations. It must have require quite effort to gather up all that topic in 1 book.
My question is;
To get the most out of your book, Should the reader have any prior knowledge of ml or math?
Yes. The book requires math knowledge to understand the theory behind the algorithms. In some cases, the paragraphs are self-contained, but, in many others, it’s necessary to give background knowledge for granted. As I often suggest, this can be a good way to learn something new. When you meet a concept you don’t understand, “bookmark” it, go, for example, on Wikipedia (or a text-book), and study what you don’t know. In this way, the gap will be slowly filled.
Thank you for the answer 👍
Hello Giuseppe Bonaccorso,
First of all, thank you for your time!
- How is it different from other books on the market?
- At the end of the book, what kind of intuition did you want the reader to create about algorithms?
Hi Alper Demirel,
- The book has been “designed” to focus both on theory (with proofs and mathematical explanations) and practice. All paragraphs contain examples to show how the algorithms work and how to implement them in real cases.
- The goal is to let the reader understand how machine learning can be achieved and which mathematical frameworks have been developed to do so. Every new step in this field requires not to forget the foundations, to avoid the mistake of thinking that progress has been obtained “by magic”.
I understand sir, thank you very much for your answers! Now I’m starting to wonder more about the book.
Hello Giuseppe Bonaccorso :
My questions are as the following:
- How you categorize the book: is it for beginners, or for experts, and what is the best usage of it… do you suggest to study it fully .. or to use when data scientist face scenarios and he want to find what is the best way to address them.
- in case of studying this the book, what is next .. i think it is depend on the interest of the learner.. but I am asking what is next from your side, what is your next book or plan?
Hi Ghaith Sankari,
- The book is for mid-level and advanced users. Everybody can choose the best way to use it. In general, I jump directly to the topic I’m interested in, but this is not a general rule.
- I don’t have any plan to write a new book. Every reader can pick the topics that most capture her attention and look for more detailed resources. In general, I add references for this reason.
thank you very much
Hello Giuseppe Bonaccorso,
Thank you so much for your time and allowing this opportunity to ask you questions!
ML algorithms are many and there parameters, even more.
- Is there a logical way of choosing an algorithm over another, other than a process of elimination and trial and error?
- Once picking your algorithm, and applying the various parameters suitable for your given project, is there a methodology in the finer tuning rather than memory, trial and error?
- Mastering successful algorithm implementation and developing expert intuition, is achieved from acquiring more knowledge and understanding (logical) or achieved from building on past experiences - successes and failures (deductible)? Or both?
Hi Livsha Klingman,
- Unguided trial and error is definitely not a good strategy. I always invite the reader to acquire a “basic” awareness and to create a subset of really appropriate algorithms (possibly covering the peculiar aspects of the dataset). For example, in a clustering task, it’s possible always to include K-Means, but if we know that the data is fragmented in irregular clusters (like in a geographic dataset), it’s also helpful to evaluate the performances of algorithms like DBSCAN.
- Hyperparameter tuning can start from default values, but it must go on according to each specific scenario. Considering again DBSCAN, for example, if there are too many noisy points, it’s easy to understand that the radius is too small. Of course, it’s possible also to employ methods like grid search or Bayesian selection. Again, searching without an understanding of the effect of every hyperparameter can result in a waste of time and unacceptable results.
- Definitely both
I noticed that you included VC dimension in the ML fundamentals chapter. I thought it is a notion that lives only in academia. Are there ways one can actually use it? For example, can one use it to anticipate the complexity of algorithm required to solve a classification problem on a given dataset? If so, are there algorithms to evaluate it?
I agree that VC is a theoretical concept and there’s only a paragraph to explain to the readers the efforts made to evaluate the capacity of a model.
The complexity of this theory is very high and it requires the usage of lots of maths. I referenced books and papers dedicated to this topic, but all practical applications are extremely difficult.
I included this concept for completeness, discussing an easy example. In practice (in particular DL), it’s almost impossible to make a correct evaluation (unless we default in the universal approximation approach). I hope my answer is satisfactory.
Thanks for clarification. I was just surprised to see that it is even mentioned, never saw it in “practical ML” book before. I am myself a mathematician, and I keep being reminded that the industry doesn’t have much reason to care about understanding algorithms or theoretical concepts.
After all, people just need to know which algorithm works better under certain conditions, and in most cases this is summarized in one line in any book/blog with common interview questions.
Was hoping there is a use of knowing what VC dimension is, but I guess there is not much…
This book looks like a serious work that required a lot of research. What gap were you looking to fill when you wrote it?
It’s indeed a very long work that required a lot of time… The gap I had to fill is the one normally present in papers (where everything is almost given for granted) and practical books (where there are only examples). I wanted to include theory contextualized with practical examples, avoiding too many “holes”.
Thank you so much for answering me previously!
Could I ask you another question?
My past ML experience was almost totally through knowledge gaining through asking others and trial and error, and though I got very successful results, my personal understanding of the specific hyperparameters and the ‘weights’ that they provide to each individual algorithm I felt was very limited, but I also did not find adequate resources to lead me to the clarity that I was looking for.
I visibly saw that depending on a given algorithm depended on the value and the effect of a given hyperparameter and not necessary was uniform to the same hyperparameter in other algorithms. Is that correct? And is your book targeting the loophole in information that I am looking for?
Algorithms are presented together with their peculiar hyperparameters, so it’s relatively easy to make your own personal experiments.
Absolutely, the effect of hyperparameters can be very different when changing algorithms (in some cases, they might not exist at all). But if the algorithm is described together with its hyperparameters, the selection work can be easier.
Of course, trial and error can be helpful, but, at least, you know that a hyperparameter can have an effect or another.
Just to summarize, some experience is necessary, but you should know that, e.g., an L1 penalty will induce sparsity. I hope to be clear.
Thank you so much - I’m assuming then that your book will give valuable insight into mapping through the maze of algorithms! Thank you again for your time!
Giuseppe Bonaccorso do you think hyperparameters could be made adaptive in the models? For example, is it possible to adjust the (l1 or l2)-regularization constant during the training of linear regression, instead of doing gridsearch? For examples, some optimizers can somehow adjust the learning rate while training, if they don’t like the progress.
Vladimir Finkelshtein Have you ever looked into Azure ML? It has both AutoML and a Hyperdrive option where you can specify parameters (like the learning rate) and use one of 3 types of parameter sampling: Random, Grid or Bayesian. It sounds like you’re talking about Bayesian Sampling.
i am not familiar exactly with this, but it seems that they replace greedy gridsearch with sampling, but they still run a training session for each choice of hyperparameters. I am wondering if one can adjust those parameters during one session of training (like some optimizers do). Another simple example of adaptive behavior is early stopping, one can think of number of epochs as hyperparameter, but during the training it can change if some conditions are met…
Yes, you’re right - they run a separate training session for each choice of hyperparameters. Interesting idea for the parameters to adapt according to the performance - if it hasn’t been done yet maybe you’ll be the one to do it!
Hi Giuseppe Bonaccorso! Thanks for taking our questions. I have two I’d like to ask.
- Do you handle working with datasets with unbalanced labels (e.g. 20 bad labelsfor 50 000 objects)?
- Do you have a kind of meta algorithm for how you decide the dataset does not contain enough information to answer the question as stated? Or a method for trying to suss that out before trying every algorithm in your book? :)
- The techniques to manage unbalanced datasets (like SMOTE) are discussed in the book Machine Learning Algorithms (2nd ed.). In this book, I discuss different semi-supervised algorithms to work with partially labeled datasets.
- No. I rely on evaluation metrics to understand whether a model is working properly or not. XAI techniques (like SHAP) can help understand how the features are contributing to the outcomes, so a domain expert might check whether the algorithm is working properly or not.
Hello Giuseppe Bonaccorso I do like your book as it is a good mix of theory and practical. I’m usually concerned when books contain sample code as we all know, libraries like scikit-learn periodically make changes to their tools. How do you react to this ? Follow the tide by making a new edition? Experienced users don’t have a problem making these changes as they know how to source for answers from GitHub and stackoverflow. Beginners become stuck as they don’t have enough experience to know that a change has occurred.
I start from the assumption that a user that understands the theoretical part, can check the documentation to know, for example, if a parameter has been renamed. Of course, it’s impossible to guarantee complete future compatibility, but I never refer to package, functions, or parameter names, but rather to the mathematical parameter.
Alright, thank you
I know that your book is used as a course textbook by some universities. How did it happen? Do you have a list of courses that use your books as a reference?
Yes, the books (both Machine Learning Algorithms and Mastering Machine Learning Algorithms) have been used as textbooks. Some time ago I posted on LinkedIn a list (I hope to find it quickly). I never promoted to university, but I have quite a good number of references in academic papers. That’s maybe the reason. I don’t know more :)
That’s nice! Thanks for sharing!
Since you mentioned Shapley values in one of the answers, when do you think interpretability techniques will become a part of standard ML curriculum? Books I have seen rarely mention anything beyond feature importance for decision trees (and even that without explanation of how it works or without mentioning its caveats).
XAI is a field that still requires a lot of basic research. I think many methods are already part of some advanced programs (like LIME or SHAP), but, considering the importance of their application (e.g., medical imaging), it’s still necessary some time to find out solutions that have the same solidity as the DT/Random forest feature importance. However, interpretability is essential to create engagement and increase confidence, in particular when black-box applications must be employed in critical sectors.
Has working on this book inspired you to develop your own algorithms?
Working on the examples was an extremely helpful exercise. In fact, I had to find out those elements in the algorithms that had to be emphasized. From this viewpoint, I also become more mentally flexible when working on new algorithms. In particular, in all those contexts where it’s necessary to find “unique” solutions and different aspects of several algorithms must be joined together.
Sounds pretty fascinating!
Not sure if this list of books is complete, but it’s amazing! How did you manage to write so many? What keeps you motivated?
Yes, more or less, it’s complete! I wrote a lot in the past 3 years. Now I’m taking a break. I always liked the idea of expressing the concepts I loved using my language and experience. Therefore I started writing. Every new book is a sorta new step because I keep on learning from mistakes and I discover new possibilities to expand what I’ve already discussed. However, it’s hard work and, when you have a “regular” job, it can become very demanding. That’s why I decided to slow down a little bit and restart when fully refreshed.
Indeed, it’s not easy to do it when you have a job. Do you have some sort of routine that helps you stay on track?
Discipline. And a lot of working weekends…
Discipline - that’s something I definitely need. Thanks a lot!
Giuseppe Bonaccorso What are the benefits an author get’s by writing a book apart from Monetary benefits?
Excluding the monetary benefit (which is almost negligible), writing helps to improve all ML skills as it’s necessary to think the concepts from different viewpoints (in particular learner’s one, which is generally one of the most difficult to manage).
Giuseppe Bonaccorso Hello, my question is: What is the difference of this book from other similar books? Ther are so many ML books in the market. Thanks in advance!
Hi Ufuk Eskici,
as said in other answers, my main goal is to join theory and practice without sacrificing the former for the latter or vice versa. Every paragraph starts with a complete theoretical discussion (sometimes more or less complex) that should help the reader understand how the algorithm works and continues with a practical example. In this way, it’s easier to employ any other framework.
I appreciate for your reply. Thank you!
I appreciate for your reply. Thank you!
Can you tell us about Bonaccorso’s Law? What is it? And how did the name appear? 🙂
It started from a joke because I used to repeat that it’s possible to learn what is already somehow encoded in the data. A friend of mine suggested me to call it “Bonaccorso’s law”. However, I think the concept is very important because nowadays so many people tend to think that ML is a sort of magic that can invent from nothing.
It is definitely important!
Not sure if this is the right channel but came across this short paper identifying a bunch of real life technical debt we face daily in ML: paper
This is a great paper! Probably
#engineering is the best channel to discuss such papers
One of my favorite papers ever!
How do you think the machine learning world is going to change over the next decade?
The field that is going to change a lot is certainly is deep learning. Lower and lower hardware prices and more and more powerful systems allow training huge models with tons of data.
There are fields (like the human brain project) that can benefit this research, but the business world is more interested in systems that can be monetized someway. So, today’s “fancy” will probably become more “classical”. Moreover, the diffusion of several automation tools will probably reduce the expertise required by many companies (while it will increase for cutting-edge ones). I don’t know if data science will be the sexiest job for a long time, but I’m sure it will have more and more tools to express its power.
Thank you for your response!
In which order do you think we should learn ML algorithms?
Do we first learn logistic regression and then decision trees? Or first decision trees and then logistic regression?
Relatedly, is there a red thread (Leitfaden) through the book other than the order it is laid out?
There’s no specific red line. In particular, considering the different families of algorithms that share only a few basic elements. I generally suggest following the path that best suits everyone’s needs. Sometimes, it’s necessary to “jump back” if a concept is missing, but normally this process works fine.
To answer your question Alexey Grigorev, I don’t think there’s a reason to select one algorithm as the first one. From a statistical viewpoint, logistic regression is indeed a regression, therefore it’s often studied before any other ML algorithm. On the other side, DTs are very easy to understand and they can be presented also to profanes. Considering my personal experience, logistic regression is generally explained before any other algorithm, simply because it’s linear and the logic behind it is mathematically extremely simple. However, there are courses, when DTs are explained first because the “technicalities” can be limited to just a few purity criteria. I don’t think there’s a golden rule.
My personal preference is to do it immediately after linear regression because we can build on top of that.
But this was recently challenged, so I wanted to know what you think about it.