MLOps Zoomcamp: Free MLOps course. Register here!

DataTalks.Club

Reinforcement Learning

by Phil Winder

The book of the week from 11 Jan 2021 to 15 Jan 2021

Reinforcement learning (RL) will deliver one of the biggest breakthroughs in AI over the next decade, enabling algorithms to learn from their environment to achieve arbitrary goals. This exciting development avoids constraints found in traditional machine learning (ML) algorithms. This practical book shows data science and AI professionals how to learn by reinforcement and enable a machine to learn by itself.

Questions and Answers

Dmitry Yemelyanov

Hello, Phil Winder! First of all, thanks for kindly agreeing to share your knowledge with us :muscle:
:question: So my question is:
Could it be possible to improve performance of RL agent doing humanoid motions by virtual demonstrations of a person wearing a mocap suit and what, in your opinion, would be TOP challenges in order to do this?

Phil Winder

:100: yes. This is a perfect example where Behaviour cloning/Imitation RL will be useful. In fact, this reminds me of a paper that I read a while ago… Here: https://bair.berkeley.edu/blog/2020/04/03/laikago/
Gif for example. 1) Motion capture, 2) no IRL, 3) with IRL.

Alexey Grigorev

Apart from multi-armed bandits, what are the other RL techniques that are getting wide adoption in the industry?

Phil Winder

Good question but it’s hard to obtain any real numbers on this. From my research/reading, most people tend to follow the media. If a particular algorithm gets media attention then it’s then quite popular in the frameworks which then leads to adpotion.
In general though, the tried and tested, simple models tend to remain the most popular. From basic Q-learning based algorithms, to simple policy gradient algorithms like SAC.
There’s no one-size fits all “best algo” though, like in ML, the “no free lunch” theorem. So you have to evaluate and experiment for your particular application.

Phil Winder

Morning all. I’ll be online all week and responding to questions in threads. Be sure to tag me so I don’t miss the message. Thanks in advance!

Alexey Grigorev

Good morning!

Sara Lane

Good morning Phil Winder!
Which industries do you see being most affected by advancements in reinforcement learning? More specifically, in which industries do you think it will prove the most useful and also will be open to actually implementing the necessary changes?

Phil Winder

Hi SL,
For reference, see page 5-7 of the book.
“Industry” is a tricky word because it is broad and out-dated. It’s similar to asking what industry could make use of software. Of course, all of them could. There are opportunities everywhere.
With that said, it’s a valid question. So far, robotics seems to be the number 1 use case. Simply because it’s hard to derive control programs for complex tasks. It’s easier to learn them.
Pricing/bidding/recommendations/advertising/etc. are largely similar tasks and have also had a lot of press.
The finance industry are going to be big users. I’ve spoken to people already that are using it.
Healthcare and specifically personalised medicine is a perfect match, although the regulatory requirements are likely to prevent this from taking off.
The Tech industry can leverage it to much greater extents for automation. E.g. ML, auto-ML, neural architecture search, etc. Lots of mundane automation like Alexa, email control, etc.
And lots more… :smile:

Sara Lane

Thanks for the clear response!
I know that all industries could technically use this technology, but I’ve seen that many businesses are hesitant to take on new things. Many of them live by the adage, if it ain’t broke, don’t fix it. So I’m curious more where you see it taking off practically, not theoretically.
What you said about healthcare is interesting, why would regulatory requirements prevent reinforcement learning from improving things there?
As far as pricing/bidding/recommendation/advertising, why is reinforcement learning superior to other forms of machine learning for this?
Thanks!

Phil Winder

Healthcare == people’s lives. So there’s lots of rules and regulations to prevent accidents. This means there’s a very high barrier to entry. (I’m talking from a UK/EU perspective by the way :wink: - there may be fewer regulations in, say, the US for e.g.)
Better depends on your application. Testing, experimentation and evidence will prove whether it’s better. But in general, any application that involves mult-step decision making could be improved by RL. ML makes one-shot decisions which are unlikely to be optimal in the long run.
See page 5 in the book.

Sara Lane

Phil Winder I just read pages 5-7, fascinating! As you wrote, I always associated RL with robotics and not much else. Thanks for clarifying and I look forward to reading more.
And I believe that you’re correct, in the USA there aren’t as many regulations for these things and I wonder if we will indeed see RL being used in healthcare in the coming years.

Neal Lathia

:question: Phil Winder What are your top tips for debugging RL algos?

Phil Winder

Great question. Check out chapter 11 for more detail on this.
Here’s some random thoughts off the top of my head:

  1. Visualise what is going on (like any data-related task)
  2. If you are given the environment, start with the simplest algorithm and work up (e.g. random/CEM).
  3. If you have control over the environment/simulation, make that as simple as possible and solve that first. Then make the environment/simulation more complex.
  4. Split the tech. If you’re working with deep models, attempt to decouple the training of the deep NN from the RL. Not always optimal, but makes development much easier. For example, use autoencoders, train the autoencoder first and verify it works. Then pass the much lower-dimensional state into the RL algo. It will train much faster (possibly less optimally) and it will be easier to figure out issues.
  5. Split the problem. Try and halve the problem. Halve it again. Solve each quarter independently.
  6. Consider hierarchical policies (similar to 5). If you can manually design the hierarchy, even better for understanding/explainability. But you can automate that process too.
  7. Good old debugging techniques. print’s are your friend.
  8. Assert expected array sizes
  9. Don’t overcomplicate the reward function.
    And more and more…
Dmitry Yemelyanov

Great answer! :thumbsup:

Jayesh Garg

Are there customers already live with RL algorithms?

Phil Winder

Hi Jayesh. Oh if only my sales team had access to that information. :smile:
So, there’s a variety of things that I’ve heard. Some public, some not. Let me try and recall some:

  • Covariant AI demoed a super cool RL-driven pick and place robot.
  • I’ve spoken to engineers that have used RL to improve their recommendations.
  • I’ve spoken to leaders that have deployed RL as part of a continuous-learning strategy for their ML models.
  • I spoke to another leader that managed to reduce the size of the ML team running their core recommendations algorithm by using RL.
    And there’s loads of use cases reported in the papers. But of course, whether you call that production or not depends on what they are doing. Many are pure research. But lots are for research on current production systems. For example, this one from the YouTube team,
    More on https://rl-book.com/applications/ and in the book.
    Of course if you know anyone that wants to develop production RL algorithms, let me know. :wink:
Vladimir Finkelshtein

What would be an example of environment with which one can experiment at home? I have neither robotic hand at home nor trading partners willing to make biding wars. The card/text/video games are covered in much detail in the books. It will be more interesting to play with something resembling a commercial use case.

Phil Winder

Hi Vladimir,
The world really is your oyster here. You can create your own in a domain that you want more experience in (that’s a great way to gain experience). Or you can search through the thousands of gyms other people have created.
For example:

Rishabh Bhargava

Phil Winder what are the most common use cases in industry where problems are framed as supervised learning (or ranking) problems, but you would reframe them as RL problems? what would be the evidence you would give such teams?

Phil Winder

Hi Rishabh. Really great question and one that deserves a much more comprehensive and evidence-based answer.
But, if I had to try and fit it in a chat window….
I’d summarise the dilemma by reminding you of the Markov Decision Process (MDP - page 35 of the book).
If you have an environment that has state that can be mutated, if it can be observed, if you can alter the state through your agent’s actions, and if you have a business problem that where it pays to move the environment into a certain state, then by definition you have an RL problem.
To the first part of your question, common use cases masquerading as supervised ML…
Any recommendations task. I think that’s broad enough for you! I would suggest that the vast majority of cases where people use recommendations are optimising for the wrong thing. The goal is to help the user find things as easily as possible so that they value the functionality and keep coming back/buying unnecessary plastic stuff.
A standard solution (I’m grossly simplifying here) would build a model, in a supervised manner, that maps user intent to products, quantified by click through rate or something.
That’s entirely the wrong metric. You could use RL and train over full customer lifecycles. You could train on raw profit. Or the amount of time individual users spend on the site. Or whatever is most applicable for your problem.
So the action is the recommendation (lots of research available on this). The environment is user and possibly the business/products. The observation is the product catalogue, user demographics, past history, information, the weather, etc. The reward is customer lifetime value or whatever.
Look up any of the RL recommendations papers for an academic argument as to why RL is better suited.

Rishabh Bhargava

Thanks for the great answer.
On the topic of recommender systems: if the metric that is being optimized might be off, do you think this is more down to Product Managers not setting the metrics correctly or ML engineers choosing to solve for the simpler problem (even though it might not be the right formulation)? Given that this is such a technical domain, who should be pushing for RL adoption?

Phil Winder

Like most things in life, I suspect there’s no easy or right answer. I’m no expert in management, but I think POs or PMs should be steering product development, but decisions should be agreed/discussed as a team. Ideas, solutions, metrics, everything, have to be defined by “the team” because no one person can know everything and get everything right.
I have the same argument with people that have the word “architect” in their title. :wink:

Ashutosh Sanzgiri

Hi Phil Winder - I am curious as to why RL techniques are not widely used as a means to improve on supervised learning problems - by this I mean both in model training (hyperparameter optimization, architecture search, ensembling models etc.) and in model deployment (measuring model drift, correcting for it etc.)

Phil Winder

> why RL techniques are not widely used as a means to improve on supervised learning problems
Why? I guess it’s some complex combination of attention, media, ease of use, advice, reading, media and something with the word OpenAI or Google in the name. :smile:
I mean, it’s there, it’s possible. Maybe it’s just waiting for someone to wrap it or market it better than the last person? Hint hint, nudge nudge. If you have a spare 6 months on your hands. :slightly_smiling_face:
To be fair there are things out there already. For example I’ve used Optuna for hyperparameter optimisation, which has an RL solution in there.
But they’re not selling the fact that it’s using RL. They’re selling the fact that it automatically does hyperparameter tuning for you.
Same with Kubeflow’s Katib. That has an RL mode too.
That’s the thing about engineering in general. People don’t care how the sausage is made. It’s the product that counts. And it’s why UI engineers take all the glory!

A McCauley

Hi Phil Winder , congratulations on the book! I have a question on the topic.
Is reinforcement learning considered a crucial approach in robotics (or do you have an opinion on its use for this)?
With a robot learning behaviours through ‘trial and error’ of their interactions. It seems like RL would have many advantages which ML just couldn’t solve in this case - becoming useful with dynamically changing constrains, or environments, they would experience.

Phil Winder

Hi A,
Crucial. Hmmm. Depends on how you define the word. I wouldn’t say it’s CRUCIAL, in capital letters, no. You can create perfectly adaquate solutions using simple stuff like PID controllers and inverse kinematics.
The threshold is complexity. Once you need to do something remotely complex, like more complex than just “move to coordinates x,y” or as soon as it involves a non-trivial number of interacting components, then yes, RL is probably necessary.
But I think that’s missing the point slightly. The great thing about RL is the interface. The MDP. It’s a way of defining problems, not solutions. And it can be applied to any project, simple or complex. If the interface is the same then you can use the same processes, the same techniques to solve a wide variety of problems. It scales from simple to mind-bendingly complex, very few ML techniques and say the same.
For example, if you worked for a robotics company and you sold a bomb-disposal robot and a floor-cleaning robot, you’d have to develop completely different architectures, systems, code, solutions, etc. But if you’re using RL, it’s the same. Define the environment, define what you’re trying to do, try lots of actions and learn which ones maximise the reward.
Sorry, rambling a bit. But yes, I think we’re on the same page!

Alexey Grigorev

Phil Winder you mentioned in a thread that RL was used for “reducing the size of the ML team running their core recommendations algorithm”.
I’m curious to learn more about it. If it’s possible, can you give more details about this case? How did they actually do it?
And another question - when RL will replace all the data scientists? :sweat_smile:

Phil Winder

Haha. Thanks Alexey. I knew someone would pick up on that.
This is not another clickbait “we all won’t have jobs next year”. :smile:
No I can’t I’m afraid, it’s not public knowledge.
To summarise, consider:
a) a team of 10+ highly educated, very expensive smart people tweaking neural network architectures and running massive expensive experiments (for example). This is what large tech companies do to solve heavily used, data-intensive systems.
vs.
b) an RL algorithm, with a decent reward function, that trains itself over the long term, to solve the actual business metric that the business is keen on improving.
RL can easily match and with effort surpass the performance of that team quite quickly.
To be clear, the engineering challenge doesn’t go away, it shifts. Now these people are curators. Guardians of the RL algorithm that is actually doing the number crunching. There’s still a lot of engineering work that goes into building a system like that, but it’s not pure data science any more.
I’m being intentionally vague and speculative here, but you can see it happening.
edited for clarity.

Alexey Grigorev

Makes sense. I like the “guardian” metaphor! Thank you

Alexey Grigorev

Good morning!
Phil Winder you mentioned that “the engineering challenge doesn’t go away, it shifts”.
I’m curious to learn about these engineering challenges that come with training and deploying RL algorithms. How are they different from “classical” ML and DL models? What are the typical tools for training and deploying?

Phil Winder

Morning Alexey.
First, bear in mind that there isn’t much industrial experience of running RL in production, yet. It’s not like ML, where there’s now years worth or experience to leverage. But I can speculate.
One of the key issues with RL is state. By definition the MDP loop is constantly evolving. New observations, new models, new actions. In particular, if you’re running an algorithm which is actively learning (most, but not all implementations), which means that the underlying state of the model (the trained parameters) are changing ALL the time.
One of the definitions of “modern” software is immutability and software that is free of side effects. By definition, an actively learning RL algorithm is mutable and most definitely has side effects!
So over the next few years I predict that there is going to be industrial research (i.e. new frameworks/blog posts/presentations/etc.) into how to run mutable RL algorithms in a robust way. I imagine that under the hood there will be a strategy to do some kind of checkpointing to make it pseudo-immutable.
On the training side, there’s loads. I can’t keep up. I did a review a long time ago and I’ve been meaning to update it (here: https://rl-book.com/rl-frameworks/). Take your pick.
On the deployment side, less so. Many of the frameworks above have some kind of serving mode, but I get the impression that most people have to roll their own serving infrastructure and tooling.

Alexey Grigorev

Thank you! Looking forward to seeing how this field develops

Ritobrata Ghosh

Phil Winder, can this book be treated as a primary textbook of Reinforcement Learning or a reference book for studying Reinforcement Learning?

Phil Winder

Hi Ritobrata,
Sorry I don’t quite understand your question. Can you explain what you are looking for?
I wrote this book from an industrial perspective. It contains more “advice” that you would expect from an academic reference. It also contains less mathematics than you would expect from academia.
My goal was to try and be a bridge between the industrial, software-driven world and academic research.

Samuel O. Alfred

Phil Winder I actually like this question. I have read the popular reinforcement learning book by Bartto and Sutton. I don’t have access to your book. So, is your book an extension or looking at things from a different perspective with the same underlying principles?

Phil Winder

Yes, the underlying principals are the same. We’re both talking about RL in the context of the MDP and build up from there.
My book is far more focussed towards industry. I cover more modern algorithms and talk a LOT more about how to do RL in industry. Sutton/Barto’s book is more formal, has a lot more maths, talks less about industrial concerns. In short, Sutton/Barto’s is a textbook. Mine is an O’Reilly book. :slightly_smiling_face:

Phil Winder

Sutton/Barto’s book is excellent for what it is, by the way. I recommend getting both. :smile:

Phil Winder

You can find more info on the main page of the website: https://rl-book.com/
I might add some pages from the preface there too, to answer this question outright.
Thanks!

Ritobrata Ghosh

Phil Winder, thanks for the reply. Appreciate it. Look forward to reading your book. I suggest Sutton and Barto’s book to everyone who asks me. Many came back to me looking for an alternative. While not a substitute, Thomas Simonini’s tutorials do offer a different attempt in learning RL. So I was asking you if learners could read your book to gain a fair level of knowledge in RL before eventually graduating towards Sutton and Barto.

Phil Winder

Yeah I’d agree with that. Most engineers in industry are probably going struggle a bit with sutton’s because it’s too academic. So yes, I’d definitely recommend reading mine first. :blush:

Ritobrata Ghosh

Phil Winder Yes, you are right. Even with my background in Physics, I found Deep Learning textbooks such as Goodfellow’s to be easier to read than Sutton, Barto’s book. I have my answer, thanks. Look forward to reading this book!

Leonid Kholkine

Hello Phil Winder!
Happy to see a book more focused on the industry. I know that more and more companies are exploring the application of RL, but, at least in Portugal, it is still in a very embryonary stage. I’m wondering how do you see the adaption of RL by the industry?

Phil Winder

Hi Leonid,
Like you said, nascent at this point. But it is moving. I don’t think it will be anywhere as big as the generic ML/analytics industry, which in turn isn’t as big as the software industry.
But as you probably know already, these are just tools in your tool belt. The trick is to pick the right tool for the job.
In terms of adoption, I think it’s being adopted already. It’s just a matter of size. I think it will cascade as more “normal” use cases come into popular industrial culture. And as frameworks/libraries start to offer easy to use and robust RL serving, natively.
In short, we’re fighting against low-hanging fruit here. Quite often something very simple is good enough and/or better than nothing. It takes quite a lot to jump up through the hoops of full ML to full RL.
This probably means that it’s going to be larger companies that adopt first. Smaller ones (at least in the non-tech industry) will probably have to wait.

Phil Winder

Yeah, to be clear, I see RL taking a slice of the ML industry. So RL depends on the underlying size of the ML and software industries.

Leonid Kholkine

That leads me to another thought, will there be then more out of the box tools as it happens now more and more with ML?
I think that also might shorten this gap

Leonid Kholkine

And a more interesting question for me, it’s how do you see RL being applied besides the classical cases such as recommender systems, Auto ML, finances, robotics, etc… :slightly_smiling_face:

Phil Winder

Hi again Leonid,
I’m afraid I’m going to have to resort to: ${insert any use case here}. :smile:
Sorry, I couldn’t help it. :stuck_out_tongue: It has a very broad applicability. In fact, you could technically use it anywhere you currently use ML. Although it may not technically be more performance. But it many cases it could be.
I’ve been trying to collate use cases here (https://rl-book.com/applications/). There’s lots already, but I’m already well out of date.
Check out some of the other answers here too: https://rl-book.com/learn/faq/frequently_asked_questions/
Apologies for the generic answer but the real answer is really broad.

Leonid Kholkine

That’s a perfect answer, I did miss those use cases :slightly_smiling_face:

Alexey Grigorev

Good morning!
Phil Winder I know you also have a lot of interest in MLOps. Is there any connection between it and RL in your work?

Phil Winder

Great question.
You’re right. I am very interested and we’ve gained a lot of experience delivering MLOps projects.
The connection to RL is the operational part. RLOps, if you will. Just like in ML, data scientists probably aren’t that interested in spending massive amounts of time messing about with infra/tooling. They’re job and responsibility is extracting value from data, not building supporting infra.
The same is true in RL too. The value is delivering the algorithm that optimises the business metric. The Ops part is irrelevant. The business doesn’t care how it happens, just that it does.
But the business certainly does care how long it takes and whether it is operationally viable. They’d be the first to complain if it breaks.
So there is value in the supporting tech/infra, but it’s not directly tied to the business objective. The value is “making it easier for other people to do their job”.
Since RL is hard to do well, and very difficult to operationalise/productionise, RLOps certianly has a very important role to play.

Alexey Grigorev

Great, thank you! Excited to see how “RLOps” is going to develop

Alexey Grigorev

I was checking the book on Amazon and noticed this:
Best Sellers Rank:

  • #136 in Machine Theory (Books)
  • #155 in Minecraft Guides
  • #165 in Artificial Intelligence (Books)
    The second category is quite an interesting one. I’m curious how it ended up there? :slightly_smiling_face:
    Do you use Minecraft as one of the examples?
Phil Winder

Haha. Yeah I saw that too. Hilarious.
The US metrics are more stable, because there’s been more sales there.
But yeah, Minecraft. Someone needs to do some NLP consulting to Amazon to fix their broken catagorisation algorithm!

Phil Winder

I’ve just grepped the book and I never mention the word minecraft. So I can only assume that there is some overlap in embedding-space between my content and other minecraft books.

Ritobrata Ghosh

No, Amazon should never fix it. Think about it. You get to brag about writing a bestselling book about Minecraft! :wink:

Phil Winder

Hahaha. Yay! :joy: Can you imagine…
Media person: “… and here to talk about minecraft is bestselling author…”
Me: “errrm…. blocks and stuff?”

Ritobrata Ghosh

Just bragging rights to teenagers! Don’t go deep into it, or you’d be caught!

Ritobrata Ghosh

Phil Winder, in your opinion, what factors have prevented the wide adoption of Reinforcement Learning in the industry as opposed to Machine Learning and to some extent, Deep Learning as academic fields widely adopted in the industry?

Phil Winder

Hi Ritobrata,
Good question. Probably just a combination of time, media exposure, market size, processing power, low-hanging fruit.
You say
> ML widely adopted in the industry.
But statistics, and therefore ML, has existed forever. Only recently (i.e. a decade, maybe) has ML “taken off”. So you could argue that it took 200 years for ML to be adopted.
RL originated around the 90’s, so wait until 2290, then ask your question again. :smile:
So the real answer is market size and perception. It goes like this:
IT -> Software -> ML -> DL -> RL.
Because they are applied by/for:
Everyone -> Companies -> One-shot decisions -> Complex decisions -> Strategic/long-term decisions.
Each time you’re reducing the market size. And when you do that you are reducing media exposure. So it might seem like ML has been adopted and RL hasn’t, but in fact the market is just smaller. When normalised the perceived adoption is the same.
With that said, I do think you’re right, it’s not been adopted yet. Mainly because there’s a lack of books like mine and well defined use cases. We’ll get there…

Ritobrata Ghosh

Thanks for the detailed answer. :slightly_smiling_face:

Leonid Kholkine

Phil Winder A bit more of a generic question, but how do you see the field of RL fold out in the next 3-5 years?

Phil Winder

The correct answer to this is probably more boring than you were hoping for.
It will expand, it will get used more. It will become easier to use and will become more obvious where to use it (because you can use off the shelf open-source solutions).
Then RLOps will become a thing.
Then people will perceive it as being “adopted”.
Then something else will take the limelight.
If I put my marketing hat on it would sound similar except with more hyperbole! :smile:

Timothy Wolodzko

Phil Winder RL is still niche of ML, there’s much less books & courses on it as compared to general ML. Besides your book, what would you suggest for someone interested in learning it? Where to start? What on focus first? Moreover, I have a feeling that online you can find either trivial examples, or the very complicated applications like AlphaGo, with not much intermediate ones. So any suggestions for planning the learning journey further?

Phil Winder

It’s the same as anything technical IMO. There is stuff to be learnt, and you can do that by reading. Read all the books and papers you can.
But the real learning experience is… experience. Do it for real. Do it in your company. Do it at work. Then and only then do you learn what you need to learn to do your job.
Yeah, that’s the point. Doing your job is nothing like anybody else’s job. I could tell you to do certain projects but it wouldn’t make sense for your unique situation. Your first challenge is finding a problem that is valuable and makes sense for RL. Then work on that. Start simple. Start with software, then ML, then RL. Work your way up.
What you need is a RL driven learning curriculum that delivers training that suits your unique needs. :smile:

Alexey Grigorev

Can one use RL to come up with the best curriculum to study it?

Alexey Grigorev

Sounds like a good project :sweat_smile:

Ritobrata Ghosh

Timothy Wolodzko, the author has answered. I would like to add a few things. You could try University of Alberta’s RL Specialization in Coursera, and definitely read the Sutton, Barto’s book. I highly recommend David Silver’s Lectures. There are other lectures from DeepMind as well. There’s Spinning Up from OpenAI. You could also read Thomas Simonini’s indroductory RL blogs. There’s plethora of beginners’ and intermediate stuff out there for RL, not as much as DL and ML, but enough for an individual to learn.

Timothy Wolodzko

Ritobrata Ghosh & Phil Winder thanks! I already have some of those books etc, just wanted to learn if there’s anything more I should look for.

Vladimir Finkelshtein

I think one obstacle for learning is the lack of plug and play libraries like for more classical ML. Even with openai.gym, some of the recent books have code that doesn’t compile, because the libraries are still being developed and change too often. It is certainly an obstacle for people who are less experienced in programming.

Phil Winder

Hi Vladimir Finkelshtein,
Although I have seen people/companies try to do data science without software experience/capabilities, I would recommend that gaining software engineering experience is as important as ML/RL experience.
Software is the language of applications, so if you want to build useable ML/RL, you need software. Of course this doesn’t apply to everything. E.g. you can just about get by with hosted tools for an analytics project and larger companies can hire multiple people with different skils (this is the general solution, by the way). But sooner or later you’ll need to code :slightly_smiling_face:
I have sympathy for your despair, however. I think the main issue is the complexity. Any complex system has a million ways to fail and it sounds like you’ve found most of them. :slightly_smiling_face:

Ashutosh Sanzgiri

Phil Winder Do you think that the field needs a new name or branding? Maybe the word “reinforcement” is not catchy enough or does not have the right connotations? Also when do you think we will start seeing “self-help” apps (weight loss etc.) that claim to be powered by RL?

Phil Winder

Not necessarily a new name, no. But I would like to see RL become more mainstream in the ML toolbox.
I’d like to be at the point where people say (at the most general level) “we’re working on a data project and we might need to dip into our toolbox, rummage around, and we might need to pick RL for the job”.

Phil Winder

And the term RL represents quite a small spectrum of techniques. You could use the words for all the sub-techniques too if you want to be more specific (e.g. imitation RL, inverse RL, curriculum RL, etc. etc.).
“Claims” are powered by marketing/advertising. So that’s entirely powered by marketing.
I’d suspect at some point someone in marketing will hear the term, go “oh that’s cool, is that like AI?” and then they’ll run with it. “The first app to use RL…”
Then there will be a domino effect, then users will get confused annoyed, and then people will stop using it again and move on to the next thing.
This is why I tend to try to avoid predicting marketing hype cycles. They are so fickle. The core technologies and concepts are useful in certain applications and that is why it will stick around for a long time.

George Melvin

Phil Winder Hi Phil, Pavlov’s famous experiments (:bell:) are a great example of reinforcement learning for non-machines. I’m interested to know: do you foresee any interplay between reinforcement learning for biological/machine entities in the future? e.g. do you expect to see research insights from (machine) reinforcement learning having application in psychology, and vice-versa?

Phil Winder

Great question. I think the answer depends on how deep you want to go.
At a superficial level, yes, definitely, health apps in particular. RL driven, personalised nudges to help you loose weight, get fit, learn a new subjects, etc. are an obvious use case.
At a slightly deeper level, the introduction of RL in core front-line healthcare, like personalised medicine, shows strong signs.
But at the full-on I’ve-had-too-many-beers-deep level, you could imagine RL providing “life” strategies. Like a personalised, optimal route to getting a job that you want. Or “automated relationships”.
Haha. I need that. Imagine not having to remember anniversaries, the perfect present automatically ordered.

Phil Winder

And in pshychological wellness. Yes, definitely.
“Hi Dave, you look sad Dave.”

Phil Winder

TIL there’s a poor selection of Red Dwarf gifs available on the internet…

To take part in the book of the week event:

  • Register in our Slack
  • Join the #book-of-the-week channel
  • Ask as many questions as you'd like
  • The book authors answer questions from Monday till Thursday
  • On Friday, the authors decide who wins free copies of their book

To see other books, check the the book of the week page.

Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.


DataTalks.Club. Hosted on GitHub Pages. We use cookies.