Questions and Answers
Hi Raghav Bali and Joseph. Thanks for doing this. I am curious to know -
- What problems Generative models can solve right now?
- What are some latest developments in the field of Generative models?
- With deepfakes being a think now, do you think the world has enough artillery to deal with it?
Hey Kshitiz, interesting and important questions… Here’s my take:
- Generative models are being leveraged for a numbers applications currently. Some popular examples are: Painting/Art Generation (a number of neural artists are all the rage nowadays), FaceGeneration(check thispersondoesnotexist.com) for stock photos, music generation for high quality background scores (free from copyright issues), style transfer for fun usecases (Snapchat/Instagram filters) , dataset augmentation (for classification usecases) and a number of other domains.
- Recent Developments in the field of Generative Models: well this is quite a broad question but a number of amazing works are focusing on improving the output quality with far lesser infra and training time. GANs are being improved to be more stable , lean yet generate extremely high quality outputs. See the likes of StyleGAN3 from folks at NVIDIA. We cover quite a few of these recent architectures in our book as well.
- Well, with any powerful technology, there is always a danger of it being used improperly. Researchers ans practitioners like us have a huge task at our hands to make people aware of stuff like deepfake. Apart from awareness, a number of research labs are focusing on ways of identification of fake(deep fake) content (though there is a long way to go for this). In short, tldr; we are not there but efforts are being made nonetheless
Raghav Bali Thanks for the responses!
Hi Raghav and Joseph, interesting stuff thanks for taking the time. I’m curious when making content in the creative field based on generative models, are there any methods you use to avoid “traps” like copyright infringement?
Hey Cam Buchanan
Apologies, looks like I somehow missed answering this very interesting and pertinent question.
Copyright infringement (and other aspects of law are quite varied and dependent on interpretation). But keeping the nuances aside, the following are some rules of thumb to keep in mind when generating content using models:
- Generative Models are searching the training space at a very abstract level. There is a very highly chance the output would be derivative of the training space. Even solutions like Github’s autopilot and the likes have raised similar questions
- Make use of available tools for a quick check. For instance, uploading content on soundcloud, youtube, etc, you should pay attention to the inbuilt copyright checks. If your content is getting flagged, go back to drawing board. But again, these tools are not always foolproof
- Always mention a caveat or a disclaimer on how you generated this content and if someone claims their copyright, best to oblige or collaborate (if it is indeed the case)
Hi Raghav Bali, Good to have you here again 🙂
I was wondering whether Generative AI can help in video summarisation within each frame of context. ex: In YouTube videos it is called video chapters (https://techcrunch.com/2020/05/28/youtube-introduces-video-chapters-to-make-it-easier-to-navigate-through-longer-videos/) where we have to manually create the window with it’s relevant text summary. This would be helpful for DTC since we do it manually 😄
Does it have a specific name in ML research where you generate a super short 2-5 words summary for long piece of video, text or audio?
Glad to have you back in this discussion WingCode. Excellent question n I can shamelessly admit that it took me down a rabbit hole. I am still looking to find more details on this (haven’t got too far yet), but here’s my take:
- Video segmentation sounds like an apt name for it but fortunately or unfortunately it refers to segmenting objects within a given video frame, so we might have to get creative here. Let’s brain-storm till we find some papers detailing this?
- contrary to your mention, it seems youtube started creating video chapters automatically using “ML”. The support pages do not detail much about it though. See here: https://support.google.com/youtube/answer/9884579
Seems like the manual stuff for DTC can be managed through this feature? The documentation says that the service does the segmentation based on different text in the video to generate titles etc but I am pretty sure there is more here than meets the eye
- Creating summary from a video frame sounds similar to the task of image captioning. There are a number of works which do this quite nicely, starting points are: https://arxiv.org/abs/1411.4555 , https://arxiv.org/abs/1601.03896
Thank you Raghav Bali for digging up all the resource and the elaborate answers as usual 🙂
What’s the easiest way to generate an intro tune for a podcast? Asking for a friend
Well, Generative Models are to the rescue here (RNNs in particular). <#C01F53D373M|shameless-promotion> and plug, refer to my article here: https://towardsdatascience.com/lstms-for-music-generation-8b65c9671d35
The article also points to a few samples generated using the said architectures. The book explores it to a greater depth
Hi Raghav Bali asking this totally out of curiosity. Is it possible to create a sort of reverse subtitling from Generative AI?. Ex. Given text “a beautiful place”, the model has to generate a picture/video/art of a scenic place, something similar to how our mind generates.
If so, how are these models trained?
you might want to look at what openai has been doing with dall-e/clip etc (one blog post with runnalbe code https://minimaxir.com/2021/08/vqgan-clip/ )
Thanks for your question Lavanya M K and kudos to Wendy Mak for the perfectly crisp answer. DALL-E and CLIP are state of the art works and generate some really wonderful works of art.
Lavanya M K found something interesting on the lines of your question. I know its well past the AMA but who cares 😉
https://blogs.nvidia.com/blog/2021/11/22/gaugan2-ai-art-demo/
This looks super exciting. Thanks Raghav Bali for sharing.
Really interested to know how training is done for such models
Not answering to a question but Generative Models are becoming more and more common and creative nowadays. This recent art gallery exhibition by Emil Wallner (leading AI researcher with Google) takes Style Transfer to the next level, see here: https://twitter.com/EmilWallner/status/1453050980438843397
Another interesting take on Generative AI
https://twitter.com/_joelsimon/status/1458507647515254785?t=rrN80ZIEweAX0ITBvmV7VQ&s=19
Hi Raghav Bali, thanks for being here again!
Hey Tim Becker nice to meet you 🙂
Thank you for answering my questions 🙂
- In your book, are you talking about data augmentation with generative AI? I am particularly interested in when this technique is useful and when not? And how beneficial is it? I guess, there are certain limits to this approach?
Unfortunately no. Data Augmentation using generative models is definitely a topic worth exploring but given that there weren’t many books introducing generative models and different nuances associated. Hence, we decided to focus the book on different types of generative models along with their different applications.
But yeah, outside the book using generative model for data augmentation is certainly gathering some steam. There are works by folks like Antoniou et. al. spearheading this space (important paper: https://arxiv.org/abs/1711.04340)
My two cents on the topic (though I am pretty new to this aspect itself):
- Highly beneficial if you have a training data space which is not so expansive/diverse (for the lack of better word) . This would help you generate samples for a robust model
- Limits: certainly risky if you do not have a metric to understand the quality of your samples. For instance, I am working in the healthcare space and imagine a scenario where we would want to generate sample X-Rays for lung cancer. We would need very tough quality check to ensure that generated samples indeed make sense
- Could you give an example of an adversarial learning paradigm?
This is something I want to explore for quite some time but haven’t been able to. Really interested to read and understand the poisoning strategies. Maybe we can catchup sometime soon to discuss more.
Ask to the larger community here: any pointers/materials to get started?
This is a very interesting research and paper about CryoGAN for me.
- For you personally what is the most exciting topic in generative AI?
On a broader level, the whole concept of GAN is very exciting to me. Every new architecture and the ideas behind them simply amaze me.
From an application standpoint, I believe Music Generation and DeepFakes (though notorious) have great potential
Hi Raghav Bali,
Have you come across any work on unsupervised labelled interpretable controls for GANs ? https://www.youtube.com/watch?v=jdTICDa_eAI In this video, they manually play around on each component to understand the changes it brings about but then it is manual and requires a human annotator to label each component.
Digging up on the above paper, found reference to this paper https://arxiv.org/abs/2002.03754 which finds these manipulation axises unsupervised but I am not sure whether it generates a labeled text for that specific axis 🙂
Awesome week and equally Awesome set of questions. Thank you to all the participants and congratulations to all the winners. It’s been fun interacting and discussing stuff with you all.
Thanks Alexey Grigorev again for this wonderful platform. Would love to keep this discussion on going.
Cheers and keep exploring guys n girls