Questions and Answers
Hello Shubham Saboo and Sandra Kublik, its my understanding that LLM’s such as GTP have been mainly trained on vast amount of English corpus datasets, my question is does the book covers sections on how to proceed should one be interested in building some replica of LLM’s based on some specific local language providing enough corpus of the local language is available for training? If not, what suggestions would you propose to start digging at? The whole point is to bridge the gap between communication amongst people and communities.
Thanks.
Hey Ataliba Miguel! Most of the training corpus is indeed in English, but there is a surprising amount of languages GPT-3.5/ChatGPT is proficient at, having only a small sample of data in the training corpus in those languages.
There is also a surprising amount of existing GPT replicas in particular languages developed by countries (Sweden https://www.ai.se/en/node/81535/gpt-sw3) or AI communities (check out Hugging Face https://huggingface.co/asi/gpt-fr-cased-base).
My suggestion would be to dig into models in the language you are interested in. Perhaps the language is actually available with GPT-3.5/ChatGPT because there was a small corpus from that language in the training data, or perhaps there is an open-source language-specific alternative somewhere already like in Hugging Face. To get a multilingual model you can try stacking them up together / wait until one of the LLM providers releases a more robust multilingual GPT model. :)
(P.S. Generative models like GPT are not the only type of LLMs, you also have BERT models that are good for e.g. semantic search use case, and here the best one on the market is right now multilingual embeddings model from Cohere that supports +100 languages. https://txt.cohere.ai/multilingual/)
Hi Shubham Saboo and Sandra Kublik. From your point of view, what would be the main challenges to create products as powerful using Bloom or Flant-T5 or other open-source models?
What do you believe is the main limitation of GPT-3 at the moment?
Some challenges include:
- Access to Data: Open-source models like Bloom or Flant-T5 rely on large amounts of data for training, which can be a challenge for smaller companies or organizations that may not have access to such data.
- Extensive Compute Resources: Bloom or Flant-T5 like models have huge number of parameters making it a compute extensive exercise to train these models.
- Model Tuning: Open-source models may require more fine-tuning and customization to achieve optimal performance for specific tasks, which can be time-consuming and require expertise.
The main limitation of GPT-3 at the moment is its lack of ability to truly understand context and meaning in the way that humans do. While GPT-3 can generate human-like text, it still lacks true comprehension and common sense reasoning abilities. This means that it can sometimes produce inaccurate or nonsensical responses to certain queries or prompts, and requires significant human oversight and guidance to ensure that its outputs are reliable and trustworthy.
Hallo Sandra Kublik and Shubham Saboo, thanks for being here to answer our questions, what are the key concepts and skills a beginner in the field should work on in order to start with GPT-3 and building NLP applications?
Hey Bhaskar Sarma! Love your question :heart: A bunch of suggestions from me:
- It’s great to learn the basics of prompt engineering. It’s very satisfying once you get the knack of it, and it requires 0 technical knowledge (you can interact with the models via playground - a simple UI)
- Try to build example projects for any GPT-3 use case you are interested in, find as much info about this particular GPT-3 use case, put something together, share it with the world, join a couple of hackathons and learn from other folks that are interested in the same. Once you get hooked, you will figure out how to move forward and learn much more than you expected.
- There isn’t any specific tech stack that is good to know, although the biggest amount of products I’ve seen is built with python. Demos often use streamlit or replit.
- Read about other LLM providers next to OpenAI like Cohere, Anthropic, AI21Labs, and check out open source communities like Hugging Face
- Observe startups in this field, their journey is super informative when it comes to understanding how GPT-3 translates to products.
Three things:
- Basic understanding of Natural Language Processing (NLP): GPT-3 is a language model that uses NLP techniques to generate human-like text. A beginner should have a basic understanding of NLP and the different techniques used in it, such as tokenization, part-of-speech tagging, and named entity recognition.
- Programming: GPT-3 is accessed through programming languages such as Python, so a beginner should have a basic understanding of programming concepts and syntax.
- API Integration: GPT-3 is accessed through an API, so a beginner should have a basic understanding of API integration and how to make API requests using programming languages.
You can start by reading the official documentation and tutorials provided by OpenAI and check out our book when interested in going in depth.
Hi Sandra Kublik and Shubham Saboo Exciting book. Very much needed.
How does GPT handle tabular data? Suppose I have an annual report(pdf) of a firm and would like to input it to GPT and do question answering on top it. The data has textual and tabular information. Most of the key numbers are in the form of a table in the report. Would GPT be able to process it and provide good responses?
Thank you Prashant!
GPT is primarily designed for processing natural language text data, and while it can handle some forms of structured data, it may not be the ideal tool for processing tabular data. However, GPT can still be used to extract information from tables in the report by training it on a large corpus of documents that contain tables, and using techniques such as fine-tuning or transfer learning to adapt it to the specific task.
To extract information from the tabular data, one approach would be to preprocess the annual report to extract the tables and convert them into a format that can be easily ingested by GPT, such as a list of values or a structured JSON object. Then, this data can be fed into GPT along with the textual content of the report to generate responses to specific questions.
However, it is worth noting that the quality of the responses generated by GPT will depend on the quality of the input data and the training data used to train the model. Additionally, GPT is a language model that generates text, so it may not be able to output tabular data directly. It can provide answers to questions based on the information in the table, but the output will be in natural language text format rather than a table.
Preprocessing the data to extract the tables and convert them into a format that can be easily ingested by GPT can improve the model’s performance on this task.
That answers my question. Thank you Shubham Saboo
Hi Sandra Kublik and Shubham Saboo,
I’ve recently taken an interest in ML and AI. I’ve been working DE space all this while. I wanted to understand if I can straight up take this book to understand how GPT works?
Or, do you guys recommend going through some pre-requisites to understand the content better?
Hello Pritam Pathak! No pre-req are needed. The book is intended for people with any tech & non-tech background 🙂 You will understand how it works, how it shaped a new ecosystem for business, and build a basic demo using GPT-3 after reading this book.
Feels awesome to see a TLP alum feature at DTC BoTW! Great work Shubham and Sandra!
- What models/ideas would you suggest to approach domain specific document summarisation?
Thank u adanai :heart: Cohere just launched a specialized endpoint (currently in beta) for this use case called Summarize https://txt.cohere.ai/summarize-beta/
Otherwise, any GPT-x model will handle it well when finetuned and prompted to summarize
what specific domain/s are you looking at?
Thank you adanai!!
You can try the Cohere Summarization endpoint as Sandra mentioned or checkout the gpt3.5-turbo by OpenAI (which is used by ChatGPT under the hood) for domain specific summarization problem!
Thank you! I’ll read more and experiment with cohere
thanks for presenting the book here!
- how do you see the future of GPT-X evolving, what do you expect on GPT-4?
- what potential new applications or improvements do you anticipate in the near future?
- how do you envision GPT-3 being used in industries outside of tech, such as healthcare or finance?
- how much knowledge on nlp is necessary to build products on top of these APIs?
Our pleasure JaimeRV, here are the answers to your questions:
- As an AI language model, GPT-3 has made significant strides in natural language processing, and its future development will undoubtedly be even more impressive. GPT-4 is likely to continue this trend, with a focus on improving the model’s performance on complex tasks, expanding its language coverage, and addressing ethical concerns such as bias and fairness. It’s possible that GPT-4 could incorporate multimodality and expand its task horizon to many folds.
- There are many potential applications for large language models like GPT-3 in the near future, including language translation, chatbots, customer support, and content generation. One area where GPT-3 could have a significant impact is in the field of education, where it could be used to generate personalized learning materials based on a student’s individual learning style and interests.
- In healthcare, GPT-3 could be used to analyze medical records and assist with diagnosis, generate personalized treatment plans, and even develop new drugs. In finance, GPT-3 could be used to analyze financial data and assist with investment decisions, detect fraudulent transactions, and provide personalized financial advice to customers. However, it’s important to note that these applications would need to be carefully developed and regulated to ensure patient privacy and financial security.
- While a basic understanding of NLP is necessary to work with GPT-3 and similar language models, it’s not necessarily required to build products on top of these APIs. Many API providers offer pre-built models and tools that can be used with minimal programming knowledge. However, to develop more advanced applications, a deeper understanding of NLP concepts such as semantic parsing, entity recognition, and sentiment analysis would be helpful.
https://twitter.com/Saboo_Shubham_/status/1614141269051322369
Do you have any advice for implementing GPT-3 at more regulated institutions?
We dedicated a chapter to GPT-3 for enterprises (generally highly regulated orgs) in the book. Microsoft has a special Azure OpenAI Service that caters to such institutions and has a framework for working with them. My advice would be to try it and make sure to sign a solid contract. Another way to use GPT-3 at your org would be to use open-source GPT architecture (e.g. from hugging face) and self-host it for maximum data security.
Here are some considerations that regulated institutions should keep in mind when implementing GPT-3:
- Privacy and Security: Regulated institutions should be aware of the privacy and security implications of using GPT-3. They should ensure that sensitive data is not exposed to the API and that appropriate security measures are in place to protect against unauthorized access or breaches.
- Bias and Fairness: Regulated institutions should be aware of potential biases in the GPT-3 model and take steps to mitigate them. They should also ensure that the use of GPT-3 is fair and equitable and does not discriminate against any particular group or individual.
- Compliance: Regulated institutions should ensure that the use of GPT-3 is compliant with relevant regulations and standards. For example, in the healthcare industry, the use of GPT-3 must comply with HIPAA regulations.
- Testing and Validation: Regulated institutions should thoroughly test and validate the use of GPT-3 to ensure that it is accurate and reliable. This includes testing for accuracy, completeness, and reliability, as well as testing for potential errors or biases.
- Governance and Oversight: Regulated institutions should establish clear governance and oversight processes for the use of GPT-3. This includes identifying roles and responsibilities, establishing processes for monitoring and managing risk, and ensuring ongoing compliance with relevant regulations and standards.
Learn more about it in Chapter 5 and 6 on our book!
With regards to future trends, do you feel that as GPT-3 and other models become increasingly integrated that there will be a corresponding increase in resistance from consumers who likely don’t understand the technology? Or is it becoming ubiquitous and therefore a non-issue?
Great question Anna! Honest answer: we shall find out 🙂 But my gut feeling and general trend we see is that the barrier of entry is decreasing (see progress from GPT-3 to ChatGPT and user-friendly chatbot interface) and interacting with the models/ products that incorporate these models will likely be more and more intuitive and not pose an issue.
Having said that, the human-AI alignment is one of the biggest and most interesting challenges at the moment. And the way it is solved will no doubt impact our degree of LLM adoption.
Thank you Sandra 🙂
Hi Shubham Saboo and Sandra Kublik - such an exciting book that you have written! Very apt and very timely indeed.
With the proliferation of Gen AI and LLM, do you see that the landscape is going to favor more of the OpenAI’s model (exposing endpoints but keeping model weights proprietary) or more of the OSS model (like that of HF) or will it be hybrid?
Thank you Bengsoon! Personally, I think both models are very much needed (APIs for flexibility and ease of use, open source for self-hosting, best for security concerns) and will continue to be needed equally by organizations that have different needs (e.g. startup agility vs enterprise compliance), recently did an interview with Julien Simon from HF where we chatted about this https://youtu.be/safSwHoA8qw
Thank you for the the kind words Bengsoon!!
Here is my take on it:
The landscape for AI language models is likely to be a hybrid of both the proprietary and open-source models. OpenAI’s model of exposing endpoints but keeping model weights proprietary allows for controlled access to the API while still protecting the model’s intellectual property. This model also allows OpenAI to ensure that the model is used ethically and responsibly.
On the other hand, open-source models like those of Hugging Face (HF) enable developers to modify the code and adapt the models to their specific needs. This model fosters collaboration and innovation, as developers can build on top of each other’s work and share their contributions with the community.
Both models have their advantages and disadvantages, and the landscape is likely to favor a hybrid approach that combines the best of both worlds. For example, OpenAI could partner with companies and institutions to develop custom versions of their models that meet specific needs while still protecting the intellectual property. OpenAI could also work with the community to develop open-source tools and libraries that can be used in conjunction with their models.
Ultimately, the success of AI language models will depend on their ability to solve real-world problems and create value for businesses and society. The landscape is likely to evolve as the technology and use cases mature, and we can expect to see a mix of proprietary and open-source models that meet the needs of different stakeholders.
that’s great! I agree 💯 with what both of you mentioned. I personally think both models also cater to different demographics of the landscape - which meant that innovation can be done at so many levels; from developing solutions that connect to endpoint to finetuning / model training.
Thank you both for taking your time to answer in detail, Sandra Kublik and Shubham Saboo! Again, deeply appreciate your work in this space!
Many thanks again to @DTC for enabling this correspondence. Thanks also to the authors for participating in it🙂
I would be interested in knowing:
- as a person who wants to build an
API
withGPT
:
◦ what is the main focus of the chapter onOpenAI
given the alternative to read theOpenAI
docs?
◦ closely linked to the first question: does the book also cover usingFastAPI
in this context? - while writing the book:
◦ did you encounter enterprises interested in adopting APIs based onGPT
, but actually not adopting it because of (from their point of view, perhaps especially in finance) the “risks” that open source imply?
◦ did evaluate the possibility to apply all what you show also to the public sector (e.g. “govtech”)?
Thanks again!
Hi Sandra Kublik and Shubham Saboo, The questions I have for you is in a scenario where an Enterprise is currently building or already built apps using GPT-3 and mostly through the API. 1. How much would the change be when GPT-4 comes in ? How should an Enterprise assess moving ? What are the recommendations for the API change ? 2. How will BYOM - Bring your own model work ? Say there is a trained model but an Enterprise would like to fine tune it with more corpus specific to the enterprise and then use in Apps - how should they go about it ? Thanks for sharing your expertise !
Hi Sandra Kublik Shubham Saboo thanks for the opportunity.
- What is your view on new programming paradigms for gpt3 like applications which could be different from using normal APIs (like current openai APIs) to be more elegant/appropriate. For comparison, Map Reduce programming paradigm changed the way for distributed computing - similarly will we see a completely new paradigm for such NLU applications integrating with support for various prompt engineering (more like plumbing framework for different one-shot/few-shot/chain of thought prompts etc..)? What are your thoughts? Thanks.
- Practically, only very few big tech companies will be commercially able to train such large models from scratch. I understand the transfer learning on top of these for fine tuning domain specific corpus but will it not create a huge bias/trustworthiness when only a few handful of companies decide the base pretrained models? With only the number of parameters used to train only ever increasing, do you think smaller models can provide such impactful responses with some innovations (and not depending on pretrained models)?
Hey Sri, thanks for the question! There is a lot of attention dedicated to eradicating bias in the foundation models. For these few companies that are able to train and deliver them via APIs, it makes sense business-wise to plan for making their models universal and adaptable with further fine-tuning to any company use case, so they can capture as big of a market share as possible.
Yes, smaller models will keep delivering better performance. Currently, the trend with the model size is actually reversing for the top LLM providers: companies are focusing on developing smaller, more robust models thanks to new hardware advancements (mostly NVIDIA) and model architecture improvements like incorporating reinforcement learning with human feedback that delivers desired performance with a smaller number of parameters (last year Chinchilla paper demonstrated you can train a smaller model that outperforms GPT-3). You can see that OpenAI’s new pricing makes GPT-3.5 x 10 cheaper than before. My thesis is that it is possible because the model is getting smaller and the costs of hosting it are dropping for the company as well. Other LLM providers trying to stay competitive will need to focus on minimizing costs while improving performance and it’s tricky if you keep scaling the number of params.
Having said all that, you don’t have to rely on LLM providers, you can always opt for self-hosting models, using existing open-source models like Hugging Face. They give full visibility in terms of model params, weights, and potential biases, and you can adapt them to your needs. It all depends on the use case and resources available to you to execute it.
Thanks Sandra Kublik, makes sense.