Applied Natural Language Processing in the Enterprise

Rimma Shafikova

👋 NLP beginner here. When we speak about progress in NLP, is it always limited to English? Are state-of-the-art approaches transferable to other languages? (of interest: Russian, Mandarin)

Ankur Patel

Yes, the beauty of the state-of-the-art approaches today is that they are very transferable to other languages, especially languages where large corpuses of text is readily available (e.g., Russian, Mandarin, etc.). Progress is more limited in languages where large corpuses of text are harder to come by, though

Ankur Patel

Basically, to develop very good NLP models today, you need access to massive volumes of text (in any language of your choice), which is easy to come by

Rimma Shafikova

thanks Ankur!

Alex

Hi there Ankur Patel! First of all, thanks a lot for taking the time to reply our questions 😄
Since this book is aiming to show applied NLP in companies/orgs: Do you consider that getting the executive buy-in is difficult, given the conceived complexity of getting effective NLP models to work in real-world/business problems?

Ankur Patel

In general, yes getting executive buy-in is difficult, mostly because this is a new space, and executives may prefer the tried and tested over something new that they do not fully understand yet

Ankur Patel

That being said, it does depend on the executive. In my experience, defining the deliverable narrowly and showing value in a modest way fast is a good way to garner the attention and interest of executives fast

Ankur Patel

Once they see the return on investment, you can pitch more ambitious projects

Ankur Patel

Too often, I see people promising too much, and then things take longer and cost more and executives become disenchanted by the space because of it

Dr Abdulrahman Baqais

Thank you Ankur Patel for the book.
Few questions:
1) Most NLP advances now is based on huge preteained model, what about the classical ML models (Logistic Regression, Bayes naive, LSTM..etc).
Do you feel that Today’s NLP practitioners should jump directly to transformers and preteained models.

Ankur Patel

I think the older approaches (regex and classical ML models) still have a place even with transformers

Ankur Patel

It really depends on the task. For example, if you are trying to process invoices (OCR + text classification), using transformers is the way to go

Ankur Patel

If you are performing entity resolution or linking, then a simpler approach based on cosine similarity or regex or Elasticsearch may be better

Ankur Patel

I think having awareness of all the different potential approaches to the problem will help you pick the right tool for the given job at hand

Doink

how to know which to use when? Is there a mindmap especially for those who don’t have much NLP domain

Ankur Patel

Here is a good approach.

If your task is the same or similar to one of the core NLP tasks here, use a Transformer-based model. https://huggingface.co/transformers/task_summary.html
If your task is for a small dataset or relatively simple or requires interpretability, start with the older yet simpler approaches such as rules-based NLP (e.g., regex) or classical ML.
If your task is for a large dataset and is more complex in nature, skip the rules-based NLP or classical ML and research some of the state of the art NLP approaches to solving the problem. Implement one of these approaches.

Dr Abdulrahman Baqais

2) NLP domain seems daunting with many subdomains: NER, summarization, translation..etc. many tools, many libraries and packages. Yet running a blacbox preteained model can get very high accuracy if it was run by a junior DS who has no clue what is going on.

What kind of skills NLP practitioners should equip themselves in order to be able to digest all these information and still be in demand in industrial market.

Ankur Patel

I think spending 20% of your time learning what is new and effective (in an applied setting) is a must. It is true that some of the latest methods are so good that they may leap frog existing approaches even without too much tuning

Ankur Patel

For example, we write about spaCy, Hugging Face, and fastai in our book. These are must haves today if you are doing NLP in enterprise

Ankur Patel

Older libraries such as StanfordNLP and NLTK are much more dated and less effective

Dr Abdulrahman Baqais

3) NLP is taught at an advanced level at separate track in many DS bootcamps assuming a solid DS knowledge.
Can we teach NLP to non-DS to create a citizen NLP practitioners?

Ankur Patel

It’s possible now, but it wasn’t really possible a few years ago. Now there are more easy to use libraries available, but we still have a long way to go

Ankur Patel

I would say that you need some Python and DS knowledge to begin the NLP journey today, but NLP is not necessarily a crazy advanced field

Ankur Patel

Our book assumes that the reader has some basic Python experience but not much more than that

Matthew Emerick

Hey, Ankur Patel! Thanks for doing this!
What is the singular biggest challenge for putting NLP into production?

Ankur Patel

I find that most NLP specialists struggle with the engineering aspects such as refactoring code, developing Docker containers, writing unit and integration tests, deploying the model as a service on AWS / Azure / GCP

Matthew Emerick

How important is the study of linguistics to NLP developers?

Ankur Patel

Nope, not necessary today. It helps but it is not required by any means

Matthew Emerick

What do you see as the next big step in NLU?

Ankur Patel

We need better representations to achieve “longer-term memory.” Without this, NLU is very hard. Attention mechanisms and transformers have helped on this front, but we need machines to hold memory over very long spans (chapters and chapters of a book), and this is really hard today

Matthew Emerick

Would you think symbolic AI mixed with today’s statistical methods might help?

Ankur Patel

I think so. I haven’t studied symbolic AI enough and it hasn’t been in vogue recently, but I think using just statistical based methods may be limiting. I’m hoping more research is done in this and other areas soon (to complement the statistical approaches)

David Cox

I appreciate you taking the time to respond to questions, Ankur Patel! I second Matthew’s questions about with respect to getting NLP products into production. Also, many database systems (e.g., AWS) are offering out-of-the-box NLP solutions. I’m wondering if you have any thoughts or recommendations on cloud computing systems that do simple NLP solutions well and also allow individuals with training in NLP to engage in more advanced analytics?

Ankur Patel

I think the major cloud providers have done a great job lowering the barrier of entry to perform NLP

Ankur Patel

I find Google best in this regard today, followed closely by Amazon. Today, you can perform some of the core NLP tasks with just basic Python knowledge without knowing ML or DL

Ankur Patel

I think we will see this trend continue quite a bit. Easy to use NLP APIs tailored to the developer community at large instead of to just data scientists

ASHISH SONI

Hey Ankur Patel! Really curious about the content your book!
What kind of business problems/examples have you covered in the book?

Ankur Patel

We cover a lot of the more popular NLP tasks in enterprise today such as named entity recognition, text classification, sentiment analysis, and summarization

Ankur Patel

We don’t focus on any one vertical because a lot of these tasks are appropriate across verticals

ASHISH SONI

Thank you Ankur! 🙏

Ajay Arasanipalai

To add on: there’s been lots of evidence recently that backs up what Ankur Patel says here about the same techniques working across verticals.

Ajay Arasanipalai

The real beauty of neural nets and deep learning is that they actually do live up to the promise of generalize across many different datasets. For example - GitHub copilot and AI dungeon, two very successful and popular real world, actually deployed, commercial products who’s domains are quite different (software engineering vs. entertainment), use nearly identical models.

David Cox

Ankur Patel Based on the trends you’ve watched over the past decade and that led into your book, what do you think are the next areas for major advancement in NLP? And, what are some lingering challenges that we don’t seem to be close to?

Ankur Patel

Let’s start with the next major (applied) advancement in NLP. Combining computer vision with NLP to process documents is and will become a very large area of investment (from the applied / enterprise community).

Ankur Patel

For example, think of all the paper heavy industries today that require visual and textual cues to interpret properly (e.g., invoices, health statements, receipts, legal documents, financial documents, etc.). These areas are ripe for automation using not just NLP but also computer vision (jointly).

Ankur Patel

The most hyped area is NLG (natural language generation), largely off of GPT-2 and GPT-3 from OpenAI. But, we are still quite a bit away from having human-like generation

Ankur Patel

That’s one of the biggest challenges, but GPT-3 has caught the attention of many people, both in academia and in industry, so I expect a lot more innovation here in the coming decade

David Cox

Very interesting! Especially tin thinking about businesses that are paper heavy. Do you know, offhand, if anyone has looked at the carbon offsets between creating and using paper as compared to the resources needed to build the models specific to the different business cases and the resulting data storage/use?

David Cox

I wonder a lot about the getting to language that is similar to human. In particular because of the known contextual factors that go into human language outside of simply looking at the structure of what was said prior to the generated statement. It seems this will require combining other datasets to get to that structural piece as well as advances in NLP generally.

Wendy Mak

Hi Ankur, what business problems do you think would benefit a lot from modern NLP methods but is commonly overlooked in research as too boring or from the businesses not being aware it is possible?

Ankur Patel

A few problems come to mind, but one of the biggest areas is information retrieval. Researchers find it “boring” at least compared to NLU and NLG tasks, but informational retrieval (think Google on domain-specific private documents such as legal and finance and healthcare) is incredibly valuable to businesses.

Ankur Patel

Businesses are using CTRL-F and manual organization and tagging of data to find what they need but NLP could really unlock a lot of value here. It’s not a Google scale problem so it often gets overlooked by the tech giants. Curious what you think about this (and others).

Wendy Mak

yeah, I agree– I think it’s really relevant but you don’t see a lot of papers about document tagging etc in e.g. Neurips… In the last company I worked for there’s a lot of legal documentation that could really benefit from automatic tagging (unfortunately that project got parked when the biz people decided it was low priority…)

Ankur Patel

Part of the challenge here is that the effort to build a model to do this well isn’t worth it for any given organization, so software companies will need to provide this service to many clients to make the investment worthwhile. I think we are starting to see more of this now

Jeff Herman

Hi Ankur. Thanks again for taking some time to give your great insights on our questions! With transformers how easy is it to really understand how the model is making the predictions that it is? For example if we have a text classification using a transformer, can we see which words were of most importance for a prediction?

Ankur Patel

Initially, this was hard to do, but since 2018 there has been some good progress on introducing interpretability to the Transformer models

Ankur Patel

You can now see which word(s) the model paid most “attention” to as it was making a prediction

Ankur Patel

It’s still difficult to truly understand the black box at scale though (across many predictions) but for one off predictions, this is possible today

Ajay Arasanipalai

To add on: I know it wasn’t originally your question, but let me also mention that I don’t think trying to kind “keywords” is necessarily a great idea for interpretability. As Ankur Patel mentioned, I don’t think this approach scales well in practice - when you have thousands of users querying your model with many gigabytes of text, what insight do you hope to get by finding the most “attentive” words?

Ajay Arasanipalai

In the case of text classification, it might be smarter to just use basic word/subword counting post-classification (i.e. what are the most common words among novels that have been classified as horror).

Jeff Herman

Hi Ajay Arasanipalai. Appreciate your insight! Originally, I was thinking of how to audit the model. For example, if we are looking at novels and we predict a novel as horror when it is actually historical I wanted to know the most important words for why the model predicted it to be horror. I like your approach, I could compare most common words in that novel vs the most common words in the different novel genres

Krzysztof Ograbek

Hi Ankur Patel, thank you for doing this.
How will NLP be different in 3 years from how it is today? Are there any tasks that will explode on popularity?

Ankur Patel

I think we have seen NLP in good use in consumer applications today (for example, with social apps), but the next 3 years will focus on getting NLP into the workplace, automating tasks that white collar workers are doing today

Ankur Patel

For example, analyzing and processing invoices, bank statements, legal memos, health care statements, financial documents, etc.

Ankur Patel

The barrier to entry to use NLP will also come down, just like it did for computer vision since 2012

Ankur Patel

Another trend that intersects well with NLP is the no-code movement in software development and machine learning. We should be able to load documents, highlight text, and have NLP models perform an array of valuable tasks such as document classification, sentiment analysis, summarization, etc. We are not quite there yet

Krzysztof Ograbek

I love the answers. Thank you!

Krzysztof Ograbek

Is your book for everyone, regardless the level of NLP experience?

Ankur Patel

Yes, it is. In fact, it is positioned best for newcomers and intermediate users

Ankur Patel

You will need to know Python and have some awareness of data science and ML though

Ajay Arasanipalai

To clarify: we don’t really require any NLP experience, but it’s definitely. If you’re a complete beginner, you can definitely pick up the book and get started - we have a bunch of external resources (which I think is actually one of the more valuable parts) to help. But it definitely won’t be enough on it’s own if you don’t have any experience with Python, PyTorch, or basic deep learning.

Ajay Arasanipalai

One thing we realized early on is that there are a lot of resources that help you get started, but less so that dive into the details of how to go from copy-pasting SciKit Learn snippets to implementing and deploying state of the models in production.

Giuditta Parolini

Why are chatbots so frustrating for the user, when NLP-based translation tools (I am thinking about DeepL for instance) can do a very good job?

Ankur Patel

Part of this has to do with NLU. Chatbots today don’t do a great job of holding relevant context across questions. You can see this firsthand if you try to ask Google Assistant or Siri or Alexa a series of related questions

Ankur Patel

Another part has to do with conversational language that many of us use in chatbots (for example, the use of casual language or idioms, both of which are very hard for NLP models today)

Ankur Patel

I think both of these items will have to get solved before chatbots appear “intelligent”

Ajay Arasanipalai

The other thing to consider that most chatbots deployed today probably aren’t using the very best of modern deep learning techniques. While the GPT-3 demos certainly look convincing, keep in mind that they require 22 GPUs for inference…

Ajay Arasanipalai

Especially for those not utilizing the latest tools (like Hugging face’s transformers library), the engineering effort and compute resources required to run a medium-sized model like BERT for a simple chatbot that really only needs to cancel orders every once in a while may be prohibitively expensive.

Giuditta Parolini

Thanks for your answers. At this point one can only say that, most of the time, chatbots are the fancy equivalent of switchboards with intolerably long recorded instructions. Both chatbots and switchboards do not give the user what (s)he needs, but they gain time when companies do not have enough customer service staff. I will try not to be too upset next time a chatbot wastes my time.

Ajay Arasanipalai

Hi everyone, I’m Ajay, Ankur Patel’s coauthor for “Applied Natural Language Processing in the Enterprise.” Thanks for setting this up, and I’m happy to stick around and answer any questions you all may have.

Ajay Arasanipalai

Quick comment: while no-one specifically asked this question, it seems like many of you here are asking about something along on the lines of what NLP applications are going to be hot in the next few years. Note that this isn’t just about completely new products and services, but things that we’ve all been using for a while from established companies may also start getting better as they incorporate transformer models.

Ajay Arasanipalai

You might have noticed that Gmail’s autocomplete and predictive keyboards have been getting better over the years. I think this trend will continue, and we’ll start to see even more autosuggestions and prompts popping up across different applications/domains. A great example of this is GitHub Copilot - https://copilot.github.com/

Alexey Grigorev

Which NLP papers are your favourite? Why?

Ankur Patel

I personally love this paper: https://arxiv.org/abs/2012.14740v1

Ankur Patel

I’m working on document understanding problems, and this is an excellent paper on the topic

Krzysztof Ograbek

What are the characteristics of a great NLP Engineer? Can you tell someone has a potential despite lack of experience?

Ankur Patel

Most of the time it comes down to practical, hands-on experience for me

Ankur Patel

I love to see Github repos and projects, even if the work experience isn’t quite there

Ankur Patel

Any indication that the individual is learning new materials and experimenting with them is a huge positive

Ankur Patel

Thirst for the space is big

Mansi Parikh

Hi, Ankur and Ajay. Thanks for interacting with the community! We appreciate your time and enthusiasm.
At what point does it become crucial for an organization to adopt an NLP-focused analytics strategy? Is it only when you’ve exhausted all other analytical opportunities for non-text data that you need to dive into this and developing new capabilities to continue to add incremental value to your business? Is it based on the actions of competitors in the market? Basically, how do you know when to seriously introduce this to an organization?

Ankur Patel

I would frame it a bit differently. If your organization uses text or documents today at reasonably high volume, then it is time to invest in NLP.

Ankur Patel

If your organization does not have high text or audio needs, there is no need to dabble in NLP

Mansi Parikh

Thank you!! I just wasn’t sure if NLP was essentially a right step forward for even organizations that may not have this type of data yet but now are realizing that they have to collect it and if NLP techniques could eventually be valuable to them. I was considering it like that, but maybe the business should just focus where it’s meant to focus.

Ankur Patel

Yes exactly. Let the business need drive the technology instead of the other way around

Mansi Parikh

One more, please, for both of you.
How can you estimate the value of an NLP undertaking to a business? Given its popularity, it might not take much to convince leadership that this is a promising route forward, but as there are usually many options for ways an organization can proceed in the future, you may still need to justify building expertise around this subject compared to alternatives and that can be done by estimating potential (or expected) business value, I suppose.

Ankur Patel

I would start by framing the problem in terms of savings to the organization if you could introduce x% automation

Ankur Patel

By itself, applying new tech for the sake of using new tech isn’t going to be too convincing, no matter how hot the new tech is

Ankur Patel

If you could deliver some value fast, that will also help with buy in

Mansi Parikh

Great! That sounds reasonable to do and provide. Sometimees fast POCs are difficult, but necessary to win over the decision-makers. Thanks again, Ankur.

Ricky McMaster

Hi Ajay Arasanipalai/Ankur Patel, thanks for doing this! Just following from the point Ajay makes above about improvements in autocomplete/predictive text, which is something I’ve been thinking about anyway recently.
As we as users become more reliant on such features, how do NLP models account for their training data potentially becoming more and more machine-generated (or at least machine-influenced), whilst humans might lose more of their standards in grammar or general literacy?
Is there a risk that it makes it more difficult for models to keep up with developing linguistic trends whilst remaining grammatically ‘correct’?

Ajay Arasanipalai

Good question. I think this is still an unsolved problem, but there are a few things that we might be able to do short-term as a quick fix. The most obvious solution is flagging the data your model generates, and avoid training on that. High quality testing and validation sets also help a lot here. You could also measure user satisfaction with the prompts and use that as a metric.

Ajay Arasanipalai

As for humans loosing grammar standards, I suppose that’s more of a social problem. The same could be said of self-driving cars, but we work on that anyway. I think worrying about polluting the training set with bad grammar due to mass normalization of autocorrect is a fairly out-there long term concern.

Ricky McMaster

Thanks, appreciate the response.

Laia

Hi Ajay Arasanipalai and Ankur Patel very interesting topic!
NLP products have become better and better, but what are the current NLP frontiers?

Ankur Patel

The two frontiers that I see are NLU and NLG, both of which are sub-areas in NLP more broadly. Here is a good post on it

Ankur Patel

NLP is much more mature today, and it is being used to process and structure all sorts of text and audio data (e.g., invoice processing).

Ankur Patel

NLU and NLG are more open areas of research that aren’t ripe enough for broad industry application just yet, partly because of the inference costs as Ajay Arasanipalai mentioned yesterday and partly because the tech isn’t quite good enough yet to be applied to broad settings (above and beyond autocomplete, for example).

Ankur Patel

NLU can unlock a lot of value because, once it has matured, we will see more context-aware conversational bots that handle queries much more like humans would, instead of the fairly “dumb” question and answer bots today

Ankur Patel

NLG will help us generate more open-ended text and audio, perhaps even assist in the creative fields such as novel writing

Ankur Patel

We have a ways to go before we get to that point though

Laia

Thanks for your answer!

WingCode

Thank you Ankur Patel & Ajay Arasanipalai for the great Q&A.
Do you think the future will be ruled by compute intensive models (example: Transformers, GPT-3) ?
Will there be more efforts put into less compute intensive techniques (example: distillation) thereby making the state of the art accessible to all?

Ajay Arasanipalai

I’m conflicted on this. Note that the other way you can get accessibility is by making compute cheaper. We’re started to see many accelerator and deep learning hardware startups that promise huge gains in efficiency and affordability. Even Nvdia, who has an undisputed monopoly on this market, continues to make better GPUs year over year. It’s entirely possible that what we consider “compute intensive” today will be very easy for the average practitioner to run 5-10 years from now.

Ajay Arasanipalai

The reason we’re seeing an interest in larger language models is that at the moment is because they are continuing to scale nicely and it’s a relatively “safe” bet to improve your model’s performance. Would you rather hire a team researchers to work on distillation for a year who may or may not be able to produce a ~20% improvement, or just change a parameter in your initializer that doubles the number of layers?

Ajay Arasanipalai

But even then, the only company I know of that regularly scales language models is OpenAI. I think most others have settled on BERT and it’s variants for practical use. I think it will be similar to the story in vision today - ResNet50 and YOLO are the standard, but there are better alternatives if you’re willing to invest the time, energy, and compute.

WingCode

Thank you for the answer Ajay 🙂 Interesting take.
Followup question. Do you think the future of GPU hardware for DL will be dominated by NVIDIA GPUs because of the wide adoption of their CUDA language? Do you think any other player can pull an “Apple” (with their ARM transition) and we get a new open source API interface?

Ajay Arasanipalai

I think the lesson we’ve learned from Nvidia is that programmability > raw performance. If developers don’t like using your platform, it won’t work.

Ajay Arasanipalai

Whether or not these hardware startups end up providing a competitive products remains to be seen (and is probably 10+ years away, so is hard to predict). But I think the most promising thing to look for short-term is software/compiler improvements to Google’s TPUs. The really big issue there is that Google doesn’t want to sell their TPUs though.

Ajay Arasanipalai

As for an open source CUDA alternative, I don’t think we’ll see anything promising within the next 1/2 years unfortunately. But that’s just my guess.

WingCode

Thank you again for the answers Ajay

Ankur Patel

Thanks everyone! And big shoutout to Alexey Grigorev for inviting us and organizing this. If you want to follow me personally on trends in the AI/ML and NLP space, please feel free to subscribe here. Hope to see you all again soon

DataTalks.Club

Applied Natural Language Processing in the Enterprise

by Ankur A. Patel, Ajay Uppili Arasanipalai

The book of the week from 26 Jul 2021 to 30 Jul 2021

Questions and Answers