I prepared this tutorial because it is somehow very difficult to find a blog post with actual working BERT code from the beginning till the end. They are always full of bugs. So, I have dug into several articles, put together their codes, edited them, and finally have a working BERT model. So, just by running the code in this tutorial, you can actually create a BERT model and fine-tune it for sentiment analysis.
Natural language processing (NLP) is one of the most cumbersome areas of artificial intelligence when it comes to data preprocessing. Apart from the preprocessing and tokenizing text datasets, it takes a lot of time to train successful NLP models. But today is your lucky day! We will build a sentiment classifier with a pre-trained NLP model: BERT.
What is BERT?
BERT stands for Bidirectional Encoder Representations from Transformers and it is a state-of-the-art machine learning model used for NLP tasks. Jacob Devlin and his colleagues developed BERT at Google in 2018. Devlin and his colleagues trained the BERT on English Wikipedia (2,500M words) and BooksCorpus (800M words) and achieved the best accuracies for some of the NLP tasks in 2018. There are two pre-trained general BERT variations: The base model is a 12-layer, 768-hidden, 12-heads, 110M parameter neural network architecture, whereas the large model is a 24-layer, 1024-hidden, 16-heads, 340M parameter neural network architecture. Figure 2 shows the visualization of the BERT network created by Devlin et al.
So, I don’t want to dive deep into BERT since we need a whole different post for that. In fact, I already scheduled a post aimed at comparing rival pre-trained NLP models. But, you will have to wait for a bit.
Additionally, I believe I should mention that although Open AI’s GPT3 outperforms BERT, the limited access to GPT3 forces us to use BERT. But rest assured, BERT is also an excellent NLP model. Here is a basic visual network comparison among rival NLP models: BERT, GPT, and ELMo:
Installing Hugging Face Transformers Library
One of the questions that I had the most difficulty resolving was to figure out where to find the BERT model that I can use with TensorFlow. Finally, I discovered Hugging Face’s Transformers library.
We can easily load a pre-trained BERT from the Transformers library. But, make sure you install it since it is not pre-installed in the Google Colab notebook.
Sentiment Analysis with BERT
Now that we covered the basics of BERT and Hugging Face, we can dive into our tutorial. We will do the following operations to train a sentiment analysis model:
- Install Transformers library;
- Load the BERT Classifier and Tokenizer alıng with Input modules;
- Download the IMDB Reviews Data and create a processed dataset (this will take several operations;
- Configure the Loaded BERT model and Train for Fine-tuning
- Make Predictions with the Fine-tuned Model
Let’s get started!
Note that I strongly recommend you to use a Google Colab notebook. If you want to learn more about how you will create a Google Colab notebook, check out this article:
Installing Transformers
Installing the Transformers library is fairly easy. Just run the following pip line on a Google Colab cell:
After the installation is completed, we will load the pre-trained BERT Tokenizer and Sequence Classifier as well as InputExample
and InputFeatures
. Then, we will build our model with the Sequence Classifier and our tokenizer with BERT’s Tokenizer.
Let’s see the summary of our BERT model:
Here are the results. We have the main BERT model, a dropout layer to prevent overfitting, and finally a dense layer for classification task:
Now that we have our model, let’s create our input sequences from the IMDB reviews dataset:
IMDB Dataset
IMDB Reviews Dataset is a large movie review dataset collected and prepared by Andrew L. Maas from the popular movie rating service, IMDB. The IMDB Reviews dataset is used for binary sentiment classification, whether a review is positive or negative. It contains 25,000 movie reviews for training and 25,000 for testing. All these 50,000 reviews are labeled data that may be used for supervised deep learning. Besides, there is an additional 50,000 unlabeled reviews that we will not use in this case study. In this case study, we will only use the training dataset.
Initial Imports
We will first have two imports: TensorFlow and Pandas.
Get the Data from the Stanford Repo
Then, we can download the dataset from Stanford’s relevant directory with tf.keras.utils.get_file
function, as shown below:
Remove Unlabeled Reviews
To remove the unlabeled reviews, we need the following operations. The comments below explain each operation:
Train and Test Split
Now that we have our data cleaned and prepared, we can create text_dataset_from_directory
with the following lines. I want to process the entire data in a single batch. That’s why I selected a very large batch size:
Convert to Pandas to View and Process
Now we have our basic train and test datasets, I want to prepare them for our BERT model. To make it more comprehensible, I will create a pandas dataframe from our TensorFlow dataset object. The following code converts our train Dataset object to train pandas dataframe:
Here is the first 5 row of our dataset:
I will do the same operations for the test dataset with the following lines:
Creating Input Sequences
We have two pandas Dataframe objects waiting for us to convert them into suitable objects for the BERT model. We will take advantage of the InputExample
function that helps us to create sequences from our dataset. The InputExample
function can be called as follows:
Now we will create two main functions:
1 — convert_data_to_examples
: This will accept our train and test datasets and convert each row into an InputExample object.
2 — convert_examples_to_tf_dataset
: This function will tokenize the InputExample objects, then create the required input format with the tokenized objects, finally, create an input dataset that we can feed to the model.
We can call the functions we created above with the following lines:
Our dataset containing processed input sequences are ready to be fed to the model.
Configuring the BERT model and Fine-tuning
We will use Adam as our optimizer, CategoricalCrossentropy as our loss function, and SparseCategoricalAccuracy as our accuracy metric. Fine-tuning the model for 2 epochs will give us around 95% accuracy, which is great.
Training the model might take a while, so ensure you enabled the GPU acceleration from the Notebook Settings. After our training is completed, we can move onto making sentiment predictions.
Making Predictions
I created a list of two reviews I created. The first one is a positive review, while the second one is clearly negative.
We need to tokenize our reviews with our pre-trained BERT tokenizer. We will then feed these tokenized sequences to our model and run a final softmax layer to get the predictions. We can then use the argmax
function to determine whether our sentiment prediction for the review is positive or negative. Finally, we will print out the results with a simple for loop. The following lines do all of these said operations:
Also, with the code above, you can predict as many reviews as possible.
Congratulations
You have successfully built a transformers network with a pre-trained BERT model and achieved ~95% accuracy on the sentiment analysis of the IMDB reviews dataset! If you are curious about saving your model, I would like to direct you to the Keras Documentation. After all, to efficiently use an API, one must learn how to read and use the documentation.
Subscribe to the Mailing List for the Full Code
Besides my latest content, I also share my Google Colab notebooks with my subscribers, containing full codes for every post I published.
If you liked this post, consider subscribing to the Newsletter! ✉️
Since you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin
Enjoyed the Article
If you like this article, check out my other NLP articles: