Accuracy of different sentiment analysis models on IMDB dataset
Sentiment analysis is like a gateway to AI based text analysis. For any company or data scientist looking to extract meaning out of an unstructured text corpus, sentiment analysis is one of the first steps which gives a high RoI of additional insights with relatively low investment of time and efforts. With an explosion of text data available in digital formats, the need for sentiment analysis and other NLU techniques for analysing this data is growing rapidly. Sentiment Analysis looks relatively simple and works very well today, but we have reached here after significant efforts by researchers who have invented different approaches and tried numerous models.
In the chart above, we give a snapshot to the reader about the different approaches tried and their corresponding accuracy on the IMDB dataset. However, we highlight that the IMDB dataset is relatively small and this makes other methods seem competitive compared to Neural Networks. When doing this analysis on a larger dataset (like the Yelp Dataset with 5 output classes and 4 million+ reviews), the rankings change. For example, Very Deep CNNs deliver up-to 64.1% accuracy on the Yelp Dataset compared to 63% for FastText.
So,actually, when we study sentiment analysis today, we have the advantage of standing on the shoulders of giants.
Recurrent Neural Networks were developed in the 1980s. A lot of algorithms we’re going to discuss in this piece are based on RNNs. RNNs recursively apply the same function (the function it learns during training) on a combination of previous memory (called hidden unit gathered from time 0 through t-1) and new input (at time t) to get output at time t. General RNNs have problems like gradients becoming too large and too small when you try to train a sentiment model using them due to the recursive nature.
An LSTM block. Credits: Nvidia
Long Short-Term Memory (LSTM) is gradient-based learning algorithm which can learn to bridge time intervals even in case of noisy, incompressible input sequences, without loss of short time lag capabilities. This is achieved by gradient-based algorithm which enforces a constant error flow through internal states of special LSTM units. The algorithm was presented by Sepp Hochrieter et al. in 1997. Essentially, the recursive application of function in RNN is controlled by converting it into an addition process.
Gated Feedback Recurrent Neural Network extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper recurrent layers to lower layers using a global gating unit for each pair of layers. The recurrent signals exchanged between layers are gated adaptively based on the previous hidden states and the current input.
Both LSTM and GF-RNN weren’t written specifically focusing on sentiment analysis, but a lot of sentiment analysis models are based on these two highly cited papers.
RNTN was introduced in 2011-2012 by Richard Socher et al. from Standford’s NLP group. The authors introduced the Recursive Neural Tensor Network which was trained on a different kind of dataset, called the Standford Sentiment Treebank. The Stanford Sentiment Treebank was the first dataset with fully labelled parse trees that allows for a complete analysis of the compositional effects of sentiment and allows to analyze the intricacies of sentiment and to capture complex linguistic phenomena.
Example of the Recursive Neural Tensor Network accurately predicting 5 sentiment classes, very negative to very positive (– –, –, 0, +, + +), at every node of a parse tree.
The model outperformed all previous methods on several metrics and pushed the state-of-the-art in single sentence positive/negative classification from 80% up to 85.4%. A more advanced version of this algorithm called Tree LSTM was proposed by Christopher Manning et al. in 2015. The Tree-LSTM model is a generalization of LSTMs to tree-structured network topologies. TreeLSTMs outperform all existing systems and strong LSTM baselines on sentiment classification on Stanford Sentiment Treebank dataset.
Attention as a concept has given a lot of success in sequence modeling tasks, where input/output both are a sequence (like translation). Attention Mechanism can be viewed as a method for making the RNN work better by letting the network know where to look as it is performing its task. For a long time, attention based models did not fit for analyzing sentiment and related tasks, where the output is one attribute (+/-/neutral) for the entire sentence.
The invention of Self-attention has given a superb performance in certain NLP tasks, such as summarization and many other fields which were even deemed unsolvable.
This paper by Zhouhan Lin et al. proposed a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, the authors used a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. Self Attention is weighted information spread across different part of a sentence according to the global meaning.
A sample model structure showing the sentence embedding model combined with a fully connected and softmax layer for sentiment analysis
The proposed sentence embedding model consists of two parts. The first part is a bidirectional LSTM, and the second part is the self-attention mechanism, which provides a set of summation weight vectors for the LSTM hidden states. These set of summation weight vectors are dotted with the LSTM hidden states, and the resulting weighted LSTM hidden states are considered as an embedding for the sentence.
Attention Is All You Need, a paper published by Ashish Vaswani et al in Jun 2017, presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. “Attention-only” based networks might finish RNN based networks (RNTN/RNN/LSTM/GRU) altogether for some NLP tasks(such as translation), but we still have to produce a strong case for sentiment analysis using these models.
Another method that can give major success is Multi-Task Learning (MTL). MTL is a sub-field of machine learning in which multiple learning tasks are solved at the same time while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models when compared to training the models separately.
A Joint Many-Task Model (Hashimoto et al., 2016)
At ParallelDots, we have an MTL dataset tagged by our own tagging team and our new sentiment model (launching soon) is an MTL model with Self Attention.
Convnets can be used for sentiment prediction is as well. Yoon Kim proposed a Convnet architecture for sentence classification. Kim reported on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks and showed that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance of the model.
At Paralleldots, we had this as our sentiment model for a long time. Our current sentiment model is a multilayered LSTM. As mentioned before, we are moving to MTL model with self-attention.
Kim’s model architecture with two channels for an example sentence
A better Convnet version for sentiment analysis is published by Nal Kalchbrenner et al. in this paper. The author proposed a Dynamic Convolutional Neural Network (DCNN) architecture for sentence modeling tasks.
If you have a large corpus of emotion/polarity filled sentences, Open AI showed that you don’t even need to tag the corpus to train a supervised sentiment. They showcased a normal character level RNN can figure out the positive/negative sentiment on its own. This study highlights the importance of Data Quantum in Machine Learning.
The L1-regularized model (pretrained in an unsupervised fashion on Amazon reviews) matched multichannel CNN performance with only 11 labeled examples, and state-of-the-art CT-LSTM Ensembles with 232 examples
Open AI’s Unsupervised model using this representation achieved state-of-the-art sentiment analysis accuracy on a small but extensively-studied dataset, the Stanford Sentiment Treebank, churning 91.8% accuracy versus the previous best of 90.2%. Also, the model matched the performance of previous supervised systems using 30-100x fewer labeled examples. The representation created by the model contains a distinct Sentiment Neuron which contains almost all of the sentiment signal.
A few non-neural networks based models have achieved significant accuracy in analyzing the sentiment of a corpus.
- Naive Bayes – Support Vector Machines (NBSVM) works very well when the dataset is very small, at times it worked better than the neural networks based models. Sida Wang et al. showed that the simple NB and SVM variants outperformed most published results on several sentiment analysis datasets (snippets and longer documents) sometimes providing a new state-of-the-art performance level.
- FastText is a Supervised Word2Vec model. It may not be the best in terms of accuracy but explores a simple and efficient baseline for text classification. FastText is many orders of magnitude faster for training and evaluation than the deep learning based models. It can be trained on more than one billion words in less than ten minutes using a standard multicore CPU and classifies half a million sentences among 312K classes in less than a minute. It can be used for sentiment classification as well.
- DeepForest came out in early 2017 and claimed state of the art on sentiment analysis using Decision Tree like methods, even better than any neural networks based model. Zhi-Hua Zhou et al. proposed this model, also known as gcForest (multi-Grained Cascade Forest), which is a tree ensemble method. This method generates a deep forest ensemble with a cascade structure which enables gcForest to do representation learning. DeepForest enhanced the state-of-the-art for sentiment analysis on IMDB dataset by getting an 89.16% test accuracy.
The overall procedure of gcForest
For a lot of time this method was not replicated in the real world, but now the paths have been cleared for it to embark the journey upon. You can follow this discussion on Microsoft LightGBM’s Github page for more information on DeepForest. For a non-neural network based models, DeepForest seems to be the best bet.
With extensive research happening on both neural network and non-neural network-based models, the accuracy of sentiment analysis and classification tasks is destined to improve. Earlier, a major challenge associated with Deep Learning models was that the neural network architectures were highly specialized to specific domains of application. Even solving a very similar problem required retraining and reassessment of the model. With the advent of Multi-task Learning, we are envisioning models which can perform several related tasks simultaneously. These strides in developing a versatile model for performing similar tasks seems to be a path of the future.