A tutorial To Find Best Scikit classifiers For Sentiment Analysis

With growing interaction of people in cyber space, sentiment analysis has become a key area of ML. We will use scikit to predict the bad comments in the given data set.

Here’s the flow chart of the approach that we are going to take.

बिना शीर्षक दस्तावेज़
Steps to sentiment analysis

Now the we have defined the approach, let’s get our hand dirty with the code. I have written a python notebook explaining each step. We have tried to predict bad comments using four different famous classifiers, SVC, MultinomialNB, LogisticRegression, and SGDClassifier.

Click here to download the dataset

View this gist on GitHub

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Final Result: SVC: 66% MultinomialNB: 11% LogisticRegression: 59% SGDClassifier: 47%

So, Naive Bayes gives very bad result. It can just predict 11% of bad comments. SGDClassifier predicted 47% of bad comments correctly which is a considerable improvement over the Naive Bayes. Logistic Regression though has regression in its surname but its a classifier and it shows good improvement over SGDClassifier.

SVC comes out as winner with 66 % correct prediction for sentiment analysis.

As you can see, each classifier consist of many different parameters.

For example MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True) has alpha, class_prior and fit_prior.

In this post, we have run each classifier with the default setting. We will try to see how we can do performance tuning by changing parameters in the next post.


  • unicode(message, ‘utf8’).lower() may throw an error in python 3, replace with str(message).lower(). This is because default encoding is UTF-8 instead of ASCII in python 3. 

  • you may have to separately download additional databases such as ‘wordnet’ or ‘punkt’ for language processing, using nltk.download() command in your IDE.

Read More about classification

Logistic Regression


Naive Bayes

Count Vectorizer

Text Blob

Productionalise your scikit model

If you liked this article and would like one such blog to land in your inbox every week, consider subscribing to our newsletter: https://skillcaptain.substack.com

Leave a Reply

Up ↑

%d bloggers like this: