A tutorial To Find Best Scikit classifiers For Sentiment Analysis

With growing interaction of people in cyber space, sentiment analysis has become a key area of ML. We will use scikit to predict the bad comments in the given data set.

Here’s the flow chart of the approach that we are going to take.

बिना शीर्षक दस्तावेज़
Steps to sentiment analysis

Now the we have defined the approach, let’s get our hand dirty with the code. I have written a python notebook explaining each step. We have tried to predict bad comments using four different famous classifiers, SVC, MultinomialNB, LogisticRegression, and SGDClassifier.

Click here to download the dataset


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Final Result: SVC: 66% MultinomialNB: 11% LogisticRegression: 59% SGDClassifier: 47%

So, Naive Bayes gives very bad result. It can just predict 11% of bad comments. SGDClassifier predicted 47% of bad comments correctly which is a considerable improvement over the Naive Bayes. Logistic Regression though has regression in its surname but its a classifier and it shows good improvement over SGDClassifier.

SVC comes out as winner with 66 % correct prediction for sentiment analysis.

As you can see, each classifier consist of many different parameters.

For example MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True) has alpha, class_prior and fit_prior.

In this post, we have run each classifier with the default setting. We will try to see how we can do performance tuning by changing parameters in the next post.


  • unicode(message, ‘utf8’).lower() may throw an error in python 3, replace with str(message).lower(). This is because default encoding is UTF-8 instead of ASCII in python 3. 

  • you may have to separately download additional databases such as ‘wordnet’ or ‘punkt’ for language processing, using nltk.download() command in your IDE.

Read More about classification

Logistic Regression


Naive Bayes

Count Vectorizer

Text Blob

Productionalise your scikit model

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: