With growing interaction of people in cyber space, sentiment analysis has become a key area of ML. We will use scikit to predict the bad comments in the given data set.

Here’s the flow chart of the approach that we are going to take.

बिना शीर्षक दस्तावेज़ — Steps to sentiment analysis

Now the we have defined the approach, let’s get our hand dirty with the code. I have written a python notebook explaining each step. We have tried to predict bad comments using four different famous classifiers, SVC, MultinomialNB, LogisticRegression, and SGDClassifier.

Click here to download the dataset

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

view raw sentiment_analysis.ipynb hosted with ❤ by GitHub

Final Result: SVC: 66% MultinomialNB: 11% LogisticRegression: 59% SGDClassifier: 47%

So, Naive Bayes gives very bad result. It can just predict 11% of bad comments. SGDClassifier predicted 47% of bad comments correctly which is a considerable improvement over the Naive Bayes. Logistic Regression though has regression in its surname but its a classifier and it shows good improvement over SGDClassifier.

SVC comes out as winner with 66 % correct prediction for sentiment analysis.

As you can see, each classifier consist of many different parameters.

For example MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True) has alpha, class_prior and fit_prior.

In this post, we have run each classifier with the default setting. We will try to see how we can do performance tuning by changing parameters in the next post.

Troubleshooter:

unicode(message, ‘utf8’).lower() may throw an error in python 3, replace with str(message).lower(). This is because default encoding is UTF-8 instead of ASCII in python 3.
you may have to separately download additional databases such as ‘wordnet’ or ‘punkt’ for language processing, using nltk.download() command in your IDE.

A tutorial To Find Best Scikit classifiers For Sentiment Analysis

3 thoughts on “A tutorial To Find Best Scikit classifiers For Sentiment Analysis”

Add yours

Leave a Reply Cancel reply

Share this:

Related

3 thoughts on “A tutorial To Find Best Scikit classifiers For Sentiment Analysis”

Add yours

Leave a Reply Cancel reply