Amazon Fine Food Review

Amazon Fine Food Review dataset consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plain text review. It also includes reviews from all other Amazon categories.

Number of reviews : 568,454

Number of users : 256,059

Number of products : 74,258

Timespan : Oct 1999 - Oct 2012

Number of Attributes/Columns in data : 10

Attribute Information :

Id - serial number

ProductId - unique identifier for the product

UserId - unqiue identifier for the user

ProfileName - name which is used by the user

HelpfulnessNumerator - number of users who found the review helpful

HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not

Score - rating between 1 and 5

Time - timestamp for the review

Summary - brief summary of the review

Text - text of the review

Problem Statement : In this challenge, Given a review, we are determining whether the review is positive (Rating of 4 or 5) or negative (rating of 1 or 2).

P.S - A rating of 4 or 5 could be cosnidered a positive review. A review of 1 or 2 could be considered negative. A review of 3 is nuetral and ignored. This is an approximate and proxy way of determining the polarity (positivity/negativity) of a review.

Source : https://www.kaggle.com/snap/amazon-fine-food-reviews

Real-world/Business Objectives and Constraints :

  1. The cost of a mis-classification can be high.
  2. No strict latency concerns.
  3. It will help everyone to unterstand the insights of large scale online businesses.

To learn more please visit :

  1. TSNE

  2. KNN

  3. Naive Bayes

  4. Logistic Regression

  5. Support Vectors Machines

  6. Decision Trees

  7. Random Forest

  8. Clustering

  9. Truncated-SVD