Image Classification Decoded - A step-by-step guide for Newbies

23 Nov 2019

The RVL-CDIP (Ryerson Vision Lab Complex Document Information Processing) dataset consists of 400,000 grayscale images in 16 classes, with 25,000 images per class. There are 320,000 training images, 40,000 validation images, and 40,000 test images. The images are sized so their largest dimension does not exceed 1000 pixels.

Link of the RVL-CDIP Dataset


Problem Statemtent - Detection of different types of Document images and classify them in different classes like letter, form, email, handwritten, advertisement, scientific report, scientific publication, specification, file folder, news article, budget, invoice, presentation, questionnaire, resume, memo. No document can belong to more than one class so it is a Multiclass Classification Problem. It is also known as a Computer Vision task.


Business objectives and constraints - Like every problem, there is a real-life/business objective and constraints involved with this Document Image Classification task. Those are - The cost of misclassification can be high, No strict latency concerns, Computationally Expensive.