Personalized Cancer Diagnosis

Once sequenced, a cancer tumor can have thousands of genetic mutations. But the challenge is distinguishing the mutations that contribute to tumor growth (drivers) from the neutral mutations (passengers). Currently this interpretation of genetic mutations is being done manually. This is a very time-consuming task where a clinical pathologist has to manually review and classify every single genetic mutation based on evidence from text-based clinical literature.

For this competition MSKCC is making available an expert-annotated knowledge base where world-class researchers and oncologists have manually annotated thousands of mutations. We need your help to develop a Machine Learning algorithm that, using this knowledge base as a baseline, automatically classifies genetic variations.

Problem Statement : In this challenge, we are trying to classify the given genetic variations/mutations based on evidence from text-based clinical literature.

Source : https://www.kaggle.com/c/msk-redefining-cancer-treatment/data

Data Description :

We have two data files : one conatins the information about the genetic mutations and the other contains the clinical evidence (text) that human experts/pathologists use to classify the genetic mutations.

Both these data files are have a common column called ID

Data file’s information : training_variants (ID , Gene, Variations, Class) training_text (ID, Text)

Real-world/Business Objectives and Constraints :

  1. No low-latency requirement.
  2. Interpretability is important.
  3. Errors can be very costly.
  4. Probability of a data-point belonging to each class is needed.

To learn more please visit : Here