Skip to content

Classification Exercise

What is classification in machine learning?

Classification in machine learning is when the feature to be predicted contains categories of values. Each of these categories is considered as a class into which the predicted value falls and hence has its name, classification. An example of this could be predicting the parts of speech (verb, noun, adjective, etc.) of words within a given text.

Objectives

By the end of this module, the participant will have learned:

  • What classification in Machine Learning is
  • Basics of training a Naive Bayes Model using sklearn
  • Basics of training a logistic regression model using sklearn

About the DataSet

In this tutorial, we use a data set that contains information about customers of an online trading platform to classify whether a given customer’s probability of churn will be high, medium, or low. This provides a good example to learn how a classification model is built from start to end. The three classes that prediction will fall under are high, medium, and low. The data is available to us in the form of a .csv file and is imported using the pandas library. We use numpy and matplotlib to get some statistics and visualize data.

Hist plot of the Data

Pay close attention to the pre-processing sections of the notebook associated with this module. The steps should give you an idea of the kinds of processing needed to prepare the data for classification models.

Open the Jupyter notebook

  1. Go the (☰) navigation menu and under the Projects section click on All Projects.

    (☰) Menu -> Projects

  2. Click the project name you created in the Workshop Setup section.

  3. From your Project overview page, click on the Assets tab to open the assets page where your project assets are stored and organized.

  4. Scroll down to the Notebooks section of the page and click on the pencil icon at the right of the classification-exercise.ipynb notebook.

    open notebook

    Note

    You may see additional notebooks that differ from the ones in this screenshot.

  5. When the Jupyter notebook is loaded and the kernel is ready, we will be ready to start executing it in the next section.

Run the Jupyter notebook

Spend some time looking through the sections of the notebook to get an overview. A notebook is composed of text (markdown or heading) cells and code cells. The markdown cells provide comments on what the code is designed to do.

You will run cells individually by highlighting each cell, then either click the Run button at the top of the notebook. While the cell is running, an asterisk ([*]) will show up to the left of the cell. When that cell has finished executing a sequential number will show up (i.e. [17]).

Hint

You can also run a cell by pressing Shift + Enter on your keyboard instead of clicking the run () button.

Summary

In this module we learned the basics of supervised machine learning using the Naive Bayes and logistic regression models and compared the effectiveness of the two against the actual data.

To learn more and for a more in-depth survey of commonly used classification algorithms, click on the learn more button below and follow the learning path at your own time.

Learn More about Classification