About the workshop

Analyzing Credit Risk data with Cloud Pak for Data on OpenShift¶

Welcome to our workshop! In this workshop we'll be using the Cloud Pak for Data platform to Collect Data, Organize Data, Analyze Data, and Infuse AI into our applications. The goals of this workshop are:

Collect and virtualize data
Visualize data with Data Refinery
Create and deploy a machine learning model
Monitor the model
Create a Python app to use the model

About this workshop¶

The introductory page of the workshop is broken down into the following sections:

Agenda
Compatability
Credits

About the data set¶

In this workshop we will be using a credit risk / lending scenario. In this scenario, lenders respond to an increased pressure to expand lending to larger and more diverse audiences, by using different approaches to risk modeling. This means going beyond traditional credit data sources to alternative credit sources (i.e. mobile phone plan payment histories, education, etc), which may introduce risk of bias or other unexpected correlations.

Use Case Diagram

The credit risk model that we are exploring in this workshop uses a training data set that contains 20 attributes about each loan applicant. The scenario and model use synthetic data based on the UCI German Credit dataset. The data is split into three CSV files and are located in the data directory of the GitHub repository you will download in the pre-work section.

Applicant Financial Data ¶

This file has the following attributes:

CUSTOMERID (hex number, used as Primary Key)
CHECKINGSTATUS
CREDITHISTORY
EXISTINGSAVINGS
INSTALLMENTPLANS
EXISTINGCREDITSCOUNT

Applicant Loan Data¶

This file has the following attributes:

CUSTOMERID
LOANDURATION
LOANPURPOSE
LOANAMOUNT
INSTALLMENTPERCENT
OTHERSONLOAN
RISK

Applicant Personal Data¶

This file has the following attributes:

CUSTOMERID
EMPLOYMENTDURATION
SEX
CURRENTRESIDENCEDURATION
OWNSPROPERTY
AGE
HOUSING
JOB
DEPENDENTS
TELEPHONE
FOREIGNWORKER
FIRSTNAME
LASTNAME
EMAIL
STREETADDRESS
CITY
STATE
POSTALCODE

Agenda¶


00:05	Welcome	Welcome to the Cloud Pak for Data workshop
00:20	Lecture - Intro and Overview	Introduction to Cloud Pak for Data and an Overview of this workshop
00:20	Lecture - Data Refinery and Data Virtualization	Data Refinery and Data Virtualization
00:30	Lab - Data Connection and Virtualization and importing the data into the project	Creating a new connection, virtualizing the data, importing the data into the project
00:10	Walkthrough - Data Connection and Virtualization	Creating a new connection, virtualizing the data, importing the data into the project
00:15	Lab - Data Visualization with Data Refinery	Refining the data, vizualizing and profiling the data
00:10	Walkthrough - Data Visualization with Data Refinery	Refining the data, vizualizing and profiling the data
00:15	Lecture - Watson Knowledge Catalog	Enterprise governance with Watson Knowledge Catalog
00:20	Lab - Enterprise data governance for Viewers using Watson Knowledge Catalog	Use and Enterprise data catalog to search, manage, and protect data
00:05	Walkthrough - Enterprise data governance for Viewers using Watson Knowledge Catalog	Use and Enterprise data catalog to search, manage, and protect data
00:20	Lab - Enterprise data governance for Admins using Watson Knowledge Catalog	Create new Categories, Business terms, Policies and Rules in Watson Knowledge Catalog
00:05	Walkthrough - Enterprise data governance for Admins using Watson Knowledge Catalog	Create new Categories, Business terms, Policies and Rules in Watson Knowledge Catalog
00:15	Lecture - Machine Learning	Machine Learning and Deep Learning concepts
00:20	Lab - AutoAI - Machine Learning with AutoAI	Use AutoAi to quickly generate a Machine Learning pipeline and model
00:10	Walkthrough - Machine Learning with AutoAI	Use AutoAi to quickly generate a Machine Learning pipeline and model
00:10	Closing	Other capabilities, review, and next steps

Compatability¶

This workshop has been tested on the following platforms:

macOS: Mojave (10.14), Catalina (10.15)

About the workshop

Analyzing Credit Risk data with Cloud Pak for Data on OpenShift¶

About this workshop¶

About the data set¶

Applicant Financial Data¶

Applicant Loan Data¶

Applicant Personal Data¶

Agenda¶

Compatability¶

Credits¶

Applicant Financial Data ¶