The notebook provided is a boilerplate to get going with this workshop. For starters, we’ve included the methods to import the data, clean the data, and build 3 different models using the Scikit-learn library.
The first task in the notebook is to import our CSV file as a pandas dataframe. To do this, click the find and add data button, a sidebar will open
Next, select the cell with the comment IMPORT DATA FRAME HERE
Now, in the sidebar that opened, select our CSV data source, select the dropdown to Insert to code and select the option Insert pandas DataFrame
You should see the generated code appear. You can close the sidebar using the X button.
The last thing we need to do is a housekeeping step.
When Watson Studio generates the code to import our CSV as a dataframe, is generates a variable name to assign the DataFrame to, in the form of df_data_X
, in the cell beloow our import code, rename the variable with your own.
(For example, in our case my_df = df_data_X
would be changed to my_df = df_data_2
)
Now that you’ve made these changes, it’s time to run! One of the benefits of using a notebook, is that instead of having to run your entire program at once, you can run each cell individually. This allows you to debug more effeciently, seeing where errors arise, but also gives you the ability to quickly test things out. In order to run the notebook, select the cell at the top of the notebook, and run by using the key-combo shift
+enter
this will run one cell and then move your cursor to the next cell. If you’d rather run the whole document at once, click the Cell menu at the top, and select Run All.