Monitoring Kubernetes infrastructure with Sysdig Monitor¶
As a pre-requisite to this lab, complete the setup instructions desribed here.
Deploy sample Petclinic application to IKS cluster as described here in this exercise. Skip the optional MySQL section. This exercise deploys the application and sets the ingress routes to access the application and the related microservices.
Step 1: Overview of the infrastructure¶
The Overview tab provided a unified view of health, risk, capacity of the infrastructure.
-
Select the
Overviewtab and pickClustersfrom the list of options. This view shows the overall health of the clusters being monitored. Hover over the space shown in the red arrow to get additional context menu. Use the options in this menu for further analysis. -
Click on the
Overviewtab and selectWorkloadsto segment the view based on the workloads. Choose theNamespaceibm-observeto view the workloads belonging to the Sysdig.
Step 2: Using Dashboards¶
Dashboards provides you a collection of relevant views and metrics for the infrastructure in a single view.
-
Let's try couple of views from the existing list of templates. Select the
Dashboardtab and pickContainers > Containers Resource Usage. -
Select the
Dashboardtab again and pickTroubleshooting > Process Resource Usage.
To add your custome view click on the Dashboards tab and select the (+) icon.
Step 3: Troubleshooting with Explore¶
This tab helps you view and troubleshoot key metrics and entities of your infrastructure stack.
-
Explore net.http.request.count
To test this out, let's atrificially trigger some traffic to the
Customerservice. Set theHOSTvariable and invoke the REST service API in a loop:export HOST="https://petclinic.$INGRESS_SUBDOMAIN" echo $HOSTClick on thefor i in `seq 1 1000` ; \ do \ echo -e "\n ======= Loop count: $i ========= \nCalling owners:" && \ wget -q -O - "${HOST}/api/customer/owners" ; doneExploretab and selectContainerized Apps. In the Metrics dropdown list, selectnet.http.request.countto view the request spikes. -
View Response times between containers
Select the
Response Timesdashboard to get a general view of network traffic between the contianers and the response times.
Step 4: Setting up Alerts¶
Alerts generate notifications based on certain conditions or events that requires further attention.
- Crash the Vists service.
$ kubectl get pods NAME READY STATUS RESTARTS AGE api-gateway-575f59b7d8-vlm6x 1/1 Running 0 115m customers-687749cfb-vzblv 1/1 Running 0 115m vets-6bb6655b7f-dpf88 1/1 Running 0 115m visits-784749c647-rxh2n 1/1 Running 0 115m $ kubectl delete pods visits-784749c647-rxh2n pod "visits-784749c647-rxh2n" deleted -
Setup notification channel
Open the settings menu from the top left bottom.

Add a
Notification Channelof typeEmail. -
Adding a new Alert
Click on the
Alertstab and then click on(+) Add Alertat the top right. SelectEventas theAlert Type. EnterExitCode = 143forTag or Description,Source Tagvalue ascontainerd. Enable the notification channelVisits Service Crashand click on theCREATEbutton.Rerun the Visits service crash set to trigger an alert. You will receive separate notifications for
TriggeredandResolvedstatus.
Step 5: Analysing Events¶
The Events view displays a comprehensive list of events that have occurred in the environment. Let's create one additional event prior to looking at the Events view. noimage-service.yaml poiints to a non-existent image.
-
Create the
Image Pullerror eventcd $HOME/kubernetes-logging-and-monitoring/src $ kubectl create -f k8s/monitor/noimage-service.yaml deployment.apps/missing-image created service/missing-image-service createdThe pod with missing image will show a status of
ErrImagePull.kubectl get pods NAME READY STATUS RESTARTS AGE api-gateway-575f59b7d8-vlm6x 1/1 Running 0 11h customers-687749cfb-vzblv 1/1 Running 0 11h missing-image-6c677574d8-zqc57 0/1 ErrImagePull 0 54s vets-6bb6655b7f-dpf88 1/1 Running 0 11h visits-784749c647-t6tb9 1/1 Running 0 9h -
View the events
Open the Events view to find
Back Off Container Start or Image PullandContainer Image Pull, Create or Start Failedat the top of the events list. Select the Event to get additional details.
Step 6: Creating Captures¶
The Captures tab gives a list of capture files with system calls and OS events for further analysis.