Monitoring Kubernetes infrastructure with Sysdig Monitor¶
As a pre-requisite to this lab, complete the setup instructions desribed here.
Deploy sample Petclinic
application to IKS cluster as described here in this exercise. Skip the optional MySQL section. This exercise deploys the application and sets the ingress routes to access the application and the related microservices.
Step 1: Overview of the infrastructure¶
The Overview
tab provided a unified view of health, risk, capacity of the infrastructure.
-
Select the
Overview
tab and pickClusters
from the list of options. This view shows the overall health of the clusters being monitored. Hover over the space shown in the red arrow to get additional context menu. Use the options in this menu for further analysis. -
Click on the
Overview
tab and selectWorkloads
to segment the view based on the workloads. Choose theNamespace
ibm-observe
to view the workloads belonging to the Sysdig.
Step 2: Using Dashboards¶
Dashboards provides you a collection of relevant views and metrics for the infrastructure in a single view.
-
Let's try couple of views from the existing list of templates. Select the
Dashboard
tab and pickContainers > Containers Resource Usage
. -
Select the
Dashboard
tab again and pickTroubleshooting > Process Resource Usage
.
To add your custome view click on the Dashboards tab and select the (+) icon.
Step 3: Troubleshooting with Explore¶
This tab helps you view and troubleshoot key metrics and entities of your infrastructure stack.
-
Explore net.http.request.count
To test this out, let's atrificially trigger some traffic to the
Customer
service. Set theHOST
variable and invoke the REST service API in a loop:export HOST="https://petclinic.$INGRESS_SUBDOMAIN" echo $HOST
Click on thefor i in `seq 1 1000` ; \ do \ echo -e "\n ======= Loop count: $i ========= \nCalling owners:" && \ wget -q -O - "${HOST}/api/customer/owners" ; done
Explore
tab and selectContainerized Apps
. In the Metrics dropdown list, selectnet.http.request.count
to view the request spikes. -
View Response times between containers
Select the
Response Times
dashboard to get a general view of network traffic between the contianers and the response times.
Step 4: Setting up Alerts¶
Alerts generate notifications based on certain conditions or events that requires further attention.
- Crash the Vists service.
$ kubectl get pods NAME READY STATUS RESTARTS AGE api-gateway-575f59b7d8-vlm6x 1/1 Running 0 115m customers-687749cfb-vzblv 1/1 Running 0 115m vets-6bb6655b7f-dpf88 1/1 Running 0 115m visits-784749c647-rxh2n 1/1 Running 0 115m $ kubectl delete pods visits-784749c647-rxh2n pod "visits-784749c647-rxh2n" deleted
-
Setup notification channel
Open the settings menu from the top left bottom.
Add a
Notification Channel
of typeEmail
. -
Adding a new Alert
Click on the
Alerts
tab and then click on(+) Add Alert
at the top right. SelectEvent
as theAlert Type
. EnterExitCode = 143
forTag or Description
,Source Tag
value ascontainerd
. Enable the notification channelVisits Service Crash
and click on theCREATE
button.Rerun the Visits service crash set to trigger an alert. You will receive separate notifications for
Triggered
andResolved
status.
Step 5: Analysing Events¶
The Events
view displays a comprehensive list of events that have occurred in the environment. Let's create one additional event prior to looking at the Events view. noimage-service.yaml
poiints to a non-existent image.
-
Create the
Image Pull
error eventcd $HOME/kubernetes-logging-and-monitoring/src $ kubectl create -f k8s/monitor/noimage-service.yaml deployment.apps/missing-image created service/missing-image-service created
The pod with missing image will show a status of
ErrImagePull
.kubectl get pods NAME READY STATUS RESTARTS AGE api-gateway-575f59b7d8-vlm6x 1/1 Running 0 11h customers-687749cfb-vzblv 1/1 Running 0 11h missing-image-6c677574d8-zqc57 0/1 ErrImagePull 0 54s vets-6bb6655b7f-dpf88 1/1 Running 0 11h visits-784749c647-t6tb9 1/1 Running 0 9h
-
View the events
Open the Events view to find
Back Off Container Start or Image Pull
andContainer Image Pull, Create or Start Failed
at the top of the events list. Select the Event to get additional details.
Step 6: Creating Captures¶
The Captures
tab gives a list of capture files with system calls and OS events for further analysis.