Skip to main contentIBM DataPower Operator Documentation

Stale Webhooks

The DataPower Operator implements and uses Kubernetes Admission Controllers for validating and defaulting DataPowerService requests.

Background

During the DataPower Operator pod’s boot sequence, it will attempt to fetch and/or create several Kubernetes resources for the validating and defaulting webhooks. These resources include:

  • Secret containing TLS key/cert for webhook API
  • Service to expose the webhook API
  • MutatingWebhookConfiguration for the defaulting webhook
  • ValidatingWebhookConfiguration for the validating webhook

The Secret and Service are owned by the DataPower Operator’s ReplicaSet, and thus share the same life cycle as the operator pod itself. However, the two webhook configurations are cluster-scope resources, and are instead owned by the operator’s ClusterRole. The expectation is that the ClusterRole shares the same life cycle as the DataPower Operator, but there are cases where this life cycle can be broken, leading to a mismatch between the TLS key/cert referenced by the configurations, and the TLS key/cert being used by the current operator pod.

Problem determination

If you see errors relating to bad certificate or x509 certificate validation being logged in the DataPower Operator pod, you are most likely seeing this issue. Some examples are shown below to assist with problem determination.

2020/05/05 04:13:22 http: TLS handshake error from x.x.x.x:57076: remote error: tls: bad certificate
Internal error occurred: failed calling webhook "datapowerservices.defaulter.datapower.ibm.com": Post https://datapower-operator.namespace.svc:443/default-datapower-ibm-com-v1beta2-datapowerservice?timeout=2s: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "datapower-operator-ca")

Resolution

To recover from this state, both the MutatingWebhookConfiguration and ValidatingWebhookConfiguration configurations need to be manually deleted, and then recreated by the DataPower Operator. At a high-level, this process will be the following:

  1. Delete stale MutatingWebhookConfiguration resource
  2. Delete stale ValidatingWebhookConfiguration resource
  3. Delete datapower-operator Pod

When the datapower-operator Pod is deleted, it will automatically be recreated by the Deployment, and the newly created Pod will recreate the webhook configuration resources.

Steps to Recover

  1. Fetch the MutatingWebhookConfiguration resource from the cluster, taking note of the name (will be the first column in the output).

    kubectl get MutatingWebhookConfiguration | grep datapower.ibm.com

    If you need to filter down further, you can filter by the namespace you’ve deployed the DataPower Operator in. For example:

    kubectl get MutatingWebhookConfiguration | grep datapower.ibm.com | grep namespace
  2. Delete the MutatingWebhookConfiguration resource, where name is the name found in Step 1.

    kubectl delete MutatingWebhookConfiguration/name
  3. Fetch the ValidatingWebhookConfiguration resource from the cluster, taking note of the name (will be the first column in the output).

    kubectl get ValidatingWebhookConfiguration | grep datapower.ibm.com

    If you need to filter down further, you can filter by the namespace you’ve deployed the DataPower Operator in. For example:

    kubectl get ValidatingWebhookConfiguration | grep datapower.ibm.com | grep namespace
  4. Delete the ValidatingWebhookConfiguration resource, where name is the name found in Step 3.

    kubectl delete ValidatingWebhookConfiguration/name
  5. Fetch the DataPower Operator pod, taking note of the name (will be the first column in the output).

    kubectl [-n namespace] get pod | grep datapower-operator
  6. Delete the DataPower Operator pod, where name is the name of the pod found in Step 5.

    kubectl [-n namespace] delete pod/name

The DataPower Operator’s ReplicaSet will schedule a new Pod, which will create all the resources in the cluster again in a clean state.

Further Troubleshooting

If you continue to see issues after following the above steps, please engage IBM Support.