FAQ

2024-03-29 02:45:42  Updated

1. How do I make sure that a resource is ready?

Assume that you want to view the resource status of a cluster. Run the following command:

kubectl get obclusters.oceanbase.oceanbase.com test -n oceanbase

If the status is running in the response, the resource is ready.

# desired output
NAME   STATUS    AGE
test   running   6m2s

2. How do I view the O&M status of a resource?

Assume that you want to view the resource status of a cluster. Run the following command:

kubectl get obclusters.oceanbase.oceanbase.com test -n oceanbase -o yaml

You can check the status and progress of O&M tasks based on values of parameters in the operationContext section in the response.

status:
  image: oceanbase/oceanbase-cloud-native:4.2.0.0-101000032023091319
  obzones:
  - status: delete observer
    zone: obcluster-1-zone1
  - status: delete observer
    zone: obcluster-1-zone2
  - status: delete observer
    zone: obcluster-1-zone3
  operationContext:
    failureRule:
      failureStatus: running
      failureStrategy: retry over
      retryCount: 0
    idx: 2
    name: modify obzone replica
    targetStatus: running
    task: wait obzone topology match
    taskId: c04aeb28-01e7-4f85-b390-8d855b9f30e3
    taskStatus: running
    tasks:
    - modify obzone replica
    - wait obzone topology match
    - wait obzone running
  parameters: []
  status: modify obzone replica

3. How do I do troubleshooting for ob-operator and OceanBase?

  • Generally, you need to first analyze the logs of ob-operator to locate an error. Run the following command to view the logs of ob-operator:
kubectl logs oceanbase-controller-manager-86cfc8f7bf-js95z -n oceanbase-system -c manager  | less
  • View the logs of the OBServer node
# Log on to the container of the OBServer node.
kubectl exec -it obcluster-1-zone1-8ab645f4d0f9 -n oceanbase -c observer -- bash

# The directory where the log files are located.
cd /home/admin/oceanbase/log

4. How do I fix a "stuck" resource in ob-operator?

As ob-operator uses a state machine and task flow to manage custom resources (CRs) and their O&M operations, there may be situations where CRs are in an unexpected state. This could include continuously retrying a task flow that is bound to fail, failing to delete a resource, or mistakenly deleting a resource that needs to be recovered. In cases where a CR cannot be restored to normal through regular operations, you can use the OBResourceRescue resource to rescue the problematic CR. The OBResourceRescue resource includes four types of operations: reset, delete, retry, and skip.

A typical OBResourceRescue CR configuration is as follows:

apiVersion: oceanbase.oceanbase.com/v1alpha1
kind: OBResourceRescue
metadata:
  generateName: rescue-reset- # generateName needs to be used with kubectl create -f
spec:
  type: reset
  targetKind: OBCluster
  targetResName: test
  targetStatus: running # The target status needs to be filled in when the type is reset

The key configurations are explained in the following table:

Configuration item Optional values Description
type reset, delete, retry, skip The type of the resource rescue action
targetKind OBCluster, OBZone, OBTenant, and other CRD kind managed by ob-operator The kind of the resource to be rescued
targetResName / The name of the resource to be rescued
targetStatus / This field needs to be filled in when the type is reset, indicating the status of the resource after the reset

Reset

The configuration example of the typical CR above is a reset type of resource rescue. After creating this resource in the K8s cluster using the kubectl create -f command, ob-operator sets the status.status of the resource whose kind is OBCluster and name is test to running (the targetStatus set in the configuration file), and sets the status.operationContext of the resource to empty.

Delete

The configuration example of the delete type of rescue action is as follows. After creating this resource in the cluster, ob-operator clears the finalizers field of the target resource and sets the deletionTimestamp of the resource to the current time.

# ...
spec:
  type: delete
  targetKind: OBCluster
  targetResName: test

Retry

The configuration example of the retry type of rescue action is as follows. After creating this resource in the cluster, ob-operator sets the status.operationContext.retryCount of the target resource to 0 and sets the status.operationContext.taskStatus to pending. Resources in this state will retry the current task.

# ...
spec:
  type: retry
  targetKind: OBCluster
  targetResName: test

Skip

The configuration example of the skip type of rescue action is as follows. After creating this resource in the cluster, ob-operator directly sets the status.operationContext.taskStatus of the target resource to successful. After receiving this message, the task manager will execute the next task in the tasks field.

# ...
spec:
  type: skip
  targetKind: OBCluster
  targetResName: test

Contact Us