|
| 1 | + |
| 2 | +## Manager ## |
| 3 | + |
| 4 | +### Processing ### |
| 5 | + |
| 6 | +The manager processes tasks (default: 1 second) in a _main_ loop. |
| 7 | +1. Fetch cluster resources using a k8s cached client. |
| 8 | +2. Process queued task delete and cancel requests. |
| 9 | +3. Delete orphaned pods. Orphans are pod within the namespace with task |
| 10 | + labels that does not correspond to a task in the running state. |
| 11 | +4. Fetch running tasks; update their status based on the associated pod/container status. |
| 12 | +5. Kill zombies. Zombies are _sidecar_ containers that have not terminated on their own after the |
| 13 | + _main_ (addon) container has terminated. |
| 14 | +6. Fetch and run new (state=Ready) tasks: |
| 15 | + 1. select addon. See: _Addons.Selection_. |
| 16 | + 2. select extensions. See: _Extensions.Selection_. |
| 17 | + 3. create pod. See: _Pods_. |
| 18 | + |
| 19 | +### Priority ### |
| 20 | + |
| 21 | +Tasks with state=Ready are started based on their `Priority` property. |
| 22 | +Priority zero(0) is the lowest and the default. There is no maximum. The manager process tasks ordered by |
| 23 | +priority. As a result, task pods are created in the order of priority. However, after the pod is created, |
| 24 | +the pod scheduling order is at the discretion of the k8s node-scheduler. To maximize the influence of task |
| 25 | +priority ordering, it is highly recommended for administrators to create a k8s _Resource Quota_ in the |
| 26 | +namespace to restrict the number of pods created. |
| 27 | + |
| 28 | +### Resource Quota ### |
| 29 | + |
| 30 | +When a pod cannot be created due to quota restriction, the manager sets the state=_QuotaBlocked_ |
| 31 | +and a _QuotaBlocked_ event is reported on the task. The manager will attempt to create the pod in |
| 32 | +every processing cycle. |
| 33 | + |
| 34 | +### Priority Escalation ### |
| 35 | + |
| 36 | +A task's priority may be escalated (increased) when one of its dependencies is also in the |
| 37 | +set of tasks ready to be started. The goal is to prevent lower priority dependencies |
| 38 | +from impeding higher priority tasks. |
| 39 | + |
| 40 | +Example: |
| 41 | + |
| 42 | +- Task id=10 (kind=A) (priority=0) |
| 43 | +- Task id=12 (kind=B) (priority=1) depends on: `A` |
| 44 | + |
| 45 | +When scheduling both tasks, task(12) cannot run until task(10) has completed. This |
| 46 | +condition effectively makes task(12) priority=0. To prevent this, the manager |
| 47 | +will _escalate_ task(10) priority=1 to match task(12). |
| 48 | + |
| 49 | +### Preemption ### |
| 50 | + |
| 51 | +To prevent priority inversions, the manager supports preempting a _running_ task so that a |
| 52 | +higher priority _ready_ task pod may be scheduled. Preemption is the act of killing (deleting) |
| 53 | +the pod of a _running_ task, so that the higher _blocked_ task may be created/scheduled |
| 54 | +by the node-scheduler. A task is considered _blocked_ when it cannot be created due to |
| 55 | +a resource quota (state=QuotaBlocked) or cannot be scheduled by the node-scheduler |
| 56 | +(state=Pending) for a defined duration (default: 1 minute). |
| 57 | +To trigger preemption, the _blocked_ task must have Policy.PreemptEnabled=TRUE. When |
| 58 | +the need for preemption is detected, the manger will preempt a percentage (default: 10%) of the |
| 59 | +newest, lower priority tasks processing cycle. To prevent _thrashing_ a preempted task will |
| 60 | +be postponed for a defined duration (default: 1 minute). |
| 61 | +When a task is preempted: |
| 62 | +1. The pod is deleted. |
| 63 | +2. The task state is reset to Ready. |
| 64 | +3. A `Preempted` event is recorded. |
| 65 | + |
| 66 | +### Macros ### |
| 67 | + |
| 68 | +The manager supports injecting values into Addon and Extension specifications. |
| 69 | +Each macro has the syntax of: ${_name_}. |
| 70 | + |
| 71 | +Supported: |
| 72 | + |
| 73 | +- ${**seq**:_pool_} - Number sequence generator. The _pool_ is the identifier and the beginning number. |
| 74 | + Example usage for network port assignment: |
| 75 | + ```yaml |
| 76 | + PORT_A: ${seq:8000} |
| 77 | + PORT_B: ${seq:8000} |
| 78 | + ``` |
| 79 | + Results in: |
| 80 | + ```yaml |
| 81 | + PORT_A: 8000 |
| 82 | + PORT_B: 8001 |
| 83 | + ``` |
| 84 | +
|
| 85 | +### Pods ### |
| 86 | +
|
| 87 | +Tasks are executed using Kubernetes Pods. When a task is _state=Ready_ to run, the |
| 88 | +manager creates a Pod resource which is associated to the task. Task pods have the |
| 89 | +following labels: |
| 90 | +- app:`tackle` |
| 91 | +- role:`task` |
| 92 | +- task: _id_ |
| 93 | + |
| 94 | +The manager injects a few environment variables: |
| 95 | + |
| 96 | +| Name | Definition | |
| 97 | +|--------------|----------------------------------------------------------------------------------------------------| |
| 98 | +| ADDON_HOME | Path to an EmptyDir mounted as the working directory. (default: /addon) | |
| 99 | +| SHARED_PATH | Path to an EmptyDir mounted in all containers within the pod for sharing files. (default: /shared) | |
| 100 | +| CACHE_PATH | Path to a volume mounted in all containers in all pods for cached files. (default: /cache) | |
| 101 | +| HUB_BASE_URL | The hub API base url. | |
| 102 | +| TASK | The task id (to be acted on). | |
| 103 | +| TOKEN | An authentication token for the hub API. | |
| 104 | + |
| 105 | +#### Retention #### |
| 106 | + |
| 107 | +The pod associated with completed task is retained for a defined duration. After |
| 108 | +which, the pod is deleted to prevent leaking pod resources indefinitely. |
| 109 | + |
| 110 | +| State | Retention (default) | |
| 111 | +|-----------|---------------------| |
| 112 | +| Succeeded | 1 (minute) | |
| 113 | +| Failed | 72 (hour) | |
| 114 | + |
| 115 | +#### Containers #### |
| 116 | + |
| 117 | +The pod is created with a _main_ container (0) for the selected addon using the image |
| 118 | +defined by the Addon CR. Additional _sidecar_ containers are created for each extension |
| 119 | +selected as defined by the Extension CR. After the _main_ (addon) container has terminated, |
| 120 | +the manager will kill extension contains should they not terminate on their own. This is to |
| 121 | +ensure complete termination of the pod after the addon container has terminated. |
| 122 | + |
| 123 | +#### Log Collection #### |
| 124 | + |
| 125 | +The manager _tails_ the log for each contain in the task pod. Each is stored as `File` in the |
| 126 | +inventory and associated with the task as an attachment. The file is named using the |
| 127 | +convention of the _container-name_.yaml. |
| 128 | + |
| 129 | +## Task ## |
| 130 | + |
| 131 | +Tasks are used to execute Addons. |
| 132 | + |
| 133 | +### Properties ### |
| 134 | + |
| 135 | +`*` indicates reported by addon. |
| 136 | + |
| 137 | +| Name | Definition | |
| 138 | +|-------------|--------------------------------------------------------------------------------------------------------------------| |
| 139 | +| ID | Unique identifier. | |
| 140 | +| CreateTime | The timestamp of when the task was created. | |
| 141 | +| CreateUser | The user (name) that created the task. | |
| 142 | +| UpdateUser | The user (name) that last updated the task. | |
| 143 | +| Name | The task mame (non-unique). | |
| 144 | +| Kind | The kind references a Task (kind) CR by name. | |
| 145 | +| Addon | The addon to be executed. References an Addon CR by name. When not specified, the addon is selected based on kind. | |
| 146 | +| Extension | The list of extensions to be injected into the addon pod as _sidecar_ containers. | |
| 147 | +| State | The task state. See: _States_. | |
| 148 | +| Locator | The task locator. An arbitrary user-defined value used for lookup. | |
| 149 | +| Priority | The task execution priority. See: _Priority_. | |
| 150 | +| Policy | The task execution policy. Determines when task is postponed. See: _Policy_. | |
| 151 | +| TTL | The task Time-To-Live in each state. See: _TTL_. | |
| 152 | +| Data | The data provided to the addon. The schema is dictated by each addon. This may be _ANY_ document. | |
| 153 | +| Started | The UTC timestamp when the task execution started. | |
| 154 | +| Terminated | The UTC timestamp when execution completed. | |
| 155 | +| Errors | A list of reported errors. See: _Errors_. | |
| 156 | +| Events | A list of reported task processing events. See: _Events_. | |
| 157 | +| Pod | The fully qualified name of the pod created. | |
| 158 | +| Retries | The number of times failure to create a pod is retried. This does not include when blocked by resource quota. | |
| 159 | +| Attached | Files attached to the task. | |
| 160 | +| \*Activity | The activity (log) entries are reported by the addon. Intended to reflect what the addon is doing. | |
| 161 | +| \*Total | Progress: The total number of items to be completed by the addon. | |
| 162 | +| \*Completed | Progress: The number of items completed by the addon. | | |
| 163 | + |
| 164 | +### Events ### |
| 165 | + |
| 166 | +Task events are used to record and report events related to task lifecycle and scheduling. |
| 167 | + |
| 168 | +Fields: |
| 169 | +- **Kind** - kind of event. |
| 170 | +- **Count**: number of times the event is reported. |
| 171 | +- **Reason** - The reason or cause of the event. |
| 172 | +- **Last** - Timestamp when last reported. |
| 173 | + |
| 174 | +| Event | Meaning | |
| 175 | +|-------------------|-------------------------------------------------------| |
| 176 | +| AddonSelected | An addon has been selected. | |
| 177 | +| ExtensionSelected | An extension has been selected. | |
| 178 | +| ImageError | The pod (k8s) reported an image error. | |
| 179 | +| PodNotFound | The pod associated with a running pod does not exist. | |
| 180 | +| PodCreated | A pod has been created. | |
| 181 | +| PodPending | Pod (k8s) reported phase=Pending. | |
| 182 | +| PodRunning | The pod (k8s) reported phase=Running. | |
| 183 | +| Preempted | The task has been preempted by the manager. | |
| 184 | +| PodSucceeded | The pod (k8s) has reported phase=Succeeded. | |
| 185 | +| PodFailed | The pod (k8s) has reported phase=Error | |
| 186 | +| PodDeleted | The pod has been deleted. | |
| 187 | +| Escalated | The manager has escalated the task priority. | |
| 188 | +| Released | The task's resources have been released. | |
| 189 | +| ContainerKilled | The specified (zombie) container needed to be killed. | |
| 190 | + |
| 191 | +### Errors ### |
| 192 | + |
| 193 | +Task errors are used to report problems with scheduling for execution. |
| 194 | + |
| 195 | +Fields: |
| 196 | +- **Severity** - Error severity. The values are at the discretion of the reporter. |
| 197 | +- **Description** - Error description. Format: (_reporter_) _description_. |
| 198 | + |
| 199 | +Note: A task may complete with a state=Succeeded with errors. |
| 200 | + |
| 201 | +### States ### |
| 202 | + |
| 203 | +`*` indicates _terminal_ states. |
| 204 | + |
| 205 | +| State | Definition | |
| 206 | +|:-------------|:----------------------------------------------------------------------------------------------| |
| 207 | +| Created | The task has been created but not submitted. | |
| 208 | +| Ready | The task has been submitted to the manager and will be scheduled for execution. | |
| 209 | +| Postponed | The task has been postponed until another task has completed based on task scheduling _policy_. | |
| 210 | +| QuotaBlocked | The task pod has been (temporarily) prevented from being created by k8s resource quota. | |
| 211 | +| Pending | The task pod has been created and awaiting k8s scheduling. | |
| 212 | +| Running | The task pod is running. | |
| 213 | +| \*Succeeded | The task pod successfully completed. | |
| 214 | +| \*Failed | The task pod either failed to be started by k8s or completed with errors. | |
| 215 | +| \*Canceled | The task has been canceled. | |
| 216 | + |
| 217 | + |
| 218 | +### Policies ### |
| 219 | + |
| 220 | +The task supports policies designed to influence scheduling. |
| 221 | + |
| 222 | +| Name | Definition | |
| 223 | +|----------------|----------------------------------------------------------| |
| 224 | +| Isolated | ALL other tasks are postponed while the task is running. | |
| 225 | +| PreemptEnabled | When (true), the task _may_ trigger preemption. | |
| 226 | +| PreemptExempt | When (true), the task may NOT be preempted. | |
| 227 | + |
| 228 | + |
| 229 | +### TTL (Time-To-Live) ### |
| 230 | + |
| 231 | +The TTL determines how long at task may exist in a given state before the task |
| 232 | +or associated resources are reaped. |
| 233 | + |
| 234 | +## (Task) Kinds ### |
| 235 | + |
| 236 | +The `Task` CR defines a name kind of task. Each kind may define: |
| 237 | +- **Priority** - The default priority. |
| 238 | +- **Dependencies** - List of dependencies (other task kinds). When created/ready concurrent, |
| 239 | + A task's dependencies must complete before the task is scheduled. |
| 240 | +- **Metadata** - **TBD**. |
| 241 | + |
| 242 | +## Addons ## |
| 243 | + |
| 244 | +An `Addon` CR defines a named addon (aka plugin). It defines functionality provide by an image to |
| 245 | +be executed. The definition includes a container specification and selection criteria. An addon |
| 246 | +may have extensions. See: _Extensions_. |
| 247 | + |
| 248 | +### Selection ### |
| 249 | + |
| 250 | +When a task is created, either the `kind` or the `addon` may be specified. When the |
| 251 | +`addon` is specified, the addon is selected by matching the name. When the `kind` is specified, |
| 252 | +the addon is selected by matching the `Addon.Task` and evaluating the `Addon.Selector`. |
| 253 | + |
| 254 | +## Extensions ## |
| 255 | + |
| 256 | +An extension defines an additional _sidecar_ container to be included in the task pod. |
| 257 | + |
| 258 | +### Selection ### |
| 259 | + |
| 260 | +When a task is created, it may define a list of extensions. When specified, addons are |
| 261 | +selected by name. When not specified, addons are selected by matching the `Extension.Addon` |
| 262 | +and evaluating the `Extension.Selector`. The selector includes logical `||` and `&&` operators |
| 263 | +and `()` parens for grouping expressions. |
| 264 | + |
| 265 | +Supported selector: |
| 266 | +- tag:_category_=_tag_ - match application tags. |
| 267 | + ```yaml |
| 268 | + spec: |
| 269 | + addon: ^(analyzer|tech-discovery)$ |
| 270 | + selector: tag:Language=Java |
| 271 | + ``` |
| 272 | + |
| 273 | +## Authorization ## |
| 274 | + |
| 275 | +When the task pod is created and _Auth_ is enabled, a token is generated with the |
| 276 | +necessary scopes. The token is mounted as a secret in the pod. The token is only |
| 277 | +valid while the task is running. |
| 278 | + |
| 279 | +## Reaping ### |
| 280 | + |
| 281 | +A task may be reaped after existing in a state for the defined duration. |
| 282 | +This is to prevent orphaned or stuck tasks from leaking resources such as buckets and files. |
| 283 | + |
| 284 | +| State | Duration (default) | Action | |
| 285 | +|-----------|--------------------|----------| |
| 286 | +| Created | 72 (hour) | Deleted | |
| 287 | +| Succeeded | 72 (hour) | Deleted | |
| 288 | +| Failed | 30 (day) | Released | |
| 289 | + |
0 commit comments