Skip to content

Commit 5040af2

Browse files
authored
👻 Add task package README. (#785)
Signed-off-by: Jeff Ortel <jortel@redhat.com>
1 parent d438f39 commit 5040af2

File tree

1 file changed

+289
-0
lines changed

1 file changed

+289
-0
lines changed

task/README.md

+289
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
2+
## Manager ##
3+
4+
### Processing ###
5+
6+
The manager processes tasks (default: 1 second) in a _main_ loop.
7+
1. Fetch cluster resources using a k8s cached client.
8+
2. Process queued task delete and cancel requests.
9+
3. Delete orphaned pods. Orphans are pod within the namespace with task
10+
labels that does not correspond to a task in the running state.
11+
4. Fetch running tasks; update their status based on the associated pod/container status.
12+
5. Kill zombies. Zombies are _sidecar_ containers that have not terminated on their own after the
13+
_main_ (addon) container has terminated.
14+
6. Fetch and run new (state=Ready) tasks:
15+
1. select addon. See: _Addons.Selection_.
16+
2. select extensions. See: _Extensions.Selection_.
17+
3. create pod. See: _Pods_.
18+
19+
### Priority ###
20+
21+
Tasks with state=Ready are started based on their `Priority` property.
22+
Priority zero(0) is the lowest and the default. There is no maximum. The manager process tasks ordered by
23+
priority. As a result, task pods are created in the order of priority. However, after the pod is created,
24+
the pod scheduling order is at the discretion of the k8s node-scheduler. To maximize the influence of task
25+
priority ordering, it is highly recommended for administrators to create a k8s _Resource Quota_ in the
26+
namespace to restrict the number of pods created.
27+
28+
### Resource Quota ###
29+
30+
When a pod cannot be created due to quota restriction, the manager sets the state=_QuotaBlocked_
31+
and a _QuotaBlocked_ event is reported on the task. The manager will attempt to create the pod in
32+
every processing cycle.
33+
34+
### Priority Escalation ###
35+
36+
A task's priority may be escalated (increased) when one of its dependencies is also in the
37+
set of tasks ready to be started. The goal is to prevent lower priority dependencies
38+
from impeding higher priority tasks.
39+
40+
Example:
41+
42+
- Task id=10 (kind=A) (priority=0)
43+
- Task id=12 (kind=B) (priority=1) depends on: `A`
44+
45+
When scheduling both tasks, task(12) cannot run until task(10) has completed. This
46+
condition effectively makes task(12) priority=0. To prevent this, the manager
47+
will _escalate_ task(10) priority=1 to match task(12).
48+
49+
### Preemption ###
50+
51+
To prevent priority inversions, the manager supports preempting a _running_ task so that a
52+
higher priority _ready_ task pod may be scheduled. Preemption is the act of killing (deleting)
53+
the pod of a _running_ task, so that the higher _blocked_ task may be created/scheduled
54+
by the node-scheduler. A task is considered _blocked_ when it cannot be created due to
55+
a resource quota (state=QuotaBlocked) or cannot be scheduled by the node-scheduler
56+
(state=Pending) for a defined duration (default: 1 minute).
57+
To trigger preemption, the _blocked_ task must have Policy.PreemptEnabled=TRUE. When
58+
the need for preemption is detected, the manger will preempt a percentage (default: 10%) of the
59+
newest, lower priority tasks processing cycle. To prevent _thrashing_ a preempted task will
60+
be postponed for a defined duration (default: 1 minute).
61+
When a task is preempted:
62+
1. The pod is deleted.
63+
2. The task state is reset to Ready.
64+
3. A `Preempted` event is recorded.
65+
66+
### Macros ###
67+
68+
The manager supports injecting values into Addon and Extension specifications.
69+
Each macro has the syntax of: ${_name_}.
70+
71+
Supported:
72+
73+
- ${**seq**:_pool_} - Number sequence generator. The _pool_ is the identifier and the beginning number.
74+
Example usage for network port assignment:
75+
```yaml
76+
PORT_A: ${seq:8000}
77+
PORT_B: ${seq:8000}
78+
```
79+
Results in:
80+
```yaml
81+
PORT_A: 8000
82+
PORT_B: 8001
83+
```
84+
85+
### Pods ###
86+
87+
Tasks are executed using Kubernetes Pods. When a task is _state=Ready_ to run, the
88+
manager creates a Pod resource which is associated to the task. Task pods have the
89+
following labels:
90+
- app:`tackle`
91+
- role:`task`
92+
- task: _id_
93+
94+
The manager injects a few environment variables:
95+
96+
| Name | Definition |
97+
|--------------|----------------------------------------------------------------------------------------------------|
98+
| ADDON_HOME | Path to an EmptyDir mounted as the working directory. (default: /addon) |
99+
| SHARED_PATH | Path to an EmptyDir mounted in all containers within the pod for sharing files. (default: /shared) |
100+
| CACHE_PATH | Path to a volume mounted in all containers in all pods for cached files. (default: /cache) |
101+
| HUB_BASE_URL | The hub API base url. |
102+
| TASK | The task id (to be acted on). |
103+
| TOKEN | An authentication token for the hub API. |
104+
105+
#### Retention ####
106+
107+
The pod associated with completed task is retained for a defined duration. After
108+
which, the pod is deleted to prevent leaking pod resources indefinitely.
109+
110+
| State | Retention (default) |
111+
|-----------|---------------------|
112+
| Succeeded | 1 (minute) |
113+
| Failed | 72 (hour) |
114+
115+
#### Containers ####
116+
117+
The pod is created with a _main_ container (0) for the selected addon using the image
118+
defined by the Addon CR. Additional _sidecar_ containers are created for each extension
119+
selected as defined by the Extension CR. After the _main_ (addon) container has terminated,
120+
the manager will kill extension contains should they not terminate on their own. This is to
121+
ensure complete termination of the pod after the addon container has terminated.
122+
123+
#### Log Collection ####
124+
125+
The manager _tails_ the log for each contain in the task pod. Each is stored as `File` in the
126+
inventory and associated with the task as an attachment. The file is named using the
127+
convention of the _container-name_.yaml.
128+
129+
## Task ##
130+
131+
Tasks are used to execute Addons.
132+
133+
### Properties ###
134+
135+
`*` indicates reported by addon.
136+
137+
| Name | Definition |
138+
|-------------|--------------------------------------------------------------------------------------------------------------------|
139+
| ID | Unique identifier. |
140+
| CreateTime | The timestamp of when the task was created. |
141+
| CreateUser | The user (name) that created the task. |
142+
| UpdateUser | The user (name) that last updated the task. |
143+
| Name | The task mame (non-unique). |
144+
| Kind | The kind references a Task (kind) CR by name. |
145+
| Addon | The addon to be executed. References an Addon CR by name. When not specified, the addon is selected based on kind. |
146+
| Extension | The list of extensions to be injected into the addon pod as _sidecar_ containers. |
147+
| State | The task state. See: _States_. |
148+
| Locator | The task locator. An arbitrary user-defined value used for lookup. |
149+
| Priority | The task execution priority. See: _Priority_. |
150+
| Policy | The task execution policy. Determines when task is postponed. See: _Policy_. |
151+
| TTL | The task Time-To-Live in each state. See: _TTL_. |
152+
| Data | The data provided to the addon. The schema is dictated by each addon. This may be _ANY_ document. |
153+
| Started | The UTC timestamp when the task execution started. |
154+
| Terminated | The UTC timestamp when execution completed. |
155+
| Errors | A list of reported errors. See: _Errors_. |
156+
| Events | A list of reported task processing events. See: _Events_. |
157+
| Pod | The fully qualified name of the pod created. |
158+
| Retries | The number of times failure to create a pod is retried. This does not include when blocked by resource quota. |
159+
| Attached | Files attached to the task. |
160+
| \*Activity | The activity (log) entries are reported by the addon. Intended to reflect what the addon is doing. |
161+
| \*Total | Progress: The total number of items to be completed by the addon. |
162+
| \*Completed | Progress: The number of items completed by the addon. | |
163+
164+
### Events ###
165+
166+
Task events are used to record and report events related to task lifecycle and scheduling.
167+
168+
Fields:
169+
- **Kind** - kind of event.
170+
- **Count**: number of times the event is reported.
171+
- **Reason** - The reason or cause of the event.
172+
- **Last** - Timestamp when last reported.
173+
174+
| Event | Meaning |
175+
|-------------------|-------------------------------------------------------|
176+
| AddonSelected | An addon has been selected. |
177+
| ExtensionSelected | An extension has been selected. |
178+
| ImageError | The pod (k8s) reported an image error. |
179+
| PodNotFound | The pod associated with a running pod does not exist. |
180+
| PodCreated | A pod has been created. |
181+
| PodPending | Pod (k8s) reported phase=Pending. |
182+
| PodRunning | The pod (k8s) reported phase=Running. |
183+
| Preempted | The task has been preempted by the manager. |
184+
| PodSucceeded | The pod (k8s) has reported phase=Succeeded. |
185+
| PodFailed | The pod (k8s) has reported phase=Error |
186+
| PodDeleted | The pod has been deleted. |
187+
| Escalated | The manager has escalated the task priority. |
188+
| Released | The task's resources have been released. |
189+
| ContainerKilled | The specified (zombie) container needed to be killed. |
190+
191+
### Errors ###
192+
193+
Task errors are used to report problems with scheduling for execution.
194+
195+
Fields:
196+
- **Severity** - Error severity. The values are at the discretion of the reporter.
197+
- **Description** - Error description. Format: (_reporter_) _description_.
198+
199+
Note: A task may complete with a state=Succeeded with errors.
200+
201+
### States ###
202+
203+
`*` indicates _terminal_ states.
204+
205+
| State | Definition |
206+
|:-------------|:----------------------------------------------------------------------------------------------|
207+
| Created | The task has been created but not submitted. |
208+
| Ready | The task has been submitted to the manager and will be scheduled for execution. |
209+
| Postponed | The task has been postponed until another task has completed based on task scheduling _policy_. |
210+
| QuotaBlocked | The task pod has been (temporarily) prevented from being created by k8s resource quota. |
211+
| Pending | The task pod has been created and awaiting k8s scheduling. |
212+
| Running | The task pod is running. |
213+
| \*Succeeded | The task pod successfully completed. |
214+
| \*Failed | The task pod either failed to be started by k8s or completed with errors. |
215+
| \*Canceled | The task has been canceled. |
216+
217+
218+
### Policies ###
219+
220+
The task supports policies designed to influence scheduling.
221+
222+
| Name | Definition |
223+
|----------------|----------------------------------------------------------|
224+
| Isolated | ALL other tasks are postponed while the task is running. |
225+
| PreemptEnabled | When (true), the task _may_ trigger preemption. |
226+
| PreemptExempt | When (true), the task may NOT be preempted. |
227+
228+
229+
### TTL (Time-To-Live) ###
230+
231+
The TTL determines how long at task may exist in a given state before the task
232+
or associated resources are reaped.
233+
234+
## (Task) Kinds ###
235+
236+
The `Task` CR defines a name kind of task. Each kind may define:
237+
- **Priority** - The default priority.
238+
- **Dependencies** - List of dependencies (other task kinds). When created/ready concurrent,
239+
A task's dependencies must complete before the task is scheduled.
240+
- **Metadata** - **TBD**.
241+
242+
## Addons ##
243+
244+
An `Addon` CR defines a named addon (aka plugin). It defines functionality provide by an image to
245+
be executed. The definition includes a container specification and selection criteria. An addon
246+
may have extensions. See: _Extensions_.
247+
248+
### Selection ###
249+
250+
When a task is created, either the `kind` or the `addon` may be specified. When the
251+
`addon` is specified, the addon is selected by matching the name. When the `kind` is specified,
252+
the addon is selected by matching the `Addon.Task` and evaluating the `Addon.Selector`.
253+
254+
## Extensions ##
255+
256+
An extension defines an additional _sidecar_ container to be included in the task pod.
257+
258+
### Selection ###
259+
260+
When a task is created, it may define a list of extensions. When specified, addons are
261+
selected by name. When not specified, addons are selected by matching the `Extension.Addon`
262+
and evaluating the `Extension.Selector`. The selector includes logical `||` and `&&` operators
263+
and `()` parens for grouping expressions.
264+
265+
Supported selector:
266+
- tag:_category_=_tag_ - match application tags.
267+
```yaml
268+
spec:
269+
addon: ^(analyzer|tech-discovery)$
270+
selector: tag:Language=Java
271+
```
272+
273+
## Authorization ##
274+
275+
When the task pod is created and _Auth_ is enabled, a token is generated with the
276+
necessary scopes. The token is mounted as a secret in the pod. The token is only
277+
valid while the task is running.
278+
279+
## Reaping ###
280+
281+
A task may be reaped after existing in a state for the defined duration.
282+
This is to prevent orphaned or stuck tasks from leaking resources such as buckets and files.
283+
284+
| State | Duration (default) | Action |
285+
|-----------|--------------------|----------|
286+
| Created | 72 (hour) | Deleted |
287+
| Succeeded | 72 (hour) | Deleted |
288+
| Failed | 30 (day) | Released |
289+

0 commit comments

Comments
 (0)