@@ -91,10 +91,12 @@ To enable the DRA feature, you must enable the following feature gates and API g
9191
9292<!-- lessoncontent -->
9393
94- ## Explore the DRA initial state
94+ ## Explore the initial cluster state {#explore-initial-state}
9595
96- With no driver installed or Pod claims yet to satisfy, you can observe the
97- initial state of a cluster with DRA enabled.
96+ You can spend some time to observe the initial state of a cluster with DRA
97+ enabled, especially if you have not used these APIs extensively before. If you
98+ set up a new cluster for this tutorial, with no driver installed and no Pod
99+ claims yet to satisfy, the output of these commands won't show any resources.
98100
991011 . Get a list of {{< glossary_tooltip text="DeviceClasses" term_id="deviceclass" >}}:
100102
@@ -106,10 +108,6 @@ initial state of a cluster with DRA enabled.
106108 No resources found
107109 ```
108110
109- If you set up a new blank cluster for this tutorial, it' s normal to find that
110- there are no DeviceClasses. [Learn more about DeviceClasses
111- here.](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#deviceclass)
112-
1131111. Get a list of {{< glossary_tooltip text=" ResourceSlices" term_id=" resourceslice" > }}:
114112
115113 ` ` ` shell
@@ -120,11 +118,7 @@ initial state of a cluster with DRA enabled.
120118 No resources found
121119 ```
122120
123- If you set up a new blank cluster for this tutorial, it' s normal to find that
124- there are no ResourceSlices advertised. [Learn more about ResourceSlices
125- here.](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/# resourceslice)
126-
127- 1. View {{< glossary_tooltip text=" ResourceClaims" term_id=" resourceclaim" > }} and {{<
121+ 1. Get a list of {{< glossary_tooltip text=" ResourceClaims" term_id=" resourceclaim" > }} and {{<
128122glossary_tooltip text=" ResourceClaimTemplates" term_id=" resourceclaimtemplate"
129123> }}
130124
@@ -138,12 +132,6 @@ glossary_tooltip text="ResourceClaimTemplates" term_id="resourceclaimtemplate"
138132 No resources found
139133 ```
140134
141- If you set up a new blank cluster for this tutorial, it' s normal to find that
142- there are no ResourceClaims or ResourceClaimTemplates as you, the user, have
143- not created any. [Learn more about ResourceClaims and ResourceClaimTemplates
144- here.](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resourceclaims-templates)
145-
146-
147135At this point, you have confirmed that DRA is enabled and configured properly in
148136the cluster, and that no DRA drivers have advertised any resources to the DRA
149137APIs yet.
@@ -158,15 +146,22 @@ selection of the nodes (using {{< glossary_tooltip text="selectors"
158146term_id="selector" >}} or similar mechanisms) in your cluster.
159147
160148Check your driver' s documentation for specific installation instructions, which
161- may include a Helm chart, a set of manifests, or other deployment tooling.
149+ might include a Helm chart, a set of manifests, or other deployment tooling.
162150
163151This tutorial uses an example driver which can be found in the
164152[kubernetes-sigs/dra-example-driver](https://github.com/kubernetes-sigs/dra-example-driver)
165- repository to demonstrate driver installation.
153+ repository to demonstrate driver installation. This example driver advertises
154+ simulated GPUs to Kubernetes for your Pods to interact with.
166155
167- ### Prepare your cluster for driver installation
156+ # ## Prepare your cluster for driver installation {#prepare-cluster-driver}
157+
158+ To simplify cleanup, create a namespace named dra-tutorial:
159+
160+ 1. Create the namespace:
168161
169- To make it easier to cleanup later, create a namespace called `dra-tutorial` in your cluster.
162+ ` ` ` shell
163+ kubectl create namespace dra-tutorial
164+ ` ` `
170165
171166In a production environment, you would likely be using a previously released or
172167qualified image from the driver vendor or your own organization, and your nodes
@@ -175,12 +170,6 @@ hosted. In this tutorial, you will use a publicly released image of the
175170dra-example-driver to simulate access to a DRA driver image.
176171
177172
178- 1. Create the namespace:
179-
180- ```shell
181- kubectl create namespace dra-tutorial
182- ```
183-
1841731. Confirm your nodes have access to the image by running the following
185174from within one of your cluster' s nodes:
186175
@@ -231,12 +220,10 @@ on this cluster:
231220 ```
232221
2332221. Create a {{< glossary_tooltip term_id="priority-class" >}} for the DRA
234- driver. The DRA driver component is responsible for important lifecycle
235- operations for Pods with claims, so you don' t want it to be preempted. Learn
236- more about [pod priority and preemption
237- here](/docs/concepts/scheduling-eviction/pod-priority-preemption/). Learn
238- more about [good practices when maintaining a DRA driver
239- here](/docs/concepts/cluster-administration/dra/).
223+ driver. The PriorityClass prevents preemption of th DRA driver component,
224+ which is responsible for important lifecycle operations for Pods with
225+ claims. Learn more about [pod priority and preemption
226+ here](/docs/concepts/scheduling-eviction/pod-priority-preemption/).
240227
241228 {{% code_sample language="yaml" file="dra/driver-install/priorityclass.yaml" %}}
242229
@@ -245,21 +232,22 @@ on this cluster:
245232 ```
246233
2472341. Deploy the actual DRA driver as a DaemonSet configured to run the example
248- driver binary with the permissions provisioned above.
235+ driver binary with the permissions provisioned above. The DaemonSet has the
236+ permissions that you granted to the ServiceAccount in the previous steps.
249237
250238 {{% code_sample language="yaml" file="dra/driver-install/daemonset.yaml" %}}
251239
252240 ```shell
253241 kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/daemonset.yaml
254242 ```
255- It is configured with
243+ The DaemonSet is configured with
256244 the volume mounts necessary to interact with the underlying Container Device
257- Interface (CDI) directory, and to expose its socket to kubelet via the
258- kubelet plugins directory.
245+ Interface (CDI) directory, and to expose its socket to ` kubelet` via the
246+ ` kubelet/ plugins` directory.
259247
260- ### Verify the DRA driver installation
248+ ### Verify the DRA driver installation {#verify-driver-install}
261249
262- 1. Observe the Pods of the DRA driver DaemonSet across all worker nodes:
250+ 1. Get a list of the Pods of the DRA driver DaemonSet across all worker nodes:
263251
264252 ```shell
265253 kubectl get pod -l app.kubernetes.io/name=dra-example-driver -n dra-tutorial
@@ -293,7 +281,7 @@ At this point, you have successfully installed the example DRA driver, and
293281confirmed its initial configuration. You' re now ready to use DRA to schedule
294282Pods.
295283
296- ## Claim resources and deploy a Pod
284+ ## Claim resources and deploy a Pod {#claim-resources-pod}
297285
298286To request resources using DRA, you create ResourceClaims or
299287ResourceClaimTemplates that define the resources that your Pods need. In the
@@ -309,12 +297,11 @@ learn more about ResourceClaims.
309297
310298### Create the ResourceClaim
311299
312- The Pod manifest itself will include a reference to its relevant ResourceClaim
313- object, which you will create now. Whatever the claim, the `deviceClassName` is
314- a required field, narrowing down the scope of the request to a specific device
315- class. The request itself can include a {{< glossary_tooltip term_id="cel" >}}
316- expression that references attributes that may be advertised by the driver
317- managing that device class.
300+ In this section, you create a ResourceClaim and reference it in a Pod. Whatever
301+ the claim, the `deviceClassName` is a required field, narrowing down the scope
302+ of the request to a specific device class. The request itself can include a {{<
303+ glossary_tooltip term_id="cel" >}} expression that references attributes that
304+ may be advertised by the driver managing that device class.
318305
319306In this example, you will create a request for any GPU advertising over 10Gi
320307memory capacity. The attribute exposing capacity from the example driver takes
@@ -341,20 +328,6 @@ underlying container.
341328kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/example/pod.yaml
342329` ` `
343330
344- # ## Explore the DRA state
345-
346- The cluster now tries to schedule that Pod to a node where Kubernetes can
347- satisfy the ResourceClaim. In our situation, the DRA driver is deployed on all
348- nodes, and is advertising mock GPUs on all nodes, all of which have enough
349- capacity advertised to satisfy the Pod' s claim, so this Pod may be scheduled to
350- any node and any of the mock GPUs on that node may be allocated.
351-
352- The mock GPU driver injects environment variables in each container it is
353- allocated to in order to indicate which GPUs _would_ have been injected into
354- them by a real resource driver and how they would have been configured, so you
355- can check those environment variables to see how the Pods have been handled by
356- the system.
357-
3583311. Confirm the pod has deployed:
359332
360333 ` ` ` shell
@@ -367,7 +340,22 @@ the system.
367340 pod0 1/1 Running 0 9s
368341 ```
369342
370- 1. Observe the pod logs which report the name of the mock GPU allocated:
343+ # ## Explore the DRA state
344+
345+ After you create the Pod, the cluster tries to schedule that Pod to a node where
346+ Kubernetes can satisfy the ResourceClaim. In this tutorial, the DRA driver is
347+ deployed on all nodes, and is advertising mock GPUs on all nodes, all of which
348+ have enough capacity advertised to satisfy the Pod' s claim, so Kubernetes can
349+ schedule this Pod on any node and can allocate any of the mock GPUs on that
350+ node.
351+
352+ When Kubernetes allocates a mock GPU to a Pod, the example driver adds
353+ environment variables in each container it is allocated to in order to indicate
354+ which GPUs _would_ have been injected into them by a real resource driver and
355+ how they would have been configured, so you can check those environment
356+ variables to see how the Pods have been handled by the system.
357+
358+ 1. Check the Pod logs, which report the name of the mock GPU that was allocated:
371359
372360 ```shell
373361 kubectl logs pod0 -c ctr0 -n dra-tutorial | grep -E "GPU_DEVICE_[0-9]+=" | grep -v "RESOURCE_CLAIM"
@@ -378,10 +366,7 @@ the system.
378366 declare -x GPU_DEVICE_4="gpu-4"
379367 ```
380368
381- 1. Observe the ResourceClaim object:
382-
383- You can observe the ResourceClaim more closely, first only to see its state
384- is allocated and reserved.
369+ 1. Check the state of the ResourceClaim object:
385370
386371 ```shell
387372 kubectl get resourceclaims -n dra-tutorial
@@ -394,9 +379,12 @@ the system.
394379 some-gpu allocated,reserved 34s
395380 ```
396381
397- Looking deeper at the `some-gpu` ResourceClaim, you can see that the status
398- stanza includes information about the device that has been allocated and for
399- what pod it has been reserved for:
382+ In this output, the `STATE` column shows that the ResourceClaim is allocated
383+ and reserved.
384+
385+ 1. Check the details of the `some-gpu` ResourceClaim. The `status` stanza of
386+ the ResourceClaim has information about the allocated device and the Pod it
387+ has been reserved for:
400388
401389 ```shell
402390 kubectl get resourceclaim some-gpu -n dra-tutorial -o yaml
@@ -453,8 +441,8 @@ the system.
453441 resourceVersion: ""
454442 {{< /highlight >}}
455443
456- 1. Observe the driver by checking the pod logs for pods backing the driver
457- daemonset :
444+ 1. To check how the driver handled device allocation, get the logs for the
445+ driver DaemonSet Pods :
458446
459447 ```shell
460448 kubectl logs -l app.kubernetes.io/name=dra-example-driver -n dra-tutorial
@@ -466,17 +454,16 @@ the system.
466454 I0729 05:11:52.684450 1 driver.go:112] Returning newly prepared devices for claim ' 79e1e8d8-7e53-4362-aad1-eca97678339e' : [&Device{RequestNames:[some-gpu],PoolName:kind-worker,DeviceName:gpu-4,CDIDeviceIDs:[k8s.gpu.example.com/gpu=common k8s.gpu.example.com/gpu=79e1e8d8-7e53-4362-aad1-eca97678339e-gpu-4],}]
467455 ```
468456
469- You have now successfully deployed a Pod with a DRA based claim, and seen it
470- scheduled to an appropriate node and the associated DRA APIs updated to reflect
471- its status.
457+ You have now successfully deployed a Pod that claims devices using DRA, verified
458+ that the Pod was scheduled to an appropriate node, and saw that the associated
459+ DRA APIs kinds were updated with the allocation status.
472460
473- ## Remove the Pod with a claim
461+ ## Delete a Pod that has a claim {#delete-pod-claim}
474462
475463When a Pod with a claim is deleted, the DRA driver deallocates the resource so
476- it can be available for future scheduling. You can observe that by deleting our
477- pod with a claim and seeing that the state of the ResourceClaim changes.
478-
479- ### Delete the pod using the resource claim
464+ it can be available for future scheduling. To validate this behavior, delete the
465+ Pod that you created in the previous steps and watch the corresponding changes
466+ to the ResourceClaim and driver.
480467
4814681. Delete the `pod0` Pod:
482469
0 commit comments