반응형
반응형
- 환경
- OS Centos 7.9
- Kubernetes Version : 1.29
- Containerd Version : 1.6.33
- 설치 주의
- NVIDIA Plugin 설치 후 데이터 경로 변경 시에는 서버들 재부팅 필요
- 데이터 경로 변경하고 NVIDIA Plugin 설치
- 참고 문서
1. Containerd 데이터 경로 변경
- Containerd Socket 경로변경
- Socket 변경할 경로 생성
- root, state 폴더는 반드시 구분하여 생성하기
- 예시 구조
- lib = /var/lib/containerd
- state = /run/containerd
- 기본 config.toml 파일
disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
temp = ""
version = 2
[cgroup]
path = ""
[debug]
address = ""
format = ""
gid = 0
level = ""
uid = 0
[grpc]
address = "/run/containerd/containerd.sock"
gid = 0
max_recv_message_size = 16777216
max_send_message_size = 16777216
tcp_address = ""
tcp_tls_ca = ""
tcp_tls_cert = ""
tcp_tls_key = ""
uid = 0
- 변경 config.toml 파일
disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/data/containerd/lib"
state = "/data/containerd/state"
temp = ""
version = 2
[cgroup]
path = ""
[debug]
address = ""
format = ""
gid = 0
level = ""
uid = 0
[grpc]
address = "/data/containerd/state/containerd.sock"
gid = 0
max_recv_message_size = 16777216
max_send_message_size = 16777216
tcp_address = ""
tcp_tls_ca = ""
tcp_tls_cert = ""
tcp_tls_key = ""
uid = 0
- 서비스 재시작
systemctl restart containerd
- kubelet Socket 경로 변경
vi /var/lib/kubelet/kubeadm-flags.env
##기존 코드
KUBELET_KUBEADM_ARGS="--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.9"
##변경 코드
KUBELET_KUBEADM_ARGS="--container-runtime-endpoint=unix:///var/"변경된 소켓주소" --pod-infra-container-image=registry.k8s.io/pause:3.9"
- 기존 코드
- 변경 코드
- 서비스 재시작
systemctl restart kubelet
2. Containerd GPU 설정
- 패키지 설치
- GPU WorkerNode에만 설치하기
- 프로덕션 저장 등록
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \ sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
- NVIDIA Containerd ToolKit 설치
sudo yum install -y nvidia-container-toolkit
- Containerd 설정
- 기존 COnfig.toml
[plugins."io.containerd.grpc.v1.cri".containerd] default_runtime_name = "runc" disable_snapshot_annotations = true discard_unpacked_layers = false ignore_rdt_not_enabled_errors = false no_pivot = false snapshotter = "overlayfs" [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime] base_runtime_spec = "" cni_conf_dir = "" cni_max_conf_num = 0 container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_path = "" runtime_root = "" runtime_type = "" [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] base_runtime_spec = "" cni_conf_dir = "" cni_max_conf_num = 0 container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_path = "" runtime_root = "" runtime_type = "io.containerd.runc.v2"
- 변경 Config,toml
[plugins."io.containerd.grpc.v1.cri".containerd] default_runtime_name = "nvidia" #default_runtime_name = "runc" disable_snapshot_annotations = true discard_unpacked_layers = false ignore_rdt_not_enabled_errors = false no_pivot = false snapshotter = "overlayfs" [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime] base_runtime_spec = "" cni_conf_dir = "" cni_max_conf_num = 0 container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_path = "" runtime_root = "" runtime_type = "" [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options] BinaryName = "/usr/bin/nvidia-container-runtime" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] base_runtime_spec = "" cni_conf_dir = "" cni_max_conf_num = 0 container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = ""
- 기존 COnfig.toml
- NVIDIA Plugin 설치(마스터)
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.3/nvidia-device-plugin.yml
- GPU 코어 상태 확인(마스터)
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
- GPU 코어 할당 테스트
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
command:
- sleep
args:
- '1000'
resources:
limits:
nvidia.com/gpu: '1'
- GPU 할당 확인
kubectl describe nodes "GPU 노드 이름"
반응형
'kubernetes' 카테고리의 다른 글
Kubernetes Nginx Ingress Controller & Ingress 설치 및 구성 (0) | 2024.12.03 |
---|---|
Kubernetes Containerd GPU 사용 방법 (0) | 2024.08.09 |
Kubernetes CronJob (0) | 2024.07.05 |
Kubernetes Postgres DataBase Backup (0) | 2024.07.05 |
k3s 설치 가이드 (0) | 2024.06.13 |
댓글