본문 바로가기
kubernetes

Kubernetes Containerd 데이터 경로 변경

by aws-evan 2024. 8. 9.
반응형
반응형

 

 

  • 환경
    • OS Centos 7.9
    • Kubernetes Version : 1.29
    • Containerd Version : 1.6.33
  • 설치 주의
    • NVIDIA Plugin 설치 후 데이터 경로 변경 시에는 서버들 재부팅 필요
    • 데이터 경로 변경하고 NVIDIA Plugin 설치
 

Installing the NVIDIA Container Toolkit — NVIDIA Container Toolkit 1.16.0 documentation

You installed a supported container engine (Docker, Containerd, CRI-O, Podman).

docs.nvidia.com

 

 

GitHub - NVIDIA/k8s-device-plugin: NVIDIA device plugin for Kubernetes

NVIDIA device plugin for Kubernetes. Contribute to NVIDIA/k8s-device-plugin development by creating an account on GitHub.

github.com

 

 

1. Containerd 데이터 경로 변경

  • Containerd Socket 경로변경
    • Socket 변경할 경로 생성
    • root, state 폴더는 반드시 구분하여 생성하기
    • 예시 구조
      • lib = /var/lib/containerd
      • state = /run/containerd

 

  • 기본 config.toml 파일
disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
temp = ""
version = 2

[cgroup]
  path = ""

[debug]
  address = ""
  format = ""
  gid = 0
  level = ""
  uid = 0

[grpc]
  address = "/run/containerd/containerd.sock"
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216
  tcp_address = ""
  tcp_tls_ca = ""
  tcp_tls_cert = ""
  tcp_tls_key = ""
  uid = 0

 

 

  • 변경 config.toml 파일
disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/data/containerd/lib"
state = "/data/containerd/state"
temp = ""
version = 2

[cgroup]
  path = ""

[debug]
  address = ""
  format = ""
  gid = 0
  level = ""
  uid = 0

[grpc]
  address = "/data/containerd/state/containerd.sock"
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216
  tcp_address = ""
  tcp_tls_ca = ""
  tcp_tls_cert = ""
  tcp_tls_key = ""
  uid = 0

 

 

  • 서비스 재시작
systemctl restart containerd

 

  • kubelet Socket 경로 변경
vi /var/lib/kubelet/kubeadm-flags.env
##기존 코드
KUBELET_KUBEADM_ARGS="--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.9"


##변경 코드
KUBELET_KUBEADM_ARGS="--container-runtime-endpoint=unix:///var/"변경된 소켓주소" --pod-infra-container-image=registry.k8s.io/pause:3.9"

 

  • 기존 코드

 

  • 변경 코드

 

  • 서비스 재시작
systemctl restart kubelet

 

 


2. Containerd GPU 설정

 

  • 패키지 설치
      • GPU WorkerNode에만 설치하기
      • 프로덕션 저장 등록  
    curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
      sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
      • NVIDIA Containerd ToolKit 설치  
    sudo yum install -y nvidia-container-toolkit

 

  • Containerd 설정
    • 기존 COnfig.toml
          [plugins."io.containerd.grpc.v1.cri".containerd]
            default_runtime_name = "runc"
            disable_snapshot_annotations = true
            discard_unpacked_layers = false
            ignore_rdt_not_enabled_errors = false
            no_pivot = false
            snapshotter = "overlayfs"
      
            [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
              base_runtime_spec = ""
              cni_conf_dir = ""
              cni_max_conf_num = 0
              container_annotations = []
              pod_annotations = []
              privileged_without_host_devices = false
              runtime_engine = ""
              runtime_path = ""
              runtime_root = ""
              runtime_type = ""
      
              [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]
      
            [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
      
              [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
                base_runtime_spec = ""
                cni_conf_dir = ""
                cni_max_conf_num = 0
                container_annotations = []
                pod_annotations = []
                privileged_without_host_devices = false
                runtime_engine = ""
                runtime_path = ""
                runtime_root = ""
                runtime_type = "io.containerd.runc.v2"
        • 변경 Config,toml
          [plugins."io.containerd.grpc.v1.cri".containerd]
            default_runtime_name = "nvidia"
            #default_runtime_name = "runc"
            disable_snapshot_annotations = true
            discard_unpacked_layers = false
            ignore_rdt_not_enabled_errors = false
            no_pivot = false
            snapshotter = "overlayfs"
      
            [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
              base_runtime_spec = ""
              cni_conf_dir = ""
              cni_max_conf_num = 0
              container_annotations = []
              pod_annotations = []
              privileged_without_host_devices = false
              runtime_engine = ""
              runtime_path = ""
              runtime_root = ""
              runtime_type = ""
      
              [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]
      
            [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
              [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
                privileged_without_host_devices = false
                runtime_engine = ""
                runtime_root = ""
                runtime_type = "io.containerd.runc.v2"
                [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
                  BinaryName = "/usr/bin/nvidia-container-runtime"
      
              [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
                base_runtime_spec = ""
                cni_conf_dir = ""
                cni_max_conf_num = 0
                container_annotations = []
                pod_annotations = []
                privileged_without_host_devices = false
                runtime_engine = ""
  • NVIDIA Plugin 설치(마스터)
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.3/nvidia-device-plugin.yml

 

  • GPU 코어 상태 확인(마스터)
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"

 

  • GPU 코어 할당 테스트
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  restartPolicy: Never
  containers:
    - name: cuda-container
      image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
      command:
        - sleep
      args:
        - '1000'
      resources:
        limits:
          nvidia.com/gpu: '1'

 

  • GPU 할당 확인
kubectl describe nodes "GPU 노드 이름"

반응형

'kubernetes' 카테고리의 다른 글

Kubernetes Containerd GPU 사용 방법  (0) 2024.08.09
Kubernetes CronJob  (0) 2024.07.05
Kubernetes Postgres DataBase Backup  (0) 2024.07.05
k3s 설치 가이드  (0) 2024.06.13
Kubernetes EFK Helm  (0) 2024.05.03