使用YOLOv8在Label Studio的自定义数据集上进行训练/推理

Vang-z约 1999 字大约 7 分钟...

摘要

书接上回(使用Label Studio标注YOLOv8数据集), 本文将介绍如何使用YOLOv8来训练Label Studio标注的自定义数据集, 以及如何使用Python将推理的视频结果保存至本地。

下载/安装

本文章的内容是建立在上篇文章的相关基础上的, 因此上篇文章提到过的虚拟环境相关内容这里就不再做过多赘述。

(必选) 打开终端进入虚拟环境后输入pip install ultralytics安装YOLOv8。

导入数据集

(必选) 将上篇文章的数据集导入到项目的dataset目录, 这里为了方便下文演示, 我将其替换成了一个bilibili的【海底世界】珊瑚鱼群视频文件制作的数据集, 视频内容, 数据集配置文件, 目录结构如下所示:

文件内容

# dataset path
train: ./images/train
val: ./images/val
test: ./images/test

# number of classes
nc: 1

# class names
names: [
  "fish",
]

目录结构

.
│
└─dataset
  └─fish
    │ classes.txt
    │ data.yaml
    │ notes.json
    │
    ├─images
    │  ├─test
    │  │   ...
    │  │
    │  ├─train
    │  │   ...
    │  │
    │  └─val
    │      ...
    │
    └─labels
        ├─test
        │   ...
        │
        ├─train
        │   ...
        │
        └─val
            ...

注意

演示视频只针对鱼类进行识别, 因此只有1个分类, 其他数据集可自行修改分类数量。

创建配置文件

(必选) 新建fish.yaml文件用于配置训练的基本信息。

# Ultralytics YOLO 🚀, AGPL-3.0 license
# Default training settings and hyperparameters for medium-augmentation COCO training

task: track  # (str) YOLO task, i.e. detect, segment, classify, pose
mode: train  # (str) YOLO mode, i.e. train, val, predict, export, track, benchmark

# Train settings -------------------------------------------------------------------------------------------------------
model:  # (str, optional) path to model file, i.e. yolov8n.pt, yolov8n.yaml
data:  # (str, optional) path to data file, i.e. coco128.yaml
epochs: 100  # (int) number of epochs to train for
patience: 100  # (int) epochs to wait for no observable improvement for early stopping of training
time: # (float, optional) number of hours to train for, overrides epochs if supplied
batch: -1  # (int) number of images per batch (-1 for AutoBatch)
imgsz: 640  # (int | list) input images size as int for train and val modes, or list[w,h] for predict and export modes
save: True  # (bool) save train checkpoints and predict results
save_period: -1 # (int) Save checkpoint every x epochs (disabled if < 1)
cache: False  # (bool) True/ram, disk or False. Use cache for data loading
device:  # (int | str | list, optional) device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu
workers: 8  # (int) number of worker threads for data loading (per RANK if DDP)
project:  # (str, optional) project name
name:  # (str, optional) experiment name, results saved to 'project/name' directory
exist_ok: False  # (bool) whether to overwrite existing experiment
pretrained: True  # (bool | str) whether to use a pretrained model (bool) or a model to load weights from (str)
optimizer: auto  # (str) optimizer to use, choices=[SGD, Adam, Adamax, AdamW, NAdam, RAdam, RMSProp, auto]
verbose: True  # (bool) whether to print verbose output
seed: 0  # (int) random seed for reproducibility
deterministic: True  # (bool) whether to enable deterministic mode
single_cls: False  # (bool) train multi-class data as single-class
rect: False  # (bool) rectangular training if mode='train' or rectangular validation if mode='val'
cos_lr: False  # (bool) use cosine learning rate scheduler
close_mosaic: 10  # (int) disable mosaic augmentation for final epochs (0 to disable)
resume: False  # (bool) resume training from last checkpoint
amp: True  # (bool) Automatic Mixed Precision (AMP) training, choices=[True, False], True runs AMP check
fraction: 1.0  # (float) dataset fraction to train on (default is 1.0, all images in train set)
profile: False  # (bool) profile ONNX and TensorRT speeds during training for loggers
freeze: None  # (int | list, optional) freeze first n layers, or freeze list of layer indices during training
multi_scale: False # (bool) Whether to use multiscale during training
# Segmentation
overlap_mask: True  # (bool) masks should overlap during training (segment train only)
mask_ratio: 4  # (int) mask downsample ratio (segment train only)
# Classification
dropout: 0.0  # (float) use dropout regularization (classify train only)

# Val/Test settings ----------------------------------------------------------------------------------------------------
val: True  # (bool) validate/test during training
split: val  # (str) dataset split to use for validation, i.e. 'val', 'test' or 'train'
save_json: False  # (bool) save results to JSON file
save_hybrid: False  # (bool) save hybrid version of labels (labels + additional predictions)
conf:  # (float, optional) object confidence threshold for detection (default 0.25 predict, 0.001 val)
iou: 0.7  # (float) intersection over union (IoU) threshold for NMS
max_det: 300  # (int) maximum number of detections per image
half: False  # (bool) use half precision (FP16)
dnn: False  # (bool) use OpenCV DNN for ONNX inference
plots: True  # (bool) save plots during train/val

# Prediction settings --------------------------------------------------------------------------------------------------
source: # (str, optional) source directory for images or videos
vid_stride: 1 # (int) video frame-rate stride
stream_buffer: False # (bool) buffer all streaming frames (True) or return the most recent frame (False)
visualize: False # (bool) visualize model features
augment: False # (bool) apply image augmentation to prediction sources
agnostic_nms: False # (bool) class-agnostic NMS
classes: # (int | list[int], optional) filter results by class, i.e. classes=0, or classes=[0,2,3]
retina_masks: False # (bool) use high-resolution segmentation masks
embed: # (list[int], optional) return feature vectors/embeddings from given layers

# Visualize settings ---------------------------------------------------------------------------------------------------
show: False # (bool) show predicted images and videos if environment allows
save_frames: False # (bool) save predicted individual video frames
save_txt: False # (bool) save results as .txt file
save_conf: False # (bool) save results with confidence scores
save_crop: False # (bool) save cropped images with results
show_labels: True # (bool) show prediction labels, i.e. 'person'
show_conf: True # (bool) show prediction confidence, i.e. '0.99'
show_boxes: True # (bool) show prediction boxes
line_width: # (int, optional) line width of the bounding boxes. Scaled to image size if None.

# Export settings ------------------------------------------------------------------------------------------------------
format: torchscript # (str) format to export to, choices at https://docs.ultralytics.com/modes/export/#export-formats
keras: False # (bool) use Kera=s
optimize: False # (bool) TorchScript: optimize for mobile
int8: False # (bool) CoreML/TF INT8 quantization
dynamic: False # (bool) ONNX/TF/TensorRT: dynamic axes
simplify: False # (bool) ONNX: simplify model using `onnxslim`
opset: # (int, optional) ONNX: opset version
workspace: 4 # (int) TensorRT: workspace size (GB)
nms: False # (bool) CoreML: add NMS

# Hyperparameters ------------------------------------------------------------------------------------------------------
lr0: 0.01 # (float) initial learning rate (i.e. SGD=1E-2, Adam=1E-3)
lrf: 0.01 # (float) final learning rate (lr0 * lrf)
momentum: 0.937 # (float) SGD momentum/Adam beta1
weight_decay: 0.0005 # (float) optimizer weight decay 5e-4
warmup_epochs: 3.0 # (float) warmup epochs (fractions ok)
warmup_momentum: 0.8 # (float) warmup initial momentum
warmup_bias_lr: 0.1 # (float) warmup initial bias lr
box: 7.5 # (float) box loss gain
cls: 0.5 # (float) cls loss gain (scale with pixels)
dfl: 1.5 # (float) dfl loss gain
pose: 12.0 # (float) pose loss gain
kobj: 1.0 # (float) keypoint obj loss gain
label_smoothing: 0.0 # (float) label smoothing (fraction)
nbs: 64 # (int) nominal batch size
hsv_h: 0.015 # (float) image HSV-Hue augmentation (fraction)
hsv_s: 0.7 # (float) image HSV-Saturation augmentation (fraction)
hsv_v: 0.4 # (float) image HSV-Value augmentation (fraction)
degrees: 0.0 # (float) image rotation (+/- deg)
translate: 0.1 # (float) image translation (+/- fraction)
scale: 0.5 # (float) image scale (+/- gain)
shear: 0.0 # (float) image shear (+/- deg)
perspective: 0.0 # (float) image perspective (+/- fraction), range 0-0.001
flipud: 0.0 # (float) image flip up-down (probability)
fliplr: 0.5 # (float) image flip left-right (probability)
bgr: 0.0 # (float) image channel BGR (probability)
mosaic: 1.0 # (float) image mosaic (probability)
mixup: 0.0 # (float) image mixup (probability)
copy_paste: 0.0 # (float) segment copy-paste (probability)
auto_augment: randaugment # (str) auto augmentation policy for classification (randaugment, autoaugment, augmix)
erasing: 0.4 # (float) probability of random erasing during classification training (0-0.9), 0 means no erasing, must be less than 1.0.
crop_fraction: 1.0 # (float) image crop fraction for classification (0.1-1), 1.0 means no crop, must be greater than 0.

# Custom config.yaml ---------------------------------------------------------------------------------------------------
cfg: # (str, optional) for overriding defaults.yaml

# Tracker settings ------------------------------------------------------------------------------------------------------
tracker: botsort.yaml # (str) tracker type, choices=[botsort.yaml, bytetrack.yaml]

训练模型

(必选) 新建并运行train.py文件用于模型训练, 训练完成后会在根目录下的runs -> detect -> train -> weights文件夹中生成训练后的最优权重best.pt。

from ultralytics import YOLO
from pathlib import Path

if __name__ == '__main__':
    BASE_PATH = Path.cwd()

    model = YOLO('yolov8m.pt')
    # 此处的 fish.yaml 为上方创建的配置文件, data.yaml 为数据集的配置文件
    model.train(cfg=f'{BASE_PATH}/fish.yaml', data=f'{BASE_PATH}/dataset/fish/data.yaml')

注意

运行代码前需要在终端执行yolo settings dvc=False, 否则会在训练完成后报错, 或者将项目上传至git仓库也可以解决报错。

推理

(必选) 新建并运行inference.py文件用于模型推理, 推理完成后会在根目录生成推理结果: inference.mp4。

from ultralytics import YOLO
from pathlib import Path
import cv2
from collections import defaultdict
import time

if __name__ == '__main__':
    BASE_PATH = Path.cwd()

    model = YOLO(f'{BASE_PATH}/cfg/models/v8/yolov8m-fish.yaml')
    model.load(f'{BASE_PATH}/runs/detect/train/weights/best.pt')
    model.model.names[0] = "fish"
    print(model.model.names)
    # video = 0
    video = f'{BASE_PATH}/videos/【海底世界】珊瑚鱼群_哔哩哔哩_bilibili.mp4720.mp4'
    cap = cv2.VideoCapture(video)
    track_history = defaultdict(lambda: [])

    fps = cap.get(cv2.CAP_PROP_FPS)
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

    output_path = f'{BASE_PATH}/inference.mp4'
    fourcc = cv2.VideoWriter_fourcc(*'avc1')
    out = cv2.VideoWriter(output_path, fourcc, fps, (frame_width, frame_height))

    fps = 0.0
    while cap.isOpened():
        success, frame = cap.read()
        if success:
            t1 = time.time()
            results = model.track(frame, persist=False, verbose=False)
            annotated_frame = results[0].plot()
            fps = (fps + (1. / (time.time() - t1))) / 2
            annotated_frame = cv2.putText(annotated_frame, f"fps: {int(fps)}", (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1,
                                          (0, 255, 0), 2)
            out.write(annotated_frame)
            cv2.imshow("inference", annotated_frame)
            if cv2.waitKey(1) & 0xFF == 27:
                break
        else:
            break
    cap.release()
    out.release()
    cv2.destroyAllWindows()

至此, 使用YOLOv8在Label Studio的自定义数据集上进行训练/推理任务就完成了。

AliPay
WeChat

昵称

邮箱(可选)

按正序
按倒序
按热度

使用YOLOv8在Label Studio的自定义数据集上进行训练/推理

摘要

下载/安装

导入数据集

创建配置文件

训练模型

推理

预览: