
Enter project description
주요 수정 사항#
- pytorch 2.6 버전에서 trainig, validation기능이 작동할 수 있도록 torch.load 에
weights_only=False
로 일단 잡아둠 - 기타 업데이트된 라이브러리로 인한 자잘한 충돌 여러가지 수정 (특히 PIL)
- protomask 처리 부분 오류 수정 (하드코드되어있었음)
- tiny와 small 크기의 segmentation pre-trained 모델 추가
- 가중치 필요시 윤영준 주임에게 요청
기준 tiny small 640x640 이미지 기준 13.2GFLOPs 30GFLOPs SAMA-COCO dataset 기준 42.2mAP 47.3mAP epoch 320 67
- SAMA-COCO는 COCO 데이터셋의 오류를 보완하여 한 번 더 검수 및 레이블링 작업을 한 데이터셋. mask에 대한 refining을 진행하고, 특히, CROWDED로 한번에 묶어서 처리된 군중/밀집된 개체들의 레이블링 데이터를 전부 개별로 풀어두었다. 특히 이들 가중치들은 사람에 대하여 훌륭한 인식율을 보인다. (60% 이상)
- 그러나 이는 데이터의 불균형을 더 심화하는데 일조하였으며, 이를 해결하기 위해서는 추가적인 데이터나 특정 테스크로의 transfer-learning, 혹은 data-augmentation 파이프라인 수정을 통한 숫자가 적은 개체에 대한 대응이 필요하다.
from | n | params | module | arguments |
---|---|---|---|---|
-1 | 1 | 464 | models.common.Conv | [3, 16, 3, 2] |
-1 | 1 | 4672 | models.common.Conv | [16, 32, 3, 2] |
-1 | 1 | 7872 | models.common.ELAN1 | [32, 32, 32, 16] |
-1 | 1 | 18560 | models.common.AConv | [32, 64] |
-1 | 1 | 65216 | models.common.RepNCSPELAN4 | [64, 64, 64, 32, 3] |
-1 | 1 | 55488 | models.common.AConv | [64, 96] |
-1 | 1 | 145824 | models.common.RepNCSPELAN4 | [96, 96, 96, 48, 3] |
-1 | 1 | 110848 | models.common.AConv | [96, 128] |
-1 | 1 | 258432 | models.common.RepNCSPELAN4 | [128, 128, 128, 64, 3] |
-1 | 1 | 41344 | models.common.SPPELAN | [128, 128, 64] |
-1 | 1 | 0 | torch.nn.modules.upsampling.Upsample | [None, 2, 'nearest'] |
[-1, 6] | 1 | 0 | models.common.Concat | [1] |
-1 | 1 | 158112 | models.common.RepNCSPELAN4 | [224, 96, 96, 48, 3] |
-1 | 1 | 0 | torch.nn.modules.upsampling.Upsample | [None, 2, 'nearest'] |
[-1, 4] | 1 | 0 | models.common.Concat | [1] |
-1 | 1 | 71360 | models.common.RepNCSPELAN4 | [160, 64, 64, 32, 3] |
-1 | 1 | 27744 | models.common.AConv | [64, 48] |
[-1, 12] | 1 | 0 | models.common.Concat | [1] |
-1 | 1 | 150432 | models.common.RepNCSPELAN4 | [144, 96, 96, 48, 3] |
-1 | 1 | 55424 | models.common.AConv | [96, 64] |
[-1, 9] | 1 | 0 | models.common.Concat | [1] |
-1 | 1 | 266624 | models.common.RepNCSPELAN4 | [192, 128, 128, 64, 3] |
15 | 1 | 45376 | models.common.RepNCSPELAN4 | [64, 64, 64, 32, 1] |
-1 | 1 | 0 | torch.nn.modules.upsampling.Upsample | [None, 2, 'nearest'] |
-1 | 1 | 55488 | models.common.Conv | [64, 96, 3, 1] |
[15, 18, 21, 24] | 1 | 1055184 | models.yolo.DSegment | [80, 16, 128, [64, 96, 128, 96]] |
gelan-t-seg_v1 summary: 1015 layers, 2594464 parameters, 2594448 gradients, 13.8 GFLOPs
yolov9-t (13 GFLOPs)#
Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 79/79 00:38
all 5000 58597 0.559 0.4 0.422 0.301 0.557 0.38 0.4 0.251
person 5000 19918 0.713 0.532 0.609 0.393 0.702 0.499 0.565 0.309
bicycle 5000 445 0.514 0.238 0.239 0.137 0.461 0.209 0.198 0.0841
car 5000 2825 0.663 0.426 0.484 0.302 0.661 0.405 0.453 0.243
motorcycle 5000 668 0.679 0.322 0.388 0.232 0.613 0.277 0.329 0.172
airplane 5000 186 0.73 0.575 0.633 0.506 0.707 0.543 0.577 0.386
bus 5000 447 0.626 0.456 0.499 0.414 0.649 0.45 0.489 0.374
train 5000 370 0.679 0.451 0.496 0.386 0.671 0.432 0.48 0.366
truck 5000 602 0.378 0.229 0.218 0.142 0.4 0.224 0.215 0.13
boat 5000 473 0.535 0.29 0.333 0.197 0.512 0.26 0.295 0.145
traffic light 5000 524 0.557 0.365 0.396 0.217 0.552 0.342 0.368 0.165
fire hydrant 5000 97 0.695 0.722 0.747 0.576 0.699 0.711 0.729 0.557
stop sign 5000 56 0.711 0.768 0.778 0.728 0.723 0.768 0.777 0.705
parking meter 5000 53 0.636 0.491 0.534 0.417 0.656 0.491 0.534 0.416
bench 5000 770 0.489 0.16 0.181 0.118 0.481 0.144 0.157 0.0765
bird 5000 587 0.529 0.4 0.431 0.25 0.527 0.368 0.389 0.203
cat 5000 242 0.724 0.678 0.717 0.607 0.738 0.674 0.725 0.595
dog 5000 274 0.585 0.526 0.52 0.43 0.604 0.526 0.526 0.403
horse 5000 526 0.527 0.39 0.404 0.272 0.539 0.373 0.389 0.212
sheep 5000 462 0.555 0.57 0.564 0.373 0.569 0.558 0.548 0.302
cow 5000 500 0.517 0.464 0.476 0.325 0.5 0.428 0.441 0.27
elephant 5000 488 0.607 0.605 0.624 0.459 0.605 0.584 0.605 0.392
bear 5000 158 0.711 0.348 0.395 0.33 0.729 0.342 0.393 0.314
zebra 5000 428 0.757 0.6 0.681 0.491 0.755 0.577 0.651 0.406
giraffe 5000 354 0.825 0.647 0.729 0.568 0.822 0.63 0.702 0.449
backpack 5000 533 0.367 0.126 0.141 0.0707 0.386 0.122 0.139 0.0646
umbrella 5000 613 0.6 0.354 0.38 0.25 0.64 0.356 0.39 0.248
handbag 5000 729 0.415 0.126 0.139 0.0829 0.457 0.128 0.145 0.0723
tie 5000 290 0.723 0.334 0.387 0.272 0.732 0.324 0.373 0.236
suitcase 5000 409 0.536 0.286 0.324 0.212 0.554 0.273 0.32 0.188
frisbee 5000 131 0.68 0.616 0.659 0.518 0.708 0.595 0.657 0.457
skis 5000 459 0.549 0.266 0.303 0.153 0.548 0.244 0.267 0.0853
snowboard 5000 74 0.442 0.311 0.33 0.241 0.466 0.297 0.312 0.177
sports ball 5000 221 0.631 0.466 0.489 0.352 0.627 0.448 0.468 0.258
kite 5000 257 0.513 0.611 0.564 0.387 0.501 0.564 0.499 0.26
baseball bat 5000 254 0.527 0.259 0.269 0.184 0.535 0.256 0.262 0.142
baseball glove 5000 146 0.657 0.445 0.469 0.31 0.679 0.438 0.48 0.282
skateboard 5000 226 0.645 0.506 0.522 0.347 0.64 0.496 0.507 0.238
surfboard 5000 409 0.624 0.377 0.411 0.273 0.632 0.364 0.392 0.223
tennis racket 5000 339 0.656 0.445 0.5 0.359 0.645 0.422 0.45 0.28
bottle 5000 1345 0.567 0.338 0.375 0.237 0.565 0.314 0.354 0.192
wine glass 5000 449 0.666 0.265 0.341 0.219 0.7 0.267 0.331 0.181
cup 5000 937 0.467 0.386 0.382 0.269 0.483 0.379 0.378 0.249
fork 5000 271 0.443 0.25 0.265 0.195 0.389 0.199 0.198 0.0829
knife 5000 315 0.39 0.157 0.157 0.1 0.363 0.133 0.126 0.0696
spoon 5000 306 0.294 0.101 0.1 0.0595 0.292 0.0882 0.0741 0.0372
bowl 5000 659 0.5 0.355 0.346 0.257 0.434 0.294 0.256 0.142
banana 5000 1910 0.599 0.491 0.496 0.278 0.53 0.41 0.381 0.158
apple 5000 773 0.404 0.445 0.32 0.207 0.41 0.426 0.308 0.157
sandwich 5000 141 0.444 0.404 0.352 0.262 0.493 0.411 0.377 0.252
orange 5000 550 0.516 0.503 0.505 0.396 0.529 0.482 0.484 0.345
broccoli 5000 436 0.467 0.475 0.441 0.253 0.471 0.448 0.425 0.23
carrot 5000 1532 0.56 0.319 0.342 0.177 0.511 0.269 0.282 0.121
hot dog 5000 109 0.339 0.459 0.38 0.261 0.293 0.367 0.285 0.171
pizza 5000 312 0.68 0.593 0.623 0.515 0.686 0.582 0.614 0.481
donut 5000 499 0.532 0.495 0.51 0.389 0.525 0.471 0.493 0.331
cake 5000 438 0.474 0.342 0.352 0.223 0.476 0.324 0.349 0.21
chair 5000 3480 0.54 0.275 0.311 0.181 0.522 0.249 0.265 0.111
couch 5000 523 0.471 0.314 0.32 0.241 0.451 0.283 0.3 0.195
potted plant 5000 360 0.395 0.222 0.212 0.1 0.386 0.194 0.185 0.0692
bed 5000 310 0.547 0.294 0.326 0.237 0.551 0.277 0.309 0.189
dining table 5000 1615 0.446 0.285 0.283 0.191 0.449 0.264 0.258 0.136
toilet 5000 226 0.664 0.58 0.603 0.499 0.693 0.584 0.601 0.466
tv 5000 305 0.65 0.61 0.642 0.496 0.661 0.597 0.629 0.438
laptop 5000 288 0.65 0.503 0.543 0.457 0.65 0.486 0.519 0.3
mouse 5000 112 0.594 0.661 0.669 0.534 0.624 0.651 0.665 0.475
remote 5000 279 0.494 0.208 0.24 0.149 0.486 0.19 0.222 0.121
keyboard 5000 238 0.68 0.474 0.566 0.414 0.691 0.459 0.559 0.394
cell phone 5000 327 0.518 0.339 0.335 0.243 0.497 0.312 0.318 0.218
microwave 5000 58 0.585 0.535 0.578 0.483 0.592 0.534 0.589 0.447
oven 5000 149 0.548 0.369 0.396 0.27 0.572 0.362 0.388 0.222
toaster 5000 12 0.773 0.288 0.352 0.202 0.766 0.277 0.352 0.192
sink 5000 252 0.556 0.44 0.473 0.345 0.574 0.444 0.472 0.308
refrigerator 5000 157 0.609 0.459 0.498 0.392 0.623 0.452 0.495 0.36
book 5000 1082 0.398 0.308 0.283 0.152 0.31 0.217 0.173 0.0692
clock 5000 281 0.647 0.584 0.581 0.422 0.668 0.581 0.587 0.392
vase 5000 642 0.512 0.24 0.269 0.167 0.53 0.234 0.257 0.147
scissors 5000 43 0.39 0.209 0.209 0.188 0.372 0.186 0.187 0.0807
teddy bear 5000 254 0.668 0.449 0.505 0.346 0.654 0.423 0.481 0.303
hair drier 5000 12 0 0 0.0607 0.0286 0 0 0.0508 0.0167
toothbrush 5000 67 0.517 0.224 0.262 0.15 0.491 0.164 0.227 0.104
yolov8-n (12.6 GFLOPs), ultralytics (do not use for any production, AGPL)#
Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 313/313 [00:22<00:00, 13.90
all 5000 58597 0.57 0.386 0.41 0.295 0.579 0.368 0.396 0.253
person 2771 19918 0.727 0.405 0.489 0.328 0.732 0.386 0.465 0.266
bicycle 156 445 0.447 0.229 0.224 0.128 0.442 0.204 0.201 0.0767
car 575 2825 0.607 0.401 0.44 0.276 0.617 0.377 0.417 0.222
motorcycle 164 668 0.644 0.333 0.374 0.22 0.649 0.311 0.352 0.175
airplane 98 186 0.697 0.602 0.659 0.527 0.724 0.608 0.651 0.431
bus 190 447 0.704 0.445 0.482 0.378 0.712 0.437 0.474 0.356
train 155 370 0.752 0.389 0.432 0.32 0.759 0.381 0.426 0.323
truck 230 602 0.426 0.261 0.249 0.16 0.443 0.251 0.246 0.145
boat 111 473 0.464 0.339 0.299 0.174 0.467 0.302 0.285 0.139
traffic light 170 524 0.499 0.437 0.443 0.252 0.497 0.399 0.396 0.175
fire hydrant 81 97 0.801 0.753 0.795 0.638 0.809 0.743 0.788 0.6
stop sign 53 56 0.581 0.768 0.773 0.712 0.611 0.768 0.773 0.681
parking meter 32 53 0.731 0.604 0.664 0.558 0.765 0.604 0.682 0.535
bench 218 770 0.469 0.153 0.161 0.104 0.47 0.144 0.149 0.0741
bird 132 587 0.643 0.354 0.418 0.265 0.65 0.327 0.39 0.209
cat 185 242 0.723 0.702 0.734 0.592 0.741 0.698 0.734 0.601
dog 184 274 0.66 0.567 0.603 0.497 0.678 0.558 0.597 0.47
horse 130 526 0.61 0.312 0.346 0.225 0.611 0.308 0.332 0.176
sheep 57 462 0.54 0.535 0.517 0.363 0.548 0.517 0.5 0.29
cow 83 500 0.623 0.489 0.507 0.365 0.606 0.454 0.478 0.3
elephant 90 488 0.796 0.506 0.556 0.416 0.815 0.507 0.559 0.37
bear 53 158 0.759 0.379 0.402 0.341 0.749 0.361 0.385 0.324
zebra 86 428 0.763 0.558 0.618 0.466 0.775 0.547 0.601 0.398
giraffe 104 354 0.851 0.579 0.636 0.501 0.843 0.571 0.627 0.425
backpack 261 533 0.393 0.144 0.144 0.0777 0.393 0.124 0.139 0.0712
umbrella 175 613 0.601 0.367 0.392 0.246 0.639 0.37 0.406 0.256
handbag 334 729 0.429 0.133 0.151 0.0872 0.497 0.137 0.158 0.0808
tie 142 290 0.574 0.341 0.382 0.257 0.603 0.331 0.372 0.225
suitcase 102 409 0.507 0.301 0.353 0.24 0.527 0.286 0.344 0.222
frisbee 84 131 0.754 0.641 0.683 0.543 0.762 0.611 0.674 0.437
skis 114 459 0.239 0.102 0.0929 0.0395 0.236 0.0915 0.0637 0.0167
snowboard 45 74 0.507 0.351 0.358 0.259 0.478 0.311 0.329 0.188
sports ball 152 221 0.578 0.516 0.522 0.367 0.568 0.475 0.463 0.228
kite 80 257 0.466 0.646 0.546 0.369 0.461 0.599 0.503 0.254
baseball bat 101 254 0.486 0.24 0.232 0.129 0.565 0.252 0.259 0.13
baseball glove 97 146 0.583 0.507 0.513 0.337 0.572 0.466 0.493 0.292
skateboard 127 226 0.628 0.54 0.541 0.369 0.596 0.509 0.533 0.26
surfboard 147 409 0.521 0.328 0.318 0.196 0.589 0.34 0.342 0.166
tennis racket 168 339 0.665 0.433 0.46 0.298 0.677 0.419 0.44 0.281
bottle 414 1345 0.6 0.36 0.402 0.257 0.598 0.325 0.378 0.212
wine glass 115 449 0.613 0.276 0.342 0.214 0.673 0.272 0.334 0.181
cup 383 937 0.497 0.412 0.417 0.303 0.507 0.392 0.405 0.269
fork 148 271 0.507 0.296 0.293 0.207 0.5 0.262 0.241 0.108
knife 168 315 0.375 0.184 0.172 0.105 0.341 0.152 0.143 0.0791
spoon 143 306 0.327 0.127 0.117 0.0738 0.32 0.111 0.0972 0.0495
bowl 269 659 0.457 0.412 0.383 0.286 0.391 0.331 0.261 0.151
banana 97 1910 0.374 0.0545 0.132 0.0697 0.333 0.044 0.0944 0.0436
apple 65 773 0.462 0.0899 0.19 0.127 0.46 0.0828 0.176 0.1
sandwich 75 141 0.375 0.496 0.389 0.289 0.413 0.496 0.408 0.287
orange 79 550 0.601 0.299 0.414 0.326 0.628 0.291 0.405 0.295
broccoli 61 436 0.542 0.346 0.384 0.225 0.561 0.326 0.374 0.203
carrot 68 1532 0.557 0.122 0.21 0.119 0.59 0.11 0.191 0.0935
hot dog 37 109 0.551 0.541 0.54 0.372 0.532 0.45 0.46 0.292
pizza 142 312 0.67 0.606 0.645 0.525 0.695 0.59 0.641 0.499
donut 50 499 0.622 0.419 0.481 0.38 0.632 0.395 0.469 0.337
cake 123 438 0.575 0.352 0.411 0.268 0.596 0.336 0.412 0.264
chair 601 3480 0.564 0.216 0.27 0.169 0.54 0.187 0.235 0.102
couch 203 523 0.447 0.251 0.247 0.177 0.47 0.241 0.245 0.145
potted plant 163 360 0.364 0.289 0.227 0.0971 0.354 0.25 0.191 0.0681
bed 136 310 0.449 0.274 0.272 0.181 0.476 0.269 0.268 0.155
dining table 411 1615 0.261 0.115 0.0966 0.0526 0.189 0.0774 0.0598 0.0244
toilet 143 226 0.631 0.588 0.632 0.542 0.664 0.597 0.631 0.524
tv 209 305 0.624 0.636 0.653 0.491 0.633 0.61 0.631 0.435
laptop 181 288 0.599 0.535 0.558 0.464 0.629 0.521 0.548 0.324
mouse 87 112 0.615 0.686 0.687 0.522 0.619 0.67 0.673 0.467
remote 122 279 0.368 0.247 0.23 0.146 0.39 0.232 0.226 0.119
keyboard 149 238 0.696 0.501 0.556 0.406 0.712 0.475 0.549 0.378
cell phone 212 327 0.471 0.313 0.328 0.229 0.504 0.307 0.329 0.209
microwave 55 58 0.512 0.569 0.611 0.507 0.54 0.552 0.611 0.465
oven 97 149 0.558 0.499 0.504 0.345 0.592 0.49 0.499 0.31
toaster 10 12 1 0.185 0.501 0.327 1 0.165 0.501 0.303
sink 182 252 0.534 0.464 0.465 0.329 0.572 0.462 0.488 0.315
refrigerator 95 157 0.646 0.523 0.548 0.433 0.658 0.503 0.533 0.396
book 217 1082 0.413 0.166 0.201 0.106 0.337 0.107 0.123 0.0489
clock 207 281 0.622 0.598 0.612 0.436 0.628 0.58 0.612 0.386
vase 230 642 0.567 0.22 0.263 0.174 0.586 0.212 0.255 0.15
scissors 29 43 0.425 0.256 0.252 0.201 0.466 0.256 0.252 0.129
teddy bear 96 254 0.638 0.437 0.489 0.368 0.672 0.42 0.473 0.33
hair drier 6 12 1 0 0.0339 0.00601 1 0 0.0362 0.0106
toothbrush 32 67 0.429 0.209 0.161 0.0991 0.448 0.206 0.167 0.0828
yolov11-n (10.2 GFLOPs) (do not use for any production, AGPL)#
Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 313/313 [00:23<00:00, 13.09
all 5000 58597 0.594 0.392 0.424 0.309 0.601 0.376 0.41 0.265
person 2771 19918 0.74 0.392 0.487 0.33 0.749 0.38 0.468 0.27
bicycle 156 445 0.51 0.248 0.235 0.138 0.502 0.225 0.21 0.0858
car 575 2825 0.636 0.401 0.451 0.284 0.643 0.382 0.428 0.227
motorcycle 164 668 0.666 0.319 0.387 0.235 0.687 0.313 0.364 0.19
airplane 98 186 0.762 0.613 0.69 0.557 0.784 0.608 0.679 0.439
bus 190 447 0.727 0.423 0.478 0.381 0.758 0.425 0.475 0.36
train 155 370 0.734 0.395 0.454 0.347 0.716 0.378 0.444 0.34
truck 230 602 0.468 0.252 0.27 0.175 0.494 0.246 0.267 0.158
boat 111 473 0.534 0.342 0.369 0.22 0.539 0.313 0.345 0.163
traffic light 170 524 0.539 0.418 0.462 0.263 0.542 0.387 0.419 0.186
fire hydrant 81 97 0.819 0.773 0.837 0.665 0.841 0.763 0.826 0.642
stop sign 53 56 0.637 0.75 0.761 0.709 0.642 0.75 0.761 0.674
parking meter 32 53 0.745 0.66 0.659 0.547 0.774 0.66 0.68 0.558
bench 218 770 0.459 0.143 0.167 0.112 0.458 0.134 0.159 0.0792
bird 132 587 0.647 0.356 0.443 0.288 0.661 0.334 0.415 0.23
cat 185 242 0.755 0.707 0.75 0.635 0.775 0.707 0.756 0.632
dog 184 274 0.744 0.602 0.645 0.553 0.756 0.602 0.647 0.524
horse 130 526 0.668 0.338 0.37 0.237 0.664 0.331 0.351 0.187
sheep 57 462 0.574 0.539 0.544 0.383 0.577 0.517 0.515 0.312
cow 83 500 0.681 0.478 0.531 0.387 0.673 0.454 0.496 0.312
elephant 90 488 0.802 0.512 0.582 0.441 0.815 0.514 0.582 0.395
bear 53 158 0.812 0.367 0.415 0.357 0.829 0.361 0.418 0.335
zebra 86 428 0.796 0.547 0.629 0.471 0.787 0.528 0.608 0.404
giraffe 104 354 0.845 0.588 0.637 0.515 0.841 0.576 0.626 0.444
backpack 261 533 0.442 0.154 0.149 0.0863 0.471 0.145 0.155 0.078
umbrella 175 613 0.608 0.372 0.414 0.261 0.653 0.375 0.416 0.267
handbag 334 729 0.458 0.125 0.148 0.0876 0.481 0.115 0.153 0.0774
tie 142 290 0.606 0.36 0.379 0.256 0.61 0.328 0.354 0.221
suitcase 102 409 0.553 0.34 0.398 0.26 0.575 0.33 0.385 0.242
frisbee 84 131 0.733 0.664 0.701 0.559 0.745 0.672 0.707 0.468
skis 114 459 0.261 0.0959 0.0949 0.0406 0.224 0.0741 0.0667 0.0175
snowboard 45 74 0.515 0.405 0.408 0.295 0.545 0.378 0.401 0.21
sports ball 152 221 0.678 0.507 0.524 0.38 0.669 0.489 0.496 0.256
kite 80 257 0.488 0.665 0.585 0.415 0.474 0.607 0.538 0.266
baseball bat 101 254 0.533 0.232 0.256 0.153 0.613 0.252 0.282 0.141
baseball glove 97 146 0.609 0.541 0.55 0.351 0.622 0.541 0.555 0.325
skateboard 127 226 0.653 0.549 0.557 0.384 0.645 0.513 0.544 0.261
surfboard 147 409 0.523 0.306 0.316 0.198 0.566 0.311 0.346 0.168
tennis racket 168 339 0.698 0.435 0.477 0.298 0.686 0.407 0.441 0.279
bottle 414 1345 0.622 0.364 0.405 0.264 0.62 0.341 0.378 0.217
wine glass 115 449 0.616 0.285 0.35 0.223 0.651 0.283 0.346 0.19
cup 383 937 0.5 0.414 0.41 0.3 0.518 0.403 0.404 0.266
fork 148 271 0.511 0.28 0.316 0.229 0.485 0.247 0.259 0.125
knife 168 315 0.399 0.203 0.188 0.113 0.352 0.16 0.139 0.073
spoon 143 306 0.332 0.154 0.136 0.083 0.313 0.134 0.115 0.0526
bowl 269 659 0.467 0.41 0.372 0.276 0.388 0.319 0.249 0.143
banana 97 1910 0.373 0.0508 0.15 0.0808 0.322 0.0393 0.113 0.0522
apple 65 773 0.488 0.11 0.194 0.136 0.52 0.106 0.185 0.113
sandwich 75 141 0.427 0.525 0.434 0.325 0.458 0.525 0.456 0.327
orange 79 550 0.643 0.304 0.385 0.302 0.653 0.291 0.38 0.276
broccoli 61 436 0.57 0.351 0.398 0.233 0.567 0.321 0.381 0.218
carrot 68 1532 0.589 0.127 0.223 0.128 0.606 0.12 0.211 0.104
hot dog 37 109 0.576 0.596 0.578 0.417 0.501 0.486 0.48 0.325
pizza 142 312 0.67 0.631 0.661 0.543 0.682 0.612 0.651 0.51
donut 50 499 0.653 0.403 0.496 0.391 0.678 0.391 0.489 0.355
cake 123 438 0.655 0.345 0.458 0.3 0.677 0.333 0.455 0.297
chair 601 3480 0.568 0.214 0.274 0.175 0.552 0.189 0.239 0.109
couch 203 523 0.467 0.25 0.266 0.191 0.494 0.247 0.257 0.16
potted plant 163 360 0.367 0.286 0.237 0.107 0.345 0.247 0.2 0.0761
bed 136 310 0.551 0.294 0.313 0.214 0.554 0.284 0.302 0.186
dining table 411 1615 0.292 0.12 0.101 0.054 0.219 0.0842 0.0594 0.0231
toilet 143 226 0.714 0.615 0.669 0.572 0.713 0.597 0.66 0.545
tv 209 305 0.696 0.639 0.682 0.517 0.703 0.623 0.659 0.461
laptop 181 288 0.671 0.583 0.588 0.509 0.681 0.566 0.571 0.349
mouse 87 112 0.617 0.652 0.684 0.537 0.63 0.643 0.678 0.492
remote 122 279 0.433 0.276 0.274 0.177 0.441 0.262 0.266 0.147
keyboard 149 238 0.69 0.496 0.581 0.421 0.681 0.471 0.563 0.393
cell phone 212 327 0.567 0.349 0.375 0.267 0.574 0.324 0.369 0.237
microwave 55 58 0.611 0.603 0.64 0.547 0.643 0.586 0.642 0.479
oven 97 149 0.614 0.483 0.502 0.365 0.637 0.47 0.508 0.324
toaster 10 12 0.28 0.0833 0.222 0.137 0.313 0.0833 0.222 0.151
sink 182 252 0.544 0.472 0.486 0.349 0.566 0.456 0.486 0.321
refrigerator 95 157 0.69 0.554 0.576 0.478 0.727 0.548 0.574 0.441
book 217 1082 0.383 0.153 0.203 0.108 0.357 0.119 0.145 0.0594
clock 207 281 0.692 0.601 0.619 0.453 0.712 0.594 0.619 0.406
vase 230 642 0.57 0.221 0.269 0.181 0.579 0.208 0.264 0.156
scissors 29 43 0.7 0.302 0.284 0.246 0.716 0.294 0.282 0.137
teddy bear 96 254 0.726 0.461 0.537 0.404 0.718 0.445 0.518 0.374
hair drier 6 12 1 0 0.00566 0.00199 1 0 0.00561 0.00224
toothbrush 32 67 0.318 0.164 0.208 0.149 0.439 0.209 0.253 0.14
YOLOv9#
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Performance#
MS COCO
Model | Test Size | APval | AP50val | AP75val | Param. | FLOPs |
---|---|---|---|---|---|---|
YOLOv9-T | 640 | 38.3% | 53.1% | 41.3% | 2.0M | 7.7G |
YOLOv9-S | 640 | 46.8% | 63.4% | 50.7% | 7.1M | 26.4G |
YOLOv9-M | 640 | 51.4% | 68.1% | 56.1% | 20.0M | 76.3G |
YOLOv9-C | 640 | 53.0% | 70.2% | 57.8% | 25.3M | 102.1G |
YOLOv9-E | 640 | 55.6% | 72.8% | 60.6% | 57.3M | 189.0G |
Useful Links#
ExpandCustom training: https://github.com/WongKinYiu/yolov9/issues/30#issuecomment-1960955297
ONNX export: https://github.com/WongKinYiu/yolov9/issues/2#issuecomment-1960519506 https://github.com/WongKinYiu/yolov9/issues/40#issue-2150697688 https://github.com/WongKinYiu/yolov9/issues/130#issue-2162045461
ONNX export for segmentation: https://github.com/WongKinYiu/yolov9/issues/260#issue-2191162150
TensorRT inference: https://github.com/WongKinYiu/yolov9/issues/143#issuecomment-1975049660 https://github.com/WongKinYiu/yolov9/issues/34#issue-2150393690 https://github.com/WongKinYiu/yolov9/issues/79#issue-2153547004 https://github.com/WongKinYiu/yolov9/issues/143#issue-2164002309
QAT TensorRT: https://github.com/WongKinYiu/yolov9/issues/327#issue-2229284136 https://github.com/WongKinYiu/yolov9/issues/253#issue-2189520073
TensorRT inference for segmentation: https://github.com/WongKinYiu/yolov9/issues/446
TFLite: https://github.com/WongKinYiu/yolov9/issues/374#issuecomment-2065751706
OpenVINO: https://github.com/WongKinYiu/yolov9/issues/164#issue-2168540003
C# ONNX inference: https://github.com/WongKinYiu/yolov9/issues/95#issue-2155974619
C# OpenVINO inference: https://github.com/WongKinYiu/yolov9/issues/95#issuecomment-1968131244
OpenCV: https://github.com/WongKinYiu/yolov9/issues/113#issuecomment-1971327672
Hugging Face demo: https://github.com/WongKinYiu/yolov9/issues/45#issuecomment-1961496943
CoLab demo: https://github.com/WongKinYiu/yolov9/pull/18
ONNXSlim export: https://github.com/WongKinYiu/yolov9/pull/37
YOLOv9 ROS: https://github.com/WongKinYiu/yolov9/issues/144#issue-2164210644
YOLOv9 ROS TensorRT: https://github.com/WongKinYiu/yolov9/issues/145#issue-2164218595
YOLOv9 Julia: https://github.com/WongKinYiu/yolov9/issues/141#issuecomment-1973710107
YOLOv9 MLX: https://github.com/WongKinYiu/yolov9/issues/258#issue-2190586540
YOLOv9 StrongSORT with OSNet: https://github.com/WongKinYiu/yolov9/issues/299#issue-2212093340
YOLOv9 ByteTrack: https://github.com/WongKinYiu/yolov9/issues/78#issue-2153512879
YOLOv9 DeepSORT: https://github.com/WongKinYiu/yolov9/issues/98#issue-2156172319
YOLOv9 counting: https://github.com/WongKinYiu/yolov9/issues/84#issue-2153904804
YOLOv9 speed estimation: https://github.com/WongKinYiu/yolov9/issues/456
YOLOv9 face detection: https://github.com/WongKinYiu/yolov9/issues/121#issue-2160218766
YOLOv9 segmentation onnxruntime: https://github.com/WongKinYiu/yolov9/issues/151#issue-2165667350
Comet logging: https://github.com/WongKinYiu/yolov9/pull/110
MLflow logging: https://github.com/WongKinYiu/yolov9/pull/87
AnyLabeling tool: https://github.com/WongKinYiu/yolov9/issues/48#issue-2152139662
AX650N deploy: https://github.com/WongKinYiu/yolov9/issues/96#issue-2156115760
Conda environment: https://github.com/WongKinYiu/yolov9/pull/93
AutoDL docker environment: https://github.com/WongKinYiu/yolov9/issues/112#issue-2158203480
Installation#
Docker environment (recommended)
Expand# create the docker container, you can change the share memory size if you have more.
nvidia-docker run --name yolov9 -it -v your_coco_path/:/coco/ -v your_code_path/:/yolov9 --shm-size=64g nvcr.io/nvidia/pytorch:21.11-py3
# apt install required packages
apt update
apt install -y zip htop screen libgl1-mesa-glx
# pip install required packages
pip install seaborn thop
# go to code folder
cd /yolov9
Evaluation#
yolov9-s-converted.pt
yolov9-m-converted.pt
yolov9-c-converted.pt
yolov9-e-converted.pt
yolov9-s.pt
yolov9-m.pt
yolov9-c.pt
yolov9-e.pt
gelan-s.pt
gelan-m.pt
gelan-c.pt
gelan-e.pt
# evaluate converted yolov9 models
python val.py --data data/coco.yaml --img 640 --batch 32 --conf 0.001 --iou 0.7 --device 0 --weights './yolov9-c-converted.pt' --save-json --name yolov9_c_c_640_val
# evaluate yolov9 models
# python val_dual.py --data data/coco.yaml --img 640 --batch 32 --conf 0.001 --iou 0.7 --device 0 --weights './yolov9-c.pt' --save-json --name yolov9_c_640_val
# evaluate gelan models
# python val.py --data data/coco.yaml --img 640 --batch 32 --conf 0.001 --iou 0.7 --device 0 --weights './gelan-c.pt' --save-json --name gelan_c_640_val
You will get the results:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.530
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.702
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.578
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.362
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.585
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.693
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.392
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.652
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.702
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.541
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.760
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.844
Training#
Data preparation
bash scripts/get_coco.sh
- Download MS COCO dataset images (train, val, test) and labels. If you have previously used a different version of YOLO, we strongly recommend that you delete
train2017.cache
andval2017.cache
files, and redownload labels
Single GPU training
# train yolov9 models
python train_dual.py --workers 8 --device 0 --batch 16 --data data/coco.yaml --img 640 --cfg models/detect/yolov9-c.yaml --weights '' --name yolov9-c --hyp hyp.scratch-high.yaml --min-items 0 --epochs 500 --close-mosaic 15
# train gelan models
# python train.py --workers 8 --device 0 --batch 32 --data data/coco.yaml --img 640 --cfg models/detect/gelan-c.yaml --weights '' --name gelan-c --hyp hyp.scratch-high.yaml --min-items 0 --epochs 500 --close-mosaic 15
Multiple GPU training
# train yolov9 models
python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train_dual.py --workers 8 --device 0,1,2,3,4,5,6,7 --sync-bn --batch 128 --data data/coco.yaml --img 640 --cfg models/detect/yolov9-c.yaml --weights '' --name yolov9-c --hyp hyp.scratch-high.yaml --min-items 0 --epochs 500 --close-mosaic 15
# train gelan models
# python -m torch.distributed.launch --nproc_per_node 4 --master_port 9527 train.py --workers 8 --device 0,1,2,3 --sync-bn --batch 128 --data data/coco.yaml --img 640 --cfg models/detect/gelan-c.yaml --weights '' --name gelan-c --hyp hyp.scratch-high.yaml --min-items 0 --epochs 500 --close-mosaic 15
Re-parameterization#
Inference#
# inference converted yolov9 models
python detect.py --source './data/images/horses.jpg' --img 640 --device 0 --weights './yolov9-c-converted.pt' --name yolov9_c_c_640_detect
# inference yolov9 models
# python detect_dual.py --source './data/images/horses.jpg' --img 640 --device 0 --weights './yolov9-c.pt' --name yolov9_c_640_detect
# inference gelan models
# python detect.py --source './data/images/horses.jpg' --img 640 --device 0 --weights './gelan-c.pt' --name gelan_c_c_640_detect
Citation#
@article{wang2024yolov9,
title={{YOLOv9}: Learning What You Want to Learn Using Programmable Gradient Information},
author={Wang, Chien-Yao and Liao, Hong-Yuan Mark},
booktitle={arXiv preprint arXiv:2402.13616},
year={2024}
}
@article{chang2023yolor,
title={{YOLOR}-Based Multi-Task Learning},
author={Chang, Hung-Shuo and Wang, Chien-Yao and Wang, Richard Robert and Chou, Gene and Liao, Hong-Yuan Mark},
journal={arXiv preprint arXiv:2309.16921},
year={2023}
}
Teaser#
Parts of code of YOLOR-Based Multi-Task Learning are released in the repository.
Object Detection#
object detection
# coco/labels/{split}/*.txt
# bbox or polygon (1 instance 1 line)
python train.py --workers 8 --device 0 --batch 32 --data data/coco.yaml --img 640 --cfg models/detect/gelan-c.yaml --weights '' --name gelan-c-det --hyp hyp.scratch-high.yaml --min-items 0 --epochs 300 --close-mosaic 10
Model | Test Size | Param. | FLOPs | APbox |
---|---|---|---|---|
GELAN-C-DET | 640 | 25.3M | 102.1G | 52.3% |
YOLOv9-C-DET | 640 | 25.3M | 102.1G | 53.0% |
Instance Segmentation#
object detection
instance segmentation
# coco/labels/{split}/*.txt
# polygon (1 instance 1 line)
python segment/train.py --workers 8 --device 0 --batch 32 --data coco.yaml --img 640 --cfg models/segment/gelan-c-seg.yaml --weights '' --name gelan-c-seg --hyp hyp.scratch-high.yaml --no-overlap --epochs 300 --close-mosaic 10
Model | Test Size | Param. | FLOPs | APbox | APmask |
---|---|---|---|---|---|
GELAN-C-SEG | 640 | 27.4M | 144.6G | 52.3% | 42.4% |
YOLOv9-C-SEG | 640 | 27.4M | 145.5G | 53.3% | 43.5% |
Panoptic Segmentation#
object detection
instance segmentation
semantic segmentation
stuff segmentation
panoptic segmentation
# coco/labels/{split}/*.txt
# polygon (1 instance 1 line)
# coco/stuff/{split}/*.txt
# polygon (1 semantic 1 line)
python panoptic/train.py --workers 8 --device 0 --batch 32 --data coco.yaml --img 640 --cfg models/panoptic/gelan-c-pan.yaml --weights '' --name gelan-c-pan --hyp hyp.scratch-high.yaml --no-overlap --epochs 300 --close-mosaic 10
Model | Test Size | Param. | FLOPs | APbox | APmask | mIoU164k/10ksemantic | mIoUstuff | PQpanoptic |
---|---|---|---|---|---|---|---|---|
GELAN-C-PAN | 640 | 27.6M | 146.7G | 52.6% | 42.5% | 39.0%/48.3% | 52.7% | 39.4% |
YOLOv9-C-PAN | 640 | 28.8M | 187.0G | 52.7% | 43.0% | 39.8%/- | 52.2% | 40.5% |
Image Captioning (not yet released)#
object detection
instance segmentation
semantic segmentation
stuff segmentation
panoptic segmentation
image captioning
# coco/labels/{split}/*.txt
# polygon (1 instance 1 line)
# coco/stuff/{split}/*.txt
# polygon (1 semantic 1 line)
# coco/annotations/*.json
# json (1 split 1 file)
python caption/train.py --workers 8 --device 0 --batch 32 --data coco.yaml --img 640 --cfg models/caption/gelan-c-cap.yaml --weights '' --name gelan-c-cap --hyp hyp.scratch-high.yaml --no-overlap --epochs 300 --close-mosaic 10
Model | Test Size | Param. | FLOPs | APbox | APmask | mIoU164k/10ksemantic | mIoUstuff | PQpanoptic | BLEU@4caption | CIDErcaption |
---|---|---|---|---|---|---|---|---|---|---|
GELAN-C-CAP | 640 | 47.5M | - | 51.9% | 42.6% | 42.5%/- | 56.5% | 41.7% | 38.8 | 122.3 |
YOLOv9-C-CAP | 640 | 47.5M | - | 52.1% | 42.6% | 43.0%/- | 56.4% | 42.1% | 39.1 | 122.0 |