Hand Gesture Detection - Rev #03

6 minute read

Hand Gesture Detection - Rev #03

0. Overview

지난 번 Conv3D를 이용한 Post( https://moonlight314.github.io/deep/learning/Hand_Gesture_Detection_Rev_02/ )에서 다룬 방식은 학습이 제대로 이루어 지지 않았습니다.

이번 Post에서는 동영상 각 Frame을 Image처럼 처리하고, Image에 Pre-Trained Model을 이용하여 Feature를 뽑은 다음, 이 Feature들을 RNN에 넣어서 Hand Gesture Detection을 진행해 보도록 하겠습니다.

전체적인 진행 방식은 아래와 같습니다.

Image에서 Feature를 Extract를 할 Pre-Trained Model로는 MobileNetV2를 사용하도록 하겠습니다.

1. Load Module

필요한 Module을 Load합니다.

특별한 Module은 없고, 늘 사용하던 Module들입니다.

import pandas as pd
import cv2 as cv
import numpy as np
from collections import deque

from sklearn.preprocessing import LabelEncoder 
from sklearn.model_selection import train_test_split

import tensorflow as tf
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.models import Sequential , Model
from tensorflow.keras.layers import LSTM, Dense, Bidirectional , Dropout , BatchNormalization , GlobalAveragePooling2D

2. Feature Extractor

상수들을 정의합니다.

MAX_FRAME = 128
BATCH_SIZE = 32
EPOCHS = 5
DEST_SIZE = (160,160)

앞서 말했듯이 Feature Extractor로 MobileNetV2를 사용하도록 하겠습니다.

Weight는 Imagenet용으로 초기화하고 Top은 날립니다.

1차원 Shape으로 만들기 위해 AveragePooling을 사용하고, Freeze 시킵니다.

이것으로 Image로 부터 Feature를 뽑아낼 준비는 끝났습니다.

def Feature_Extractor():
    model = Sequential()

    model.add( MobileNetV2(include_top = False, 
                        input_shape=(160, 160, 3),
                        ))   

    model.add(GlobalAveragePooling2D()) 
    model.trainable = False

    return model

3. RNN Model

RNN Model도 앞선 Post에서와 유사하게 만들었습니다.

다만, Input Shape은 Feature Extractor의 Output을 감안하여 변형하였습니다.

def Make_RNN_Model():
    model = Sequential()
    model.add(Bidirectional( LSTM(512,  return_sequences=True , activation='tanh'), input_shape=(MAX_FRAME , 1280)))
    model.add(Bidirectional( LSTM(256,  return_sequences=True , activation='tanh')))
    model.add(Bidirectional( LSTM(64,  return_sequences=False , activation='tanh')))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(4, activation='softmax'))

    return model

4. Generate Train Data

손 동작을 녹화한 Video File로 부터 Feature Extraction하고 모두 동일한 Frame Size로 Length를 맞추는 함수들입니다.

Adjust_Length는 Feature의 Length가 Max Frame과 맞지 않은 경우 앞뒤로 Dummy Data를 붙여서 Length를 모두 동일하도록 해줍니다.

def Adjust_Length( data ):
    
    # Dummy Data로 사용할 Feature
    # 앞쪽은 가장 앞쪽으로 채우고, 뒤쪽은 가장 뒤쪽의 Data로 채운다.
    front = data[0]
    back = data[-1]

    d = np.array(data)
    data = deque(data)

    length = d.shape[0]

    for _ in range(int((MAX_FRAME - length)/2)):
        data.append( back )
        data.appendleft( front )

    if len(data) != MAX_FRAME:
        data.append( back )

    aligned_data = np.array(data)
    
    return aligned_data

generate_train_data 함수는 실제 동영상 File을 읽어서 각 Frame을 Feature Extractor에 넣어 결과를 모은 후, 앞서 설명한 Adjust 과정을 거쳐 RNN Model의 입력에 넣을 수 있도록 Train Data를 생성하는 일을 합니다.

def generate_train_data( file_path , label , feature_extractor):     

    output = []

    for f in file_path:
        
        batch_frames = []

        filename = f.replace("./Train_Data","C:/Users/csyi/Desktop/Hand_Gesture_Detection/Train_Data")
        filename = filename.replace("'","")   

        cap = cv.VideoCapture( filename )

        if cap.isOpened() == False:
            print("Open Error")

        data = deque([])

        while( True ):
            ret, frame = cap.read()

            if ret == False:
                break

            frame = cv.resize( frame, dsize=DEST_SIZE, interpolation=cv.INTER_AREA )
            frame = tf.keras.applications.mobilenet_v2.preprocess_input( frame )
            batch_frames.append( frame )
        
        batch_frames = np.reshape( np.array(batch_frames) , (-1,160,160,3))

        ret = feature_extractor.predict_on_batch( batch_frames )

        data = Adjust_Length( ret )
        output.append(data)
        
    output = np.reshape(np.array(output) , (-1,128,1280))
    label = np.reshape(np.array(label) , (-1,4))

    return output , label

Feature Extractor인 MobileNetV2의 구조입니다.

# Load Feature Extractor
feature_extractor = Feature_Extractor()
feature_extractor.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
mobilenetv2_1.00_160 (Functi (None, 5, 5, 1280)        2257984   
_________________________________________________________________
global_average_pooling2d (Gl (None, 1280)              0         
=================================================================
Total params: 2,257,984
Trainable params: 0
Non-trainable params: 2,257,984
_________________________________________________________________

RNN도 만들어 볼까요?

RNN_Model = Make_RNN_Model()

RNN_Model.compile(  optimizer=tf.keras.optimizers.Adam(1e-3),
                    loss='categorical_crossentropy',
                    metrics=['categorical_accuracy']
)

Meta Data File을 읽어옵니다.

# Load meta data file
meta = pd.read_csv("Meta_Data_220117_Rev_01.csv")

# Data File Path
file_path = meta['file_path'].tolist()

# Label 
labels = meta['action'].tolist()
print(len(file_path) , len(labels))

389 389

Label도 준비합니다.

le = LabelEncoder()
le_action = le.fit(labels)
le_action = le.transform(labels)
print(le.classes_)

y = tf.keras.utils.to_categorical(le_action, num_classes=4)
print(y)

['1_Finger_Click' '2_Fingers_Left' '2_Fingers_Right' 'Shake_Hand']
[[1. 0. 0. 0.]
 [1. 0. 0. 0.]
 [1. 0. 0. 0.]
 ...
 [0. 0. 0. 1.]
 [0. 0. 0. 1.]
 [0. 0. 0. 1.]]

# Train / Test 나누기
X_train, X_test, y_train, y_test = train_test_split(file_path, y, test_size=0.25 , stratify = y)

tmp_val_data = []
tmp_val_target = []

대망의 Training Stage입니다.

Feature Extractor에서 나온 값을 RNN의 Input으로 넣는 과정으로 진행됩니다.

for epoch in range(EPOCHS):

    print("### Epoch : {0} ###\n\n".format(epoch))

    for idx in range( 0 , len(X_train) , BATCH_SIZE):

        tmp_train_data = []
        tmp_target = []

        batch_file_list = []
        batch_target = []

        for batch in range(BATCH_SIZE):                

            if idx+batch >= len(X_train):
                break

            batch_file_list.append( X_train[idx+batch] )
            batch_target.append( y_train[idx+batch] )

        train_data , target = generate_train_data( batch_file_list, batch_target , feature_extractor)

        RNN_Model.fit(  x = train_data,
                        y = target,            
                        verbose=1)

    # Eval.
    print("### Evaluation... Epoch : {0} ###".format(epoch))

    if len(tmp_val_data) == 0:
        tmp_val_data , tmp_val_target = generate_train_data( X_test , y_test , feature_extractor)

    ret = RNN_Model.evaluate(tmp_val_data , tmp_val_target )

### Epoch : 0 ###


1/1 [==============================] - 6s 6s/step - loss: 1.3582 - categorical_accuracy: 0.2188
1/1 [==============================] - 0s 246ms/step - loss: 1.6103 - categorical_accuracy: 0.2812
1/1 [==============================] - 0s 248ms/step - loss: 1.9793 - categorical_accuracy: 0.0938
1/1 [==============================] - 0s 246ms/step - loss: 1.2979 - categorical_accuracy: 0.4062
1/1 [==============================] - 0s 244ms/step - loss: 1.4777 - categorical_accuracy: 0.1875
1/1 [==============================] - 0s 247ms/step - loss: 1.3085 - categorical_accuracy: 0.2812
1/1 [==============================] - 0s 247ms/step - loss: 1.3324 - categorical_accuracy: 0.2188
1/1 [==============================] - 0s 247ms/step - loss: 1.2749 - categorical_accuracy: 0.4062
1/1 [==============================] - 0s 244ms/step - loss: 1.1534 - categorical_accuracy: 0.4375
1/1 [==============================] - 4s 4s/step - loss: 1.1634 - categorical_accuracy: 0.0000e+00
### Evaluation... Epoch : 0 ###
4/4 [==============================] - 2s 89ms/step - loss: 1.2826 - categorical_accuracy: 0.4592
### Epoch : 1 ###


1/1 [==============================] - 0s 246ms/step - loss: 1.2443 - categorical_accuracy: 0.4375
1/1 [==============================] - 0s 245ms/step - loss: 1.1586 - categorical_accuracy: 0.4688
1/1 [==============================] - 0s 243ms/step - loss: 1.0803 - categorical_accuracy: 0.6562
1/1 [==============================] - 0s 246ms/step - loss: 0.9259 - categorical_accuracy: 0.7812
1/1 [==============================] - 0s 243ms/step - loss: 0.9260 - categorical_accuracy: 0.5938
1/1 [==============================] - 0s 240ms/step - loss: 0.7924 - categorical_accuracy: 0.7500
1/1 [==============================] - 0s 244ms/step - loss: 0.6781 - categorical_accuracy: 0.8125
1/1 [==============================] - 0s 247ms/step - loss: 0.5861 - categorical_accuracy: 0.8438
1/1 [==============================] - 0s 238ms/step - loss: 0.5703 - categorical_accuracy: 0.8438
1/1 [==============================] - 0s 106ms/step - loss: 0.5877 - categorical_accuracy: 1.0000
### Evaluation... Epoch : 1 ###
4/4 [==============================] - 0s 88ms/step - loss: 0.8222 - categorical_accuracy: 0.6020
### Epoch : 2 ###


1/1 [==============================] - 0s 240ms/step - loss: 0.7703 - categorical_accuracy: 0.5938
1/1 [==============================] - 0s 239ms/step - loss: 0.6549 - categorical_accuracy: 0.8438
1/1 [==============================] - 0s 245ms/step - loss: 0.6331 - categorical_accuracy: 0.8438
1/1 [==============================] - 0s 242ms/step - loss: 0.5142 - categorical_accuracy: 0.8125
1/1 [==============================] - 0s 228ms/step - loss: 0.4535 - categorical_accuracy: 0.7812
1/1 [==============================] - 0s 251ms/step - loss: 0.3349 - categorical_accuracy: 0.8438
1/1 [==============================] - 0s 244ms/step - loss: 0.2939 - categorical_accuracy: 1.0000
1/1 [==============================] - 0s 243ms/step - loss: 0.3022 - categorical_accuracy: 1.0000
1/1 [==============================] - 0s 242ms/step - loss: 0.2264 - categorical_accuracy: 0.9688
1/1 [==============================] - 0s 107ms/step - loss: 0.1069 - categorical_accuracy: 1.0000
### Evaluation... Epoch : 2 ###
4/4 [==============================] - 0s 87ms/step - loss: 0.2742 - categorical_accuracy: 0.9184
### Epoch : 3 ###


1/1 [==============================] - 0s 249ms/step - loss: 0.2286 - categorical_accuracy: 0.9688
1/1 [==============================] - 0s 248ms/step - loss: 0.2914 - categorical_accuracy: 0.9688
1/1 [==============================] - 0s 241ms/step - loss: 0.2398 - categorical_accuracy: 1.0000
1/1 [==============================] - 0s 247ms/step - loss: 0.1969 - categorical_accuracy: 0.9688
1/1 [==============================] - 0s 242ms/step - loss: 0.1515 - categorical_accuracy: 1.0000
1/1 [==============================] - 0s 245ms/step - loss: 0.2328 - categorical_accuracy: 0.8438
1/1 [==============================] - 0s 248ms/step - loss: 0.1201 - categorical_accuracy: 1.0000
1/1 [==============================] - 0s 247ms/step - loss: 0.2063 - categorical_accuracy: 0.9375
1/1 [==============================] - 0s 241ms/step - loss: 0.0878 - categorical_accuracy: 1.0000
1/1 [==============================] - 0s 105ms/step - loss: 0.0248 - categorical_accuracy: 1.0000
### Evaluation... Epoch : 3 ###
4/4 [==============================] - 0s 87ms/step - loss: 0.1733 - categorical_accuracy: 0.9490
### Epoch : 4 ###


1/1 [==============================] - 0s 241ms/step - loss: 0.1172 - categorical_accuracy: 0.9688
1/1 [==============================] - 0s 243ms/step - loss: 0.1761 - categorical_accuracy: 0.9688
1/1 [==============================] - 0s 250ms/step - loss: 0.2952 - categorical_accuracy: 0.8750
1/1 [==============================] - 0s 244ms/step - loss: 0.1936 - categorical_accuracy: 0.9688
1/1 [==============================] - 0s 243ms/step - loss: 0.3419 - categorical_accuracy: 0.9062
1/1 [==============================] - 0s 244ms/step - loss: 0.1645 - categorical_accuracy: 0.9375
1/1 [==============================] - 0s 239ms/step - loss: 0.3557 - categorical_accuracy: 0.8438
1/1 [==============================] - 0s 253ms/step - loss: 0.1632 - categorical_accuracy: 0.9688
1/1 [==============================] - 0s 243ms/step - loss: 0.1602 - categorical_accuracy: 0.9688
1/1 [==============================] - 0s 107ms/step - loss: 0.0483 - categorical_accuracy: 1.0000
### Evaluation... Epoch : 4 ###
4/4 [==============================] - 0s 86ms/step - loss: 0.4784 - categorical_accuracy: 0.8469

다행히도 Training이 되네요. Train / Validation Data 모두에서 Loss / Acc.가 좋아지는 모습을 볼 수 있습니다.

Share on

Twitter Facebook LinkedIn

MoonLight

Hand Gesture Detection - Rev #03

Hand Gesture Detection - Rev #03

0. Overview

1. Load Module

2. Feature Extractor

3. RNN Model

4. Generate Train Data

Share on

You May Also Enjoy

LoRA(Low-Rank Adaptation)

Flash Attention

대규모 머신러닝 시스템 디자인 패턴(Distributed Machine Learning Patterns)

소프트웨어 엔지니어 가이드북(The Software Engineer’s Guidebook)