| Yazarlar (1) |
Dr. Öğr. Üyesi Senem TANBERK
Huawei R&D İstanbul, Türkiye |
| Özet |
| Emotion recognition has become an important research topic in the field of human-computer interaction. Studies on audio and videos to understand emotions focused mainly on analyzing facial expressions and classified 6 basic emotions. In this study, the performance of different sequence models which are frequently used in literature is compared for multi-modal emotion recognition problems. The audio and images were first processed by multi-layered CNN models, and the outputs of these models were fed into various sequence models. The sequence models are GRU, Transformer, LSTM, and Max Pooling. Accuracy, precision, harmonic, and macro F1 Score values of all models were calculated. The multi-modal CREMA-D dataset was used in the experiments. As a result of the comparison of the CREMA-D dataset, GRU-based architecture with 0.640 showed the best result in harmonic F1 score, LSTM-based … |
| Anahtar Kelimeler |
| Bildiri Türü | Tebliğ/Bildiri |
| Bildiri Alt Türü | Tam Metin Olarak Yayınlanan Tebliğ (Uluslararası Kongre/Sempozyum) |
| Bildiri Niteliği | Alanında Hakemli Uluslararası Kongre/Sempozyum |
| Bildiri Dili | İngilizce |
| Kongre Adı | 2023 Innovations in Intelligent Systems and Applications Conference (ASYU) |
| Kongre Tarihi | 11-10-2023 / 11-10-2023 |
| Basıldığı Ülke | Türkiye |
| Basıldığı Şehir |