Resume Information Extraction via Post-OCR Text Processing

TANBERK, SENEM

Resume Information Extraction via Post-OCR Text Processing

Yazarlar (1)

Dr. Öğr. Üyesi Senem TANBERK Huawei R&D İstanbul, Türkiye

Bildiri Türü	Tebliğ/Bildiri	Bildiri Dili	İngilizce
Bildiri Alt Türü	Tam Metin Olarak Yayınlanan Tebliğ (Uluslararası Kongre/Sempozyum)
Bildiri Niteliği	Alanında Hakemli Uluslararası Kongre/Sempozyum
Kongre Adı	2023 Innovations in Intelligent Systems and Applications Conference (ASYU)
Kongre Tarihi	11-10-2023 / 11-10-2023
Basıldığı Ülke	Türkiye	Basıldığı Şehir
Bildiri Linki	https://ieeexplore.ieee.org/abstract/document/10296715/
UAK Araştırma Alanları	Eğitim Bilimleri

Özet

Information extraction (IE), one of the main tasks of natural language processing (NLP), has recently increased importance in the use of resumes. In studies on the text to extract information from the CV, sentence classification was generally made using NLP models. In this study, it is aimed to extract information by classifying all of the text groups after preprocessing such as Optical Character Recognition (OCT) and object recognition with the YOLOv8 model of the resumes. The text dataset consists of 286 resumes collected for 5 different (education, experience, talent, personal, and language) job descriptions in the IT industry. The dataset created for object recognition consists of 1198 resumes, which were collected from the open-source datasets and labeled as sets of text. BERT, BERT-t, DistilBERT, RoBERTa, and XLNet were used as models. F1 score variances were used to compare the model results. In addition …

Anahtar Kelimeler

Pdf İndir

BM Sürdürülebilir Kalkınma Amaçları

Atıf Sayıları
Google Scholar	6

Resume Information Extraction via Post-OCR Text Processing

Paylaş