Ahmed Nizhan Haikal, . (2025) IMPLEMENTASI MODEL VISUAL QUESTION ANSWERING MENGGUNAKAN VISION TRANSFORMER DAN EFFICIENTNET-V2 DENGAN BERT. Skripsi thesis, Universitas Pembangunan Nasional Veteran Jakarta.
![]() |
Text
ABSTRAK.pdf Download (159kB) |
![]() |
Text
AWAL.pdf Download (3MB) |
![]() |
Text
BAB 1.pdf Restricted to Repository UPNVJ Only Download (197kB) |
![]() |
Text
BAB 2.pdf Restricted to Repository UPNVJ Only Download (1MB) |
![]() |
Text
BAB 3.pdf Restricted to Repository UPNVJ Only Download (897kB) |
![]() |
Text
BAB 4.pdf Restricted to Repository UPNVJ Only Download (8MB) |
![]() |
Text
BAB 5.pdf Download (169kB) |
![]() |
Text
DAFTAR PUSTAKA.pdf Download (179kB) |
![]() |
Text
DAFTAR RIWAYAT HIDUP.pdf Restricted to Repository UPNVJ Only Download (252kB) |
![]() |
Text
ARTIKEL KI.pdf Restricted to Repository staff only Download (7MB) |
![]() |
Text
HASIL PLAGIARISME.pdf Restricted to Repository staff only Download (17MB) |
Abstract
Visual Question Answering (VQA) is a multimodal task that integrates computer vision and natural language processing to answer image-based questions. This research explores the impact of visual architectures (EfficientNet V2 and Vision Transformer/ViT) on VQA performance, utilizing BERT as the text backbone for bidirectional semantic context understanding. We employed the DT-VQA dataset and implemented a thresholding strategy to address label imbalance. Results indicate that the ViT-BERT model with fine-tuning achieved the best performance among ViT-BERT variants, recording an accuracy of 53.19% and an ANLS of 61.95% at a threshold of 75. Overall, the highest performance was achieved by the EfficientNet V2-BERT model with transfer learning, with an accuracy of 56.3% and an ANLS of 62.9%. These findings underscore the importance of tailoring architectural choices and training strategies for optimal performance given specific data characteristics.
Item Type: | Thesis (Skripsi) |
---|---|
Additional Information: | [No.Panggil: 2110511022] [Pembimbing 1: Ridwan Raafi'udin] [Pembimbing 2: Muhammad Adrezo] [Penguji 1: Neny Rosmawarni] [Penguji 2: Nurul Afifah Arifuddin] |
Uncontrolled Keywords: | BERT, EfficientNet V2, Multimodal Processing, Visual Question Answering, Vision Transformer |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Fakultas Ilmu Komputer > Program Studi Informatika (S1) |
Depositing User: | AHMED NIZHAN HAIKAL |
Date Deposited: | 07 Aug 2025 01:17 |
Last Modified: | 07 Aug 2025 01:17 |
URI: | http://repository.upnvj.ac.id/id/eprint/37040 |
Actions (login required)
![]() |
View Item |