IMPLEMENTASI MODEL VISUAL QUESTION ANSWERING MENGGUNAKAN VISION TRANSFORMER DAN EFFICIENTNET-V2 DENGAN BERT

Ahmed Nizhan Haikal, . (2025) IMPLEMENTASI MODEL VISUAL QUESTION ANSWERING MENGGUNAKAN VISION TRANSFORMER DAN EFFICIENTNET-V2 DENGAN BERT. Skripsi thesis, Universitas Pembangunan Nasional Veteran Jakarta.

[img] Text
ABSTRAK.pdf

Download (159kB)
[img] Text
AWAL.pdf

Download (3MB)
[img] Text
BAB 1.pdf
Restricted to Repository UPNVJ Only

Download (197kB)
[img] Text
BAB 2.pdf
Restricted to Repository UPNVJ Only

Download (1MB)
[img] Text
BAB 3.pdf
Restricted to Repository UPNVJ Only

Download (897kB)
[img] Text
BAB 4.pdf
Restricted to Repository UPNVJ Only

Download (8MB)
[img] Text
BAB 5.pdf

Download (169kB)
[img] Text
DAFTAR PUSTAKA.pdf

Download (179kB)
[img] Text
DAFTAR RIWAYAT HIDUP.pdf
Restricted to Repository UPNVJ Only

Download (252kB)
[img] Text
ARTIKEL KI.pdf
Restricted to Repository staff only

Download (7MB)
[img] Text
HASIL PLAGIARISME.pdf
Restricted to Repository staff only

Download (17MB)

Abstract

Visual Question Answering (VQA) is a multimodal task that integrates computer vision and natural language processing to answer image-based questions. This research explores the impact of visual architectures (EfficientNet V2 and Vision Transformer/ViT) on VQA performance, utilizing BERT as the text backbone for bidirectional semantic context understanding. We employed the DT-VQA dataset and implemented a thresholding strategy to address label imbalance. Results indicate that the ViT-BERT model with fine-tuning achieved the best performance among ViT-BERT variants, recording an accuracy of 53.19% and an ANLS of 61.95% at a threshold of 75. Overall, the highest performance was achieved by the EfficientNet V2-BERT model with transfer learning, with an accuracy of 56.3% and an ANLS of 62.9%. These findings underscore the importance of tailoring architectural choices and training strategies for optimal performance given specific data characteristics.

Item Type: Thesis (Skripsi)
Additional Information: [No.Panggil: 2110511022] [Pembimbing 1: Ridwan Raafi'udin] [Pembimbing 2: Muhammad Adrezo] [Penguji 1: Neny Rosmawarni] [Penguji 2: Nurul Afifah Arifuddin]
Uncontrolled Keywords: BERT, EfficientNet V2, Multimodal Processing, Visual Question Answering, Vision Transformer
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Fakultas Ilmu Komputer > Program Studi Informatika (S1)
Depositing User: AHMED NIZHAN HAIKAL
Date Deposited: 07 Aug 2025 01:17
Last Modified: 07 Aug 2025 01:17
URI: http://repository.upnvj.ac.id/id/eprint/37040

Actions (login required)

View Item View Item