The online environment becomes more and more merged with visually represented data and
text, then it is not only possible but also requires to use the models which will be capable of
processing multimodal input. A multimodal sentiment classification system is proposed in the
offered project, which is based on the utilization of transformers which enables one to analyze
visual and textual information. Specifically, the model is utilized in deriving text-based features
with the use of BERT and visual features with the help of Vision Transformer (ViT) to further
understand the sentiments expressed in the two modalities. During training, the textual data
underwent cleansing and were tokenized and fed into a pretrained BERT model, and images
were normalized and re-sized, with eyes on being turned into patch embeddings by using ViT.
The features were bulged later on, by concatenation to completely connected layers to reach
classification. The experiments were conducted on the MVSA data and the proposed model
actually performs better than the traditional unimodal models and it also has 84.2, 85.1, 83.5
and 84.3 to show its accuracy, precision, recall and F1-score respectively. In order to be
applicable in practice, a friendly user interface was offered in Flutter and to test the tool a user
might upload tweets and photos, to obtain instant predictions in sentiment. The paper
demonstrates the advantage of the attention-based fusion strategy to elicit subtle emotion
information and sets the viability of transformer designs in social media in real life text
processing. Future studies can take into account language-neutral sentiment recognition and
can be extended to voice-image recognition to have more holistic applications in affective
computing.
Research Area
Machine Learning: Machine Learning (ML) research in Computer Science and Information Technology focuses on the development of algorithms and models that enable computers to learn from data and improve their performance over time without being explicitly programmed. It is a subset of Artificial Intelligence that uses statistical techniques to give machines the ability to learn patterns, make decisions, and predict outcomes based on data.
Supervised learning, a key area of ML research, involves training models on labeled data, where the input-output relationships are predefined. This method is widely used for tasks such as classification (e.g., spam detection) and regression (e.g., predicting house prices). Unsupervised learning, on the other hand, involves finding hidden patterns in data without predefined labels, with clustering and association being typical applications in areas such as customer segmentation and anomaly detection.
Reinforcement learning is another area of ML that focuses on teaching agents to make decisions by interacting with their environment and receiving feedback in the form of rewards or penalties. It is often applied in robotics, game playing, and autonomous systems, where continuous learning and adaptation are required.
Project Main Objective
To design and implement an efficient and scalable multimodal sentiment analysis system using transformer-based architecture.