ITDS | SPMS

Multimodal Sentiment Classification Using Transformer-Based Architectures

Project Description
Supervisor

The online environment becomes more and more merged with visually represented data and text, then it is not only possible but also requires to use the models which will be capable of processing multimodal input. A multimodal sentiment classification system is proposed in the offered project, which is based on the utilization of transformers which enables one to analyze visual and textual information. Specifically, the model is utilized in deriving text-based features with the use of BERT and visual features with the help of Vision Transformer (ViT) to further understand the sentiments expressed in the two modalities. During training, the textual data underwent cleansing and were tokenized and fed into a pretrained BERT model, and images were normalized and re-sized, with eyes on being turned into patch embeddings by using ViT. The features were bulged later on, by concatenation to completely connected layers to reach classification. The experiments were conducted on the MVSA data and the proposed model actually performs better than the traditional unimodal models and it also has 84.2, 85.1, 83.5 and 84.3 to show its accuracy, precision, recall and F1-score respectively. In order to be applicable in practice, a friendly user interface was offered in Flutter and to test the tool a user might upload tweets and photos, to obtain instant predictions in sentiment. The paper demonstrates the advantage of the attention-based fusion strategy to elicit subtle emotion information and sets the viability of transformer designs in social media in real life text processing. Future studies can take into account language-neutral sentiment recognition and can be extended to voice-image recognition to have more holistic applications in affective computing.

Machine Learning: Machine Learning (ML) research in Computer Science and Information Technology focuses on the development of algorithms and models that enable computers to learn from data and improve their performance over time without being explicitly programmed. It is a subset of Artificial Intelligence that uses statistical techniques to give machines the ability to learn patterns, make decisions, and predict outcomes based on data. Supervised learning, a key area of ML research, involves training models on labeled data, where the input-output relationships are predefined. This method is widely used for tasks such as classification (e.g., spam detection) and regression (e.g., predicting house prices). Unsupervised learning, on the other hand, involves finding hidden patterns in data without predefined labels, with clustering and association being typical applications in areas such as customer segmentation and anomaly detection. Reinforcement learning is another area of ML that focuses on teaching agents to make decisions by interacting with their environment and receiving feedback in the form of rewards or penalties. It is often applied in robotics, game playing, and autonomous systems, where continuous learning and adaptation are required.