A Hybrid KNN–SVD, CNN, and RoBERTa-Enhanced Framework for Sentiment Analysis of OTT Film Reviews
DOI:
https://doi.org/10.70917/ijcisim-2026-2706Keywords:
Sentiment Analysis, KNN, Hybrid Model, Complementary Membership Model, OTT, Reviews, Dimensionality Reduction, Machine Learning, ClassificationAbstract
The exponential increase in Over-the-Top (OTT) broadcasting platforms possesses resulted in a massive inflow of user-generated film reappraisals, requiring a scalable and highly accurate sentiment analysis structures to support a successful recommendation. The present study proposes a hybrid KNN–SVD, CNN, and RoBERTa-enhanced foundation for sentiment classification of OTT film reviews, integrating lightweight machine learning and transformer-based contextual understanding. A benchmark dataset consisting of 50,000 multilingual OTT movie review sources from IMDb, Amazon top Video review, and YouTube remark was pre-processed using NLP strategies including tokenization, lemmatization, tri-gram TF–IDF vectorization, semantic standardization, and RoBERTa-based contextual embeddings for comparative analysis. The Singular Value Decomposition (SVD), which reduces the aspect space from 32,000 to 250 latent components, thereby reducing the computational cost by 68%. The vector was classified using a cosine-similarity-based K-Nearest Neighbor (KNN) model with k=5, optimizing using a Bayesian search. 94.27% accuracy, 93.81% precision, 94.12% recall, and 94.01% F1-score exceeded the baseline machine learning model. Further robustness valuation using Convolutional Neural Network (CNN) produces probabilistic soft-label distribution, efficiently detecting ambiguity in various other context-dependent reviews. For benchmarking transformer performance and second reliability of hybrid models, a fine-tuned RoBERTa model was integrated, together with a review of the confusion matrix revealing reduced misclassification rates (FPR 3.9%, FNR 4.6%). The novelty of the current task lies in the combination of SVD-enabled latent semantic compression, non-parametric KNN similarity study, CNN-driven soft resolution marking, and RoBERTa-based contextual knowledge in the context of an integrated architecture. This fusion achieves high-achieving sentiment analysis while remaining computationally efficient and explainable, thus ideally suited to real-time sentiment monitoring in large OTT systems.