SALiEnSeA: Spatial Action Localization and Temporal Attention for Video Event Recognition

Authors

  • Prithwish Jana
  • Swarnabja Bhaumik
  • Partha Pratim Mohanta

Keywords:

Video Classification, Event and Activity Recognition, Unsupervised Action Localization, Motion and Video Analysis, Deep Neural Network

Abstract

Automated event and activity recognition in unconstrained videos has become a societal necessity. In this paper, we address video event classification and analyze the influence of preprocessing through action localization on the classification task. We propose an approach for event classification in videos, that is aided by unsupervised preprocessing through temporal attention and subsequent spatial action-localization at those specific attentive instants of time. The unsupervised temporal attention is achieved through a graph-based algorithm for selection of representative (key) frames. Our spatial action localization technique SALiEnSeA identifies the most-‘dynamic’ motion patch in each key-frame. It is based on an oil-painting approach of refining and stacking motion components. These focused actions along with spatial and temporal information are fed into three separate deep neural-network pipelines consisting of ResNet50 and LSTM. A multi-tier hierarchical fusion thereby, consolidates frame-level and video-level predictions. The experiment is performed on four benchmark datasets: CCV, KCV, UCF-101 and HMDB-51. The holistically developed solution framework for action localization-aided event classification provides encouraging results. By introducing a separate modality for action-localized SALiEnSeA patches, we get improved video classification performance on top of the traditional modality of RGB frames. This outperforms standard neural-network based approaches as well as state-of-the-art multimodal models in use, for video classification.

Downloads

Download data is not yet available.

Downloads

Published

2022-04-11

How to Cite

Prithwish Jana, Swarnabja Bhaumik, & Partha Pratim Mohanta. (2022). SALiEnSeA: Spatial Action Localization and Temporal Attention for Video Event Recognition. International Journal of Computer Information Systems and Industrial Management Applications, 14, 15. Retrieved from https://cspub-ijcisim.org/index.php/ijcisim/article/view/580

Issue

Section

Original Articles