Open Access Open Access  Restricted Access Subscription Access

Salient Region Guided Deep Network for Violence Detection in Surveillance Systems

Gajendra Singh, Arun Khosla, Rajiv Kapoor


Abstract: It is significant to detect violent actions in video surveillance systems automatically, for example, bus stands, malls and railway stations. Though, the earlier detection techniques generally extract statistic features around the spatiotemporal interest points or extract descriptor in the regions where movement takes place, leading to limited abilities to successfully detect violence activities in video surveillance systems. To solve this problem, a new approach for the automatic detection of violence in video surveillance systems is proposed in this paper. Our proposed algorithm first extracts salient regions in the frames using PFT and Temporal PFT; salient regions may have a possible candidate region of violence. Then these salient region images are fed to the pre-trained deep networks to extract features and these extracted features are fed to the AdaBoost Support Vector Machine classifier for detection of whether the image contains any violent candidate or not. Experimental analysis on the two challenging standard datasets validates that the method proposed in this paper is better than the previous methods.

Keywords: AdaBoost support vector machine, CNN, deep networks, PFT, Temporal-PFT

Cite this Article: Gajendra Singh, Arun Khosla, Rajiv Kapoor. Salient Region Guided Deep Network for Violence Detection in Surveillance Systems. Journal of Computer Technology & Applications. 2019; 10(3): 19–28p.


PFT, Temporal-PFT, Deep Networks, CNN, AdaBoost Support Vector Machine

Full Text:



L. R. Huesmann, J. Moise-Titus, C.-L. Podolski, and L. D. Eron, “Longitudinal relations between children’s exposure to TV violence and their aggressive and violent behavior in young adulthood: 1977-1992.,” Dev. Psychol., vol. 39, no. 2, p. 201, 2003.

J. K. Aggarwal and M. S. Ryoo, “Human activity analysis: A review,” ACM Comput. Surv. CSUR, vol. 43, no. 3, p. 16, 2011.

J. Yamato, J. Ohya, and K. Ishii, “Recognizing human action in time-sequential images using hidden markov model,” in Proceedings 1992 IEEE Computer Society conference on computer vision and pattern recognition, 1992, pp. 379–385.

N. Oliver, E. Horvitz, and A. Garg, “Layered representations for human activity recognition,” in Proceedings. Fourth IEEE International Conference on Multimodal Interfaces, 2002, pp. 3–8.

D. Zhang, D. Gatica-Perez, S. Bengio, and I. McCowan, “Modeling individual and group actions in meetings with layered HMMs,” IEEE Trans. Multimed., vol. 8, no. 3, pp. 509–520, 2006.

C. S. Pinhanez and A. F. Bobick, “Human action detection using pnf propagation of temporal constraints,” in Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No. 98CB36231), 1998, pp. 898–904.

Y. Shi, Y. Huang, D. Minnen, A. Bobick, and I. Essa, “Propagation networks for recognition of partially ordered sequential action,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., 2004, vol. 2, pp. II–II.

E. Yu and J. K. Aggarwal, “Detection of fence climbing from monocular video,” in 18th International Conference on Pattern Recognition (ICPR’06), 2006, vol. 1, pp. 375–378.

F. Cupillard, F. Brémond, and M. Thonnat, “Group behavior recognition with multiple cameras,” in Sixth IEEE Workshop on Applications of Computer Vision, 2002.(WACV 2002). Proceedings., 2002, pp. 177–183.

S. S. Intille and A. F. Bobick, “A framework for recognizing multi-agent action from visual evidence,” AAAI/IAAI, vol. 99, no. 518–525, 1999.

R. Nevatia, T. Zhao, and S. Hongeng, “Hierarchical language-based representation of events in video streams,” in 2003 Conference on Computer Vision and Pattern Recognition Workshop, 2003, vol. 4, pp. 39–39.

A. Gupta, P. Srinivasan, J. Shi, and L. S. Davis, “Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 2012–2019.

M. S. Ryoo and J. K. Aggarwal, “Semantic representation and recognition of continued and recursive human activities,” Int. J. Comput. Vis., vol. 82, no. 1, pp. 1–24, 2009.

M. Cristani, M. Bicego, and V. Murino, “Audio-visual event recognition in surveillance video sequences,” IEEE Trans. Multimed., vol. 9, no. 2, pp. 257–267, 2007.

A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Trans. Pattern Anal. Mach. Intell., no. 3, pp. 257–267, 2001.

E. Shechtman and M. Irani, “Space-time behavior based correlation,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, vol. 1, pp. 405–412.

A. Oikonomopoulos, I. Patras, M. Pantic, and N. Paragios, “Trajectory-based representation of human actions,” in Artifical Intelligence for Human Computing, Springer, 2007, pp. 133–154.

S. Vishwakarma, A. Sapre, and A. Agrawal, “Action recognition using cuboids of interest points,” in 2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), 2011, pp. 1–6.

P. Natarajan and R. Nevatia, “Coupled hidden semi markov models for activity recognition,” in 2007 IEEE Workshop on Motion and Video Computing (WMVC’07), 2007, pp. 10–10.

C. Guo, Q. Ma, and L. Zhang, “Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.

L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell., no. 11, pp. 1254–1259, 1998.

X. Hou and L. Zhang, “Saliency Detection: A Spectral Residual Approach,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 2007, pp. 1–8.

X. Sun, H. Yao, and R. Ji, “What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 1552–1559.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.

and and E. Sung, “A study of AdaBoost with SVM based weak learners,” in Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., 2005, vol. 1, pp. 196–201 vol. 1.

R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence-rated predictions,” Mach. Learn., vol. 37, no. 3, pp. 297–336, 1999.

The Hockey Fight dataset. Available at

The BEHAVE dataset. Available at

L. Yeffet and L. Wolf, “Local trinary patterns for human action recognition,” in 2009 IEEE 12th international conference on computer vision, 2009, pp. 492–497.

T. Hassner, Y. Itcher, and O. Kliper-Gross, “Violent flows: Real-time detection of violent crowd behavior,” in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 1–6.

E. B. Nievas, O. D. Suarez, G. B. García, and R. Sukthankar, “Violence detection in video using computer vision techniques,” in International conference on Computer analysis of images and patterns, 2011, pp. 332–339.

O. Deniz, I. Serrano, G. Bueno, and T.-K. Kim, “Fast violence detection in video,” in 2014 International Conference on Computer Vision Theory and Applications (VISAPP), 2014, vol. 2, pp. 478–485.

P. Bilinski and F. Bremond, “Human violence recognition and detection in surveillance videos,” in 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2016, pp. 30–36.

S. Mohammadi, H. Kiani, A. Perina, and V. Murino, “Violence detection in crowded scenes using substantial derivative,” in 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2015, pp. 1–6.

X. Cui, Q. Liu, M. Gao, and D. N. Metaxas, “Abnormal detection using interaction energy potentials,” in CVPR 2011, 2011, pp. 3161–3167.


  • There are currently no refbacks.

Copyright (c) 2019 Journal of Computer Technology & Applications