DESCRIPTION :
This Project aims to detect critical situations in the CCTV video stream. Weakly-supervised video anomaly detection (wVAD) has recently gained popularity thanks to its ability to provide frame-level binary labels (i.e., 0 : Normal, 1 : Anomaly) using only video-level labels during training. Despite decent progress on simple anomaly detection (such as an explosion), recently proliferated methods still suffer from complex real-world anomalies (such as shoplifting). This is mainly due to two reasons : (I) undermining the anomaly diversity during training : previous methods assemble diverse categories of anomalies under a unified label, thereby ignoring the category-specific key attribution. (II) Lack of precise temporal information (i.e., weak-supervision) : limits the ability of the methods to capture complex abnormal attributes that can viably blend with normal events. Towards addressing this, we plan to first decompose the anomaly diversity into multiple experts for encoding
category-specific representations and then to entangle pertinent cues of each expert by exploiting the semantic intercorrelation between them. Further, existing anomaly detection methods primarily focus on immediate detection, lacking the capability to anticipate anomalies well in advance. This shortcoming is particularly critical in systems where early warning can prevent anomalies. By leveraging the strengths of auto-regressive models, which predict future values based on historical data, we aim to extend the predictive horizon, allowing for timely and informed decision-making.
Mission confiée
We will leverage state-of-the-art VLMs to bridge the gap between visual data and linguistic interpretations. By interacting with the VLMs and LLMs, users can query, interpret, and refine the detection process, fostering a more dynamic and adaptable anomaly detection system. To further enhance interpretability, we will integrate Large Language Models (LLMs) with Chain-of-Thought (CoT) reasoning and Retrieval-Augmented Generation (RAG) techniques. CoT enables LLMs to break down complex reasoning tasks into intermediate steps, mirroring human cognitive processes. Combined with RAG, which retrieves relevant external knowledge for grounding responses, this approach significantly reduces hallucination while improving anomaly explainability.
Principales activités
The Inria STARS team is seeking a PhD student with a strong background in computer vision, deep learning, and machine learning.
The candidate is expected to conduct research related to the development of computer vision algorithms for video understanding.
Main activities :
Analyze the requirements of end-users and study the limitations of existing solutions.
Proposea new algorithm for detectingvideo anomalies (wVAD)
Evaluate and optimize the proposed algorithm on the targeted video datasets
Oral presentation and writing reports
Submit a scientific paper to a conference
Code d'emploi : Thésard (h/f)
Niveau de formation : Bac+5
Temps partiel / Temps plein : Plein temps
Type de contrat : Contrat à durée déterminée (CDD)
Compétences : Vision par Ordinateur, C ++ (Langage de Programmation), Encodages, Programmation Informatique, Linux, Python (Langage de Programmation), Machine Learning, OpenCV, Tensorflow, Pytorch, Large Language Models, Deep Learning, Technologies Informatiques, Prise de Décision, Prise de Parole en Publique, Stabilité Émmotionnelle, Recherche, Algorithmes, Assemblage et Installation, Processus Cognitifs, Organisation d'Événements, Mathématiques, Analyses Prédictives, Rédaction de Rapports, Vidéo Protection, Détection D'anomalies
Courriel :
webmaster@inria.fr
Téléphone :
0139635511
Type d'annonceur : Employeur direct