This project implements state-of-the-art Human-Object Interaction (HOI) detection models, focusing on identifying relationships between humans and objects in visual scenes. The system can detect not ...
Large Vision-Language Models (LVLMs) often hallucinate objects by effectively ignoring the image and instead relying on previously generated output (prelim) tokens. We quantify this phenomenon using ...