Visual Text Comprehension

EraX-JS-Company/EraX-MTVQA-Benchmark

Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to ...

Nature

Visual Grounding and Language Comprehension in Robotics

Visual grounding and language comprehension in robotics represent a rapidly evolving interdisciplinary field that integrates computer vision, natural language processing and robotic control systems.

Europe PMC

The role of working memory in structure prediction during language comprehension: Evidence from visual-world structural priming paradigm

Comprehenders can not only predict syntactic information of an upcoming word, but also of a larger unit, such as sentence structure. It is unclear whether such prediction effects are driven by an ...

Frontiers

How sure am I? How text genre and question type shape comprehension calibration in primary and secondary school students

Metacognitive skills in text comprehension are fundamental for students' learning, yet their development may differ depending on text genre (narrative vs. expository), question type (factual vs.

GitHub

Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation

Stage 2 – Reinforcement Learning (RL): The model is trained using a token-level Markov decision process with bi-level QA-based rewards to encourage spontaneous reasoning and correction, optimizing via ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results