Visual Reasoning Unit

Activities ▸ Visual Reasoning Unit

This research group consists of four members with distinct focusing areas, which is dedicated to advancing the fields of artificial intelligence and computer vision through their specialized research. Regular events within the group include the sharing of papers related to individual research topics and text-image multimodal studies, along with updates on the progress of their respective research projects.

Members

Tao Jin, D2
Yuting Shi (Unit leader), D1
Zhenzhu Yan, D1
Houjing Wei, M2
Mariko Kato, M1
Naoya Inoue

Meetings Log

Meeting Date	Presenter	Topic
2025/03/31	Wei Houjing	Paper Reading: [2405.16700] Implicit Multimodal Alignment: On the Generalization of Frozen LLMs in Multimodal Inputs
2025/3/24	Yuting Shi	Paper Reading: [2411.17491] What's in the Image? A Deep-Dive into the Vision of Vision Language Models
2025/03/17	Yan Zhenzhu	Paper Reading: [2502.00372] NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning.
2025/03/03	Yuting Shi	Paper Reading: [2502.05390] Learning Task Representations from In-Context Learning
2025/02/17	Wei Houjing	Paper Reading: [2410.12816] Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective
2025/02/03	Yan Zhenzhu	Paper Reading: LogicAD: Explainable Anomaly Detection via VLM-based Text FeatureExtraction
2025/01/27	Yuting Shi	Paper Reading: [2412.06769] Training Large Language Models to Reason in a Continuous Latent Space
2025/01/06	Wei Houjing	Paper Reading: [2411.09968] Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs
2024/12/23	Yuting Shi	Paper Reading: [2412.02946] WhoBrings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis
2024/12/16	Yan Zhenzhu	Paper Reading: [2411.06048] An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models
2024/12/09	Jin Tao	Paper Reading: https://aclanthology.org/2024.findings-emnlp.46/ Can LLMs Learn from Mistakes? An Empirical Study on Reasoning Tasks
2024/12/2	Wei Houjing	Paper Reading: [2411.03312] Inference Optimal VLMs Need Only One Visual Token but Larger Models
2024/11/25	Yuting Shi	Paper Reading: https://arxiv.org/abs/2303.08112 Eliciting Latent Predictions from Transformers with the Tuned Lens
2024/11/18	Mariko Kato	Paper Reading: https://arxiv.org/abs/2410.22330 Task Vectors are Cross-Modal
2024/11/11	Yan Zhenzhu	Paper Reading: [2410.03062] Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks
2024/10/21	Jin Tao	Paper Reading: https://arxiv.org/abs/2312.08914
2024/10/8	Wei Houjing	Paper Reading: 2310.05916 (arxiv.org)
2024/9/24	Jin Tao	Paper Reading: 2024.findings-eacl.87.pdf (aclanthology.org)
2024/9/10	Yan Zhenzhu	Paper Reading: [2403.16442] If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions (arxiv.org)
2024/9/3	Yuting Shi	Paper Reading: *2408.17006 (arxiv.org)
2024/8/26	Yuting Shi	Paper Reading: [2402.14328] Understanding and Patching Compositional Reasoning in LLMs (arxiv.org)
2024/8/6	Jin Tao	Paper Reading: https://arxiv.org/abs/2406.10819
2024/7/23	Yan Zhenzhu	Paper Reading: 43ba0466af2b1ac76aa85d8fbec714e3-Paper-Conference.pdf (neurips.cc)
2024/7/16	Wei Houjing	Paper Reading: [2406.17759] Interpreting Attention Layer Outputs with Sparse Autoencoders (arxiv.org)
2024/7/9	Yuting Shi	Paper Reading: 2305.15054 (arxiv.org)
2024/7/2	Yan Zhenzhu	Paper Reading:[2402.12289] DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (arxiv.org)
2024/6/25	Jin Tao	Paper Reading: https://python.langchain.com/v0.2/docs/introduction/
2024/6/21	Wei Houjing	Paper Reading: [2312.06742] Honeybee: Locality-enhanced Projector for Multimodal LLM (arxiv.org) [2405.20985] DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models (arxiv.org)
2024/6/10	Yan Zhenzhu	[2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (arxiv.org)
2024/5/29	Naoya Inoue	LLaVA/InstructBLIP/MiniGPT-4: model architecture/source code
2024/5/20	Yuting Shi	Paper Reading: [2311.03079] CogVLM: Visual Expert for Pretrained Language Models (arxiv.org)
2024/5/13	Jin Tao	Paper Reading: [2404.16054] LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation (arxiv.org)
2024/4/22	Wei Houjing	Paper Reading: [2402.12865] Backward Lens: Projecting Language Model Gradients into the Vocabulary Space (arxiv.org)
2024/4/15	Yuting Shi	Paper Reading: [2402.04236] CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations (arxiv.org)
2024/4/1	Yuting ShiWei HoujingJin Tao	Progress Report
2024/3/25	Jin Tao	Paper Reading: [2402.06596] Understanding the Weakness of Large Language Model Agents within a Complex Android Environment (arxiv.org)
2024/3/18	Wei Houjing	Paper Reading:[2202.05262] Locating and Editing Factual Associations in GPT (arxiv.org)
2024/3/4	Yuting ShiWei HoujingJin Tao	Progress Report
2024/2/26	Yuting Shi	Paper Reading: [1907.03950] Learning by Abstraction: The Neural State Machine (arxiv.org)
2024/2/19	Yuting ShiWei HoujingJin Tao	Progress Report
2024/1/29	Jin Tao	Paper Reading: [2401.10935] SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents (arxiv.org)
2024/1/22	Yuting ShiWei HoujingJin Tao	Progress Report
2024/1/15	Wei Houjing	Paper Reading: Multimodal Neurons in Pretrained Text-Only Transformers
2024/1/9	Yuting ShiWei HoujingJin Tao	Progress Report
2023/12/25	Yuting ShiWei HoujingJin Tao	Progress Report
2023/12/5	Yuting Shi	Paper Reading
2023/11/27	Wei Houjing	Paper Reading: Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies? - ACL Anthology
2023/11/20	Jin Tao	Paper Reading
2023/11/13	Yuting Shi	Paper Reading: [2309.17421] The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) (arxiv.org)
2023/11/04	Wei Houjing	Papers Reading: A Survey on Bridge-style MLMs
2023/10/27	Jin Tao	Progress Report
2023/10/16	Yuting Shi	Task assginment for LREC
2023/10/02	Wei Houjing	Paper Reading
2023/09/25	Yuting Shi	Progress Report
2023/07/03	Yuting Shi	Progress Report
2023/06/26	Wei Houjing	Paper Reading: [2209.14927] Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus (arxiv.org)
2023/05/19	Jin Tao	Jin Tao presentation
2023/05/15	Yuting Shi	Yuting’s experiment
2023/05/10	Jin Tao	Jin Tao presentation
2023/05/02	Wei Houjing	Augmented LLMs for visual reasoning
2023/04/24	Wei HoujingYuting Shi	Paper Reading: “Language Modeling with Pixels” ; “Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text”
2023/04/20	Yuting Shi	Towards Inductive Reasoning from Visual Information