Logo
  • People
  • Research
  • Activities
  • Contact
Join us!

Visual Reasoning Unit

Activities ▸ Visual Reasoning Unit

This research group consists of four members with distinct focusing areas, which is dedicated to advancing the fields of artificial intelligence and computer vision through their specialized research. Regular events within the group include the sharing of papers related to individual research topics and text-image multimodal studies, along with updates on the progress of their respective research projects.

Members

  • Tao Jin, D2
  • Yuting Shi (Unit leader), D1
  • Zhenzhu Yan, D1
  • Houjing Wei, M2
  • Mariko Kato, M1
  • Naoya Inoue

Meetings Log

Meeting Date
Presenter
Topic
2025/03/31
Wei Houjing

Paper Reading: [2405.16700] Implicit Multimodal Alignment: On the Generalization of Frozen LLMs in Multimodal Inputs

2025/3/24
Yuting Shi

Paper Reading: [2411.17491] What's in the Image? A Deep-Dive into the Vision of Vision Language Models

2025/03/17
Yan Zhenzhu

Paper Reading: [2502.00372] NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning.

2025/03/03
Yuting Shi

Paper Reading: [2502.05390] Learning Task Representations from In-Context Learning

2025/02/17
Wei Houjing

Paper Reading: [2410.12816] Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective

2025/02/03
Yan Zhenzhu

Paper Reading: LogicAD: Explainable Anomaly Detection via VLM-based Text FeatureExtraction

2025/01/27
Yuting Shi

Paper Reading: [2412.06769] Training Large Language Models to Reason in a Continuous Latent Space

2025/01/06
Wei Houjing

Paper Reading: [2411.09968] Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs

2024/12/23
Yuting Shi

Paper Reading: [2412.02946] WhoBrings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis

2024/12/16
Yan Zhenzhu

Paper Reading: [2411.06048] An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models

2024/12/09
Jin Tao

Paper Reading: https://aclanthology.org/2024.findings-emnlp.46/ Can LLMs Learn from Mistakes? An Empirical Study on Reasoning Tasks

2024/12/2
Wei Houjing

Paper Reading: [2411.03312] Inference Optimal VLMs Need Only One Visual Token but Larger Models

2024/11/25
Yuting Shi

Paper Reading: https://arxiv.org/abs/2303.08112 Eliciting Latent Predictions from Transformers with the Tuned Lens

2024/11/18
Mariko Kato

Paper Reading: https://arxiv.org/abs/2410.22330 Task Vectors are Cross-Modal

2024/11/11
Yan Zhenzhu

Paper Reading: [2410.03062] Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks

2024/10/21
Jin Tao

Paper Reading: https://arxiv.org/abs/2312.08914

2024/10/8
Wei Houjing

Paper Reading: 2310.05916 (arxiv.org)

2024/9/24
Jin Tao

Paper Reading: 2024.findings-eacl.87.pdf (aclanthology.org)

2024/9/10
Yan Zhenzhu

Paper Reading: [2403.16442] If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions (arxiv.org)

2024/9/3
Yuting Shi

Paper Reading: *2408.17006 (arxiv.org)

2024/8/26
Yuting Shi

Paper Reading: [2402.14328] Understanding and Patching Compositional Reasoning in LLMs (arxiv.org)

2024/8/6
Jin Tao

Paper Reading: https://arxiv.org/abs/2406.10819

2024/7/23
Yan Zhenzhu

Paper Reading: 43ba0466af2b1ac76aa85d8fbec714e3-Paper-Conference.pdf (neurips.cc)

2024/7/16
Wei Houjing

Paper Reading: [2406.17759] Interpreting Attention Layer Outputs with Sparse Autoencoders (arxiv.org)

2024/7/9
Yuting Shi

Paper Reading: 2305.15054 (arxiv.org)

2024/7/2
Yan Zhenzhu

Paper Reading:[2402.12289] DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (arxiv.org)

2024/6/25
Jin Tao

Paper Reading: https://python.langchain.com/v0.2/docs/introduction/

2024/6/21
Wei Houjing

Paper Reading: [2312.06742] Honeybee: Locality-enhanced Projector for Multimodal LLM (arxiv.org) [2405.20985] DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models (arxiv.org)

2024/6/10
Yan Zhenzhu

[2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (arxiv.org)

2024/5/29
Naoya Inoue

LLaVA/InstructBLIP/MiniGPT-4: model architecture/source code

2024/5/20
Yuting Shi

Paper Reading: [2311.03079] CogVLM: Visual Expert for Pretrained Language Models (arxiv.org)

2024/5/13
Jin Tao

Paper Reading: [2404.16054] LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation (arxiv.org)

2024/4/22
Wei Houjing

Paper Reading: [2402.12865] Backward Lens: Projecting Language Model Gradients into the Vocabulary Space (arxiv.org)

2024/4/15
Yuting Shi

Paper Reading: [2402.04236] CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations (arxiv.org)

2024/4/1
Yuting ShiWei HoujingJin Tao

Progress Report

2024/3/25
Jin Tao

Paper Reading: [2402.06596] Understanding the Weakness of Large Language Model Agents within a Complex Android Environment (arxiv.org)

2024/3/18
Wei Houjing

Paper Reading:[2202.05262] Locating and Editing Factual Associations in GPT (arxiv.org)

2024/3/4
Yuting ShiWei HoujingJin Tao

Progress Report

2024/2/26
Yuting Shi

Paper Reading: [1907.03950] Learning by Abstraction: The Neural State Machine (arxiv.org)

2024/2/19
Yuting ShiWei HoujingJin Tao

Progress Report

2024/1/29
Jin Tao

Paper Reading: [2401.10935] SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents (arxiv.org)

2024/1/22
Yuting ShiWei HoujingJin Tao

Progress Report

2024/1/15
Wei Houjing

Paper Reading: Multimodal Neurons in Pretrained Text-Only Transformers

2024/1/9
Yuting ShiWei HoujingJin Tao

Progress Report

2023/12/25
Yuting ShiWei HoujingJin Tao

Progress Report

2023/12/5
Yuting Shi

Paper Reading

2023/11/27
Wei Houjing

Paper Reading: Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies? - ACL Anthology

2023/11/20
Jin Tao

Paper Reading

2023/11/13
Yuting Shi

Paper Reading: [2309.17421] The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) (arxiv.org)

2023/11/04
Wei Houjing

Papers Reading: A Survey on Bridge-style MLMs

2023/10/27
Jin Tao

Progress Report

2023/10/16
Yuting Shi

Task assginment for LREC

2023/10/02
Wei Houjing

Paper Reading

2023/09/25
Yuting Shi

Progress Report

2023/07/03
Yuting Shi

Progress Report

2023/06/26
Wei Houjing

Paper Reading: [2209.14927] Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus (arxiv.org)

2023/05/19
Jin Tao

Jin Tao presentation

2023/05/15
Yuting Shi

Yuting’s experiment

2023/05/10
Jin Tao

Jin Tao presentation

2023/05/02
Wei Houjing

Augmented LLMs for visual reasoning

2023/04/24
Wei HoujingYuting Shi

Paper Reading: “Language Modeling with Pixels” ; “Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text”

2023/04/20
Yuting Shi

Towards Inductive Reasoning from Visual Information

Logo

©RebelsNLU at JAIST

X