Activities ▸ Visual Reasoning Unit
This research group consists of four members with distinct focusing areas, which is dedicated to advancing the fields of artificial intelligence and computer vision through their specialized research. Regular events within the group include the sharing of papers related to individual research topics and text-image multimodal studies, along with updates on the progress of their respective research projects.
Members
- Tao Jin, D2
- Yuting Shi (Unit leader), D1
- Zhenzhu Yan, D1
- Houjing Wei, M2
- Mariko Kato, M1
- Naoya Inoue
Meetings Log
Meeting Date | Presenter | Topic |
---|---|---|
2025/03/03 | Yuting Shi | Paper Reading: [2502.05390] Learning Task Representations from In-Context Learning |
2025/02/17 | Wei Houjing | Paper Reading: [2410.12816] Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective |
2025/02/03 | Yan Zhenzhu | Paper Reading: LogicAD: Explainable Anomaly Detection via VLM-based Text FeatureExtraction |
2025/01/27 | Yuting Shi | Paper Reading: [2412.06769] Training Large Language Models to Reason in a Continuous Latent Space |
2025/01/06 | Wei Houjing | Paper Reading: [2411.09968] Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs |
2024/12/23 | Yuting Shi | Paper Reading: [2412.02946] WhoBrings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis |
2024/12/16 | Yan Zhenzhu | Paper Reading: [2411.06048] An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models |
2024/12/09 | Jin Tao | Paper Reading: https://aclanthology.org/2024.findings-emnlp.46/ Can LLMs Learn from Mistakes? An Empirical Study on Reasoning Tasks |
2024/12/2 | Wei Houjing | Paper Reading: [2411.03312] Inference Optimal VLMs Need Only One Visual Token but Larger Models |
2024/11/25 | Yuting Shi | Paper Reading: https://arxiv.org/abs/2303.08112 Eliciting Latent Predictions from Transformers with the Tuned Lens |
2024/11/18 | Mariko Kato | Paper Reading: https://arxiv.org/abs/2410.22330 Task Vectors are Cross-Modal |
2024/11/11 | Yan Zhenzhu | Paper Reading: [2410.03062] Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks |
2024/10/21 | Jin Tao | Paper Reading: https://arxiv.org/abs/2312.08914 |
2024/10/8 | Wei Houjing | Paper Reading: 2310.05916 (arxiv.org) |
2024/9/24 | Jin Tao | Paper Reading: 2024.findings-eacl.87.pdf (aclanthology.org) |
2024/9/10 | Yan Zhenzhu | |
2024/9/3 | Yuting Shi | Paper Reading: *2408.17006 (arxiv.org) |
2024/8/26 | Yuting Shi | Paper Reading: [2402.14328] Understanding and Patching Compositional Reasoning in LLMs (arxiv.org) |
2024/8/6 | Jin Tao | Paper Reading: https://arxiv.org/abs/2406.10819 |
2024/7/23 | Yan Zhenzhu | Paper Reading: 43ba0466af2b1ac76aa85d8fbec714e3-Paper-Conference.pdf (neurips.cc) |
2024/7/16 | Wei Houjing | Paper Reading: [2406.17759] Interpreting Attention Layer Outputs with Sparse Autoencoders (arxiv.org) |
2024/7/9 | Yuting Shi | Paper Reading: 2305.15054 (arxiv.org) |
2024/7/2 | Yan Zhenzhu | Paper Reading:[2402.12289] DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (arxiv.org) |
2024/6/25 | Jin Tao | Paper Reading: https://python.langchain.com/v0.2/docs/introduction/ |
2024/6/21 | Wei Houjing | Paper Reading: [2312.06742] Honeybee: Locality-enhanced Projector for Multimodal LLM (arxiv.org) [2405.20985] DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models (arxiv.org) |
2024/6/10 | Yan Zhenzhu | [2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (arxiv.org) |
2024/5/29 | Naoya Inoue | LLaVA/InstructBLIP/MiniGPT-4: model architecture/source code |
2024/5/20 | Yuting Shi | Paper Reading: [2311.03079] CogVLM: Visual Expert for Pretrained Language Models (arxiv.org) |
2024/5/13 | Jin Tao | Paper Reading: [2404.16054] LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation (arxiv.org) |
2024/4/22 | Wei Houjing | Paper Reading: [2402.12865] Backward Lens: Projecting Language Model Gradients into the Vocabulary Space (arxiv.org) |
2024/4/15 | Yuting Shi | Paper Reading: [2402.04236] CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations (arxiv.org) |
2024/4/1 | Yuting ShiWei HoujingJin Tao | Progress Report |
2024/3/25 | Jin Tao | Paper Reading: [2402.06596] Understanding the Weakness of Large Language Model Agents within a Complex Android Environment (arxiv.org) |
2024/3/18 | Wei Houjing | Paper Reading:[2202.05262] Locating and Editing Factual Associations in GPT (arxiv.org) |
2024/3/4 | Yuting ShiWei HoujingJin Tao | Progress Report |
2024/2/26 | Yuting Shi | Paper Reading: [1907.03950] Learning by Abstraction: The Neural State Machine (arxiv.org) |
2024/2/19 | Yuting ShiWei HoujingJin Tao | Progress Report |
2024/1/29 | Jin Tao | Paper Reading: [2401.10935] SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents (arxiv.org) |
2024/1/22 | Yuting ShiWei HoujingJin Tao | Progress Report |
2024/1/15 | Wei Houjing | Paper Reading: Multimodal Neurons in Pretrained Text-Only Transformers |
2024/1/9 | Yuting ShiWei HoujingJin Tao | Progress Report |
2023/12/25 | Yuting ShiWei HoujingJin Tao | Progress Report |
2023/12/5 | Yuting Shi | Paper Reading |
2023/11/27 | Wei Houjing | Paper Reading: Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies? - ACL Anthology |
2023/11/20 | Jin Tao | Paper Reading |
2023/11/13 | Yuting Shi | Paper Reading: [2309.17421] The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) (arxiv.org) |
2023/11/04 | Wei Houjing | Papers Reading: A Survey on Bridge-style MLMs |
2023/10/27 | Jin Tao | Progress Report |
2023/10/16 | Yuting Shi | Task assginment for LREC |
2023/10/02 | Wei Houjing | Paper Reading |
2023/09/25 | Yuting Shi | Progress Report |
2023/07/03 | Yuting Shi | Progress Report |
2023/06/26 | Wei Houjing | Paper Reading: [2209.14927] Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus (arxiv.org) |
2023/05/19 | Jin Tao | Jin Tao presentation |
2023/05/15 | Yuting Shi | Yuting’s experiment |
2023/05/10 | Jin Tao | Jin Tao presentation |
2023/05/02 | Wei Houjing | Augmented LLMs for visual reasoning |
2023/04/24 | Wei HoujingYuting Shi | Paper Reading: “Language Modeling with Pixels” ; “Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text” |
2023/04/20 | Yuting Shi | Towards Inductive Reasoning from Visual Information |