Logo
  • People
  • Research
  • Activities
  • Contact
Join us!

Meetings Archive (1)

Meeting Date
Presenter
Topic
2025/03/31
Wei Houjing

Paper Reading: [2405.16700] Implicit Multimodal Alignment: On the Generalization of Frozen LLMs in Multimodal Inputs

2025/3/24
Yuting Shi

Paper Reading: [2411.17491] What's in the Image? A Deep-Dive into the Vision of Vision Language Models

2025/03/17
Yan Zhenzhu

Paper Reading: [2502.00372] NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning.

2025/03/03
Yuting Shi

Paper Reading: [2502.05390] Learning Task Representations from In-Context Learning

2025/02/17
Wei Houjing

Paper Reading: [2410.12816] Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective

2025/02/03
Yan Zhenzhu

Paper Reading: LogicAD: Explainable Anomaly Detection via VLM-based Text FeatureExtraction

2025/01/27
Yuting Shi

Paper Reading: [2412.06769] Training Large Language Models to Reason in a Continuous Latent Space

2025/01/06
Wei Houjing

Paper Reading: [2411.09968] Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs

2024/12/23
Yuting Shi

Paper Reading: [2412.02946] WhoBrings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis

2024/12/16
Yan Zhenzhu

Paper Reading: [2411.06048] An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models

2024/12/09
Jin Tao

Paper Reading: https://aclanthology.org/2024.findings-emnlp.46/ Can LLMs Learn from Mistakes? An Empirical Study on Reasoning Tasks

2024/12/2
Wei Houjing

Paper Reading: [2411.03312] Inference Optimal VLMs Need Only One Visual Token but Larger Models

2024/11/25
Yuting Shi

Paper Reading: https://arxiv.org/abs/2303.08112 Eliciting Latent Predictions from Transformers with the Tuned Lens

2024/11/18
Mariko Kato

Paper Reading: https://arxiv.org/abs/2410.22330 Task Vectors are Cross-Modal

2024/11/11
Yan Zhenzhu

Paper Reading: [2410.03062] Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks

2024/10/21
Jin Tao

Paper Reading: https://arxiv.org/abs/2312.08914

2024/10/8
Wei Houjing

Paper Reading: 2310.05916 (arxiv.org)

2024/9/24
Jin Tao

Paper Reading: 2024.findings-eacl.87.pdf (aclanthology.org)

2024/9/10
Yan Zhenzhu

Paper Reading: [2403.16442] If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions (arxiv.org)

2024/9/3
Yuting Shi

Paper Reading: *2408.17006 (arxiv.org)

2024/8/26
Yuting Shi

Paper Reading: [2402.14328] Understanding and Patching Compositional Reasoning in LLMs (arxiv.org)

2024/8/6
Jin Tao

Paper Reading: https://arxiv.org/abs/2406.10819

2024/7/23
Yan Zhenzhu

Paper Reading: 43ba0466af2b1ac76aa85d8fbec714e3-Paper-Conference.pdf (neurips.cc)

2024/7/16
Wei Houjing

Paper Reading: [2406.17759] Interpreting Attention Layer Outputs with Sparse Autoencoders (arxiv.org)

2024/7/9
Yuting Shi

Paper Reading: 2305.15054 (arxiv.org)

2024/7/2
Yan Zhenzhu

Paper Reading:[2402.12289] DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (arxiv.org)

2024/6/25
Jin Tao

Paper Reading: https://python.langchain.com/v0.2/docs/introduction/

2024/6/21
Wei Houjing

Paper Reading: [2312.06742] Honeybee: Locality-enhanced Projector for Multimodal LLM (arxiv.org) [2405.20985] DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models (arxiv.org)

2024/6/10
Yan Zhenzhu

[2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (arxiv.org)

2024/5/29
Naoya Inoue

LLaVA/InstructBLIP/MiniGPT-4: model architecture/source code

2024/5/20
Yuting Shi

Paper Reading: [2311.03079] CogVLM: Visual Expert for Pretrained Language Models (arxiv.org)

2024/5/13
Jin Tao

Paper Reading: [2404.16054] LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation (arxiv.org)

2024/4/22
Wei Houjing

Paper Reading: [2402.12865] Backward Lens: Projecting Language Model Gradients into the Vocabulary Space (arxiv.org)

2024/4/15
Yuting Shi

Paper Reading: [2402.04236] CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations (arxiv.org)

2024/4/1
Yuting ShiWei HoujingJin Tao

Progress Report

2024/3/25
Jin Tao

Paper Reading: [2402.06596] Understanding the Weakness of Large Language Model Agents within a Complex Android Environment (arxiv.org)

2024/3/18
Wei Houjing

Paper Reading:[2202.05262] Locating and Editing Factual Associations in GPT (arxiv.org)

2024/3/4
Yuting ShiWei HoujingJin Tao

Progress Report

2024/2/26
Yuting Shi

Paper Reading: [1907.03950] Learning by Abstraction: The Neural State Machine (arxiv.org)

2024/2/19
Yuting ShiWei HoujingJin Tao

Progress Report

2024/1/29
Jin Tao

Paper Reading: [2401.10935] SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents (arxiv.org)

2024/1/22
Yuting ShiWei HoujingJin Tao

Progress Report

2024/1/15
Wei Houjing

Paper Reading: Multimodal Neurons in Pretrained Text-Only Transformers

2024/1/9
Yuting ShiWei HoujingJin Tao

Progress Report

2023/12/25
Yuting ShiWei HoujingJin Tao

Progress Report

2023/12/5
Yuting Shi

Paper Reading

2023/11/27
Wei Houjing

Paper Reading: Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies? - ACL Anthology

2023/11/20
Jin Tao

Paper Reading

2023/11/13
Yuting Shi

Paper Reading: [2309.17421] The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) (arxiv.org)

2023/11/04
Wei Houjing

Papers Reading: A Survey on Bridge-style MLMs

2023/10/27
Jin Tao

Progress Report

2023/10/16
Yuting Shi

Task assginment for LREC

2023/10/02
Wei Houjing

Paper Reading

2023/09/25
Yuting Shi

Progress Report

2023/07/03
Yuting Shi

Progress Report

2023/06/26
Wei Houjing

Paper Reading: [2209.14927] Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus (arxiv.org)

2023/05/19
Jin Tao

Jin Tao presentation

2023/05/15
Yuting Shi

Yuting’s experiment

2023/05/10
Jin Tao

Jin Tao presentation

2023/05/02
Wei Houjing

Augmented LLMs for visual reasoning

2023/04/24
Wei HoujingYuting Shi

Paper Reading: “Language Modeling with Pixels” ; “Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text”

2023/04/20
Yuting Shi

Towards Inductive Reasoning from Visual Information

Logo

©RebelsNLU at JAIST

X