Meeting Date | Presenter | Topic |
---|---|---|
2025/03/31 | Wei Houjing | Paper Reading: [2405.16700] Implicit Multimodal Alignment: On the Generalization of Frozen LLMs in Multimodal Inputs |
2025/3/24 | Yuting Shi | Paper Reading: [2411.17491] What's in the Image? A Deep-Dive into the Vision of Vision Language Models |
2025/03/17 | Yan Zhenzhu | Paper Reading: [2502.00372] NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning. |
2025/03/03 | Yuting Shi | Paper Reading: [2502.05390] Learning Task Representations from In-Context Learning |
2025/02/17 | Wei Houjing | Paper Reading: [2410.12816] Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective |
2025/02/03 | Yan Zhenzhu | Paper Reading: LogicAD: Explainable Anomaly Detection via VLM-based Text FeatureExtraction |
2025/01/27 | Yuting Shi | Paper Reading: [2412.06769] Training Large Language Models to Reason in a Continuous Latent Space |
2025/01/06 | Wei Houjing | Paper Reading: [2411.09968] Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs |
2024/12/23 | Yuting Shi | Paper Reading: [2412.02946] WhoBrings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis |
2024/12/16 | Yan Zhenzhu | Paper Reading: [2411.06048] An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models |
2024/12/09 | Jin Tao | Paper Reading: https://aclanthology.org/2024.findings-emnlp.46/ Can LLMs Learn from Mistakes? An Empirical Study on Reasoning Tasks |
2024/12/2 | Wei Houjing | Paper Reading: [2411.03312] Inference Optimal VLMs Need Only One Visual Token but Larger Models |
2024/11/25 | Yuting Shi | Paper Reading: https://arxiv.org/abs/2303.08112 Eliciting Latent Predictions from Transformers with the Tuned Lens |
2024/11/18 | Mariko Kato | Paper Reading: https://arxiv.org/abs/2410.22330 Task Vectors are Cross-Modal |
2024/11/11 | Yan Zhenzhu | Paper Reading: [2410.03062] Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks |
2024/10/21 | Jin Tao | Paper Reading: https://arxiv.org/abs/2312.08914 |
2024/10/8 | Wei Houjing | Paper Reading: 2310.05916 (arxiv.org) |
2024/9/24 | Jin Tao | Paper Reading: 2024.findings-eacl.87.pdf (aclanthology.org) |
2024/9/10 | Yan Zhenzhu | |
2024/9/3 | Yuting Shi | Paper Reading: *2408.17006 (arxiv.org) |
2024/8/26 | Yuting Shi | Paper Reading: [2402.14328] Understanding and Patching Compositional Reasoning in LLMs (arxiv.org) |
2024/8/6 | Jin Tao | Paper Reading: https://arxiv.org/abs/2406.10819 |
2024/7/23 | Yan Zhenzhu | Paper Reading: 43ba0466af2b1ac76aa85d8fbec714e3-Paper-Conference.pdf (neurips.cc) |
2024/7/16 | Wei Houjing | Paper Reading: [2406.17759] Interpreting Attention Layer Outputs with Sparse Autoencoders (arxiv.org) |
2024/7/9 | Yuting Shi | Paper Reading: 2305.15054 (arxiv.org) |
2024/7/2 | Yan Zhenzhu | Paper Reading:[2402.12289] DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (arxiv.org) |
2024/6/25 | Jin Tao | Paper Reading: https://python.langchain.com/v0.2/docs/introduction/ |
2024/6/21 | Wei Houjing | Paper Reading: [2312.06742] Honeybee: Locality-enhanced Projector for Multimodal LLM (arxiv.org) [2405.20985] DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models (arxiv.org) |
2024/6/10 | Yan Zhenzhu | [2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (arxiv.org) |
2024/5/29 | Naoya Inoue | LLaVA/InstructBLIP/MiniGPT-4: model architecture/source code |
2024/5/20 | Yuting Shi | Paper Reading: [2311.03079] CogVLM: Visual Expert for Pretrained Language Models (arxiv.org) |
2024/5/13 | Jin Tao | Paper Reading: [2404.16054] LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation (arxiv.org) |
2024/4/22 | Wei Houjing | Paper Reading: [2402.12865] Backward Lens: Projecting Language Model Gradients into the Vocabulary Space (arxiv.org) |
2024/4/15 | Yuting Shi | Paper Reading: [2402.04236] CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations (arxiv.org) |
2024/4/1 | Yuting ShiWei HoujingJin Tao | Progress Report |
2024/3/25 | Jin Tao | Paper Reading: [2402.06596] Understanding the Weakness of Large Language Model Agents within a Complex Android Environment (arxiv.org) |
2024/3/18 | Wei Houjing | Paper Reading:[2202.05262] Locating and Editing Factual Associations in GPT (arxiv.org) |
2024/3/4 | Yuting ShiWei HoujingJin Tao | Progress Report |
2024/2/26 | Yuting Shi | Paper Reading: [1907.03950] Learning by Abstraction: The Neural State Machine (arxiv.org) |
2024/2/19 | Yuting ShiWei HoujingJin Tao | Progress Report |
2024/1/29 | Jin Tao | Paper Reading: [2401.10935] SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents (arxiv.org) |
2024/1/22 | Yuting ShiWei HoujingJin Tao | Progress Report |
2024/1/15 | Wei Houjing | Paper Reading: Multimodal Neurons in Pretrained Text-Only Transformers |
2024/1/9 | Yuting ShiWei HoujingJin Tao | Progress Report |
2023/12/25 | Yuting ShiWei HoujingJin Tao | Progress Report |
2023/12/5 | Yuting Shi | Paper Reading |
2023/11/27 | Wei Houjing | Paper Reading: Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies? - ACL Anthology |
2023/11/20 | Jin Tao | Paper Reading |
2023/11/13 | Yuting Shi | Paper Reading: [2309.17421] The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) (arxiv.org) |
2023/11/04 | Wei Houjing | Papers Reading: A Survey on Bridge-style MLMs |
2023/10/27 | Jin Tao | Progress Report |
2023/10/16 | Yuting Shi | Task assginment for LREC |
2023/10/02 | Wei Houjing | Paper Reading |
2023/09/25 | Yuting Shi | Progress Report |
2023/07/03 | Yuting Shi | Progress Report |
2023/06/26 | Wei Houjing | Paper Reading: [2209.14927] Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus (arxiv.org) |
2023/05/19 | Jin Tao | Jin Tao presentation |
2023/05/15 | Yuting Shi | Yuting’s experiment |
2023/05/10 | Jin Tao | Jin Tao presentation |
2023/05/02 | Wei Houjing | Augmented LLMs for visual reasoning |
2023/04/24 | Wei HoujingYuting Shi | Paper Reading: “Language Modeling with Pixels” ; “Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text” |
2023/04/20 | Yuting Shi | Towards Inductive Reasoning from Visual Information |