Projects

Research ▸ Projects

Overview

Individual projects

Mechanistic Interpretability

Mechanistic Interpretability. We are dedicated to simplifying the complex inference of language models into a sequence of simpler processes. For instance, in the work shown in the figure, we break down the In-context Learning process of language models into three straightforward steps with careful measurements, and use such a decompose to explain many observed phenomena. This approach falls under mechanistic interpretability, offering a transparent, step-by-step understanding of how neural networks perform tasks. While this type of decomposition may not always yield precise models, as the saying goes, "all models are wrong, but some are useful." Rather than pursuing traditional machine learning theory's theoretical elegance and precision, we prioritize empirical practicality to guide better practice.

Our Efforts: Cho et al. ICLR 2025(shown in the figure), Cho et al. COLING 2025

Application: Improve the In-context Learning Performance. If we input a text-label paired prompt and leave the final label blank (as shown in the figure), the language model will predict the missing label using its causal language modeling operation. This allows us to prompt the language model to learn from the few-shot text-label pairs and generate a response to the question, which is called In-context Learning. As previously mentioned, we also focus on analyzing and improving the in-context learning capabilities of language models. For example, in the work shown in the figure, we examine the decision boundaries in in-context learning and refine them to boost both accuracy and stability. We believe our work can significantly enhance the practical utility of language models in downstream tasks.

Our Efforts: Cho et al. NAACL 2025(shown in the figure, Japanese), Cho et al. 2024

Towards Improving Reasoning Capability of LLMs

Benchmarking Multi-hop QA in Japanese. JEMHopQA is a multi-hop QA dataset in Japanese for the development of explainable QA systems, consisting of question-answer pairs with two types of questions, and derivation triples of supporting evidence. It is created based on Japanese Wikipedia using both crowd-sourced human annotation as well as prompting a large language model (LLM). Evaluating several state-of-the-art LLMs on proposed dataset show that the dataset is sufficiently challenging.

Our Efforts: Ai Ishii et al. LREC-COLING 2024

Datasets for Logical Fallacy Detection. This paper introduces four sets of templates for common informal logical fallacies. Using proposed templates, an annotation study is conducted on top of 400 fallacious arguments taken from LOGIC dataset and achieves a high agreement score and reasonable coverage. Extensive experiments are conducted for detecting the structure of fallacies and discover that state-of-the-art language models struggle with detecting fallacy templates.

Our Efforts: Irfan Robbani et al. EMNLP 2024

Reasoning and Probing for Vision-Language Models

Benchmark for Inductive Visual Reasoning. We introduce Find-the-Common (FTC) benchmark, which consists of 353 instances, each of which provides (i) four 3D scenes consisting of 2-6 objects and (ii) four multiple choices, including a decoy choice that is partially true in scenes. Models are required to identify an answer that explains the common attributes across visual scenes. We propose Image-Based Reasoning, Text-Based Reasoning, and Image-Text-Based Reasoning for evaluating various VL models. Extensive experiments show that even state-of-the-art models like GPT-4V struggle on FTC, showing FTC as a new challenge for visual reasoning.

Our Efforts: Yuting Shi et al. LREC-COLING 2024

Grants

井之上直也 (PI). 人々が頼りたくなる自己批判的思考力を備えた言語処理機構. JST 2023年度創発的研究支援事業, 2024/10-2028/03, 20,000,000JPY.
井之上直也 (PI). 自己認識的に推論ができる信頼性の高いAIの研究. 中島国際交流財団日本人独立研究者始動助成金, 2024/04-2027/03, 5,000,000JPY.
乾健太郎, 井之上直也, 中川智皓, HEINZERLING BENJAMIN, 吉川将司. 深い論述理解の計算モデリングと論述学習支援への応用. JSPS Grant-in-Aid for Scientific Research (KAKENHI 基盤A). 22H00524. 2022/04/01-2027/03/31, 41,470,000JPY.
水本正晴, 和泉悠, Nguyen Minh Le, 井之上直也, 窪田悠介. Cross-Linguistic Semantic Alignment for Universal Philosophical Concepts. JSPS Grant-in-Aid for Scientific Research (KAKENHI 基盤A). 2025/04/01-2029/03/31, 44,000,000JPY.

Past grants can be found here.

Main collaborators

Previous Pojects