Research Areas
IISR advances intelligent information services through three major thrusts (2021–2025), integrating LLMs, Retrieval (RAG), Data-centric AI, and Cross-domain Representation Learning.
Thrust 1 — Taiwan-Localized LLMs & Low-Resource Languages
ACL 2024
EMNLP 2024
NAACL 2025
COLING 2024
Motivation: Mainstream LLMs often lack Traditional Chinese (Taiwan) cultural grounding and have limited coverage for local languages (e.g., Taiwanese Hokkien).
🔑 Key Contributions
-
Chat Vector (ACL 2024) [📄 Paper] [💻 Code]
A lightweight, economical alignment technique that transfers instruction-following ability using weight-space additivity—reducing dependence on expensive RLHF.
-
TWBias (EMNLP 2024) [📄 Paper] [💻 Code]
The first benchmark for assessing social bias in Traditional Chinese LLMs through a Taiwan cultural lens, supporting trustworthy evaluation for Taiwan-localized models.
-
Hokkien NLP & Multimodal Learning
- Standardizing and bridging multiple writing systems (COLING 2024). [📄 Paper]
- ATAIGI (NAACL 2025): A multimodal learning app leveraging generative models for low-resource Hokkien education. [📄 Paper] [💻 Code]
Why it matters:
We build methods + benchmarks + applications as a complete stack for localization—going beyond simple "translation-only" pipelines to ensure cultural depth and practical usability.
Thrust 2 — Biomedical NLP (BioNLP): Data-Centric & Privacy-Aware
🏆 BioASQ Champion (6 yrs)
Briefings in Bioinformatics
npj Digital Medicine
Database (Oxford)
Motivation: Biomedical data are often high-noise / low-resource, and clinical text requires strict privacy protection.
🔑 Key Contributions
-
BioASQ Biomedical QA (2020–2025) [🌐 Project]
Achieved six consecutive years as the top team in tasks involving evidence retrieval, semantic search, and evidence-grounded answering.
-
Relation Extraction: Survey & New Resource (PEDD) [📄 Paper] [💾 Data]
Proposed PEDD, a high-quality document-level dataset originally developed for the AI CUP 2019 competition, addressing critical inconsistencies in legacy benchmarks (Briefings in Bioinformatics, 2024).
-
Clinical De-identification & Temporal Normalization with LLMs [📄 Paper]
- Discovery of inverse scaling beyond ~6B parameters without targeted adaptation (npj Digital Medicine, 2025).
- Efficient solution via PEFT/LoRA, improving privacy protection while preserving clinical utility.
-
Data-centric Ensemble Learning [📄 Paper]
Enhancing biomedical relation extraction through data-centric and preprocessing-robust ensemble learning approach (Database, 2025).
Why it matters:
Many global BioNLP systems optimize model architectures on curated datasets; we emphasize real-world robustness and privacy constraints, addressing barriers to actual hospital deployment.
Thrust 3 — Digital Humanities & Historical GIS
IJGIS 2025
EMNLP 2023
DSH 2025
Motivation: Humanities data are frequently unstructured, archaic in language, and lack annotations, making standard supervised learning ineffective.
🔑 Key Contributions
-
Historical Maps Without Labels (IJGIS 2025) [📄 Paper] [💻 Code]
Unsupervised domain adaptation (UDA) bridging modern labeled maps and historical unlabeled maps for land-use understanding.
-
MingOfficial (EMNLP 2023) [📄 Paper] [💾 Data (Depository)]
A historical context-aware representation learning framework embedding time, space, and events into career trajectory representations.
-
Computational Analysis of Ming Military Power [📄 Paper] [📰 Media (中文)]
A digital humanities approach to analyzing the roles of Supreme Commanders and Grand Coordinators in Ming Shilu (Digital Scholarship in the Humanities, 2025).
-
LLM-based Long-document Analysis
Designing grounded RAG workflows with context fields (metadata + text) and human-in-the-loop benchmarks to improve traceability in historical sources.
Why it matters:
Instead of applying generic tools, we develop algorithmic methods (like UDA and Context Embedding) that specifically overcome the fundamental "no-label" barrier in historical materials.
🚀 Selected Deployment Context
| Semiconductor |
Domain LLMs for fab operation (TSMC, 2025) |
| National Scale |
AI semantic search for theses/dissertations (National Central Library, 2025) |
| Public / Education |
Teacher co-planning support with NARLabs / NCHC (2025) |
| Healthcare |
Clinical collaborations with Cathay General & Landseed Int'l Hospital (2021–2025) |
| International |
East Asian multilingual model merging with KISTI (Korea, 2025) |