MLEC-QA: A Chinese Multi-Choice Biomedical Question Answering Dataset - Details

author：

Li, Jing (Li, Jing.) ^[1] | Zhong, Shangping (Zhong, Shangping.) ^[2] (Scholars：钟尚平) | Chen, Kaizhi (Chen, Kaizhi.) ^[3] (Scholars：陈开志)

Indexed by：

Abstract：

Question　Answering　(QA)　has　been　successfully　applied　in　scenarios　of　human-computer　interaction　such　as　chatbots　and　search　engines.　However,　for　the　specific　biomedical　domain,　QA　systems　are　still　immature　due　to　expert-annotated　datasets　being　limited　by　category　and　scale.　In　this　paper,　we　present　MLEC-QA,　the　largest-scale　Chinese　multi-choice　biomedical　QA　dataset,　collected　from　the　National　Medical　Licensing　Examination　in　China.　The　dataset　is　composed　of　five　subsets　with　136,236　biomedical　multi-choice　questions　with　extra　materials　(images　or　tables)　annotated　by　human　experts,　and　first　covers　the　following　biomedical　sub-fields:　Clinic,　Stomatology,　Public　Health,　Traditional　Chinese　Medicine,　and　Traditional　Chinese　Medicine　Combined　with　Western　Medicine.　We　implement　eight　representative　control　methods　and　open-domain　QA　methods　as　baselines.　Experimental　results　demonstrate　that　even　the　current　best　model　can　only　achieve　accuracies　between　40%　to　55%　on　five　subsets,　especially　performing　poorly　on　questions　that　require　sophisticated　reasoning　ability.　We　hope　the　release　of　the　MLEC-QA　dataset　can　serve　as　a　valuable　resource　for　research　and　evaluation　in　open-domain　QA,　and　also　make　advances　for　biomedical　QA　systems.　©　2021　Association　for　Computational　Linguistics

Keyword：

Computational linguistics Human computer interaction Natural language processing systems Search engines

Community：

[ 1 ] [Li, Jing]College of Computer and Data Science, Fuzhou University, Fuzhou, China
[ 2 ] [Zhong, Shangping]College of Computer and Data Science, Fuzhou University, Fuzhou, China
[ 3 ] [Chen, Kaizhi]College of Computer and Data Science, Fuzhou University, Fuzhou, China

Reprint 's Address：

Email：

Show more details

Related Keywords：

The establishment of machine translation bilingual corpus based on artificial intelligence and big data technology
2021，4th International Conference on Information Systems and Computer Aided Education, ICISCAE 2021
MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Language Models
2023，2023 Findings of the Association for Computational Linguistics: EMNLP 2023
Survey on Security and Privacy Risks in Large Language Models
2025，Computer Research and Development
Extracting Entity and Relation of Landscape Plant's Knowledge based on ALBERT Model
2021，Journal of Geo-Information Science

Source ：

Year： 2021

Page： 8862-8874

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 8

Affiliated Colleges：

计算机与大数据学院、软件学院本学院/部未明确归属的数据

Get Fulltext

Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to