Back To Top

LG CNS releases updated language comprehension dataset KorQuAD 2.0

LG CNS announced Wednesday it will release for free KorQuAD 2.0, consisting of an additional 80,000 datasets, for the benefit of South Korea’s artificial intelligence industry. 


The initial release of KorQuAD, short for the Korean Question Answering Dataset, comprises a South Korean machine reading comprehension dataset with over 20,000 question-and-answer sets compiled from Korean Wikipedia entries. It provides guidelines for AI and machines to learn how to extract answers from questions asked in natural language.

At the “AI Tech Talk for NLU” held by LG CNS on Thursday, the firm announced the release of KorQuAD 2.0, which expands the dataset to over 100,000 question-and-answer sets.

KorQuAD 2.0 will be able to answer long questions and process information contained in tables and lists, the firm said.

LG CNS Vice President and Chief Technical Officer Hyun Shin-kyun said the language data accumulated by the firm is being released for free to better aid the AI industry here.

By Cho Hyee-su (
Korea Herald Youtube