As the world’s leading AI companies release new models at breakneck speed, Kakao is sticking to a narrower path, looking to refine AI that works better in Korean rather than trying to keep up in the global race.

Kakao on Friday released new benchmark results for Kanana-o, its in-house multimodal AI that handles text, speech and images, alongside a search focused model called Kanana-v-Embedding.

The disclosure comes as OpenAI just launched GPT-5.2 one month after GPT-5.1, following what its chief executive Sam Altman reportedly described as a “code red” push to counter Google’s Gemini 3 released last month.

Kanana-o is designed primarily for voice interaction in Korean. In simple English tasks such as recognizing spoken words or generating speech, Kakao’s comparison charts show little difference between major models, with GPT-4o, Gemini 2.5 Pro and Kanana-o all performing at similarly high levels.

But a gap opens up when tasks require understanding spoken instructions or combining speech with images, which demands stronger reasoning. On English speech instruction benchmarks, Kanana-o averaged (77.54), compared with (87.12) for GPT-4o and (90.41) for Gemini 2.5 Pro.

GPT-4o was introduced by OpenAI in May 2024, while Google released Gemini 2.5 Pro in March 2025.

In Korean, the results are more favorable. Although Kakao did not release exact figures for every Korean benchmark, its charts consistently place Kanana-o ahead of mid-tier models such as Qwen 2.5 O and MiniCPM o 2.6 across Korean speech recognition, speech synthesis, emotional speech detection and following spoken instructions.

GPT-4o and Gemini remain ahead overall, but the performance gap narrows in Korean, especially in tasks involving emotional tone, such as recognizing whether a speaker sounds happy, angry or sad.

Kakao says these gains come from training on large volumes of Korean audio, including long-form conversational recordings, and from preference-based fine tuning. The company also said Kanana-o was optimized to respond more naturally to everyday cultural references.

Kakao also released results for Kanana-v-Embedding, a model used for search and recommendation. Embedding models convert text and images into numerical formats so computers can compare different items. In Korean sentence retrieval tests, Kanana-v-Embedding scored (77.09), compared with (36.73) for Jina Embeddings v4, a widely used multilingual search model.

Kakao said the embedding model is already deployed internally for advertising content review, and framed both systems as infrastructure for its own services rather than global products.