Using ChatGPT to Learn Chinese
I got back from a family trip to Lijiang (Yunnan) last week. It was quite interesting being there again when my first and only trip to Lijiang happened a full 20 years ago! (Yeah, yeah… I’m old!)
I remember even then that some tourists were lamenting that Lijiang was “too commercialized.” Obviously it’s more commercialized now, but I still found it nice. One big difference is that while I used to see Naxi women in the “old town” of Lijiang wearing their traditional clothing selling food, clothing, and trinkets, I no longer do. Instead, it’s lots of tourists (mostly young women) decked out in rented Naxi garb (or even Tibetan) in order to take photos.
But anyway, I’m back in Shanghai now, and one of the reasons I’m happy to be back is that I can continue to experiment with ChatGPT every day. It’s super addictive and fun, and it also has great potential for Chinese learning. So I thought I’d share a few things I’ve discovered. I’m planning to go into more specifics in an upcoming series on the AllSet Learning blog.
What Can Chat GPT do in Chinese?
Here’s a simple rundown of some of the many things you can do with ChatGPT, starting with the obvious and getting more specific:
- Translate between English and Chinese (or whatever other languages ChatGPT can handle)
- Adding pinyin to Chinese (it often does this by default for Chinese output when the prompt is in English)
- Provide both simplified and traditional characters, or convert between the two
- Create frequency lists of Chinese characters (limited)
- Creating lists of Chinese words
- Provide Chinese characters with pinyin for an article in English which mentions Chinese words.
- Generate short paragraphs in Chinese on any topic
- Generate short paragraphs of text in both English and Chinese (parallel texts)
- Respond in Chinese to chat prompts in Chinese
Sounds pretty cool, right? OK, but hold on… it doesn’t do any of these tasks perfectly. It does most of them well, though (especially general translation).
Here’s an example of me asking for the Chinese words (with pinyin) for English output about Lijiang it had just given me:
I had it generate a frequency list of Chinese characters using a custom format I wanted (with the slashes and parentheses), and also including Japanese. It did take a bit of back and forth to get the formatting right, but here’s the result:
BUT, you may be surprised to learn that ChatGPT doesn’t have access to Chinese word frequency data, so word frequency lists will be trickier.
Let’s get into more specifics about some of the other problems.
Problems with Using ChatGPT for Chinese
OK, first of all, the obvious: it sometimes just plain gets stuff wrong. You can correct it, and it will readily admit and correct its mistake, but you might not realize the mistake is there at all.
Here’s a simple pinyin mistake I called it out on:
Another annoying thing is that, by default, pinyin is not grouped by words (it’s not word segmented). You can get ChatGPT to segment by words, but it’s harder than you might expect. It took me quite a few tries, and in the end, I had to manually segment the pinyin of the first sentence for it and provide that as a model. After that, ChatGPT was able to do the word segmentation (more or less) correctly.
It can also easily make factual errors about stuff related to Chinese. It looks more “correct” when it’s accompanied by pinyin, but it still might be just plan wrong!
Sometimes it uses the wrong punctuation for Chinese (maybe because I was mixing the languages in my prompts). You can ask it to fix that, but it’s still annoying.
How about simplifying? You might have high hopes for this, considering that ChatGPT is quite good at simplifying English explanations. It can do it in Chinese, too, but often not well. It does not seem to have a sense of what is “difficult” for learners. I imagine this is largely result of an absence of such material for the AI’s training, but the result is very real. ChatGPT is not good at writing very simple texts for learners.
Here’s an example of me trying to get it to simplify text as far as possible:
This went on for a bit longer, and the end result was:
“历史遗迹” 是古代建筑或地方。
I’m sure ChatGPT will get better at a lot of this stuff, but likely not before ChatGPT becomes a paid service. (There’s plenty of room int he market for specialized AI products, though!)
What to Use ChatGPT for to Learn Chinese
So what should you use ChatGPT for as a learner?
- Learn about Chinese culture and get the vocabulary with it. I love being able to ask for the Chinese (with pinyin) when discussing any topic in English. This is something that annoys me when reading in-depth articles on topics related to China: they often don’t provide the Chinese at all (or maybe just pinyin with no tone marks).
- Chat in Chinese. Hey, it’s a chat bot, after all? It’s infinitely patient, can provide pinyin, can say things in different ways, etc. It might not always use simple Chinese, but there’s still loads of potential there.
- Generate short texts. If you’re intermediate (or close to it), you could try generating short texts on topics you’re interested in. You could try jokes or even short stories. I’ve found that in general, the fewer constraints you give ChatGPT on its output, the more natural that output will be. So while you may not be able to restrict a text to only the Chinese that a first grader would know, you can still ask for texts on simple topics.
Keep experimenting! This stuff is only the tip of the iceberg. This blog post is more of a “quick and dirty assessment” than a comprehensive review. If you’ve discovered some cool uses of ChatGPT related to learning Chinese, I would love to hear about it.
The one thing to keep in mind with ChatGPT is that it’s fundamental strength is text generation. In no way is it a search engine, nor does it have any notion of truth. All it does is literally make things up, based on the massive corpus of content it has ingested – massive and yet with inherent biases. It indeed makes factual errors, about Chinese or otherwise, all it does is give you a plausible answer.
Yes, good points.
The 家 / Family mistake was weird, though, because it had gotten the name correct earlier. So it’s not just a matter of sources or methods, but also consistency.
I’ve previously tried to have it explain four character idioms to me. Obviously it’s dangerous to trust it too much, but what it generates often at least has its own internal logic that can help to remember the phrase (even though it might etymologically not be 100% accurate).
Me: Help me learn Chinese. Take the idiom 空前絕後, explain the meaning of the whole idiom, then separate it into individual characters. Explain the meaning of every single character. Finally explain the logic by which these characters combine to make-up the idiom’s total meaning.
ChatGPT: The idiom “空前絕後” (Kōng qián jué hòu) translates to “unprecedented” or “without equal before or after.”
The characters:
“空” (kōng) means “empty” or “unfilled.”
“前” (qián) means “before.”
“絕” (jué) means “cut off” or “extinct.”
“後” (hòu) means “after.”
The combination of these characters creates an idiomatic expression meaning that something is without equal or comparison, either in the past or future. This phrase is often used to describe events or situations that are truly unique and unparalleled.
Me: Go into detail how this combination of characters works. I don’t see the connection between the individual meanings and the totality.
ChatGPT: The combination of the characters in “空前絕後” works by describing something that is both without equal in the past and without equal in the future. The character “空” (kōng) represents the idea of emptiness or lack of comparison, while “前” (qián) and “後” (hòu) represent the past and future respectively. The character “絕” (jué) is used to emphasize the complete absence of comparison, meaning that there is no match either before or after the event or situation in question.
So, the combination of these characters creates a phrase that is used to describe something that is truly unique and unparalleled, both in the past and the future. The idiomatic expression is meant to convey the idea that there is no event or situation that can compare to what is being described, in either the past or the future.
Very cool example! Thanks for sharing. I’ll have to try more of these. (I would expect it to occasionally be incorrect in its explanation.)
I got better luck by using English to request the simplification, e.g.
=============
ME:
I’m going to give you a Chinese word, and then you will define it in Chinese using only vocabulary and grammar that a beginner student would know.
For example:
Word: 家
家是我们住的地方。
CHATGPT:
Understood! Please provide the Chinese word that you would like me to define.
ME:
历史遗迹
CHATGPT:
历史遗迹是过去很多年前建造的建筑或地方,现在仍然存在并被保存作为我们了解历史的重要资料。
ME:
A beginner wouldn’t know some of those words. Please only use words that a beginner would know.
CHATGPT:
历史遗迹是很久以前建的房子或地方,现在还在,让我们知道历史。
=============
So even the first definition it provided was somewhat beginner-friendly (or at least intermediate-friendly). And the second, more simplified version is definitely beginner-friendly (with the exception of 建 as a single-character word), although less natural-sounding.
Good example @Robert! Are you using v4.0? I was playing with it too: after some back and for as in John’s formatting example…
——
…
Me: I see it’s difficult for you to control the alignment – but no problem, it’s still impressive! Can you regenerate the last response with the English translation also added below the pinyin line?
—-
ChatGPT: Sure! Here are the first sentences of each of the Four Great Chinese Classic Novels with improved formatting and the English translation added below the pinyin line:
1. Journey to the West (西游记):
开始时,天地浑沌,盘古开辟世界。
Kāishǐ shí, tiāndì húndùn, Pángǔ kāipì shìjiè.
“In the beginning, there was only chaos and darkness, and the great god Pangu created the world.”
2. Romance of the Three Kingdoms (三国演义):
天下大势,分久必合,合久必分。
Tiānxià dàshì, fēn jiǔ bì hé, hé jiǔ bì fēn.
“The empire, long divided, must unite; long united, must divide.”
3. Water Margin (水浒传):
宋朝末年,天下大乱,民不聊生。
Sòngcháo mònián, tiānxià dàluàn, mín bù liáoshēng.
“It was the end of the Northern Song dynasty, and chaos and corruption were rife throughout the empire.”
4. Dream of the Red Chamber (红楼梦):
此案是一处位于京城东北的府第,名叫石头记。
Cǐ àn shì yī chù wèiyú jīngchéng dōngběi de fǔdì, míng jiào Shítou Jì.
“The Stone, or Shi, Mansion stood in the north-eastern part of the capital.”
—-
I didn’t fact check but it looks plausible 🙂
There was this article/small independent study written on this for those interested.
https://imlearningmandarin.com/2024/05/19/does-chatgpt-speak-authentic-chinese-the-results-of-my-study-may-surprise-you/