Explanation of the Database for the 1,945 Basic Japanese Kanji 
(J-1945D)


TAMAOKA, Katsuo (Hiroshima University, Japan)
KIRSNER, Kim (University of Western Australia, Australia)
YANASE, Yushi (Ehime University, Japan)
MIYAOKA, Yayoi (Hiroshima University, Japan)
KAWAKAMI, Masahiro (Nagoya University, Japan)

Produced on May 1, 2000





Address for correspondence:
Katsuo Tamaoka, Institute for International Education, Hiroshima University, 1-2, 1-chome, 
Kagamiyama, Higashi-Hiroshima, Japan 739-8523
Tel: 0824-24-6288 (Office)
e-mail: ktamaoka@hiroshima-u.ac.jp


Japanese kanji provides a stimulus-rich environment for research focusing 
on the perceptual and cognitive processes required for reading, memory and 
language acquisition in general.
There are several potentially important differences between the Japanese 
writing system and other writing systems. The first is that there are three 
different Japanese scripts: kanji, hiragana and katakana. Kanji developed from 
pictures used by the Chinese thousands of years ago to represent objects and 
events in the world around them. Some kanji have preserved their pictographic 
form and are still similar in appearance to the objects which they were intended 
to represent. Others were designed to represent more abstract ideas, and still 
others involve kanji combinations which were created to convey information 
about a related idea. A fourth type of kanji consists of elements that hint at 
pronunciation. Japanese also has two scripts representing morae (a slightly 
smaller unit of syllabaries) ? hiragana and katakana ? to depict the same 
set of 46 basic sounds. Hiragana is used for verb endings, parts of speech and 
to write words not usually written in kanji. Katakana is used to write words and 
names which are not of Japanese or Chinese origin. No further consideration 
will be given to hiragana and katakana in this paper or in the associated 
database.
The second important difference has to do with the number of different kanji 
characters. In the version of kanji now in common use there is a total of 1,945 
characters, and the pedagogic load associated with mastery of this set of 
characters is evident in the way acquisition is spread out over the first nine 
years of schooling. A third difference is the fact that kanji are constructed from 
a set of 214 constituents, or 'radicals'. With this being the case, all 1,945 basic 
kanji are constructed from one of these 214 radicals. There is a parallel in 
Indo-European writing systems where many words have evolved from the 
same stem or root of a remote language such as Latin. However, the radicals 
in kanji are 'pictographs' rather than letters.
A fourth difference involves the fact that the spoken forms associated with 
an individual kanji character are often shared by several other kanji, and are 
therefore homophones. The English parallel involves words such as BARE and 
BEAR which are, unlike most kanji examples, are visually similar. A fifth 
difference involves the fact that many kanji have two readings, On and Kun, 
based on the words from Chinese and Japanese origins respectively. 
Homographs are of course present in the Indo-European languages as well, 
with the two interpretations of BANK being an obvious example. The database 
described and illustrated in this paper was created to facilitate general access 
to information about these and other distributional characteristics of kanji and, 
in turn, to facilitate research into the perceptual and cognitive processes that 
may or may not be unique to kanji.

Kanji

The kanji script used in the Japanese language consists of morphemic units, 
the smallest unit of meaning in a language. About 70 percent of the 51,962 
words listed in a Japanese dictionary are composed of two kanji (Yokosawa 
& Umeda, 1988).  In 1981, the Japanese government published a list of the 
1,945 most commonly-used basic kanji.  Called 'Jooyoo Kanji Hyoo' (??
???), the list established the standard for kanji usage (Ministry of Education, 
Science and Culture, Government of Japan, 1987, 1998). According to a 
survey on frequency of kanji in print conducted by the National Language 
Research Institute (1976), 2,000 kanji encompassed 99.6 percent of the kanji 
used in three major Japanese newspapers, (Asahi, Mainichi and Yomiuri) 
published during 1966. Although the 1,945 basic kanji and the 2,000 kanji 
mentioned above were not identical in each case, it is estimated that the 1,945 
basic kanji cover approximately 99 percent of kanji used in Japanese 
newspapers.
In 1989, Ministry of Education, Science and Culture, Government of Japan 
released a revised version of the Japanese language curriculum (Nihon-go 
Gakushuu Shidoo Yooryo; ?????????) which included a list of kanji 
to be mastered from Grades 1 to 6 (Gakunen-betsu Kanji Haitoo-hyoo; ???
?????).  Of the 1,006 kanji on the list, 80 are taught in Grade 1, 160 in 
Grade 2, 200 each in both Grades 3 and 4, 185 in Grade 5, and 181 in Grade 
6. All of these 1,006 kanji are taken from the 1,945 basic kanji. The remaining 
939 kanji are taught from Grades 7 to 9.  Because the Education Act in Japan 
stipulates that all Japanese citizens complete the ninth grade, every Japanese 
person must study all 1,945 basic kanji by Grade 9.  We can, therefore, 
expect that native Japanese speakers educated in Japan will know at least 
these 1,945 basic kanji.
The 1,945 basic kanji are ideal for experimental use in studies involving the 
Japanese language.  The present database provides 27 cells which cover 
various aspects of the fundamental characteristics of the 1,945 basic Japanese 
kanji. This information is stored in a Microsoft Excel 2000 file.  Using this 
database, researchers will be able to conduct planned experiments based on 
the known characteristics of selected kanji.

The Kanji Table

The database includes 27 variables describing the 1,945 basic kanji. 
Regarding the database cells from left to right, the variables are (1) ID 
classification according to the Japanese 50-Sound System (50-Onzu, ???
?), (2) kanji orthography, (3) classification based on six categories provided 
by Shirakawa (1994), (4) school grade during which the kanji is taught, (5) 
number of strokes, (6) kanji frequency provided by the National Language 
Research Institute (1976), (7) kanji frequency published by Yokoyama, 
Sasahara, Nozaki and Long (1998), (8) kanji frequency on CD-ROM provided 
by Yokoyama, et al. (1998), (9) kanji neighborhood size of the left-hand side 
position provided by Kawakami (1997), (10) kanji neighborhood size of the 
right-hand side position provided by Kawakami (1997), (11) a total of kanji 
neighborhoods adding left-hand and the right-hand sides together, (12) 
accumulative kanji neighborhood of the left-hand side position, (13) 
accumulative kanji neighborhood of the right-hand side position, (14) a total of 
accumulative kanji neighborhood adding the left-hand and right-hand sides 
together, (15) name of radicals used for the kanji, (16) radical frequency in the 
1,945 basic kanji, (17) number of constituents which construct the kanji, (18) 
number of kanji homophones, (19) number of On-readings, (20) On-reading 
pronunciations, (21) English translation of On-readings, (22) number of 
Kun-readings, (23) Kun-reading pronunciations, (24) English translation of 
Kun-readings, (25) On-reading frequency calculated from the index provided 
by the National Language Research Institute (1976), (26) On- and Kun-reading 
accumulative frequency calculated from the index of the National Language 
Research Institute (1976), and (27) On-reading ratio (Cell 25 divided by Cell 
26).

Cell 1: Kanji identification number as it ranks in the Japanese 50-Sound 
System.
Cell 2: Kanji character. A presentation of character's orthography is provided 
in this cell.
Cell 3: Kanji classification. The most common classification system for kanji is 
one developed by the Chinese and adopted by the Japanese which divides all 
kanji into six categories (Rikusho Bunrui, ????). These six categories are 
(1) 'pictographic kanji' (Shookei, ??) derived from the shapes of objects, 
(2) 'ideographic kanji' (Shiji, ??) which express ideas and qualities, (3) 
'compound ideographic kanji' (Kaii, ??) formed by combining more than 
one internal constituent to represent ideas and their associations, (4) 
'phonetic compound kanji' (Keisei, ??) constructed by phonetic and 
semantic components, (5) 'loan kanji' (Kashaku, ??) whose original 
sounds were adopted but not their original meaning and (6) 'analogous 
kanji' (Tenchuu, ??) which are new kanji patterned after old kanji to denote 
new meanings. To represent these categories, our Cell 3 used these six terms: 
'?? Pictographic', '?? Ideographic', '?? Com. Ideographic', 
'?? Phonetic', '?? Loan', and '?? Analogous'. (Analogous 
kanji are not found in the 1,945 basic Japanese kanji.)  These six categories 
were based upon a system of categorization provided by Shirakawa (1994). It 
should be noted that there are five kanji (i.e., ?, ?, ?, ? and ?) that 
cannot be classified according to this system of categorization. These kanji are 
original Japanese characters (Kokuji, ??). They are labeled as '?? 
Original'. The kanji ? is also an original Japanese character but it is also a 
phonetic compound kanji, so it is listed as '?? Phonetic'.
Cell 4: Grade of instruction. This cell specifies the school grade in which a kanji 
is taught.  The figures for the 1,006 kanji follow the Japanese language 
curriculum as published by the Ministry of Education, Science and Culture, 
Government of Japan in 1989.  Since the remaining kanji are taught in 
Grades 7-9, these are all indicated with the number '7'.
Cell 5: Number of strokes. This cell specifies the number of strokes in each 
kanji.  The strokes required to write a kanji are taken from a Japanese kanji 
dictionary edited by Kamata (1991).
Cell 6: Frequency of occurrences for kanji in print (kanji frequency) as indexed 
in 1976.  Kanji frequency was calculated from all words printed in three major 
newspapers (Asahi, Yomiuri and Mainichi) during the year 1966 and was 
published by the National Language Research Institute in 1976. The study 
sampled 991,375 kanji and, for each one, calculated the frequency of 
appearance in every 1000 kanji.
Cell 7: Kanji frequency as indexed in 1998. Yokoyama, Sasahara, Nozaki and 
Long (1998) published frequency of occurrence data based on all the kanji in 
the Tokyo edition of the Asahi newspaper printed in 1993.  We used 
Pearson's correlation to find the relationship between frequency of 
occurrence among the 1,945 basic kanji (n=1945) for 1966 as recorded in our 
seventh cell, and for 1993 as recorded in our eighth cell.  The correlation was 
0.969, a figure which indicates that the frequency of occurrence for printed 
kanji was stable over a period of 27 years, from 1966 to 1993.
Cell 8: Frequency of occurrence for kanji on CD-ROM (KF on CD). Yokoyama 
et al. (1998) calculated and published frequency of occurrence data for all the 
kanji used in the CD-ROM version of Asahi newspaper (CD-HIASK'93).  It 
contained 110,000 newspaper articles published in 1993.  Thus, all the words 
in the Tokyo edition of the Asahi newspaper used in Cell 8 are included in the 
CD-ROM version. Pearson's correlation for the relationship between kanji 
frequency of occurrence established for 1966 and 1993 was 0.971.  Thus, 
kanji frequency indexes changed little over 27 years. The correlation between 
the two kanji frequencies for the 1993 Tokyo editions of the Asahi newspaper 
and its CD-ROM version was 0.996. A smaller sampling of the newspaper 
texts was enough to represent the kanji frequency index.
Cell 9: Kanji neighborhood size of the left-hand side of two-kanji compound 
words. An index of kanji neighborhood size is provided by Kawakami (1997). 
The term 'kanji neighborhood size' refers to the unit of one kanji combined 
with another kanji to created two-kanji compound words.  Two-kanji 
compound words are produced by the combination of kanji placed in the 
left-hand and right-hand side positions of the word. Cell 9 provides the kanji 
neighborhood size of the left-hand side.
Cell 10: Kanji neighborhood size of the right-hand side of two-kanji compound 
words.
Cell 11: A total of kanji neighborhood size for both the left-hand and right-hand 
sides of two-kanji compound words.
Cell 12: Accumulative kanji neighborhood of the left-hand side of two-kanji 
compound words. A kanji neighborhood size is simply a count of the number of 
two-kanji compound words with no consideration of word frequency. Thus, the 
accumulative kanji neighborhood is calculated by adding up all the frequency 
of occurrences for words in print provided by the National Language Research 
Institute (1973).  Since two-kanji compound words can be produced by the 
combination of kanji placed in the left-hand or right-hand side of the word, Cell 
12 provides the kanji neighborhood size of the left-hand side.
Cell 13: Accumulative kanji neighborhood of the right-hand side of two-kanji 
compound words.
Cell 14: A total of accumulative kanji neighborhoods of both the left-hand and 
right-hand sides of two-kanji compound words.
Cell 15: Name of radical.  This cell indicates the name of the radical used in 
kanji.  Japanese kanji dictionaries traditionally classify kanji by way of 214 
radicals.  The name used for each radical has been taken from the Japanese 
kanji dictionary edited by Kamata (1987).
Cell 16: Radical frequency calculated using the 1,945 basic kanji. The radical 
frequency indicates how many of the 1,945 basic kanji share the same radical.  
A large body of kanji (1,057 characters or 54.34% of the 1,945 basic kanji) is 
constructed with only 24 radicals out of a possible 214.
Cell 17: Number of constituents. Single kanji are often composed of two or 
more constituents ? a radical, and a secondary elements. We conducted a 
survey of visually complex kanji to identify how native Japanese speakers 
divide up a single kanji character. Japanese speakers were asked to divide up 
kanji into smaller constituents by circling them. Our survey found that subjects 
were likely to divide up kanji based on radical units and then other elements.  
Our database followed this survey in defining the number of components in 
each kanji.
Cell 18: Number of kanji homophones. A single kanji pronunciation is often 
shared by multiple kanji.  For example, the sound /yoku/ could be written by 
five different characters of the 1,945 basic kanji.  Each of these five kanji is 
indicated by the number 5. Both On- and Kun-readings were calculated for 
kanji homophones. A majority of kanji homophones were found in On-readings.
Cell 19: Number of On-readings. Japanese kanji often have multiple 
On-readings. On-readings were adopted from the original Chinese 
pronunciation during the Chinese dynastic periods when contact occurred 
between Japan and China.  The count for On-readings was taken from the 
On-readings listed in the kanji dictionary edited by Kamata (1991). There were 
779 kanji (40.05%) out of 1,945 basic kanji which had only one type of either 
On-reading or Kun-reading.
Cell 20: Pronunciation of On-readings. The pronunciation of On-readings is 
written in the Hepburn system of romanization provided by Nelson (1992). 
However, when pronunciation of an On-reading ends with a geminate sound, 
the symbol /Q/ is used. This phonemic symbol is common in presenting 
Japanese special sounds. In order to precisely describe Japanese sounds, 
long vowels are transcribed by repeating the same vowel twice such as /oo/ 
and /uu/. Some rare pronunciations of On-readings were excluded from the 
database.
Cell 21: English translation of On-readings.
Cell 22: Number of Kun-readings. Kun-readings originated in Japan. The count 
for the Kun-readings was taken from the Kun-readings listed in the kanji 
dictionary edited by Kamata (1987). Rare Kun-readings were not included.
Cell 23: Pronunciation of Kun-readings. Kun-readings were written in the 
Hepburn system of romanization provided by Nelson (1992).  Again, long 
vowels are transcribed by repeating the same vowel.  There was no ending of 
the geminate sound /Q/ in Kun-readings. Rare pronunciations of Kun-readings 
were excluded from the database.
Cell 24: English translation of Kun-readings.
Cell 25: Accumulative frequency for kanji with On-readings. Using the kanji 
frequency index of 1976 provided by the National Language Research Institute, 
we calculated the frequency of occurrence for the On-reading(s) of each kanji.  
The resulting figures actually give frequency of occurrence as it appeared in 
the three newspapers published in 1966.  For example, the kanji ?, which 
has an On-reading of /iku/, appeared in 19 different words used 918 times in 
the sample taken from the newspapers.  Accumulative frequency is therefore 
listed as 918.  The names of persons and places were not included in this 
frequency.  When the overall accumulative kanji frequency was less than 9, 
the index of 1976 did not provide detailed frequencies of either On- or 
Kun-readings. Accumulative frequency for these kanji is indicated by a hyphen 
'-'.
Cell 26: Accumulative frequency for combined On- and Kun-readings.  Using 
the kanji frequency index of 1976, the total accumulative frequency was 
calculated for each kanji using both its On- and Kun-readings. In the process of 
calculation, Cell 26 excluded kanji used for proper nouns. This exclusion 
slightly alters the figures in this cell from the kanji frequency figures of Cell 6.  
Cell 27: On-reading ratio.  A database consisting of 996 kanji published by 
Kaiho and Nomura (1983) provided On-reading ratios. Kaiho and Nomura 
calculated this ratio by working with kanji which have both an On- and 
Kun-readings.  For each of these kanji, they totaled the number of subjects 
who had employed only the On-reading and then they simply divided this figure 
by the number of subjects who had applied both readings.  The shortcoming 
of this approach is that the question of whether all subjects were familiar or not 
with both readings was disregarded.  For our database, we calculated the 
On-reading ratio by dividing the accumulative On-reading frequency in Cell 25 
by the accumulative frequency for the combined On- and Kun-readings in Cell 
26.  The On-reading ratios in the present database are objective figures using 
kanji frequency of occurrence. When a kanji has only one type of reading 
(either On or Kun), the figure is indicated by a hyphen '-'.

References

Kaiho, H., & Nomura, Y. (1983). Kanji joohoo shori no shinrigaku [Psychology 
of kanji information processing]. Tokyo: Kyouiku Shuppan. (In Japanese)

Kamata, T. (1991). Kuwashii shoogakkoo kanji jiten [The detailed elementary 
school kanji dictionary]. Tokyo: Buneido. (in Japanese)

Kawakami, M. (1997). JIS 1-shu kanji 2965 ji wo mochiite sakusei sareru kanji 
niji jukugosuu hyoo [Tables of two-kanji compound words constructed with 
2,965 JIS 1st kanji characters]. School of Education Bulletin, Nagoya 
University, 44, 243-299. (In Japanese)

Kokuritsu Kokugo Kenkyuujo [National Language Research Institute]. (1973). 
Shinbun no goi chosa (IV) [A study of Japanese word usage in 
newspapers]. Tokyo: National Language Research Institute. (In Japanese)

Kokuritsu Kokugo Kenkyuujo [National Language Research Institute]. (1976). 
Gendai Shinbun no Kanji [Japanese kanji characters in modern 
newspapers]. Tokyo: National Language Research Institute. (In Japanese)

Ministry of Education, Science and Culture, Government of Japan. (1978). 
Chuugakkoo shidoosho - Kokugohen [The Japanese language - the 
course of study at junior high-school]. Tokyo: Tokyo Shoseki. (In 
Japanese)

Ministry of Education, Science and Culture, Government of Japan. (1987). 
Shoogakkoo shidoosho - Kokugohen [The Japanese language - the 
course of study at elementary school]. Osaka: Osaka Shoseki. (In 
Japanese)

Ministry of Education, Science and Culture, Government of Japan. (1998). 
Monbushoo kokuji - Shoogakkoo gakushuu shidoo yooryoo [The 
announcement of the elementary school course of study by the Ministry of 
Education, Science and Culture, Government of Japan.]. Tokyo: Gyoosei. 
(in Japanese)

Nelson, A. N. (1992). The modern reader's Japanese-English character 
dictionary (2nd revised edition, 35th printing). Tokyo: Charles E. Tuttle 
Company.

Shirakawa, S. (1994). Jitoo [Kanji etymology]. Tokyo: Heibonsha. (in 
Japanese)

Yokosawa, K., & Umeda, M. (1988). Processes in human Kanji-word 
recognition. Proceedings of the 1988 IEEE international conference on 
systems, man, and cybernetics (pp. 377-380). August 8-12, 1988, Beijing 
and Shenyang, China.

Yokoyama, S., Sasahara, H., Nozaki, H., & Long, E. (1998). Shinbun denshi 
media no kanji � Asahi shinbun CD-ROM niyoru kanji hindo hyoo 
[Japanese kanji in the newspaper media � Kanji frequency index from the 
Asahi newspaper on CD-ROM]. Tokyo: Sanseidoo. (In Japanese)


1