A language family is a grouping of linguistically linked languages, stemming from a common ancestral mother-language called Protolanguage. Most languages in the world belong to a specific family. Languages that have no demonstrable relation with others, and cannot be classified within a specific family, are generally known as language isolates. Creole languages are the only ones to be neither isolates, nor members of a linguistic family. They form their own different type of languages.
The genetic classifications given in the language entries name 141 different language families (that is, top-level genetic groups). Six of these, each of which has at least 5% of the world's languages, stand out as the major language families of the world. Together they account for nearly two-thirds of all languages and five-sixths of the world's population.
The Berber languages are a group of very closely related and similar languages and dialects spoken in Morocco, Algeria, Tunisia, Libya, and the Egyptian area of Siwa, as well as by large Berber communities in parts of Niger and Mali.
The Chadic languages constitute a language family spoken across northern Nigeria, Niger, Chad, Central African Republic and Cameroon. The most widely spoken Chadic language is Hausa, the lingua franca of much of West Africa.
The Cushitic languages are spoken in the Horn of Africa, Tanzania, Kenya, Sudan and Egypt. They are named after the Biblical figure Cush by analogy with Shem's being the eponym of Semitic. The most populous Cushitic language is Oromo with about 35 million speakers, followed by Somali with about 15 million speakers, and Sidamo in Ethiopia with about 2 million speakers.
Egyptian is the indigenous language of Egypt. Written records of the Egyptian language have been dated from about 3400 BC, making it one of the oldest recorded languages known. Egyptian was spoken until the late 7th century AD in the form of Coptic. The national language of modern-day Egypt is Egyptian Arabic, which gradually replaced Coptic as the language of daily life in the centuries after the Muslim conquest of Egypt.
The Omotic languages are a branch of the Afroasiatic family spoken in southwestern Ethiopia. The Ge'ez alphabet is used to write some Omotic languages, the Roman alphabet for some others.
The Semitic languages are a group of related languages whose living representatives are spoken by more than 467 million people across much of the Middle East, North Africa and the Horn of Africa. The most widely spoken Semitic language by far today is Arabic with around 230 million native speakers. It is followed by Amharic (27 million), Tigrinya (5.8 million), and Hebrew (about 5 million).
Semitic languages are attested in written form from a very early date, with texts in Eblaite and Akkadian appearing from around the middle of the third millennium BC, written in a script adapted from Sumerian cuneiform. The other scripts used to write Semitic languages are alphabetic. Among them are the Ugaritic, Phoenician, Aramaic, Hebrew, Syriac, Arabic, South Arabian, and Ge'ez alphabets. Maltese is the only Semitic language written in the Latin alphabet and the only official Semitic language of the European Union.
The Afroasiatic languages constitute a language family with about 375 living languages and more than 500 million speakers spread throughout North Africa, the Horn of Africa, and Southwest Asia, as well as parts of the Sahel, and East Africa. The most widely spoken Afroasiatic language is Arabic, with 230 million speakers. In addition to languages now spoken, Afroasiatic includes several ancient languages, such as Ancient Egyptian, Biblical Hebrew, and Akkadian.
The Afroasiatic language family is usually considered to include the following branches: Berber, Chadic, Cushitic, Egyptian, Omotic, Semitic.
The Berber languages are a group of very closely related and similar languages and dialects spoken in Morocco, Algeria, Tunisia, Libya, and the Egyptian area of Siwa, as well as by large Berber communities in parts of Niger and Mali.
The Chadic languages constitute a language family spoken across northern Nigeria, Niger, Chad, Central African Republic and Cameroon. The most widely spoken Chadic language is Hausa, the lingua franca of much of West Africa.
The Cushitic languages are spoken in the Horn of Africa, Tanzania, Kenya, Sudan and Egypt. They are named after the Biblical figure Cush by analogy with Shem's being the eponym of Semitic. The most populous Cushitic language is Oromo with about 35 million speakers, followed by Somali with about 15 million speakers, and Sidamo in Ethiopia with about 2 million speakers.
Egyptian is the indigenous language of Egypt. Written records of the Egyptian language have been dated from about 3400 BC, making it one of the oldest recorded languages known. Egyptian was spoken until the late 7th century AD in the form of Coptic. The national language of modern-day Egypt is Egyptian Arabic, which gradually replaced Coptic as the language of daily life in the centuries after the Muslim conquest of Egypt.
The Omotic languages are a branch of the Afroasiatic family spoken in southwestern Ethiopia. The Ge'ez alphabet is used to write some Omotic languages, the Roman alphabet for some others.
The Semitic languages are a group of related languages whose living representatives are spoken by more than 467 million people across much of the Middle East, North Africa and the Horn of Africa. The most widely spoken Semitic language by far today is Arabic with around 230 million native speakers. It is followed by Amharic (27 million), Tigrinya (5.8 million), and Hebrew (about 5 million).
Semitic languages are attested in written form from a very early date, with texts in Eblaite and Akkadian appearing from around the middle of the third millennium BC, written in a script adapted from Sumerian cuneiform. The other scripts used to write Semitic languages are alphabetic. Among them are the Ugaritic, Phoenician, Aramaic, Hebrew, Syriac, Arabic, South Arabian, and Ge'ez alphabets. Maltese is the only Semitic language written in the Latin alphabet and the only official Semitic language of the European Union.
Altaic is a disputed language family that is generally held by its proponents to include the Turkic, Mongolic, Tungusic, and Japonic language families and the Korean language. These languages are spoken in a wide arc stretching from northeast Asia through Central Asia to Anatolia and eastern Europe.
The group is named after the Altai Mountains, a mountain range in Central Asia. These language families share numerous characteristics. The debate is over the origin of their similarities. One camp, often called the "Altaicists", views these similarities as arising from common descent from a Proto-Altaic language spoken several thousand years ago. The other camp, often called the "anti-Altaicists", views these similarities as arising from areal interaction between the language groups concerned. Some linguists believe the case for either interpretation is about equally strong.
Another view accepts Altaic as a valid family but includes in it only Turkic, Mongolic, and Tungusic. This view was widespread prior to the 1960s, but has almost no supporters among specialists today. The expanded grouping, including Korean and Japanese, came to be known as "Macro-Altaic", leading to the designation by back-formation of the smaller grouping as "Micro-Altaic". Most proponents of Altaic continue to support the inclusion of Korean and Japanese. Micro-Altaic would include about 66 living languages, to which Macro-Altaic would add Korean, Japanese, and the Ryukyuan languages for a total of about 74. Micro-Altaic would have a total of about 348 million speakers today, Macro-Altaic about 558 million.
The Austro-Asiatic languages are a large language family of Southeast Asia, and also scattered throughout India and Bangladesh. The name comes from the Latin word for "south" and the Greek name of Asia, hence "South Asia."
Among these languages, only Khmer, Vietnamese, and Mon have a long established recorded history, and only Vietnamese and Khmer have official status (in Vietnam and Cambodia, respectively). The rest of the languages are spoken by minority groups.
Ethnologue identifies 168 Austro-Asiatic languages. These are traditionally divided into two families, Mon-Khmer and Munda, but two recent classifications have abandoned Mon-Khmer as a valid node, although this is tentative and not generally accepted.
Austro-Asiatic languages have a disjunct distribution across India, Bangladesh and Southeast Asia, separated by regions where other languages are spoken. It is widely believed that the Austro-Asiatic languages are the autochthonous languages of Southeast Asia and the eastern Indian subcontinent, and that the other languages of the region, including the Indo-European, Kradai, Dravidian and Sino-Tibetan languages, are the result of later migrations of people.
The Austro-Asiatic languages are well known for having a "sesqui-syllabic" pattern, with basic nouns and verbs consisting of a reduced minor syllable plus a full syllable. Many of them also have infixes.
The Austronesian languages are a language family widely dispersed throughout the islands of Southeast Asia and the Pacific, with a few members spoken on continental Asia. It is on par with Bantu, Indo-European, Afro-Asiatic and Uralic as one of the best-established ancient language families.
The name Austronesian comes from Latin auster "south wind" plus Greek nêsos "island". The family is aptly named, as the vast majority of Austronesian languages are spoken on islands: only a few languages, such as Malay and the Chamic languages, are indigenous to mainland Asia.
Many Austronesian languages have very few speakers, but the major Austronesian languages are spoken by tens of millions of people. There is debate among linguists as to which language family comprises the largest number of languages. Austronesian is clearly one candidate, with 1,268 (according to Ethnologue), or roughly one-fifth of the known languages of the world. The geographical span of the homelands of its languages is also among the widest, ranging from Madagascar to Easter Island. Hawaiian, Rapanui, and Malagasy (spoken on Madagascar) are the geographic outliers of the Austronesian family.
Austronesian has several primary branches, all but one of which are found exclusively on Taiwan. The Formosan languages of Taiwan are grouped into as many as nine first-order subgroups of Austronesian. All Austronesian languages spoken outside Taiwan (including its offshore Yami language) belong to the Malayo-Polynesian branch, sometimes called Extra-Formosan.
It is difficult to make generalizations about the languages that make up a family as diverse as Austronesian. Very broadly, the Austronesian languages can be divided into three groups of languages: Philippine-type languages, Indonesian-type languages and post-Indonesian type. The first group is characterized by relatively strong verb-initial word order and Philippine-type voice alternations.
The Austronesian languages tend to use reduplication (repetition of all or part of a word, such as wiki-wiki), and, like many East and Southeast Asian languages, have highly restrictive phonotactics, with small numbers of phonemes and predominantly consonant-vowel syllables.
Basque is the ancestral language of the Basque people, who inhabit the Basque Country, a region spanning an area in northeastern Spain and southwestern France. It is spoken by 25.7% of Basques in all territories. Of these, 614,000 live in the Spanish part of the Basque country and the remaining 51,800 live in the French part.
In academic discussions of the distribution of Basque in Spain and France, it is customary to refer to three ancient provinces in France and four Spanish provinces. Native speakers are concentrated in a contiguous area including parts of the Spanish Autonomous Communities of the Basque Autonomous Community (Spanish: País Vasco; Euskara: Euskadi) and Navarre and in the western half of the French Département of Pyrénées-Atlantiques.
These provinces and many areas of Navarre are heavily populated by ethnic Basques, but the Euskara language had, at least until the 1990s, all but disappeared from most of Álava, western parts of Biscay and central and southern areas of Navarre. In southwestern France, the ancient Basque-populated provinces were Labourd, Lower Navarre, and Soule.
A standardized form of the Basque language, called Batua, was developed by the Basque Language Academy in the late 1960s. Batua is mainly used in the Spanish Basque Country. In France the Basque language school Seaska and the association for a bilingual schooling Ikasbi meet a wide range of Basque language educational needs up to the Sixth Form.
Apart from this standardized version, there are six main Basque dialects, corresponding to the above mentioned historic provinces populated by Basques: Bizkaian, Gipuzkoan, and Upper Navarrese in Spain and Lower Navarrese, Lapurdian, and Zuberoan (in France). However, the dialect boundaries are not congruent with political boundaries.
Though geographically surrounded by Indo-European Romance languages, Basque is classified as a language isolate. It is the last remaining pre-Indo-European language in Western Europe. Consequently, its prehistory may not be reconstructible by means of the comparative method except by applying it to differences between dialects within the language. Little is known of its origins but it is likely that an early form of the Basque language was present in Western Europe before the arrival of the Indo-European languages to the area.
The Dravidian family of languages includes approximately 85 languages, spoken by around 200 million people. They are mainly spoken in southern India and parts of eastern and central India as well as in northeastern Sri Lanka, Pakistan, Nepal, Bangladesh, Afghanistan, Iran, and overseas in other countries such as Malaysia and Singapore. Among them Telugu, Tamil, Kannada and Malayalam are the members with the most speakers. There are also small groups of Dravidian-speaking scheduled tribes, who live beyond the mainstream communities.
It is often speculated that Dravidian languages are native to India. Epigraphically the Dravidian languages have been attested since the 6th century BC. The origins of the Dravidian languages, as well as their subsequent development and the period of their differentiation are unclear, partially due to the lack of comparative linguistic research into the Dravidian languages.
The Dravidian languages have remained an isolated family to the present day and have defied all of the attempts to show a connection with the Indo-European tongues, Mitanni, Basque, Sumerian, or Korean. The theory that the Dravidian languages display similarities with the Uralic language group, suggesting a prolonged period of contact in the past, is popular amongst Dravidian linguists and has been supported by a number of scholars. This theory has, however, been rejected by some specialists in Uralic languages, and has in recent times also been criticised by other Dravidian linguists like Bhadriraju Krishnamurti.
Although in modern times speakers of the various Dravidian languages have mainly occupied the southern portion of India, nothing definite is known about the ancient domain of the Dravidian parent speech. It is, however, a well-established and well-supported hypothesis that Dravidian speakers must have been widespread throughout India, including the northwest region before the arrival of Indo-European speakers. Proto-Dravidian is thought to have differentiated into Proto-North Dravidian, Proto-Central Dravidian, Proto South-Central Dravidian and Proto-South Dravidian around 500 BC, although some linguists have argued that the degree of differentiation between the sub-families points to an earlier split. Dravidian is a close-knit family.
The languages are much more closely related than, say, the Indo-European languages. There is a fair degree of agreement on how they are related to each other. The Dravidian languages have not been shown to be related to any other language family. Comparisons have been made not just with the other language families of the Subcontinent (Indo-European, Austro-Asiatic, Tibeto-Burman, and Nihali), but with all typologically similar language families of the Old World.
Dravidian is one of the primary linguistic groups in the proposed Nostratic proposal, which would link most languages in North Africa, Europe and Western Asia into a family with its origins in the Fertile Crescent sometime between the last Ice Age and the emergence of proto-Indo-European 4-6 thousand years BCE. However, the general consensus is that such deep connections are not, or not yet, demonstrable.
The Indo-European languages are a family (or phylum) of several hundred related languages and dialects, including most major languages of Europe, the Iranian plateau, and Southern Asia, and historically also predominant in Anatolia and Central Asia.
With written attestations appearing since the Bronze Age, in the form of the Anatolian languages and Mycenaean Greek, the Indo-European family is significant to the field of historical linguistics as possessing the longest recorded history after the Afroasiatic family. The languages of the Indo-European group are spoken by approximately three billion native speakers, the largest number for recognised languages families. Of the top 20 contemporary languages in terms of native speakers according to SIL Ethnologue, 12 are Indo-European: Spanish, English, Hindi, Portuguese, Bengali, Russian, German, Marathi, French, Italian, Punjabi and Urdu, accounting for over 1.6 billion native speakers.
Membership of these languages in the Indo-European language family is determined by genetic relationships, meaning that all members are presumed to be descendants of a common ancestor, Proto-Indo-European. Membership in the various branches, groups and subgroups or Indo-European is also genetic, but here the defining factors are shared innovations among various languages, suggesting a common ancestor that split off from other Indo-European groups. For example, what makes the Germanic languages a branch of Indo-European is that much of their structure and phonology can so be stated in rules that apply to all of them. Many of their common features are presumed to be innovations that took place in Proto-Germanic, the source of all the Germanic languages.
Some linguists propose that Indo-European languages form part of a hypothetical Nostratic language superfamily, and attempt to relate Indo-European to other language families, such as South Caucasian languages, Uralic languages, Dravidian languages, and Afroasiatic languages. This theory, like the similar Eurasiatic theory of Joseph Greenberg, and the Proto-Pontic postulation of John Colarusso, remains highly controversial, however, and is not accepted by most linguists in the field. Objections to such groupings are not based on any theoretical claim about the likely historical existence or non-existence of such super-families; it is entirely reasonable to suppose that they might have existed.
The serious difficulty lies in identifying the details of actual relationships between language families; it is very hard to find concrete evidence that transcends chance resemblance. Since the signal-to-noise ratio in historical linguistics declines steadily over time, at great enough time-depths it becomes open to reasonable doubt that it can even be possible to distinguish between signal and noise.
Japonic or Japanese-Ryukyuan is a language family composed of Japanese and Ryukyuan. Their common ancestral language is known as Proto-Japonic or Proto-Japanese-Ryukyuan. The essential feature of this classification is that the first split in the family resulted in the separation of all dialects of Japanese from all dialects of Ryukyuan.
Japanese is a language spoken by over 130 million people in Japan and in Japanese emigrant communities. There are a number of proposed relationships with other languages, but none of them has gained unanimous acceptance. Japanese is an agglutinative language. It is distinguished by a complex system of honorifics reflecting the nature of Japanese society, with verb forms and particular vocabulary to indicate the relative status of the speaker, the listener, and persons mentioned in conversation.
The language has a relatively small sound inventory, and a lexically significant pitch-accent system. Japanese is a mora-timed language. The Japanese language is written with a combination of three scripts: Chinese characters called kanji and two syllabic scripts made up of modified Chinese characters, hiragana and katakana. The Latin alphabet is also often used in modern Japanese, especially for company names and logos, advertising, and when entering Japanese text into a computer. Arabic numerals are generally used for numbers, but traditional Sino-Japanese numerals are also commonplace.
Dozens of dialects are spoken in Japan. The profusion is due to many factors, including the length of time the archipelago has been inhabited, its mountainous island terrain, and Japan's long history of both external and internal isolation. Dialects typically differ in terms of pitch accent, inflectional morphology, vocabulary, and particle usage. Some even differ in vowel and consonant inventories, although this is uncommon.
The main distinction in Japanese accents is between Tokyo-type and Kyoto-Osaka-type, though Kyushu-type dialects form a third, smaller group. Within each type are several subdivisions. Kyoto-Osaka-type dialects are in the central region, with borders roughly formed by Toyama, Kyoto, Hyogo, and Mie Prefectures; most Shikoku dialects are also that type. The final category of dialects are those that are descended from the Eastern dialect of Old Japanese; these dialects are spoken in Hachijo-jima island and a few others.
Dialects from peripheral regions, such as Tohoku or Kagoshima may be unintelligible to speakers from other parts of the country. The several dialects of Kagoshima in southern Kyushu are famous for being unintelligible not only to speakers of standard Japanese but to speakers of nearby dialects elsewhere in Kyushu as well. This is probably due in part to the Kagoshima dialects' peculiarities of pronunciation, which include the existence of closed syllables. A dialects group of Kansai is spoken and known by many Japanese, and Osaka dialect in particular is associated with comedy. Dialects of Tohoku and North Kanto are associated with typical farmers. The Ryukyuan languages, spoken in Okinawa and Amami Islands that are politically part of Kagoshima, are distinct enough to be considered a separate branch of the Japonic family.
However, many Japanese common people tend to consider the Ryukyuan languages as dialects of Japanese. Not only is each language unintelligible to Japanese speakers, but most are unintelligible to those who speak other Ryukyuan languages.
Recently, Standard Japanese has become prevalent nationwide (including the Ryukyu islands) due to education, mass media, and increase of mobility networks within Japan, as well as economic integration.
Korean is the official language of Korea, both South and North. It is also one of the two official languages in the Yanbian Korean Autonomous Prefecture in China. There are about 78 million Korean speakers worldwide.
In the 15th century a national writing system was commissioned by Sejong the Great, the system being currently called Hangul. Prior to the development of Hangul, Koreans used Hanja (Chinese characters) to write for over a millennium. The genealogical classification of the Korean language is debated by a small number of linguists. Most classify it as a language isolate, while a few consider it to be in the Altaic language family. Others believe it to be distantly related to Japanese-Ryukyuan.
Some linguists support the hypothesis that Korean can be classified as an Altaic language or as a relative of proto-Altaic. Korean is similar to the Altaic languages in that they both lack certain grammatical elements, including articles, fusional morphology and relative pronouns.
However, linguists agree today on the fact that typological resemblances cannot be used to prove genetic relatedness of languages as these features are typologically connected and easily borrowed. Such factors of typological divergence as Middle Mongolian's exhibition of gender agreement can be used to argue that a genetic relationship is unlikely.
The hypothesis that Korean might be related to Japanese has had some more supporters due to some considerable overlap in vocabulary and similar grammatical features. Linguists have found about 25% of potential cognates in the Japanese-Korean 100-word Swadesh list, which - if valid - would place these two languages closer together than other possible members of the Altaic family.
Other linguists argue however, that the similarities are not due to any genetic relationship, but rather to a sprachbund effect and heavy borrowing especially from Korean into Western Old Japanese.
Korean has several dialects. The standard language (pyojuneo or pyojunmal) of South Korea is based on the dialect of the area around Seoul, and the standard for North Korea is based on the dialect spoken around P'yongyang. All dialects of Korean are similar to each other, and are in fact all mutually intelligible.
One of the most notable differences between dialects is the use of stress: speakers of Seoul dialect use very little stress, and standard South Korean has a very flat intonation; on the other hand, speakers of the Gyeongsang dialect have a very pronounced intonation.
There is substantial evidence for a history of extensive dialect levelling, or even convergent evolution or intermixture of two or more originally distinct linguistic stocks, within the Korean language and its dialects. Many Korean dialects have basic vocabulary that is etymologically distinct from vocabulary of identical meaning in Standard Korean or other dialects.
This suggests that the Korean Peninsula may have at one time been much more linguistically diverse than it is at present. There is a very close connection between the dialects of Korean and the regions of Korea, since the boundaries of both are largely determined by mountains and seas.
The Sino-Tibetan languages form a language family composed of, at least, the Chinese and the Tibeto-Burman languages, including some 250 languages of East Asia, Southeast Asia and parts of South Asia. They are second only to the Indo-European languages in terms of the number of native speakers.
The Sino-Tibetan language family has been defined as also including the Tai and Hmong-Mien languages. In the past, Vietnamese and other Mon-Khmer languages were classified under the Sino-Tibetan tree, however, their similarities to Chinese are currently credited to language contact.
In the Western scholarly community, the other tonal language families of East Asia, Kradai and Hmong-Mien (Miao-Yao), are no longer classified under the Sino-Tibetan tree. However, in the Chinese scholarly community, Kradai and Hmong-Mien are still commonly included in the Sino-Tibetan family.
e Ugaritic, Phoenician, Aramaic, Hebrew, Syriac, Arabic, South Arabian, and Ge'ez alphabets. Maltese is the only Semitic language written in the Latin alphabet and the only official Semitic language of the European Union.The Uralic languages constitute a language family of 37 languages spoken by approximately 25 million people. The healthiest Uralic languages in terms of the number of native speakers are Hungarian, Finnish, Estonian, Mari and Udmurt.
Countries that are home to a significant number of speakers of Uralic languages include Estonia, Finland, Hungary, Romania, Russia, Serbia, and Slovakia. The name "Uralic" refers to the suggested Urheimat (original homeland) of the Uralic family, which was often located in the vicinity of the Ural Mountains, as the modern languages are spoken on both sides of this mountain range.
In recent times, linguists often place the Urheimat further to the west and south and in the vicinity of the Volga River, close to the Urheimat of the Indo-European languages, or to the east and southeast of the Urals.