Tajik language
E122576
The Tajik language is a variety of Persian spoken primarily in Tajikistan and written in the Cyrillic script.
All labels observed (5)
| Label | Occurrences |
|---|---|
| Tajik | 14 |
| Tajik language canonical | 7 |
| Tajik Persian | 3 |
| Tajik (variety of Persian) | 1 |
| Tajik phonology | 1 |
How this entity was disambiguated
This entity first appeared as the object of triple T1023226 — resolving that mention is where its identity was fixed. The disambiguator weighed these candidate entities and picked the highlighted one (or “None”, minting a new entity). This is how homonymy is resolved: the same surface form can point to different entities.
NED1
Entity disambiguation (via context triple)
gpt-5-mini-2025-08-07
Target entity: Tajik language Context triple: [Iranian languages, hasMajorLanguage, Tajik language]
-
A.
Judeo-Tajik
Judeo-Tajik is a Jewish ethnolect of the Tajik (Persian) language historically spoken by Central Asian Bukharan Jews, written in Hebrew script and enriched with Hebrew and Aramaic loanwords.
-
B.
Turkmen language
The Turkmen language is a Turkic language spoken primarily in Turkmenistan and surrounding regions, closely related to Turkish and other Oghuz languages.
-
C.
Uzbek
Uzbek refers to a Turkic ethnic group primarily associated with Uzbekistan and surrounding regions of Central Asia, known for their distinct language, culture, and historical heritage.
-
D.
Tajiks
Tajiks are an Iranian ethnic group of Central Asia, primarily inhabiting Tajikistan and parts of Afghanistan and Uzbekistan, known for their Persian cultural and linguistic heritage.
-
E.
Oghuz Uzbek
Oghuz Uzbek refers to a historical Turkic group associated with the Oghuz branch that played a key role in the ethnogenesis of the Uzbek people.
- F. None of above. chosen
- G. Unsure - the case is ambiguous/there is not enough information to decide.
NED2
Entity disambiguation (via description)
gpt-5-mini-2025-08-07
Target entity: Tajik language Target entity description: The Tajik language is a variety of Persian spoken primarily in Tajikistan and written in the Cyrillic script.
-
A.
Judeo-Tajik
Judeo-Tajik is a Jewish ethnolect of the Tajik (Persian) language historically spoken by Central Asian Bukharan Jews, written in Hebrew script and enriched with Hebrew and Aramaic loanwords.
-
B.
Turkmen language
The Turkmen language is a Turkic language spoken primarily in Turkmenistan and surrounding regions, closely related to Turkish and other Oghuz languages.
-
C.
Uzbek
Uzbek refers to a Turkic ethnic group primarily associated with Uzbekistan and surrounding regions of Central Asia, known for their distinct language, culture, and historical heritage.
-
D.
Tajiks
Tajiks are an Iranian ethnic group of Central Asia, primarily inhabiting Tajikistan and parts of Afghanistan and Uzbekistan, known for their Persian cultural and linguistic heritage.
-
E.
Oghuz Uzbek
Oghuz Uzbek refers to a historical Turkic group associated with the Oghuz branch that played a key role in the ethnogenesis of the Uzbek people.
- F. None of above. chosen
Statements (50)
How these facts were elicited
The pipeline generated the facts above by prompting gpt-5.1 with this entity's name + description and the instruction below.
Instruction
You are a knowledge base construction expert. Given a subject entity and a description of it, return factual statements that you know for the subject as a JSON list of dictionaries(triples), where keys must be "subject", "predicate" and "object". The number of facts may be very high, between 25 to 50 or more, for very popular subjects. For less popular subjects, the number of facts can be very low, like 5 or 10. # Requirements - If you don't know the subject at all, return an empty list. - If the subject is not a named entity, return an empty list. - Include at least one triple where predicate is "instanceOf". - Do not get too wordy. - Separate several objects into multiple triples with one object.
Input
Subject: Tajik language Description of subject: The Tajik language is a variety of Persian spoken primarily in Tajikistan and written in the Cyrillic script.
Referenced by (26)
Full triples — surface form annotated when it differs from this entity's canonical label.
this entity surface form:
Tajik Persian
this entity surface form:
Tajik
this entity surface form:
Tajik
this entity surface form:
Tajik
this entity surface form:
Tajik
this entity surface form:
Tajik
this entity surface form:
Tajik phonology
this entity surface form:
Tajik
subject surface form:
Middle Persian
this entity surface form:
Tajik Persian
this entity surface form:
Tajik Persian
this entity surface form:
Tajik (variety of Persian)
this entity surface form:
Tajik
this entity surface form:
Tajik
this entity surface form:
Tajik
this entity surface form:
Tajik
this entity surface form:
Tajik
this entity surface form:
Tajik
this entity surface form:
Tajik