Similarity Statistics

Balancing the common Slavic language between national languages is a critical factor to achieve language neutrality. The process of manual word creation is subject to bias, each author tends to prefer words from his mother language. Algorithmic approach is not subject to bias.

Dictionary statistics and similarity to Slovanto Dictionary is guaranteed between 60-70%. However as you can see below some languages are closer and some more distant. It's no surprise as some languages were more susceptible to foreign influences due to geographical location or history.

Language Code	Word Count	Total Word Length (of National Dictionary)	Total Word Distance* (to Slovanto Dictionary)	Language Similarity (to Slovanto)
BLR	4953	36136	13764	61.910%
BLG	4881	35029	12527	64.236%
CRO	4846	34525	13020	62.286%
CZE	5044	34524	12853	62.770%
POL	5597	41816	14979	64.177%
RUS	5193	38613	12925	66.526%
SLK	5012	35374	11626	67.132%
SLN	4817	33640	12942	61.527%
SRB	4828	34146	12466	63.492%
UKR	5066	36993	11721	68.314%
Languages Partially Included
CSB	32	128	50	60.937%
HSB	121	604	192	68.211%
MAC	339	1986	683	65.609%
OCS	112	507	152	69.921%
RUE	393	2221	497	77.600%

*Total distance between national words and calculated common words in Levenshtein Distance

The most interesting phenomenon is Rusyn language (RUE) which, even after partial inclusion, yields over 70% similiarity to "averaged language". The explanation of it is not easy, however Rusyn language emerged in the center point of Slavic cultures so it was evenly influenced by lechitic, ruthenian, moravian and balkan cultures. If one would like to choose the "most Centered Slavic Language", Rusyn could be the best pick.

Note that it is not possible to achieve 100% similarity between Common Dictionary and EVERY Slavic language. To achieve 100% similarity, all languages would have to be identical. Common words are algorithmically computed so the Levenshtein distance to the Common Word is the lowest to ALL Slavic languages. Changing letters in the Common Word will make the word more similar to one language but at the same time more distant to other language. The algorithm tries to be as much balanced as possible but it is not possible to have equal similarity, which oscilates between 60 and 70%.

Статистика словаря и сходство со словарем Slovanto гарантированы на уровне 60-70%. Обратите внимание, что невозможно достичь 100% сходства между Common Dictionary и КАЖДЫМ славянским языком. Чтобы достичь 100% сходства, все языки должны быть идентичны. Общие слова вычисляются алгоритмически, поэтому расстояние Левенштейна до Общего слова является самым низким для ВСЕХ славянских языков. Изменение букв в Общем слове сделает слово более похожим на один язык, но в то же время более далеким от другого языка. Алгоритм пытается быть максимально сбалансированным, но невозможно достичь одинакового сходства, которое колеблется между 60 и 70%.

Similarity Statistics

Table of Contents