Similarity Statistics

Balancing the common Slavic language between national languages is a critical factor to achieve language neutrality. The process of manual word creation is subject to bias, each author tends to prefer words from his mother language. Algorithmic approach is not subject to bias.

Dictionary statistics and similarity to Slovanto Dictionary is guaranteed between 60-70%. However as you can see below some languages are closer and some more distant. It's no surprise as some languages were more susceptible to foreign influences due to geographical location or history.

Language Code	Word Count	Total Word Length (of National Dictionary)	Total Word Distance* (to Slovanto Dictionary)	Language Similarity (to Slovanto)
BLR	6133	45652	17651	61.33%
BLG	6093	45347	16332	63.98%
CRO	6017	43552	16198	62.80%
CZE	6279	43768	16232	62.91%
POL	6940	52705	19281	63.41%
RUS	6447	49072	16533	66.30%
SLK	6235	44661	14699	67.08%
SLN	6006	42785	16403	61.66%
SRB	6015	43258	15482	64.21%
UKR	6296	47095	15193	67.73%
Languages Partially Included
CSB	72	300	147	50.83%
HSB	179	933	354	62.00%
MAC	662	4206	1581	62.39%
OCS	142	708	237	66.45%
RUE	456	2596	636	75.48%
SZL	237	1333	663	50.26%

*Total distance between national words and calculated common words in Levenshtein Distance

The most interesting phenomenon is Rusyn language (RUE) which, even after partial inclusion, yields over 70% similiarity to "averaged language". The explanation of it is not easy, however Rusyn language emerged in the center point of Slavic cultures so it was evenly influenced by lechitic, ruthenian, moravian and balkan cultures. If one would like to choose the "most Centered Slavic Language", Rusyn could be the best pick.

Note that it is not possible to achieve 100% similarity between Common Dictionary and EVERY Slavic language. To achieve 100% similarity, all languages would have to be identical. Common words are algorithmically computed so the Levenshtein distance to the Common Word is the lowest to ALL Slavic languages. Changing letters in the Common Word will make the word more similar to one language but at the same time more distant to other language. The algorithm tries to be as much balanced as possible but it is not possible to have equal similarity, which oscilates between 60 and 70%.

Статистика словаря и сходство со словарем Slovanto гарантированы на уровне 60-70%. Обратите внимание, что невозможно достичь 100% сходства между Common Dictionary и КАЖДЫМ славянским языком. Чтобы достичь 100% сходства, все языки должны быть идентичны. Общие слова вычисляются алгоритмически, поэтому расстояние Левенштейна до Общего слова является самым низким для ВСЕХ славянских языков. Изменение букв в Общем слове сделает слово более похожим на один язык, но в то же время более далеким от другого языка. Алгоритм пытается быть максимально сбалансированным, но невозможно достичь одинакового сходства, которое колеблется между 60 и 70%.

Similarity Statistics

Table of Contents