Similarity Statistics

Balancing the common Slavic language between national languages is a critical factor to achieve language neutrality. The process of manual word creation is subject to bias, each author tends to prefer words from his mother language. Algorithmic approach is not subject to bias.

Dictionary statistics and similarity to Slovanto Dictionary is guaranteed between 60-70%. However as you can see below some languages are closer and some more distant. It's no surprise as some languages were more susceptible to foreign influences due to geographical location or history.

Language Code Word Count Total Word Length
(of National Dictionary)
Total Word Distance*
(to Slovanto Dictionary)
Language Similarity
(to Slovanto)
BLR 6133 45652 17651 61.33%
BLG 6093 45347 16332 63.98%
CRO 6017 43552 16198 62.80%
CZE 6279 43768 16232 62.91%
POL 6940 52705 19281 63.41%
RUS 6447 49072 16533 66.30%
SLK 6235 44661 14699 67.08%
SLN 6006 42785 16403 61.66%
SRB 6015 43258 15482 64.21%
UKR 6296 47095 15193 67.73%
Languages Partially Included
CSB 72 300 147 50.83%
HSB 179 933 354 62.00%
MAC 662 4206 1581 62.39%
OCS 142 708 237 66.45%
RUE 456 2596 636 75.48%
SZL 237 1333 663 50.26%

*Total distance between national words and calculated common words in Levenshtein Distance

The most interesting phenomenon is Rusyn language (RUE) which, even after partial inclusion, yields over 70% similiarity to "averaged language". The explanation of it is not easy, however Rusyn language emerged in the center point of Slavic cultures so it was evenly influenced by lechitic, ruthenian, moravian and balkan cultures. If one would like to choose the "most Centered Slavic Language", Rusyn could be the best pick.

Note that it is not possible to achieve 100% similarity between Common Dictionary and EVERY Slavic language. To achieve 100% similarity, all languages would have to be identical. Common words are algorithmically computed so the Levenshtein distance to the Common Word is the lowest to ALL Slavic languages. Changing letters in the Common Word will make the word more similar to one language but at the same time more distant to other language. The algorithm tries to be as much balanced as possible but it is not possible to have equal similarity, which oscilates between 60 and 70%.

Статистика словаря и сходство со словарем Slovanto гарантированы на уровне 60-70%. Обратите внимание, что невозможно достичь 100% сходства между Common Dictionary и КАЖДЫМ славянским языком. Чтобы достичь 100% сходства, все языки должны быть идентичны. Общие слова вычисляются алгоритмически, поэтому расстояние Левенштейна до Общего слова является самым низким для ВСЕХ славянских языков. Изменение букв в Общем слове сделает слово более похожим на один язык, но в то же время более далеким от другого языка. Алгоритм пытается быть максимально сбалансированным, но невозможно достичь одинакового сходства, которое колеблется между 60 и 70%.

Table of Contents