Similarity Statistics

Balancing the common Slavic language between national languages is a critical factor to achieve language neutrality. The process of manual word creation is subject to bias, each author tends to prefer words from his mother language. Algorithmic approach is not subject to bias.

Dictionary statistics and similarity to Slovanto Dictionary is guaranteed between 60-70%. However as you can see below some languages are closer and some more distant. It's no surprise as some languages were more susceptible to foreign influences due to geographical location or history.

Language Code Word Count Total Word Length
(of National Dictionary)
Total Word Distance*
(to Slovanto Dictionary)
Language Similarity
(to Slovanto)
BLR 4953 36136 13764 61.910%
BLG 4881 35029 12527 64.236%
CRO 4846 34525 13020 62.286%
CZE 5044 34524 12853 62.770%
POL 5597 41816 14979 64.177%
RUS 5193 38613 12925 66.526%
SLK 5012 35374 11626 67.132%
SLN 4817 33640 12942 61.527%
SRB 4828 34146 12466 63.492%
UKR 5066 36993 11721 68.314%
Languages Partially Included
CSB 32 128 50 60.937%
HSB 121 604 192 68.211%
MAC 339 1986 683 65.609%
OCS 112 507 152 69.921%
RUE 393 2221 497 77.600%

*Total distance between national words and calculated common words in Levenshtein Distance

The most interesting phenomenon is Rusyn language (RUE) which, even after partial inclusion, yields over 70% similiarity to "averaged language". The explanation of it is not easy, however Rusyn language emerged in the center point of Slavic cultures so it was evenly influenced by lechitic, ruthenian, moravian and balkan cultures. If one would like to choose the "most Centered Slavic Language", Rusyn could be the best pick.

Note that it is not possible to achieve 100% similarity between Common Dictionary and EVERY Slavic language. To achieve 100% similarity, all languages would have to be identical. Common words are algorithmically computed so the Levenshtein distance to the Common Word is the lowest to ALL Slavic languages. Changing letters in the Common Word will make the word more similar to one language but at the same time more distant to other language. The algorithm tries to be as much balanced as possible but it is not possible to have equal similarity, which oscilates between 60 and 70%.

Статистика словаря и сходство со словарем Slovanto гарантированы на уровне 60-70%. Обратите внимание, что невозможно достичь 100% сходства между Common Dictionary и КАЖДЫМ славянским языком. Чтобы достичь 100% сходства, все языки должны быть идентичны. Общие слова вычисляются алгоритмически, поэтому расстояние Левенштейна до Общего слова является самым низким для ВСЕХ славянских языков. Изменение букв в Общем слове сделает слово более похожим на один язык, но в то же время более далеким от другого языка. Алгоритм пытается быть максимально сбалансированным, но невозможно достичь одинакового сходства, которое колеблется между 60 и 70%.

Table of Contents