Preview

Open Education

Advanced search

Automatic vowels selection and ranking in Russian enciphered texts

https://doi.org/10.21686/1818-4243-2018-1-59-69

Abstract

This work was developed while teaching students the cryptanalysis. The course includes the study of statistics of (Russian encrypted) texts. The purpose of training is to learn how to extract redundant information of the text and to descript the cryptogram without a password. One of the most comfortable methods for learning is a simple substitution and similar encryption systems, which are presented in most courses of cryptography. This paper presents a method of automatic separation of vowels and consonants in Russian texts, which releases some of the redundancy of the cipher text. In addition, this method greatly facilitates the task of descript some other symmetric ciphers which may be reduced to simple substitution.

The aim of this work is to develop and implement a method for the automatic selection of vowels in Russian texts, enciphered by a simple substitution and similar encryption systems.

According to the theory of Shannon, for unambiguous decoding of the text you want the redundancy of the text that exceeds the entropy of the password. After the separation of vowels and consonants redundancy of the text increases to one bit per symbol, this allows you to open shorter encrypted texts. Moreover, the separation of vowels and consonants greatly simplifies the cryptanalysis of some ciphers. For instance, cryptanalysis of the most famous encryption method - method of simple substitution-requires selection of one of N! possible passwords (where N is the number of letters in the alphabet). For the Russian language it is 33! or nearly 2 to 123rd degrees of options. After the separation of vowels and consonants you will need a selection of 10!*23!, or nearly 2 to 96th degrees of options. The number of combinations is reduced to one hundred million times, that makes the cryptanalysis much easier. The program that implements this method first creates a matrix of the probabilities of bigrams of the text.

For this matrix Markov criterion calculated, defined as the difference between the conditional probabilities of vowel-consonant and vowelvowel diagram’s types. For an alphabet consisted of N characters the program defines a combination of a given number k of “vowels” by exhaustive search. This combination of k symbols maximizes Markov criterion. The order relation of the new “vowels” for k = 1, 2, 3... characterizes the descending of their “strength” and can be used to separate vowels from consonants. In texts of sufficient volume there are possible approximate ranking of the vowel’s set. A more accurate ranking is possible when as a measure of “symbol power” Markov criterion’s increments are used. The algorithm speed can be greatly accelerated by using some tricks of steepest descent method. The test program discovered the independence of Markov criterion from the text’s author as well as its unimodality for long texts. Using this criterion, the algorithm can separate vowels from consonants for short (up to 100 characters) texts as well as the ranking of vowels for texts as small as 250-500 letters. The similarity of Markov criterion’s statistics of letters “ь”, “ъ” and standard vowels is discovered. These two letters are inseparable by Markov criterion method from the standard vowels. The test results showed that Markov criterion method can be used for cryptanalysis of short Russian texts as well as texts of the other consonant languages. 

About the Author

Yuri I. Petrenko
Moscow Aviation Institute (National Research University)
Russian Federation

Cand. Sci. (Engineering), Senior researcher, Associate Professor 

Tel.: 8-926-237-3501



References

1. Markov A.A. Issledovanie zamechatel’nogo sluchaya zavisimykh ispytaniy. Izvestiya Imperatorskoy akademii nauk. Series 6. 1907. Vol. 1. Iss. 3. P. 61-80. (In Russ.)

2. Markov A.A. Primer statisticheskago izsledovaniya nad tekstom “Evgeniya Onegina”, illyustriruyushchiy svyaz’ ispytaniy v tsep’”. Izvestiya Imperatorskoy Akademii Nauk. VI series. 1913. Vol. 7. Iss. 3. P. 153–162. (In Russ.)

3. Shennon K. Teoriya svyazi v sekretnykh sistemakh. In.: Raboty po teorii informatsii i kibernetike. Moscow: IL, 1963. (In Russ.)

4. Friedman W. F.,Callimahos D. Military cryptanalysis. Aegean Park Press, Laguna Hills CA, 1985. Part I. Vol. 2.

5. Moler C., Morrison D. Singular Value Analysis of Cryptograms. The American Mathematical Monthly. 1983. Vol. 90. No. 2. P. 78–87.

6. Kahn D. The codebreakers. The story of secret writing. Macmillan, N.Y., 1967.

7. Alferov A. P., Zubov A. Yu., Kuz’min A. S., Cheremushkin A. V. Osnovy kriptografii. Moscow: Gelios ARV, 2002. (In Russ.)

8. Sikora T.F. Field Manual 34-40-2, Basic Cryptanalysts. USA, Washington, DC, 13 September 1990.

9. Pushkin A.S. Evgeniy Onegin, Moscow: Eksmo, 2017. (In Russ.)

10. Yaglom A.M. Yaglom I.M. Veroyatnost’ i informatsiya. Moscow: Nauka, 1973. (In Russ.)

11. Konan Doyl’ A., Priklyucheniya Sherloka Kholmsa. Plyashushchie chelovechki. Moscow: AST, 2016. (In Russ.)

12. Piotrovskiy R.G., K.B. Bektaev, A.A. Piotrovskaya. Matematicheskaya lingvistika. Moscow: Vysshaya shkola. 1977. P. 383. (In Russ.)

13. Mashinnyy fond russkogo yazyka. URL: http//cfrl.ruslang.ru. (In Russ.)

14. Pushkin A.S. Kapitanskaya dochka. Moscow: Eksmo, 2015. P. 1-640. (In Russ.)

15. Aksakov S.T. Detskie gody Bagrova-vnuka. Moscow: AST, 2017. P. 1-416. (In Russ.)

16. Kuprin A.I. Poedinok. Moscow: Eksmo, 2013. P. 1-640. (In Russ.)

17. Turgenev I.S. Dvoryanskoe gnezdo. Moscow: AST, 2017. P. 384 (In Russ.)

18. Lermontov M.Yu. Geroy nashego vremeni. Moscow: AST, 2015. P. 1-285. (In Russ.)

19. Chekhov A.P. Tsvety zapozdalye. A. P. Chekhov. Polnoe sobranie sochineniy i pisem v 30-ti tomakh. Sochineniya. Moscow: Nauka, 1983. Vol. 1. (In Russ.)

20. Shukshin V.M. Kinopovesti. M., 1988. P. 115. (In Russ.)


Review

For citations:


Petrenko Yu.I. Automatic vowels selection and ranking in Russian enciphered texts. Open Education. 2018;22(1):59-69. (In Russ.) https://doi.org/10.21686/1818-4243-2018-1-59-69

Views: 791


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-4243 (Print)
ISSN 2079-5939 (Online)