Preview

Open Education

Advanced search

The problem of analysis of big web data and the use of data mining technology for processing and searching patterns in big web data on a practical example

https://doi.org/10.21686/1818-4243-2019-2-42-49

Abstract

The purpose of the work is to study the current problems and prospects of the solution for processing big data received or stored in the Internet (web data), as well as the possibility of practical realization of Data Mining technology for big web data on practical example.

Materials and methods. The study included a review of bibliographic sources on big data analysis problems.

Data Mining technology was used to analyze large web data, as well as computer modeling of a practical problem using the C # programming language and creating a DDL database structure for accumulating web data.

Results. In the course of the work, the specifics of big data were described, the main characteristics of big data were highlighted, and modern approaches to processing big data were analyzed. A brief description of the horizontal-scalable architecture and the BI-solution architecture for big data processing is given. The problems of processing large web data are formulated: limiting the speed of access to data, providing access via network protocols through general-purpose networks.

An example showing the approach to processing large web data was also implemented. Based on the idea of big data, the described complexities of web data processing and the methods of Data Mining, techniques were proposed for effectively solving the practical problem of processing and searching patterns in a large data array.

The following classes have been developed in the C # programming language:

Class of receiving web data via the Internet; Data conversion class;

Intelligent data processing class;

Created DDL script that creates a structure for the accumulation of web data.

A single UML class diagram has been developed.

The constructed system of data and classes allows to solve the main part of the problems of processing large web data and perform intelligent processing using Data Mining technology in order to solve the problem posed of identifying certain records in a large array. The combination of object-oriented approach, neural networks and BI-analysis to filter data will speed up the process of data processing and obtaining the result of the study

Conclusion. According to the results of the study, it can be argued that the current state of technology for analyzing large web data allows you to efficiently process data objects, identify patterns, get hidden data and get full-fledged statistical data.

The obtained results can be used both for the purpose of the initial study of big data processing technologies, and as a basis for developing an already real application for analyzing web data. The use of neural networks and the created universal classes-handlers makes the created architecture flexible and self-learning, and the class declarations and the base DDL structure will greatly simplify the development of program code.

About the Authors

K. V. Mulyukova
Engineering-Technological Academy of SFU
Russian Federation

Ksenia V. Mulyukova – Postgraduate Student, Department of Automatic Control Systems 

Rostov-on-Don



V. M. Kureichik
Engineering-Technological Academy of SFU
Russian Federation

Victor M. Kureichik – Dr. Sci. (Engineering), Professor, Department of Automatic Control Systems 

Rostov-on-Don



References

1. Khashkovskiy V.V., Shkurko A.N. Modern approaches in the organization of systems for processing large volumes of data.Izvestiya Yuzhnogo federal’nogo universiteta. Tekhnicheskiye nauki = News of the Southern Federal University. Technical science. 2014; 8 (157): 241–250. (In Russ.)

2. Barsegyan A.A., Kupriyanov M.S., Stepanenko V.V., KHolod I.I. Tekhnologii analiza dannykh. Data Mining, Visual Mining, Text Mining, OLAP. 2 izd. = Data analysis technologies. Data Mining, Visual Mining, Text Mining, OLAP. 2nd ed. SPb.: BHV-Petersburg; 2007. 384 p. (In Russ.)

3. Marts N., Uorren D. Bol’shiye dannyye. Printsipy i praktika postroyeniya masshtabiruyemykh sistem obrabotki dannykh v real’nom vremeni = Big data. Principles and practice of building scalable data processing systems in real time. Moscow: Williams; 2017. 368 p. (In Russ.)

4. Koshik A. Veb-analitika 2.0 na praktike. Tonkosti i luchshiye metodiki = Web Analytics 2.0 in practice. Subtleties and best practices. Moscow: Williams; 2014. 528 p. (In Russ.)

5. Bol’shiye Dannyye = Big Data [Internet]. Tolkovyy slovar’ na Akademike = Explanatory Dictionary on Academician. 2014. URL: https://dic.academic.ru/dic.nsf/ruwiki/1422719 (Cited: 04.04.2019). (In Russ.)

6. Kuznetsov P. D., Poskonin A. V. Distributed horizontally scalable solutions for data management. Trudy Instituta sistemnogo programmirovaniya RAN = Works of the Institute for System Programming of the Russian Academy of Sciences. 2013; 24: 327–358. (In Russ.)

7. Flegontov A. V., Fomin V. V. System of intellectual data processing.Izvestiya Rossiyskogo gosudarstvennogo pedagogicheskogo universiteta im. A.I. Gertsena = A.I. Herzen News of the Russian State Pedagogical University. 2013; 1 (154): 41–48. (In Russ.)

8. Mitrovic P. Specifics of the integration of business intelligence and Big Data technologies in the processes of economic analysis. Biznes-informatika. 2017; 4 (42): 40–46.

9. Filyak P.Yu., Baylarli E.E.O., Rastvorov V.V., Starchenko V.I. Tools for using big data and data mining in order to ensure information security approaches, application experience. Vestnik Moskovskogo finansovo-yuridicheskogo universiteta = Bulletin of Moscow Financial and Law University. 2017; 2: 210–220. (In Russ.)

10. Data Mining: chto vnutri. Habr. = Data mining: what’s inside. Habr. [Internet] URL: https://habr.com/ru/post/95209/ (Cited: 04.04.2019). (In Russ.)

11. Kadyrova N.O., Pavlova L.V. Effective methods for processing large-sized multidimensional data.Nauchno-tekhnicheskiye vedomosti Sankt-Peterburgskogo gosudarstvennogo politekhnicheskogo universiteta. Informatika. Telekommunikatsii. Upravleniye = Scientific and Technical Gazette of the St. Petersburg State Polytechnic University. Computer science. Telecommunications. Management. 2012; 6 (162): 118–124. (In Russ.)

12. Novikov B.A., Grafeyeva N.G., Mikhaylova E.G. BIG DATA: New tasks and modern approaches. Komp’yuternyye instrumenty v obrazovanii = Computer tools in education. 2014; 4: 10–18. (In Russ.)

13. Loseva E.D., Antamoshkin A.N. Algorithm for automated formation of neural network ensembles for solving complex data mining problems. Izvestiya Tul’skogo gosudarstvennogo universiteta. Tekhnicheskiye nauki. = News of Tula State University. Technical science. 2017; 4: 234–243. (In Russ.)

14. The author. 2014. (In Russ.)

15. Kleppman M. Vysokonagruzhennyye prilozheniya. Programmirovaniye, masshtabirovaniye, podderzhka = Highly loaded applications. Programming, scaling, support. Saint Petersburg: Peter; 2018. 740 p. (In Russ.)

16. Flegontov A. V., Fomin V. V. System of intellectual data processing. Izvestiya Rossiyskogo gosudarstvennogo pedagogicheskogo universiteta im. A.I. Gertsena = News of the Herzen Russian State Pedagogical University. 2013; 1 (154): 41–48. (In Russ.)

17. Samarev R.S. Review of the state of the stream data processing area. Trudy Instituta sistemnogo programmirovaniya RAN = Proceedings of the Institute for System Programming of the Russian Academy of Sciences. 2017; 1 : 231–260. (In Russ.)


Review

For citations:


Mulyukova K.V., Kureichik V.M. The problem of analysis of big web data and the use of data mining technology for processing and searching patterns in big web data on a practical example. Open Education. 2019;23(2):42-49. (In Russ.) https://doi.org/10.21686/1818-4243-2019-2-42-49

Views: 1253


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-4243 (Print)
ISSN 2079-5939 (Online)