Preview

Open Education

Advanced search

Using A Neural Network to Generate Images When Teaching Students to Develop an Alternative Text

https://doi.org/10.21686/1818-4243-2024-1-9-20

Abstract

The purpose of research. The purpose of the study is to develop and test an approach to training digital content compilers in creating alternative text that accurately describes the original image, using a neural network to generate reference images reconstructed from the text. The lack of textual descriptions of visual content in a web resource limits digital accessibility, especially for users with visual disorders. To ensure accessibility, each informative image should be accompanied by the alternative text. Text alternatives generated by means of automated tools are known to be lower in quality to human-generated descriptions. Therefore, a digital content compiler must be able to develop the alternative text for images. It has been suggested that a neural network for generating images from text descriptions can act as a tool for checking the relevance of the developed text alternatives.

Materials and methods. The study was carried out in April-May 2023. 17 undergraduate students studied the requirements for developing text alternatives, completed initial text descriptions for three proposed photographs, and then corrected the text using the Kandinsky 2.1 neural network according to the algorithm: generating an image from the description; visual comparison of the resulting image with the original; returning to editing the description or ending the process. Based on the initial and final descriptions, the researchers reconstructed the images using the same neural network. Further work consisted of assessing the quality of all text descriptions and the similarity of all generated images to the original ones. The results of the study (text descriptions; expert evaluations; links to generated images) were published as a data set in the Mendeley Data repository. The t-test, Pearson correlation and multivariate regression were used to analyze the data (at the specified significance level p = 0,05).

Results. It was found that the quality scores of the initial and final text descriptions were not significantly different (p > 0,05), and also there were no significant differences for the length of the text (p > 0,05). At the same time, the similarity of the generated images and original photographs after students used the neural network has increased considerably (p < 0,05). Therefore, training in the neural network contributed to improving the quality (similarity to the original) of images generated from modified text descriptions, without losing the descriptions’ quality. It was also shown that the quality of the final text alternatives was higher the larger their size within the allotted limit, the better and shorter the initial descriptions (p < 0,05). Thus, concise and accurate alternative descriptions for images after training students in a neural network can be converted into equally high-quality text alternatives, the relevance of which is increased by adding plot details to the description.

Conclusion. Neural networks generating images can be applied as a software tool to encourage potential content authors to create more accurate and complete alternative text while keeping it concise. It seems important to continue the research by extending it to other types of images and using a variety of neural networks.

About the Authors

Yekaterina A. Kosova
V.I. Vernadsky Crimean Federal University
Russian Federation

Kosova A. Yekaterina, Cand. Sci. (Pedagogical), Associat Professor, Head of the Department of Applied Mathematics at the Institute of Physics and Technology, 

Simferopol.



Kirill I. Redkokosh
V.I. Vernadsky Crimean Federal University
Russian Federation

Kirill I. Redkokosh, Postgraduate Student at the Institute of Physics and Technology,

Simferopol.



Pavel O. Mikheyev
V.I. Vernadsky Crimean Federal University
Russian Federation

Pavel O. Mikheyev, Student at the Institute of Physics and Technology,

Simferopol.



References

1. Web Content Accessibility Guidelines (WCAG) 2.1. [Internet]. 2018. Available from: https://www.w3.org/TR/WCAG21/ (cited 22.11.2023).

2. Web Content Accessibility Guidelines (WCAG) 2.2. [Internet]. 2023. Available from: https://www.w3.org/TR/WCAG22/ (cited 22.11.2023).

3. World Blind Union [Internet]. 2023. Available from: https://worldblindunion.org/ (cited 22.11.2023).

4. Gill K., Sharma R., Gupta R. Empowering visually impaired students through e-learning at higher education: problems and solutions. IOSR Journal of Humanities and Social Science. 2017; 22; 8: 27-35. DOI: 10.9790/0837-2208072735.

5. Marghalani A. Online courses accessibility for low-vision [Internet]. In 2020 AECT Convention Proceedings. 2020: 1–37. Available from: https://members.aect.org/pdf/Proceedings/proceedings20/2020/20_03.pdf (cited 22.11.2023).

6. Jung C., Mehta S., Kulkarni A., Zhao Y., Kim Y.-S. Communicating Visualizations without Visuals: Investigation of Visualization Alternative Text for People with Visual Impairments. In IEEE Transactions on Visualization and Computer Graphics. 2022; 28; 1: 1095-1105. DOI: 10.1109/TVCG.2021.3114846.

7. The WebAIM million: Images and alternative text [Internet]. 2023. Available from: https://webaim.org/projects/million/#alttext (cited 22.11.2023).

8. Usage statistics of image file formats for websites [Internet]. 2023. Available from: https://w3techs.com/technologies/overview/image_format (cited 22.11.2023).

9. Sharma H., Agrahari M., Singh S.K., Firoj M., Mishra R.K. Image Captioning: A Comprehensive Survey. In 2020 International Conference on Power Electronics & IoT Applications in Renewable Energy and its Control (PARC), Mathura, India. 2020: 325-328. DOI: 10.1109/PARC49193.2020.236619.

10. Hanley M., Barocas S., Levy K., Azenkot S., Nissenbaum H. Computer Vision and Conflicting Values: Describing People with Automated Alt Text. In Proceedings of the 2021 AAAI/ ACM Conference on AI, Ethics, and Society (AIES ‘21). Association for Computing Machinery, New York, NY, USA. 2021: 543–554. DOI: 10.1145/3461702.3462620.

11. Lee J., Peng Y.H., Herskovitz J., Guo A. Image Explorer: Multi-Layered Touch Exploration to Make Images Accessible. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS ‘21). Association for Computing Machinery, New York, NY, USA. 2021; 69: 1–4. DOI: 10.1145/3441852.3476548.

12. Mack K., Cutrell E., Lee B., Morris M.R. Designing Tools for High-Quality Alt Text Authoring. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS ‘21). Association for Computing Machinery, New York, NY, USA. 2021; 23: 1–14. DOI: 10.1145/3441852.3471207.

13. Jeong H., Chun V., Lee H., Oh S.Y., Jung H. WATAA: Web Alternative Text Authoring Assistant for Improving Web Content Accessibility // In Companion Proceedings of the 28th International Conference on Intelligent User Interfaces (IUI ‘23 Companion). Association for Computing Machinery, New York, NY, USA. 2023: 41–45. DOI: 10.1145/3581754.3584127.

14. Salisbury E., Kamar E., Morris M. Toward Scalable Social Alt Text: Conversational Crowdsourcing as a Tool for Refining Vision-to-Language Technology for the Blind.Proceedings of the AAAI Conference on Human Computation and Crowdsourcing. 2017; 5(1): 147-156. DOI: 10.1609/hcomp.v5i1.13301.

15. Edwards E.J., Gilbert M., Blank E., Branham S.M. How the Alt Text Gets Made: What Roles and Processes of Alt Text Creation Can Teach Us About Inclusive Imagery. ACM Trans. Access. Comput. 2023; 16(2): 1-18. DOI: 10.1145/3587469.

16. Chintalapati S.S., Bragg J., Wang L.L. A Dataset of Alt Texts from HCI Publications: Analyses and Uses Towards Producing More Descriptive Alt Texts of Data Visualizations in Scientific Papers. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS ‘22). Association for Computing Machinery, New York, NY, USA. 2022: 1–12. DOI: 10.1145/3517428.3544796.

17. Morash V.S., Siu Y.-T., Miele J.A., Hasty L., Landau S. Guiding Novice Web Workers in Making Image Descriptions Using Templates. ACM Trans. Access. Comput. 2015; 7; 4: 1-21. DOI: 10.1145/2764916.

18. Gleason C., Pavel A., Liu X., Carrington P., Chilton L.B., Bigham J.P. Making Memes Accessible. In The 21st International ACM SIGACCESS Conference on Computers and Accessibility (Pittsburgh, PA, USA) (ASSETS ’19). Association for Computing Machinery, New York, NY, USA. 2019: 367–376. DOI: 10.1145/3308561.3353792.

19. Midjourney [Internet]. 2023. Available from: https://www.midjourney.com/home?callbackUrl=%2Fexplore (cited 22.11.2023).

20. DALL-E [Internet]. 2023. Available from: https://openai.com/dall-e-2

21. Kandinsky [Internet]. 2023. Available from: https://rudalle.ru/kandinsky2 (cited 22.11.2023). (In Russ.)

22. Alternative Text [Internet]. 2023. Available from: https://webaim.org/techniques/alttext/ (cited 22.11.2023).

23. Images Tutorial [Internet]. 2022. Available from: https://www.w3.org/WAI/tutorials/images/ (cited 22.11.2023).

24. Dobavleniye zameshchayushchego teksta k figure, kartinke, diagramme, risunku SmartArt ili k drugomu ob”yektu = Add alt text to a shape, picture, chart, SmartArt, or other object [Internet]. 2023. Available from: https://support.microsoft.com/ru-ru/office/83-44989b2a-903c-4d9a-b742-6a75b451c669 (cited 22.11.2023). (In Russ.)

25. GOST R 57891-2022. Tiflocommentation and Tiflocommentary. Terms and definitions: date of introduction 2022-01-01 / FGBU “RST”, NU IPRPP VOS “Reacomp” [Internet]. 2022. Available from: https://nd.gostinfo.ru/document/6880129.aspx (cited 15.12.2023). (In Russ.)

26. On establishing a procedure for ensuring accessibility conditions for visually impaired persons of official websites of state bodies, local governments and subordinate organizations on the Internet information and telecommunications network [Order of the Ministry of Digital Development, Communications and Mass Communications of the Russian Federation dated December 12, 2022 N 931 ] [Internet]. 2022. Available from: https://www.garant.ru/products/ipo/prime/doc/405916637/ (cited 15.12.2023). (In Russ.)

27. Lunch atop a Skyscraper [Internet]. 2023. Available from: https://en.wikipedia.org/wiki/Lunch_atop_a_Skyscraper (cited 15.12.2023).

28. Apollo 11 [Internet]. 2023. Available from: https://en.wikipedia.org/wiki/Apollo_11 (cited 15.12.2023).

29. Drysdale J., Regan M. Our Peaceable Kingdom: The photographs of John Drysdale. New York: St. Martin’s Press; 2000. 112 p.

30. Chaddock R.E. Principles and methods of statistics. Boston: Houghton Mifflin Company; 1925. 471 p.

31. Maerten A.-S., Soydaner D. From paintbrush to pixel: A review of deep neural networks in AI-generated art. 2023. DOI: 10.48550/arXiv.2302.10913.


Review

For citations:


Kosova Ye.A., Redkokosh K.I., Mikheyev P.O. Using A Neural Network to Generate Images When Teaching Students to Develop an Alternative Text. Open Education. 2024;28(1):9-20. (In Russ.) https://doi.org/10.21686/1818-4243-2024-1-9-20

Views: 388


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-4243 (Print)
ISSN 2079-5939 (Online)