Araştırma Makalesi
BibTex RIS Kaynak Göster

Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers

Yıl 2023, Cilt: 9 Sayı: 4, 14 - 22, 31.12.2023

Öz

The rapid advancement of technology has resulted in a surge in the volume of digital data available. This situation creates a problem for users who need assistance locating specific information inside this massive data collection, resulting in a time-consuming process. Automatic Text Summarizing systems have been developed as a more effective solution to conventional
summary techniques to address this issue and improve users' access to relevant information. It is well known that, because of their busy schedules, researchers in the field of health sciences find it challenging to keep up with the most recent literature. The goal of this study is to generate comprehensive summaries of Turkish-language scientific papers in the field of health sciences. Although abstracts are already present in scientific papers, more thorough summaries are still required. To the best of our knowledge, no previous attempt has been made to automatically summarize academic papers on health in the Turkish language. For this, a dataset of 105 Turkish papers from DergiPark was collected. Term Frequency, Term Frequency-Inverse Document Frequency, Latent Semantic Analysis, TextRank, and Latent Dirichlet Allocation algorithms were chosen as extractive text summarization methods due to their frequent usage in this field. The performance of the text summarization models was evaluated using Recall, Precision, and F-score metrics, and the algorithms gave satisfying results for Turkish.

Kaynakça

  • [1] J. P. Andersen, M. W. Nielsen, N. L. Simone, R. E. Lewiss, and R. Jagsi, “COVID-19 medical papers have fewer women first authors than expected,” eLife, vol. 9, June 2020. doi:10.7554/elife.58807
  • [2] A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with pointer-generator networks,” arxiv.org, April 2017 [Online]. Available: arXiv, https://arxiv.org/abs/1704.04368. [Accessed: 15 Dec. 2023].
  • [3} S. Narayan, S. B. Cohen, and M. Lapata, “Ranking sentences for extractive summarization with reinforcement learning,” arxiv.org, April 2018 [Online]. Available: arXiv, https://arxiv.org/abs/1802.08636. [Accessed: 15 Dec. 2023].
  • [4] E. Erdağı, “Extractive based automatic text summarization in Turkish texts,” Ph.D. dissertation, Maltepe University, İstanbul, Turkey, 2023.
  • [5] Ö. E. Gündoğdu and N. Duru, “Turkish text summarization and methods,” in Proc. of 18th Academic Computing Conference -AB 2016, Aydın, Turkey, January 30-February 5, 2016, pp. 69–76.
  • [6] DergiPark, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/. [Accessed: 16 Dec. 2023].
  • [7] J. Beel, B. Gipp, S. Langer, and C. Breitinger, “Research-paper recommender systems: A literature survey,” International Journal on Digital Libraries, vol. 17, no. 4, pp. 305–338, July 2015. doi:10.1007/s00799-015-0156-0
  • [8] A. Güran, "Automatic text summarization system," Ph.D. dissertation, Yıldız Technical University, Istanbul, Turkey, 2013.
  • [9] O. Kaynar, Y. Işık, Y. Görmez, and F. Demirkoparan “Genetic algorithm based sentence extraction for automatic text summarization,” Journal of Management Information Systems, vol. 3, no. 2, pp. 62-75, December 2017.
  • [10] H. Torun and A. B. Inner, "Detecting similar news by summarizing Turkish news," in Proc. of 2018 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, 2018, pp. 1-4. doi: 10.1109/SIU.2018.8404826
  • [11] A. A. Karcioğlu and A. C. Yaşa, "Automatic summary extraction in texts using genetic algorithms,"in Proc. of 2020 28th Signal Processing and Communications Applications Conference (SIU), Gaziantep, Turkey, 2020, pp. 1-4. doi: 10.1109/SIU49456.2020.9302205
  • [12] Ankara Journal of Health Services Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/ashd. [Accessed: 16 Dec. 2023].
  • [13] Çukurova Medical Journal Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/cumj. [Accessed: 16 Dec. 2023].
  • [14] Ege Journal of Medicine Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/egetbd. [Accessed: 16 Dec. 2023].
  • [15] Gazi Journal of Health Sciences Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/gsbdergi. [Accessed: 16 Dec. 2023].
  • [16]Journal of Hacettepe University Faculty of Health Sciences Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/husbfd. [Accessed: 16 Dec. 2023].
  • [17] Journal of Istanbul Faculty of Medicine Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik https://dergipark.org.tr/tr/pub/iuitfd. [Accessed: 16 Dec. 2023].
  • [18] Mersin University Journal of Health Sciences Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik https://dergipark.org.tr/tr/pub/mersinsbd. [Accessed: 16 Dec. 2023].
  • [19] Journal of Samsun Health Sciences Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik https://dergipark.org.tr/tr/pub/jshs. [Accessed: 16 Dec. 2023].
  • [20] S. Bal, “New methods for improving the performance of extractive Turkish text summarization,” Ph.D. dissertation, Eskisehir Osmangazi University, Eskisehir, Turkey, 2022.
  • [21] F. Horasan F and B. Bilen “Extractive text summarization system for news texts,” International Journal of Applied Mathematics Electronics and Computers, vol. 8, no. 4, pp. 179-184, December 2020. doi:10.18100/ijamec.800905
  • [22] N. Abdi Omar, “A user based comparative study of automatic text summarization,” M.Sc. Dissertation, Institute of Science and Technology, Kocaeli University, Kocaeli, Turkey, 2018.
  • [23] E. Akulker, “Extractive text summarization for Turkish using TF-IDF and pagerank algorithms,” Ph.D. dissertation, The Graduate School of Natural and Applied Sciences of Atılım University, İstanbul, Turkey, 2019.
  • [24] M. G. Ozsoy, F. N. Alpaslan, and I. Cicekli, “Text summarization using latent semantic analysis,” Journal of Information Science, vol. 37, no. 4, pp. 405–417, June 2011. doi:10.1177/0165551511408848
  • [25] V. Gulati, D. Kumar, D. E. Popescu, and J. D. Hemanth, “Extractive article summarization using integrated TextRank and BM25+ algorithm,” Electronics, vol. 12, no. 2, p. 372, January 2023. doi:10.3390/electronics12020372
  • [26] M. Kar, S. Nunes, and C. Ribeiro, “Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model,” Information Processing & Management, vol. 51, no. 6, pp. 809–833, November 2015. doi:10.1016/j.ipm.2015.06.002
  • [27] S.W. Kim and J.M. Gil, “Research paper classification systems based on TF-IDF and LDA schemes,” Human-Centric Computing and Information Sciences, vol. 9, no. 1, August 2019. doi:10.1186/s13673-019-0192-7
  • [28] M. Zhang, C. Li, M. Wan, X. Zhang, and Q. Zhao, “ROUGE-SEM: Better evaluation of summarization using ROUGE combined with semantics,” Expert Systems with Applications, vol. 237, p. 121364, March 2024. doi:10.1016/j.eswa.2023.121364
  • [29] M. D. Akın and A. A. Akın, “Türk dilleri için açık kaynaklı doğal dil işleme kütüphanesi: ZEMBEREK,” EMO Elektrik Mühendisliği Dergisi, vol. 431, no. 1, pp. 38–44, August 2007.
  • [30] B. Srinivasa-Desikan, Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras. Packt Publishing Ltd, 2018, pp. 250-287.
  • [31] R. Rani and D. K. Lobiyal, “An extractive text summarization approach using tagged-LDA based topic modeling,” Multimedia Tools and Applications, vol. 80, pp. 3275-3305, January 2021. doi:10.1007/s11042-020-09549-3

Çıkarımsal Otomatik Metin Özetleme Yöntemlerinin Tıp Makaleleri Kullanılarak Performans Değerlendirmesi

Yıl 2023, Cilt: 9 Sayı: 4, 14 - 22, 31.12.2023

Öz

Teknolojinin hızlı ilerlemesi, mevcut dijital veri miktarında büyük bir artışa neden olmuştur. Bu durum, bu kapsamlı veri koleksiyonu içinde belirli bilgileri bulmada yardıma ihtiyaç duyan kullanıcılar için bir sorun yaratır ve zaman alıcı bir süreçle sonuçlanır. Bu sorunu ele almak ve kullanıcıların bilgilere daha etkili bir şekilde erişmelerini sağlamak amacıyla Otomatik Metin Özetleme sistemleri, geleneksel özetleme tekniklerine bir alternatif olarak geliştirilmiştir. Bilindiği üzere, sağlık bilimleri alanındaki araştırmacılar, yoğun programları nedeniyle güncel literatürü takip etmekte zorlanmaktadır. Bu çalışmanın amacı, sağlık bilimleri alanındaki Türkçe bilimsel makalelerin kapsamlı özetlerini oluşturmaktır. Bilimsel makalelerde hâli hazırda özetler bulunmasına rağmen, daha detaylı özetlere ihtiyaç duyulmaktadır. Bildiğimiz kadarıyla, Türkçe yazılan akademik tıp makalelerini otomatik olarak özetleyen bir çalışma daha önce yapılmamıştır. Bu amaçla, DergiPark'tan 105 adet Türkçe makale içeren bir veri kümesi toplanmıştır. Çıkarımsal metin özetleme algoritmaları olarak, bu alanda sıkça kullanılan Terim Frekansı, Terim Frekansı-Tersine Doküman Frekansı, Gizli Anlamsal Analiz, TextRank ve Gizli Dirichlet Ayırımı algoritmaları seçilmiştir. Metin özetleme modellerinin başarımı Duyarlılık, Kesinlik ve F-skor metrikleri kullanılarak değerlendirilmiş ve algoritmalar tatmin edici sonuçlar vermiştir.

Kaynakça

  • [1] J. P. Andersen, M. W. Nielsen, N. L. Simone, R. E. Lewiss, and R. Jagsi, “COVID-19 medical papers have fewer women first authors than expected,” eLife, vol. 9, June 2020. doi:10.7554/elife.58807
  • [2] A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with pointer-generator networks,” arxiv.org, April 2017 [Online]. Available: arXiv, https://arxiv.org/abs/1704.04368. [Accessed: 15 Dec. 2023].
  • [3} S. Narayan, S. B. Cohen, and M. Lapata, “Ranking sentences for extractive summarization with reinforcement learning,” arxiv.org, April 2018 [Online]. Available: arXiv, https://arxiv.org/abs/1802.08636. [Accessed: 15 Dec. 2023].
  • [4] E. Erdağı, “Extractive based automatic text summarization in Turkish texts,” Ph.D. dissertation, Maltepe University, İstanbul, Turkey, 2023.
  • [5] Ö. E. Gündoğdu and N. Duru, “Turkish text summarization and methods,” in Proc. of 18th Academic Computing Conference -AB 2016, Aydın, Turkey, January 30-February 5, 2016, pp. 69–76.
  • [6] DergiPark, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/. [Accessed: 16 Dec. 2023].
  • [7] J. Beel, B. Gipp, S. Langer, and C. Breitinger, “Research-paper recommender systems: A literature survey,” International Journal on Digital Libraries, vol. 17, no. 4, pp. 305–338, July 2015. doi:10.1007/s00799-015-0156-0
  • [8] A. Güran, "Automatic text summarization system," Ph.D. dissertation, Yıldız Technical University, Istanbul, Turkey, 2013.
  • [9] O. Kaynar, Y. Işık, Y. Görmez, and F. Demirkoparan “Genetic algorithm based sentence extraction for automatic text summarization,” Journal of Management Information Systems, vol. 3, no. 2, pp. 62-75, December 2017.
  • [10] H. Torun and A. B. Inner, "Detecting similar news by summarizing Turkish news," in Proc. of 2018 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, 2018, pp. 1-4. doi: 10.1109/SIU.2018.8404826
  • [11] A. A. Karcioğlu and A. C. Yaşa, "Automatic summary extraction in texts using genetic algorithms,"in Proc. of 2020 28th Signal Processing and Communications Applications Conference (SIU), Gaziantep, Turkey, 2020, pp. 1-4. doi: 10.1109/SIU49456.2020.9302205
  • [12] Ankara Journal of Health Services Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/ashd. [Accessed: 16 Dec. 2023].
  • [13] Çukurova Medical Journal Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/cumj. [Accessed: 16 Dec. 2023].
  • [14] Ege Journal of Medicine Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/egetbd. [Accessed: 16 Dec. 2023].
  • [15] Gazi Journal of Health Sciences Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/gsbdergi. [Accessed: 16 Dec. 2023].
  • [16]Journal of Hacettepe University Faculty of Health Sciences Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/husbfd. [Accessed: 16 Dec. 2023].
  • [17] Journal of Istanbul Faculty of Medicine Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik https://dergipark.org.tr/tr/pub/iuitfd. [Accessed: 16 Dec. 2023].
  • [18] Mersin University Journal of Health Sciences Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik https://dergipark.org.tr/tr/pub/mersinsbd. [Accessed: 16 Dec. 2023].
  • [19] Journal of Samsun Health Sciences Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik https://dergipark.org.tr/tr/pub/jshs. [Accessed: 16 Dec. 2023].
  • [20] S. Bal, “New methods for improving the performance of extractive Turkish text summarization,” Ph.D. dissertation, Eskisehir Osmangazi University, Eskisehir, Turkey, 2022.
  • [21] F. Horasan F and B. Bilen “Extractive text summarization system for news texts,” International Journal of Applied Mathematics Electronics and Computers, vol. 8, no. 4, pp. 179-184, December 2020. doi:10.18100/ijamec.800905
  • [22] N. Abdi Omar, “A user based comparative study of automatic text summarization,” M.Sc. Dissertation, Institute of Science and Technology, Kocaeli University, Kocaeli, Turkey, 2018.
  • [23] E. Akulker, “Extractive text summarization for Turkish using TF-IDF and pagerank algorithms,” Ph.D. dissertation, The Graduate School of Natural and Applied Sciences of Atılım University, İstanbul, Turkey, 2019.
  • [24] M. G. Ozsoy, F. N. Alpaslan, and I. Cicekli, “Text summarization using latent semantic analysis,” Journal of Information Science, vol. 37, no. 4, pp. 405–417, June 2011. doi:10.1177/0165551511408848
  • [25] V. Gulati, D. Kumar, D. E. Popescu, and J. D. Hemanth, “Extractive article summarization using integrated TextRank and BM25+ algorithm,” Electronics, vol. 12, no. 2, p. 372, January 2023. doi:10.3390/electronics12020372
  • [26] M. Kar, S. Nunes, and C. Ribeiro, “Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model,” Information Processing & Management, vol. 51, no. 6, pp. 809–833, November 2015. doi:10.1016/j.ipm.2015.06.002
  • [27] S.W. Kim and J.M. Gil, “Research paper classification systems based on TF-IDF and LDA schemes,” Human-Centric Computing and Information Sciences, vol. 9, no. 1, August 2019. doi:10.1186/s13673-019-0192-7
  • [28] M. Zhang, C. Li, M. Wan, X. Zhang, and Q. Zhao, “ROUGE-SEM: Better evaluation of summarization using ROUGE combined with semantics,” Expert Systems with Applications, vol. 237, p. 121364, March 2024. doi:10.1016/j.eswa.2023.121364
  • [29] M. D. Akın and A. A. Akın, “Türk dilleri için açık kaynaklı doğal dil işleme kütüphanesi: ZEMBEREK,” EMO Elektrik Mühendisliği Dergisi, vol. 431, no. 1, pp. 38–44, August 2007.
  • [30] B. Srinivasa-Desikan, Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras. Packt Publishing Ltd, 2018, pp. 250-287.
  • [31] R. Rani and D. K. Lobiyal, “An extractive text summarization approach using tagged-LDA based topic modeling,” Multimedia Tools and Applications, vol. 80, pp. 3275-3305, January 2021. doi:10.1007/s11042-020-09549-3
Toplam 31 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Bilgisayar Yazılımı, Yazılım Mühendisliği (Diğer)
Bölüm Araştırma Makalesi
Yazarlar

Anıl Kuş 0000-0002-5964-3727

Çiğdem İnan Acı 0000-0002-0028-9890

Yayımlanma Tarihi 31 Aralık 2023
Gönderilme Tarihi 16 Kasım 2023
Kabul Tarihi 16 Aralık 2023
Yayımlandığı Sayı Yıl 2023 Cilt: 9 Sayı: 4

Kaynak Göster

IEEE A. Kuş ve Ç. İ. Acı, “Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers”, GMBD, c. 9, sy. 4, ss. 14–22, 2023.

Gazi Journal of Engineering Sciences (GJES) publishes open access articles under a Creative Commons Attribution 4.0 International License (CC BY) 1366_2000-copia-2.jpg