Repository logo
 
Publication

Comparative evaluation of artificial intelligence chatbots in answering electroencephalography-related questions

dc.contributor.authorProença, Soraia
dc.contributor.authorSoares, Joana Isabel
dc.contributor.authorParra, Joana
dc.contributor.authorMaia, Gisela
dc.contributor.authorLeite, Juliana
dc.contributor.authorBeniczky, Sándor
dc.contributor.authorJesus-Ribeiro, Joana
dc.contributor.authorHenrique Maia, Gisela Maria
dc.date.accessioned2025-12-18T12:31:22Z
dc.date.available2025-12-18T12:31:22Z
dc.date.issued2025-12-05
dc.description.abstractAs large language models (LLMs) become more accessible, they may be used to explain challenging EEG concepts to nonspecialists. This study aimed to compare the accuracy, completeness, and readability of EEG-related responses from three LLM-based chatbots and to assess inter-rateragreement. One hundred questions, covering 10 EEG categories, were entered into ChatGPT, Copilot, and Gemini. Six raters from the clinical neurophysiology field (two physicians, two teachers, and two technicians) evaluated the responses. Accuracy was rated on a 6-point scale, completeness on a 3-point scale, and readability was assessed using the Automated Readability Index (ARI). We used a repeated-measures ANOVA for group differences in accuracy and readability, the intraclass correlation coefficient (ICC) for inter-raterreliability, and a two way ANOVA, with chatbot and raters as factors, for completeness. Total accuracy was significantly higher for ChatGPT (mean ± SD 4.54 ± .05) compared with Copilot (mean ± SD 4.11 ± .08) and Gemini (mean ± SD 4.16 ± .13) (p < .001). ChatGPT's lowest performance was in normal variants and patterns of uncertain significance (mean ± SD 3.10 ± .14), while Copilot and Gemini performed lowest in ictal EEG patterns (mean ± SD 2.93 ± .11 and 3.37 ± .24, respectively). Although inter-rater agreement for accuracy was excellent among physicians (ICC = .969) and teachers (ICC = .926), it was poor for technicians in several EEG categories. ChatGPT achieved significantly higher completeness scores than Copilot (p < .001) and Gemini (p = .01). ChatGPT text (ARI − mean ± SD 17.41 ± 2.38) was less readable than Copilot (ARI −mean ± SD 11.14 ± 2.60) (p < .001) and Gemini (ARI − mean ± SD 14.16 ± 3.33). Chatbots achieved relatively high accuracy, but not without flaws, emphasizing that the information provided requires verification. ChatGPT outperformed the other chatbots in accuracy and completeness, though at the expense of readability. The lower inter-rater agreement among technicians may reflect a gap in standardized training or practical experience, potentially impacting the consistency of EEG-related content assessment.eng
dc.identifier.citationProença, S., Soares, J. I., Parra, J., Maia, G., Leite, J., Beniczky, S., & Jesus-Ribeiro, J. (2025). Comparative evaluation of artificial intelligence chatbots in answering electroencephalography-related questions. Epileptic Disorders, 1–11. https://doi.org/10.1002/epd2.70156
dc.identifier.doi10.1002/epd2.70156
dc.identifier.eissn1950-6945
dc.identifier.issn1294-9361
dc.identifier.urihttp://hdl.handle.net/10400.22/31262
dc.language.isoeng
dc.peerreviewedyes
dc.publisherWiley
dc.relation.hasversionhttps://onlinelibrary.wiley.com/doi/10.1002/epd2.70156
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectArtificial intelligence
dc.subjectChatGPT
dc.subjectCopilot
dc.subjectElectroencephalography
dc.subjectGemini
dc.subjectLarge language model
dc.titleComparative evaluation of artificial intelligence chatbots in answering electroencephalography-related questionseng
dc.typejournal article
dspace.entity.typePublication
oaire.citation.endPage11
oaire.citation.startPage1
oaire.citation.titleEpileptic Disorders
oaire.versionhttp://purl.org/coar/version/c_970fb48d4fbd8a85
person.familyNameHenrique Maia
person.givenNameGisela Maria
person.identifier.ciencia-id2014-5C31-EBF4
person.identifier.orcid0000-0002-3199-340X
relation.isAuthorOfPublication0bdd4bff-1f99-4630-b3c2-d5759b95a2eb
relation.isAuthorOfPublication.latestForDiscovery0bdd4bff-1f99-4630-b3c2-d5759b95a2eb

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
ART_Gisela Maia.pdf
Size:
15.72 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.03 KB
Format:
Item-specific license agreed upon to submission
Description: