Fine Tuned Multitasking Neural Network for Parkinson's Disease Detection from Voice Recordings
| dc.creator | López-Santander, Diego Alexander | |
| dc.creator | Ríos-Urrego, Cristian David | |
| dc.creator | Orozco-Arroyave, Juan Rafael | |
| dc.date | 2025-07-28 | |
| dc.date.accessioned | 2025-10-01T23:53:16Z | |
| dc.description | Parkinson's disease (PD) is the second most prevalent neurodegenerative disorder in old age. It is characterized by symptoms such as resting tremor, rigidity, and gait disturbances. It also affects the natural production of speech, causing tremors of the voice and imprecise pronunciation, among others. Given the prevalence of speech disorders in PD, analyzing an individual's speech provides a non-invasive, cost-effective means for detection and monitoring. The objective of this paper was to take advantage of the potential of deep learning, specifically a pre-trained convolutional neural network and a multitasking approach, to classify speech recordings from PD patients and healthy controls (HC) from spectral representations. The proposed multitask analysis methodology aimed to evaluate the effectiveness of pre-trained ResNet models, fine-tuned on Spanish, Italian, and German speech databases, for both single-task and multitask classification approaches. The results indicated that multitask learning, which includes additional tasks such as vowel and sex classification, enhances the model's performance compared to monotask learning by taking advantage of shared representations across related tasks. The multitask approach showed an improvement of up to 5% in classification accuracy and the inclusion of the intermediate models for fine-tuning produced up to 10% better classification accuracy with respect to the implemented baseline. In conclusion, this work contributes to the growing body of literature demonstrating the viability of deep learning methods for non-invasive PD detection and highlights the advantages of multitask learning for pathological speech classification. | en-US |
| dc.description | La enfermedad de Parkinson (EP) es el segundo trastorno neurodegenerativo más prevalente en la vejez. Se caracteriza por síntomas como temblor en reposo, rigidez y alteraciones de la marcha. También afecta a la producción natural del habla, causando temblor de voz y pronunciación imprecisa. Dada la prevalencia de los trastornos del habla en la EP, el análisis del habla de un individuo proporciona un medio no invasivo y económico para su detección y monitorización. El objetivo de este trabajo consistió en aprovechar el potencial del aprendizaje profundo, específicamente una red neuronal convolucional pre entrenada y un enfoque multitarea, para clasificar grabaciones del habla de pacientes con EP y controles sanos (HC) utilizando representaciones espectrales. La metodología de análisis multitarea propuesta consistió en evaluar la eficacia de los modelos ResNet pre entrenados, afinados en bases de datos en español, italiano y alemán, tanto para enfoques de clasificación de una sola tarea como multitarea. Los resultados indicaron que el aprendizaje multitarea, que incluye tareas adicionales como la clasificación de vocales y la clasificación de sexos, mejora el rendimiento del modelo en comparación con el aprendizaje monotarea al aprovechar las representaciones compartidas entre tareas relacionadas. El enfoque multitarea mostró una mejora de hasta el 5 % en la tasa de acierto de la clasificación, y la inclusión de los modelos intermedios para el ajuste fino produjo una mejora de hasta el 10 % con respecto al modelo baseline implementado. Finalmente, se concluye que este trabajo contribuye al creciente cuerpo de literatura que demuestra la viabilidad de los métodos de aprendizaje profundo para la detección no invasiva de la EP y destaca las ventajas del aprendizaje multitarea para la clasificación patológica del habla. | es-ES |
| dc.format | application/pdf | |
| dc.format | text/xml | |
| dc.format | application/zip | |
| dc.identifier | https://revistas.itm.edu.co/index.php/tecnologicas/article/view/3307 | |
| dc.identifier | 10.22430/22565337.3307 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.12622/7936 | |
| dc.language | eng | |
| dc.publisher | Instituto Tecnológico Metropolitano (ITM) | es-ES |
| dc.relation | https://revistas.itm.edu.co/index.php/tecnologicas/article/view/3307/3710 | |
| dc.relation | https://revistas.itm.edu.co/index.php/tecnologicas/article/view/3307/3778 | |
| dc.relation | https://revistas.itm.edu.co/index.php/tecnologicas/article/view/3307/3779 | |
| dc.relation | /*ref*/A. H. V. Schapira, C. Warren Olanow, J. Timothy Greenamyre, and E. Bezard, “Slowing of neurodegeneration in Parkinson's disease and Huntington's disease: future therapeutic perspectives,” Lancet, vol. 384, no. 9942, pp. 545-555, Aug. 2014. https://doi.org/10.1016/S0140-6736(14)61010-2 | |
| dc.relation | /*ref*/J. Jankovic, and A. E. Lang, “Diagnosis and assessment of Parkinson disease and other movement disorders,” in Bradley's Neurology in Clinical Practice E-Book. 8th ed. Oxford, UK: Elsevier, 2021, pp. 310-33. https://www.clinicalkey.com/nursing/#!/content/book/3-s2.0-B9780323642613000243?scrollTo=%23hl0002636 | |
| dc.relation | /*ref*/M. Sapmaz Atalar, O. Oguz, and G. Genc, “Hypokinetic Dysarthria in Parkinson's Disease: A Narrative Review,” Med. Bull. Sisli Etfal Hosp., vol. 57, no. 2, pp. 163-170, 2023. https://doi.org/10.14744/SEMB.2023.29560 | |
| dc.relation | /*ref*/F. Cao, A. P. Vogel, P. Gharahkhani, and M. E. Renteria, “Speech and language biomarkers for Parkinson’s disease prediction, early diagnosis and progression," npj Parkinsons Dis., vol. 11, no. 1, p. 57, Mar. 2025. https://doi.org/10.1038/s41531-025-00913-4 | |
| dc.relation | /*ref*/J. Rusz et al., “Smartphone allows capture of speech abnormalities associated with high risk of developing Parkinson’s disease,” IEEE Transact. Neur. Systems Rehab. Engin., vol. 26 no. 8, pp. 1495-1507, Aug. 2018. https://doi.org/10.1109/TNSRE.2018.2851787 | |
| dc.relation | /*ref*/A. Lowit, A. Marchetti, S. Corson, and A. Kuschmann, “Rhythmic performance in hypokinetic dysarthria: Relationship between reading, spontaneous speech and diadochokinetic tasks,” J. Communic. Disord., vol. 72, no. 26, Mar-Apr. 2018. https://doi.org/10.1016/j.jcomdis.2018.02.005 | |
| dc.relation | /*ref*/P. Kumar Keserwani, S. Das, and N. Sarkar, “A comparative study: prediction of parkinson’s disease using machine learning, deep learning and nature inspired algorithm,” Multimed. Tools Appl., vol. 83, no. 27, pp. 69393-69441, Jan 2024. https://doi.org/10.1007/s11042-024-18186-z | |
| dc.relation | /*ref*/A. Shrestha, and A. Mahmood, “Review of deep learning algorithms and architectures,” IEEE Acc., vol. 7, pp. 53040-53065, Apr. 2019. https://doi.org/10.1109/ACCESS.2019.2912200 | |
| dc.relation | /*ref*/M. Shaban, “Deep learning for Parkinson’s disease diagnosis: a short survey,” Computers, vol. 12, no. 3, p. 58, Mar. 2023. https://doi.org/10.3390/computers12030058 | |
| dc.relation | /*ref*/J. Rasheed, A. Ali Hameed, N. Ajlouni, A. Jamil, A. Özyavaş, and Z. Orman, “Application of adaptive back-propagation neural networks for Parkinson’s disease prediction,” in 2020 Inter. Conf. Data Analytics Bus. Indust.: Way Towards a Sustainable Economy, Sakheer, Bahrain, 2020, pp. 1-5. https://doi.org/10.1109/ICDABI51230.2020.9325709 | |
| dc.relation | /*ref*/S. Rahman, M. Hasan, A. Krishno Sarkar, and F. Khan, “Classification of Parkinson’s Disease using Speech Signal with Machine Learning and Deep Learning Approaches,” Europ. J. Electr. Engin. Comput. Sci., vol. 7, no. 2, pp.20-27, Mar. 2023. https://doi.org/10.24018/ejece.2023.7.2.488 | |
| dc.relation | /*ref*/M. Little, 2007, “Parkinsons” UCI Machine Learning Repository. https://doi.org./10.24432/C59C74 | |
| dc.relation | /*ref*/A. Rehman, T. Saba, M. Mujahid, F. S. Alamri, and N. ElHakim, “Parkinson’s disease detection using hybrid LSTM-GRU deep learning model,” Electronics, vol. 12, no. 13, p. 2856, Jun. 2023. https://doi.org/10.3390/electronics12132856 | |
| dc.relation | /*ref*/J. Mallela et al., “Voice based classification of patients with Amyotrophic Lateral Sclerosis, Parkinson’s Disease and Healthy Controls with CNN-LSTM using transfer learning,” in 2020 IEEE Inter. Conf. Acoust. Speech Sign. Process, Barcelona, Spain, 2020, pp. 6784-6788. https://doi.org/10.1109/ICASSP40776.2020.9053682 | |
| dc.relation | /*ref*/O. Karaman, H. Çakın, A. Alhudhaif, and K. Polat, “Robust automated Parkinson disease detection based on voice signals with transfer learning,” Expert Syst. Appl., vol. 178, p. 115013, Sep. 2021. https://doi.org/10.1016/j.eswa.2021.115013 | |
| dc.relation | /*ref*/K. G. Dávid Sztahó, and T. Miklós Gábriel, “Deep learning solution for pathological voice detection using LSTM-based autoencoder hybrid with multi-task learning,” in I14th Inter. Joint Conf. Biomed. Engin. Syst. Technol, Vienna, Austria, 2021, pp. 135-141. https://www.scitepress.org/PublishedPapers/2021/101931/101931.pdf | |
| dc.relation | /*ref*/J. C. Vásquez-Correa, T. Arias-Vergara, J. R. Orozco-Arroyave, and E. Nöth, “A Multitask Learning Approach to Assess the Dysarthria Severity in Patients with Parkinson's Disease,” in Proceed. Interspeech, Hyderabad, India, 2018, pp. 456-460. https://doi.org/10.21437/Interspeech.2018-1988 | |
| dc.relation | /*ref*/J. R. Orozco-Arroyave, J. D. Arias-Londoño, J. F. Vargas-Bonilla, M. C. Gonzalez-Rátiva, and E. Nöth, “New Spanish speech corpus database for the analysis of people suffering from Parkinson's disease,” in Proceed. LREC, 2014, pp. 342-347. https://www5.informatik.uni-erlangen.de/Forschung/Publikationen/2014/Orozco14-NSS.pdf | |
| dc.relation | /*ref*/C. G. Goetz et al., “Movement Disorder Society‐sponsored revision of the Unified Parkinson's Disease Rating Scale (MDS‐UPDRS): scale presentation and clinimetric testing results,” Movem. Disord., vol. 23, no. 15, pp. 2129-2170, Nov. 2008. https://doi.org/10.1002/mds.22340 | |
| dc.relation | /*ref*/Universidad del Sarre, and Hospital Universitario de Essen, “Saarbrücken Voice Database,” stimmdb.coli. Accessed: Jun. 20. 2024. [Online]. Available: https://stimmdb.coli.uni-saarland.de/ | |
| dc.relation | /*ref*/G. Dimauro, V. Di Nicola, V. Bevilacqua, D. Caivano, and F. Girardi, “Assessment of speech intelligibility in Parkinson’s disease using a speech-to-text system,” IEEE Acc., vol. 5, pp. 22199-22208, Oct. 2017. https://doi.org/10.1109/ACCESS.2017.2762475 | |
| dc.relation | /*ref*/D. A. López-Santander, C. David Rios-Urrego, C. Bergler, E. Nöth, and J. R. Orozco-Arroyave, “Robust Classification of Parkinson’s Speech: An Approximation to a Scenario With Non-controlled Acoustic Conditions,” in Text, Speech, and Dialogue. TSD 2024. Lecture Notes in Computer Science, E. Nöth, A. Horák, P. Sojka, Eds., Cham, Switzerland: Springer, 2024, pp. 252-262. https://doi.org/10.1007/978-3-031-70566-3_22 | |
| dc.relation | /*ref*/K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Las Vegas, USA, 2016, pp. 770-778. https://doi.org/10.1109/CVPR.2016.90 | |
| dc.relation | /*ref*/S. Ruder, “An Overview of Multi-Task Learning in Deep Neural Networks,” arXiv: 1706.05098, 2017. https://doi.org/10.48550/arXiv.1706.05098 | |
| dc.relation | /*ref*/M. Fontana, M. Spratling, and M. Shi, “When multitask learning meets partial supervision: A computer vision review,” Proceed. IEEE, vol. 112, no. 6, pp. 516-543, Aug. 2024. https://doi.org/10.1109/JPROC.2024.3435012 | |
| dc.relation | /*ref*/G. Pironkov, S. Dupont, and T. Dutoit, “Multi-Task Learning for Speech Recognition: An Overview,” in ESANN 2016 Proceed. Europ. Symp. Artif. Neur. Net., Comput. Intellig. Mach. Learn., Bruges, Belgium, 2016, pp. 189-194. https://www.esann.org/sites/default/files/proceedings/legacy/es2016-154.pdf | |
| dc.relation | /*ref*/H. Harutyunyan, H. Khachatrian, D. C. Kale, G. Ver Steeg, and A. Galstyan, “Multitask learning and benchmarking with clinical time series data,” Scient. Data, vol. 6, no. 96, Jun. 2019. https://doi.org/10.1038/s41597-019-0103-9 | |
| dc.relation | /*ref*/S. Chen, Y. Zhang, and Q. Yang, “Multi-Task Learning in Natural Language Processing: An Overview,” arXiv: 2109.09138, 2021. https://doi.org/10.48550/arXiv.2109.09138 | |
| dc.relation | /*ref*/F. Amato, L. Borzì, G. Olmo, C. A. Artusi, G. Imbalzano, and L. Lopiano, “Speech impairment in Parkinson’s disease: acoustic analysis of unvoiced consonants in Italian native speakers,” IEEE Acc., vol. 9, pp. 166370-166381, Dec. 2021. https://doi.org/10.1109/ACCESS.2021.3135626 | |
| dc.rights | Derechos de autor 2025 TecnoLógicas | es-ES |
| dc.rights | https://creativecommons.org/licenses/by-nc-sa/4.0 | es-ES |
| dc.source | TecnoLógicas; Vol. 28 No. 63 (2025); e3307 | en-US |
| dc.source | TecnoLógicas; Vol. 28 Núm. 63 (2025); e3307 | es-ES |
| dc.source | 2256-5337 | |
| dc.source | 0123-7799 | |
| dc.subject | aprendizaje profundo | es-ES |
| dc.subject | aprendizaje multitarea | es-ES |
| dc.subject | clasificación de habla patológica | es-ES |
| dc.subject | aprendizaje por transferencia | es-ES |
| dc.subject | deep learning | en-US |
| dc.subject | multitask learning | en-US |
| dc.subject | pathological speech classification | en-US |
| dc.subject | transfer learning | en-US |
| dc.title | Fine Tuned Multitasking Neural Network for Parkinson's Disease Detection from Voice Recordings | en-US |
| dc.title | Red neuronal multitarea para la detección de la enfermedad de Parkinson a partir de grabaciones de voz | es-ES |
| dc.type | info:eu-repo/semantics/article | |
| dc.type | info:eu-repo/semantics/publishedVersion | |
| dc.type | Research Papers | en-US |
| dc.type | Artículos de investigación | es-ES |
Archivos
Bloque original
1 - 3 de 3
Cargando...
- Nombre:
- 3307_Diagramado_Eng_V3.pdf
- Tamaño:
- 642.47 KB
- Formato:
- Adobe Portable Document Format