Human pose estimation vision-based method for upper limb rehabilitation

Pires, Francisco José Preto

http://hdl.handle.net/10400.22/26553

Use this identifier to reference this record.

Name:	Description:	Size:	Format:
Tese_5530_v3.pdf		41.89 MB	Adobe PDF	Download

Send Feedback

Authors

Pires, Francisco José Preto

Advisor(s)

Silva, Manuel Fernando dos Santos

Abstract(s)

In recent years, the need for real-time and efficient human pose estimation has grown significantly across various fields, from sports analytics, virtual reality, and healthcare, especially in the context of rehabilitation, as one-third of stroke survivors become dependent on others for daily life. This work addresses this challenge by developing lightweight and resource-efficient deep learning models optimized for markerless, vision-based real-time upper-limb stroke rehabilitation, as an alternative to cumbersome physical sensors and markers. Traditional pose estimation models, while accurate, are often too computationally expensive for real-time use, particularly in environments with limited hardware resources. Three different models were implemented and tested: ResNet-18, U-Net, and Keypoint Region-based Convolutional Neural Network (R-CNN), each offering distinct approaches to 2D keypoint estimation. The models were evaluated using multiple datasets, with their performance varying based on the complexity of the data and estimation methods used. A key aspect of this research involved the integration of temporal smoothing techniques using a Long Short-Term Memory (LSTM) model, which aimed to enhance pose tracking by learning from past keypoint positions. While this method improved smoothness in stable keypoints, challenges remained in dynamic scenarios, particularly with more mobile body parts. The findings showed that while some models achieved higher accuracy (Keypoint R-CNN), they required more processing time, making them less suited for real-time scenarios, where speed is crucial. Additionally, the thesis includes efforts to improve system performance through camera calibration techniques, improving the overall alignment between Red Green Blue (RGB) and depth data of the Red Green Blue and Depth (RGB-D) camera system. In summary, this work highlights the trade-offs between model complexity, accuracy, and speed, presenting a solution that advances human pose estimation while identifying areas for future improvement in both real-time processing and system optimization.

Nos últimos anos, a necessidade de estimativa da pose humana em tempo real e de forma eficiente cresceu significativamente em vários campos, desde a análise desportiva à realidade virtual e aos cuidados de saúde, especialmente no contexto da reabilitação, onde um terço dos sobreviventes de Acidente vascular cerebral (AVC) se tornam dependentes de outros para o dia-a-dia. Este trabalho aborda esse desafio ao desenvolver modelos de deep learning leves e eficientes, otimizados para reabilitação de AVC dos membros superiores sem marcadores e baseada na visão em tempo real, como uma alternativa aos sensores e marcadores físicos incómodos. Os modelos tradicionais de estimativa de pose, embora precisos, costumam ser muito exigentes em termos computacionais para uso em tempo real, especialmente em ambientes com recursos limitados. Foram testados três modelos distintos, cada um oferecendo abordagens diferentes para a estimativa de pontos-chave em 2D. Os modelos foram avaliados utilizando vários conjuntos de dados, com o seu desempenho a variar com base na complexidade dos dados e nos métodos de estimativa utilizados. Um aspecto fundamental desta pesquisa é a integração de técnicas de suavização temporal usando um modelo LSTM, que visa melhorar o acompanhamento da pose ao aprender com posições passadas dos pontos-chave. Embora este método tenha melhorado a suavidade em pontos-chave estáveis, continuaram a existir desafios em cenários dinâmicos, particularmente com partes do corpo mais móveis. Os resultados mostram que, embora alguns modelos tenham alcançado maior precisão (Keypoint R-CNN), eles exigem mais tempo de processamento, tornandoos menos adequados para cenários em tempo real, onde a velocidade é crucial. Além disso, a tese inclui esforços para melhorar o desempenho do sistema por meio de técnicas de calibração de câmeras, melhorando o alinhamento geral entre os dados de RGB e profundidade do sistema de câmara RGB-D. Em resumo, este trabalho destaca os compromissos entre a complexidade do modelo, a precisão e a velocidade, apresentando uma solução que avança a estimativa de pose humana, ao mesmo tempo que identifica áreas para melhorias futuras tanto no processamento em tempo real quanto na otimização do sistema.

Description

I gratefully acknowledge the support provided by the AI-Care4U project at INESC TEC. This work was co-financed by Component 5 - Capitalization and Business Innovation of the core funding for Technology and Innovation Centres (CTI), integrated in the Resilience Dimension of the Recovery and Resilience Plan under the Recovery and Resilience Mechanism (MRR) of the European Union (EU), framed within the Next Generation EU, for the period 2021 - 2026.

Keywords

Human pose estimation Real-time Rehabilitation of the upper-limb after stroke Temporal smoothing Neural networks Estimativa da pose humana, Tempo real Reabilitação do membro superior AVC Suavização temporal Redes neurais