摘要: | 跌倒意外可能發生在老年人、住院病患、特定疾病患者與行動不便者的日常 生活當中,並對他們的周身安全和生活品質造成嚴重的威脅,同時亦可能對他們的照護者的身體和心理產生沉重的負面影響。此外,跌倒意外若越遲被發現,受傷者完全康復的機會則可能越低。為了能及早發覺跌倒事件,在本研究中我們提出了一個數據導向的跌倒偵測模型與感測器融合架構。我們結合商用網路攝影機 (WebCam) 與實驗室自製超音波感測器陣列 (ultrasonic array),開發一個能自動偵測人體跌倒的感測裝置與相應的機率模型 (probabilistic model)。
在這個研究所提出的複合式感測架構當中,網路攝影機與超音波陣列分別用於收集人體移動的上、下、左、右等四個方向 (即橫向) 以及前、後兩個方向 (即縱向) 的時間序列訊號,而在人物的辨識追蹤技術上則是利用 Haar 特徵階層式分類器 (Haar-feature-based cascade classifier) 與特徵通道與空間可信度循跡 (channel and spatial reliability tracking, CSRT) 追蹤器兩種不同的偵測演算法來辨識及追蹤目標對象。然後再將這些訊號依時間對齊並組合成一組在三維空間 (three-dimensional space, 3D space) 中的運動軌跡圖,爾後以離散型快速資料密度泛函轉換 (Discrete fast Data Density Functional Transform, D-fDDFT) 產生跌倒辨識的機率圖譜。實驗中七位受試者平均身高為 164.2 ± 12 公分,並以特定的運動模式模擬自然狀況下的暈眩跌倒姿態。我們以 D-fDDFT 技術提取了他們的 3D 動作時序數據的資料群數及對應的資料邊界,再以這些動作特徵建立辨識人體跌倒的機率模型。
在 D-fDDFT 技術的特徵提取與分類下,該機率模型能視覺化呈現人體動作的三種可能狀態:正常行動姿勢、跌倒前期的過渡區、以及跌倒姿勢的完成。研究中,複合式感測系統在數據驗證中的準確率 (accuracy) 為 90%、靈敏度 (sensitivity) 為 90%、精確率 (precision) 為 95%,從預測跌倒可能發生到發出警報的平均時間約為 0.7秒。這些關鍵結果不僅顯示複合式感測系統的偵測性能可與當前的其他方法相媲美,也證實了該系統的實際可行性。此外,為因應當前人體姿勢追蹤與跌倒偵測技術仍以影像技術併用深度卷積神經網路 (convolutional neural network, CNN) 為主流的趨勢,我們提出一個結合物件偵測技術與隨機變分推理 (stochastic variational inference, SVI) 的新方案:藉由建構輕量化單次多框偵測器 (single-shot multi-box detector, SSD) 神經網路模型來縮小辨識模型的尺寸並提高推理速度,以利將此技術應用於快速人體姿態辨識。
技術上,我們採用惟整數運算 (integer-arithmetic-only, IOA) 演算法來降低模型訓練的計算複雜度,並採用特徵金字塔網路 (feature pyramid network, FPN) 加強捕捉小物體的特徵。同時利用自注意力機制 (self-attention mechanism) 來提取人體連續動作框之特徵,亦即偵測框的質心座標,再透過貝氏神經網絡 (Bayesian neural network) 與隨機變分推論 (stochastic variational inference) 技術,人體姿勢便可以經由快速解析以高斯混合模型 (Gaussian mixture model, GMM) 產生的數據群簇來進行即時分類。模型以即時質心特徵作為輸入值 (inputs),並且在機率圖譜中的不同位置來顯示可能的人體姿勢。相對於作為比較基準的 ResNet 模型,我們的模型具有較高的平均精確度 (mAP: 34.6 vs. 32.5)、較快的推理速度 (inference speed: 27 vs. 48 milliseconds)、以及較小的模型尺寸 (46.2 vs. 227.8 MB),且能在疑似跌倒事件發生前約 0.66 秒就發出警報。;Elderly people, inpatients, individuals with specific conditions, and those with mobility impairments are at high risk of fall accidents, which may pose severe impacts on their health and quality of life, as well as significant burdens to their caregivers. The later a fall accident is discovered, the chance of full recovery is slimer. We develop a data-driven system with a hybrid sensing mechanism and a probabilistic model to enable the automatic detection of fall incidents. In the dual sensing platform, a webcam and an ultrasonic array correspondingly capture the target subject′s transverse and longitudinal time-series signals, which are assembled into a three-dimensional (3D) motion trajectory map. Two different detection and tracking algorithms are utilized to identify the target subject. The synergy of the Haar-feature-based cascade classifier and channel and spatial reliability tracking (CSRT) tracker ensures continuous face tracking of a moving subject. The average height of 7 normal subjects participating in our study is 164.2 ± 12 cm. We use 3D motion data with discrete fast data density functional theory (D-fDDFT) to estimate cluster numbers and their corresponding boundaries and employ a Gaussian mixture model (GMM) as the kernel of D-fDDFT. These features are then used to construct a probabilistic model for visually displaying the three possible motion states of normal movement, transition from normal to fall, and fall. The hybrid sensing system achieves an accuracy of 90%, a sensitivity of 90%, and a precision of 95% during data validation. The average time from a fall to the alarm being triggered is approximately 0.7 seconds. These key results not only demonstrate detection performance comparable to contemporary methods but also validate the feasibility of the proposed system. Since image-based approaches employing deep neural networks remain mainstream for posture detection, we further establish a novel framework combining object detection techniques with stochastic variational inference (SVI). By constructing lightweight neural network models, we aim to reduce model sizes and improve inference speed, facilitating their application in rapid human posture recognition. We adopt the integer-arithmetic-only (IOA) algorithm to lower the computational complexity during model training and utilize the feature pyramid network (FPN) to capture features of small objects. The self-attention mechanism is employed to extract features from continuous human motion frames, which are the centroid coordinates of bounding boxes. By integrating Bayesian neural networks and stochastic variational inference techniques, human postures can be classified promptly by efficiently resolving a Gaussian mixture model (GMM). With the instant centroid features as inputs, the potential human postures can be displayed on probabilistic maps. Compared to the ResNet model, which serves as a benchmark, our model demonstrates superior performance with higher mean precision (34.6 vs. 32.5), faster inference speed (27 vs. 48 milliseconds), and smaller model size (46.2 vs. 227.8 MB). Additionally, the model can issue an alert approximately 0.66 seconds before a suspected fall event occurs. |