摘要: | 在本篇論文中主要探討如何利用接受者作業特徵曲線 (Receiver Operating Characteristic curve ROC curve)來判斷不同的生物指標(biomarker)對於疾病預測能力的好壞,在本篇論文中,所考慮的共變數數值和疾病狀態是會跟隨時間而改變的,一般地情況下都是使用接受者作業特徵曲線下面積 (Area under ROC curve AUC)來判斷,但由於是時間相依共變數的資料,所以在不同時間點所得到的接受者作業特徵曲線下的面積可能有大有小,無法明確判別出哪一個生物指標的預測能力較好,因此,就想到採用接受者作業特徵曲面下體積 (Volume under ROC surface VUS)來判斷,當體積越大代表預測能力越好。這邊使用二元分配最近鄰點估計法 (NNE)來估計ROC曲線。在模擬研究中,生成兩個生物指標,想要知道哪一個生物指標的預測能力較佳。由AUC得知,生物指標一和生物指標二的線性組合的預測能力較佳,又由VUS得知,生物指標一和生物指標二的線性組合的預測能力較佳。在本篇論文中,舉了兩個實例,第一個為探討CD4細胞數和病毒載量針對愛滋病的預測能力的好壞,由AUC可知,CD4細胞數對於愛滋病的預測能力是優於病毒載量;第二個為果蠅的資料,探討果蠅產卵量與老化狀況的關係,在這裡討論了三種不同的生物指標:總產卵量、最大產卵量的時間和每日產卵量,由AUC得知,總產卵量和每日產卵量對果蠅老化的影響較大,又由VUS得知,每日產卵量對果蠅老化的影響最大。
In this paper, we are mainly interested in using the receiver operating characteristic (ROC) curve to determine which biomarker has better disease prediction. We consider the data that patient’s covariates and their disease status are both time dependent and, in general, this kind of data is justified by the Area under ROC curve (AUC). However, due to the time-dependent covariates, AUC values may vary (under different time points), which make us difficult to make inference (or decide which biomarker has better disease prediction). Thus, we adapt the volume under the ROC surface (VUS) approach instead-the larger the volume, the better the disease prediction. Here, we use the nearest neighbor estimation for a bivariate distribution to estimate the ROC curve. In simulation, we generate two biomarkers, and we are interested in which biomarker has better prediction. From the AUC values, we can know that the biomarker one is better than biomarker two, we compare biomarker one to the combination of biomarker one and biomarker two and by the AUC values, we can know that the linear combination of biomarkers has better prediction. We also use the VUS, we know the linear combination of biomarkers has better prediction. In the practical data analysis, two examples (cases) are given. First, we are interested in the biomarkers CD4 counts and viral load, which one has better prediction for the AIDS. From the AUC values, we can know that the CD4 counts is better than viral load. Second, we are interested in the biomarkers total number of eggs laid during lifetime, the time of maximum eggs laid and number of eggs laid daily, which one has more influence to medfly lifetime. From the AUC values, we can know that the total number of eggs laid during lifetime and number of eggs laid daily are better, but by volume under the ROC surface, number of eggs laid daily has more influence to medfly lifetime. |