龐大資料集之混合模型分析

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：30

、訪客IP：3.133.144.197

姓名

溫建璋(Chien-Chang Wen) 查詢紙本館藏

畢業系所

統計研究所

論文名稱

龐大資料集之混合模型分析

相關論文

★ 具Box-Cox轉換之逐步加速壽命實驗的指數推論模型	★ 多元反應變數長期資料之多變量線性混合模型
★ 多重型 I 設限下串聯系統之可靠度分析與最佳化設計	★ 應用累積暴露模式至單調過程之加速衰變模型
★ 串聯系統加速壽命試驗之最佳樣本數配置	★ 破壞性加速衰變試驗之適合度檢定
★ 串聯系統加速壽命試驗之最佳妥協設計	★ 加速破壞性衰變模型之貝氏適合度檢定
★ 加速破壞性衰變模型之最佳實驗配置	★ 累積暴露模式之單調加速衰變試驗
★ 具ED過程之兩因子加速衰退試驗建模研究	★ 逆高斯過程之完整貝氏衰變分析
★ 加速不變原則之偏斜-t過程	★ 花蓮地區地震資料改變點之貝氏模型選擇
★ 颱風降雨量之統計迴歸預測	★ 花蓮地區地震資料之長時期相關性及時間-空間模型之可行性

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

摘要
隨著科技的進步、資訊普及化及生活品質的提升，各行各業的資料可能數以”億”計。但在龐大集資料分析上，受計算工具儲存容量的限制，使得傳統方法變的不可行。本文提出利用分段加權平均法來取代傳統常態混合模型及混合線性迴歸模型中之參數估計。我們將資料予以分組，先在各區段中以E-M演算法得參數之最大概似估計，再將各段參數估計量的變異數加入考慮，使得較大變異區段之估計量具有較小的權重，進而探討估計量之性質。另外提出在龐大資料下決定成份個數的方法。

關鍵字(中)

★ 混合模型

關鍵字(英)

論文目次

目錄
第一章緒論 1
1.1 研究動機 1
1.2 文獻回顧及研究方法 2
第二章常態混合模型 7
2.1 已知成份個數時之參數估計 7
2.2 模型選擇 12
2.3 分類預測 14
第三章混合線性迴歸模型 15
3.1 常數成份機率之混合線性迴歸模型 15
3.2 成份機率之混合線性迴歸模型 18
第四章分段加權法 21
4.1 分段加權平均法 21
4.2 龐大資料集之模型選擇 23
第五章模擬結果及實例分析 25
5.1 常態混合模型相關分段之模擬 25
5.1.1 固定成份個數時模型之參數估計 25
5.1.2 成份個數未知時之模型選擇 28
5.2 混合線性迴歸模型之相關模擬 32
5.3 信用卡實例分析 38
第六章討論及未來研究方向 42
參考文獻 43
表目錄
表5.1:常態混合模型使用之加權平均估計 27
表5.2:常態混合模型使用不同加權平均之加權估計量區間之覆蓋
率及區間長度比(信賴係數為95%) 27
表5.3:模擬資料為(5.1)時之判斷鑑別 27
表5.4:(5.2)模擬資料中不同的，組合下各成份個數被選取的
比率 29
表5.5: ，在不同的，組合下各成份個數
被選取的比率 31
表5.6: ，在不同的，組合下各成份個數被
選取的比率 31
表5.7:模型I中使用不同加權平均之參數估計值 33
表5.8:模型I成份個數選擇之支持比率(模擬次數1000) 34
表5.9:模型I中使用最佳權重與等量權重之加權估計區間估計之
覆蓋率及區間長度比 34
表5.10:模型I模擬資料時之判斷鑑別 35
表5.11:模型II成份個數選擇之支持比率(模擬次數1000) 36
表5.12:模型II中使用不同加權平均之參數估計值 36
表5.13:模型II中使用最佳權重與等量權重之加權估計區間估計之
覆蓋率及區間長度比 37
表5.14:模型II模擬資料時之判斷鑑別 37
表5.15:區段中月刷卡金額成份個數選擇之支持比率( ) 39
表5.16:月刷卡金額之混合分佈模型中參數之加權平均估計 40
表5.17:月刷卡金額對家庭月收入之成份個數選擇支持比率 41
表5.18:月刷卡金額對家庭月收入之參數估計 41
表5.19:成份個數 (即簡單迴歸模型)之參數估計 41
圖目錄
圖5.1:模型(5.2)模擬樣本之密度函數估計圖 29
圖5.2:模型I之反應變數對解釋變數的散佈圖 33
圖5.3:月刷卡金額直方圖 39

參考文獻

參考文獻
1.Casella, G. and Berger, R. L. (2001). “Statistical Inference.” 2nd ed ,
Duxbury.
2.Celeux, G. and Diebolt, J. (1985). “The SEM algorithm: a probabilistic
teacher algorithm derived from the EM algorithm for the mixture problem.”
Comput. Statist. , 2, 73-82
3.Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). “Maximum likelihood
from incomplete data via the EM algorithm(with discussion).” Journal of the
Royal Statistical Society, Ser. B, 39, 1-38.
4.Diebolt, J. and Robert, C. P. (1994). “Estimation of Finite Mixture
Distribution by Bayesian sampling.” Journal of the Royal Statistical
Society, Ser. B, 57, 357-384.
5.Everitt, B. and Hand, D. (1981). “Finite Mixture Distribution.” London:
Chapman and Hall.
6.Feng, Z. D. and Mcculloch, C. E. (1996). “Using Bootstrap Likelihood Ratios
in Finite Mixture Models.” Journal of the Royal Statistical society, Series
B, 58, No. 3. 609-617.
7.Fraley, C. and A. E. Raftery. (1998). “How many clusters? Which clustering
method? Answers via model-based cluster analysis.” Computer Journal,41,578-
588,1998
8.Hurn, M. , Justel, A. and Robert, C. P. (2003). “Estimating Mixtures of
Regressions.” Journal of Computational and Graphical Statistics, Volume 12,
Number 1, 55-79.
9.Leytham, K. M. (1984). “Maximum Likelihood Estimates for the Parameters of
Mixture Distributions.” Water resources Research, Vol. 20, NO. 7, 896-902.
10.Li, R., Lin, D. K. J., and Li, B. (2003).”Statistical Inference on Large
Data Sets.” Knowledge Discovery, forthcoming.
11.Louis, T. A. (1982). “Finding the observed information matrix when using
the EM algorithm.” Journal of the Royal Statistical society, Series B, 44,
226-233
12.McGilchrist, C. A., Yau, K.K.W., 1995. “The derivation of BIUP, ML, REML
estimation methods for generalized linear mixed models.” Commum. Statist.-
Theory Method 24, 2963-2980.
13.McLachlan, G. J. (1987). “On bootstrapping the likelihood ratio test
statistics for the number of components in a normal mixture.” Appl.
Statist. , 36, 318-324.
14.McLachlan, G. J. and Basford K. E. (1988). “Mixture Models: Inference and
Applications to Clustering.” New York: Marcel Dekker.
15.McLachlan, G. J. and Peel, D. (1997). “On a resampling approach to choosing
the number of components in normal mixture models.” In Computing Science
and Statistics Vol. 28, L. Billard and N. I. Fisher (Eds.). Fairfax Station,
Virginia: Interface Foundation of North America, 260-266.
16.Quandt, R. E. , and Ramsey, J. B. (1978). “Estimating Mixtures of Normal
Distributions and Switching Regressions.” Journal of the American
Statictical Association, 73, 730-752
17.Roeder, C. and Wasserman, L. (1997). “Practical Bayesian Density Estimation
Using Mixtures of Normals.” Journal of the American Statictical
Association,92,894-902
18.Schwarz, G. (1978). “Estimating the Dimension of a Model.” The Annals of
Statistics, 6, 461-464.
19.Tanner, M. D. and Wong, W. (1987) “The calculation of posterion
distributions by data augmentation(with discussion).” J. Am. Statist. Ass.,
82 528-550
20.Titterington, D., Smith, A. F. M. and Makov,U. (1985) “Statistical Analysis
of Finite Mixture Distributions.” New York: Wiley.
21.Volinsky C.T. and Raftery A. E. (1998). “Bayesian Information Criterion for
Censored Survival Models.” Biometrics: Vol. 56, No. 1, 256–262
22.West, M. (1992) “Midelling with mixtures.” In Bayesian Statistics 4(eds J.
M. Dernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith). Oxford: Oxford
University Press.
23.Yau, K. K. W. , Lee, A. H. and Ng, A. S. K. (2003). “Finite Mixture
Regression Model with Random Effects: Application to Neonatal Hospotal
Length of Stay.” Computational Statistics and Data Analysis, 41, 359-366.
24.Zen, M. M., Lin, Y. H. and Lin, D. K. (2003). “Simple Linear Regression for
Large Data sets.” Tech. Report.國立成功大學統計研究所。
25.邵利雅(2003)。”龐大資料集之線性迴歸分析”。國立中央大學統計研究所，碩士
論文。

指導教授

樊采虹(Tsai-Hung Fan)

審核日期

2004-6-14

推文