dc.description.abstract | To ensure the sustainability of NHI, all citizens who meet insurance qualifications should be insured and pay premiums. The universal coverage rate has reached 99.9%, but the premium collection rate is lower than this. Therefore, we should deal with the issue of arrears actively. It can not only make effective use of resources under limited administrative funds, maximize the recovery of arrears, but also urge insured to assume the obligation to pay premiums.
Therefore, the research aims to identify objects accurately which can be implemented strategies effectively to increase the premium collection rate through machine learning. The object of the research is the arrear data in 2019 of the insured units of northern division of NHIA, which is the training dataset of the prediction model. The prediction model includes no dimension reduction and dimension reduction by feature selection (information gain, genetic algorithm), and analyzes with 22 dimensions, including 3 features of arrear, 13 features of insured unit and 6 features of the person in charge. Then, the single classifier (CART decision tree, multi-layer perceptron and support vector machine) and ensemble learning (random forest, Bagging and AdaBoost) were used to build the prediction model for NHI premium payment after arrear reminder of insured units.
The classifier model is used to predict whether the insured unit will pay premium within one year after the grace period after arrear reminder. To send the urge reminder in a more accurate way, we propose an improvement strategy for those predicted not to pay the arrears within one year after the grace period, which is to send the arrear reminder by double registered mail instead of original mail. This strategy can not only save the postage for ordinary mail, but more importantly, achieve the effect of delivery at least 4 months earlier, so the subsequent administrative execution process would be accelerated to ensure the priority of compensation which can increase the probability of the premium collection rate.
To compare the AUC value and the model building time of each classifier, it shows the random forest performs the best, followed by Boosting combined with CART, bagging combined with CART, and CART. That is, ensemble learning is indeed better than single classifier. In the random forest model, whether the dimension is simplified or not, the AUC value all reach 0.974, which have excellent discrimination, and the T test shows that there is no significant difference. On the other hand, the multi-layer perceptron and support vector machine perform relatively poor due to the large amount of the dataset.
In order to verify the prediction performance of the new data, the arrear data of January 2020 and February 2020 is used as test dataset in the study. And the result shows that among the random forest model, the information gain performs the best as 0.828 of AUC, which just greater than no dimension reduction 0.827 of AUC slightly. However, the dimensions can be reduced through information gain of feature selection, so it can not only reduce the storage space, but also build models relatively quickly. Overall, the random forest model used information gain is the best classification prediction model. Moreover, the results of the study can be provided to NHIA as an basis for monitoring arrears to improve the premium collection rate. | en_US |