dc.description.abstract | In previous research, the authors treat operand in KPI formula as query word to find similar description of entity attribute to form KPI candidates. However, operand in KPI formula only contains 2-3 words, which are hard to perfectly perform a query so that it may causes worse mining results and also affect the number of KPI candidates generated. After generating KPI candidates, we discover that some aggregate value operand cannot find its mapping attributes. Furthermore, we discover that some of the dropped KPI candidates are due to the data type of operands in KPI candidates are not numeric value. When previous authors generate dimensional model, they only consider the entities that adjacent to fact table as dimension tables, ignoring other entities that connect to the adjacent entities, which may cause information lost.
In this research, in order to increase the number of KPI candidates we generated, we expand the operand words from existing KPI in Data Warehouse (DW) system based on its synonyms from lexical database, attaching these synonyms to operand words as query words. By text mining technique, we modified TFIDF to compare the similarity between description of entity-attributes and query words. Moreover, we also modified the predefined structure that used to generate KPI candidate by switching the operand set based on operator. Besides, in order to decrease the number of meaningless KPI candidates, we filter out those entity-attributes with uncountable data type. For those aggregated operands which cannot find mapping attributes, we proposed an algorithm to disaggregate it to find description of entity attribute. Moreover, eliminating entity-attribute of improper data type may also make TFIDF weighting procedure more precisely. Also, we improve the dimension model through merging the entities that connect to dimension table by their hierarchy.
| en_US |