Video summarization techniques have been proposed for years to offer people comprehensive understanding of a whole story on video. However, although these traditional methods give brief summaries for users, they still do not provide concept-organized or structural views. Besides, the knowledge they offer to users is often limited to existing videos. In this study, we present a structural video content summarization that utilizes the four kinds of entities, "who," "what," "where," and "when," to establish the framework of the video contents. Relevant media associated with each entity in the online resource are also analyzed to enrich existing contents. With the above-mentioned information, the structure of the story and its complementary knowledge can be built up according to the entities. Therefore, users can not only browse the video efficiently but also focus on what they are interested in. In order to construct the fundamental system, we employ the maximum entropy criterion to integrate visual and text features extracted from video frames and speech transcripts, generating high-level concept entities. Shots are linked together based on their contents. After constructing the relational graph, we exploit the graph entropy model to detect meaningful shots and relations. The social network analysis based on the Markov clustering algorithm is performed to explore relevant information online. The results demonstrate that our system can achieve excellent performance and information coverage.