dc.description.abstract | Hadoop is a common cloud tool to build a cloud platform that can be accessed by end-users. In most cases, installation of a Hadoop platform requires installation of different software packages on tens to thousands of computing machines. With a traditional software deployment strategy, the administrators need to repeat similar installation procedures on different computing machines, which is inefficient and time-consuming. In addition, different software components that are distributed among the computing hosts may need to be upgraded. Recently, a popular software deployment tool called Puppet is developed to solve the problem of software deployment, and can apply to Hadoop installation. It uses a server-driven model to install and update various software packages on many computing machines. That is, the Puppet deployment server periodically receives reports from the deployment clients (targets), and computes what software packages are required by what deployment clients. Although it simplifies the problem of software deployment, it still suffers the scalability problem --- the server becomes an obvious performance bottleneck when the number of deployment clients is large. To overcome the problem, we have developed an automatic, client-driven deployment tool for Hadoop based on Puppet. At first, the proposed deployment tool dispatch different installation rules from the deployment server to the client (target) machines. Then, the client periodically checks the server for what versions of software packages are required, and installs software packages if necessary. When compared with similar deployment tools for Hadoop, the proposed deployment tool works better when the number of deployment clients is relatively large. | en_US |