dc.description.abstract | Since the emergence of music automatic generation technology, we have witnessed a series of advancements. From early neural networks like DNN and CNN to the recent developments in GANs and LSTMs, each technique has brought new possibilities to music composition.
Recently, the application of GPT-2 (Generative Pre-trained Transformer 2) has garnered particular attention. GPT-2 is a pre-trained language model based on the Transformer architecture,
initially used for natural language processing tasks but has expanded into the field of music
generation in recent years. Compared to other techniques, GPT-2 offers significant advantages,
such as the ability to train on large amounts of music data to better understand musical structure
and style. Its pre-training nature enables it to generate music segments that are more fluent,
natural, and demonstrate higher creativity and diversity in composition. Additionally, GPT-2
exhibits good scalability and can be applied to various types and styles of music generation
tasks. However, using GPT-2 for music generation also faces challenges, such as inherent biases in the model or incomplete understanding of music, resulting in generated music lacking
emotional expression or creativity, requiring further post-processing and adjustments.
On the other hand, the application of CycleGAN (Cycle-Consistent Generative Adversarial Network) has also emerged in the field of music generation. CycleGAN utilizes generative
adversarial network technology for unpaired image transformation and ensures the coherence
and authenticity of generated content by introducing cycle consistency loss. This technique is
particularly useful in music generation for transforming one musical style into another. For example, CycleGAN can learn and generate music segments with specific stylistic features, which
is especially effective in preserving traditional music characteristics. Compared to GPT-2, CycleGAN demonstrates advantages in capturing and maintaining stylistic consistency in music
but may be slightly lacking in creativity and diversity. Therefore, combining the strengths of
GPT-2 and CycleGAN can achieve a more balanced effect in music generation, producing natural and creative music segments while preserving the original style features. In this study, we
explored the use of both GPT-2 and CycleGAN techniques to automatically generate percussion
rhythms corresponding to the main melodies of Chinese Jiangnan silk and bamboo music. We
extracted music segments from traditional Jiangnan silk and bamboo music and used this data
to train the GPT-2 model to generate unique percussion rhythms. Additionally, we employed
CycleGAN for generation, which generates percussion rhythms that match the Jiangnan silk
and bamboo style by learning transformations between different musical styles. Compared to
GPT-2, CycleGAN performs well in capturing stylistic features and maintaining consistency in
musical structure, but may be slightly lacking in creativity and diversity. We found that GPT-2
has advantages in generating diverse and innovative rhythms, while CycleGAN excels in generating coherent and stylized rhythms while preserving original style features. Combining the
strengths of both can further improve the quality of percussion rhythm generation, providing a
new path for the automation of Chinese classical music composition. | en_US |