Exploring Segmentation Models for Chinese Ancient Landscape Paintings
Summary
Semantic segmentation is applied to various tasks such as road images, medical images and images that need to separate the main objects from the background. With high performance semantic segmentation models, the computer can actually see the images almost the same as how human can see it. The computer can know what are the objects in the image and where exactly are they. With this level of understanding, a lot of tasks can be turned automatic, therefore, saving the use of manpower. Also, with the context and location information extracted, a lot of hidden features which are hard for human to observe can be learnt by applying deep learning models. For semantic segmentation, the state-of-the-art models achieves high accuracy in tasks where there are large-scale training dataset as support. In this work, we explore a new field for semantic segmentation which is Chinese ancient landscape paintings. Developing a good segmentation model for ancient paintings can convenience and systematize the image retrieval for ancient paintings. Furthermore, it can speed up and upgrade the process of art digitalization. It can also promote new classification systems which offers new perspectives for art work interpretation. The main challenges lies in the following aspects: First, there is no annotated training data for Chinese landscape paintings. Second, the characteristics of Chinese landscape paintings make the task more challenging. Most of the paintings are drawn in black ink, therefore resulting in the lack of color information. Also, having a lot of blank space is a feature of Chinese landscape paintings. Such blank spaces tend to be the sky or water, therefore resulting in the lack of boundary information. Given the challenges above, we first tested the state-of-the-art models which are Unet, DeepLab-V2 and DeepLab-V3 on our manually annotated test set. Overall, DeepLab-V3 achieves the highest segmentation score among all the models. Furthermore, we proposed to improve the performance through text removal, style transfer and adding elastic augmentation to the training procedure. The combination usage of text removal and style transfer promoted the segmentation accuracy for sky and water classes while adding elastic data augmentation benefits the performance for mountain class.