Removal and Inpainting of Objects from Street-View Scenes using Diffusion Models
Summary
Inpainting is the process of reconstructing missing parts of an image, with the goal of producing a convincing result. This research, done in collaboration with Cyclomedia, investigates whether latent diffusion models (Rombach et al., 2022) can be used to inpaint the missing regions after an object has been removed from a street-view image. Cyclomedia semantic object masks were refined using the SAM model (Kirillov et al., 2023) to produce high-quality and accurate object coverage for inpainting. Fine-tuning was evaluated for increasing the accuracy and quality of inpainting results. A partial loss function was proposed, implemented, and evaluated. Lastly, a feature-based measure of image complexity was used to evaluate the training data and a model was trained on a subset of the most complex training images. The evaluation process includes both computational metrics and a qualitative user study. We found that the fine-tuning process improves the generative performance of the models, but that the partial loss and data filtering techniques did not result in an improvement. We speculate on reasons why that may be the case and share recommendations for future research directions.