Understanding of trafﬁc scenes robustly as a basis of executing driving strategies and planning routes is a cornerstone for autonomous driving, where a bird view is an essential component to create panoramas of surroundings. Since there is a large gap between bird views and other views, such as the front view, the task of synthesizing the related bird views is quite challenging.
Generative adversarial networks (GAN) developed rapidly in recent years have been utilized a minimax game between a generator module and a discriminator module for image conversion and synthesis. Then, this dissertation applies a new framework for the synthesis of bird view for the modern autonomous driving:
Firstly, inspired by correspondence between pixels, this dissertation applies a pixel level GAN to achieve one to one generation from a front view to the related bird view. In the generator module, unlike the original GAN, which uses random vectors as input, the proposed method uses an encoder and a decoder. This method directly inputs the image as the source domain, retains the semantic characteristics and is constructed by the convolutional neural network. In the discriminator module, based on the real/fake discriminator, the proposed network add another discriminator, which is called identification discriminator to improve the correlation between the source domain and the target domain, avoiding the loss of identification information.
Secondly, we use a dataset which is similar with the autonomous driving scene in the real world from Grand Theft Auto V (GTA5) video game. The camera automatically toggles between front view and bird view at each time step, then packs the paired images with low similarity in the same frame as the training set and test set. In order to output the related bird view, a method for fine-tuning of the network is discussed to design layers, parameters and reasonable epochs of network. Additionally, various front views from more complex scenes are applied for testing, According to the parameter setting, epoch setting and architecture optimization, bird view is generated respectively.
Finally, an experimental evaluation is extended based on LPIPS algorithm which contains two modules, one is for calculating distance between image patches while another is for the perceptual loss calculation. The evaluation is combined with the LPIPS algorithm to calculate the difference score between the synthetic image and the real bird view. Compared with other methods, the error is reduced by 40.96% on average. The parallax image is also visualized to build the distance map, then a comprehensive analysis of the pixel level generative adversarial network can be achieved based on the score and the distance map objectively.
In summary, the proposed network neither uses complex geometric transformations nor avoids the introduction of multiple intermediate views, which can be applied to the field of autonomous driving to realize the transformation from a front view into a high-resolution bird view under the road environment.