Extracting manufactured features such as buildings, roads, and water from aerial images is critical for urban planning, traffic management, and industrial development. Recently, convolutional neural networks (CNNs) have become a popular strategy to capture contextual features automatically. In order to train CNNs, a large training data are required, but it is not straightforward to use free-accessible data sets due to imperfect labeling. To address this issue, we make a large scale of data sets using RGB aerial images and convert them to digital maps with location information such as roads, buildings, and water from the metropolitan area of' Seoul in South Korea. The numbers of training and test data are 72400 and 9600, respectively. Based on our self-made data sets, we design a multiobject segmentation system and propose an algorithm that utilizes pyramid pooling layers (PPLs) to improve U-Net. Test results indicate that U-Net with PPLs, called UNetPPL, learn fine-grained classification maps and outperforms other algorithms of fully convolutional network and U-Net, achieving the mean intersection of union (mIOU) of 79.52 and the pixel accuracy of 87.61% for four types of objects (i.e., building, road, water, and background).