Deep learning-based pedestrian detectors are now being used in various applications, including surveillance cameras and autonomous vehicles. However, the lack of generalizability of pedestrian detectors remains a problem. Recently, it has been shown that utilizing the knowledge of large-scale models on pedestrian detection can improve the generalizability of pedestrian detectors. However, the current method uses only a single pedestrian dataset to extract pedestrian knowledge from a large-scale model. In this paper, we propose a data curation method to gather clean and diverse pedestrian instances from multiple pedestrian datasets. To filter noisy pedestrian instances, we propose CLIP-based Pedestrian Filtering Module (CPFM). CPFM utilizes the image-text-aligned property of CLIP model to filter noisy pedestrian instances. Through extensive experiments on various pedestrian datasets, we show the effectiveness and the generalizability of our proposed method.