The learning of visual representations is a crucial problem in machine learning and computer vision. However, previous studies have focused mainly on improving model performance on benchmark datasets like ImageNet, limiting their applicability in real-world scenarios. In this paper, we propose two solutions to overcome these challenges and address the difficulties in representation learning from uncurated datasets. Firstly, we propose learning object-centric representations by separating objects from backgrounds in multi-object images. This enables us to remove scene biases and enhance the robustness of the model. Secondly, we employ semi-supervised learning when dealing with uncurated, unlabeled data. This allows us to improve the model's performance by leveraging large amounts of unlabeled data. Specifically, for the first problem, we propose object-centric learning techniques in unsupervised and patch-based models. For the second problem, we propose semi-supervised learning techniques in image classification and image-to-text models. Through our proposed techniques, we demonstrate excellent performance by efficiently utilizing uncurated data in various experimental settings.