In this paper, we attempt to tackle the MediaEval 2014 Retrieving Diverse Social Images challenge, a filter and refinement problem defined for a Flickr-based ranked set of social images. We build upon solutions proposed in [5] and mainly focus on exploiting the joint use of all modalities. The use of image features extracted from a deep convolutional neural network, combined with the use of distributed word representations, forms the basis of our approach.