Feature selection is one of the most important issues in supervised learning and there are a lot of different feature selection approaches in the literature. Among them one recent approach is to use Gaussian pro-cess (GP) because it can capture well the hidden relevance between the features of the input and the out-put. However, the existing feature selection approaches with GP suffer from the scalability problem due to high computational cost of inference with GP. Moreover, they use the Kullback-Leibler (KL) divergence in the sensitivity analysis for feature selection, but we show in this paper that the KL divergence under-estimates the relevance of important features in some cases of classification. To remedy such drawbacks of the existing GP based approaches, we propose a new feature selection method with scalable variational Gaussian process (SVGP) and L2 divergence. With the help of SVGP the proposed method exploits given large data sets well for feature selection through so-called inducing points while avoiding the scalability problem. Moreover, we provide theoretical analysis to motivate the choice of L2 divergence for feature selection in both classification and regression. To validate the perfor-mance of the proposed method, we compare it with other existing methods through experiments with synthetic and real data sets. (c) 2022 Elsevier B.V. All rights reserved.