Predicting a wind turbine's responses that correspond to a complex wind field is challenging because the responses are caused by the complex interaction between a dynamically operating mechanical system and a spatially and temporally coupled stochastic wind field. We propose a physics-inspired, data-driven prediction model called stacked dilated convolutional LSTMs (SDCL) that uses a sequence of wind fields (snapshots) as an input to predict future wind turbine responses. A SDCL is composed of a set of dilated convolutional neural networks (CNNs) combined with a long short-term memory (LSTM) to capture the spatial and temporal evolution of the turbulence structure in the input wind field. Notably, a dilated CNN with different dilation ratios along with a corresponding LSTM module, a single component of SDCL, is designed to capture the evolution of an eddy of a certain size in the turbulent wind field. Then SDCL effectively models the evolution of multiple eddies of different sizes. Through a simulation study, we have demonstrated that such a physics-inspired network architecture is effective in processing a complex wind field and thus predicting two representative future wind turbine responses, energy generation and blade root out-of-plane bending moment, more accurately than other standard deep learning architectures.