Predicting road traffic speed is a challenging task due to different types of roads, abrupt speed change and spatial dependencies between roads; it requires the modeling of dynamically changing spatial dependencies among roads and temporal patterns over long input sequences. This paper proposes a novel spatio-temporal graph attention (ST-GRAT) that effectively captures the spatio-temporal dynamics in road networks. The novel aspects of our approach mainly include spatial attention, temporal attention, and spatial sentinel vectors. The spatial attention takes the graph structure information (e.g., distance between roads) and dynamically adjusts spatial correlation based on road states. The temporal attention is responsible for capturing traffic speed changes, and the sentinel vectors allow the model to retrieve new features from spatially correlated nodes or preserve existing features. The experimental results show that ST-GRAT outperforms existing models, especially in difficult conditions where traffic speeds rapidly change (e.g., rush hours). We additionally provide a qualitative study to analyze when and where ST-GRAT tended to make accurate predictions during rush-hour times.