This paper introduces a framework for an augmented reality (AR) telepresence system that utilizes virtual avatars as a medium for the interaction between remote users. Our system is built with only commodity hardware devices for sensor and display: the Kinect for Windows v2 for capturing a user's motion and detecting an object, and the Oculus Rift DK2 with the Ovrvision for displaying camera images augmented with a virtual avatar. Each site is equipped with the same hardware setting and each user sees a virtual avatar representing the remote user augmented in the respective site. We demonstrate our system by showing two experiments; the handshaking between a user and a virtual avatar, and generation of an avatar's sitting motion on a chair different from that in a remote site.