This thesis studies the problem of optimization and learning in graphical models via stochastic approximation theory. First, in various multi-agent networked environments, the system can benefit from coordinating actions of two interacting agents at some cost of coordination, where a primary goal is to develop a distributed algorithm maximizing the coordination effect. Such pairwise coordinations and nodewise costs in the network can be captured by graphical model framework, which becomes the problem of finding the optimal graph parameter. We propose various distributed algorithms that require only one-hop message passing, which can be interpreted based on either Lagrange duality theory or game theory framework. Our algorithms are motivated by a stochastic approximation method that runs a Markov chain incompletely over time, but provably guarantees the convergence to the optimal solution. In machine learning field, for the problem of parameter learning in graphical models having latent variables, the standard approach, i.e., EM algorithm, is computationally intractable for high dimensional graphs in both E and M steps. Since the substitution of one step to a faster surrogate for combating against intractability can often cause failure in convergence, we propose a new learning algorithm which is computationally efficient and provably ensures convergence to a correct optimum via multi-time-scale stochastic approximation theory, where its key idea is to run only a few cycles of Markov chain in both steps. We demonstrate our theoretical findings through extensive simulations with synthetic data and/or real-world datasets.