Recently, active research has been made on sparse representation of images and video for data compression. The transforms often used to compress images or video include DCT (discrete cosine transform) and Wavelet transforms. Since these transforms concentrate most of the energy of images into a small number of low frequency coefficients, the compression performance can be greatly increased. Recently, there have been studies to improve the compression performance by learning a transform or a dictionary so as to be more suitable for specific data. These studies enable a more sparse representation than the existing transforms for certain data. In this thesis, we study how to learn sparsifying transforms for directionally predicted pixel blocks of H.264/AVC and to compare them with the existing linear transform models. Furthermore, by interpreting the above linear transform models as a neural network with one layer, we extend them to nonlinear sparsifying transforms based on neural networks with multiple layers to obtain more sparsifying transforms. We compare the nonlinear sparsifying transforms with the linear sparsifying transforms in terms of compact representation capability.