Advances in editing tools and compression technologies have made it possible to easily manipulate videos without leaving any visual traces and then compress them using video codecs. Among the various forging operations, fine-grained manipulations such as filtering and noise addition accompany various forgery scenarios. Detecting low-level features left in videos by such manipulations in order to spot forgeries is a challenging task. Furthermore, when fine-grained manipulations are applied to videos, the presence of compression artifacts left by codecs are also added, making it more difficult to classify the manipulations to the videos. To overcome these obstacles, we propose a dual-path network (DPN) for identifying fine-grained manipulations in H.264 videos. The DPN consists of two single-path networks: one for learning low-level features caused by manipulations and the other utilizing a discrete cosine transform (DCT) histogram for capturing block DCT-based compression artifacts. The fusion network incorporates features learned in each stream, enabling comprehensive forensic clue learning. Experimental results indicate that the proposed DPN achieves superior performance compared to comparable baselines in terms of multi-class classification. Furthermore, our work can localize the manipulated areas through temporal- and spatial-localization.