Parallel decoding is required for low density parity check (LDPC) codes to achieve high decoding throughput, but it suffers from a large set of registers and complex interconnections due to randomly located 1’s in the sparse parity check matrix. This paper proposes a new LDPC codes decoding architecture to reduce registers and alleviate complex interconnections required to store and exchange messages respectively. To reduce the number of messages to be exchanged among processing units (PUs), data flows are reconstructed to be loosely coupled by allowing duplicated operations which makes PUs exchange summation values instead of the original messages. In addition, intermediate values are grouped and stored into local storages each of which is accessed by only one processing unit. In order to save area, local storages are implemented using memories instead of registers. A partially parallel architecture is proposed to promote the memory usage and an efficient algorithm that schedules the processing order of the partially parallel architecture is also proposed to reduce the overall processing time by overlapping operations. To verify the proposed architecture, a 1024 bit rate-1/2 LDPC decoder is implemented using a 0.18 um CMOS process. The decoder runs correctly at the frequency of 154 MHz, which enables almost 1Gbps decoding throughput. Since the proposed decoder occupies an area of $10.08 mm^2$, it is less than one fifth of area compared to the previous architecture.