#### GlobalSIP 2015

A 6.16Gb/s 4.7pJ/bit/iteration LDPC decoder for IEEE 802.11ad standard in 40nm LP-CMOS

<u>Hiroyuki Motozuka</u>, Naoya Yosoku, Takenori Sakamoto, Takayuki Tsukizawa, Naganori Shirakata and Koji Takinami Panasonic Corporation

This work was partly supported by "The research and development project for expansion of radio spectrum resources" of The Ministry of Internal Affairs and Communications, Japan.

- Background
- Overview of LDPC decoding for IEEE802.11ad
- Proposed low power architecture
  - Column-parallel architecture for IEEE802.11ad
  - Low complexity variable cyclic shifters
- Experiment results
- Conclusion

#### Background

- Overview of LDPC decoding for IEEE802.11ad
- Proposed low power architecture
  - Column-parallel architecture for IEEE802.11ad
  - Low complexity variable cyclic shifters
- Experiment results
- Conclusion

## Background

- IEEE802.11ad 60 GHz band multi-gigabit wireless
- For mobile, challenge is power consumption
- LDPC decoder is one of the most power-hungry blocks

Achieve low power by improving LDPC decoder architecture

Previous work [1] 2.50 Gb/s (3.08 Gb/s uncoded) This work 4.62 Gb/s

(6.16 Gb/s uncoded)





[1] Tsukizawa, et al, ISSCC 2013

- Background
- Overview of LDPC decoding for IEEE802.11ad
- Proposed low power architecture
  - Column-parallel architecture for IEEE802.11ad
  - Low complexity variable cyclic shifters
- Experiment results
- Conclusion

## LDPC matrices in IEEE802.11ad

- Four matrices for different code rates
- Consist of cyclic shift sub-matrices



# Min-Sum algorithm

- Min-Sum algorithm
  - Variable-Node processing (VN) column by column
  - Check-Node processing (CN) row by row



#### Parallel decoder architectures

 Parallel processing is required to achieve very high data rate – 6.16 Gb/s

Row-parallel – conventionally used for 11ad [2-4]

| 35 | 19 | 41 | 22 | 40 | 41 | 39 | 6  | 28 | 18 | 17 | 3  | 28 |    |    |    |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 29 | 30 | 0  | 8  | 33 | 22 | 17 | 4  | 27 | 28 | 20 | 27 | 24 | 23 |    |    |
| 37 | 31 | 18 | 23 | 11 | 21 | 6  | 20 | 32 | 9  | 12 | 29 |    | 0  | 13 |    |
| 25 | 22 | 4  | 34 | 31 | 3  | 14 | 15 | 4  |    | 14 | 18 | 13 | 13 | 22 | 24 |

#### Column-parallel

| 35 | 19 | 41 | 22 | 40 | 41 | 39 | 6  | 28 | 18 | 17 | 3  | 28 |    |    |    |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 29 | 30 | 0  | 8  | 33 | 22 | 17 | 4  | 27 | 28 | 20 | 27 | 24 | 23 |    |    |
| 37 | 31 | 18 | 23 | 11 | 21 | 6  | 20 | 32 | 9  | 12 | 29 |    | 0  | 13 |    |
| 25 | 22 | 4  | 34 | 31 | 3  | 14 | 15 | 4  |    | 14 | 18 | 13 | 13 | 22 | 24 |
|    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |

[2] Park, JSSC 2014[3] Weiner, ISSCC 2014[4] Li, SiPS 2013

## **Conventional row-parallel architecture**

- Memory and pipeline registers dominate the power consumption
  - Parallelized VNs require large working memory



- Background
- Overview of LDPC decoding for IEEE802.11ad
- Proposed low power architecture
  - Column-parallel architecture for IEEE802.11ad
  - Low complexity variable cyclic shifters
- Experiment results
- Conclusion

### Proposed architecture – Overview

- Column-parallel architecture for IEEE802.11ad – Reduce memory size
- Low complexity variable cyclic shifters
  - Shorten critical path, and reduce pipeline registers



- Background
- Overview of LDPC decoding for IEEE802.11ad
- Proposed low power architecture
  - Column-parallel architecture for IEEE802.11ad
  - Low complexity variable cyclic shifters
- Experiment results
- Conclusion

## Proposed column-parallel architecture

- Data flows "one way" without large working memory nor accumulators
- Small working memory for VN processing



## Proposed column-parallel architecture

 C2V memory only needs to keep CN results instead of all C2V messages



#### Memory size comparison

- Reduces memory bits by 60%
- Memory size reduction directly contributes to power reduction

#### **Comparison of Memory Size**

|                | Park<br>JSSC 2014 | Weiner<br>ISSCC2014             | This work       |
|----------------|-------------------|---------------------------------|-----------------|
| Architecture   | Row-parallel      | Row-parallel with approximation | Column-parallel |
| Implementation | eDRAM, Flip-Flop  | Flip-Flop                       | Flip-Flop       |
| Total bits     | 30240             | 20832                           | 12096           |
| Reduction      |                   | -31%                            | -60%            |

- Background
- Overview of LDPC decoding for IEEE802.11ad
- Proposed low power architecture
  - Column-parallel architecture for IEEE802.11ad
  - Low complexity variable cyclic shifters
- Experiment results
- Conclusion

## Reduce power on pipeline register

 By shortening critical path, we could reduce pipeline stages to three while achieving high speed processing

- conventionally 4 - 5 stages



## Conventional variable cyclic shifter

- Variable shifters can be implemented as barrel shifters
- Possible shift values: 0 to 41





Radix-8 barrel shifter

#### Low complexity variable cyclic shifters

- The number of required shift values is limited to 9 – 13 for each shifter
- Utilize modulo addition



- Background
- Overview of LDPC decoding for IEEE802.11ad
- Proposed low power architecture
  - Column-parallel architecture for IEEE802.11ad
  - Low complexity variable cyclic shifters
- Experiment results
- Conclusion

# Prototype chip

- Fabricated in 40nm CMOS process
- Achieve uncoded 6.16Gb/s
  - Maximum data rate in 11ad SC PHY





Designed LDPC decoder core

Die micrograph

## **Experiment Results**

- Power consumption
  - Measured with fabricated chip
  - 203mW for 6.16Gb/s (16QAM, R=1/2)



#### **Experiment Results**



10

g

#### Performance comparison

 This work achieves the lowest normalized energy efficiency of 4.7 pJ/bit/iteration besides it achieves best BER performance

|                                                       | This work         | Weiner<br>ISSCC2014 | Li<br>SiPS2013    |
|-------------------------------------------------------|-------------------|---------------------|-------------------|
| CMOS Technology                                       | 40nm LP           | 28nm FDSOI          | 40nm G            |
| Supply Voltage                                        | 1.1               | 1.07                | 0.9               |
| Hardware Mapping                                      | Column-parallel   | Row-parallel        | Half-row-parallel |
| Scheduling                                            | Flooding          | Flooding            | Layered           |
| Iterations                                            | 7                 | 3.75                | 5                 |
| Throughput [Gb/s]                                     | 6.16              | 12                  | 5.6               |
| BER @Eb/N0=5dB,<br>BPSK/QPSK                          | <10 <sup>-8</sup> | <10 <sup>-6</sup>   | <10 <sup>-6</sup> |
| Normalized Energy<br>Efficiency<br>[pJ/bit/Iteration] | 4.7               | 6.0                 | 7.0               |

- Background
- Overview of LDPC decoding for IEEE802.11ad
- Proposed low power architecture
  - Column-parallel architecture for IEEE802.11ad
  - Low complexity variable cyclic shifters
- Experiment results
- Conclusion

## Conclusion

- We propose a novel LDPC decoder architecture for IEEE802.11ad. It features:
  - Column-parallel architecture, which reduces required memory bits by 60%
  - Low complexity variable shifters, which reduces complexity by 42%
- Proposed decoder achieves:
  - Normalized energy efficiency of 4.7pJ/bit/iteration without significant BER performance degradation