*Institute of Solid State Physics, Russian Academy of Sciences October 20, 2014* 

## Hybrid Superconducting Ferromagnetic Devices for Energy Efficient Electronics



Igor V. Vernik Email: vernik@hypres.com HYPRES, Inc. 175 Clearbrook Road, Elmsford, NY 10523, USA www.hypres.com



## **HYPRES**, Inc.

www.hypres.com A Full-Cycle Digital-RF Electronics Company

Elmsford, NY A 1983 spinoff of IBM (Yorktown Heights)

- Integrated Circuit Design (Cadence-based suites)
- Integrated Circuit Fabrication (All-thin-film process)
- Multi-Chip Module Integration
- **Cryopackaging and Interface Electronics**
- Final Assembly and Test







From chip design to complete turn-key systems





### **HYPRES: A Brief History**





### The Beginning

#### ➢Nb/AIOx/Nb JJ Fab

- Innovative 'spray' cooling of chip
- System Integration
- Award-winning Product

#### New Ideas

#### RSFQ Technology

- Ultrafast Digital Logic Circuits
- Analog-to-digital converters: Flash and Oversampled delta modulator

#### **Search of Market**

- Closed-Cycle Refrigerator (COTS Cryocooler)
   Cryopackaging
- Variety of Digital and Mixed-Signal ICs Demonstrations
- Primary Voltage Standard Product

#### Focused Growth

- ➢ Digital-RF
- High performance computing
- System Integration Infrastructure
- All Digital Receivers and Transmitters

### **HYPRES Superconductor IC Foundry**



~6 mask releases per year (current number is 358)

~400 chips per wafer

High J<sub>c</sub> (20 kA/cm<sup>2</sup>) with MoN resistors – under development
 ▶ RSFQ, ERSFQ & eSFQ (Digital & Mixed-signal)
 Medium J<sub>c</sub> (1 kA/cm<sup>2</sup> and 4.5 kA/cm<sup>2</sup>) with Mo resistors
 ▶ RSFQ, ERSFQ & eSFQ (Digital & Mixed-signal)
 Low J<sub>c</sub> (30 A/cm<sup>2</sup>) with AI, Mo, or Ti/PdAu resistors

- 1 V and 10 V Voltage Standard
- QC circuits for mK operation





All-Digital Receiver (ADR) – 12,000 JJs

### **Digital-RF Receiver**



**RSFQ chip:** 1 cm<sup>2</sup>, 11K JJs, 30 GHz clock Band-pass ADC integrated with digital signal processor





30 Gs/s wideband digital receiver for satellite communications

ADR-7 – Complete cryogenic Digital-RF satellite communication receiver system

## Outline



## Motivation

- Energy-efficiency is a priority for high-end computing
- Memory or <u>lack</u> of memory
- Possible approaches to the memory problem
   Superconducting three terminal devices
   Conclusions

## **Computing and Energy**



Starting with *small* – ending up with *LARGE*:

- Shannon-Neuman-Landauer minimum energy per bit  $E_{BITMIN} = k_B T \ln 2 \sim 4 \times 10^{-21} \text{ J}$  (@ T = 300K)
- In practical CMOS:  $10^5 10^6 E_{BITMIN}$  or (1-10  $\mu$ W per gate at 3 GHz)
- Modern CMOS processor ~10<sup>8</sup> transistors operating at ~ 3 GHz: 0.1-1 kW
- Modern data centers: ~10-100 MW







*Intel* Core i7 Processor (Nehalem), 263 mm<sup>2</sup>, 731 Million transistors

Supercomputer: K-Computer (Japan), 12.7 MW

7

## **Application:** Data Centers and Supercomputers **W**

21-27 PB RAM

~1.07 PUE

1900-6800 PB disk

290,000 ft<sup>2</sup>(27,000 m<sup>2</sup>)

►EE

SUPERCOMPUTER SITES

Memory\*

Power

Space

Cooling\*

\* estimated

### Data Centers: Facebook Data Center



- 2014 completion target
- Cost: ~760 M\$
- Nearby Lule River generates 9% of Sweden's electricity (~4.23 GW)
- Average annual temperature: 1.3 °C

#### Courtesy of S. Holmes

#### **Data Centers:**

- **Cloud computing**
- Banking
- Shopping
- **Social Networks**
- Search Engines....

Lulea data center: 120 mW (max power)

#### **Supercomputers:** K-Computer (Japan)



**Top500 No. 4 supercomputer: K-computer (Japan):** 10.51 petaflop/s, 12.7 MW

> Top500 No. 1: Tianhe-2 (China): 33.9 petaflop/s, **17.8 MW**

> > rating updated in June 2014

### From CMOS experts at the Rebooting Computing summit Dec. 2013

- 1. Officially Moore's Law ends in 2020 at 7nm, but nobody cares, because <u>11nm isn't any</u> <u>better than 14nm</u>, which was only marginally better than 22nm.
- With Dennard scaling already dead since 2004, and <u>thermal dissipation</u> issues thoroughly constrain the integration density – effectively <u>ending the multicore era</u>: "Dark Silicon" problem (only part of available cores can be run simultaneously).
- 3. CMOS becomes <u>a commodity</u> cost is the only remaining differentiator at a given technology node (22nm or 14nm)
  - Profit margins plummet, and interest by those with money (T.I., Intel, AMD, IBM) in investing in any more expensive new designs (or silicon process technologies) asymptotically approaches zero. The historical virtuous cycle dies.
- 4. Hardware for servers still needed, but will have to be built out of the commodity parts.
  - This is what NVIDIA means when they call their supercomputer sales a "zero billion dollar business." Their components are essential to it, but they can't justify investing what it would cost to play in that space if they couldn't recoup their money from a much more lucrative area like gaming
- 5. "Quantum computing" isn't going to save us. If some <u>other technology</u> is going to, it <u>had</u> <u>better show up soon</u>, because it's already 8 years late.

Courtesy of Bob Colwell (former Intel chief IA-32 architect on the Pentium Pro, Pentium II, Pentium III, and Pentium 4)

Can superconducting technology respond to the this challenge?

## Superconductivity can help



Courtesy of M. Manheimer

# The largest superconducting effort: IBM Josephson project



- Goal: Josephson Signal Processor
- Labor: 120 at maximum (~1980)
- Technology: Nb/IBMium (Pb-alloy) latching
- Performance:
  - 5-ns cycle time,
  - 100-MHz bipolar power supply
  - Dwell time for punch-through suppression
- Successes:
  - ~20-ps turn-on + risetime
  - Multi-chip Cross-Sectional Module (CSM)
  - Nb edge-junction technology
  - 3-D card-on-board packaging work (solder and/or Hg)
- Ended: fall of 1983:
  - Lacked the J<sub>c</sub> control and
  - IBM was ramping up Si, SiGe and GaAs in mid-80s



\* All graphics from IBM J. Res. Dev. Vol. 24, No. 2, March, 1980

### **1983 Issues with Digital SCE**



Competition – "Moore's law"

**Fab** – manufacturing issues (Pb and Pb alloys)

Logic – low performance of latching logic (needs ac power/clock-crosstalk, punchthrough at reset ~ 1ns)

Memory - lack of large RAM

**Cryocooling - relies on liquid He** 

## **Today's Superconductor Digital Electronics**



### Major Changes since the "IBM days" of the 70-80's (since 1983)

- Problems with conventional CMOS scaling
  - **no** more blind reliance on semiconductor (CMOS) progress...
- □ An all-niobium refractory fabrication process
  - **not** lead ...
- **Single Flux Quantum (RSFQ) logic family** 
  - **not** "latching logic" ...
  - Recent development: energy-efficient SFQ logic families (ERSFQ/eSFQ, RQL, LV-RSFQ, and LR-RSFQ)
- **Off-the-shelf cryocoolers or closed-cycle refrigerators (CCRs)** 
  - **not** liquid helium cooling for products ...
- Memory is still missing

## **Computing Applications: What is needed**



- Energy-efficient logic to implement processing cores
  - New energy-efficient generation of SFQ logic eSFQ, ERSFQ, RQL, LV-RSFQ
  - Need to increase complexity to implement 64-bit microprocessors mostly engineering challenge ERSFQ adder

eSFQ demux

- Energy-efficient interconnect to communicate between processors and memories
  - Superconducting microstrip interconnect allows ballistic, fast transport
- Memory technology to implement dense Random Access Memories (RAMs)
  - Traditional superconducting electronics based on SIS Josephson junctions could not deliver RAM with capacity >4 kbit
  - New technologies are needed physics and engineering challenge

### **Demonstrated ERSFQ Parallel Adders**



36 HA cells, 180 mA dc bias, 372 aJ

16 HA cells, 80 mA dc bias, 166 aJ

#### Adder designs done in HYPRES 4- and 6-layer and MIT-LL 4-layer processes

A. Kirichenko, I. Vernik, J. Vivalda, R. Hunt and D. Yohannes, "ERSFQ 8-bit parallel adders as a process benchmark," submitted for publication at *IEEE Trans. Appl. Supercon.*, August 2014.

## **Demonstrated eSFQ circuits**



1:16 and 1:4 demultiplexers



Shift register, 2JJ/bit and 0.3 aJ/bit



184-bit shift register, 5JJ/bit, 0.8 aJ/bit



Shift Register/Counter circuits: shift register (0.3 aJ/bit) and Toggle Flip-Flops (0.5 aJ/bit)

## **History of Josephson memories – only a selection**



1987 NEC Japan, 1024 bit NDRO Josephson memory Nagasawa et al., IEEE Journal of Solid state circuits, Vol. 24, No. 5, (1989)





1999 NEC Japan, Dr. Nagasawa
4096 bit vortex transitional memory
256 x 16 bit organized
tested at 620 MHz
\*S. Nagasawa et al., IEEE TAS, Vol. 9, No. 2, p. 3708, 1999

2000 ISTEC SRL Japan, Dr. Nagasawa 256 bit vortex transitional memory all dc-powered

Thomas Ortlepp

### Hybrid CMOS-Josephson memory at UC Berkeley

|  |                      | <b>64 kbit</b> (4096 x 16)                                      |
|--|----------------------|-----------------------------------------------------------------|
|  | power<br>consumption | 12 mW read,<br>21 mW write                                      |
|  | Read access          | 400 ps (experiment)                                             |
|  | Cycle time           | 1 ns (1GHz)*                                                    |
|  | JJ carrier           | 4.5 kA/cm <sup>2</sup> Hypres,<br>5 x 5 mm <sup>2</sup>         |
|  | CMOS                 | 65 nm TSMC CMOS<br>low V <sub>t</sub> , 2 x 1.5 mm <sup>2</sup> |
|  | I/O                  | 5 mV (latching gates)                                           |

### This is the largest functional 4 Kelvin memory [1].

[1] T. Van Duzer et al., IEEE TAS, Vol. 23, No. 3, 1700504 (4pp), 2013
[2] T. Ortlepp et al., Supercond. Sci. Technol. Vol. 26, 035007 (12pp), 2013
[3] T. Ortlepp et al., IEEE TAS, Vol. 23, No. 3, 1400104, 2013

**Thomas Ortlepp** 

ortlepp@rsfq.de

## **Cryogenic Magnetic Memory**

Hybrid circuits with cryogenic magnetoresistive memory elements

- Memory cell based on spintronic elements with addition of JJs (for low impedance) or nanowire switches (for high impedance)
- Polarized spin injection for magnetization reversal (spin torque transfer)
- JJ periphery (address decoders, sense, etc)

### Superconducting-Ferromagnetic (SF) Junctions

- Memory cell based on Magnetic JJs (MJJ) w/ or w/o additional JJs or SF switches
- Magnetic field for magnetization reversal (field programmable)
- JJ periphery (address decoders, sense, etc)







### Spin Torque Transfer (STT) MRAM Technologies





Courtesy of A. Kent, R. Buhrman and T. Ohki

## Magnetic Josephson Junctions (MJJs)

Long-range spin-triplet MJJs (N. Birge *et al.*, MSU) – switchable π-JJ (programmable phase shifters)



Spin-singlet MJJs (N. Newman *et al.,* ASU) – switchable I<sub>c</sub>



21

## **NST** Memory Element Based on a Pseudo-Spin-Valve-Barrier JJ

Low I<sub>c</sub>R<sub>n</sub>

### **Device Structure:**

JJ with two ferromagnetic barriers in series



### Features:

- •Demonstrated scalable switching of Jc
- •Josephson phase can also switch between 0 &  $\pi$
- Nonvolatile (at a cryogenic temperature)
- •Demonstrated  $\Delta J_c/J_c$  up to 500 %
- •Write: similar to MRAM (field or current)

### **Challenges:**

- •Write efficiency and speed
- •Control circuit designs and implementation
- •Electrical properties not compatible to SIS JJs

#### Courtesy of B. Baek, S. Benz

#### **Principle:**

Exchange field effect on Josephson coupling

### →Jc(parallel) $\neq$ Jc(anti-parallel) →Also Phase(0 state) $\neq$ Phase( $\pi$ state)



Baek, B. et al. Nat. Commun. 5:3888 (2014)

### JMRAM is a superconducting MRAM





JMRAM memory element is a magnetic tunnel junction with superconducting electrodes

Memory state – critical current magnetic hysteresis (programmable  $o/\pi$  phase shift

Write – spin reversal by applied magnetic field

Read – use additional SQUID made of fast SIS junctions

Drawback: memory cell size is determined by a readout SQUID – a dense RAM is not likely possible.

Courtesy of A. Herr, D. Miller

#### ИФТТ РАН Fast Magnetic Josephson Junction (MJJ) (X ISSP RAS



T. Larkin, Appl. Phys. Lett., vol. 100, 222601, 2012





At temperature close to  $T_c$ , calculations are performed analytically in the frame of the GL equations. At low temperatures, the Usadel equations are numerically solved.

- Josephson current as a function of s and F layers thickness, temperature and exchange energy of F film was calculated.
- □ SIsFS junctions are characterized by several distinct regimes
  - Mode 1a S-I-sFS tunnel junction in 0 or  $\pi$
  - Mode 1b SIs-F-S sandwich in 0 or π
  - Mode 2 SInFS tunnel junction in 0 or  $\pi$



- □ The crossover between these regimes which is caused by snitting the location of a weak link from the tunnel barrier 'l' to the F-layer.
- A model for critical current switching for SIsFS that is in good agreement with experiment

M. Kupriyanov and A. Golubov group Vernik et al. IEEE Tran. on Appl. Supercond., 23(3), 1701208 (2013) Bakurskiy et al. Appl. Phys. Lett., 102, 192603 (2013) Bakurskiy et al. Phys. Rev. B, 88(14), 144519 (2013)



### **Scalability of Fast MJJ**

## $\bigotimes$

### Fast MJJ with high IcRn

- 2 μm SIsFS were fabricated and
   W/R was observed
- SIsFS approach to realize memory element will possibly hold to ~ 1 μm dimension



**SIsFS** 





## Fast Scalable MJJ with Double-barrier





Exchange field effect on Josephson coupling  $\rightarrow$  Jc(parallel)  $\neq$  Jc(anti-parallel)  $\rightarrow$  Can be also Phase(0 state)  $\neq$  Phase( $\pi$  state)



- □ SIsFsFS MJJs have high *I*<sub>c</sub>*R*<sub>N</sub> electrically compatible to conventional JJs:
  - no need for readout SQUIDs
  - good scalability potential





The 2.8 % difference is not yet sufficient for applications

### **Integration into Random Access Memory Array**



- MJJ is programmable JJ a nonvolatile memory element. But it is a two-terminal device without input/output isolation
- For random access memory (RAM), one needs to address (select) an individual memory cell without disturbing neighboring cells in RAM array
  - Need a current switch with good input/output isolation (good to have)

### **Example:** conventional room-temperature spintronic RAM (STT MRAM)



### *nTron:* Nanowire 3-terminal Device

Can be used for RAM as line drivers and memory cell selector

- Planar NbN or Nb, simple to ٠ fabricate
- SFQ compatible

**Demonstrated:** 

- Comparator; 66nA grey zone
- Digital Logic, half adder
- 20x gain ٠
- Good In/Out isolation
- High Z drive ٠





50 nm





### **Superconductor-Ferromagnet Transistor**







Superconductor-Ferromagnet Transistor (SFT) - Superconducting 3-terminal device with <u>good input/output isolation</u>, capable of amplifying signal, and working at 4.2 K is useful for superconducting memory as a <u>memory cell selector</u>



Courtesy of I. Nevirkovets

## **SFT Device Configurations**





Single-Acceptor Double-Acceptor SFT

### **Acceptor Ic Modulation by Injector Current**

Single-acceptor SFT



Injector (blue) and acceptor I-V curves at 4.2 K Dependence of acceptor critical current  $I_c$  on injection current  $I_i$ 









*Ic* vs. *H* dependence for the acceptor junction of the device  $D_2$  at different levels of the injection current. Curves from top to bottom are for the injection current from 0 to 4 mA applied with the 0.4 mA increment.

Maximum Josephson current of the acceptor junction vs. the current through the injector junction for devices  $D_1$  and  $D_2$ 



## Voltage Gain and I/O Isolation



### **RIPPLE: Rapid Integrated Process for Layer Extension**



**RIPPLE-2:** a 6 layer process (additional 2 layers)

Uniformity of sheet inductance (L<sub>s</sub>) in planarized Mn2 and Mn1 layers over 150mm wafer:

 $L_{c}$  (Mn2) = 0.279 pH ± 5.66%  $L_{s}$  (Mn1) = 0.172 pH ± 6.24%

D. Yohannes, R. Hunt, J. Vivalda, D. Amparo, A. Cohen, I. Vernik, and A. Kirichenko, "Planarized, Extendible, Multilaver Fabrication Process for Superconducting Electronics," submitted for publication at IEEE Trans. Appl. Supercon., August 2014



## HYPRES Fab Process Integrated Process

Use standard HYPRES 6-layer RIPPLE-2 Process as a platform





Memory elements (COST, CSHE, MJJs)

JJ-based digital circuits

Courtesy of D. Yohannes

## **3D: Memory over Processor**



### Vertically integrated distributed memory resources

- Denser memory arrays are accessed vertically from more sparse processing level
- Enables low-power, fast memory access



- Maximizes energy-efficiency and memory access bandwidth
- Applicable for gate-programmable arrays (Spin-PGAs)



- Energy-Efficiency crisis in computing caused a major reset in all computing technologies and created an opening for cryogenic superconductive technologies.
- This caused a major transformation in superconductive digital electronics <u>new unconventional devices</u> are now acceptable.
  - Memory Problem broke an opening for new memory devices:
    - Circuit hybrids with cryospintronic elements
    - Superconducting spintronics: superconducting-ferromagnetic Josephson junctions
    - New 3-terminal devices

# If you have new device ideas, which might be of interest... this is the time to put it forward