Models in DEQ Zoo

deq-zoo currently supports six implicit models via TorchDEQ. For each project, we provide a README doc for data preparation and launching instructions.

DEQ

The first Deep Equilibrium Model is a sequence model that takes advantage of transformers in its model design. Given the injection \(U(\mathbf{x}_{0:T})\) from the input sequence and the past context \(\mathbf{z}^\star_{0:t}\), DEQ transformer predicts the next tokens via the fixed points \(\mathbf{z}^\star_{t:T}\) of a transformer block,

\[ \begin{array}{llll} & \mathbf{q}, \mathbf{k}, \mathbf{v} & = & \mathbf{w} \mathbf{z}^\star_{0:T} + U(\mathbf{x}_{0:T}) \\ & \tilde{\mathbf{z}} & = & \mathbf{z}^\star_{t:T} + \text{Attention}\left(\mathbf{q}, \mathbf{k}, \mathbf{v}\right) \\ & \mathbf{z}^\star_{t:T} & = & \tilde{\mathbf{z}} + \text{FFN}\left(\tilde{\mathbf{z}} \right) \\ \end{array} \]

where Attention is MultiHead Decoder Attention, FFN is a 2-layer feed-forward network.

In DEQ Zoo, we implement the DEQ transformer and benchmark it through the word-level language modeling on WikiText-103~\cite{wiki}. The model details and training protocols are redesigned based on TorchDEQ.

  • deq-seq: Language modeling on WikiText-103. Implementation using Pytorch DataParallel.

  • deq-lm: Faster & updated implementation using PyTorch Distributed Data Parallel (DDP) framework. This is the recommended version.

MDEQ

This directory contains the code for Multiscale Deep Equilibrium Models(MDEQ) proposed in the paper Multiscale Deep Equilibrium Models.

  • mdeq: Code for training MDEQs on CIFAR10 and ImageNet (DDP).

IGNN

This directory contains the code for Implicit Graph Neural Networks (IGNN) proposed in the paper Implicit Graph Neural Networks.

  • ignn: Code for conducting graph and node classification tasks, using datasets like PPI.

DEQ-Flow

Deep Equilibrium Optical Flow Estimation

  • deq-flow: Code for training and evaluating optical flow models.

DEQ-INR

\((\text{Implicit})^2\): Implicit Layers for Implicit Representations.

  • deq-inr: Code for converting and compressing image, audio, and video data into implicit layers for implicit representations.

DEQ-DDIM

Deep Equilibrium Approaches to Diffusion Models

  • deq-ddim: Code for performing parallel diffusion sampling & inversion using the joint equilibrium of the sampling trajectory.