Add VQGAN #18150

patil-suraj · 2022-07-15T14:42:47Z

What does this PR do?

Adds the VQGAN model, first step for adding the Dallemega model in transformers.

This model is different from most the models available in Transformers, it's an U-Net like encoder-decoder architecture with vector quantizer bottleneck.
This is only the generator part of the GAN, intended only for inference.
It does not have common transformer style embeddings, blocks and other attributes.
Currently it does not support output_hidden_states and output_attentions, since this is complex architecture and it's not clear which hidden_states to return. Would love to hear your thoughts if we should support this.

patrickvonplaten · 2022-07-26T11:16:23Z

docs/source/en/model_doc/vqgan.mdx

+
+## Usage
+
+TODO (patil-suraj): add some tips here


Suggested change

TODO (patil-suraj): add some tips here

patrickvonplaten · 2022-07-26T11:17:44Z

src/transformers/models/vqgan/modeling_vqgan.py

@@ -0,0 +1,763 @@
+# coding=utf-8
+# Copyright 2022 The Tamin Transformers authors and The HuggingFace Inc. team. All rights reserved.


Suggested change

# Copyright 2022 The Tamin Transformers authors and The HuggingFace Inc. team. All rights reserved.

# Copyright 2022 The Taming Transformers authors and The HuggingFace Inc. team. All rights reserved.

patrickvonplaten · 2022-07-26T11:21:29Z

src/transformers/models/vqgan/feature_extraction_vqgan.py

+logger = logging.get_logger(__name__)
+
+
+class VQGANFeatureExtractor(FeatureExtractionMixin, ImageFeatureExtractionMixin):


Should we keep the name "...FeatureExtractor" here or not? cc @sgugger @LysandreJik

patrickvonplaten · 2022-07-26T11:23:38Z

src/transformers/models/vqgan/configuration_vqgan.py

+            The number of channels of the hidden representation.
+        channel_mult (`tuple`, *optional*, defaults to (1, 1, 2, 2, 4)):
+            The channel multipliers for the hidden representation.
+        num_res_blocks (`int`, *optional*, defaults to 2):


this is num_res_layers_per_block no?

patrickvonplaten · 2022-07-26T11:25:05Z

src/transformers/models/vqgan/configuration_vqgan.py

+        num_res_blocks (`int`, *optional*, defaults to 2):
+            The number of residual blocks.
+        attn_resolutions (`tuple`, *optional*, defaults to (16,)):
+            The resolutions of the attention heads.


attn_resolutions is a bit misleading IMO, I'd prefer something like resolutions_with_attention

patrickvonplaten · 2022-07-26T11:25:31Z

src/transformers/models/vqgan/configuration_vqgan.py

+        num_res_blocks (`int`, *optional*, defaults to 2):
+            The number of residual blocks.
+        attn_resolutions (`tuple`, *optional*, defaults to (16,)):
+            The resolutions of the attention heads.


Suggested change

The resolutions of the attention heads.

The resolutions at which an attention layer is used.

patrickvonplaten · 2022-07-26T11:27:12Z

src/transformers/models/vqgan/configuration_vqgan.py

+            The dimension of the quantized (latent) embedding vectors.
+        dropout (`float`, *optional*, defaults to 0.0):
+            The dropout probability.
+        resample_with_conv (`bool`, *optional*, defaults to True):


If I remember correctly this is always True no? Should we maybe just remove this parameter and default it to True?

patrickvonplaten · 2022-07-26T11:27:53Z

src/transformers/models/vqgan/modeling_vqgan.py

+_CONFIG_FOR_DOC = "VQGANConfig"
+
+VQGAN_PRETRAINED_MODEL_ARCHIVE_LIST = [
+    "valhalla/vqgan_imagenet_f16_16384",  # TODO: upload this to CompVis org.


Let's indeed change this to CompVis

patrickvonplaten · 2022-07-26T11:28:18Z

src/transformers/models/vqgan/modeling_vqgan.py

+    def __init__(self, in_channels: int, with_conv: bool):
+        super().__init__()
+
+        self.with_conv = with_conv


If I remember correctly this is always true

patrickvonplaten · 2022-07-26T11:28:39Z

src/transformers/models/vqgan/modeling_vqgan.py

+    def __init__(self, in_channels: int, with_conv: bool):
+        super().__init__()
+
+        self.with_conv = with_conv


Same here think this is always true no?

patrickvonplaten · 2022-07-26T11:31:01Z

src/transformers/models/vqgan/modeling_vqgan.py

+        self,
+        in_channels: int,
+        out_channels: int = None,
+        use_conv_shortcut: bool = False,


this param is never used and always defaults to False -> let's remove it and also remove the corresponding use_conv_short_cut param

patrickvonplaten · 2022-07-26T11:31:15Z

src/transformers/models/vqgan/modeling_vqgan.py

+        super().__init__()
+
+        self.in_channels = in_channels
+        self.out_channels = out_channels


Suggested change

self.out_channels = out_channels

patrickvonplaten · 2022-07-26T11:31:31Z

src/transformers/models/vqgan/modeling_vqgan.py

+
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.out_channels_ = self.in_channels if self.out_channels is None else self.out_channels


Suggested change

self.out_channels_ = self.in_channels if self.out_channels is None else self.out_channels

self.out_channels_ = self.in_channels if out_channels is None else out_channels

patrickvonplaten · 2022-07-26T11:32:29Z

src/transformers/models/vqgan/modeling_vqgan.py

+        conv = partial(nn.Conv2d, self.in_channels, self.in_channels, kernel_size=1, stride=1, padding=0)
+
+        self.norm = nn.GroupNorm(num_groups=32, num_channels=self.in_channels, eps=1e-6, affine=True)
+        self.q, self.k, self.v = conv(), conv(), conv()


No convolution layers anymore for attention blocks please. We have a working solution in diffusers that makes use of nn.Linear -> let's use this instead

patrickvonplaten · 2022-07-26T11:32:40Z

src/transformers/models/vqgan/modeling_vqgan.py

+
+        self.norm = nn.GroupNorm(num_groups=32, num_channels=self.in_channels, eps=1e-6, affine=True)
+        self.q, self.k, self.v = conv(), conv(), conv()
+        self.proj_out = conv()


same here, we should use nn.Linear

patrickvonplaten

This is too much of a simple copy-paste of the original code for me.

Some parameters, flags are never used and therefore we should remove them
I'm not a big fan of the original config naming such as attn_resolutions -> this is extremely hard to understand
Let's not use conv layers for attention projection layers

I'm currently refactoring I think the exact same model. How about you wait 1,2 days and then you can copy-paste my refactor + conversion script?
See: huggingface/diffusers#137

VQGan from taming transformers is IMO too important to have it be a simple copy-paste

patrickvonplaten · 2022-07-28T20:11:41Z

@patil-suraj note that you can use the current main version of diffusers as a reference of how the code should look like and you can use the conversion script to covert the official weights

patrickvonplaten · 2022-08-31T12:14:37Z

Taking over this PR

patil-suraj added 30 commits July 15, 2022 13:41

add vqgan

e80fd62

boom boom

a7fabcc

remove temb, style

8059106

fix the bug in applying attention

8ed9247

fix PT model, remove flax model

3689f38

define some common api

2f62eb3

add test

b30b1bc

todos

13acb01

clean

979741d

add feature extractor

1c952db

add tests

81e6c13

add slow test

2cab8f6

fxi url for test

a811fe2

add some docstrings

c991bb3

better config names

38c2e5d

add some properties in config

8e64124

fix config

323cce8

add docstr for config

9a2fee4

add start docstring

ee3df0b

fix copis, quality, style

382a0d5

add empty returns

65079d3

fix doc, add examples

8d8593c

fix attr name

626cacc

add temp checkpoint

fe28ccf

add summary and example in doc page

1fdaa7a

update and fix tests

76f2305

style

7cd83be

improve quantizer

92cd14f

update readme

3f3e5ec

fix copyrights

82a6447

patil-suraj requested review from LysandreJik, patrickvonplaten and sgugger July 25, 2022 17:28

patil-suraj mentioned this pull request Jul 25, 2022

dalle mega #18152

Open

5 tasks

patil-suraj added 2 commits July 26, 2022 10:54

fix doc

cb483e1

fix some tests

927f6e0

patrickvonplaten reviewed Jul 26, 2022

View reviewed changes

huggingface deleted a comment from github-actions bot Aug 31, 2022

huggingface deleted a comment from github-actions bot Sep 27, 2022

patrickvonplaten added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Sep 27, 2022

		@@ -0,0 +1,763 @@
		# coding=utf-8
		# Copyright 2022 The Tamin Transformers authors and The HuggingFace Inc. team. All rights reserved.

	# Copyright 2022 The Tamin Transformers authors and The HuggingFace Inc. team. All rights reserved.
	# Copyright 2022 The Taming Transformers authors and The HuggingFace Inc. team. All rights reserved.

		logger = logging.get_logger(__name__)


		class VQGANFeatureExtractor(FeatureExtractionMixin, ImageFeatureExtractionMixin):

	The resolutions of the attention heads.
	The resolutions at which an attention layer is used.

	self.out_channels_ = self.in_channels if self.out_channels is None else self.out_channels
	self.out_channels_ = self.in_channels if out_channels is None else out_channels

Add VQGAN #18150

Are you sure you want to change the base?

Add VQGAN #18150

Uh oh!

Conversation

patil-suraj commented Jul 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Jul 28, 2022

Uh oh!

patrickvonplaten commented Aug 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

patil-suraj commented Jul 15, 2022 •

edited

Loading