-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Add Ovis2 model and processor implementation #37088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
110 commits
Select commit
Hold shift + click to select a range
b3bfa35
Add Ovis2 model and processor implementation
thisisiron 51c9efd
Apply style fixes
thisisiron 9891508
Add unit tests for Ovis2 image processing and processor
thisisiron fde1b2a
Refactor image processing functions for clarity and efficiency
thisisiron 6b0e5d4
Add Ovis2 ImageProcessorFast
thisisiron 6b8ae7e
Refactor Ovis2 code
thisisiron 91f72b2
Refactor Ovis2 model components and update processor functionality
thisisiron aacbab3
Fix repo consistency issues for Ovis2: docstring, config cleanup
thisisiron 7305a22
Update Ovis2 model integration tests
thisisiron 355a91c
Update Ovis2 configuration and processing classes for improved docume…
thisisiron ac232e0
Remove duplicate entry for 'ovis2' in VLM_CLASS_NAMES
thisisiron 16d71f8
Fix conflict
thisisiron a7b5094
Fix import order
thisisiron 4d56043
Update image processor class names
thisisiron 7f1cbc0
Update Ovis2 model structure
thisisiron a4e37e6
Refactor Ovis2 configuration
thisisiron 11a2a09
Merge remote-tracking branch 'upstream/main' into add-ovis2
thisisiron 5999659
Fix typos
thisisiron f66426c
Refactor Ovis2 model classes and remove unused code
thisisiron ae1ea0d
Fix typos
thisisiron 4e540b5
Refactor Ovis2 model initialization
thisisiron 83a7cca
Fiix typos
thisisiron 234edb2
Merge branch 'main' into add-ovis2
thisisiron db59777
Remove Ovis2 model mapping from MODEL_MAPPING_NAMES in modeling_auto.py
thisisiron b604a70
Add license and update type hints
thisisiron f26717d
Refactor token function and update docstring handling
thisisiron 890abdc
Add license
thisisiron 97e84a4
Merge branch 'main' into add-ovis2
thisisiron 67a45ab
Merge branch 'main' into add-ovis2
thisisiron 764e74f
Merge branch 'main' into add-ovis2
thisisiron 178fc10
Add Ovis2 model support and update documentation
thisisiron 2e278a4
Refactor Ovis2 model structure and enhance multimodal capabilities
thisisiron 17afef9
Update Ovis2 weight mapping for consistency and clarity in key patterns
thisisiron 1a87ab3
Remove unused 'grids' parameter from Ovis2 model and Update processin…
thisisiron f3c498e
Refactor Ovis2 model test structure to include Ovis2Model
thisisiron ec0ffd5
Merge branch 'main' into add-ovis2
thisisiron 0f418e8
Add optional disable_grouping param to Ovis2ImageProcessorFast
thisisiron afd50aa
Refactor type hints in Ovis2 modules
thisisiron bdbcb22
Add licensing information in Ovis2 modules and tests
thisisiron cd369a6
Refactor Ovis2 model by removing unused methods
thisisiron b459f50
Refactor Ovis2 model tests by renaming test classes and removing skip…
thisisiron 4ae2f70
Merge branch 'main' into add-ovis2
thisisiron 57abe35
Refactor Ovis2 model output classes
thisisiron 541dc7f
Refactor Ovis2 weight conversion and Update model embedding classes
thisisiron 5e7846c
Merge branch 'main' into add-ovis2
thisisiron d13eaea
Refactor Ovis2 model imports and remove unused functions
thisisiron a10e3db
Enhance vision configuration extraction in Ovis2 weight conversion
thisisiron 0501e0f
Refactor Ovis2 model's forward method to remove interpolation option
thisisiron c19231f
Update Ovis2 model documentation
thisisiron 6083141
Merge branch 'main' into add-ovis2
thisisiron c27bf25
Refactor Ovis2 model input handling and tokenizer configuration
thisisiron 58c0c0a
Merge branch 'main' into add-ovis2
thisisiron 94fd529
Update return type hints in Ovis2 model
thisisiron 8402244
Merge branch 'main' into add-ovis2
thisisiron 2cd3837
Remove commented-out code
thisisiron 1a5f6a9
fix config for tests and remove key mappings
Cyrilvallez e919722
Update tokenizer configuration to use add_special_tokens method
thisisiron 2de5a94
Merge branch 'main' into add-ovis2
thisisiron e7e2464
Merge branch 'add-ovis2' of https://github.com/thisisiron/transformer…
thisisiron d9a8599
skip torchscript
Cyrilvallez 94ba3aa
Fix image placeholder generation in Ovis2Processor
thisisiron 8392223
Merge branch 'add-ovis2' of https://github.com/thisisiron/transformer…
thisisiron 0f19c79
Merge branch 'main' into add-ovis2
thisisiron d335aaa
Refactor Ovis2 model to rename visual_table to visual_embeddings_table
thisisiron 91e924c
Enhance Ovis2 model by adding vision_feature_select_strategy parameter
thisisiron 3b02fe1
Refactor Ovis2 model weights conversion and architecture
thisisiron 7376160
Refactor Ovis2 model by removing vision_feature_select_strategy param…
thisisiron 683d3e9
Merge branch 'main' into add-ovis2
thisisiron a8ffbd4
Update Ovis2 model examples
thisisiron 432a718
Refactor Ovis2 model
thisisiron 1d4a1e9
Update Ovis2 model
thisisiron 933cadd
Update Ovis2 model configuration
thisisiron 9ecdd76
Merge branch 'main' into add-ovis2
thisisiron c024a10
Refactor Ovis2 model test setup
thisisiron 5fb7870
Merge branch 'main' into add-ovis2
thisisiron 3fcdb3a
Merge branch 'main' into add-ovis2
thisisiron a48468a
Refactor flash attention support
thisisiron 5b02165
Merge branch 'main' into add-ovis2
thisisiron b5b2eb6
Refactor
thisisiron 5e9c276
Fix typo
thisisiron 0f3163a
Refactor
thisisiron 0c13cfc
Refactor model classes
thisisiron 8d495ee
Update expected output in Ovis2
thisisiron 9d995c3
Refactor docstrings
thisisiron ccfdb43
Fix
thisisiron 192cc10
Merge branch 'main' into add-ovis2
thisisiron cfe3a3b
Fix
thisisiron 530aad0
Fix
thisisiron 5d92825
Update input in tests
thisisiron 7bb0e2b
Merge branch 'main' into add-ovis2
thisisiron c4a83b6
Fix
thisisiron ac31c2a
Merge branch 'main' into add-ovis2
thisisiron 7b78029
Fix get_decoder method
thisisiron c230e72
Refactor
thisisiron 3b0a94a
Refactor Ovis2
thisisiron 7cff46b
Merge branch 'main' into add-ovis2
thisisiron 9afdbad
Fix
thisisiron bd69fb5
Fix
thisisiron 3ed0cb6
Fix test
thisisiron 2b0621c
Add get_placeholder_mask
thisisiron 11802a4
Merge branch 'main' into add-ovis2
thisisiron 38b6f15
Merge branch 'main' into add-ovis2
thisisiron 0e7d6ed
Refactor Ovis2 model tests
thisisiron 7ce5c4e
Fix
thisisiron 0c6571d
Refactor
thisisiron 13010fa
Merge branch 'main' into add-ovis2
thisisiron 2773182
Fix
thisisiron 8642f7d
Fix
thisisiron 62f2023
Fix Ovis2 test
thisisiron dd47f25
Merge branch 'main' into add-ovis2
thisisiron File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Add unit tests for Ovis2 image processing and processor
- Loading branch information
commit 989150862f02f66dffa01835e9793c8aeccf1eee
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,160 @@ | ||
| import unittest | ||
|
|
||
| from transformers.image_utils import SizeDict | ||
| from transformers.testing_utils import require_torch, require_vision | ||
| from transformers.utils import is_torch_available, is_torchvision_available, is_vision_available | ||
|
|
||
| from ...test_image_processing_common import ImageProcessingTestMixin, prepare_image_inputs | ||
|
|
||
|
|
||
| if is_torch_available(): | ||
| import torch | ||
|
|
||
| if is_vision_available(): | ||
| from transformers import Ovis2ImageProcessor | ||
|
|
||
| # if is_torchvision_available(): | ||
| # from transformers import Ovis2ImageProcessorFast | ||
|
|
||
|
|
||
| class Ovis2ImageProcessingTester(unittest.TestCase): | ||
| def __init__( | ||
| self, | ||
| parent, | ||
| batch_size=7, | ||
| num_channels=3, | ||
| image_size=18, | ||
| min_resolution=30, | ||
| max_resolution=400, | ||
| do_resize=True, | ||
| size=None, | ||
| do_normalize=True, | ||
| do_pad=False, | ||
| image_mean=[0.48145466, 0.4578275, 0.40821073], | ||
| image_std=[0.26862954, 0.26130258, 0.27577711], | ||
| do_convert_rgb=True, | ||
| ): | ||
| super().__init__() | ||
| size = size if size is not None else {"height": 20, "width": 20} | ||
| self.parent = parent | ||
| self.batch_size = batch_size | ||
| self.num_channels = num_channels | ||
| self.image_size = image_size | ||
| self.min_resolution = min_resolution | ||
| self.max_resolution = max_resolution | ||
| self.do_resize = do_resize | ||
| self.size = size | ||
| self.do_normalize = do_normalize | ||
| self.image_mean = image_mean | ||
| self.image_std = image_std | ||
| self.do_pad = do_pad | ||
| self.do_convert_rgb = do_convert_rgb | ||
|
|
||
| def prepare_image_processor_dict(self): | ||
| return { | ||
| "do_resize": self.do_resize, | ||
| "size": self.size, | ||
| "do_normalize": self.do_normalize, | ||
| "image_mean": self.image_mean, | ||
| "image_std": self.image_std, | ||
| "do_convert_rgb": self.do_convert_rgb, | ||
| "do_pad": self.do_pad, | ||
| } | ||
|
|
||
| def expected_output_image_shape(self, images): | ||
| return self.num_channels, self.size["height"], self.size["width"] | ||
|
|
||
| def prepare_image_inputs(self, equal_resolution=False, numpify=False, torchify=False): | ||
| return prepare_image_inputs( | ||
| batch_size=self.batch_size, | ||
| num_channels=self.num_channels, | ||
| min_resolution=self.min_resolution, | ||
| max_resolution=self.max_resolution, | ||
| equal_resolution=equal_resolution, | ||
| numpify=numpify, | ||
| torchify=torchify, | ||
| ) | ||
|
|
||
|
|
||
| @require_torch | ||
| @require_vision | ||
| class Ovis2ProcessingTest(ImageProcessingTestMixin, unittest.TestCase): | ||
| image_processing_class = Ovis2ImageProcessor if is_vision_available() else None | ||
| # fast_image_processing_class = Ovis2ImageProcessorFast if is_torchvision_available() else None | ||
|
|
||
| def setUp(self): | ||
| super().setUp() | ||
| self.image_processor_tester = Ovis2ImageProcessingTester(self) | ||
|
|
||
| @property | ||
| def image_processor_dict(self): | ||
| return self.image_processor_tester.prepare_image_processor_dict() | ||
|
|
||
| def test_image_processor_properties(self): | ||
| for image_processing_class in self.image_processor_list: | ||
| image_processor = image_processing_class(**self.image_processor_dict) | ||
| self.assertTrue(hasattr(image_processor, "do_resize")) | ||
| self.assertTrue(hasattr(image_processor, "size")) | ||
| self.assertTrue(hasattr(image_processor, "do_normalize")) | ||
| self.assertTrue(hasattr(image_processor, "image_mean")) | ||
| self.assertTrue(hasattr(image_processor, "image_std")) | ||
| self.assertTrue(hasattr(image_processor, "do_convert_rgb")) | ||
|
|
||
| def test_slow_fast_equivalence_crop_to_patches(self): | ||
| dummy_image = self.image_processor_tester.prepare_image_inputs(equal_resolution=False, torchify=True)[0] | ||
|
|
||
| image_processor_slow = self.image_processing_class(**self.image_processor_dict, crop_to_patches=True) | ||
| # image_processor_fast = self.fast_image_processing_class(**self.image_processor_dict, crop_to_patches=True) | ||
|
|
||
| encoding_slow = image_processor_slow(dummy_image, return_tensors="pt") | ||
| # encoding_fast = image_processor_fast(dummy_image, return_tensors="pt") | ||
|
|
||
| torch.testing.assert_close(encoding_slow.num_patches, encoding_fast.num_patches) | ||
| self.assertTrue(torch.allclose(encoding_slow.pixel_values, encoding_fast.pixel_values, atol=1e-1)) | ||
| self.assertLessEqual( | ||
| torch.mean(torch.abs(encoding_slow.pixel_values - encoding_fast.pixel_values)).item(), 1e-3 | ||
| ) | ||
|
|
||
| def test_slow_fast_equivalence_batched_crop_to_patches(self): | ||
| # Prepare image inputs so that we have two groups of images with equal resolution with a group of images with | ||
| # different resolutions in between | ||
| dummy_images = self.image_processor_tester.prepare_image_inputs(equal_resolution=True, torchify=True) | ||
| dummy_images += self.image_processor_tester.prepare_image_inputs(equal_resolution=False, torchify=True) | ||
| dummy_images += self.image_processor_tester.prepare_image_inputs(equal_resolution=True, torchify=True) | ||
|
|
||
| image_processor_slow = self.image_processing_class(**self.image_processor_dict, crop_to_patches=True) | ||
| # image_processor_fast = self.fast_image_processing_class(**self.image_processor_dict, crop_to_patches=True) | ||
|
|
||
| encoding_slow = image_processor_slow(dummy_images, return_tensors="pt") | ||
| # encoding_fast = image_processor_fast(dummy_images, return_tensors="pt") | ||
|
|
||
| torch.testing.assert_close(encoding_slow.num_patches, encoding_fast.num_patches) | ||
| self.assertTrue(torch.allclose(encoding_slow.pixel_values, encoding_fast.pixel_values, atol=1e-1)) | ||
| self.assertLessEqual( | ||
| torch.mean(torch.abs(encoding_slow.pixel_values - encoding_fast.pixel_values)).item(), 1e-3 | ||
| ) | ||
|
|
||
| def test_crop_to_patches(self): | ||
| # test slow image processor | ||
| image_processor = self.image_processor_list[0](**self.image_processor_dict) | ||
| image = self.image_processor_tester.prepare_image_inputs(equal_resolution=True, numpify=True)[0] | ||
| processed_images = image_processor.crop_image_to_patches( | ||
| image, | ||
| min_patches=1, | ||
| max_patches=6, | ||
| patch_size={"height": 20, "width": 20}, | ||
| ) | ||
| self.assertEqual(len(processed_images[0]), 5) | ||
| self.assertEqual(processed_images[0].shape[:2], (20, 20)) | ||
|
|
||
| # test fast image processor (process batch) | ||
| image_processor = self.image_processor_list[1](**self.image_processor_dict) | ||
| image = self.image_processor_tester.prepare_image_inputs(equal_resolution=True, torchify=True)[0] | ||
| processed_images = image_processor.crop_image_to_patches( | ||
| image.unsqueeze(0), | ||
| min_patches=1, | ||
| max_patches=6, | ||
| patch_size=SizeDict(height=20, width=20), | ||
| ) | ||
| self.assertEqual(len(processed_images[0]), 5) | ||
| self.assertEqual(processed_images.shape[-2:], (20, 20)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| import json | ||
| import shutil | ||
| import tempfile | ||
| import unittest | ||
|
|
||
| from transformers.testing_utils import require_av, require_vision | ||
| from transformers.utils import is_torch_available, is_vision_available | ||
|
|
||
| from ...test_processing_common import ProcessorTesterMixin | ||
|
|
||
|
|
||
| if is_vision_available(): | ||
| from transformers import ( | ||
| AutoProcessor, | ||
| Ovis2ImageProcessor, | ||
| Ovis2Processor, | ||
| Qwen2TokenizerFast, | ||
| ) | ||
|
|
||
| if is_torch_available: | ||
| pass | ||
|
|
||
|
|
||
| @require_vision | ||
| class Ovis2ProcessorTest(ProcessorTesterMixin, unittest.TestCase): | ||
| processor_class = Ovis2Processor | ||
|
|
||
| def setUp(self): | ||
| self.tmpdirname = tempfile.mkdtemp() | ||
| image_processor = Ovis2ImageProcessor() | ||
| tokenizer = Qwen2TokenizerFast.from_pretrained("thisisiron/Ovis2-1B-hf") | ||
| processor_kwargs = self.prepare_processor_dict() | ||
|
|
||
| processor = Ovis2Processor( | ||
| image_processor=image_processor, tokenizer=tokenizer, **processor_kwargs | ||
| ) | ||
| processor.save_pretrained(self.tmpdirname) | ||
|
|
||
| def get_tokenizer(self, **kwargs): | ||
| return AutoProcessor.from_pretrained(self.tmpdirname, **kwargs).tokenizer | ||
|
|
||
| def get_image_processor(self, **kwargs): | ||
| return AutoProcessor.from_pretrained(self.tmpdirname, **kwargs).image_processor | ||
|
|
||
| def prepare_processor_dict(self): | ||
| return { | ||
| "chat_template": "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n'}}{% if message['content'] is string %}{{ message['content'] }}{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' %}{{ '<image>\n' }}{% elif content['type'] == 'text' %}{{ content['text'] }}{% endif %}{% endfor %}{% endif %}{{'<|im_end|>\n'}}{% endfor %}{% if add_generation_prompt %}{{'<|im_start|>assistant\n' }}{% endif %}", | ||
| } # fmt: skip | ||
|
|
||
| def test_processor_to_json_string(self): | ||
| processor = self.get_processor() | ||
| obj = json.loads(processor.to_json_string()) | ||
| for key, value in self.prepare_processor_dict().items(): | ||
| # chat_tempalate are tested as a separate test because they are saved in separate files | ||
| if key != "chat_template": | ||
| self.assertEqual(obj[key], value) | ||
| self.assertEqual(getattr(processor, key, None), value) | ||
|
|
||
| # Copied from tests.models.llava.test_processor_llava.LlavaProcessorTest.test_chat_template_is_saved | ||
| def test_chat_template_is_saved(self): | ||
| processor_loaded = self.processor_class.from_pretrained(self.tmpdirname) | ||
| processor_dict_loaded = json.loads(processor_loaded.to_json_string()) | ||
| # chat templates aren't serialized to json in processors | ||
| self.assertFalse("chat_template" in processor_dict_loaded.keys()) | ||
|
|
||
| # they have to be saved as separate file and loaded back from that file | ||
| # so we check if the same template is loaded | ||
| processor_dict = self.prepare_processor_dict() | ||
| self.assertTrue(processor_loaded.chat_template == processor_dict.get("chat_template", None)) | ||
|
|
||
| def tearDown(self): | ||
| shutil.rmtree(self.tmpdirname) | ||
|
|
||
| def test_chat_template(self): | ||
| processor = AutoProcessor.from_pretrained("thisisiron/Ovis2-1B-hf") | ||
| expected_prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<image>\nWhat is shown in this image?<|im_end|>\n<|im_start|>assistant\n" | ||
|
|
||
| messages = [ | ||
| { | ||
| "role": "user", | ||
| "content": [ | ||
| {"type": "image"}, | ||
| {"type": "text", "text": "What is shown in this image?"}, | ||
| ], | ||
| }, | ||
| ] | ||
|
|
||
| formatted_prompt = processor.apply_chat_template(messages, add_generation_prompt=True) | ||
| self.assertEqual(expected_prompt, formatted_prompt) | ||
|
|
||
| @require_av | ||
| def test_chat_template_dict(self): | ||
| processor = AutoProcessor.from_pretrained("thisisiron/Ovis2-1B-hf") | ||
| messages = [ | ||
| { | ||
| "role": "user", | ||
| "content": [ | ||
| {"type": "image"}, | ||
| {"type": "text", "text": "What is shown in this image?"}, | ||
| ], | ||
| }, | ||
| ] | ||
|
|
||
| formatted_prompt_tokenized = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True) | ||
| expected_output = [[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 27, 1805, 397, 3838, 374, 6839, 304, 419, 2168, 30, 151645, 198, 151644, 77091, 198]] # fmt: skip | ||
| self.assertListEqual(expected_output, formatted_prompt_tokenized) | ||
|
|
||
| out_dict = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True) | ||
| self.assertListEqual(list(out_dict.keys()), ["input_ids", "attention_mask"]) | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.