The MGIE model consists of a Multimodal Large Language Model that expands users request and provides “concise expressive instructions” that the diffusion model can use to edit the input image. According to the research paper, this way of editing allows the MGIE model to address “ambiguous human commands to achieve reasonable editing”.
READ: Adopt AI, don’t sit on sidelines: Microsoft’s Satya Nadella to CEOs
According to the research, existing models such as LLM-Guided Image Editing (LGIE) lack the visual perception of MGIE. The Large Language Model (LLM) is confined to a single modality, while the MLLM, with access to the input image and cross-modal understanding, derives more descriptive instructions. For example, if the user wants the image to be brighter, the MLLM within the MGIE model will let the diffusion model know which regions should be brightened.
MGIE is available as an open-source project on GitHub and can be downloaded with code, data and pre-trained models. According to VentureBeat, the image editing model is also available through a web demo hosted on Hugging Face spaces. However, Apple has not yet confirmed how it plans to utilise this model beyond research projects.
Earlier this month, During Apple’s quarterly earnings call, CEO Tim Cook confirmed that the company is working on AI features for its devices that will be announced later this year. Apple is expected to incorporate gen-AI features into its virtual assistant Siri and Messages app for features like text summarisation, suggestions and more. Similarly, other services across Apple’s platform, such as Apple Music, Pages and Keynotes, will likely get the AI treatment, too.
First Published: Feb 08 2024 | 11:50 AM IST
Note:- (Not all news on the site expresses the point of view of the site, but we transmit this news automatically and translate it through programmatic technology on the site and not from a human editor. The content is auto-generated from a syndicated feed.))