{"id":23578,"date":"2024-04-11T00:56:46","date_gmt":"2024-04-11T00:56:46","guid":{"rendered":"https:\/\/news.talkwithrattan.com\/index.php\/2024\/04\/11\/apple-working-on-ferret-ui-ai-model-that-can-understand-iphone-ui\/"},"modified":"2024-04-11T00:56:46","modified_gmt":"2024-04-11T00:56:46","slug":"apple-working-on-ferret-ui-ai-model-that-can-understand-iphone-ui","status":"publish","type":"post","link":"https:\/\/news.talkwithrattan.com\/index.php\/2024\/04\/11\/apple-working-on-ferret-ui-ai-model-that-can-understand-iphone-ui\/","title":{"rendered":"Apple Working on \u2018Ferret UI\u2019 AI Model That Can Understand iPhone UI"},"content":{"rendered":"<div style=\"text-align:center\"><img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"675\" src=\"https:\/\/i1.wp.com\/i.gadgets360cdn.com\/large\/ferret_ui_1712743038746.jpg?resize=1200,675&amp;ssl=1\" class=\"attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"Apple Working on \u2018Ferret UI\u2019 AI Model That Can Understand iPhone UI\" title=\"Apple Working on \u2018Ferret UI\u2019 AI Model That Can Understand iPhone UI\" \/><\/div><p> <br \/>\n<\/p>\n<div>\n<p><a href=\"https:\/\/www.gadgets360.com\/apple\">Apple<\/a> researchers have published yet another paper on artificial intelligence (AI) models, and this time the focus is on understanding and navigating through smartphone user interfaces (UI). The yet-to-be peer-reviewed research paper highlights a large language model (LLM) dubbed Ferret UI, which can go beyond traditional computer vision and understand complex smartphone screens. Notably, this is not the first paper on AI published by the research division of the tech giant. It has already published a <a href=\"https:\/\/www.gadgets360.com\/ai\/news\/apple-researchers-mm1-family-of-multimodal-ai-models-up-to-30-billion-parameters-5260749\">paper<\/a> on multimodal LLMs (MLLMs) and <a href=\"https:\/\/www.gadgets360.com\/ai\/news\/apple-researchers-on-device-ai-model-can-understand-contextual-prompts-5357909\">another<\/a> on on-device AI models.<\/p>\n<p>The pre-print version of the research <a href=\"https:\/\/www.gadgets360.com\/ai\/news\/v\" target=\"_blank\" rel=\"nofollow noopener\">paper<\/a> has been published on arXiv, an open-access online repository of scholarly papers. The paper is titled \u201cFerret-UI: Grounded Mobile UI Understanding with Multimodal LLMs\u201d and focuses on expanding the use case of MLLMs. It highlights that most language models with multimodal capabilities cannot understand beyond natural images and are functionality \u201crestricted\u201d. It also states the need for AI models to understand complex and dynamic interfaces such as those on a smartphone.<\/p>\n<p>As per the paper, Ferret UI is \u201cdesigned to execute precise referring and grounding tasks specific to UI screens, while adeptly interpreting and acting upon open-ended language instructions.\u201d In simple terms, the vision language model can not only process a smartphone screen with multiple elements representing different information but it can also tell a user about them when prompted with a query.<\/p>\n<p><span class=\"mt-enclosure mt-enclosure-image\" style=\"display: inline;\"><\/span><\/p>\n<p class=\"ins_instory_dv_caption\">How Ferret UI processes information on a screen<br \/><span class=\"ins_instory_span_credit\">Photo Credit: Apple<\/span><\/p>\n<p>\u00a0<\/p>\n<p>Based on an image shared in the paper, the model can understand and classify widgets and recognise icons. It can also answer questions such as \u201cWhere is the launch icon\u201d, and \u201cHow do I open the Reminders app\u201d. This shows that the AI is not only capable of explaining the screen it sees, but can also navigate to different parts of an iPhone based on a prompt.<\/p>\n<p>To train Ferret UI, the Apple researchers created data of varying complexities themselves. This helped the model in learning basic tasks and understanding single-step processes. \u201cFor advanced tasks, we use GPT-4 [40] to generate data, including detailed description, conversation perception, conversation interaction, and function inference. These advanced tasks prepare the model to engage in more nuanced discussions about visual components, formulate action plans with specific goals in mind, and interpret the general purpose of a screen,\u201d the paper explained.<\/p>\n<p>The paper is promising, and if it passes the peer-review stage, Apple might be able to utilise this capability to add powerful tools to the <a href=\"https:\/\/www.gadgets360.com\/tags\/iphone\">iPhone<\/a> that can perform complex UI navigation tasks with simple text or verbal prompts. This capability appears to be ideal for Siri.<\/p>\n<style type=\"text\/css\"><![CDATA[.embed-container { position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden; max-width: 100%; } .embed-container iframe, .embed-container object, .embed-container embed { position: absolute; top: 0; left: 0; width: 100%; height: 100%; }]]><\/style>\n<hr\/>\n<div class=\"downloadtxt\"><i>Affiliate links may be automatically generated &#8211; see our <a href=\"https:\/\/www.gadgets360.com\/ethics\" target=\"_blank\" rel=\"noopener\">ethics statement<\/a> for details.<\/i><\/div>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/www.gadgets360.com\/ai\/news\/apple-ferret-ui-ai-model-can-understand-iphone-ui-5412515#rss-gadgets-news\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Apple researchers have published yet another paper on artificial intelligence (AI) models, and this time the focus is on understanding and navigating through smartphone user interfaces (UI). The yet-to-be peer-reviewed research paper highlights a large language model (LLM) dubbed Ferret UI, which can go beyond traditional computer vision and understand complex smartphone screens. Notably, this [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":23579,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"tdm_status":"","tdm_grid_status":"","fifu_image_url":"https:\/\/i.gadgets360cdn.com\/large\/ferret_ui_1712743038746.jpg","fifu_image_alt":"","footnotes":""},"categories":[607],"tags":[828,5167,28842,890,28843,9353,826,3958,28844,5864],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/posts\/23578"}],"collection":[{"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/comments?post=23578"}],"version-history":[{"count":1,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/posts\/23578\/revisions"}],"predecessor-version":[{"id":23580,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/posts\/23578\/revisions\/23580"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/media\/23579"}],"wp:attachment":[{"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/media?parent=23578"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/categories?post=23578"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/tags?post=23578"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}