Meta, the parent company of Facebook, is bringing its own contender to the table in the race for the most advanced multimodal AI model.
While large language models have been impressive, they are limited to processing text. Multimodal AI takes things a step further by being able to understand and generate not just text, but also images, sound recordings, and even videos.
Meta's answer to this challenge is Chameleon, a multimodal model that utilises an "early-fusion" approach. This means that unlike previous "late-fusion" techniques that process different forms of data separately, Chameleon treats all data – text, images, and even code – as a single entity.
To achieve this, the Chameleon team developed a system that converts all data into a common set of tokens, similar to how words are processed in large language models.
This allows for powerful computing techniques to be applied to this combined data, leading to a more comprehensive understanding of complex information.
One of the key advantages of Chameleon is that it is an end-to-end model. This means it handles the entire process of understanding and generating multimodal data, from start to finish.
The researchers behind Chameleon also implemented special training techniques to allow the model to work with this diverse range of token types.
This involved a two-stage learning process and a massive dataset specifically designed for multimodal learning. The system was then fine-tuned on high-speed GPUs for a staggering 5 million hours.
Testing has shown that Chameleon is a multimodal powerhouse. It achieves state-of-the-art performance in tasks like image captioning, even surpassing previous text-only models from Meta.
In some cases, Chameleon is even able to match or outperform significantly larger models like OpenAI's GPT-4 and Meta's own Gemini Pro, demonstrating its impressive versatility within a single, unified framework.
Recently, Osmond Chia, a journalist for The Straits Times in Singapore, had a disconcerting experience with Meta AI, a new chatbot designed to rival ChatGPT and Google's Gemini.
Intrigued by the capabilities of these large language models, Chia decided to test Meta AI with a simple question: "Who is Osmond Chia?"
The response he received was nothing short of shocking.
Meta AI fabricated an elaborate backstory, portraying Chia as a Singaporean photographer who had been jailed for sexual assault between 2016 and 2020.
The fabricated story included details of a prolonged trial, multiple victims, and widespread public outrage.
Perplexed by the misinformation, Chia pressed Meta AI for further details. The chatbot maintained its fabricated narrative, even citing The Straits Times as a source.
This detail led Chia to believe that Meta AI might have misinterpreted his byline on articles covering court cases.
Despite reporting the error and indicating that the information was demonstrably false, Meta AI continued to return the same inaccurate response. This raised concerns about the chatbot's underlying algorithms and data integration methods.
Experts believe that Meta AI's malfunction stemmed from a technology called Retrieval-Augmented Generation (RAG). RAG allows chatbots to access and process vast amounts of information from the internet to provide more relevant responses.
However, in Chia's case, RAG seems to have backfired.
Instead of accurately analysing Chia's byline and the content of his articles, Meta AI might have misinterpreted headlines or keywords, leading to the fabricated narrative.
This incident highlights the potential pitfalls of RAG when not implemented with proper safeguards and fact-checking mechanisms.
Chia's experience is not an isolated incident.
In April 2023, ChatGPT was accused of falsely accusing a law professor of sexual harassment. Similarly, an Air Canada chatbot provided inaccurate information, leading to a lost court case for the airline.
These cases illustrate the dangers of chatbots as potential vectors for misinformation. The difficulty of holding these AI systems accountable further complicates the issue.
Unlike traditional media platforms, there's no way to track the reach of a chatbot's inaccurate response, making it challenging to prove defamation or hold companies accountable.
Companies like Meta often shield themselves with disclaimers in their terms of use, placing the onus of verifying information on the user.
However, this creates a conundrum.
Chatbots are promoted as reliable sources of information, yet users are expected to independently verify every answer. This inconsistency raises questions about the true purpose of these systems.
Given the high cost of legal battles, most users are likely to resort to reporting misinformation to the platforms themselves. But the effectiveness of this approach remains to be seen.
Yann LeCun, the leading artificial intelligence scientist at Meta, has thrown cold water on the idea that large language models (LLMs) like ChatGPT will ever achieve true human-like intelligence.
In an interview with the Financial Times, LeCun argued that these models have several critical limitations that prevent them from reaching human-level understanding and reasoning.
According to LeCun, LLMs lack a fundamental grasp of the physical world. They don't possess persistent memory, meaning they can't learn and build upon past experiences in the way humans do.
Furthermore, LeCun argues that LLMs are incapable of true reasoning or hierarchical planning.
He emphasises that these models rely heavily on the specific training data they are fed, and their responses are limited by the parameters of that data.
This, according to LeCun, makes them "intrinsically unsafe" because they can be easily manipulated to produce misleading or incorrect outputs.
Isn't this a case of Meta's own product facing similar issues? Is LeCun inadvertently highlighting the very problems his own products encounter?
Meta's approach hinges on open-source AI projects like LLaMa, which has garnered significant attention within the AI community. However, these projects haven't translated into direct revenue streams yet.
The hope lies in Meta's vast AI infrastructure, which the company believes will pave the way for global leadership in the field.
Notably, the substantial capital expenditures previously ridiculed for the "metaverse" project are now viewed favourably due to their potential role in AI development.
One key difference between Meta and its competitors is the company's strategy for monetising AI. Meta has begun charging for access to larger AI models, but a significant portion of the technology remains freely available.
This approach aims to leverage the scale of its social media platforms – Facebook, Instagram, Threads, and WhatsApp – to indirectly generate revenue.
Essentially, by making AI a readily available commodity, Meta hopes to attract more users and interactions within its ecosystem, ultimately leading to a more valuable advertising platform.
However, this optimistic outlook is challenged by scepticism from prominent figures like Gary Marcus. Marcus argues that LLMs are overrated and prone to errors.
He believes the current enthusiasm surrounding them represents a "suspension of disbelief," and that alternative approaches like neuro-symbolic AI hold greater promise.
Neuro-symbolic AI attempts to mimic the human brain's functionality, a concept that Marcus believes was prematurely abandoned by researchers.
In simpler terms, Marcus suggests that while LLMs can handle basic customer service interactions, they lack the capability to deal with complex situations.
When faced with demanding customers, companies will still require human intervention. If this scepticism becomes mainstream, Meta's investors could face significant losses.
LeCun's perspective stands in contrast to the current wave of investment in LLM technology.
Many companies are pouring resources into developing ever-more sophisticated LLMs with the hope of achieving artificial general intelligence (AGI) – a level of machine intelligence that surpasses human capabilities.
However, LeCun proposes a different approach. He and his team at Meta's AI research lab are working on a new generation of AI systems based on "world modelling."
This approach aims to create AI that can build an understanding of the world around it, similar to how humans learn and interact with their environment.
While this approach holds promise for the future of AI, LeCun acknowledges it's a long-term vision, potentially taking ten years or more to bear fruit.
Despite the uncertainties surrounding AI, Meta possesses a powerful advantage: its dominance in the digital advertising landscape.
The company's advertising revenue continues to surge, fueled by its unparalleled ability to target users across various platforms.
This capability allows Meta to present highly relevant advertisements at a lower cost compared to traditional content-based advertising.
Essentially, Meta has established itself as the world's free communication network. Its vast infrastructure facilitates free calls and messaging, funded entirely by advertising revenue.
This model thrives on the ability to connect potential customers based on their conversations and online interactions. Unlike traditional media, Meta doesn't incur content creation costs, further boosting its profitability (currently at 33% net margins).
However, this relentless pursuit of profits raises ethical concerns. Meta's algorithm-driven advertising platform has been criticised for allowing harmful advert content to be promoted on their platforms.
Despite promises to address these issues, critics argue that Meta may prioritise revenue over principles, potentially allowing detrimental content to persist.
Not the first time, Meta faces scrutiny regarding its approval on unethical advertisements.
This time, the parent company of Facebook and Instagram faces severe criticism over its failure to prevent the spread of AI-manipulated political adverts during India's recent election.
According to a report shared exclusively with The Guardian, Meta approved a series of adverts containing hate speech, disinformation, and incitement to religious violence that targets religious groups and political leaders, including known slurs towards Muslims and Hindu supremacist rhetoric.
Despite Meta's supposed mechanisms to detect and block harmful content, these adverts, submitted by India Civil Watch International and Ekō, were approved, magnifying the platform's capacity to amplify existing harmful narratives.
India's Prime Minister, Narendra Modi, alongside Facebook CEO, Mark Zuckerberg (Source: The Guardian)
Alarmingly, Meta's systems failed to recognise that the approved adverts featured AI-manipulated images, despite the company's public pledge to prevent the spread of AI-generated or manipulated content during the Indian election.
While some adverts were rejected for violating community standards, the ones targeting Muslims were approved, breaching Meta's own policies on hate speech, misinformation, and incitement of violence.
Moreover, Meta failed to acknowledge these adverts as political or election-related, allowing them to circumvent India's election rules, which ban political advertising during voting periods.
This failure highlights the platform's inadequacies in combating hate speech and disinformation, raising questions about its reliability in managing elections worldwide.
In response to the revelations, Meta emphasised the requirement for advertisers to comply with all applicable laws and its commitment to removing violating content.
However, the findings expose significant gaps in Meta's mechanisms for tackling hate speech and disinformation. Despite claims of extensive preparation for India's election, Meta's inability to detect and prevent the dissemination of harmful content calls into question its effectiveness in safeguarding democratic processes.
As the company grapples with ongoing challenges in content moderation, including the spread of Islamophobic hate speech and conspiracy theories, there are growing concerns about its ability to address similar issues in elections globally, raising doubts about its trustworthiness as a platform for political discourse.
Meta's platforms, including Facebook, Instagram, and WhatsApp, have become hotbeds for scams globally. From phishing scams to malware distribution, Meta's lax oversight and insufficient safety measures raise concerns about user security.
The company's response to fabricated news further highlights its vulnerability to exploitation, prompting questions about its ability to combat scams effectively.
Speculation arises whether Meta is turning a blind eye to these concerns in favour of allocating funds to its AI development endeavours.
As these technologies continue to evolve, robust fact-checking mechanisms and clearer user expectations are crucial to prevent the spread of misinformation and protect individuals from reputational harm.
Meta presents itself as an AI leader pushing the boundaries of technology, yet its social media platforms are riddled with safety issues.
While the company invests heavily in AI advancements like Chameleon, its ability to mitigate real-world problems like misinformation and political manipulation remains questionable. This inconsistency raises a critical question: can Meta truly be an AI champion if it can't address the safety concerns plaguing its existing products?
The answer may lie in how effectively Meta bridges the gap between its ambitious AI goals and the urgent need to safeguard its users.
This week's appointment of external advisors suggests a potential shift towards a more holistic approach, but only time will tell if Meta can reconcile its futuristic vision with its present-day shortcomings.