Amazon has launched a new language model to catch up with the wildly popular chatbot ChatGPT
Just over two months ago, OpenAI made ChatGPT available to the general public, thrusting the AI-powered chatbot into the center of popular conversation and igniting discussions about how the new language model can change business, education, and other areas. Then, Chinese internet behemoths Google and Baidu debuted their chatbots to demonstrate to the public that their so-called “generative AI” (technology that can create conversational text, visuals, and more) was also ready for general use. Recently Amazon has launched a new language model to outperform GPT3.5.
Amazon‘s newly launched language model to outperform GPT3.5 is all set to blast. This new language model now exceeds many people and GPT-3.5 by 16 percentage points (75.17%) on the ScienceQA benchmark. The ScienceQA benchmark is a significant collection of annotated responses to multimodal science questions. More than 21,000 multimodal multiple-choice questions are included (MCQs). Large language models (LLMs) may now perform effectively on tasks requiring complicated reasoning thanks to recent technological advancements. Chain-of-thought (CoT) prompting, the technique of creating intermediary logical stages to demonstrate how to do something, is used. However, most recent CoT research solely examines language modality, and researchers frequently use the Multimodal-CoT paradigm when looking for CoT reasoning in multimodality. Multiple inputs, including language and visuals, are necessary for multimodality.
How does it function?
Even if the inputs come from multiple modalities like language and visual, Multimodal-CoT divides problems with more than one step into intermediate thinking processes that lead to the ultimate response. Before requesting LLMs to perform CoT, one of the most popular methods for performing Multimodal-CoT is to aggregate data from various modalities into a single modality. However, this approach has some drawbacks, one of which is that a significant amount of information is lost when converting data between formats. Small language models that have been fine-tuned can perform CoT reasoning in multimodality by fusing various parts of language and visuals. The fundamental problem with this strategy, however, is that these language models have the predisposition to generate hallucinatory reasoning patterns that materially influence the answer inference.
In order to mitigate the effects of these errors, Amazon researchers developed Multimodal-CoT, which incorporates visual features in a different training framework. The study of how CoT thinking differs from other types of reasoning is the first of its kind. According to Amazon researchers, the method performs at a state-of-the-art level on the ScienceQA benchmark, outperforming GPT-3.5 accuracy by 16 percentage points and outperforming human performance.