In a recent study from the University of Wisconsin-Madison, the University of Michigan, and Microsoft Research, the occurrence of task superposition across different LLM kinds and scales has been empirically supported. Even models taught to learn one task at a time using ICL exhibit this capacity to manage several tasks simultaneously. This implies that the capacity for simultaneous processing is an intrinsic trait that arises throughout the inference process rather than being directly related to the type of training.
Theoretically, the idea of task superposition fits in with the capabilities of transformer architectures, which constitute the basis of the majority of contemporary LLMs. By using techniques like self-attention, which enables them to concentrate on various input segments as required, transformers are renowned for their capacity to handle intricate patterns and dependencies in data. This versatility enables them to represent and interpret task-specific information within a single prompt, making it viable for them to generate responses that simultaneously address numerous tasks.
The study has also explored the internal handling of this task superposition by LLMs. It looks at how they integrate and handle various task vectors, i.e., the internal representations that are specific to each task. In essence, the model balances these task-specific representations by modifying its internal state during inference. This enables the model to generate accurate outputs for every task type that is presented in the input.
One of the study’s main conclusions is that larger LLMs are typically better able to manage several activities at once. The model can handle more jobs concurrently and improves accuracy when calibrating its output probabilities as its size grows. This indicates that larger models are more capable of producing more precise and dependable answers for all of the jobs they are doing and are better at multitasking.
These revelations have clarified the fundamental powers of LLMs and provide credence to the idea that these models are a superposition of simulators. According to this viewpoint, LLMs can simulate a variety of possible task-specific models inside of themselves, enabling them to react flexibly depending on the input’s context. These results also raise interesting concerns about how LLMs actually accomplish several tasks at once, including whether this is a result of their training and optimization or if it stems from a deeper structural property of the model. Gaining a deeper understanding of these mechanisms may help identify the limitations and possible uses of LLMs in managing intricate, multifaceted jobs.
The team has shared their primary contributions as follows.