Bookkeeping

Novel AI model inspired by neural dynamics from the brain Massachusetts Institute of Technology

Bashir explains that how quickly a model answers a question has a big impact on its energy use. Market research firm TechInsights estimates that the three major producers (NVIDIA, AMD, and Intel) shipped 3.85 million GPUs to data centers in 2023, up from about 2.67 million in 2022. It has been estimated that, for each kilowatt hour of energy a data center consumes, it would need two liters of water for cooling, says Bashir. While electricity demands of data centers may be getting the most attention in research literature, the amount of water consumed by these facilities has environmental impacts, as well. Plus, generative AI models have an especially short shelf-life, driven by rising demand for new AI applications.

MIT Sea Grant students explore the intersection of technology and offshore aquaculture in Norway

It’s now upending educational models and, in some cases, complicating efforts to improve student outcomes. “Throughout my career, I’ve tried to be a person who researches education and technology and translates findings for people who work in the field,” says Reich. An efficient image-generation model would unlock a lot of possibilities,” he says. “LLMs are a good interface for all sorts of models, like multimodal models and models that can reason. It uses about 31 percent less computation than state-of-the-art models. An autoregressive model utilizes an autoencoder to compress raw image pixels into discrete tokens as well as reconstruct the image from predicted tokens.

Konstantin Rusch and Daniela Rus have developed what they call “linear oscillatory state-space models” (LinOSS), which leverage principles of forced harmonic oscillators — a concept deeply rooted in types of revenue physics and observed in biological neural networks. One new type of AI model, called “state-space models,” has been designed specifically to understand these sequential patterns more effectively. MIT neuroscientists find a surprising parallel in the ways humans and new AI models solve complex problems. AI supports the clean energy transition as it manages power grid operations, helps plan infrastructure investments, guides development of novel materials, and more. Large language models can learn to mistakenly link certain sentence patterns with specific topics — and may then repeat these patterns instead of reasoning. In this context, papers that unify and connect existing algorithms are of great importance, yet they are extremely rare.

Lisa Su ’90, SM ’91, PhD ’94 to deliver MIT’s 2026 Commencement address

With HART, the researchers developed a hybrid approach that uses an autoregressive model to predict compressed, discrete image tokens, then a small diffusion model to predict residual tokens. Autoregressive models, commonly used for predicting text, can generate images by predicting patches of an image sequentially, a few pixels at a time. On the other hand, the autoregressive models that power LLMs like ChatGPT are much faster, but they produce poorer-quality images that are often riddled with errors.

But while generative models can achieve incredible results, they aren’t the best choice for all types of data. For instance, Isola’s group is using generative AI to create synthetic image data that could be used to train another intelligent system, such as by teaching a computer vision model how to recognize objects. Diffusion models were introduced a year later by researchers at Stanford University and the University of California at Berkeley. Just a few years ago, researchers tended to focus on finding a machine-learning algorithm that makes the best use of a specific dataset. “We were generating things way before the last decade, but the major distinction here is in terms of the complexity of objects we can generate and the scale at which we can train these models,” he explains. Generative AI can be thought of as a machine-learning model that is trained to create new data, rather than making a prediction about a specific dataset.

How artificial intelligence can help achieve a clean energy future

  • In fact, MIT researchers have developed a speech-to-reality system, an AI-driven workflow that allows them to provide input to a robotic arm and “speak objects into existence,” creating things like furniture in as little as five minutes.
  • Their tool, known as HART (short for hybrid autoregressive transformer), can generate images that match or exceed the quality of state-of-the-art diffusion models, but do so about nine times faster.
  • They want a human recruiter, a human doctor who can see them as distinct from other people.”
  • “AI isn’t like learning to tie knots; we don’t know what AI is, or is going to be, yet.”
  • Instead of having a model make an image of a chair, perhaps it could generate a plan for a chair that could be produced.

“We’ve shown that just one very elegant equation, rooted in the science of information, gives you rich algorithms spanning 100 years of research in machine learning. During the development of HART, the researchers encountered challenges in effectively integrating the diffusion model to enhance the autoregressive model. Because the diffusion model de-noises all pixels in an image at each step, and there may be 30 or more steps, the process is slow and computationally expensive. The generation process consumes fewer computational resources than typical diffusion models, enabling HART to run locally on a commercial laptop or smartphone. Their hybrid image-generation tool uses an autoregressive model to quickly capture the big picture and then a small diffusion model to refine the details of the image. In 2017, researchers at Google introduced the transformer architecture, which has been used to develop large language models, like those that power ChatGPT.

Noman Bashir, a fellow with the MIT Climate and Sustainability Consortium and a postdoc at CSAIL, speaks with Wired reporter Molly Taft about AI and energy consumption. While it is difficult to estimate how much power is needed to manufacture a GPU, a type of powerful processor that can handle intensive generative AI workloads, it would be more than what is needed to produce a simpler CPU because the fabrication process is more complex. Chilled water is used to cool a data center by absorbing heat from computing equipment. In a 2021 research paper, scientists from Google and the University of California at Berkeley estimated the training process alone consumed 1,287 megawatt hours of electricity (enough to power about 120 average U.S. homes for a year), generating about 552 tons of carbon dioxide. A data center is a temperature-controlled building that houses computing infrastructure, such as servers, data storage drives, and network equipment. Images for download on the MIT News office website are made available to non-commercial entities, press and the general public under a Creative Commons Attribution Non-Commercial No Derivatives license.

Questions: Justin Reich on the state of teacher speech in America

“With LinOSS, we can now reliably learn long-range interactions, even in sequences spanning hundreds of thousands of data points or more.” This approach provides stable, expressive, and computationally efficient predictions without overly restrictive conditions on the model parameters. AI often struggles with analyzing complex information that unfolds over long periods of time, such as climate trends, biological signals, or financial data. By stacking multiple active components based on new materials on the back end of a computer chip, this new approach reduces the amount of energy wasted during computation.

MIT scientists debut a generative AI model that could create molecules addressing hard-to-treat diseases

The researchers filled in one gap by borrowing ideas from a machine-learning technique called contrastive learning and applying them adp bookkeeping to image clustering. It includes everything from classification algorithms that can detect spam to the deep learning algorithms that power LLMs. After joining the Freeman Lab, Alshammari began studying clustering, a machine-learning technique that classifies images by learning to organize similar images into nearby clusters. The researchers didn’t set out to create a periodic table of machine learning.

  • However, existing state-space models often face challenges — they can become unstable or require a significant amount of computational resources when processing long data sequences.
  • A user only needs to enter one natural language prompt into the HART interface to generate an image.
  • A new study finds that people are neither entirely enthusiastic nor totally averse to AI.
  • This minimal overhead of the additional diffusion model allows HART to retain the speed advantage of the autoregressive model while significantly enhancing its ability to generate intricate image details.

Novel AI model inspired by neural dynamics from the brain

Reich invites people directly impacted by AI to help develop solutions to the challenges its ubiquity presents. “If we think teachers provide content and context to support learning and students no longer perform the exercises housing the content and providing the context, that’s a serious problem.” Schools are also struggling to measure how student learning loss looks in the age of AI. AI’s arrival has left schools scrambling to respond to multiple challenges, like how to ensure academic integrity and maintain data privacy.

While bigger datasets are one catalyst that led to the generative AI boom, a variety of major research advances also led to more complex deep-learning architectures. The base models underlying ChatGPT and similar systems work in much the same way as a Markov model. In text prediction, a Markov model generates the next word in a sentence by looking at the previous word or a few previous words. A generative AI system is one that learns to generate more objects that look like the data it was trained on. For instance, such models are trained, using millions of examples, to predict whether a certain X-ray shows signs of a tumor or if a particular borrower is likely best fixed asset management software in 2021 to default on a loan. By leveraging natural language, the system makes design and manufacturing more accessible to people without expertise in 3D modeling or robotic programming.

Each time a model is used, perhaps by an individual asking ChatGPT to summarize an email, the computing hardware that performs those operations consumes energy. The power needed to train and deploy a model like OpenAI’s GPT-3 is difficult to ascertain. By 2026, the electricity consumption of data centers is expected to approach 1,050 terawatt-hours (which would bump data centers up to fifth place on the global list, between Japan and Russia). This would have made data centers the 11th largest electricity consumer in the world, between the nations of Saudi Arabia (371 terawatt-hours) and France (463 terawatt-hours), according to the Organization for Economic Co-operation and Development. Globally, the electricity consumption of data centers rose to 460 terawatt-hours in 2022. While data centers have been around since the 1940s (the first was built at the University of Pennsylvania in 1945 to support the first general-purpose digital computer, the ENIAC), the rise of generative AI has dramatically increased the pace of data center construction.

Their tool, known as HART (short for hybrid autoregressive transformer), can generate images that match or exceed the quality of state-of-the-art diffusion models, but do so about nine times faster. Popular diffusion models, such as Stable Diffusion and DALL-E, are known to produce highly detailed images. By iteratively refining their output, these models learn to generate new data samples that resemble samples in a training dataset, and have been used to create realistic-looking images. This minimal overhead of the additional diffusion model allows HART to retain the speed advantage of the autoregressive model while significantly enhancing its ability to generate intricate image details. Because the diffusion model only predicts the remaining details after the autoregressive model has done its job, it can accomplish the task in eight steps, instead of the usual 30 or more a standard diffusion model requires to generate an entire image.

Supporting sustainability, digital health, and the future of work

An influential 2015 paper on “algorithm aversion” found that people are less forgiving of AI-generated errors than of human errors, whereas a widely noted 2019 paper on “algorithm appreciation” found that people preferred advice from AI, compared to advice from humans. “There are differences in how these models work and how we think the human brain works, but I think there are also similarities. Generative AI chatbots are now being used in call centers to field questions from human customers, but this application underscores one potential red flag of implementing these models — worker displacement. As long as your data can be converted into this standard, token format, then in theory, you could apply these methods to generate new data that look similar. In 2014, a machine-learning architecture known as a generative adversarial network (GAN) was proposed by researchers at the University of Montreal.

“We’re starting to see machine learning as a system with structure that is a space we can explore rather than just guess our way through.” The table gives researchers a toolkit to design new algorithms without the need to rediscover ideas from prior approaches, says Shaden Alshammari, an MIT graduate student and lead author of a paper on this new framework. Just like the periodic table of chemical elements, which initially contained blank squares that were later filled in by scientists, the periodic table of machine learning also has empty spaces. Building on these insights, the researchers identified a unifying equation that underlies many classical AI algorithms. The new framework sheds light on how scientists could fuse strategies from different methods to improve existing AI models or come up with new ones.

What all of these approaches have in common is that they convert inputs into a set of tokens, which are numerical representations of chunks of data. This attention map helps the transformer understand context when it generates new text. In natural language processing, a transformer encodes each word in a corpus of text as a token and then generates an attention map, which captures each token’s relationships with all other tokens. This recurrence helps the model understand how to cut text into statistical chunks that have some predictability. And it has been trained on an enormous amount of data — in this case, much of the publicly available text on the internet.

Rather than falling into camps of techno-optimists and Luddites, people are discerning about the practical upshot of using AI, case by case. A new study finds that people are neither entirely enthusiastic nor totally averse to AI. We have the ability to think and dream in our heads, to come up with interesting ideas or plans, and I think generative AI is one of the tools that will empower agents to do that, as well,” Isola says. On the other side, Shah proposes that generative AI could empower artists, who could use generative tools to help them make creative content they might not otherwise have the means to produce. In addition, generative AI can inherit and proliferate biases that exist in training data, or amplify hate speech and false statements. The same way a generative model learns the dependencies of language, if it’s shown crystal structures instead, it can learn the relationships that make structures stable and realizable, he explains.

Leave a Reply

Your email address will not be published. Required fields are marked *