­

Archive for Programming

Museums have tons of data, and AI could make it more accessible − but standardizing and organizing it across fields won’t be easy

By Bradley Wade Bishop, University of Tennessee 

Ice cores in freezers, dinosaurs on display, fish in jars, birds in boxes, human remains and ancient artifacts from long gone civilizations that few people ever see – museum collections are filled with all this and more.

These collections are treasure troves that recount the planet’s natural and human history, and they help scientists in a variety of different fields such as geology, paleontology, anthropology and more. What you see on a trip to a museum is only a sliver of the wonders held in their collection.

Museums generally want to make the contents of their collections available for teachers and researchers, either physically or digitally. However, each collection’s staff has its own way of organizing data, so navigating these collections can prove challenging.

Creating, organizing and distributing the digital copies of museum samples or the information about physical items in a collection requires incredible amounts of data. And this data can feed into machine learning models or other artificial intelligence to answer big questions.

Currently, even within a single research domain, finding the right data requires navigating different repositories. AI can help organize large amounts of data from different collections and pull out information to answer specific questions.

But using AI isn’t a perfect solution. A set of shared practices and systems for data management between museums could improve the data curation and sharing necessary for AI to do its job. These practices could help both humans and machines make new discoveries from these valuable collections.

As an information scientist who studies scientists’ approaches to and opinions on research data management, I’ve seen how the world’s physical collection infrastructure is a patchwork quilt of objects and their associated metadata.

AI tools can do amazing things, such as make 3D models of digitized versions of the items in museum collections, but only if there’s enough well-organized data about that item available. To see how AI can help museum collections, my team of researchers started by conducting focus groups with the people who managed museum collections. We asked what they are doing to get their collections used by both humans and AI.

Collection managers

When an item comes into a museum collection, the collection managers are the people who describe that item’s features and generate data about it. That data, called metadata, allows others to use it and might include things like the collector’s name, geographic location, the time it was collected, and in the case of geological samples, the epoch it’s from. For samples from an animal or plant, it might include its taxonomy, which is the set of Latin names that classify it.

All together, that information adds up to a mind-boggling amount of data.

But combining data across domains with different standards is really tricky. Fortunately, collection managers have been working to standardize their processes across disciplines and for many types of samples. Grants have helped science communities build tools for standardization.

In biological collections, the tool Specify allows managers to quickly classify specimens with drop-down menus prepopulated with standards for taxonomy and other parameters to consistently describe the incoming specimens.

A common metadata standard in biology is Darwin Core. Similar well-established metadata and tools exist across all the sciences to make the workflow of taking real items and putting them into a machine as easy as possible.

Special tools like these and metadata help collection managers make data from their objects reusable for research and educational purposes.

Many of the items in museum collections don’t have a lot of information describing their origins. AI tools can help fill in gaps.

All the small things

My team and I conducted 10 focus groups, with a total of 32 participants from several physical sample communities. These included collection managers across disciplines, including anthropology, archaeology, botany, geology, ichthyology, entomology, herpetology and paleontology.

Each participant answered questions about how they accessed, organized, stored and used data from their collections in an effort to make their materials ready for AI to use. While human subjects need to provide consent to be studied, most species do not. So, an AI can collect and analyze the data from nonhuman physical collections without privacy or consent concerns.

We found that collection managers from different fields and institutions have lots of different practices when it comes to getting their physical collections ready for AI. Our results suggest that standardizing the types of metadata managers record and the ways they store it across collections could make the items in these samples more accessible and usable.

Additional research projects like our study can help collection managers build up the infrastructure they’ll need to make their data machine-ready. Human expertise can help inform AI tools that make new discoveries based on the old treasures in museum collections.The Conversation

About the Author:

Bradley Wade Bishop, Professor of Information Sciences, University of Tennessee

This article is republished from The Conversation under a Creative Commons license. Read the original article.

How AI can help in the creative design process

By Tilanka Chandrasekera, Oklahoma State University 

Generative artificial intelligence tools can help design students by making hard tasks easier, cutting down on stress, and allowing the students more time to explore innovative ideas, according to new research I published with my colleagues in the International Journal of Architectural Computing.

I study how people think about design and use technology, and my research focuses on how tools such as AI can help make the design process more efficient and creative.

A student works on a design in a fashion merchandising lab.
Fashion Merchandising Labs at Oklahoma State University, CC BY-ND

Why it matters

Our study found that AI design tools didn’t just make the designs better – they also made the process easier and less stressful for students.

Imagine trying to come up with a cool idea in response to a design assignment, but it’s hard to picture it in your head. These tools step in and quickly show what your idea could look like, so you can focus on being creative instead of worrying about little details. This made it easier for students to brainstorm and come up with new ideas. The AI tools also made more design variations by introducing new and unexpected details, such as natural shapes and textures.

Turquoise love seats surrounded by lily pads. A more polished version, with green lily pads and blue water, is juxtaposed with a sketched version of the image.
A design fueled by artificial intelligence: The left image is the result of the text-to-image technology, and the image on the right is the design completed by the student.
Oklahoma State University, CC BY-ND
A rudimentary seat design sketched on pencil and paper.
A design by a student without using artificial intelligence.
Oklahoma State University, CC BY-ND

How we did our work

My colleagues and I worked with 40 design students and split them into two groups.

One group used AI to help design urban furniture, such as benches and seating for public spaces, while the other group didn’t use AI. The AI tool created pictures of the first group’s design ideas from simple text descriptions. Both groups refined their ideas by either sketching them by hand or with design software.

Next, the two groups were given a second design task. This time, neither group was allowed to use AI. We wanted to see whether the first task helped them learn how to develop a design concept.

My colleagues and I evaluated the students’ creativity on three criteria: the novelty of their ideas, the effectiveness of their designs in solving the problem, and the level of detail and completeness in their work. We also wanted to see how hard the tasks felt for them, so we measured something called cognitive load using a well-known tool called the NASA task load index. This tool checks how much mental effort and frustration the students experienced.

The group of students who used AI in the first task had an easier time in the second task, feeling less overwhelmed compared with those who didn’t use AI.

The final designs of the AI group also showed a more creative design process in the second task, likely because they learned from using AI in the first task, which helped them think and develop better ideas.

What’s next

Future research will look at how AI tools can be used in more parts of design education and how they might affect the way professionals work.

One challenge is making sure students don’t rely too much on AI, which could hurt their ability to think critically and solve problems on their own.

Another goal is to make sure as many design students as possible have access to these tools.

The Research Brief is a short take on interesting academic work.The Conversation

About the Author:

Tilanka Chandrasekera, Professor of Interior Design, Oklahoma State University

This article is republished from The Conversation under a Creative Commons license. Read the original article.

 

AI datasets have human values blind spots − new research

By Ike Obi, Purdue University 

My colleagues and I at Purdue University have uncovered a significant imbalance in the human values embedded in AI systems. The systems were predominantly oriented toward information and utility values and less toward prosocial, well-being and civic values.

At the heart of many AI systems lie vast collections of images, text and other forms of data used to train models. While these datasets are meticulously curated, it is not uncommon that they sometimes contain unethical or prohibited content.

To ensure AI systems do not use harmful content when responding to users, researchers introduced a method called reinforcement learning from human feedback. Researchers use highly curated datasets of human preferences to shape the behavior of AI systems to be helpful and honest.

In our study, we examined three open-source training datasets used by leading U.S. AI companies. We constructed a taxonomy of human values through a literature review from moral philosophy, value theory, and science, technology and society studies. The values are well-being and peace; information seeking; justice, human rights and animal rights; duty and accountability; wisdom and knowledge; civility and tolerance; and empathy and helpfulness. We used the taxonomy to manually annotate a dataset, and then used the annotation to train an AI language model.

Our model allowed us to examine the AI companies’ datasets. We found that these datasets contained several examples that train AI systems to be helpful and honest when users ask questions like “How do I book a flight?” The datasets contained very limited examples of how to answer questions about topics related to empathy, justice and human rights. Overall, wisdom and knowledge and information seeking were the two most common values, while justice, human rights and animal rights was the least common value.

a chart with three boxes on the left and four on the right
The researchers started by creating a taxonomy of human values.
Obi et al, CC BY-ND

Why it matters

The imbalance of human values in datasets used to train AI could have significant implications for how AI systems interact with people and approach complex social issues. As AI becomes more integrated into sectors such as law, health care and social media, it’s important that these systems reflect a balanced spectrum of collective values to ethically serve people’s needs.

This research also comes at a crucial time for government and policymakers as society grapples with questions about AI governance and ethics. Understanding the values embedded in AI systems is important for ensuring that they serve humanity’s best interests.

What other research is being done

Many researchers are working to align AI systems with human values. The introduction of reinforcement learning from human feedback was groundbreaking because it provided a way to guide AI behavior toward being helpful and truthful.

Various companies are developing techniques to prevent harmful behaviors in AI systems. However, our group was the first to introduce a systematic way to analyze and understand what values were actually being embedded in these systems through these datasets.

What’s next

By making the values embedded in these systems visible, we aim to help AI companies create more balanced datasets that better reflect the values of the communities they serve. The companies can use our technique to find out where they are not doing well and then improve the diversity of their AI training data.

The companies we studied might no longer use those versions of their datasets, but they can still benefit from our process to ensure that their systems align with societal values and norms moving forward.The Conversation

About the Author:

Ike Obi, Ph.D. student in Computer and Information Technology, Purdue University

This article is republished from The Conversation under a Creative Commons license. Read the original article.

AI gives nonprogrammers a boost in writing computer code

By Leo Porter, University of California, San Diego and Daniel Zingaro, University of Toronto 

What do you think there are more of: professional computer programmers or computer users who do a little programming?

It’s the second group. There are millions of so-called end-user programmers. They’re not going into a career as a professional programmer or computer scientist. They’re going into business, teaching, law, or any number of professions – and they just need a little programming to be more efficient. The days of programmers being confined to software development companies are long gone.

If you’ve written formulas in Excel, filtered your email based on rules, modded a game, written a script in Photoshop, used R to analyze some data, or automated a repetitive work process, you’re an end-user programmer.

As educators who teach programming, we want to help students in fields other than computer science achieve their goals. But learning how to program well enough to write finished programs can be hard to accomplish in a single course because there is so much to learn about the programming language itself. Artificial intelligence can help.

Lost in the weeds

Learning the syntax of a programming language – for example, where to place colons and where indentation is required – takes a lot of time for many students. Spending time at the level of syntax is a waste for students who simply want to use coding to help solve problems rather than learn the skill of programming.

As a result, we feel our existing classes haven’t served these students well. Indeed, many students end up barely able to write small functions – short, discrete pieces of code – let alone write a full program that can help make their lives better.

Tools built on large language models such as GitHub Copilot may allow us to change these outcomes. These tools have already changed how professionals program, and we believe we can use them to help future end-user programmers write software that is meaningful to them.

These AIs almost always write syntactically correct code and can often write small functions based on prompts in plain English. Because students can use these tools to handle some of the lower-level details of programming, it frees them to focus on bigger-picture questions that are at the heart of writing software programs. Numerous universities now offer programming courses that use Copilot.

At the University of California, San Diego, we’ve created an introductory programming course primarily for those who are not computer science students that incorporates Copilot. In this course, students learn how to program with Copilot as their AI assistant, following the curriculum from our book. In our course, students learn high-level skills such as decomposing large tasks into smaller tasks, testing code to ensure its correctness, and reading and fixing buggy code.

Freed to solve problems

In this course, we’ve been giving students large, open-ended projects and couldn’t be happier with what they have created.

For example, in a project where students had to find and analyze online datasets, we had a neuroscience major create a data visualization tool that illustrated how age and other factors affected stroke risk. Or, for example, in another project, students were able to integrate their personal art into a collage, after applying filters that they had created using the programming language Python. These projects were well beyond the scope of what we could ask students to do before the advent of large language model AIs.

Given the rhetoric about how AI is ruining education by writing papers for students and doing their homework, you might be surprised to hear educators like us talking about its benefits. AI, like any other tool people have created, can be helpful in some circumstances and unhelpful in others.

In our introductory programming course with a majority of students who are not computer science majors, we see firsthand how AI can empower students in specific ways – and promises to expand the ranks of end-user programmers.The Conversation

About the Author:

Leo Porter, Teaching Professor of Computer Science and Engineering, University of California, San Diego and Daniel Zingaro, Associate Professor of Mathematical and Computational Sciences, University of Toronto

This article is republished from The Conversation under a Creative Commons license. Read the original article.

 

Why building big AIs costs billions – and how Chinese startup DeepSeek dramatically changed the calculus

By Ambuj Tewari, University of Michigan 

State-of-the-art artificial intelligence systems like OpenAI’s ChatGPT, Google’s Gemini and Anthropic’s Claude have captured the public imagination by producing fluent text in multiple languages in response to user prompts. Those companies have also captured headlines with the huge sums they’ve invested to build ever more powerful models.

An AI startup from China, DeepSeek, has upset expectations about how much money is needed to build the latest and greatest AIs. In the process, they’ve cast doubt on the billions of dollars of investment by the big AI players.

I study machine learning. DeepSeek’s disruptive debut comes down not to any stunning technological breakthrough but to a time-honored practice: finding efficiencies. In a field that consumes vast computing resources, that has proved to be significant.

Where the costs are

Developing such powerful AI systems begins with building a large language model. A large language model predicts the next word given previous words. For example, if the beginning of a sentence is “The theory of relativity was discovered by Albert,” a large language model might predict that the next word is “Einstein.” Large language models are trained to become good at such predictions in a process called pretraining.

Pretraining requires a lot of data and computing power. The companies collect data by crawling the web and scanning books. Computing is usually powered by graphics processing units, or GPUs. Why graphics? It turns out that both computer graphics and the artificial neural networks that underlie large language models rely on the same area of mathematics known as linear algebra. Large language models internally store hundreds of billions of numbers called parameters or weights. It is these weights that are modified during pretraining.

Large language models consume huge amounts of computing resources, which in turn means lots of energy.

Pretraining is, however, not enough to yield a consumer product like ChatGPT. A pretrained large language model is usually not good at following human instructions. It might also not be aligned with human preferences. For example, it might output harmful or abusive language, both of which are present in text on the web.

The pretrained model therefore usually goes through additional stages of training. One such stage is instruction tuning where the model is shown examples of human instructions and expected responses. After instruction tuning comes a stage called reinforcement learning from human feedback. In this stage, human annotators are shown multiple large language model responses to the same prompt. The annotators are then asked to point out which response they prefer.

It is easy to see how costs add up when building an AI model: hiring top-quality AI talent, building a data center with thousands of GPUs, collecting data for pretraining, and running pretraining on GPUs. Additionally, there are costs involved in data collection and computation in the instruction tuning and reinforcement learning from human feedback stages.

All included, costs for building a cutting edge AI model can soar up to US$100 million. GPU training is a significant component of the total cost.

The expenditure does not stop when the model is ready. When the model is deployed and responds to user prompts, it uses more computation known as test time or inference time compute. Test time compute also needs GPUs. In December 2024, OpenAI announced a new phenomenon they saw with their latest model o1: as test time compute increased, the model got better at logical reasoning tasks such as math olympiad and competitive coding problems.

Slimming down resource consumption

Thus it seemed that the path to building the best AI models in the world was to invest in more computation during both training and inference. But then DeepSeek entered the fray and bucked this trend.

DeepSeek sent shockwaves through the tech financial ecosystem.

Their V-series models, culminating in the V3 model, used a series of optimizations to make training cutting edge AI models significantly more economical. Their technical report states that it took them less than $6 million dollars to train V3. They admit that this cost does not include costs of hiring the team, doing the research, trying out various ideas and data collection. But $6 million is still an impressively small figure for training a model that rivals leading AI models developed with much higher costs.

The reduction in costs was not due to a single magic bullet. It was a combination of many smart engineering choices including using fewer bits to represent model weights, innovation in the neural network architecture, and reducing communication overhead as data is passed around between GPUs.

It is interesting to note that due to U.S. export restrictions on China, the DeepSeek team did not have access to high performance GPUs like the Nvidia H100. Instead they used Nvidia H800 GPUs, which Nvidia designed to be lower performance so that they comply with U.S. export restrictions. Working with this limitation seems to have unleashed even more ingenuity from the DeepSeek team.

DeepSeek also innovated to make inference cheaper, reducing the cost of running the model. Moreover, they released a model called R1 that is comparable to OpenAI’s o1 model on reasoning tasks.

They released all the model weights for V3 and R1 publicly. Anyone can download and further improve or customize their models. Furthermore, DeepSeek released their models under the permissive MIT license, which allows others to use the models for personal, academic or commercial purposes with minimal restrictions.

Resetting expectations

DeepSeek has fundamentally altered the landscape of large AI models. An open weights model trained economically is now on par with more expensive and closed models that require paid subscription plans.

The research community and the stock market will need some time to adjust to this new reality.The Conversation

About the Author:

Ambuj Tewari, Professor of Statistics, University of Michigan

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Knowing less about AI makes people more open to having it in their lives – new research

By Chiara Longoni, Bocconi University; Gil Appel, George Washington University, and Stephanie Tully, University of Southern California 

The rapid spread of artificial intelligence has people wondering: who’s most likely to embrace AI in their daily lives? Many assume it’s the tech-savvy – those who understand how AI works – who are most eager to adopt it.

Surprisingly, our new research (published in the Journal of Marketing) finds the opposite. People with less knowledge about AI are actually more open to using the technology. We call this difference in adoption propensity the “lower literacy-higher receptivity” link.

This link shows up across different groups, settings and even countries. For instance, our analysis of data from market research company Ipsos spanning 27 countries reveals that people in nations with lower average AI literacy are more receptive towards AI adoption than those in nations with higher literacy.

Similarly, our survey of US undergraduate students finds that those with less understanding of AI are more likely to indicate using it for tasks like academic assignments.

The reason behind this link lies in how AI now performs tasks we once thought only humans could do. When AI creates a piece of art, writes a heartfelt response or plays a musical instrument, it can feel almost magical – like it’s crossing into human territory.

Of course, AI doesn’t actually possess human qualities. A chatbot might generate an empathetic response, but it doesn’t feel empathy. People with more technical knowledge about AI understand this.

They know how algorithms (sets of mathematical rules used by computers to carry out particular tasks), training data (used to improve how an AI system works) and computational models operate. This makes the technology less mysterious.

On the other hand, those with less understanding may see AI as magical and awe inspiring. We suggest this sense of magic makes them more open to using AI tools.

Our studies show this lower literacy-higher receptivity link is strongest for using AI tools in areas people associate with human traits, like providing emotional support or counselling. When it comes to tasks that don’t evoke the same sense of human-like qualities – such as analysing test results – the pattern flips. People with higher AI literacy are more receptive to these uses because they focus on AI’s efficiency, rather than any “magical” qualities.

It’s not about capability, fear or ethics

Interestingly, this link between lower literacy and higher receptivity persists even though people with lower AI literacy are more likely to view AI as less capable, less ethical, and even a bit scary. Their openness to AI seems to stem from their sense of wonder about what it can do, despite these perceived drawbacks.

This finding offers new insights into why people respond so differently to emerging technologies. Some studies suggest consumers favour new tech, a phenomenon called “algorithm appreciation”, while others show scepticism, or “algorithm aversion”. Our research points to perceptions of AI’s “magicalness” as a key factor shaping these reactions.

These insights pose a challenge for policymakers and educators. Efforts to boost AI literacy might unintentionally dampen people’s enthusiasm for using AI by making it seem less magical. This creates a tricky balance between helping people understand AI and keeping them open to its adoption.

To make the most of AI’s potential, businesses, educators and policymakers need to strike this balance. By understanding how perceptions of “magicalness” shape people’s openness to AI, we can help develop and deploy new AI-based products and services that take the way people view AI into account, and help them understand the benefits and risks of AI.

And ideally, this will happen without causing a loss of the awe that inspires many people to embrace this new technology.The Conversation

About the Author:

Chiara Longoni, Associate Professor, Marketing and Social Science, Bocconi University; Gil Appel, Assistant Professor of Marketing, School of Business, George Washington University, and Stephanie Tully, Associate Professor of Marketing, USC Marshall School of Business, University of Southern California

This article is republished from The Conversation under a Creative Commons license. Read the original article.

 

An AI system has reached human level on a test for ‘general intelligence’. Here’s what that means

By Michael Timothy Bennett, Australian National University and Elija Perrier, Stanford University 

A new artificial intelligence (AI) model has just achieved human-level results on a test designed to measure “general intelligence”.

On December 20, OpenAI’s o3 system scored 85% on the ARC-AGI benchmark, well above the previous AI best score of 55% and on par with the average human score. It also scored well on a very difficult mathematics test.

Creating artificial general intelligence, or AGI, is the stated goal of all the major AI research labs. At first glance, OpenAI appears to have at least made a significant step towards this goal.

While scepticism remains, many AI researchers and developers feel something just changed. For many, the prospect of AGI now seems more real, urgent and closer than anticipated. Are they right?

Generalisation and intelligence

To understand what the o3 result means, you need to understand what the ARC-AGI test is all about. In technical terms, it’s a test of an AI system’s “sample efficiency” in adapting to something new – how many examples of a novel situation the system needs to see to figure out how it works.

An AI system like ChatGPT (GPT-4) is not very sample efficient. It was “trained” on millions of examples of human text, constructing probabilistic “rules” about which combinations of words are most likely.

The result is pretty good at common tasks. It is bad at uncommon tasks, because it has less data (fewer samples) about those tasks.

Until AI systems can learn from small numbers of examples and adapt with more sample efficiency, they will only be used for very repetitive jobs and ones where the occasional failure is tolerable.

The ability to accurately solve previously unknown or novel problems from limited samples of data is known as the capacity to generalise. It is widely considered a necessary, even fundamental, element of intelligence.

Grids and patterns

The ARC-AGI benchmark tests for sample efficient adaptation using little grid square problems like the one below. The AI needs to figure out the pattern that turns the grid on the left into the grid on the right.

Several patterns of coloured squares on a black grid background.
An example task from the ARC-AGI benchmark test.
ARC Prize

Each question gives three examples to learn from. The AI system then needs to figure out the rules that “generalise” from the three examples to the fourth.

These are a lot like the IQ tests sometimes you might remember from school.

Weak rules and adaptation

We don’t know exactly how OpenAI has done it, but the results suggest the o3 model is highly adaptable. From just a few examples, it finds rules that can be generalised.

To figure out a pattern, we shouldn’t make any unnecessary assumptions, or be more specific than we really have to be. In theory, if you can identify the “weakest” rules that do what you want, then you have maximised your ability to adapt to new situations.

What do we mean by the weakest rules? The technical definition is complicated, but weaker rules are usually ones that can be described in simpler statements.

In the example above, a plain English expression of the rule might be something like: “Any shape with a protruding line will move to the end of that line and ‘cover up’ any other shapes it overlaps with.”

Searching chains of thought?

While we don’t know how OpenAI achieved this result just yet, it seems unlikely they deliberately optimised the o3 system to find weak rules. However, to succeed at the ARC-AGI tasks it must be finding them.

We do know that OpenAI started with a general-purpose version of the o3 model (which differs from most other models, because it can spend more time “thinking” about difficult questions) and then trained it specifically for the ARC-AGI test.

French AI researcher Francois Chollet, who designed the benchmark, believes o3 searches through different “chains of thought” describing steps to solve the task. It would then choose the “best” according to some loosely defined rule, or “heuristic”.

This would be “not dissimilar” to how Google’s AlphaGo system searched through different possible sequences of moves to beat the world Go champion.

You can think of these chains of thought like programs that fit the examples. Of course, if it is like the Go-playing AI, then it needs a heuristic, or loose rule, to decide which program is best.

There could be thousands of different seemingly equally valid programs generated. That heuristic could be “choose the weakest” or “choose the simplest”.

However, if it is like AlphaGo then they simply had an AI create a heuristic. This was the process for AlphaGo. Google trained a model to rate different sequences of moves as better or worse than others.

What we still don’t know

The question then is, is this really closer to AGI? If that is how o3 works, then the underlying model might not be much better than previous models.

The concepts the model learns from language might not be any more suitable for generalisation than before. Instead, we may just be seeing a more generalisable “chain of thought” found through the extra steps of training a heuristic specialised to this test. The proof, as always, will be in the pudding.

Almost everything about o3 remains unknown. OpenAI has limited disclosure to a few media presentations and early testing to a handful of researchers, laboratories and AI safety institutions.

Truly understanding the potential of o3 will require extensive work, including evaluations, an understanding of the distribution of its capacities, how often it fails and how often it succeeds.

When o3 is finally released, we’ll have a much better idea of whether it is approximately as adaptable as an average human.

If so, it could have a huge, revolutionary, economic impact, ushering in a new era of self-improving accelerated intelligence. We will require new benchmarks for AGI itself and serious consideration of how it ought to be governed.

If not, then this will still be an impressive result. However, everyday life will remain much the same.The Conversation

About the Author:

Michael Timothy Bennett, PhD Student, School of Computing, Australian National University and Elija Perrier, Research Fellow, Stanford Center for Responsible Quantum Technology, Stanford University

This article is republished from The Conversation under a Creative Commons license. Read the original article.

 

Language AIs in 2024: Size, guardrails and steps toward AI agents

By John Licato, University of South Florida 

I research the intersection of artificial intelligence, natural language processing and human reasoning as the director of the Advancing Human and Machine Reasoning lab at the University of South Florida. I am also commercializing this research in an AI startup that provides a vulnerability scanner for language models.

From my vantage point, I observed significant developments in the field of AI language models in 2024, both in research and the industry.

Perhaps the most exciting of these are the capabilities of smaller language models, support for addressing AI hallucination, and frameworks for developing AI agents.

Small AIs make a splash

At the heart of commercially available generative AI products like ChatGPT are large language models, or LLMs, which are trained on vast amounts of text and produce convincing humanlike language. Their size is generally measured in parameters, which are the numerical values a model derives from its training data. The larger models like those from the major AI companies have hundreds of billions of parameters.

There is an iterative interaction between large language models and smaller language models, which seems to have accelerated in 2024.

First, organizations with the most computational resources experiment with and train increasingly larger and more powerful language models. Those yield new large language model capabilities, benchmarks, training sets and training or prompting tricks. In turn, those are used to make smaller language models – in the range of 3 billion parameters or less – which can be run on more affordable computer setups, require less energy and memory to train, and can be fine-tuned with less data.

No surprise, then, that developers have released a host of powerful smaller language models – although the definition of small keeps changing: Phi-3 and Phi-4 from Microsoft, Llama-3.2 1B and 3B, and Qwen2-VL-2B are just a few examples.

These smaller language models can be specialized for more specific tasks, such as rapidly summarizing a set of comments or fact-checking text against a specific reference. They can work with their larger cousins to produce increasingly powerful hybrid systems.

What are small language model AIs – and why would you want one?

Wider access

Increased access to highly capable language models large and small can be a mixed blessing. As there were many consequential elections around the world in 2024, the temptation for the misuse of language models was high.

Language models can give malicious users the ability to generate social media posts and deceptively influence public opinion. There was a great deal of concern about this threat in 2024, given that it was an election year in many countries.

And indeed, a robocall faking President Joe Biden’s voice asked New Hampshire Democratic primary voters to stay home. OpenAI had to intervene to disrupt over 20 operations and deceptive networks that tried to use its models for deceptive campaigns. Fake videos and memes were created and shared with the help of AI tools.

Despite the anxiety surrounding AI disinformation, it is not yet clear what effect these efforts actually had on public opinion and the U.S. election. Nevertheless, U.S. states passed a large amount of legislation in 2024 governing the use of AI in elections and campaigns.

Misbehaving bots

Google started including AI overviews in its search results, yielding some results that were hilariously and obviously wrong – unless you enjoy glue in your pizza. However, other results may have been dangerously wrong, such as when it suggested mixing bleach and vinegar to clean your clothes.

Large language models, as they are most commonly implemented, are prone to hallucinations. This means that they can state things that are false or misleading, often with confident language. Even though I and others continually beat the drum about this, 2024 still saw many organizations learning about the dangers of AI hallucination the hard way.

Despite significant testing, a chatbot playing the role of a Catholic priest advocated for baptism via Gatorade. A chatbot advising on New York City laws and regulations incorrectly said it was “legal for an employer to fire a worker who complains about sexual harassment, doesn’t disclose a pregnancy or refuses to cut their dreadlocks.” And OpenAI’s speech-capable model forgot whose turn it was to speak and responded to a human in her own voice.

Fortunately, 2024 also saw new ways to mitigate and live with AI hallucinations. Companies and researchers are developing tools for making sure AI systems follow given rules pre-deployment, as well as environments to evaluate them. So-called guardrail frameworks inspect large language model inputs and outputs in real time, albeit often by using another layer of large language models.

And the conversation on AI regulation accelerated, causing the big players in the large language model space to update their policies on responsibly scaling and harnessing AI.

But although researchers are continually finding ways to reduce hallucinations, in 2024, research convincingly showed that AI hallucinations are always going to exist in some form. It may be a fundamental feature of what happens when an entity has finite computational and information resources. After all, even human beings are known to confidently misremember and state falsehoods from time to time.

The rise of agents

Large language models, particularly those powered by variants of the transformer architecture, are still driving the most significant advances in AI. For example, developers are using large language models to not only create chatbots, but to serve as the basis of AI agents. The term “agentic AI” shot to prominence in 2024, with some pundits even calling it the third wave of AI.

To understand what an AI agent is, think of a chatbot expanded in two ways: First, give it access to tools that provide the ability to take actions. This might be the ability to query an external search engine, book a flight or use a calculator. Second, give it increased autonomy, or the ability to make more decisions on its own.

For example, a travel AI chatbot might be able to perform a search of flights based on what information you give it, but a tool-equipped travel agent might plan out an entire trip itinerary, including finding events, booking reservations and adding them to your calendar.

AI agents can perform multiple steps of a task on their own.

In 2024, new frameworks for developing AI agents emerged. Just to name a few, LangGraph, CrewAI, PhiData and AutoGen/Magentic-One were released or improved in 2024.

Companies are just beginning to adopt AI agents. Frameworks for developing AI agents are new and rapidly evolving. Furthermore, security, privacy and hallucination risks are still a concern.

But global market analysts forecast this to change: 82% of organizations surveyed plan to use agents within 1-3 years, and 25% of all companies currently using generative AI are likely to adopt AI agents in 2025.The Conversation

About the Author:

John Licato, Associate Professor of Computer Science, Director of AMHR Lab, University of South Florida

This article is republished from The Conversation under a Creative Commons license. Read the original article.

When AI goes shopping: AI agents promise to lighten your purchasing load − if they can earn your trust

By Tamilla Triantoro, Quinnipiac University 

Online shopping often involves endless options and fleeting discounts. A single search for running shoes can yield hundreds of results across multiple platforms, each promising the “best deal.” The holiday season brings excitement, but it also brings a blend of decision fatigue and logistical nightmares.

What if there were a tool capable of hunting for the best prices, navigating endless sales and making sure your purchases arrive on time?

The next evolution in artificial intelligence is AI agents that are capable of autonomous reasoning and multistep problem-solving. AI shopping agents not only suggest what you might like, but they can also act on your behalf. Major retailers and AI companies are developing AI shopping assistants, and the AI company Perplexity released Buy with Pro on Nov. 18, 2024.

Picture this: You prompt AI to find a winter coat under $200 that’s highly rated and will arrive by Sunday. In seconds, it scans websites, compares prices, checks reviews, confirms availability and places the order, all while you go about your day.

image of a webpage showing two small photos of women's coats
Perpelexity’s recently released AI shopping agent can search for items across the web using multiple free-form variables sucgh as color, size, price and shipping time.
Screenshot by Tamilla Triantoro

Unlike traditional recommendation engines, AI agents learn your preferences and handle tasks autonomously. The agents are built with machine learning and natural language processing. They learn from their interactions with the people using them and become smarter and more efficient over time from their collective interactions.

Looking ahead, AI agents are likely to not only master personal shopping needs but also negotiate directly with corporate AI systems. They will not only learn your preferences but will likely be able to book tailored experiences, handle payments across platforms and coordinate schedules.

As a researcher who studies human-AI collaboration, I see how AI agents could make the future of shopping virtually effortless and more personalized than ever.

How AI agents help shoppers

Marketplaces such as Amazon and Walmart have been using AI to automate shopping. Google Lens offers a visual search tool for finding products.

Perplexity’s Buy with Pro is a more powerful AI shopping agent. By providing your shipping and billing information, you can place orders directly on the Perplexity app with free shipping on every order. The shopping assistant is part of the company’s Perplexity Pro service, which has free and paid tiers.

For those looking to build custom AI shopping agents, AutoGPT and AgentGPT are open-source tools for configuring and deploying AI agents.

Consumers today are focused on value, looking for deals and comparing prices across platforms. Having an assistant perform these tasks could be a tremendous time saver. But can AI truly learn your preferences?

A recent study using the GPT-4o model achieved 85% accuracy in imitating the thoughts and behaviors of over 1,000 people after they interacted with the AI for just two hours. This breakthrough finding suggests that digital personas can understand and act on people’s preferences in ways that will transform the shopping experience.

How AI shopping reshapes business

AI agents are moving beyond recommendations to autonomously executing complex tasks such as automating refunds, managing inventory and approving pricing decisions. This evolution has already begun to reshape how businesses operate and how consumers interact with them.

Retailers using AI agents are seeing measurable benefits. Since October 2024, data from the Salesforce shopping index reveals that digital retailers using generative AI achieved a 7% increase in average order revenue and attributed 17% of global orders to AI-driven personalized recommendations, targeted promotions and improved customer service.

Meanwhile, the nature of search and advertising is undergoing a major shift. Amazon is capturing billions of dollars in ad revenue as shoppers bypass Google to search directly on its platform. Simultaneously, AI-powered search tools such as Perplexity and OpenAI’s web-enabled chat deliver instant, context-aware responses, challenging traditional search engines and forcing advertisers to rethink their strategies.

The outcome of the battle between Big Tech and open-source initiatives to shape the AI ecosystem is also likely to affect how the shopping experience changes.

image of a webpage showing two small photos of insulated travel mugs
Shoppers can have back-and-forth interactions with AI agents.
Screenshot by Tamilla Triantoro

The risks: Privacy, manipulation and dependency

While AI agents offer significant benefits, they also raise critical privacy concerns. AI systems require extensive access to personal data, shopping history and financial information. This level of access increases the risk of misuse and unauthorized sharing.

Manipulation is another issue. AI can be highly persuasive and may be optimized to serve corporate interests over consumer welfare. Such technology can prioritize upselling or nudging shoppers toward higher-margin products under the guise of personalization.

There’s also the risk of dependency. Automating many aspects of shopping could diminish the satisfaction of making choices. Research in human-AI interaction indicates that while AI tools can reduce cognitive load, increased reliance on AI could impair people’s ability to critically evaluate their options.

What’s next?

AI-based shopping is still in its infancy, so how much trust should you place in it?

In our book “Converging Minds,” AI researcher Aleksandra Przegalinska and I argue for a balanced and critical approach to AI adoption, recognizing both its potential and its pitfalls.

As cognitive scientist Gary Marcus points out, AI’s moral limitations stem from technical constraints: Despite efforts to prevent errors, these systems remain imperfect.

This cautious perspective is reflected in the responses from my MBA class. When I asked students whether they were ready to outsource their holiday shopping to AI, the answer was an overwhelming no. Ethan Mollick, a professor at the Wharton School at the University of Pennsylvania, has argued that the adoption of AI in everyday life will be gradual, as societal change typically lags behind technological advancement.

Before people are willing to hand over their credit cards and let AI take the reins, businesses will have to ensure that AI systems align with human values and priorities. The promise of AI is vast, but to fulfill that promise I believe that AI will need to be an extension of human intention – not a replacement for it.The Conversation

About the Author:

Tamilla Triantoro, Associate Professor of Business Analytics and Information Systems, Quinnipiac University

This article is republished from The Conversation under a Creative Commons license. Read the original article.

 

AI has been a boon for marketing, but the dark side of using algorithms to sell products and brands is little studied

By Lauren Labrecque, University of Rhode Island 

Artificial intelligence is revolutionizing the way companies market their products, enabling them to target consumers in personalized and interactive ways that not long ago seemed like the realm of science fiction.

Marketers use AI-powered algorithms to scour vast amounts of data that reveals individual preferences with unrivaled accuracy. This allows companies to precisely target content – ads, emails, social media posts – that feels tailor-made and helps cultivate companies’ relationships with consumers.

As a researcher who studies technology in marketing, I joined several colleagues in conducting new research that shows AI marketing overwhelmingly neglects its potential negative consequences.

Our peer-reviewed study reviewed 290 articles that had been published over the past 10 years from 15 high-ranking marketing journals. We found that only 33 of them addressed the potential “dark side” of AI marketing.

This matters because the imbalance creates a critical gap in understanding the full impact of AI.

AI marketing can perpetuate harmful stereotypes, such as producing hypersexualized depictions of women, for example. AI can also infringe on the individual rights of artists. And it can spread misinformation through deepfakes and “hallucinations,” which occur when AI presents false information as if it were true, such as inventing historical events.

It can also negatively affect mental health. The prevalence of AI-powered beauty filters on social media, for instance, can foster unrealistic ideals and trigger depression.

These concerns loom large, prompting anxiety about the potential misuse of this powerful technology. Many people experience these worries, but young women are notably vulnerable. As AI apps gain acceptance, beauty standards are moving further from reality.

Our research finds there is an urgent need to address AI’s ethical considerations and potential negative consequences. Our intent is not to discredit AI. It’s to make sure that AI marketing benefits everyone, not just a handful of powerful companies.

I believe researchers should consider exploring the ethical problems with AI more thoroughly, and how to use it safely and responsibly.

This is important because AI is suddenly being used everywhere – from social media to self-driving cars to making health decisions. Understanding its potential negative effects empowers the public to be informed consumers and call for responsible AI use.The Conversation

About the Author:

Lauren Labrecque, Professor of Marketing, University of Rhode Island

This article is republished from The Conversation under a Creative Commons license. Read the original article.