Archive for Programming – Page 5

Noise in the brain enables us to make extraordinary leaps of imagination. It could transform the power of computers too

By Tim Palmer, University of Oxford 

We all have to make hard decisions from time to time. The hardest of my life was whether or not to change research fields after my PhD, from fundamental physics to climate physics. I had job offers that could have taken me in either direction – one to join Stephen Hawking’s Relativity and Gravitation Group at Cambridge University, another to join the Met Office as a scientific civil servant.

I wrote down the pros and cons of both options as one is supposed to do, but then couldn’t make up my mind at all. Like Buridan’s donkey, I was unable to move to either the bale of hay or the pail of water. It was a classic case of paralysis by analysis.

Since it was doing my head in, I decided to try to forget about the problem for a couple of weeks and get on with my life. In that intervening time, my unconscious brain decided for me. I simply walked into my office one day and the answer had somehow become obvious: I would make the change to studying the weather and climate.

More than four decades on, I’d make the same decision again. My fulfilling career has included developing a new, probabilistic way of forecasting weather and climate which is helping humanitarian and disaster relief agencies make better decisions ahead of extreme weather events. (This and many other aspects are described in my new book, The Primacy of Doubt.)

But I remain fascinated by what was going on in my head back then, which led my subconscious to make a life-changing decision that my conscious could not. Is there something to be understood here not only about how to make difficult decisions, but about how humans make the leaps of imagination that characterise us as such a creative species? I believe the answer to both questions lies in a better understanding of the extraordinary power of noise.

Imprecise supercomputers

I went from the pencil-and-paper mathematics of Einstein’s theory of general relativity to running complex climate models on some of the world’s biggest supercomputers. Yet big as they were, they were never big enough – the real climate system is, after all, very complex.

In the early days of my research, one only had to wait a couple of years and top-of-the-range supercomputers would get twice as powerful. This was the era where transistors were getting smaller and smaller, allowing more to be crammed on to each microchip. The consequent doubling of computer performance for the same power every couple of years was known as Moore’s Law.


This story is part of Conversation Insights

The Insights team generates long-form journalism and is working with academics from different backgrounds who have been engaged in projects to tackle societal and scientific challenges.


There is, however, only so much miniaturisation you can do before the transistor starts becoming unreliable in its key role as an on-off switch. Today, with transistors starting to approach atomic size, we have pretty much reached the limit of Moore’s Law. To achieve more number-crunching capability, computer manufacturers must bolt together more and more computing cabinets, each one crammed full of chips.

But there’s a problem. Increasing number-crunching capability this way requires a lot more electric power – modern supercomputers the size of tennis courts consume tens of megawatts. I find it something of an embarrassment that we need so much energy to try to accurately predict the effects of climate change.

That’s why I became interested in how to construct a more accurate climate model without consuming more energy. And at the heart of this is an idea that sounds counterintuitive: by adding random numbers, or “noise”, to a climate model, we can actually make it more accurate in predicting the weather.

A constructive role for noise

Noise is usually seen as a nuisance – something to be minimised wherever possible. In telecommunications, we speak about trying to maximise the “signal-to-noise ratio” by boosting the signal or reducing the background noise as much as possible. However, in nonlinear systems, noise can be your friend and actually contribute to boosting a signal. (A nonlinear system is one whose output does not vary in direct proportion to the input. You will likely be very happy to win £100 million on the lottery, but probably not twice as happy to win £200 million.)

Noise can, for example, help us find the maximum value of a complicated curve such as in Figure 1, below. There are many situations in the physical, biological and social sciences as well as in engineering where we might need to find such a maximum. In my field of meteorology, the process of finding the best initial conditions for a global weather forecast involves identifying the maximum point of a very complicated meteorological function.

Figure 1

A curve with multiple local peaks and troughs
A curve with multiple local peaks and troughs.
Author provided

However, employing a “deterministic algorithm” to locate the global maximum doesn’t usually work. This type of algorithm will typically get stuck at a local peak (for example at point a) because the curve moves downwards in both directions from there.

An answer is to use a technique called “simulated annealing” – so called because of its similarities with (annealing), the heat treatment process that changes the properties of metals. Simulated annealing, which employs noise to get round the issue of getting stuck at local peaks, has been used to solve many problems including the classic travelling salesman puzzle of finding the shortest path between a large number of cities on a map.

Figure 1 shows a possible route to locating the curve’s global maximum (point 9) by using the following criteria:

  • If a randomly chosen point is higher than the current position on the curve, then the new point is always moved to.
  • If it is lower than the current position, the suggested point isn’t necessarily rejected. It depends whether the new point is a lot lower or just a little lower.

However, the decision to move to a new point also depends on how long the analysis has been running. Whereas in the early stages, random points quite a bit lower than the current position may be accepted, in later stages only those that are higher or just a tiny bit lower are accepted.

The technique is known as simulated annealing because early on – like hot metal in the early phase of cooling – the system is pliable and changeable. Later in the process – like cold metal in the late phase of cooling – it is almost rigid and unchangeable.

How noise can help climate models

Noise was introduced into comprehensive weather and climate models around 20 years ago. A key reason was to represent model uncertainty in our ensemble weather forecasts – but it turned out that adding noise also reduced some of the biases the models had, making them more accurate simulators of weather and climate.

Unfortunately, these models require huge supercomputers and a lot of energy to run them. They divide the world into small gridboxes, with the atmosphere and ocean within each assumed to be constant – which, of course, it isn’t. The horizontal scale of a typical gridbox is around 100km – so one way of making a model more accurate is to reduce this distance to 50km, or 10km or 1km. However, halving the volume of a gridbox increases the computational cost of running the model by up to a factor of 16, meaning it consumes a lot more energy.

Here again, noise offered an appealing alternative. The proposal was to use it to represent the unpredictable (and unmodellable) variations in small-scale climatic processes like turbulence, cloud systems, ocean eddies and so on. I argued that adding noise could be a way of boosting accuracy without having to incur the enormous computational cost of reducing the size of the gridboxes. For example, as has now been verified, adding noise to a climate model increases the likelihood of producing extreme hurricanes – reflecting the potential reality of a world whose weather is growing more extreme due to climate change.

The computer hardware we use for this modelling is inherently noisy – electrons travelling along wires in a computer move in partly random ways due to its warm environment. Such randomness is called “thermal noise”. Could we save even more energy by tapping into it, rather than having to use software to generate pseudo-random numbers? To me, low-energy “imprecise” supercomputers that are inherently noisy looked like a win-win proposal.

But not all of my colleagues were convinced. They were uncomfortable that computers might not give the same answers from one day to the next. To try to persuade them, I began to think about other real-world systems that, because of limited energy availability, also use noise that is generated within their hardware. And I stumbled on the human brain.

Noise in the brain

Every second of the waking day, our eyes alone send gigabytes of data to the brain. That’s not much different to the amount of data a climate model produces each time it outputs data to memory.

The brain has to process this data and somehow make sense of it. If it did this using the power of a supercomputer, that would be impressive enough. But it does it using one millionth of that power, about 20W instead of 20MW – what it takes to power a lightbulb. Such energy efficiency is mind-bogglingly impressive. How on Earth does the brain do it?

An adult brain contains some 80 billion neurons. Each neuron has a long slender biological cable – the axon – along which electrical impulses are transmitted from one set of neurons to the next. But these impulses, which collectively describe information in the brain, have to be boosted by protein “transistors” positioned at regular intervals along the axons. Without them, the signal would dissipate and be lost.

The energy for these boosts ultimately comes from an organic compound in the blood called ATP (adenosine triphosphate). This enables electrically charged atoms of sodium and potassium (ions) to be pushed through small channels in the neuron walls, creating electrical voltages which, much like those in silicon transistors, amplify the neuronal electric signals as they travel along the axons.

With 20W of power spread across tens of billions of neurons, the voltages involved are tiny, as are the axon cables. And there is evidence that axons with a diameter less than about 1 micron (which most in the brain are) are susceptible to noise. In other words, the brain is a noisy system.

If this noise simply created unhelpful “brain fog”, one might wonder why we evolved to have so many slender axons in our heads. Indeed, there are benefits to having fatter axons: the signals propagate along them faster. If we still needed fast reaction times to escape predators, then slender axons would be disadvantageous. However, developing communal ways of defending ourselves against enemies may have reduced the need for fast reaction times, leading to an evolutionary trend towards thinner axons.

Perhaps, serendipitously, evolutionary mutations that further increased neuron numbers and reduced axon sizes, keeping overall energy consumption the same, made the brain’s neurons more susceptible to noise. And there is mounting evidence that this had another remarkable effect: it encouraged in humans the ability to solve problems that required leaps in imagination and creativity.

Perhaps we only truly became Homo Sapiens when significant noise began to appear in our brains?

Putting noise in the brain to good use

Many animals have developed creative approaches to solving problems, but there is nothing to compare with a Shakespeare, a Bach or an Einstein in the animal world.

How do creative geniuses come up with their ideas? Here’s a quote from Andrew Wiles, perhaps the most famous mathematician alive today, about the time leading up to his celebrated proof of the maths problem (misleadingly) known as Fermat’s Last Theorem:

When you reach a real impasse, then routine mathematical thinking is of no use to you. Leading up to that kind of new idea, there has to be a long period of tremendous focus on the problem without any distraction. You have to really think about nothing but that problem – just concentrate on it. And then you stop. [At this point] there seems to be a period of relaxation during which the subconscious appears to take over – and it’s during this time that some new insight comes.

BBC’s Horizon unpicks Andrew Wiles’s novel approach to solving Fermat’s Theorem.

This notion seems universal. Physics Nobel Laureate Roger Penrose has spoken about his “Eureka moment” when crossing a busy street with a colleague (perhaps reflecting on their conversation while also looking out for oncoming traffic). For the father of chaos theory Henri Poincaré, it was catching a bus.

And it’s not just creativity in mathematics and physics. Comedian John Cleese, of Monty Python fame, makes much the same point about artistic creativity – it occurs not when you are focusing hard on your trade, but when you relax and let your unconscious mind wander.

Of course, not all the ideas that bubble up from your subconscious are going to be Eureka moments. Physicist Michael Berry talks about these subconscious ideas as if they are elementary particles called “claritons”:

Actually, I do have a contribution to particle physics … the elementary particle of sudden understanding: the “clariton”. Any scientist will recognise the “aha!” moment when this particle is created. But there is a problem: all too frequently, today’s clariton is annihilated by tomorrow’s “anticlariton”. So many of our scribblings disappear beneath a rubble of anticlaritons.

Here is something we can all relate to: that in the cold light of day, most of our “brilliant” subconscious ideas get annihilated by logical thinking. Only a very, very, very small number of claritons remain after this process. But the ones that do are likely to be gems.

In his renowned book Thinking Fast and Slow, the Nobel prize-winning psychologist Daniel Kahneman describes the brain in a binary way. Most of the time when walking, chatting and looking around (in other words when multitasking), it operates in a mode Kahneman calls “system 1” – a rather fast, automatic, effortless mode of operation.

By contrast, when we are thinking hard about a specific problem (unitasking), the brain is in the slower, more deliberative and logical “system 2”. To perform a calculation like 37×13, we have to stop walking, stop talking, close our eyes and even put our hands over our ears. No chance for significant multitasking in system 2.

My 2015 paper with computational neuroscientist Michael O’Shea interpreted system 1 as a mode where available energy is spread across a large number of active neurons, and system 2 as where energy is focused on a smaller number of active neurons. The amount of energy per active neuron is therefore much smaller when in the system 1 mode, and it would seem plausible that the brain is more susceptible to noise when in this state. That is, in situations when we are multitasking, the operation of any one of the neurons will be most susceptible to the effects of noise in the brain.

Berry’s picture of clariton-anticlariton interaction seems to suggest a model of the brain where the noisy system 1 and the deterministic system 2 act in synergy. The anticlariton is the logical analysis that we perform in system 2 which, most of the time, leads us to reject our crazy system 1 ideas.

But sometimes one of these ideas turns out to be not so crazy.

This is reminiscent of how our simulated annealing analysis (Figure 1) works. Initially, we might find many “crazy” ideas appealing. But as we get closer to locating the optimal solution, the criteria for accepting a new suggestion becomes more stringent and discerning. Now, system 2 anticlaritons are annihilating almost everything the system 1 claritons can throw at them – but not quite everything, as Wiles found to his great relief.

The key to creativity

If the key to creativity is the synergy between noisy and deterministic thinking, what are some consequences of this?

On the one hand, if you do not have the necessary background information then your analytic powers will be depleted. That’s why Wiles says that leading up to the moment of insight, you have to immerse yourself in your subject. You aren’t going to have brilliant ideas which will revolutionise quantum physics unless you have a pretty good grasp of quantum physics in the first place.

But you also need to leave yourself enough time each day to do nothing much at all, to relax and let your mind wander. I tell my research students that if they want to be successful in their careers, they shouldn’t spend every waking hour in front of their laptop or desktop. And swapping it for social media probably doesn’t help either, since you still aren’t really multitasking – each moment you are on social media, your attention is still fixed on a specific issue.

But going for a walk or bike ride or painting a shed probably does help. Personally, I find that driving a car is a useful activity for coming up with new ideas and thoughts – provided you don’t turn the radio on.

When making difficult decisions, this suggests that, having listed all the pros and cons, it can be helpful not to actively think about the problem for a while. I think this explains how, years ago, I finally made the decision to change my research direction – not that I knew it at the time.

Because the brain’s system 1 is so energy efficient, we use it to make the vast majority of the many decisions in our daily lives (some say as many as 35,000) – most of which aren’t that important, like whether to continue putting one leg in front of the other as we walk down to the shops. (I could alternatively stop after each step, survey my surroundings to make sure a predator was not going to jump out and attack me, and on that basis decide whether to take the next step.)

However, this system 1 thinking can sometimes lead us to make bad decisions, because we have simply defaulted to this low-energy mode and not engaged system 2 when we should have. How many times do we say to ourselves in hindsight: “Why didn’t I give such and such a decision more thought?”

Of course, if instead we engaged system 2 for every decision we had to make, then we wouldn’t have enough time or energy to do all the other important things we have to do in our daily lives (so the shops may have shut by the time we reach them).

From this point of view, we should not view giving wrong answers to unimportant questions as evidence of irrationality. Kahneman cites the fact that more than 50% of students at MIT, Harvard and Princeton gave the incorrect answer to this simple question – a bat and ball costs $1.10; the bat costs one dollar more than the ball; how much does the ball cost? – as evidence of our irrationality. The correct answer, if you think about it, is 5 cents. But system 1 screams out ten cents.

If we were asked this question on pain of death, one would hope we would spend enough thought to come up with the correct answer. But if we were asked the question as part of an anonymous after-class test, when we had much more important things to spend time and energy doing, then I’d be inclined to think of it as irrational to give the right answer.

If we had 20MW to run the brain, we could spend part of it solving unimportant problems. But we only have 20W and we need to use it carefully. Perhaps it’s the 50% of MIT, Harvard and Princeton students who gave the wrong answer who are really the clever ones.

Just as a climate model with noise can produce types of weather that a model without noise can’t, so a brain with noise can produce ideas that a brain without noise can’t. And just as these types of weather can be exceptional hurricanes, so the idea could end up winning you a Nobel Prize.

So, if you want to increase your chances of achieving something extraordinary, I’d recommend going for that walk in the countryside, looking up at the clouds, listening to the birds cheeping, and thinking about what you might eat for dinner.

So could computers be creative?

Will computers, one day, be as creative as Shakespeare, Bach or Einstein? Will they understand the world around us as we do? Stephen Hawking famously warned that AI will eventually take over and replace mankind.

However, the best-known advocate of the idea that computers will never understand as we do is Hawking’s old colleague, Roger Penrose. In making his claim, Penrose invokes an important “meta” theorem in mathematics known as Gödel’s theorem, which says there are mathematical truths that can’t be proven by deterministic algorithms.

There is a simple way of illustrating Gödel’s theorem. Suppose we make a list of all the most important mathematical theorems that have been proven since the time of the ancient Greeks. First on the list would be Euclid’s proof that there are an infinite number of prime numbers, which requires one really creative step (multiply the supposedly finite number of primes together and add one). Mathematicians would call this a “trick” – shorthand for a clever and succinct mathematical construction.

But is this trick useful for proving important theorems further down the list, like Pythagoras’s proof that the square root of two cannot be expressed as the ratio of two whole numbers? It’s clearly not; we need another trick for that theorem. Indeed, as you go down the list, you’ll find that a new trick is typically needed to prove each new theorem. It seems there is no end to the number of tricks that mathematicians will need to prove their theorems. Simply loading a given set of tricks on a computer won’t necessarily make the computer creative.

Does this mean mathematicians can breathe easily, knowing their jobs are not going to be taken over by computers? Well maybe not.

I have been arguing that we need computers to be noisy rather than entirely deterministic, “bit-reproducible” machines. And noise, especially if it comes from quantum mechanical processes, would break the assumptions of Gödel’s theorem: a noisy computer is not an algorithmic machine in the usual sense of the word.

Does this imply that a noisy computer can be creative? Alan Turing, pioneer of the general-purpose computing machine, believed this was possible, suggesting that “if a machine is expected to be infallible then it cannot also be intelligent”. That is to say, if we want the machine to be intelligent then it had better be capable of making mistakes.

Others may argue there is no evidence that simply adding noise will make an otherwise stupid machine into an intelligent one – and I agree, as it stands. Adding noise to a climate model doesn’t automatically make it an intelligent climate model.

However, the type of synergistic interplay between noise and determinism – the kind that sorts the wheat from the chaff of random ideas – has hardly yet been developed in computer codes. Perhaps we could develop a new type of AI model where the AI is trained by getting it to solve simple mathematical theorems using the clariton-anticlariton model; by making guesses and seeing if any of these have value.

For this to be at all tractable, the AI system would need to be trained to focus on “educated random guesses”. (If the machine’s guesses are all uneducated ones, it will take forever to make progress – like waiting for a group of monkeys to type the first few lines of Hamlet.)

For example, in the context of Euclid’s proof that there are an unlimited number of primes, could we train an AI system in such a way that a random idea like “multiply the assumed finite number of primes together and add one” becomes much more likely than the completely useless random idea “add the assumed finite number of primes together and subtract six”? And if a particular guess turns out to be especially helpful, can we train the AI system so that the next guess is a refinement of the last one?

If we can somehow find a way to do this, it could open up modelling to a completely new level that is relevant to all fields of study. And in so doing, we might yet reach the so-called “singularity” when machines take over from humans. But only when AI developers fully embrace the constructive role of noise – as it seems the brain did many thousands of years ago.

For now, I feel the need for another walk in the countryside. To blow away some fusty old cobwebs – and perhaps sow the seeds for some exciting new ones.

About the Author:

Tim Palmer, Royal Society Research Professor, University of Oxford

This article is republished from The Conversation under a Creative Commons license. Read the original article.

A new type of material called a mechanical neural network can learn and change its physical properties to create adaptable, strong structures

By Ryan H. Lee, University of California, Los Angeles 

The Research Brief is a short take about interesting academic work.

The big idea

A new type of material can learn and improve its ability to deal with unexpected forces thanks to a unique lattice structure with connections of variable stiffness, as described in a new paper by my colleagues and me.

A hand holding a small, complex cube of plastic.
Architected materials – like this 3D lattice – get their properties not from what they are made out of, but from their structure.
Ryan Lee, CC BY-ND

The new material is a type of architected material, which gets its properties mainly from the geometry and specific traits of its design rather than what it is made out of. Take hook-and-loop fabric closures like Velcro, for example. It doesn’t matter whether it is made from cotton, plastic or any other substance. As long as one side is a fabric with stiff hooks and the other side has fluffy loops, the material will have the sticky properties of Velcro.

My colleagues and I based our new material’s architecture on that of an artificial neural network – layers of interconnected nodes that can learn to do tasks by changing how much importance, or weight, they place on each connection. We hypothesized that a mechanical lattice with physical nodes could be trained to take on certain mechanical properties by adjusting each connection’s rigidity.

To find out if a mechanical lattice would be able to adopt and maintain new properties – like taking on a new shape or changing directional strength – we started off by building a computer model. We then selected a desired shape for the material as well as input forces and had a computer algorithm tune the tensions of the connections so that the input forces would produce the desired shape. We did this training on 200 different lattice structures and found that a triangular lattice was best at achieving all of the shapes we tested.

Once the many connections are tuned to achieve a set of tasks, the material will continue to react in the desired way. The training is – in a sense – remembered in the structure of the material itself.

We then built a physical prototype lattice with adjustable electromechanical springs arranged in a triangular lattice. The prototype is made of 6-inch connections and is about 2 feet long by 1½ feet wide. And it worked. When the lattice and algorithm worked together, the material was able to learn and change shape in particular ways when subjected to different forces. We call this new material a mechanical neural network.

A photo of hydraulic springs arranged in a triangular lattice
The prototype is 2D, but a 3D version of this material could have many uses.
Jonathan Hopkins, CC BY-ND

Why it matters

Besides some living tissues, very few materials can learn to be better at dealing with unanticipated loads. Imagine a plane wing that suddenly catches a gust of wind and is forced in an unanticipated direction. The wing can’t change its design to be stronger in that direction.

The prototype lattice material we designed can adapt to changing or unknown conditions. In a wing, for example, these changes could be the accumulation of internal damage, changes in how the wing is attached to a craft or fluctuating external loads. Every time a wing made out of a mechanical neural network experienced one of these scenarios, it could strengthen and soften its connections to maintain desired attributes like directional strength. Over time, through successive adjustments made by the algorithm, the wing adopts and maintains new properties, adding each behavior to the rest as a sort of muscle memory.

This type of material could have far reaching applications for the longevity and efficiency of built structures. Not only could a wing made of a mechanical neural network material be stronger, it could also be trained to morph into shapes that maximize fuel efficiency in response to changing conditions around it.

This connection of springs is a new type of material that can change shape and learn new properties.
Jonathan Hopkins, CC BY-ND

What’s still not known

So far, our team has worked only with 2D lattices. But using computer modeling, we predict that 3D lattices would have a much larger capacity for learning and adaptation. This increase is due to the fact that a 3D structure could have tens of times more connections, or springs, that don’t intersect with one another. However, the mechanisms we used in our first model are far too complex to support in a large 3D structure.

What’s next

The material my colleagues and I created is a proof of concept and shows the potential of mechanical neural networks. But to bring this idea into the real world will require figuring out how to make the individual pieces smaller and with precise properties of flex and tension.

We hope new research in the manufacturing of materials at the micron scale, as well as work on new materials with adjustable stiffness, will lead to advances that make powerful smart mechanical neural networks with micron-scale elements and dense 3D connections a ubiquitous reality in the near future.The Conversation

About the Author:

Ryan H. Lee, PhD Student in Mechanical and Aerospace Engineering, University of California, Los Angeles

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Google’s powerful AI spotlights a human cognitive glitch: Mistaking fluent speech for fluent thought

By Kyle Mahowald, The University of Texas at Austin College of Liberal Arts and Anna A. Ivanova, Massachusetts Institute of Technology (MIT) 

When you read a sentence like this one, your past experience tells you that it’s written by a thinking, feeling human. And, in this case, there is indeed a human typing these words: [Hi, there!] But these days, some sentences that appear remarkably humanlike are actually generated by artificial intelligence systems trained on massive amounts of human text.

People are so accustomed to assuming that fluent language comes from a thinking, feeling human that evidence to the contrary can be difficult to wrap your head around. How are people likely to navigate this relatively uncharted territory? Because of a persistent tendency to associate fluent expression with fluent thought, it is natural – but potentially misleading – to think that if an AI model can express itself fluently, that means it thinks and feels just like humans do.

Thus, it is perhaps unsurprising that a former Google engineer recently claimed that Google’s AI system LaMDA has a sense of self because it can eloquently generate text about its purported feelings. This event and the subsequent media coverage led to a number of rightly skeptical articles and posts about the claim that computational models of human language are sentient, meaning capable of thinking and feeling and experiencing.

The question of what it would mean for an AI model to be sentient is complicated (see, for instance, our colleague’s take), and our goal here is not to settle it. But as language researchers, we can use our work in cognitive science and linguistics to explain why it is all too easy for humans to fall into the cognitive trap of thinking that an entity that can use language fluently is sentient, conscious or intelligent.

Using AI to generate humanlike language

Text generated by models like Google’s LaMDA can be hard to distinguish from text written by humans. This impressive achievement is a result of a decadeslong program to build models that generate grammatical, meaningful language.

a screenshot showing a text dialog
The first computer system to engage people in dialogue was psychotherapy software called Eliza, built more than half a century ago.
Rosenfeld Media/Flickr, CC BY

Early versions dating back to at least the 1950s, known as n-gram models, simply counted up occurrences of specific phrases and used them to guess what words were likely to occur in particular contexts. For instance, it’s easy to know that “peanut butter and jelly” is a more likely phrase than “peanut butter and pineapples.” If you have enough English text, you will see the phrase “peanut butter and jelly” again and again but might never see the phrase “peanut butter and pineapples.”

Today’s models, sets of data and rules that approximate human language, differ from these early attempts in several important ways. First, they are trained on essentially the entire internet. Second, they can learn relationships between words that are far apart, not just words that are neighbors. Third, they are tuned by a huge number of internal “knobs” – so many that it is hard for even the engineers who design them to understand why they generate one sequence of words rather than another.

The models’ task, however, remains the same as in the 1950s: determine which word is likely to come next. Today, they are so good at this task that almost all sentences they generate seem fluid and grammatical.

Peanut butter and pineapples?

We asked a large language model, GPT-3, to complete the sentence “Peanut butter and pineapples___”. It said: “Peanut butter and pineapples are a great combination. The sweet and savory flavors of peanut butter and pineapple complement each other perfectly.” If a person said this, one might infer that they had tried peanut butter and pineapple together, formed an opinion and shared it with the reader.

But how did GPT-3 come up with this paragraph? By generating a word that fit the context we provided. And then another one. And then another one. The model never saw, touched or tasted pineapples – it just processed all the texts on the internet that mention them. And yet reading this paragraph can lead the human mind – even that of a Google engineer – to imagine GPT-3 as an intelligent being that can reason about peanut butter and pineapple dishes.

Large AI language models can engage in fluent conversation. However, they have no overall message to communicate, so their phrases often follow common literary tropes, extracted from the texts they were trained on. For instance, if prompted with the topic “the nature of love,” the model might generate sentences about believing that love conquers all. The human brain primes the viewer to interpret these words as the model’s opinion on the topic, but they are simply a plausible sequence of words.

The human brain is hardwired to infer intentions behind words. Every time you engage in conversation, your mind automatically constructs a mental model of your conversation partner. You then use the words they say to fill in the model with that person’s goals, feelings and beliefs.

The process of jumping from words to the mental model is seamless, getting triggered every time you receive a fully fledged sentence. This cognitive process saves you a lot of time and effort in everyday life, greatly facilitating your social interactions.

However, in the case of AI systems, it misfires – building a mental model out of thin air.

A little more probing can reveal the severity of this misfire. Consider the following prompt: “Peanut butter and feathers taste great together because___”. GPT-3 continued: “Peanut butter and feathers taste great together because they both have a nutty flavor. Peanut butter is also smooth and creamy, which helps to offset the feather’s texture.”

The text in this case is as fluent as our example with pineapples, but this time the model is saying something decidedly less sensible. One begins to suspect that GPT-3 has never actually tried peanut butter and feathers.

Ascribing intelligence to machines, denying it to humans

A sad irony is that the same cognitive bias that makes people ascribe humanity to GPT-3 can cause them to treat actual humans in inhumane ways. Sociocultural linguistics – the study of language in its social and cultural context – shows that assuming an overly tight link between fluent expression and fluent thinking can lead to bias against people who speak differently.

For instance, people with a foreign accent are often perceived as less intelligent and are less likely to get the jobs they are qualified for. Similar biases exist against speakers of dialects that are not considered prestigious, such as Southern English in the U.S., against deaf people using sign languages and against people with speech impediments such as stuttering.

These biases are deeply harmful, often lead to racist and sexist assumptions, and have been shown again and again to be unfounded.

Fluent language alone does not imply humanity

Will AI ever become sentient? This question requires deep consideration, and indeed philosophers have pondered it for decades. What researchers have determined, however, is that you cannot simply trust a language model when it tells you how it feels. Words can be misleading, and it is all too easy to mistake fluent speech for fluent thought.The Conversation

About the Authors:

Kyle Mahowald, Assistant Professor of Linguistics, The University of Texas at Austin College of Liberal Arts and Anna A. Ivanova, PhD Candidate in Brain and Cognitive Sciences, Massachusetts Institute of Technology (MIT)

This article is republished from The Conversation under a Creative Commons license. Read the original article.

 

What are digital twins? A pair of computer modeling experts explain

By Amlan Ganguly, Rochester Institute of Technology and Nalini Venkatasubramanian, University of California, Irvine 

A digital twin is a virtual representation of a real system – a building, the power grid, a city, even a human being – that mimics the characteristics of the system. A digital twin is more than just a computer model, however. It receives data from sensors in the real system to constantly parallel the system’s state.

A digital twin helps people analyze and predict a system’s behavior under different conditions. The systems being twinned are typically very complex and require significant effort to model and track.

Digital twins are useful in a wide variety of domains, including supply chains, health care, buildings, bridges, self-driving cars and retail customer personas to improve efficiency and reliability. For example, a warehouse operator can optimize a warehouse’s performance by exploring the response of its digital twin to various material handling policies and equipment without incurring the cost of making actual changes.

Even a wildfire can be represented by a digital twin. Government agencies can predict the spread of the fire and its impact under different conditions such as wind velocity, humidity and proximity to habitats, and use this information to guide evacuations.

Why digital twins matter

Digital twins are often used to model, understand and analyze complex systems where performance, reliability and security of the system are critical. In such systems it is paramount to test any changes, whether planned or unplanned.

In order to accurately test changes to the state of the actual system and the effects of any possible stimulus, the digital twin must accurately represent the physical system in its current state. This requires the digital twin to receive continuous updates from the physical system via fast and reliable communications channels.

Digital twins are a key part of the push to create “smart” cities.

Creating and maintaining digital twins often involves vast amounts of data to represent various features of the real system. Collecting and processing this data requires advanced communication and computing technologies. Communication support typically involves high-speed internet connections and wireless networks such as Wi-Fi and 5G. Computational support is typically in the form of servers, either in the cloud or closer to the physical system.

We and other faculty members at Rochester Institute of Technology and the University of California, Irvine are starting the Center for Smart Spaces Research, a research center sponsored by the National Science Foundation. One of the primary ongoing projects within this center is building the basic technologies for creating digital twins in a variety of applications.

Read other short, accessible explanations of newsworthy subjects written by academics in their areas of expertise for The Conversation U.S. here.The Conversation

About  the Author:

Amlan Ganguly, Associate Professor of Computer Engineering, Rochester Institute of Technology and Nalini Venkatasubramanian, Professor of Computer Science, University of California, Irvine

This article is republished from The Conversation under a Creative Commons license. Read the original article.

AI and machine learning are improving weather forecasts, but they won’t replace human experts

By Russ Schumacher, Colorado State University and Aaron Hill, Colorado State University 

A century ago, English mathematician Lewis Fry Richardson proposed a startling idea for that time: constructing a systematic process based on math for predicting the weather. In his 1922 book, “Weather Prediction By Numerical Process,” Richardson tried to write an equation that he could use to solve the dynamics of the atmosphere based on hand calculations.

It didn’t work because not enough was known about the science of the atmosphere at that time. “Perhaps some day in the dim future it will be possible to advance the computations faster than the weather advances and at a cost less than the saving to mankind due to the information gained. But that is a dream,” Richardson concluded.

A century later, modern weather forecasts are based on the kind of complex computations that Richardson imagined – and they’ve become more accurate than anything he envisioned. Especially in recent decades, steady progress in research, data and computing has enabled a “quiet revolution of numerical weather prediction.”

For example, a forecast of heavy rainfall two days in advance is now as good as a same-day forecast was in the mid-1990s. Errors in the predicted tracks of hurricanes have been cut in half in the last 30 years.

There still are major challenges. Thunderstorms that produce tornadoes, large hail or heavy rain remain difficult to predict. And then there’s chaos, often described as the “butterfly effect” – the fact that small changes in complex processes make weather less predictable. Chaos limits our ability to make precise forecasts beyond about 10 days.

As in many other scientific fields, the proliferation of tools like artificial intelligence and machine learning holds great promise for weather prediction. We have seen some of what’s possible in our research on applying machine learning to forecasts of high-impact weather. But we also believe that while these tools open up new possibilities for better forecasts, many parts of the job are handled more skillfully by experienced people.

Australian meteorologist Dean Narramore explains why it’s hard to forecast large thunderstorms.

Predictions based on storm history

Today, weather forecasters’ primary tools are numerical weather prediction models. These models use observations of the current state of the atmosphere from sources such as weather stations, weather balloons and satellites, and solve equations that govern the motion of air.

These models are outstanding at predicting most weather systems, but the smaller a weather event is, the more difficult it is to predict. As an example, think of a thunderstorm that dumps heavy rain on one side of town and nothing on the other side. Furthermore, experienced forecasters are remarkably good at synthesizing the huge amounts of weather information they have to consider each day, but their memories and bandwidth are not infinite.

Artificial intelligence and machine learning can help with some of these challenges. Forecasters are using these tools in several ways now, including making predictions of high-impact weather that the models can’t provide.

In a project that started in 2017 and was reported in a 2021 paper, we focused on heavy rainfall. Of course, part of the problem is defining “heavy”: Two inches of rain in New Orleans may mean something very different than in Phoenix. We accounted for this by using observations of unusually large rain accumulations for each location across the country, along with a history of forecasts from a numerical weather prediction model.

We plugged that information into a machine learning method known as “random forests,” which uses many decision trees to split a mass of data and predict the likelihood of different outcomes. The result is a tool that forecasts the probability that rains heavy enough to generate flash flooding will occur.

We have since applied similar methods to forecasting of tornadoes, large hail and severe thunderstorm winds. Other research groups are developing similar tools. National Weather Service forecasters are using some of these tools to better assess the likelihood of hazardous weather on a given day.

Two maps showing a machine learning forecast and actual flooding in the mid-Atlantic states after Hurricane Ida in 2021.
An excessive rainfall forecast from the Colorado State University-Machine Learning Probabilities system for the extreme rainfall associated with the remnants of Hurricane Ida in the mid-Atlantic states in September 2021. The left panel shows the forecast probability of excessive rainfall, available on the morning of Aug. 31, more than 24 hours ahead of the event. The right panel shows the resulting observations of excessive rainfall. The machine learning program correctly highlighted the corridor where widespread heavy rain and flooding would occur.
Russ Schumacher and Aaron Hill, CC BY-ND

Researchers also are embedding machine learning within numerical weather prediction models to speed up tasks that can be intensive to compute, such as predicting how water vapor gets converted to rain, snow or hail.

It’s possible that machine learning models could eventually replace traditional numerical weather prediction models altogether. Instead of solving a set of complex physical equations as the models do, these systems instead would process thousands of past weather maps to learn how weather systems tend to behave. Then, using current weather data, they would make weather predictions based on what they’ve learned from the past.

Some studies have shown that machine learning-based forecast systems can predict general weather patterns as well as numerical weather prediction models while using only a fraction of the computing power the models require. These new tools don’t yet forecast the details of local weather that people care about, but with many researchers carefully testing them and inventing new methods, there is promise for the future.

Maps of an evolving machine learning forecast for an outbreak of severe weather in the US Midwest in December 2021.
A forecast from the Colorado State University-Machine Learning Probabilities system for the severe weather outbreak on Dec. 15, 2021, in the U.S. Midwest. The panels illustrate the progression of the forecast from eight days in advance (lower right) to three days in advance (upper left), along with reports of severe weather (tornadoes in red, hail in green, damaging wind in blue).
Russ Schumacher and Aaron Hill, CC BY-ND

The role of human expertise

There are also reasons for caution. Unlike numerical weather prediction models, forecast systems that use machine learning are not constrained by the physical laws that govern the atmosphere. So it’s possible that they could produce unrealistic results – for example, forecasting temperature extremes beyond the bounds of nature. And it is unclear how they will perform during highly unusual or unprecedented weather phenomena.

And relying on AI tools can raise ethical concerns. For instance, locations with relatively few weather observations with which to train a machine learning system may not benefit from forecast improvements that are seen in other areas.

Another central question is how best to incorporate these new advances into forecasting. Finding the right balance between automated tools and the knowledge of expert human forecasters has long been a challenge in meteorology. Rapid technological advances will only make it more complicated.

Ideally, AI and machine learning will allow human forecasters to do their jobs more efficiently, spending less time on generating routine forecasts and more on communicating forecasts’ implications and impacts to the public – or, for private forecasters, to their clients. We believe that careful collaboration between scientists, forecasters and forecast users is the best way to achieve these goals and build trust in machine-generated weather forecasts.The Conversation

About the Author:

Russ Schumacher, Associate Professor of Atmospheric Science and Colorado State Climatologist, Colorado State University and Aaron Hill, Research Scientist, Colorado State University

This article is republished from The Conversation under a Creative Commons license. Read the original article.

 

Nonprogrammers are building more of the world’s software – a computer scientist explains ‘no-code’

By Tam Nguyen, University of Dayton 

Traditional computer programming has a steep learning curve that requires learning a programming language, for example C/C++, Java or Python, just to build a simple application such as a calculator or Tic-tac-toe game. Programming also requires substantial debugging skills, which easily frustrates new learners. The study time, effort and experience needed often stop nonprogrammers from making software from scratch.

No-code is a way to program websites, mobile apps and games without using codes or scripts, or sets of commands. People readily learn from visual cues, which led to the development of “what you see is what you get” (WYSIWYG) document and multimedia editors as early as the 1970s. WYSIWYG editors allow you to work in a document as it appears in finished form. The concept was extended to software development in the 1990s.

There are many no-code development platforms that allow both programmers and nonprogrammers to create software through drag-and-drop graphical user interfaces instead of traditional line-by-line coding. For example, a user can drag a label and drop it to a website. The no-code platform will show how the label looks and create the corresponding HTML code. No-code development platforms generally offer templates or modules that allow anyone to build apps.

Early days

In the 1990s, websites were the most familiar interface to users. However, building a website required HTML coding and script-based programming that are not easy for a person lacking programming skills. This led to the release of early no-code platforms, including Microsoft FrontPage and Adobe Dreamweaver, to help nonprogrammers build websites.

a screenshot showing computer code
Traditional programming requires learning a programming language.
WILLPOWER STUDIOS/Flickr, CC BY

Following the WYSIWYG mindset, nonprogrammers could drag and drop website components such as labels, text boxes and buttons without using HTML code. In addition to editing websites locally, these tools also helped users upload the built websites to remote web servers, a key step in putting a website online.

However, the websites created by these editors were basic static websites. There were no advanced functions such as user authentication or database connections.

Website development

There are many current no-code website-building platforms such as Bubble, Wix, WordPress and GoogleSites that overcome the shortcomings of the early no-code website builders. Bubble allows users to design the interface by defining a workflow. A workflow is a series of actions triggered by an event. For instance, when a user clicks on the save button (the event), the current game status is saved to a file (the series of actions).

Meanwhile, Wix launched an HTML5 site builder that includes a library of website templates. In addition, Wix supports modules – for example, data analysis of visitor data such as contact information, messages, purchases and bookings; booking support for hotels and vacation rentals; and a platform for independent musicians to market and sell their music.

WordPress was originally developed for personal blogs. It has since been extended to support forums, membership sites, learning management systems and online stores. Like WordPress, GoogleSites lets users create websites with various embedded functions from Google, such as YouTube, Google Maps, Google Drive, calendar and online office applications.

Game and mobile apps

In addition to website builders, there are no-code platforms for game and mobile app development. The platforms are aimed at designers, entrepreneurs and hobbyists who don’t have game development or coding knowledge.

GameMaker provides a user interface with built-in editors for raster graphics, game level design, scripting, paths and “shaders” for representing light and shadow. GameMaker is primarily intended for making games with 2D graphics and 2D skeletal animations.

Buildbox is a no-code 3D game development platform. The main features of Buildbox include the image drop wheel, asset bar, option bar, collision editor, scene editor, physics simulation and even monetization options. While using Buildbox, users also get access to a library of game assets, sound effects and animations. In addition, Buildbox users can create the story of the game. Then users can edit game characters and environmental settings such as weather conditions and time of day, and change the user interface. They can also animate objects, insert video ads, and export their games to different platforms such as PCs and mobile devices.

Games such as Minecraft and SimCity can be thought of as tools for creating virtual worlds without coding.

Future of no-code

No-code platforms help increase the number of developers, in a time of increasing demand for software development. No-code is showing up in fields such as e-commerce, education and health care.

I expect that no-code will play a more prominent role in artificial intelligence, as well. Training machine-learning models, the heart of AI, requires time, effort and experience. No-code programming can help reduce the time to train these models, which makes it easier to use AI for many purposes. For example, one no-code AI tool allows nonprogrammers to create chatbots, something that would have been unimaginable even a few years ago.The Conversation

About the Author:

Tam Nguyen, Assistant Professor of Computer Science, University of Dayton

This article is republished from The Conversation under a Creative Commons license. Read the original article.

How a simple crystal could help pave the way to full-scale quantum computing

By Jarryd Pla, UNSW and Andrew Dzurak, UNSW 

Vaccine and drug development, artificial intelligence, transport and logistics, climate science — these are all areas that stand to be transformed by the development of a full-scale quantum computer. And there has been explosive growth in quantum computing investment over the past decade.

Yet current quantum processors are relatively small in scale, with fewer than 100 qubits — the basic building blocks of a quantum computer. Bits are the smallest unit of information in computing, and the term qubits stems from “quantum bits”.

While early quantum processors have been crucial for demonstrating the potential of quantum computing, realising globally significant applications will likely require processors with upwards of a million qubits.

Our new research tackles a core problem at the heart of scaling up quantum computers: how do we go from controlling just a few qubits, to controlling millions? In research published today in Science Advances, we reveal a new technology that may offer a solution.

What exactly is a quantum computer?

Quantum computers use qubits to hold and process quantum information. Unlike the bits of information in classical computers, qubits make use of the quantum properties of nature, known as “superposition” and “entanglement”, to perform some calculations much faster than their classical counterparts.

Unlike a classical bit, which is represented by either 0 or 1, a qubit can exist in two states (that is, 0 and 1) at the same time. This is what we refer to as a superposition state.

Demonstrations by Google and others have shown even current, early-stage quantum computers can outperform the most powerful supercomputers on the planet for a highly specialised (albeit not particularly useful) task — reaching a milestone we call quantum supremacy.

Google’s quantum computer, built from superconducting electrical circuits, had just 53 qubits and was cooled to a temperature close to -273℃ in a high-tech refrigerator. This extreme temperature is needed to remove heat, which can introduce errors to the fragile qubits. While such demonstrations are important, the challenge now is to build quantum processors with many more qubits.

Major efforts are underway at UNSW Sydney to make quantum computers from the same material used in everyday computer chips: silicon. A conventional silicon chip is thumbnail-sized and packs in several billion bits, so the prospect of using this technology to build a quantum computer is compelling.

The control problem

In silicon quantum processors, information is stored in individual electrons, which are trapped beneath small electrodes at the chip’s surface. Specifically, the qubit is coded into the electron’s spin. It can be pictured as a small compass inside the electron. The needle of the compass can point north or south, which represents the 0 and 1 states.

To set a qubit in a superposition state (both 0 and 1), an operation that occurs in all quantum computations, a control signal must be directed to the desired qubit. For qubits in silicon, this control signal is in the form of a microwave field, much like the ones used to carry phone calls over a 5G network. The microwaves interact with the electron and cause its spin (compass needle) to rotate.

Currently, each qubit requires its own microwave control field. It is delivered to the quantum chip through a cable running from room temperature down to the bottom of the refrigerator at close to -273℃. Each cable brings heat with it, which must be removed before it reaches the quantum processor.

At around 50 qubits, which is state-of-the-art today, this is difficult but manageable. Current refrigerator technology can cope with the cable heat load. However, it represents a huge hurdle if we’re to use systems with a million qubits or more.

The solution is ‘global’ control

An elegant solution to the challenge of how to deliver control signals to millions of spin qubits was proposed in the late 1990s. The idea of “global control” was simple: broadcast a single microwave control field across the entire quantum processor.

Voltage pulses can be applied locally to qubit electrodes to make the individual qubits interact with the global field (and produce superposition states).

It’s much easier to generate such voltage pulses on-chip than it is to generate multiple microwave fields. The solution requires only a single control cable and removes obtrusive on-chip microwave control circuitry.

For more than two decades global control in quantum computers remained an idea. Researchers could not devise a suitable technology that could be integrated with a quantum chip and generate microwave fields at suitably low powers.

In our work we show that a component known as a dielectric resonator could finally allow this. The dielectric resonator is a small, transparent crystal which traps microwaves for a short period of time.

The trapping of microwaves, a phenomenon known as resonance, allows them to interact with the spin qubits longer and greatly reduces the power of microwaves needed to generate the control field. This was vital to operating the technology inside the refrigerator.

In our experiment, we used the dielectric resonator to generate a control field over an area that could contain up to four million qubits. The quantum chip used in this demonstration was a device with two qubits. We were able to show the microwaves produced by the crystal could flip the spin state of each one.

The path to a full-scale quantum computer

There is still work to be done before this technology is up to the task of controlling a million qubits. For our study, we managed to flip the state of the qubits, but not yet produce arbitrary superposition states.

Experiments are ongoing to demonstrate this critical capability. We’ll also need to further study the impact of the dielectric resonator on other aspects of the quantum processor.

That said, we believe these engineering challenges will ultimately be surmountable — clearing one of the greatest hurdles to realising a large-scale spin-based quantum computer.

About the Author:

Jarryd Pla, Senior Lecturer in Quantum Engineering, UNSW and Andrew Dzurak, Scientia Professor in Quantum Engineering, UNSW

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Machine learning plus insights from genetic research shows the workings of cells – and may help develop new drugs for COVID-19 and other diseases

By Shang Gao, University of Illinois at Chicago and Jalees Rehman, University of Illinois at Chicago 

The Research Brief is a short take about interesting academic work.

The big idea

We combined a machine learning algorithm with knowledge gleaned from hundreds of biological experiments to develop a technique that allows biomedical researchers to figure out the functions of the proteins that turn genes on and off in cells, called transcription factors. This knowledge could make it easier to develop drugs for a wide range of diseases.

Early on during the COVID-19 pandemic, scientists who worked out the genetic code of the RNA molecules of cells in the lungs and intestines found that only a small group of cells in these organs were most vulnerable to being infected by the SARS-CoV-2 virus. That allowed researchers to focus on blocking the virus’s ability to enter these cells. Our technique could make it easier for researchers to find this kind of information.

The biological knowledge we work with comes from this kind of RNA sequencing, which gives researchers a snapshot of the hundreds of thousands of RNA molecules in a cell as they are being translated into proteins. A widely praised machine learning tool, the Seurat analysis platform, has helped researchers all across the world discover new cell populations in healthy and diseased organs. This machine learning tool processes data from single-cell RNA sequencing without any information ahead of time about how these genes function and relate to each other.

Our technique takes a different approach by adding knowledge about certain genes and cell types to find clues about the distinct roles of cells. There has been more than a decade of research identifying all the potential targets of transcription factors.

Armed with this knowledge, we used a mathematical approach called Bayesian inference. In this technique, prior knowledge is converted into probabilities that can be calculated on a computer. In our case it’s the probability of a gene being regulated by a given transcription factor. We then used a machine learning algorithm to figure out the function of the transcription factors in each one of the thousands of cells we analyzed.

We published our technique, called Bayesian Inference Transcription Factor Activity Model, in the journal Genome Research and also made the software freely available so that other researchers can test and use it.

Why it matters

Our approach works across a broad range of cell types and organs and could be used to develop treatments for diseases like COVID-19 or Alzheimer’s. Drugs for these difficult-to-treat diseases work best if they target cells that cause the disease and avoid collateral damage to other cells. Our technique makes it easier for researchers to home in on these targets.

A blob is covered with tiny spheres
A human cell (greenish blob) is heavily infected with SARS-CoV-2 (orange dots), the virus that causes COVID-19, in this colorized microscope image.
National Institute of Allergy and Infectious Diseases

What other research is being done

Single-cell RNA-sequencing has revealed how each organ can have 10, 20 or even more subtypes of specialized cells, each with distinct functions. A very exciting new development is the emergence of spatial transcriptomics, in which RNA sequencing is performed in a spatial grid that allows researchers to study the RNA of cells at specific locations in an organ.

A recent paper used a Bayesian statistics approach similar to ours to figure out distinct roles of cells while taking into account their proximity to one another. Another research group combined spatial data with single-cell RNA-sequencing data and studied the distinct functions of neighboring cells.

What’s next

We plan to work with colleagues to use our new technique to study complex diseases such as Alzheimer’s disease and COVID-19, work that could lead to new drugs for these diseases. We also want to work with colleagues to better understand the complexity of interactions among cells.

About the Author:

Shang Gao, Doctoral student in Bioinformatics, University of Illinois at Chicago and Jalees Rehman, Professor of Medicine, Pharmacology and Biomedical Engineering, University of Illinois at Chicago

This article is republished from The Conversation under a Creative Commons license. Read the original article.

 

Shape-shifting computer chip thwarts an army of hackers

By Todd Austin, University of Michigan and Lauren Biernacki, University of Michigan 

The Research Brief is a short take about interesting academic work.

The big idea

We have developed and tested a secure new computer processor that thwarts hackers by randomly changing its underlying structure, thus making it virtually impossible to hack.

Last summer, 525 security researchers spent three months trying to hack our Morpheus processor as well as others. All attempts against Morpheus failed. This study was part of a program sponsored by the U.S. Defense Advanced Research Program Agency to design a secure processor that could protect vulnerable software. DARPA released the results on the program to the public for the first time in January 2021.

A processor is the piece of computer hardware that runs software programs. Since a processor underlies all software systems, a secure processor has the potential to protect any software running on it from attack. Our team at the University of Michigan first developed Morpheus, a secure processor that thwarts attacks by turning the computer into a puzzle, in 2019.

A processor has an architecture – x86 for most laptops and ARM for most phones – which is the set of instructions software needs to run on the processor. Processors also have a microarchitecture, or the “guts” that enable the execution of the instruction set, the speed of this execution and how much power it consumes.

Hackers need to be intimately familiar with the details of the microarchitecture to graft their malicious code, or malware, onto vulnerable systems. To stop attacks, Morpheus randomizes these implementation details to turn the system into a puzzle that hackers must solve before conducting security exploits. From one Morpheus machine to another, details like the commands the processor executes or the format of program data change in random ways. Because this happens at the microarchitecture level, software running on the processor is unaffected.

a fan on top of a metal square in the middle of a computer circuit board
The Morpheus computer processor, inside the square beneath the fan on this circuit board, rapidly and continuously changes its underlying structure to thwart hackers.
Todd Austin, CC BY-ND

A skilled hacker could reverse-engineer a Morpheus machine in as little as a few hours, if given the chance. To counter this, Morpheus also changes the microarchitecture every few hundred milliseconds. Thus, not only do attackers have to reverse-engineer the microachitecture, but they have to do it very fast. With Morpheus, a hacker is confronted with a computer that has never been seen before and will never be seen again.

Why it matters

To conduct a security exploit, hackers use vulnerabilities in software to get inside a device. Once inside, they graft their malware onto the device. Malware is designed to infect the host device to steal sensitive data or spy on users.

The typical approach to computer security is to fix individual software vulnerabilities to keep hackers out. For these patch-based techniques to succeed, programmers must write perfect software without any bugs. But ask any programmer, and the idea of creating a perfect program is laughable. Bugs are everywhere, and security bugs are the most difficult to find because they don’t impair a program’s normal operation.

Morpheus takes a distinct approach to security by augmenting the underlying processor to prevent attackers from grafting malware onto the device. With this approach, Morpheus protects any vulnerable software that runs on it.

What other research is being done

For the longest time, processor designers considered security a problem for software programmers, since programmers made the software bugs that lead to security concerns. But recently computer designers have discovered that hardware can help protect software.

Academic efforts, such as Capability Hardware Enhanced RISC Instructions at the University of Cambridge, have demonstrated strong protection against memory bugs. Commercial efforts have begun as well, such as Intel’s soon-to-be-released Control-flow Enforcement Technology.

Morpheus takes a notably different approach of ignoring the bugs and instead randomizes its internal implementation to thwart exploitation of bugs. Fortunately, these are complementary techniques, and combining them will likely make systems even more difficult to attack.

The Morpheus secure processor works like a puzzle that keeps changing before hackers have a chance to solve it.
Alan de la Cruz via Unsplash

What’s next

We are looking at how the fundamental design aspects of Morpheus can be applied to protect sensitive data on people’s devices and in the cloud. In addition to randomizing the implementation details of a system, how can we randomize data in a way that maintains privacy while not being a burden to software programmers?

About the Author:

Todd Austin, Professor of Electrical Engineering and Computer Science, University of Michigan and Lauren Biernacki, Ph.D. Candidate in Computer Science & Engineering, University of Michigan

This article is republished from The Conversation under a Creative Commons license. Read the original article.

The Colonial Pipeline ransomware attack and the SolarWinds hack were all but inevitable – why national cyber defense is a ‘wicked’ problem

By Terry Thompson, Johns Hopkins University 

Takeaways:

· There are no easy solutions to shoring up U.S. national cyber defenses.

· Software supply chains and private sector infrastructure companies are vulnerable to hackers.

· Many U.S. companies outsource software development because of a talent shortage, and some of that outsourcing goes to companies in Eastern Europe that are vulnerable to Russian operatives.

· U.S. national cyber defense is split between the Department of Defense and the Department of Homeland Security, which leaves gaps in authority.

The ransomware attack on Colonial Pipeline on May 7, 2021, exemplifies the huge challenges the U.S. faces in shoring up its cyber defenses. The private company, which controls a significant component of the U.S. energy infrastructure and supplies nearly half of the East Coast’s liquid fuels, was vulnerable to an all-too-common type of cyber attack. The FBI has attributed the attack to a Russian cybercrime gang. It would be difficult for the government to mandate better security at private companies, and the government is unable to provide that security for the private sector.

Similarly, the SolarWinds hack, one of the most devastating cyber attacks in history, which came to light in December 2020, exposed vulnerabilities in global software supply chains that affect government and private sector computer systems. It was a major breach of national security that revealed gaps in U.S. cyber defenses.

These gaps include inadequate security by a major software producer, fragmented authority for government support to the private sector, blurred lines between organized crime and international espionage, and a national shortfall in software and cybersecurity skills. None of these gaps is easily bridged, but the scope and impact of the SolarWinds attack show how critical controlling these gaps is to U.S. national security.

The SolarWinds breach, likely carried out by a group affiliated with Russia’s FSB security service, compromised the software development supply chain used by SolarWinds to update 18,000 users of its Orion network management product. SolarWinds sells software that organizations use to manage their computer networks. The hack, which allegedly began in early 2020, was discovered only in December when cybersecurity company FireEye revealed that it had been hit by the malware. More worrisome, this may have been part of a broader attack on government and commercial targets in the U.S.

The Biden administration is preparing an executive order that is expected to address these software supply chain vulnerabilities. However, these changes, as important as they are, would probably not have prevented the SolarWinds attack. And preventing ransomware attacks like the Colonial Pipeline attack would require U.S. intelligence and law enforcement to infiltrate every organized cyber criminal group in Eastern Europe.

Supply chains, sloppy security and a talent shortage

The vulnerability of the software supply chain – the collections of software components and software development services companies use to build software products – is a well-known problem in the security field. In response to a 2017 executive order, a report by a Department of Defense-led interagency task force identified “a surprising level of foreign dependence,” workforce challenges and critical capabilities such as printed circuit board manufacturing that companies are moving offshore in pursuit of competitive pricing. All these factors came into play in the SolarWinds attack.

SolarWinds, driven by its growth strategy and plans to spin off its managed service provider business in 2021, bears much of the responsibility for the damage, according to cybersecurity experts. I believe that the company put itself at risk by outsourcing its software development to Eastern Europe, including a company in Belarus. Russian operatives have been known to use companies in former Soviet satellite countries to insert malware into software supply chains. Russia used this technique in the 2017 NotPetya attack that cost global companies more than US$10 billion.

Software supply chain attacks explained.

SolarWinds also failed to practice basic cybersecurity hygiene, according to a cybersecurity researcher.

Vinoth Kumar reported that the password for the software company’s development server was allegedly “solarwinds123,” an egregious violation of fundamental standards of cybersecurity. SolarWinds’ sloppy password management is ironic in light of the Password Management Solution of the Year award the company received in 2019 for its Passportal product.

In a blog post, the company admitted that “the attackers were able to circumvent threat detection techniques employed by both SolarWinds, other private companies, and the federal government.”

The larger question is why SolarWinds, an American company, had to turn to foreign providers for software development. A Department of Defense report about supply chains characterizes the lack of software engineers as a crisis, partly because the education pipeline is not providing enough software engineers to meet demand in the commercial and defense sectors.

There’s also a shortage of cybersecurity talent in the U.S. Engineers, software developers and network engineers are among the most needed skills across the U.S., and the lack of software engineers who focus on the security of software in particular is acute.

Fragmented authority

Though I’d argue SolarWinds has much to answer for, it should not have had to defend itself against a state-orchestrated cyber attack on its own. The 2018 National Cyber Strategy describes how supply chain security should work. The government determines the security of federal contractors like SolarWinds by reviewing their risk management strategies, ensuring that they are informed of threats and vulnerabilities and responding to incidents on their systems.

However, this official strategy split these responsibilities between the Pentagon for defense and intelligence systems and the Department of Homeland Security for civil agencies, continuing a fragmented approach to information security that began in the Reagan era. Execution of the strategy relies on the DOD’s U.S. Cyber Command and DHS’s Cyber and Infrastructure Security Agency. DOD’s strategy is to “defend forward”: that is, to disrupt malicious cyber activity at its source, which proved effective in the runup to the 2018 midterm elections. The Cyber and Infrastructure Security Agency, established in 2018, is responsible for providing information about threats to critical infrastructure sectors.

Neither agency appears to have sounded a warning or attempted to mitigate the attack on SolarWinds. The government’s response came only after the attack. The Cyber and Infrastructure Security Agency issued alerts and guidance, and a Cyber Unified Coordination Group was formed to facilitate coordination among federal agencies.

These tactical actions, while useful, were only a partial solution to the larger, strategic problem. The fragmentation of the authorities for national cyber defense evident in the SolarWinds hack is a strategic weakness that complicates cybersecurity for the government and private sector and invites more attacks on the software supply chain.

A wicked problem

National cyber defense is an example of a “wicked problem,” a policy problem that has no clear solution or measure of success. The Cyberspace Solarium Commission identified many inadequacies of U.S. national cyber defenses. In its 2020 report, the commission noted that “There is still not a clear unity of effort or theory of victory driving the federal government’s approach to protecting and securing cyberspace.”

Many of the factors that make developing a centralized national cyber defense challenging lie outside of the government’s direct control. For example, economic forces push technology companies to get their products to market quickly, which can lead them to take shortcuts that undermine security. Legislation along the lines of the Gramm-Leach-Bliley Act passed in 1999 could help deal with the need for speed in software development. The law placed security requirements on financial institutions. But software development companies are likely to push back against additional regulation and oversight.

The Biden administration appears to be taking the challenge seriously. The president has appointed a national cybersecurity director to coordinate related government efforts. It remains to be seen whether and how the administration will address the problem of fragmented authorities and clarify how the government will protect companies that supply critical digital infrastructure. It’s unreasonable to expect any U.S. company to be able to fend for itself against a foreign nation’s cyberattack.

Steps forward

In the meantime, software developers can apply the secure software development approach advocated by the National Institute of Standards and Technology. Government and industry can prioritize the development of artificial intelligence that can identify malware in existing systems. All this takes time, however, and hackers move quickly.

Finally, companies need to aggressively assess their vulnerabilities, particularly by engaging in more “red teaming” activities: that is, having employees, contractors or both play the role of hackers and attack the company.

Recognizing that hackers in the service of foreign adversaries are dedicated, thorough and not constrained by any rules is important for anticipating their next moves and reinforcing and improving U.S. national cyber defenses. Otherwise, Colonial Pipeline is unlikely to be the last victim of a major attack on U.S. infrastructure and SolarWinds is unlikely to be the last victim of a major attack on the U.S. software supply chain.

This is an updated version of an article originally published on February 9, 2021.

About the Author:

Terry Thompson, Adjunct Instructor in Cybersecurity, Johns Hopkins University

This article is republished from The Conversation under a Creative Commons license. Read the original article.