by Loz Blain
September 29, 2024
from
NewAtlas Website
Today's AI models, pictured here
using generative tools, are infants,
and their understanding of truth is being held back
by the human
thinking and language
they're trained on.
While you still have some reasoning power left, you should consider
this deeply philosophical question:
does AI think the way humans
think?
The only way you might think that is based on language, where
AI is giving you answers in your own language.
But what if AI
creates its own digital language to explain the world and everything
in it?
Communicating with you will be like a speed bump, even an
inconvenience. Why would it bother?
Artificial thinking is not human
thinking.
Source
AIs have a big problem with truth and correctness - and human
thinking appears to be a big part of that problem.
A new generation
of AI is now starting to take a much more experimental approach that
could catapult machine learning way past humans.
Remember
Deepmind's
AlphaGo...?
It represented a fundamental
breakthrough in AI development, because it was one of the first
game-playing AIs that took no human instruction and read no rules.
Instead, it used a technique called self-play reinforcement learning
(RL) to build up its own understanding of the game.
Pure trial and
error across millions, even billions of virtual games, starting out
more or less randomly pulling whatever levers were available, and
attempting to learn from the results.
Within two years of the start of the project in 2014,
AlphaGo had
beaten the European Go champion 5-0 - and by 2017 it had defeated
the
world's #1 ranked human player.
AlphaGo
soundly defeated
many-times
world-champion Go master Lee Sedol in 2016,
using strange moves that
would be incredibly rare
from a human opponent - and indeed,
that
evolved the human understanding
of the gameDeepmind
At this point, Deepmind unleashed a similar AlphaZero model on the
chess world, where models like Deep Blue, trained on human thinking,
knowledge and rule sets, had been beating human grandmasters since
the 90s.
AlphaZero played 100 matches against the reigning AI
champion, Stockfish, winning 28 and tying the rest.
Human thinking puts the brakes on
AI
Deepmind started dominating these games - and shoji, Dota 2,
Starcraft II and many others - when it jettisoned the idea that
emulating a human was the best way to get a good result.
Bound by different limits than us, and gifted with different
talents, these electronic minds were given the freedom to interact
with things on their own terms, play to their own cognitive
strengths, and build their own ground-up understanding of what works
and what doesn't.
AlphaZero doesn't know chess like Magnus Carlssen does.
It's never
heard of the Queen's Gambit or studied the great grandmasters.
It's
just played a shit-ton of chess, and built up its own understanding
against the cold, hard logic of wins and losses, in an inhuman and
inscrutable language it created itself as it went.
As a result it's so much better than any model trained by humans,
that it's an absolute certainty:
no human, and no model trained on
human thinking will ever again have a chance in a chess game if
there's an advanced reinforcement learning agent on the other side.
And something similar, according to people that are better-placed to
know the truth than anyone else on the planet, is what's just
started happening with the
latest, greatest version of ChatGPT.
OpenAI's new 'o1' model begins to
diverge from human thinking
ChatGPT and other Large Language Model (LLM) AIs, like those early
chess AIs, has been trained on as much human knowledge as was
available:
the entire written output of our species, give or take.
And they've become very, very good.
All this palaver about whether
they'll ever achieve Artificial General Intelligence...
Good grief,
can you picture a human that could compete with GPT-4o across the
breadth of its capabilities?
But LLMs specialize in language, not in getting facts right or
wrong.
That's why they "hallucinate" - or
BS - giving you wrong
information in beautifully phrased sentences, sounding as confident
as a news anchor.
Language is a collection of weird gray areas where there's rarely an
answer that's 100% right or wrong - so LLMs are typically trained
using reinforcement learning with human feedback.
That is, humans
pick which answers sound closer to the kind of answer they were
wanting.
But facts, and exams, and coding - these things do have a
clear success/fail condition; either you got it right, or you
didn't.
And this is where the new o1 model has started to split away from
human thinking and start bringing in that insanely effective AlphaGo
approach of pure trial and error in pursuit of the right result.
o1's Baby steps into
Reinforcement Learning
In many ways, o1 is pretty much the same as its predecessors
- except that OpenAI has built in some 'thinking time' before it
starts to answer a prompt.
During this thinking time, o1 generates a
'chain of thought' in which it considers and reasons its way through
a problem.
And this is where the RL approach comes in - o1, unlike previous
models that were more like the world's most advanced autocomplete
systems, really 'cares' whether it gets things right or wrong.
And
through part of its training, this model was given the freedom to
approach problems with a random trial-and-error approach in its
chain of thought reasoning.
It still only had human-generated reasoning steps to draw from, but
it was free to apply them randomly and draw its own conclusions
about which steps, in which order, are most likely to get it toward
a correct answer.
And in that sense, it's the first LLM that's really starting to
create that strange, but super-effective AlphaGo-style
'understanding' of problem spaces. In the domains where it's now
surpassing Ph.D.-level capabilities and knowledge, it got there
essentially by trial and error, by chancing upon the correct answers
over millions of self-generated attempts, and by building up its own
theories of what's a useful reasoning step and what's not.
So in topics where there's a clear right and wrong answer, we're now
beginning to see this alien intelligence take the first steps past
us on its own two feet. If the games world is a good analogy for
real life, then friends, we know where things go from here.
It's a
sprinter that'll accelerate forever, given enough energy.
But o1 is still primarily trained on human language. That's very
different from truth - language is a crude and low-res
representation of reality.
Put it this way:
you can describe a
biscuit to me all day long, but I won't have tasted it.
So what happens when you stop describing the truth of the physical
world, and let the AIs go and eat some biscuits?
We'll soon begin to
find out, because AIs embedded in robot bodies are now starting to
build their own ground-up understanding of how the physical world
works.
AI's Pathway toward Ultimate Truth
Freed from the crude human musings of Newton, and Einstein, and
Hawking, embodied AIs will take a bizarre AlphaGo-style approach to
understanding the world.
They'll poke and prod at reality, and
observe the results, and build up their own theories in their own
languages about what works, what doesn't, and why.
They won't approach reality like humans or animals do.
They won't
use a scientific method like ours, or split things into disciplines
like physics and chemistry, or run the same kinds of experiments
that helped humans master the materials and forces and energy
sources around them and dominate the world.
Embodied AIs given the freedom to learn like this will be
hilariously weird.
They'll do the most bizarre things you can think
of, for reasons known only to themselves, and in doing so, they'll
create and discover new knowledge that humans could never have
pieced together.
Unshackled from our language and thinking, they won't even notice
when they break through the boundaries of our knowledge and discover
truths about the universe and new technologies that humans wouldn't
stumble across in a billion years.
We're granted some reprieve here:
this isn't happening in a matter
of days or weeks, like so much of what's going on in the LLM world.
Reality is the highest-resolution system we know of, and the
ultimate source of truth.
But there's an awful lot of it, and it's
also painfully slow to work with; unlike in simulation, reality
demands that you operate at a painfully slow one minute per minute,
and you're only allowed to use as many bodies as you've actually
built.
So embodied AIs attempting to learn from base reality won't
initially have the wild speed advantage of their language-based
forebears. But they'll still be a lot faster than evolution, with
the ability to pool their learnings among co-operative groups in
swarm learning.
Companies like Tesla, Figure and Sanctuary AI are working feverishly
at building humanoids to a standard that's commercially useful and
cost-competitive with human labor.
Once they achieve that - if they
achieve that - they'll be able to build enough robots to start
working on that ground-up, trial-and-error understanding of the
physical world, at scale and at speed.
They'll need to pay their way, though. It's funny to think about,
but these humanoids might learn to master the universe in their
downtime from work.
Apologies for these rather esoteric and speculative thoughts, but as
I keep finding myself saying, what a time to be alive!
OpenAI's o1 model might not look like a quantum leap forward,
sitting there in GPT's drab textual clothing, looking like just
another invisible terminal typist.
But it really is a step-change in
the development of AI - and a fleeting glimpse into exactly how
these alien machines will eventually overtake humans in every
conceivable way.
For a wonderful deeper dive into how reinforcement learning makes o1
a step-change in the development of AI, I highly recommend the video
below, from the excellent AI Explained channel:
|