| 
			  
			  
			
			
  
			by Loz BlainSeptember 29, 2024
 from 
			NewAtlas Website
 
 
			
			
 
 
  Today's AI models, pictured here
 
			using generative tools, are infants, 
			and their understanding of truth is being held back 
			 
			by the human 
			thinking and language  
			they're trained on. 
			
 
				
					
						
						While you still have some reasoning power left, you should consider 
			this deeply philosophical question:  
							
							does AI think the way humans 
			think?  
						The only way you might think that is based on language, where 
						
						AI is giving you answers in your own language.    
						But what if AI 
			creates its own digital language to explain the world and everything 
			in it?    
						Communicating with you will be like a speed bump, even an 
			inconvenience. Why would it bother?    
						Artificial thinking is not human 
			thinking. 
						
						
						Source 
			
 AIs have a big problem with truth and correctness - and human 
			thinking appears to be a big part of that problem.
 
			  
			A new generation 
			of AI is now starting to take a much more experimental approach that 
			could catapult machine learning way past humans.
 Remember 
			
			Deepmind's
			
			
			AlphaGo...?
 
				
				It represented a fundamental 
			breakthrough in AI development, because it was one of the first 
			game-playing AIs that took no human instruction and read no rules.
 Instead, it used a technique called self-play reinforcement learning 
			(RL) to build up its own understanding of the game.
   
				Pure trial and 
			error across millions, even billions of virtual games, starting out 
			more or less randomly pulling whatever levers were available, and 
			attempting to learn from the results. 
			Within two years of the start of the project in 2014, 
			AlphaGo had 
			beaten the European Go champion 5-0 - and by 2017 it had defeated 
			the
			
			world's #1 ranked human player.
 
			  
			
			
			 
			
			AlphaGo
			
			
			soundly defeated
			many-times 
			 
			world-champion Go master Lee Sedol in 2016,  
			using strange moves that 
			would be incredibly rare  
			from a human opponent - and indeed, 
			 
			that 
			evolved the human understanding  
			of the gameDeepmind 
			  
			At this point, Deepmind unleashed a similar AlphaZero model on the 
			chess world, where models like Deep Blue, trained on human thinking, 
			knowledge and rule sets, had been beating human grandmasters since 
			the 90s.
 
			  
			AlphaZero played 100 matches against the reigning AI 
			champion, Stockfish, winning 28 and tying the rest. 
			  
			  
			  
			Human thinking puts the brakes on 
			AI
 
 Deepmind started dominating these games - and shoji, Dota 2, 
			Starcraft II and many others - when it jettisoned the idea that 
			emulating a human was the best way to get a good result.
 
 Bound by different limits than us, and gifted with different 
			talents, these electronic minds were given the freedom to interact 
			with things on their own terms, play to their own cognitive 
			strengths, and build their own ground-up understanding of what works 
			and what doesn't.
 
 AlphaZero doesn't know chess like Magnus Carlssen does.
 
				
				It's never 
			heard of the Queen's Gambit or studied the great grandmasters. 
				   
				It's 
			just played a shit-ton of chess, and built up its own understanding 
			against the cold, hard logic of wins and losses, in an inhuman and 
			inscrutable language it created itself as it went. 
			  
			  
			As a result it's so much better than any model trained by humans, 
			that it's an absolute certainty:
 
				
				no human, and no model trained on 
			human thinking will ever again have a chance in a chess game if 
			there's an advanced reinforcement learning agent on the other side. 
			And something similar, according to people that are better-placed to 
			know the truth than anyone else on the planet, is what's just 
			started happening with the
			
			latest, greatest version of ChatGPT. 
			  
			  
			  
			OpenAI's new 'o1' model begins to 
			diverge from human thinking
 
 ChatGPT and other Large Language Model (LLM) AIs, like those early 
			chess AIs, has been trained on as much human knowledge as was 
			available:
 
				
				the entire written output of our species, give or take. 
			And they've become very, very good.  
			  
			All this palaver about whether 
			they'll ever achieve Artificial General Intelligence...  
			  
			Good grief, 
			 
				
				can you picture a human that could compete with GPT-4o across the 
			breadth of its capabilities? 
			But LLMs specialize in language, not in getting facts right or 
			wrong.  
			  
			That's why they "hallucinate" - or 
			
			BS - giving you wrong 
			information in beautifully phrased sentences, sounding as confident 
			as a news anchor.
 Language is a collection of weird gray areas where there's rarely an 
			answer that's 100% right or wrong - so LLMs are typically trained 
			using reinforcement learning with human feedback.
 
			  
			That is, humans 
			pick which answers sound closer to the kind of answer they were 
			wanting.  
				
				But facts, and exams, and coding - these things do have a 
			clear success/fail condition; either you got it right, or you 
			didn't. 
			And this is where the new o1 model has started to split away from 
			human thinking and start bringing in that insanely effective AlphaGo 
			approach of pure trial and error in pursuit of the right result. 
			  
			  
			  
			o1's Baby steps into 
			Reinforcement Learning
 
 In many ways, o1 is pretty much the same as its predecessors 
			- except that OpenAI has built in some 'thinking time' before it 
			starts to answer a prompt.
 
			  
			During this thinking time, o1 generates a 
			'chain of thought' in which it considers and reasons its way through 
			a problem.
 And this is where the RL approach comes in - o1, unlike previous 
			models that were more like the world's most advanced autocomplete 
			systems, really 'cares' whether it gets things right or wrong.
 
			  
			And 
			through part of its training, this model was given the freedom to 
			approach problems with a random trial-and-error approach in its 
			chain of thought reasoning.
 It still only had human-generated reasoning steps to draw from, but 
			it was free to apply them randomly and draw its own conclusions 
			about which steps, in which order, are most likely to get it toward 
			a correct answer.
 
 And in that sense, it's the first LLM that's really starting to 
			create that strange, but super-effective AlphaGo-style 
			'understanding' of problem spaces. In the domains where it's now 
			surpassing Ph.D.-level capabilities and knowledge, it got there 
			essentially by trial and error, by chancing upon the correct answers 
			over millions of self-generated attempts, and by building up its own 
			theories of what's a useful reasoning step and what's not.
 
 So in topics where there's a clear right and wrong answer, we're now 
			beginning to see this alien intelligence take the first steps past 
			us on its own two feet. If the games world is a good analogy for 
			real life, then friends, we know where things go from here.
 
			  
			It's a 
			sprinter that'll accelerate forever, given enough energy.
 But o1 is still primarily trained on human language. That's very 
			different from truth - language is a crude and low-res 
			representation of reality.
 
			  
			Put it this way:  
				
				you can describe a 
			biscuit to me all day long, but I won't have tasted it. 
			So what happens when you stop describing the truth of the physical 
			world, and let the AIs go and eat some biscuits?  
			  
			We'll soon begin to 
			find out, because AIs embedded in robot bodies are now starting to 
			build their own ground-up understanding of how the physical world 
			works. 
			  
			  
			  
			AI's Pathway toward Ultimate Truth
 
 Freed from the crude human musings of Newton, and Einstein, and 
			Hawking, embodied AIs will take a bizarre AlphaGo-style approach to 
			understanding the world.
 
			  
			They'll poke and prod at reality, and 
			observe the results, and build up their own theories in their own 
			languages about what works, what doesn't, and why. 
				
				They won't approach reality like humans or animals do. 
				   
				They won't 
			use a scientific method like ours, or split things into disciplines 
			like physics and chemistry, or run the same kinds of experiments 
			that helped humans master the materials and forces and energy 
			sources around them and dominate the world. 
			Embodied AIs given the freedom to learn like this will be 
			hilariously weird.  
			  
			They'll do the most bizarre things you can think 
			of, for reasons known only to themselves, and in doing so, they'll 
			create and discover new knowledge that humans could never have 
			pieced together.
 Unshackled from our language and thinking, they won't even notice 
			when they break through the boundaries of our knowledge and discover 
			truths about the universe and new technologies that humans wouldn't 
			stumble across in a billion years.
 
 We're granted some reprieve here:
 
				
				this isn't happening in a matter 
			of days or weeks, like so much of what's going on in the LLM world. 
			Reality is the highest-resolution system we know of, and the 
			ultimate source of truth.  
			  
			But there's an awful lot of it, and it's 
			also painfully slow to work with; unlike in simulation, reality 
			demands that you operate at a painfully slow one minute per minute, 
			and you're only allowed to use as many bodies as you've actually 
			built.
 So embodied AIs attempting to learn from base reality won't 
			initially have the wild speed advantage of their language-based 
			forebears. But they'll still be a lot faster than evolution, with 
			the ability to pool their learnings among co-operative groups in 
			swarm learning.
 
 Companies like Tesla, Figure and Sanctuary AI are working feverishly 
			at building humanoids to a standard that's commercially useful and 
			cost-competitive with human labor.
 
			  
			Once they achieve that - if they 
			achieve that - they'll be able to build enough robots to start 
			working on that ground-up, trial-and-error understanding of the 
			physical world, at scale and at speed.
 They'll need to pay their way, though. It's funny to think about, 
			but these humanoids might learn to master the universe in their 
			downtime from work.
 
 Apologies for these rather esoteric and speculative thoughts, but as 
			I keep finding myself saying, what a time to be alive!
 
 OpenAI's o1 model might not look like a quantum leap forward, 
			sitting there in GPT's drab textual clothing, looking like just 
			another invisible terminal typist.
 
			  
			But it really is a step-change in 
			the development of AI - and a fleeting glimpse into exactly how 
			these alien machines will eventually overtake humans in every 
			conceivable way.
 For a wonderful deeper dive into how reinforcement learning makes o1 
			a step-change in the development of AI, I highly recommend the video 
			below, from the excellent AI Explained channel:
 
			  
			  
			  
			
 
			  
			 
			
			 |